UIMA pos tagger invalid output

UIMA pos tagger invalid output - uima

I gave two inputs to UIMA tagger.
1) JOHN IS VERY HAPPY TODAY.
2) john is very happy today.
in case 1) every thing turns out to be a Name and in second nothing turns out to be a name.
Can some one please help me in the same?

POS taggers tend to be a) case sensitive and b) fall back to noun. Thus what probably happens is this: for 1), it knows none of the words and falls back to noun. For 2) it does not recognize "John" as a name because it does not start with a capital letter and believes it to be something else.

Related

Multiple regex in one command

Disclaimer: I have no engineering background whatsoever - please don't hold it against me ;)
What I'm trying to do:
Scan a bunch of text strings and find the ones that
are more than one word
contain title case (at least one capitalized word after the first one)
but exclude specific proper nouns that don't get checked for title case
and disregard any parameters in curly brackets
Example: Today, a Man walked his dogs named {FIDO} and {Fifi} down the Street.
Expectation: Flag the string for title capitalization because of Man and Street, not because of Today, {FIDO} or {Fifi}
Example: Don't post that video on TikTok.
Expectation: No flag because TikTok is a proper noun
I have bits and pieces, none of them error-free from what https://www.regextester.com/ keeps telling me so I'm really hoping for help from this community.
What I've tried (in piece meal but not all together):
(?=([A-Z][a-z]+\s+[A-Z][a-z]+))
^(?!(WordA|WordB)$)
^((?!{*}))

I think your problem is not really solvable solely with regex...
My recommendation would be splitting the input via [\s\W]+ (e.g. with python's re.split, if you really need strings with more than one word, you can check the length of the result), filtering each resulting word if the first character is uppercase (e.g with python's string.isupper) and finally filtering against a dictionary.
[\s\W]+ matches all whitespace and non-word characters, yielding words...
The reasoning behind this different approach: compiling all "proper nouns" in a regex is kinda impossible, using "isupper" also works with non-latin letters (e.g. when your strings are unicode, [A-Z] won't be sufficient to detect uppercase). Filtering utilizing a dictionary is a way more forward approach and much easier to maintain (I would recommend using set or other data type suited for fast lookups.
Maybe if you can define your use case more clearer we can work out a pure regex solution...

How do I use pg_trgm to be more permissible

I used pg_trgrm to check string matches and I am pretty happy with the results. But it is not pefrectly the way I want it. I want that searches like "poduto" finds "produtos" (the r was missing). And Also that "sofáa" finds "sofa". I am using posgresql 9.6.
It does find "vermelho" when I type "vermelo" (h is missing). And it does find "sofa" when I type "sof". It seems that only some letters in middle can be left out and I always can miss a final letter. I want to be able to miss any letter in the middle of the word. And also be able to commit "two mistakes" in the case of sofáa and sofá (I used an accent and used one additional "a").

The solution is to lower pg_trgm.similarity_threshold (or pg_trgm.word_similarity_threshold if you are using <% or %>).
Then words with lower similarity will also be found.

why I can't use capital as the first character of the var in akka cluster distribute data example

I am trying the official example on https://doc.akka.io/docs/akka/current/distributed-data.html#using-the-replicator
(The first scala example on this page)
But it seems strange when I change my code a little bit.
I record a video what I change in the code .The only change I made is the name of the variable on line 16.From DataKey to dataKey. I just renamed it.
https://photos.app.goo.gl/CZrnNZlW85e9MaF73
Now the question is why it happened.
I can't use capital as the first character of the var in this example ???
Please help me to figure that out.Thanks very much.
Akka Version:2.5.9
Scala Version:2.11.12
IDE:IntelliJ IDEA 2017.3.3 Community Edition

Regarding the pattern matching with #, the # allows you to deal with the object itself after the match. In your example, you check for the variable c, if that variable is an object Changed(DataKey) then you retrieve the DataKey through the method get on the object itself
case c # Changed(DataKey) ⇒
val data = c.get(DataKey)

I finally find the answer to the question!
https://www.safaribooksonline.com/library/view/programming-scala-2nd/9781491950135/ch04.html
There are a few rules and gotchas to keep in mind when writing case clauses. The compiler assumes that a term that starts with a capital letter is a type name, while a term that begins with a lowercase letter is assumed to be the name of a variable that will hold an extracted or matched value.
In case clauses, a term that begins with a lowercase letter is assumed to be the name of a new variable that will hold an extracted value. To refer to a previously defined variable, enclose it in back ticks. Conversely, a term that begins with an uppercase letter is assumed to be a type name.

I can't understand the behaviour of btrim()

I'm currently working with postgresql, I learned about this function btrim, I checked many websites for explanation, but I don't really understand.
Here they mention this example:
btrim('xyxtrimyyx', 'xyz')
It gives trim.
When I try this example:
btrim('xyxtrimyyx', 'yzz')
or
btrim('xyxtrimyyx', 'y')
I get this: xyxtrimyyx
I don't understand this. Why didn't it remove the y?

From the docs you point to, the definition says:
Remove the longest string consisting only of characters in characters
(a space by default) from the start and end of string
The reason your example doesn't work is because the function tries to strip the text from Both sides of the text, consisting only of the characters specified
Lets take a look at the first example (from the docs):
btrim('xyxtrimyyx', 'xyz')
This returns trim, because it goes through xyxtrimyyx and gets up to the t and doesn't see that letter in xyz, so that is where the function stops stripping from the front.
We are now left with trimyyx
Now we do the same, but from the end of the string.
While one of xyz is the last letter, remove that letter.
We do this until m, so we are left with trim.
Note: I have never worked with any form of sql. I could be wrong about the exact way that postgresql does this, But I am fairly certain from the docs that this is how it is done.

Unexpected behavior in uppercase regexp using Emacs

I am trying to have an inbetween step for deleting words backwards, which should stop at a capital letter (in the case of camelCase).
For this I thought to use the following to obtain the position of the first capital letter backwards:
(search-backward-regexp "[:upper:]")
If you'd run this when point is after that last parenthesis, it will go here:
(search-backward-regexp "[:upper | :]"), that is, after the r.
How so?

(search-backward-regexp "[[:upper:]]")
[:upper:] is not the "upper" call yet,
but simply a character class which matches a single character which has to be one of ":" or "u" or "p" or "e" or "r".
Only second "[]" makes it search the class.

Andreas answer is correct. However, based on what you are trying to achieve, I would suggest you to take a look to subword-mode (it comes bundled with emacs, at least for modern emacsen)

I realize now that it has to do with the variable case-fold-search.
When set to t, it will ignore case in searches.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

UIMA pos tagger invalid output - uima

I gave two inputs to UIMA tagger. 1) JOHN IS VERY HAPPY TODAY. 2) john is very happy today. in case 1) every thing turns out to be a Name and in second nothing turns out to be a name. Can some one please help me in the same?

POS taggers tend to be a) case sensitive and b) fall back to noun. Thus what probably happens is this: for 1), it knows none of the words and falls back to noun. For 2) it does not recognize "John" as a name because it does not start with a capital letter and believes it to be something else.

Related

Multiple regex in one command

How do I use pg_trgm to be more permissible

why I can't use capital as the first character of the var in akka cluster distribute data example

I can't understand the behaviour of btrim()

Unexpected behavior in uppercase regexp using Emacs

Categories

Resources