Can't convert string to lower case in Ruta - uima

I have a functioning RUTA script. All I want to do is convert a string variable to lowercase doing this ASSIGN(s1, toLowerCase(s2)) where both s1 and s2 are strings. My script works when I do this ASSIGN(s1,s2) but causes an error when I add toLowerCase to my script. The error I get is not very helpful.
2021-08-28 11:27:39 ERROR AnnotateFileHandler:67 - org.apache.uima.resource.ResourceInitializationException: Initialization of annotator class "org.apache.uima.ruta.engine.RutaEngine" failed. (Descriptor: )

I found an answer posted by Peter.
here
I had to change the way I was configuring my Ruta engine to import the string functions, like this:
createEngineDescription(RutaEngine.class,
RutaEngine.PARAM_MAIN_SCRIPT, "system8.annotator.system8",
RutaEngine.PARAM_ADDITIONAL_EXTENSIONS,
new String[]{
BooleanOperationsExtension.class.getName(),
StringOperationsExtension.class.getName()})
Thank goodness for Peter Kluegl

Related

Spark write text file without ignoring escape(backslash)

I'm trying write DataSet into text file.
Example
datasets
.wirte
.text(path)
What I intended is to write "some\text"(String which dataset contains).
As scala to interpret this String, we should set String value like something this
val text: String = "some\\text"
Of course when testing in scala, it prints out correct value ("some\text").
But when I write this dataset with spark.write, it appears to be written "some\\text"
Reading the internal codes, I just found escape option only for csv writing.
Is there any way to solve this problem?
Thanks

Find String Length in Azure Data Factory v2

I'm starting off with ADFv2 and am facing an issue while trying to find out the length of a string. Although the portal says length('abc') should work, it is actually not working. What is the workaround for this?
PFB error snapshot:
If the variable is of type String, just convert the value to a string using this: https://learn.microsoft.com/en-us/azure/data-factory/control-flow-expression-language-functions#string
string(length('abc'))
Hope this helped!

RUTA: Multiline Annotation

Full disclosure: New to RUTA.
I have a multi line using regex to find the entity. But I need it now to have the break removed in the annotation.
My RUTA looks like "(?i)\\b[A-Z]{2}[[0-9]{1,}[\n]{0,}[0-9]{1,}]{1,}" -> EntitType;
My results end up like
S01234
25475
How can I get it be S0123425475?
Here is an example for storing a modified string in a feature:
DECLARE EntitType (String normalized);
e:EntitType{-> e.normalized = replaceAll(e.ct, "\n", "")};
DISCLAIMER: I am a developer of UIMA Ruta

How to retrieve compound words from string list- UIMA RUTA

Sample Script:
DECLARE Name,TEST;
"Peter"->Name;
"der Groot"->Name;
"Robert"->Name;
"de Leew"->Name;
"O'Sullivan"->Name;
STRING s;
STRINGLIST slist;
Name{-> MATCHEDTEXT(s), ADD(slist,s),LOG(s)};
ANY+ {INLIST(slist)->MARK(TEST)};
Received Output:
Peter
Robert
Expected Output:
Peter
der Groot
Robert
de Leew
O'Sullivan
Sample Input:
Peter
der Groot
Robert
de Leew
O'Sullivan
I've tried to mark the stringlist value into an annotation type.But the received output is different from expected output.
The condition at the rule element ANY+ validates every single ANY, thus fails with the first one and also matches only single tokens.
Should the last rule annotate only position directly after Name annotations?
If not, the you can do something like:
Name{-> MATCHEDTEXT(s), ADD(slist,s)};
MARKFAST(TEST, slist);
If yes, the situation gets more complicated because you do not have candidates with the correct span. You cannot solve this with a combination of ANY and INLIST, You either need a correct span or fragments in the list. I'd rather recommend an additional fixing rule:
Name{-> MATCHEDTEXT(s), ADD(slist,s)};
MARKFAST(TEST, slist);
ANY{-ENDSWITH(Name)} #TEST{-> UNMARK(TEST)};
DISCLAIMER: I am a developer of UIMA Ruta

Cant enter raw_parser in postgreSQL?

I'm trying to analyze how postgreSQL parse a query, and after some postgreSQL source code tracing with embedding printf() here and there, I've known that the query will be parsed into raw parse tree with raw_parser, which located in file parser.c.
The strange thing is, I've already embedded a printf() dummy in the raw_parser, and after re-installing the postgreSQL and execute a query, my printf() dummy is not printed to the screen!
Can anybody please help me, where I went wrong?
Thanks in advance :D
if you use printf(stderr, "...."), then you can find result in server log. Don't forget - you are not work with server directly. For debugging purposes there are a elog function - it's like printf for client application:
elog(NOTICE, "some text");
a format string is same like printf's format - but you must to remember, PostgreSQL uses a different formats than glibc - so you can to show only integer or float variables. String variables uses different format than is C zero finished string.