UIMA Ruta wordlist case ignore - uima

My use case is such that i have a list of match words in a WORDLIST "MonthNames.txt".
Now i want to Mark all the occurrences of these words in the given document irrespective of the text case.
PACKAGE uima.ruta.example;
WORDLIST MonthNameList = 'MonthNames.txt';
DECLARE MonthNames;
DECLARE MonthNameValue;
// Regex to be used in finding dates
STRING monthNameValueRegex = "(?i)(january|february|march|april|may|june|july|august|september|october|november|december|jan|feb|mar|apr|jun|jul|aug|sept|oct|nov|dec)";
// Mark month name
Document{-> MARKFAST(MonthNames, MonthNameList)};
Document{CONTAINS(MonthNames) -> MARK(MonthNameValue)};
Document{REGEXP(monthNameValueRegex) -> MARK(MonthNameValue)};
Is there any way to do it ?
I tried
Document{-> MARKFAST(MonthNames, MonthNameList,true)};
But that is just to ignore whitespaces not text case.
Please help

Passing a 3rd variable as true makes it ignore the word case.
Document{-> MARKFAST(MonthNames, MonthNameList,true)};
Thanks to Peter for his help.

Related

How to determine if a string contains any element of a set

I have a sentence, and I want to determine if it contains any elements of a set.
val sentence = "Hello, today is a fine day to learn scala"
val mySet = Set("day", "scala")
What about:
mySet.exists(word => sentence.contains(word))
It will return true if at least one word from the set is present in the string.
Here's a solution that...
is case-insensitive ("scala" does match "Scala")
ignores sub-strings ("rat" does not match "rats")
ignores punctuation (!?,-) unless specifically specified in mySet
mySet.mkString("(?i)\\b(", "|", ")\\b")
.r.unanchored
.matches(sentence)

WORDTABLE - Not matching the word - UIMA RUTA

I've tried to match a word using WORDTABLE. But some text is not matching.
In the below input the word Afghanistan is not matching. If I remove A Coruña;n.a. from WORDTABLE, then it's matching.
Sample Input:
Afghanistan
Report
report
Sample CSV ( test.csv):
Afghanistan;Afghan.
report;rep.
A Coruña;n.a.
Code:
PACKAGE uima.ruta.example;
RETAINTYPE(SPACE);
WORDTABLE Table = 'test.csv';
DECLARE Annotation Abbr(STRING short);
Document{->MARKTABLE(Abbr, 1, Table,true,0,"",0, "short" = 2)};
RETAINTYPE;
This is most likely caused by the whitespace in the wordlist. There are several options to avoid this problem, e.g., activating the configuration parameter dictRemoveWS.

Scala: Transforming List of Strings containing long descriptions to list of strings containing only last sentences

I have a List[String], for example:
val test=List("this is, an extremely long sentence. Check; But. I want this sentence.",
"Another. extremely. long. (for eg. description). But I want this sentence.",
..)
I want the result to be like:
List("I want this sentence", "But I want this sentence"..)
I tried few approaches but didn't work
test.map(x=>x.split(".").reverse.head)
test.map(x=>x.split(".").last)
Try using this
test.reverse.head.split("\\.").last
To handle any Exception
Try(List[String]().reverse.head.split("\\.").last).getOrElse("YOUR_DEFAULT_STRING")
You can map over you List, split each String and then take the last element. Try the below code.
val list = List("this is, an extremely long sentence. Check; But. I want this sentence.",
"Another. extremely. long. (for eg. description). But I want this sentence.")
list.map(_.split("\\.").last.trim)
It will give you
List(I want this sentence, But I want this sentence)
test.map (_.split("\\.").last)
Split takes a regular expression, and in such, the dot stands for every character, so you have to mask it.
Maybe you want to include question marks and bangs:
test.map (_.split("[!?.]").last)
and trim surrounding whitespace:
test.map (_.split("[!?.]").last.trim).
The reverse.head would have been a good idea, if there wasn't the last:
scala> test.map (_.split("[!?.]").reverse.head.trim)
res138: List[String] = List(I want this sentence, But I want this sentence)
You can do this a number of ways:
For each string in your original list: split by ., reverse the list, take the first value
test.map(_.split('.').reverse.headOption)
// List(Some( I want this sentence), Some( But I want this sentence))
.headOption results in Some("string") or None, and you can do something like a .getOrElse("no valid string found") on it. You can trim the unwanted whitespace if you want.
Regex match
test.map { sentence =>
val regex = ".*\\.\\s*([^.]*)\\.$".r
val regex(value) = sentence
value
}
This will fetch any string at the end of a longer string which is preceded by a full stop and a space and followed by a full stop. You can modify the regex to change the exact rules of the regex, and I recommend playing around with regex101.com if you fancy learning more regex. It's very good.
This solution is better for more complicated examples and requirements, but it's worth keeping in mind. If you are worried that the regex might not match, you can do something like checking if the regex matches before extracting it:
test.map { sentence =>
val regexString = ".*\\.\\s*([^.]*)\\.$"
val regex = regexString.r
if(sentence.matches(regexString)) {
val regex(value) = sentence
value
} else ""
}
Take the last after splitting the string by .
test.map(_.split('.').map(_.trim).lastOption)

swift, how i find the letter that i want in sentence or word?

I want to find letter in variable a,
what is code for searching method?
Example:
var a = ["coffee","juice","water"]
search letters is "co"
searching method's result is "cofee",
what is searching method?
First you need to iterate over an array and select elements which match some condition, there's a filter method for that. In this case you need to check if a word contains some string, so use containsString.
a.filter { $0.containsString("un") }

find whether a string is substring of other string in SML NJ

In SML NJ, I want to find whether a string is substring of another string and find its index. Can any one help me with this?
The Substring.position function is the only one I can find in the basis library that seems to do string search. Unfortunately, the Substring module is kind of hard to use, so I wrote the following function to use it. Just pass two strings, and it will return an option: NONE if not found, or SOME of the index if it is found:
fun index (str, substr) = let
val (pref, suff) = Substring.position substr (Substring.full str)
val (s, i, n) = Substring.base suff
in
if i = size str then
NONE
else
SOME i
end;
Well you have all the substring functions, however if you want to also know the position of it, then the easiest is to do it yourself, with a linear scan.
Basically you want to explode both strings, and then compare the first character of the substring you want to find, with each character of the source string, incrementing a position counter each time you fail. When you find a match you move to the next char in the substring as well without moving the position counter. If the substring is "empty" (modeled when you are left with the empty list) you have matched it all and you can return the position index, however if the matching suddenly fail you have to return back to when you had the first match and skip a letter (incrementing the position counter) and start all over again.
Hope this helps you get started on doing this yourself.