UIMA ruta annotate a sequence of fixed length containing words from specific wordlist - uima

I have a WORDTABLE containing numbers expressed as strings (zero, one, two, ..., n) plus the respective digits as features. I am trying to annotate a sequence of a fixed length of stringified numbers.
E.g.:
one two three four -> should be annotated
one two three four five six -> should not be annotated
So far I have done
WORDTABLE numbers = "numbers.csv";
DECLARE Annotation number(STRING int_string, STRING digit);
DECLARE Annotation numberSequence;
Document{-> MARKTABLE(number, 1, numbers, "digit" = 2)};
(number number) {-> MARK(numberSequence)};
This matches a sequence containing n stringified number, what I want is establishing the length of the sequence, something like:
number[4,4] {-> MARK(numberSequence)};
where the minimum and maximum tokens in the sentence containing the stringified numbers should be equal, for example, to 4.
Is it possible to do this?

Here's an exemplary rule for annotating text positions if there are exactly four annotations of the type number:
ANY{-PARTOF(number)} #number[4,4] {-> MARK(numberSequence)} ANY{-PARTOF(number)};
DISCLAIMER: I am a developer of UIMA Ruta

Related

Extract specific values (n digit long and starting with a number) PSQL

I have two string like this and I want to extract 6 digits starting from "2007" using postgresql. Please help
"472a62b98b-2004-07-l-de_k-de_n-email_t-cus_200703"
"21d6bd96f2-2004-04-l-de_c-de_m-email_t-cus_200705-b"
Results would be:
"200703"
"200705"
You can use substring() with a regex:
substring(the_column from '2007[0-9]{2}')
This extracts the first substring that starts with 2007 and is followed by (exactly) two digits.

Find rows where string contains certain character at specific place

I have a field in my database, that contains 10 characters:
Fx: 1234567891
I want to look for the rows where the field has eg. the numbers 8 and 9 in places 5 and 6
So for example,
if the rows are
a) 1234567891
b) 1234897891
c) 1234877891
I only want b) returned in my select.
The type of the field is string/character varying.
I have tried using:
where field like '%89%'
but that won't work, because I need it to be 89 at a specific place in the string.
The fastest solution would be
WHERE substr(field, 8, 2) = '89'
If the positions are not adjacent, you end up with two conditions joined with AND.
You should be able to evaluate the single character using the underscore(_) character. So you should be able to use it as follows.
where field like '____89%'

Cutting off decimals in Year conversion in Crystal Syntax

I'm printing the following year as a string in a report but it prints as 2,018.00. How do I have it print as a four digit year string without decimals or the comma? The Truncate() didn't seem to work.
CStr (Year({Date}) + 1)
You can either omit the CStr-function and set the number format on the formatting tab or, if the formula needs to return a string, you can use the arguments of the CStr- or ToText-function (which are equivalent).
Either set the second argument to define the number format:
CStr(Year({Date}) + 1, "####")
Or
Set the second and third argument to set the number of decimals to 0 and an empty string as thousands separator:
CStr(Year({Date}) + 1, 0, "")
What is happening is the Year() function converts the data into a Number, complete with thousands separator, decimal, and 2 significant digits after the decimal.
To get around this what I have found that works is to remove the CStr() function from your formula. This allows you to access the Formatting tab for a Number data type by right clicking the field and selecting Format Field. Then from the Number tab you can set the Style of the field to one of the styles that doesn't use a separator or decimal in the display.
If you are needing to concatenate this value with another string, then you can get a little more creative and use the LEFT() and REPLACE() functions like this.
Left(Replace(Cstr(Year({Date}) + 1), ",", ""), 4)

finding a comma in string

[23567,0,0,0,0,0] and other value is [452221,0,0,0,0,0] and the value should be contineously displaying about 100 values and then i want to display only the sensor value like in first sample 23567 and in second sample 452221 , only the these values have to display . For that I have written a code
value = str2double(str(2:7));see here my attempt
so I want to find the comma in the output and only display the value before first comma
As proposed in a comment by excaza, MATLAB has dedicated functions, such as sscanf for such purposes.
sscanf(str,'[%d')
which matches but ignores the first [, and returns the next (i.e. the first) number as a double variable, and not as a string.
Still, I like the idea of using regular expressions to match the numbers. Instead of matching all zeros and commas, and replacing them by '' as proposed by Sardar_Usama, I would suggest directly matching the numbers using regexp.
You can return all numbers in str (still as string!) with
nums = regexp(str,'\d*','match')
and convert the first number to a double variable with
str2double(nums{1})
To match only the first number in str, we can use the regexp
nums = regexp(str,'[(\d*),','tokens')
which finds a [, then takes an arbitrary number of decimals (0-9), and stops when it finds a ,. By enclosing the \d* in brackets, only the parts in brackets are returned, i.e. only the numbers without [ and ,.
Final Note: if you continue working with strings, you could/should consider the regexp solution. If you convert it to a double anyways, using sscanf is probably faster and easier.
You can use regexprep as follows:
str='[23567,0,0,0,0,0]' ;
required=regexprep(str(2:end-1),',0','')
%Taking str(2:end-1) to exclude brackets, and then removing all ,0
If there can be values other than 0 after , , you can use the following more general approach instead:
required=regexprep(str(2:end-1),',[-+]?\d*\.?\d*','')

reading via matlab a number after a specific string in a txt file

I re explain my pb in a large a.txt file i have
Amount of Food is 1
Desired Travel is 5
I need to read the 1 after the 'Amount of Food is ' expression and the 5 after the 'Desired Travel is' expression, Thanks again
You can have a look at this: with regexpi you can simply look for numbers in your strings.
The syntax is as simple as this:
startIndex = regexpi(str,expression)
where the expression parameter is a regex expression (i.e. '\d*' to retrieve consecutive digits).
In your specific case a way to perform this with regular expressions would be:
First you have to decide what strings are valid in your search
for example:
firstpar = 'First parameter is [0-9]+';
means that you are looking for a string 'First parameter is '
that ends with a sequence of digits.
Then you could use regexp or regexpi in the following way:
results = regexp(mystring, firstpar, 'match');
Where mystring is the text you perform the search on and 'match' means that you want parts of the text as output, not indexes.
Now, results is a cell matrix with each cell containing a string that appeared in your text and fulfilled your firstpar definition. In order to extract just the numbers from cell matrix of strings you could use regexp again, but now helping yourself with cellfun, which iteratively applies your command to all cells of a cell matrix:
numbers = cellfun(#(x) str2num(regexp(x, '[0-9]+', 'match', 'once')), results);
numbers is an array of numbers that you were looking for.
You can do the same for different string patterns - if you want to have a more general string definitions (instead of straightforward firstpar that we used here) read matlab documentation about regular expressions (alexcasalboni pasted it in his comment), scroll down to Input Arguments and expand 'expressions'.
The difference between regexp and regexpi is that the latter is case insensitive.