UIMA Ruta: Copy the feature value from a contained annotation to a containing annotation - uima

Note: This seems heavily related to Setting feature value to the count of containing annotation in UIMA Ruta. But I cannot quite apply the answer to my situation.
I am analyzing plain text documents where the following structure is assumed:
Document (one, of course)
Section (many)
Heading (one per section)
I am being asked to identify sections by checking whether their headings satisfy conditions. A useful and obvious sort of condition would be: does the heading match a given regular expression? A less-useful but perhaps more achievable condition would be: does the heading contain a given text?
I could and have already achieved this by taking a list of tuples of regular expressions and section titles, and at design time, for each member of the list, as such:
BLOCK(forEach) SECTION{} {
...
HEADING{REGEXP(".*table.*contents.*", true) ->
SETFEATURE("value", "Table of Contents")};
...
}
SECTION{ -> SETFEATURE("value", "Table of Contents")}
<- { HEADING.headingValue == "Table of Contents"; };
This approach is fairly straightforward but has a few big drawbacks:
It heavily violates the DRY principle
Even when writing the rule for just one section to identify, the rule author must copy the section title twice (it should only need to be specified once)
It makes the script needlessly long and unwieldy
It puts a big burden on the rule author, who in an ideal case, would only need to know Regex - not Ruta
So I wanted to refactor to achieve the following goals:
A text file is used to store the regular expressions and corresponding titles, and the rule iterates over these pairs
Features, rather than types, are used to differentiate different sections/headings (i.e. like above, using SECTION.value=="Table of Contents" and not TableOfContentsSection)
After looking over the UIMA Ruta reference to see which options were available to achieve these goals, I settled on the following:
Use a WORDTABLE to store tuples of section title, words to find / regex if possible, lookup type - so for instance, Table of Contents,contents,sectiontitles
Use MARKTABLE to mark an intermediate annotation type LookupMatch whose hint feature contains the section title and whose lookup feature contains the type of lookup we are talking about
For each HEADING, see if a LookupMatch.lookup == "sectiontitle" is inside, and if it is, copy the LookupMatch.hint to the heading's value field.
For each SECTION, see if a HEADING with a value is inside; if so, copy the value to the SECTION.value field.
It was not quite a surprise to find that implementing steps 3 and 4 was not so easy. That's where I am at and why I am asking for help.
// STEP 1
WORDTABLE Structure_Heading_WordTable =
'/uima/resource/structure/Structure_Heading_WordTable.csv';
// STEP 2
Document.docType == "Contract"{
-> MARKTABLE(LookupMatch, // annotation
2, // lookup column #
Structure_Heading_WordTable, // word table to lookup
true, // case-insensitivity
0, // length before case-insensitivity
"", // characters to ignore
0, // matches to ignore
"hint" = 1, "lookup" = 3 // features
)
};
// STEPS 3 AND 4 ... ???
BLOCK(ForEach) LookupMatch.lookup == "sectiontitle"{} {
???
}
HEADING{ -> SETFEATURE("value", ???)} <- {
???
};
Here is my first real stab at it:
HEADING{ -> SETFEATURE("value", lookupMatchHint)} <- {
LookupMatch.lookup == "HeadingWords"{ -> GETFEATURE("hint", lookupMatchHint)};
};
TL; DR
How can I conditionally copy a feature value from one annotation to another? GETFEATURE kind of assumes that you only get 1...

Related

How to access list element in FEEL list literal expression in DMN?

I have the following list of the object:
"animals":[
{
"family":"cat",
"color":"grey"
},
{
"family":"dog",
"color":"white"
}
]
I want to access first animal object that is in dog family and white color. I am trying to achieve it by doing this:
animals[family = "dog" and color = "white"][0]
But it shows warning as follows:
FEEL WARN while evaluating literal expression 'animals[ family = .... [string clipped after 50 chars, total length is 82]': Index out of bound: list of 1 elements, index 0; will evaluate as FEEL null
What exactly is incorrect here? I feel I am doing wrong something semantically. I also referred FEEL's specification but am unable to figure out what's wrong. I also referred dmn decision modeling documentation for DMN from Redhat but still I am clueless. Please help.
In FEEL, the list's elements index starts at 1.
So the expression you want to access first animal object, actually is:
animals[family = "dog" and color = "white"][1]
This is documented in the DMN specification at page 126:
The first element of a list L can be accessed using L[1] and the last
element can be accessed using L[-1].
To provide a more friendly reference, this is also documented in Drools documentation
Elements in a list can be accessed by index, where the first element
is 1. Negative indexes can access elements starting from the end of
the list so that -1 is the last element.
...and equivalently for the productized Red Hat documentation version as well:

Multiple Conditions in MailMerge Field

I would like to include up to 3 conditions in a MailMerge field. Below is my current field which returns 1 if checkbox1 is checked.
if"<<cb1>>"="Yes" "Checked""Unchecked"
I would like to include checking of additional cb2 and cb3, to check if any of them are checked.
May I know how can it be done?
p.s. I left out the { } colons which I am not sure if it will be required here.
Edit: Tried the following structure but the output was Yes
if
"if"<<cb1>>"="Yes" "1""0""
+
"if"<<cb2>>"="Yes" "1""0""
+
"if"<<cb3>>"="Yes" "1""0""
>0
"1 or more checked""None checked"
Try a field coded as:
{IF{={IF«cb1»= "Yes" 1 0}+{IF«cb2»= "Yes" 1 0}+{IF«cb3»= "Yes" 1 0}}> 0 "1 or more checked" "None checked"}
Note: The field brace pairs (i.e. '{ }') for the above example are all created in the document itself, via Ctrl-F9 (Cmd-F9 on a Mac or, if you’re using a laptop, you might need to use Ctrl-Fn-F9); you can't simply type them or copy & paste them from this message. Nor is it practical to add them via any of the standard Word dialogues. Likewise, the chevrons (i.e. '« »') are part of the actual mergefields - which you can insert from the 'Insert Merge Field' dropdown (i.e. you can't type or copy & paste them from this message, either). The spaces represented in the field constructions are all required.
For example, you can create a field with logic such as: “If Condition 1 is met, then if Condition 2 is also met, display Result 1”. This is equivalent to saying “if both conditions are met, display Result 1”; but unfortunately you can't use “and” in Word fields.
Such a field would look something this:
{ IF [Condition 1] { IF [Condition 2] [Display Result 1] "" } "" }
The easiest way to create nested fields of this type is to create them separately, then cut and paste one field into the other.
Source

Can we set tolerance level on regex annotator in Ruta?

I am annotating Borrower Name
"Borrower Name" -> BorrowerNameKeyword ( "label" = "Borrower Name");
But I get this text post OCR analysis. At times I might get Borrower Name as B0rr0wer Nane. Is this possible to set tolerance limit so that this text gets annotated as BorrowerNameKeyword?
Is their any other approach which could help here?
I could think of dictionary correction but that wont help as it could auto correct right words.
You could achieve that with regular expressions in UIMA Ruta. For you particular example the following rule should work:
"B.rr.wer\\sNa.e" -> BorrowerName;
Likewise, you can create more variants of regular expressions to cover the OCR errors.

How to generate code from custom scoping through instances?

I'm trying to write a code generator using xtext. There are instances of types declared in the corresponding DSL, attributes can be referenced through those instances by custom scoping (see code for example). The linking is performed directly from referencing element to attribute, so that there is no information about the surrounding instance - but for code generation, I exactly need the qualified name that is added in the DSL file. Is there any other possibility so that I can figure out through which instance the actual feature is referenced?
My first idea was to recall the ScopeProvider at code generation, which works but does not react on two instances of same type because the first matching Attribute is chosen - so if there are multiple instances, the generator cannot distinguish which one is meant.
Second idea was to include information from the corresponding DSL file, but I don't have any idea how to get this work. I already searched a lot if it is possible to get the corresponding DSL-file from the current model, but could not find any helpful answer.
Third idea was to include the instance as a hidden field in the referencing element - but I could not find any solution for this approach too.
Small extract of Grammar (simplified):
Screen:
(features += ScreenFeature)*
;
ScreenFeature:
name=ID ':' type=[ClientEntity]
;
ClientEntity:
(features += Feature)*
;
Feature:
name=ID ':' type=DefaultValue
;
DefaultValue:
'String'|'int'|'double'|'boolean'
;
ChangeViewParam:
param=[ScreenFeature|QualifiedName] ':' value=[ScreenFeature|QualifiedName]
;
DSL-Example:
ClientEntity Car {
id : int
name : String
}
Screen Details {
car : Car
car2 : Car
[...]
car2.id : car.id
}
Generation output of first approach (line: car2.id : car.id) :
car.id : car.id
Expected:
car2.id : car.id
I hope you can understand my problem and have an idea how to solve it. Thanks for your help!
You can use
org.eclipse.xtext.nodemodel.util.NodeModelUtils.findNodesForFeature(EObject, EStructuralFeature) to obtain the nodes for YourDslPackage.Literals.CHANGE_VIEW_PARAM__PARAM (should be only one) and ask that one for its text.
Alternatively you could split your param=[ScreenFeature|QualifiedName] into two references

How do I dynamically build a search block in sunspot?

I am converting a Rails app from using acts_as_solr to sunspot.
The app uses the field search capability in solr that was exposed in acts_as_solr. You could give it a query string like this:
title:"The thing to search"
and it would search for that string in the title field.
In converting to sunspot I am parsing out field specific portions of the query string and I need to dynamically generate the search block. Something like this:
Sunspot.search(table_clazz) do
keywords(first_string, :fields => :title)
keywords(second_string, :fields => :description)
...
paginate(:page => page, :per_page => per_page)
end
This is complicated by also needing to do duration (seconds, integer) ranges and negation if the query requires it.
On the current system users can search for something in the title, excluding records with something else in another field and scoping by duration.
In a nutshell, how do I generate these blocks dynamically?
I recently did this kind of thing using instance_eval to evaluate procs (created elsewhere) in the context of the Sunspot search block.
The advantage is that these procs can be created anywhere in your application yet you can write them with the same syntax as if you were inside a sunspot search block.
Here's a quick example to get you started for your particular case:
def build_sunspot_query(conditions)
condition_procs = conditions.map{|c| build_condition c}
Sunspot.search(table_clazz) do
condition_procs.each{|c| instance_eval &c}
paginate(:page => page, :per_page => per_page)
end
end
def build_condition(condition)
Proc.new do
# write this code as if it was inside the sunspot search block
keywords condition['words'], :fields => condition[:field].to_sym
end
end
conditions = [{words: "tasty pizza", field: "title"},
{words: "cheap", field: "description"}]
build_sunspot_query conditions
By the way, if you need to, you can even instance_eval a proc inside of another proc (in my case I composed arbitrarily-nested 'and'/'or' conditions).
Sunspot provides a method called Sunspot.new_search which lets you build the search conditions incrementally and execute it on demand.
An example provided by the Sunspot's source code:
search = Sunspot.new_search do
with(:blog_id, 1)
end
search.build do
keywords('some keywords')
end
search.build do
order_by(:published_at, :desc)
end
search.execute
# This is equivalent to:
Sunspot.search do
with(:blog_id, 1)
keywords('some keywords')
order_by(:published_at, :desc)
end
With this flexibility, you should be able to build your query dynamically. Also, you can extract common conditions to a method, like so:
def blog_facets
lambda { |s|
s.facet(:published_year)
s.facet(:author)
}
end
search = Sunspot.new_search(Blog)
search.build(&blog_facets)
search.execute
I have solved this myself. The solution I used was to compiled the required scopes as strings, concatenate them, and then eval them inside the search block.
This required a separate query builder library that interrogates the solr indexes to ensure that a scope is not created for a non existent index field.
The code is very specific to my project, and too long to post in full, but this is what I do:
1. Split the search terms
this gives me an array of the terms or terms plus fields:
['field:term', 'non field terms']
2. This is passed to the query builder.
The builder converts the array to scopes, based on what indexes are available. This method is an example that takes the model class, field and value and returns the scope if the field is indexed.
def convert_text_query_to_search_scope(model_clazz, field, value)
if field_is_indexed?(model_clazz, field)
escaped_value = value.gsub(/'/, "\\\\'")
"keywords('#{escaped_value}', :fields => [:#{field}])"
else
""
end
end
3. Join all the scopes
The generated scopes are joined join("\n") and that is evaled.
This approach allows the user to selected the models they want to search, and optionally to do field specific searching. The system will then only search the models with any specified fields (or common fields), ignoring the rest.
The method to check if the field is indexed is:
# based on http://blog.locomotivellc.com/post/6321969631/sunspot-introspection
def field_is_indexed?(model_clazz, field)
# first part returns an array of all indexed fields - text and other types - plus ':class'
Sunspot::Setup.for(model_clazz).all_field_factories.map(&:name).include?(field.to_sym)
end
And if anyone needs it, a check for sortability:
def field_is_sortable?(classes_to_check, field)
if field.present?
classes_to_check.each do |table_clazz|
return false if ! Sunspot::Setup.for(table_clazz).field_factories.map(&:name).include?(field.to_sym)
end
return true
end
false
end