Looping issue - UIMA Ruta - uima

Objective:
To assign heading levels.
First heading is assigned level 1. I extract font family and size of it and find matching headings. Once the level gets assigned, I unmark the heading, preserving the headings & the features in yet another annotation (HeadingHierarchy). Once the level is finished, I call the same block again and again as long as there is any heading left in Headinglevel annotation.
Issue:
The script works fine for finding all level-1 headings. But when the block is executed via Call statement, it finds only the first match for each levels (level 2 onwards). Hence the total number of levels for the below input becomes 10, whereas it has to be 4.
Input:(.txt)
Apache UIMA Ruta Overview =>Arial,18
What is Apache UIMA Ruta? =>Arial,16
Getting started =>Arial,16
UIMA Analysis Engines =>Arial,16
Ruta Engine =>Times New Roman,14
Configuration Parameters =>Arial,10
Annotation Writer =>Times New Roman,14
Configuration Parameters =>Arial,10
Apache UIMA Ruta Language =>Arial,18
Syntax =>Arial,16
Rule elements and their matching order =>Arial,16
Script:
PACKAGE uima.ruta.example;
DECLARE Headinglevel(STRING family, INT size, INT level);
DECLARE HeadingHierarchy(STRING family, INT size, INT level);
DECLARE FontFamily, FontSize;
STRING family;
INT size;
RETAINTYPE(BREAK);
BREAK? #{-PARTOF(Headinglevel)} #SPECIAL+ W+ COMMA NUM{->MARK(Headinglevel,2,6), MARK(HeadingHierarchy,2,6), MARK(FontFamily,4), MARK(FontSize,6)};
RETAINTYPE;
h:Headinglevel{->h.family = family, HeadingHierarchy.family = family}
<-{FontFamily{PARSE(family)};};
h:Headinglevel{->h.size = size, HeadingHierarchy.size = size}
<-{FontSize{PARSE(size)};};
INT i=1;
BLOCK(ForEachHeadLevel)Document{}
{
# h:Headinglevel{-> family = h.family, size = h.size};
h:Headinglevel{AND(h.family == family, h.size == size)-> h.level=i, HeadingHierarchy.level = i, UNMARK(h)};
}
Headinglevel{->i=i+1, CALL(Test2.ForEachHeadLevel)};
Document{->LOG(" LEVELS : " + (i))};
Expected Output:
HeadingHierarchy Feature
Apache UIMA... =>Arial,18 level: 1
What is Apa... =>Arial,16 level: 2
Getting sta... =>Arial,16 level: 2
UIMA Analys... =>Arial,16 level: 2
Ruta Engine... =>Times New Roman,14 level: 3
Configurati... =>Arial,10 level: 4
Annotation ... =>Times New Roman,14 level: 3
Configurati... =>Arial,10 level: 4
Apache UIMA... =>Arial,18 level: 1
Syntax =>Ar... =>Arial,16 level: 2
Rule elemen... =>Arial,16 level: 2

The problem is that the CALL restricts the window on the span matched by the rule element. This means that the BLOCK is only executed within an existing Headinglevel annotation. However, you need to have the complete document so that the second rule in the block does its job.
This is most likely not the best solution, but the first one that came to my mind.
You could reset the window within the BLOCK to the complete document regardless of the restriction of the CALL action with DOCUMENTBLOCK:
BLOCK (ForEachHeadLevel)Document{}
{
DOCUMENTBLOCK Document{}
{
# h:Headinglevel{-> family = h.family, size = h.size};
h:Headinglevel{AND(h.family == family, h.size == size)-> h.level=i, HeadingHierarchy.level = i, UNMARK(h)};
}
}
DOCUMENTBLOCK is a block extension. You need to include org.apache.uima.ruta.block.DocumentBlockExtension in the additionalExtensions configuration parameter.
Here's another solution using a FOREACH block:
INT i=0;
FOREACH(hl) Headinglevel{}{
hl{IS(Headinglevel)-> i=i+1, family = hl.family, size = hl.size};
h:Headinglevel{h.family == family, h.size == size -> h.level=i, HeadingHierarchy.level = i, UNMARK(h)};
}
DISCLAIMER: I am a developer of UIMA Ruta

Related

Add okta units to a pint registry and handle percent conversion

I have a pint registry that already defines many climate specific units, but I struggle to add a new one for cloud coverage.
I would like to add "okta"[0] as a new unit.
Oktas are an integer scale from 0 to 9.
A conversion from percent to oktas is possible with the formula oktas = round(percent_value * 8 / 100)[1]
I added okta as a new unit in pint with:
my_registry.define(
pint.unit.UnitDefinition("okta", "okta", ("octa", "octas",), pint.converters.ScaleConverter(0.01)
))
But my question is how do I add an auto-conversion from percent to oktas and vice-versa ?
I tried
units.define("okta = percent * 8 / 100") ; it works when using the units.Quantity(10, "okta").to("percent"), but the result is (obviously) not rounded.
units.define("okta = round(percent * 8 / 100)") ; it throws an "UndefinedUnitError: 'round' is not defined in the unit registry" when calling units.Quantity(10, "okta").to("percent").
I also tried to create a "cloudiness" Context and add an add_transformation to it, but without success.
References:
[0] https://en.wikipedia.org/wiki/Okta
[1] at Chapter 2.1.2, on variable "CC" https://knmi-ecad-assets-prd.s3.amazonaws.com/documents/atbd.pdf

count with an input from user in Terraform?

I want to create a number of resources using terraform depending which number the user enters. The number have to be between 2 and 5.
I tried:
in vars.tf:
variable "user_count" {
type = number
default = 2
description = "How many number of VMs to create (minimum 2 and no more than 5): "
}
The problem here is its creates the resources with the default number 2.
Another case:
variable "user_count" {
type = number
description = "How many number of VMs to create (minimum 2 and no more than 5): "
}
Here, without the default parameter. I get the message/description, but I/the user can enter anything!
How to make this possible? - the user get a message and verify the number is between 2 and 5, else the resources will not be created.
Any help is appreciate - I am really stuck in it!
You may try custom validation
variable "user_count" {
type = number
description = "How many number of VMs to create (minimum 2 and no more than 5): "
validation {
condition = var.user_count > 1 && var.user_count < 6
error_message = "Validation condition of the user_count variable did not meet."
}
}
But maybe better option instead of checking number, will be variable as string and regex to check if value is 2,3,4 or 5.
variable "user_count" {
type = string
description = "How many number of VMs to create (minimum 2 and no more than 5): "
validation {
# regex(...) fails if it cannot find a match
condition = can(regex("2|3|4|5", var.user_count))
error_message = "Validation condition of the user_count variable did not meet."
}
}

Possible to write Typology into dataset?

I am working with TraMineR and I am new to R and TraMineR.
Actually I made a typology of a life course dataset with TraMineR and the cluster library in R.
(used this guide: http://traminer.unige.ch/preview-typology.shtml)
Now I have different Cases sorted into different Types (all in all 4 Types).
I want to get into deeper analysis with a certain Type but I need to know which cases ( I have case numbers) belong to which type.
#
Is it possible to write the certain type a case is sorted to into the dataset itself as a new variable Is there another way?
In the example of the referenced guide, the Type is obtained as follows using an optimal matching distances with substitution costs based on transition probabilities
library(TraMineR)
data(mvad)
mvad.seq <- seqdef(mvad, 17:86)
dist.om1 <- seqdist(mvad.seq, method = "OM", indel = 1, sm = "TRATE")
library(cluster)
clusterward1 <- agnes(dist.om1, diss = TRUE, method = "ward")
cl1.4 <- cutree(clusterward1, k = 4)
cl.4 is a vector with the cluster membership of the sequences in the order corresponding to the mvad dataset. (It could be convenient to transform it into a factor.) Therefore, you can simply add this variable as an additional column to the dataset. For instance, if we want to name this new column as Type
mvad$Type <- cl1.4
tail(mvad[,c("id","Type")]) ## id and Type of the last 6 sequences
## id Type
## 707 707 3
## 708 708 3
## 709 709 4
## 710 710 2
## 711 711 1
## 712 712 4

Anylogic referencing columns in a collection

I am using a collection to represent available trucks in a system. I am using a 1 or 0 for a given index number, using a 1 to say that indexed truck is available. I am then trying to assign that index number to a customer ID. I am trying to randomly select an available truck from those listed as available. I am getting an error saying the left-hand side of an assignment must be a variable and highlighting the portion of the code reading Available_Trucks() = 1. This is the code:
agent.ID = randomWhere(Available_Trucks, Available_Trucks() = 1);
The way you are doing it won't work... randomWhere when applied to a collection of integers, will return the element of the collection (in this case 1 or 0).
So doing
randomWhere(Available_Trucks,at->at==1); //this is the right synthax
will return 1 always since that's the value of the number chosen in the collection. So what you need is to get the index of the number of the collection that is equal to 1. But you will have to create a function to do that yourself... something like this (probably not the best way but it works: agent.ID=getRandomAvailbleTruck(Available_Trucks);
And the function getRandomAvailbleTruck will take as an argument a collection (arrayList probably).. it will return -1 if there is no availble truck
int availableTrucks=count(collection,c->c==1);
if(availableTrucks==0) return -1;
int rand=uniform_discr(1,availableTrucks);
int i=0;
int j=0;
while(i<rand){
if(collection.get(j)==1){
i++;
if(i==rand){
return j;
}
}
j++;
}
return -1;
Now another idea is to instead of using 0 and 1 for the availability, you can use correlative numbers: 1,2,3,4,5 ... etc and use a 0 if it's not available. For instance if truck 3 is not availble, the array will be 1,2,0,4,5 and if it's available it will be 1,2,3,4,5.
In that case you can use
agent.ID=randomTrue(available_trucks,at->at>0);
But you will get an error if there is no available truck, so check that.
Nevertheless, what you are doing is horrible practice... And there is a much easier way to do it if you put the availability in your truck if your truck is an agent...
Then you can just do
Truck truck=randomWhere(trucks,t->t.available==1);
if(truck!=null)
agent.ID=truck.ID;

How make logic OR between vertex Indexes in Titan 1.0 / TP3 3.01 using predicate Text

During my migration from TP2 0.54 -> TP3 titan 1.0 / Tinkerpop 3.01
I'm trying to build gremlin query which make "logical OR" with Predicate Text , between properties on different Vertex indexes
Something like:
------------------- PRE-DEFINED ES INDEXES: ------------------
tg = TitanFactory.open('../conf/titan-cassandra-es.properties')
tm = tg.openManagement();
g=tg.traversal();
PropertyKey pNodeType = createPropertyKey(tm, "nodeType", String.class, Cardinality.SINGLE);
PropertyKey userContent = createPropertyKey(tm, "storyContent", String.class, Cardinality.SINGLE);
PropertyKey storyContent = createPropertyKey(tm, "userContent", String.class, Cardinality.SINGLE);
//"storyContent" : is elasticsearch backend index - mixed
tm.buildIndex(indexName, Vertex.class).addKey(storyContent, Mapping.TEXTSTRING.asParameter()).ib.addKey(pNodeType, Mapping.TEXTSTRING.asParameter()).buildMixedIndex("search");
//"userContent" : is elasticsearch backend index - mixed
tm.buildIndex(indexName, Vertex.class).addKey(userContent, Mapping.TEXTSTRING.asParameter()).ib.addKey(pNodeType, Mapping.TEXTSTRING.asParameter()).buildMixedIndex("search");
v1= g.addVertex()
v1.property("nodeType","USER")
v1.property("userContent" , "dccsdsadas")
v2= g.addVertex()
v2.property("nodeType","STORY")
v2.property("storyContent" , "abdsds")
v3= g.addVertex()
v3.property("nodeType","STORY")
v3.property("storyContent" , "xxxx")
v4= g.addVertex()
v4.property("nodeType","STORY")
v4.property("storyContent" , "abdsds") , etc'...
------------------- EXPECTED RESULT: -----------
I want to return all vertexes with property "storyContent" match text contains prefix , OR all vertexes with property "userContent" matching its case.
in this case return v1 and v2 , because v3 doesn't match and v4 duplicated so it must be ignored by dedup step
g.V().has("storyContent", textContainsPrefix("ab")) "OR" has("userContent", textContainsPrefix("dc"))
or maybe :
g.V().or(_().has('storyContent', textContainsPrefix("abc")), _().has('userContent', textContainsPrefix("dcc")))
PS,
I thought use TP3 OR step with dedup , but gremlin throws error ...
Thanks for any help
Vitaly
How about something along those lines:
g.V().or(
has('storyContent', textContainsPrefix("abc")),
has('userContent', textContainsPrefix("dcc"))
)
Edit - as mentioned in the comments, this query won't use any index. It must be split into two separate queries.
See TinkerPop v3.0.1 Drop Step documentation and Titan v1.0.0 Ch. 20 - Index Parameters and Full-Text Search documentation.
With Titan, you might have to import text predicates before:
import static com.thinkaurelius.titan.core.attribute.Text.*
_.() is TinkerPop2 material and no longer used in TinkerPop3. You now use anonymous traversals as predicates, which sometimes have to start with __. for steps named with reserved keywords in Groovy (for ex. __.in()).