Using UIMA Ruta: How do I annotate the first token of a text and use that annotation further? - uima

I would like to annotate the first token of a text and use that annotation in following rules. I have tried different patterns:
Token.begin == 0 (doesn't work, although there definitely is a token that begins at 0)
Token{STARTSWITH(DocumentMetaData)}; (also doesn't work)
The only pattern that works is:
Document{->MARKFIRST(First)};
But if I try to use that annotation e.g. in the following way:
First{->MARK(FirstAgain)};
it doesn't work again. This makes absolutely no sense to me. There seems to be a really weird behaviour with annotations that start at 0.

This trivial task can be a bit tricky indeed, mainly because of the visibility settings. I do not know why your rules in the question do not work without having a look at the text that should be processed.
As for UIMA Ruta 2.7.0, I prefer a rule like:
# Token{->First};
Here some additional thoughts about the rules in the question:
Token.begin == 0;
Normally, there is not token with begin at 0 since the document starts with some whitespaces or line breaks. If there is actually a token that starts at offset 0 and the rule does not match, then something invisible is covering the begin of the end of the token. This depends of course of the filtering settings, but in case that you did not change them, it could be a bom.
Token{STARTSWITH(DocumentMetaData)};
Here, either the problem above applies, or the begin offset is not identical. If the DocumentMetaData covers the complete document, then I would bet on the leading whitespaces. Another reason could be that the internal indexing is broken, e.g., the tokens or the DocumentMetaData are created by an external analysis engine which was called with EXEC and no reindexing was configured in the action. This situation could also occur with unfortunate optimizations using the config params.
Document{->MARKFIRST(First)};
First{->MARK(FirstAgain)};
MARKFIRST creates an annotation using the offset of the first RutaBasic in the matched context IIRC. If the document starts with something invisible, e.g., a line break, then the second rule cannot match.
As a general advice in situations like this when some obvious simple rules do not work correctly as expected, I recommend adding some additional rules and using the debugging config with the explanation view. As rule like Token; can directly highlight if the visibility setting are problematic for the given tokens.
DISCLAIMER: I am a developer of UIMA Ruta

Related

Possibility of a multilanguage 'source' name with Twincat Eventlogger

Roald has written an excellent guide for the Twincat Eventlogger.
https://roald87.github.io/twincat/2020/11/03/twincat-eventlogger-plc-part.html
https://roald87.github.io/twincat/2021/01/20/twincat-eventlogger-hmi-part.html
For us this is exactly what we want, there is however 1 thing I haven't figured out. How to get the sourcename of the alarm in multiple languages in the HMI. params::sourceName gives the path in the software (example: MAIN.fbConveyor1.Cylinder1) This path can be customized when initializing the alarm (as Roald has shown). This doesn't work in my case, since I would like to define a generic alarm (example: "Cilinder not retracted within maximum time") that is instantiated multiple times.
I was thinking of using the source as a way to show the operator where the alarm occurs. We use this way (path) already for saving machine settings among other things. The machines we build are installed all over the world, so multilanguage is a must.
Beckhoff does support multilanguage alarm names (when defined), but the source is not defined, but dynamically generated.
Anyone have an idea how this problem can be solved?
If I understand your question correctly, then being able to parameterize the event text with information of the source of the problem should help you out.
If you define the event text as Cylinder {0} has not retracted in time. then you can add the arguments of that text during runtime.
IF bRaiseAlarm THEN
bRaiseAlarm := FALSE;
fbAlarm.ipArguments.Clear().AddString('Alice');
fbAlarm.Raise(0);
END_IF
However, since this also stated in the articles you mentioned, I am unsure if this would solve your problem.
'Alice' in this example, can be hard to localize. The following options come to my mind.
The string can be based on an ENUM. Enums can have textlist support, so if you add your translations there, that should allow multilingual output. However... this does require a lot of setup, placing translations inside your code, and making sure the PLC application is aware of the language that the parameter should use.
Use tags to mark the source device, as tags can be language invariant. It is not the most user-friendly method, but it could work for you. It would become something like: "Cylinder 'AA.1123' did not retract in time.". 'AA.1123' as a tag would have to be stored inside your PLC code as a string. You will have to trust that your operator can relate the tag back to the actual source.
Hopefully, this helped, or else please help me understand the problem better.

How do I configure the ModSecurity engine to be ON for a single attack type and DetectionOnly for all others?

I need to gradually implement ModSecurity. It must be configured to only block attacks by a single attack type (e.g. SQLi), but log all other attacks from the other attack types.
For ease of upgrading the owasp rules, it is recommended to avoid modifying the original owasp rules. Ideally I'm looking for a solution which will follow this guideline and won't require modifying the original owasp rules.
Currently my test configuration is only accomplishing part of this. With this Debian installation of ModSecurity, I have removed individual rule files from /usr/share/modsecurity-crs/rules/*.conf from the configuration. This allows me to enable ModSecurity with engine=on and only the rule sets for the particular attack type loaded in the configuration, but it is not logging the incidents of other attack types.
You’ve a few options:
1) Use anomaly scoring and the sql_injection_score value that the OWASP CRS sets for SQLi rules.
Set your mode to DetectionOnly.
Set your anomaly scoring values very high in
Add a new rule that blocks if sql_injection_score is above a certain amount.
This can be achieved with an extra rule like this:
SecRule tx.sql_injection_score "#gt 1”
"id:9999,\
phase:5,\
ctl:ruleEngine=on \
block"
Setting the ”#gt 1” to an appropriate threshold.
The OWASP CRS sets similar variables for other categories as well.
2) Load rules individually and rules before and after to turn rule engine on and off.
Within a phase rules are executed in order specified. You can use this to have config like the following:
SecRuleEngine DetectionOnly
Include rules/other_rules_1.conf
Include rules/other_rules_2.conf
SecAction “id:9000, phase:2, ctl: ctl:ruleEngine=on”
Include rules/sqli_rules.conf
SecAction “id:9001, phase:2, ctl: ctl:ruleEngine=off”
Include rules/other_rules_3.conf
Include rules/other_rules_4.conf
However if a category contains several phases then you’ll need to add several SecActions - one for each phase used.
3) Active the rules you want by altering the Actions to include turning on the ruleEngine.
Set your mode to DetectionOnly.
Use SecRuleUpdateActionById to add a ctl:ruleEngine=on to the rules you want on. It would be nice if there was a SecRuleUpdateActionByTag or SecRuleAddActionByTag but there isn’t (though it has been asked for in the past).
This is probably a bit fragile as depends on knowing the specific rule ids and also requires checking the actions per rule or assuming they are all the same. Probably better to just edit the CRS files to be honest.
This is probably the best if you want to only enable a set of rules, rather than a full category.
4) Edit the files, to do the same as above directly.
This is not a bad option if you know this will be a short term option and eventually you hope to enable all rules anyway. Revert the file back when ready.
Alternatively leave the original rules in place and copy the rules, giving them new ids, and with the addition of the ctl:ruleEngine=on action.

Turn off Blazegraph inference

When I add triples that use rdfs:subPropertyOf, Blazegraph adds a reflexive triple. After doing some research, I came to the conclusion that I could turn off this behavior by going into the blazegraph.properties file and uncommenting this line:
com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms
I then deleted the journal and namespace my app is using, started up the service again, recreated the namespace, and tested to see if I could add a subPropertyOf triple without the extra triples being added in, and I still can't. Is there anything else I need to do? Or am I incorrect in thinking this behavior is inference?

UIMA Ruta negating conditions

This might be a trivial question, but I'm new to Ruta so bear with me please.
My testdata consists of numbers in the following format:
0.1mm 0,11mm 1.1mm 1,1mm 1mm
I use the following rule to annotate the first four examples:
((NUM(COMMA|PERIOD)NUM) W{REGEXP("mm")}) {-> nummm};
Document{->MARK(nummm)};
Now I want to annotate "1mm", for example, too, but I'm kind of stuck right now, because I have no idea how to do this. I tried negating Conditions, like AFTER (as in "if NUM mm not after comma or period"), but it either didn't work or the syntax was wrong. Any help would be appreciated!
EDIT: I should add that I want to annotate "1mm", but not the 1mm part after a comma or period, as of right now i basically annotate everything twice.
There are really a lot of ways to specify this in UIMA Ruta.
Here's the first thing that came to my mind:
(NUM{-PARTOF(nummm)} (PM{PARTOF({COMMA,PERIOD})} NUM)? W{REGEXP("mm")}){-> nummm};
This is probably not the "best" rule but should do what you want. There are three main changes:
I made the middle part of the rule optional so that it also matches on a single NUM.
I added the negated PARTOF of at the first rule element thus the matching will fail if the starting point is already covered by a nummm annotation. The - is a shortcut for the NOT condition.
I replaced the expensive disjunctive composed rule element with a simple one just because it is not really necessary here.
This rule works because the actions of a rule match are already executed before the next rule match is considered.
DISCLAIMER: I am a developer of UIMA Ruta.

The driver.findelement don't find the tab element:

i have this Problem with my test ..the
driver.findElement(By.xpath("//html/body/div[2]/div/div/div[2]/div[2]/div/div[2]/div/div/div/div/div/div/div/ul/li[2]/a[2]/em/span/span/span")).click();
don't find the element.
the eclipse show this message of error
Cannot locate a node using
//html/body/div[2]/div/div/div[2]/div[2]/div/div[2]/div/div/div/div/div/div/div/ul/li[2]/a[2]/em/span/span/span
EDIT : Post edited to reflect answer to actual problem. Original answer follows.
Long XPath expressions are fragile, and tests are prone to fail when relying on them : a completely unrelated change somewhere else in the document can mess everything up, and even if you're aware of the problem, the tests' code is just harder to maintain.
In this particular case, since the site is generated by GWT, it's even worse - there is little control over the actual HTML changes. A good solution when using GWT is to use the ensureDebugId method (see link in comments).
Are you sure that this XPath expression is correct ? Does other tests work with this driver ?
I'd recommend avoiding the use of long XPath expressions like that - wouldn't it be safer in the long term to start the expression at an id-specified div somewhere in the page rather than at the root of the DOM ?