Using covered text on an annotator in UIMA RUTA - uima

I would like to use the covered text of an annotation as in input in the subsequent code (eg, to name another annotation). Is it possible to recall the covered text and mention it again in the code? For example, if I have the following text -
Heading1
.............(Text 1)
Heading2
..............(Text 2)
Code:
DECLARE Header
"Heading1" {-> MARK(Header)}
DECLARE Text_Heading1 (where Heading1 = covered text of header)
Is it possible to do this in Ruta?
Thanks a lot!

DECLARE Header;
DECLARE TextHeading (STRING headerText);
"Heading." -> Header;
Header{-> TextHeading, TextHeading.headerText=Header.ct};
"ct" refers to the covered text of the matched annotation as a feature. If the annotations should be created at different positions, you may need to use additional language elements like a string variable, MATCHEDTEXT, and CREATE/GATHER or label expressions.

Related

xpath for named LabeledStatement or BreakStatement

I want to create a PMD rule to forbid usage of labeled statements
Sadly I could not find a common XPath for such statements.
I need a XPath query which finds
//LabeledStatement for the labels itself
and for the
ContinueStatement and BreakStatement i would need a possiblity to check if a label is defined there.
From the PMD Rule builder (XPath builder) the labels are defined as:
BreakStatement:loop (loop is the defined label name and could be anything)
ContinueStatement:loop (loop is the defined label name and could be anything)
can someone give me a hint what XPath I should define?
You are on a very good track. Using the rule designer is a great way to figure this one out, specially since PMD 6.0.0 which revamped the GUI.
As you figured, //LabeledStatement will match all labels (which you don't want), and //BreakStatement and //ContinueStatement will flag all break / continues, which you only want to flag if they are followed by a tag.
Therefore, you simply need to check if those have a tag set or not. Using the designer to inspect properties of those AST nodes makes it easy to figure it out, the attribute where the label is stored is the Image, which is null when none is defined. As XPath stringifies all attributes, a null value is an empty string.
Therefore:
//LabeledStatement | //BreakStatement[#Image != ""] | //ContinueStatement[#Image != ""]
Will match:
All labels
All breaks with a label
All continues with a label

Modify all text output by TYPO3

I would like to create a "cleanup" extension that replaces various characters (quotes by guillemets) in all kinds of textfields in TYPO3.
I thought about extending <f:format.html> or parseFunc, but I don't know where to "plug in" so I get to replace output content easily before it's cached.
Any ideas, can you give me an example?
If you don't mind regexing, try this:
$GLOBALS['TYPO3_CONF_VARS']['SC_OPTIONS']['tslib/class.tslib_fe.php']['cleanUpQuotes'][] = \NAMESPACE\Your\Extension::class;
Insert it into ext_localconf.php and this part is done.
The next step is the class itself:
public function cleanUpQuotes(TypoScriptFrontendController $parentObject)
{
$parentObject->content = DO_YOUR_THING_HERE
}
There also is another possibility which could replace any strings in the whole page - as it operates on the rendered page (and not only on single fields).
You even can use regular expressions.
Look at my answer -> here

To find whether annotation type is part of the covered text

Sample Input file:
<p class="Head1"><a name="para1">Sections 87-89</a></p>
some text
<p class="Head2"><a name="para2">Sections 90-92</a></p>
some text
<p class="ParaFL"><a name="para3">Some Text1</a></p>
<p class="ParaFirstLineInd"><a name="para4">Some Text2</a></p>
For example from the sample input file, if I annotate "Sections 87-89 and Sections 90-92" as Head1".Now I want to compare the annotation type(Head1) with its class type ( class="Head1", class="Head").If annotation type is not equal to class, then I want to set a feature "class changed" for the corresponding annotation type.Similarly for "Some Text1" and "Some Text2" is annotated as ParaFL(annotation type).
It depends on how the required information is represented. I assume that the class information is represented by the HtmlTypeSystem in Ruta.
There are two language elements missing in Ruta (2.4.0) in order to solve this. The main problem is that the attribute information of html tag is stored in two separate arrays and there is not option in Ruta to jointly iterate over them. the second on is autoboxing of types to strings.
I recommend to create an analysis engine (which can also be executed from within a Ruta script), which creates new annotations with one string feature containing the required information. Then, you can compare the annotation to the feature value. Autoboxing does not convert the short type names. I would add a feature to your annotations types with the corresponding type/class values. Then, you can compare the feature values.

using JFace text Singlelinerule

Is it possible to use the Singlelinerule in RuleBasedPartitionScanner to detect
whether the partition starts with an alphabet or space ?
If you have rules for //, //* and /* you don't need a rule to cover the remaining text - that text will be put in the default IDocument.DEFAULT_CONTENT_TYPE partition.
Update:
Neither SingleLineRule or its parent class PatternRule support testing for a range of characters. However you could write your own implementation of IPredicateRule to do this, look at the PatternRule implementation to see how columns and match is handled.

Custom field in backend with new extensions

When I build a new extension with Quickstarter, I customize how the backend with the file ext_tables.php, using this line:
$TCA['tt_content']['types'][$_EXTKEY . '_pi1']['showitem'] = 'CType, header,media;Images';
Where I can add new fields and even rename them. But sometimes I find weird sufixes to each field like, for example, "media;;;;1-1-1" which control other stuff that appear around the controls in the backend.
How can I know what these codes mean?
Have a look at the TCA Documentation there is a description of the types-section ($TCA['tt_content']['types']). In there is a table, where ['showitem'] is explained.
Part 1: Field name reference (Required!)
Part 2: Alternative field label (string or LLL reference)
Part 3: Palette number (referring to an entry in the "palettes" section).
Part 4: Special configuration (split by colon ( : )), e.g. 'nowrap' and 'richtext[(list of keys or *)]' (see “Additional $TCA features”)
Part 5: Form style codes (see “Visual style of TCEforms”)