UIMA RUTA - Sofa mapping -in Aggregate Pipeline - uima

This is in regards to the question.
UIMA RUTA - how to do find & replace using regular expression and groups
I'm trying to setup Sofa mappings as suggested. I have an aggregate AE with several AEs and trying to incorporate 2 RUTA AEs/scripts within this pipeline. Both RUTA AEs (and associated scripts) are responsible for REGEXP find and replace using a Modifier. The 2nd AE is dependent on the output of the first AE. I had to configure the modifier's outputView of the 2nd AE, otherwise I was getting a 'Sofa data already set' exception.
In essence, I'm unable to weave the output of one as the input of the other AE.
The setup I have is similar to below,
_initialview --Input> (Normalizer1 RUTA AE) --Output> norm_1_out
norm_1_out --Input> (Normalizer2 RUTA AE) --Output> norm_2_out
norm_2_out --Input> (Other AE)
Here's the Aggregate AE code
<?xml version="1.0" encoding="UTF-8"?>
<analysisEngineDescription xmlns="http://uima.apache.org/resourceSpecifier">
<frameworkImplementation>org.apache.uima.java</frameworkImplementation>
<primitive>false</primitive>
<delegateAnalysisEngineSpecifiers>
<delegateAnalysisEngine key="NormalizerPrepStep1">
<import location="../../../ruta-annotators/desc/NormalizeNumbersEngine.xml"/>
</delegateAnalysisEngine>
<delegateAnalysisEngine key="NormalizerPrepStep2">
<import location="../../../ruta-annotators/desc/NormalizeRangesEngine.xml"/>
</delegateAnalysisEngine>
<delegateAnalysisEngine key="Normalizer">
<import location="../../../ruta-annotators/desc/NormalizerEngine.xml"/>
</delegateAnalysisEngine>
<delegateAnalysisEngine key="SimpleAnnotator">
<import location="../../../textanalyzer/desc/analysis_engine/SimpleAnnotator.xml"/>
</delegateAnalysisEngine>
</delegateAnalysisEngineSpecifiers>
<analysisEngineMetaData>
<name>RUTAAggregatePlaintextProcessor</name>
<description>Runs the complete pipeline for annotating documents in plain text format.</description>
<version/>
<vendor/>
<configurationParameters searchStrategy="language_fallback">
<configurationParameter>
<name>SegmentID</name>
<description/>
<type>String</type>
<multiValued>false</multiValued>
<mandatory>false</mandatory>
<overrides>
<parameter>SimpleAnnotator/SegmentID</parameter>
</overrides>
</configurationParameter>
</configurationParameters>
<configurationParameterSettings/>
<flowConstraints>
<fixedFlow>
<node>NormalizerPrepStep1</node>
<node>NormalizerPrepStep2</node>
<node>Normalizer</node>
<node>SimpleAnnotator</node>
</fixedFlow>
</flowConstraints>
<typePriorities>
<name>Ordering</name>
<description>For subiterator</description>
<version>1.0</version>
<priorityList>
</priorityList>
</typePriorities>
<fsIndexCollection/>
<capabilities>
<capability>
<inputs/>
<outputs/>
<inputSofas>
<sofaName>norm_1_out</sofaName>
<sofaName>norm_2_out</sofaName>
<sofaName>normalized</sofaName>
</inputSofas>
<languagesSupported/>
</capability>
</capabilities>
<operationalProperties>
<modifiesCas>true</modifiesCas>
<multipleDeploymentAllowed>true</multipleDeploymentAllowed>
<outputsNewCASes>false</outputsNewCASes>
</operationalProperties>
</analysisEngineMetaData>
<resourceManagerConfiguration/>
<sofaMappings>
<sofaMapping>
<componentKey>SimpleAnnotator</componentKey>
<aggregateSofaName>normalized</aggregateSofaName>
</sofaMapping>
<sofaMapping>
<componentKey>NormalizerPrepStep2</componentKey>
<aggregateSofaName>norm_1_out</aggregateSofaName>
</sofaMapping>
<sofaMapping>
<componentKey>Normalizer</componentKey>
<aggregateSofaName>norm_2_out</aggregateSofaName>
</sofaMapping>
</sofaMappings>
</analysisEngineDescription>
Few things to note,
all three RUTA AEs (step1, step2, normalizer) uses RUTA Modifier
the above setup throws an exception "No sofaFS with name norm_2_out
found." - this happens after step 2.
I have tried to switch 'norm_2_out' to 'modified' as the input sofa to
normalizer, this seems to move the processing to the next step in the pipeline (normalizer), but that throws an exception "Data for Sofa feature
setLocalSofaData() has already been set." at
org.apache.uima.ruta.engine.RutaModifier.process(RutaModifier.java:107)
I have tried with RUTA 2.2.0 (snapshot) with the same result
As I'm relatively new to both UIMA and RUTA, not sure if I'm doing something wrong or if there's a limitation that I'm running into.
BTW, I'm using RUTA 2.1.0
Thanks

The first thing that I noticed in your example is that you have to specify output sofas in your AAE. Those are all sofas that are created in the AAE, e.g, by one of its components.
Then there are sofa mappings missing. You have to connect the output views of the AEs with the input views of the other AEs. In your example, I only see the default input views.
I created a unit test, which can be applied as an example for this task.
The test is here: https://svn.apache.org/repos/asf/uima/ruta/trunk/ruta-core/src/test/java/org/apache/uima/ruta/engine/CascadedModifierTest.java
The resources (descriptors) used in the test are here: https://svn.apache.org/repos/asf/uima/ruta/trunk/ruta-core/src/test/resources/org/apache/uima/ruta/engine
Mind that I deleted the absolute paths in the ruta descriptors and adapted the namespace of the imported scripts. They are now loaded by classpath for the test instead of using the absolute paths.
The test calls an aggregate analysis engine AAE.xml, which imports and maps five analysis engines:
CWEngine.xml: simple Ruta script that replaces capitalized words. CW{->REPLACE("CW")}; CW.ruta
ModiferCW.xml: a normal modifier
SWEngine.xml: simple Ruta script that replaces small-written words. SW{->REPLACE("SW")}; SW.ruta
ModiferSW.xml: a normal modifier
SimpleEngine.xml: simple Ruta script that defines a new type and matches on "CW" followed by "SW". DECLARE CwSw; ("CW" "SW"){-> CwSw}; Simple.ruta
The aggreagted analysis engines defines three views: global1 (input), global2 (output) and global3 (output). The sofa mapping of the components is the following:
global1 -> [CWEngine, ModiferCW] -> global2 -> [SWEngine, ModiferSW] -> global3-> [SimpleEngine]
Given the text Peter is tired. in the view global1, the aggregate analysis engine creates two new views with the view global3 containing the text CW SW SW. and one annotation of the type Simple.CwSw.

Related

reltable is not working with dita open tool kit 3.0

I have created the file below. But it is giving me an error in dita open tool kit version 3.0.
{?xml version="1.0" encoding="UTF-8"?}
{!DOCTYPE map PUBLIC "-//OASIS//DTD DITA Map//EN" "map.dtd"}
{map id="ducks"}
{title}Ducks{/title}
{reltable}
{relheader}
{relcolspec type="concept"/}
{relcolspec type="task"/}
{relcolspec type="reference"/}
{/relheader}
{relrow}
{relcell}{topicref href="c.dita"/}{/relcell}
{relcell}{topicref href="t.dita"/}{/relcell}
{relcell}{topicref href="r.dita"/}{/relcell}
{/relrow}
{/reltable}
{/map}
The files c.dita (concept), t.dita (task) or r.dita (reference) are available.
Regards
Deepak Bhatia
Is this the full map? If so, then the problem is you are missing the structure that generates the output. Add topicref elements to build the content sand it should work.

UIMA Ruta - basic example

I am trying the example of uima ruta:
here.
I want to create ruta script and apply it to my text (from plain java without any workbench).
1.how do i get the type system descriptor from plain java (without workbench)?
2. when do i get it with workbench? (if i "run" the ruta script, no description were made.)
The main question is whether the script declares new types.
If no new types are declared, the linked examples in the documentation should be sufficient.
If new types are declared in the script, then a type system description needs to be created and included in the creation process of the CAS before the script can be applied on the CAS.
The type system description of a script containing the type descriptions of the types declared within the script can be created the following ways:
The Ruta Workbench creates the type system description automatically for each script within a simple Ruta Project when the script is saved. If no description is created, the script is most likely not parseable and contains syntax errors.
In maven-built projects, the ruta-maven-plugin can be utilized to create the type system descriptions of Ruta scripts.
In plain Java, the RutaDescriptorFactory can be utilized to create the type system description programmatically. Here's a code example.
There are several ways to create and execute a ruta-based analysis engine in plain java code. Here's an example without using additional files:
String rutaScript = "DECLARE MyType; CW{-> MyType};";
RutaDescriptorFactory descriptorFactory = new RutaDescriptorFactory();
RutaBuildOptions options = new RutaBuildOptions();
options.setResolveImports(true);
options.setImportByName(true);
RutaDescriptorInformation descriptorInformation = descriptorFactory
.parseDescriptorInformation(rutaScript, options);
// replace null values for build environment if necessary (e.g., location in classpath)
Pair<AnalysisEngineDescription, TypeSystemDescription> descriptions = descriptorFactory
.createDescriptions(null, null, descriptorInformation, options, null, null, null);
AnalysisEngineDescription rutaAnalysisEngineDescription = descriptions.getKey();
rutaAnalysisEngineDescription.getAnalysisEngineMetaData().getConfigurationParameterSettings().setParameterValue(RutaEngine.PARAM_RULES, rutaScript);
TypeSystemDescription rutaTypeSystemDescription = descriptions.getValue();
// directly set type system description since no file will be created
rutaAnalysisEngineDescription.getAnalysisEngineMetaData().setTypeSystem(rutaTypeSystemDescription);
ResourceManager resourceManager = UIMAFramework.newDefaultResourceManager();
AnalysisEngine ae = UIMAFramework.produceAnalysisEngine(rutaAnalysisEngineDescription);
List<TypeSystemDescription> typeSystemDescriptions = new ArrayList<>();
TypeSystemDescription scannedTypeSystemDescription = TypeSystemDescriptionFactory.createTypeSystemDescription();
typeSystemDescriptions.add(scannedTypeSystemDescription);
typeSystemDescriptions.add(rutaTypeSystemDescription);
TypeSystemDescription mergeTypeSystemDescription = CasCreationUtils.mergeTypeSystems(typeSystemDescriptions, resourceManager);
JCas jCas = JCasFactory.createJCas(mergeTypeSystemDescription);
CAS cas = jCas.getCas();
jCas.setDocumentText("This is my document.");
ae.process(jCas);
Collection<AnnotationFS> select = CasUtil.select(cas, cas.getTypeSystem().getType("Anonymous.MyType"));
for (AnnotationFS each : select) {
System.out.println(each.getCoveredText());
}
DISCLAIMER: I am a developer of UIMA Ruta

apigee overwriting the target.url with a custom variable

I have been trying to overwrite the target.url with a variable using the Assign Message Policy. Per other solutions, I have put this in the "Target EndPoint" Section. The issue is, unless I hard-code the root section of the URL, the substitution fails. I have tried all the commented VALUE stmts below with and then started adding the "REF"stmts to attempt to solve the issue - to no avail. You can see I have tried cutting the target into various snippets using Extract policies, but cannot get a solution that works.
Thanks for help.
For the purposes of the code snippet below
entireURL = "http://my.root.url/thestuff/morestuff"
AppServerURL = "my.root.url/thestuff/morestuff"
AppServerRoot = "my.root.url"
AppServerSfx = "thestuff/morestuff"
codee from Assign Message Policy
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<AssignMessage async="false" continueOnError="false" enabled="true" name="Post-to-named-serverL">
<DisplayName>Post to named server</DisplayName>
<FaultRules/>
<Properties/>
<AssignVariable>
<Name>target.url</Name>
<Value>http://${AppServerRoot}/{AppServerSfx}</Value>
<Ref/>
<!--
<Value>http://my.root.url/{AppServerSfx}</Value> works but I need the root changed
<Value>http://{AppServerRoot}/{AppServerSfx}</Value>
<Value>http://${AppServerRoot}/{AppServerSfx}</Value>
<Value>http://{AppServerURL}</Value>
<Value>http://${AppServerURL}</Value>
<Value>entireURL</Value>
<Value>{entireURL}</Value> -- this was my first try
<Value>${entireURL}</Value>
<Ref>entireURL</Ref>
<Ref>{entireURL}</Ref>
<Ref>${entireURL}</Ref>
-->
</AssignVariable>
<IgnoreUnresolvedVariables>false</IgnoreUnresolvedVariables>
<AssignTo createNew="false" transport="http" type="request"/>
</AssignMessage>
You are correctly putting the target.url manipulation in the Target Request flow.
Using AssignMessage/AssignVariable can be limiting. The Value element doesn't allow you to do any variable substitutions.
The following worked for me:
<Ref>entireURL</Ref>
Ref also doesn't allow variable substitutions -- it just takes the name of the variable. Since you have to build the value of that variable ahead of time, using the Ref example above doesn't buy you much.
I usually accomplish target URL rewriting using a JavaScript callout with code similar to the following:
var appServerRoot = context.getVariable("AppServerRoot");
var appServerSfx = context.getVariable("AppServerSfx");
context.setVariable("target.url", "http://" + appServerRoot + "/" + appServerSuffix);

Do not automatically add reference when using the Templace producer

I'm doing a CodeFluent Entities project and I use the Template Producer to generate a report that print some statistics about my model.
As I could see, this Producer automatically add two references (CodeFluent.Runtime.dll and CodeFluent.Runtime.Web.dll).
This is a great feature, nevertheless in my case, I don't generate any C# classes so the target project don't really need those references.
How can I disable this behavior ?
The Template producer inherits from the CodeDomProducer (the one that generates the BOM). This allows the template producer to have some useful methods like AddToGeneratedFiles that adds the file to Visual Studio target project, or AddCompilationReferences which adds references to the target project.
The producer also inherits some configuration options like Target Project Layout of type CodeFluent.Model.Design.TargetProjectLayoutOptions
[Flags]
public enum TargetProjectLayoutOptions
{
None = 0x0,
[Description("Update All")]
Update = UpdateReferences | UpdateItems,
[Description("Update References")]
UpdateReferences = 0x1,
[Description("Update Items")]
UpdateItems = 0x2,
[Description("Do Not Remove Existing Items")]
DontRemove = 0x4,
Default = Update,
}
As you can see this allows you to not update project references. So to answer your question, your producer configuration should look like
<cf:producer name="Template" typeName="CodeFluent.Producers.CodeDom.TemplateProducer, CodeFluent.Producers.CodeDom">
<cf:configuration cfx:targetProjectLayout="UpdateItems" [other options] />
</cf:producer>
Happy templating :)
I'm not sure you really can remove reference from CodeFluent.Runtime.dll, since it actually contains the Template engine, see http://blog.codefluententities.com/2013/12/26/exploring-the-codefluent-runtime-the-template-engine/ in the main project.
But since the generated code may not reference the CodeFluent.Runtime.dll it is not mandatory in the target project.
I aggree that it could be usefull to remove the CodeFluent.Runtime.Web.dll reference if you do not need it (which actually do not contain very much things, according to http://www.softfluent.com/documentation/).

Plone 4 - History for second workflow won't show in ##historyview

I have a dexterity content type in Plone 4.2.4. The versioning works fine with the default workflow for this content type, although it is not a workflow shipped with plone, but a custom made.
However, when I enable a second workflow for the same type, everything but the versioning works fine.
additional permissions managed by the second workflow are working
The state changes are working
The difference:
I used different state_variable names for the workflows, which seems to make sense, to have a catalogable field for the state of the second workflow.
I've tried to use the same state variable name, but that didn't help. I have the workflow variable review_history also set in the 2nd workflow and sufficient permissions in the context.
I am (mostly) shure, that I got the permission concept, but I have no clou, how permissions get calculated, when multiple workflows are involved.
Any idea, why the second workflow does not leave a trace in my content types history?
Thanks very much in advance.
Udate
I've reordered the workflows as Ida Ebkes suggested and did see, that all transitions from the 2nd workflow get stored properly. So it seems to be an issue with the historyview.
Since these workflows indeed describe concurrent behaviors of a content type, I really would like to stick with separate workflows and ideally different workflow state variables and catalog indexes.
What I now think needs to be done, is to tweak the historyview.
Here is how I did it. It works for plone 4.2.4 at least.
Since the problem was a display problem, I just had to tweak my historyviewlet. Therefore, I created a folder named viewlets in my product root and created a __init__.py and a configure.zcml file. Then i copied content_history.pt, history_view.pt, review_history.pt and content.py from plone/app/layout/viewlets/ (omelette) to the newly created folder.
The configure.zcml contains two view registrations:
<browser:view
for="*"
name="my-contenthistory"
class=".content.ContentHistoryView"
permission="zope2.View"
/>
<browser:page
for="*"
name="my-historyview"
template="history_view.pt"
permission="zope2.View"
/>
Furthermore, I copied the whole WorkflowHistoryViewlet class from content.py to a different class name. TransferHistory in this case. Then I changed mostly the part that corresponds to the workflow state variable, which was not review_state, but transfer_state. I further found that the initial usage of the 2nd workflow creates also a created entry in the history of the 2nd workflow, that I just filtered .
transfer_history = [x for x in transfer_history if x['action'] != None]
The I corrected the view name in history_view.pt to my new view name.
<div tal:replace="structure here/##my-contenthistory">Content History</div>
Finally, I added my class as parent to the ContentHistoryViewlet class in content.py
class ContentHistoryViewlet(WorkflowHistoryViewlet, TransferHistoryViewlet):
index = ViewPageTemplateFile("content_history.pt")
#memoize
def getUserInfo(self, userid):
[...]
def fullHistory(self):
history = self.workflowHistory() + self.revisionHistory() + self.transferHistory()
if len(history) == 0:
return None
history.sort(key=lambda x: x["time"], reverse=True)
return history
and registered the .zcml in the products configure.zcml
<include package=".viewlets" />
Then I modified content_history.pt and also changed the definition of action_id in the upper part of the file.
[...]
action_id python:item['action'] or item.get('review_state', False) or item.get('transfer_state', False);
[...]
After rebooting the monster and a product reinstall, all state changes from both workflows are shown in the my-historyview.