UIMA: Plug & Play Annotators for differen teams' chains - uima

Assume I have a UIMA toolchain that does something like this:
tokenize -> POS tagging -> assign my custom tags/annotations -> use the custom tags to assign more tags -> further processing.
Would it be possible tou use a third party, let's say entity-recognition (that uses POS tags but does not need much more), right after the POS-tagging, in between the two custom things or afterwards?
I'm asking this questions because I can see complications due to the type systems. In particular the most difficult case may be pluggin a thrid party ER annotator in between the custom things or right after them. The third party annotator won't expect our custom tags to be there.
However, there are just additional annotations that have to be "passed through" the annotator without looking at them or modifying them. So, in principle, I'd assume that this is possible. I just don't know if UIMA supports this or is all about writing full chains on your own with strict typing everywhere.
If this isn't possible out of the box, could we write the custom annotators in a way such that they can be plugged anywhere where POS tags are available independent from if there are other annotations present. I.e. as authors of annotators take care that there may be some necessary annotations, some annotations we add and any number of annotations that may be present or not and we do not care about them and only pass them through?

The third party annotator won't expect our custom tags to be there.
If I understand correctly, you are concerned that your custom annotations might collide with the third-party NER, right? It won't, unless your code adds exactly the same annotations.
This is the strength of UIMA: every Analysis Engine (AE) is independent of the others, it only cares about the annotations that are passed in the CAS. For example, say you have an AE that expects annotation of type my.namespace.Token. It doesn't matter which AE created these annotations, as long as there are present in the CAS.
The price to pay for this flexibility is that you (as a developer) have to make sure that the required annotation for each AE are present. For example, if an AE expects annotations of type my.namespace.Sentence but none are present, this AE won't be able to do any processing.

Related

How do I check valid method name for an Object in Pascal?

I have a class (character) with inherited classes (solider, medic etc) that have specific game related methods. E.g. Shoot or Heal.
I want it so that the user can type in Heal, for example, and the program can check what type of character they have and therefore see if that is a valid name of a method in that Object.
I know it's possible in other languages but can't see how to do it in Pascal. It must work in Free Pascal as well as Delphi. Thank you
You don't need to be able to check for the validity of a method name to do this, and it is probably preferable if you don't.
You could do check a method name's using RTTI, but that is implemented somewhat differently in FreePascal than Delphi, (in particular for extended RTTI).
However, it would be far more straightforward to implement your own look-up mechanism to resolve in-game entity-names, properties and verbs in a dictionary of some sort. That would be trivial in both FP and Delphi and independent of the compiler used. It would also allow the names used by the end-user to be independent of the names used in code, which would be easier internationalisation, etc. It would also avoid the problem which would arise if an in-game identifier contained a character not permitted in a Pascal identifier (such as a space, accented character or whatever).
PS: You didn't ask this, BUT ... if I were contemplating writing a text-game of any size, I would seriously consider doing it as a hybrid Delphi of Prolog: Delpi for the gui and Prolog as a far easier language in which to code in-game actions, objects and rules, and there is one paricular implementation, Amzi Prolog, which has a very rich interface for interfacing a Prolog engine with Delphi -see https://www.amzi.com/#apls. Amzi used to be commercial but is now PD, fwiw.

Are there reliable algorithms for generating robust DOM node selectors, given only the target node?

In writing a scraper, we typically use some kind of selector to identify particular nodes of interest. Ideally the selectors should continue to work even as the page changes over time. A lot of the common approaches like grabbing nodes by id are fragile on frequently updated pages and impossible on some nodes. I'm trying to find good algorithms for generating robust selectors, but since there doesn't seem to be a standard terminology for this problem, it's hard to find everything that's out there.
Here are the selector DSLs I already know.
XPath selectors - Implemented everywhere from JS to the popular
Python and Ruby scraping libraries.
CSS selectors - Found in many of the places where you can find xpath
selectors.
High level selectors - Here I'll give the example of Chickenfoot,
which allows users to write click("begin tutorial") to find a link
with the text "Begin Tutorial." Usually these are implemented on top of
xpath and CSS selectors. I'd love to find out about more members of
this language family.
Visual selectors - This would be the approach taken by, for instance,
Sikuli, which makes it appear as though the program is calling a
function on a screengrab of the relevant node. I don't know any
web-specific instances of this approach, but I imagine there are
some.
Here are the selector generation algorithms I already know. By a selector generation algorithm I mean an algorithm that takes a node as input and produces a robust selector as output.
iMacros: Finds all elements with the same node type and text as the
target element, finds the target element's index in this list list. Uses
the node type, text, and index as the selector. Also includes id
for forms and form elements.
CoScripter: Uses element's text if available. If not, uses preceding
text.
Selenium: Uses id where available. Uses various other attributes
otherwise, such as image alt text, links' displayed texts, buttons'
displayed texts.
Wargo System: Uses element text.
Many systems: Many systems use the xpath from the root to the target node, or some
suffix of that xpath.
All of these selector generation algorithms fail on some nodes. Are there better approaches out there? Or other approaches that I could combine with these algorithms to produce a better hybrid algorithm?
When I started investigating this topic for some work I am doing, I was also surprised by how little information is available on this topic.
I did find this 2003 paper, but unfortunately, I only have access to the abstract:
Abe, Mari, and Masahiro Hori. “Robust Pointing by XPath Language: Authoring Support and Empirical Evaluation.” In Proceedings of the 2003 Symposium on Applications and the Internet, 156 – . SAINT ’03. Washington, DC, USA: IEEE Computer Society, 2003.
For my own use, I followed the approach in Tim Vasil's 50-line jquery plugin. I won't reproduce the code which is available at that link, but instead I'll describe it:
It recursively traverses up the DOM tree from the element, building a selector "backwards". At each level:
If the node has an ID, just use that and skip all the parents; they aren't added to the selector.
If node has a tag name or a set of classes that is unique among its siblings, use that as the selector. Otherwise, use :nth-child.
Since I will be storing element contents between visits to a page, I'm thinking of implementing some "blunder detection" here, maybe using a percentage change from last visit to detect if the selector may be grabbing the wrong element.

Has anyone implement DITA 1.2 Subject scheme in their work?

I would like to know if there is anyone who has implemented the subjectscheme maps of DITA1.2 in their work? If yes, can you please break-up the example to show:
how to do it?
when not to use it?
I am aware of the theory behind it, but I am yet to implement the same and I wanted to know if there are things I must keep in mind during the planning and implementation phase.
An example is here:
How to use DITA subjectSchemes?
The DITA 1.2 spec also has a good example (3.1.5.1.1).
What you can currently do with subject scheme maps is:
define a taxonomy
bind the taxonomy to a profiling or flagging attribute, so that it the attribute only takes a value that you have defined
filter or flag elements that have a defined value with a DITAVAL file.
Advantage 1: Since you have a taxonomy, filtering a parent value also filters its children, which is convenient.
Advantage 2: You can fully define and thus control the list of values, which prevents tag bloat.
Advantage 3: You can reuse the subject scheme map in many topic maps, in the usual modular DITA way, so you can apply the same taxonomies anywhere.
These appear to be the main uses for a subject scheme map at present.
The only disadvantages I have found is that I can think of other hypothetical uses for subject scheme maps such as faceted browsing, but I don't think any implementation exists. The DITA-OT doesn't have anything like that yet anyway.

CMFWorkflow and Marker Interfaces

I'm currently prototyping a small project in Plone and trying to KISS as much as possible while the requirements are still in flux. To that end, I've resisted creating any custom content types for now and have been using marker interfaces to distinguish between "types" of content.
Now that I'm looking at workflow, I've realised that they're bound to types, and there doesn't seem to be a mechanism for assigning them to markers. I think I could wrap portal_workflow with my own version that looks for markers and returns the appropriate workflow if found, however, this doesn't feel like a tenable approach.
Is there a way of assigning workflow to markers that I've missed, or should I just bite the bullet and create some lightweight custom content types instead?
There's not really a built-in feature to use markers, but at http://www.martinaspeli.net/articles/dcworkflows-hidden-gems, Martin Aspeli hints that it is possible:
Note that in Plone, the workflow chain of an object is looked up by
multi-adapting the object and the workflow to the IWorkflowChain
interface. The adapter factory should return a tuple of string
workflow names (IWorkflowChain is a specialisation of IReadSequence,
i.e. a tuple). The default obviously looks at the mappings in the
portal_workflow tool, but it is possible to override the mapping, e.g.
in resopnse to some marker interface.

Searching for a concept like 'verbosity' in Modelica

I'm struggling with the size of output files for large Modelica models. Off course, I can protect some objects in order to remove them completely from the result file. However, that gives rise to two problems:
it's not possible to redeclare protected objects
if i want to test my model in detail (eg for a short time period), i need to declare those objects publicly again in order to see their variables
I wonder if there's a trick to set the 'verbosity' of a Modelica model. Maybe what I would like is a third keyword next to public, protected, eg. transparent. Then, when setting up a simulation, I want be able to set the verbosity level to 1, or 2 with the following effect:
1--> consider all transparentelements as protected
2--> consider all transparentelements as public
This effect would propagate to all models and submodels.
I don't think this already exists. But is there an easy workaround?
Thanks,
Roel
As Michael Tiller wrote above, this is not handled the same way in all Modelica tools and there is no definite answer. To give an OpenModelica-specific answer, it's possible to use simulate(ModelName,outputFilter="regex"), to store only the variables that fully match the given regex (default is .*, matching any variable).
Roel,
I know several people wrestling with this issue. At the moment, all of this depends on the tool being used. I don't know how other tools handle filtering of results, but in Dymola you control it (as you point out) by giving the signals special qualifiers (e.g. protected).
One thing I've done in the past is to extend from a model and then add a bunch of output signals for things I'm interested in. Then you can select "Outputs" in Dymola to make sure those get in the results file. This is far from perfect because a) listing everything you want can get tedious and b) referencing protected variables is not strictly allowed (although Dymola lets you get away with it but issues a warning).
At Dassault, we are actively discussing this idea and hope to provide some better functionality along these lines. It isn't clear whether such functionality will be strictly tool specific or whether it will involve the language somehow. But if it is language related, we will (of course) work with the design group to formulate a specification that other tool vendors can support as well.
In SystemModeler, you go to the Settings tab in the Experiment Browswer in Simulation Center. Click on Output on the bottom and select which variables to store.
(The options are state variables, derivatives, algebraic variables, parameters, protected variables and if you mark the Store simulation log-option, you'll get some interesting statistics on events over time and function evaluations, opening another possibility to track down parts of the simulation and model that creates more evaluations)
I am not sure if this helps you, but in Dymola you can go to Simulation->Setup->Output and mark a checkbox saying "Store Protected variables". That way it is possible to declare most variables as protected: during normal simulation they are not stored, but when debugging your model, you just mark that checkbox and they are stored.
Of course that is not the same as your suggested keyword transparent, but maybe it helps a little...
A bit late, but in Dymola 2013 FD01 and later you can select which variables to store based on names (and model names) using the annotation __Dymola_selections, and even filter on user-defined tags - so you could create a tag name "transparent" in the model. See "Matching and variable selections" in the manual.