How many languages does UIMA Ruta supports? - uima

I am new to text analysis, UIMA and UIMA Ruta related technologies and working on a new software (Java based) for intelligent document processing. Currently, I am going through all the reading materials related with UIMA/Ruta. One question I have and still don't know the clear answer is how many different languages does UIMA Ruta supports? I would be kind for any other help/link/doc regarding what reading materials should I go through (for an intelligent document processing software capable of analyzing documents in multiple languages). Thanks -Rahul

Ruta itself is a (scripting) language which is language-agnostic and per se does not support any particular set of (natural) languages. You can write Ruta scripts for any language such as English, Spanish, Chinese, etc.
For example, take a look at the Learning by Example section in the official Ruta reference. It presents a simple script that marks animals in English texts. As should be obvious, you could do the same for any language by adapting the regular expressions in the example code.
Therefore, which languages your system will support depends entirely on your Ruta scripts and not Ruta itself.

Related

Making a DSL and an interpreter in Eclipse?

i have to do a DSL and an interpreter of it using Eclipse modeling framework i think so , because i dont have a lot of information about it . I have four months to do it and i am very lost .
The DSL have to read files from sensors , and with the DSL you can make complex math operations . Anyone one know any free resource/book/tutorial/guide where i can read about that ( i can't find anything useful) or anyone can tell my some clues to follow and how start . Thank you so much.
I try to find some examples doing someting like that and i can't find anything.
Eclipse Modeling Project: A Domain-Specific Language (DSL) Toolkit and EMF: Eclipse Modeling Framework (2nd Edition) are two great books on this topic that you can get used for about five dollars each. While not free, they are well worth the small price. There is also a newer reference Implementing Domain Specific Languages with Xtext and Xtend that seems very relevant but I have not had the pleasure of reading it yet, so can't vouch for it.
There are also many free talks on these subjects on youtube and all the EMF, Xtext, etc. websites have quite a bit of tutorials.
Also, based on this question: Interpreter vs. Code Generator Xtext , Xtext does not appear to support interpreters but Xbase may.
There are examples of using XText to build an interpreter / interpreted language (eg. https://eclipse.org/Xtext/documentation/202_scripting.html).
For me, it took a while to get all plugins configurations correct, but it is well documented, on Xtext web and in github / tutorials.
Also, look at XTend (http://www.eclipse.org/xtend/) as this is a major birck in XText framework.

How/are you supposed to use the DKPro libraries with UIMA Ruta?

I have studied the default UIMA Ruta Workbench Eclipse project enough to significantly understand its moving parts - for instance, why the input/ and output/ folders behave as they do, how to accomplish the project using the jcasgen and other Maven plugins, etc.
But even after hours of studying the project and playing with Maven to try to get it to work, I still have a lot of trouble doing something very simple: using the DKPro libraries (the types especially) from a Ruta script.
My fundamental question is this: what is the path of least resistence towards using the types and analysis components from the DKPro and TC libraries within a Ruta script?
My specific questions are:
I noticed that in the desc/type folder of many api jars there are TypeSystemDescription XML files that would appear to be appropriate for use with Ruta. Is there some way of getting a "master" TypeSystemDescription XML file for the DKPro components?
Is there a project of significant complexity that uses both Ruta and DKPro that I can study?
What is the distinction between an AnalysisEngine as in what you do with Ruta scripts and an Analysis Component you write in Java?
Edited to reflect less frustration
Actually, the Ruta and DKPro people do workshops together and sit happily around the campfire afterwards - or at least in a cocktail bar and have some drinks. Unfortunately, we don't get around to doing that very often.
The kind and number of questions you are asking calls for a tutorial ;)
Did you have a look the slides and examples from our joint workshop at GSCL 2013?
It includes several examples of how to use DKPro Core and Ruta together. In those examples, there is a Maven project responsible for fetching the DKPro Core dependencies and separate Ruta projects then have a dependency on that Maven project and use the analysis engines.
It should also work to have a single project with both, the Ruta and Maven natures.
The way to get a single type descriptor for all DKPro Core types in your classpath (or rather for all uimaFIT-enabled types in your classpath) is
import org.apache.uima.fit.factory.TypeSystemDescriptionFactory;
OutputStream os = ...
TypeSystemDescriptionFactory.createTypeSystemDescription().toXML(os);
Check out the GSCL 2013 tutorial examples.
AnalysisComponent represents the view from the inside, i.e. from the perspective of the developer of components (the view from within the framework). AnalysisEngine represents the view from the outside, i.e. from the user of a component/workflow. However, typically one would say "I'm implementing a new analysis engine" and mean "I'm going to subclass JCasAnnotator_ImplBase (an implementation of AnalysisComponent)". See also this post on the UIMA developer mailing list.
Disclosure: I am a DKPro Core developer and an Apache UIMA developer.

Free Pascal without Lazarus

Could any of you chaps point me in the direction of a good tutorial or book for Free Pascal that does not rely on using Lazarus? Ideally I would like just to write code in a text editor and compile/link using the command-line. I have no existing knowledge of Pascal at all.
Unfortunately all the material I have uncovered through Google assumes the use of Lazarus, prior knowledge of Pascal or refers only to the pre-Delphi/FPC versions of Pascal without 'modern' features such as dynamic strings, objects and so forth.
Lazarus doesn't change the base dialect possibilities. It is only an IDE and visual library over freepascal, nearly everything nonvisual is FPC.
Just install FPC and compile with
fpc <programname>
As reference for the dialect see the Free Pascal reference guide, which contains nonvisual examples. When you have a base level of knowledge, you can adapt Lazarus related examples to your own environment.

What is the fastest way to create a cross-platform IDE for a new programming language?

The title already says most of what I'm after, but let me state some of the requirements explicitly:
The language is not widely used, so writing a new language tokenizer etc is assumed to probably be required.
Cross-platform, means at least Linux, Mac OS and Windows
Minimal features: Syntax highlighting and Code-completion (aka "IntelliSense")
Preferrable features: Interactive debugging
Assumption: The developer is not an expert in any one programming language (although mediocre in a few, and eager to learn new techniques), so the focus on an environment / tools that quickly gets a developer up to speed, and is productive enough to reach the goal as fast as possible.
Xtext would be the perfect fit for these requirements. All you need to do is to define your grammar and you have your parser, linker, editor, etc. Of course all of this can be customized to your needs.
If your language compiles down to Java, you also get expressions and debugging out of the box.
Lazarus + SynEdit + SynCompletion (cross platform + syntax highlighting + autocompletion), interactive debugging is way to difficult I guess.

Tools for manual translation of Constants/Messages .properties files

I'm looking for some tools that could be used by human translators during the process of translating our GWT application into other languages.
Currently, we have the English version of .properties files containing constants and messages, and need create the files for other languages. This tool should be easy to use, so even non-IT-lover can master it.
Or, do you suggest other method for translation of the texts?
I heard the "community" approach becomes quite popular, by that I mean that one uploads his texts to some (?) forum, and the community there creates the translations into other language - but as I said, I don't know much about this
Are there any online platforms for this purpose?
any other ideas?
See my SO answer for VB 6 source code, speech text is in french want to translate to english. The same answer works if you replace the computer langauge "VB6" by "JavaScript".