Ruta- Abbreviation finding - uima

Is there any way to find the abbreviation short before its expansion Using Uima Ruta.
Sample Input Document
Data science” is widely recognized as an increasingly powerful force in the realm of web management and development, as well as in society in general. ML is an application of artificial intelligence. On the He found an automated teller machine (ATM). Allowing these companies to realize continuous innovation and improvement in user experience through rapid any time money (ATM) app. These ATM latter two companies are working to regain competitive advantages in the evolving web using data science techniques including natural language processing (NLP) and machine learning (ML)
Problem
I want to get the values, ML alone not ATM Because it's used as short form after expansion only. Is there a way to do so?

Here an example how to project annotations using a simplified definition detection. Does that help?
PACKAGE uima.example;
DECLARE AbbreviationDefinition;
DECLARE AbbreviationLongform;
DECLARE Abbreviation;
STRINGLIST definedAccronyms;
INT expectedWordcount;
(W[expectedWordcount, expectedWordcount]{-> AbbreviationLongform}
SPECIAL.ct=="("
c:#CAP{-> Abbreviation}<-{c{-> expectedWordcount = (c.end-c.begin)};}
SPECIAL.ct==")"
){-> AbbreviationDefinition};
// TODO check first characters of Abbreviation and AbbreviationLongform and remove annotations again if required
a:Abbreviation{PARTOF(AbbreviationDefinition) -> ADD(definedAccronyms, a.ct)};
MARKFAST(Abbreviation, definedAccronyms);
Abbreviation->{a:#Abbreviation{-> UNMARK(a)} ANY; ANY a:#Abbreviation{-> UNMARK(a)};};
a:Abbreviation{CONTAINS(Abbreviation,2,2) -> UNMARK(a)};
DISCLAIMER: I am a developer of UIMA Ruta

Related

Converting SBML model into a simulatable Matlab Function

I'm looking for a tool to convert a SBML model into a Matlab function. I've tried SBMLTranslate() function from libSBML but this returns a Matlab struct, not a function. Does anybody know if such tool exists? Thanks
There are at least three efforts in this direction:
Frank Bergmann offers an online service for SBML translation where you can upload an SBML file and it will generate a MATLAB file. The comments at the top of the generated MATLAB file explain how to use the results. The C++ source code is available on SourceForge.
Bergmann's code referenced above was used by Stanley Gu to create sbml2matlab, a Windows standalone program. Off-hand, I don't know whether Gu's version changed or enhanced the algorithm used by the Bergmann version, but it seems likely. (Note: Gu now works at Google and does not maintain this code anymore, as far as I know.)
The Systems Biology Format Converter (SBFC) is a framework written principally by Nicolas Rodriguez; it includes a collection of converters, one of which is an SBML-to-MATLAB converter. This converter is written in Java.
I have not compared the results of the translators myself yet, so cannot speak to the differences or quality of output. If you try them and have any feedback to relate, please let the authors know. Knowing what has or hasn't worked for real users will help improve things in the future.
A final caveat is that all of these have been research projects, so make sure to set your expectations accordingly. (This is not a criticism of the authors; the authors are very good – I know most of them personally – but the reality of academic development work is that we all lack the time and resources to make these systems comprehensive, hardened, polished, and documented to the degree that we wish we could.)

Convert MIndiGolog fluents to the IndiGolog causes_val format

I am using Eclipse (version: Kepler Service Release 1) with Prolog Development Tool (PDT) plug-in for Prolog development in Eclipse. Used these installation instructions: http://sewiki.iai.uni-bonn.de/research/pdt/docs/v0.x/download.
I am working with Multi-Agent IndiGolog (MIndiGolog) 0 (the preliminary prolog version of MIndiGolog). Downloaded from here: http://www.rfk.id.au/ramblings/research/thesis/. I want to use MIndiGolog because it represents time and duration of actions very nicely (I want to do temporal planning), and it supports planning for multiple agents (including concurrency).
MIndiGolog is a high-level programming language based on situation calculus. Everything in the language is exactly according to situation calculus. This however does not fit with the project I'm working on.
This other high-level programming language, Incremental Deterministic (Con)Golog (IndiGolog) (Download from here: http://sourceforge.net/p/indigolog/code/ci/master/tree/) (also made with Prolog), is also (loosly) based on situation calculus, but uses fluents in a very different way. It makes use of causes_val-predicates to denote which action changes which fluent in what way, and it does not include the situation in the fluent!
However, this is what the rest of the team actually wants. I need to rewrite MIndiGolog so that it is still an offline planner, with the nice representation of time and duration of actions, but with the causes_val predicate of IndiGolog to change the values of the fluents.
I find this extremely hard to do, as my knowledge in Prolog and of situation calculus only covers the basics, but they see me as the expert. I feel like I'm in over my head and could use all the help and/or advice I can get.
I already removed the situations from my fluents, made a planning domain with causes_val predicates, and tried to add IndiGolog code into MIndiGolog. But with no luck. Running the planner just returns "false." And I can make little sense of the trace, even when I use the GUI-tracer version of the SWI-Prolog debugger or when I try to place spy points as strategically as possible.
Thanks in advance,
Best, PJ
If you are still interested (sounds like you might not be): this isn't actually very hard.
If you look at Reiter's book, you will find that causes_vals are just effect axioms, while the fluents that mention the situation are usually successor-state-axioms. There is a deterministic way to convert from the former to the latter, and the correct interpretation of the causes_vals is done in the implementation of regression. This is always the same, and you can just copy that part of Prolog code from indiGolog to your flavor.

Dynamic typing and programming distributed systems

Coming from Scala (and Akka), I recently began looking at other languages that were designed with distributed computing in mind, namely Erlang (and a tiny bit of Oz and Bloom). Both Erlang and Oz are dynamically typed, and if I remember correctly (will try to find link) people have tried to add types to Erlang and managed to type a good portion of it, but could not successfully coerce the system to make it fit the last bit?
Oz, while a research language, is certainly interesting to me, but that is dynamically typed as well.
Bloom's current implementation is in Ruby, and is consequently dynamically typed.
To my knowledge, Scala (and I suppose Haskell, though I believe that was built initially more as an exploration into pure lazy functional languages as opposed to distributed systems) is the only language that is statically typed and offer language-level abstractions (for lack of a better term) in distributed computing.
I am just wondering if there are inherent advantages of dynamic typing over static typing, specifically in the context of providing language level abstractions for programming distributed systems.
Not really. For example, the same group that invented Oz later did some work on Alice ML, a project whose mission statement was to rethink Oz as a typed, functional language. And although it remained a research project, I'd argue that it was enough proof of concept to demonstrate that the same basic functionality can be supported in such a setting.
(Full disclosure: I was a PhD student in that group at the time, and the type system of Alice ML was my thesis.)
Edit: The problem with adding types to Erlang isn't distribution, it simply is an instance of the general problem that adding types to a language after the fact never works out well. On the other hand, there still is Dialyzer for Erlang.
Edit 2: I should mention that there were other interesting research projects for typed distributed languages, e.g. Acute, which had a scope similar to Alice ML, or ML5, which used modal types to enable stronger checking of mobility characteristics. But they have only survived in the form of papers.
There are no inherent advantages of dynamic typing over static typing for distributed systems. Both have their own advantages and disadvantages in general.
Erlang (Akka is inspired from Erlang Actor Model) is dynamically typed. Dynamic typing in Erlang was historically chosen for simple reasons; those who implemented Erlang at first mostly came from dynamically typed languages particularly Prolog, and as such, having Erlang dynamic was the most natural option to them. Erlang was built with failure in mind.
Static typing helps in catching many errors during compilation time itself rather than at runtime as in case of dynamic typing. Static Typing was tried in Erlang and it was a failure. But Dynamic typing helps in faster prototyping. Check this link for reference which talks a lot about the difference.
Subjectively, I would rather think about the solution/ algorithm of a problem rather than thinking about the type of each of the variable that I use in the algorithm. It also helps in quick development.
These are few links which might help
BenefitsOfDynamicTyping
static-typing-vs-dynamic-typing
BizarroStaticTypingDebate
Cloud Haskell is maturing quickly, statically-typed, and awesome. The only thing it doesn't feature is Erlang-style hot code swapping - that's the real "killer feature" of dynamically-typed distributed systems (the "last bit" that made Erlang difficult to statically type).

Tooling for expressive, feature rich numeric computations on the JVM

I am looking for numeric computation tooling on the JVM. My major requirements are expressiveness/readability, ease of use, evaluation and features in terms of mathematical functions. I guess I am after something like the Matlab kernel (probably including some basic libraries and w/o graphics) on the JVM. I'd like to be able to "throw" computional code at a running JVM and want this code to be evaluated. I don't want to worry about types. Arbitrary precision and performance is not so important.
I guess there are some nice libraries out there but I think an appropriate language on top is needed to get the expressiveness.
Which tooling would you guys suggest to address expressive, feature rich numeric computation on the JVM ?
From the jGroovyLab page:
The GroovyLab environment aims to provide a Matlab/Scilab like scientific computing platform that is supported by a scripting engine implemented in Groovy language. The GroovyLab user can work either with a Matlab-lke command console, or with a flexible editor based on the jsyntaxpane (http://code.google.com/p/jsyntaxpane/) component, that offers more convenient code development. Also, GroovyLab supports Computer Algebra based on the symja (http://code.google.com/p/symja/) project.
And there is also GroovyLab:
GroovyLab is a collection of Groovy classes to provide matlab-like syntax and basic features (linear algebra, 2D/3D plots). It is based on jmathplot and jmatharray libs:
Groovy has a smooth learning curve for Java programmers and a flexible syntax similar to Ruby. It is also pretty easy to write a DSL on it.
Though Groovy's performance is pretty good for a dynamic language, you can use static compilation if you are in the need for it.
Most of Mathworks Matlab is built on the Intel Math Kernel Library (MKL), which is (IMHO) the unbeatable champion in linear algebra computations. There is java support, but it costs 500 dollar (the MKL, not just the java support)...
Best second option if you want to use java is jblas, which uses BLAS and LAPACK, the industry standards for linear algebra.
Pure java libraries' performances are horrible apparently, see here...
Spire sounds like it's aiming at the area you're looking at. It takes advantage of a lot of recent scala features such as macros to get decent performance without having to sacrifice the expressiveness of being in a high level language.
There's also breeze, which is targeted at machine learning but includes a fair amount of linear algebra stuff.
Depending how much work you want to get into and what languages you're already familiar with, Incanter in the Clojure world might be worth a look. Also quickly evolving in Clojure right now is core.matrix, which aims to encapsulate high-level common abstractions in linear algebra implemented with various methods or packages.
You highlighted expressiveness in your post, and the nice thing about Clojure is that, as a Lisp, it is possible to make or extend DSLs to closely match problem domains. This is one of the big draws of the language (and of Lisps in general).
I'm the original author of core.matrix for Clojure. So I have a clear affiniy and much more knowledge in this specific space. That said, I'm still going to try and give you an honest answer :-)
I was the the same position as you a year or so back, looking for a solution for numeric computation that would be scalable, flexible and suitable for deployment as a clustered cloud service.
I ended up going with Clojure for the following reasons:
Functional Programming: Clojure is a functional programming language at heart, more so than most other language (although not as much as Haskell....). Lazy infinite sequences, persistent data structures, immutability throughout etc. Makes for elegany code when you are dealing with big computations.
Metaprogramming: I saw a need to do code generation for vector / computational experessions. Hence being a Lisp was a big plus: once you have done code generation in a homoiconic language with a "whole language" macro system then it's hard to find anything else that comes close.
Concurrency - Clojure has an impressive and movel approach to multi-code concurrency. If you haven't seen it then watch: http://www.infoq.com/presentations/Value-Identity-State-Rich-Hickey
Interactive REPL: Something I've always felt is very important for data work. You want to be able to work with your code / data "live" to get a real feel for its properties. Having a dynamically typed language with an interactive REPL works wonders here.
JVM based: big advantage for pragmantic purposes, because of the huge library / tool ecosystem and the excellent engineering in the JVM as a runtime platform.
Community: I saw a lot of innovation going on in Clojure, particularly around the general area of data and analytics.
The main thing Clojure was lacking at that time was a good library / API for matrix operations. There were some nice tools in Incanter, but they weren't very general purpose or performant. Hence I started developing core.matrix, which is shaping up to be an idiomatic Clojure-flavoured equivalent of NumPY / SciPY. Right now it is still work in progress but good enough for production use if you are careful.
In terms of low-level matrix support, I also maintain vectorz-clj, which is my attempt to provide a core.mattrix implementation that offers high performance vector/matrix operations while remaining Pure Java (i.e. no native dependencies). If you are interested in the performance of this, you may like to see:
http://clojurefun.wordpress.com/2013/03/07/achieving-awesome-numerical-performance-in-clojure/
My second choice after Clojure would have been Scala. I liked Scala's slightly greater maturity and decent static type system. Both the languages are JVM based so the library / tool side was a tie. It was probably the Lisp features that clinched it.
If you happen to have access to Mathematica, then it's fairly easy to get it working with the JVM by means of J/Link. For Clojure, Clojuratica is an excellent library to make that as seemless as possible, although it's not been maintained for a while and it may take some effort to get it working in modern environments again.

What are "not so well defined problems" that LISP is supposed to solve?

Most people agree that LISP helps to solve problems that are not well defined, or that are not fully understood at the beginning of the project.
"Not fully understood"" might indicate that we don't know what problem we are trying to solve, so the developer refines the problem domain continuously. But isn't this process language independent?
All this refinement does not take away the need for, say, developing algorithms/solutions for the final problem that does need to be solved. And that is the actual work.
So, I'm not sure what advantage LISP provides if the developer has no idea where he's going i.e. solving a problem that is not finalised yet.
Lisp (not "LISP") has a number of advantages when you're facing problems that are not well-defined. First of all, you have a REPL where you can quickly experiment with -- that helps in sketching out quick functions and trying to play with them, leading to a very rapid development cycle. Second, having a dynamically typed language is working well in this context too: with a statically typed language you need to "design more" before you begin, and changing the design leads to changing more code -- in contrast, with Lisps you just write the code and the data it operates on can change as needed. In addition to these, there's the usual benefits of a functional language -- one with first class lambda functions, etc (eg, garbage collection).
In general, these advantage have been finding their way into other languages. For example, Javascript has everything that I listed so far. But there is one more advantage for Lisps that is still not present in other languages -- macros. This is an important tool to use when your problem calls for a domain specific language. Basically, in Lisp you can extend the language with constructs that are specific to your problem -- even if these constructs lead to a completely different language.
Finally, you need to plan ahead for what happens when the code becomes more than a quick experiment. In this case you want your language to cope with "growing scripts into applications" -- for example, having a module system means that you can get a more "serious"
application. For example, in Racket you can get your solution separated into such modules, where each can be written in its own language -- it even has a statically typed language which makes it possible to start with a dynamically typed development cycle and once the code becomes more stable and/or big enough that maintenance becomes difficult, you can switch some modules into the static language and get the usual benefits from that. Racket is actually unique among Lisps and Schemes in this kind of support, but even with others the situation is still far more advanced than in non-Lisp languages.
In AI (Artificial Intelligence) historically Lisp was seen as the AI assembly language. It was used to build higher-level languages which help to work with the problem domain in a more direct way. Many of these domains need a lot of 'knowledge' for finding usable answers.
A typical example is an expert system for, say, oil exploration. The expert system gets as inputs (geological) observations and gives information about the chances to find oil, what kind of oil, in what depths, etc. To do that it needs 'expert knowledge' how to interpret the data. When you start such a project to develop such an expert system it is typically not clear what kind of inferences are needed, what kind of 'knowledge' experts can provide and how this 'knowledge' can be written down for a computer.
In this case one typically develops new languages on top of Lisp and you are not working with a fixed predefined language.
As an example see this old paper about Dipmeter Advisor, a Lisp-based expert system developed by Schlumberger in the 1980s.
So, Lisp does not solve any problems. But it was originally used to solve problems that are complex to program, by providing new language layers which should make it easier to express the domain 'knowledge', rules, constraints, etc. to find solutions which are not straight forward to compute.
The "big" win with a language that allows for incremental development is that you (typically) has a read-eval-print loop (or "listener" or "console") that you interact with, plus you tend to not need to lose state when you compile and load new code.
The ability to keep state around from test run to test run means that lengthy computations that are untouched by your changes can simply be kept around instead of being re-computed.
This allows you to experiment and iterate faster. Being able to iterate faster means that exploration is less of a hassle. Very useful for exploratory programming, something that is typical with dealing with less well-defined problems.