UIMA Ruta - Gather with optional annotation - uima

I need to gather a few annotations to create a new annotation. For example, suppose
a is marked as annotation A
b is marked as annotation B
c is marked as annotation C
I want to create another annotation D that has A,B and C as features, but B needs to be optional.
A B? C{-> GATHER D, 1, 2, "a" = 1, "b" = 2, c=3)};
This does not work if B is missing, and I understand that it is because of the numbers associated. Is there a workaround for this?
Thanks!

This should work just fine. Your example with some syntax corrections and some prerequisites:
DECLARE A, B, C;
DECLARE D (A a, B b, C c);
"a" -> A;
"b" -> B;
"c" -> C;
A B? C{-> GATHER(D, 1, 2, "a" = 1, "b" = 2, "c" = 3)};
... tested on text "a b c a c" with UIMA Ruta 2.4.0
DISCLAIMER: I am a developer of UIMA Ruta

Related

Is there a function that transforms/maps both Either's Left and Right cases taking two transformation functions respectively?

I have not found a function in Scala or Haskell that can transform/map both Either's Left and Right cases taking two transformation functions at the same time, namely a function that is of the type
(A => C, B => D) => Either[C, D]
for Either[A, B] in Scala, or the type
(a -> c, b -> d) -> Either a b -> Either c d
in Haskell. In Scala, it would be equivalent to calling fold like this:
def mapLeftOrRight[A, B, C, D](e: Either[A, B], fa: A => C, fb: B => D): Either[C, D] =
e.fold(a => Left(fa(a)), b => Right(fb(b)))
Or in Haskell, it would be equivalent to calling either like this:
mapLeftOrRight :: (a -> c) -> (b -> d) -> Either a b -> Either c d
mapLeftOrRight fa fb = either (Left . fa) (Right . fb)
Does a function like this exist in the library? If not, I think something like this is quite practical, why do the language designers choose not to put it there?
Don't know about Scala, but Haskell has a search engine for type signatures. It doesn't give results for the one you wrote, but that's just because you take a tuple argument while Haskell functions are by convention curried†. https://hoogle.haskell.org/?hoogle=(a -> c) -> (b -> d) -> Either a b -> Either c d does give matches, the most obvious being:
mapBoth :: (a -> c) -> (b -> d) -> Either a b -> Either c d
...actually, even Google finds that, because the type variables happen to be exactly as you thought. (Hoogle also finds it if you write it (x -> y) -> (p -> q) -> Either x p -> Either y q.)
But actually, as Martijn said, this behaviour for Either is only a special case of a bifunctor, and indeed Hoogle also gives you the more general form, which is defined in the base library:
bimap :: Bifunctor p => (a -> b) -> (c -> d) -> p a c -> p b d
†TBH I'm a bit disappointed that Hoogle doesn't by itself figure out to curry the signature or to swap arguments. Pretty sure it actually used to do that automatically, but at some point they simplified the algorithm because with the huge number of libraries the time it took and number of results got out of hand.
Cats provides Bifunctor, for example
import cats.implicits._
val e: Either[String, Int] = Right(41)
e.bimap(e => s"boom: $e", v => 1 + v)
// res0: Either[String,Int] = Right(42)
The behaviour you are talking about is a bifunctor behaviour, and would commonly be called bimap. In Haskell, a bifunctor for either is available: https://hackage.haskell.org/package/bifunctors-5/docs/Data-Bifunctor.html
Apart from the fold you show, another implementation in scala would be either.map(fb).left.map(fa)
There isn't such a method in the scala stdlib, probably because it wasn't found useful or fundamental enough. I can somewhat relate to that: mapping both sides in one operation instead of mapping each side individually doesn't come across as fundamental or useful enough to warrant inclusion in the scala stdlib to me either. The bifunctor is available in Cats though.
In Haskell, the method exists on Either as mapBoth and BiFunctor is in base.
In Haskell, you can use Control.Arrow.(+++), which works on any ArrowChoice:
(+++) :: (ArrowChoice arr) => arr a b -> arr c d -> arr (Either a c) (Either b d)
infixr 2 +++
Specialised to the function arrow arr ~ (->), that is:
(+++) :: (a -> b) -> (c -> d) -> Either a c -> Either b d
Hoogle won’t find +++ if you search for the type specialised to functions, but you can find generalised operators like this by replacing -> in the signature you want with a type variable: x a c -> x b d -> x (Either a b) (Either c d).
An example of usage:
renderResults
:: FilePath
-> Int
-> Int
-> [Either String Int]
-> [Either String String]
renderResults file line column
= fmap ((prefix ++) +++ show)
where
prefix = concat [file, ":", show line, ":", show column, ": error: "]
renderResults "test" 12 34 [Right 1, Left "beans", Right 2, Left "bears"]
==
[ Right "1"
, Left "test:12:34: error: beans"
, Right "2"
, Left "test:12:34: error: bears"
]
There is also the related operator Control.Arrow.(|||) which does not tag the result with Either:
(|||) :: arr a c -> a b c -> arr (Either a b) c
infixr 2 |||
Specialised to (->):
(|||) :: (a -> c) -> (b -> c) -> Either a b -> c
Example:
assertRights :: [Either String a] -> [a]
assertRights = fmap (error ||| id)
sum $ assertRights [Right 1, Right 2]
==
3
sum $ assertRights [Right 1, Left "oh no"]
==
error "oh no"
(|||) is a generalisation of the either function in the Haskell Prelude for matching on Eithers. It’s used in the desugaring of if and case in arrow proc notation.

What is a BitVector and how to use it as return in Breeze Scala?

I am doing a comparison between two BreezeDenseVectors with the following way a :< b and what i get as a return is a BitVector.
I haven't worked again with this and everything i read about it, was not helpful enough.
Can anyone explain to me how they work?
Additionally, by printing the output, i get: {0, 1, 2, 3, 4 }. What is this supposed to mean?
You can check BitVectorTest.scala for more detail usage.
Basically, a :< b gives you a BitVector, which indicates which elements in a smaller than the ones in b.
For example val a = DenseVector[Int](4, 9, 3); val b = DenseVector[Int](8, 2, 5); a :< b will gives you BitVector(0, 2), it means that a(0) < b(0) and a(2) < b(2), which is correct.

The right way to normalize database into 3NF

I have this database:
R(A, B, C, D, E)
Keys: A
F = {A -> B, D -> E, C -> D}
I normalize it into 3NF like this:
R(A, B, C, D, E)
Keys: AD
F = {AD -> B, AD -> E, C -> D}
What I do is when I check D -> E, D is not a superkey and E is not a key attribute, so I treat D and A as a superkey {AD}. When I check C -> D, C is not a key but D is a key attribute so it's OK.
Is my normalization correctly?
There is a problem in your input data. If the relation R has the dependencies F = {A -> B, D -> E, C -> D}, then A cannot be a key. In fact, a key is a set of attributes whose closure determines all the attributes of the relation, which is not the case here, since:
A+ = AB
From F, the (only) possible key is AC, in fact
AC+ = ABCD
Normalizing means to reduce the redundancy by decomposing a relation in other relations in which the functional dependencies do not violate the normal form, and such that joining the decomposed relations, one can obtain the original one.
In you solution, you do not decompose the relation, but only change the set of dependencies with other dependencies not implied by the first set.
A correct decomposition would be instead the following:
R1 < (A B) ,
{ A → B } >
R2 < (C D) ,
{ C → D } >
R3 < (D E) ,
{ D → E } >
R4 < (A C) ,
{ } >
The algorithm to decompose a relation into 3NF can be found on any good book on databases.

New collection or nested array?

Supposed we have a db called A. The structure of A can be:
1) A( a, b, c, d).
a, b, c, d are collections.
And the element in each collection is like { _id:id, data : data }
2) A(k).
k(a, b, c, d)
k is a colletion. and a, b, c, d are elements inside k.
a, b, c, d are like
{
type : 'a / b / c / d',
data : [
{_id : id1, data : data1 },
{_id : id2, data : data2},
...
]
}
the daily operations are { get, inserting element into, empty element of } a, b, c and d.
Which one is better in terms of efficiency?
#Markus-W-Mahlberg is right about your actual-use-case.
As you are using mongodb and it uses documents not tabular data structure (such as ms-sql), your both approaches work fine and if you define right index, u get same performance.
But in my opinion if your types (a, b, c and d ) have different structures (different properties, different queries, different update scenarios, aggregation plans and ...) Use way1, other wise use Way2 with right index.

Anorm is just locking on executeUpdate

I have a very simple Play 2.1 Scala project. As in, this is the only code in it so far. I have a task which I am running with an Akka.system.scheduler. I have some code to select from the database (currently the standard test H2 instance) and I'm following the documentation example almost exactly.
DB.withConnection { implicit c =>
Logger.info("2")
var x = SQL("insert into x (a, b, c) values ({a, b, c})").on(
'a -> a,
'b -> b,
'c -> c
)
Logger.info("2.5")
x.executeUpdate()
Logger.info("3")
It never gets past 2.5. I haven't got any other database operations happening (except for evolutions).
Help?
Based on your link, shouldn't your SQL statement look like:
var x = SQL("insert into x (a, b, c) values ({a}, {b}, {c})").on(
"a" -> a,
"b" -> b,
"c" -> c
)
In the question the values don't have individual braces: {a, b, c}.