Abbreviations Using Uima Ruta - uima

I tried to tag abbreviations in some files using Uima Ruta. I used a simple script as below but is not working for certain abbreviations.
My algorithm goes something like this;
1. Split abbreviation into letters/numbers (ATM -> A,T,M . IC3 -> I,C,3)
2. Convert numbers into letters (I,C,3 -> I,C,C,C)
3. Read current sentence and match letters to words (stop words may/may not be included)
But I don't know how to achieve the same in Ruta. Where can I look for such looping and control structures ?
Sample Input:
The National Academies of Science, Engineering, and Medicine (NAS)
registered nurses (RNs)
Licensed practical nurses (LPNs)
Asian/Pacific Islander Americans (APIAs)
Crime&Investigation Network (CI)
Internet Crime Complaint Center (“IC3”)
Practice Management <PM>
Script:
CW (EnglishStopWord?|SPECIAL?)? CW (EnglishStopWord?|SPECIAL?)? CW (EnglishStopWord?|SPECIAL?)? CW (EnglishStopWord?|SPECIAL?)? CW LParen CAP RParen{-> MARK(DZC_ABBREVIATIONS, 1, 12)};
CW (EnglishStopWord?|SPECIAL?)? CW (EnglishStopWord?|SPECIAL?)? CW (EnglishStopWord?|SPECIAL?)? CW (EnglishStopWord?|SPECIAL?)? CW{-PARTOF(DZC_ABBREVIATIONS)} LParen CAP RParen{-PARTOF(DZC_ABBREVIATIONS) -> MARK(DZC_ABBREVIATIONS, 1, 12)};
CW (EnglishStopWord?|SPECIAL?)? CW (EnglishStopWord?|SPECIAL?)? CW (EnglishStopWord?|SPECIAL?)? CW (LParen CAP SW? RParen){-PARTOF(DZC_ABBREVIATIONS) -> MARK(DZC_ABBREVIATIONS, 1, 11)};
Untagged ABBREVIATIONS:
Chronic Kidney Disease in Children (CKiD)
Society of Intercultural Education, Training, and Research (SIETAR)
The National Academies of Science, Engineering, and Medicine (NAS)
Internet Crime Complaint Center (“IC3”)

Related

factor equation with several variables

I have this code :
factor(sqrt((diff(theta, x1))^2+(diff(theta, y1))^2+(diff(theta, z1))^2));
The two equations are identicals but maple doesn't see it(the difference give me an awful equation ...). Is there a way for maple to be able de factorize the first equation ?
When you write, "The two equations are identicals..." you seem to be indicating that you think that they are mathematically equivalent.
That is false.
Under the assumptions that all the unknowns are real, Maple can simplify the difference to zero.
Below, I give a counter-example where the two expressions are not equal.
It is bad etiquette here, to provide images of code instead of plaintext code.
restart;
with(VectorCalculus):
r1:=<x1,y1,z1>:
r2:=<x2,y2,z2>:
r3:=<x3,y3,z3>:
A:=r1 &x r2:
B:=r3 &x r2:
theta:=arccos(DotProduct(A,B)/(Norm(A)*Norm(B))):
sintheta1:=Norm(r1 &x r2)/(Norm(r1)*Norm(r2)):
expr1:=factor(sqrt((diff(theta, x1))^2+(diff(theta, y1))^2
+(diff(theta, z1))^2)):
lprint(expr1);
((x2^2+y2^2+z2^2)/(x1^2*y2^2+x1^2*z2^2-2*x1*x2*y1*y2-
2*x1*x2*z1*z2+x2^2*y1^2+x2^2*z1^2+y1^2*z2^2-
2*y1*y2*z1*z2+y2^2*z1^2))^(1/2)
expr2:=1/(Norm(r1)*sintheta1):
lprint(expr2);
(x2^2+y2^2+z2^2)^(1/2)/((y1*z2-y2*z1)^2
+(-x1*z2+x2*z1)^2+(x1*y2-x2*y1)^2)^(1/2)
Now, under the assumptions that all the unknowns are real,
combine(expr2-expr1) assuming real;
0
Now, a counter-example with (some particular) complex values,
simplify(eval(expr2-expr1, [x1=I, x2=1, y1=0, y2=1, z1=0, z2=1]));
(1/2) (1/2)
-I 2 3

How to embed a sentence into vector

I had the sentence.I use word2vec to embed word to vector.For example, consider I have a sentence of 5 words.so I get 5 different vectors(One for each word) for the sentence.Which is the best method to make the complete sentence as a single vector which I will pass to the ANN?
This is an open problem; many approaches exist to creating meaningful sentence vectors.
BoW models, as Fabrizio_P explained
Element-wise vector operations (http://www.aclweb.org/anthology/P/P08/P08-1028.pdf)
Addition (i.e. simply add all the word vector together, possibly normalizing afterwards)
Multiplication (i.e. multiply all vectors together, element-wise, resulting in a logically grounded embedding)
Arbitrary task-specific recurrent functions (http://www.aclweb.org/anthology/D12-1110)
More sophisticated general-purpose approaches (https://arxiv.org/abs/1508.02354, https://arxiv.org/abs/1506.06726)
Element-wise operations, such as vector addition, suffice for most simple tasks, but obviously exhibit a high amount of information loss as sentences grow larger or the task at hand gets more demanding. Recurrent neural networks are quite good at creating task specific sentence embeddings, but obviously these require training data and some familiarity with machine learning. General purpose sentence embeddings are the most interesting ones from a research perspective, but probably not what you're looking for.
You could use the bag of words concept, as explained here https://machinelearningmastery.com/gentle-introduction-bag-words-model/. So that you collect all of you words and put them in a vocabulary. After that you can represent your sentence as a vector, where each element is either 1 or 0, depending on whether the word is in the sentence or not.
For example if your sentence is
Hello my name is Peter.
Your dictionary will be
[Hello, my, name, is, Peter]
The vector for your sentence will be
[1, 1, 1, 1, 1]
If you have another sentence like
I am happy.
Your dictionary will extend including also those words. So it will be
[Hello, my, name, is, Peter, I, am, happy]
And your vector sentence will also extend
[1, 1, 1, 1, 1, 0, 0, 0]
As an alternative you can also create a vocabulary where each word is represented by a number, so that
{Hello: 1, my: 2, name: 3, is: Peter: 4, I: 5, am: 6, happy: 7}
And the vector for your sentence will be
[1,2,3,4]
For each new sentence you will convert the words into numbers using the vocabulary as reference.
word2vec is an algorithm to create word embeddings, you can read the details here https://www.tensorflow.org/tutorials/word2vec
You can run this algorithm on your own dataset, or use saved word embeddings that Google (or other parties) have been run on billions of documents.
The idea is to map each word as dense vector in some n-dimensional vector space, thus containing much more information about words and their relationships.
Put simply each word is represented by a unique list of numbers, and now mathematical operations are possible on words, sentences and documents.

Divisor function on calculator

I need to know the factorization of numbers. Why? Well I am planning on writing a story called Math World and for the base population I have these gender and factor rules:
If male factors outweigh female factors(so like 4 male factors and 3 female factors), the number is male.
If female factors outweigh male factors, the number is female.
If male factors equal female factors, the number is hermaphroditic(both male and female).
If you take a male number to an integer power, you get a hermaphroditic number.
If you take a female number to an integer power, you get another female number.
Primes alternate between male and female(so like 2 is male, 3 is female and so on)
But this is only for the base population.
Anyway,I don't have a factorization program on my calculator. I need all the factors, not just prime factors. How can I do that?
Calculator model:
TI-84 Plus Silver Edition
Code:
:Input "TYPE NUMBER", X
:FOR(A,1,X)
:IF remainder(X,A)=0
:Disp A
:End
:End
:End
Here is the code that is giving me errors on my calculator. My calculator is in FUNC mode, thus the X that I put in via the variable button(different variable for each mode).
Let's take a look at your code:
:Input "TYPE NUMBER", X
:FOR(A,1,X)
:IF remainder(X,A)=0
:Disp A
:End
:End
:End
First off, programs automatically return in TI-Basic. There is no need for Stop, Return, or End at the end of a program by default. Programs behave (with one exception) as if they had Return at the end (that exception is if something is evaluated on the last line, where it will be displayed instead of Done).
Second, unless an If statement is accompanied by Then, an End is not necessary (it will be only the next line that is conditionally evaluated).
Thus, you should only have one End token instead of three. Also, this may have been a typo on your part, but you should not have a space after the comma in the first line. You probably wanted it before the end-quote on the first line.
Now your code works, and it looks like this:
:Input "TYPE NUMBER ",X
:For(A,1,X)
:If remainder(X,A)=0
:Disp A
:End
There are still optimizations, however. Because of the quirk in the For loop, you should leave the closing parenthesis. But for the if statement, you could do If 0=remainder(X,A to leave off the closing parenthesis. Additionally, instead of 0= you can simply use not(. Lastly, remainder(X,A doesn't work with all of the TI series and is one byte longer than fPart(X/A. There are no downsides to this replacement, and it will save you a little space (1 byte), time, and compatibility.
Lastly, if X is a number and A is a factor of that number, X/A is also a factor. Thus, we only need to loop up to sqrt(X), which is much more efficient. Here is the final code:
:Input "TYPE NUMBER ",X
:For(A,1,√(X))
:If not(fpart(X/A
:Disp A,X/A
:End
Possible optimizations (up to you):
50/50 - calculate X/A separately and use Ans twice. A little save in speed, no size change, a little less readability.
Would not recommend - leave off the )) in the For loop to save a little space (2 bytes), but there is a noticeable decrease in speed for large numbers, thanks to the quirk with For( loops.
Up to you - since you already know that 1 and X are factors, you could start the loop from 2. It just depends how you're recording or displaying these values.
You can improve the speed of the algorithm from O(n) to O(√n) with this if you don't care about the order of the factors.
For(A,1,√(X))
If remainder(X,A)=0
Disp A,X/A
End
Input "TYPE NUMBER:",N
For(A,1,N)
If remainder(N,A)=0
Disp A
End
I wasn't able to test his program, but I'm fairly sure that it works. This program plainly displays the factors, but it can easily be changed into a format that can be outputted, like maybe a list.
If the remainder of N divided by A is equal to zero, then A is a factor of N.

Conjunctive normal form of Game Of life

I want to use minisat to solve a 7 * 7 size game of life, to get the stable generations.
Here I simplify the rule of live and death:
Von Neumann de rayon 1
The cell who has south, east and north neighbors alive will be alive.
(xin : north neighbor; xie : east neighbors; xis : south neighbors)
My formule
But I don't know to to change this to CNF(Conjunctive normal form)
Can someone help me? T
The way I learned CNF, the "dead" formula is a single disjunctive term:
~xin V ~xie V ~xis
... which is simply the application of DeMorgan's theorem to the "live" case, which is already in CNF.
Remember, any expression whose operators are all disjuctions or all conjunctions, is already in both CNF and DNF.

Orange Bayes algorithm with continuous features

I have a two class Bayes classification problem with four continuous features. I'm trying to partially reproduce bayes algorithm algorithm that Orange uses for calculating probabilities. But I haven't succeeded to obtain same values that Orange outputs.
Data set size : 150 (class0 : 88 and class1 : 62)
I use the following algorithm
p(class0 | X1, X2, X3, X4) = L0 / (L0 + L1)
p(class1 | X1, X2, X3, X4) = L1 / (L0 + L1)
where L0 and L1 are likelihoods
L0 = prior_class0 * product( p(Xi|class0) )
L1 = prior_class1 * product( p(Xi|class1) )
prior_class0 and prior_class1 are Laplacian estimators
prior_class0 = (88 + 1) / (150 + 2)
prior_class1 = (62 + 1) / (150 + 2)
Orange uses LOESS for calculating conditional probabilities (I guess its not necessary to reproduce that). For this dataset it outputs 49 points for both classes as given in python object classifier.conditional_distributions. By using linear interpolation between surrounding points for Xi, I can calculate p(Xi|class0) and p(Xi|class1).
1) Can anyone comment on Orange Bayes algorithm with continuous features?
2) Or any technical advice how to setup compiler/IDE that I could debug Orange C++ code and inspect some intermediary results from functions in orange/source/orange/bayes.cpp?
Orange uses a slightly different formula that, according to Kononenko, gives the same result but allows for better interpretability and m-estimate of probabilities. Instead of product( p(Xi|class0) ) it computes product( p(class0|Xi) / p(class0)). I don't think this should affect your computation, though, but you can check. The code that computes those probabilities is at https://github.com/biolab/orange/blob/master/source/orange/bayes.cpp#L169. Note that it does it for all classes in parallel.
The other piece of the code you're interested in is the computation of probabilities from LOESS density estimates. It's at https://github.com/biolab/orange/blob/master/source/orange/estimateprob.cpp#L307. Note that most operations there are on vectors, e.g. all variables in *result *= (x-x1)/(x2-x1); are actually vectors.
As for debugging, I wrote this code (many years ago, and somewhat ashamed to admit -- seeing the terrible coding style I used) with Visual Studio. I forgot the version and can't check it since I no longer use Windows. But I never really debugged Orange on any other OS.
If you load the project and build a debug version, you'll also have to build a debug version of Python. This is actually simple (see the instructions in the Python source code), the problem would be that you'd have to build debug version of any other binary libraries you use as well (e.g. numpy). A simpler way is to build a release version of Orange but switch the debug info flags on. This way you can use standard Python and libraries.