Blank results while using Tokens Regex rules to identify Named Entities - macros

I am struggling with writing the correct rule which involves macros to identify organizations in a text.
To Identify Matrix Inc. in:
With it's rising share prices Matrix Inc. has come out a winner this quarter.
I am trying to check for words like Inc within the entity and thus defined a macros and rule as below:
$ORGANIZATION_TITLES = "/pharmaceuticals?|group|corp|corporation|international|co.?|inc.?|incorporated|holdings|motors|ventures|parters|llc|limited liability corporation|pvt.? ltd.?/"
ENV.defaults["stage"] = 1
{
ruleType: "tokens",
pattern: ([$ORGANIZATION_TITLES]),
action: ( Annotate($0, ner, "ORGANIZATION") )
}
ENV.defaults["stage"] = 2
{ ( [{tag:NNP}]+? ($ORGANIZATION_TITLES)) => ORGANIZATION }
I tried using bindings also and then applying the rule.
env.bind("$ORGANIZATION_TITLES", TokenSequencePattern.compile(env,"/pharmaceuticals?|group|corp|corporation|international|co.?|inc.?|incorporated|holdings|motors|ventures|parters|llc|limited liability corporation|pvt.? ltd.?/"));
Nothing seems to be working. I need to define more complex pattern rules involving macros like:
pattern: ( [ { ner:PERSON } ]+ /,/*? ($TITLES_CORPORATE_PREFIXES)*? $TITLES_CORPORATE+? /,/*? /of|for/? /,/*? [ { ner:ORGANIZATION } ]+ )
where $TITLES_CORPORATE_PREFIXES and $TITLES_CORPORATE are macros similar to $ORGANIZATION_TITLES.
What am I doing wrong?
EDIT
Here's my code:
public static void main(String[] args)
{
String rulesFile = "D:\\Workspace\\resource\\NERRulesFile.txt";
String dataFile = "D:\\Workspace\\resource\\GoldSetSentences.txt";
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// pipeline.addAnnotator(new TokensRegexAnnotator(rulesFile));
String inputText = "Bill Edelman , CEO and Chairman , for Paragonix commented on the Supply Agreement with Essential Pharmaceuticals .";
Annotation document = new Annotation(inputText.toLowerCase());
pipeline.annotate(document);
List<CoreMap> sentences = document.get(SentencesAnnotation.class);
CoreMapExpressionExtractor extractor = CoreMapExpressionExtractor.createExtractorFromFiles(TokenSequencePattern.getNewEnv(), rulesFile);
/* Next we can go over the annotated sentences and extract the annotated words,
Using the CoreLabel Object */
for (CoreMap sentence : sentences)
{
List<MatchedExpression> matched = extractor.extractExpressions(sentence);
for(MatchedExpression phrase : matched){
// Print out matched text and value
System.out.println("matched: " + phrase.getText() + " with value " + phrase.getValue());
// Print out token information
CoreMap cm = phrase.getAnnotation();
for (CoreLabel token : cm.get(TokensAnnotation.class))
{
String word = token.get(TextAnnotation.class);
String lemma = token.get(LemmaAnnotation.class);
String pos = token.get(PartOfSpeechAnnotation.class);
String ne = token.get(NamedEntityTagAnnotation.class);
System.out.println("matched token: " + "word="+word + ", lemma="+lemma + ", pos=" + pos + "ne=" + ne);
}
}
}
}

Here is a rules file that should work:
ner = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$NamedEntityTagAnnotation" }
$ORGANIZATION_TITLES = "/inc\.|corp\./"
{ pattern: ([{pos: NNP}]+ $ORGANIZATION_TITLES), action: ( Annotate($0, ner, "RULE_FOUND_ORG") ) }
I have made some changes to our code base to make the TokensRegexAnnotator more easily accessible. You will need to get the latest version from GitHub: https://github.com/stanfordnlp/CoreNLP
java -Xmx8g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner,tokensregex -tokensregex.rules organization.rules -file samples.txt -outputFormat text -tokensregex.caseInsensitive
If you run this command or the equivalent Java API call it should work:

Related

Convert List to json object in Talend

I'm trying to convert list result from tJava Component to json object.
Result from tjava component is below.
[{run_id=5d0753d58d93b71a1d12cc22_, parent_run_id=null, pipe_invoker=scheduled, path_id=shared, count=33, plex_path=null, invoker=abc.com, nested_pipeline=true, duration=355, start_time=2020-11-20T11:17:32.298000+00:00, lable=MP_SQS, state=Completed, key=57694b41ee, root_ruuid=2_ba32ea346}, {run_id=5bd4c6ea346, parent_run_id=null, pipe_invoker=scheduled, path_id=shared, count=33, plex_path=null, invoker=wwr.com, nested_pipeline=true, duration=355, start_time=2020-11-20T11:17:32.298000+00:00, lable=Summary_MP_SQS, state=Completed, key=55dfff4f, root_ruuid=1246d2-8bdc-1846}]
i tried using tConvertType, or in tMapper converting into String then replace all function multiple time and then storing result into json file but noting is working as expected.
End Expected result is json file from above result.
I don't think there is an easy way with a Talend component to do that (but I may be wrong :-) ) :
Put quotes on the keys/value
Replace "=" by ":"
But it can be done with regular Java code, here an example :
String input =
"[{run_id=1111, parent_run_id=null1, pipe_invoker=scheduled1, path_id=shared, count=33, plex_path=null}," +
" {run_id=2222, parent_run_id=null2, pipe_invoker=scheduled2, path_id=shared, count=33, plex_path=null}]";
StringBuilder output = new StringBuilder();
input = input.substring(2,input.length() - 2); // remove "[{" .... "}]"
output.append("[{");
String step1 [] = input.split("\\}, \\{"); // each record, split on "}, {"
int j = 1;
for (String record : step1) {
String step2 [] = record.split(","); // each key/value, split on ","
int i = 1;
for (String keyValue : step2) {
// key=value --> "key":"value"
output.append("\"" + keyValue.split("=")[0].trim() + "\":\"" + keyValue.split("=")[1].trim() + "\"");
if (i++ < step2.length ) {
output.append(",");
}
}
if (j++ < step1.length ) {
output.append("} , {");
}
}
output.append("}]");
/*
output :
[{"run_id":"1111","parent_run_id":"null1","pipe_invoker":"scheduled1","path_id":"shared","count":"33","plex_path":"null"} , {"run_id":"2222","parent_run_id":"null2","pipe_invoker":"scheduled2","path_id":"shared","count":"33","plex_path":"null"}]
*/

ServiceStack Ormlite Deserialize Array for In Clause

I am storing some query criteria in the db via a ToJson() on the object that contains all the criteria. A simplified example would be:
{"FirstName" :[ {Operator: "=", Value: "John"}, { Operator: "in", Value:" ["Smith", "Jones"]"}], "SomeId": [Operator: "in", Value: "[1,2,3]" }]}
The lists are either string, int, decimal or date. These all map to the same class/table so it is easy via reflection to get FirstName or SomeId's type.
I'm trying to create a where clause based on this information:
if (critKey.Operator == "in")
{
wb.Values.Add(keySave + i, (object)ConvertList<Members>(key,
(string)critKey.Value));
wb.WhereClause = wb.WhereClause + " And {0} {1} (#{2})".Fmt(critKey.Column,
critKey.Operator, keySave + i);
}
else
{
wb.Values.Add(keySave + i, (object)critKey.Value);
wb.WhereClause = wb.WhereClause + " And {0} {1} #{2}".Fmt(critKey.Column, critKey.Operator, keySave + i);
}
It generates something like this (example from my tests, yes I know the storenumber part is stupid):
Email = #Email0 And StoreNumber = #StoreNumber0 And StoreNumber in (#StoreNumber1)
I'm running into an issue with the lists. Is there a nice way to do this with any of the ormlite tools instead of doing this all by hand? The where clause generates fine except when dealing with lists. I'm trying to make it generic but having a hard time on that part.
Second question maybe related but I can't seem to find how to use parameters with in. Coming from NPoco you can do (colum in #0, somearray)` but I cant' seem to find out how to do this without using Sql.In.
I ended up having to write my own parser as it seems ormlite doesn't support have the same support for query params for lists like NPoco. Basically I'd prefer to be able to do this:
Where("SomeId in #Ids") and pass in a parameter but ended up with this code:
listObject = ConvertListObject<Members>(key, (string)critKey.Value);
wb.WhereClause = wb.WhereClause + " And {0} {1} ({2})"
.Fmt(critKey.Column, critKey.Operator,listObject.EscapedList(ColumnType<Members>(key)));
public static string EscapedList(this List<object> val, Type t)
{
var escapedList = "";
if (t == typeof(int) || t == typeof(float) || t == typeof(decimal))
{
escapedList = String.Join(",", val.Select(x=>x.ToString()));
} else
{
escapedList = String.Join(",", val.Select(x=>"'" + x.ToString() + "'"));
}
return escapedList;
}
I'd like to see other answers especially if I'm missing something in ormlite.
When dealing with lists you can use the following example
var storeNumbers = new [] { "store1", "store2", "store3" };
var ev = Db.From<MyClass>
.Where(p => storeNumbers.Contains(p => p.StoreNumber));
var result = Db.Select(ev);

Custom procedure fails to collect properties of a class parameter; why?

OK, first of all, I'm a rookie with Caché, so the code will probably be poor, but...
I need to be able to query the Caché database in Java in order to rebuild source files out of the Studio.
I can dump methods etc without trouble, however there is one thing which escapes me... For some reason, I cannot dump the properties of parameter EXTENTQUERYSPEC from class Samples.Person (namespace: SAMPLES).
The class reads like this in Studio:
Class Sample.Person Extends (%Persistent, %Populate, %XML.Adaptor)
{
Parameter EXTENTQUERYSPEC = "Name,SSN,Home.City,Home.State";
// etc etc
}
Here is the code of the procedure:
CREATE PROCEDURE CacheQc.getParamDesc(
IN className VARCHAR(50),
IN methodName VARCHAR(50),
OUT description VARCHAR(8192),
OUT type VARCHAR(50),
OUT defaultValue VARCHAR(1024)
) RETURNS NUMBER LANGUAGE COS {
set ref = className _ "||" _ methodName
set row = ##class(%Dictionary.ParameterDefinition).%OpenId(ref)
if (row = "") {
quit 1
}
set description = row.Description
set type = row.Type
set defaultValue = row.Default
quit 0
}
And the Java code:
private void getParamDetail(final String className, final String paramName)
throws SQLException
{
final String call
= "{ ? = call CacheQc.getParamDesc(?, ?, ?, ?, ?) }";
try (
final CallableStatement statement = connection.prepareCall(call);
) {
statement.registerOutParameter(1, Types.INTEGER);
statement.setString(2, className);
statement.setString(3, paramName);
statement.registerOutParameter(4, Types.VARCHAR);
statement.registerOutParameter(5, Types.VARCHAR);
statement.registerOutParameter(6, Types.VARCHAR);
statement.executeUpdate();
final int ret = statement.getInt(1);
// HERE
if (ret != 0)
throw new SQLException("failed to read parameter");
System.out.println(" description: " + statement.getString(4));
System.out.println(" type : " + statement.getString(5));
System.out.println(" default : " + statement.getString(6));
}
}
Now, for the aforementioned class/parameter pair the condition marked // HERE is always triggered and therefore the exception thrown... If I comment the whole line then I see that all three of OUT parameters are null, even defaultValue!
I'd have expected the latter to have the value mentioned in Studio...
So, why does this happen? Is my procedure broken somewhat?
In first you should check that you send right value for className and paramName, full name and in right case and. Why you choose storage procedures, when you can use select? And you can call your procedure in System Management Portal to see about probable errors.
select description, type,_Default "Default" from %Dictionary.ParameterDefinition where id='Sample.Person||EXTENTQUERYSPEC'
Your example, works well for me.
package javaapplication3;
import com.intersys.jdbc.CacheDataSource;
import java.sql.CallableStatement;
import java.sql.Connection;
import java.sql.SQLException;
import java.sql.Types;
public class JavaApplication3 {
/**
* #param args the command line arguments
*/
public static void main(String[] args) throws SQLException {
CacheDataSource ds = new CacheDataSource();
ds.setURL("jdbc:Cache://127.0.0.1:56775/Samples");
ds.setUser("_system");
ds.setPassword("SYS");
Connection dbconnection = ds.getConnection();
String call = "{ ? = call CacheQc.getParamDesc(?, ?, ?, ?, ?)}";
CallableStatement statement = dbconnection.prepareCall(call);
statement.registerOutParameter(1, Types.INTEGER);
statement.setString(2, "Sample.Person");
statement.setString(3, "EXTENTQUERYSPEC");
statement.registerOutParameter(4, Types.VARCHAR);
statement.registerOutParameter(5, Types.VARCHAR);
statement.registerOutParameter(6, Types.VARCHAR);
statement.executeUpdate();
int ret = statement.getInt(1);
System.out.println("ret = " + ret);
System.out.println(" description: " + statement.getString(4));
System.out.println(" type : " + statement.getString(5));
System.out.println(" default : " + statement.getString(6));
}
}
end result
ret = 0
description: null
type : null
default : Name,SSN,Home.City,Home.State
UPD:
try to change code of your procedure and add some debug like here
Class CacheQc.procgetParamDesc Extends %Library.RegisteredObject [ ClassType = "", DdlAllowed, Owner = {UnknownUser}, Not ProcedureBlock ]
{
ClassMethod getParamDesc(className As %Library.String(MAXLEN=50), methodName As %Library.String(MAXLEN=50), Output description As %Library.String(MAXLEN=8192), Output type As %Library.String(MAXLEN=50), Output defaultValue As %Library.String(MAXLEN=1024)) As %Library.Numeric(SCALE=0) [ SqlName = getParamDesc, SqlProc ]
{
set ref = className _ "||" _ methodName
set row = ##class(%Dictionary.ParameterDefinition).%OpenId(ref)
set ^debug($i(^debug))=$lb(ref,row,$system.Status.GetErrorText($g(%objlasterror)))
if (row = "") {
quit 1
}
set description = row.Description
set type = row.Type
set defaultValue = row.Default
quit 0
}
}
and after some test from java, check zw ^debug
SAMPLES>zw ^debug
^debug=4
^debug(3)=$lb("Sample.Person||EXTENTQUERYSPEC","31#%Dictionary.ParameterDefinition","ERROR #00: (no error description)")
Well, uh, I found the problem... Talk about stupid.
It happens that I had the Samples.Person class open in Studio and had made a "modification" to it; and deleted it just afterwards. Therefore the file was "as new"...
But the procedure doesn't seem to agree with this statement.
I closed the Studio where that file was, selected not to modify the "changes", reran the procedure again, and it worked...
Strangely enough, the SQL query worked even with my "fake modification". I guess it's a matter of some cache problem...

What is it that should be done here?

I have been following this tutorial to come up with a simple source code editor. (The feature that I want the most is keyword highlighting.) What I do not understand is the last part:
class Scanner extends RuleBasedScanner {
public Scanner() {
WordRule rule = new WordRule(new IWordDetector() {
public boolean isWordStart(char c) {
return Character.isJavaIdentifierStart(c);
}
public boolean isWordPart(char c) {
return Character.isJavaIdentifierPart(c);
}
});
Token keyword = new Token(new TextAttribute(Editor.KEYWORD, null, SWT.BOLD));
Token comment = new Token(new TextAttribute(Editor.COMMENT));
Token string = new Token(new TextAttribute(Editor.STRING));
//add tokens for each reserved word
for (int n = 0; n < Parser.KEYWORDS.length; n++) {
rule.addWord(Parser.KEYWORDS[n], keyword);
}
setRules(new IRule[] {
rule,
new SingleLineRule("#", null, comment),
new SingleLineRule("\"", "\"", string, '\\'),
new SingleLineRule("'", "'", string, '\\'),
new WhitespaceRule(new IWhitespaceDetector() {
public boolean isWhitespace(char c) {
return Character.isWhitespace(c);
}
}),
});
}
}
The instruction is as follows:
For each of the keywords in our little language, we define a word entry in our WordRule. We pass our keyword detector, together with rules for recognizing comments, strings, and white spaces to the scanner. With this simple set of rules, the scanner can segment a stream of bytes into sections and then use the underlying rules to color the sections.
Shed me some light please? I do not know what it is I have to do to set the desired keywords..

Entity Framework - Table-Valued Functions - Parameter Already Exists

I am using table-valued functions with Entity Framework 5. I just received this error:
A parameter named 'EffectiveDate' already exists in the parameter collection. Parameter names must be unique in the parameter collection. Parameter name: parameter
It is being caused by me joining the calls to table-valued functions taking the same parameter.
Is this a bug/limitation with EF? Is there a workaround? Right now I am auto-generating the code (.edmx file).
It would be really nice if Microsoft would make parameter names unique, at least on a per-context basis.
I've created an issue for this here.
In the meantime, I was able to get this to work by tweaking a few functions in the .Context.tt file, so that it adds a GUID to each parameter name at runtime:
private void WriteFunctionImport(TypeMapper typeMapper, CodeStringGenerator codeStringGenerator, EdmFunction edmFunction, string modelNamespace, bool includeMergeOption) {
if (typeMapper.IsComposable(edmFunction))
{
#>
[EdmFunction("<#=edmFunction.NamespaceName#>", "<#=edmFunction.Name#>")]
<#=codeStringGenerator.ComposableFunctionMethod(edmFunction, modelNamespace)#>
{ var guid = Guid.NewGuid().ToString("N"); <#+
codeStringGenerator.WriteFunctionParameters(edmFunction, " + guid", WriteFunctionParameter);
#>
<#=codeStringGenerator.ComposableCreateQuery(edmFunction, modelNamespace)#>
} <#+
}
else
{
#>
<#=codeStringGenerator.FunctionMethod(edmFunction, modelNamespace, includeMergeOption)#>
{ <#+
codeStringGenerator.WriteFunctionParameters(edmFunction, "", WriteFunctionParameter);
#>
<#=codeStringGenerator.ExecuteFunction(edmFunction, modelNamespace, includeMergeOption)#>
} <#+
if (typeMapper.GenerateMergeOptionFunction(edmFunction, includeMergeOption))
{
WriteFunctionImport(typeMapper, codeStringGenerator, edmFunction, modelNamespace, includeMergeOption: true);
}
} }
...
public void WriteFunctionParameters(EdmFunction edmFunction, string nameSuffix, Action<string, string, string, string> writeParameter)
{
var parameters = FunctionImportParameter.Create(edmFunction.Parameters, _code, _ef);
foreach (var parameter in parameters.Where(p => p.NeedsLocalVariable))
{
var isNotNull = parameter.IsNullableOfT ? parameter.FunctionParameterName + ".HasValue" : parameter.FunctionParameterName + " != null";
var notNullInit = "new ObjectParameter(\"" + parameter.EsqlParameterName + "\"" + nameSuffix + ", " + parameter.FunctionParameterName + ")";
var nullInit = "new ObjectParameter(\"" + parameter.EsqlParameterName + "\"" + nameSuffix + ", typeof(" + parameter.RawClrTypeName + "))";
writeParameter(parameter.LocalVariableName, isNotNull, notNullInit, nullInit);
}
}
...
public string ComposableCreateQuery(EdmFunction edmFunction, string modelNamespace)
{
var parameters = _typeMapper.GetParameters(edmFunction);
return string.Format(
CultureInfo.InvariantCulture,
"return ((IObjectContextAdapter)this).ObjectContext.CreateQuery<{0}>(\"[{1}].[{2}]({3})\"{4});",
_typeMapper.GetTypeName(_typeMapper.GetReturnType(edmFunction), modelNamespace),
edmFunction.NamespaceName,
edmFunction.Name,
string.Join(", ", parameters.Select(p => "#" + p.EsqlParameterName + "\" + guid + \"").ToArray()),
_code.StringBefore(", ", string.Join(", ", parameters.Select(p => p.ExecuteParameterName).ToArray())));
}
Not a bug. Maybe a limitation or an omission. Apparently this use case has never been taken into account. EF could use auto-created parameter names, but, yeah, it just doesn't.
You'll have to resort to calling one of the functions with .AsEnumerable(). For some reason, this must be the first function in the join (as I have experienced). If you call the second function with .AsEnumerable() it is still translated to SQL and the name collision still occurs.