How to correctly use the LAST Condition to MARK the last word of the document - uima

I'm trying to mark the last word in the document as a Annotation to be used by other rules.
This is what I've tried so far:
DocumentAnnotation{LAST(W) -> MARK(Unit2)};
Document{LAST(W) -> MARK(Unit2)};
Neither of these rules seem to work.
Is it even possible to mark the last word of the document by these means?
The Problem is that we try to find the last word/period of the Document so that a previously marked Annotation can be shifted to the last word.
Any help would be greatly appreciated.

You cant use LAST Condition for this scenario. Rather you can use MARKLAST action.
DECLARE LastWord;
Document{->MARKLAST(LastWord)};

Nevermind, I'm an oaf,
it's done with
Document{->MARKLAST(Unit2)}

Related

Grok filter for a time counter HH:MM

I'm quite new to ELK and Grok-filtering, and I'm struggling with parsing this particular pattern in my grok filter.
I've used the grok debugger to try and solve this, but although I like the tool, I just get confused by the custom patterns.
Eventually, I hope to parse lots of log files sent by filebeat to logstash, then send the parsed logs to elasticsearch and display with kibana or some similar visualization tool.
The lines that I need to parse follow the following pattern:
1310 2017-01-01 16:48:54 [325:51] [326:49] [359:57] Some log info text
The first four digits is a log type identifier, and will be used for grouping. I've called the field "LogLineID".
The date is formatted YYYY-MM-DD HH:MM:SS, and is parsed ok. I called the field "LogDate".
But now the problem begins. Within the square brackets, I have counters, formatted as MM:SS if you like. I cannot for the life of me find a way to sort these out, but I need to compare these times, hence I want to store them as minutes and seconds, not just numbers.
The first is a counter "TimeSpent",
the second is a counter "TimeStarted" and
the third is a counter "TimeSinceDown".
Then, last, comes the info text, which I've managed to grok with simply applying %{GREEDYDATA:LogInfo}.
I notice that the amount of minutes could be far higher than the standard 60 minutes within an hour, so I may be barking up the wrong tree here trying to parse it with date patterns such as TIMESTAMP_ISO8601, but then, I don't really know how else to do this.
So, I came this far:
%{NUMBER:LogLineID} %{TIMESTAMP_ISO8601:LogDate}
and were as mentioned able to (by cutting away the square bracket parts) to parse the log info text with
%{GREEDYDATA:LogInfo}
to create a field LogInfo.
But that's were I'm stuck. Could someone please help me figure out the rest?
Massive thanks in advance.
PS! I also found %{NUMBER:duration}, but it could as far as I could tell only parse timestamps with dot, not colon..
grok regex expression can help you solve the problem.
but first I wanna make sure that do you mean [325:51] [326:49] [359:57] are the three component that you wanna to fetch? And it will returns the result like :
TimeSpent: 325:51
TimeStarted: 326:49
TimeSinceDown: 359:57
were i get the point , you can use my ways in on of the following suggestions:
define your own custom pattern files and add the pattern in your file.
just use the expression in filter part of logstash conf file
hope it will helps you
Ah, there was a space.. Actually, I was misleading myself and everybody in my question, as it was not actually that log line that was causing problems. I just took the first one, not realizing where the problem really were, but the one causing problems had a space within the brackets as such: [ 42:31]. There are also some parts where there are two spaces, so the way I managed to solve this was to include a %{SPACE} between the \[ and the %{NUMBER}:
%{NUMBER:LogLineID} %{TIMESTAMP_ISO8601:LogDate} \[%{SPACE}%{NUMBER:TimeSpentMinutes}\:%{NUMBER:TimeSpentSeconds}\] \[%{SPACE}%{NUMBER:TimeStartedMinutes}\:%{NUMBER:TimeStartedSeconds}\] \[%{SPACE}%{NUMBER:TimeSinceDownMinutes}\:%{NUMBER:TimeSinceDownSeconds}\] %{GREEDYDATA:LogText}
I still haven't solved the merging of minutes and seconds, but this I can also handle in a later stage.
Thanks to Lin Don for showing an interest in my problem, and sorry for not replying sooner.
Hope the solution will help others (or even myself) if their stuck on the same kind of problem.
Note to myself: Read the logs more carefully before grok'ing.. :)

Returning the number of results?

I'm looking to return the number of results found in an ajax fashion on Algolia instant search.
A little field saying something like "There are X number of results" and refines as the characters are typed.
I've read you utilise 'nbHits' but i'm unsure of how to go about it.. Being from a design background.
Thanks for help in advance.
The instantsearch.js stats widget shows the number of results and speed of the search. If you don't want to use the widget, I believe you can still use {{nbHits}} inside of your template wherever you want the number to print.
Very easy when you know how, Thanks for pointing me in the right direction Josh.
This works:
search.addWidget(
instantsearch.widgets.stats({
container: '.no-of-results'
})
);

In selector with asterisk * not working in report selection

What is the proper way to search a table for every record that starts in a similar way? I have tried:
"THESE. WORDS" IN {example_one.job_title} and {example_two.status} = "A"
But I need all combinations, including "THESE. WORDS*" Adding the asterisk doesn't work, I guess because of how IN works.
To summarize information in the comments,
to limit job_title by the list of values in these. words, you need your field on the left hand side and the values on the right.
you may want {example_one.job_title} LIKE 'keyword*'
If you found this information helpful, you can upvote and/or accept the answer.

Accessing nested elements in python

I am new to python.I am posting here for the first time and I know question might be quite basic but problem is I can't figure out myself.
Lets say I have
List=[("a,"b"),("c","d"),("e","f")]
I want the user to enter one of the elements of one of the tuples as input and the other element is printed.Or more precisely I would say that just one of elements in List[x][0] is input and corresponding List[x][1] element is printed as output.I hope it makes sense.
Thanks!
Please check List in the question.I think you forgot a quote(") in the first tuple.[("a^here,"b"),("c","d"),("e","f")]
I think this might help you.
List=[("a","b"),("c","d"),("e","f")]
c=raw_input('ENTER A CHARACTER-')
for i in xrange(len(List)):
if c in List[i]:
ind=List[i].index(c)
print List[i][abs(ind-1)]
break

whoosh doesn't search for short words like "C#"

i am using whoosh to index over 200,000 books. but i have encountered some problems with it.
the whoosh query parser returns NullQuery for words like "C#", "C++" with meta-characters in them and also for some other short words. this words are used in the title and body of some documents so i am not using keyword type for them. i guess the problem is in the analysis or query-parsing phase of searching or indexing but i can't touch my data blindly. can anyone help me to correct this issue. Tnx.
i fixed the problem by creating a StandardAnalyzer with a regex pattern that meets my requirements,here is the regex pattern:
'\w+[#+.\w]*'
this will make tokenizing of fields to be done successfully, and also the searching goes well.
but when i use queries like "some query++*" or "some##*" the parsed query will be a single Every query, just the '*'. also i found that this is not related to my analyzer and this is the Whoosh's default behavior. so here is my new question: is this behavior correct or it is a bug??
note: removing the WildcardPlugin from the query-parser solves this problem but i also need the WildcardPlugin.
now i am using the following code:
from whoosh.util import rcompile
#for matching words like: '.NET', 'C++' and 'C#'
word_pattern = rcompile('(\.|[\w]+)(\.?\w+|#|\+\+)*')
#i don't need words shorter that two characters so i don't change the minsize default
analyzer = analysis.StandardAnalyzer(expression=word_pattern)
... now in my schema:
...
title = fields.TEXT(analyzer=analyzer),
...
this will solve my first problem, yes. but the main problem is in searching. i don't want to let users to search using the Every query or *. but when i parse queries like C++* i end up an Every(*) query. i know that there is some problem but i can't figure out what it is.
I had the same issue and found out that StandardAnalyzer() uses minsize=2 by default. So in your schema, you have to tell it otherwise.
schema = whoosh.fields.Schema(
name = whoosh.fields.TEXT(stored=True, analyzer=whoosh.analysis.StandardAnalyzer(minsize=1)),
# ...
)