Fastparse parse error column numbers missing - scala

I just updated from fastparse 0.3.7 to 0.4.1. There is no longer a column number value in the extras of a Parsed.Failure. I grepped through the source and it seems the functionality has been removed, though it is still in the documentation. Is there some other way to get column info now?

It's just changed a bit. You need to grab the index and the parser that failed, and call StringReprOps.prettyIndex.

Related

Issues with "QUERY(IMPORTRANGE)"

Here's my first question on this forum, though I've read through a lot of good answers here.
Can anyone tell me what I'm doing wrong with my attempt to do a query import from one sheet to a column in another?
Here's the formula I've tried, but all my adjustments still get me a parsing error.
=QUERY(IMPORTRANGE("https://docs.google.com/spreadsheets/d/1yGPdI0eBRNltMQ3Wr8E2cw-wNlysZd-XY3mtAnEyLLY/edit#gid=163356401","Master Treatment Log (Responses)!V2:V")"WHERE Col8="'&B2&'")")
Note that importrange is only needed for imports between spreadsheets. If you only import from one sheet into another within the same spreadsheet I would suggest using filter() or query().
Assuming the value in B2 is actually a string (and not a number), you can try
=QUERY(IMPORTRANGE("https://docs.google.com/spreadsheets/d/1yGPdI0eBRNltMQ3Wr8E2cw-wNlysZd-XY3mtAnEyLLY/edit#gid=163356401","Master Treatment Log (Responses)!V2:V"), "WHERE Col8="'&B2&'", 0)
Note the added comma before "WHERE". If you want to import a header row, change 0 to 1.
See if that helps? If not, please share a copy of your spreadsheet (sensitive data erased).

Grok filter for a time counter HH:MM

I'm quite new to ELK and Grok-filtering, and I'm struggling with parsing this particular pattern in my grok filter.
I've used the grok debugger to try and solve this, but although I like the tool, I just get confused by the custom patterns.
Eventually, I hope to parse lots of log files sent by filebeat to logstash, then send the parsed logs to elasticsearch and display with kibana or some similar visualization tool.
The lines that I need to parse follow the following pattern:
1310 2017-01-01 16:48:54 [325:51] [326:49] [359:57] Some log info text
The first four digits is a log type identifier, and will be used for grouping. I've called the field "LogLineID".
The date is formatted YYYY-MM-DD HH:MM:SS, and is parsed ok. I called the field "LogDate".
But now the problem begins. Within the square brackets, I have counters, formatted as MM:SS if you like. I cannot for the life of me find a way to sort these out, but I need to compare these times, hence I want to store them as minutes and seconds, not just numbers.
The first is a counter "TimeSpent",
the second is a counter "TimeStarted" and
the third is a counter "TimeSinceDown".
Then, last, comes the info text, which I've managed to grok with simply applying %{GREEDYDATA:LogInfo}.
I notice that the amount of minutes could be far higher than the standard 60 minutes within an hour, so I may be barking up the wrong tree here trying to parse it with date patterns such as TIMESTAMP_ISO8601, but then, I don't really know how else to do this.
So, I came this far:
%{NUMBER:LogLineID} %{TIMESTAMP_ISO8601:LogDate}
and were as mentioned able to (by cutting away the square bracket parts) to parse the log info text with
%{GREEDYDATA:LogInfo}
to create a field LogInfo.
But that's were I'm stuck. Could someone please help me figure out the rest?
Massive thanks in advance.
PS! I also found %{NUMBER:duration}, but it could as far as I could tell only parse timestamps with dot, not colon..
grok regex expression can help you solve the problem.
but first I wanna make sure that do you mean [325:51] [326:49] [359:57] are the three component that you wanna to fetch? And it will returns the result like :
TimeSpent: 325:51
TimeStarted: 326:49
TimeSinceDown: 359:57
were i get the point , you can use my ways in on of the following suggestions:
define your own custom pattern files and add the pattern in your file.
just use the expression in filter part of logstash conf file
hope it will helps you
Ah, there was a space.. Actually, I was misleading myself and everybody in my question, as it was not actually that log line that was causing problems. I just took the first one, not realizing where the problem really were, but the one causing problems had a space within the brackets as such: [ 42:31]. There are also some parts where there are two spaces, so the way I managed to solve this was to include a %{SPACE} between the \[ and the %{NUMBER}:
%{NUMBER:LogLineID} %{TIMESTAMP_ISO8601:LogDate} \[%{SPACE}%{NUMBER:TimeSpentMinutes}\:%{NUMBER:TimeSpentSeconds}\] \[%{SPACE}%{NUMBER:TimeStartedMinutes}\:%{NUMBER:TimeStartedSeconds}\] \[%{SPACE}%{NUMBER:TimeSinceDownMinutes}\:%{NUMBER:TimeSinceDownSeconds}\] %{GREEDYDATA:LogText}
I still haven't solved the merging of minutes and seconds, but this I can also handle in a later stage.
Thanks to Lin Don for showing an interest in my problem, and sorry for not replying sooner.
Hope the solution will help others (or even myself) if their stuck on the same kind of problem.
Note to myself: Read the logs more carefully before grok'ing.. :)

Talend: configuration dimension time error in tOracleOutput

I still have this problem
Exception in component tOracleOutput_1
java.sql.SQLSyntaxErrorException: ORA-00904: : invalid identifier
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:447)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)
at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:951)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:513)
There is some currupt code in your job. What you can do is first check is there any code generated for this job. if not try removing each component/disable and run and see if the error persist or not
I have had this as well. What usually helps is restarting Talend or restarting the computer.
If that doesn't help, there is something wrong with the job. Then I check every schema, every connection, every tMap, every item in the job if there is an error which Talend doesn't show to me.
To check if the code generation system works, you can always click on the Code tab and see if something comes up.
EDIT
An error ORA-00904 comes up. This leads to the suggestion that a column is named wrongly as seen here: https://dba.stackexchange.com/questions/129641/ora-00904-error-while-querying-the-oracle-database-table
To avoid ORA-00904, column names must
begin with a letter.
consist only of alphanumeric and the special characters ($_#); other characters need double quotation marks around them.
be less than or equal to thirty characters.

Using a csv feeder to check a count i.e. convert to Int

I am using Gatling 2.0.0-SNAPSHOT (from a few months ago) and I have a CSV file:
search,min_results
test,1000
testing,50
...
I would, ideally, like to check that the number of results are equal or greater than the min_results column, something like:
.get(...)
.check(
jsonPath("$..results")
.count
.is(>= "${min_results}".toInt)
The last line doesn't work as it probably isn't valid Scala but also"${min_results}".toInt tries to convert ${min_results} rather than its value to an Int.
I would settle for just fixing the conversion toInt problem but this might result in a slightly less robust script. However, I could then add a max_results column and use the .in(xx to yy).
Thanks
In this case, toInt definitely won't work, as it would be called before the EL has been resolved.
However, this code snippet would achieve what you want :
.get(...)
.check(
jsonPath("$...results")
.count
.greaterThanOrEqual(session => session("min_results").as[String].toInt))
Here, using as[Int] works as expected, since the min_results attribute has already been resolved from the session at this point.

Talend How To Pass Last Modified File Into TFileInputDelimited?

I have searched all over, and read this post.
But it doesn't seem complete and doesn't work.
The situation: I need to get the last modified file from a directory on the local machine. I then need to pass that file into the fileinputdelimited component.
I currently have:
tfilelist --> iterate --> titeratetoflow --> tsamplerow
-->tflowtoiterate -> tinpufiledelimited ---> tlogrow (just to make sure its pulling the right file)
But it doesn't work. I have configured it. so that titeratetoflow has a column called
"FileName" with "((String)globalMap.get("CURRENT_FILE"))" as the value,
"FileDirectory" with ((String)globalMap.get("CURRENT_FILEDIRECTORY")) as value, and
"FileAndDirectory" with ((String)globalMap.get("CURRENT_FILEPATH")) as value.
The tsamplerow is limited to "1".
The tiflowtoiterate is set so that
"FileNameOnly" is value of "FileName"
"FileDirectoryOnly" is "FileDirectory" and
"FilePathComplete" is "FileAndDirectory"
In the File location field of the tinputfiledelimited, I have "((String)globalMap.get("FilePathComplete"))"
When it runs I get an error saying cannot find file or path. If I cut out the fileinput component and have it send straight to the tlogrow, it shows a single line of blank entry.
Any ideas?
I'm not sure if you've just slightly misconfigured the job here but it seems to work fine for me.
Here's a few screenshots showing my job design:
The only thing I can think of just by looking at your post is that you might have slightly messed up the key value pair combinations in the tFlowToIterate. I tend to find that the default settings there work fine pretty much all of the time and it makes it a little more obvious what it's doing as well.
EDIT: Actually, it looks like you might be using the wrong values in your tIterateToFlow. The tFileList will throw the values for the file paths etc in to the global map but it will preface it with the unique component name. If you hit ctrl+space in the value window it should prompt you with a list of available values (these are also specified in the "Outline" tab of the studio). It typically makes an implicit conversion to String but for this you will need to explicitly convert it so use .toString() instead of (String).
Another way to get last modified file is as below
tFileList(sorted DESC by file modified date) ------> tFixedFlowInput (schema - filename, filenumber) ----->tHashOutput
here in tFixedFlowInput
filename = file(String)globalMap.get("tFileList_1_CURRENT_FILEPATH")+"/"+(String)globalMap.get("tFileList_1_CURRENT_FILE")
filenumber = (Integer)globalMap.get("tFileList_1_NB_FILE")
What above will accomplish is get list of all files in the directory with their number/rank - where the file last modified will have file number =1 and next to that will have 2...and so on.
Now on SubJobOK of above tFileList you can have tHashInput which will read from above tHashOutput and filter only row where filenumber==1 - which means the last modified file.
tHashInput (link to tHashoutput) ---->tFilterRow(filenumber==1)------>tLogRow
One reason why you are getting null is probably you have used globalMap.get("CURRENT_FILEPATH) instead of globalMap.get("tFileList_1_CURRENT_FILEPATH")
The Simple Solution for above problem could be as below:
tFileList(sorted ASC by file modified date)--> tIterateToFlow --> tJava( just to end the subjob).
Then on
subjob ok --> tfileinput ( use (String)globalMap.get("tFileList_1_CURRENT_FILE") or (String)globalMap.get("tFileList_1_CURRENT_FILEPATH") as a file name/file path)
Explanation:
Since tFileList iterates all the files in ASC order, it will always have Latest file name stored in globalMap for the last iteration. The list is only iterated till tIterateToFlow hence after this component (String)globalMap.get("tFileList_1_CURRENT_FILE") will always give the last file name from the iterated list, which is the latest file in out case.
Main Flow :
Component View: