oozie intial instance and start time giving error on missing dataset - oozie-coordinator

I am new to oozie and trying to understand dataset.xml. I have following dataset and trying to understand what exactly oozie is trying to validate here. what is the meaning of initial instance and what uri-template is doing here(not clear on oozie document)
<dataset name="sample" frequency="${coord:hours(1)}" initial-instance="2022-01-10T00:00Z" timezone="UTC">
<uri-template>${hdfsdir}/filepath/${YEAR}${MONTH}${DAY}${HOUR}</uri-template>
<done-flag>_SUCCESS</done-flag>
</dataset>
Similarly, in coordinator I have following for input and output dataset. Here what is the significance of current(-5) and start parameter?
<coordinator-app name="test" frequency="${freq}" start="2022-01-10T00:00Z" end="2023-04-11T00:00Z" timezone="UTC" xmlns="uri:oozie:coordinator:0.4" xmlns:sla="uri:oozie:sla:0.2">
<data-in name="raw" dataset="raw_data">
<instance>${coord:current(-5)}</instance>
</data-in>
<data-out name="processed" dataset="raw_out">
<instance>${coord:current(-5)}</instance>
</data-out>
Can someone explain what oozie is expecting on the datasets?
Thanks, bab

Without looking at the documentation, here's what I can guess.
initial-instance - When is the dataset first available? If you try to provide a timestamp before this in a workflow or coordinator, you can expect an error.
After which, a positive frequency will "count up" from that timestamp
uri-template uses built-in Oozie variables to determine what pattern those files exist in the filesystem.
coord:current(-5) will multiply 5 by the dataset frequency, and return the 5th previous instance... Giving you a dataset 5 hours before the time that the coordinator was started.
So, for your example, you have dataset name="sample" defined, but your data-in and data-out tags do not reference this, so I don't think anything will run...
Here's the docs for coord:current (might say something different from my answer) https://oozie.apache.org/docs/5.2.1/CoordinatorFunctionalSpec.html#a6.6.1._coord:currentint_n_EL_Function_for_Synchronous_Datasets
Section 5.1 seems to mostly answer your question

Related

Movesense, setting the system time

I am trying to set the system time in Movesense. I couldn't find an example of that, but based on the documentation I think that this should do:
asyncPut(WB_RES::LOCAL::TIME(),
AsyncRequestOptions::Empty,
(int64_t)0);
In this case, I'm just trying to reset the epoch to zero but onPutResults gives me
HTTP_CODE_BAD_REQUEST
So what is the right way?
Minimum timestamp seems to be 1483228800000000 us which corresponds to 1.1.2017. So you can't set the time to 70's as zero would set it to.
This should be documented in the yaml api but currently is not. We will add that to the list of tasks to make sure it's documented in the next release of device-lib.

Grok filter for a time counter HH:MM

I'm quite new to ELK and Grok-filtering, and I'm struggling with parsing this particular pattern in my grok filter.
I've used the grok debugger to try and solve this, but although I like the tool, I just get confused by the custom patterns.
Eventually, I hope to parse lots of log files sent by filebeat to logstash, then send the parsed logs to elasticsearch and display with kibana or some similar visualization tool.
The lines that I need to parse follow the following pattern:
1310 2017-01-01 16:48:54 [325:51] [326:49] [359:57] Some log info text
The first four digits is a log type identifier, and will be used for grouping. I've called the field "LogLineID".
The date is formatted YYYY-MM-DD HH:MM:SS, and is parsed ok. I called the field "LogDate".
But now the problem begins. Within the square brackets, I have counters, formatted as MM:SS if you like. I cannot for the life of me find a way to sort these out, but I need to compare these times, hence I want to store them as minutes and seconds, not just numbers.
The first is a counter "TimeSpent",
the second is a counter "TimeStarted" and
the third is a counter "TimeSinceDown".
Then, last, comes the info text, which I've managed to grok with simply applying %{GREEDYDATA:LogInfo}.
I notice that the amount of minutes could be far higher than the standard 60 minutes within an hour, so I may be barking up the wrong tree here trying to parse it with date patterns such as TIMESTAMP_ISO8601, but then, I don't really know how else to do this.
So, I came this far:
%{NUMBER:LogLineID} %{TIMESTAMP_ISO8601:LogDate}
and were as mentioned able to (by cutting away the square bracket parts) to parse the log info text with
%{GREEDYDATA:LogInfo}
to create a field LogInfo.
But that's were I'm stuck. Could someone please help me figure out the rest?
Massive thanks in advance.
PS! I also found %{NUMBER:duration}, but it could as far as I could tell only parse timestamps with dot, not colon..
grok regex expression can help you solve the problem.
but first I wanna make sure that do you mean [325:51] [326:49] [359:57] are the three component that you wanna to fetch? And it will returns the result like :
TimeSpent: 325:51
TimeStarted: 326:49
TimeSinceDown: 359:57
were i get the point , you can use my ways in on of the following suggestions:
define your own custom pattern files and add the pattern in your file.
just use the expression in filter part of logstash conf file
hope it will helps you
Ah, there was a space.. Actually, I was misleading myself and everybody in my question, as it was not actually that log line that was causing problems. I just took the first one, not realizing where the problem really were, but the one causing problems had a space within the brackets as such: [ 42:31]. There are also some parts where there are two spaces, so the way I managed to solve this was to include a %{SPACE} between the \[ and the %{NUMBER}:
%{NUMBER:LogLineID} %{TIMESTAMP_ISO8601:LogDate} \[%{SPACE}%{NUMBER:TimeSpentMinutes}\:%{NUMBER:TimeSpentSeconds}\] \[%{SPACE}%{NUMBER:TimeStartedMinutes}\:%{NUMBER:TimeStartedSeconds}\] \[%{SPACE}%{NUMBER:TimeSinceDownMinutes}\:%{NUMBER:TimeSinceDownSeconds}\] %{GREEDYDATA:LogText}
I still haven't solved the merging of minutes and seconds, but this I can also handle in a later stage.
Thanks to Lin Don for showing an interest in my problem, and sorry for not replying sooner.
Hope the solution will help others (or even myself) if their stuck on the same kind of problem.
Note to myself: Read the logs more carefully before grok'ing.. :)

connecting nulls in logi analytics charts

I am new to LogiAnalytics. I am using Series.Line to plot a graph with values retried from MongoDB. The values are different for each date. Some times it happens that for the very first date there is no value in the MongoDB collection itself. When we create a graph, it ignores the first value which is actually not having a value and starts from the second point which has a value.
In the Series.Line there is an attribute "Connect Nulls" and we set that to "true". However it does not make any impact. Can any please help me solve this problem.
I am adding my code snippet here"
<ChartCanvas
AutoQuicktip="True"
BorderColor="#cfcfcf"
BorderColorTransparency="1"
ChartBorderThickness="1"
ID="lineChart"
NoDataCaption="#Request.noDataDisplay~"
SpacingLeft="50"
SpacingRight="50"
>
<Series ChartXDataColumn="dateCalculatedColumn"
ChartXDataColumnType="DateTime" ChartYDataColumn="count"
ConnectNulls="True"
ID="engagementSeriesLine" Type="Line"
>
<DataLayer
ConnectionID="connMetrics"
ID="dlLineGraph"
MongoRunCommand="{
//My query here
}"
Type="MongoRunCommand"
>
<CalculatedColumn
Formula="(new
Date(#Data.day~).getMonth()+1)+"/"+new
Date(#Data.day~).getDate()+"/"+new Date(#Data.day~).getFullYear()"
ID="dateCalculatedColumn"
/>
<CrosstabFilter
CrosstabColumn="network"
CrosstabLabelColumn="dateCalculatedColumn"
CrosstabValueColumn="count"
CrosstabValueFunction="sum"
ID="rdCrosstabValue"
/>
</DataLayer>
</Series>
<ChartXAxis
AxisType="DateTimeLinear"
ScaleLowerBound="#Request.stDate~"
ScaleUpperBound="#Request.edDate~"
>
<AxisLabelStyle
Format="MM/dd"
/>
</ChartXAxis>
</ChartCanvas>
Request you to please help.
Unfortunately, If it is a bug in Logi, especially one they've fixed in a newer version, you're only real solution is to update the version you are running. Luckily that's pretty easy to do, it just requires some regression testing to make sure your existing application isn't going to have any issues.

How to get triggers from .nxe files with FieldTrip Toolbox

I'm trying to analyse TMS-EEG data from Nexstim with FieldTrip Toolbox. I want to make a trial matrix from my raw .nxe data. But how I know which triggers to assign for cfg.trialdef.eventvalue, when cfg is the output variable. I'm trying to mimic the same kind of code as you can find from the tutorial: http://www.fieldtriptoolbox.org/tutorial/tms-eeg
I came up with a solution to the problem. With a command event = ft_read_event('filename.nxe') I got a struct with fields: type, value, sample, duration and offset and this is all I need.

Job executed with no data in Spark Streaming

My code:
// messages is JavaPairDStream<K, V>
Fun01(messages)
Fun02(messages)
Fun03(messages)
Fun01, Fun02, Fun03 all have transformations, output operations (foreachRDD) .
Fun01, Fun03 both executed as expected, which prove "messages" is not null or empty.
On Spark application UI, I found Fun02's output stage in "Spark stages", which prove "executed".
The first line of Fun02 is a map function, I add log in it. I also add log for every step in Fun02, they all prove "with no data".
Does somebody know possible reasons? Thanks very much.
#maasg Fun02's logic is:
msg_02 = messages.mapToPair(...)
msg_03 = msg_02.reduceByKeyAndWindow(...)
msg_04 = msg_03.mapValues(...)
msg_05 = msg_04.reduceByKeyAndWindow(...)
msg_06 = msg_05.filter(...)
msg_07 = msg_06.filter(...)
msg_07.cache()
msg_07.foreachRDD(...)
I have done test on Spark-1.1 and Spark-1.2, which is supported by my company's Spark cluster.
It seems that this is a bug in Spark-1.1 and Spark-1.2, fixed in Spark-1.3 .
I post my test result here: http://secfree.github.io/blog/2015/05/08/spark-streaming-reducebykeyandwindow-data-lost.html .
When continuously use two reduceByKeyAndWindow, depending of the window, slide value, "data lost" may appear.
I can not find the bug in Spark's issue list, so I can not get the patch.