How can I add the time that a message arrives in a tickerplant from a feedhandler in KDB - kdb

I’m wondering what the best way to add the time that data arrives in a KDB tickerplant from a feedhandler. As the data arrives in list form, would it be correct to just append .z.z to the end of the list using the join operator?

Are you using the tick library from Kx tick? I think it automatically appends the time that data reaches the tickerplant for you.
https://github.com/KxSystems/kdb-tick/blob/master/tick.q
On line 38:
if[not -16=type first first x;
if[d<"d"$a:.z.P;.z.ts[]];
a:"n"$a;x:$[0>type first x;a,x;(enlist(count first x)#a),x]
];
Here x is the table/list that is being sent to the ticker plant and a is the timespan that has been added.

Related

How to check if the stream of rows has ended

Is there a way for me to know if the stream of rows has ended? That is, if the job is on the last row?
What im trying to do is for every 10 rows do something, my problem are the last rows, for example in 115 rows, the last 5 wont happen but i need them to.
There is no built-in functionality in Talend which tells you if you're on the last row. You can work around this using one of the following:
Get the row count beforehand. For instance, if you have a file, you
can use tFileRowCount to count the number of rows, then when you
process your file, you use a variable for your current row
number, and so you can tell if you've reached the last row. If your
data come from a database, you could either issue a query that
returns the total number of rows beforehand, or modify your main
query to return the total number of rows in an additional column and
use that (using ranking functions).
Do some processing after the subjob has ended: There may be situations
where you need a special processing for the last row, you can achieve
this by getting the last row processed by the previous subjob (which
you have already saved, for instance, by putting a tSetGlobalVar
after your target, when your subjob is done, your variable contains the last written value).
Edit
For your use case, what you could do is first store the result of the API call in memory using tHashOutput, then read it with a tHashInput in order to process it, and you'll know then how many rows you have retrieved using tHashOutput's global variable tHashOuput_X_NB_LINE.

How to Handle Rows that Change over Time in Druid

I'm wondering how we could handle data that changes over time in Druid. I realize that Druid is built for streaming data where we wouldn't expect a particular row to have data elements change. However, I'm working on a project where we want to stream transactional data from a logistics management system, but there's a calculation that happens in that system that can change for a particular transaction based on other transactions. What I mean:
-9th of the month - I post transaction A with a date of today (9th) that results in the stock on hand coming to 0 units
-10th of the month - I post transaction B with a date of the 1st of the month, crediting my stock amount by 10 units. At this time (on the 10th of the month) the stock on hand for transaction A recalculates to 10 units. The same would be true for ALL transactions after the 1st of the month
As I understand it, we would re-extract transaction A, resulting in transaction A2.
The stock on hand dimension is incredibly important to our metrics. Specifically, identifying when stockouts occur (when stock on hand = 0). In the above example, if I have two rows for transaction A, I would be mistakenly identifying a stockout with transaction A1, whereas transaction A2 is the source of truth.
Is there any ability to archive a row and replace it with an updated row, or do we need to add logic to our queries that finds the rows with the freshest timestamp per transaction id?
Thanks
I have two thoughts that I hope help you. The key documentation for this is "Updating Existing Data": http://druid.io/docs/latest/ingestion/update-existing-data.html which gives you three options: Lookup Tables, Reindexing, and Delta Ingestion. The last one, Delta Ingestion, is only for adding new rows to old segments, so that's not very useful for you, let's go over the other two.
Reindexing: You can crunch all the numbers that change in your ETL process, identify the segments that would need to be reloaded, and simply have Druid re-index those segments. That will replace the stock-on-hand value for A in your example whenever you want, whenever you do the re-indexing.
Lookups: If you have stock values for multiple products, you can store the product id in the segment and have that be immutable, but lookup the stock-on-hand value in a lookup. So, you would store:
A, 2018-01-01, product-id: 123
And in your lookup, you'd have:
product-id: 123, stock-on-hand: 0
And later, you'd update the lookup and change that to 10. This would update any rows that reference product-id: 123.
I can't be sure but you may be mixing up dimensions and metrics while you're doing this, and you may need to read over that terminology in OLAP descriptions like this: https://en.wikipedia.org/wiki/Online_analytical_processing
Good luck!

Finding Difference from Previous Day Volume to Next Day Volume

I have been struggling to find the new incoming volume per day.
I have categories as : - total ticket, Resolved, closed and Daily left.
So the calc is everyday resolved and closed are moved from the queue and
'daily left = Total Ticket- (Pending + Closed)'
Now there is some carry forward everyday hence the total ticket for the next day includes some volume i.e. Daily left of previous day.
I am not able to figure out how to show that number, I have tried using previous value but it is not helping. Please suggest. Attaching a print screen of the data.
For 3rd the # of records are 33 however there is 1 carry forward from previous
day hence the Fresh Vol should be 32. I have used the formula to calc but it is
not giving correct result
sum([Number of Records]) - (PREVIOUS_VALUE([Daily Left Volume]))
This is taking the left over of current day and not previous day.
I am also using look Up function but that also does not show the current output.
The output from tableau after using the lookup function is below attached as well
I am new to this community and dont have enought reputation to comment :P. So writing few possible solutions here:
1) Make sure the data is sorted by date and is unique on date level. If it is not then Previous or lookup might not work
2) Another solution will be take running_sum of every field and then apply the operations. This should give right answer
3) If this does not will it possible to change the way you import the data?
a) Simply create another filed as Date_past = Date-1 in your raw data.
b) Duplicate your data.
c) join the two data sets on Date = Date_past
d) Now you have all the data of today's date and last day in one field and you can perform operations as you need

tJavaFlex behaviour when changing loop position

Having some problems in a job, and I suspect it is due to a lack of understanding of tJavaFlex. I am generating 10 rows in this test job, and am generating loop inside a tJavaFlex:
So there are 10 rows coming in, and a loop in the Start and End section. I was expecting that for each row coming in, it would generate 10 identical rows coming out. And that I would see iterations 0,1,2,3....9 for each row.
What I got was this. This looks to me like the entire job is running 10 times, and so I have 100 random values coming through the flow from the tRowGenerator.
If I move the for loop into the Main Code section, I get close to the behaviour I was expecting. I am expecting each row when it comes in to be repeated 10 times, and for 1 row coming in to produce 10 output rows. What I get is this.
But even then my tLogRow is only generating one row for each 10 iterations it seems (look at the tLogRow output after iteration 9 above why not 10 items?). I had thought I would be getting 10 rows for each single row coming in and I would see this in the tLogRow.
What I need to do is take a value from a field coming in, do some reg exp parsing and split into an array, and then for each item in the array create lines in the output flow. i.e. 1 row coming in can be turned into x number of rows coming out using a string.split() method.
Can someone explain the behaviour above, and also advise on the best approach to get one value coming in, do some java manipulation and then generate multiple rows coming out?
Any advice appreciated.
Yes you don't use it correctly.
The initial part is for initiate variable. (executed one time before the first tow)
In the principal you put your loop (executed one time at each row)
In the final you store in global variable for example.(executed one time after the last row)
The principal code will be executed at each row in a tjavaflex. So don't put a for loop inside you can do like the example in the screen.
You tjavaflex comportement is normal. you have ten row so each row the for loop wil be executed 10 time (i<10)
You can use it like :
You dont need to create your own loop.
By putting the for loop in the Start code, your main code will be triggered by the loop and by incoming rows, and it will be executed n*r times.
The behaviour of subjob that contains a tJavaFlex, reveils that component before tJavaFlex is included into its starting code, and the after component is included in the ending code, but that may depend to many conditions like data propagation and trigger type.
start code :
System.out.print("tJavaFlex is starting...");
int i = 0;
Main code :
i++;
System.out.print("tJavaFlex inside Main Code...iteration:"+i);
row8.ITEM_NAME = row7.ITEM_NAME;
row8.ITEM_COUNT = row7.ITEM_COUNT;
End code :
System.out.print("tJavaFlex is ending...");
System.out.print(row7.ITEM_NAME);
Instead of main flow in row5, try using iterate flow to connect tJavaFlex

Summation of numbers in a list

I have a long list of precipitation data in a txt file on this form:
50810,200301110422,0.1
50810,200301110422,0.1
50810,200301110422,0.1
50810,200301110423,0.1
50810,200301110423,0.1
50810,200301110423,0.1
50810 is the station number
200301110422/23 is the date and time
0.1 is the amount of precipitation in mm.
One txt file covers one year, and everytime there is more than one line With the same date and time I want to sum the precipitation.
So, in this case I want the outcome to be:
50810,200301110422,0.3
50810,200301110423,0.3
I already know how to read the txt file and make a New one. I just need something to put in the matlab code to compress the lines With the same date and time.
Any suggestions on how to do this??
I presume you mean same Station and Date Time so you can sum the precipitation by location?
Iterate though each line and do a split to break each line into it's elements. E.g.:
If previous Station & Date Time = current Station & Date Time then
add the value.
else
save what you got to your new text file and proceed to the next.
OR
Copy the structure into a LIST and use some LinQ on it.
OR
Really, you'd want to chuck all this info into a SQL table and then use some aggregate queries on the table. I'd probably go down the SQL route, personally, as it makes it easier to re-extract and manipulate later on.