How to check if the stream of rows has ended - talend

Is there a way for me to know if the stream of rows has ended? That is, if the job is on the last row?
What im trying to do is for every 10 rows do something, my problem are the last rows, for example in 115 rows, the last 5 wont happen but i need them to.

There is no built-in functionality in Talend which tells you if you're on the last row. You can work around this using one of the following:
Get the row count beforehand. For instance, if you have a file, you
can use tFileRowCount to count the number of rows, then when you
process your file, you use a variable for your current row
number, and so you can tell if you've reached the last row. If your
data come from a database, you could either issue a query that
returns the total number of rows beforehand, or modify your main
query to return the total number of rows in an additional column and
use that (using ranking functions).
Do some processing after the subjob has ended: There may be situations
where you need a special processing for the last row, you can achieve
this by getting the last row processed by the previous subjob (which
you have already saved, for instance, by putting a tSetGlobalVar
after your target, when your subjob is done, your variable contains the last written value).
Edit
For your use case, what you could do is first store the result of the API call in memory using tHashOutput, then read it with a tHashInput in order to process it, and you'll know then how many rows you have retrieved using tHashOutput's global variable tHashOuput_X_NB_LINE.

Related

Talend - How to get tFlowToIterate size and tFileInputRegex size?

Good day,
I have component tFileInputRegex and tFlowToIterate to read data from a text file, I saw there are number of row record being process as follow:
Which is 3412 rows, may I know how can I get this value in tJava_2 ?
currently I am using _NB_LINE but getting null:
System.out.println("total is " + (Integer)globalMap.get("tFileInputRegex_1_NB_LINE"));
System.out.println("total2 is " + (Integer)globalMap.get("tFlowToIterate_2_NB_LINE"));
In your example, tJava_2 executes within the iteration, i.e. once for each row. In that component, you can use globalMap.get("tFlowToIterate_2_CURRENT_ITERATION").toString() to get the number of rows processed so far. Please note that instead of casting it to Integer you need to convert it to a string as shown above in order to output it the way you do.
If you need the total number of rows, you can use globalMap.get("tFileInputRegex_1_NB_LINE").toString() - but it is only available after the end of the loop, which means the component where you access it needs to be connected to tFileInputRegex_1 via OnSubjobOk trigger.

Passing Coldfusion query result to Form

I have two Coldfusion templates (getdata.cfm and generate.cfm). The first template getdata.cfm will retrieve from a database with a query, in addition to other tasks. It will retrieve exactly 16 rows of data and each row will have 8 fields. Such as this:
<cfquery datasource="xyz name="lista">
SELECT n1,n2,n3,n4,n5,n6,n7,n8
FROM atable
WHERE product = "abc"
ORDER BY date DESC LIMIT 16
</cfquery>
The second template will generate some random numbers and compare them against these 16 rows. There is a Refresh button on the second template to regenerate the numbers. This is how I would like it to work.
However, right now the only way for it to work is having the database <cfquery> in the second template, generate.cfm That means every time I press Refresh, it will access the database, retrieve the same 16 rows every time, and generate the random numbers. This is not ideal. Because the 16 rows are the same, it makes no sense to retrieve them every time a new set of random numbers get generated. It would be best to get them once, in the first template and somehow pass them to the second template. The 16 sets of numbers will need to be displayed on the screen at all times. The matched and unmatched numbers need to be shown.
How can I pass the whole query result from the first template to the second one without having to pass the 16 records as 16 lists via the form as form fields? Is this even possible? Thanks in advance.
Generate a set of random numbers.
Compare those numbers against a static set of data.
Repeat.
Do you need to do the comparison in the application (CFML) code? Can you generate the set of random numbers and send them to the DB as part of the query in a single request? That way, you get the records from the DB that match your set of numbers and not all 16.
Then every refresh would send the new set of random numbers to the DB, returning only the relevant data.
Alternately, you can use cfquery with the cachedWithin attribute in order to store the results of the query into memory for a specific amount of time while refreshing your random set of numbers.
https://cfdocs.org/cfquery

T-SQL set a rotating Flag (True/False) in records using Stored Proc

I can do this using multiple commands in C# for the app I'm creating, but prefer a stored proc to eliminate issues with latency/locks, etc. (hopefully):
I have a table of 10 extensions (important fields):
SortOrder, Extension, IsUsed
First record will be set to IsUsed = true
When calling the stored proc, I need the IsUsed of the NEXT record in sort order to be set to true,the current record that is true set to false. When I hit the last record, rotate back to the first record.
Use Case: I need to rotate through a bank of usable numbers. Multiple people use the app, so cannot reuse. a number within the last 4 minutes (Bank of 10 will suffice, but we can extend if necessary). When the user requests a number, they get the next avail. I can build the table however needed, so any and all options to achieve use case are welcome.
I need to set the flag to true on the 1st record when stored proc is called. All other records should be false.
I have seen this, which is of interest, but doesn't quite answer:
Get "next" row from SQL Server database and flag it in single transaction
If all that you're using this for is to return a number to identify a session, I'd suggest scrapping the whole table idea and letting SQL Server do the work for you.
You can create a SEQUENCE object that will cycle and return the next value for you, without needing to write any code or maintain any tables.
CREATE SEQUENCE dbo.Extension
AS integer
START WITH 5
INCREMENT BY 5
MINVALUE 5
MAXVALUE 50
CYCLE;
This will return the number 5 the first time it's called, up to the number 50 on call number 10, and then start back over. You can adjust the numbers in the code to more or less do whatever you would like, though.
Get the next value like this:
SELECT NEXT VALUE FOR dbo.Extension;
And when/if you need to extend the range:
ALTER SEQUENCE dbo.Extension
MAXVALUE 100;
Play around with the idea on the Rextester demo.
Edit: In light of the comments above and below, I'd still stick with a SEQUENCE, I think.
Every time your code calls the table for an extension, use a query along the lines of this:
SELECT
Extension
FROM
ExtTable
WHERE
SortOrder = NEXT VALUE FOR dbo.Extension;
Functionally, this should do what you're after, again with no code to write or maintain.

tJavaFlex behaviour when changing loop position

Having some problems in a job, and I suspect it is due to a lack of understanding of tJavaFlex. I am generating 10 rows in this test job, and am generating loop inside a tJavaFlex:
So there are 10 rows coming in, and a loop in the Start and End section. I was expecting that for each row coming in, it would generate 10 identical rows coming out. And that I would see iterations 0,1,2,3....9 for each row.
What I got was this. This looks to me like the entire job is running 10 times, and so I have 100 random values coming through the flow from the tRowGenerator.
If I move the for loop into the Main Code section, I get close to the behaviour I was expecting. I am expecting each row when it comes in to be repeated 10 times, and for 1 row coming in to produce 10 output rows. What I get is this.
But even then my tLogRow is only generating one row for each 10 iterations it seems (look at the tLogRow output after iteration 9 above why not 10 items?). I had thought I would be getting 10 rows for each single row coming in and I would see this in the tLogRow.
What I need to do is take a value from a field coming in, do some reg exp parsing and split into an array, and then for each item in the array create lines in the output flow. i.e. 1 row coming in can be turned into x number of rows coming out using a string.split() method.
Can someone explain the behaviour above, and also advise on the best approach to get one value coming in, do some java manipulation and then generate multiple rows coming out?
Any advice appreciated.
Yes you don't use it correctly.
The initial part is for initiate variable. (executed one time before the first tow)
In the principal you put your loop (executed one time at each row)
In the final you store in global variable for example.(executed one time after the last row)
The principal code will be executed at each row in a tjavaflex. So don't put a for loop inside you can do like the example in the screen.
You tjavaflex comportement is normal. you have ten row so each row the for loop wil be executed 10 time (i<10)
You can use it like :
You dont need to create your own loop.
By putting the for loop in the Start code, your main code will be triggered by the loop and by incoming rows, and it will be executed n*r times.
The behaviour of subjob that contains a tJavaFlex, reveils that component before tJavaFlex is included into its starting code, and the after component is included in the ending code, but that may depend to many conditions like data propagation and trigger type.
start code :
System.out.print("tJavaFlex is starting...");
int i = 0;
Main code :
i++;
System.out.print("tJavaFlex inside Main Code...iteration:"+i);
row8.ITEM_NAME = row7.ITEM_NAME;
row8.ITEM_COUNT = row7.ITEM_COUNT;
End code :
System.out.print("tJavaFlex is ending...");
System.out.print(row7.ITEM_NAME);
Instead of main flow in row5, try using iterate flow to connect tJavaFlex

Tableau Future and Current References

Tough problem I am working on here.
I have a table of CustomerIDs and CallDates. I want to measure whether there is a 'repeat call' within a certain period of time (up to 30 days).
I plan on creating a parameter called RepeatTime which is a range from 0 - 30 days, so the user can slide a scale to see the number/percentage of total repeats.
In Excel, I have this working. I sort CustomerID in order and then sort CallDate from earliest to latest. I then have formulas like:
=IF(AND(CurrentCustomerID = FutureCustomerID, FutureCallDate - CurrentCallDate <= RepeatTime), 1,0)
CurrentCustomerID = the current row, and the FutureCustomerID = the following row (so it is saying if the customer ID is the same).
FutureCallDate = the following row and the CurrentCallDate = the current row. It is subtracting the future call time from the first call time to measure the time in between.
The goal is to be able to see, dynamically, how many customers called in for a specific reason within maybe 4 hours or 1 day or 5 days, etc. All of the way up until 30 days (this is our actual metric but it is good to see the calls which are repeats within a shorter time frame so we can investigate).
I had a similar problem, see here for detailed version Array calculation in Tableau, maxif routine
In your case, that is basically the same thing as mine, so you could apply that solution, but I find it easier to understand the one I'm about to give, I would do:
1) Create a calculated field called RepeatTime:
DATEDIFF('day',MAX(CallDates),LOOKUP(MAX(CallDates),-1))
This will calculated how many days have passed since the last call to the current. You can add a IFNULL not to get Null values for the first entry.
2) Drag CustomersID, CallDates and RepeatTime to the worksheet (can be on the marks tab, don't need to be on rows or column).
3) Configure the table calculation of RepeatTIme, Compute using Advanced..., partitioning CustomersID, Adressing CallDates
Also Sort by Field CallDates, Maximum, Ascending.
This will guarantee the table calculation works properly
4) Now you have a base that you can use for what you need. You can either export it to csv or mdb and connect to it.
The best approach, actually, is to have this RepeatTime field calculated outside Tableau, on your database, so it's already there when you connect to it. But this is a way to use Tableau to do the calculation for you.
Unfortunately there's no direct way to do this directly with your database.