Talend - How to loop over an output from tmap - talend

I have a requirement were I am getting min_age and max_age from tmap. I want to loop over these 2 inputs i.e. from min_age to max_age and insert records into cassandra table per iteration value.
For example - min_age is 10 and max_age is 15, then I want to insert records as 10, 11, 12, 13, 14 and 15 in cassandra table.
I tried figuring out a solution but could not succeed. tloop component seemed to be the best fit but unfortunately tmap does not have an iterator connector.
Can anyone help out here.

You could just use tFlowToIterate to convert your input flow row into an iteration. Then add a loop on each iterated row to further iterate on the min/max values. Then you can convert your iteration back to a flow using tIteratetoFlow.
Here is a test job:
This is how I set up my tRowGenerator to try and simulate your input flow (you can use more than one row if you want):
And the component settings for your loop:
And the component settings to convert your iteration back to a flow:
And this is the log output:
Starting job testjob at 11:35 19/06/2017.
[statistics] connecting to socket on port 3914
[statistics] connected
10
11
12
13
14
15
[statistics] disconnected
Job testjob ended at 11:35 19/06/2017. [exit code=0]

You could have a subjob like this : [inputflow] --> tFlowToIterate-->tLoop-->tFixedFlowInput--> DBInsert
In tLoop, just use globalMap variables constructed by tFlowToIterate to populate your "from" and "to" fields (use ctrl space in these fields to find the variables constructed by tFlowToIterate).
In tFixedFlow, you can construct your data to be inserted : it is the current value of your tloop : ((Integer)globalMap.get("tLoop_1_CURRENT_VALUE"))

Related

Talend - How to get tFlowToIterate size and tFileInputRegex size?

Good day,
I have component tFileInputRegex and tFlowToIterate to read data from a text file, I saw there are number of row record being process as follow:
Which is 3412 rows, may I know how can I get this value in tJava_2 ?
currently I am using _NB_LINE but getting null:
System.out.println("total is " + (Integer)globalMap.get("tFileInputRegex_1_NB_LINE"));
System.out.println("total2 is " + (Integer)globalMap.get("tFlowToIterate_2_NB_LINE"));
In your example, tJava_2 executes within the iteration, i.e. once for each row. In that component, you can use globalMap.get("tFlowToIterate_2_CURRENT_ITERATION").toString() to get the number of rows processed so far. Please note that instead of casting it to Integer you need to convert it to a string as shown above in order to output it the way you do.
If you need the total number of rows, you can use globalMap.get("tFileInputRegex_1_NB_LINE").toString() - but it is only available after the end of the loop, which means the component where you access it needs to be connected to tFileInputRegex_1 via OnSubjobOk trigger.

SPSS aggergate on 2 variables

I am trying to compute a N_break that has to "satisfy" a condition. I have a variable which indicates 1 or 0. Lets call that variable "HT". Every lopnr is also labled in every row multiple times. So first 10 rows can be ID nr 1. And next 20 can be ID nr 2 and so on.
My question is: How do i create a N-break with lopnr as breakvariable that has to have HT=1? I am not allowed to select only 1s on variable HT before, since i need the 0s in the file.
A few simple ways to do this:
1 - USE FILTER
filter cases by HT.
aggregate ....
when you get back to original dataset, use:
filter off.
use all.
2 - COPY DATASET
dataset name orig.
dataset copy foragg.
dataset activate foragg.
select if HT.
aggregate....
3 - TEMPORARY SELECTION
temporary.
select if HT.
aggregate....

Execute pentaho job completely for first row before starting job for second row

My root job which has two steps,
Transformation Executor(to copy rows to results) & a Job Executor(Executing for each input row)
what I want is, that my sub-job should execute completely for first incoming row before it start execution for second row.
Click on the Job executor step and check the box Execute for every input row.
Tell me if it is not what you need.
Unless you specify a different value than 1 on Change Number Of Copies To Start (Right click on any Transformation Entry to see that option), that will always be the expected behavior.
If the number is greater than 1 then the Job Executor will have that number of copies running in parallel, distributing the input rows (for example, 100 input rows, with 10 copies, each copy will execute 10 rows no matter what).

tJavaFlex behaviour when changing loop position

Having some problems in a job, and I suspect it is due to a lack of understanding of tJavaFlex. I am generating 10 rows in this test job, and am generating loop inside a tJavaFlex:
So there are 10 rows coming in, and a loop in the Start and End section. I was expecting that for each row coming in, it would generate 10 identical rows coming out. And that I would see iterations 0,1,2,3....9 for each row.
What I got was this. This looks to me like the entire job is running 10 times, and so I have 100 random values coming through the flow from the tRowGenerator.
If I move the for loop into the Main Code section, I get close to the behaviour I was expecting. I am expecting each row when it comes in to be repeated 10 times, and for 1 row coming in to produce 10 output rows. What I get is this.
But even then my tLogRow is only generating one row for each 10 iterations it seems (look at the tLogRow output after iteration 9 above why not 10 items?). I had thought I would be getting 10 rows for each single row coming in and I would see this in the tLogRow.
What I need to do is take a value from a field coming in, do some reg exp parsing and split into an array, and then for each item in the array create lines in the output flow. i.e. 1 row coming in can be turned into x number of rows coming out using a string.split() method.
Can someone explain the behaviour above, and also advise on the best approach to get one value coming in, do some java manipulation and then generate multiple rows coming out?
Any advice appreciated.
Yes you don't use it correctly.
The initial part is for initiate variable. (executed one time before the first tow)
In the principal you put your loop (executed one time at each row)
In the final you store in global variable for example.(executed one time after the last row)
The principal code will be executed at each row in a tjavaflex. So don't put a for loop inside you can do like the example in the screen.
You tjavaflex comportement is normal. you have ten row so each row the for loop wil be executed 10 time (i<10)
You can use it like :
You dont need to create your own loop.
By putting the for loop in the Start code, your main code will be triggered by the loop and by incoming rows, and it will be executed n*r times.
The behaviour of subjob that contains a tJavaFlex, reveils that component before tJavaFlex is included into its starting code, and the after component is included in the ending code, but that may depend to many conditions like data propagation and trigger type.
start code :
System.out.print("tJavaFlex is starting...");
int i = 0;
Main code :
i++;
System.out.print("tJavaFlex inside Main Code...iteration:"+i);
row8.ITEM_NAME = row7.ITEM_NAME;
row8.ITEM_COUNT = row7.ITEM_COUNT;
End code :
System.out.print("tJavaFlex is ending...");
System.out.print(row7.ITEM_NAME);
Instead of main flow in row5, try using iterate flow to connect tJavaFlex

compare current value with previous value in datastage

i have input like below
empid salary
10 1000
20 2000
30 3000
40 4000
the output i require in a sequential fie is like below. that is prevsal should have the salary of the previous row
empid salary prevsal
10 1000 null
20 2000 1000
30 3000 2000
40 4000 3000
i tried using a transformer by giving stage variable as prevsal=inputlink.salary and then defining a output column inputlink.salary=prevsal. i know that doesnt work logically and yes it didnt work. can anyone find me a solution for this.
You are on the right way - transformer and stage variables is the way to go.
Remember that within the transformer the data is processed top down. This means the first (top most) stage variable is processed first, then the second and so on and finally the data is put on the output links.
Having you input column: inputlink.salary
Assuming two stagevariables: svPrevSalary (top most)
and a second one svCurrentSalary
Try following assingments in the stage variable section:
1. svCurrentSalary (=) svPrevSalary
2. inputlink.salary (=) svCurrentSalary
Use
svPrevSalary
as derivation of the output link / field.
Please note that the (=) are just the idea you have to specify only svCurrentSalary for the first stage variable.
I was facing the same problem when I started to do.i was not getting the expected result.
For this question, we have to note down 2 things.
1. for which jobs I m doing.like for the server, sequential or parallel.I am working in the parallel environment.
2. pls remember the order of execution of link i.e i/o order.
code- curr_salary -> Prev_Sal,
link.salary -> curr_Salary
link to prev_salary output link to prev_salary
Note
If you working in the Parallel environment then you have to make a sequential mode in execution section in every stage.
Go to Stage-> advanced -> Excution mode-> sequential.
I think it should work. I did this practically.Transformer_Image
thanks