I want to execute some subjob if the previously processed number of rows are greater than N. To do this, i'm using the following configuration:
tFixedFlowInput have some rows.
tAggregateRow uses the count function and outputs one row with the number.
tSetGlobalVar then stores this value into a global var that I can check in the Run If connector (In this case, (Integer)globalMap.get("tSetGlobalVar_1") > 3 ).
tMsgBox then shows if the condition is true.
What I would like is to do the same, but in a more elegant way, using the minimum components required. I would like to connect tAggregateRow with the Run If connector directly (or even tFixedFlow) with tMsgBox, but I haven't found a way to refer to the number of rows previously processed without using the output row2.count variable.
How could I do something like this?
What should I put in the If condition to refer to the tAggregateRow operation result without connecting it to another meaningless component like exposed at the beginning?
for any talend component look under outline tab under the left side workspace pane at the bottom. this lists down the properties available via global variables for that component. Some properties like count of records inserted by output components are only available once the component is executed completely (After).
for your case you can try directly using ((Integer)globalMap.get("tFixedFlowInput_1_NB_LINE")) which gives number of lines (after) given by tFixedFlowInput.
Related
I have a transformation in Pentaho Data Integration that stores data in several tables from a database.
But this database has constraints, meaning I can't put things in a table before the related data is put in another table.
Sometimes it works, sometimes it doesn't, depends on concurrency luck.
So I need to assure that Table Output 1 gets entirely run before Table Output 2 starts.
How can I do this?
You can use a step named "Block this step until steps finish".
You place it before the step that needs to wait. And inside the block you define which steps are to be waited for.
Below, suppose, Table Output 2 contains a foreign key to a field in table 1, but the rows you're going to reference in table 2 still don't exist in table 1. This means Table Output 2 needs to wait until Table Output finishes.
Place the "block" connected before Table Output 2:
Then enter the properties of the "block" step. Inside, add Table Output in the list (and any other steps you want to wait for):
For that, you can use job instead of transformation. because in transformation all the steps run parallelly. so use a job in that add the first transformation in which table output1 will be executed first and in second transformation table output2 will be performed
I am designing a ADF pipeline that copies rows from a SQL table to a folder in Azure Data Lake. After that the rows in SQL should be deleted. But for this delete action takes place I want to know if the number rows that are copied are the same as the number of rows that were I selected in the beginning of the pipeline.
Is there a way to get the rowcount fo the copy action and use this in another action (like a lookup)
Edit follow up question:
Bo Xiao's answer is OK. BUt then I have a follow up question. After the copy-activity I put an If Condition with the following expression:
#activity('LookUpActivity').output.firstRow.RecordsRead == #{activity('copyActivity').output.rowsCopied
But then I get the error: #activity('LookUpActivity').output.firstRow.RecordsRead == #{activity('copyActivity').output.rowsCopied
Isn't it possible to compare output parameters of two activities to see if this is True?
extra edit: I just found an error in this piece of code. I forgot a "{" at the begin of the code. But then the code is still wrong. To compare two outputs from earlier activities the code must be:
#equals(activity('LookUpActivity').output.firstRow.RecordsRead,activity('copyActivity').output.rowsCopied)
You can find copied rows in activity output as pictured below.
And you can use the output value like this:
#activity('copyActivity').output.rowsCopied
I have one item extracted from a JMX interrogation and I want to compare it against another one extracted from a database. These values shouldn't be equal but first one should contain/include the second - if not then I need an alert.
Obviously I tried with str() function but this one doesn't accept an {item.last} as parameter for the V string:
{node1:ITEM1.str({node2:ITEM2.last()})}=0
Any other idea?
That is currently not possible with the built-in functionality. While it might be possible in Zabbix 3.4, that is not fully clear yet. Please also note that only numeric values can be compared in the Zabbix trigger expressions.
I am pretty new to Pentaho so my query might sound very novice.
I have written a transformation in which am using CSV file input step and table input step.
Steps I followed:
Initially, I created a parameter in transformation properties. The
parameter birthdate doesn't have any default value set.
I have used this parameter in postgresql query in table input step
in the following manner:
select * from person where EXTRACT(YEAR FROM birthdate) > ${birthdate};
I am reading the CSV file using CSV file input step. How do I assign the birthdate value which is present in my CSV file to the parameter which I created in the transformation?
(OR)
Could you guide me the process of assigning the CSV field value directly to the SQL query used in the table input step without the use of a parameter?
TLDR;
I recommend using a "database join" step like in my third suggestion below.
See the last image for reference
First idea - Using Table Input as originally asked
Well, you don't need any parameter for that, unless you are going to provide the value for that parameter when asking the transformation to run. If you need to read data from a CSV you can do that with this approach.
First, read your CSV and make sure your rows are ok.
After that, use a select values to keep only the columns to be used as parameters.
In the table input, use a placeholder (?) to determine where to place the data and ask it to run for each row that it receives from the source step.
Just keep in ming that the order of columns received by the table input (the columns out of the select values) is the same order that it will be used for the placeholders (?). This should not be a problem with your question that uses only one placeholder, but keep that in mind as you ramp up using Pentaho.
Second idea, using a Database Lookup
This is another approach where you can't personalize the query made to the database and may experience a better performance because you can set a "Enable cache" flag and if you don't need to use a function on your where clause this is really recommended.
Third idea, using a Database Join
That is my recommended approach if you need a function on your where clause. It looks a lot like the Table Input approach but you can skip the select values step and select what columns to use, repeat the same column a bunch of times and enable a "outer join" flag that returns the rows without result from the query
ProTip: If you feel the transformation running too slow, try to use multiple copies from the step (documentation here) and obviously make sure the table have the appropriate indexes in place.
Yes there's a way of assigning directly without the use of parameter. Do as follows.
Use Block this step until steps finish to halt the table input step till csv input step completes.
Following is how you configure each step.
Note:
Postgres query should be select * from person where EXTRACT(YEAR
FROM birthdate) > ?::integer
Check Execute for each row and Replace variables in in Table input step.
Select only the birthday column in CSV input step.
I am currently working on a job something like this
The design is to,extract some data from customers,(say first name,last name) to one excel file,other data (say address) is to goto other excel file,i added a identity to tMap Numeric("s1",1,1) but it is starting from 1,3,5,7,9,11,13.... and on other excel it getting 2,4,6,8,10,12,...
but i need both excel to have same identity 1,2,3,4,5,6,....N
so that i can map the records
so can somebody guide me on this?
edit:
The autoincrement returns 1,2,3,4,5,6,... this is fine when thers only one tMap component in the job,but not similar when 2 tMaps are used ?
This is because the numeric sequence is static. Since you have only one sequence called "s1", it will be incremented twice at every iteration (one time for each tMap it's invoked in).
Just use some unique labels (ie. "s1" and "s2") to force the use of two independent sequences, thus the solution of your problem.