talend - put a logic before tOutput - talend

I have a talend job that have the following components:
- 1 database input component that fetched data from the database
- 1 output component that writes the fetched data into flat file.
Now, I have a scenario wherein, from two fields on the fetched data, only one should be put on in the output based on some if else logic.
Can anyone assist me in this matter? Is it using tMap?

Your if/else logic could be put in the tMap; as a ternary operator in the output of your tMap.
condition?resultifOK:resultifKO
For example, you want to insert in your file the value of ColumnA if it is "MY CONDITION", otherwise you insert value of ColumnB :
You'll have, directly in the output row of tMap:
"MY CONDITION".equals(row1.ColumnA)?row1.ColumnA:row1.ColumnB
You can't directly use a if/else in tMap. If you have several conditions , consider using a Routine instead of multiple ternary operators.

Related

Throw error on invalid lookup in Talend job that populates an output table

I have a tMap component in a Talend job. The objective is to get a row from an input table, perform a column lookup in another input table, and write an output table populating one of the columns with the retrieved value (see screenshot below).
If the lookup is unsuccessful, I generate a row in an "invalid rows" table. This works fine however is not the solution I'm looking for.
Instead, I want to stop the entire process and throw an error on the first unsuccessful lookup. Is this possible in Talend? The error that is thrown should contain the value that failed the lookup.
UPDATE
A tfileoutputdelimited componenent would do the staff .
So ,the flow would be as such tMap ->invalid_row->tfileoutputdelimited -> tdie
Note : that you have to go to advanced settings in the tfileoutputdelimited component aand tick split output into multiple files option and put 1 rather then 1000
For more flexibility , simply do two tmap order than one tMap

How to Select or Filter out values with endsWith() in Mapping data flow

I would like to filter out all values that does not end with ":Q_TT".
I have tried with the following activities
My bronze data has a column named "pnt_name". One of the rows in the table ends with ":Q_TT" so I would expect that the Exists activity would pass that row through.
Custom expression in Exists1
endsWith(':Q_TT', pnt_name)
In the future I would like the SourceData dataset to hold the filter values.
Thanks very much
You should be using filter activity instead of exists activity for your case.
The pipeline(where you can see the 2 test cases of A and B:Q_TT:
Here is the preview of the filter activity, you should use an expression of endsWith(pnt_name, ":Q_TT") too. You can see A is removed and B:Q_TT is kept.

Pivot data in Talend

I have some data which I need to pivot in Talend. This is a sample:
brandname,metric,value
A,xyz,2
B,xyz,2
A,abc,3
C,def,1
C,ghi,6
A,ghi,1
Now I need this data to be pivoted on the metric column like this:
brandname,abc,def,ghi,xyz
A,3,null,1,2
B,null,null,null,2
C,null,1,6,null
Currently I am using tPivotToColumnsDelimited to pivot the data to a file and reading back from that file. However having to store data on an external file and reading back is messy and unnecessary overhead.
Is there a way to do this with Talend without writing to an external file? I tried to use tDenormalize but as far as I understand, it will return the rows as 1 column which is not what I need. I also looked for some 3rd party component in TalendExchange but couldn't find anything useful.
Thank you for your help.
Assuming that your metrics are fixed, you can use their names as columns of the output. The solution to do the pivot has two parts: first, a tMap that transposes the value of each input-row in into the corresponding column in the output-row out and second, a tAggregate that groups the map's output-rows according to the brandname.
For the tMap you'd have to fill the columns conditionally like this, example for output colum named "abc":
out.abc = "abc".equals(in.metric)?in.value:null
In the tAggregate you'd have to group by out.brandname and aggregate each column as sum ignoring nulls.

How to assign csv field value to SQL query written inside table input step in Pentaho Spoon

I am pretty new to Pentaho so my query might sound very novice.
I have written a transformation in which am using CSV file input step and table input step.
Steps I followed:
Initially, I created a parameter in transformation properties. The
parameter birthdate doesn't have any default value set.
I have used this parameter in postgresql query in table input step
in the following manner:
select * from person where EXTRACT(YEAR FROM birthdate) > ${birthdate};
I am reading the CSV file using CSV file input step. How do I assign the birthdate value which is present in my CSV file to the parameter which I created in the transformation?
(OR)
Could you guide me the process of assigning the CSV field value directly to the SQL query used in the table input step without the use of a parameter?
TLDR;
I recommend using a "database join" step like in my third suggestion below.
See the last image for reference
First idea - Using Table Input as originally asked
Well, you don't need any parameter for that, unless you are going to provide the value for that parameter when asking the transformation to run. If you need to read data from a CSV you can do that with this approach.
First, read your CSV and make sure your rows are ok.
After that, use a select values to keep only the columns to be used as parameters.
In the table input, use a placeholder (?) to determine where to place the data and ask it to run for each row that it receives from the source step.
Just keep in ming that the order of columns received by the table input (the columns out of the select values) is the same order that it will be used for the placeholders (?). This should not be a problem with your question that uses only one placeholder, but keep that in mind as you ramp up using Pentaho.
Second idea, using a Database Lookup
This is another approach where you can't personalize the query made to the database and may experience a better performance because you can set a "Enable cache" flag and if you don't need to use a function on your where clause this is really recommended.
Third idea, using a Database Join
That is my recommended approach if you need a function on your where clause. It looks a lot like the Table Input approach but you can skip the select values step and select what columns to use, repeat the same column a bunch of times and enable a "outer join" flag that returns the rows without result from the query
ProTip: If you feel the transformation running too slow, try to use multiple copies from the step (documentation here) and obviously make sure the table have the appropriate indexes in place.
Yes there's a way of assigning directly without the use of parameter. Do as follows.
Use Block this step until steps finish to halt the table input step till csv input step completes.
Following is how you configure each step.
Note:
Postgres query should be select * from person where EXTRACT(YEAR
FROM birthdate) > ?::integer
Check Execute for each row and Replace variables in in Table input step.
Select only the birthday column in CSV input step.

Transpose data using Talend

I have this kind of data:
I need to transpose this data into something like this using Talend:
Help would be much appreciated.
dbh's suggestion should work indeed, but I did not try it.
However, I have another solution which doesn't require to change input format and is not too complicated to implement. Indeed the job has only 2 transformation components (tDenormalize and tMap).
The job looks like the following:
Explanation :
Your input is read from a CSV file (could be a database or any other kind of input)
tDenormalize component will Denormalize your column value (column 2), based on value on id column (column 1), separating fields with a specific delimiter (";" in my case), resulting as shown in 2 rows.
tMap : split the aggregated column into multiple columns, by using java's String.split() method and spreading the resulting array into multiple columns. The tMap should like like this:
Since Talend doesn't accept to store Array objects, make sure to store the splitted String in Object format. Then, cast that object into Array on the right side of the Map.
That approach should give you the expected result.
IMPORTANT:
tNormalize might shuffle the rows, meaning for bigger input, you might encounter unsorted output. Make sure to sort it if needed or use tDenormalizeSortedRow instead.
tNormalize is similar to an aggregation component meaning it scans the whole input before processing, which results into possible performance issues with particularly big inputs (tens of millions of records).
Your input is probably wrong (you have 5 entries with 1 as id, and 6 entries with 2 as id). 6 columns are expected meaning you should always have 6 lines per id. If not, then you should implement dbh's solution, and you probably HAVE TO add a column with a key.
You can use Talend's tPivotToColumnsDelimited component to achieve this. You will most likely need an additional column in your data to represent the field name.
Like "Identifier, field name, value "
Then you can use this component to pivot the data and write a file as output. If you need to process the data further, read the resulting file with tFileInoutDelimited .
See docs and an example at
https://help.talend.com/display/TalendOpenStudioComponentsReferenceGuide521EN/13.43+tPivotToColumnsDelimited