tHiveInput in a loop/iterate? - talend

I currently have a file with a sql query on each row.
I'd like to read each row of this file with a tHDFSInput, and execute the query with a tHiveInput.
How can I do that ?
I have something like this :
But it just goes in the thiveinput only once.

You should consider using the component tFlowToIterate between the tHDFSInput and the subjob with your tHiveInput.
In the below example, I generate a flow which contains a sequence of number then for each number, I do a request on my database(I confess it is not a tHiveInput but I guess it is the same logic).
Here is the configuration of the request I use in my tDBRow_1:
Hope it will help you.

Related

Informatica SQ returns different result

I am trying to pull data from DB2 via informatica, I have a SQ query that pulls few fields based on joins for 4 different tables.
When I run the query directly in the database, it returns the expected result, however when I run it in informatica and run a debugger, I see something else.
Please note all the columns data perfectly match, except one single column.
Weird thing is, this is a calculated field from the table based on a case statement:
CASE WHEN Column1='3' THEN 'N' ELSE 'Y' END.
Since this is a calculated field with a length of one string, I have connected from the source to SQ from one of the sources having 1 character length.
This returns 'Y' when executed in the database, the same query when I copy paste in SQ of information and run it, I get a data 'E', and this data can never be possible as I expect only a N or a Y. I have verified the column order, that its in the right place. This is very strange, is something going wrong because of the CASE Statement?
Save yourself the hassle, put an expression transformation after tge source qualifier and calculate, port value there then forget about it
I think i got the issue. We use Informatica PowerExchange to connect to a as400 system(DB2), and it seems that when we are trying to set a flag information in AS400, and pass it to informatica via PowerExchange, it converts it to binary, and to solve this, there needs to be an entry in the PowerExchange configuration file.
Unfortunately, i myself was not aware that it could be related to PowerExchange instead of powercenter itself.!!
Thanks for your assistance! Below is the KB about it.
https://kb.informatica.com/solution/4/Pages/17498.aspx

Talend component that takes multiple rows as input and returns one row as output?

I have an XML file (tFileInputXML) as the start point of my job, from that XML, i'd like to "combine" all its rows in a java List/Array/Whatever and get that List as output.
Is there a component in Talend that offers such mechanism ?
NB : I've already tried the TJavaFlex component but it still output many rows.
Thank you in advance.
You need to read the file, map its fields using tXMLMap and then process them in a java component :

How to assign csv field value to SQL query written inside table input step in Pentaho Spoon

I am pretty new to Pentaho so my query might sound very novice.
I have written a transformation in which am using CSV file input step and table input step.
Steps I followed:
Initially, I created a parameter in transformation properties. The
parameter birthdate doesn't have any default value set.
I have used this parameter in postgresql query in table input step
in the following manner:
select * from person where EXTRACT(YEAR FROM birthdate) > ${birthdate};
I am reading the CSV file using CSV file input step. How do I assign the birthdate value which is present in my CSV file to the parameter which I created in the transformation?
(OR)
Could you guide me the process of assigning the CSV field value directly to the SQL query used in the table input step without the use of a parameter?
TLDR;
I recommend using a "database join" step like in my third suggestion below.
See the last image for reference
First idea - Using Table Input as originally asked
Well, you don't need any parameter for that, unless you are going to provide the value for that parameter when asking the transformation to run. If you need to read data from a CSV you can do that with this approach.
First, read your CSV and make sure your rows are ok.
After that, use a select values to keep only the columns to be used as parameters.
In the table input, use a placeholder (?) to determine where to place the data and ask it to run for each row that it receives from the source step.
Just keep in ming that the order of columns received by the table input (the columns out of the select values) is the same order that it will be used for the placeholders (?). This should not be a problem with your question that uses only one placeholder, but keep that in mind as you ramp up using Pentaho.
Second idea, using a Database Lookup
This is another approach where you can't personalize the query made to the database and may experience a better performance because you can set a "Enable cache" flag and if you don't need to use a function on your where clause this is really recommended.
Third idea, using a Database Join
That is my recommended approach if you need a function on your where clause. It looks a lot like the Table Input approach but you can skip the select values step and select what columns to use, repeat the same column a bunch of times and enable a "outer join" flag that returns the rows without result from the query
ProTip: If you feel the transformation running too slow, try to use multiple copies from the step (documentation here) and obviously make sure the table have the appropriate indexes in place.
Yes there's a way of assigning directly without the use of parameter. Do as follows.
Use Block this step until steps finish to halt the table input step till csv input step completes.
Following is how you configure each step.
Note:
Postgres query should be select * from person where EXTRACT(YEAR
FROM birthdate) > ?::integer
Check Execute for each row and Replace variables in in Table input step.
Select only the birthday column in CSV input step.

Talend tMap Set Default Value for Rejected Inner Joins and connect them with the main data flow

i've got the following problem.
I have several tMaps, each has a lookup and at the end all the data is written in a db. The following mockup shall illustrate it:
There can be values in the main data stream which are not found in the lookup tables. For this values there is a rejected path which catches them from the specific tMap.
Requirements:
In case of a rejected inner join the looked up value shall be set to a default value (for example 0, which could be done in the schema of the tMap) and after that these "corrected" records should be added to the "normal" main data flow and process the next lookup.
The tUnite component is not able to handle this cases because it can not exist in a data flow loop.
Does anybody got an idea how to solve this problem?
Cheers.
The answer was so easy that i didn't got it in the first conception. I just have to change the join model from inner to left-join so all the formal rejected values will have a null value in it. Afterwards i can check the columns in the tmap and set them on a default value if they are null.
row1.id == null ? 0 : row1.id
Cheers.
If I understand correctly what you are trying to accomplish you will have to have staging files or staging tables on the database. Once you get the rejected rows, write them on a file or table. The accepted files will go also to a staging table(different than the rejected). Then you can union both tables or files by reading them. The key point is having a staging structure. I attach a picture what how would it be. In the picture the staging structure is a mysql table.
Let me know if it helps!

How to log (or see) all inserts performed in a talend job

I have a Job in talend that inserts data into a table.
Can I get this SQL sentences (ie "insert into tabla(a,b)values(....)")?
You can see the data inserted by adding tLogRow but if you want to see the generated insert on real time you can use the debugger.
For example, for the following job:
Above you can see the data inserted from an excel file to a mysql table. This was generated using tLogRow. But if you want the sql generated sentence, by using the debug you can see it here:
Hope to help.
You could simply place a tLogRow component either before or after your database output component to log things to the console if you are interested in seeing what data is being sent to the database.
I think it's impossible to see (it could be nice as an improvement in new releases). My problem, was when I change de source of my database output (Oracle SID to Oracle RAC), the inserts were made in the older database.
I fix it change the xml code in the "item" file. With the change older params attached to Oracle SID were stil there.
Thanks a lot!! Have a nice weekend Goon10 and ydaetskcoR!
You can check the generated JAVA code. You'll see an:
INSERT INTO (columns) VALUES (?,?,?)
thats the insert preparedStatement. Talend uses preparedStatements to do the inserts, thus only 1 insert will be generated and sent. In the main part of the component it will call
setString(value,position)
Please refer to: http://docs.oracle.com/javase/tutorial/jdbc/basics/prepared.html