Talend combine two different data flows with same schema into one

Talend combine two different data flows with same schema into one - talend

I am using talend 6.1.1 and i am i have two components tmysqlinput and tfixedflowinput.
Schema is same for both the components and i am trying to combine the data generated by these components.
for example: schema is like col1 and col2
output of tmysqlinput component is:
1,2
2,3
output of tixeflowinput component is:
3,4
4,5
Now output which i am expeting is like combination of both the ouputs.
It should be like:
1,2
2,3
3,4
4,5
Please help me to combine the outputs of those two components.

An alternative to using tUnite is tHashOutput.
For example:
tMySqlInput--main-->tHashOutput
|
onSubjobOK
|
tFixedFlowInput--main-->tHashOutput
|
onSubjobOK
|
tHashInput--main-->tFileOuputDelimtited
In the second tHashOutput, make sure to associate it with the first tHashOutput.
In the tHashInput, make sure to associate it with the first tHashOutput.
tUnite would generally be preferred, but depending upon the case tHashOutput can be appropriate.

If the schema is exactly the same, you can send the row output of both components into a tUnite component
https://help.talend.com/display/TalendComponentsReferenceGuide54EN/tUnite

Related

Azure Data Factory merge 2 csv files with different schema

I am trying to merge the 2 csv files(in Azure data factory) which has different schema. Below is the scenario
CSV 1: 15 columns -> say 5 dimensions and 10 metrices(x1, x2,...x10)
CSV 2: 15 columns -> 5 dimensions(same as above) and 10 metrices(different from above, y1, y2...y10)
So my schema is different. Now I have to merge both CSV files so that only 5 dimensions comes with all 20 metrices.
I tried with Data Transformation using Select operation. That is giving me 2 rows in the merged file. One row with first 5 dimensions and 10 metrices and second row with next 5 dimensions and 10 metrices, which is incorrect as I am looking only for one row with 5 dimensions and all 20 metrics(x1,x2...x10, y1,y2...y10)
Any help is much appreciated on this issue

Thank you #sac for the update and thank you #Joel Cochran, for the suggestion. Posting it as an answer to help other community members.
Use Join transformation and Join type as Inner join. Use Key columns or common columns (dimension columns) from 2 Input files as your Join condition. This will output all columns from file1 and file2.
Use Select transformation to get the required select list from the join output.
Refer below process for implementation:
(i) Join 2 source files, with inner join and key columns in the join condition.
(ii) Output of Join transformation will list all the columns from source1 and all columns from source2 (includes duplicate key columns from both source files).
(iii) Use select transformation and remove duplicate (or not required in select list) columns from the Join output.
(iv) Output of Select transformation.

Splitting a Tmap output into several tables based on the value of a column

I have an output of tmap below is:
|src_table|src_columname
--------------------------
|Account |ID
|Account |Name
|Account |Owner
|Contact |ID
|Contact |Name
|Contact |FirstName
|Contact |LastName
I want output in two table like first Account and second Contact
Account
-----------------
ID |Name |Owner |
Contact
-------------------------------
ID |Name |FirstName |LastName |
I am beginer in taled. Please tell me which component i need to use for above output.
Actually, I am not an expert user and I don't found my solution. scenario is:
I'm trying to migrate some 10 tables from SQL server dB to oracle server DB and i wish to use Talend but I don't know in which way I could make it. First I tried by below method: I have created many sub-jobs for mapping table by table in one job, because each table has a different table structure, I have created different sub job with the corresponding schema, for example tOracleInput_1--main tMSSQLOutput_1 (migrate table1) |
onsubjobok |
tOracleInput_2--main--tMSSQLOutput_2 (migrate table2) |
onsubjobok |
...other subjobs for other tables...
But I do not want to create many sub job. is there any way like i need to create one subjob for all tables?

You'll have to use multiple outputs in tMap, and use the filter option in it.
Add a second output to your tMap .
Then activate a filter on both outputs (filter button in the yellow bar title)
In output #1, put a filter on src_table, like "Account".equals(row2.src_table)
In output #2, put a filter on src_table, like "Contact".equals(row2.src_table)
Then you will have only Accounts in your first output, only Contacts in your second output.

Cloud Dataprep - Multiply rows in one column based on values in other column

I am working in Cloud Dataprep and i have a case like this:
Basically I need to create new rows in column 2 based on how many rows there is with matching data in column 1.
Is it possible and how?

I understand that the scenario you want to have is: obtain all values from column1 that match a value present in column2. There are many things to consider in this scenario, which you did not describe, such as: can values in column2 be repeated? or if there is a value in column2 missing in column1, what should happen? or what happens the other way around?
However, as a general approach to this issue, I would do the following flow:
With a flow such as this one, you take the input table, which as two columns like this:
In recipes FIRST_COLUMN and SECOND_COLUMN you split both columns into different branches, and do the necessary steps to clean each column. In column1, I understand nothing is needed to be done. In column2, I understand that you will have to remove duplicates (again, this is my guessing, but it would depend on your specific implementation, which you have not completely described) and delete empty values. You can do that applying the following transforms:
Finally, you can join both columns together. Depending on your needs (only values present in both columns should appear, only values present in columnX should appear, etc.) you should apply a different JOIN strategy. You should use a Join key like column1 = column2 (as in the image), and if you choose only the second column in the left-side menu, you will have a single-column result.
Note that in this case I used an Inner-join, but using other JOIN types will provide completely different results. Use the one that fits your requirements better.

Closed loop does not work in a Talend Job

I have a Talend Job, where somehow a closed loop is formed by the components. Image is as follows:
The schemas of both the tMap outputs is same. Now after connecting any tMap to tUnite, when I try to connect the second tMap, it does not connect to it.
I heard that Talend does not allow, a closed loop in a Job. Is that true? If yes, the Why?
Someone had a similar question here, but found no answers.

Talend actually creates a Java program; essentially that is the reason for the limitation you've encountered.
tUnite take all the data provided by each of the inputs in turn i.e. all of A then all of B then all of C.
It cannot take row 1 from A then row 1 from B then row 1 from C then row 2 from A then row 2 from B etc. because of the nature of programming loops used for each flow.
However, tMap multiple outputs or tReplicate do create row 1 to A then row 1 to B then row 1 to C then row 2 to A then row 2 to B etc..
This is why you cannot split and then rejoin flows.

PreetyK has explained the why. I'll explain how to work around this limitation.
You can store the output from tMap_10 and tMap_11 in a tHashOutput each. On the 2nd tHashOutput you must check "Link with a tHashOutput" checkbox and then select the other tHashOutput from the droplist. This tells it to write to the same buffer as the 1st tHashOutput effectively making "union" of your tMap_10 and tMap_11 outputs.
On the next subjob, you use a tHashInput to read from your tHashOuput (you must use a single tHashInput as the 2 outputs share the same data).
Here are some screenshots :
Then the tHashInput:
Note that by default these components are hidden. You have to go to File > Project Settings > Designer > Palette settings, and then move them from left to right pane as bellow. You will then find them in your palette.

Crystal Reports - Create subreport with column range [col1...col60] as datasource?

I am adept in both SQL and CR, but this is something I've never had to do.
In CR, I load a table that will always contain 1 record. There is a range of columns (like Column1, Column2 ... Column60). (bad design, I know, but I can't do anything to change that).
Thanks to this old design I have to manually add each column in my report like this:
-----------
| TABLE |
-----------
| Column1 |
| Column2 |
| Column3 |
| ... |
-----------
Now I would like to be able to create a subreport and create a datasource for it in such a way that [Column1...Column60] becomes a collection [Row1...Row60]. I want to be able to use the detailsection of the subreport to dynamically generate the table. That would save me a lot of time.
Is there any way to do that? Maybe a different approach to what I had in mind?
Edit
#Siva: I'll describe it the best way I can. The table exists out of 500+ columns and will only hold 1 record (never more). Because normalization was never taken into account when creating these tables (Objective C / DBF ages) columns like these: Brand01,Brand02,Brand03...Brand60 should have been placed in a separate table named "Brands"
The document itself is pretty straight forward considering there's only one record. But some columns have to be pivoted (stacked vertically) and placed in a table layout on the document which is a lot of work if you have to do it manually. That's why I wanted to feed a range of columns into my subreport so I can use the detail section of my subreport to generate the table layout automatically.

Ok got it... I will try to answer to the extent possible...
you need to have 2 columns in report that will show the 60 column names as 60 rows as 1st column and 60 column data as 2nd column. For this there are two ways that I can think of.
if columns are static and report need to be developed only once then though its a tough job manually create 120 formulas 60 for row names where you will write column names and 60 for data for respective columns and place in report since you have only one record you will get correct data. Like below:
formula 1:
column1 name // write manually
Formula 1:
databasefield for column1 // this has data for column1
Above will be one row in report like this you will get 120 formulas 60 rows and you don't need sub report here main report will do the job.
Since you are expecting dynamic behavior (Though columns are static), you can create view from database perspective or datatable (Please note I have no idea on datatable use it as per your convinience).
Create in such a way that it has 2 columns in table and in report use cross tab that will give you dynamic behaviour.
In cross tab column1 will be rows part and column 2 will be data.
Here also I don't see any requirement for sub report you can directly use main report. If you want sub report you can use aswell no harm since you have only 1 record

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse