Merge Columns from various files in Talend - talend

I am trying to achieve column merge of files in a folder using Talend.(Files are local)
Example:- 4 files are there in a folder. ( there could be 'n' number of files also)
Each file would have one column having 100 values.
So after merge, the output file would have 4 or 'n' number of columns with 100 records in it.
Is it possible to merge this way using Talend components ?
Tried with 2 files in tmap , the output records becomes multiplied ( the record in first file * the record in second file ).
Any help would be appreciated.
Thanks.

You have to determine how to join data from the different files.
If row number N of each file has to be matched with row number N of the other files, then you must set a sequence on each of your file, and join the sequences in order to get your result. Careful, you are totally depending on the order of data in each file.
Then you can have this job :
tFileInputdelimited_1 --> tMap_1 --->{tMap_5
tFileInputdelimited_2 --> tMap_2 --->{tMap_5
tFileInputdelimited_3 --> tMap_3 --->{tMap_5
tFileInputdelimited_4 --> tMap_4 --->{tMap_5
In tMaps from 1 to 4, copy the input to the output, and add a "sequence" column (datatype integer) to your output, populate it with Numeric.sequence("IDENTIFIER1",1,1) . Then you have 2 columns in output : your data and a unique sequence.
Be careful to use different identifiers for each source.
Then in tMap_5, just join the different sequences, and get your inputColumn.

Related

ADF map source columns startswith to sink columns in SQL table

I have a ADF data flow with many csv files as a source and a SQL database as a sink. The data in the csv files are similar with 170 plus columns wide however not all of the files have the same columns. Additionally, some column names are different in each file, but each column name starts with the same corresponding 3 digits. Example: 203-student name, 644-student GPA.
Is it possible to map source columns using the first 3 characters?
Go back to the data flow designer and edit the data flow.
Click on the parameters tab
Create a new parameter and choose string array data type
For the default value as per your requirement, enter ['203-student name','203-student grade',’203-student-marks']
Add a Select transformation. The Select transformation will be used to map incoming columns to new column names for output.
We're going to change the first 3 column names to the new names defined in the parameter
To do this, add 3 rule-based mapping entries in the bottom pane
For the first column, the matching rule will be position==1 and the name will be $parameter11
Follow the same pattern for column 2 and 3
Click on the Inspect and Data Preview tabs of the Select transformation to view the new column name.
Reference - https://learn.microsoft.com/en-us/azure/data-factory/tutorial-data-flow-dynamic-columns#parameterized-column-mapping

Combine two sources with different schemas in a single file keeping the schema per row in ADF Dataflow

i need to combine two sources into a single sink file with keeping the schema per row. Example:
File 1
Column 1
Column 2
Column 3
Column 4
A
B
C
D
File 2
Column 1
Column 2
J
K
Output File
A, B, C, D
J, K
No need to header row.
Each column separated by a comma
Each row keep it's structure/schema:
Thanks for help
#Kd85 As per your comment, You want to combine 2 CSV files and store output in .txt file.
If you are using Binary dataset as Sink in copy activity, you can only copy from Binary dataset.
Please Refer - https://learn.microsoft.com/en-us/azure/data-factory/format-binary
You can simply use a Union activity in your dataflow.
Choose Output to single file in the sink settings.
Result txt:

Azure Data Factory merge 2 csv files with different schema

I am trying to merge the 2 csv files(in Azure data factory) which has different schema. Below is the scenario
CSV 1: 15 columns -> say 5 dimensions and 10 metrices(x1, x2,...x10)
CSV 2: 15 columns -> 5 dimensions(same as above) and 10 metrices(different from above, y1, y2...y10)
So my schema is different. Now I have to merge both CSV files so that only 5 dimensions comes with all 20 metrices.
I tried with Data Transformation using Select operation. That is giving me 2 rows in the merged file. One row with first 5 dimensions and 10 metrices and second row with next 5 dimensions and 10 metrices, which is incorrect as I am looking only for one row with 5 dimensions and all 20 metrics(x1,x2...x10, y1,y2...y10)
Any help is much appreciated on this issue
Thank you #sac for the update and thank you #Joel Cochran, for the suggestion. Posting it as an answer to help other community members.
Use Join transformation and Join type as Inner join. Use Key columns or common columns (dimension columns) from 2 Input files as your Join condition. This will output all columns from file1 and file2.
Use Select transformation to get the required select list from the join output.
Refer below process for implementation:
(i) Join 2 source files, with inner join and key columns in the join condition.
(ii) Output of Join transformation will list all the columns from source1 and all columns from source2 (includes duplicate key columns from both source files).
(iii) Use select transformation and remove duplicate (or not required in select list) columns from the Join output.
(iv) Output of Select transformation.

How to compare the colums (2nd and 3rd) of 2 .csv files based on a condition in matlab or octave

I have a program based on some functionality and the results are displayed on 2 .csv files (file1 and file2)
The files consist of same rows and columns, the columns are named as condition, supply1, supply2
I want to read both the files into the program, load them and then compare the contents of supply1, supply2, based on the condition and number.
We have to compare the respective contents and display the difference of supply1 and suppy2 between those 2 files.
Eg: condition ta and number 1 of the 4th row of file1 has to compare with condition ta and number 1 of the 1st row of file2
Please help me on this.
file 1
file 1 contents
file 2

How to split 2 or more delimited columns in a single row to multiple rows using Talend

I am trying to move data from a CSV file to DB table. There are 2 delimited columns in the CSV file (separated by ";"). I would like to create a row for each of the delimited values at matching indexes as shown below. Assumption is that both columns will contain same number of delimited items.
Example CSV Input:
Labels Values
A;B;C 1;2;3
D 4
F;G 5;6
Expected Output:
Labels Values
A 1
B 2
C 3
D 4
E 5
F 6
How can I achieve this? I have tried using tNormalize but this only works for a single column. Also I tried 2 successive tNormalize nodes but as expected it resulted in unwanted combinations.
Thanks
Read your CSV file with a tfileinputdelimited, and
define your schema for the file.
Assuming you are using MySQL , also drop a tMysqlOutput component on you desinger to save your parsed file to the DB.