How to split 2 or more delimited columns in a single row to multiple rows using Talend - talend

I am trying to move data from a CSV file to DB table. There are 2 delimited columns in the CSV file (separated by ";"). I would like to create a row for each of the delimited values at matching indexes as shown below. Assumption is that both columns will contain same number of delimited items.
Example CSV Input:
Labels Values
A;B;C 1;2;3
D 4
F;G 5;6
Expected Output:
Labels Values
A 1
B 2
C 3
D 4
E 5
F 6
How can I achieve this? I have tried using tNormalize but this only works for a single column. Also I tried 2 successive tNormalize nodes but as expected it resulted in unwanted combinations.
Thanks

Read your CSV file with a tfileinputdelimited, and
define your schema for the file.
Assuming you are using MySQL , also drop a tMysqlOutput component on you desinger to save your parsed file to the DB.

Related

How can i split a column delimiter with spaces?

I have a csv delimited with spaces that can change to 9-10-11, it s there a way for split the column in two with Azure data factory?
Examples:
This is my CSV
I try using dataflows:
but when I execute the dataflow, it throw me this error:
PD: the csv has 4.000.000 rows
Solve the problem using azure data factory, the csv needs to finish in my DW
I have the following data in my sample csv file with either 9, 10 or 11 spaces in between the value.
Now, after reading the column, use derived column transformation to split the required column using 9 spaces (minimum number of spaces).
req1 : split(col1,' ')[1]
req2 : split(col1,' ')[2]
This will split the data into an array of 2 elements, where 1st index will have no spaces in its value and the 2nd index element has trailing spaces.
Now apply ltrim on the req2 column. Check the length of this column before and after the transformation to confirm that we are eliminating the trailing spaces.
req2 : ltrim(req2)
After doing this, you can check the length of the req2 and it would be 1.
Now, select only the required columns and then write it to required sink.

Combine two sources with different schemas in a single file keeping the schema per row in ADF Dataflow

i need to combine two sources into a single sink file with keeping the schema per row. Example:
File 1
Column 1
Column 2
Column 3
Column 4
A
B
C
D
File 2
Column 1
Column 2
J
K
Output File
A, B, C, D
J, K
No need to header row.
Each column separated by a comma
Each row keep it's structure/schema:
Thanks for help
#Kd85 As per your comment, You want to combine 2 CSV files and store output in .txt file.
If you are using Binary dataset as Sink in copy activity, you can only copy from Binary dataset.
Please Refer - https://learn.microsoft.com/en-us/azure/data-factory/format-binary
You can simply use a Union activity in your dataflow.
Choose Output to single file in the sink settings.
Result txt:

How to add trailer/footer in csv dataframe azure blob pyspark

i have as solution which goes like
df1 -->dataframe 1 with having 50 columns of data
df2 --->datarame 2 having footer/trailer 3 columns of data like Trailer,count of rows,date
so i added the remaining 47 columns as "","",""..... so on
so that i can union 2 dataframe:
df3=df1.union(df2)
now if i want to save
df3.coalesce(1).write.format("com.databricks.spark.csv")\
.option("header","true").mode("overwrite")\
.save(output_blob_path);
so now i am getting the footer as well
like this Trailer,400,20210805,"","","","","","","".. and so on
if any one can suggest how to remove ,"","","",.. these double quotes from the last row
where i want to save this file in blob container.
it would be very helpful
You can try to define structure of data frame to treat entire row as single column for both the files and then perform union. This way you no need to add extra columns on data frame 2 and then struck in to tricky situation to remove extra columns after union.

How to compare the colums (2nd and 3rd) of 2 .csv files based on a condition in matlab or octave

I have a program based on some functionality and the results are displayed on 2 .csv files (file1 and file2)
The files consist of same rows and columns, the columns are named as condition, supply1, supply2
I want to read both the files into the program, load them and then compare the contents of supply1, supply2, based on the condition and number.
We have to compare the respective contents and display the difference of supply1 and suppy2 between those 2 files.
Eg: condition ta and number 1 of the 4th row of file1 has to compare with condition ta and number 1 of the 1st row of file2
Please help me on this.
file 1
file 1 contents
file 2

Merge Columns from various files in Talend

I am trying to achieve column merge of files in a folder using Talend.(Files are local)
Example:- 4 files are there in a folder. ( there could be 'n' number of files also)
Each file would have one column having 100 values.
So after merge, the output file would have 4 or 'n' number of columns with 100 records in it.
Is it possible to merge this way using Talend components ?
Tried with 2 files in tmap , the output records becomes multiplied ( the record in first file * the record in second file ).
Any help would be appreciated.
Thanks.
You have to determine how to join data from the different files.
If row number N of each file has to be matched with row number N of the other files, then you must set a sequence on each of your file, and join the sequences in order to get your result. Careful, you are totally depending on the order of data in each file.
Then you can have this job :
tFileInputdelimited_1 --> tMap_1 --->{tMap_5
tFileInputdelimited_2 --> tMap_2 --->{tMap_5
tFileInputdelimited_3 --> tMap_3 --->{tMap_5
tFileInputdelimited_4 --> tMap_4 --->{tMap_5
In tMaps from 1 to 4, copy the input to the output, and add a "sequence" column (datatype integer) to your output, populate it with Numeric.sequence("IDENTIFIER1",1,1) . Then you have 2 columns in output : your data and a unique sequence.
Be careful to use different identifiers for each source.
Then in tMap_5, just join the different sequences, and get your inputColumn.