Datastage: Split multiple sequential rows under different columns - datastage

I've got this type of data in my Database. Imagine that File_Name is the column name and so I need to take all the rows (Under "File_name") and put them into different columns with different Names.
File_Name (Column Name)
File1 (First Row)
File2 (Second Row)
File3 (Third Row)
And I need to put them in another file like this:
File_Name1 (Column Name1) ,File_Name2 (Column Name2), File_Name3 (Column Name3)
File1 (Under First column), File2 (Under Second Column), File3 (Under Third column)
Is there a stage that can help me? I tried using the Pivot but I can't really figure how to set it with just one input column.

So assuming you just want a single result row from your input (that is what I understood from your question) I would use a Transformer (or Column Generator) to add an artificial column with a value of 1 for all rows.
You tried already with the Pivot Enterprise stage and with that additional column it will be possible to transform it into the result you need.

Related

Is it possible to generate the space separated header row using data factory copy activity?

I am using azure sql as source dataset and delimited file as sink dataset in the copy activity.
I tried copy activity but First row as header gives comma separated headers.
Is there way to change the header output style ?
Please note spacing is unequal (h3...h4)
In this repro, I tried to give
1 space between 1st and 2nd column,
2 spaces between 2nd and 3rd column,
3 spaces between 3rd and 4th column.
Also, I tried to give same column name for column2 and column3. The approach is as follows.
Data is copied from Azure SQL database to datalake in comma delimitted format as a staging file.
This staging file is taken as a source in Dataflow activity.
In source dataset, first row as header is not checked.
Data preview of Source transformation:
Derived column transformation is added to change the column name of column2 and column3.
In this case, date_col of column1 is header data. Thus when column1 is 'date_col' replace column2 and column3 data with same column name.
column_2 = iif(Column_1=='date_col','ECIX',Column_2);
column_3 = iif(Column_1=='date_col','ECIX',Column_3);
Again derived column transformation is added to concat all the columns with spaces. Column name is given as concat . Value for this column is
concat(Column_1,' ',Column_2,' ',Column_3,' ',Column_4)
Select transformation is added and only concat column is selected here.
In sink, new delimited file is added as a sink dataset. And in sink dataset also , first row as header is not checked.
Output file screenshot
After pipeline is run, the target file looks like this.
Keeping the source as azure sql itself in the dataflow, I created a single derived column 'OUTDC' and added all the columns from the source like this:
(h1)+' '+(h2)+' '+(h3)
Then fed the OUTDC to a delimited sink and kept the Headers option as single string like this:
['h1 h2 h2']

ADF map source columns startswith to sink columns in SQL table

I have a ADF data flow with many csv files as a source and a SQL database as a sink. The data in the csv files are similar with 170 plus columns wide however not all of the files have the same columns. Additionally, some column names are different in each file, but each column name starts with the same corresponding 3 digits. Example: 203-student name, 644-student GPA.
Is it possible to map source columns using the first 3 characters?
Go back to the data flow designer and edit the data flow.
Click on the parameters tab
Create a new parameter and choose string array data type
For the default value as per your requirement, enter ['203-student name','203-student grade',’203-student-marks']
Add a Select transformation. The Select transformation will be used to map incoming columns to new column names for output.
We're going to change the first 3 column names to the new names defined in the parameter
To do this, add 3 rule-based mapping entries in the bottom pane
For the first column, the matching rule will be position==1 and the name will be $parameter11
Follow the same pattern for column 2 and 3
Click on the Inspect and Data Preview tabs of the Select transformation to view the new column name.
Reference - https://learn.microsoft.com/en-us/azure/data-factory/tutorial-data-flow-dynamic-columns#parameterized-column-mapping

How can I rename several columns in dataprep?

I have more than 100 columns in dataprep whose names are like:
my column name 1
my column name 2
I would like to rename the name of the columns to be:
my_column_name_1
my_column_name_2
I have tried to do a rename, changing " " by "_". However, dataprep only changes the first whitespace! Is there any way to change all the whitespaces?
Another question, when I do a function like rename, it is done just for a column. I can add more columns writing the name of de column. Is there any way to select all columns without writing all the names?
thank you so much!
You can shift-select multiple columns to Transform when the data is in column view mode.
Select the columns to apply to and then choose the transformation.
JSDBroughton answer did the trick for me although it's not so clear how to do it. Change your view to Columns (second icon from the left on the toolbar). Select the first column, then hold Shift and select the last column. You should now have all columns selected. Then right clock and select Rename. A new Recipe step will be added with all your columns already added. Then set the Option to "Find and replace".
In terms removing all the spaces I couldn't find any Cloud Dataprep pattern or Regular Expression which let me replace all my spaces in my columns. Having said that my columns had a maximum of 4 spaces so I simply added the same step multiple times. I used the Regular Expression \s to match spaces and I replaced them underscores.

Merge Columns from various files in Talend

I am trying to achieve column merge of files in a folder using Talend.(Files are local)
Example:- 4 files are there in a folder. ( there could be 'n' number of files also)
Each file would have one column having 100 values.
So after merge, the output file would have 4 or 'n' number of columns with 100 records in it.
Is it possible to merge this way using Talend components ?
Tried with 2 files in tmap , the output records becomes multiplied ( the record in first file * the record in second file ).
Any help would be appreciated.
Thanks.
You have to determine how to join data from the different files.
If row number N of each file has to be matched with row number N of the other files, then you must set a sequence on each of your file, and join the sequences in order to get your result. Careful, you are totally depending on the order of data in each file.
Then you can have this job :
tFileInputdelimited_1 --> tMap_1 --->{tMap_5
tFileInputdelimited_2 --> tMap_2 --->{tMap_5
tFileInputdelimited_3 --> tMap_3 --->{tMap_5
tFileInputdelimited_4 --> tMap_4 --->{tMap_5
In tMaps from 1 to 4, copy the input to the output, and add a "sequence" column (datatype integer) to your output, populate it with Numeric.sequence("IDENTIFIER1",1,1) . Then you have 2 columns in output : your data and a unique sequence.
Be careful to use different identifiers for each source.
Then in tMap_5, just join the different sequences, and get your inputColumn.

Merge all the data in the second column for each unique value in the first column

I have two columns of data. Some of the data in the first column repeats (they represent questions). The data in the second column is unique (they represent multiple answers to the same question).
I need to merge all the data in the second column for each unique value in the first column. e.g.:
Q,A
1,yes.
1,is possible.
2,no.
2,not possible.
2,cannot do this.
2,impossible.
3,maybe.
merged to:
Q,A
1,yes.is possible.
2,no.not possible.cannot do this.impossible.
3,maybe.
Something like this is crude but may be adequate:
=IF(A1=A2,C1&B2,B2)
copied down to suit. Then select the last entry (identifiable with something like =A1=A2 copied down to suit) for each Question number.
Questions in column A sorted in order
Answers in column B
In C1 use =B1
In C2 use =if(a2=A1,C1&B2,B2)
Drag down formula in C2.
It will keep adding the lines together as long as the question remains the same. When it gets to a new question, it'll start a new string. The last time each question is listed will be the complete string in column C.
Create a 2 column project in Google Refine
Sort by Q column (if not already sorted) and make sort permanent
Blank Down on Q column to remove duplicate values
On A column, do Edit Cells -> Merge multi-valued cells