ADF Add Header to CSV Sink - azure-data-factory

Anyone know how to add header to csv sink? I have a data flow that's source is a database table. Then I have used derived column and concatenated the columns to make one column and split the data in the column by commas (done in the source via a query). I have then selected the column that has been concatenated to be export to csv.
Data example:
Matt,Smith,10
Therefore I technically only have one column, however, I want to add a header for each section of the data.
Desired output:
FirstName,LastName,Age
Matt,Smith,10

You can add headers in CSV file.
Select Data Flow Activity.
Select Source and use Select activity.
Add column names as shown in below screenshot.
Finally add Sink and run Pipeline.

Related

Is it possible to generate the space separated header row using data factory copy activity?

I am using azure sql as source dataset and delimited file as sink dataset in the copy activity.
I tried copy activity but First row as header gives comma separated headers.
Is there way to change the header output style ?
Please note spacing is unequal (h3...h4)
In this repro, I tried to give
1 space between 1st and 2nd column,
2 spaces between 2nd and 3rd column,
3 spaces between 3rd and 4th column.
Also, I tried to give same column name for column2 and column3. The approach is as follows.
Data is copied from Azure SQL database to datalake in comma delimitted format as a staging file.
This staging file is taken as a source in Dataflow activity.
In source dataset, first row as header is not checked.
Data preview of Source transformation:
Derived column transformation is added to change the column name of column2 and column3.
In this case, date_col of column1 is header data. Thus when column1 is 'date_col' replace column2 and column3 data with same column name.
column_2 = iif(Column_1=='date_col','ECIX',Column_2);
column_3 = iif(Column_1=='date_col','ECIX',Column_3);
Again derived column transformation is added to concat all the columns with spaces. Column name is given as concat . Value for this column is
concat(Column_1,' ',Column_2,' ',Column_3,' ',Column_4)
Select transformation is added and only concat column is selected here.
In sink, new delimited file is added as a sink dataset. And in sink dataset also , first row as header is not checked.
Output file screenshot
After pipeline is run, the target file looks like this.
Keeping the source as azure sql itself in the dataflow, I created a single derived column 'OUTDC' and added all the columns from the source like this:
(h1)+' '+(h2)+' '+(h3)
Then fed the OUTDC to a delimited sink and kept the Headers option as single string like this:
['h1 h2 h2']

ADF map source columns startswith to sink columns in SQL table

I have a ADF data flow with many csv files as a source and a SQL database as a sink. The data in the csv files are similar with 170 plus columns wide however not all of the files have the same columns. Additionally, some column names are different in each file, but each column name starts with the same corresponding 3 digits. Example: 203-student name, 644-student GPA.
Is it possible to map source columns using the first 3 characters?
Go back to the data flow designer and edit the data flow.
Click on the parameters tab
Create a new parameter and choose string array data type
For the default value as per your requirement, enter ['203-student name','203-student grade',’203-student-marks']
Add a Select transformation. The Select transformation will be used to map incoming columns to new column names for output.
We're going to change the first 3 column names to the new names defined in the parameter
To do this, add 3 rule-based mapping entries in the bottom pane
For the first column, the matching rule will be position==1 and the name will be $parameter11
Follow the same pattern for column 2 and 3
Click on the Inspect and Data Preview tabs of the Select transformation to view the new column name.
Reference - https://learn.microsoft.com/en-us/azure/data-factory/tutorial-data-flow-dynamic-columns#parameterized-column-mapping

Handling delimited files in Azure Data factory

I have got a very large table with around 28 columns and 900k records.
I converted it to CSV file (Pipe separated) and then tried to use that file for feeding another table using ADF itself.
When I tried to use that file, it keeps triggering an error saying some column datatype mismatch.
So excavating more into the data I have found few rows having Pipe (|) symbol in their text itself. So at the time coverting it back, the text after the pipe been considered for the next column and thus the error.
So how to handle the conversion into CSV efficiently when there are texts with delimiters in their columns.
Option1: If there is a possibility, I would suggest changing the delimiter to other than pipe(|), as the column value also contains pipe in its text.
Option2: In the CSV dataset, select a Quote character to identify the columns.
Step1: Copying data from table1 to CSV.
Source:
Sink CSV dataset:
Output:
Step2: Loading same CSV data to table2 with a copy activity.
CSV output file of Step1.
Source CSV dataset:
Sink dataset:
Output:

Custom Row Delimiter in Azure Data Factory (ADF)

I have a CSV file that terminates a row with Comma and CRLF.
I set my dataset to ",\r\n" but when I ran the pipeline, it won't accept this, thinking it's multiple values in the delimiter... If I don't put the comma in the dataset row delimiter, when pipeline runs, it thinks that there's an unnamed header. Is it possible in ADF to have this combination as a delimeter (comma + crlf) - ",\r\n"?
FirstName,LastName,Occupation,<CRLF Char>
Michael,Jordan,Doctor,<CRLF Char>
Update:
When running the copy activity, I encountered the same problem as you.
Then I selcet Line feed(\n)as Row delimiter at Source.
Add Column mapping as follows:
When I run debug, the csv file was successfully copied into Azure SQL table.
I created a simple test. Do you just want ADF to read 3 columns?
This is the origin csv file.
In ADF, If we use default Row delimiter and Column delimiter settings, select First row as header.
We also can select Edit and enter \r\n at Row delimiter field.
You can import schema here.

Skip lines while reading csv - Azure Data Factory

I am trying to copy data from Blob to Azure SQL using data flows within a pipeline.
Data Files is in csv format and the Header is at 4th row in the csv file.
i want to use the header as is what is available in the csv data file.
I want to loop through all the files and upload data.
Thanks
Add a Surrogate Key transformation and then a Filter transformation to filter out row number 4.
You need to first uncheck the "First row as header" in your CSV dataset. Then you can use the "Skip line count" field in the copy data activity source tab and skip any number of lines you want.