Custom Row Delimiter in Azure Data Factory (ADF) - azure-data-factory

I have a CSV file that terminates a row with Comma and CRLF.
I set my dataset to ",\r\n" but when I ran the pipeline, it won't accept this, thinking it's multiple values in the delimiter... If I don't put the comma in the dataset row delimiter, when pipeline runs, it thinks that there's an unnamed header. Is it possible in ADF to have this combination as a delimeter (comma + crlf) - ",\r\n"?
FirstName,LastName,Occupation,<CRLF Char>
Michael,Jordan,Doctor,<CRLF Char>

Update:
When running the copy activity, I encountered the same problem as you.
Then I selcet Line feed(\n)as Row delimiter at Source.
Add Column mapping as follows:
When I run debug, the csv file was successfully copied into Azure SQL table.
I created a simple test. Do you just want ADF to read 3 columns?
This is the origin csv file.
In ADF, If we use default Row delimiter and Column delimiter settings, select First row as header.
We also can select Edit and enter \r\n at Row delimiter field.
You can import schema here.

Related

Is it possible to generate the space separated header row using data factory copy activity?

I am using azure sql as source dataset and delimited file as sink dataset in the copy activity.
I tried copy activity but First row as header gives comma separated headers.
Is there way to change the header output style ?
Please note spacing is unequal (h3...h4)
In this repro, I tried to give
1 space between 1st and 2nd column,
2 spaces between 2nd and 3rd column,
3 spaces between 3rd and 4th column.
Also, I tried to give same column name for column2 and column3. The approach is as follows.
Data is copied from Azure SQL database to datalake in comma delimitted format as a staging file.
This staging file is taken as a source in Dataflow activity.
In source dataset, first row as header is not checked.
Data preview of Source transformation:
Derived column transformation is added to change the column name of column2 and column3.
In this case, date_col of column1 is header data. Thus when column1 is 'date_col' replace column2 and column3 data with same column name.
column_2 = iif(Column_1=='date_col','ECIX',Column_2);
column_3 = iif(Column_1=='date_col','ECIX',Column_3);
Again derived column transformation is added to concat all the columns with spaces. Column name is given as concat . Value for this column is
concat(Column_1,' ',Column_2,' ',Column_3,' ',Column_4)
Select transformation is added and only concat column is selected here.
In sink, new delimited file is added as a sink dataset. And in sink dataset also , first row as header is not checked.
Output file screenshot
After pipeline is run, the target file looks like this.
Keeping the source as azure sql itself in the dataflow, I created a single derived column 'OUTDC' and added all the columns from the source like this:
(h1)+' '+(h2)+' '+(h3)
Then fed the OUTDC to a delimited sink and kept the Headers option as single string like this:
['h1 h2 h2']

ADF Add Header to CSV Sink

Anyone know how to add header to csv sink? I have a data flow that's source is a database table. Then I have used derived column and concatenated the columns to make one column and split the data in the column by commas (done in the source via a query). I have then selected the column that has been concatenated to be export to csv.
Data example:
Matt,Smith,10
Therefore I technically only have one column, however, I want to add a header for each section of the data.
Desired output:
FirstName,LastName,Age
Matt,Smith,10
You can add headers in CSV file.
Select Data Flow Activity.
Select Source and use Select activity.
Add column names as shown in below screenshot.
Finally add Sink and run Pipeline.

Handling delimited files in Azure Data factory

I have got a very large table with around 28 columns and 900k records.
I converted it to CSV file (Pipe separated) and then tried to use that file for feeding another table using ADF itself.
When I tried to use that file, it keeps triggering an error saying some column datatype mismatch.
So excavating more into the data I have found few rows having Pipe (|) symbol in their text itself. So at the time coverting it back, the text after the pipe been considered for the next column and thus the error.
So how to handle the conversion into CSV efficiently when there are texts with delimiters in their columns.
Option1: If there is a possibility, I would suggest changing the delimiter to other than pipe(|), as the column value also contains pipe in its text.
Option2: In the CSV dataset, select a Quote character to identify the columns.
Step1: Copying data from table1 to CSV.
Source:
Sink CSV dataset:
Output:
Step2: Loading same CSV data to table2 with a copy activity.
CSV output file of Step1.
Source CSV dataset:
Sink dataset:
Output:

Skip lines while reading csv - Azure Data Factory

I am trying to copy data from Blob to Azure SQL using data flows within a pipeline.
Data Files is in csv format and the Header is at 4th row in the csv file.
i want to use the header as is what is available in the csv data file.
I want to loop through all the files and upload data.
Thanks
Add a Surrogate Key transformation and then a Filter transformation to filter out row number 4.
You need to first uncheck the "First row as header" in your CSV dataset. Then you can use the "Skip line count" field in the copy data activity source tab and skip any number of lines you want.

Postgresql: Execute query write results to csv file - datatype money gets broken into two columns because of comma

After running Execute query write results to file - the columns in my output file for datatype money get broken into two columns. e.g if my revenue is $500 it is displayed correctly. But, if my revenue is $1,500.00 - there is an issue. It gets broken into two columns $1 and $500.00
Can you please help me getting my results in a csv file in a single column for datatype money?
What is this command "execute query write results to file"? Do you mean COPY? If so, have a look at the FORCE QUOTE option http://www.postgresql.org/docs/current/static/sql-copy.html
Eg.
COPY yourtable to '/some/path/and/file.csv' CSV HEADER FORCE QUOTE *;
Note: if the application that is consuming the csv files still fails because of the comma, you can change the delimiter from "," to whatever works for you (eg. "|").
Additionally, if you do not want CSV, but you do want TSV, you can omit the CSV HEADER keywords and the results will output in tab-separated format.
Comma is the list separator of our computer for some regions, some region semicolon is the list separator. so I think you need to replace the comma when you write it to csv.