Handling delimited files in Azure Data factory - azure-data-factory

I have got a very large table with around 28 columns and 900k records.
I converted it to CSV file (Pipe separated) and then tried to use that file for feeding another table using ADF itself.
When I tried to use that file, it keeps triggering an error saying some column datatype mismatch.
So excavating more into the data I have found few rows having Pipe (|) symbol in their text itself. So at the time coverting it back, the text after the pipe been considered for the next column and thus the error.
So how to handle the conversion into CSV efficiently when there are texts with delimiters in their columns.

Option1: If there is a possibility, I would suggest changing the delimiter to other than pipe(|), as the column value also contains pipe in its text.
Option2: In the CSV dataset, select a Quote character to identify the columns.
Step1: Copying data from table1 to CSV.
Source:
Sink CSV dataset:
Output:
Step2: Loading same CSV data to table2 with a copy activity.
CSV output file of Step1.
Source CSV dataset:
Sink dataset:
Output:

Related

Is it possible to generate the space separated header row using data factory copy activity?

I am using azure sql as source dataset and delimited file as sink dataset in the copy activity.
I tried copy activity but First row as header gives comma separated headers.
Is there way to change the header output style ?
Please note spacing is unequal (h3...h4)
In this repro, I tried to give
1 space between 1st and 2nd column,
2 spaces between 2nd and 3rd column,
3 spaces between 3rd and 4th column.
Also, I tried to give same column name for column2 and column3. The approach is as follows.
Data is copied from Azure SQL database to datalake in comma delimitted format as a staging file.
This staging file is taken as a source in Dataflow activity.
In source dataset, first row as header is not checked.
Data preview of Source transformation:
Derived column transformation is added to change the column name of column2 and column3.
In this case, date_col of column1 is header data. Thus when column1 is 'date_col' replace column2 and column3 data with same column name.
column_2 = iif(Column_1=='date_col','ECIX',Column_2);
column_3 = iif(Column_1=='date_col','ECIX',Column_3);
Again derived column transformation is added to concat all the columns with spaces. Column name is given as concat . Value for this column is
concat(Column_1,' ',Column_2,' ',Column_3,' ',Column_4)
Select transformation is added and only concat column is selected here.
In sink, new delimited file is added as a sink dataset. And in sink dataset also , first row as header is not checked.
Output file screenshot
After pipeline is run, the target file looks like this.
Keeping the source as azure sql itself in the dataflow, I created a single derived column 'OUTDC' and added all the columns from the source like this:
(h1)+' '+(h2)+' '+(h3)
Then fed the OUTDC to a delimited sink and kept the Headers option as single string like this:
['h1 h2 h2']

ADF Add Header to CSV Sink

Anyone know how to add header to csv sink? I have a data flow that's source is a database table. Then I have used derived column and concatenated the columns to make one column and split the data in the column by commas (done in the source via a query). I have then selected the column that has been concatenated to be export to csv.
Data example:
Matt,Smith,10
Therefore I technically only have one column, however, I want to add a header for each section of the data.
Desired output:
FirstName,LastName,Age
Matt,Smith,10
You can add headers in CSV file.
Select Data Flow Activity.
Select Source and use Select activity.
Add column names as shown in below screenshot.
Finally add Sink and run Pipeline.

Custom Row Delimiter in Azure Data Factory (ADF)

I have a CSV file that terminates a row with Comma and CRLF.
I set my dataset to ",\r\n" but when I ran the pipeline, it won't accept this, thinking it's multiple values in the delimiter... If I don't put the comma in the dataset row delimiter, when pipeline runs, it thinks that there's an unnamed header. Is it possible in ADF to have this combination as a delimeter (comma + crlf) - ",\r\n"?
FirstName,LastName,Occupation,<CRLF Char>
Michael,Jordan,Doctor,<CRLF Char>
Update:
When running the copy activity, I encountered the same problem as you.
Then I selcet Line feed(\n)as Row delimiter at Source.
Add Column mapping as follows:
When I run debug, the csv file was successfully copied into Azure SQL table.
I created a simple test. Do you just want ADF to read 3 columns?
This is the origin csv file.
In ADF, If we use default Row delimiter and Column delimiter settings, select First row as header.
We also can select Edit and enter \r\n at Row delimiter field.
You can import schema here.

Read csv file excluding first column and first line

I have a csv file containing 8 lines and 1777 columns.
I need to read all the contents in matlab, excluding the first line and first column. First line and first column contain strings and matlab can't parse them.
Do you have any idea?
data = csvread(filepath);
The code above reads all the contents
As suggested, csvread with a range will read in the numeric data. If you would like to read in the strings as well (which are presumably column headers), you can use readtable:
t = readtable(filepath);
This will create a table with the column headers in your file as variable names of the columns of the table. This way you can keep the strings associated with the data, if need be.

Postgresql: Execute query write results to csv file - datatype money gets broken into two columns because of comma

After running Execute query write results to file - the columns in my output file for datatype money get broken into two columns. e.g if my revenue is $500 it is displayed correctly. But, if my revenue is $1,500.00 - there is an issue. It gets broken into two columns $1 and $500.00
Can you please help me getting my results in a csv file in a single column for datatype money?
What is this command "execute query write results to file"? Do you mean COPY? If so, have a look at the FORCE QUOTE option http://www.postgresql.org/docs/current/static/sql-copy.html
Eg.
COPY yourtable to '/some/path/and/file.csv' CSV HEADER FORCE QUOTE *;
Note: if the application that is consuming the csv files still fails because of the comma, you can change the delimiter from "," to whatever works for you (eg. "|").
Additionally, if you do not want CSV, but you do want TSV, you can omit the CSV HEADER keywords and the results will output in tab-separated format.
Comma is the list separator of our computer for some regions, some region semicolon is the list separator. so I think you need to replace the comma when you write it to csv.