Skip lines while reading csv - Azure Data Factory - azure-data-factory

I am trying to copy data from Blob to Azure SQL using data flows within a pipeline.
Data Files is in csv format and the Header is at 4th row in the csv file.
i want to use the header as is what is available in the csv data file.
I want to loop through all the files and upload data.
Thanks

Add a Surrogate Key transformation and then a Filter transformation to filter out row number 4.

You need to first uncheck the "First row as header" in your CSV dataset. Then you can use the "Skip line count" field in the copy data activity source tab and skip any number of lines you want.

Related

Data Factory - Can I use the date field in a CSV to determine the destination folder in Copy Activity

I have some CSV files that I want to copy to a specific folder in ADLS based on the date column within the file.
i.e. CSV file has a column named "date" that reads "2022-02-23" on all rows. I want to copy that file to a folder that has the corresponding year and month, such as "/curated/UK/ProjectABC/2022/02"
I've got a Lookup activity that's pointing to the source CSV file and populating a Set Variable activity with the month using this dynamic content - #substring(string(activity('Lookup1').output.firstrow.date),5,2)
Would this be the right approach, to use a variable?
I cant use variables in the Directory portion of the Sink Dataset, as far as I know.
Have you come across this situation before?
Sounds like you're on the right path. You can use absolutely use Dataset parameters:
Then populate them in your pipeline using a variable (or parameter, or expression):

ADF Add Header to CSV Sink

Anyone know how to add header to csv sink? I have a data flow that's source is a database table. Then I have used derived column and concatenated the columns to make one column and split the data in the column by commas (done in the source via a query). I have then selected the column that has been concatenated to be export to csv.
Data example:
Matt,Smith,10
Therefore I technically only have one column, however, I want to add a header for each section of the data.
Desired output:
FirstName,LastName,Age
Matt,Smith,10
You can add headers in CSV file.
Select Data Flow Activity.
Select Source and use Select activity.
Add column names as shown in below screenshot.
Finally add Sink and run Pipeline.

Handling delimited files in Azure Data factory

I have got a very large table with around 28 columns and 900k records.
I converted it to CSV file (Pipe separated) and then tried to use that file for feeding another table using ADF itself.
When I tried to use that file, it keeps triggering an error saying some column datatype mismatch.
So excavating more into the data I have found few rows having Pipe (|) symbol in their text itself. So at the time coverting it back, the text after the pipe been considered for the next column and thus the error.
So how to handle the conversion into CSV efficiently when there are texts with delimiters in their columns.
Option1: If there is a possibility, I would suggest changing the delimiter to other than pipe(|), as the column value also contains pipe in its text.
Option2: In the CSV dataset, select a Quote character to identify the columns.
Step1: Copying data from table1 to CSV.
Source:
Sink CSV dataset:
Output:
Step2: Loading same CSV data to table2 with a copy activity.
CSV output file of Step1.
Source CSV dataset:
Sink dataset:
Output:

Custom Row Delimiter in Azure Data Factory (ADF)

I have a CSV file that terminates a row with Comma and CRLF.
I set my dataset to ",\r\n" but when I ran the pipeline, it won't accept this, thinking it's multiple values in the delimiter... If I don't put the comma in the dataset row delimiter, when pipeline runs, it thinks that there's an unnamed header. Is it possible in ADF to have this combination as a delimeter (comma + crlf) - ",\r\n"?
FirstName,LastName,Occupation,<CRLF Char>
Michael,Jordan,Doctor,<CRLF Char>
Update:
When running the copy activity, I encountered the same problem as you.
Then I selcet Line feed(\n)as Row delimiter at Source.
Add Column mapping as follows:
When I run debug, the csv file was successfully copied into Azure SQL table.
I created a simple test. Do you just want ADF to read 3 columns?
This is the origin csv file.
In ADF, If we use default Row delimiter and Column delimiter settings, select First row as header.
We also can select Edit and enter \r\n at Row delimiter field.
You can import schema here.

How can we Exclude the unnecessary rows from Excel File while doing Data Load using Copy activity in ADF

I have a excel file which is semi-structured. There is data in a table, but there are dividers in certain rows that should needs to be ignored.
The processing of the data should start with the column headers(Col1 , col2 ....) and only process the rows with actual data.
Could anyone suggest the way to achieve this using copy activity in adf .
My source is xls file and target is ADLA (Parquet file)
Any help appreciated. Thanks in advance.
The most closest solution is that you need manually choose data range in the excel file:
Ref: https://learn.microsoft.com/en-us/azure/data-factory/format-excel#dataset-properties
HTH.