Azure Data Factory V2 Copy Activity - Save List of All Copied Files

Azure Data Factory V2 Copy Activity - Save List of All Copied Files - azure-data-factory

I have pipelines that copy files from on-premises to different sinks, such as on-premises and SFTP.
I would like to save a list of all files that were copied in each run for reporting.
I tried using Get Metadata and For Each, but not sure how to save the output to a flat file or even a database table.
Alternatively, is it possible to fine the list of object that are copied somewhere in the Data Factory logs?
Thank you

Update:
Items:#activity('Get Metadata1').output.childItems
If you want record the source file names, yes we can. As you said we need to use Get Metadata and For Each activity.
I've created a test to save the source file names of the Copy activity into a SQL table.
As we all know, we can get the file list via Child items in Get metadata activity.
The dataset of Get Metadata1 activity specify the container which contains several files.
The list of file in test container is as follows:
At inside of the ForEach activity, we can traverse this array. I set a Copy activity named Copy-Files to copy files from source to destnation.
#item().name represents every file in the test container. I key in the dynamic content #item().name to specify the file name. Then it will sequentially pass the file names in the test container. This is to execute the copy task in batches, each batch will pass in a file name to be copied. So that we can record each file name into the database table later.
Then I set another Copy activity to save the file names into a SQL table. Here I'm using Azure SQL and I've created a simple table.
create table dbo.File_Names(
Copy_File_Name varchar(max)
);
As this post also said, we can use similar syntax select '#{item().name}' as Copy_File_Name to access some activity datas in ADF. Note: the alias name should be the same as the column name in SQL table.
Then we can sink the file names into the SQL table.
Select the table which created previously.
After I run debug, I can see all the file names are saved into the table.
If you want add more infomation, you can reference the post I maintioned previously.

Related

How to dynamically Load the names of files in different folders to Azure SQl DB

I need to create an ADF Pipeline. I have 6 different folders in my Blob storage and each folder contains 20 files. I need load all the names of this file along with some other pipeline parameters into a azure SQl DB table using stored procedure. The name of Files starts with Letter Q. How can we achieve this.

You can get the list of files using Get Metadata activity and pass each file name to stored procedure activity by looping through ForEach activity.
Source folders:
Files in folders:
ADF pipeline:
Get the list of files from Get Metadata activity. Create a dataset with folder and filename parameters and pass values from Get Metadata activity as below.
If you provide Q* in the file name you will get all the files that start with Q.
Get Metadata output:
Pass the child items to ForEach activity.
#activity('Get Metadata1').output.childItems
Add Stored procedure activity inside ForEach activity and pass the current item name to the stored procedure parameter. You can add more parameters and pass pipeline parameters to the stored procedure.

Required help in removing the column from text file using ADF

I have a sample file like this . Using data factory Where I need to create another text file with output where I can remove the 1st two columns. Is there any query where I can generate a file like as below.
Source file:
Output file :

Core Data Factory (ie not including Mapping Data Flows) is not gifted with many abilities to do data transformation (which this is) however it can do some things. It can change formats (eg .csv to JSON), it can add some metadata columns (like $$FILENAME) and it can remove columns, simply by using the mapping in the Copy activity.
Add a Copy activity to your pipeline and set the source to your main file
Set the Sink to your target file name. It can be the same name as your original file but I would make it different for audit trail purposes.
Import the schema of your file, make sure the separator in the dataset is set to semi-colon ';'
4. Now press the Trash can button to delete the mappings for columns 1 and 2.
5. Run your pipeline. The output file should not have the two columns.
My results:

You can accomplish this task by using Select transformation in mapping data flow in Azure Data Factory (ADF). You can delete any unwanted columns from your delimited text file in data flow transformation.
I tested the same in my environment and it is working fine.
Please follow the below steps:
Create the Azure Data factory using Azure Portal
Upload the data at the source (eg: Blob Container)
Create a linked service to connect the blob storage with ADF as shown below
Then, create DelimitedText datasets using the above linked service for source and sink files. In the source dataset, mark Column delimiter as Semicolon(;). Also, in the Schema tab, select Import Schema From connection/store.
Create a data flow. Select the Source dataset from your datasets list. Click on + symbol to add Select from options as shown below.
**In the settings, select the columns you want to delete and then click on delete option.
Add the sink at the end. In the Sink tab use the sink dataset you created earlier in step 4. In the Settings tab, for File name option select Output to single file and give the filename in below option.
Now create a pipeline and use the Data flow activity. Select the data flow you created. Click on Trigger Now option to run the pipeline.
Check the output file at the sink location. You can see my input and output files below.

Copying header files from a txt file to other csv files using azure data factory

I am new to azure data factory. I have a list of txt files (created by splitting a huge CSV file - flights.txt). The list of txt files are listed as flightaa, flightab, flightac etc. Only the first file flightaa has the header.
All my files are stored in the Azure blob storage in the input container. I am transforming the file name from flightaa to flight_1.csv with a header for each file. I am using the Azure data factory to copy from the input container to the output along with a header for each file.
How can I store the header from one file and use it as a header for all output files in azure data factory? Any suggestions?

You can extract your header from flightaa.txt to a new file as source manually and don't check First row as header option.
Then you can get Metadata activity and loop all your file to add header by using Union in Data Flow.
Details:
1.create 4 datasets and two variables as below.
2.get all your txt files by Get Metadata activity.
3.loop child items with For Each activity and check Sequential option.
4.get array index by using two Set Variable activity and pass it to Data Flow.
5.create tow source in Data Flow and union two source. Source1 is content_file,Source2 is headers.
6.sink to your output container.

Add a Data Flow activity to your pipeline. Inside the data flow, make 1 source for the file with header and 2nd source that uses a wildcard to read the files with no header.
After the 1st source with the header, add a Union transformation to the 2nd source.

How to copy data from an a csv to Azure SQL Server table?

I have a dataset based on a csv file. This exposes a data as follows:
Name,Age
John,23
I have an Azure SQL Server instance with a table named: [People]
This has columns
Name, Age
I am using the Copy Data task activity and trying to copy data from the csv data set into the azure table.
There is no option to indicate the table name as a source. Instead I have a space to input a Stored Procedure name?
How does this work? Where do I put the target table name in the image below?

You should DEFINITELY have a table name to write to. If you don't have a table, something is wrong with your setup. Anyway, make sure you have a table to write to; make sure the field names in your table match the fields in the CSV file. Then, follow the steps outlined in the description below. There are several steps to click through, but all are pretty intuitive, so just follow the instructions step by step and you should be fine.
http://normalian.hatenablog.com/entry/2017/09/04/233320

You can add records into the SQL Database table directly without stored procedures, by configuring the table value on the Sink Dataset rather than the Copy Activity which is what is happening.
Have a look at the below screenshot which shows the Table field within my dataset.

Retrieve blob file name in Copy Data activity

I download json files from a web API and store them in blob storage using a Copy Data activity and binary copy. Next I would like to use another Copy Data activity to extract a value from each json file in the blob container and store the value together with its ID in a database. The ID is part of the filename, but is there some way to extract the filename?

You can do the following set of activities:
1) A GetMetadata activity, configure a dataset pointing to the blob folder, and add the Child Items in the Field List.
2) A forEach activity that takes every item from the GetMetadata activity and iterates over them. To do this you configure the Items to be #activity('NameOfGetMetadataActivity').output.childItems
3) Inside the foreach, you can extract the filename of each file using the following function: item().name
After this continue as you see fit, either adding functions to get the ID or copy the entire name.
Hope this helped!

After Setting up Dataset for source file/file path with wildcard and destination/sink as some table
Add Copy Activity setup source, sink
Add Additional Columns
Provide a name to the additional column and value "$$FILEPATH"
Import Mapping and voila - your additional column should be in the list of source columns marked "Additional"

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse