Need to read a CSV file using Azure Data Factory Activity - azure-data-factory

I have a CSV file.
I need to read each row and access the column values and for each row I need to call a foreach activity.
With what activity can I achieve this?

Assuming that the CSV file is in a cloud storage , you can use the lookup activity . Please beware that lookup activity has a limitation of 5000 at this time . Once you have done that you can use a FE loop and iterate through it .
Hope this helps

Related

Compare Get metadata activity with lookup output

I was trying to do a delta load using ADF. I have a get meta data activity on my blob location to read all the files using getchildItem and lookup activity which reads loaded file names from a sql table. Following this I have a filter activity which should filter out the new files from the blob location.
expression on items
#activity('GetRawFilenames').output.childItems
expression on Condition
#not(contains(activity('GetLoadedFilesList').output.value,item().name))
But still its not filtering out the file list in the filter output. Could the experts please help? Thanks in advance..
Have a look at this. They describe same problem. Expression on condition should be:
#not(contains(join(activity('GetLoadedFilesList').output.value,','),item().name))
so after joining of GetLoadedFilesList activity your code should be working

Transform FormRecognizer output in Azure Data Factory

I want to Extract tables in PDF file and insert that data into output sink ( CSV \ Azure SQL etc )
I have tried below things
Analyze custom pdf document using Form Recognizer General document as I just want to scrape Tables
Call "Get Analyze Result" REST API from ADF to get Table Array
Now I want to loop through every Table and Cells and insert data into Azure SQL table
How do I achieve this effectively ?
One way I see is , use JSON parsing along with Looping mechanism in ADF to transform Form Recognizer output row by row
Note : I have checked this post already
Extract PDF table data using Azure Form Recognizer
You should be able to achieve this using the cognitive services API with the external call transformation: https://youtu.be/r22nthp-f4g?t=400

How to iterare all the excel sheets present in a excel file in azure data factory

I have an Excel file with 5 sheets: Sheet1, Sheet2, Sheet3, Sheet4, Sheet 5.
In the future, the user can add Sheet6, Sheet7 as well.
I want to create a pipeline to copy all the sheet data into a single table. I want to iterate all the sheets in excel and copy the data from Sheet to a single table.
As per my approach, I have created an Array variable and assigned ["Sheet1", "Sheet2", "Sheet3", "Sheet4", "Sheet5"] and I am using a Foreach loop and Inside Foreach, I am copying the Sheet data to a single table.
In my second approach, I am using a Lookup activity to fetch the sheet info from a SQL table and then using foreach loop to copy the sheet's data into the table.
But, in both the approach, whenever a user adds a new sheet, either, I need to update my ADF pipeline (approach 1) or I need to update my SQL table where Sheet info is present.
I don't want to update either the pipeline or SQL table to fetch data from the new additional sheet. It should iterate dynamically and loads all the sheets' data to a single table. It will do always truncate and load.
Currently getting the sheetnames dynamically in ADF is not possible.
So you would have to write a custom logic to get the list of sheet names and then iterate it over foreach.
For that you can leverage Azure automation/Azure function etc and call them in ADF
ADF - How to copy an Excel Sheet with Multiple Sheets into separate .csv files
I am afraid that this feature is available at this point of time, as Excel is still relatively new for ADF v2, this feature might not be there, but you can submit a feedback or create a feature request for this with Microsoft here
For continuing the job, you will have to follow the same approach that you are using, adding the names of sheet manually
Alternatively, if you don't want to add the names of the sheet then you can provide the user access to update the ADF parameter by giving a custom role to just update it and then inform them to update the parameter list as soon as they add a new sheet
Thanks!

how to get Iteration Id for items in array using Azure Data factory

I have a simple ADF pipeline which contains 1 lookup (which loads the name of tables to be migrated) and a ForEach activity (Which contains copy activity and a function App to loads data in BQ). I want to get the Iteration ID and want to send it to Azure function App.
Let say the Lookup returns a JSON with three tables in it (A,B,C) I want to get the iteration id inside the foreach loop for example 1 for A and 2 for B and 3 for C.
Any help on this will be highly appreciated.
I agree this is a common requirement,but it seems no direct way to get the array index inside the for-each activity. However,you could try my little trick with AzureFunction Activity.
Step1: Create a text file (named as index.txt)in the some blob storage path and store 1 value in it(for using it as array index)
Step2: Inside the For-each Activity, use LookUp Activity to read the value of index.txt. First time, it is 1.
Step3: After that, execute an Azure Function Activity to change the value --plus 1.So that,next time it is 2.
Step4: When you finish For-each Activity,you could reset the value as 0 by Azure Function Activity.
No need to create 2 azure functions,just 1. You could pass a boolean parameter to distinct whether this invoke is for reset or plus.
In the lookup table from which I was going to pick the Source and destination tables/databases. I added another column with the Iterator number like 1, 2,3,4 for each row in the Source table from which the lookup activities is retrieving the data.
Then inside Azure data factory, I read that column inside the Foreach loop. For each of the Source and Destination tables I have a self made Iterator and used that for my purpose. It worked perfectly fine for me.

Retrieve blob file name in Copy Data activity

I download json files from a web API and store them in blob storage using a Copy Data activity and binary copy. Next I would like to use another Copy Data activity to extract a value from each json file in the blob container and store the value together with its ID in a database. The ID is part of the filename, but is there some way to extract the filename?
You can do the following set of activities:
1) A GetMetadata activity, configure a dataset pointing to the blob folder, and add the Child Items in the Field List.
2) A forEach activity that takes every item from the GetMetadata activity and iterates over them. To do this you configure the Items to be #activity('NameOfGetMetadataActivity').output.childItems
3) Inside the foreach, you can extract the filename of each file using the following function: item().name
After this continue as you see fit, either adding functions to get the ID or copy the entire name.
Hope this helped!
After Setting up Dataset for source file/file path with wildcard and destination/sink as some table
Add Copy Activity setup source, sink
Add Additional Columns
Provide a name to the additional column and value "$$FILEPATH"
Import Mapping and voila - your additional column should be in the list of source columns marked "Additional"