Nested ForEach in ADF (Azure Data Factory) - azure-data-factory

I have a set of json files that I want to browse, in each file there is a field that contains a list of links that direct to an image. The goal is to download each image from the links using binary formats (I tested with several links and it already works).
Here, my problem is to make the nested ForEach, I manage to browse all the json files but when I make a second ForEach to browse the links and make a copy data to download the images using an Execute Pipeline I get this error
"ErrorCode=InvalidTemplate, ErrorMessage=cannot reference action 'Copy data1'. Action 'Copy data1' must either be in 'runAfter' path, or be a Trigger"
Example of file:
t1.json
{
"type": "jean",
"image":[
"pngmart.com/files/7/Denim-Jean-PNG-Transparent-Image.png",
"https://img2.freepng.fr/20171218/882/men-s-jeans-png-image-5a387658387590.0344736015136497522313.jpg",
"https://img2.freepng.fr/20171201/ed5/blue-jeans-png-image-5a21ed9dc7f436.281334271512172957819.jpg"
]
}
t1.json
{
"type": "socks",
"image":[ "https://upload.wikimedia.org/wikipedia/commons/thumb/5/52/Fun_socks.png/667px-Fun_socks.png",
"https://upload.wikimedia.org/wikipedia/commons/e/ed/Bulk_tube_socks.png",
"https://cdn.picpng.com/socks/socks-face-30640.png"
]
}
Do you have a solution?
Thanks

As per the documentation you cannot nest For Each activities in Azure Data Factory (ADF) or Synapse Pipelines, but you can use the Execute Pipeline activity to create nested pipelines, where the parent has a For Each activity and the child pipeline does too. You can also chain For Each activities one after the other, but not nest them.
Excerpt from the documentation:
Limitation
Workaround
You can't nest a ForEach loop inside another ForEach loop (or an Until loop).
Design a two-level pipeline where the outer pipeline with the outer ForEach loop iterates over an inner pipeline with the nested loop.
Or visually:
It may be that multiple nested pipelines is not what you want in which case you could pass this looping off to another activity, eg Stored Proc, Databricks Notebook, Synapse Notebook (if you're in Azure Synapse Analytics) etc. One example here might be to load up the json files into a table (or dataframe), extract the filenames once and then loop through that list, rather than each file. Just an idea.

I have repro’d and was able to copy all the links looping the copy data activity inside the ForEach activity and using the execute pipeline activity.
Parent pipeline:
If you have multiple JSON files, get the files list using the Get Metadata activity.
Loop the child items using the ForEach activity and add the execute pipeline activity to get the data from each file by passing the current item as a parameter (#item().name).
Child pipeline:
Create a parameter to store the file name from the parent pipeline.
Using the lookup activity, get the data from the current JSON file.
Filename property: #pipeline().parameters.filename
Here I have added https:// to your first image link as it is not validating in the copy activity and giving an error.
Pass the output to the ForEach activity and loop through each image value.
#activity('Lookup1').output.value[0].image
Add Copy data activity inside ForEach activity to copy each link from source to sink.
I have created a binary dataset with the HttpServer linked service and created a parameter for the Base URL in the linked service.
Passing the linked service parameter value from the dataset.
Pass the dataset parameter value from the copy activity source to use the current item (link) in the linked service.

Related

Use Lookup and For Each Iteration to pull data from different analytics.dev.azure.com projects

Hi would just like to ask if this is possible, I am currently working on ADF, what I want to do is get workitems from analytics.dev.azure.com/[Organization]/[Project] then copy it to SQL Database. i am currently already doing this for 1 project, but want to do it for multiple projects without creating multiple copyto tasks within ADF but just run a Lookup to ForEach to iterate through all the team analytics URLs, is there anyway to do this?
We can use lookup and for-each activity to copy data to SQL dB tables from all URLs. Below are the steps
Create a lookup table which contains the entire list of URLs
Next in for each activity's settings, type the following in items for getting output of lookup activity
#activity('Lookup1').output.value
Inside for each activity, use copy activity.
In source, create a dataset and http linked service. Enter the base URL and relative URL. I have stored relative URLs in lookup activity. Thus I have given #{item().url} in relative URL
In sink, Create azure SQL database table for each item in for each activity or use the existing tables and copy data to those tables.

how to check if all my 5 files are present inside a folder using ADF

I have 5 files each stored in two folders in blob storage, I need to check if all 5 files are present in both folders . If yes then execute rest of the pipeline, else wait until all 5 files are placed in the folder.
How to achieve this using ADF
You can use Get meta data activity to get the file details within a folder .
So the flow would be use:
untill activity
within Untill, use get meta data activity (2) one for each folder and set a variable once all 10 files are there
Create dataset for your blob storage
Create pipeline with:
Get Metadata childitems on your container - it has an output.count property
IF activity to carry out other tasks IF count >= 5
Create a trigger for this pipeline with type "Storage Events" that runs on blob created event

How to configure, Event trigger

I want to configure event based trigger on blob creation. But i have files only in container with container/files format(no folder inside container). In this case how to configure the trigger? What should be given under 'Blob path begins with'??
The Blob path begins with and Blob path ends with properties allow you to specify the containers, folders, and blob names for which you want to receive events. Your storage event trigger requires at least one of these properties to be defined. You can use variety of patterns for both Blob path begins with and Blob path ends with properties, as shown in the examples later in this article.
Blob path begins with: The blob path must start with a folder path.
Valid values include 2018/ and 2018/april/shoes.csv. This field can't
be selected if a container isn't selected.
If your files only in container with container/files format, I'm afraid we can't do that.
For more details, please ref: Create a trigger that runs a pipeline in response to a storage event
But If you think about logic app, it has the trigger When a blob is added or modified (properties only):
This operation triggers a flow when one or more blobs are added or
modified in a container. This trigger will only fetch the file
metadata. To get the file content, you can use the "Get file content"
operation. The trigger does not fire if a file is added/updated in a
subfolder. If it is required to trigger on subfolders, multiple
triggers should be created.
You use this logic trigger + Create a pipeline run or Get a pipeline run action to achieve your request.

Azure data factory pipeline - copy blob and store filename in a DocumentDB or in Azure SQL

I set up 2 blob storage folders called "input" and "output". My pipeline gets triggered when a new file arrives in "input" and copies that file to the "output" folder. Furthermore I do have a Get Metadata activity where I receive the copied filename(s).
Now I would like to store the filename(s) of the copied data into a DocumentDB.
I tried to use the ForEach activity with it, but here I am stuck.
Basically I tried to use parts from this answer: Add file name as column in data factory pipeline destination
But I don't know what to assign as Source in the CopyData activity since my source are the filenames from the ForEach activity - or am I wrong?
Based on your requirements, I suggest you using Blob Trigger Azure Functions to combine with your current Azure data factory business.
Step 1: still use event trigger in adf to transfer between input and output.
Step 2: assign Blob Trigger Azure Functions to output folder.
Step 3: the function will be triggered as soon as a new file created into it.Then get the file name and use Document DB SDK to store it into document db.
.net document db SDK: https://learn.microsoft.com/en-us/azure/cosmos-db/sql-api-sdk-dotnet
Blob trigger bindings, please refer to here: https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-blob
You may try use a custom activity to insert filenames into Document Db.
You can pass filenames as parameters to the custom activity, and write your own code to insert data into Document Db.
https://learn.microsoft.com/en-us/azure/data-factory/transform-data-using-dotnet-custom-activity

ADF - Using parameter from loop in a copy data flow

I've created a loop that loops over URL's to fetch ODATA. Then, a copy flow is created for every path in the oDATA service. However, I need to be able to pass the URL into these tables as well.
How can I add the URL (#pipeline().parameters.ProjectUrl) to my sink when I am unable to import schemas because I'm working with parameters? Note that my query is a select, like so:
$select=Field1,Field2,Field3
I'd like to add my parameter here, so it gets added to the tables.
THanks!
By copy flow do you mean 'a mapping data flow, which is used just to copy' ?
If that is the case, go into the flow, and add a parameter. Keep the type as string (not all of the flow parameter types have pipeline equivalents). Go back to the pipeline, and look at the execute data flow activity. The parameter will now be visible in the activity. When you click on the 'value' field, you can choose between 'Pipeline expression' and 'Data flow expression'. Choose 'Pipeline expression' and place the #pipeline().parameters.ProjectUrl here.