Can azure data factory trigger event when new container added with files in storage account ? if not how this can be implemented
You need to read this for full details.
The Blob path begins with and Blob path ends with properties allow you to specify the containers, folders, and blob names for which you want to receive events.
Related
I have 5 files each stored in two folders in blob storage, I need to check if all 5 files are present in both folders . If yes then execute rest of the pipeline, else wait until all 5 files are placed in the folder.
How to achieve this using ADF
You can use Get meta data activity to get the file details within a folder .
So the flow would be use:
untill activity
within Untill, use get meta data activity (2) one for each folder and set a variable once all 10 files are there
Create dataset for your blob storage
Create pipeline with:
Get Metadata childitems on your container - it has an output.count property
IF activity to carry out other tasks IF count >= 5
Create a trigger for this pipeline with type "Storage Events" that runs on blob created event
I want to configure event based trigger on blob creation. But i have files only in container with container/files format(no folder inside container). In this case how to configure the trigger? What should be given under 'Blob path begins with'??
The Blob path begins with and Blob path ends with properties allow you to specify the containers, folders, and blob names for which you want to receive events. Your storage event trigger requires at least one of these properties to be defined. You can use variety of patterns for both Blob path begins with and Blob path ends with properties, as shown in the examples later in this article.
Blob path begins with: The blob path must start with a folder path.
Valid values include 2018/ and 2018/april/shoes.csv. This field can't
be selected if a container isn't selected.
If your files only in container with container/files format, I'm afraid we can't do that.
For more details, please ref: Create a trigger that runs a pipeline in response to a storage event
But If you think about logic app, it has the trigger When a blob is added or modified (properties only):
This operation triggers a flow when one or more blobs are added or
modified in a container. This trigger will only fetch the file
metadata. To get the file content, you can use the "Get file content"
operation. The trigger does not fire if a file is added/updated in a
subfolder. If it is required to trigger on subfolders, multiple
triggers should be created.
You use this logic trigger + Create a pipeline run or Get a pipeline run action to achieve your request.
I am looking to copy files from blob storage to another blob using Azure Data Factory. However I want to pick only files which starts with say AAABBBCCC , XXXYYYZZZ and MMMNNNOOO. And the remaining I would like to ignore.
In ADF, use the Copy Activity and the wildcard path to set your matching file patterns
You could use prefix to pick the files that you want to copy. And this sample shows how to copy blob to blob using Azure Data Factory.
prefix: Specifies a string that filters the results to return only blobs whose name begins with the specified prefix.
// List blobs start with "AAABBBCCC" in the container
await foreach (BlobItem blobItem in client.GetBlobsAsync(prefix: "AAABBBCCC"))
{
Console.WriteLine(blobItem.Name);
}
With ADF setting:
Set Wildcard paths with AAABBBCCC*. For more details, see here.
I am a newbie to azure data factory and working on copy activity. I want to prevent the run of copy activity for files which are empty. can anyone help me out with this?
Also, what would happen if an empty file is encountered in the copy activity? will there be any errror?
You can use Lookup activity that reads and returns the content of a configuration file or table.
Lookup activity in Azure Data Factory
I set up 2 blob storage folders called "input" and "output". My pipeline gets triggered when a new file arrives in "input" and copies that file to the "output" folder. Furthermore I do have a Get Metadata activity where I receive the copied filename(s).
Now I would like to store the filename(s) of the copied data into a DocumentDB.
I tried to use the ForEach activity with it, but here I am stuck.
Basically I tried to use parts from this answer: Add file name as column in data factory pipeline destination
But I don't know what to assign as Source in the CopyData activity since my source are the filenames from the ForEach activity - or am I wrong?
Based on your requirements, I suggest you using Blob Trigger Azure Functions to combine with your current Azure data factory business.
Step 1: still use event trigger in adf to transfer between input and output.
Step 2: assign Blob Trigger Azure Functions to output folder.
Step 3: the function will be triggered as soon as a new file created into it.Then get the file name and use Document DB SDK to store it into document db.
.net document db SDK: https://learn.microsoft.com/en-us/azure/cosmos-db/sql-api-sdk-dotnet
Blob trigger bindings, please refer to here: https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-blob
You may try use a custom activity to insert filenames into Document Db.
You can pass filenames as parameters to the custom activity, and write your own code to insert data into Document Db.
https://learn.microsoft.com/en-us/azure/data-factory/transform-data-using-dotnet-custom-activity