How to prevent copying of empty file through azure data factory? - azure-data-factory

I am a newbie to azure data factory and working on copy activity. I want to prevent the run of copy activity for files which are empty. can anyone help me out with this?
Also, what would happen if an empty file is encountered in the copy activity? will there be any errror?

You can use Lookup activity that reads and returns the content of a configuration file or table.
Lookup activity in Azure Data Factory

Related

Dynamic source in Azure Data Factory copy activity

So I have a copy activity in an Azure Data Factory pipeline that copies data from 1 ODBC data source to Data Lake.
Now I have another ODBC source that I would like to do the same copy activity.
Is it possible to parameterize the source part in a copy activity?
Yes this is possible. Please have a look at this doc
https://learn.microsoft.com/en-us/azure/data-factory/connector-odbc?tabs=data-factory#dataset-properties

Azure ADF - ContentMD5 field in Get Metadata activity is always null

If I manually upload txt or csv files in azure blob storage, when using the get metadata activity, I always get "contentMD5": null, while the other fields of the output are always populated. I also tried to copy it from on prem to blob storage using azcopy,but I have same issue. I am using ADF v2.
screen shot here null ContentMD5
Any idea why would this happen?
Thanks
Actually, that's not the Data Factory error.
Please check the file property in your Blob Storage. I also have one file without content-MD5:
That why the result of get metadata contentMD5 will be null.
How to solve this problem?
I just download the test.csv to my computer, delete it(Also delete blob snapshots) in the container. Then re-upload it, we can see the CONTENT-MD5 now:
Run the get metadata activity, check the out put:
Hope this helps.

Find the number of files available in Azure data lake directory using azure data factory

I am working on a pipeline where our data sources are csv files stored in Azure data lake. I was able to process all the files using get meta data and for each activity. Now I need to find the number of files available in the Azure data lake? How can we achieve that. I couldn't find any itemcount argument in the Get Meta Data activity. I have noticed that the input of For each activity contains an itemscount value. Is there anyway to access this?
Regards,
Sandeep
Since the output of a child_items Get Metadata activity is a list of objects, why not just get the length of this list?
#{length(activity('Get Metadata1').output.childItems)}

Azure data factory pipeline - copy blob and store filename in a DocumentDB or in Azure SQL

I set up 2 blob storage folders called "input" and "output". My pipeline gets triggered when a new file arrives in "input" and copies that file to the "output" folder. Furthermore I do have a Get Metadata activity where I receive the copied filename(s).
Now I would like to store the filename(s) of the copied data into a DocumentDB.
I tried to use the ForEach activity with it, but here I am stuck.
Basically I tried to use parts from this answer: Add file name as column in data factory pipeline destination
But I don't know what to assign as Source in the CopyData activity since my source are the filenames from the ForEach activity - or am I wrong?
Based on your requirements, I suggest you using Blob Trigger Azure Functions to combine with your current Azure data factory business.
Step 1: still use event trigger in adf to transfer between input and output.
Step 2: assign Blob Trigger Azure Functions to output folder.
Step 3: the function will be triggered as soon as a new file created into it.Then get the file name and use Document DB SDK to store it into document db.
.net document db SDK: https://learn.microsoft.com/en-us/azure/cosmos-db/sql-api-sdk-dotnet
Blob trigger bindings, please refer to here: https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-blob
You may try use a custom activity to insert filenames into Document Db.
You can pass filenames as parameters to the custom activity, and write your own code to insert data into Document Db.
https://learn.microsoft.com/en-us/azure/data-factory/transform-data-using-dotnet-custom-activity

API access from Azure Data Factory

I want to create a ADF pipeline which needs to access an API and using some filter parameter it will get data from there and write the output in JSON format in DataLake. How can I do that??
After the JSON available in Lake it needs to be converted to CSV file. How to do?
You can create a pipeline with copy activity from HTTP connector to Datalake connector. Use HTTP as the copy source to access the API (https://learn.microsoft.com/en-us/azure/data-factory/connector-http), specify the format in dataset as JSON. Reference https://learn.microsoft.com/en-us/azure/data-factory/supported-file-formats-and-compression-codecs#json-format on how to define the schema. Use Datalake connector as the copy sink, specify the format as Text format, and do some modification like row delimiter and column delimiter according to your need.
the below work follow may meet your requirement:
Involve a Copy activity in ADFv2, where the source dataset is HTTP data store and the destination is the Azure Data lake store, HTTP source data store allows you to fetch data by calling API and Copy activity will copy data into your destination data lake.
Chain an U-SQL activity after Copy activity, once the Copy activity succeeds, it'll run the U-SQL script to convert json file to CSV file.