Azure ADF - ContentMD5 field in Get Metadata activity is always null - metadata

If I manually upload txt or csv files in azure blob storage, when using the get metadata activity, I always get "contentMD5": null, while the other fields of the output are always populated. I also tried to copy it from on prem to blob storage using azcopy,but I have same issue. I am using ADF v2.
screen shot here null ContentMD5
Any idea why would this happen?
Thanks

Actually, that's not the Data Factory error.
Please check the file property in your Blob Storage. I also have one file without content-MD5:
That why the result of get metadata contentMD5 will be null.
How to solve this problem?
I just download the test.csv to my computer, delete it(Also delete blob snapshots) in the container. Then re-upload it, we can see the CONTENT-MD5 now:
Run the get metadata activity, check the out put:
Hope this helps.

Related

Copy subdirs + files from storage acct to ADX using Data Factory

I'm trying to copy files from Az Storage blob to ADX using Data factory, but I can't find a solution to do this using json datasets (not binary), so I can map schema. I would love to have a template to do this.
I have tried to follow the resolution mentioned here (Get metadata from Blob storage with "folder like structure" using Azure Data Factory pipeline), but I'm lacking some more guidance (this is my first project using ADF)
enter image description here
enter image description here
enter image description here
Now I am getting another error as shown below
I'm actually also looking for a COMPLETE guide setting this up.
Here is my overall use-case target https://learn.microsoft.com/en-us/azure/sentinel/store-logs-in-azure-data-explorer?tabs=azure-storage-azure-data-factory - but the documentation is missing the detailed steps - in step 6 Create a data pipeline with a copy activity, based on when the blob properties were last modified. This step requires an extra understanding of Azure Data Factory. For more information, see Copy activity in Azure Data Factory and Azure Synapse Analytics.
enter image description here
enter image description here
It seems as I was going in a wrong direction.
I have actually found a simple solution for setting this up. I have tried the copy data tool, and it seems to be doing what I want. So case closed :-)
From the error message you have shared, it seems like your dynamic expression for passing the childItems of your get metadata activity to ForEach activity items is causing this problem.
You might be using Items = #activity('GetMetadataActivity').output instead of #activity('GetMetadataActivity').output.childItems.
Please use Items = #activity('GetMetadataActivity').output.childItems in your ForEach activity which should help resolve your error.
Here is a video demonstration by a community volunteer where this error has been explained in detail: ADF Error Function Length expects its parameter to be array or string

Azure Data Factory DataFlow Source CSV File Header Keep Changing

I am trying load the CSV file from source blob storage and option selected for first row as a header but while doing multiple time debug trigger, the header keep changing, so that i could not able to insert the data to target SQL DB.
kindly suggest and how do we handle this scenario. i am expecting static header needs to configure from source or else existing column i would have to rename into adf side.
Thanks
In Source settings "Allow Schema drift" needs to be ticked.
Allow Schema Drift should be turned-on in the sink as well.

How to prevent copying of empty file through azure data factory?

I am a newbie to azure data factory and working on copy activity. I want to prevent the run of copy activity for files which are empty. can anyone help me out with this?
Also, what would happen if an empty file is encountered in the copy activity? will there be any errror?
You can use Lookup activity that reads and returns the content of a configuration file or table.
Lookup activity in Azure Data Factory

Find the number of files available in Azure data lake directory using azure data factory

I am working on a pipeline where our data sources are csv files stored in Azure data lake. I was able to process all the files using get meta data and for each activity. Now I need to find the number of files available in the Azure data lake? How can we achieve that. I couldn't find any itemcount argument in the Get Meta Data activity. I have noticed that the input of For each activity contains an itemscount value. Is there anyway to access this?
Regards,
Sandeep
Since the output of a child_items Get Metadata activity is a list of objects, why not just get the length of this list?
#{length(activity('Get Metadata1').output.childItems)}

Azure data factory pipeline - copy blob and store filename in a DocumentDB or in Azure SQL

I set up 2 blob storage folders called "input" and "output". My pipeline gets triggered when a new file arrives in "input" and copies that file to the "output" folder. Furthermore I do have a Get Metadata activity where I receive the copied filename(s).
Now I would like to store the filename(s) of the copied data into a DocumentDB.
I tried to use the ForEach activity with it, but here I am stuck.
Basically I tried to use parts from this answer: Add file name as column in data factory pipeline destination
But I don't know what to assign as Source in the CopyData activity since my source are the filenames from the ForEach activity - or am I wrong?
Based on your requirements, I suggest you using Blob Trigger Azure Functions to combine with your current Azure data factory business.
Step 1: still use event trigger in adf to transfer between input and output.
Step 2: assign Blob Trigger Azure Functions to output folder.
Step 3: the function will be triggered as soon as a new file created into it.Then get the file name and use Document DB SDK to store it into document db.
.net document db SDK: https://learn.microsoft.com/en-us/azure/cosmos-db/sql-api-sdk-dotnet
Blob trigger bindings, please refer to here: https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-blob
You may try use a custom activity to insert filenames into Document Db.
You can pass filenames as parameters to the custom activity, and write your own code to insert data into Document Db.
https://learn.microsoft.com/en-us/azure/data-factory/transform-data-using-dotnet-custom-activity