Trouble scanning a Data Lake Gen 2 source in Purview - azure-purview

I am using Azure Purview. I created an Azure Data Lake resource. I am trying to scan the source. When I try to scan the source the connection fails as can be seen from the image below:
I did try to use this link to figure out how to solve this:
https://learn.microsoft.com/en-gb/azure/purview/manage-credentials?wt.mc_id=mspurview_inproduct_scan_msiauth_csadai
But I haven't been able to. Any help would be greatly appreciated.

I needed to copy the 'Managed Identity Name' for the Purview account and add it a role assignment of 'Storage Blob Data Reader' for the ADLS Gen 2 Account. Here is a more detailed breakdown.
https://learn.microsoft.com/en-us/answers/questions/565738/error-3835-purview-failed-to-access-the-adls-gen2.html

Related

Copy subdirs + files from storage acct to ADX using Data Factory

I'm trying to copy files from Az Storage blob to ADX using Data factory, but I can't find a solution to do this using json datasets (not binary), so I can map schema. I would love to have a template to do this.
I have tried to follow the resolution mentioned here (Get metadata from Blob storage with "folder like structure" using Azure Data Factory pipeline), but I'm lacking some more guidance (this is my first project using ADF)
enter image description here
enter image description here
enter image description here
Now I am getting another error as shown below
I'm actually also looking for a COMPLETE guide setting this up.
Here is my overall use-case target https://learn.microsoft.com/en-us/azure/sentinel/store-logs-in-azure-data-explorer?tabs=azure-storage-azure-data-factory - but the documentation is missing the detailed steps - in step 6 Create a data pipeline with a copy activity, based on when the blob properties were last modified. This step requires an extra understanding of Azure Data Factory. For more information, see Copy activity in Azure Data Factory and Azure Synapse Analytics.
enter image description here
enter image description here
It seems as I was going in a wrong direction.
I have actually found a simple solution for setting this up. I have tried the copy data tool, and it seems to be doing what I want. So case closed :-)
From the error message you have shared, it seems like your dynamic expression for passing the childItems of your get metadata activity to ForEach activity items is causing this problem.
You might be using Items = #activity('GetMetadataActivity').output instead of #activity('GetMetadataActivity').output.childItems.
Please use Items = #activity('GetMetadataActivity').output.childItems in your ForEach activity which should help resolve your error.
Here is a video demonstration by a community volunteer where this error has been explained in detail: ADF Error Function Length expects its parameter to be array or string

Dynamic source in Azure Data Factory copy activity

So I have a copy activity in an Azure Data Factory pipeline that copies data from 1 ODBC data source to Data Lake.
Now I have another ODBC source that I would like to do the same copy activity.
Is it possible to parameterize the source part in a copy activity?
Yes this is possible. Please have a look at this doc
https://learn.microsoft.com/en-us/azure/data-factory/connector-odbc?tabs=data-factory#dataset-properties

Find the number of files available in Azure data lake directory using azure data factory

I am working on a pipeline where our data sources are csv files stored in Azure data lake. I was able to process all the files using get meta data and for each activity. Now I need to find the number of files available in the Azure data lake? How can we achieve that. I couldn't find any itemcount argument in the Get Meta Data activity. I have noticed that the input of For each activity contains an itemscount value. Is there anyway to access this?
Regards,
Sandeep
Since the output of a child_items Get Metadata activity is a list of objects, why not just get the length of this list?
#{length(activity('Get Metadata1').output.childItems)}

how to merge two csv files in azure data factory

I want to update the Target csv file (Located in Azure Data Lake Store) with delta records updated every day (delta file sit in blob). If existed record updated, then I want to update the same in Target file or if the delta records is new one, then want to append that records to Target CSV file in azure data lake store. I want to implement this using Azure Data Factory, preferably using ADF Data flow.
I am trying to do this using Azure Data Factory Data Flow Task, but I observed it is possible to create new target file post the merge but couldn't able to update the existed file.
Please let me know if any powershell or any other way if we can update the target file
We have a sample template that shows you how to update an existing file from a new file using ADF Data Flows. The file type is Parquet, but will work for CSV as well.
Go to New > Pipeline from Template and look for "Parquet CRUD Operations". You can open up that Data Flow to see how it's done.

Azure data factory pipeline - copy blob and store filename in a DocumentDB or in Azure SQL

I set up 2 blob storage folders called "input" and "output". My pipeline gets triggered when a new file arrives in "input" and copies that file to the "output" folder. Furthermore I do have a Get Metadata activity where I receive the copied filename(s).
Now I would like to store the filename(s) of the copied data into a DocumentDB.
I tried to use the ForEach activity with it, but here I am stuck.
Basically I tried to use parts from this answer: Add file name as column in data factory pipeline destination
But I don't know what to assign as Source in the CopyData activity since my source are the filenames from the ForEach activity - or am I wrong?
Based on your requirements, I suggest you using Blob Trigger Azure Functions to combine with your current Azure data factory business.
Step 1: still use event trigger in adf to transfer between input and output.
Step 2: assign Blob Trigger Azure Functions to output folder.
Step 3: the function will be triggered as soon as a new file created into it.Then get the file name and use Document DB SDK to store it into document db.
.net document db SDK: https://learn.microsoft.com/en-us/azure/cosmos-db/sql-api-sdk-dotnet
Blob trigger bindings, please refer to here: https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-blob
You may try use a custom activity to insert filenames into Document Db.
You can pass filenames as parameters to the custom activity, and write your own code to insert data into Document Db.
https://learn.microsoft.com/en-us/azure/data-factory/transform-data-using-dotnet-custom-activity