error browsing directory under ADLS Gen2 container for Azure Data Factory - azure-data-factory

I am creating a dataset in Azure Data Factory. This dataset will be a Parquet file within a directory under a certain container in an ADLS Gen2 account. The container name is 'raw', and the directory that I want to place the file into is source/system1/FullLoad. When I click on Browse next to File path, I am able to access the container, but I cannot access the directory. When I hit folder 'source', I get the error shown below.
How can I drill to the desired directory? As the error message indicates, I suspect that it's something to do with permissions to access the data (the Parquet file doesn't exist yet, as it will be used as a sink in a copy activity that hasn't been run yet), but I don't know how to resolve.

Thanks for confirming putting the resolution for others if anyone face this issue.
The user or managed identity you are using for your data factory should have storage data blob contributor access on the storage account. You can check it from azure portal, go to your storage account, navigate to the container and then directory, click on Access Control on the left panel and check role assignment. If it is missing add the role assignment of storage data blob contributor to your managed identity.

Related

How to copy blob file to SAS URL in a Synapse pipeline

I have a blob zip file in my storage account, I have a linked service and binary dataset to get the file as the source in a copy activity. There is an outside service I call in a web activity that returns a writable SAS URL to a different storage account in this format.
https://foo.blob.core.windows.net/dmf/43de9fb6-3b96-4f47-b730-eb8de040859dblah.zip?sv=2014-02-14&sr=b&sig=0mgvh25htg45b5u4ty5E%2Bf0ahMwFkHVy3iTC2nh%2FIKw%3D&st=2022-08-13T02%3A19%3A33Z&se=2022-08-13T02%3A54%3A33Z&sp=rw
I tried adding a SAS azure blob linked service, I added a parameter for the uri on the LS, then added a dataset bound to the LS and also added a parameter for the uri, I pass the SAS uri dynamically all the way down to the linked service. The copy fails each time with The remote server returned an error: (403). I have to be doing something wrong but not sure what it is. I'd appreciate any input, thanks.
I tried to reproduce the same in my environment and got same error:
To resolve the above 403 error, you need to enable it from all network option and also check whether the Storage blob data contributor was added or not. If not , Go to Azure Storage Account -> Access control (IAM) -> +Add role assignment as Storage blob data contributor.
Now, its working.

Azure Data Factory - Batch Accounts - BlobAccessDenied

I'm trying to work with a custom activity in Data Factory to execute in a batch accounts pool a python batch stored in a blob storage.
I followed the Microsoft tutorial https://learn.microsoft.com/en-us/azure/batch/tutorial-run-python-batch-azure-data-factory
My problem is when I execute the ADF pipeline the activity failed:
When I check in the Batch Explorer tool, I got this BlobAccessDenied message:
Depending of the execution, it happens on all ADF reference files but also for my batch file.
I have linked the Storage Account to the Batch Accounts
I'm new to this and I'm not sure of what I must do to solve this.
Thank you in advance for your help.
I tried to reproduce the issue and it is working fine for me.
Please check the following points while creating the pipeline.
Check if you have pasted storage account connection string at line number 6 in main.py file
You need to create a Blob Storage and a Batch Linked Services in the Azure Data Factory(ADF). These linked services will be required in “Azure Batch” and “Settings” Tabs when configure ADF Pipeline. Please follow below snapshots to create Linked Services.
In ADF Portal, click on left ‘Manage’ symbol and then click on +New to create Blob Storage linked service.
Search for “Azure Blob Storage” and then click on Continue
Fill the required details as per your Storage account, test the connection and then click on apply.
Similarly, search for Azure Batch Linked Service (under Compute tab).
Fill the details of your batch account, use the previously created Storage Linked service under “Storage linked service name” and then test the connection. Click on save.
Later, when you will create custom ADF pipeline, under “Azure Batch” Tab, provide the Batch Linked Service Name.
Under “Settings” Tab, provide the Storage Linked Service name and other required information. In "Folder Path", provide the Blob name where you have main.py and iris.csv files.
Once this is done, you can Validate, Debug, Publish and Trigger the pipeline. Pipeline should run successfully.
Once pipeline ran successfully, you will see the iris_setosa.csv file in your output Blob.

Can you extract data from an Excel Workbook using Azure Data Factory that has Azure Information Protection

I have an internal document (Excel) that has an Azure information Protection / O365 Unified Sensitivity Labelling applied to it.
Im trying to extract that data, but I'm getting an Encryption Error because and rightly so the information is encrypted.
The process:
The document is pulled from Sharepoint into a blob storage container and then Azure Data factory picks up the file using the Copy activity and reads the contents into an Azure SQL Database
Error message:
ErrorCode=EncryptedExcelIsNotSupported,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Encrypted excel file 'dummy file.xlsx' is not supported, please remove its password.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=ICSharpCode.SharpZipLib.Zip.ZipException,Message=Wrong Local header signature: 0xE011CFD0,Source=ICSharpCode.SharpZipLib,'
I have a Linked Service using a Service principal that can connect to the file, but previewing the data results in a message saying the file is encrypted.
I presume I would need to give permissions to the Service Principal, but im still stuck what those would be.
I tried adding Azure Rights Management read/create in the API permissions but that still hasn't worked
Data Factory can't read the file data which is protected by other service.
If you want to copy data from the encrypted files, you must have the permission to access it.

Linked Service with self-hosted integration runtime is not supported in data flow in Azure Data Factory

Step to reproduce:
I created a Copy Data first in the pipeline to simple transfer CSV files frol Azure VM to Azure Blob storage. I always use IRPOC1 as a connection via integration runtime and connect using SAS URI and SAS Token to my Blob Storage
After validate and run my first Copy Data, I successfully have CSV file transfer from my VM to Blob storage
I tried to add a new Data Flow after the Copy Data activity
In my Data Flow, my source is the Blob storage containing the CSV files transferred from VM, my Sink is my Azure SQL Database with successful connection
However, when I ran validation, I got the error message on my Data Flow Source:
Linked Service with self-hosted integration runtime is not supported in data flow.
I saw someone replied on Microsoft Azure Document issue Github that I need to use Copy Data to transfer data to Blob first. Then use the source from this blob with data. This is what I did but I still have the same error. Could you please let me know how I can fix this?
The Data Flow source dataset must use a Linked Service that uses an Azure IR, not a self-hosted IR.
Go to the dataset in your data flow Source, click "Open". In the dataset page, click "Edit" next to Linked Service.
In the Linked Service dialog, make sure you are using an Azure Integration Runtime, not a Self-hosted IR.

Unable to transfer GCS bucket from one account to another

I am trying to create a transfer job in Data Transfer, to copy all files in a bucket belonging to one account to an existing bucket belonging to another account.
I get access to both source and destination buckets, I get "green light" in the wizard, but when I try to run the transfer job I get the following error message:
To complete this transfer, you need the 'storage.buckets.setIamPolicy'
permission for the source bucket. Ask the bucket's administrator to
grant you the required permission and try again.
I have tried to apply various roles to the user runnning the transfer job, but I can't figure out how to overcome this problem.
Can anyone help me on this?
This permission storage.buckets.setIamPolicy can be granted with either roles/storage.legacyBucketOwner or roles/iam.securityAdmin role. It could be needed to keep the permissions applied to the source object.
Permissions for copying an object:
storage.objects.create (for the destination bucket)
storage.objects.delete (for the destination bucket)
storage.objects.get (for the source object)
storage.objects.getIamPolicy (for the source object)
storage.objects.setIamPolicy (for the destination bucket)
Please see:
Cloud IAM > Documentation > Understanding roles
Cloud Storage > Documentation > Reference > Cloud IAM roles