Microsoft Purview Data Lake Gen2 scan fails to discover assets - azure-purview

I registered a data lake Gen2 resource and gave the Purview Managed Identity read access to the data lake storage account as explained in detail in the purview documentation. Then I started a scan and made sure that the connection was valid.
Yet when I go to the next screen (Scope your scan) and try to open the container I get the access control error:
Purview scan Request failed with status code 403 Error: (3835) Failed to access the ADLS Gen2 storage with the Managed Identity.
It is no surprise that the scan does not find any assets other than the data lake itself.
What can be done to solve this problem so the scan registers all the assets in my data lake?

Related

Delete a file in sharepoint using Azure Data Factory Delete Activity

I am trying to delete a file that is located in a sharepoint directory after successful copy activity. The Delete Activity is having the following properties:
Linked Service : HTTP
DataSet : Excel
Additional Header: #{concat('Authorization: Bearer ',activity('GetToken').output.access_token)}
Here, GetToken is the Web Activity in ADF that generates a token number for accessing SharePoint.
When I am running the pipeline, I am getting the below error:
Invalid delete activity payload with 'folderPath' that is required and cannot be empty.
I have no clue on how to tackle this.
As per my understanding you are trying to delete a file in Sharepoint online using Azure Data Factory.
Currently delete activity in ADF only supports the below data stores and not sharepoint online. which is why you are receiving the above error.
Azure Blob storage
Azure Data Lake Storage Gen1
Azure Data Lake Storage Gen2
Azure Files
File System
FTP
SFTP
Amazon S3
Amazon S3 Compatible Storage
Google Cloud Storage
Oracle Cloud Storage
HDFS
Image: Delete activity Supported Data stores
Ref: Delete activity supported data sources
As a workaround you may try exploring HTTP connector. OR you can use custom activity and write your own code to delete files from SharePoint.
Hope this info helps.

error browsing directory under ADLS Gen2 container for Azure Data Factory

I am creating a dataset in Azure Data Factory. This dataset will be a Parquet file within a directory under a certain container in an ADLS Gen2 account. The container name is 'raw', and the directory that I want to place the file into is source/system1/FullLoad. When I click on Browse next to File path, I am able to access the container, but I cannot access the directory. When I hit folder 'source', I get the error shown below.
How can I drill to the desired directory? As the error message indicates, I suspect that it's something to do with permissions to access the data (the Parquet file doesn't exist yet, as it will be used as a sink in a copy activity that hasn't been run yet), but I don't know how to resolve.
Thanks for confirming putting the resolution for others if anyone face this issue.
The user or managed identity you are using for your data factory should have storage data blob contributor access on the storage account. You can check it from azure portal, go to your storage account, navigate to the container and then directory, click on Access Control on the left panel and check role assignment. If it is missing add the role assignment of storage data blob contributor to your managed identity.

Linked Service with self-hosted integration runtime is not supported in data flow in Azure Data Factory

Step to reproduce:
I created a Copy Data first in the pipeline to simple transfer CSV files frol Azure VM to Azure Blob storage. I always use IRPOC1 as a connection via integration runtime and connect using SAS URI and SAS Token to my Blob Storage
After validate and run my first Copy Data, I successfully have CSV file transfer from my VM to Blob storage
I tried to add a new Data Flow after the Copy Data activity
In my Data Flow, my source is the Blob storage containing the CSV files transferred from VM, my Sink is my Azure SQL Database with successful connection
However, when I ran validation, I got the error message on my Data Flow Source:
Linked Service with self-hosted integration runtime is not supported in data flow.
I saw someone replied on Microsoft Azure Document issue Github that I need to use Copy Data to transfer data to Blob first. Then use the source from this blob with data. This is what I did but I still have the same error. Could you please let me know how I can fix this?
The Data Flow source dataset must use a Linked Service that uses an Azure IR, not a self-hosted IR.
Go to the dataset in your data flow Source, click "Open". In the dataset page, click "Edit" next to Linked Service.
In the Linked Service dialog, make sure you are using an Azure Integration Runtime, not a Self-hosted IR.

Databricks fails accessing a Data Lake Gen1 while trying to enumerate a directory

I am using (well... trying to use) Azure Databricks and I have created a notebook.
I would like the notebook to connect my Azure Data Lake (Gen1) and transform the data. I followed the documentation and put the code in the first cell of my notebook:
spark.conf.set("dfs.adls.oauth2.access.token.provider.type", "ClientCredential")
spark.conf.set("dfs.adls.oauth2.client.id", "**using the application ID of the registered application**")
spark.conf.set("dfs.adls.oauth2.credential", "**using one of the registered application keys**")
spark.conf.set("dfs.adls.oauth2.refresh.url", "https://login.microsoftonline.com/**using my-tenant-id**/oauth2/token")
dbutils.fs.ls("adl://**using my data lake uri**.azuredatalakestore.net/tenantdata/events")
The execution fails with this error:
com.microsoft.azure.datalake.store.ADLException: Error enumerating
directory /
Operation null failed with exception java.io.IOException : Server
returned HTTP response code: 400 for URL:
https://login.microsoftonline.com/using my-tenant-id/oauth2/token
Last encountered exception thrown after 5 tries.
[java.io.IOException,java.io.IOException,java.io.IOException,java.io.IOException,java.io.IOException]
[ServerRequestId:null] at
com.microsoft.azure.datalake.store.ADLStoreClient.getExceptionFromResponse(ADLStoreClient.java:1169)
at
com.microsoft.azure.datalake.store.ADLStoreClient.enumerateDirectoryInternal(ADLStoreClient.java:558)
at
com.microsoft.azure.datalake.store.ADLStoreClient.enumerateDirectory(ADLStoreClient.java:534)
at
com.microsoft.azure.datalake.store.ADLStoreClient.enumerateDirectory(ADLStoreClient.java:398)
at
com.microsoft.azure.datalake.store.ADLStoreClient.enumerateDirectory(ADLStoreClient.java:384)
I have given the registered application the Reader role to the Data Lake:
Question
How can I allow Spark to access the Data Lake?
Update
I have granted both the tenantdata and events folders Read and Execute access:
The RBAC roles on the Gen1 lake do not grant access to the data (just the resource itself), with exception of the Owner role which grants Super User access and does grant full data access.
You must grant access to the folders/files themselves using Data Explorer in the Portal or download storage explorer using POSIX permissions.
This guide explains the detail of how to do that: https://learn.microsoft.com/en-us/azure/data-lake-store/data-lake-store-access-control
Reference: https://learn.microsoft.com/en-us/azure/data-lake-store/data-lake-store-secure-data
Only the Owner role automatically enables file system access. The
Contributor, Reader, and all other roles require ACLs to enable any
level of access to folders and files

Copy data from Data Lake Storage to Database present in azure Analysis server using copy activity

Is there any way to copy the data from azure data lake storage to database present in azure analysis server using azure data factory ?
I am trying to use copy activity to do the same task but I don't know how to specify the Analysis Server Database as the destination in output dataset.
For the data connector not in data factory support list, you can write custom activity to access your data
Here is the doc: https://learn.microsoft.com/en-us/azure/data-factory/data-factory-use-custom-activities
Thanks,
Charles