Databricks fails accessing a Data Lake Gen1 while trying to enumerate a directory - scala

I am using (well... trying to use) Azure Databricks and I have created a notebook.
I would like the notebook to connect my Azure Data Lake (Gen1) and transform the data. I followed the documentation and put the code in the first cell of my notebook:
spark.conf.set("dfs.adls.oauth2.access.token.provider.type", "ClientCredential")
spark.conf.set("dfs.adls.oauth2.client.id", "**using the application ID of the registered application**")
spark.conf.set("dfs.adls.oauth2.credential", "**using one of the registered application keys**")
spark.conf.set("dfs.adls.oauth2.refresh.url", "https://login.microsoftonline.com/**using my-tenant-id**/oauth2/token")
dbutils.fs.ls("adl://**using my data lake uri**.azuredatalakestore.net/tenantdata/events")
The execution fails with this error:
com.microsoft.azure.datalake.store.ADLException: Error enumerating
directory /
Operation null failed with exception java.io.IOException : Server
returned HTTP response code: 400 for URL:
https://login.microsoftonline.com/using my-tenant-id/oauth2/token
Last encountered exception thrown after 5 tries.
[java.io.IOException,java.io.IOException,java.io.IOException,java.io.IOException,java.io.IOException]
[ServerRequestId:null] at
com.microsoft.azure.datalake.store.ADLStoreClient.getExceptionFromResponse(ADLStoreClient.java:1169)
at
com.microsoft.azure.datalake.store.ADLStoreClient.enumerateDirectoryInternal(ADLStoreClient.java:558)
at
com.microsoft.azure.datalake.store.ADLStoreClient.enumerateDirectory(ADLStoreClient.java:534)
at
com.microsoft.azure.datalake.store.ADLStoreClient.enumerateDirectory(ADLStoreClient.java:398)
at
com.microsoft.azure.datalake.store.ADLStoreClient.enumerateDirectory(ADLStoreClient.java:384)
I have given the registered application the Reader role to the Data Lake:
Question
How can I allow Spark to access the Data Lake?
Update
I have granted both the tenantdata and events folders Read and Execute access:

The RBAC roles on the Gen1 lake do not grant access to the data (just the resource itself), with exception of the Owner role which grants Super User access and does grant full data access.
You must grant access to the folders/files themselves using Data Explorer in the Portal or download storage explorer using POSIX permissions.
This guide explains the detail of how to do that: https://learn.microsoft.com/en-us/azure/data-lake-store/data-lake-store-access-control
Reference: https://learn.microsoft.com/en-us/azure/data-lake-store/data-lake-store-secure-data
Only the Owner role automatically enables file system access. The
Contributor, Reader, and all other roles require ACLs to enable any
level of access to folders and files

Related

facing issue while using synapsesql (####.dfs.windows.core.net not found)

I was working on connecting dedicated sql pool(formerly sql DWH) to synapse spark notebooks. I was using spark.read.synapsesql(). I'm able to write data as table but not able to read data from the table.
val df:DataFrame = spark.read.option(Constants.SERVER, "XXXXX.database.windows.net")
.option(Constants.USER, "XXXXX")
.option(Constants.PASSWORD, "XXXXX")
.option(Constants.TEMP_FOLDER,"abfss://xxxxx#xxxx.dfs.core.windows.net/Tempfolder/")
.synapsesql("dedicated-poc.dbo.customer"
com.microsoft.spark.sqlanalytics.SQLAnalyticsConnectorException: com.microsoft.sqlserver.jdbc.SQLServerException: External file access failed due to internal error: 'Error occurred while accessing HDFS: Java exception raised on call to HdfsBridge_Connect.
Java exception message: Configuration property XXXXXXXX.dfs.core.windows.net not found.' at com.microsoft.spark.sqlanalytics.ItemsScanBuilder$PlanInputPartitionsUtilities$.extractDataAndGetLocation(ItemsScanBuilder.scala:183)
Permission: we have owner, storage data blob contributor access for synapse and specific user
To resolve the above exception, please try the below:
Try updating the code by adding below:
spark._jsc.hadoopConfiguration().set("fs.azure.account.key.xxxxx.dfs.core.windows.net", "xxxx==")
To read data from table, try including date data type in SQL Pool and then read.
Note:
Synapse RBAC roles do not grant permissions to create or manage SQL pools, Apache Spark pools, and Integration runtimes in Azure Synapse workspaces. Azure Owner or Azure Contributor roles on the resource group are required for these actions.
Give Azure owner role to resource group instead of synapse and specific user.
Check if there is any firewall rule that is blocking the connectivity and disable it.
If still the issue persists, raise a Azure support request
For more in detail, please refer below links:
Azure Synapse RBAC roles - Azure Synapse Analytics | Microsoft Docs
azure databricks - File read from ADLS Gen2 Error - Configuration property xxx.dfs.core.windows.net not found - Stack Overflow

Create Service Principle Connection from Crystal Reports to Azure Synapse Analytics

I have data held in an Azure Data Lake Gen 2 storage container. I would like to provision this data for an existing report authored in Crystal Reports using SQL on demand.
During development I used my own Azure AD login via an ODBC connection on my local machine. I have access to the Synapse environment and also the data lake. This worked successfully and although slow, pulled all information required.
To deploy this solution correctly I need to remove my AAD creds and use a provisioned service principle. I have given the service principle to read from the data lake and also added the principle to the SQL database. Now I am stuck on how to use the principle to connect to Crystal Reports.
I have tried the same authentication type as with my AAD but now I am using a clientID not a email. So when the system prompts for connection details it wants you to sign in and does not accept the clientID.
Does anyone have any suggestions on how to connect to Crystal Reports using this way or any other way?
Also: My org does not want this user or app reg to have restricted permissions so therefore adding them to the RBAC "synapse admin" wont work.
Thanks
Tom
Found a way around this.
Create a service account user on Azure Portal. Head to Synapse Analytics and open blank SQL script to give the user minimal permissions.
*USE [master]*
CREATE LOGIN [serviceaccountsynapseuser#company.onmicrosoft.com] FROM EXTERNAL PROVIDER
GRANT CONNECT ANY DATABASE TO [serviceaccountsynapseuser#company.onmicrosoft.com]
GRANT SELECT ALL USER SECURABLES TO [serviceaccountsynapseuser#company.onmicrosoft.com]
*USE [Reporting] (Serverless SQL DB)*
CREATE USER [serviceaccountsynapseuser#company.onmicrosoft.com] FROM EXTERNAL PROVIDER
ALTER ROLE db_datareader ADD MEMBER [serviceaccountsynapseuser#company.onmicrosoft.com]
Finally head to the storage account and give the user storage blob reader role.

Can you extract data from an Excel Workbook using Azure Data Factory that has Azure Information Protection

I have an internal document (Excel) that has an Azure information Protection / O365 Unified Sensitivity Labelling applied to it.
Im trying to extract that data, but I'm getting an Encryption Error because and rightly so the information is encrypted.
The process:
The document is pulled from Sharepoint into a blob storage container and then Azure Data factory picks up the file using the Copy activity and reads the contents into an Azure SQL Database
Error message:
ErrorCode=EncryptedExcelIsNotSupported,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Encrypted excel file 'dummy file.xlsx' is not supported, please remove its password.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=ICSharpCode.SharpZipLib.Zip.ZipException,Message=Wrong Local header signature: 0xE011CFD0,Source=ICSharpCode.SharpZipLib,'
I have a Linked Service using a Service principal that can connect to the file, but previewing the data results in a message saying the file is encrypted.
I presume I would need to give permissions to the Service Principal, but im still stuck what those would be.
I tried adding Azure Rights Management read/create in the API permissions but that still hasn't worked
Data Factory can't read the file data which is protected by other service.
If you want to copy data from the encrypted files, you must have the permission to access it.

Google Data Studio - "No Data Set Access" when displaying charts from PostgreSQL data source

I'm hitting the following error when trying to display graphs with any of my PostgreSQL data sources.
No Data Set Access
Insufficient permissions to the underlying data set.
Access denied, please check your username and password.
Request is missing required authentication credential. Expected OAuth 2 access token, login cookie or other valid authentication credential. See https://developers.google.com/identity/sign-in/web/devconsole-project.
I've whitelisted all Google Data Studio IPs on my PostgreSQL instance and I have no issue adding the corresponding data source to my Google Data Studio report (Add data > PostgreSQL > Authenticate (using a PostgreSQL user) > Add) but every time I try to add a graph I get this error message.
Does anyone know what is going wrong here?
I was able to solve the issue by granting all privileges on all tables to the user I use to authenticate on Google Data Studio. You need to run the following SQL query with a superuser (such as postgres):
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO my_user;
Another option to solve the issue is authenticating with a superuser (such as postgres).
In case some of you happen to be blocked by logs appearing on the charts, I recommend trying to add the data with a SELECT * FROM my_table in "CUSTOM QUERY" instead of using "TABLES". The logs are more explicit.

Unable to transfer GCS bucket from one account to another

I am trying to create a transfer job in Data Transfer, to copy all files in a bucket belonging to one account to an existing bucket belonging to another account.
I get access to both source and destination buckets, I get "green light" in the wizard, but when I try to run the transfer job I get the following error message:
To complete this transfer, you need the 'storage.buckets.setIamPolicy'
permission for the source bucket. Ask the bucket's administrator to
grant you the required permission and try again.
I have tried to apply various roles to the user runnning the transfer job, but I can't figure out how to overcome this problem.
Can anyone help me on this?
This permission storage.buckets.setIamPolicy can be granted with either roles/storage.legacyBucketOwner or roles/iam.securityAdmin role. It could be needed to keep the permissions applied to the source object.
Permissions for copying an object:
storage.objects.create (for the destination bucket)
storage.objects.delete (for the destination bucket)
storage.objects.get (for the source object)
storage.objects.getIamPolicy (for the source object)
storage.objects.setIamPolicy (for the destination bucket)
Please see:
Cloud IAM > Documentation > Understanding roles
Cloud Storage > Documentation > Reference > Cloud IAM roles