facing issue while using synapsesql (####.dfs.windows.core.net not found) - azure-devops

I was working on connecting dedicated sql pool(formerly sql DWH) to synapse spark notebooks. I was using spark.read.synapsesql(). I'm able to write data as table but not able to read data from the table.
val df:DataFrame = spark.read.option(Constants.SERVER, "XXXXX.database.windows.net")
.option(Constants.USER, "XXXXX")
.option(Constants.PASSWORD, "XXXXX")
.option(Constants.TEMP_FOLDER,"abfss://xxxxx#xxxx.dfs.core.windows.net/Tempfolder/")
.synapsesql("dedicated-poc.dbo.customer"
com.microsoft.spark.sqlanalytics.SQLAnalyticsConnectorException: com.microsoft.sqlserver.jdbc.SQLServerException: External file access failed due to internal error: 'Error occurred while accessing HDFS: Java exception raised on call to HdfsBridge_Connect.
Java exception message: Configuration property XXXXXXXX.dfs.core.windows.net not found.' at com.microsoft.spark.sqlanalytics.ItemsScanBuilder$PlanInputPartitionsUtilities$.extractDataAndGetLocation(ItemsScanBuilder.scala:183)
Permission: we have owner, storage data blob contributor access for synapse and specific user

To resolve the above exception, please try the below:
Try updating the code by adding below:
spark._jsc.hadoopConfiguration().set("fs.azure.account.key.xxxxx.dfs.core.windows.net", "xxxx==")
To read data from table, try including date data type in SQL Pool and then read.
Note:
Synapse RBAC roles do not grant permissions to create or manage SQL pools, Apache Spark pools, and Integration runtimes in Azure Synapse workspaces. Azure Owner or Azure Contributor roles on the resource group are required for these actions.
Give Azure owner role to resource group instead of synapse and specific user.
Check if there is any firewall rule that is blocking the connectivity and disable it.
If still the issue persists, raise a Azure support request
For more in detail, please refer below links:
Azure Synapse RBAC roles - Azure Synapse Analytics | Microsoft Docs
azure databricks - File read from ADLS Gen2 Error - Configuration property xxx.dfs.core.windows.net not found - Stack Overflow

Related

Cloud data fusion Permission denied due to datastream.connectionProfiles.discover

I am trying to create a cloud data fusion replication job from oracle to bigquery. Receiving the below error.
Failed to connect to the database due to below error :
io.grpc.StatusRuntimeException: PERMISSION_DENIED: Permission
'datastream.connectionProfiles.discover' denied on
'projects/<>/locations/us-central1/connectionProfiles'
Following the steps mentioned in the official google documentation.
I was able to grant the Datastream Admin role to the dataproc service account.
The cloud data fusion service account is not available in this project's IAM page. Not sure how to assign "Datastream Admin role" to the data fusion service account.
Any help is appreciated..
Found it we assign roles by the above way.

Azure Machine Learning workspace's storage account permission issue

Was working on az ml cli v2 to deploy real-time endpoint with command az ml online-deployment through Azure pipeline. had double confirmed that the service connection used in this pipeline task had added the permissions below in Azure Portal but still showing the same error.
ERROR: Error with code: You don't have permission to alter this storage account. Ensure that you have been assigned both Storage Blob Data Reader and Storage Blob Data Contributor roles.
Using the same service connection, we are able to perform the creation of online endpoint with az ml online-endpoint create in the same and other workspaces.
Issue was resolved. I did not change anything in the service principal and running it on second day using same yml got through the issue. I guess there might be some propagation issue, but longer than usual.

Azure Data Factory Manged Identity connection to Databricks

I've just created a new Azure Databricks and Azure Data Factory services inside my subscription.
For ADF, I've also created a SystemAssigned (Managed Identity) via TerraForm. Then, I've added this managed identity to owners of Databricks workspace and I've also added the service principal to admins inside Databricks workspace (tried both via TerraForm and via SCIM).
When I try adding a Databricks linked service to Data Factory, I always receive the error:
<title>Error 403 User not authorized.</title>
</head>
<body><h2>HTTP ERROR 403</h2>
<p>Problem accessing /api/2.0/clusters/get. Reason:
<pre> User not authorized.</pre></p>
SCIM is saying that my application ID is in the admins group:
SCIM response
What am I doing wrong?
If one attempts to set up a linked service to a Databricks workspace to, without the correct role assignment set up, it will fail.
To grant the correct role assignment:
Grant the contributor role to the managed identity.
The managed identity in this instance will be the name of the Data
Factory that the Databricks linked service will be created on.
The following diagram shows how to grant the “Contributor” role assignment via the Azure Portal.
This should resolve your issue.

Azure Synapse analytics connectivity with Tableau

I have created server less synapse analytics with database and table in it.
I have tried using the SQL query to view data within synapse analytics and can view it as expected but when I try to connect Tableau desktop version 2020.2.9 (as the connector is only available in version 2020.2 and above) with connector provide in tableau the connection establish successfully as I can see the list of database and tables in it but when I try to click on the table to view data below issue pop-up.
An error occurred while communicating with Azure Synapse Analytics
Unable to connect to the server. Check that the server is running and that you have access privileges to the requested database.
Error Code: 2F0F5E42
[Microsoft][ODBC SQL Server Driver][SQL Server]External table 'dbo' is not accessible because location does not exist or it is used by another process.
The table "[dbo].[Diagnosis]" does not exist.
You will have to check what type of server less synapse you are using whether a standard dedicated cluster or an On-Demand cluster. Further, you can reach to Tableau support team to check with the connectivity based on your cluster type.

Databricks fails accessing a Data Lake Gen1 while trying to enumerate a directory

I am using (well... trying to use) Azure Databricks and I have created a notebook.
I would like the notebook to connect my Azure Data Lake (Gen1) and transform the data. I followed the documentation and put the code in the first cell of my notebook:
spark.conf.set("dfs.adls.oauth2.access.token.provider.type", "ClientCredential")
spark.conf.set("dfs.adls.oauth2.client.id", "**using the application ID of the registered application**")
spark.conf.set("dfs.adls.oauth2.credential", "**using one of the registered application keys**")
spark.conf.set("dfs.adls.oauth2.refresh.url", "https://login.microsoftonline.com/**using my-tenant-id**/oauth2/token")
dbutils.fs.ls("adl://**using my data lake uri**.azuredatalakestore.net/tenantdata/events")
The execution fails with this error:
com.microsoft.azure.datalake.store.ADLException: Error enumerating
directory /
Operation null failed with exception java.io.IOException : Server
returned HTTP response code: 400 for URL:
https://login.microsoftonline.com/using my-tenant-id/oauth2/token
Last encountered exception thrown after 5 tries.
[java.io.IOException,java.io.IOException,java.io.IOException,java.io.IOException,java.io.IOException]
[ServerRequestId:null] at
com.microsoft.azure.datalake.store.ADLStoreClient.getExceptionFromResponse(ADLStoreClient.java:1169)
at
com.microsoft.azure.datalake.store.ADLStoreClient.enumerateDirectoryInternal(ADLStoreClient.java:558)
at
com.microsoft.azure.datalake.store.ADLStoreClient.enumerateDirectory(ADLStoreClient.java:534)
at
com.microsoft.azure.datalake.store.ADLStoreClient.enumerateDirectory(ADLStoreClient.java:398)
at
com.microsoft.azure.datalake.store.ADLStoreClient.enumerateDirectory(ADLStoreClient.java:384)
I have given the registered application the Reader role to the Data Lake:
Question
How can I allow Spark to access the Data Lake?
Update
I have granted both the tenantdata and events folders Read and Execute access:
The RBAC roles on the Gen1 lake do not grant access to the data (just the resource itself), with exception of the Owner role which grants Super User access and does grant full data access.
You must grant access to the folders/files themselves using Data Explorer in the Portal or download storage explorer using POSIX permissions.
This guide explains the detail of how to do that: https://learn.microsoft.com/en-us/azure/data-lake-store/data-lake-store-access-control
Reference: https://learn.microsoft.com/en-us/azure/data-lake-store/data-lake-store-secure-data
Only the Owner role automatically enables file system access. The
Contributor, Reader, and all other roles require ACLs to enable any
level of access to folders and files