Azure Synapse Exception while reading table from synapse Dwh - pyspark

While reading from table I'm getting
jdbc.SQLServerException : Create External Table As Sect statement failed as the path ####### could not be used for export.
Error Code :105005

jdbc.SQLServerException : Create External Table As Sect statement
failed as the path ####### could not be used for export. Error Code
:105005
This error occurs because of PolyBase can't complete the operation. The operation failure can be due to the following reasons :
Network failure when you try to access the Azure blob storage
The configuration of the Azure storage account.
You can fix this issue by following this article it helps you resolve the problem that occurs when you do a CREATE EXTERNAL TABLE AS SELECT.
For more in detail, please refer below links:
https://learn.microsoft.com/en-us/troubleshoot/sql/analytics-platform-system/error-cetas-to-blob-storage
https://www.sqlservercentral.com/articles/access-external-data-from-azure-synapse-analytics-using-polybase
https://knowledge.informatica.com/s/article/000175628?language=en_US

Related

ADF Copy data from Azure Data Bricks Delta Lake to Azure Sql Server

I'm trying to use the data copy activity to extract information from azure databricks delta lake, but I've noticed that it doesn't pass the information directly from the delta lake to the SQL server I need, but must pass it to an azure blob storage, when running it, it throws the following error
ErrorCode=AzureDatabricksCommandError,Hit an error when running the command in Azure Databricks. Error details: Failure to initialize configurationInvalid configuration value detected for fs.azure.account.key Caused by: Invalid configuration value detected for fs.azure.account.key
Looking for information I found a possible solution but it didn't work.
Invalid configuration value detected for fs.azure.account.key copy activity fails
Does anyone have any idea how the hell to pass information from an azure databricks delta lake table to a table in Sql Server??
These are some images of the structure that I have in ADF:
In the image I get a message that tells me that I must have a Storage Account to continue
These are the configuration images, and execution failed:
Conf:
Fail:
Thank you very much
The solution for this problem was the following:
Correct the way the Storage Access Key configuration was being defined:
in the instruction: spark.hadoop.fs.azure.account.key..blob.core.windows.net
The following change must be made:
spark.hadoop.fs.azure.account.key.
storageaccountname.dfs.core.windows.net
Does anyone have any idea how the hell to pass information from an azure databricks delta lake table to a table in Sql Server??
To achieve Above scenario, follow below steps:
First go to your Databricks cluster Edit it and under Advance options >> spark >> spark config Add below code if you are using blob storage.
spark.hadoop.fs.azure.account.key.<storageaccountname>.blob.core.windows.net <Accesskey>
spark.databricks.delta.optimizeWrite.enabled true
spark.databricks.delta.autoCompact.enabled true
After that as you are using SQL Database as a sink.
Enable staging and give same blob storage account linked service as Staging account linked service give storage path from your blob storage.
And then debug it. make sure you complete Prerequisites from official document.
My sample Input:
Output in SQL:

Synapse suddenly started having problem about having hash distribution column in a Merge

I started geting error at 06-25-2022 in my Fact table flows. Before that there was no problem and nothing has changed.
The Error is:
Operation on target Fact_XX failed: Operation on target Merge_XX failed: Execution fail against sql server. Sql error number: 100090. Error Message: Updating a distribution key column in a MERGE statement is not supported.
Sql error number: 100090. Error Message: Updating a distribution key column in a MERGE statement is not supported.
You got this error because updating a distribution key column through MERGE command is not supported in azure synapse currently.
The MERGE is currently in preview for Azure Synapse Analytics.
You can refer to the official documentation for more details about this as given below: -
https://learn.microsoft.com/en-us/sql/t-sql/statements/merge-transact-sql?view=azure-sqldw-latest&preserve-view=true
It clearly states that The MERGE command in Azure Synapse Analytics, which is presently in preview, may under certain conditions, leave the target table in an inconsistent state, with rows placed in the wrong distribution, causing later queries to return wrong results in some cases.

Azure Data Factory - source dataset fails with "path does not resolve to any file(s)" when sink to a different directory is in progress

We have an ADF pipeline with Copy activity to transfer data from Azure Table Storage to a JSON file in an Azure Blob Storage container. When the data transfer is in progress, other pipelines that use this dataset as a source fail with the following error "Job failed due to reason: Path does not resolve to any file(s)".
The dataset has a property that indicates the container directory. This property is populated by the trigger time of the pipeline copying the data, so it writes to a different directory in each run. The other failing pipelines use a directory corresponding to an earlier run of the pipeline copying the data and I have confirmed that the path does exist.
Anyone knows why this is happening and how to solve it?
Probably your expression in directory and file textbox inside the dataset is not correct.
Check this link : Azure data flow not showing / in path to data source

Azure Data Factory CICD error: The document creation or update failed because of invalid reference

All, when running a build pipeline using Azure Devops with ARM template, the process is consistently failing when trying to deploy a dataset or a reference to a dataset with this error:
ARM Template deployment: Resource Group scope (AzureResourceManagerTemplateDeployment)
BadRequest: The document creation or update failed because of invalid reference 'dataset_1'.
I've tried renaming the dataset and also recreating it to see if that would help.
I then deleted the dataset_1.json file from the repo and still get the same message so it's some reference to this dataset and not the dataset itself I think. I've looked through all the other files for references to this but they all look fine.
Any ideas on how to troubleshoot this?
thanks
try this
Looks like you have created 'myTestLinkedService' linked service, tested connection but haven't published it yet and trying to reference that linked service in the new dataset that you are trying to create using Powershell.
In order to reference any data factory entity from Powershell, please make sure those entities are published first. Please try publishing the linked service first from the portal and then try to run your Powershell script to create the new dataset/actvitiy.
I think I found the issue. When I went into the detailed logs I found that in addition to this error there was an error message about an invalid SQL connection string, so I though it may be related since the dataset in question uses Azure SQL database linked service.
I adjusted the connection string and this seems to have solved the issue.

External Table on DELTA format files in ADLS Gen 1

We have number of databricks DELTA tables created on ADLS Gen1. and also, there are external tables built on top each of those tables in one of the databricks workspace.
similarly, I am trying to create same sort of external tables on the same DELTA format files,but in different workspace.
I do have read only access via Service principle on ADLS Gen1. So I can read DELTA files through spark data-frames, as in given below:
read_data_df = spark.read.format("delta").load('dbfs:/mnt/data/<foldername>')
I can even able to create hive external tables, but I do see following warning while reading data from the same table:
Error in SQL statement: AnalysisException: Incompatible format detected.
A transaction log for Databricks Delta was found at `dbfs:/mnt/data/<foldername>/_delta_log`,
but you are trying to read from `dbfs:/mnt/data/<foldername>` using format("hive"). You must use
'format("delta")' when reading and writing to a delta table.
To disable this check, SET spark.databricks.delta.formatCheck.enabled=false
To learn more about Delta, see https://learn.microsoft.com/azure/databricks/delta/index
;
If I create external table 'using DELTA', then I see a different access error as in:
Caused by: org.apache.hadoop.security.AccessControlException:
OPEN failed with error 0x83090aa2 (Forbidden. ACL verification failed.
Either the resource does not exist or the user is not authorized to perform the requested operation.).
failed with error 0x83090aa2 (Forbidden. ACL verification failed.
Either the resource does not exist or the user is not authorized to perform the requested operation.).
Does it mean that I would need full access, rather just READ ONLY?, on those underneath file system?
Thanks
Resolved after upgrading to Databricks Runtime environment to runtime version DBR-7.3.