Azure Data Factory - source dataset fails with "path does not resolve to any file(s)" when sink to a different directory is in progress - azure-data-factory

We have an ADF pipeline with Copy activity to transfer data from Azure Table Storage to a JSON file in an Azure Blob Storage container. When the data transfer is in progress, other pipelines that use this dataset as a source fail with the following error "Job failed due to reason: Path does not resolve to any file(s)".
The dataset has a property that indicates the container directory. This property is populated by the trigger time of the pipeline copying the data, so it writes to a different directory in each run. The other failing pipelines use a directory corresponding to an earlier run of the pipeline copying the data and I have confirmed that the path does exist.
Anyone knows why this is happening and how to solve it?

Probably your expression in directory and file textbox inside the dataset is not correct.
Check this link : Azure data flow not showing / in path to data source

Related

ADF Copy data from Azure Data Bricks Delta Lake to Azure Sql Server

I'm trying to use the data copy activity to extract information from azure databricks delta lake, but I've noticed that it doesn't pass the information directly from the delta lake to the SQL server I need, but must pass it to an azure blob storage, when running it, it throws the following error
ErrorCode=AzureDatabricksCommandError,Hit an error when running the command in Azure Databricks. Error details: Failure to initialize configurationInvalid configuration value detected for fs.azure.account.key Caused by: Invalid configuration value detected for fs.azure.account.key
Looking for information I found a possible solution but it didn't work.
Invalid configuration value detected for fs.azure.account.key copy activity fails
Does anyone have any idea how the hell to pass information from an azure databricks delta lake table to a table in Sql Server??
These are some images of the structure that I have in ADF:
In the image I get a message that tells me that I must have a Storage Account to continue
These are the configuration images, and execution failed:
Conf:
Fail:
Thank you very much
The solution for this problem was the following:
Correct the way the Storage Access Key configuration was being defined:
in the instruction: spark.hadoop.fs.azure.account.key..blob.core.windows.net
The following change must be made:
spark.hadoop.fs.azure.account.key.
storageaccountname.dfs.core.windows.net
Does anyone have any idea how the hell to pass information from an azure databricks delta lake table to a table in Sql Server??
To achieve Above scenario, follow below steps:
First go to your Databricks cluster Edit it and under Advance options >> spark >> spark config Add below code if you are using blob storage.
spark.hadoop.fs.azure.account.key.<storageaccountname>.blob.core.windows.net <Accesskey>
spark.databricks.delta.optimizeWrite.enabled true
spark.databricks.delta.autoCompact.enabled true
After that as you are using SQL Database as a sink.
Enable staging and give same blob storage account linked service as Staging account linked service give storage path from your blob storage.
And then debug it. make sure you complete Prerequisites from official document.
My sample Input:
Output in SQL:

Extra Blob Created after Sink in Data Flow

I'm importing from Snowflake to Azure blob using data flow activity in Azure Data Factory.
I noticed that whenever I created a blob thru sink (placed inside provider/Inbound/ folder), I get an extra empty blob file outside Inbound.
Does this happen for all data flow sink to blob?
I had created a data flow and loaded data to blob from snowflake and I don't see any additional blob file generated outside my sink folder.
Make sure the sink connection is pointed to the correct folder and also double-check if any other process is running other than this dataflow which is causing to create an extra file outside the sink folder.
Snowflake source:
Sink:
Output file path to generate the out file:
Sink setting to add a date as the filename:
Output folder:
Output file generated after executing the data flow.

How to Merge files using For each activity in Azure Data Factory

I am using ADF to copy the files from a file server to Azure Blob storage. The files in the directory have the same structure without headers and I need to merge them to a single file in the Blob storage.
I created a ADF pipeline which uses get metadata to fetch the childItems and For each activity to loop through the files one by one
Inside the For each activity there is a copy data activity where I use the file name from the get metadata activity
In the sink setting, I use Merge files as the copy behaviour
When I execute the pipeline, the copy activity gets executed for 3 times and the file in the blob storage gets overwritten with the last file. How do I merge all the 3 files ?
I know we can use a wildcard pattern to select files. Suppose I have 3 files to begin with, when I run the get metadata activity and by the time the control comes to copy job activity and by this time if there is an addition of 4th file in the folder then with the wildcard pattern I will process all the 4 files and get metadata activity gives me the file names of the 3 files which I will use for archiving which is not right
Any help is appreciated
You don't need a for each for this. Just one copy activity that Marges all three files.
The trick would be to identify the source files using file path wildcards. if the requirement is to merge all file from source dataset, then merge behaviour in copy activity should be sufficient.

Azure Data Factory copy data - how do I save an http downloaded zip to blob store?

I have a simple copy data activity, with an HTTP connector source, and Azure Blob Storage as the sink. The file is a zip file so I am using a binary dataset for source and a binary for sink.
The data is properly fetched (I believe - looking at bytes transferred). However, I cannot save it to the Blob Store. In this scenario, you do not get to set the filename, only the path (container/directory). The filename used is the name of the file that I fetched.
However, the filename used in the sink step is prefixed with a backslash. It does not exist in the source, and I can find no way to remove it, and with a filename like that, I get a failure:
Failure happened on 'Sink' side. ErrorCode=UserErrorFailedFileOperation,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Upload file failed at path extract/coEDW\XXXX_Data_etc.zip.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=Microsoft.WindowsAzure.Storage.StorageException,Message=The remote server returned an error: (404) Not Found.,Source=Microsoft.WindowsAzure.Storage,StorageExtendedMessage=The specified resource does not exist. RequestId:bfe4e2f6-501e-002e-6a21-eaf10e000000 Time:2021-01-14T02:59:24.3300081Z,,''Type=System.Net.WebException,Message=The remote server returned an error: (404) Not Found.,Source=Microsoft.WindowsAzure.Storage,'
(filename masked by me)
I am sure the fix is simple, but I cannot figure this out. Can anyone help?
Thanks.
You will have to add a dynamic file name for your Blob sink.
You can use the below example to see how to dynamically add a file name using variables:
In this example, the file name is having date and time fields to mark each file with their date and time.
Let me know if that works.

Error ingesting a flat file into Azure Data Lake Store using Azure Data Lake

I am getting the error below for certain time slices while using COPY activity in a data factory pipeline but at the same time it is also able to copy the file successfully to the specified data lake folder for some slices. I don't understand if this is an issue with the factory or lake or is it a data manager gateway communication failure. The pipeline creates 2 GUID folders underneath the specified lake folder and a 0KB "temp" file but the COPY activity fails with the error below.
FileWriter|Error trying to write the file|uri:https://bcbsne.azuredatalakestore.net/webhdfs/v1/Consumer/FlatFiles/2017071013/2a621c14-bdac-4cd6-a0d3-efba4a4526a0/5a3ac937-8176-469d-b6c6-ca738f8ab3a6/_tmp_test.txt-0.tmp?op=APPEND&overwrite=true&user.name=testUser&api-version=2014-01-01&offset=0&length=27&append=true,Content:
Job ID: 41ff39a9-f6e0-4b94-8f9d-625dec7f84de
Log ID: Error