Extra Blob Created after Sink in Data Flow - azure-data-factory

I'm importing from Snowflake to Azure blob using data flow activity in Azure Data Factory.
I noticed that whenever I created a blob thru sink (placed inside provider/Inbound/ folder), I get an extra empty blob file outside Inbound.
Does this happen for all data flow sink to blob?

I had created a data flow and loaded data to blob from snowflake and I don't see any additional blob file generated outside my sink folder.
Make sure the sink connection is pointed to the correct folder and also double-check if any other process is running other than this dataflow which is causing to create an extra file outside the sink folder.
Snowflake source:
Sink:
Output file path to generate the out file:
Sink setting to add a date as the filename:
Output folder:
Output file generated after executing the data flow.

Related

Azure Data Factory - source dataset fails with "path does not resolve to any file(s)" when sink to a different directory is in progress

We have an ADF pipeline with Copy activity to transfer data from Azure Table Storage to a JSON file in an Azure Blob Storage container. When the data transfer is in progress, other pipelines that use this dataset as a source fail with the following error "Job failed due to reason: Path does not resolve to any file(s)".
The dataset has a property that indicates the container directory. This property is populated by the trigger time of the pipeline copying the data, so it writes to a different directory in each run. The other failing pipelines use a directory corresponding to an earlier run of the pipeline copying the data and I have confirmed that the path does exist.
Anyone knows why this is happening and how to solve it?
Probably your expression in directory and file textbox inside the dataset is not correct.
Check this link : Azure data flow not showing / in path to data source

Dataset format for copying csv files from a sftp server to blob storage

Hi I want to copy csv files from a sftp server to blob storage using copy activity of adf without processing the content. Is there a difference using a binary dataset instead of a csv dataset for the source and sink?
If I understand the ask correctly, you want to copy csv files from SFTP to blob storage as-is. If that is the case, you can use binary dataset on both source and sink of your copy activity. When using Binary dataset, the service does not parse file content but treat it as-is. Where as if you use CSV file format, the service will parse file content and you will have to configuration file specs in the connection settings of your dataset.
Please note that when using Binary dataset in copy activity, you can only copy from Binary dataset to Binary dataset.

How to use file name prefix in Data Factory when importing data into Azure data lake from SAP BW Open Hub?

I have a source of SAP BW Open Hub in data factory and a sink of Azure data lake gen2 and am using a copy activity to move the data.
I am attempting to transfer the data to the lake and split into numerous files, with 200000 rows per file. I would also like to be able to prefix all of the filenames e.g. 'cust_', so the files would be something along the lines of cust_1, cust_2, cust_3 etc.
This method only seems to be an issue when using SAP BW Open Hub as a source (it works fine when using SQL Server as a source. Please see the warning message below. After checking with out internal SAP BW team, they assure me that the data is in a tabular format, and no explicit partition is enabled, so there shouldn't be an issue.
When executing the copy activity, the files are transferred to the lake but the file name prefix setting is ignored, and the filenames instead are set automatically, as below (the name seems to be automatically made up of the SAP BW Open Hub table and the request ID):
Here is the source config:
All other properties on the other tabs are set to default and have been unchanged.
QUESTION: without using a data flow, is there any way to split the files when pulling from SAP BW Open Hub and also be able to dictate the filenames in the lake?
I tried to reproduce the issue and it works fine with a work around. Instead of splitting the data while copying from SAP BW to Azure data lake storage, you can just simply copy the entire exact data (without partition) into the Azure SQL Database. Please follow copy data from SAP Business warehouse by using azure data factory (make sure to use Azure SQL Database as sink).
Now the data is in you Azure SQL Database, you can now simply use the copy activity to copy the data to Azure data lake storage.
In source configuration, keep “Partition option” as None.
Source Config:
Sink config:
Output:

Azure Data Factory copy data - how do I save an http downloaded zip to blob store?

I have a simple copy data activity, with an HTTP connector source, and Azure Blob Storage as the sink. The file is a zip file so I am using a binary dataset for source and a binary for sink.
The data is properly fetched (I believe - looking at bytes transferred). However, I cannot save it to the Blob Store. In this scenario, you do not get to set the filename, only the path (container/directory). The filename used is the name of the file that I fetched.
However, the filename used in the sink step is prefixed with a backslash. It does not exist in the source, and I can find no way to remove it, and with a filename like that, I get a failure:
Failure happened on 'Sink' side. ErrorCode=UserErrorFailedFileOperation,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Upload file failed at path extract/coEDW\XXXX_Data_etc.zip.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=Microsoft.WindowsAzure.Storage.StorageException,Message=The remote server returned an error: (404) Not Found.,Source=Microsoft.WindowsAzure.Storage,StorageExtendedMessage=The specified resource does not exist. RequestId:bfe4e2f6-501e-002e-6a21-eaf10e000000 Time:2021-01-14T02:59:24.3300081Z,,''Type=System.Net.WebException,Message=The remote server returned an error: (404) Not Found.,Source=Microsoft.WindowsAzure.Storage,'
(filename masked by me)
I am sure the fix is simple, but I cannot figure this out. Can anyone help?
Thanks.
You will have to add a dynamic file name for your Blob sink.
You can use the below example to see how to dynamically add a file name using variables:
In this example, the file name is having date and time fields to mark each file with their date and time.
Let me know if that works.

Error ingesting a flat file into Azure Data Lake Store using Azure Data Lake

I am getting the error below for certain time slices while using COPY activity in a data factory pipeline but at the same time it is also able to copy the file successfully to the specified data lake folder for some slices. I don't understand if this is an issue with the factory or lake or is it a data manager gateway communication failure. The pipeline creates 2 GUID folders underneath the specified lake folder and a 0KB "temp" file but the COPY activity fails with the error below.
FileWriter|Error trying to write the file|uri:https://bcbsne.azuredatalakestore.net/webhdfs/v1/Consumer/FlatFiles/2017071013/2a621c14-bdac-4cd6-a0d3-efba4a4526a0/5a3ac937-8176-469d-b6c6-ca738f8ab3a6/_tmp_test.txt-0.tmp?op=APPEND&overwrite=true&user.name=testUser&api-version=2014-01-01&offset=0&length=27&append=true,Content:
Job ID: 41ff39a9-f6e0-4b94-8f9d-625dec7f84de
Log ID: Error