Issue while updating copy activity in ADF - azure-data-factory

I want to update a source excel column with a particular string.
My source contains n columns. I need to check where the string apple exists in any one of the columns. If the value exist in any column I need to replace the apple with orange string. And output the excel. How can I do this in ADF?
Note:I cannot use dataflows since we were using a self hosted vm

Excel files has lot of limitations in ADF like it is not supported in the copy activity sink and in Data flow sink as well.
You can raise the feature request for that in ADF.
So, try the above operation with a csv and copy the result to a csv in blob which later you can change it to Excel in your local machine.
To do the operations like above, Data flow can be a better option than doing it with normal activities as Dataflow deals with the transformations.
But Data flow won't support Self hosted linked service.
So, as a workaround first copy the Excel file as csv to Blob storage using copy activity. Create a Blob linked service for that to use in dataflow.
Now follow the below process in Data flow.
Source CSV from Blob:
Derived column transformation:
give the condition for each column case(col1=="apple", "orange", col1)
Sink :
In Sink settings specify as Output to single file.
After Pipeline execution a csv will be generated in the blob. You can convert it to Excel in your local machine.

Related

Move Entire Azure Data Lake Folders using Data Factory?

I'm currently using Azure Data Factory to load flat file data from our Gen 2 data lake into Synapse database tables. Unfortunately, we receive (many) thousands of files into timestamped folders for each feed. I'm currently using Synapse external tables to copy this data into standard heap tables.
Since each folder contains so many files, I'd like to move (or Copy/Delete) the entire folder (after processing) somewhere else in the lake. Is there some practical way to do that with Azure Data Factory?
Yes, you can use copy activity with a wild card. I tried to reproduce the same in my environment and I got the below results:
First, add source dataset and select wildcard with folder name. In my scenario, I have a folder name pool.
Then select sink dataset with file path
The pipeline run is successful. It transferred the file from one location to another location with the required name. Look at the following image for reference.

ADF Staged Copy Not Applying Schema Mapping for XML

I'm trying to copy data between a SOAP Web Service and an Azure SQL Database. When I use the staging option of the copy activity, mappings are not applied and no data is copied. If I disable the stage and write directly to a text file, mappings are applied as expected. How can I make the mappings apply when the staging option is enabled?
Additional Information
Source: HTTP
Sink: Azure SQL Database
Direct copies between the source and sink do not work because of where they're located, so I need to stage the copy.
However, when staging the copy, the defined mappings are not being applied and the sink database table ends up with a single null row.
When using a delimited text sink without a staging step, the mappings work as expected.
However, as soon as I add a staging step, the same issue occurs with a delimited text sink.
Question:
I have reproduced the same issue. I used http (xml response) as source and Azure SQL DB as sink. Staging is also stored as XML file. While doing so, few columns are null and only few data are getting copied in sink. Mapping is not happening as given in mapping tab of copy activity.
This issue is not happening when another source format like delimited file is used. In those cases, Mapping occurs as given in the mapping tab.
Perhaps you can try the workaround of using two copy activities. One copy activity from HTTP source to blob and then blob to sink. And these activities should happen sequentially.
In copy data activity 1, HTTP is used as source and csv file in Blob storage is used as a sink.
In Mapping tab, I have given the corresponding mapping
In copy activity 2, I used same dataset for blob storage as a source dataset and Azure SQL db sink dataset.
In copy activity 2 also, I tested with auto-mapping and with manual mapping. Both worked in my case.
Final Sink Table

Azure data factory: Implementing the SCD2 on txt files

I have flat files in adls source,
for full load we are adding 2 columns Insert and datatimestamp.
For change load we need to Lookup with full data, the data available in full should be taken as Updated and not available data as Insert and copy.
below is the approach I tried to work out, but i'm unable to perform.
Can any one help me on this.
Thanks you and waiting for quick response.
Currently, the feature to update the existing flat file using the Azure data factory sink is not supported. You have to create a new flat file.
You can also use data flow activity to read full and incremental data and load to a new file in sink transformation.

Infer Schema from .csv file appear in azure blob and generaate DDL script automatically using Azure SQL

Every time the .csv file appearing in the blob storage, i have to create DDL from that manually on azure sql. The data type is based on the value specified for that field.
The file have 400 column, and manually it is taking lots of time.
May someone please suggest how to automate this using SP or script, so when we execute the script, it will create TABLE or DDL script, based on the file in the blob storage.
I am not sure if it is possible, or is there any better way to handle such scenario.
Appreciate yours valuable suggestion.
Many Thanks
This can be achieved in multiple ways. As you mentioned about automating it, you can use Azure function as well.
Firstly create a function that reads the csv file from blob storage:
Read a CSV Blob file in Azure
Then add the code to generate the DDL statement:
Uploading and Importing CSV File to SQL Server
Azure function can be scheduled or run when new files are added to blob storage.
If this is once a day kind of requirement and can manually be done as well, we can download the file from blob and use the 'Import Flat File' functionality available within SSMS where we can just specify the csv file and it creates the schema based on existing column values.

How to remove extra files when sinking CSV files to Azure Data Lake Gen2 with Azure Data Factory data flow?

I have done data flow tutorial. Sink currently created 4 files to Azure Data Lake Gen2.
I suppose this is related to HDFS file system.
Is it possible to save without success, committed, started files?
What is best practice? Should they be removed after saving to data lake gen2?
Are then needed in further data processing?
https://learn.microsoft.com/en-us/azure/data-factory/tutorial-data-flow
There are a couple of options available.
You can mention the output filename in Sink transformation settings.
Select Output to single file from the dropdown of file name option and give the output file name.
You could also parameterize the output file name as required. Refer to this SO thread.
You can add delete activity after the data flow activity in the pipeline and delete the files from the folder.