Azure Data Factory Copy using Variable - azure-data-factory

I am coping data from a rest api to an azure SQL database. The copy is working find but there is a column which isn't being return within the api.
What I want to do is to add this column to the source. I've got a variable called symbol which I want to use as the source column. However, this isn't working:
Mapping
Any ideas?

This functionality is available using the "Additional Columns" feature of the Copy Activity.
If you navigate to the "Source" area, the bottom of the page will show you an area where you can add Additional Columns. Clicking the "New" button will let you enter a name and a value (which can be dynamic), which will be added to the output.
Source(s):
https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-overview#add-additional-columns-during-copy

Per my knowledge, the copy activity may can't meet your requirements.Please see the error conditions in the link:
Source data store query result does not have a column name that is
specified in the input dataset "structure" section.
Sink data store (if with pre-defined schema) does not have a column
name that is specified in the output dataset "structure" section.
Either fewer columns or more columns in the "structure" of sink
dataset than specified in the mapping.
Duplicate mapping.
I think Mapping Data Flow is your choice.You could add a derived column before the sink dataset and create a parameter named Symbol.
Then set the derived column as the value of Symbol.

You can use the Copy Activity with a stored proc sink to do that. See my answer here for more info.

Related

ADF Pipeline include fixed text in output

The overall aim of the pipeline is to copy from XML to Oracle.
One of the source columns is a datetime that needs formatting, so I'm using an intermediate copy activity to copy from XML to CSV as instructed in this answer
From the CSV to the table is simple mapping except for the need for an additional target column with a fixed value of '365Response'
I've tried adding this as an additional column as shown below:
However, on the mapping tab, I'm not able to select the new additional column:
What did I do wrong?
Your process to add an Additional column in the copy activity looks correct. If the Additional column is not showing in the mapping, you can clear the mapping and import schema again to refresh the mapping.

Azure Data Factory - Data Flow - Derived Column Issue

Am using Azure DataFlow - DerivedColumn to create some new columns.
Ex:
this is my source and can preview the data.
But from DerivedColumn1 i cannot see these column or even in Expression Editor
Expression Editor:
Is something changed in ADF or am I doing something wrong.
According you screenshot, the column name is set as the row. Or you will get the error in Sink column mapping. Please set "first row as header" in the excel dataset.
If you don't check it, the column name will be considered as first row:
For your issue, you could try bellow workarounds:
import the source schema in Projection and Delete the Derived column
active and add again.
Drop the data flow and create a new one. Some time data flow may
have bugs, we refresh the browser or just recreate the data flow, it
will be solved.

Schema compliance in Azure data factory

I am trying to do schema compliance of an input file in ADF. I have tried the below.
Get Metadata Activity
The schema validation that is available in source activity
But the above seems to only check if a particular field is present or not in the specified position. Also Azure by default takes the datatype of all these fields as string since the input is flat file.
I want to check the position and datatype as well. for eg:-
empid,name,salary
1,abc,10
2,def,20
3,ghi,50
xyz,jkl,10
The row with empid as xyz needs to be rejected as it is not of number data type. Any help is appreciated.
You can use data flow and create a filter to achieve this.
Below is my test:
1.create a source
2.create a filter and use this expression:regexMatch(empid,'(\\d+)')
3.Output:
Hope this can help you.

Case-insensitive column names breaks the Data Preview mode in Data Flow of Data Factory

I have a csv file in my ADLS:
a,A,b
1,1,1
2,2,2
3,3,3
When I load this data into a delimited text Dataset in ADF with first row as header the data preview seems correct (see picture below). The schema has the names a, A and b for columns.
However, now I want to use this dataset in Mapping Data Flow and here does the Data Preview mode break. The second column name (A) is seen as duplicate and no preview can be loaded.
All other functionality in Data Flow keeps on working fine, it is only the Data Preview tab that gives an error. All consequent transformation nodes also gives this error in the Data Preview.
Moreover, if the data contains two "exact" same column names (e.g. a, a, b), then the Dataset recognizes the columns as duplicates and puts a "1" and "2" after each name. It is only when they are case-sensitive unequal and case-insensitive equal that the Dataset doesn't get an error and Data Flow does.
Is this a known error? Is it possible to change a specific column name in the dataset before loading into Data Flow? Or is there just something I'am missing?
I testes it and get the error in source Data Preview:
I ask Azure support for help and they are testing now. Please wait my update.
Update:
I sent Azure Support the test.csv file. They tested and replied me. If you insist to use " first row as header", Data Factory can not solve the error. The solution is that re-edit the csv file. Even in Azure SQL database, it doesn't support we create a table with same column name. Column names are case-insensitive.
For example, this code is not supported:
Here's the full email message:
Hi Leon,
Good morning! Thanks for your information.
I have tested the sample file you share with me and reproduce the issue. The data preview is alright by default when I connect to your sample file.
But I noticed when we do the trouble shooting session – a, A, b are column name, so you have checked the first row as header your source connection. Please confirm it’s alright and you want to use a, A, b as column headers. If so, it should be a error because there’s no text- transform for “A” in schema.
Hope you can understand the column name doesn’t influence the data transform and it’s okay to change it to make sure no errors block the data flow.
There’re two tips for you to remove block, one is change the column name from your source csv directly or you can click the import schema button ( in the blow screen shot) in schema tab, and you can choose a sample file to redefine the schema which also allows you to change column name.
Hope this helps.

Copy Data - How to skip Identity columns

I'm designing a Copy Data task where the Sink SQL Server table contains an Identity column. The Copy Data task always wants me to map that column when, in my opinion, it should just not include the column in the list of columns to map. Does anyone know how I can get the ADF Copy Data task to ignore Sink Identity columns?
If you are using copy data tool, and in your sql server, the ID is set as auto-increment, then it should not show out at the mapping step. Please tell us if it is not the case.
If you are using the create pipeline/dataset, you could just go to the sink dataset schema tab, remove the id column. And then go to the copy activity mapping tab, click import schemes again. ID column should has disappeared now.
You could include a SET_IDENTITY_INSERT_ON statement for the given table before executing the copy step. After completed, set it to OFF.