Schema compliance in Azure data factory - azure-data-factory

I am trying to do schema compliance of an input file in ADF. I have tried the below.
Get Metadata Activity
The schema validation that is available in source activity
But the above seems to only check if a particular field is present or not in the specified position. Also Azure by default takes the datatype of all these fields as string since the input is flat file.
I want to check the position and datatype as well. for eg:-
empid,name,salary
1,abc,10
2,def,20
3,ghi,50
xyz,jkl,10
The row with empid as xyz needs to be rejected as it is not of number data type. Any help is appreciated.

You can use data flow and create a filter to achieve this.
Below is my test:
1.create a source
2.create a filter and use this expression:regexMatch(empid,'(\\d+)')
3.Output:
Hope this can help you.

Related

ADF Pipeline include fixed text in output

The overall aim of the pipeline is to copy from XML to Oracle.
One of the source columns is a datetime that needs formatting, so I'm using an intermediate copy activity to copy from XML to CSV as instructed in this answer
From the CSV to the table is simple mapping except for the need for an additional target column with a fixed value of '365Response'
I've tried adding this as an additional column as shown below:
However, on the mapping tab, I'm not able to select the new additional column:
What did I do wrong?
Your process to add an Additional column in the copy activity looks correct. If the Additional column is not showing in the mapping, you can clear the mapping and import schema again to refresh the mapping.

How set parameters in SQL Server table from Copy Data Activity - Source: XML / Sink: SQL Server Table / Mapping: XML column

I have a question, hopefully someone in the forum could give some help here. I am able to pull data from Soap API call to SQL Server table (xml data type field actually) via Copy Data Activity. The pipeline that runs this process is metadata driven, so how could I write other parameters in the same SQL Server table for the same run? I am using a Copy Data Activity to load XML data to SQL Server table but in Mapping tab I am not able to select other parameters in order to point them to others SQL table columns.
In addition, I am using a ForEach Activity in order the Copy Data Activity iterates for several values of one column on SQL Server table.
I will appreciate any advice on this.
Thanks
David
Thank you for your interest, I will try to be more explicit with this image: Hopefully this clarify a little bit. Given the current escenario, how could I pass StoreId and CustomerNumber parameters to the table Stage.XmlDataTable?
Taking in to account in the mapping step I am just able to map XML data from the current API call and then write it into Stage.XmlDataTable - XmlData column.
Thanks in advance David
You can add your parameters using Additional Columns in the Copy data activity Source.
When you import schema in mapping you can see the additional columns added in source.
Refer to this MS document for more details on adding additional columns during the copy.

Compare Get metadata activity with lookup output

I was trying to do a delta load using ADF. I have a get meta data activity on my blob location to read all the files using getchildItem and lookup activity which reads loaded file names from a sql table. Following this I have a filter activity which should filter out the new files from the blob location.
expression on items
#activity('GetRawFilenames').output.childItems
expression on Condition
#not(contains(activity('GetLoadedFilesList').output.value,item().name))
But still its not filtering out the file list in the filter output. Could the experts please help? Thanks in advance..
Have a look at this. They describe same problem. Expression on condition should be:
#not(contains(join(activity('GetLoadedFilesList').output.value,','),item().name))
so after joining of GetLoadedFilesList activity your code should be working

Azure Data Factory - Data Flow - Derived Column Issue

Am using Azure DataFlow - DerivedColumn to create some new columns.
Ex:
this is my source and can preview the data.
But from DerivedColumn1 i cannot see these column or even in Expression Editor
Expression Editor:
Is something changed in ADF or am I doing something wrong.
According you screenshot, the column name is set as the row. Or you will get the error in Sink column mapping. Please set "first row as header" in the excel dataset.
If you don't check it, the column name will be considered as first row:
For your issue, you could try bellow workarounds:
import the source schema in Projection and Delete the Derived column
active and add again.
Drop the data flow and create a new one. Some time data flow may
have bugs, we refresh the browser or just recreate the data flow, it
will be solved.

Azure Data Factory Copy using Variable

I am coping data from a rest api to an azure SQL database. The copy is working find but there is a column which isn't being return within the api.
What I want to do is to add this column to the source. I've got a variable called symbol which I want to use as the source column. However, this isn't working:
Mapping
Any ideas?
This functionality is available using the "Additional Columns" feature of the Copy Activity.
If you navigate to the "Source" area, the bottom of the page will show you an area where you can add Additional Columns. Clicking the "New" button will let you enter a name and a value (which can be dynamic), which will be added to the output.
Source(s):
https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-overview#add-additional-columns-during-copy
Per my knowledge, the copy activity may can't meet your requirements.Please see the error conditions in the link:
Source data store query result does not have a column name that is
specified in the input dataset "structure" section.
Sink data store (if with pre-defined schema) does not have a column
name that is specified in the output dataset "structure" section.
Either fewer columns or more columns in the "structure" of sink
dataset than specified in the mapping.
Duplicate mapping.
I think Mapping Data Flow is your choice.You could add a derived column before the sink dataset and create a parameter named Symbol.
Then set the derived column as the value of Symbol.
You can use the Copy Activity with a stored proc sink to do that. See my answer here for more info.