Copy Data - How to skip Identity columns - azure-data-factory

I'm designing a Copy Data task where the Sink SQL Server table contains an Identity column. The Copy Data task always wants me to map that column when, in my opinion, it should just not include the column in the list of columns to map. Does anyone know how I can get the ADF Copy Data task to ignore Sink Identity columns?

If you are using copy data tool, and in your sql server, the ID is set as auto-increment, then it should not show out at the mapping step. Please tell us if it is not the case.
If you are using the create pipeline/dataset, you could just go to the sink dataset schema tab, remove the id column. And then go to the copy activity mapping tab, click import schemes again. ID column should has disappeared now.

You could include a SET_IDENTITY_INSERT_ON statement for the given table before executing the copy step. After completed, set it to OFF.

Related

How set parameters in SQL Server table from Copy Data Activity - Source: XML / Sink: SQL Server Table / Mapping: XML column

I have a question, hopefully someone in the forum could give some help here. I am able to pull data from Soap API call to SQL Server table (xml data type field actually) via Copy Data Activity. The pipeline that runs this process is metadata driven, so how could I write other parameters in the same SQL Server table for the same run? I am using a Copy Data Activity to load XML data to SQL Server table but in Mapping tab I am not able to select other parameters in order to point them to others SQL table columns.
In addition, I am using a ForEach Activity in order the Copy Data Activity iterates for several values of one column on SQL Server table.
I will appreciate any advice on this.
Thanks
David
Thank you for your interest, I will try to be more explicit with this image: Hopefully this clarify a little bit. Given the current escenario, how could I pass StoreId and CustomerNumber parameters to the table Stage.XmlDataTable?
Taking in to account in the mapping step I am just able to map XML data from the current API call and then write it into Stage.XmlDataTable - XmlData column.
Thanks in advance David
You can add your parameters using Additional Columns in the Copy data activity Source.
When you import schema in mapping you can see the additional columns added in source.
Refer to this MS document for more details on adding additional columns during the copy.

Required help in removing the column from text file using ADF

I have a sample file like this . Using data factory Where I need to create another text file with output where I can remove the 1st two columns. Is there any query where I can generate a file like as below.
Source file:
Output file :
Core Data Factory (ie not including Mapping Data Flows) is not gifted with many abilities to do data transformation (which this is) however it can do some things. It can change formats (eg .csv to JSON), it can add some metadata columns (like $$FILENAME) and it can remove columns, simply by using the mapping in the Copy activity.
Add a Copy activity to your pipeline and set the source to your main file
Set the Sink to your target file name. It can be the same name as your original file but I would make it different for audit trail purposes.
Import the schema of your file, make sure the separator in the dataset is set to semi-colon ';'
4. Now press the Trash can button to delete the mappings for columns 1 and 2.
5. Run your pipeline. The output file should not have the two columns.
My results:
You can accomplish this task by using Select transformation in mapping data flow in Azure Data Factory (ADF). You can delete any unwanted columns from your delimited text file in data flow transformation.
I tested the same in my environment and it is working fine.
Please follow the below steps:
Create the Azure Data factory using Azure Portal
Upload the data at the source (eg: Blob Container)
Create a linked service to connect the blob storage with ADF as shown below
Then, create DelimitedText datasets using the above linked service for source and sink files. In the source dataset, mark Column delimiter as Semicolon(;). Also, in the Schema tab, select Import Schema From connection/store.
Create a data flow. Select the Source dataset from your datasets list. Click on + symbol to add Select from options as shown below.
**In the settings, select the columns you want to delete and then click on delete option.
Add the sink at the end. In the Sink tab use the sink dataset you created earlier in step 4. In the Settings tab, for File name option select Output to single file and give the filename in below option.
Now create a pipeline and use the Data flow activity. Select the data flow you created. Click on Trigger Now option to run the pipeline.
Check the output file at the sink location. You can see my input and output files below.

Azure Data Factory - Data Flow - Derived Column Issue

Am using Azure DataFlow - DerivedColumn to create some new columns.
Ex:
this is my source and can preview the data.
But from DerivedColumn1 i cannot see these column or even in Expression Editor
Expression Editor:
Is something changed in ADF or am I doing something wrong.
According you screenshot, the column name is set as the row. Or you will get the error in Sink column mapping. Please set "first row as header" in the excel dataset.
If you don't check it, the column name will be considered as first row:
For your issue, you could try bellow workarounds:
import the source schema in Projection and Delete the Derived column
active and add again.
Drop the data flow and create a new one. Some time data flow may
have bugs, we refresh the browser or just recreate the data flow, it
will be solved.

How to copy data from an a csv to Azure SQL Server table?

I have a dataset based on a csv file. This exposes a data as follows:
Name,Age
John,23
I have an Azure SQL Server instance with a table named: [People]
This has columns
Name, Age
I am using the Copy Data task activity and trying to copy data from the csv data set into the azure table.
There is no option to indicate the table name as a source. Instead I have a space to input a Stored Procedure name?
How does this work? Where do I put the target table name in the image below?
You should DEFINITELY have a table name to write to. If you don't have a table, something is wrong with your setup. Anyway, make sure you have a table to write to; make sure the field names in your table match the fields in the CSV file. Then, follow the steps outlined in the description below. There are several steps to click through, but all are pretty intuitive, so just follow the instructions step by step and you should be fine.
http://normalian.hatenablog.com/entry/2017/09/04/233320
You can add records into the SQL Database table directly without stored procedures, by configuring the table value on the Sink Dataset rather than the Copy Activity which is what is happening.
Have a look at the below screenshot which shows the Table field within my dataset.

How to know in Talend if tMySQLInput will overwrite data?

I have one already existing Talend Open Studio tMySQLInput component with some sql code inside it, in order to retrieve some joined columns linked to a tMySQLOuput component (pointing to an already existing MySQL table) with few records.
QUESTION:
Will the "tMySQLInput" component overwrite the already existing table data that the tMySQLOutput component relates to? I mean is there an option to check in the tMySQLInput our output in order to say, overwrite each time this job is executed ?
Thank you all.
Yes, there is an option where in tMySQLOutput where you can specify what action you want to do to your table. Follow following steps:
Go to component tab of tMySQLOutput, it will open the basic settings of this component.
If you will look closer you will find Action on table. This is the action which you can perform on the table which is pointed by tMySQLOutput. It has options as Default, Drop and Create Table etc.
Then you have Action on data. These are the options which you can perform on the data like Insert, Update etc.
In your case I suppose you can choose Action on Table as Default and Action on Data as Insert. Default action would not do anything on the table and Insert option would insert the records at the end of table. But in case of Insert if you will have duplicate rows then job would stop the moment it will find any duplicate row.