Transfer data from U-SQL managed table to Azure SQL Database table - azure-data-factory

I have a U-SQL managed table that contains schematized structured data.
CREATE TABLE [AdlaDb].[dbo].[User]
(
UserGuid Guid,
Postcode string,
Age int?
DateOfBirth DateTime?,
)
And a Azure SQL Database table.
CREATE TABLE [SqlDb].[dbo].[User]
(
UserGuid uniqueidentifier NOT NULL,
Postcode varchar(15) NULL,
Age int NULL,
DateOfBirth Date NULL,
)
I would like to transfer data from U-SQL managed table to Azure SQLDB table without losing the data types.
I'm using azure data factory, seems like I cannot
directly query the U-SQL managed table as an input dataset for data factory
do a federated write query to Azure SQLDB
Hence I'm having an intermediate step where I copy from U-SQL managed table to Azure Blob and then move to Azure SQLDB table. Doing this, I'm losing the data type and having to have type conversion/transformations later again before inserting.
Is there any better way to transfer data from U-SQL managed table to Azure SQL Database table without losing data type? Or am I missing something?

At this point you have the following option:
Export the U-SQL table into an intermediate format (e.g., CSV) in ADLS or blob storage.
Use ADF to move the file into Azure SQL DB.
I know that the ADF team has a work item to do this for you. I will ask them to reply to this thread as well.
Directly writing into a table from a U-SQL script has a lot of challenges due to the fault-tolerant retry and scale-out processing in U-SQL. This makes atomic writing in parallel into a transacted store a bit more complex (see for example http://www.vldb.org/conf/1996/P460.PDF).

There is now another option to transfer data from USQL managed table to Azure SQL Database table.
Write out the data from USQL Managed table or from a USQL script to Azure Blob Storage as a text file (.csv, .txt etc..)
And then make use of the public preview feature in Azure SQL Database - BULK INSERT - wrap this into a stored procedure
Add an Stored procedure activity in Azure Data Factory to schedule
Note: There is one thing to be aware of when creating DATABASE SCOPED CREDENTIAL, refer this Stack Overflow question

Related

Can I use Azure Data Factory for ETL Testing as well?

If I have to do data migration and ETL testing for data in Azure SQL DB's then can I use Azure Data Factory?
If yes the please provide some links explaining how? or some tutorials or page where I can find some details?
Thanks in advance!
Sunil
Yes, you can do ETL testing by using data factory. I copied all tables from one database to another database by using below process:
I check all the tables by using below command:
select TABLE_SCHEMA, TABLE_NAME FROM information_schema.
TABLES Where TABLE_TYPE = 'BASE TABLE' and TABLE_SCHEMA = 'dbo'
I created pipeline and performed Lookup activity to retrieve the tables of database by entering below query.
Output:
I created linked service for source and sink on success of lookup activity I implemented foreach activity by enabling sequential and added copy activity to it. I created dynamic dataset by using linked service that created for source. to retrieve all the tables from database I added the data dynamically as below:
Created schema and table parameters enter dynamic content for schema is #dataset().schema and for table is #dataset().table entered values for schema is #item().TABLE_SCHEMA for table is #item().TABLE_NAME .
create dataset by using linked service that created for sink and created parameters and entered values same as source in the sink and enabled auto create table option.
Executed the pipeline. It executed successfully.
All tables are copied to target database.
In this way you can copy all the tables from one database to another database.

Azure Data Factory Copy Pipeline with Geography Data Type

I am trying to get a geography data type from a production DB to another DB on a nightly occurrence. I really wanted to leverage upsert as the write activity, but it seems that geography is not supported with this method. I was reading a similar post about bringing the data through ADF as a well known text data type and then changing it, but I keep getting confused on what to do with the data once it is brought over as a well known data type. I would appreciate any advice, thank you.
Tried to utilize ADF pipelines and data flows. Tried to convert the data type once it was in the destination, but then I was not able to run the pipeline again.
I tried to upsert the data with geography datatype from one Azure SQL database to another using copy activity and got error message.
Then, I did the upsert using dataflow activity. Below are the steps.
A source table is taken in dataflow as in below image.
CREATE TABLE SpatialTable
( id int ,
GeogCol1 geography,
GeogCol2 AS GeogCol1.STAsText() );
INSERT INTO SpatialTable (id,GeogCol1)
VALUES (1,geography::STGeomFromText('LINESTRING(-122.360 46.656, -122.343 46.656 )', 4326));
INSERT INTO SpatialTable (id,GeogCol1)
VALUES (2,geography::STGeomFromText('POLYGON((-122.357 47.653 , -122.348 47.649, -122.348 47.658, -122.358 47.658, -122.358 47.653))', 4326));
Then Alter Row transformation is taken and in Alter Row Conditions, Upsert if isNull(id)==false()is given. (Based on the column id, sink table upserted)
Then, in Sink dataset for target table is given. In sink settings, Update method is selected as Allow Upsert and required Key column is given. (Here column id is selected)
When pipeline is run for the first time, data is inserted into target table.
When pipeline is run for the second time by updating the existing data and inserting new records to source, data is upserted correctly.
Source Data is changed for id=1 and new row is inserted with id=3
Sink data is reflecting the changes done in source.

Copy activity auto-creates nvarchar(max) columns

I have Azure Data Factory copy activity which loads parquet files to Azure Synapse. Sink is configured as shown below:
After data loading completed I had a staging table structure like this:
Then I create temp table based on stg one and it has been working fine until today when new created tables suddenly received nvarchar(max) type instead of nvarchar(4000):
Temp table creation now is failed with obvious error:
Column 'currency_abbreviation' has a data type that cannot participate in a columnstore index.'
Why the AutoCreate table definition has changed and how can I return it to the "normal" behavior without nvarchar(max) columns?
I've got exactly the same problem! I'm using a data factory to read csv-files into my Azure datawarehouse and this used to result in nvarchar(4000) columns, but now they are all nvarchar(max). I also get the error
Column xxx has a data type that cannot participate in a columnstore index.
My solution for now is to change my SQL code and use a CAST to change the formats, but there must be a setting in the data factory to get the former results back...

Use dynamic value as table name of a table storage in Azure Data Factory

I have an ADF pipeline that uses copy data activity for copying data from blob storage to table storage. This pipeline runs on a trigger once every day. I have provided a table name in table storage data set as 'Table1'.
Instead of providing a hard coded table name value (Table1), is it possible to provide a dynamic value as table name in the table storage such that RUN ID of pipeline run is used as the table name in the table storage and copy data from blob to that table in table storage?
You could set a dynamic value as table name.
For example, you can add parameter to the table storage dataset:
Then you can set the pipeline parameter to specify the table name:
But we can not provide the RUN ID of pipeline run as the table name in the table storage and copy data from blob to that table in table storage.
Hope this helps.

Azure Data Factory -> Copy from SQL to Table Storage (boolean mapping)

I am adding pipeline in Azure Data factory to migrate data from SQL to Table storage.
All seems working fine, however i observed that bit column is not getting copies as expected.
I have a filed "IsMinor" in SQL DB.
If i don't add explicit mapping for bit column as is then, it is copied as null
If i set it as 'True' Or 'False' from SQL, it is copied as String instead of Boolean in TableStorage.
I also tried to specify type while mapping the field i.e. "IsMinor (Boolean)", however it didn't worked as well.
Following is my sample table
I want the bit value above to be copied as "Boolean" in Table storage instead of String.
I tried copy the boolean data from my SQL database to table Storage, it works.
As you know, SQL server does't support boolean data type, so I create table like this:
All the data preview look well in Source dataset:
I just create a table test1 in Table storage, let the data factory create the PartitionKey and RowKey automatically.
Run the pipeline and check the data in test1 with Storage Explorer:
From the document Understanding the Table service data model, Table storage do support Boolean property types.
Hope this help.