Copy activity auto-creates nvarchar(max) columns - azure-data-factory

I have Azure Data Factory copy activity which loads parquet files to Azure Synapse. Sink is configured as shown below:
After data loading completed I had a staging table structure like this:
Then I create temp table based on stg one and it has been working fine until today when new created tables suddenly received nvarchar(max) type instead of nvarchar(4000):
Temp table creation now is failed with obvious error:
Column 'currency_abbreviation' has a data type that cannot participate in a columnstore index.'
Why the AutoCreate table definition has changed and how can I return it to the "normal" behavior without nvarchar(max) columns?

I've got exactly the same problem! I'm using a data factory to read csv-files into my Azure datawarehouse and this used to result in nvarchar(4000) columns, but now they are all nvarchar(max). I also get the error
Column xxx has a data type that cannot participate in a columnstore index.
My solution for now is to change my SQL code and use a CAST to change the formats, but there must be a setting in the data factory to get the former results back...

Related

Azure Data Factory Copy Pipeline with Geography Data Type

I am trying to get a geography data type from a production DB to another DB on a nightly occurrence. I really wanted to leverage upsert as the write activity, but it seems that geography is not supported with this method. I was reading a similar post about bringing the data through ADF as a well known text data type and then changing it, but I keep getting confused on what to do with the data once it is brought over as a well known data type. I would appreciate any advice, thank you.
Tried to utilize ADF pipelines and data flows. Tried to convert the data type once it was in the destination, but then I was not able to run the pipeline again.
I tried to upsert the data with geography datatype from one Azure SQL database to another using copy activity and got error message.
Then, I did the upsert using dataflow activity. Below are the steps.
A source table is taken in dataflow as in below image.
CREATE TABLE SpatialTable
( id int ,
GeogCol1 geography,
GeogCol2 AS GeogCol1.STAsText() );
INSERT INTO SpatialTable (id,GeogCol1)
VALUES (1,geography::STGeomFromText('LINESTRING(-122.360 46.656, -122.343 46.656 )', 4326));
INSERT INTO SpatialTable (id,GeogCol1)
VALUES (2,geography::STGeomFromText('POLYGON((-122.357 47.653 , -122.348 47.649, -122.348 47.658, -122.358 47.658, -122.358 47.653))', 4326));
Then Alter Row transformation is taken and in Alter Row Conditions, Upsert if isNull(id)==false()is given. (Based on the column id, sink table upserted)
Then, in Sink dataset for target table is given. In sink settings, Update method is selected as Allow Upsert and required Key column is given. (Here column id is selected)
When pipeline is run for the first time, data is inserted into target table.
When pipeline is run for the second time by updating the existing data and inserting new records to source, data is upserted correctly.
Source Data is changed for id=1 and new row is inserted with id=3
Sink data is reflecting the changes done in source.

View query used in Data Factory Copy Data operation

In Azure Data Factory, I'm doing a pretty vanilla 'Copy Data' operation. One dataset to another.
Can I view the query being used to perform the copy operation? Apparently, it's a syntax error, but I've only used drag-and-drop menus. Here's the error:
ErrorCode=SqlOperationFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=A
database operation failed. Please search error to get more
details.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Data.SqlClient.SqlException,Message=Incorrect
syntax near the keyword 'from'.,Source=.Net SqlClient Data
Provider,SqlErrorNumber=156,Class=15,ErrorCode=-2146232060,State=1,Errors=[{Class=15,Number=156,State=1,Message=Incorrect
syntax near the keyword 'from'.,},],'
Extra context
1.Clear Schema and import Schema again.
2. This mostly happens when there are some changes to table schema after creating pipeline and datasets. verify once.
3.The schema and datasets should be refreshed when there are some changes in the SQL table schema.
4. For table name or view name used in the query, use []. ex: [dbo].[persons]
5. In datasets select table name
6. try to publish before testing.
I followed the same scenario and reproduced the same thing. I didn’t get any error as you can see, the above error mainly happens because of schema.
Source dataset:
In source dataset, I manually added and connected to SQL database with Sample SQL table.
Sink dataset:
In sink dataset, I added another SQL database with auto create table, write behavior: Upset, key column: PersonID.
Before execution there is no table on SQL database, then after the execution was successful, I got this sample output in azure SQL database.

Azure data factory -ingesting the data from csv file to sql table- data activity sql sink stored procedure -table type and table type parameter name

I am running one of the existing Azure data factory pipeline that contains instet into sql table, where the sink is a sql server stored procedure.
I supplied the table type and table type parameter name and which maps to my table.
I am getting error as Failure happened on 'Sink' side.
Storedprocedure:
CREATE PROCEDURE [ods].[Insert_EJ_Bing]
#InputTable [ods].[EJ_Bing_Type] READONLY,
#FileName varchar(1000)
AS
BEGIN
insert into #workingtable
(
[ROWHASH],
[Date])
select [ROWHASH],
[Date] from #InputTable
end
Error message:
Failure happened on 'Sink' side.
ErrorCode=InvalidParameter,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=The value of the property '' is invalid:
'Invalid 3 part name format for TypeName.'.,Source=,''Type=System.ArgumentException,Message=Invalid 3 part name format for TypeName.,Source=System.Data
Can anyone please let me where i am doing wrong.
There is nothing wrong with your approach or code. You are getting this error because of the way you specified the Table type value in the sink.
I got same error as you with the above table type.
Change the Table type from [dbo.][tabletype] to [dbo].[tabletype]
You can see my copy activity is successful.
Rather than typing the stored procedure name and Table type names, you can do like below.

Azure Data Factory -> Copy from SQL to Table Storage (boolean mapping)

I am adding pipeline in Azure Data factory to migrate data from SQL to Table storage.
All seems working fine, however i observed that bit column is not getting copies as expected.
I have a filed "IsMinor" in SQL DB.
If i don't add explicit mapping for bit column as is then, it is copied as null
If i set it as 'True' Or 'False' from SQL, it is copied as String instead of Boolean in TableStorage.
I also tried to specify type while mapping the field i.e. "IsMinor (Boolean)", however it didn't worked as well.
Following is my sample table
I want the bit value above to be copied as "Boolean" in Table storage instead of String.
I tried copy the boolean data from my SQL database to table Storage, it works.
As you know, SQL server does't support boolean data type, so I create table like this:
All the data preview look well in Source dataset:
I just create a table test1 in Table storage, let the data factory create the PartitionKey and RowKey automatically.
Run the pipeline and check the data in test1 with Storage Explorer:
From the document Understanding the Table service data model, Table storage do support Boolean property types.
Hope this help.

Transfer data from U-SQL managed table to Azure SQL Database table

I have a U-SQL managed table that contains schematized structured data.
CREATE TABLE [AdlaDb].[dbo].[User]
(
UserGuid Guid,
Postcode string,
Age int?
DateOfBirth DateTime?,
)
And a Azure SQL Database table.
CREATE TABLE [SqlDb].[dbo].[User]
(
UserGuid uniqueidentifier NOT NULL,
Postcode varchar(15) NULL,
Age int NULL,
DateOfBirth Date NULL,
)
I would like to transfer data from U-SQL managed table to Azure SQLDB table without losing the data types.
I'm using azure data factory, seems like I cannot
directly query the U-SQL managed table as an input dataset for data factory
do a federated write query to Azure SQLDB
Hence I'm having an intermediate step where I copy from U-SQL managed table to Azure Blob and then move to Azure SQLDB table. Doing this, I'm losing the data type and having to have type conversion/transformations later again before inserting.
Is there any better way to transfer data from U-SQL managed table to Azure SQL Database table without losing data type? Or am I missing something?
At this point you have the following option:
Export the U-SQL table into an intermediate format (e.g., CSV) in ADLS or blob storage.
Use ADF to move the file into Azure SQL DB.
I know that the ADF team has a work item to do this for you. I will ask them to reply to this thread as well.
Directly writing into a table from a U-SQL script has a lot of challenges due to the fault-tolerant retry and scale-out processing in U-SQL. This makes atomic writing in parallel into a transacted store a bit more complex (see for example http://www.vldb.org/conf/1996/P460.PDF).
There is now another option to transfer data from USQL managed table to Azure SQL Database table.
Write out the data from USQL Managed table or from a USQL script to Azure Blob Storage as a text file (.csv, .txt etc..)
And then make use of the public preview feature in Azure SQL Database - BULK INSERT - wrap this into a stored procedure
Add an Stored procedure activity in Azure Data Factory to schedule
Note: There is one thing to be aware of when creating DATABASE SCOPED CREDENTIAL, refer this Stack Overflow question