Azure Data Factory Copy Pipeline with Geography Data Type - azure-data-factory

I am trying to get a geography data type from a production DB to another DB on a nightly occurrence. I really wanted to leverage upsert as the write activity, but it seems that geography is not supported with this method. I was reading a similar post about bringing the data through ADF as a well known text data type and then changing it, but I keep getting confused on what to do with the data once it is brought over as a well known data type. I would appreciate any advice, thank you.
Tried to utilize ADF pipelines and data flows. Tried to convert the data type once it was in the destination, but then I was not able to run the pipeline again.

I tried to upsert the data with geography datatype from one Azure SQL database to another using copy activity and got error message.
Then, I did the upsert using dataflow activity. Below are the steps.
A source table is taken in dataflow as in below image.
CREATE TABLE SpatialTable
( id int ,
GeogCol1 geography,
GeogCol2 AS GeogCol1.STAsText() );
INSERT INTO SpatialTable (id,GeogCol1)
VALUES (1,geography::STGeomFromText('LINESTRING(-122.360 46.656, -122.343 46.656 )', 4326));
INSERT INTO SpatialTable (id,GeogCol1)
VALUES (2,geography::STGeomFromText('POLYGON((-122.357 47.653 , -122.348 47.649, -122.348 47.658, -122.358 47.658, -122.358 47.653))', 4326));
Then Alter Row transformation is taken and in Alter Row Conditions, Upsert if isNull(id)==false()is given. (Based on the column id, sink table upserted)
Then, in Sink dataset for target table is given. In sink settings, Update method is selected as Allow Upsert and required Key column is given. (Here column id is selected)
When pipeline is run for the first time, data is inserted into target table.
When pipeline is run for the second time by updating the existing data and inserting new records to source, data is upserted correctly.
Source Data is changed for id=1 and new row is inserted with id=3
Sink data is reflecting the changes done in source.

Related

Custom logging in Azure Data Factory

I'm new to ADF and trying to build an Azure Data Flow Pipeline. I'm reading from a Snowflake data source and checking the data against several business rules. After each check, I'm writing the bad records to a csv file. Now, my requirement is that I need to create a log table which shows the business rule and the number of records that failed to pass that particular business rule. I've attached a screenshot of my ADF data flow as well as the structure of the table I'm trying to populate.
My idea was to create a stored proc that will be called at the end of each business rule, so that a record is created in the database. However, I'm unable to add an SP from the data flow. I found that I can get the rows written to a sink from the pipeline. However, I am not getting as to how I can tie the sink name and the rows written together and iterate the stored procedure for all the business rules?
Snapshot of how my data flow looks like
The columns that I want to populate
I have considered sink1 and sink2 for storing the data that violates business-rule1 and rule2 respectively in my dataflow activity. I've created a stored procedure for recording the business rule and failed count in log-files. Then execute stored procedure activity in ADF is used and records are inserted in log files. Below are the steps.
Table for log-file.
CREATE TABLE [dbo].[log_file](
[BusinessRule] [varchar](50) NULL,--Business Rule
[count] [varchar](50) NULL--failed rows count
) ON [PRIMARY]
GO
Stored procedure for inserting records in log file through data factory.
Create proc [dbo].[usp_insert_log_file] (#BusinessRule varchar(100),#count varchar(10))
as
begin
insert into log_file values (#BusinessRule,#count)
end
-Dataflow activity has two sinks. It is chained with execute Stored procedure activity.
Stored Procedure has two parameters,
Enter the business rule in business_rule parameter and for count parameter,
In Stored procedure 1, Enter the corresponding business rule for sink1 in BusinessRule field, and for count field, pass the sink1's rowsWritten value from the output of data flow activity.
BusinessRule: 'Business_Rule_1'
Count:
#string(activity('Data flow1').output.runStatus.metrics.sink1.rowsWritten)`
Similarly in Stored Procedure2 activity, pass the sink2 count value and enter the corresponding business rule in parameters
BusinessRule: 'Business_Rule_2'
Count:
#string(activity('Data flow1').output.runStatus.metrics.sink2.rowsWritten)
In this way, We can insert data to logfile from dataflow activity using exec stored procedure activity.

NiFi passing date from ExecuteSQL to PutDatabaseRecord

I need to insert the result of an SQL query into a Postgres table.
For that I use ExecuteSQL("Use Avro Logical Types" true) and PutDatabaseRecord("StatementType" INSERT, "Record Reader" AvroReader) processors.
The insert doesn’t work because Nifi converts the date to this number: '1322683200000' and the column in the destination table is of type date.
I suppose I should either add the "UpdateRecord" processor between "ExecuteSQL" and "PutDatabaseRecord" processors or use "Data Record Path" property in the "PutDatabaseRecord" processor.
But I can't find an example of configuring the UpdateRecord processor or filling the "Data Record Path" property
I tried to do the same process on my side with a dummy select current_date as value; in the sql execute processor and then passing the same over to insert the data into PostgresDB.
The flow was able to insert the date value into the sample table that I created for testing. Can you try the same and see if you have any issues or provide some samples of data and how you are building the data flow.
Insert started to work when I put columns in sql query in the same order as in destination table.
I thought that Nifi should match columns by names

ADF mapping data flow only inserting, never updating

I have an ADF data flow that will only insert. It never updates rows.
Below is a screenshot of the flow, and the Alter Row task that sets the insert/Update policies.
data flow
alter row task
There is a source table and a destination table.
There is a source table for new data.
A lookup is done against the key of the destination table.
Two columns are then generated, a hash of the source data & hash of the destination data.
In the alter row task, the policy's are as follows:
Insert: if the lookup found no matching id.
Update: if lookup found a matching id and the checksums do not match (i.e. user exists but data is different between the source and existing record).
Otherwise it should do nothing.
The Sink allows insert and updates:
Even so, on first run it inserts all records but on second run it inserts all the records again, even if they exist.
I think I am misunderstanding the process and so appreciate any expertise or advise.
Thank you Joel Cochran for your valuable inputs, repro’d the scenario, and posting it as an answer to help other community members.
If you are using the upsert method in the sink, add alter row transformation with upsert if and write the expression for the upsert condition.
If you are using insert and update as your update method in the sink then in alter row transformation use both inserts if and update if conditions to insert and update data accordingly into the sink based on alter row conditions.

Copy activity auto-creates nvarchar(max) columns

I have Azure Data Factory copy activity which loads parquet files to Azure Synapse. Sink is configured as shown below:
After data loading completed I had a staging table structure like this:
Then I create temp table based on stg one and it has been working fine until today when new created tables suddenly received nvarchar(max) type instead of nvarchar(4000):
Temp table creation now is failed with obvious error:
Column 'currency_abbreviation' has a data type that cannot participate in a columnstore index.'
Why the AutoCreate table definition has changed and how can I return it to the "normal" behavior without nvarchar(max) columns?
I've got exactly the same problem! I'm using a data factory to read csv-files into my Azure datawarehouse and this used to result in nvarchar(4000) columns, but now they are all nvarchar(max). I also get the error
Column xxx has a data type that cannot participate in a columnstore index.
My solution for now is to change my SQL code and use a CAST to change the formats, but there must be a setting in the data factory to get the former results back...

Azure Data Factory -> Copy from SQL to Table Storage (boolean mapping)

I am adding pipeline in Azure Data factory to migrate data from SQL to Table storage.
All seems working fine, however i observed that bit column is not getting copies as expected.
I have a filed "IsMinor" in SQL DB.
If i don't add explicit mapping for bit column as is then, it is copied as null
If i set it as 'True' Or 'False' from SQL, it is copied as String instead of Boolean in TableStorage.
I also tried to specify type while mapping the field i.e. "IsMinor (Boolean)", however it didn't worked as well.
Following is my sample table
I want the bit value above to be copied as "Boolean" in Table storage instead of String.
I tried copy the boolean data from my SQL database to table Storage, it works.
As you know, SQL server does't support boolean data type, so I create table like this:
All the data preview look well in Source dataset:
I just create a table test1 in Table storage, let the data factory create the PartitionKey and RowKey automatically.
Run the pipeline and check the data in test1 with Storage Explorer:
From the document Understanding the Table service data model, Table storage do support Boolean property types.
Hope this help.