Copy Data Sink Validation - azure-data-factory

How to Use Copy data activity to check against sink values
My Data Sources:
SourceDataset : Source_SQL_DB
DestinationDataset : Destination_SQL_DB
SourceTable : SourceTableName
Column : Name,Age,Gender,Location
DestinationTable : DestinationTableName
Column : Name,Age,Gender,Location
Below is my scenario :
I have to validate Source before moving to sinkTable by checking Destination should not have the values
On Copy data, i can directly load data,
How to pass the Location in Source Query since my source will be connecting to source dataset only
select * from SourceTableName where Location in (select distinct Location from DestinationTableName)
How to check is the name present in the destination dataset table, If name is present, i should not insert data.
select * from SourceTableName where name not in (select distinct name from DestinationTableName )

assuming both your source and sink are sql, you can use a lookup activity to get the list of names and location as comma seperated and either save them in a variable or use it to directly in source query.
Another way would be to load the source dara as is in staging table and then leveraging a stored procedure activity.
The final way would be to use dataflows

Related

Azure Synapse Upsert Record into Dedicated Sql Pool

We have requirement that we need to fetch json data from the datalake storage and insert/update data into synapse tables based on the lastmodified field in the source json and table column.
we need to perform either insert/update record based on following conditions.
if(sourceJson.id==table.id) //assume record already exists
{
if (SourceJson.lastmodified > table.lastmodified){
//update existing record
}
else if(SourceJson.lastmodified<table.lastmodified){
//ignore record
}
}
else{
//insert record
}
is there any way to achieve this, if there please help me on this by sharing any sample flow.
Thanks
The copy data activity and azure dataflows both have an option to Upsert. But they would not help your requirement.
Since you have a key column id and also have a special condition based on which you want to either insert or ignore a record, you can create a stored procedure first in your azure synapse dedicated pool.
The following is the data available in my table:
The following is the data available in my JSON:
[
{
"id":1,
"first_name":"Ana",
"lastmodified":"2022-09-10 07:00:00"
},
{
"id":2,
"first_name":"Cassy",
"lastmodified":"2022-09-07 07:00:00"
},
{
"id":5,
"first_name":"Topson",
"lastmodified":"2022-09-10 07:00:00"
}
]
Use lookup to read the input JSON file. Create a dataset, uncheck first row only and run it. The following is my debug output:
Now, create a stored procedure. If I have created it directly in my Synapse pool (You can use use script activity to create it).
CREATE PROCEDURE mymerge
#array varchar(max)
AS
BEGIN
--inserting records whose id are not present in table
insert into demo1 SELECT * FROM OPENJSON(#array) WITH ( id int,first_name varchar(30),lastmodified datetime) where id not in (select id from demo1);
--using MERGE to update records based on matching id and lastmodified column condition
MERGE into demo1 as tgt
USING (SELECT * FROM OPENJSON(#array) WITH ( id int,first_name varchar(30),lastmodified datetime) where id in (select id from demo1)) as ip
ON (tgt.id = ip.id and ip.lastmodified>tgt.lastmodified)
WHEN MATCHED THE
UPDATE SET tgt.first_name = ip.first_name, tgt.lastmodified = ip.lastmodified;
END
Create a stored procedure activity. Select the above created Stored procedure and pass the lookup output array as a string parameter to stored procedure to get the required result.
#string(activity('Lookup1').output.value)
Running this would give the required result.

adf v2 copy data tool with 787 tables (sql server) only creates pipeline with 340 in pipeline parameters json. why?

I'm using the Azure Data factor's Copy Data Tool. My source is on prem sql server. My sink target is azure sql mi. I am selecting 787 tables and the entire wizard process works fine and seemingly generates the artifacts (pipelines, datasets, linked services, etc.).
When I ran the pipeline it only copied 340 tables due to the pipeline parameter only having the json code for 340 tables instead of 787! I repeated the same Copy Data Tool steps and got the same results a second time. The pipeline parameter is of type 'array' and appears to be json that was generated to feed all the tables I selected during the adf copy data tool wizard.
Am I hitting some sort of limit and do you know a way around it?
I think we can use a query to get all the table names and then pass these table names to a copy activity.
select concat(s.name,'.',ov.name) as tableName
from
sysobjects o
join sys.objects ov on o.id = ov.object_id
join sys.schemas s ON ov.schema_id = s.schema_id
where o.xtype = 'U'
For example:
Use a Lookup activity to get the list of table names.
Then foreach the list via exression #activity('Lookup1').output.value.
Inside the Foreach activity, we can use a Copy activity.
We can edit the table name at Source tab, key in a dynamic content #item().name.
The same we can edit table name at Sink tab.

Extract and process select/update query constraints in Teiid ddl in Virtual DB

I am using Teiid vdb model where i need to extract query constraints inside the ddl and use it in a stored procedure to fetch results of my choice. For example, if I run following query :
select * from Student where student_name = 'st123', i want to pass st123 to my procedure and return the results based on some processing.
How can i extract this constraint inside the ddl instead of teiid doing the filtering for me and returning the matching row. Is there a way around developing the connector and handling this in vdb instead?
See http://teiid.github.io/teiid-documents/master/content/reference/r_procedural-relational-command.html
If you have the procedure:
create virtual procedure student (in student_name string) returns table (<some cols>) as
begin
if (student_name like '...')
...
end
then you can all it as if it were a table:
select * from student where student_name = 'st123'

shredding xml file with xmltable db2

Its possibile give input at XMLTable xml file store in my desktop pc?
How ?
select id, name, T.*
INSERT INTO abc(name)
SELECT x.name
from XMLTABLE('$i/product' PASSING CAST(? AS XML) as "i"
COLUMNS
name VARCHAR(10) PATH 'name',
) as x;
How pass my file store on the desktop in my pc?
thank you
SQL statements can't access files outside of the database manager. To load data in an XML file and write it to a table, you'd either need to use a database utility (like LOAD or IMPORT), or write your own program to read the data from files on the client machine (i.e., your PC) and perform the inserts.

How to update an XML column in DB2 with XMLQuery?

I want to update an XML column in DB2 with dynamic values or you can say with values that I'll pick from another table and insert them in the xml column.
I know how to insert a node along with its value that we provide by
hard coding it, e.g.
<data>some_value</data>
I want to do it in the following way:
UPDATE my_table SET my_table_column = XMLQuery(..... <data>???</data>)
WHERE my_table_id = other_table_id;
Where I place ??? I need a kind of select statement here which will come up with actual value for the node.