How to create error based on Lookup value in Data Factory? - azure-data-factory

I have Azure Data Factory pipeline.
After processing data, I would like to validate from Azure SQL database to catch exception, which were not catched by data factory. There are situations were no new rows were created because of errors in system.
So I would create Lookup to make SELECT COUNT statement to check specific ID exists or not.
Value 0 would mean that no required row is created and error should occur in data factory.
How to create error for monitoring data factory if lookup value is 0.

You can lookup the sql sink with query to count the rows in it. Then use IfCondition activity to compare the result with 0 and proceed with any action as necessary.
Query: SELECT count(*) FROM [dbo].[myStudents]
IfActivityExpression: #equals(activity('Lookup SQL').output.value,0)

Related

How to map the iterator value to sink in adf

A question concerning Azure Data Factory.
I need to persist the iterator value from a lookup activity (an Id column from a sql table) to my sink together with other values.
How to do that?
I thought that I could just reference the iterator value as #{item().id} as source and a destination column name from from my sql table sink. That doesn’t seems to work. The resulting value in the destination column is NULL.
I have used 2 look up activities, one for id values and the other for remaining values. Now, to combine and insert these values to sink table, I have used the following:
The ids look up activity output is as following:
I have one more column to combine with above id values. The following is the look up output for that:
I have given the following dynamic content as the items value in for each as following:
#range(0,length(activity('ids').output.value))
Inside for each activity, I have given the following script activity query to insert data as required into sink table:
insert into t1 values(#{activity('ids').output.value[item()].id},'#{activity('remaining rows').output.value[item()].gname}')
The data would be inserted successfully and the following is the reference image of the same:

Azure Data Factory Copy Activity Pipeline Destination Mapping String Format Date to Sql Date column Warning

I am doing copy activity to load the data from azure data factory to on premise SQL table.
I could see in copy activity column Mapping, there is warning message like source column is string with date and time value (2022-09-13 12:53:28) so that i created target SQL table column is date data type.
While import mapping in copy activity, i could see the whatever date column i mapped in SQL. there is warning message throwing in ADF. kindly advise, how do we resolve it.
screenshot:
The warning just indicates that it copy data will truncate source column data when additional data information is found in a column value. There would not be any error in this case but there might be data loss.
In your case, since the column value is 2022-09-13 12:53:28, it will be inserted without any issue into the datetime column without truncation.
The following is a demonstration where I try to insert the following source data:
id,first_name,date
1,Wenona,2022-09-13 12:53:28
2,Erhard,2022-09-13 13:53:28
3,Imelda,2022-09-13 14:53:28
The copy activity runs successfully and inserts the data. The following is my target table data after inserting:
When I insert the following data, it would be truncated to just include a precision of 2 digits of milli seconds as shown below.
id,first_name,date
1,Wenona,2022-09-13 12:53:28.11111
2,Erhard,2022-09-13 13:53:28.11111
3,Imelda,2022-09-13 14:53:28.11111

ADF add ActivityRunID to Sink table

I have a sink table that I would like to populate with the ActivityRunID of the Copy Data iteration within the Until loop
I understand that I cannot map ActivityRunID within the Copy Data task until that task is completed. My Until looks like this:
Is there an easy way to populate my sink with the RunID once the Copy Data task has finished? I was thinking of populating the sink with a dummy GUID then using a Lookup task to populate it in a subsequent task
If you want to use the Lookup activity just to pull your dummy GUID.
Alternatively, you can generate a GUID using data factory dynamic expressions and store it in a variable using Set Variable activity. And use the variable in later activities directly.
If you want to use the copy data activity RunID, create a stored procedure to update the value in the sink with the input parameter and pass the parameter of activity RunID from the stored procedure activity.
Parameter value: #activity('Copy data1').ActivityRunId

azure data factory how to fetch source distinct count and target distinct count for data validation

I am copying data from azure sql to azure storage. I need the validation for source and target like the record count for each column in source and destination also distinct record count for each column in source and destination.
The output must be in a table or file.
According my understanding, the source table and target are both sources.
Then you could use Data Flow to achieve that:
Source 1: source table--> use query to get the columnname, sourcerowcount and
sourcedistinctcount.
Source 2: target table-->user query to get targetrowcount and
targetdistinctcount.
Join active: join this two source output
HTH.

How do I query Postgresql with IDs from a parquet file in an Data Factory pipeline

I have an azure pipeline that moves data from one point to another in parquet files. I need to join some data from a Postgresql database that is in an AWS tenancy by a unique ID. I am using a dataflow to create the unique ID I need from two separate columns using a concatenate. I am trying to create where clause e.g.
select * from tablename where unique_id in ('id1','id2',id3'...)
I can do a lookup query to the database, but I can't figure out how to create the list of IDs in a parameter that I can use in the select statement out of the dataflow output. I tried using a set variable and was going to put that into a for-each, but the set variable doesn't like the output of the dataflow (object instead of array). "The variable 'xxx' of type 'Array' cannot be initialized or updated with value of type 'Object'. The variable 'xxx' only supports values of types 'Array'." I've used a flatten to try to transform to array, but I think the sync operation is putting it back into JSON?
What a workable approach to getting the IDs into a string that I can put into a lookup query?
Some notes:
The parquet file has a small number of unique IDs compared to the total unique IDs in the database.
If this were an azure postgresql I could just use a join in the dataflow to do the join, but the generic postgresql driver isn't available in dataflows. I can't copy the entire database over to Azure just to do the join and I need the dataflow in Azure for non-technical reasons.
Edit:
For clarity sake, I am trying to replace local python code that does the following:
query = "select * from mytable where id_number in "
df = pd.read_parquet("input_file.parquet")
df['id_number'] = df.country_code + df.id
df_other_data = pd.read_sql(conn, query + str(tuple(df.id_number))
I'd like to replace this locally executing code with ADF. In the ADF process, I have to replace the transformation of the IDs which seems easy enough if a couple of different ways. Once I have the IDs in the proper format in a column in a dataset, I can't figure out how to query a database that isn't supported by Data Flow and restrict it to only the IDs I need so I don't bring down the entire database.
Due to variables of ADF only can store simple type. So we can define an Array type paramter in ADF and set default value. Paramters of ADF support any type of elements including complex JSON structure.
For example:
Define a json array:
[{"name": "Steve","id": "001","tt_1": 0,"tt_2": 4,"tt3_": 1},{"name": "Tom","id": "002","tt_1": 10,"tt_2": 8,"tt3_": 1}]
Define an Array type paramter and set its default value:
So we will not get any error.