Custom logging in Azure Data Factory - azure-data-factory

I'm new to ADF and trying to build an Azure Data Flow Pipeline. I'm reading from a Snowflake data source and checking the data against several business rules. After each check, I'm writing the bad records to a csv file. Now, my requirement is that I need to create a log table which shows the business rule and the number of records that failed to pass that particular business rule. I've attached a screenshot of my ADF data flow as well as the structure of the table I'm trying to populate.
My idea was to create a stored proc that will be called at the end of each business rule, so that a record is created in the database. However, I'm unable to add an SP from the data flow. I found that I can get the rows written to a sink from the pipeline. However, I am not getting as to how I can tie the sink name and the rows written together and iterate the stored procedure for all the business rules?
Snapshot of how my data flow looks like
The columns that I want to populate

I have considered sink1 and sink2 for storing the data that violates business-rule1 and rule2 respectively in my dataflow activity. I've created a stored procedure for recording the business rule and failed count in log-files. Then execute stored procedure activity in ADF is used and records are inserted in log files. Below are the steps.
Table for log-file.
CREATE TABLE [dbo].[log_file](
[BusinessRule] [varchar](50) NULL,--Business Rule
[count] [varchar](50) NULL--failed rows count
) ON [PRIMARY]
GO
Stored procedure for inserting records in log file through data factory.
Create proc [dbo].[usp_insert_log_file] (#BusinessRule varchar(100),#count varchar(10))
as
begin
insert into log_file values (#BusinessRule,#count)
end
-Dataflow activity has two sinks. It is chained with execute Stored procedure activity.
Stored Procedure has two parameters,
Enter the business rule in business_rule parameter and for count parameter,
In Stored procedure 1, Enter the corresponding business rule for sink1 in BusinessRule field, and for count field, pass the sink1's rowsWritten value from the output of data flow activity.
BusinessRule: 'Business_Rule_1'
Count:
#string(activity('Data flow1').output.runStatus.metrics.sink1.rowsWritten)`
Similarly in Stored Procedure2 activity, pass the sink2 count value and enter the corresponding business rule in parameters
BusinessRule: 'Business_Rule_2'
Count:
#string(activity('Data flow1').output.runStatus.metrics.sink2.rowsWritten)
In this way, We can insert data to logfile from dataflow activity using exec stored procedure activity.

Related

Azure Data Factory Copy Pipeline with Geography Data Type

I am trying to get a geography data type from a production DB to another DB on a nightly occurrence. I really wanted to leverage upsert as the write activity, but it seems that geography is not supported with this method. I was reading a similar post about bringing the data through ADF as a well known text data type and then changing it, but I keep getting confused on what to do with the data once it is brought over as a well known data type. I would appreciate any advice, thank you.
Tried to utilize ADF pipelines and data flows. Tried to convert the data type once it was in the destination, but then I was not able to run the pipeline again.
I tried to upsert the data with geography datatype from one Azure SQL database to another using copy activity and got error message.
Then, I did the upsert using dataflow activity. Below are the steps.
A source table is taken in dataflow as in below image.
CREATE TABLE SpatialTable
( id int ,
GeogCol1 geography,
GeogCol2 AS GeogCol1.STAsText() );
INSERT INTO SpatialTable (id,GeogCol1)
VALUES (1,geography::STGeomFromText('LINESTRING(-122.360 46.656, -122.343 46.656 )', 4326));
INSERT INTO SpatialTable (id,GeogCol1)
VALUES (2,geography::STGeomFromText('POLYGON((-122.357 47.653 , -122.348 47.649, -122.348 47.658, -122.358 47.658, -122.358 47.653))', 4326));
Then Alter Row transformation is taken and in Alter Row Conditions, Upsert if isNull(id)==false()is given. (Based on the column id, sink table upserted)
Then, in Sink dataset for target table is given. In sink settings, Update method is selected as Allow Upsert and required Key column is given. (Here column id is selected)
When pipeline is run for the first time, data is inserted into target table.
When pipeline is run for the second time by updating the existing data and inserting new records to source, data is upserted correctly.
Source Data is changed for id=1 and new row is inserted with id=3
Sink data is reflecting the changes done in source.

ADF mapping data flow only inserting, never updating

I have an ADF data flow that will only insert. It never updates rows.
Below is a screenshot of the flow, and the Alter Row task that sets the insert/Update policies.
data flow
alter row task
There is a source table and a destination table.
There is a source table for new data.
A lookup is done against the key of the destination table.
Two columns are then generated, a hash of the source data & hash of the destination data.
In the alter row task, the policy's are as follows:
Insert: if the lookup found no matching id.
Update: if lookup found a matching id and the checksums do not match (i.e. user exists but data is different between the source and existing record).
Otherwise it should do nothing.
The Sink allows insert and updates:
Even so, on first run it inserts all records but on second run it inserts all the records again, even if they exist.
I think I am misunderstanding the process and so appreciate any expertise or advise.
Thank you Joel Cochran for your valuable inputs, repro’d the scenario, and posting it as an answer to help other community members.
If you are using the upsert method in the sink, add alter row transformation with upsert if and write the expression for the upsert condition.
If you are using insert and update as your update method in the sink then in alter row transformation use both inserts if and update if conditions to insert and update data accordingly into the sink based on alter row conditions.

Call Azure Cosmos DB UDF function from a powershell script

I have a UDF function in Cosmos DB , it takes a parameter and returns the documents that meets the condition based on the parameter.
Each document returned by this UDF has 3 fields,
Customer ID
Modified Date
Customer Status
I need this information in a SQL Server SP present in another database.
I am thinking of having a powershell script to bring this data from the Cosmos DB , store it in a table local to the SQL server database , and then use this table eventually in the SP.
I wondering if my above approach to fetch data from Cosmos DB to SQL Server database is right, and if so could I know if we can execute a cosmos DB UDF from a powershell script and use the result set returned by the UDF.
Based on your description,maybe you could use Azure Data Factory.
Step1: Follow the article to create Copy activity.
Step2: Configure Cosmos db source data:
sql:
SELECT udf.adf(c.fields).CustomerID,
udf.adf(c.fields).ModifiedDate,
udf.adf(c.fields).CustomerStatus FROM c
Then,please follow the steps from this doc:
Step 3: Configure your Sink dataset:
Step 4: Configure Sink section in copy activity as follows:
Step 5: In your database, define the table type with the same name as sqlWriterTableType. Notice that the schema of the table type should be same as the schema returned by your input data.
CREATE TYPE [dbo].[CsvType] AS TABLE(
[ID] [varchar](256) NOT NULL,
[Date] [varchar](256) NOT NULL,
[Status ] [varchar](256) NOT NULL
)
Step 6: In your database, define the stored procedure with the same name as SqlWriterStoredProcedureName. It handles input data from your specified source, and merge into the output table. Notice that the parameter name of the stored procedure should be the same as the "tableName" defined in dataset.
Create PROCEDURE convertCsv #ctest [dbo].[CsvType] READONLY
AS
BEGIN
MERGE [dbo].[adf] AS target
USING #ctest AS source
ON (1=1)
WHEN NOT MATCHED THEN
INSERT (id,data,status)
VALUES (source.ID,source.Date,source.Status );
END

Spring Batch reader for temp table (create and insert) and stored procedure execution combination

I am creating a spring batch application to migrate the data from legacy Sybase database to csv files which can be loaded into target systems.
I am facing problem in designing reader configuration.
Possible combination of inputs for reader:
Direct SQL query (JdbcCursorReader is suitable) - No issues
Stored Procedure (stored procedure reader is suitable) - No issues
Sequence of below steps execution to get input - My Problem
Create temp table
Insert values into temp table
Execute stored procedure (reads input from temp table, process them and write output into same temp table)
Read data from the inserted temp table
I am blocked with this requirement #3, kindly help me with the solution.
Note: I am doing Spring boot application with dynamic configuration for Spring Batch.
ItemReader<TreeMap<Integer, TableColumn>> itemReader = ReaderBuilder.getReader(sybaseDataSource, sybaseJdbcTemplate, workflowBean);
ItemProcessor<TreeMap<Integer, TableColumn>, TreeMap<Integer, TableColumn>> itemProcessor = ProcessorBuilder.getProcessor(workflowBean);
ItemWriter<TreeMap<Integer, TableColumn>> itemWriter = WriterBuilder.getWriter(workflowBean);
JobCompletionNotificationListener listener = new JobCompletionNotificationListener();
SimpleStepBuilder<TreeMap<Integer, TableColumn>, TreeMap<Integer, TableColumn>> stepBuilder = stepBuilderFactory
.get(CommonJobEnum.SBTCH_JOB_STEP_COMMON_NAME.getValue()).allowStartIfComplete(true)
.<TreeMap<Integer, TableColumn>, TreeMap<Integer, TableColumn>>chunk(10000).reader(itemReader);
if (itemProcessor != null) {
stepBuilder.processor(itemProcessor);
}
Step step = stepBuilder.writer(itemWriter).build();
String jobName = workflowBean.getiMTWorkflowTemplate().getNameWflTemplate() + workflowBean.getIdWorkflow();
job = jobBuilderFactory.get(jobName).incrementer(new RunIdIncrementer()).listener(listener).flow(step).end().build();
jobLauncher.run(job, jobParameters);
'Sybase' was the name of a company (which was bought out by SAP several years ago). There were (at least) 4 different database products produced under the Sybase name ... Adaptiver Server Enterprise (ASE), SQL Anywhere, IQ, Advantage DB.
It would help if you state which Sybase database product you're trying to extract data from.
Assuming you're talking about ASE ...
If all you need to do is pull data out of Sybase tables then why jump through all the hoops of writing SQL, procs, Spring code, etc? Or is this some sort of homework assignment (but even so, why go this route)?
Just use bcp (OS-level utility that comes with the Sybase dataserver) to pull the data from your Sybase tables. With a couple command line flags you can tell bcp to write the data to a delimited file.
I'm pretty sure that you will have issues accessing a temporary table within a stored procedure that is created outside a stored procedure because the stored proc runs as the writer of the proc, not the executer. So the temp table doesn't belong to and is not visible to the writer of the proc so it can't access it.
You could either create the temp table within the proc, or use a permanent table and either lock the table when you are using it for this, or create a key on the table which you pass into the proc so it only processes the data you have just passed in.

Triggering Code when a specific value is inserted into a table column in an Azure SQL Table

I am seeking suggestions on methods to trigger the running of code based on specific event occurring.
Basically I need to monitor all inserts into a table and compare a column value against a parameter set in another table.
For example, when a new record is added to the table and the column [Temperature] is greater than 30 (which is a value set in another table). Send an alert email to notify of this situation.
You can create a trigger (special type of stored procedure) that is automatically executed after an insert happened. Documentation for triggers is here: https://technet.microsoft.com/en-us/library/ms189799(v=sql.120).aspx
You will not be able to send an email out of SQL Database though.
Depending on how quick you need the notification after the insert, maybe you could make an insert into yet another table from within the trigger and query this new table periodically (e.g. using a script in Azure automation) and have the email logic outside the database.