Stored Procedure as Source in Data Flow - azure-data-factory

I'm trying to execute a stored procedure which will have rows as output but when I try in the Data Flow Source I'm getting error message
DF-SYS-01 at Source 'source1':
com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax near the keyword 'EXEC'.
My Source is Query option and I'm trying to execute
"EXEC [UVREP].spFeedsProduct 'HH',-2"
Can't I use Stored Procedure as Source in Data Flow ? I'm able to do the same in Copy Data Activity it works fine? What I'm doing wrong?

ADF Data Flow source can take queries or UDFs, but not sprocs.
https://learn.microsoft.com/en-us/azure/data-factory/connector-azure-sql-database#source-transformation
As Joel mentioned in comments, you can use an ADF Stored Proc activity in the pipeline to execute the sproc before your data flow and store the results in a table or staging file (Parquet/CSV) for the data flow source to read it.

Thanks MarkKromer and JoelCochran.
Instead of Stored Procedure now I modified using Views. Using a pipeline with lookup and Data Flow inside the for each loop. I have to copy like 12 tables to three different sinks.
Is there a better way?

Related

ADF Mapping Dataflow Temp Table issue inside SP call

I have a mapping dataflow inside a foreach activity which I'm using to copy several tables into ADLS; in the dataflow source activity, I call a stored procedure from my synapse environment. In the SP, I have a small temp table which I create to store some values which I will later use for processing a query.
When I run the pipeline, I get an error on the mapping dataflow; "SQLServerException: 111212: Operation cannot be performed within a transaction." If I remove the temp table, and just do a simple select * from a small table, it returns the data fine; it's only after I bring back the temp table that I get an issue.
Have you guys ever seen this before, and is there a way around this?
If you go through the official MS docs, this error is very well documented.
Failed with an error: "SQLServerException: 111212; Operation cannot be performed within a transaction."
Symptoms
When you use the Azure SQL Database as a sink in the data flow to
preview data, debug/trigger run and do other activities, you may find
your job fails with following error message:
{"StatusCode":"DFExecutorUserError","Message":"Job failed due to reason: at Sink 'sink': shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerException: 111212;Operation cannot be performed within a transaction.","Details":"at Sink 'sink': shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerException: 111212;Operation cannot be performed within a transaction."}
Cause
The error "111212;Operation cannot be performed within a transaction." only occurs in the Synapse dedicated SQL pool. But you
mistakenly use the Azure SQL Database as the connector instead.
Recommendation
Confirm if your SQL Database is a Synapse dedicated SQL pool. If so,
use Azure Synapse Analytics as a connector shown in the picture below.
So after running some tests around this issue, it seems like Mapping Dataflows do not like Temp Tables when I call my stored procedure.
The way I ended up fixing this was that instead of using a Temp Table, I ended up using a CTE, which believe it or not, runs a bit faster than when I used the Temp Table.
#KarthikBhyresh
I looked at that article before, but it wasn't an issue with the sink, I was using Synapse LS as my source and a Data Lake Storage as my sink, so I knew from the beginning that this did not apply to my issue, even though it was the same error number.

ADF: How do I clear a table in SQL?

I have a pipeline that ingests data from Kusto, does some simple transformation, and flows the data to SQL. It will be run once per day, and needs to clear the sink tables in SQL. I thought this would be straightforward (and probably is) but I can't figure out how to do it. Thanks for any assistance!
As #wBob said, if you are using Copy activity in ADF, we can enter TRUNCATE TABLE <your-table-name> at Pre-copy script. It will execute the T-SQL script here before copying.
You have to write a stored procedure on prior to transformation, which can delete your staging data.
Stored procedure->do transformation

Azure Data Factory V2 - Calling a stored procedure that returns multiple result set

I want to create an ADF v2 pipeline that calls a stored procedure in Azure SQL Database. The stored procedure has input parameters and would return multiple result sets (around 3). We need to get it extracted. We are trying to load to Blob storage of 4 different files or load to table.
Is there a way to perform in pipeline?
In SSIS there is option to use script component and extract. https://www.timmitchell.net/post/2015/04/27/the-ssis-object-variable-and-multiple-result-sets/
Looking for suggestions in Data factory.
You cannot easily accomplish that in Azure Data Factory (ADF) as the Stored Procedure activity does not support resultsets at all and the Copy activity does not support multiple resultsets. However with a few small changes you could get the same outcome: you have a couple of options:
If the code and SSIS package already exist and you want to minimise your refactoring, you could host it in ADF via the SSIS-IR
Maybe you could accomplish this with an Azure Function which are kind of roughly equivalent to the SSIS Script Tasks, but it seems like a bit of a waste of time to me. It's an unproven pattern and you have simpler options such as:
Break the stored proc into parts: have it process its data and not return any resultsets. Alter the proc to place the three resultsets in tables instead. Have multiple Copy activities which will run in parallel, copy the data to blob store after the main Stored Proc activity has finished, something like this:
It's also possible to trick the Lookup activity to run stored procedures for you but the outputs are limited to 5,000 rows and it's not like you can pipe it into a Copy activity afterwards. I would recommend option 3 which will get the same outcome with only a few changes to your proc.

How to Update Table in Snowflake using Azure Data Factory

I have two tables in snowflake named table1 and table2. Table1 is the source table which contains incremental data and table2 is the target table.
So my usecase is I have to take data from table1 and update the data into table2 but this process has to be done using Azure Data Factory.
I tried to create a data flow in ADF but it didn't allowed me to connect with the snowflake directly as it is not in the supported sources list. The native snowflake connector only supports the Copy Data Activity. So as a work around I first created a copy activity which copy the data from snowflake to azure blob. Then used the Azure Blob as source for Data Flow to create my scd1 implementation and saved the output in csv files.
Now My question is how should I update the data in target table2. Because If I directly use the copy activity to copy the csv files into snowflake then it will result in the duplicate records at snowflake side. For instance lets say table2 contains a row
id,name,age,data
1234,kristopher,24,somedata
and table1 contains
id,name,age,data
1234,kristopher,24,some-new-data
So now I have table1 data in csv which has to be loaded in snowflake. If I am loading directly then the resultant looks something like this.
id,name,age,data
1234,kristopher,24,somedata
1234,kristopher,24,some-new-data
But I only need
1234,kristopher,24,some-new-data
Let me know if some more explanation is required. I am new to Azure Data Factory and Snowflake as well.
Thanks
As you have observed, the ADF Data Flows currently don't support Snowflake datasets as a source.
You could theoretically follow this design pattern but it seems like alot of work for the requirement you have described. An alternative would be to go down the Azure Function route, but again I would trade off the requirement vs. effort required.
If it didn't have to be in ADF, then a quick approach would be to use a Snowflake Task to schedule some SQL to manage the SCD behavior for you.
I hope this helps.
Best regards,
Dan.
you can put your login in a snowflake stored procedure, then execute your stored proc in ADF

How to get max of a given column from ADF Copy Data activity

I have a copy data activity for on-premise SQL Server as source and ADLS Gen2 as sink. There is a control table to pickup tableName, watermarkDateColumn and the watermarkDatetime to pull incremental data from the source database.
After data is pulled/loaded in sink, I want to get the max of the watermarkDateColumn in my dataset. Can it be obtained from #activity('copyActivity1').output?
I'm not allowed to use one extra lookup activity to query the source table for getting the max(watermarkDateColumn) in pipeline.
Copy activity only could be used for data transmission,not for any other aggregation feature. So #activity('copyActivity1').output won't help. Since you said you can't use lookup activity, i'm afraid your requirement is not available so far.
If you prefer not using additional activities, I suggest you using Data Flow Activity instead which is more flexible.There is built-in aggregation feature in the Data Flow Activity.