I have a UDF function in Cosmos DB , it takes a parameter and returns the documents that meets the condition based on the parameter.
Each document returned by this UDF has 3 fields,
Customer ID
Modified Date
Customer Status
I need this information in a SQL Server SP present in another database.
I am thinking of having a powershell script to bring this data from the Cosmos DB , store it in a table local to the SQL server database , and then use this table eventually in the SP.
I wondering if my above approach to fetch data from Cosmos DB to SQL Server database is right, and if so could I know if we can execute a cosmos DB UDF from a powershell script and use the result set returned by the UDF.
Based on your description,maybe you could use Azure Data Factory.
Step1: Follow the article to create Copy activity.
Step2: Configure Cosmos db source data:
sql:
SELECT udf.adf(c.fields).CustomerID,
udf.adf(c.fields).ModifiedDate,
udf.adf(c.fields).CustomerStatus FROM c
Then,please follow the steps from this doc:
Step 3: Configure your Sink dataset:
Step 4: Configure Sink section in copy activity as follows:
Step 5: In your database, define the table type with the same name as sqlWriterTableType. Notice that the schema of the table type should be same as the schema returned by your input data.
CREATE TYPE [dbo].[CsvType] AS TABLE(
[ID] [varchar](256) NOT NULL,
[Date] [varchar](256) NOT NULL,
[Status ] [varchar](256) NOT NULL
)
Step 6: In your database, define the stored procedure with the same name as SqlWriterStoredProcedureName. It handles input data from your specified source, and merge into the output table. Notice that the parameter name of the stored procedure should be the same as the "tableName" defined in dataset.
Create PROCEDURE convertCsv #ctest [dbo].[CsvType] READONLY
AS
BEGIN
MERGE [dbo].[adf] AS target
USING #ctest AS source
ON (1=1)
WHEN NOT MATCHED THEN
INSERT (id,data,status)
VALUES (source.ID,source.Date,source.Status );
END
Related
I am trying to get a geography data type from a production DB to another DB on a nightly occurrence. I really wanted to leverage upsert as the write activity, but it seems that geography is not supported with this method. I was reading a similar post about bringing the data through ADF as a well known text data type and then changing it, but I keep getting confused on what to do with the data once it is brought over as a well known data type. I would appreciate any advice, thank you.
Tried to utilize ADF pipelines and data flows. Tried to convert the data type once it was in the destination, but then I was not able to run the pipeline again.
I tried to upsert the data with geography datatype from one Azure SQL database to another using copy activity and got error message.
Then, I did the upsert using dataflow activity. Below are the steps.
A source table is taken in dataflow as in below image.
CREATE TABLE SpatialTable
( id int ,
GeogCol1 geography,
GeogCol2 AS GeogCol1.STAsText() );
INSERT INTO SpatialTable (id,GeogCol1)
VALUES (1,geography::STGeomFromText('LINESTRING(-122.360 46.656, -122.343 46.656 )', 4326));
INSERT INTO SpatialTable (id,GeogCol1)
VALUES (2,geography::STGeomFromText('POLYGON((-122.357 47.653 , -122.348 47.649, -122.348 47.658, -122.358 47.658, -122.358 47.653))', 4326));
Then Alter Row transformation is taken and in Alter Row Conditions, Upsert if isNull(id)==false()is given. (Based on the column id, sink table upserted)
Then, in Sink dataset for target table is given. In sink settings, Update method is selected as Allow Upsert and required Key column is given. (Here column id is selected)
When pipeline is run for the first time, data is inserted into target table.
When pipeline is run for the second time by updating the existing data and inserting new records to source, data is upserted correctly.
Source Data is changed for id=1 and new row is inserted with id=3
Sink data is reflecting the changes done in source.
I'm new to ADF and trying to build an Azure Data Flow Pipeline. I'm reading from a Snowflake data source and checking the data against several business rules. After each check, I'm writing the bad records to a csv file. Now, my requirement is that I need to create a log table which shows the business rule and the number of records that failed to pass that particular business rule. I've attached a screenshot of my ADF data flow as well as the structure of the table I'm trying to populate.
My idea was to create a stored proc that will be called at the end of each business rule, so that a record is created in the database. However, I'm unable to add an SP from the data flow. I found that I can get the rows written to a sink from the pipeline. However, I am not getting as to how I can tie the sink name and the rows written together and iterate the stored procedure for all the business rules?
Snapshot of how my data flow looks like
The columns that I want to populate
I have considered sink1 and sink2 for storing the data that violates business-rule1 and rule2 respectively in my dataflow activity. I've created a stored procedure for recording the business rule and failed count in log-files. Then execute stored procedure activity in ADF is used and records are inserted in log files. Below are the steps.
Table for log-file.
CREATE TABLE [dbo].[log_file](
[BusinessRule] [varchar](50) NULL,--Business Rule
[count] [varchar](50) NULL--failed rows count
) ON [PRIMARY]
GO
Stored procedure for inserting records in log file through data factory.
Create proc [dbo].[usp_insert_log_file] (#BusinessRule varchar(100),#count varchar(10))
as
begin
insert into log_file values (#BusinessRule,#count)
end
-Dataflow activity has two sinks. It is chained with execute Stored procedure activity.
Stored Procedure has two parameters,
Enter the business rule in business_rule parameter and for count parameter,
In Stored procedure 1, Enter the corresponding business rule for sink1 in BusinessRule field, and for count field, pass the sink1's rowsWritten value from the output of data flow activity.
BusinessRule: 'Business_Rule_1'
Count:
#string(activity('Data flow1').output.runStatus.metrics.sink1.rowsWritten)`
Similarly in Stored Procedure2 activity, pass the sink2 count value and enter the corresponding business rule in parameters
BusinessRule: 'Business_Rule_2'
Count:
#string(activity('Data flow1').output.runStatus.metrics.sink2.rowsWritten)
In this way, We can insert data to logfile from dataflow activity using exec stored procedure activity.
I have Azure Data Factory copy activity which loads parquet files to Azure Synapse. Sink is configured as shown below:
After data loading completed I had a staging table structure like this:
Then I create temp table based on stg one and it has been working fine until today when new created tables suddenly received nvarchar(max) type instead of nvarchar(4000):
Temp table creation now is failed with obvious error:
Column 'currency_abbreviation' has a data type that cannot participate in a columnstore index.'
Why the AutoCreate table definition has changed and how can I return it to the "normal" behavior without nvarchar(max) columns?
I've got exactly the same problem! I'm using a data factory to read csv-files into my Azure datawarehouse and this used to result in nvarchar(4000) columns, but now they are all nvarchar(max). I also get the error
Column xxx has a data type that cannot participate in a columnstore index.
My solution for now is to change my SQL code and use a CAST to change the formats, but there must be a setting in the data factory to get the former results back...
I have a U-SQL managed table that contains schematized structured data.
CREATE TABLE [AdlaDb].[dbo].[User]
(
UserGuid Guid,
Postcode string,
Age int?
DateOfBirth DateTime?,
)
And a Azure SQL Database table.
CREATE TABLE [SqlDb].[dbo].[User]
(
UserGuid uniqueidentifier NOT NULL,
Postcode varchar(15) NULL,
Age int NULL,
DateOfBirth Date NULL,
)
I would like to transfer data from U-SQL managed table to Azure SQLDB table without losing the data types.
I'm using azure data factory, seems like I cannot
directly query the U-SQL managed table as an input dataset for data factory
do a federated write query to Azure SQLDB
Hence I'm having an intermediate step where I copy from U-SQL managed table to Azure Blob and then move to Azure SQLDB table. Doing this, I'm losing the data type and having to have type conversion/transformations later again before inserting.
Is there any better way to transfer data from U-SQL managed table to Azure SQL Database table without losing data type? Or am I missing something?
At this point you have the following option:
Export the U-SQL table into an intermediate format (e.g., CSV) in ADLS or blob storage.
Use ADF to move the file into Azure SQL DB.
I know that the ADF team has a work item to do this for you. I will ask them to reply to this thread as well.
Directly writing into a table from a U-SQL script has a lot of challenges due to the fault-tolerant retry and scale-out processing in U-SQL. This makes atomic writing in parallel into a transacted store a bit more complex (see for example http://www.vldb.org/conf/1996/P460.PDF).
There is now another option to transfer data from USQL managed table to Azure SQL Database table.
Write out the data from USQL Managed table or from a USQL script to Azure Blob Storage as a text file (.csv, .txt etc..)
And then make use of the public preview feature in Azure SQL Database - BULK INSERT - wrap this into a stored procedure
Add an Stored procedure activity in Azure Data Factory to schedule
Note: There is one thing to be aware of when creating DATABASE SCOPED CREDENTIAL, refer this Stack Overflow question
I have one stored proc in db2 (luw) on schema A which looks like below.
CREATE PROCEDURE A.GET_TOTAL (IN ID CHARACTER(23))
BEGIN
DECLARE CURSOR1 CURSOR WITH HOLD WITH RETURN TO CLIENT FOR
SELECT * FROM B.employee e where e.id=ID
END
Given sproc which exist on schema "A" runs query on another schema "B". This another schema name B may changed based on application logic. How can i pass the schema name as parameter to this sproc?
First, I do not think that stored procedure works because the select statement is not defined in a cursor or a prepared statement.
If you want to execute a dynamic SQL in a stored procedure, you need to define in a stmt, then prepare it and execute it.
Let's suppose you pass the schema name as parameter; If you want to change the schema, you can execute dynamically "set current schema" or concatenate the schema name in your query.
For more information: http://www.toadworld.com/platforms/ibmdb2/w/wiki/7461.prepare-execute-and-parameter-markers.aspx