Escaping single quotes in ADF dataflow - azure-data-factory

I am executing this query:
select * from schema.country where country_code='abc' and country='INDIA'
inside ADF dataflow. I am getting this query from the database so I don't want to hardcode it inside the data flow. All the data flow inputs are adding a single quotes while executing. The dataflow is failing because of parsing error.
How can I escape single quotes ?

You can just pass your query in double quotes ("") in the dataflow to escape single quotes.
Please go through the below 3 approaches based on your usage of query in the dataflow.
This is my sample data from database:
If you are using Query in the source of dataflow, you can give the below expression in the Expression builder.
"select * from [dbo].[country] where country_code='abc' and country_name='INDIA'"
Output Data preview:
If you want to get your query from your database, then create a stored procedure for the query and access it inside the dataflow.
My stored procedure in the database:
create or alter procedure dbo.proc2
as
begin
select * from [dbo].[country] where country_code='abc' and country_name='INDIA'
end;
Use this query inside stored procedure in the dataflow like below. You will get the same output as above.
If you are using pipeline paramter to pass query to the dataflow, then create a parameter in the dataflow with default value. Give your query in double quotes in the pipeline parameters expression. Check the Expression check box after this.
Then give the $parameter1 in the expression builder of query and execute. Please check this SO thread to learn more it.
My csv in sink after pipeline execution in this process:

Related

Azure Data Factory Data Flow: pipeline expression issue for timestamp string

I am working on adf from snowflake to adls using data flow:
I am using Pipeline expression:
#concat('SELECT * FROM mySchema.myTable WHERE loadDate >= ', '''', '2022-07-01', '''')
It failed with the error message:
Operation on target Copy Data failed:
{"StatusCode":"DF-Executor-StoreIsNotDefined","Message":"Job failed
due to reason: The store configuration is not defined. This error is
potentially caused by invalid parameter assignment in the
pipeline.","Details":""}
(It worked when I directly run the query below in snowflake:
SELECT * FROM mySchema.myTable WHERE loadDate >= '2022-07-01')
But when I used Pipeline expression below (removing Where clause):
#concat('SELECT * FROM mySchema.myTable')
It worked.
Or if I used Pipeline expression below (using a different Where clause without timestamp comparison:
#concat('SELECT * FROM mySchema.myTable WHERE loadDate is not null')
This also worked.
So, my question is: why the first expression failed? How should I fix it?
Thanks in advance.
You can give the date in the string itself as suggested by #Sally Dabbah.
This is my repro with Azure SQL Database for your reference:
Give the query in the double quotes and date in single quotes in the dynamic content pipeline expression like below.
"select * from dbo.myschema where loaddate >= '2022-07-01'"
Give this parameter $query in the dataflow expression.
Pipeline succeded:
Data in the csv file in the sink (to single file):

Azure data factory: pass where clause as a string to dynamic query with quotes

I have a Lookup that retrieves a few records from a MS SQL table containing schema, table name and a whole where clause. These values are passed to a copy data (within a ForEach) In the copy data i use a Dynamic query statement like:
#concat('select a.*, current_date as crt_tms from ',item().shm_nam,'.',item().tab_nam,
item().where_clause )
This construction works fine without the where_clause or with a where clause with an integer. But it goes wrong with strings like:
'a where a.CODSYSBRN ='XXX' ;'
it's about the quote (')
How can i pass it through?
I know that the where clause as a fixed string in the dynamic query works when i use double quotes (to escape the single quote):
'a where a.CODSYSBRN =''XXX'' ;'
Point is i need the where clause to be completely dynamic because it differ per table
whatever i try i get this kind of error:
Syntax error or access violation;257 sql syntax error: incorrect syntax near "where a"
ps i also tested this, but with the same result:
select a.*, current_date as crt_tms from #{item().shm_nam}.#{item().tab_nam} a #{item().where_clause}
As you have mentioned you are getting whole where clause from the lookup table, the query must have included the column values in where clause for string and integer types separately.
Example lookup table:
In your copy activity, you can use Concat() function as you were already doing it, to combine static values & parameters.
#concat('select * from ',item().schma_name,'.',item().table_name,' ',item().where_clause)
For debugging purposes, I have added the expression in set variable activity, to see the value of the expression.
Iteration1:
Iteration2:

How to fetch Select statement from variable for Copy data activity in Azure Data Factory?

I have source query in Copy data activity of Azure Data Factory. Currently it is static query, but I would like to get entire statement from database and use that in Query in Activity.
I try to pass SQL statement as variable there is issue.
Lookup SQL Query for fetching Select statement:
SELECT [source_query] FROM [SalesLT].[configuration] WHERE [Id] = '1'
Query returns and set variable:
SELECT Count(CustomerId) Source_Rows FROM [SalesLT].[Customer]
Error with Copy data:
Type=System.Data.SqlClient.SqlException,Message=Incorrect syntax near
')'.,Source=.Net SqlClient Data
You can't directly use the pipeline parameter in Query instead need to use #concat function to concatenate your SQL Query with parameters. Please refer the example below:
I suggest you to please go through this tutorial from #TechBrothersIT to get more clarification on the related issue.

Datastage multiple parametric (conditionned) query execution

I would like to create a job than based on some values in Table A, execute a Select query in Table B where the WHERE CONDITION must be parametric.
For example: I have 10 columns in A with 100 rows filled. 9 of my columns can be nullable so I have to create a query that controls the nullability of a value, if null then it must NOT be considered a research criteria in the Select statement.
I thought about using a SPARSE lookup where I'd pass a string that I created with the concatenation of the research parameters if they're not null but the job fails because you need to map the columns.
I even created a file with queries as string and then I loop the file and pass the string as a variable for the DB2 connector stage. It works... but I have more than 10000 rows means 10000 queries.. not that fast.
Thanks for your help.
PS: I'm new to this stuff :D
what you can do is to use Before SQL option at your source/target stage. Namely, your job will have at least two stages. One source db2 stage and one copy or sequential or peek stage as target or Row generator and target db2 connector.
In your input db2 connector you can pass your sql script as parameter into before sql provided that it is generated in advance and pass it as value to your before sql of db2 connector. Your actual sql statement will use "dummy" script such as "select current date from sysibm.sysdummy1" to complete your execution.
Hope it makes sense.

Is there a way to run a single PSQL Aggregate Function without hitting a database table?

For example, I'd like to run:
REGEXP_REPLACE("What's My Name?", "[^a-z0-9_\-]", "-");
and simply see what it returns, instead of doing a search against a DB Table. I tried to run it in the CLI and got
ERROR: syntax error at or near "REGEXP_REPLACE"
LINE 1: REGEXP_REPLACE("What's My Name?", "[^a-z0-9_\-]", "-")
(I'm trying to be generic- I'd like to be able to use this for other PSQL Aggregate Functions as well.)
Remember, this is SQL, so every output you get is a relation. Hence to calculate the result of a function, you need to run SELECT to retrieve the function's value.
Unfortunately, in many DBs, SELECT requires a table. In Oracle land, there's dual to work around this problem:
SELECT REGEXP_REPLACE('What''s My Name?', '[^a-z0-9_\-]', '-') FROM dual;
PostgreSQL, however, allows you to execute a SELECT query without having to specify a table:
SELECT REGEXP_REPLACE('What''s My Name?', '[^a-z0-9_\-]', '-');
Note that the string quote in SQL is ', not ". PostgreSQL uses double quotes to quote identifiers.
Side note: Not every function is an aggregate. An aggregate is a function that combines multiple values into a single output value. REGEXP_REPLACE() is just a normal string function.
Is there a way to run a single PSQL Aggregate Function without hitting a database table?
Yes, in PostgreSQL you can write a SELECT statement without a FROM part at all.