DataFlow Query Not Replacing Variable - azure-data-factory

I have a workable sql query
... where AcctPD >= '#{variables('StartDate')}'
and want to use a variable such as StartDate value of 202211, a varchar(6).
If I put ... where AcctPD >= '202211' in the sql, the query works, and data is returned. But if I run the above for a variable substitution, I get
at Source 'XXXXx': Parse error at line: 36, column: 41: Incorrect syntax near 'StartDate'.
SQL Server error encountered while reading from the given table or
while executing the given query.
Why?

ADF pipeline variables and pipeline parameters cannot be used directly in dataflow. In order to use the value of pipeline variable, a dataflow parameter is to be created and value of the pipeline variable should be passed to that dataflow parameter. I tried to repro this.
Pipeline variable StartDate is created and value is assigned.
Dataflow Query is run without using the variable. Data is read from the SQL table without any error.
When variable is given in the Query, same error is produced.
In order to solve this, dataflow parameter named par_StartDate is created.
In Source transformation Query, open expression builder is selected.
Query is written as "select * from Target_merged_table where createdat='{$par_StartDate}'" in dataflow expression builder.
Sink transformation is added with csv file as a sink dataset.
This dataflow is added in dataflow activity of pipeline. Value of pipeline variable StartDate is passed to dataflow parameter par_StartDate.
When pipeline is run, dataflow is executed successfully.

Related

How to combine parameter and static value in Azure Data Factory?

I have Copy Data Activity in Azure Data Factory.
I have OnPrem File system as Sink dataset.
Folder value is currently "Dev/Customers/Nissan".
I would like to use #pipeline().parameters.Environment for "Dev" value.
However following did not work
"#pipeline().parameters.OnPremLoad_OnPremEnv/Customers/Nissan"
Please advice how to concatenate parameter and dynamic value.
Use concat() function to add parameter to static value in the expression.
Ex:
Expression: #concat(pipeline().parameters.Environment,'/Customers/Nissan')

How can I check if a JSON field exists using an ADF expression?

I want to do some activity in an ADF pipeline, but only if a field in a JSON output is present. What kind of ADF expression can I use to check that?
I set up two json files for testing, one with a firstName attribute and one without:
I then created a Lookup activity to get the contents of the JSON file and a Set Variable activity for testing the expression. I often use these to test expressions and it's a good way to test and view expression results iteratively:
I then created a Boolean variable (which is one of the datatypes supported by Azure Data Factory and Synapse pipelines) and the expression I am using to check the existence of the attribute is this:
#bool(contains(activity('Lookup1').output.firstRow, 'firstName'))
You can then use that boolean variable in an If activity, to execute subsequent activities conditionally based on the value of the variable.

How do I query Postgresql with IDs from a parquet file in an Data Factory pipeline

I have an azure pipeline that moves data from one point to another in parquet files. I need to join some data from a Postgresql database that is in an AWS tenancy by a unique ID. I am using a dataflow to create the unique ID I need from two separate columns using a concatenate. I am trying to create where clause e.g.
select * from tablename where unique_id in ('id1','id2',id3'...)
I can do a lookup query to the database, but I can't figure out how to create the list of IDs in a parameter that I can use in the select statement out of the dataflow output. I tried using a set variable and was going to put that into a for-each, but the set variable doesn't like the output of the dataflow (object instead of array). "The variable 'xxx' of type 'Array' cannot be initialized or updated with value of type 'Object'. The variable 'xxx' only supports values of types 'Array'." I've used a flatten to try to transform to array, but I think the sync operation is putting it back into JSON?
What a workable approach to getting the IDs into a string that I can put into a lookup query?
Some notes:
The parquet file has a small number of unique IDs compared to the total unique IDs in the database.
If this were an azure postgresql I could just use a join in the dataflow to do the join, but the generic postgresql driver isn't available in dataflows. I can't copy the entire database over to Azure just to do the join and I need the dataflow in Azure for non-technical reasons.
Edit:
For clarity sake, I am trying to replace local python code that does the following:
query = "select * from mytable where id_number in "
df = pd.read_parquet("input_file.parquet")
df['id_number'] = df.country_code + df.id
df_other_data = pd.read_sql(conn, query + str(tuple(df.id_number))
I'd like to replace this locally executing code with ADF. In the ADF process, I have to replace the transformation of the IDs which seems easy enough if a couple of different ways. Once I have the IDs in the proper format in a column in a dataset, I can't figure out how to query a database that isn't supported by Data Flow and restrict it to only the IDs I need so I don't bring down the entire database.
Due to variables of ADF only can store simple type. So we can define an Array type paramter in ADF and set default value. Paramters of ADF support any type of elements including complex JSON structure.
For example:
Define a json array:
[{"name": "Steve","id": "001","tt_1": 0,"tt_2": 4,"tt3_": 1},{"name": "Tom","id": "002","tt_1": 10,"tt_2": 8,"tt3_": 1}]
Define an Array type paramter and set its default value:
So we will not get any error.

Query by date in Azure Data Factory Pipeline

I want to use a query in a copy job for my source in an Azure Data Factory pipeline together with a date function - here is the dummy query:
SELECT * FROM public.report_campaign_leaflet WHERE day="{today - 1d}"
I´ve found some documentation about dynamic content and some other stuff but no information on how to use date functions directly in a sql query.
Maybe someone has a hint for me?
Thanks & best,
Michael
Here is the possible solution for your problem.
In your copy activity, at the source side, you choose query in Use Query option, and then in the query box you write an expression
Here is the expression #concat('SELECT * FROM public.report_campaign_leaflet WHERE day=','"',formatDateTime(adddays(utcnow(),-1), 'yyyy-MM-dd'),'"')
formatDateTime function will just format the output of addDays(utcnow(),-1) into yyyy-MM-dd format
Again, you can have a parameter in your pipeline processDate for example, and to set this value from expression in trigger definition, and then just to call that parameter in the query. (suggestion)
You need to replace the double quote (") with two single quotes (''):
#concat('SELECT * FROM public.report_campaign_leaflet WHERE day=','''',formatDateTime(adddays(utcnow(),-1), 'yyyy-MM-dd'),'''')

PHP and sanitizing strings for use in dynamicly created DB2 queries

I'm relatively new to DB2 for IBMi and am wondering the methods of how to properly cleanse data for a dynamically generated query in PHP.
For example if writing a PHP class which handles all database interactions one would have to pass table names and such, some of which cannot be passed in using db2_bind_param(). Does db2_prepare() cleanse the structured query on its own? Or is it possible a malformed query can be "executed" within a db2_prepare() call? I know there is db2_execute() but the db is doing something in db2_prepare() and I'm not sure what (just syntax validation?).
I know if the passed values are in no way effected by the result of user input there shouldn't be much of an issue, but if one wanted to cleanse data before using it in a query (without using db2_prepare()/db2_execute()) what is the checklist for db2? The only thing I can find is to escape single quotes by prefixing them with another single quote. Is that really all there is to watch out for?
There is no magic "cleansing" happening when you call db2_prepare() -- it will simply attempt to compile the string you pass as a single SQL statement. If it is not a valid DB2 SQL statement, the error will be returned. Same with db2_exec(), only it will do in one call what db2_prepare() and db2_execute() do separately.
EDIT (to address further questions from the OP).
Execution of every SQL statement has three stages:
Compilation (or preparation), when the statement is parsed, syntactically and semantically analyzed, the user's privileges are determined, and the statement execution plan is created.
Parameter binding -- an optional step that is only necessary when the statement contains parameter markers. At this stage each parameter data type is verified to match what the statement text expects based on the preparation.
Execution proper, when the query plan generated at step 1 is performed by the database engine, optionally using the parameter (variable) values provided at step 2. The statement results, if any, are then returned to the client.
db2_prepare(), db2_bind_param(), and db2_execute() correspond to steps 1, 2 and 3 respectively. db2_exec() combines steps 1 and 3, skipping step 2 and assuming the absence of parameter markers.
Now, speaking about parameter safety, the binding step ensures that the supplied parameter values correspond to the expected data type constraints. For example, in the query containing something like ...WHERE MyIntCol = ?, if I attempt to bind a character value to that parameter it will generate an error.
If instead I were to use db2_exec() and compose a statement like so:
$stmt = "SELECT * FROM MyTab WHERE MyIntCol=" . $parm
I could easily pass something like "0 or 1=1" as the value of $parm, which would produce a perfectly valid SQL statement that only then will be successfully parsed, prepared and executed by db2_exec().