Azure data factory: pass where clause as a string to dynamic query with quotes - azure-data-factory

I have a Lookup that retrieves a few records from a MS SQL table containing schema, table name and a whole where clause. These values are passed to a copy data (within a ForEach) In the copy data i use a Dynamic query statement like:
#concat('select a.*, current_date as crt_tms from ',item().shm_nam,'.',item().tab_nam,
item().where_clause )
This construction works fine without the where_clause or with a where clause with an integer. But it goes wrong with strings like:
'a where a.CODSYSBRN ='XXX' ;'
it's about the quote (')
How can i pass it through?
I know that the where clause as a fixed string in the dynamic query works when i use double quotes (to escape the single quote):
'a where a.CODSYSBRN =''XXX'' ;'
Point is i need the where clause to be completely dynamic because it differ per table
whatever i try i get this kind of error:
Syntax error or access violation;257 sql syntax error: incorrect syntax near "where a"
ps i also tested this, but with the same result:
select a.*, current_date as crt_tms from #{item().shm_nam}.#{item().tab_nam} a #{item().where_clause}

As you have mentioned you are getting whole where clause from the lookup table, the query must have included the column values in where clause for string and integer types separately.
Example lookup table:
In your copy activity, you can use Concat() function as you were already doing it, to combine static values & parameters.
#concat('select * from ',item().schma_name,'.',item().table_name,' ',item().where_clause)
For debugging purposes, I have added the expression in set variable activity, to see the value of the expression.
Iteration1:
Iteration2:

Related

PostgreSQL, allow to filter by not existing fields

I'm using a PostgreSQL with a Go driver. Sometimes I need to query not existing fields, just to check - maybe something exists in a DB. Before querying I can't tell whether that field exists. Example:
where size=10 or length=10
By default I get an error column "length" does not exist, however, the size column could exist and I could get some results.
Is it possible to handle such cases to return what is possible?
EDIT:
Yes, I could get all the existing columns first. But the initial queries can be rather complex and not created by me directly, I can only modify them.
That means the query can be simple like the previous example and can be much more complex like this:
WHERE size=10 OR (length=10 AND n='example') OR (c BETWEEN 1 and 5 AND p='Mars')
If missing columns are length and c - does that mean I have to parse the SQL, split it by OR (or other operators), check every part of the query, then remove any part with missing columns - and in the end to generate a new SQL query?
Any easier way?
I would try to check within information schema first
"select column_name from INFORMATION_SCHEMA.COLUMNS where table_name ='table_name';"
And then based on result do query
Why don't you get a list of columns that are in the table first? Like this
select column_name
from information_schema.columns
where table_name = 'table_name' and (column_name = 'size' or column_name = 'length');
The result will be the columns that exist.
There is no way to do what you want, except for constructing an SQL string from the list of available columns, which can be got by querying information_schema.columns.
SQL statements are parsed before they are executed, and there is no conditional compilation or no short-circuiting, so you get an error if a non-existing column is referenced.

Is there a way to run a single PSQL Aggregate Function without hitting a database table?

For example, I'd like to run:
REGEXP_REPLACE("What's My Name?", "[^a-z0-9_\-]", "-");
and simply see what it returns, instead of doing a search against a DB Table. I tried to run it in the CLI and got
ERROR: syntax error at or near "REGEXP_REPLACE"
LINE 1: REGEXP_REPLACE("What's My Name?", "[^a-z0-9_\-]", "-")
(I'm trying to be generic- I'd like to be able to use this for other PSQL Aggregate Functions as well.)
Remember, this is SQL, so every output you get is a relation. Hence to calculate the result of a function, you need to run SELECT to retrieve the function's value.
Unfortunately, in many DBs, SELECT requires a table. In Oracle land, there's dual to work around this problem:
SELECT REGEXP_REPLACE('What''s My Name?', '[^a-z0-9_\-]', '-') FROM dual;
PostgreSQL, however, allows you to execute a SELECT query without having to specify a table:
SELECT REGEXP_REPLACE('What''s My Name?', '[^a-z0-9_\-]', '-');
Note that the string quote in SQL is ', not ". PostgreSQL uses double quotes to quote identifiers.
Side note: Not every function is an aggregate. An aggregate is a function that combines multiple values into a single output value. REGEXP_REPLACE() is just a normal string function.
Is there a way to run a single PSQL Aggregate Function without hitting a database table?
Yes, in PostgreSQL you can write a SELECT statement without a FROM part at all.

How to escape SQL parameter in JDBC when PreparedStatement won't work?

I have a string that I want to pass to SQL. To prevent SQL injection and other quoting and escaping problems, the best practice is to use a PreparedStatement with ?. For example:
val ps = conn.prepareStatement("select * from foo where name = ?")
ps.setString(1, name)
But for some SQL code, this won't work. For example, here is PostgreSQL, trying to create a view.
val ps = conn.prepareStatement("create temp view v1 as select * from foo where name = ?")
ps.setString(1, name)
val rs = ps.execute()
This throws an error:
org.postgresql.util.PSQLException: ERROR: there is no parameter $1
It apparently doesn't allow parameters to create view. How do you get around this and safely escape the string?
Prepared statements are used to plan a complex statement once and then execute it multiple (= very many) times with different parameter values. Simple statements have no noticeable benefit from using a prepared statement because planning them is trivial. DDL statements are not supported at all, so that is most likely the cause of the error, although the error message is confusing.
From the documentation:
PREPARE name [ ( data_type [, ...] ) ] AS statement
statement
Any SELECT, INSERT, UPDATE, DELETE, or VALUES statement.
The PreparedStatement class does document that you can use DDL in the executeUpdate() method, but from a logical standpoint that is just nonsense, at least in PostgreSQL.
Instead, you should use a Statement and then call execute() or executeUpdate() (a bit odd that the latter method would support DDL statements because there is no update being performed).
Preventing SQL-injection
In order to prevent SQL-injection you can use a few PostgreSQL functions:
quote_literal() - As can be expected, this will quote a literal parameter value to be safe in the query. Not only does this prevent you from Bobby Tables, but also from the likes of Mr. O'Brien.
quote_nullable() - For literals like above, but will generate proper code when the parameter IS NULL.
quote_identifier() - Will double quote any table or column name that might cause problems for the planner, like table name from with columns int, type and from: SELECT int, type, from FROM from WHERE int = type becomes SELECT "int", "type", "from" FROM "from" WHERE "int" = "type".
You can use these functions directly in your SQL statements and then let PostgreSQL deal with nasty input.

Running a DBLink INSERT query using Psycopg2, running into formatting problems

I'm trying to insert into a remote database using DBLink, and I'm making the query through Psycopg2. This is my code that takes a list of column names called fields and a list of column values called values:
fields_str = ",".join(fields)
values_str = ("%s, " * len(values))[:-2]
query = """SELECT dblink_exec('{0}', 'INSERT INTO {1} ({2}) VALUES ({3});'""".format(self.foreign_db,
table,
fields_str,
values_str)
This doesn't work because DBLink expects two single quotes around every string value like in the example in their documentation: SELECT dblink_exec('myconn', 'insert into foo values(21,''z'',''{"a0","b0","c0"}'');');
The problem is that Psycopg2 sticks an "E" in front of all my strings to escape them. That means neither adding single quotes around all my values nor adding single quotes around all my "%s" parts in my query can fix this. The general problem seems to be that Psycopg2 cannot properly handle escaping strings within a DBLink query string.
psycopg2.ProgrammingError: syntax error at or near "f5907160"
LINE 1: ...ar,tag,oid,objecttype,updatedat,usedat) VALUES (E'f5907160-4...
Any way around this?

Multiple SQL queries in a report

The requirement is to generate a single report connecting to a single DB:
Query1 is a group by query and has a bar chart and pie chart based on it.
Query2 is a simple query on which a table gets created.
Both these queries need results based on a WHERE clause, which is supplied dynamically.
Can somebody point me to some examples on how to achieve this?
Thank you.
You can tell JasperReports to use a parameter to define part of the query using the $P!{PARAMETER_NAME} syntax. This tells JasperReports to use the literal value of PARAMETER_NAME as part of the query. You can then do:
Create a parameter named WHERE_CLAUSE in the report.
Give WHERE_CLAUSE a default value of 1=1.
Consider the following SQL statement:
SELECT * FROM table WHERE $P!{WHERE_CLAUSE}
The $P! expression changes the literal SQL statement to:
SELECT * FROM table WHERE 1=1
That is a valid query. Note the difference between $P{} and $P!{} -- the exclamation mark (!) is important.
You can then supply the SQL conditions dynamically.