Can I safely prevent SQL Injection using PostgreSQL's Dollar-Quoted String Constants?
I know the best was to handle dynamic queries is to have them generated in a application layer with a parametrized query, that's not what this question is about. All of the business logic is in stored procedures.
I have a stored procedure that takes parameters and generates a query, runs it, formats the results and returns it as a chunk of text. This function is passed a table name, column names and WHERE parameters. The WHERE parameters passed to the function are from user entered data in the database. I would like to make sure that the stings are sanitized so the query that is built is safe.
Using PostgreSQLs Dollar-Quoted Strings Constants, I should be able to safely sanitize all string input other than ' $$ '. However, if I do a string replace on "$" to escape it, I should be able to do a string comparison that is safe.
Stored Procedure:
function_name(tablename text, colnames text[], whereparam text)
--Build dynamic query...
Function Call:
SELECT
function_name('tablename', ARRAY['col1', 'col2', 'col3'], 'AND replace(col1, ''$'', ''/$'') = $$' || replace(alt_string_col, '$', '/$') || '$$ ')
FROM alttable
WHERE alt_id = 123;
Query Generated:
SELECT col1, col2, col3 FROM tablename WHERE 1=1 AND replace(col1, '$', '/$') = $$un/safe'user /$/$ data;$$
Since I'm escaping the col1 field before I compare it to escaped user data, even if the user enters, "un/safe'user $$ data;" in the field, alt_string_col, the double dollar sign does not break the query and the comparison passes.
Is this a safe way to escape strings in PostgreSQL stored procedure?
Edit1
Thanks to Erwin Brandstetter. Using the USING clause for EXECUTE I was about to create a function that can be called like this:
SELECT function_name(
'tablename',
ARRAY['col1', 'col2', 'col3'],
ARRAY[' AND col1 = $1 ', ' OR col2 = $5 '],
quote_literal(alt_string_col)::text, --Text 1-4
NULL::text,
NULL::text,
NULL::text,
alt_active_col::boolean, --Bool 1-4
NULL::boolean,
NULL::boolean,
NULL::boolean,
NULL::integer, --Int 1-4
NULL::integer,
NULL::integer,
NULL::integer
)
FROM alttable
WHERE alt_id = 123;
It gives some flexibility to the WHERE clauses that can be passed in.
Inside the stored procedure I have something like this for the EXECUTE statement.
FOR results IN EXECUTE(builtquery) USING
textParm1,
textParm2,
textParm3,
textParm4,
boolParm1,
boolParm2,
boolParm3,
boolParm4,
intParm1,
intParm2,
intParm3,
intParm4
LOOP
-- Do some stuff
END LOOP;
Use quote_ident() to safeguard against SQL injection while concatenating identifiers. Or format() in Postgres 9.1 or later.
Use the USING clause for EXECUTE in PL/pgSQL code to pass values. Or at least quote_literal().
To make sure a table name exists (and is quoted and schema-qualified automatically where necessary when concatenated), use the special data type regclass.
More about executing dynamic SQL with PL/pgSQL:
PostgreSQL parameterized Order By / Limit in table function
Table name as a PostgreSQL function parameter
Since PostgreSQL 9.0 you can also use anonymous code blocks with the DO statement to execute dynamic SQL.
Related
I have a dynamic SQL statement I'm building up that is something like
my_interval_var := interval - '60 days';
sql_query := format($f$ AND NOT EXISTS (SELECT 1 FROM some_table st WHERE st.%s = %s.id AND st.expired_date >= %L )$f$, var1, var2, now() - my_interval_var)
Regarding my first question, it seemed to insert the Timestamp correctly (it seems) after the now() - my_interval_var computation. However, I just want to make sure I don't need to cast anything or something, because the only way I could get it work was if I used %L, which is the string literal Identifer. Or does postgres allow direct comparisons with Strings that represent Time without a cast?, like
some_column <= '2021-12-31 00:00:00'; // is ::timestamp cast needed?
Second of all, regarding the sql_query variable that I concatenated an SQL String into above, I actually wanted to skip the Format I did, and directly inject this sql_query variable into an EXECUTE...FORMAT...USING statement.
I couldn't get it to work, but something like this:
EXECUTE format($f$ SELECT *
FROM %I tbl_alias
WHERE tbl_alias.%s = %L
%s ) USING var1, var2, var3, sql_query;
Is it possible to leave the Dynamic SQL Identifiers %I %L and %s inside the variable and FORMAT it at the EXECUTE... level? Something tells me this isn't possible, but it would be really cool.
Also last question I didn't want to add, but I feel someone might have a quick answer.
I was using the ]
FOR temprecord IN
SELECT myCol1, myCol2, myCol3
FROM %I tbl',var1)
LOOP
EXECUTE temprecord.someColumnOnMyTbl;
END LOOP;
...but I could not for the life of get the EXECUTE temprecord.someColumnOnMyTbl statement to work when I made the query dynamic. I tried everything identifier, using FORMAT, USING...
I thought columns were strings like %s because I do that for columns all the time when they are aliased like alias.%s = 'some string literal'
ANyway, I couldn't get it to work, I wanted to make the column name dynamic but tried all these things
EXECUTE format($f$ %I.%s $f$, var1, var2);
EXECUTE format($f$ %$1.%$2 $f$) USING var1, var2;
EXECUTE format($f$ %I.someColumnOn%s $f$, var1, var2);
EXECUTE format($f$ $1.someColumnOn$2 $f$) USING var1, var2;
Anyway, I tried more stuff than that, but I actually got some data from the DB when I made the temprecord variable an %I but I am Selecting 3 columns and it looked like sommething got jacked up with the second identifier because I got a syntax error and it looked like it was trying to concatenate all 3 columns of the query results...
I did try hardcoding it and that worked fine... any help appreciated!
String literal is unknown type value. Postgres always does cast to some target binary format. The type is deduced from context. When you use function format, and %L placeholder, then any binary value is converted to string, and escaped to Postgres's string literal (protection against syntax errors, and SQL injection). When you use USING clause, then the binary value is passed directly to executor. It is little bit faster, and there is not possibility to lost some information under cast to string. Without these points, the real effect of %L and USING clause is almost same.
Your type of variable is timestamp. Probably type of expired_date column is date type. So some conversion timestamp->date is necessary.
Function format is just string function. It just make string. For better readability it supports placeholders, that ensure correct escaping and correct result SQL string. %L is same like calling function quote_literal and %I is same like quote_ident (for column, table names). %s inserts string without escaping and quoting. The result of format function (when you use it in EXECUTE command) should be valid SQL statement. You can use it in RAISE NOTICE command, and you can print result to debug output. Usually it is good idea
DECLARE
query text;
x date DEFAULT current_date
y int;
BEGIN
query := format('.... WHERE inserted = $1', ...);
RAISE NOTICE 'dynamic query will be: %', query);
EXECUTE query USING x INTO y;
...
Clause USING allows using parameters in dynamic SQL (EXECUTE clause). Usually, the format's placeholdres should be used for table or column names, and USING for any other.
For types date and timestamp (scalars basic types) the following execution will be on 99.99% same:
EXECUTE format('select count(*) from foo where inserted = %L', current_date) INTO ..
EXECUTE 'select count(*) from foo where inserted = $1' USING current_date INTO ..
You cannot to use query parameters on column name or table name positions. This is limit of USING clause. But for any other cases, this clause should be used primary.
I am trying to write a function that will dynamically create the sql statement, but I am facing problems with typecasts so, how can I identify using the type of field if it needs to be quoted
-- using this I can recover the types of each field
-- but I do not have a simple way to express that for a specific type it need
-- to quote
create table test (
id serial not null,
employee_name text,
salary decimal(12,2),
hire_date date,
active boolean
);
select column_name,data_type, null as need_to_be_quoted
from information_schema.columns
where table_name = 'table_name';
column type need to be quoted (this is a missing information)
-------------------------------------
id integer false
employee_name text true
salary decimal false
hire_date date true
active boolean false
quote_ident docs says:
Return the given string suitably quoted to be used as an identifier in an SQL statement string. Quotes are added only if necessary
But it is not what I was expecting:
insert into test (employee_name, salary, hire_date, active)
values (quote_identy('John Doe'), quote_identy(100000), quote_identy(current_date), quote_identy(true));
This is kind of necessary because I am trying to generate the statement string dinamically.
I have values to be inserted in some table, I can discover the type of each value, but to generated the insert string statement, I should know if a specific value type should be quoted or not for example
text: type should be quoted in the string statement
boolean: should not be quoted
numeric: should not be quoted
date: should be quoted
Don't quote. Quoting adds complexity and if you don't get it just right it has syntax and security issues.
Instead use bind parameters. The details depend on what database library you're working with, but the basic idea is always the same. First prepare the statement with placeholders for your values. Then execute it passing in the values. The database takes care of putting the values into the statement. You can execute the same prepared statement multiple times optimizing your coding and database calls.
Here's how you'd do it in PL/pgSQL with EXECUTE. The linked documentation has lots of information about safely executing dynamic queries.
do $$
begin
execute '
insert into test (employee_name, salary, hire_date, active)
values ($1, $2, $3, $4)
' using 'John Doe', 100000, current_date, true;
end;
$$;
Furthermore, while writing a SQL builder is a good exercise, it's also very complicated and very easy to get subtly wrong. There are plenty of libraries which will build SQL for you.
Here's your statement using PHP's Laravel.
DB::table('test')->insert([
'employee_name' => 'John Doe',
'salary' => 100000,
'hire_date' => current_date,
'active' => true
);
If you're curious about how SQL builders work, you can dig inside Laravel for ideas. Writing one in PL/pgSQL is ambitious, but I don't know of one that exists. Good luck!
I have a multi-tenant database where each tenant gets their own schema. Each schema has a set of materialized views used in full-text searches.
The following function takes a schema name and a table name and concatenates them into schema.table_name format:
CREATE OR REPLACE FUNCTION create_table_name(_schema text, _tbl text, OUT result text)
AS 'select $1 || ''.'' || $2'
LANGUAGE SQL
It works as expected in PGAdmin:
I'm trying to use this function in a prepared statement, like this:
SELECT p.id AS id,
ts_rank(
p.document, plainto_tsquery(unaccent(?))
) AS rank
FROM create_table_name(?, 'project_search') AS p
WHERE p.document ## plainto_tsquery(unaccent(?))
OR p.name ILIKE ?
However, when I run it, I get the following error:
ERROR 42703 (undefined_column) column p.id does not exist
If I "hard-code" the schema and table name though, it works.
Why am I getting this error?
P.S. I should note that I am aware of the dangers of this approach, but the schema name always comes from inside my application so I'm not worried about SQL injection.
You want to use the function result as table name in a query, but what you are actually doing is using the function as a table function. This “table” has only one row and one column called result, which explains the error message.
You need dynamic SQL for that, for example by using PL/pgSQL code in a DO statement:
DO
$$DECLARE
...
BEGIN
EXECUTE
format(
E'SELECT p.id AS id,\n'
' ts_rank(\n'
' p.document,\n'
' plainto_tsquery(unaccent(?))\n'
' ) AS rank\n'
'FROM %I.project_search AS p\n'
'WHERE p.document ## plainto_tsquery(unaccent($1))\n'
'OR p.name ILIKE $2',
schema_name
)
USING fts_query, like_pattern
INTO var1, ...;
...
$$;
To handle more than one result row, you'd use a FOR loop — this is just a simple example to show the principle.
Note how I use format with the %I pattern to avoid SQL injection. Your function is vulnerable.
How can I write a dynamic SELECT INTO query inside a PL/pgSQL function in Postgres?
Say I have a variable called tb_name which is filled in a FOR loop from information_schema.tables. Now I have a variable called tc which will be taking the row count for each table. I want something like the following:
FOR tb_name in select table_name from information_schema.tables where table_schema='some_schema' and table_name like '%1%'
LOOP
EXECUTE FORMAT('select count(*) into' || tc 'from' || tb_name);
END LOOP
What should be the data type of tb_name and tc in this case?
CREATE OR REPLACE FUNCTION myfunc(_tbl_pattern text, _schema text = 'public')
RETURNS void AS -- or whatever you want to return
$func$
DECLARE
_tb_name information_schema.tables.table_name%TYPE; -- currently varchar
_tc bigint; -- count() returns bigint
BEGIN
FOR _tb_name IN
SELECT table_name
FROM information_schema.tables
WHERE table_schema = _schema
AND table_name ~ _tbl_pattern -- see below!
LOOP
EXECUTE format('SELECT count(*) FROM %I.%I', _schema, _tb_name)
INTO _tc;
-- do something with _tc
END LOOP;
END
$func$ LANGUAGE plpgsql;
Notes
I prepended all parameters and variables with an underscore (_) to avoid naming collisions with table columns. Just a useful convention.
_tc should be bigint, since that's what the aggregate function count() returns.
The data type of _tb_name is derived from its parent column dynamically: information_schema.tables.table_name%TYPE. See the chapter Copying Types in the manual.
Are you sure you only want tables listed in information_schema.tables? Makes sense, but be aware of implications. See:
How to check if a table exists in a given schema
a_horse already pointed to the manual and Andy provided a code example. This is how you assign a single row or value returned from a dynamic query with EXECUTE to a (row) variable. A single column (like count in the example) is decomposed from the row type automatically, so we can assign to the scalar variable tc directly - in the same way we would assign a whole row to a record or row variable. Related:
How to get the value of a dynamically generated field name in PL/pgSQL
Schema-qualify the table name in the dynamic query. There may be other tables of the same name in the current search_path, which would result in completely wrong (and very confusing!) results without schema-qualification. Sneaky bug! Or this schema is not in the search_path at all, which would make the function raise an exception immediately.
How does the search_path influence identifier resolution and the "current schema"
Always quote identifiers properly to defend against SQL injection and random errors. Schema and table have to be quoted separately! See:
Table name as a PostgreSQL function parameter
Truncating all tables in a Postgres database
I use the regular expression operator ~ in table_name ~ _tbl_pattern instead of table_name LIKE ('%' || _tbl_pattern || '%'), that's simpler. Be wary of special characters in the pattern parameter either way! See:
PostgreSQL Reverse LIKE
Escape function for regular expression or LIKE patterns
Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL
I set a default for the schema name in the function call: _schema text = 'public'. Just for convenience, you may or may not want that. See:
Assigning default value for type
Addressing your comment: to pass values, use the USING clause like:
EXECUTE format('SELECT count(*) FROM %I.%I
WHERE some_column = $1', _schema, _tb_name,column_name)
USING user_def_variable;
Related:
INSERT with dynamic table name in trigger function
It looks like you want the %I placeholder for FORMAT so that it treats your variable as an identifier. Also, the INTO clause should go outside the prepared statement.
FOR tb_name in select table_name from information_schema.tables where table_schema='some_schema' and table_name like '%1%'
LOOP
EXECUTE FORMAT('select count(*) from %I', tb_name) INTO tc;
END LOOP
I have a situation in a T-SQL stored procedure, where in a dynamic SQL, the parameter/s, referencing other variables, whose value has single quotes and other predefined operators. The problem is T-SQL script fails, when such a condition exist.
Attached is a sample code, demonstrating such a situation.
Any Idea how to solve such a case?
DECLARE #TransVocObj XML,#xmlfragment XML,#SQL NVARCHAR(MAX)
SELECT #TransVocObj = '<TransactionVoucherViewModel><TransactionRows></TransactionRows></TransactionVoucherViewModel>'
DECLARE #Narration varchar(100)
SET #Narration ='AABBCC''DD''EEFF'-- #Narration ='AABBCCDDEEFF'
Select #xmlfragment=
'<TransactionRow>'+'<Description>'+#Narration +'</Description>'+'<DebitAmount>'+CONVERT(VARCHAR(30),500.00)+'</DebitAmount>'+'</TransactionRow>'
SET #SQL=N' SET #TransVocObj.modify(''insert '+ CONVERT(NVARCHAR(MAX),#xmlfragment)+' into (/TransactionVoucherViewModel/TransactionRows)[1] '') '
EXECUTE sp_executesql #SQL,N'#TransVocObj XML Output,#xmlfragment XML',#TransVocObj OUTPUT,#xmlfragment
SELECT T.Item.query('.//Description').value('.','VARCHAR(60)') FROM #TransVocObj.nodes('//TransactionRows/TransactionRow') AS T(Item)
The database server is MS SQL SERVER 2005
You can double-up your single-quote characters within #Narration using the REPLACE function. So, when you build #xmlfragment it can look like:
Select #xmlfragment=
'<TransactionRow>'+'<Description>'+REPLACE(#Narration,'''','''''')+'</Description>'+'<DebitAmount>'+CONVERT(VARCHAR(30),500.00)+'</DebitAmount>'+'</TransactionRow>'