PostgreSQL exception handling on select statements with calculations - postgresql

I'm having issues with the exception handling of postgresql, and I'm looking for some tips/advice how to tackle this.
Say I want to do the following:
SELECT col1 / col2
FROM table
The first problem that arises is that if at some point a value in col2 = 0, the query throws an exception. A solution to this is to add a NULLIF() in the denominator.
In our case, users can make their own formulas, which have to be parsed by the database, so we do not have the knowledge about the division in advance. We could make a fancy formula parser that adds NULLIF() in the right places, but then don't get me started on taking square root of negative numbers..
So I'm wondering if there is a better solution to the problem. Does something like this exist?
SELECT col1 / col2
exception
return null
FROM table
Or do I need to make use of the 'function' feature of postgresql? Is it possible to combine two columns like this?
CREATE OR REPLACE FUNCTION somefunction(col1, col2) RETURNS real AS $$
BEGIN
RETURN col1 / col2;
exception when division_by_zero then
return null;
END;
$$ LANGUAGE plpgsql;
SELECT somefunction(col1, col2, ..)
FROM table;
So keep in mind that we do not know the formula in advance!
Thanks!

In PostgreSQL you can define your own operators. Given your function, you could
CREATE OPERATOR public./ (
LEFTARG = real,
RIGHTARG = real
FUNCTION = public.somefunction
);
SET search_path = public, pg_catalog;
This would use your custom function for every division.
Allowing your users to run SQL statements is a bad idea. SQL is so powerful that they can do anything - they can change the search_path, and they can easily bring your database server to its knees.

Related

PostgreSQL. Is such function vulnerable to SQL injection or is it safe?

Functions that looks problematic
I'm exploring postgresql database and I see a recurring pattern:
CREATE OR REPLACE FUNCTION paginated_class(_orderby text DEFAULT NULL, _limit int DEFAULT 10, _offset int DEFAULT 0)
RETURNS SETOF pg_class
LANGUAGE PLPGSQL
AS $$
BEGIN
return query execute'
select * from pg_class
order by '|| coalesce (_orderby, 'relname ASC') ||'
limit $1 offset $1
'
USING _limit, _offset;
END;
$$;
Sample usage:
SELECT * FROM paginated_class(_orderby:='reltype DESC, relowner ASC ')
Repeating is:
_orderby is passed as text. It could be any combination of fields of returned SETOF type. E.g. 'relname ASC, reltype DESC'
_orderby parameter is not sanitized or checked in any way
_limit and _offset are integers
DB Fiddle for that: https://www.db-fiddle.com/f/vF6bCN37yDrjBiTEsdEwX6/1
Question: is such function vulnerable to SQL injection or not?
By external signs it's possible to suspect that such function is vulnerable to sql injection.
But all my attempts to find combination of params failed.
E.g.
CREATE TABLE T(id int);
SELECT * FROM paginated_class(_orderby:='reltype; DROP TABLE T; SELECT * FROM pg_class');
will return "Query Error: error: cannot open multi-query plan as cursor".
I did not found a way to exploit vulnerability if it exists with UPDATE/INSERT/DELETE.
So can we conclude that such function is actually safe?
If so: then why?
UPDATE. Possible plan for attack
Maybe I was not clear: I'm not asking about general guidelines but for experimental exploit of vulnerability or proof that such exploit is not possible.
DB Fiddle for this: https://www.db-fiddle.com/f/vF6bCN37yDrjBiTEsdEwX6/4 (or you can provide other of course)
My conclusions so far
A. Such attack could be possible if _orderby will have parts:
sql code that suppresses output of first SELECT
do something harmful
select * from pg_class so that it satisfies RETURNS SETOF pg_class
E.g.
SELECT * FROM paginated_class(_orderby:='relname; DELETE FROM my_table; SELECT * FROM pg_class')
It's easy for 2 and 3. I don't know a way to do 1st part.
This will generate: "error: cannot open multi-query plan as cursor"
B. If it's not possible to suppress first SELECT
Then:
every postgresql function works in separate transaction
because of error this transaction will be rollbacked
there is no autonomous transactions as in Oracle
for non-transactional ops: I know only about sequence-related ops
everything else be that DML or DDL is transactional
So? Can we conclude that such function is actually safe?
Or I'm missing something?
UPDATE 2. Attack using prepared function
From answer https://stackoverflow.com/a/69189090/1168212
A. It's possible to implement Denial-of-service attack putting expensive calculation
B. Side-effects:
If you put a function with side effects into the ORDER BY clause, you may also be able to modify data.
Let's try the latter:
CREATE FUNCTION harmful_fn()
RETURNS bool
LANGUAGE SQL
AS '
DELETE FROM my_table;
SELECT true;
';
SELECT * FROM paginated_class(_orderby:='harmful_fn()', _limit:=1);
https://www.db-fiddle.com/f/vF6bCN37yDrjBiTEsdEwX6/8
Yes.
So if an attacker has right to create functions: non-DOS attack is possible too.
I accept Laurenz Albe answer but: is it possible to do non-DOS attack without function?
Ideas?
No, that is not safe. An attacker could put any code into your ORDER BY clause via the _orderby parameter.
For example, you can pass an arbitrary subquery, as long as it returns only a single row: (SELECT max(i) FROM generate_series(1, 100000000000000) AS i). That can easily be used for a denial-of-service attack, if the query is expensive enough. Or, like with this example, you can cause a (brief) out-of-space condition with temporary files.
If you put a function with side effects into the ORDER BY clause, you may also be able to modify data.

Postgresql & PgAdmin SIMPLE variable [duplicate]

How do I declare a variable for use in a PostgreSQL 8.3 query?
In MS SQL Server I can do this:
DECLARE #myvar INT
SET #myvar = 5
SELECT *
FROM somewhere
WHERE something = #myvar
How do I do the same in PostgreSQL? According to the documentation variables are declared simply as "name type;", but this gives me a syntax error:
myvar INTEGER;
Could someone give me an example of the correct syntax?
I accomplished the same goal by using a WITH clause, it's nowhere near as elegant but can do the same thing. Though for this example it's really overkill. I also don't particularly recommend this.
WITH myconstants (var1, var2) as (
values (5, 'foo')
)
SELECT *
FROM somewhere, myconstants
WHERE something = var1
OR something_else = var2;
There is no such feature in PostgreSQL. You can do it only in pl/PgSQL (or other pl/*), but not in plain SQL.
An exception is WITH () query which can work as a variable, or even tuple of variables. It allows you to return a table of temporary values.
WITH master_user AS (
SELECT
login,
registration_date
FROM users
WHERE ...
)
SELECT *
FROM users
WHERE master_login = (SELECT login
FROM master_user)
AND (SELECT registration_date
FROM master_user) > ...;
You could also try this in PLPGSQL:
DO $$
DECLARE myvar integer;
BEGIN
SELECT 5 INTO myvar;
DROP TABLE IF EXISTS tmp_table;
CREATE TABLE tmp_table AS
SELECT * FROM yourtable WHERE id = myvar;
END $$;
SELECT * FROM tmp_table;
The above requires Postgres 9.0 or later.
Dynamic Config Settings
you can "abuse" dynamic config settings for this:
-- choose some prefix that is unlikely to be used by postgres
set session my.vars.id = '1';
select *
from person
where id = current_setting('my.vars.id')::int;
Config settings are always varchar values, so you need to cast them to the correct data type when using them. This works with any SQL client whereas \set only works in psql
The above requires Postgres 9.2 or later.
For previous versions, the variable had to be declared in postgresql.conf prior to being used, so it limited its usability somewhat. Actually not the variable completely, but the config "class" which is essentially the prefix. But once the prefix was defined, any variable could be used without changing postgresql.conf
It depends on your client.
However, if you're using the psql client, then you can use the following:
my_db=> \set myvar 5
my_db=> SELECT :myvar + 1 AS my_var_plus_1;
my_var_plus_1
---------------
6
If you are using text variables you need to quote.
\set myvar 'sometextvalue'
select * from sometable where name = :'myvar';
This solution is based on the one proposed by fei0x but it has the advantages that there is no need to join the value list of constants in the query and constants can be easily listed at the start of the query. It also works in recursive queries.
Basically, every constant is a single-value table declared in a WITH clause which can then be called anywhere in the remaining part of the query.
Basic example with two constants:
WITH
constant_1_str AS (VALUES ('Hello World')),
constant_2_int AS (VALUES (100))
SELECT *
FROM some_table
WHERE table_column = (table constant_1_str)
LIMIT (table constant_2_int)
Alternatively you can use SELECT * FROM constant_name instead of TABLE constant_name which might not be valid for other query languages different to postgresql.
Using a Temp Table outside of pl/PgSQL
Outside of using pl/pgsql or other pl/* language as suggested, this is the only other possibility I could think of.
begin;
select 5::int as var into temp table myvar;
select *
from somewhere s, myvar v
where s.something = v.var;
commit;
I want to propose an improvement to #DarioBarrionuevo's answer, to make it simpler leveraging temporary tables.
DO $$
DECLARE myvar integer = 5;
BEGIN
CREATE TEMP TABLE tmp_table ON COMMIT DROP AS
-- put here your query with variables:
SELECT *
FROM yourtable
WHERE id = myvar;
END $$;
SELECT * FROM tmp_table;
True, there is no vivid and unambiguous way to declare a single-value variable, what you can do is
with myVar as (select "any value really")
then, to get access to the value stored in this construction, you do
(select * from myVar)
for example
with var as (select 123)
... where id = (select * from var)
You may resort to tool special features. Like for DBeaver own proprietary syntax:
#set name = 'me'
SELECT :name;
SELECT ${name};
DELETE FROM book b
WHERE b.author_id IN (SELECT a.id FROM author AS a WHERE a.name = :name);
As you will have gathered from the other answers, PostgreSQL doesn’t have this mechanism in straight SQL, though you can now use an anonymous block. However, you can do something similar with a Common Table Expression (CTE):
WITH vars AS (
SELECT 5 AS myvar
)
SELECT *
FROM somewhere,vars
WHERE something = vars.myvar;
You can, of course, have as many variables as you like, and they can also be derived. For example:
WITH vars AS (
SELECT
'1980-01-01'::date AS start,
'1999-12-31'::date AS end,
(SELECT avg(height) FROM customers) AS avg_height
)
SELECT *
FROM customers,vars
WHERE (dob BETWEEN vars.start AND vars.end) AND height<vars.avg_height;
The process is:
Generate a one-row cte using SELECT without a table (in Oracle you will need to include FROM DUAL).
CROSS JOIN the cte with the other table. Although there is a CROSS JOIN syntax, the older comma syntax is slightly more readable.
Note that I have cast the dates to avoid possible issues in the SELECT clause. I used PostgreSQL’s shorter syntax, but you could have used the more formal CAST('1980-01-01' AS date) for cross-dialect compatibility.
Normally, you want to avoid cross joins, but since you’re only cross joining a single row, this has the effect of simply widening the table with the variable data.
In many cases, you don’t need to include the vars. prefix if the names don’t clash with the names in the other table. I include it here to make the point clear.
Also, you can go on to add more CTEs.
This also works in all current versions of MSSQL and MySQL, which do support variables, as well as SQLite which doesn’t, and Oracle which sort of does and sort of doesn’t.
Here is an example using PREPARE statements. You still can't use ?, but you can use $n notation:
PREPARE foo(integer) AS
SELECT *
FROM somewhere
WHERE something = $1;
EXECUTE foo(5);
DEALLOCATE foo;
In DBeaver you can use parameters in queries just like you can from code, so this will work:
SELECT *
FROM somewhere
WHERE something = :myvar
When you run the query DBeaver will ask you for the value for :myvar and run the query.
Here is a code segment using plain variable in postges terminal. I have used it a few times. But need to figure a better way. Here I am working with string variable. Working with integer variable, you don't need the triple quote. Triple quote becomes single quote at query time; otherwise you got syntax error. There might be a way to eliminate the need of triple quote when working with string variables. Please update if you find a way to improve.
\set strainname '''B.1.1.7'''
select *
from covid19strain
where name = :strainname ;
In psql, you can use these 'variables' as macros. Note that they get "evaluated" every time they are used, rather than at the time that they are "set".
Simple example:
\set my_random '(SELECT random())'
select :my_random; -- gives 0.23330629315990592
select :my_random; -- gives 0.67458399344433542
this gives two different answers each time.
However, you can still use these as a valuable shorthand to avoid repeating lots of subselects.
\set the_id '(SELECT id FROM table_1 WHERE name = ''xxx'' LIMIT 1)'
and then use it in your queries later as
:the_id
e.g.
INSERT INTO table2 (table1_id,x,y,z) VALUES (:the_id, 1,2,3)
Note you have to double-quote the strings in the variables, because the whole thing is then string-interpolated (i.e. macro-expanded) into your query.

Postgres using functions inside queries

I have a table with common word values to match against brands - so when someone types in "coke" I want to match any possible brand names associated with it as well as the original term.
CREATE TABLE word_association ( commonterm TEXT, assocterm TEXT);
INSERT INTO word_association ('coke', 'coca-cola'), ('coke', 'cocacola'), ('coke', 'coca-cola');
I have a function to create a list of these values in a pipe-delim string for pattern matching:
CREATE OR REPLACE FUNCTION usp_get_search_terms(userterm text)
RETURNS text AS
$BODY$DECLARE
returnstr TEXT DEFAULT '';
BEGIN
SET DATESTYLE TO DMY;
returnstr := userterm;
IF EXISTS (SELECT 1 FROM word_association WHERE LOWER(commonterm) = LOWER(userterm)) THEN
SELECT returnstr || '|' || string_agg(assocterm, '|') INTO returnstr
FROM word_association
WHERE commonterm = userterm;
END IF;
RETURN returnstr;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
ALTER FUNCTION usp_get_search_terms(text)
OWNER TO customer_role;
If you call SELECT * FROM usp_get_search_terms('coke') you end up with
coke|coca-cola|cocacola|coca cola
EDIT: this function runs <100ms so it works fine.
I want to run a query with this text inserted e.g.
SELECT X.article_number, X.online_description
FROM articles X
WHERE LOWER(X.online_description) % usp_get_search_terms ('coke');
This takes approx 56s to run against my table of ~500K records.
If I get the raw text and use it in the query it takes ~300ms e.g.
SELECT X.article_number, X.online_description
FROM articles X
WHERE X.online_description % '(coke|coca-cola|cocacola|coca cola)';
The result sets are identical.
I've tried modifying what the output string from the function to e.g. enclose it in quotes and parentheses but it doesn't seem to make a difference.
Can someone please advise why there is a difference here? Is it the data type or something about calling functions inside queries? Thanks.
Your function might take 100ms, but it's not calling your function once; it's calling it 500,000 times.
It's because your function is declared VOLATILE. This tells Postgres that either the function returns different values when called multiple times within a query (like clock_timestamp() or random()), or that it alters the state of the database in some way (for example, by inserting records).
If your function contains only SELECTs, with no INSERTs, calls to other VOLATILE functions, or other side-effects, then you can declare it STABLE instead. This tells the planner that it can call the function just once and reuse the result without affecting the outcome of the query.
But your function does have side-effects, due to the SET DATESTYLE statement, which takes effect for the rest of the session. I doubt this was the intention, however. You may be able to remove it, as it doesn't look like date formatting is relevant to anything in there. But if it is necessary, the correct approach is to use the SET clause of the CREATE FUNCTION statement to change it only for the duration of the function call:
...
$BODY$
LANGUAGE plpgsql STABLE
SET DATESTYLE TO DMY
COST 100;
The other issue with the slow version of the query is the call to LOWER(X.online_description), which will prevent the query from utilising the index (since online_description is indexed, but LOWER(online_description) is not).
With these changes, the performance of both queries is the same; see this SQLFiddle.
So the answer came to me about dawn this morning - CTEs to the rescue!
Particularly as this is the "simple" version of a very large query, it helps to get this defined once in isolation, then do the matching against it. The alternative (given I'm calling this from a NodeJS platform) is to have one request retrieve the string of terms, then make another request to pass the string back. Not elegant.
WITH matches AS
( SELECT * FROM usp_get_search_terms('coke') )
, main AS
( SELECT X.article_number, X.online_description
FROM articles X
JOIN matches M ON X.online_description % M.usp_get_search_terms )
SELECT * FROM main
Execution time is somewhere around 300-500ms depending on term searched and articles returned.
Thanks for all your input guys - I've learned a few things about PostGres that my MS-SQL background didn't necessarily prepare me for :)
Have you tried removing the IF EXISTS() and simply using:
SELECT returnstr || '|' || string_agg(assocterm, '|') INTO returnstr
FROM word_association
WHERE LOWER(commonterm) = LOWER(userterm)
In instead of calling the function for each row call it once:
select x.article_number, x.online_description
from
woolworths.articles x
cross join
woolworths.usp_get_search_terms ('coke') c (s)
where lower(x.online_description) % s

Executing queries dynamically in PL/pgSQL

I have found solutions (I think) to the problem I'm about to ask for on Oracle and SQL Server, but can't seem to translate this into a Postgres solution. I am using Postgres 9.3.6.
The idea is to be able to generate "metadata" about the table content for profiling purposes. This can only be done (AFAIK) by having queries run for each column so as to find out, say... min/max/count values and such. In order to automate the procedure, it is preferable to have the queries generated by the DB, then executed.
With an example salesdata table, I'm able to generate a select query for each column, returning the min() value, using the following snippet:
SELECT 'SELECT min('||column_name||') as minval_'||column_name||' from salesdata '
FROM information_schema.columns
WHERE table_name = 'salesdata'
The advantage being that the db will generate the code regardless of the number of columns.
Now there's a myriad places I had in mind for storing these queries, either a variable of some sort, or a table column, the idea being to then have these queries execute.
I thought of storing the generated queries in a variable then executing them using the EXECUTE (or EXECUTE IMMEDIATE) statement which is the approach employed here (see right pane), but Postgres won't let me declare a variable outside a function and I've been scratching my head with how this would fit together, whether that's even the direction to follow, perhaps there's something simpler.
Would you have any pointers, I'm currently trying something like this, inspired by this other question but have no idea whether I'm headed in the right direction:
CREATE OR REPLACE FUNCTION foo()
RETURNS void AS
$$
DECLARE
dyn_sql text;
BEGIN
dyn_sql := SELECT 'SELECT min('||column_name||') from salesdata'
FROM information_schema.columns
WHERE table_name = 'salesdata';
execute dyn_sql
END
$$ LANGUAGE PLPGSQL;
System statistics
Before you roll your own, have a look at the system table pg_statistic or the view pg_stats:
This view allows access only to rows of pg_statistic that correspond
to tables the user has permission to read, and therefore it is safe to
allow public read access to this view.
It might already have some of the statistics you are about to compute. It's populated by ANALYZE, so you might run that for new (or any) tables before checking.
-- ANALYZE tbl; -- optionally, to init / refresh
SELECT * FROM pg_stats
WHERE tablename = 'tbl'
AND schemaname = 'public';
Generic dynamic plpgsql function
You want to return the minimum value for every column in a given table. This is not a trivial task, because a function (like SQL in general) demands to know the return type at creation time - or at least at call time with the help of polymorphic data types.
This function does everything automatically and safely. Works for any table, as long as the aggregate function min() is allowed for every column. But you need to know your way around PL/pgSQL.
CREATE OR REPLACE FUNCTION f_min_of(_tbl anyelement)
RETURNS SETOF anyelement
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE (
SELECT format('SELECT (t::%2$s).* FROM (SELECT min(%1$s) FROM %2$s) t'
, string_agg(quote_ident(attname), '), min(' ORDER BY attnum)
, pg_typeof(_tbl)::text)
FROM pg_attribute
WHERE attrelid = pg_typeof(_tbl)::text::regclass
AND NOT attisdropped -- no dropped (dead) columns
AND attnum > 0 -- no system columns
);
END
$func$;
Call (important!):
SELECT * FROM f_min_of(NULL::tbl); -- tbl being the table name
db<>fiddle here
Old sqlfiddle
You need to understand these concepts:
Dynamic SQL in plpgsql with EXECUTE
Polymorphic types
Row types and table types in Postgres
How to defend against SQL injection
Aggregate functions
System catalogs
Related answer with detailed explanation:
Table name as a PostgreSQL function parameter
Refactor a PL/pgSQL function to return the output of various SELECT queries
Postgres data type cast
How to set value of composite variable field using dynamic SQL
How to check if a table exists in a given schema
Select columns with particular column names in PostgreSQL
Generate series of dates - using date type as input
Special difficulty with type mismatch
I am taking advantage of Postgres defining a row type for every existing table. Using the concept of polymorphic types I am able to create one function that works for any table.
However, some aggregate functions return related but different data types as compared to the underlying column. For instance, min(varchar_column) returns text, which is bit-compatible, but not exactly the same data type. PL/pgSQL functions have a weak spot here and insist on data types exactly as declared in the RETURNS clause. No attempt to cast, not even implicit casts, not to speak of assignment casts.
That should be improved. Tested with Postgres 9.3. Did not retest with 9.4, but I am pretty sure, nothing has changed in this area.
That's where this construct comes in as workaround:
SELECT (t::tbl).* FROM (SELECT ... FROM tbl) t;
By casting the whole row to the row type of the underlying table explicitly we force assignment casts to get original data types for every column.
This might fail for some aggregate function. sum() returns numeric for a sum(bigint_column) to accommodate for a sum overflowing the base data type. Casting back to bigint might fail ...
#Erwin Brandstetter, Many thanks for the extensive answer. pg_stats does indeed provide a few things, but what I really need to draw a complete profile is a variety of things, min, max values, counts, count of nulls, mean etc... so a bunch of queries have to be ran for each columns, some with GROUP BY and such.
Also, thanks for highlighting the importance of data types, i was sort of expecting this to throw a spanner in the works at some point, my main concern was with how to automate the query generation, and its execution, this last bit being my main concern.
I have tried the function you provide (I probably will need to start learning some plpgsql) but get a error at the SELECT (t::tbl) :
ERROR: type "tbl" does not exist
btw, what is the (t::abc) notation referred as, in python this would be a list slice, but it’s probably not the case in PLPGSQL

In postgres (plpgsql), how to make a function that returns select * on a variable table_name?

Basically, at least for proof of concept, I want a function where I can run:
SELECT res('table_name'); and this will give me the results of SELECT * FROM table_name;.
The issue I am having is schema...in the declaration of the function I have:
CREATE OR REPLACE FUNCTION res(table_name TEXT) RETURNS SETOF THISISTHEPROBLEM AS
The problem is that I do not know how to declare my return, as it wants me to specify a table or a schema, and I won't have that until the function is actually run.
Any ideas?
You can do this, but as mentioned before you have to add a column definiton list in the SELECT query.
CREATE OR REPLACE FUNCTION res(table_name TEXT) RETURNS SETOF record AS $$
BEGIN
RETURN QUERY EXECUTE 'SELECT * FROM ' || table_name;
END;
$$ LANGUAGE plpgsql;
SELECT * FROM res('sometable') sometable (col1 INTEGER, col2 INTEGER, col3 SMALLINT, col4 TEXT);
Why for any real practical purpose would you just want to pass in table and select * from it? For fun maybe?
You can't do it without defining some kind of known output like jack and rudi show. Or doing it like depesz does here using output parameters http://www.depesz.com/index.php/2008/05/03/waiting-for-84-return-query-execute-and-cursor_tuple_fraction/.
A few hack around the wall approachs are to issue raise notices in a loop and print out a result set one row at a time. Or you could create a function called get_rows_TABLENAME that has a definition for every table you want to return. Just use code to generate the procedures creations. But again not sure how much value doing a select * from a table, especially with no constraints is other than for fun or making the DBA's blood boil.
Now in SQL Server you can have a stored procedure return a dynamic result set. This is both a blessing and curse as you can't be certain what comes back without looking up the definition. For me I look at PostgreSQL's implementation to be the more sound way to go about it.
Even if you manage to do this (see rudi-moore's answer for a way if you have 8.4 or above), You will have to expand the type explicitly in the select - eg:
SELECT res('table_name') as foo(id int,...)