PostgreSQL Set-Returning Function Call Optimization - postgresql

I have the following problem with PostgreSQL 9.3.
There is a view encapsulating a non-trivial query to some resources (e.g., documents). Let's illustrate it as simple as
CREATE VIEW vw_resources AS
SELECT * FROM documents; -- there are several joined tables in fact...
The client application uses the view usually with some WHERE conditions on several fields, and might also use paging of the results, so OFFSET and LIMIT may also be applied.
Now, on top of the actual resource list computed by vw_resources, I only want to display resources which the current user is allowed for. There is quite a complex set of rules regarding privileges (they depend on several attributes of the resources in question, explicit ACLs, implicit rules based on user roles or relations to other users...) so I wanted to encapsulate all of them in a single function. To prevent repetitive costly queries for each resource, the function takes a list of resource IDs, evaluates the privileges for all of them at once, and returns the set of the requested resource IDs together with the according privileges (read/write is distinguished). It looks roughly like this:
CREATE FUNCTION list_privileges(resource_ids BIGINT[])
RETURNS TABLE (resource_id BIGINT, privilege TEXT)
AS $function$
BEGIN
-- the function lists privileges for a user that would get passed in an argument - omitting that for simplicity
RAISE NOTICE 'list_privileges called'; -- for diagnostic purposes
-- for illustration, let's simply grant write privileges for any odd resource:
RETURN QUERY SELECT id, (CASE WHEN id % 2 = 1 THEN 'write' ELSE 'none' END)
FROM unnest(resource_ids) id;
END;
$function$ LANGUAGE plpgsql STABLE;
The question is how to integrate such a function in the vw_resources view for it to give only resources the user is privileged for (i.e., has 'read' or 'write' privilege).
A trivial solution would use a CTE:
CREATE VIEW vw_resources AS
WITH base_data AS (
SELECT * FROM documents
)
SELECT base_data.*, priv.privilege
FROM base_data
JOIN list_privileges((SELECT array_agg(resource_id) FROM base_data)) AS priv USING (resource_id)
WHERE privilege IN ('read', 'write');
The problem is that the view itself gives too much rows - some WHERE conditions and OFFSET/LIMIT clauses are only applied to the view itself, like SELECT * FROM vw_resources WHERE id IN (1,2,3) LIMIT 10 (any complex filtering might be requested by the client application). And since PostgreSQL is unable to push the conditions down the CTE, the list_privileges(BIGINT[]) function ends up with evaluating privileges for all resources in the database, which effectively kills the performance.
So I attempted to use a window function which would collect resource IDs from the whole result set and join the list_privileges(BIGINT[]) function in an outer query, like illustrated below, but the list_privileges(BIGINT[]) function ends up being called repetitively for each row (as testified by 'list_privileges called' notices), which kinda ruins the previous effort:
CREATE VIEW vw_resources AS
SELECT d.*, priv.privilege
FROM (
SELECT *, array_agg(resource_id) OVER () AS collected
FROM documents
) AS d
JOIN list_privileges(d.collected) AS priv USING (resource_id)
WHERE privilege IN ('read', 'write');
I would resort to forcing clients to give two separate queries, the first taking the vw_resources without privileges applied, the second calling the list_privileges(BIGINT[]) function passing it the list of resource IDs fetched by the first query, and filtering the disallowed resources on the client side. It is quite clumsy for the client, though, and obtaining e.g. the first 20 allowed resources would be practically impossible as limiting the first query simply does not get it - if some resources are filtered out due to privileges then we simply don't have 20 rows in the overall result...
Any help welcome!
P.S. For the sake of completeness, I append a sample documents table:
CREATE TABLE documents (resource_id BIGINT, content TEXT);
INSERT INTO documents VALUES (1,'a'),(2,'b'),(3,'c');

If you must use plpgsql then create the function taking no arguments
create function list_privileges()
returns table (resource_id bigint, privilege text)
as $function$
begin
raise notice 'list_privileges called'; -- for diagnostic purposes
return query select 1, case when 1 % 2 = 1 then 'write' else 'none' end
;
end;
$function$ language plpgsql stable;
And join it to the other complex query to form the vw_resources view
create view vw_resources as
select *
from
documents d
inner join
list_privileges() using(resource_id)
The filter conditions will be added at query time
select *
from vw_resources
where
id in (1,2,3)
and
privilege in ('read', 'write')
Let the planner do its optimization magic and check the explain output before any "premature optimization".
This is just a conjecture: The function might make it harder or impossible for the planner to optimize.
If plpgsql is not really necessary, and that is very frequent, I would just create a view in instead of the function
create view vw_list_privileges as
select
1 as resource_id,
case when 1 % 2 = 1 then 'write' else 'none' end as privilege
And join it the same way to the complex query
create view vw_resources as
select *
from
documents d
inner join
vw_list_privileges using(resource_id)

Related

PostgreSQL. Is such function vulnerable to SQL injection or is it safe?

Functions that looks problematic
I'm exploring postgresql database and I see a recurring pattern:
CREATE OR REPLACE FUNCTION paginated_class(_orderby text DEFAULT NULL, _limit int DEFAULT 10, _offset int DEFAULT 0)
RETURNS SETOF pg_class
LANGUAGE PLPGSQL
AS $$
BEGIN
return query execute'
select * from pg_class
order by '|| coalesce (_orderby, 'relname ASC') ||'
limit $1 offset $1
'
USING _limit, _offset;
END;
$$;
Sample usage:
SELECT * FROM paginated_class(_orderby:='reltype DESC, relowner ASC ')
Repeating is:
_orderby is passed as text. It could be any combination of fields of returned SETOF type. E.g. 'relname ASC, reltype DESC'
_orderby parameter is not sanitized or checked in any way
_limit and _offset are integers
DB Fiddle for that: https://www.db-fiddle.com/f/vF6bCN37yDrjBiTEsdEwX6/1
Question: is such function vulnerable to SQL injection or not?
By external signs it's possible to suspect that such function is vulnerable to sql injection.
But all my attempts to find combination of params failed.
E.g.
CREATE TABLE T(id int);
SELECT * FROM paginated_class(_orderby:='reltype; DROP TABLE T; SELECT * FROM pg_class');
will return "Query Error: error: cannot open multi-query plan as cursor".
I did not found a way to exploit vulnerability if it exists with UPDATE/INSERT/DELETE.
So can we conclude that such function is actually safe?
If so: then why?
UPDATE. Possible plan for attack
Maybe I was not clear: I'm not asking about general guidelines but for experimental exploit of vulnerability or proof that such exploit is not possible.
DB Fiddle for this: https://www.db-fiddle.com/f/vF6bCN37yDrjBiTEsdEwX6/4 (or you can provide other of course)
My conclusions so far
A. Such attack could be possible if _orderby will have parts:
sql code that suppresses output of first SELECT
do something harmful
select * from pg_class so that it satisfies RETURNS SETOF pg_class
E.g.
SELECT * FROM paginated_class(_orderby:='relname; DELETE FROM my_table; SELECT * FROM pg_class')
It's easy for 2 and 3. I don't know a way to do 1st part.
This will generate: "error: cannot open multi-query plan as cursor"
B. If it's not possible to suppress first SELECT
Then:
every postgresql function works in separate transaction
because of error this transaction will be rollbacked
there is no autonomous transactions as in Oracle
for non-transactional ops: I know only about sequence-related ops
everything else be that DML or DDL is transactional
So? Can we conclude that such function is actually safe?
Or I'm missing something?
UPDATE 2. Attack using prepared function
From answer https://stackoverflow.com/a/69189090/1168212
A. It's possible to implement Denial-of-service attack putting expensive calculation
B. Side-effects:
If you put a function with side effects into the ORDER BY clause, you may also be able to modify data.
Let's try the latter:
CREATE FUNCTION harmful_fn()
RETURNS bool
LANGUAGE SQL
AS '
DELETE FROM my_table;
SELECT true;
';
SELECT * FROM paginated_class(_orderby:='harmful_fn()', _limit:=1);
https://www.db-fiddle.com/f/vF6bCN37yDrjBiTEsdEwX6/8
Yes.
So if an attacker has right to create functions: non-DOS attack is possible too.
I accept Laurenz Albe answer but: is it possible to do non-DOS attack without function?
Ideas?
No, that is not safe. An attacker could put any code into your ORDER BY clause via the _orderby parameter.
For example, you can pass an arbitrary subquery, as long as it returns only a single row: (SELECT max(i) FROM generate_series(1, 100000000000000) AS i). That can easily be used for a denial-of-service attack, if the query is expensive enough. Or, like with this example, you can cause a (brief) out-of-space condition with temporary files.
If you put a function with side effects into the ORDER BY clause, you may also be able to modify data.

Are multiple calls to a STABLE function optimized across CTEs?

According to PostgreSQL: Documentation: 38.7. Function Volatility Categories:
A STABLE function cannot modify the database and is guaranteed to return the same results given the same arguments for all rows within a single statement. This category allows the optimizer to optimize multiple calls of the function to a single call.
What about when using CTEs? If I call a STABLE function inside a CTE and then again inside the primary SELECT query, will the optimizer optimize both calls of the function to a single call?
How can you tell? (I don't know how to use EXPLAIN.)
In the impractical example below, I want to make sure that the function get_user_by_id() is only called once.
CREATE TABLE users (
id bigserial PRIMARY KEY,
username varchar(64) NOT NULL
);
CREATE FUNCTION get_user_by_id(_id bigint) RETURNS users AS $$
SELECT *
FROM users
WHERE id = _id
$$ LANGUAGE SQL STABLE;
INSERT INTO users (username) VALUES ('user1');
WITH error AS (
SELECT -1 AS code
WHERE (SELECT get_user_by_id(3) IS NULL)
)
SELECT code AS id, NULL AS username FROM error
UNION ALL
SELECT * FROM get_user_by_id(3) WHERE id IS NOT NULL;
I think the answer is no, multiple calls to a STABLE function are not optimized across CTEs. I don't know how to prove it, but according to PostgreSQL: Documentation: 11: 7.8. WITH Queries (Common Table Expressions), CTEs seem to be separate "auxiliary statements". And, they don't seem to be well-integrated with the parent query. For example:
. . . the optimizer is less able to push restrictions from the parent query down into a WITH query than an ordinary subquery. The WITH query will generally be evaluated as written, without suppression of rows that the parent query might discard afterwards.

Hana - Select without Cursor

I am doing a select in a stored procedure with a cursor, I would to know If I could do the same select without using a cursor.
PROCEDURE "DOUGLAS"."tmp.douglas.yahoo::testando" ( )
LANGUAGE SQLSCRIPT
SQL SECURITY INVOKER
DEFAULT SCHEMA DOUGLAS
AS
BEGIN
/*****************************
procedure logic
*****************************/
declare R varchar(50);
declare cursor users FOR select * from USERS WHERE CREATE_TIME between ADD_SECONDS (CURRENT_TIMESTAMP , -7200 ) and CURRENT_TIMESTAMP;
FOR R AS users DO
CALL _SYS_REPO.GRANT_ACTIVATED_ROLE('dux.health.model.roles::finalUser',R.USER_NAME);
END FOR;
END;
Technically you could convert the result set into an ARRAY and then loop over the array - but for what?
The main problem is that you want to automatically grant permissions on any users that match your time based WHERE condition.
This is not a good idea in most scenarios.
The point of avoiding cursors is to allow the DBMS to optimize SQL commands. Telling the DBMS what data you want, not how to produce it.
In this example it really wouldn't make any difference performance wise.
A much more important factor is that you run SELECT * even though you only need the USER_NAME and that your R variable is declared as VARCHAR(50) (which is wrong if you wanted to store the USER_NAME in it) but never actually used.
The R variable in the FOR loop exists in a different validity context and actually contains the current row of the cursor.

Executing queries dynamically in PL/pgSQL

I have found solutions (I think) to the problem I'm about to ask for on Oracle and SQL Server, but can't seem to translate this into a Postgres solution. I am using Postgres 9.3.6.
The idea is to be able to generate "metadata" about the table content for profiling purposes. This can only be done (AFAIK) by having queries run for each column so as to find out, say... min/max/count values and such. In order to automate the procedure, it is preferable to have the queries generated by the DB, then executed.
With an example salesdata table, I'm able to generate a select query for each column, returning the min() value, using the following snippet:
SELECT 'SELECT min('||column_name||') as minval_'||column_name||' from salesdata '
FROM information_schema.columns
WHERE table_name = 'salesdata'
The advantage being that the db will generate the code regardless of the number of columns.
Now there's a myriad places I had in mind for storing these queries, either a variable of some sort, or a table column, the idea being to then have these queries execute.
I thought of storing the generated queries in a variable then executing them using the EXECUTE (or EXECUTE IMMEDIATE) statement which is the approach employed here (see right pane), but Postgres won't let me declare a variable outside a function and I've been scratching my head with how this would fit together, whether that's even the direction to follow, perhaps there's something simpler.
Would you have any pointers, I'm currently trying something like this, inspired by this other question but have no idea whether I'm headed in the right direction:
CREATE OR REPLACE FUNCTION foo()
RETURNS void AS
$$
DECLARE
dyn_sql text;
BEGIN
dyn_sql := SELECT 'SELECT min('||column_name||') from salesdata'
FROM information_schema.columns
WHERE table_name = 'salesdata';
execute dyn_sql
END
$$ LANGUAGE PLPGSQL;
System statistics
Before you roll your own, have a look at the system table pg_statistic or the view pg_stats:
This view allows access only to rows of pg_statistic that correspond
to tables the user has permission to read, and therefore it is safe to
allow public read access to this view.
It might already have some of the statistics you are about to compute. It's populated by ANALYZE, so you might run that for new (or any) tables before checking.
-- ANALYZE tbl; -- optionally, to init / refresh
SELECT * FROM pg_stats
WHERE tablename = 'tbl'
AND schemaname = 'public';
Generic dynamic plpgsql function
You want to return the minimum value for every column in a given table. This is not a trivial task, because a function (like SQL in general) demands to know the return type at creation time - or at least at call time with the help of polymorphic data types.
This function does everything automatically and safely. Works for any table, as long as the aggregate function min() is allowed for every column. But you need to know your way around PL/pgSQL.
CREATE OR REPLACE FUNCTION f_min_of(_tbl anyelement)
RETURNS SETOF anyelement
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE (
SELECT format('SELECT (t::%2$s).* FROM (SELECT min(%1$s) FROM %2$s) t'
, string_agg(quote_ident(attname), '), min(' ORDER BY attnum)
, pg_typeof(_tbl)::text)
FROM pg_attribute
WHERE attrelid = pg_typeof(_tbl)::text::regclass
AND NOT attisdropped -- no dropped (dead) columns
AND attnum > 0 -- no system columns
);
END
$func$;
Call (important!):
SELECT * FROM f_min_of(NULL::tbl); -- tbl being the table name
db<>fiddle here
Old sqlfiddle
You need to understand these concepts:
Dynamic SQL in plpgsql with EXECUTE
Polymorphic types
Row types and table types in Postgres
How to defend against SQL injection
Aggregate functions
System catalogs
Related answer with detailed explanation:
Table name as a PostgreSQL function parameter
Refactor a PL/pgSQL function to return the output of various SELECT queries
Postgres data type cast
How to set value of composite variable field using dynamic SQL
How to check if a table exists in a given schema
Select columns with particular column names in PostgreSQL
Generate series of dates - using date type as input
Special difficulty with type mismatch
I am taking advantage of Postgres defining a row type for every existing table. Using the concept of polymorphic types I am able to create one function that works for any table.
However, some aggregate functions return related but different data types as compared to the underlying column. For instance, min(varchar_column) returns text, which is bit-compatible, but not exactly the same data type. PL/pgSQL functions have a weak spot here and insist on data types exactly as declared in the RETURNS clause. No attempt to cast, not even implicit casts, not to speak of assignment casts.
That should be improved. Tested with Postgres 9.3. Did not retest with 9.4, but I am pretty sure, nothing has changed in this area.
That's where this construct comes in as workaround:
SELECT (t::tbl).* FROM (SELECT ... FROM tbl) t;
By casting the whole row to the row type of the underlying table explicitly we force assignment casts to get original data types for every column.
This might fail for some aggregate function. sum() returns numeric for a sum(bigint_column) to accommodate for a sum overflowing the base data type. Casting back to bigint might fail ...
#Erwin Brandstetter, Many thanks for the extensive answer. pg_stats does indeed provide a few things, but what I really need to draw a complete profile is a variety of things, min, max values, counts, count of nulls, mean etc... so a bunch of queries have to be ran for each columns, some with GROUP BY and such.
Also, thanks for highlighting the importance of data types, i was sort of expecting this to throw a spanner in the works at some point, my main concern was with how to automate the query generation, and its execution, this last bit being my main concern.
I have tried the function you provide (I probably will need to start learning some plpgsql) but get a error at the SELECT (t::tbl) :
ERROR: type "tbl" does not exist
btw, what is the (t::abc) notation referred as, in python this would be a list slice, but it’s probably not the case in PLPGSQL

How to ensure all Postgres queries have WHERE clause?

I am building a multi tenant system in which many clients data will be in the same database.
I am paranoid about some developer forgetting to put the appropriate "WHERE clientid = " onto every query.
Is there a way to, at the database level, ensure that every query has the correct WHERE = clause, thereby ensuring that no query will ever be executed without also specifying which client the query is for?
I was wondering if maybe the query rewrite rules could do this but it's not clear to me if they can do so.
thanks
Deny permissions on the table t for all users. Then give them permission on a function f that returns the table and accepts the parameter client_id:
create or replace function f(_client_id integer)
returns setof t as
$$
select *
from t
where client_id = _client_id
$$ language sql
;
select * from f(1);
client_id | v
-----------+---
1 | 2
Another way is to create a VIEW for:
SELECT *
FROM t
WHERE t.client_id = current_setting('session_vars.client_id');
And use SET session_vars.client_id = 1234 at the start of the session.
Deny acces to the tables, and leave only permissins for views.
You may need to create rewrite rules for UPDATE, DELETE, INSERT for the views (it depends on your PostgreSQL version).
Performance penalty will be small (if any) because PostgreSQL will rewrite the queries before execution.