Why is SELECT without columns valid - postgresql

I accidently wrote a query like select from my_table; and surprisingly it is valid statement. Even more interesting to me is that even SELECT; is a valid query in PostgreSQL. You can try to write a lot funny queries with this:
select union all select;
with t as (select) select;
select from (select) a, (select) b;
select where exists (select);
create table a (b int); with t as (select) insert into a (select from t);
Is this a consequence of some definition SQL standard, or there is some use case for it, or it is just funny behavior that no one cared to programatically restrict?

Right from the manual:
The list of output expressions after SELECT can be empty, producing a zero-column result table. This is not valid syntax according to the SQL standard. PostgreSQL allows it to be consistent with allowing zero-column tables. However, an empty list is not allowed when DISTINCT is used.
The possibility of "zero-column" tables is a side effect of the table inheritance if I'm not mistaken. There were discussions over this on the Postgres mailing lists (but I can't find them right now)

Related

PostgreSQL. Is such function vulnerable to SQL injection or is it safe?

Functions that looks problematic
I'm exploring postgresql database and I see a recurring pattern:
CREATE OR REPLACE FUNCTION paginated_class(_orderby text DEFAULT NULL, _limit int DEFAULT 10, _offset int DEFAULT 0)
RETURNS SETOF pg_class
LANGUAGE PLPGSQL
AS $$
BEGIN
return query execute'
select * from pg_class
order by '|| coalesce (_orderby, 'relname ASC') ||'
limit $1 offset $1
'
USING _limit, _offset;
END;
$$;
Sample usage:
SELECT * FROM paginated_class(_orderby:='reltype DESC, relowner ASC ')
Repeating is:
_orderby is passed as text. It could be any combination of fields of returned SETOF type. E.g. 'relname ASC, reltype DESC'
_orderby parameter is not sanitized or checked in any way
_limit and _offset are integers
DB Fiddle for that: https://www.db-fiddle.com/f/vF6bCN37yDrjBiTEsdEwX6/1
Question: is such function vulnerable to SQL injection or not?
By external signs it's possible to suspect that such function is vulnerable to sql injection.
But all my attempts to find combination of params failed.
E.g.
CREATE TABLE T(id int);
SELECT * FROM paginated_class(_orderby:='reltype; DROP TABLE T; SELECT * FROM pg_class');
will return "Query Error: error: cannot open multi-query plan as cursor".
I did not found a way to exploit vulnerability if it exists with UPDATE/INSERT/DELETE.
So can we conclude that such function is actually safe?
If so: then why?
UPDATE. Possible plan for attack
Maybe I was not clear: I'm not asking about general guidelines but for experimental exploit of vulnerability or proof that such exploit is not possible.
DB Fiddle for this: https://www.db-fiddle.com/f/vF6bCN37yDrjBiTEsdEwX6/4 (or you can provide other of course)
My conclusions so far
A. Such attack could be possible if _orderby will have parts:
sql code that suppresses output of first SELECT
do something harmful
select * from pg_class so that it satisfies RETURNS SETOF pg_class
E.g.
SELECT * FROM paginated_class(_orderby:='relname; DELETE FROM my_table; SELECT * FROM pg_class')
It's easy for 2 and 3. I don't know a way to do 1st part.
This will generate: "error: cannot open multi-query plan as cursor"
B. If it's not possible to suppress first SELECT
Then:
every postgresql function works in separate transaction
because of error this transaction will be rollbacked
there is no autonomous transactions as in Oracle
for non-transactional ops: I know only about sequence-related ops
everything else be that DML or DDL is transactional
So? Can we conclude that such function is actually safe?
Or I'm missing something?
UPDATE 2. Attack using prepared function
From answer https://stackoverflow.com/a/69189090/1168212
A. It's possible to implement Denial-of-service attack putting expensive calculation
B. Side-effects:
If you put a function with side effects into the ORDER BY clause, you may also be able to modify data.
Let's try the latter:
CREATE FUNCTION harmful_fn()
RETURNS bool
LANGUAGE SQL
AS '
DELETE FROM my_table;
SELECT true;
';
SELECT * FROM paginated_class(_orderby:='harmful_fn()', _limit:=1);
https://www.db-fiddle.com/f/vF6bCN37yDrjBiTEsdEwX6/8
Yes.
So if an attacker has right to create functions: non-DOS attack is possible too.
I accept Laurenz Albe answer but: is it possible to do non-DOS attack without function?
Ideas?
No, that is not safe. An attacker could put any code into your ORDER BY clause via the _orderby parameter.
For example, you can pass an arbitrary subquery, as long as it returns only a single row: (SELECT max(i) FROM generate_series(1, 100000000000000) AS i). That can easily be used for a denial-of-service attack, if the query is expensive enough. Or, like with this example, you can cause a (brief) out-of-space condition with temporary files.
If you put a function with side effects into the ORDER BY clause, you may also be able to modify data.

Executing queries dynamically in PL/pgSQL

I have found solutions (I think) to the problem I'm about to ask for on Oracle and SQL Server, but can't seem to translate this into a Postgres solution. I am using Postgres 9.3.6.
The idea is to be able to generate "metadata" about the table content for profiling purposes. This can only be done (AFAIK) by having queries run for each column so as to find out, say... min/max/count values and such. In order to automate the procedure, it is preferable to have the queries generated by the DB, then executed.
With an example salesdata table, I'm able to generate a select query for each column, returning the min() value, using the following snippet:
SELECT 'SELECT min('||column_name||') as minval_'||column_name||' from salesdata '
FROM information_schema.columns
WHERE table_name = 'salesdata'
The advantage being that the db will generate the code regardless of the number of columns.
Now there's a myriad places I had in mind for storing these queries, either a variable of some sort, or a table column, the idea being to then have these queries execute.
I thought of storing the generated queries in a variable then executing them using the EXECUTE (or EXECUTE IMMEDIATE) statement which is the approach employed here (see right pane), but Postgres won't let me declare a variable outside a function and I've been scratching my head with how this would fit together, whether that's even the direction to follow, perhaps there's something simpler.
Would you have any pointers, I'm currently trying something like this, inspired by this other question but have no idea whether I'm headed in the right direction:
CREATE OR REPLACE FUNCTION foo()
RETURNS void AS
$$
DECLARE
dyn_sql text;
BEGIN
dyn_sql := SELECT 'SELECT min('||column_name||') from salesdata'
FROM information_schema.columns
WHERE table_name = 'salesdata';
execute dyn_sql
END
$$ LANGUAGE PLPGSQL;
System statistics
Before you roll your own, have a look at the system table pg_statistic or the view pg_stats:
This view allows access only to rows of pg_statistic that correspond
to tables the user has permission to read, and therefore it is safe to
allow public read access to this view.
It might already have some of the statistics you are about to compute. It's populated by ANALYZE, so you might run that for new (or any) tables before checking.
-- ANALYZE tbl; -- optionally, to init / refresh
SELECT * FROM pg_stats
WHERE tablename = 'tbl'
AND schemaname = 'public';
Generic dynamic plpgsql function
You want to return the minimum value for every column in a given table. This is not a trivial task, because a function (like SQL in general) demands to know the return type at creation time - or at least at call time with the help of polymorphic data types.
This function does everything automatically and safely. Works for any table, as long as the aggregate function min() is allowed for every column. But you need to know your way around PL/pgSQL.
CREATE OR REPLACE FUNCTION f_min_of(_tbl anyelement)
RETURNS SETOF anyelement
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE (
SELECT format('SELECT (t::%2$s).* FROM (SELECT min(%1$s) FROM %2$s) t'
, string_agg(quote_ident(attname), '), min(' ORDER BY attnum)
, pg_typeof(_tbl)::text)
FROM pg_attribute
WHERE attrelid = pg_typeof(_tbl)::text::regclass
AND NOT attisdropped -- no dropped (dead) columns
AND attnum > 0 -- no system columns
);
END
$func$;
Call (important!):
SELECT * FROM f_min_of(NULL::tbl); -- tbl being the table name
db<>fiddle here
Old sqlfiddle
You need to understand these concepts:
Dynamic SQL in plpgsql with EXECUTE
Polymorphic types
Row types and table types in Postgres
How to defend against SQL injection
Aggregate functions
System catalogs
Related answer with detailed explanation:
Table name as a PostgreSQL function parameter
Refactor a PL/pgSQL function to return the output of various SELECT queries
Postgres data type cast
How to set value of composite variable field using dynamic SQL
How to check if a table exists in a given schema
Select columns with particular column names in PostgreSQL
Generate series of dates - using date type as input
Special difficulty with type mismatch
I am taking advantage of Postgres defining a row type for every existing table. Using the concept of polymorphic types I am able to create one function that works for any table.
However, some aggregate functions return related but different data types as compared to the underlying column. For instance, min(varchar_column) returns text, which is bit-compatible, but not exactly the same data type. PL/pgSQL functions have a weak spot here and insist on data types exactly as declared in the RETURNS clause. No attempt to cast, not even implicit casts, not to speak of assignment casts.
That should be improved. Tested with Postgres 9.3. Did not retest with 9.4, but I am pretty sure, nothing has changed in this area.
That's where this construct comes in as workaround:
SELECT (t::tbl).* FROM (SELECT ... FROM tbl) t;
By casting the whole row to the row type of the underlying table explicitly we force assignment casts to get original data types for every column.
This might fail for some aggregate function. sum() returns numeric for a sum(bigint_column) to accommodate for a sum overflowing the base data type. Casting back to bigint might fail ...
#Erwin Brandstetter, Many thanks for the extensive answer. pg_stats does indeed provide a few things, but what I really need to draw a complete profile is a variety of things, min, max values, counts, count of nulls, mean etc... so a bunch of queries have to be ran for each columns, some with GROUP BY and such.
Also, thanks for highlighting the importance of data types, i was sort of expecting this to throw a spanner in the works at some point, my main concern was with how to automate the query generation, and its execution, this last bit being my main concern.
I have tried the function you provide (I probably will need to start learning some plpgsql) but get a error at the SELECT (t::tbl) :
ERROR: type "tbl" does not exist
btw, what is the (t::abc) notation referred as, in python this would be a list slice, but it’s probably not the case in PLPGSQL

Postgres Rules Preventing CTE Queries

Using Postgres 9.3:
I am attempting to automatically populate a table when an insert is performed on another table. This seems like a good use for rules, but after adding the rule to the first table, I am no longer able to perform inserts into the second table using the writable CTE. Here is an example:
CREATE TABLE foo (
id INT PRIMARY KEY
);
CREATE TABLE bar (
id INT PRIMARY KEY REFERENCES foo
);
CREATE RULE insertFoo AS ON INSERT TO foo DO INSERT INTO bar VALUES (NEW.id);
WITH a AS (SELECT * FROM (VALUES (1), (2)) b)
INSERT INTO foo SELECT * FROM a
When this is run, I get the error
"ERROR: WITH cannot be used in a query that is rewritten by rules
into multiple queries".
I have searched for that error string, but am only able to find links to the source code. I know that I can perform the above using row-level triggers instead, but it seems like I should be able to do this at the statement level. Why can I not use the writable CTE, when queries like this can (in this case) be easily re-written as:
INSERT INTO foo SELECT * FROM (VALUES (1), (2)) a
Does anyone know of another way that would accomplish what I am attempting to do other than 1) using rules, which prevents the use of "with" queries, or 2) using row-level triggers? Thanks,
        
TL;DR: use triggers, not rules.
Generally speaking, prefer triggers over rules, unless rules are absolutely necessary. (Which, in practice, they never are.)
Using rules introduces heaps of problems which will needlessly complicate your life down the road. You've run into one here. Another (major) one is, for instance, that the number of affected rows will correspond to that of the very last query -- if you're relying on FOUND somewhere and your query is incorrectly reporting that no rows were affected by a query, you'll be in for painful bugs.
Moreover, there's occasional talk of deprecating Postgres rules outright:
http://postgresql.nabble.com/Deprecating-RULES-td5727689.html
As the other answer I definitely recommend using INSTEAD OF triggers before RULEs.
However if for some reason you don't want to change existing VIEW RULEs and still want use WITH you can do so by wrapping the VIEW in a stored procedure:
create function insert_foo(int) returns void as $$
insert into foo values ($1)
$$ language sql;
WITH a AS (SELECT * FROM (VALUES (1), (2)) b)
SELECT insert_foo(a.column1) from a;
This could be useful when using some legacy db through some system that wraps statements with CTEs.

How to cut seconds from an interval column?

In my table results from column work_time (interval type) display as 200:00:00. Is it possible to cut the seconds part, so it will be displayed as 200:00? Or, even better: 200h00min (I've seen it accepts h unit in insert so why not load it like this?).
Preferably, by altering work_time column, not by changing the select query.
This is not something you should do by altering a column but by changing the select query in some way. If you change the column you are changing storage and functional uses, and that's not good. To change it on output, you need to modify how it is retrieved.
You have two basic options. The first is to modify your select queries directly, using to_char(myintervalcol, 'HH24:MI')
However if your issue is that you have a common format you want to have universal access to in your select query, PostgreSQL has a neat trick I call "table methods." You can attach a function to a table in such a way that you can call it in a similar (but not quite identical) syntax to a new column. In this case you would do something like:
CREATE OR REPLACE FUNCTION myinterval_nosecs(mytable) RETURNS text LANGUAGE SQL
IMMUTABLE AS
$$
SELECT to_char($1.myintervalcol, 'HH24:MI');
$$;
This works on the row input, not on the underlying table. As it always returns the same information for the same input, you can mark it immutable and even index the output (meaning it can be run at plan time and indexed used).
To call this, you'd do something like:
SELECT myinterval_nosecs(m) FROM mytable m;
But you can then use the special syntax above to rewrite that as:
SELECT m.myinterval_nosecs FROM mytable m;
Note that since myinterval_nosecs is a function you cannot omit the m. at the beginning. This is because the query planner will rewrite the query in the former syntax and will not guess as to which relation you mean to run it against.

conditional statement in DB2 without a procedure

I need to execute statements conditionally in DB2. I searched for DB2 documentation and though if..then..elseif will serves the purpose. But can't i use if without a procedure.?
My DB2 verion is 9.7.6.
My requirement is I have a table say Group(name,gp_id). And I have another table Group_attr(gp_id,value,elem_id). We can ignore ant the elem_id for the requirement now.
-> I need to check the Group if it have a specific name.
-> If it has then nothing to be done.
-> If it doesn't have I need to add it to the Group. Then I need to insert corresponding rows in the Group_attr. Assume the value and elem_id are static.
You can use an anonymous block for PL/SQL or a compound statement for SQL PL code.
BEGIN ATOMIC
FOR ROW AS
SELECT PK, C1, DISCRETIZE(C1) AS D FROM SOURCE
DO
IF ROW.D IS NULL THEN
INSERT INTO EXCEPT VALUES(ROW.PK, ROW.C1);
ELSE
INSERT INTO TARGET VALUES(ROW.PK, ROW.D);
END IF;
END FOR;
END
Compound statements :
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.sql.ref.doc/doc/r0004240.html
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.apdv.sqlpl.doc/doc/c0053781.html
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.sql.ref.doc/doc/r0004239.html
Anonymous block :
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.apdv.plsql.doc/doc/c0053848.html
http://www.ibm.com/developerworks/data/library/techarticle/dm-0908anonymousblocks/
Many ot this features come since version 9.7
I got a solution for the conditional insert. For the scenario that I mentioned, the solution can be like this.
Insert into Group(name) select 'Name1' from sysibm.sysdummy1 where (select count(*) from Group where name='Name1')=0
Insert into Group_attr(gp_id,value,elem_id) select g.gp_id,'value1','elem1' Group g,group_attr ga where ga.gp_id=g.gp_id and (select count(*) from Group_attr Ga1 where Ga.gp_id=g.gp_id)=0
-- In my case Group_attr will contain some data for sure if the group has exists already