How to assert the expect number of result rows of a sub-query in PostgreSQL? - postgresql

Often times I find myself writing code such as:
const fooId = await pool.oneFirst(sql`
SELECT id
FROM foo
WHERE nid = 'BAR'
`);
await pool.query(sql`
INSERT INTO bar (foo_id)
VALUES (${fooId})
`);
oneFirst is a Slonik query method that ensures that the query returns exactly 1 result. It is needed there, because foo_id also accepts NULL as a valid value, i.e. if SELECT id FROM foo WHERE status = 'BAR' returned no results, this part of the program would fail silently.
The problem with this approach is that it causes two database roundtrips for what should be a single operation.
In a perfect world, postgresql supported assertions natively, e.g.
INSERT INTO bar (foo_id)
VALUES
(
(
SELECT id
FROM foo
WHERE nid = 'BAR'
EXPECT 1 RESULT
)
)
EXPECT 1 RESULT is a made up DSL.
The expectation is that EXPECT 1 RESULT would cause PostgreSQL to throw an error if that query returns anything other than 1 result.
Since PostgreSQL does not support this natively, what are the client-side solutions?

You can use
const fooId = await pool.oneFirst(sql`
INSERT INTO bar (foo_id)
SELECT id
FROM foo
WHERE nid = 'BAR'
RETURNING foo_id;
`);
This will insert all rows matched by the condition in foo into bar, and Slonik will throw if that was not exactly one row.
Alternatively, if you insist on using VALUES with a subquery, you can do
INSERT INTO bar (foo_id)
SELECT tmp.id
FROM (VALUES (
SELECT id
FROM foo
WHERE nid = 'BAR'
)) AS tmp
WHERE tmp.id IS NOT NULL
Nothing would be inserted if no row did match the condition and your application could check that. Postgres would throw an exception if multiple rows were matched, since a subquery in an expression must return at most one row.

That's a cool idea.
You can cause an error on extra results by putting the select in a context where only one result is expected. For example, just writing
SELECT (SELECT id from foo WHERE nid = 'bar')
will get you a decent error message on multiple results: error: more than one row returned by a subquery used as an expression.
To handle the case where nothing is returned, you could use COALESCE.

Related

postgres case statement with subquery

I have a subquery like this
with subquery as (select host from table_A where << some condition >>)
and in my main query, I am querying data from another table called table_B, and one of the columns is called destination_host. Now I need to check if the destination_host is in the list returned from my subquery, then I want to output TypeA in my select statement or else TypeB. My select statement looks something like
select name, place, destination_host
from table_B
where <<some condition>>
I want to output a fourth column that is based on a condition check, let's say we call this host_category and if the destination_host value exists in the subquery then I want to add value typeA or else typeB. Please can you help me understand how to write this. I understand that it is hard to provide guidance if you don't have actual data to work with.
I tried using case statements such as this one:
when (destination_host in (select host from subquery)) THEN 'typeA'
when (destination_host not in (select host from subquery)) THEN 'typeB'
end as host_category
but I don't think this is the way to solve this problem.
I would use EXISTS:
WITH subquery AS (...)
SELECT CASE WHEN EXISTS (SELECT 1 FROM subquery
WHERE subquery.host = table_b.destination_host)
THEN 'typeA'
ELSE 'typeB'
END
FROM table_b;
With queries like that, you have to take care of NULL values. If table_b.destination_host is NULL, the row will always show up as typeB, because NULL = NULL is not TRUE in SQL.

Postgres dynamic filter conditions

I want to dynamically filter through data based on condition, which is stored in specific column. This condition can change for every row.
For example I have a table my_table with couple of columns, one of them is called foo, where there are couple of filter conditions such as AND bar > 1 or in the next row AND bar > 2 or in the next row AND bar = 33.
I have a query which looks like:
SELECT something from somewhere
LEFT JOIN otherthing on some_condition
WHERE first_condition AND second_condition AND
here_i_want_dynamically_load_condition_from_my_table.foo
What is the correct way to do it? I have read some articles about dynamic queries, but I am not able to find a correct way.
This is impossible in pure SQL: at query time, the planner has to know your exact logic. Now, you can hide it away in a function (in pseudo-sql):
CREATE FUNCTION do_I_filter_or_not(some_id) RETURNS boolean AS '
BEGIN
value = select some_value from table where id = some_id
condition_type = SELECT ... --Query a condition type for this row
condition_value = SELECT ... --Query a condition value for this row
if condition_type = 'equals' and condition_value = value
return true
if condition_type = 'greater_than' and condition_value < value
return true
if condition_type = 'lower_than' and condition_value > value
return true
return false;
END;
'
LANGUAGE 'plpgsql';
And query it like this:
SELECT something
FROM somewhere
LEFT JOIN otherthing on some_condition
WHERE first_condition
AND second_condition
AND do_I_filter_or_not(somewhere.id)
Now the performance will be bad: you have to invoke that function potentially on every row in the query; triggering lots of subqueries.
Thinking about it, if you just want <, >, =, and you have a table (filter_criteria) describing for each id what the criteria is you can do it:
CREATE TABLE filter_criteria AS(
some_id integer,
equals_threshold integer,
greater_than_threshold integer,
lower_than_threshold integer
-- plus a check that two thresholds must be null, and one not null
)
INSERT INTO filter_criteria (1, null, 5, null); -- for > 5
And query like this:
SELECT something
FROM somewhere
LEFT JOIN otherthing on some_condition
LEFT JOIN filter_criteria USING (some_id)
WHERE first_condition
AND second_condition
AND COALESCE(bar = equals_threshold, true)
AND COALESCE(bar > greater_than_threshold, true)
AND COALESCE(bar < lower_than_threshold, true)
The COALESCEs are here to default to not filtering (AND true) if the threshold is missing (bar = equals_threshold will yield null instead of a boolean).
The planner has to know your exact logic at query time: now you're just doing 3 passes of filtering, with a =, <, > check each time. That'd still be more performant than idea #1 with all the subquerying.

RETURNING parameter in SELECT (postgres)

SELECT * FROM users WHERE user_id = '423423432r32' RETURNING name
Does the RETURNING clause exist in SELECT, and if it doesn't, what can I use instead of to get the same result?
From your comments and from what I can understand from your question what you want is just the result of the query
SELECT name FROM users WHERE user_id = '423423432r32'
the return statement is more used for things like
insert into table1 (foo, bar, baz)
values (X, Y, Z)
returning *
so that you can see the updated or inserted rows,
more info here

postgres `order by` argument type

What is the argument type for the order by clause in Postgresql?
I came across a very strange behaviour (using Postgresql 9.5). Namely, the query
select * from unnest(array[1,4,3,2]) as x order by 1;
produces 1,2,3,4 as expected. However the query
select * from unnest(array[1,4,3,2]) as x order by 1::int;
produces 1,4,3,2, which seems strange. Similarly, whenever I replace 1::int with whatever function (e.g. greatest(0,1)) or even case operator, the results are unordered (on the contrary to what I would expect).
So which type should an argument of order by have, and how do I get the expected behaviour?
This is expected (and documented) behaviour:
A sort_expression can also be the column label or number of an output column
So the expression:
order by 1
sorts by the first column of the result set (as defined by the SQL standard)
However the expression:
order by 1::int
sorts by the constant value 1, it's essentially the same as:
order by 'foo'
By using a constant value for the order by all rows have the same sort value and thus aren't really sorted.
To sort by an expression, just use that:
order by
case
when some_column = 'foo' then 1
when some_column = 'bar' then 2
else 3
end
The above sorts the result based on the result of the case expression.
Actually I have a function with an integer argument which indicates the column to be used in the order by clause.
In a case when all columns are of the same type, this can work: :
SELECT ....
ORDER BY
CASE function_to_get_a_column_number()
WHEN 1 THEN column1
WHEN 2 THEN column2
.....
WHEN 1235 THEN column1235
END
If columns are of different types, you can try:
SELECT ....
ORDER BY
CASE function_to_get_a_column_number()
WHEN 1 THEN column1::varchar
WHEN 2 THEN column2::varchar
.....
WHEN 1235 THEN column1235::varchar
END
But these "workarounds" are horrible. You need some other approach than the function returning a column number.
Maybe a dynamic SQL ?
I would say that dynamic SQL (thanks #kordirko and the others for the hints) is the best solution to the problem I originally had in mind:
create temp table my_data (
id serial,
val text
);
insert into my_data(id, val)
values (default, 'a'), (default, 'c'), (default, 'd'), (default, 'b');
create function fetch_my_data(col text)
returns setof my_data as
$f$
begin
return query execute $$
select * from my_data
order by $$|| quote_ident(col);
end
$f$ language plpgsql;
select * from fetch_my_data('val'); -- order by val
select * from fetch_my_data('id'); -- order by id
In the beginning I thought this could be achieved using case expression in the argument of the order by clause - the sort_expression. And here comes the tricky part which confused me: when sort_expression is a kind of identifier (name of a column or a number of a column), the corresponding column is used when ordering the results. But when sort_expression is some value, we actually order the results using that value itself (computed for each row). This is #a_horse_with_no_name's answer rephrased.
So when I queried ... order by 1::int, in a way I have assigned value 1 to each row and then tried to sort an array of ones, which clearly is useless.
There are some workarounds without dynamic queries, but they require writing more code and do not seem to have any significant advantages.

Most efficient way to do a bulk UPDATE with pairs of input

Suppose I want to do a bulk update, setting a=b for a collection of a values. This can easily be done with a sequence of UPDATE queries:
UPDATE foo SET value='foo' WHERE id=1
UPDATE foo SET value='bar' WHERE id=2
UPDATE foo SET value='baz' WHERE id=3
But now I suppose I want to do this in bulk. I have a two dimensional array containing the ids and new values:
[ [ 1, 'foo' ]
[ 2, 'bar' ]
[ 3, 'baz' ] ]
Is there an efficient way to do these three UPDATEs in a single SQL query?
Some solutions I have considered:
A temporary table
CREATE TABLE temp ...;
INSERT INTO temp (id,value) VALUES (....);
UPDATE foo USING temp ...
But this really just moves the problem. Although it may be easier (or at least less ugly) to do a bulk INSERT, there are still a minimum of three queries.
Denormalize the input by passing the data pairs as SQL arrays. This makes the query incredibly ugly, though
UPDATE foo
USING (
SELECT
split_part(x,',',1)::INT AS id,
split_part(x,',',2)::VARCHAR AS value
FROM (
SELECT UNNEST(ARRAY['1,foo','2,bar','3,baz']) AS x
) AS x;
)
SET value=x.value WHERE id=x.id
This makes it possible to use a single query, but makes that query ugly, and inefficient (especially for mixed and/or complex data types).
Is there a better solution? Or should I resort to multiple UPDATE queries?
Normally you want to batch-update from a table with sufficient index to make the merge easy:
CREATE TEMP TABLE updates_table
( id integer not null primary key
, val varchar
);
INSERT into updates_table(id, val) VALUES
( 1, 'foo' ) ,( 2, 'bar' ) ,( 3, 'baz' )
;
UPDATE target_table t
SET value = u.val
FROM updates_table u
WHERE t.id = u.id
;
So you should probably populate your update_table by something like:
INSERT into updates_table(id, val)
SELECT
split_part(x,',',1)::INT AS id,
split_part(x,',',2)::VARCHAR AS value
FROM (
SELECT UNNEST(ARRAY['1,foo','2,bar','3,baz'])
) AS x
;
Remember: an index (or the primary key) on the id field in the updates_table is important. (but for small sets like this one, a hashjoin will probably by chosen by the optimiser)
In addition: for updates, it is important to avoid updates with the same value, these cause extra rowversions to be created + plus the resulting VACUUM activity after the update was committed:
UPDATE target_table t
SET value = u.val
FROM updates_table u
WHERE t.id = u.id
AND (t.value IS NULL OR t.value <> u.value)
;
You can use CASE conditional expression:
UPDATE foo
SET "value" = CASE id
WHEN 1 THEN 'foo'
WHEN 2 THEN 'bar'
WHEN 3 THEN 'baz'
END