Using perform with a WITH clause in case of multiple rows - postgresql

I'm trying to use PERFORM with a WITH query that returns multiple rows.
CREATE OR REPLACE FUNCTION test_function() RETURNS void AS $$
BEGIN
PERFORM (
WITH selection AS (
SELECT id,
ROW_NUMBER() OVER w AS r,
first_value(id) OVER w AS first_value,
nth_value(id, 5) OVER w AS last_value
FROM mytable
WINDOW w AS (PARTITION BY v.ability_id ORDER BY unit_id ASC)
)
create_question(id, 1, 1, 1)
FROM selection
WHERE ability_id IN (
SELECT ability_id
FROM selection
WHERE last_value > 0.5
ORDER BY first_value DESC
)
AND selection.r <= 5
);
END;
$$ LANGUAGE plpgsql;
and I get the error:
ERROR: more than one row returned by a subquery used as an expression
The postgres doc says it can't be done:
For WITH queries, use PERFORM and then place the query in parentheses. (In this case, the query can only return one row.)
What could be done to solve this problem, apart from writing the With query (called selection here) twice ?

Remark: Your query is missing a SELECT right before create_question(id, 1, 1, 1).
The trick is to modify the query so that it returns a single row.
You can do that by using an aggregate function, e.g. write:
SELECT
count(create_question(id, 1, 1, 1))
FROM selection
...
Then the query only returns a single row and can be used as a subquery in the PERFORM statement.

Related

How to insert into after the last row in a table?

i have this table below named roombooking:
I wrote this code that inserts a new row into roombooking(dont mind the details, just the hotelbookingID):
CREATE OR REPLACE FUNCTION my_function(startdate date , enddate date,idForHotel integer)
RETURNS void AS
$$
BEGIN
INSERT INTO roombooking("hotelbookingID","roomID","bookedforpersonID"
,checkin,checkout,rate)
SELECT rb."hotelbookingID", r."idRoom", p."idPerson"
,startdate-integer'20', startdate-integer'10', rr.rate
FROM(SELECT "hotelbookingID" FROM roombooking
WHERE "hotelbookingID"=
(select "hotelbookingID"
from roombooking
order by "hotelbookingID" desc
limit 1)+1) rb,
(SELECT "idRoom" FROM room
WHERE "idHotel"=idForHotel) r ,
(SELECT "idPerson" FROM person
ORDER BY random()
LIMIT 1) p,
(SELECT rate FROM roomrate
WHERE "idHotel"=idForHotel) rr;
END;
$$
LANGUAGE 'plpgsql';
The problem here is that i want to insert after the last row based on the last hotelbookingID(it is in asc order)
My function works but as i guess it cant find the last row ,in order to perform the insertion after . (I think that the problem can be spotted here :
SELECT "hotelbookingID" FROM roombooking
WHERE "hotelbookingID"=
(select "hotelbookingID"
from roombooking
order by "hotelbookingID" desc
limit 1)+1)
Any help would be valuable. Thank you.
Any approach that uses a subquery to find the maximum existing id is doomed to suffer from race conditions: if two such INSERTs are running concurrently, they will end up with the same number.
Use an identity column:
ALTER TABLE roombooking
ALTER id ADD GENERATED ALWAYS AS IDENTITY (START 100000);
where 100000 is a value greater than the maximum id in the table.
Then all you have to do is not insert anything into id, and the column will be populated automatically.
That WHERE condition makes no sense. There is no row in the roombooking table whose id is 1 + the largest id in the roombooking table.
You simply want to add 1 to the inserted value:
INSERT INTO roombooking("hotelbookingID", …)
SELECT rb."hotelbookingID" + 1, …
-- ^^^^
FROM (
SELECT "hotelbookingID"
FROM roombooking
ORDER BY "hotelbookingID" DESC
LIMIT 1
) rb,
…
That said, I would recommend to simply use a sequence instead (if you don't care about occasional gaps). If you really need a continuous numbering, I wouldn't use order by+limit though. Just use an aggregate, and consider the case where the table is still empty:
INSERT INTO roombooking("hotelbookingID", …)
VALUES ( COALESCE((SELECT max("hotelbookingID") FROM roombooking), 0) + 1, …);

Postgres ANY operator with array selected in a subquery

Can someone explain to me why the 4th select works, but the first 3 do not? (I'm on PostgreSQL 9.3.4 if it matters.)
drop table if exists temp_a;
create temp table temp_a as
(
select array[10,20] as arr
);
select 10 = any(select arr from temp_a); -- ERROR: operator does not exist: integer = integer[]
select 10 = any(select arr::integer[] from temp_a); -- ERROR: operator does not exist: integer = integer[]
select 10 = any((select arr from temp_a)); -- ERROR: operator does not exist: integer = integer[]
select 10 = any((select arr from temp_a)::integer[]); -- works
Here's a sqlfiddle: http://sqlfiddle.com/#!15/56a09/2
You might be expecting an aggregate. Per the documentation:
Note: Boolean aggregates bool_and and bool_or correspond to standard SQL aggregates every and any or some. As for any and some, it seems that there is an ambiguity built into the standard syntax:
SELECT b1 = ANY((SELECT b2 FROM t2 ...)) FROM t1 ...;
Here ANY can be considered either as introducing a subquery, or as being an aggregate function, if the subquery returns one row with a Boolean value. Thus the standard name cannot be given to these aggregates.
In Postgres, the any operator exists for subqueries and for arrays.
The first three queries return a set of values of type int[] and you're comparing them to an int. Can't work.
The last query is returning an int[] array but it's only working because you're returning a single element.
Exhibit A; this works:
select (select i from (values (array[1])) rows(i))::int[];
But this doesn't:
select (select i from (values (array[1]), (array[2])) rows(i))::int[];
This works as a result (equivalent to your fourth query):
select 1 = any((select i from (values (array[1])) rows(i))::int[]);
But this doesn't (equivalent to your fourth query returning multiple rows):
select 1 = any((select i from (values (array[1]), (array[2])) rows(i))::int[]);
These should also work, btw:
select 1 = any(
select unnest(arr) from temp_a
);
select 1 = any(
select unnest(i)
from (values (array[1]), (array[2])) rows(i)
);
Also note the array(select ...)) construct as an aside, since it's occasionally handy:
select 1 = any(array(
select i
from (values (1), (2)) rows(i)
));
select 1 = any(
select i
from (values (1), (2)) rows(i)
);

PostgreSQL - return most common value for all columns in a table

I've got a table with a lot of columns in it and I want to run a query to find the most common value in each column.
Ordinarily for a single column, I'd run something like:
SELECT country
FROM users
GROUP BY country
ORDER BY count(*) DESC
LIMIT 1
Does PostgreSQL have a built in function for doing this or can anyone suggest a query I could run to achieve this?
Using the same query, for more than one column you should do:
SELECT *
FROM
(
SELECT country
FROM users
GROUP BY 1
ORDER BY count(*) DESC
LIMIT 1
) country
,(
SELECT city
FROM users
GROUP BY 1
ORDER BY count(*) DESC
LIMIT 1
) city
This works for any type and will return all the values in the same row, with the columns having its original name.
For more columns just had more subquerys as:
,(
SELECT someOtherColumn
FROM users
GROUP BY 1
ORDER BY count(*) DESC
LIMIT 1
) someOtherColumn
Edit:
You could reach it with window functions also. However it will not be better in performance nor in readability.
Starting from PG 9.4 there is aggregate function for this:
mode() WITHIN GROUP (ORDER BY sort_expression)
returns the most frequent input value (arbitrarily choosing the first one if there are multiple equally-frequent results)
And for earlier versions, you could create one...
CREATE OR REPLACE FUNCTION mode_array(anyarray)
RETURNS anyelement AS
$BODY$
SELECT a FROM unnest($1) a GROUP BY 1 ORDER BY COUNT(1) DESC, 1 LIMIT 1;
$BODY$
LANGUAGE SQL IMMUTABLE;
CREATE AGGREGATE mode(anyelement)(
SFUNC = array_append, --Function to call for each row. Just builds the array
STYPE = anyarray,
FINALFUNC = mode_array, --Function to call after everything has been added to array
INITCOND = '{}'--Initialize an empty array when starting
) ;
Usage: SELECT mode(column) FROM table;
If I were doing this, I'd write a query like this one:
SELECT 'country', country
FROM users
GROUP BY country
ORDER BY count(*) DESC
LIMIT 1
UNION ALL
SELECT 'city', city
FROM USERS
GROUP BY city
ORDER BY count(*) DESC
LIMIT 1
-- etc.
It should be noted this only works if all the columns are of compatible types. If they are not, you'll probably need a different solution.
This window function version will read the users table and the computed table once each. The correlated subquery version will read the users table once for each of the columns. If the columns are many as in the OPs case then my guess is that this is faster. SQL Fiddle
select distinct on (country_count, age_count) *
from (
select
country,
count(*) over(partition by country) as country_count,
age,
count(*) over(partition by age) as age_count
from users
) s
order by country_count desc, age_count desc
limit 1

upsert and sub select

i have a upsert statement (http://www.the-art-of-web.com/sql/upsert/) doing an insert whenever a row with id does not exist and updating the column when the row exists :
WITH upsert AS
(UPDATE foo SET counter=counter+1 WHERE id='bar' RETURNING *)
INSERT INTO foo(id, counter) SELECT 'bar', 0
WHERE NOT EXISTS (SELECT * FROM upsert) RETURNING counter;
id is the primary key column (as expected). until here everything works fine.
but there is a 3rd column 'position' which can be used for custom ordering.
in case of an update i want to keep the current value.
but the insert statement needs an additional subquery returning the lowest possible position not in use:
WITH upsert AS
(UPDATE foo SET counter=counter+1 WHERE id='bar' RETURNING *)
INSERT INTO foo(id, counter, position) SELECT 'bar', 0, MIN( position)-1 from foo
WHERE NOT EXISTS (SELECT * FROM upsert) RETURNING counter;
using this statement i get an error
ERROR: duplicate key value violates unique constraint "id"
whats wrong here ?
The problem is that MIN() applied to 0 row returns one row (with a NULL value)
Example:
test=> select min(1) where false;
min
-----
(1 row)
This differs from the same WHERE clause without min()
test=> select 1 where false;
?column?
----------
(0 rows)
So when using MIN() in the subquery feeding the INSERT, it will insert a new row even when the WHERE clause evaluates to false, which defeats the logic of this UPSERT.
I think this can be worked around by introducing another subquery:
WITH upsert AS
(UPDATE foo SET counter=counter+1 WHERE id='bar' RETURNING *)
INSERT INTO foo(id, counter, position)
SELECT * FROM (SELECT 'bar', 0, MIN( position)-1 from foo) s
WHERE NOT EXISTS (SELECT * FROM upsert)
RETURNING counter;
Note however that cramming this into a single SQL statement does not confer any guarantee of systematic success when run concurrently.
See for more:
How do I do an UPSERT (MERGE, INSERT … ON DUPLICATE UPDATE) in PostgreSQL?

Join 2 sets based on default order

How do I join 2 sets of records solely based on the default order?
So if I have a table x(col(1,2,3,4,5,6,7)) and another table z(col(a,b,c,d,e,f,g))
it will return
c1 c2
-- --
1 a
2 b
3 c
4 d
5 e
6 f
7 g
Actually, I wanted to join a pair of one dimensional arrays from parameters and treat them like columns from a table.
Sample code:
CREATE OR REPLACE FUNCTION "Test"(timestamp without time zone[],
timestamp without time zone[])
RETURNS refcursor AS
$BODY$
DECLARE
curr refcursor;
BEGIN
OPEN curr FOR
SELECT DISTINCT "Start" AS x, "End" AS y, COUNT("A"."id")
FROM UNNEST($1) "Start"
INNER JOIN
(
SELECT "End", ROW_NUMBER() OVER(ORDER BY ("End")) rn
FROM UNNEST($2) "End" ORDER BY ("End")
) "End" ON ROW_NUMBER() OVER(ORDER BY ("Start")) = "End".rn
LEFT JOIN "A" ON ("A"."date" BETWEEN x AND y)
GROUP BY 1,2
ORDER BY "Start";
return curr;
END
$BODY$
Now, to answer the real question that was revealed in comments, which appears to be something like:
Given two arrays 'a' and 'b', how do I pair up their elements so I can get the element pairs as column aliases in a query?
There are a couple of ways to tackle this:
If and only if the arrays are of equal length, use multiple unnest functions in the SELECT clause (a deprecated approach that should only be used for backward compatibility);
Use generate_subscripts to loop over the arrays;
Use generate_series over subqueries against array_lower and array_upper to emulate generate_subscripts if you need to support versions too old to have generate_subscripts;
Relying on the order that unnest returns tuples in and hoping - like in my other answer and as shown below. It'll work, but it's not guaranteed to work in future versions.
Use the WITH ORDINALITY functionality added in PostgreSQL 9.4 (see also its first posting) to get a row number for unnest when 9.4 comes out.
Use multiple-array UNNEST, which is SQL-standard but which PostgreSQL doesn't support yet.
So, say we have function arraypair with array parameters a and b:
CREATE OR REPLACE FUNCTION arraypair (a integer[], b text[])
RETURNS TABLE (col_a integer, col_b text) AS $$
-- blah code here blah
$$ LANGUAGE whatever IMMUTABLE;
and it's invoked as:
SELECT * FROM arraypair( ARRAY[1,2,3,4,5,6,7], ARRAY['a','b','c','d','e','f','g'] );
possible function definitions would be:
SRF-in-SELECT (deprecated)
CREATE OR REPLACE FUNCTION arraypair (a integer[], b text[])
RETURNS TABLE (col_a integer, col_b text) AS $$
SELECT unnest(a), unnest(b);
$$ LANGUAGE sql IMMUTABLE;
Will produce bizarre and unexpected results if the arrays aren't equal in length; see the documentation on set returning functions and their non-standard use in the SELECT list to learn why, and what exactly happens.
generate_subscripts
This is likely the safest option:
CREATE OR REPLACE FUNCTION arraypair (a integer[], b text[])
RETURNS TABLE (col_a integer, col_b text) AS $$
SELECT
a[i], b[i]
FROM generate_subscripts(CASE WHEN array_length(a,1) >= array_length(b,1) THEN a::text[] ELSE b::text[] END, 1) i;
$$ LANGUAGE sql IMMUTABLE;
If the arrays are of unequal length, as written it'll return null elements for the shorter, so it works like a full outer join. Reverse the sense of the case to get an inner-join like effect. The function assumes the arrays are one-dimensional and that they start at index 1. If an entire array argument is NULL then the function returns NULL.
A more generalized version would be written in PL/PgSQL and would check array_ndims(a) = 1, check array_lower(a, 1) = 1, test for null arrays, etc. I'll leave that to you.
Hoping for pair-wise returns:
This isn't guaranteed to work, but does with PostgreSQL's current query executor:
CREATE OR REPLACE FUNCTION arraypair (a integer[], b text[])
RETURNS TABLE (col_a integer, col_b text) AS $$
WITH
rn_c1(rn, col) AS (
SELECT row_number() OVER (), c1.col
FROM unnest(a) c1(col)
),
rn_c2(rn, col) AS (
SELECT row_number() OVER (), c2.col
FROM unnest(b) c2(col)
)
SELECT
rn_c1.col AS c1,
rn_c2.col AS c2
FROM rn_c1
INNER JOIN rn_c2 ON (rn_c1.rn = rn_c2.rn);
$$ LANGUAGE sql IMMUTABLE;
I would consider using generate_subscripts much safer.
Multi-argument unnest:
This should work, but doesn't because PostgreSQL's unnest doesn't accept multiple input arrays (yet):
SELECT * FROM unnest(a,b);
select x.c1, z.c2
from
x
inner join
(
select
c2,
row_number() over(order by c2) rn
from z
order by c2
) z on x.c1 = z.rn
order by x.c1
If x.c1 is not 1,2,3... you can do the same that was done with z
The middle order by is not necessary as pointed by Erwin. I tested it like this:
create table t (i integer);
insert into t
select ceil(random() * 100000)
from generate_series(1, 100000);
select
i,
row_number() over(order by i) rn
from t
;
And i comes out ordered. Before this simple test which I never executed I though it would be possible that the rows would be numbered in any order.
By "default order" it sounds like you probably mean the order in which the rows are returned by select * from tablename without an ORDER BY.
If so, this ordering is undefined. The database can return rows in any order that it feels like. You'll find that if you UPDATE a row, it probably moves to a different position in the table.
If you're stuck in a situation where you assumed tables had an order and they don't, you can as a recovery option add a row number based on the on-disk ordering of the tuples within the table:
select row_number() OVER (), *
from the_table
order by ctid
If the output looks right, I recommend that you CREATE TABLE a new table with an extra field, then do an INSERT INTO ... SELECT to insert the data ordered by ctid, then ALTER TABLE ... RENAME the tables and finally fix any foreign key references so they point to the new table.
ctid can be changed by autovacuum, UPDATE, CLUSTER, etc, so it is not something you should ever be using in applications. I'm using it here only because it sounds like you don't have any real ordering or identifier key.
If you need to pair up rows based on their on-disk ordering (an unreliable and unsafe thing to do as noted above), you could per this SQLFiddle try:
WITH
rn_c1(rn, col) AS (
SELECT row_number() OVER (ORDER BY ctid), c1.col
FROM c1
),
rn_c2(rn, col) AS (
SELECT row_number() OVER (ORDER BY ctid), c2.col
FROM c2
)
SELECT
rn_c1.col AS c1,
rn_c2.col AS c2
FROM rn_c1
INNER JOIN rn_c2 ON (rn_c1.rn = rn_c2.rn);
but never rely on this in a production app. If you're really stuck you can use this with CREATE TABLE AS to construct a new table that you can start with when you're working on recovering data from a DB that lacks a required key, but that's about it.
The same approach given above might work with an empty window clause () instead of (ORDER BY ctid) when using sets that lack a ctid, like interim results from functions. It's even less safe then though, and should be a matter of last resort only.
(See also this newer related answer: https://stackoverflow.com/a/17762282/398670)