Postgres ANY operator with array selected in a subquery - postgresql

Can someone explain to me why the 4th select works, but the first 3 do not? (I'm on PostgreSQL 9.3.4 if it matters.)
drop table if exists temp_a;
create temp table temp_a as
(
select array[10,20] as arr
);
select 10 = any(select arr from temp_a); -- ERROR: operator does not exist: integer = integer[]
select 10 = any(select arr::integer[] from temp_a); -- ERROR: operator does not exist: integer = integer[]
select 10 = any((select arr from temp_a)); -- ERROR: operator does not exist: integer = integer[]
select 10 = any((select arr from temp_a)::integer[]); -- works
Here's a sqlfiddle: http://sqlfiddle.com/#!15/56a09/2

You might be expecting an aggregate. Per the documentation:
Note: Boolean aggregates bool_and and bool_or correspond to standard SQL aggregates every and any or some. As for any and some, it seems that there is an ambiguity built into the standard syntax:
SELECT b1 = ANY((SELECT b2 FROM t2 ...)) FROM t1 ...;
Here ANY can be considered either as introducing a subquery, or as being an aggregate function, if the subquery returns one row with a Boolean value. Thus the standard name cannot be given to these aggregates.
In Postgres, the any operator exists for subqueries and for arrays.
The first three queries return a set of values of type int[] and you're comparing them to an int. Can't work.
The last query is returning an int[] array but it's only working because you're returning a single element.
Exhibit A; this works:
select (select i from (values (array[1])) rows(i))::int[];
But this doesn't:
select (select i from (values (array[1]), (array[2])) rows(i))::int[];
This works as a result (equivalent to your fourth query):
select 1 = any((select i from (values (array[1])) rows(i))::int[]);
But this doesn't (equivalent to your fourth query returning multiple rows):
select 1 = any((select i from (values (array[1]), (array[2])) rows(i))::int[]);
These should also work, btw:
select 1 = any(
select unnest(arr) from temp_a
);
select 1 = any(
select unnest(i)
from (values (array[1]), (array[2])) rows(i)
);
Also note the array(select ...)) construct as an aside, since it's occasionally handy:
select 1 = any(array(
select i
from (values (1), (2)) rows(i)
));
select 1 = any(
select i
from (values (1), (2)) rows(i)
);

Related

Postgres Crosstab query with CTE (with clause)

Recently started working on Postgres and need to pivot data.
I wrote the following query:
select *
from crosstab (
$$
with tmp_kv as (
select distinct pat_id
,col.name as key, replace(replace(replace(value, '[',''), ']', ''),'"','') as value
from (
select p.Id as pat_id, nullif(kv.key,'undefined')::int as key, trim(kv.value::text,'"') as value
from pat_table p
left join e_table e on e.pat_id = p.id and e.id is null
,jsonb_each_text(p.data) as kv
) t
left join lateral (
select name::text as name from public.config_fields fld
where id = t.key
) col on true
)
select pat_id, key, value
from tmp_kv
where nullif(trim(key),'') is not null
order by pat_id, key
$$,$$
select distinct key from tmp_kv -- (Get error "relation "tmp_kv" does not exist" )
where nullif(trim(key),'') is not null
order by 1
$$
) as (
pat_id bigint
...
...
);
Query works if I take the WITH clause out into temporary table. But will be deploying it to production with read replicas, so need it to be working with a CTE. Is there a way?
The two queries passed as strings to the crosstab() function are separate queries.
A CTE can only be attached to a single query.
What you ask for is strictly impossible.
Since you have to spell out the (static) return type for crosstab() anyway, and the result of the query in the 2nd parameter has to match that, it's pointless to use a query with a dynamic result as 2nd parameter to begin with.

Use result of postgres CTE in function

I am having difficulty using the results from a CTE in a function. Given the following Postgres table.
CREATE TABLE directory (
id SERIAL PRIMARY KEY
, name TEXT
, parent_id INTEGER REFERENCES directory(id)
);
INSERT INTO directory (name, parent_id)
VALUES ('Root', NULL), ('D1', 1), ('D2', 2), ('D3', 3);
I have this recursive CTE that returns the descendants of a directory.
WITH RECURSIVE tree AS (
SELECT id
FROM directory
WHERE parent_id = 2
UNION ALL
SELECT directory.id
FROM directory, tree
WHERE directory.parent_id = tree.id
)
The returned values are what I expect and can be made to equal an array
SELECT (SELECT array_agg(id) FROM tree) = ARRAY[3, 4];
I can use an array to select values from the table
SELECT * FROM directory WHERE id = ANY(ARRAY[3, 4]);
However, I cannot use the results of the CTE to accomplish the same thing.
SELECT * FROM directory WHERE id = ANY(SELECT array_agg(id) FROM tree);
The resulting error indicates that there is a type mismatch.
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.
However, I am unsure how to correctly accomplish this.
Use:
SELECT *
FROM directory
WHERE id = ANY(SELECT unnest(array_agg(id)) FROM tree);
See detailed explanation in this answer.
Using unnest() in a subquery is a general method for dealing with arrays:
where id = any(select unnest(some_array))
Because array_agg() and unnest() are inverse operations, the query can be as simply as:
SELECT *
FROM directory
WHERE id = ANY(SELECT id FROM tree);

Join 2 sets based on default order

How do I join 2 sets of records solely based on the default order?
So if I have a table x(col(1,2,3,4,5,6,7)) and another table z(col(a,b,c,d,e,f,g))
it will return
c1 c2
-- --
1 a
2 b
3 c
4 d
5 e
6 f
7 g
Actually, I wanted to join a pair of one dimensional arrays from parameters and treat them like columns from a table.
Sample code:
CREATE OR REPLACE FUNCTION "Test"(timestamp without time zone[],
timestamp without time zone[])
RETURNS refcursor AS
$BODY$
DECLARE
curr refcursor;
BEGIN
OPEN curr FOR
SELECT DISTINCT "Start" AS x, "End" AS y, COUNT("A"."id")
FROM UNNEST($1) "Start"
INNER JOIN
(
SELECT "End", ROW_NUMBER() OVER(ORDER BY ("End")) rn
FROM UNNEST($2) "End" ORDER BY ("End")
) "End" ON ROW_NUMBER() OVER(ORDER BY ("Start")) = "End".rn
LEFT JOIN "A" ON ("A"."date" BETWEEN x AND y)
GROUP BY 1,2
ORDER BY "Start";
return curr;
END
$BODY$
Now, to answer the real question that was revealed in comments, which appears to be something like:
Given two arrays 'a' and 'b', how do I pair up their elements so I can get the element pairs as column aliases in a query?
There are a couple of ways to tackle this:
If and only if the arrays are of equal length, use multiple unnest functions in the SELECT clause (a deprecated approach that should only be used for backward compatibility);
Use generate_subscripts to loop over the arrays;
Use generate_series over subqueries against array_lower and array_upper to emulate generate_subscripts if you need to support versions too old to have generate_subscripts;
Relying on the order that unnest returns tuples in and hoping - like in my other answer and as shown below. It'll work, but it's not guaranteed to work in future versions.
Use the WITH ORDINALITY functionality added in PostgreSQL 9.4 (see also its first posting) to get a row number for unnest when 9.4 comes out.
Use multiple-array UNNEST, which is SQL-standard but which PostgreSQL doesn't support yet.
So, say we have function arraypair with array parameters a and b:
CREATE OR REPLACE FUNCTION arraypair (a integer[], b text[])
RETURNS TABLE (col_a integer, col_b text) AS $$
-- blah code here blah
$$ LANGUAGE whatever IMMUTABLE;
and it's invoked as:
SELECT * FROM arraypair( ARRAY[1,2,3,4,5,6,7], ARRAY['a','b','c','d','e','f','g'] );
possible function definitions would be:
SRF-in-SELECT (deprecated)
CREATE OR REPLACE FUNCTION arraypair (a integer[], b text[])
RETURNS TABLE (col_a integer, col_b text) AS $$
SELECT unnest(a), unnest(b);
$$ LANGUAGE sql IMMUTABLE;
Will produce bizarre and unexpected results if the arrays aren't equal in length; see the documentation on set returning functions and their non-standard use in the SELECT list to learn why, and what exactly happens.
generate_subscripts
This is likely the safest option:
CREATE OR REPLACE FUNCTION arraypair (a integer[], b text[])
RETURNS TABLE (col_a integer, col_b text) AS $$
SELECT
a[i], b[i]
FROM generate_subscripts(CASE WHEN array_length(a,1) >= array_length(b,1) THEN a::text[] ELSE b::text[] END, 1) i;
$$ LANGUAGE sql IMMUTABLE;
If the arrays are of unequal length, as written it'll return null elements for the shorter, so it works like a full outer join. Reverse the sense of the case to get an inner-join like effect. The function assumes the arrays are one-dimensional and that they start at index 1. If an entire array argument is NULL then the function returns NULL.
A more generalized version would be written in PL/PgSQL and would check array_ndims(a) = 1, check array_lower(a, 1) = 1, test for null arrays, etc. I'll leave that to you.
Hoping for pair-wise returns:
This isn't guaranteed to work, but does with PostgreSQL's current query executor:
CREATE OR REPLACE FUNCTION arraypair (a integer[], b text[])
RETURNS TABLE (col_a integer, col_b text) AS $$
WITH
rn_c1(rn, col) AS (
SELECT row_number() OVER (), c1.col
FROM unnest(a) c1(col)
),
rn_c2(rn, col) AS (
SELECT row_number() OVER (), c2.col
FROM unnest(b) c2(col)
)
SELECT
rn_c1.col AS c1,
rn_c2.col AS c2
FROM rn_c1
INNER JOIN rn_c2 ON (rn_c1.rn = rn_c2.rn);
$$ LANGUAGE sql IMMUTABLE;
I would consider using generate_subscripts much safer.
Multi-argument unnest:
This should work, but doesn't because PostgreSQL's unnest doesn't accept multiple input arrays (yet):
SELECT * FROM unnest(a,b);
select x.c1, z.c2
from
x
inner join
(
select
c2,
row_number() over(order by c2) rn
from z
order by c2
) z on x.c1 = z.rn
order by x.c1
If x.c1 is not 1,2,3... you can do the same that was done with z
The middle order by is not necessary as pointed by Erwin. I tested it like this:
create table t (i integer);
insert into t
select ceil(random() * 100000)
from generate_series(1, 100000);
select
i,
row_number() over(order by i) rn
from t
;
And i comes out ordered. Before this simple test which I never executed I though it would be possible that the rows would be numbered in any order.
By "default order" it sounds like you probably mean the order in which the rows are returned by select * from tablename without an ORDER BY.
If so, this ordering is undefined. The database can return rows in any order that it feels like. You'll find that if you UPDATE a row, it probably moves to a different position in the table.
If you're stuck in a situation where you assumed tables had an order and they don't, you can as a recovery option add a row number based on the on-disk ordering of the tuples within the table:
select row_number() OVER (), *
from the_table
order by ctid
If the output looks right, I recommend that you CREATE TABLE a new table with an extra field, then do an INSERT INTO ... SELECT to insert the data ordered by ctid, then ALTER TABLE ... RENAME the tables and finally fix any foreign key references so they point to the new table.
ctid can be changed by autovacuum, UPDATE, CLUSTER, etc, so it is not something you should ever be using in applications. I'm using it here only because it sounds like you don't have any real ordering or identifier key.
If you need to pair up rows based on their on-disk ordering (an unreliable and unsafe thing to do as noted above), you could per this SQLFiddle try:
WITH
rn_c1(rn, col) AS (
SELECT row_number() OVER (ORDER BY ctid), c1.col
FROM c1
),
rn_c2(rn, col) AS (
SELECT row_number() OVER (ORDER BY ctid), c2.col
FROM c2
)
SELECT
rn_c1.col AS c1,
rn_c2.col AS c2
FROM rn_c1
INNER JOIN rn_c2 ON (rn_c1.rn = rn_c2.rn);
but never rely on this in a production app. If you're really stuck you can use this with CREATE TABLE AS to construct a new table that you can start with when you're working on recovering data from a DB that lacks a required key, but that's about it.
The same approach given above might work with an empty window clause () instead of (ORDER BY ctid) when using sets that lack a ctid, like interim results from functions. It's even less safe then though, and should be a matter of last resort only.
(See also this newer related answer: https://stackoverflow.com/a/17762282/398670)

How to work around the "Recursive CTE member can refer itself only in FROM clause" requirement?

I'm trying to run a graph search to find all nodes accessible from a starting point, like so:
with recursive
nodes_traversed as (
select START_NODE ID
from START_POSITION
union all
select ed.DST_NODE
from EDGES ed
join nodes_traversed NT
on (NT.ID = ed.START_NODE)
and (ed.DST_NODE not in (select ID from nodes_traversed))
)
select distinct * from nodes_traversed
Unfortunately, when I try to run that, I get an error:
Recursive CTE member (nodes_traversed) can refer itself only in FROM clause.
That "not in select" clause is important to the recursive expression, though, as it provides the ending point. (Without it, you get infinite recursion.) Using generation counting, like in the accepted answer to this question, would not help, since this is a highly cyclic graph.
Is there any way to work around this without having to create a stored proc that does it iteratively?
Here is my solution that use global temporary table, I have limited recursion by level and nodes from temporary table.
I am not sure how it will work on large set of data.
create procedure get_nodes (
START_NODE integer)
returns (
NODE_ID integer)
as
declare variable C1 integer;
declare variable C2 integer;
begin
/**
create global temporary table id_list(
id integer
);
create index id_list_idx1 ON id_list (id);
*/
delete from id_list;
while ( 1 = 1 ) do
begin
select count(distinct id) from id_list into :c1;
insert into id_list
select id from
(
with recursive nodes_traversed as (
select :START_NODE AS ID , 0 as Lv
from RDB$DATABASE
union all
select ed.DST_NODE , Lv+1
from edges ed
join nodes_traversed NT
on
(NT.ID = ed.START_NODE)
and nt.Lv < 5 -- Max recursion level
and nt.id not in (select id from id_list)
)
select distinct id from nodes_traversed);
select count(distinct id) from id_list into :c2;
if (c1 = c2) then break;
end
for select distinct id from id_list into :node_id do
begin
suspend ;
end
end

tsql - using internal stored procedure as parameter is where clause

I'm trying to build a stored procedure that makes use of another stored procedure. Taking its result and using it as part of its where clause, from some reason I receive an error:
Invalid object name 'dbo.GetSuitableCategories'.
Here is a copy of the code:
select distinct top 6 * from
(
SELECT TOP 100 *
FROM [dbo].[products] products
where products.categoryId in
(select top 10 categories.categoryid from
[dbo].[GetSuitableCategories]
(
-- #Age
-- ,#Sex
-- ,#Event
1,
1,
1
) categories
ORDER BY NEWID()
)
--and products.Price <=#priceRange
ORDER BY NEWID()
)as d
union
select * from
(
select TOP 1 * FROM [dbo].[products] competingproducts
where competingproducts.categoryId =-2
--and competingproducts.Price <=#priceRange
ORDER BY NEWID()
) as d
and here is [dbo].[GetSuitableCategories] :
if (#gender =0)
begin
select * from categoryTable categories
where categories.gender =3
end
else
begin
select * from categoryTable categories
where categories.gender = #gender
or categories.gender =3
end
I would use an inline table valued user defined function. Or simply code it inline is no re-use is required
CREATE dbo.GetSuitableCategories
(
--parameters
)
RETURNS TABLE
AS
RETURN (
select * from categoryTable categories
where categories.gender IN (3, #gender)
)
Some points though:
I assume categoryTable has no gender = 0
Do you have 3 genders in your categoryTable? :-)
Why do pass in 3 parameters but only use 1? See below please
Does #sex map to #gender?
If you have extra processing on the 3 parameters, then you'll need a multi statement table valued functions but beware these can be slow
You can't use the results of a stored procedure directly in a select statement
You'll either have to output the results into a temp table, or make the sproc into a table valued function to do what you doing.
I think this is valid, but I'm doing this from memory
create table #tmp (blah, blah)
Insert into #tmp
exec dbo.sprocName