Postgresql sequential add/remove reduction operation - postgresql

I have a table with line numbers and either a "define" or an "undefine" event of an identifier. Example:
line_no | def | undef
--------------------
1 | 'a' | NULL
2 | 'b' | NULL
3 | NULL| 'a'
...
42 | NULL| 'b'
I want to compute the "live variables" information on every line. Iteratively, I'd just write code like the following pseudocode:
live = []
for each r in table:
if r.def:
live.append(r.def)
else:
live.remove(r.undef)
store(r.line_no, live)
The expected result is a table like :
line_no | live
1. | ['a']
2. | ['a', 'b']
3. | ['b']
...
42. | []
I managed to write the equivalent sequential loop as a plpgsql function. However, I was wondering if there is a (maybe more elegant) way as a SQL query? I tried various things using lag() but somehow that never resulted in this "reduction" operation I am looking for?
(bonus points if the query can also partition over a field, such as 'filename')

If you want a pure SQL solution, use a recursive query.
with recursive cte as (
select line_no, array[def] as list
from my_table
where line_no = 1
union all
select
t.line_no,
case when def is null
then array_remove(list, undef)
else array_append(list, def)
end
from my_table t
join cte c on c.line_no = t.line_no- 1
)
select *
from cte;
However, a more efficient and flexible solution in this case may be creating a function.
create or replace function list_of_vars()
returns table (line_no int, list text[])
language plpgsql as $$
declare
arr text[];
rec record;
begin
for rec in
select *
from my_table
order by line_no
loop
if rec.def is null then
arr := array_remove(arr, rec.undef);
else
arr := array_append(arr, rec.def);
end if;
line_no := rec.line_no;
list := arr;
return next;
end loop;
end
$$;
Use:
select *
from list_of_vars()
order by line_no
Test it in db<>fiddle.

One way this can be achieved is to use a recursive query which is in the end pretty similar to the iteration you are thinking of.
with recursive lines as (
select line_no,
(case when def is not null then jsonb_build_array(def) else '[]'::jsonb end) - coalesce(undef, '') live
from the_table
where line_no = 1
union all
select c.line_no,
(p.live || case when def is not null then jsonb_build_array(c.def) else '[]'::jsonb end) - coalesce(c.undef, '')
from the_table c
join lines p on c.line_no - 1 = p.line_no
)
select *
from lines;
jsonb_build_array will happily include a NULL value, that's why we need the somewhat convoluted CASE expression to turn a single value into an array (with one or zero elements).
The - operator for jsonb removes the element on the right hand side from the array. However an null value on the right hand side, will remove all elements, hence the coalesce()
This requires the values for line_no to not have any gaps (and start with 1). If that is not a case, another step would be required to generate a gap-less number for each row (which would make this even slower)
Online example
I doubt if that is actually faster then a "proper" procedural solution.

Related

JSONB select result in one row

I need your help on this. I'm trying to achieve a query for a jsonb column information I have in a table. My jsonb is an array of objects and in every object I have two key/value pairs. In this case, I have a key/value to exclude and only get the another one key without it value. So, I figure it out how to do it like:
jsonb : '[{"track":"value","location":"value"},{"extra":"value","location":"value"},...{"another":"value","location":"value"}]'
SELECT id, jsonb_object_keys((item::jsonb - 'location')::jsonb)
FROM mytable, jsonb_array_elements(theJsonB) with ordinality arr(item,position)
WHERE offer = '0001'
This query, get me the result like
id | jsonb_object_keys
-----------------------
1 | track
1 | extra
... |
1 | another
But I need to get the result in only one row for each id like
id | column1 | column2 | ... | column+
------------------------
1 | track | extra | ... | another
2 | track | extra | ... | another
3 | track | extra | ... | another
4 | track | extra | ... | another
How I could solve this? Thanks in advance, I'm a pretty newbie in SQL but I'm working hard ;-)
If you know the list of the resulting columns only at the runtime then you need some piece of dynamic sql code.
Here is a full dynamic solution which relies on the creation of a user-defined composite type and on the standard functions jsonb_populate_record and jsonb_object_agg :
First you dynamically create the list of keys as a new composite type :
CREATE OR REPLACE PROCEDURE key_list (NewJsonB jsonb) LANGUAGE plpgsql AS
$$
DECLARE key_list text ;
BEGIN
IF NewJsonB IS NULL
THEN
SELECT string_agg(DISTINCT k.object->>'key' || ' text', ',')
INTO key_list
FROM mytable
CROSS JOIN LATERAL jsonb_path_query(theJsonB, '$[*].keyvalue()[*] ? (#.key != "location")') AS k(object) ;
ELSE SELECT string_agg(DISTINCT k.key :: text || ' text', ',')
FROM (SELECT jsonb_object_keys(to_jsonb(a.*)) AS key FROM (SELECT(null :: key_list).*) AS a
UNION ALL
SELECT jsonb_path_query(NewJsonB, '$[*].keyvalue()[*] ? (#.key != "location")')->>'key'
) AS k
INTO key_list ;
END IF ;
EXECUTE 'DROP TYPE IF EXISTS key_list ' ;
EXECUTE 'CREATE TYPE key_list AS (' || COALESCE(key_list, '') || ')' ;
END ;
$$ ;
CALL key_list(NULL) ;
Then you call the procedure key_list() by trigger when the list of keys is supposed to be modified :
CREATE OR REPLACE FUNCTION mytable_insert_update()
RETURNS trigger LANGUAGE plpgsql VOLATILE AS
$$
BEGIN
IF NOT EXISTS (SELECT jsonb_object_keys(to_jsonb(a.*)) FROM (SELECT(null :: key_list).*) AS a)
THEN CALL key_list(NULL) ;
ELSIF EXISTS ( SELECT jsonb_path_query(NEW.theJsonB, '$[*].keyvalue()[*] ? (#.key != "location")')->>'key'
EXCEPT ALL
SELECT jsonb_object_keys(to_jsonb(a.*)) FROM (SELECT(null :: key_list).*) AS a
)
THEN CALL key_list(NEW.theJsonB) ;
END IF ;
RETURN NEW ;
END ;
$$ ;
CREATE OR REPLACE TRIGGER mytable_insert_update AFTER INSERT OR UPDATE OF theJsonB ON mytable
FOR EACH ROW EXECUTE FUNCTION mytable_insert_update() ;
CREATE OR REPLACE FUNCTION mytable_delete()
RETURNS trigger LANGUAGE plpgsql VOLATILE AS
$$
BEGIN
CALL key_list (NULL) ;
RETURN OLD ;
END ;
$$ ;
CREATE OR REPLACE TRIGGER mytable_delete AFTER DELETE ON mytable
FOR EACH ROW EXECUTE FUNCTION mytable_delete() ;
Finally, you should get the expected result with the following query :
SELECT (jsonb_populate_record(NULL :: key_list, jsonb_object_agg(lower(c.object->>'key'), c.object->'key'))).*
FROM mytable AS t
CROSS JOIN LATERAL jsonb_path_query(t.theJsonB, '$[*].keyvalue()[*] ? (#.key != "location")') AS c(object)
GROUP BY t
full test result in dbfiddle.

Appending query with if-else condition in PostgreSQL

I have my function to run SELECT query with 3 condition of HAVING CLAUSE:
having sum() > 0
having sum() <= 0
dont have HAVING CLAUSE
Here is my function:
DROP function getf(arg int);
create or replace function getf(arg int)
returns table (
option_id bigint,
importQuantity bigint,
sold bigint,
remain bigint
)
as $$
begin
if arg = 1 then
return query select b.option_id, SUM(b.import_quantity)::bigint as importQuantity, SUM(b.sold_quantity)::bigint as sold, SUM(b.remaining_quantity)::bigint as remain from batch b where b.product_id = 220 and b.option_id in (select o.id from "option" o where o.barcode like '%%' or o.barcode is null) group by b.option_id having sum(b.remaining_quantity) > 0;
elsif arg = 2 then
return query select b.option_id, SUM(b.import_quantity)::bigint as importQuantity, SUM(b.sold_quantity)::bigint as sold, SUM(b.remaining_quantity)::bigint as remain from batch b where b.product_id = 220 and b.option_id in (select o.id from "option" o where o.barcode like '%%' or o.barcode is null) group by b.option_id having sum(b.remaining_quantity) <= 0;
elsif arg = 3 then
return query select b.option_id, SUM(b.import_quantity)::bigint as importQuantity, SUM(b.sold_quantity)::bigint as sold, SUM(b.remaining_quantity)::bigint as remain from batch b where b.product_id = 220 and b.option_id in (select o.id from "option" o where o.barcode like '%%' or o.barcode is null) group by b.option_id;
end if;
end; $$ language plpgsql;
And I call my function:
select getf(3);
Question
The function work fine. But SELECT query only different at HAVING CLAUSE
How can I use dynamic query to appending HAVING with if-else condition?
Since you need to change the structure of the query not just replace a parameter value within the query you need dynamic SQL. But first lets put on a diet. That is remove unnecessary parts.
b.option_id in (select o.id from "option" o where o.barcode like '%%' or o.barcode is null)
This is unnecessary the WHERE clause contains a tautology (a statement that in always true). Why is this. Well the predicate o.barcode like '%%' actually check if the barcode contains 0 or more characters. Only NULL makes this false, but in that case the other predicate barcode is NULL will evaluate true,so the overall condition is always true. As a result the sub-query always returns as ALL ids in options table and b.option_id is always in the list. (That of course assumes batch.option_id is properly defined as not null and a FK to options).
Now lets consider that HAVING clause. The first two are trivial replacements, just replace the rvalue with '< 0' or '>= 0'. The third option presents a problem, as you need to remove the entire clause rather change the rvalue. Its not all that difficult, however it also can be turned into a simple rvalue change. The HAVING predicate simply needs to evaluate true and we accomplish the same thing. That is achieved by replacing it by 'is not null'. Finally we can create an array of the rvalue replacements avoid any if then logic on the parameter (arg). Other that validating it contains a valid value. So: see demo
create or replace function getf(arg int)
returns table (option_id bigint
,importQuantity bigint
,sold bigint
,remain bigint
)
language plpgsql
as $$
declare
k_having_arg constant text[] = array['> 0','<= 0', 'is not null']; -- Replacement values for RVALUE below
k_base_query constant text =
'select b.option_id'
', SUM(b.import_quantity)::bigint '
', SUM(b.sold_quantity)::bigint '
', SUM(b.remaining_quantity)::bigint '
' from batch b'
' where b.product_id = 220'
' group by b.option_id '
' having sum(b.remaining_quantity) %s; '; -- expressions RVALUE to be replaces
l_exec_query text;
begin
if arg not between 1 and 3 then
raise exception 'Invalid Arg value (%) out of range, Must be 1, 2, or 3.', arg;
end if;
l_exec_query = format (k_base_query,k_having_arg[arg]);
raise notice E'Running query\n%',l_exec_query;
return query execute l_exec_query;
end;
$$;
try using with CTE clause. you can have your query (till group by) in with clause.
Then use can use your if conditions and select from with CTE and and append having clause.
For reference you can check
How to declare a variable in a PostgreSQL query

search for multiple values in comma separated column (postgresql)

I need to fetch all records where these (5565bffd-b1c8-4556-ae5d-4bef61af48f5","5565bffd-cd78-4e6f-ae13-4bef61af48f5) values exists in categories_id column.
The value that I need to search in categories_id can be multiple because it coming from the form.
+--------------------------------------+-------------+-------------+-------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id | name | alias | description | categories_id |
+--------------------------------------+-------------+-------------+-------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 5565c08d-9f18-4b76-9cae-4a8261af48f5 | Honeycolony | honeycolony | null | ["5565bffd-7f64-494c-8950-4bef61af48f5"] |
+--------------------------------------+-------------+-------------+-------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| c8f16660-32cf-11e6-b73c-1924f891ba4d | LOFT | loft | null | ["5565bffd-25bc-4b09-8a83-4bef61af48f5","5565bffd-b1c8-4556-ae5d-4bef61af48f5","5565bffd-cd78-4e6f-ae13-4bef61af48f5"] |
+--------------------------------------+-------------+-------------+-------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 5565c17f-80d8-4390-aadf-4a8061af48f5 | Fawn Shoppe | fawn-shoppe | null | ["5565bffd-25bc-4b09-8a83-4bef61af48f5","5565bffd-0744-4740-81f5-4bef61af48f5","5565bffd-b1c8-4556-ae5d-4bef61af48f5","5565bffd-cd78-4e6f-ae13-4bef61af48f5"] |
+--------------------------------------+-------------+-------------+-------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
I have this function which work as in_array function php.
CREATE OR REPLACE FUNCTION public.arraycontain(
x json,
y json)
RETURNS boolean
LANGUAGE 'plpgsql'
COST 100
VOLATILE
AS $BODY$
DECLARE a text;b text;
BEGIN
FOR a IN SELECT json_array_elements_text($1)
LOOP
FOR b IN SELECT json_array_elements_text($2)
LOOP
IF a = b THEN
RETURN TRUE;
END IF;
END LOOP;
END LOOP;
RETURN FALSE ;
END;
$BODY$;
ALTER FUNCTION public.arraycontain(json, json)
OWNER TO postgres;
But when I do this:
select * from "stores"
where arrayContain(stores.categories_id::JSON,'["5565bffd-b1c8-4556-ae5d-4bef61af48f5","5565bffd-cd78-4e6f-ae13-4bef61af48f5"]')
it shows
ERROR: invalid input syntax for type json
DETAIL: The input string
ended unexpectedly.
CONTEXT: JSON data, line 1: SQL state: 22P02
here is the sqlfiddle (I couldn't update the arraycontain function in fiddle.)
My expected output from the fiddle is it should return last 3 rows that is Furbish Studio,Fawn Shoppe AND LOFT if search using this values ["5565bffd-b1c8-4556-ae5d-4bef61af48f5","5565bffd-cd78-4e6f-ae13-4bef61af48f5"])
I am open for any recommendation.
I also tried this query below but it returns empty.
select id
from stores
where string_to_array(categories_id,',') && array['5565bffd-cd78-4e6f-ae13-4bef61af48f5','5565bffd-b1c8-4556-ae5d-4bef61af48f5'];
EDIT:
This code is actually a filter to filter data. So if I only filter using categories it didn't work but if there is a query before it it works
select * from "stores"
where name like '%ab%' and arrayContain(stores.categories_id::JSON,'["5565bffd-b1c8-4556-ae5d-4bef61af48f5","5565bffd-cd78-4e6f-ae13-4bef61af48f5"]')
also the thing that amaze me is that the '%ab%' must contain more than two character if there's below <2 it will throw error. what could be wrong.
Click: demo:db<>fiddle
You can use the ?| operator, which takes a jsonb array (your column in that case) and checks a text array if any elements are included:
SELECT
*
FROM
mytable
WHERE categories_id ?| '{5565bffd-b1c8-4556-ae5d-4bef61af48f5,5565bffd-cd78-4e6f-ae13-4bef61af48f5}'
If your categories_id is not of type json (which is what the error message says) but a simple text array, you can compare two text arrays directly using the && operator:
Click: demo:db<>fiddle
SELECT
*
FROM
mytable
WHERE categories_id && '{5565bffd-b1c8-4556-ae5d-4bef61af48f5,5565bffd-cd78-4e6f-ae13-4bef61af48f5}'
Your code seems to work fine so perhaps sqlfiddle is the problem.
Try changing the separator in the schema building part to / (instead of ;) and make sure you have the correct version for Postgresql. json_array_elements_text is not supported in 9.3 (you can use json_array_elements instead in this case).
Also skip the " in the select statement.
Look here http://sqlfiddle.com/#!17/918b75/1
There might be an error in your data. Perhaps categories_id is an empty string somewhere.
Try this to see the offending data if any.
do $$
declare
r record;
b boolean;
begin
for r in (select * from stores) loop
b:= arrayContain(r.categories_id::JSON,'["5565bffd-b1c8-4556-ae5d-4bef61af48f5","5565bffd-cd78-4e6f-ae13-4bef61af48f5"]') ;
end loop;
exception
when others
then raise notice '%,%',r,r.categories_id;
return;
end;
$$
Best regards.
Bjarni

Loop through columns of RECORD

I need to loop through type RECORD items by key/index, like I can do this using array structures in other programming languages.
For example:
DECLARE
data1 record;
data2 text;
...
BEGIN
...
FOR data1 IN
SELECT
*
FROM
sometable
LOOP
FOR data2 IN
SELECT
unnest( data1 ) -- THIS IS DOESN'T WORK!
LOOP
RETURN NEXT data1[data2]; -- SMTH LIKE THIS
END LOOP;
END LOOP;
As #Pavel explained, it is not simply possible to traverse a record, like you could traverse an array. But there are several ways around it - depending on your exact requirements. Ultimately, since you want to return all values in the same column, you need to cast them to the same type - text is the obvious common ground, because there is a text representation for every type.
Quick and dirty
Say, you have a table with an integer, a text and a date column.
CREATE TEMP TABLE tbl(a int, b text, c date);
INSERT INTO tbl VALUES
(1, '1text', '2012-10-01')
,(2, '2text', '2012-10-02')
,(3, ',3,ex,', '2012-10-03') -- text with commas
,(4, '",4,"ex,"', '2012-10-04') -- text with commas and double quotes
Then the solution can be a simple as:
SELECT unnest(string_to_array(trim(t::text, '()'), ','))
FROM tbl t;
Works for the first two rows, but fails for the special cases of row 3 and 4.
You can easily solve the problem with commas in the text representation:
SELECT unnest(('{' || trim(t::text, '()') || '}')::text[])
FROM tbl t
WHERE a < 4;
This would work fine - except for line 4 which has double quotes in the text representation. Those are escaped by doubling them up. But the array constructor would need them escaped by \. Not sure why this incompatibility is there ...
SELECT ('{' || trim(t::text, '()') || '}') FROM tbl t WHERE a = 4
Yields:
{4,""",4,""ex,""",2012-10-04}
But you would need:
SELECT '{4,"\",4,\"ex,\"",2012-10-04}'::text[]; -- works
Proper solution
If you knew the column names beforehand, a clean solution would be simple:
SELECT unnest(ARRAY[a::text,b::text,c::text])
FROM tbl
Since you operate on records of well know type you can just query the system catalog:
SELECT string_agg(a.attname || '::text', ',' ORDER BY a.attnum)
FROM pg_catalog.pg_attribute a
WHERE a.attrelid = 'tbl'::regclass
AND a.attnum > 0
AND a.attisdropped = FALSE
Put this in a function with dynamic SQL:
CREATE OR REPLACE FUNCTION unnest_table(_tbl text)
RETURNS SETOF text LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE '
SELECT unnest(ARRAY[' || (
SELECT string_agg(a.attname || '::text', ',' ORDER BY a.attnum)
FROM pg_catalog.pg_attribute a
WHERE a.attrelid = _tbl::regclass
AND a.attnum > 0
AND a.attisdropped = false
) || '])
FROM ' || _tbl::regclass;
END
$func$;
Call:
SELECT unnest_table('tbl') AS val
Returns:
val
-----
1
1text
2012-10-01
2
2text
2012-10-02
3
,3,ex,
2012-10-03
4
",4,"ex,"
2012-10-04
This works without installing additional modules. Another option is to install the hstore extension and use it like #Craig demonstrates.
PL/pgSQL isn't really designed for what you want to do. It doesn't consider a record to be iterable, it's a tuple of possibly different and incompatible data types.
PL/pgSQL has EXECUTE for dynamic SQL, but EXECUTE queries cannot refer to PL/pgSQL variables like NEW or other records directly.
What you can do is convert the record to a hstore key/value structure, then iterate over the hstore. Use each(hstore(the_record)), which produces a rowset of key,value tuples. All values are cast to their text representations.
This toy function demonstrates iteration over a record by creating an anonymous ROW(..) - which will have column names f1, f2, f3 - then converting that to hstore, iterating over its column/value pairs, and returning each pair.
CREATE EXTENSION hstore;
CREATE OR REPLACE FUNCTION hs_demo()
RETURNS TABLE ("key" text, "value" text)
LANGUAGE plpgsql AS
$$
DECLARE
data1 record;
hs_row record;
BEGIN
data1 = ROW(1, 2, 'test');
FOR hs_row IN SELECT kv."key", kv."value" FROM each(hstore(data1)) kv
LOOP
"key" = hs_row."key";
"value" = hs_row."value";
RETURN NEXT;
END LOOP;
END;
$$;
In reality you would never write it this way, since the whole loop can be replaced with a simple RETURN QUERY statement and it does the same thing each(hstore) does anyway - so this is only to show how each(hstore(record)) works, and the above function should never actually be used.
This feature is not supported in plpgsql - Record IS NOT hash array like other scripting languages - it is similar to C or ADA, where this functionality is impossible. You can use other PL language like PLPerl or PLPython or some tricks - you can iterate with HSTORE datatype (extension) or via dynamic SQL
see How to set value of composite variable field using dynamic SQL
But request for this functionality usually means, so you do some wrong. When you use PL/pgSQL you have think different than you use Javascript or Python
FOR data2 IN
SELECT d
from unnest( data1 ) s(d)
LOOP
RETURN NEXT data2;
END LOOP;
If you order your results prior to looping, will you accomplish what you want.
for rc in select * from t1 order by t1.key asc loop
return next rc;
end loop;
will do exactly what you need. It is also the fastest way to perform that kind of task.
I wasn't able to find a proper way to loop over record, so what I did is converted record to json first and looped over json
declare
_src_schema varchar := 'db_utility';
_targetjson json;
_key text;
_value text;
BEGIN
select row_to_json(c.*) from information_schema.columns c where c.table_name = prm_table and c.column_name = prm_column
and c.table_schema = _src_schema into _targetjson;
raise notice '_targetjson %', _targetjson;
FOR _key, _value IN
SELECT * FROM jsonb_each_text(_targetjson)
LOOP
-- do some math operation on its corresponding value
RAISE NOTICE '%: %', _key, _value;
END LOOP;
return true;
end;

How to write combinatorics function in postgres?

I have a PostgreSQL table of this form:
base_id int | mods smallint[]
3 | {7,15,48}
I need to populate a table of this form:
combo_id int | base_id int | mods smallint[]
1 | 3 |
2 | 3 | {7}
3 | 3 | {7,15}
4 | 3 | {7,48}
5 | 3 | {7,15,48}
6 | 3 | {15}
7 | 3 | {15,48}
8 | 3 | {48}
I think I could accomplish this using a function that does almost exactly this, iterating over the first table and writing combinations to the second table:
Generate all combinations in SQL
But, I'm a Postgres novice and cannot for the life of me figure out how to do this using plpgsql. It doesn't need to be particularly fast; it will only be run periodically on the backend. The first table has approximately 80 records and a rough calculation suggests we can expect around 2600 records for the second table.
Can anybody at least point me in the right direction?
Edit: Craig: I've got PostgreSQL 9.0. I was successfully able to use UNNEST():
FOR messvar IN SELECT * FROM UNNEST(mods) AS mod WHERE mod BETWEEN 0 AND POWER(2, #n) - 1
LOOP
RAISE NOTICE '%', messvar;
END LOOP;
but then didn't know where to go next.
Edit: For reference, I ended up using Erwin's solution, with a single line added to add a null result ('{}') to each set and the special case Erwin refers to removed:
CREATE OR REPLACE FUNCTION f_combos(_arr integer[], _a integer[] DEFAULT '{}'::integer[], _z integer[] DEFAULT '{}'::integer[])
RETURNS SETOF integer[] LANGUAGE plpgsql AS
$BODY$
DECLARE
i int;
j int;
_up int;
BEGIN
IF array_length(_arr,1) > 0 THEN
_up := array_upper(_arr, 1);
IF _a = '{}' AND _z = '{}' THEN RETURN QUERY SELECT '{}'::int[]; END IF;
FOR i IN array_lower(_arr, 1) .. _up LOOP
FOR j IN i .. _up LOOP
CASE j-i
WHEN 0,1 THEN
RETURN NEXT _a || _arr[i:j] || _z;
ELSE
RETURN NEXT _a || _arr[i:i] || _arr[j:j] || _z;
RETURN QUERY SELECT *
FROM f_combos(_arr[i+1:j-1], _a || _arr[i], _arr[j] || _z);
END CASE;
END LOOP;
END LOOP;
ELSE
RETURN NEXT _arr;
END IF;
END;
$BODY$
Then, I used that function to populate my table:
INSERT INTO e_ecosystem_modified (ide_ecosystem, modifiers)
(SELECT ide_ecosystem, f_combos(modifiers) AS modifiers FROM e_ecosystem WHERE ecosystemgroup <> 'modifier' ORDER BY ide_ecosystem, modifiers);
From 79 rows in my source table with a maximum of 7 items in the modifiers array, the query took 250ms to populate 2630 rows in my output table. Fantastic.
After I slept over it I had a completely new, simpler, faster idea:
CREATE OR REPLACE FUNCTION f_combos(_arr anyarray)
RETURNS TABLE (combo anyarray) LANGUAGE plpgsql AS
$BODY$
BEGIN
IF array_upper(_arr, 1) IS NULL THEN
combo := _arr; RETURN NEXT; RETURN;
END IF;
CASE array_upper(_arr, 1)
-- WHEN 0 THEN -- does not exist
WHEN 1 THEN
RETURN QUERY VALUES ('{}'), (_arr);
WHEN 2 THEN
RETURN QUERY VALUES ('{}'), (_arr[1:1]), (_arr), (_arr[2:2]);
ELSE
RETURN QUERY
WITH x AS (
SELECT f.combo FROM f_combos(_arr[1:array_upper(_arr, 1)-1]) f
)
SELECT x.combo FROM x
UNION ALL
SELECT x.combo || _arr[array_upper(_arr, 1)] FROM x;
END CASE;
END
$BODY$;
Call:
SELECT * FROM f_combos('{1,2,3,4,5,6,7,8,9}'::int[]) ORDER BY 1;
512 rows, total runtime: 2.899 ms
Explain
Treat special cases with NULL and empty array.
Build combinations for a primitive array of two.
Any longer array is broken down into:
the combinations for same array of length n-1
plus all of those combined with element n .. recursively.
Really simple, once you got it.
Works for 1-dimensional integer arrays starting with subscript 1 (see below).
2-3 times as fast as old solution, scales better.
Works for any element type again (using polymorphic types).
Includes the empty array in the result as is displayed in the question (and as #Craig pointed out to me in the comments).
Shorter, more elegant.
This assumes array subscripts starting at 1 (Default). If you are not sure about your values, call the function like this to normalize:
SELECT * FROM f_combos(_arr[array_lower(_arr, 1):array_upper(_arr, 1)]);
Not sure if there is a more elegant way to normalize array subscripts. I posted a question about that:
Normalize array subscripts for 1-dimensional array so they start with 1
Old solution (slower)
CREATE OR REPLACE FUNCTION f_combos2(_arr int[], _a int[] = '{}', _z int[] = '{}')
RETURNS SETOF int[] LANGUAGE plpgsql AS
$BODY$
DECLARE
i int;
j int;
_up int;
BEGIN
IF array_length(_arr,1) > 0 THEN
_up := array_upper(_arr, 1);
FOR i IN array_lower(_arr, 1) .. _up LOOP
FOR j IN i .. _up LOOP
CASE j-i
WHEN 0,1 THEN
RETURN NEXT _a || _arr[i:j] || _z;
WHEN 2 THEN
RETURN NEXT _a || _arr[i:i] || _arr[j:j] || _z;
RETURN NEXT _a || _arr[i:j] || _z;
ELSE
RETURN NEXT _a || _arr[i:i] || _arr[j:j] || _z;
RETURN QUERY SELECT *
FROM f_combos2(_arr[i+1:j-1], _a || _arr[i], _arr[j] || _z);
END CASE;
END LOOP;
END LOOP;
ELSE
RETURN NEXT _arr;
END IF;
END;
$BODY$;
Call:
SELECT * FROM f_combos2('{7,15,48}'::int[]) ORDER BY 1;
Works for 1-dimensional integer arrays.
This could be further optimized, but that's certainly not needed for the scope of this question.
ORDER BY to impose the order displayed in the question.
Provide for NULL or empty array, as NULL is mentioned in the comments.
Tested with PostgreSQL 9.1, but should work with any halfway modern version.
array_lower() and array_upper() have been around for at least since PostgreSQL 7.4. Only parameter defaults are new in version 8.4. Could easily be replaced.
Performance is decent.
SELECT DISTINCT * FROM f_combos('{1,2,3,4,5,6,7,8,9}'::int[]) ORDER BY 1;
511 rows, total runtime: 7.729 ms
Explanation
It builds on this simple form that only creates all combinations of neighboring elements:
CREATE FUNCTION f_combos(_arr int[])
RETURNS SETOF int[] LANGUAGE plpgsql AS
$BODY$
DECLARE
i int;
j int;
_up int;
BEGIN
_up := array_upper(_arr, 1);
FOR i in array_lower(_arr, 1) .. _up LOOP
FOR j in i .. _up LOOP
RETURN NEXT _arr[i:j];
END LOOP;
END LOOP;
END;
$BODY$;
But this will fail for sub-arrays with more than two elements. So:
For any sub-array with 3 elements one array with just the outer two elements is added. this is a shortcut for this special case that improves performance and is not strictly needed.
For any sub-array with more than 3 elements I take the outer two elements and fill in with all combinations of inner elements built by the same function recursively.
One approach is with a recursive CTE. Erwin's updated recursive function is significantly faster and scales better, though, so this is really useful as an interesting different approach. Erwin's updated version is much more practical.
I tried a bit counting approach (see the end) but without a fast way to pluck arbitrary elements from an array it proved slower then either recursive approach.
Recursive CTE combinations function
CREATE OR REPLACE FUNCTION combinations(anyarray) RETURNS SETOF anyarray AS $$
WITH RECURSIVE
items AS (
SELECT row_number() OVER (ORDER BY item) AS rownum, item
FROM (SELECT unnest($1) AS item) unnested
),
q AS (
SELECT 1 AS i, $1[1:0] arr
UNION ALL
SELECT (i+1), CASE x
WHEN 1 THEN array_append(q.arr,(SELECT item FROM items WHERE rownum = i))
ELSE q.arr END
FROM generate_series(0,1) x CROSS JOIN q WHERE i <= array_upper($1,1)
)
SELECT q.arr AS mods
FROM q WHERE i = array_upper($1,1)+1;
$$ LANGUAGE 'sql';
It's a polymorphic function, so it'll work on arrays of any type.
The logic is to iterate over each item in the unnested input set, using a working table. Start with an empty array in the working table, with a generation number of 1. For each entry in the input set insert two new arrays into the working table with an incremented generation number. One of the two is a copy of the input array from the previous generation and the other is the input array with the (generation-number)'th item from the input set appended to it. When the generation number exceeds the number of items in the input set, return the last generation.
Usage
You can use the combinations(smallint[]) function to produce the results you desire, using it as a set-returning function in combinatin with the row_number window function.
-- assuming table structure
regress=# \d comb
Table "public.comb"
Column | Type | Modifiers
---------+------------+-----------
base_id | integer |
mods | smallint[] |
SELECT base_id, row_number() OVER (ORDER BY mod) AS mod_id, mod
FROM (SELECT base_id, combinations(mods) AS mod FROM comb WHERE base_id = 3) x
ORDER BY mod;
Results
regress=# SELECT base_id, row_number() OVER (ORDER BY mod) AS mod_id, mod
regress-# FROM (SELECT base_id, combinations(mods) AS mod FROM comb WHERE base_id = 3) x
regress-# ORDER BY mod;
base_id | mod_id | mod
---------+--------+-----------
3 | 1 | {}
3 | 2 | {7}
3 | 3 | {7,15}
3 | 4 | {7,15,48}
3 | 5 | {7,48}
3 | 6 | {15}
3 | 7 | {15,48}
3 | 8 | {48}
(8 rows)
Time: 2.121 ms
Zero element arrays produce a null result. If you want combinations({}) to return one row {} then a UNION ALL with {} will do the job.
Theory
It appears you want the k-combinations for all k in a k-multicombination, rather than simple combinations. See number of combinations with repetition.
In other words, you want all k-combinations of elements from your set, for all k from 0 to n where n is the set size.
Related SO question: SQL - Find all possible combination, which has the really interesting answer about bit counting.
Bit operations exist in Pg, so a bit counting approach should be possible. You'd expect it to be more efficient, but because it's so slow to select a scattered subset of elements from an array it actually works out slower.
CREATE OR REPLACE FUNCTION bitwise_subarray(arr anyarray, elements integer)
RETURNS anyarray AS $$
SELECT array_agg($1[n+1])
FROM generate_series(0,array_upper($1,1)-1) n WHERE ($2>>n) & 1 = 1;
$$ LANGUAGE sql;
COMMENT ON FUNCTION bitwise_subarray(anyarray,integer) IS 'Return the elements from $1 where the corresponding bit in $2 is set';
CREATE OR REPLACE FUNCTION comb_bits(anyarray) RETURNS SETOF anyarray AS $$
SELECT bitwise_subarray($1, x)
FROM generate_series(0,pow(2,array_upper($1,1))::integer-1) x;
$$ LANGUAGE 'sql';
If you could find a faster way to write bitwise_subarray then comb_bits would be very fast. Like, say, a small C extension function, but I'm only crazy enough to write one of those for an SO answer.