Postgresql, recursive view with getting rows of older iterations

Postgresql, recursive view with getting rows of older iterations - postgresql

I would like to do some calculations with postgresql in a recursive / iterative way.
Here is a very rudimentary example: There is a line and on the line are 2 points. I want to create new points in the middle of every pair of points before existing.
(This is just an example, this does not really make sense, because we will get infinitely many points and got at the same point as existing ones)
CREATE TABLE IF NOT EXISTS public.points
(
pos real
)
TABLESPACE pg_default;
INSERT INTO points VALUES (0),(1024);
What i want to achieve is something like this:
| pos | step | Parent A | Parent B |
| -------- | -------- | -------- | -------- |
| 0 | 1 | -1 | -1 |
| 1024 | 1 | -1 | -1 |
| 512 | 2 | 0 | 1024 |
| 256 | 3 | 0 | 512 |
| 768 | 3 | 512 | 1024 |
...
Now I have two starting points. How to build a request to achieve such a result?
If you have a solution without the step, it would be great also, but i think it can help me, to prevent doing calculations twice.
After a few tries and help from the article Recursive cumulative function - reuse resulting rows as input
I found this solution. Is this the way to go, or do you have a better solution?
CREATE TYPE public.points_type AS
(
pos real,
step integer,
parenta real,
parentb real
);
CREATE TYPE public.points_type_input AS
(
pos real,
step integer
);
CREATE OR REPLACE FUNCTION public.points_calc(
_points points_type_input[])
RETURNS SETOF points_type
LANGUAGE 'plpgsql'
COST 100
VOLATILE PARALLEL UNSAFE
ROWS 1000
AS $BODY$
declare
new_point points_type;
Begin
FOR new_point IN
SELECT (new_p).pos AS pos,
(new_p).stepnow +1 AS step,
(new_p).parentA AS parentA,
(new_p).parentB AS parentB
FROM (
SELECT ((p1.pos + p2.pos)/2)::real AS pos,
(array_agg(p1.pos) OVER (PARTITION BY 1)) AS oldpos,
(max(p1.step) OVER (PARTITION BY 1)) AS stepnow,
p1.pos AS parentA,
p2.pos AS parentB
FROM unnest(_points) p1
CROSS JOIN unnest(_points) p2
WHERE p1.pos < p2.pos
) new_p
WHERE not ARRAY[pos] <# oldpos
LOOP
RETURN NEXT new_point;
END LOOP;
RETURN;
END
$BODY$;
This is the best solution i found. Is there a better way?
CREATE OR REPLACE FUNCTION points_v2(end_step integer)
RETURNS TABLE(pos real, step integer, parenta real, parentb real)
LANGUAGE plpgsql AS
$func$
DECLARE
t record;
a boolean;
cnt integer;
BEGIN
DROP TABLE IF EXISTS pg_temp.result2;
CREATE TEMP TABLE result2 (pos real, step integer, parenta real, parentb real) ON COMMIT DROP;
FOR t IN
TABLE points ORDER BY pos
LOOP
INSERT INTO result2(pos, step, parenta, parentb)
SELECT t.pos, 1 AS step, -1 AS parenta, -1 AS parentb;
END LOOP;
a = true;
cnt = 0;
WHILE a AND cnt<end_step
LOOP
a = false;
cnt = cnt + 1;
FOR t IN
SELECT (newpoints.result).pos,
(newpoints.result).step,
(newpoints.result).parenta,
(newpoints.result).parentb
FROM ( SELECT points_calc(array_agg(ROW(p.pos, p.step)::points_type_input)) AS result
FROM result2 p
) newpoints
LOOP
a = true;
INSERT INTO result2(pos, step, parenta, parentb)
SELECT t.pos, t.step, t.parenta, t.parentb;
END LOOP;
END LOOP;
RETURN QUERY
SELECT r.pos, r.step, r.parenta, r.parentb
FROM result2 r;
END
$func$;
SELECT * FROM points_v2(3);
Is there someone, who can help me?
Here is a dbfiddle example.
Thank you very much, best regards, Ludwig Rahlff

Related

Postgres 12 - CREATE AGGREGATE looks right, but results never return

I've been wanting a reason to try out CREATE AGGREGATE, and now have one: Root Mean Square/Quadratic Mean. I posted some broken code, that I've corrected, based on helpful suggestions from jjanes. Here's the working setup, with my custom tools and types schemas...you could use your own.
Now that it's working, I'm finding that the custom aggregate is dramatically slower than raw SQL. The grouping field is indexed, the aggregated field is not. Is this speed difference to be expected, and can it be overcome in SQL or PL/PgSQL?
First, here's the working code:
------------------------------------------------------
-- Create compound type to pass down processing chain
------------------------------------------------------
DROP TYPE types.rms_state CASCADE;
CREATE TYPE types.rms_state AS (
running_count int4,
running_sum_squares int4
);
------------------------------------------------------
-- Create the per-row function
------------------------------------------------------
DROP FUNCTION IF EXISTS tools.rms_row_function(types.rms_state, int4);
CREATE FUNCTION tools.rms_row_function (
rms_data_in types.rms_state,
value_from_row int4
)
RETURNS types.rms_state
LANGUAGE plpgsql
IMMUTABLE
STRICT
AS $BODY$
DECLARE
rms_data_out types.rms_state;
BEGIN
-- RAISE NOTICE 'rms_row_function: rms_data_in: %', rms_data_in::text;
rms_data_out.running_count := rms_data_in.running_count + 1;
rms_data_out.running_sum_squares := rms_data_in.running_sum_squares + (value_from_row ^ 2);
RETURN rms_data_out;
END;
$BODY$;
------------------------------------------------------
-- Create the final results function
------------------------------------------------------
DROP FUNCTION IF EXISTS tools.rms_result_function(types.rms_state);
CREATE FUNCTION tools.rms_result_function (
rms_data_in types.rms_state
)
RETURNS real
LANGUAGE plpgsql
IMMUTABLE
STRICT
AS $BODY$
DECLARE
rms_out real;
BEGIN
-- RAISE NOTICE 'rms_result_function: rms_data_in: %', rms_data_in::text;
IF (rms_data_in.running_count = 0) THEN
rms_out := 0;
ELSE
rms_out := (rms_data_in.running_sum_squares / rms_data_in.running_count)::real;
rms_out := rms_out ^ 0.5; -- Get the square root and return it
END IF;
RETURN rms_out;
END;
$BODY$;
------------------------------------------------------
-- Create the aggregate bindings/declaration
------------------------------------------------------
CREATE AGGREGATE tools.rms (int4)
(
sfunc = tools.rms_row_function,
finalfunc = tools.rms_result_function,
stype = types.rms_state,
FINALFUNC_MODIFY = READ_WRITE,
initcond = '(0,0)' -- Reset on each group, must be a textual version of state data.
);
I'm using a field named analytic_productivity.num_inst in my example, but it could be any int4 field. Here's a stripped-down table declation:
CREATE TABLE IF NOT EXISTS data.analytic_productivity (
id uuid NOT NULL DEFAULT NULL,
facility_id uuid NOT NULL DEFAULT NULL,
num_inst integer NOT NULL DEFAULT 0,
);
The facility table is included in the query for a name lookup:
select facility.name_ as facility_name,
sqrt(avg(power(num_inst, 2))) as inst_rms, -- root mean square/quadratic mean,
rms(num_inst) as inst_rms_check
from analytic_productivity
left join facility on facility.id = analytic_productivity.facility_id
group by 1
order by 1
Below are some sample results.
+-----------------+--------------------+----------------+
| facility_name | inst_rms | inst_rms_check |
+-----------------+--------------------+----------------+
| Anderson | 5.191804567965901 | 5.0990195 |
| Baldwin North | 42.24082451064157 | 42.237423 |
| Curvey | 41.75334367003306 | 41.749252 |
| Daodge Creeek | 28.75910443926612 | 28.757608 |
| Edgards | 42.430040392954375 | 42.426407 |
+-------------------------+--------------------+--------+
I'm not alarmed about the slight difference in scores, as I'm using a real, which only supports six decimals.

PostgresQL window function over blocks of continuous IDs

I have a table with partially consecutive integer ids, i.e. there are blocks such as 1,2,3, 6,7,8, 10, 23,24,25,26.
the gap size is dynamic
the length of the blocks is dynamic
I am breaking my head about a simple solution that selects from the table
and includes a column where the value corresponds to the first id of the respective block.
I.e. something like this
select id, first(id) over <what goes here?> first from table;
The result should look as following
| id | first |
|----|-------|
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 6 | 6 |
| 7 | 6 |
| 8 | 6 |
| 10 | 10 |
| 23 | 23 |
| 24 | 23 |
| 25 | 23 |
| 26 | 23 |
Afterwards i could use this column nicely with the partition by window function clause.
What I came up with so far always looked similar to this and didn't succeed:
WITH foo AS (
SELECT LAG(id) OVER (ORDER BY id) AS previous_id,
id AS id,
id - LAG(id, 1, id) OVER (ORDER BY id) AS first_in_sequence
FROM table)
SELECT *,
FIRST_VALUE(id) OVER (ORDER BY id) AS first
FROM foo
ORDER BY id;
Defining a custom postgres function would also be an acceptable solution.
Thanks for any advice,
Marti

In Postgres you can create a custom aggregate. Example:
create or replace function first_in_series_func(int[], int)
returns int[] language sql immutable
as $$
select case
when $1[2] is distinct from $2- 1 then array[$2, $2]
else array[$1[1], $2] end;
$$;
create or replace function first_in_series_final(int[])
returns int language sql immutable
as $$
select $1[1]
$$;
create aggregate first_in_series(int) (
sfunc = first_in_series_func,
finalfunc = first_in_series_final,
stype = int[]
);
Db<>fiddle.
Read in the docs: User-Defined Aggregates

Here is an idea how this could be done. An implicit cursor is not horribly efficient though.
create or replace function ff()
returns table (r_id integer, r_first integer)
language plpgsql as
$$
declare
running_previous integer;
running_id integer;
running_first integer := null;
begin
for running_id in select id from _table order by id loop
if running_previous is distinct from running_id - 1 then
running_first := running_id;
end if;
r_id := running_id;
r_first := running_first;
running_previous := running_id;
return next;
end loop;
end
$$;
-- test
select * from ff() as t(id, first);

Improving PL/pgSQL function

I just finished writing my first PLSQL function. Here what it does.
The SQL function attempt to reset the duplicate timestamp to NULL.
From table call_records find all timestamp that are duplicated.(using group by)
loop through each timestamp.Find all record with same timestamp (times-1, so that only 1 record for a given times is present)
From all the records found in step 2 update the timestamp to NULL
Here how the SQL function looks like.
CREATE OR REPLACE FUNCTION nullify() RETURNS INTEGER AS $$
DECLARE
T call_records.timestamp%TYPE;
-- Not sure why row_type does not work
-- R call_records%ROWTYPE;
S integer;
CRNS bigint[];
TMPS bigint[];
sql_stmt varchar = '';
BEGIN
FOR T,S IN (select timestamp,count(timestamp) as times from call_records where timestamp IS NOT NULL group by timestamp having count(timestamp) > 1)
LOOP
sql_stmt := format('SELECT ARRAY(select plain_crn from call_records where timestamp=%s limit %s)',T,S-1);
EXECUTE sql_stmt INTO TMPS;
CRNS := array_cat(CRNS,TMPS);
END LOOP;
sql_stmt = format('update call_records set timestamp=null where plain_crn in (%s)',array_to_string(CRNS,','));
RAISE NOTICE '%',sql_stmt;
EXECUTE sql_stmt ;
RETURN 1;
END
$$ LANGUAGE plpgsql;
Help me understand more PL/pgSQL language my suggesting me how it can be done better.
#a_horse_with_no_name: Here how the DB structure looks like
\d+ call_records;
id integer primary key
plain_crn bigint
timestamp bigint
efd integer default 0
id | efd | plain_crn | timestamp
----------+------------+------------+-----------
1 | 2016062936 | 8777444059 | 14688250050095
2 | 2016062940 | 8777444080 | 14688250050095
3 | 2016063012 | 8880000000 | 14688250050020
4 | 2016043011 | 8000000000 | 14688240012012
5 | 2016013011 | 8000000001 | 14688250050020
6 | 2016022011 | 8440000001 |
Now,
select timestamp,count(timestamp) as times from call_records where timestamp IS NOT NULL group by timestamp having count(timestamp) > 1
timestamp | count
-----------------+-----------
14688250050095 | 2
14688250050020 | 2
All that I want is to update the duplicate timestamp to null so that only one of them record has the given timestamp.
In short the above query should return result like this
select timestamp,count(timestamp) as times from call_records where timestamp IS NOT NULL group by timestamp;
timestamp | count
-----------------+-----------
14688250050095 | 1
14688250050020 | 1

You can use array variables directly (filter with predicate =ANY() - using dynamic SQL is wrong for this purpose:
postgres=# DO $$
DECLARE x int[] = '{1,2,3}';
result int[];
BEGIN
SELECT array_agg(v)
FROM generate_series(1,10) g(v)
WHERE v = ANY(x)
INTO result;
RAISE NOTICE 'result is: %', result;
END;
$$;
NOTICE: result is: {1,2,3}
DO
Next - this is typical void function - it doesn't return any interesting. Usually these functions returns nothing when all is ok or raises exception. The returning 1 RETURN 1 is useless.
CREATE OR REPLACE FUNCTION foo(par int)
RETURNS void AS $$
BEGIN
IF EXISTS(SELECT * FROM footab WHERE id = par)
THEN
...
ELSE
RAISE EXCEPTION 'Missing data for parameter: %', par;
END IF;
END;
$$ LANGUAGE plpgsql;

ERROR: function expression in form cannot refer to other relations of same query level : How to work around LATERAL

DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path=tmp;
CREATE TABLE primes
( pos SERIAL NOT NULL PRIMARY KEY
, val INTEGER NOT NULL
, CONSTRAINT primes_alt UNIQUE (val)
);
CREATE FUNCTION is_prime(_val INTEGER)
RETURNS BOOLEAN
AS $func$
DECLARE ret BOOLEAN ;
BEGIN
SELECT False INTO ret
WHERE EXISTS (SELECT *
FROM primes ex
WHERE ex.val = $1
OR ( (ex.val * ex.val) <= $1 AND ($1 % ex.val) = 0 )
);
RETURN COALESCE(ret, True);
END;
$func$ LANGUAGE plpgsql STABLE;
CREATE VIEW vw_prime_step AS (
-- Note when the table is empty we return {2,3,1} as a bootstrap
SELECT
COALESCE(MAX(val) +2,2) AS start
, COALESCE((MAX(val) * MAX(val))-1, 3) AS stop
, COALESCE(min(val), 1) AS step
FROM primes
);
SELECT * FROM vw_prime_step;
-- The same as a function.
-- Works, but is not usable in a query that alters the primes table.
-- ; even not with the TEMP TABLE construct
CREATE FUNCTION fnc_prime_step ( OUT start INTEGER, OUT stop INTEGER, OUT step INTEGER)
RETURNS RECORD
AS $func$
BEGIN
/***
CREATE TEMP TABLE tmp_limits
ON COMMIT DROP
AS SELECT ps.start,ps.stop,ps.step FROM vw_prime_step ps
;
-- RETURN QUERY
SELECT tl.start,tl.stop,tl.step INTO $1,$2,$3
FROM tmp_limits tl
LIMIT 1
;
***/
SELECT tl.start,tl.stop,tl.step INTO $1,$2,$3
FROM vw_prime_step tl
LIMIT 1;
END;
$func$
-- Try lying ...
-- IMMUTABLE LANGUAGE plpgsql;
-- Try lying ...
Stable LANGUAGE plpgsql;
-- This works
SELECT * FROM fnc_prime_step();
INSERT INTO primes (val)
SELECT gs FROM fnc_prime_step() sss
, generate_series( 2, 3, 1 ) gs
WHERE is_prime(gs) = True
;
-- This works
SELECT * FROM fnc_prime_step();
INSERT INTO primes (val)
SELECT gs FROM fnc_prime_step() sss
, generate_series( 5, 24, 2 ) gs
WHERE is_prime(gs) = True
;
-- This does not work
-- ERROR: function expression in FROM cannot refer to other relations of same query level:1
SELECT * FROM fnc_prime_step();
INSERT INTO primes (val)
SELECT gs FROM fnc_prime_step() sss
, generate_series( sss.start, sss.stop, sss.step ) gs
WHERE is_prime(gs) = True
;
SELECT * FROM primes;
SELECT * FROM fnc_prime_step();
Of course, this question is purely hypothetic, I am not stupid enough to attempt to calculate a table of prime numbers in an DBMS. But the question remains: is there a clean way to hack around the absence of LATERAL?
As you can see, I tried with a view (does not work), function around this view (does not work either), a temp table in this function (njet), and twiddling the function's attributes.
Next step will probably be some trigger-hack (but I really,really hate triggers, basically because they are invisible to the strictness of the DBMS schema)

you can use SRF function in target list, but there should be some strange corner cases. LATERAL is best.
postgres=# select i, generate_series(1,i) X from generate_series(1,3) g(i);
i | x
---+---
1 | 1
2 | 1
2 | 2
3 | 1
3 | 2
3 | 3
(6 rows)

How to write combinatorics function in postgres?

I have a PostgreSQL table of this form:
base_id int | mods smallint[]
3 | {7,15,48}
I need to populate a table of this form:
combo_id int | base_id int | mods smallint[]
1 | 3 |
2 | 3 | {7}
3 | 3 | {7,15}
4 | 3 | {7,48}
5 | 3 | {7,15,48}
6 | 3 | {15}
7 | 3 | {15,48}
8 | 3 | {48}
I think I could accomplish this using a function that does almost exactly this, iterating over the first table and writing combinations to the second table:
Generate all combinations in SQL
But, I'm a Postgres novice and cannot for the life of me figure out how to do this using plpgsql. It doesn't need to be particularly fast; it will only be run periodically on the backend. The first table has approximately 80 records and a rough calculation suggests we can expect around 2600 records for the second table.
Can anybody at least point me in the right direction?
Edit: Craig: I've got PostgreSQL 9.0. I was successfully able to use UNNEST():
FOR messvar IN SELECT * FROM UNNEST(mods) AS mod WHERE mod BETWEEN 0 AND POWER(2, #n) - 1
LOOP
RAISE NOTICE '%', messvar;
END LOOP;
but then didn't know where to go next.
Edit: For reference, I ended up using Erwin's solution, with a single line added to add a null result ('{}') to each set and the special case Erwin refers to removed:
CREATE OR REPLACE FUNCTION f_combos(_arr integer[], _a integer[] DEFAULT '{}'::integer[], _z integer[] DEFAULT '{}'::integer[])
RETURNS SETOF integer[] LANGUAGE plpgsql AS
$BODY$
DECLARE
i int;
j int;
_up int;
BEGIN
IF array_length(_arr,1) > 0 THEN
_up := array_upper(_arr, 1);
IF _a = '{}' AND _z = '{}' THEN RETURN QUERY SELECT '{}'::int[]; END IF;
FOR i IN array_lower(_arr, 1) .. _up LOOP
FOR j IN i .. _up LOOP
CASE j-i
WHEN 0,1 THEN
RETURN NEXT _a || _arr[i:j] || _z;
ELSE
RETURN NEXT _a || _arr[i:i] || _arr[j:j] || _z;
RETURN QUERY SELECT *
FROM f_combos(_arr[i+1:j-1], _a || _arr[i], _arr[j] || _z);
END CASE;
END LOOP;
END LOOP;
ELSE
RETURN NEXT _arr;
END IF;
END;
$BODY$
Then, I used that function to populate my table:
INSERT INTO e_ecosystem_modified (ide_ecosystem, modifiers)
(SELECT ide_ecosystem, f_combos(modifiers) AS modifiers FROM e_ecosystem WHERE ecosystemgroup <> 'modifier' ORDER BY ide_ecosystem, modifiers);
From 79 rows in my source table with a maximum of 7 items in the modifiers array, the query took 250ms to populate 2630 rows in my output table. Fantastic.

After I slept over it I had a completely new, simpler, faster idea:
CREATE OR REPLACE FUNCTION f_combos(_arr anyarray)
RETURNS TABLE (combo anyarray) LANGUAGE plpgsql AS
$BODY$
BEGIN
IF array_upper(_arr, 1) IS NULL THEN
combo := _arr; RETURN NEXT; RETURN;
END IF;
CASE array_upper(_arr, 1)
-- WHEN 0 THEN -- does not exist
WHEN 1 THEN
RETURN QUERY VALUES ('{}'), (_arr);
WHEN 2 THEN
RETURN QUERY VALUES ('{}'), (_arr[1:1]), (_arr), (_arr[2:2]);
ELSE
RETURN QUERY
WITH x AS (
SELECT f.combo FROM f_combos(_arr[1:array_upper(_arr, 1)-1]) f
)
SELECT x.combo FROM x
UNION ALL
SELECT x.combo || _arr[array_upper(_arr, 1)] FROM x;
END CASE;
END
$BODY$;
Call:
SELECT * FROM f_combos('{1,2,3,4,5,6,7,8,9}'::int[]) ORDER BY 1;
512 rows, total runtime: 2.899 ms
Explain
Treat special cases with NULL and empty array.
Build combinations for a primitive array of two.
Any longer array is broken down into:
the combinations for same array of length n-1
plus all of those combined with element n .. recursively.
Really simple, once you got it.
Works for 1-dimensional integer arrays starting with subscript 1 (see below).
2-3 times as fast as old solution, scales better.
Works for any element type again (using polymorphic types).
Includes the empty array in the result as is displayed in the question (and as #Craig pointed out to me in the comments).
Shorter, more elegant.
This assumes array subscripts starting at 1 (Default). If you are not sure about your values, call the function like this to normalize:
SELECT * FROM f_combos(_arr[array_lower(_arr, 1):array_upper(_arr, 1)]);
Not sure if there is a more elegant way to normalize array subscripts. I posted a question about that:
Normalize array subscripts for 1-dimensional array so they start with 1
Old solution (slower)
CREATE OR REPLACE FUNCTION f_combos2(_arr int[], _a int[] = '{}', _z int[] = '{}')
RETURNS SETOF int[] LANGUAGE plpgsql AS
$BODY$
DECLARE
i int;
j int;
_up int;
BEGIN
IF array_length(_arr,1) > 0 THEN
_up := array_upper(_arr, 1);
FOR i IN array_lower(_arr, 1) .. _up LOOP
FOR j IN i .. _up LOOP
CASE j-i
WHEN 0,1 THEN
RETURN NEXT _a || _arr[i:j] || _z;
WHEN 2 THEN
RETURN NEXT _a || _arr[i:i] || _arr[j:j] || _z;
RETURN NEXT _a || _arr[i:j] || _z;
ELSE
RETURN NEXT _a || _arr[i:i] || _arr[j:j] || _z;
RETURN QUERY SELECT *
FROM f_combos2(_arr[i+1:j-1], _a || _arr[i], _arr[j] || _z);
END CASE;
END LOOP;
END LOOP;
ELSE
RETURN NEXT _arr;
END IF;
END;
$BODY$;
Call:
SELECT * FROM f_combos2('{7,15,48}'::int[]) ORDER BY 1;
Works for 1-dimensional integer arrays.
This could be further optimized, but that's certainly not needed for the scope of this question.
ORDER BY to impose the order displayed in the question.
Provide for NULL or empty array, as NULL is mentioned in the comments.
Tested with PostgreSQL 9.1, but should work with any halfway modern version.
array_lower() and array_upper() have been around for at least since PostgreSQL 7.4. Only parameter defaults are new in version 8.4. Could easily be replaced.
Performance is decent.
SELECT DISTINCT * FROM f_combos('{1,2,3,4,5,6,7,8,9}'::int[]) ORDER BY 1;
511 rows, total runtime: 7.729 ms
Explanation
It builds on this simple form that only creates all combinations of neighboring elements:
CREATE FUNCTION f_combos(_arr int[])
RETURNS SETOF int[] LANGUAGE plpgsql AS
$BODY$
DECLARE
i int;
j int;
_up int;
BEGIN
_up := array_upper(_arr, 1);
FOR i in array_lower(_arr, 1) .. _up LOOP
FOR j in i .. _up LOOP
RETURN NEXT _arr[i:j];
END LOOP;
END LOOP;
END;
$BODY$;
But this will fail for sub-arrays with more than two elements. So:
For any sub-array with 3 elements one array with just the outer two elements is added. this is a shortcut for this special case that improves performance and is not strictly needed.
For any sub-array with more than 3 elements I take the outer two elements and fill in with all combinations of inner elements built by the same function recursively.

One approach is with a recursive CTE. Erwin's updated recursive function is significantly faster and scales better, though, so this is really useful as an interesting different approach. Erwin's updated version is much more practical.
I tried a bit counting approach (see the end) but without a fast way to pluck arbitrary elements from an array it proved slower then either recursive approach.
Recursive CTE combinations function
CREATE OR REPLACE FUNCTION combinations(anyarray) RETURNS SETOF anyarray AS $$
WITH RECURSIVE
items AS (
SELECT row_number() OVER (ORDER BY item) AS rownum, item
FROM (SELECT unnest($1) AS item) unnested
),
q AS (
SELECT 1 AS i, $1[1:0] arr
UNION ALL
SELECT (i+1), CASE x
WHEN 1 THEN array_append(q.arr,(SELECT item FROM items WHERE rownum = i))
ELSE q.arr END
FROM generate_series(0,1) x CROSS JOIN q WHERE i <= array_upper($1,1)
)
SELECT q.arr AS mods
FROM q WHERE i = array_upper($1,1)+1;
$$ LANGUAGE 'sql';
It's a polymorphic function, so it'll work on arrays of any type.
The logic is to iterate over each item in the unnested input set, using a working table. Start with an empty array in the working table, with a generation number of 1. For each entry in the input set insert two new arrays into the working table with an incremented generation number. One of the two is a copy of the input array from the previous generation and the other is the input array with the (generation-number)'th item from the input set appended to it. When the generation number exceeds the number of items in the input set, return the last generation.
Usage
You can use the combinations(smallint[]) function to produce the results you desire, using it as a set-returning function in combinatin with the row_number window function.
-- assuming table structure
regress=# \d comb
Table "public.comb"
Column | Type | Modifiers
---------+------------+-----------
base_id | integer |
mods | smallint[] |
SELECT base_id, row_number() OVER (ORDER BY mod) AS mod_id, mod
FROM (SELECT base_id, combinations(mods) AS mod FROM comb WHERE base_id = 3) x
ORDER BY mod;
Results
regress=# SELECT base_id, row_number() OVER (ORDER BY mod) AS mod_id, mod
regress-# FROM (SELECT base_id, combinations(mods) AS mod FROM comb WHERE base_id = 3) x
regress-# ORDER BY mod;
base_id | mod_id | mod
---------+--------+-----------
3 | 1 | {}
3 | 2 | {7}
3 | 3 | {7,15}
3 | 4 | {7,15,48}
3 | 5 | {7,48}
3 | 6 | {15}
3 | 7 | {15,48}
3 | 8 | {48}
(8 rows)
Time: 2.121 ms
Zero element arrays produce a null result. If you want combinations({}) to return one row {} then a UNION ALL with {} will do the job.
Theory
It appears you want the k-combinations for all k in a k-multicombination, rather than simple combinations. See number of combinations with repetition.
In other words, you want all k-combinations of elements from your set, for all k from 0 to n where n is the set size.
Related SO question: SQL - Find all possible combination, which has the really interesting answer about bit counting.
Bit operations exist in Pg, so a bit counting approach should be possible. You'd expect it to be more efficient, but because it's so slow to select a scattered subset of elements from an array it actually works out slower.
CREATE OR REPLACE FUNCTION bitwise_subarray(arr anyarray, elements integer)
RETURNS anyarray AS $$
SELECT array_agg($1[n+1])
FROM generate_series(0,array_upper($1,1)-1) n WHERE ($2>>n) & 1 = 1;
$$ LANGUAGE sql;
COMMENT ON FUNCTION bitwise_subarray(anyarray,integer) IS 'Return the elements from $1 where the corresponding bit in $2 is set';
CREATE OR REPLACE FUNCTION comb_bits(anyarray) RETURNS SETOF anyarray AS $$
SELECT bitwise_subarray($1, x)
FROM generate_series(0,pow(2,array_upper($1,1))::integer-1) x;
$$ LANGUAGE 'sql';
If you could find a faster way to write bitwise_subarray then comb_bits would be very fast. Like, say, a small C extension function, but I'm only crazy enough to write one of those for an SO answer.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Postgresql, recursive view with getting rows of older iterations - postgresql

Related

Postgres 12 - CREATE AGGREGATE looks right, but results never return

PostgresQL window function over blocks of continuous IDs

Improving PL/pgSQL function

ERROR: function expression in form cannot refer to other relations of same query level : How to work around LATERAL

How to write combinatorics function in postgres?

Categories

Resources