How can i generate a linestring with a sql question from this source of numbers, where all coordinate items are separated with comma:
[16.49422,48.8011,16.49432,48.8012,16.49441,48.80127,16.49451,48.80131,16.49464,48.80135,16.49471,48.80139]
Linestring should be separated by each second number with comma.
LINESTRING(16.49422 48.8011,1 6.49432 48.8012, ... )
I would probably create a function for this, as it makes the actual SQL that has to convert the JSON array to the "string" a lot easier to deal with:
create function jsonb_array_to_linestring(p_input jsonb)
returns text
as
$$
declare
l_num_elements int;
l_idx int;
l_result text;
begin
l_num_elements := jsonb_array_length(p_input);
if l_num_elements = 2 then
return 'point('||(p_input ->> 0)||' '||(p_input ->> 1)||')';
end if;
l_result := 'linestring(';
for l_idx in 0 .. l_num_elements - 2 by 2 loop
l_result := l_result || (p_input ->> l_idx) || ' ' || (p_input ->> l_idx + 1);
if l_idx < l_num_elements - 2 then
l_result := l_result || ',' ;
end if;
end loop;
l_result := l_result || ')';
return l_result;
end;
$$
language plpgsql;
Then you can use it like this:
select id, jsonb_array_to_linestring(input)
from test;
Online example
This assumes your column is defined as jsonb (which it should be). If you are using json instead, you need to adjust the code to that.
step-by-step demo:db<>fiddle
SELECT
st_makeline(point order by index) -- 6
FROM (
SELECT
ceil(index::numeric / 2) as index, -- 3
st_makepoint( -- 5
MAX(value) FILTER (WHERE index % 2 = 1), -- 4
MAX(value) FILTER (WHERE index % 2 = 0)
) as point
FROM
unnest( -- 1
ARRAY[16.49422,48.8011,16.49432,48.8012,16.49441,48.80127,16.49451,48.80131,16.49464,48.80135,16.49471,48.80139
]) WITH ORDINALITY elements(value, index) -- 2
GROUP BY 1 -- 3
) s
Extract all array elements into one record per element
This adds an index to the elements to store their position within the original index
Calculate the coordinate pairs. Here I used ceil(... / 2) to generate the same value for indexes: old index pair (1, 2) -> new index 1, (3, 4) -> 2, (5, 6) -> 3, ... This can be used for grouping
Now we use a conditional aggregation to create two columns: One for the odd indexes and one for the even, to generate a X/Y pair for your coordinates.
These pairs can be used to create a Point
Afterwards all Points can be merged into one line. To ensure the correct order, we use the recently created pair index.
If your input is not a normal Postgres array but a JSON array, you have to change 2 things (demo:db<>fiddle):
In (1): unnest(...) to json_array_elements_text(...)
In (4): MAX(value) to MAX(value::numeric)
Related
The goal is a custom aggregate function made with CREATE AGGREGATE called string_agg_oxford; it is an aggregate function that works similar to string_agg except it is smart enough to know how many items it is aggregating so that it can place "and" in front of the last item.
So where string_agg(items, ', ') would return "item1, item2, item3", string_agg_oxford(items) will return "item1, item2, and item3".
My failed attempt starts with a type for our accumulator that includes the total number of rows and the index for the current row:
CREATE TYPE oxford_accumulator as (
row_count numeric,
i numeric,
acc text
);
Now we need our accumulator function:
CREATE OR REPLACE FUNCTION oxford_acc (acc oxford_accumulator, curr text)
RETURNS oxford_accumulator
LANGUAGE PLPGSQL
AS $$
BEGIN
IF acc.i + 1 = acc.row_count THEN
RETURN (acc.row_count, acc.i + 1, acc.acc || curr);
END IF;
IF (acc.i + 2 = acc.row_count) AND (acc.row_count = 2) THEN
RETURN (acc.row_count, acc.i + 1, acc.acc || curr || ' and ');
END IF;
IF (i + 2 = acc.row_count) THEN
RETURN (acc.row_count, acc.i + 1, acc.acc || curr || ', and ');
END IF;
RETURN (acc.row_count, acc.i + 1, acc.acc || curr || ', ');
END;
$$;
because the accumulator has swallowed up the total count and the index we have to release this information when the accumulator is finished with an ffunc.
CREATE OR REPLACE FUNCTION oxford_final (acc oxford_accumulator)
RETURNS text
LANGUAGE PLPGSQL
AS $$
BEGIN
RETURN acc.acc;
END;
$$;
My idea falls apart here where we need to wire it all up because there does not seem to be a way to parametrize the total row count... so fail.
CREATE OR REPLACE AGGREGATE string_agg_oxford (text, row_count numeric) (
INITCOND = (row_count, 0, ''),
-- ^^^ fail
STYPE = oxford_accumulator,
SFUNC = oxford_acc,
FINALFUNC = oxford_final
);
I know something similar can be achieved with a regular function, but I'm not ready to give up yet if there's a way to do this as an aggregator that could be used in a select statement like SELECT string_agg_oxford(clients.full_name) FROM matters GROUP BY matters.matter_id;
Right, you don't know you are done until you are done. You would have to make two passes over the data, one to get the count and the other to construct the string. That wouldn't be very efficient, and there is no obvious way to make PostgreSQL do it that way that could be used as a simple aggregate. I think you could use a window function to get the total count, then an aggregate which takes two arguments (string and total count), but I think you would need to write that part in an "outer query", otherwise it wouldn't have access to the total count.
The straightfoward way to do this would be to use the same accumulator and transition function as string_agg(x, ', '), but have your finalizer function just peel the last value out of the accumulator, and tack it back on with an ', and '.
Or you could define an accumulator with (acc text, prev text) and have the transition function tack the prev into acc (if prev isn't null) then store its current value into prev. Then have the finalizer tack the final prev into acc with the ', and ' separator.
Initial condition must be "a string constant in the form accepted for the data type state_data_type" so no way to pass a dynamic value. Credit to #a_horse_with_no_name
For efficiency reasons, it would be easier to "peel the last value out of the accumulator." Credit to #jjanes.
Lets use a unique separator to alleviate any issues matching on ", " because that is fairly common - so we'll use a unique separator and then replace those in the ffunc as is appropriate.
Postgres does not support having capture groups within its lookahead assertions, so instead of a fancy regular expression to find the last occurrence of our unique separator, we will reverse the string and tackle things upside down.
CREATE OR REPLACE FUNCTION oxford_acc (acc text, curr text)
RETURNS text
LANGUAGE PLPGSQL
AS $$
BEGIN
RETURN acc || curr || '#$#$';
-- ^^^ unique separator, can be anything unique
END;
$$;
CREATE OR REPLACE FUNCTION oxford_final (acc text)
RETURNS text
LANGUAGE PLPGSQL
AS $$
DECLARE
my_result text;
my_counter numeric;
BEGIN
SELECT left(acc, -4) INTO my_result;
-- ^^^ removes the last separator
SELECT count(*) FROM regexp_matches(my_result, '#\$#\$', 'g') INTO my_counter;
IF my_counter = 0 THEN
RETURN my_result;
END IF;
IF my_counter = 1 THEN
RETURN regexp_replace(my_result, '#\$#\$', ' and ');
END IF;
SELECT reverse(my_result) INTO my_result;
SELECT regexp_replace(my_result, '\$#\$#', ' dna ,') INTO my_result;
SELECT reverse(my_result) INTO my_result;
RETURN regexp_replace(
my_result,
'#\$#\$',
', ',
'g'
);
END;
$$;
CREATE OR REPLACE AGGREGATE oxford_agg (text) (
INITCOND = '',
STYPE = text,
SFUNC = oxford_acc,
FINALFUNC = oxford_final
);
Suggestions for improvements welcome!
I have a column with data that looks like this in a single field:
"a,a,b,b,c,a,b,b,b,a,a,a,a,a,a,c,a,a,b"
Using some sort of regex or SQL function I would like to make it look like this:
"a,b,c,a,b,a,c,a,b"
Essentially I am trying to get rid of repeated values that appear in order but keep the unique changes from one value to another.
My knowledge of reg-expressions pretty much ends at removing duplicates. Any help is greatly appreciated!
use regexp:
SELECT regexp_replace('a,a,b,b,c,a,b,b,b,a,a,a,a,a,a,c,a,a,b', '(\w)(,\1)+', '\1', 'g')
(\w)(,\1)+ mutches: (any word char) and following (, and this same word char) more than one time...
Fiddle example
RegExr example
You can convert the elements into rows, check if the previous row is different to the current and then keep only those where something changed. This can then be aggregated back into a comma separated list:
select string_agg(ch, ',' order by idx)
from (
select u.ch, u.idx,
coalesce(u.ch <> lag(u.ch) over (order by u.idx), true) as is_change
from unnest(string_to_array('a,a,b,b,c,a,b,b,b,a,a,a,a,a,a,c,a,a,b', ',')) with ordinality as u(ch, idx)
) t
where is_change
The with ordinality returns the original array index, so that we can sort the elements correctly when aggregating them.
This can also be put into a function:
create or replace function cleanup(p_input text)
returns text
as
$$
select string_agg(ch, ',' order by idx)
from (
select u.ch, u.idx,
coalesce(u.ch <> lag(u.ch) over (order by u.idx), true) as is_change
from unnest(string_to_array(p_input, ',')) with ordinality as u(ch, idx)
) t
where is_change;
$$
language sql;
Online example
My understanding is:
If the character is the same as previous character, you want to remove it from the string.
So I will use while loop and if statement in this case:
--CREATE TABLE TEST (ID VARCHAR(100));
--INSERT INTO TEST VALUES ('a,a,b,b,c,a,b,b,b,a,a,a,a,a,a,c,a,a,b');
DO $$
DECLARE
V_NEWSTRING VARCHAR(100) := '';
V_I INTEGER := 1;
V_LENGTH INTEGER := 0;
V_CURRENT VARCHAR(10) := '';
V_LAST VARCHAR(10) := '';
BEGIN
SELECT LENGTH(ID) FROM TEST INTO V_LENGTH;
WHILE V_I <= V_LENGTH LOOP
SELECT SUBSTRING(ID,V_I,1) from TEST INTO V_CURRENT;
IF V_CURRENT <> V_LAST THEN
V_NEWSTRING = V_NEWSTRING || V_CURRENT || ',';
END IF;
V_LAST = V_CURRENT;
V_I = V_I + 2;
END LOOP;
raise notice 'Value: %', V_NEWSTRING;
END $$;
Test Result (PostgreSQL-9.4):
I'm converting a BDE query (Paradox) to a Firebird (2.5, not 3.x) and I have a very convenient conversion in it:
select TRIM(' 1') as order1, CAST(' 1' AS INTEGER) AS order2 --> 1
select TRIM(' 1 bis') as order1, CAST(' 1 bis' AS INTEGER) AS order2 --> 1
Then ordering by the cast value then the trimmed value (ORDER order2, order1) provide me the result I need:
1
1 bis
2 ter
100
101 bis
However, in Firebird casting an incorrect integer will raise an exception and I did not find any way around to provide same result. I think I can tell if a number is present with something like below, but I couldn't find a way to extract it.
TRIM(' 1 bis') similar to '[ [:ALPHA:]]*[[:DIGIT:]]+[ [:ALPHA:]]*'
[EDIT]
I had to handle cases where text were before the number, so using #Arioch'The's trigger, I got this running great:
SET TERM ^ ;
CREATE TRIGGER SET_MYTABLE_INTVALUE FOR MYTABLE ACTIVE
BEFORE UPDATE OR INSERT POSITION 0
AS
DECLARE I INTEGER;
DECLARE S VARCHAR(13);
DECLARE C VARCHAR(1);
DECLARE R VARCHAR(13);
BEGIN
IF (NEW.INTVALUE is not null) THEN EXIT;
S = TRIM( NEW.VALUE );
R = NULL;
I = 1;
WHILE (I <= CHAR_LENGTH(S)) DO
BEGIN
C = SUBSTRING( S FROM I FOR 1 );
IF ((C >= '0') AND (C <= '9')) THEN LEAVE;
I = I + 1;
END
WHILE (I <= CHAR_LENGTH(S)) DO
BEGIN
C = SUBSTRING( S FROM I FOR 1 );
IF (C < '0') THEN LEAVE;
IF (C > '9') THEN LEAVE;
IF (C IS NULL) THEN LEAVE;
IF (R IS NULL) THEN R=C; ELSE R = R || C;
I = I + 1;
END
NEW.INTVALUE = CAST(R AS INTEGER);
END^
SET TERM ; ^
Converting such a table, you have to add a special indexed integer column for keeping the extracted integer data.
Note, this query while using "very convenient conversion" is actually rather bad: you should use indexed columns to sort (order) large amounts of data, otherwise you are going into slow execution and waste a lot of memory/disk for temporary sorting tables.
So you have to add an extra integer indexed column and to use it in the query.
Next question is how to populate that column.
Better would be to do it once, when you move your entire database and application from BDE to Firebird. And from that point make your application when entering new data rows fill BOTH varchar and integer columns properly.
One time conversion can be done by your convertor application, then.
Or you can use selectable Stored Procedure that would repeat the table with such and added column. Or you can make Execute Block that would iterate through the table and update its rows calculating the said integer value.
How to SELECT a PROCEDURE in Firebird 2.5
If you would need to keep legacy applications, that only insert text column but not integer column, then I think you would have to use BEFORE UPDATE OR INSERT triggers in Firebird, that would parse the text column value letter by letter and extract integer from it. And then make sure your application never changes that integer column directly.
See a trigger example at Trigger on Update Firebird
PSQL language documentation: https://www.firebirdsql.org/file/documentation/reference_manuals/fblangref25-en/html/fblangref25-psql.html
Whether you would write procedure or trigger to populate the said added integer indexed column, you would have to make simple loop over characters, copying string from first digit until first non-digit.
https://www.firebirdsql.org/file/documentation/reference_manuals/fblangref25-en/html/fblangref25-functions-scalarfuncs.html#fblangref25-functions-string
https://www.firebirdsql.org/file/documentation/reference_manuals/fblangref25-en/html/fblangref25-psql-coding.html#fblangref25-psql-declare-variable
Something like that
CREATE TRIGGER my_trigger FOR my_table
BEFORE UPDATE OR INSERT
AS
DECLARE I integer;
DECLARE S VARCHAR(100);
DECLARE C VARCHAR(100);
DECLARE R VARCHAR(100);
BEGIN
S = TRIM( NEW.MY_TXT_COLUMN );
R = NULL;
I = 1;
WHILE (i <= CHAR_LENGTH(S)) DO
BEGIN
C = SUBSTRING( s FROM i FOR 1 );
IF (C < '0') THEN LEAVE;
IF (C > '9') THEN LEAVE;
IF (C IS NULL) THEN LEAVE;
IF (R IS NULL) THEN R=C; ELSE R = R || C;
I = I + 1;
END
NEW.MY_INT_COLUMN = CAST(R AS INTEGER);
END;
In this example your ORDER order2, order1 would become
SELECT ..... FROM my_table ORDER BY MY_INT_COLUMN, MY_TXT_COLUMN
Additionally, it seems your column actually contains a compound data: an integer index and an optional textual postfix. If so, then the data you have is not normalized and the table better be restructured.
CREATE TABLE my_table (
ORDER_Int INTEGER NOT NULL,
ORDER_PostFix VARCHAR(24) CHECK( ORDER_PostFix = TRIM(ORDER_PostFix) ),
......
ORDER_TXT COMPUTED BY (ORDER_INT || COALESCE( ' ' || ORDER_PostFix, '' )),
PRIMARY KEY (ORDER_Int, ORDER_PostFix )
);
When you would move your data from Paradox to Firebird - make your convertor application check and split those values like "1 bis" into two new columns.
And your query then would be like
SELECT ORDER_TXT, ... FROM my_table ORDER BY ORDER_Int, ORDER_PostFix
if you're using fb2.5 you can use the following:
execute block (txt varchar(100) = :txt )
returns (res integer)
as
declare i integer;
begin
i=1;
while (i<=char_length(:txt)) do begin
if (substring(:txt from i for 1) not similar to '[[:DIGIT:]]')
then txt =replace(:txt,substring(:txt from i for 1),'');
else i=i+1;
end
res = :txt;
suspend;
end
in fb3.0 you have more convenient way to do the same
select
cast(substring(:txt||'#' similar '%#"[[:DIGIT:]]+#"%' escape '#') as integer)
from rdb$database
--assuming that the field is varchar(15))
select cast(field as integer) from table;
Worked in firebird version 2.5.
I need to loop through type RECORD items by key/index, like I can do this using array structures in other programming languages.
For example:
DECLARE
data1 record;
data2 text;
...
BEGIN
...
FOR data1 IN
SELECT
*
FROM
sometable
LOOP
FOR data2 IN
SELECT
unnest( data1 ) -- THIS IS DOESN'T WORK!
LOOP
RETURN NEXT data1[data2]; -- SMTH LIKE THIS
END LOOP;
END LOOP;
As #Pavel explained, it is not simply possible to traverse a record, like you could traverse an array. But there are several ways around it - depending on your exact requirements. Ultimately, since you want to return all values in the same column, you need to cast them to the same type - text is the obvious common ground, because there is a text representation for every type.
Quick and dirty
Say, you have a table with an integer, a text and a date column.
CREATE TEMP TABLE tbl(a int, b text, c date);
INSERT INTO tbl VALUES
(1, '1text', '2012-10-01')
,(2, '2text', '2012-10-02')
,(3, ',3,ex,', '2012-10-03') -- text with commas
,(4, '",4,"ex,"', '2012-10-04') -- text with commas and double quotes
Then the solution can be a simple as:
SELECT unnest(string_to_array(trim(t::text, '()'), ','))
FROM tbl t;
Works for the first two rows, but fails for the special cases of row 3 and 4.
You can easily solve the problem with commas in the text representation:
SELECT unnest(('{' || trim(t::text, '()') || '}')::text[])
FROM tbl t
WHERE a < 4;
This would work fine - except for line 4 which has double quotes in the text representation. Those are escaped by doubling them up. But the array constructor would need them escaped by \. Not sure why this incompatibility is there ...
SELECT ('{' || trim(t::text, '()') || '}') FROM tbl t WHERE a = 4
Yields:
{4,""",4,""ex,""",2012-10-04}
But you would need:
SELECT '{4,"\",4,\"ex,\"",2012-10-04}'::text[]; -- works
Proper solution
If you knew the column names beforehand, a clean solution would be simple:
SELECT unnest(ARRAY[a::text,b::text,c::text])
FROM tbl
Since you operate on records of well know type you can just query the system catalog:
SELECT string_agg(a.attname || '::text', ',' ORDER BY a.attnum)
FROM pg_catalog.pg_attribute a
WHERE a.attrelid = 'tbl'::regclass
AND a.attnum > 0
AND a.attisdropped = FALSE
Put this in a function with dynamic SQL:
CREATE OR REPLACE FUNCTION unnest_table(_tbl text)
RETURNS SETOF text LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE '
SELECT unnest(ARRAY[' || (
SELECT string_agg(a.attname || '::text', ',' ORDER BY a.attnum)
FROM pg_catalog.pg_attribute a
WHERE a.attrelid = _tbl::regclass
AND a.attnum > 0
AND a.attisdropped = false
) || '])
FROM ' || _tbl::regclass;
END
$func$;
Call:
SELECT unnest_table('tbl') AS val
Returns:
val
-----
1
1text
2012-10-01
2
2text
2012-10-02
3
,3,ex,
2012-10-03
4
",4,"ex,"
2012-10-04
This works without installing additional modules. Another option is to install the hstore extension and use it like #Craig demonstrates.
PL/pgSQL isn't really designed for what you want to do. It doesn't consider a record to be iterable, it's a tuple of possibly different and incompatible data types.
PL/pgSQL has EXECUTE for dynamic SQL, but EXECUTE queries cannot refer to PL/pgSQL variables like NEW or other records directly.
What you can do is convert the record to a hstore key/value structure, then iterate over the hstore. Use each(hstore(the_record)), which produces a rowset of key,value tuples. All values are cast to their text representations.
This toy function demonstrates iteration over a record by creating an anonymous ROW(..) - which will have column names f1, f2, f3 - then converting that to hstore, iterating over its column/value pairs, and returning each pair.
CREATE EXTENSION hstore;
CREATE OR REPLACE FUNCTION hs_demo()
RETURNS TABLE ("key" text, "value" text)
LANGUAGE plpgsql AS
$$
DECLARE
data1 record;
hs_row record;
BEGIN
data1 = ROW(1, 2, 'test');
FOR hs_row IN SELECT kv."key", kv."value" FROM each(hstore(data1)) kv
LOOP
"key" = hs_row."key";
"value" = hs_row."value";
RETURN NEXT;
END LOOP;
END;
$$;
In reality you would never write it this way, since the whole loop can be replaced with a simple RETURN QUERY statement and it does the same thing each(hstore) does anyway - so this is only to show how each(hstore(record)) works, and the above function should never actually be used.
This feature is not supported in plpgsql - Record IS NOT hash array like other scripting languages - it is similar to C or ADA, where this functionality is impossible. You can use other PL language like PLPerl or PLPython or some tricks - you can iterate with HSTORE datatype (extension) or via dynamic SQL
see How to set value of composite variable field using dynamic SQL
But request for this functionality usually means, so you do some wrong. When you use PL/pgSQL you have think different than you use Javascript or Python
FOR data2 IN
SELECT d
from unnest( data1 ) s(d)
LOOP
RETURN NEXT data2;
END LOOP;
If you order your results prior to looping, will you accomplish what you want.
for rc in select * from t1 order by t1.key asc loop
return next rc;
end loop;
will do exactly what you need. It is also the fastest way to perform that kind of task.
I wasn't able to find a proper way to loop over record, so what I did is converted record to json first and looped over json
declare
_src_schema varchar := 'db_utility';
_targetjson json;
_key text;
_value text;
BEGIN
select row_to_json(c.*) from information_schema.columns c where c.table_name = prm_table and c.column_name = prm_column
and c.table_schema = _src_schema into _targetjson;
raise notice '_targetjson %', _targetjson;
FOR _key, _value IN
SELECT * FROM jsonb_each_text(_targetjson)
LOOP
-- do some math operation on its corresponding value
RAISE NOTICE '%: %', _key, _value;
END LOOP;
return true;
end;
I have a PostgreSQL table of this form:
base_id int | mods smallint[]
3 | {7,15,48}
I need to populate a table of this form:
combo_id int | base_id int | mods smallint[]
1 | 3 |
2 | 3 | {7}
3 | 3 | {7,15}
4 | 3 | {7,48}
5 | 3 | {7,15,48}
6 | 3 | {15}
7 | 3 | {15,48}
8 | 3 | {48}
I think I could accomplish this using a function that does almost exactly this, iterating over the first table and writing combinations to the second table:
Generate all combinations in SQL
But, I'm a Postgres novice and cannot for the life of me figure out how to do this using plpgsql. It doesn't need to be particularly fast; it will only be run periodically on the backend. The first table has approximately 80 records and a rough calculation suggests we can expect around 2600 records for the second table.
Can anybody at least point me in the right direction?
Edit: Craig: I've got PostgreSQL 9.0. I was successfully able to use UNNEST():
FOR messvar IN SELECT * FROM UNNEST(mods) AS mod WHERE mod BETWEEN 0 AND POWER(2, #n) - 1
LOOP
RAISE NOTICE '%', messvar;
END LOOP;
but then didn't know where to go next.
Edit: For reference, I ended up using Erwin's solution, with a single line added to add a null result ('{}') to each set and the special case Erwin refers to removed:
CREATE OR REPLACE FUNCTION f_combos(_arr integer[], _a integer[] DEFAULT '{}'::integer[], _z integer[] DEFAULT '{}'::integer[])
RETURNS SETOF integer[] LANGUAGE plpgsql AS
$BODY$
DECLARE
i int;
j int;
_up int;
BEGIN
IF array_length(_arr,1) > 0 THEN
_up := array_upper(_arr, 1);
IF _a = '{}' AND _z = '{}' THEN RETURN QUERY SELECT '{}'::int[]; END IF;
FOR i IN array_lower(_arr, 1) .. _up LOOP
FOR j IN i .. _up LOOP
CASE j-i
WHEN 0,1 THEN
RETURN NEXT _a || _arr[i:j] || _z;
ELSE
RETURN NEXT _a || _arr[i:i] || _arr[j:j] || _z;
RETURN QUERY SELECT *
FROM f_combos(_arr[i+1:j-1], _a || _arr[i], _arr[j] || _z);
END CASE;
END LOOP;
END LOOP;
ELSE
RETURN NEXT _arr;
END IF;
END;
$BODY$
Then, I used that function to populate my table:
INSERT INTO e_ecosystem_modified (ide_ecosystem, modifiers)
(SELECT ide_ecosystem, f_combos(modifiers) AS modifiers FROM e_ecosystem WHERE ecosystemgroup <> 'modifier' ORDER BY ide_ecosystem, modifiers);
From 79 rows in my source table with a maximum of 7 items in the modifiers array, the query took 250ms to populate 2630 rows in my output table. Fantastic.
After I slept over it I had a completely new, simpler, faster idea:
CREATE OR REPLACE FUNCTION f_combos(_arr anyarray)
RETURNS TABLE (combo anyarray) LANGUAGE plpgsql AS
$BODY$
BEGIN
IF array_upper(_arr, 1) IS NULL THEN
combo := _arr; RETURN NEXT; RETURN;
END IF;
CASE array_upper(_arr, 1)
-- WHEN 0 THEN -- does not exist
WHEN 1 THEN
RETURN QUERY VALUES ('{}'), (_arr);
WHEN 2 THEN
RETURN QUERY VALUES ('{}'), (_arr[1:1]), (_arr), (_arr[2:2]);
ELSE
RETURN QUERY
WITH x AS (
SELECT f.combo FROM f_combos(_arr[1:array_upper(_arr, 1)-1]) f
)
SELECT x.combo FROM x
UNION ALL
SELECT x.combo || _arr[array_upper(_arr, 1)] FROM x;
END CASE;
END
$BODY$;
Call:
SELECT * FROM f_combos('{1,2,3,4,5,6,7,8,9}'::int[]) ORDER BY 1;
512 rows, total runtime: 2.899 ms
Explain
Treat special cases with NULL and empty array.
Build combinations for a primitive array of two.
Any longer array is broken down into:
the combinations for same array of length n-1
plus all of those combined with element n .. recursively.
Really simple, once you got it.
Works for 1-dimensional integer arrays starting with subscript 1 (see below).
2-3 times as fast as old solution, scales better.
Works for any element type again (using polymorphic types).
Includes the empty array in the result as is displayed in the question (and as #Craig pointed out to me in the comments).
Shorter, more elegant.
This assumes array subscripts starting at 1 (Default). If you are not sure about your values, call the function like this to normalize:
SELECT * FROM f_combos(_arr[array_lower(_arr, 1):array_upper(_arr, 1)]);
Not sure if there is a more elegant way to normalize array subscripts. I posted a question about that:
Normalize array subscripts for 1-dimensional array so they start with 1
Old solution (slower)
CREATE OR REPLACE FUNCTION f_combos2(_arr int[], _a int[] = '{}', _z int[] = '{}')
RETURNS SETOF int[] LANGUAGE plpgsql AS
$BODY$
DECLARE
i int;
j int;
_up int;
BEGIN
IF array_length(_arr,1) > 0 THEN
_up := array_upper(_arr, 1);
FOR i IN array_lower(_arr, 1) .. _up LOOP
FOR j IN i .. _up LOOP
CASE j-i
WHEN 0,1 THEN
RETURN NEXT _a || _arr[i:j] || _z;
WHEN 2 THEN
RETURN NEXT _a || _arr[i:i] || _arr[j:j] || _z;
RETURN NEXT _a || _arr[i:j] || _z;
ELSE
RETURN NEXT _a || _arr[i:i] || _arr[j:j] || _z;
RETURN QUERY SELECT *
FROM f_combos2(_arr[i+1:j-1], _a || _arr[i], _arr[j] || _z);
END CASE;
END LOOP;
END LOOP;
ELSE
RETURN NEXT _arr;
END IF;
END;
$BODY$;
Call:
SELECT * FROM f_combos2('{7,15,48}'::int[]) ORDER BY 1;
Works for 1-dimensional integer arrays.
This could be further optimized, but that's certainly not needed for the scope of this question.
ORDER BY to impose the order displayed in the question.
Provide for NULL or empty array, as NULL is mentioned in the comments.
Tested with PostgreSQL 9.1, but should work with any halfway modern version.
array_lower() and array_upper() have been around for at least since PostgreSQL 7.4. Only parameter defaults are new in version 8.4. Could easily be replaced.
Performance is decent.
SELECT DISTINCT * FROM f_combos('{1,2,3,4,5,6,7,8,9}'::int[]) ORDER BY 1;
511 rows, total runtime: 7.729 ms
Explanation
It builds on this simple form that only creates all combinations of neighboring elements:
CREATE FUNCTION f_combos(_arr int[])
RETURNS SETOF int[] LANGUAGE plpgsql AS
$BODY$
DECLARE
i int;
j int;
_up int;
BEGIN
_up := array_upper(_arr, 1);
FOR i in array_lower(_arr, 1) .. _up LOOP
FOR j in i .. _up LOOP
RETURN NEXT _arr[i:j];
END LOOP;
END LOOP;
END;
$BODY$;
But this will fail for sub-arrays with more than two elements. So:
For any sub-array with 3 elements one array with just the outer two elements is added. this is a shortcut for this special case that improves performance and is not strictly needed.
For any sub-array with more than 3 elements I take the outer two elements and fill in with all combinations of inner elements built by the same function recursively.
One approach is with a recursive CTE. Erwin's updated recursive function is significantly faster and scales better, though, so this is really useful as an interesting different approach. Erwin's updated version is much more practical.
I tried a bit counting approach (see the end) but without a fast way to pluck arbitrary elements from an array it proved slower then either recursive approach.
Recursive CTE combinations function
CREATE OR REPLACE FUNCTION combinations(anyarray) RETURNS SETOF anyarray AS $$
WITH RECURSIVE
items AS (
SELECT row_number() OVER (ORDER BY item) AS rownum, item
FROM (SELECT unnest($1) AS item) unnested
),
q AS (
SELECT 1 AS i, $1[1:0] arr
UNION ALL
SELECT (i+1), CASE x
WHEN 1 THEN array_append(q.arr,(SELECT item FROM items WHERE rownum = i))
ELSE q.arr END
FROM generate_series(0,1) x CROSS JOIN q WHERE i <= array_upper($1,1)
)
SELECT q.arr AS mods
FROM q WHERE i = array_upper($1,1)+1;
$$ LANGUAGE 'sql';
It's a polymorphic function, so it'll work on arrays of any type.
The logic is to iterate over each item in the unnested input set, using a working table. Start with an empty array in the working table, with a generation number of 1. For each entry in the input set insert two new arrays into the working table with an incremented generation number. One of the two is a copy of the input array from the previous generation and the other is the input array with the (generation-number)'th item from the input set appended to it. When the generation number exceeds the number of items in the input set, return the last generation.
Usage
You can use the combinations(smallint[]) function to produce the results you desire, using it as a set-returning function in combinatin with the row_number window function.
-- assuming table structure
regress=# \d comb
Table "public.comb"
Column | Type | Modifiers
---------+------------+-----------
base_id | integer |
mods | smallint[] |
SELECT base_id, row_number() OVER (ORDER BY mod) AS mod_id, mod
FROM (SELECT base_id, combinations(mods) AS mod FROM comb WHERE base_id = 3) x
ORDER BY mod;
Results
regress=# SELECT base_id, row_number() OVER (ORDER BY mod) AS mod_id, mod
regress-# FROM (SELECT base_id, combinations(mods) AS mod FROM comb WHERE base_id = 3) x
regress-# ORDER BY mod;
base_id | mod_id | mod
---------+--------+-----------
3 | 1 | {}
3 | 2 | {7}
3 | 3 | {7,15}
3 | 4 | {7,15,48}
3 | 5 | {7,48}
3 | 6 | {15}
3 | 7 | {15,48}
3 | 8 | {48}
(8 rows)
Time: 2.121 ms
Zero element arrays produce a null result. If you want combinations({}) to return one row {} then a UNION ALL with {} will do the job.
Theory
It appears you want the k-combinations for all k in a k-multicombination, rather than simple combinations. See number of combinations with repetition.
In other words, you want all k-combinations of elements from your set, for all k from 0 to n where n is the set size.
Related SO question: SQL - Find all possible combination, which has the really interesting answer about bit counting.
Bit operations exist in Pg, so a bit counting approach should be possible. You'd expect it to be more efficient, but because it's so slow to select a scattered subset of elements from an array it actually works out slower.
CREATE OR REPLACE FUNCTION bitwise_subarray(arr anyarray, elements integer)
RETURNS anyarray AS $$
SELECT array_agg($1[n+1])
FROM generate_series(0,array_upper($1,1)-1) n WHERE ($2>>n) & 1 = 1;
$$ LANGUAGE sql;
COMMENT ON FUNCTION bitwise_subarray(anyarray,integer) IS 'Return the elements from $1 where the corresponding bit in $2 is set';
CREATE OR REPLACE FUNCTION comb_bits(anyarray) RETURNS SETOF anyarray AS $$
SELECT bitwise_subarray($1, x)
FROM generate_series(0,pow(2,array_upper($1,1))::integer-1) x;
$$ LANGUAGE 'sql';
If you could find a faster way to write bitwise_subarray then comb_bits would be very fast. Like, say, a small C extension function, but I'm only crazy enough to write one of those for an SO answer.