PostgreSQL function or stored procedure that outputs multiple columns? - postgresql

Here is what I ideally want. Imagine that I have a table with the row A.
I want to do:
SELECT A, func(A) FROM table
and for the output to have say 4 columns.
Is there any way to do this? I have seen things on custom types or whatever that let you sort of get a result that would look like
A,(B,C,D)
But it would be really great if I could have that one function return multiple columns without any more finagling.
Is there anything that can do something like this?

If the function func returns only 1 row with 3 values, such as:
CREATE OR REPLACE FUNCTION func
(
input_val integer,
OUT output_val1 integer,
OUT output_val2 integer,
OUT output_val3 integer
)
AS $$
BEGIN
output_val1 := input_val + 1;
output_val2 := input_val + 2;
output_val3 := input_val + 3;
END;
$$ LANGUAGE plpgsql;
and you then execute SELECT a, func(a) FROM table1 you'll get:
a | func
integer | record
========|==========
1 | (2, 3, 4)
2 | (3, 4, 5)
3 | (4, 5, 6)
but, if you execute:
SELECT a, (f).output_val1, (f).output_val2, (f).output_val3
FROM (SELECT a, func(a) AS f FROM table1) AS x
you'll get:
a | output_val1 | output_val2 | output_val3
integer | integer | integer | integer
========|=============|=============|=============
1 | 2 | 3 | 4
2 | 3 | 4 | 5
3 | 4 | 5 | 6
or, using CTE (Common Table Expressions), if you execute:
WITH temp AS (SELECT a, func(a) AS f FROM table1)
SELECT a, (f).output_val1, (f).output_val2, (f).output_val3 FROM temp
you'll also get:
a | output_val1 | output_val2 | output_val3
integer | integer | integer | integer
========|=============|=============|=============
1 | 2 | 3 | 4
2 | 3 | 4 | 5
3 | 4 | 5 | 6
Note: you may also use the following queries to obtain the same results:
SELECT a, (f).*
FROM (SELECT a, func(a) AS f FROM table1) AS x
or
WITH temp AS (SELECT a, func(a) AS f FROM table1)
SELECT a, (f).* FROM temp

I agree with bambam's answer but would like to point out that JackPDouglas's more succinct syntax SELECT a, (func(a)).* FROM table1, from my tests, would actually execute the function once for each column returned whereas the CTE expression will only execute the function once. So the CTE expression is preferred if the function takes a long time to execute.

If the function always returns 3 columns, you can do something like that:
CREATE TYPE sometype AS (b INT, c TEXT, d TEXT);
CREATE OR REPLACE FUNCTION func(a TEXT) RETURNS SETOF sometype AS $$
BEGIN
RETURN QUERY EXECUTE 'SELECT b, c, d FROM ' || a;
END;
$$ LANGUAGE plpgsql;
SELECT a, (f).b, (f).c, (f).d
FROM (SELECT a, func(a) AS f FROM table) x;
If you can access the table from within a view, maybe you can create a view in some way
CREATE VIEW v AS
SELECT 'tab1' AS a, b, c, d FROM tab1 WHERE 'tab1' IN (SELECT a FROM table)
UNION
SELECT 'tab2' AS a, b, c, d FROM tab2 WHERE 'tab2' IN (SELECT a FROM table)
UNION
SELECT 'tab3' AS a, b, c, d FROM tab3 WHERE 'tab3' IN (SELECT a FROM table);
then it's just a SELECT * FROM v. But again this looks like Inheritance could be used.

I think you will want to return a single record, with multiple columns? In that case you can use the return-type RECORD for example. This will allow you to return an anonymous variable with as many columns as you want. You can find more information about all the different variables here:
http://www.postgresql.org/docs/9.0/static/plpgsql-declarations.html
And about return types:
http://www.postgresql.org/docs/9.0/static/xfunc-sql.html#XFUNC-OUTPUT-PARAMETERS
If you want to return multiple records with multiple columns, first check and see if you have to use a stored procedure for this. It might be an option to just use a VIEW (and query it with a WHERE-clause) instead. If that's not a good option, there is the possibility of returning a TABLE from a stored procedure in version 9.0.

Related

How to count values in a column that come after a particular value

I have a column that has two types of values A or B. I need to find the count of values that come after the first occurrence of A.
eg
column
B
B
B
A
B
A
B
The result in this case would be 4 as their are 4 entries after the first occurrence of A(including A)
You can use a sub query to get the id number (of whatever column you are using to order the columns) of the first 'A'.
CREATE TABLE t (
id serial,
col char(1));
insert into t (col) values
('B'),
('B'),
('A'),
('B'),
('A'),
('B')
✓
6 rows affected
select
count(*) NUM
from t
where id >=
(select MIN(id)
from t
where col = 'A');
| num |
| --: |
| 4 |
db<>fiddle here

Why subqueried function does not insert new rows?

I need a function to insert rows because one column's (seriano) default value should be the same as PK id.
I have defined table:
CREATE SEQUENCE some_table_id_seq
INCREMENT 1
START 1
MINVALUE 1
MAXVALUE 9223372036854775807
CACHE 1;
CREATE TABLE some_table
(
id bigint NOT NULL DEFAULT nextval('some_table_id_seq'::regclass),
itemid integer NOT NULL,
serialno bigint,
CONSTRAINT stockitem_pkey PRIMARY KEY (id),
CONSTRAINT stockitem_serialno_key UNIQUE (serialno)
);
and function to insert count of rows:
CREATE OR REPLACE FUNCTION insert_item(itemid int, count int DEFAULT 1) RETURNS SETOF bigint AS
$func$
DECLARE
ids bigint[] DEFAULT '{}';
id bigint;
BEGIN
FOR counter IN 1..count LOOP
id := NEXTVAL( 'some_table_id_seq' );
INSERT INTO some_table (id, itemid, serialno) VALUES (id, itemid, id);
ids := array_append(ids, id);
END LOOP;
RETURN QUERY SELECT unnest(ids);
END
$func$
LANGUAGE plpgsql;
And inserting with it works fine:
$ select insert_item(123, 10);
insert_item
-------------
1
2
3
4
5
6
7
8
9
10
(10 rows)
$ select * from some_table;
id | itemid | serialno
----+--------+----------
1 | 123 | 1
2 | 123 | 2
3 | 123 | 3
4 | 123 | 4
5 | 123 | 5
6 | 123 | 6
7 | 123 | 7
8 | 123 | 8
9 | 123 | 9
10 | 123 | 10
(10 rows)
But if I want to use function insert_item as subquery, it seems not to work anymore:
$ select id, itemid from some_table where id in (select insert_item(123, 10));
id | itemid
----+--------
(0 rows)
I created dumb function insert_dumb to test in a subquery:
CREATE OR REPLACE FUNCTION insert_dumb(itemid int, count int DEFAULT 1) RETURNS SETOF bigint AS
$func$
DECLARE
ids bigint[] DEFAULT '{}';
BEGIN
FOR counter IN 1..count LOOP
ids := array_append(ids, counter::bigint);
END LOOP;
RETURN QUERY SELECT unnest(ids);
END
$func$
LANGUAGE plpgsql;
and this works in a subquery as expected:
$ select id, itemid from some_table where id in (select insert_dumb(123, 10));
id | itemid
----+--------
1 | 123
2 | 123
3 | 123
4 | 123
5 | 123
6 | 123
7 | 123
8 | 123
9 | 123
10 | 123
(10 rows)
Why does insert_item function not insert new rows when called as subquery? I tried to add raise notice to the loop and it runs as expected shouting new id every time (and increasing the sequence), but no new rows are appended to the table.
I made all the setup available as fiddle
I am using Postgres 11 on Ubuntu.
EDIT
Of course, I let out my real reason, and it pays off...
I need the insert_item function returning ids, so I could use it in update-statement, like:
update some_table set some_text = 'x' where id in (select insert_item(123, 10);)
And addition to the why-question: it is understandable I can get no ids in return (because they share the same snapshot), but the function runs all the needed INSERTs without affecting the table. Shouldn't those rows be available in the next query?
The problem is that the subquery and the surrounding query share the same snapshot, that is, they see the same state of the database. Hence the outer query cannot see the rows inserted by the inner query.
See the documentation (which explains that in the context of WITH, although it also applies here):
The sub-statements in WITH are executed concurrently with each other and with the main query. Therefore, when using data-modifying statements in WITH, the order in which the specified updates actually happen is unpredictable. All the statements are executed with the same snapshot (see Chapter 13), so they cannot “see” one another's effects on the target tables.
In addition, there is a second problem with your approach: if you run EXPLAIN (ANALYZE) on your statement, you will find that the subquery is not executed at all! Since the table is empty, there is no id, and running the subquery is not necessary to calculate the (empty) result.
You will have to run that in two different statements. Or, better, do it in a different fashion: updating a row that you just inserted is unnecessarily wasteful.
Laurenz explained the visibility problem, but you don't need the sub-query at all if you re-write your function to return the actual table, rather than just he IDs
CREATE OR REPLACE FUNCTION insert_item(itemid int, count int DEFAULT 1)
RETURNS setof some_table
AS
$func$
INSERT INTO some_table (id, itemid, serialno)
select NEXTVAL( 'some_table_id_seq' ), itemid, currval('some_table_id_seq')
from generate_series(1,count)
returning *;
$func$
LANGUAGE sql;
Then you can use it like this:
select id, itemid
from insert_item(123, 10);
And you get the complete inserted rows.
Online example

Split array of values into rows in Amazon Redshift

How can i split an array of values in a column into corresponding rows in Redshift using a delimiter (,) ?
Input Data:-
—————————————
Empid | Items
—————————————
1001| A, B
1002| B
1003| C, D, E
Required Output:-
—————————————
Empid | Items
—————————————
1001| A
1001| B
1002| B
1003| C
1003| D
1003| E
Any help is appreciated.
Thanks
Based on the official docs, you can do with JOIN!
Let's say your input is:
—————————————
empid | items
—————————————
1001| [A, B]
1002| [B]
1003| [C, D, E]
1004| []
Then you can do it as:
SELECT t.empid, items as item
FROM table_name AS t
LEFT JOIN t.items AS items ON TRUE
This will returns:
—————————————
empid | item
—————————————
1001| A
1001| B
1002| B
1003| C
1003| D
1003| E
1004| <NULL>
There is no way to "functionally" do this in redshift.
Instead you need something like this
select empid,'A' as items from input where items ilike 'A,'
union all
select empid,'B' as items from input where items ilike 'B,'
union all
select empid,'C' as items from input where items ilike 'C,'
union all
select empid,'D' as items from input where items ilike 'D,'
union all
select empid,'E' as items from input where items ilike 'E,'
Actually with the addition of stored procedures to Redshift this is possible
The procedure below accepts two parameters (source_table and target_table)
assuming both table exists it transforms the data described in the question
The way it works is
Reads data from source table row by row
Finds out max items in Items column
In a loop extracts each item
Inserts id + item combo into target table
CREATE OR REPLACE PROCEDURE Array_to_Rows(source_table VARCHAR, target_table VARCHAR)
LANGUAGE plpgsql
AS $$
DECLARE i INTEGER;
rec RECORD;
query VARCHAR;
item VARCHAR;
cnt INTEGER;
BEGIN
query := 'SELECT * FROM ' || source_table;
FOR rec IN EXECUTE query
LOOP
select INTO cnt regexp_count(rec.items,',')+1;
i := 1;
<< items_loop >>
LOOP
SELECT INTO item trim(split_part(rec.items,',',i));
EXECUTE 'INSERT INTO ' || target_table || ' values (' || rec.Empid || ',''' || item ||''')';
i := i + 1;
EXIT items_loop WHEN (i > cnt);
END LOOP;
END LOOP;
END;
$$
Usage: CALL Array_to_Rows('source table name','target table name')
With test data in the question it took less than 0.2 seconds, don't know how big OPs data set is
Output is
Empid item
1001 A
1001 B
1002 B
1003 C
1003 D
1003 E

Query with condition on array items in PostgreSQL

I would like to select rows in a table in which a certain number of items in an array column meet a comparison condition (>= n). Is this possible without using unnest?
unnest() is a natural way to count filtered elements in an array.
However, you can hide this in an sql function like this:
create or replace function number_of_elements(arr int[], val int)
returns bigint language sql
as $$
select count(*)
from unnest(arr) e
where e > val;
$$;
with test(id, arr) as (
values
(1, array[1,2,3,4]),
(2, array[3,4,5,6]))
select id, arr, number_of_elements(arr, 3)
from test;
id | arr | number_of_elements
----+-----------+--------------------
1 | {1,2,3,4} | 1
2 | {3,4,5,6} | 3
(2 rows)

Postgres: The best way to optmize "greater than" query

what is the best way to optimize join query that join with the same table on the next id value at the sub group? For now I have something like this:
CREATE OR REPLACE FUNCTION select_next_id(bigint, bigint) RETURNS bigint AS $body$
DECLARE
_id bigint;
BEGIN
SELECT id INTO _id FROM table WHERE id_group = $2 AND id > $1 ORDER BY id ASC LIMIT 1;
RETURN _id;
END;
$body$ LANGUAGE plpgsql;
And the JOIN query:
SELECT * FROM table t1
JOIN table t2 ON t2.id = select_next_id(t1.id, t1.id_group)
The table have more than 2kk rows, and it takes very very long. Is there a better way to do this quick? Also I have UNIQUE INDEX on column id. Not very helpfull I guess.
Some sample data:
id | id_group
=============
1 | 1
2 | 1
3 | 1
4 | 2
5 | 2
6 | 2
20 | 4
25 | 4
37 | 4
40 | 1
55 | 2
And I want to recieve something like this:
id | id_next
1 | 2
2 | 3
3 | null
4 | 5
5 | 6
6 | 55
and so on.
For the query in the function, you need an index on (id_group, id), not just (id).
Next, you don't need the overhead of plpgsql in the function itself, and you can give a few hints to the planner by making it as stable and having a small cost:
CREATE OR REPLACE FUNCTION select_next_id(bigint, bigint) RETURNS bigint AS $body$
SELECT id FROM table WHERE id_group = $2 AND id > $1 ORDER BY id ASC LIMIT 1;
$body$ LANGUAGE sql STABLE COST 10;
In the final query, depending on what you're actually trying to do, you might be able to get rid of the join and the function call by using lead() as highlighted by the horse:
http://www.postgresql.org/docs/current/static/tutorial-window.html
I'm not entirely sure, but I think you want something like this:
select id,
lead(id) over (partition by id_group order by id) as id_next
from the_table
order by id, id_next;