Postgres - Return multiple columns in function - postgresql

At the company that I work at it is common to store some metadata with each table row such as the time at which the row has been created and by whom (in reality, there is more metadata, but I will just stick to these two for simplicity).
The columns are "creationTimestamp" and "createdBy". Explicitly selecting each of the columns is quite tedious and I would prefer a soluition where I just have to tell the SQL that it should kindly select all of the metadata columns at once.
Standard-SQL (with explicit select) looks like this:
SELECT "foo", "creationDate", "createdBy" from "bar";
and I would like a solution that looks something like this:
SELECT "foo", select_default_meta() from "bar";
Also please note that a simple select * from "bar" is not a viable solution!

demo
BEGIN;
CREATE temp TABLE bar (
col1 text,
col2 int,
col3 date,
col4 timestamptz,
creationDate date,
created_by text,
last_update timestamptz,
last_updated_by text
);
CREATE temp VIEW bar_meta AS
SELECT
creationDate,
created_by,
last_update,
last_updated_by
FROM
bar;
COMMIT;
CREATE OR REPLACE FUNCTION bar_metas (IN text)
RETURNS TABLE (
creationDate date,
created_by text,
last_update timestamptz,
last_updated_by text
)
AS $a$
BEGIN
RAISE NOTICE '%', quote_ident($1);
RETURN query
SELECT
creationDate,
created_by,
last_update,
last_updated_by
FROM
quote_ident($1);
END;
$a$
LANGUAGE plpgsql;
expand meta columns from function.
SELECT
*
FROM (
SELECT
col1,
col2,
col3
FROM
bar) c,
LATERAL bar_metas ('bar');
directly use view
SELECT
col1,
col2,
col3,
b.*
FROM
bar
CROSS JOIN bar_meta b;
demo both view and function. view is much easier. But, the view access control is kind of tricky.
SELECT "foo", select_default_meta() from "bar";. select_default_meta() will only generate one column. Further explanation: https://www.postgresql.org/docs/current/xfunc-sql.html#XFUNC-SQL-COMPOSITE-FUNCTIONS
To make function expand all the columns, then you need to somehow use pattern. select * from function(); This is the reason why need use lateral.

Related

ERROR: column "int4" specified more than once

Steps for Execution:
Table Creation
CREATE TABLE xyz.table_a(
id bigint NOT NULL,
scores jsonb,
CONSTRAINT table_a_pkey PRIMARY KEY (id)
);
Add some dummy data :
INSERT INTO xyz.table_a(
id, scores)
VALUES (1, '{"a":20,"b":20}');
Function Creation
CREATE OR REPLACE FUNCTION xyz.example(
table_name text,
regular_columns text,
json_column text,
view_name text
) RETURNS text
LANGUAGE 'plpgsql'
COST 100
VOLATILE
AS $BODY$
DECLARE
cols TEXT;
cols_sum TEXT;
BEGIN
EXECUTE
format(
$ex$SELECT string_agg(
format(
'CAST(%2$s->>%%1$L AS INTEGER)',
key),
', '
)
FROM (SELECT DISTINCT key
FROM %1$s, jsonb_each(%2$s)
ORDER BY 1
) s;$ex$,
table_name, json_column
)
INTO cols;
EXECUTE
format(
$ex$SELECT string_agg(
format(
'CAST(%2$s->>%%1$L AS INTEGER)',
key
),
'+'
)
FROM (SELECT DISTINCT key
FROM %1$s, jsonb_each(%2$s)
ORDER BY 1) s;$ex$,
table_name, json_column
)
INTO cols_sum;
EXECUTE
format(
$ex$DROP VIEW IF EXISTS %2$s;
CREATE VIEW %2$s AS
SELECT %3$s, %4$s, SUM(%5$s) AS total
FROM %1$s
GROUP BY %3$s$ex$,
table_name, view_name, regular_columns, cols, cols_sum
);
RETURN cols;
END
$BODY$:
Call Function
SELECT xyz.example(
'xyz.table_a',
' id',
'scores',
'xyz.view_table_a'
);
Once you run these steps, I am getting an error
ERROR: column "int4" specified more than once
CONTEXT: SQL statement "
DROP VIEW IF EXISTS xyz.view_table_a;
CREATE VIEW xyz.view_table_a AS
SELECT id, CAST(scores->>'a' AS INTEGER), CAST(scores->>'b' AS INTEGER), SUM(CAST(scores->>'a' AS INTEGER)+CAST(scores->>'b' AS INTEGER)) AS total FROM xyz.table_a GROUP BY id
Look at the error message closely:
...
SELECT id, CAST(scores->>'a' AS INTEGER), CAST(scores->>'b' AS INTEGER),
...
There are multiple expressions without column alias. A named column like "id" defaults to the given name. But other expressions default to the internal type name, which is "int4" for integer. One might assume that the JSON key name is used, but that's not so. CAST(scores->>'a' AS INTEGER) is just another expression returning an unnamed integer value.
This still works for a plain SELECT. Postgres tolerates duplicate column names in the (outer) SELECT list. But a VIEW cannot be created that way. Would result in ambiguities.
Either add column aliases to expressions in the SELECT list:
SELECT id, CAST(scores->>'a' AS INTEGER) AS a, CAST(scores->>'b' AS INTEGER) AS b, ...
Or add a list of column names to CREATE VIEW:
CREATE VIEW xyz.view_table_a(id, a, b, ...) AS ...
Something like this should fix your function (preserving literal spelling of JSON key names:
...
format(
'CAST(%2$s->>%%1$L AS INTEGER) AS %%1$I',
key),
...
See the working demo here:
db<>fiddle here
Aside, your nested format() calls make the code pretty hard to read and maintain.

Run a stored procedure using select columns as input parameters?

I have a select query that returns a dataset with "n" records in one column. I would like to use this column as the parameter in a stored procedure. Below a reduced example of my case.
The query:
SELECT code FROM rawproducts
The dataset:
CODE
1
2
3
The stored procedure:
ALTER PROCEDURE [dbo].[MyInsertSP]
(#code INT)
AS
BEGIN
INSERT INTO PRODUCTS description, price, stock
SELECT description, price, stock
FROM INVENTORY I
WHERE I.icode = #code
END
I already have the actual query and stored procedure done; I just am not sure how to put them both together.
I would appreciate any assistance here! Thank you!
PS: of course the stored procedure is not as simple as above. I just choose to use a very silly example to keep things small here. :)
Here's two methods for you, one using a loop without a cursor:
DECLARE #code_list TABLE (code INT);
INSERT INTO #code_list SELECT code, ROW_NUMBER() OVER (ORDER BY code) AS row_id FROM rawproducts;
DECLARE #count INT;
SELECT #count = COUNT(*) FROM #code_list;
WHILE #count > 0
BEGIN
DECLARE #code INT;
SELECT #code = code FROM #code_list WHERE row_id = #count;
EXEC MyInsertSP #code;
DELETE FROM #code_list WHERE row_id = #count;
SELECT #count = COUNT(*) FROM #code_list;
END;
This works by putting the codes into a table variable, and assigning a number from 1..n to each row. Then we loop through them, one at a time, deleting them as they are processed, until there is nothing left in the table variable.
But here's what I would consider a better method:
CREATE TYPE dbo.code_list AS TABLE (code INT);
GO
CREATE PROCEDURE MyInsertSP (
#code_list dbo.code_list)
AS
BEGIN
INSERT INTO PRODUCTS (
[description],
price,
stock)
SELECT
i.[description],
i.price,
i.stock
FROM
INVENTORY i
INNER JOIN #code_list cl ON cl.code = i.code;
END;
GO
DECLARE #code_list dbo.code_list;
INSERT INTO #code_list SELECT code FROM rawproducts;
EXEC MyInsertSP #code_list = #code_list;
To get this to work I create a user-defined table type, then use this to pass a list of codes into the stored procedure. It means slightly rewriting your stored procedure, but the actual code to do the work is much smaller.
(how to) Run a stored procedure using select columns as input
parameters?
What you are looking for is APPLY; APPLY is how you use columns as input parameters. The only thing unclear is how/where the input column is populated. Let's start with sample data:
IF OBJECT_ID('dbo.Products', 'U') IS NOT NULL DROP TABLE dbo.Products;
IF OBJECT_ID('dbo.Inventory','U') IS NOT NULL DROP TABLE dbo.Inventory;
IF OBJECT_ID('dbo.Code','U') IS NOT NULL DROP TABLE dbo.Code;
CREATE TABLE dbo.Products
(
[description] VARCHAR(1000) NULL,
price DECIMAL(10,2) NOT NULL,
stock INT NOT NULL
);
CREATE TABLE dbo.Inventory
(
icode INT NOT NULL,
[description] VARCHAR(1000) NULL,
price DECIMAL(10,2) NOT NULL,
stock INT NOT NULL
);
CREATE TABLE dbo.Code(icode INT NOT NULL);
INSERT dbo.Inventory
VALUES (10,'',20.10,3),(11,'',40.10,3),(11,'',25.23,3),(11,'',55.23,3),(12,'',50.23,3),
(15,'',33.10,3),(15,'',19.16,5),(18,'',75.00,3),(21,'',88.00,3),(21,'',100.99,3);
CREATE CLUSTERED INDEX uq_inventory ON dbo.Inventory(icode);
The function:
CREATE FUNCTION dbo.fnInventory(#code INT)
RETURNS TABLE AS RETURN
SELECT i.[description], i.price, i.stock
FROM dbo.Inventory I
WHERE I.icode = #code;
USE:
DECLARE #code TABLE (icode INT);
INSERT #code VALUES (10),(11);
SELECT f.[description], f.price, f.stock
FROM #code AS c
CROSS APPLY dbo.fnInventory(c.icode) AS f;
Results:
description price stock
-------------- -------- -----------
20.10 3
40.10 3
Updated Proc (note my comments):
ALTER PROC dbo.MyInsertSP -- (1) Lose the input param
AS
-- (2) Code that populates the "code" table
INSERT dbo.Code VALUES (10),(11);
-- (3) Use CROSS APPLY to pass the values from dbo.code to your function
INSERT dbo.Products ([description], price, stock)
SELECT f.[description], f.price, f.stock
FROM dbo.code AS c
CROSS APPLY dbo.fnInventory(c.icode) AS f;
This ^^^ is how it's done.

Copy rows into same table, but change value of one field

I have a list of values:
(56957,85697,56325,45698,21367,56397,14758,39656)
and a 'template' row in a table.
I want to do this:
for value in valuelist:
{
insert into table1 (field1, field2, field3, field4)
select value1, value2, value3, (value)
from table1
where ID = (ID of template row)
}
I know how I would do this in code, like c# for instance, but I'm not sure how to 'loop' this while passing in a new value to the insert statement. (i know that code makes no sense, just trying to convey what I'm trying to accomplish.
There is no need to loop here, SQL is a set based language and you apply your operations to entire sets of data all at once as opposed to looping through row by row.
insert statements can come from either an explicit list of values or from the result of a regular select statement, for example:
insert into table1(col1, col2)
select col3
,col4
from table2;
There is nothing stopping you selecting your data from the same place you are inserting to, which will duplicate all your data:
insert into table1(col1, col2)
select col1
,col2
from table1;
If you want to edit one of these column values - say by incrementing the value currently held, you simply apply this logic to your select statement and make sure the resultant dataset matches your target table in number of columns and data types:
insert into table1(col1, col2)
select col1
,col2+1 as col2
from table1;
Optionally, if you only want to do this for a subset of those values, just add a standard where clause:
insert into table1(col1, col2)
select col1
,col2+1 as col2
from table1
where col1 = <your value>;
Now if this isn't enough for you to work it out by yourself, you can join your dataset to you values list to get a version of the data to be inserted for each value in that list. Because you want each row to join to each value, you can use a cross join:
declare #v table(value int);
insert into #v values(56957),(85697),(56325),(45698),(21367),(56397),(14758),(39656);
insert into table1(col1, col2, value)
select t.col1
,t.col2
,v.value
from table1 as t
cross join #v as v

How do I avoid listing all the table columns in a PostgreSQL returns statement?

I have a PostgreSQL function similar to this:
CREATE OR REPLACE FUNCTION dbo.MyTestFunction(
_ID INT
)
RETURNS dbo.MyTable AS
$$
SELECT *,
(SELECT Name FROM dbo.MySecondTable WHERE RecordID = PersonID)
FROM dbo.MyTable
WHERE PersonID = _ID
$$ LANGUAGE SQL STABLE;
I would really like to NOT have to replace the RETURNS dbo.MyTable AS with something like:
RETURNS TABLE(
col1 INT,
col2 TEXT,
col3 BOOLEAN,
col4 TEXT
) AS
and list out all the columns of MyTable and Name of MySecondTable. Is this something that can be done? Thanks.
--EDIT--
To clarify I have to return ALL columns in MyTable and 1 column from MySecondTable. If MyTable has >15 columns, I don't want to have to list out all the columns in a RETURNS TABLE (col1.. coln).
You just list the columns that you want returned in the SELECT portion of your SQL statement:
SELECT t1.column1, t1.column2,
(SELECT Name FROM dbo.MySecondTable WHERE RecordID = PersonID)
FROM dbo.MyTable t1
WHERE PersonID = _ID
Now you'll just get column1, column3, and name returned
Furthermore, you'll probably find better performance using a LEFT OUTER JOIN in your FROM portion of the SQL statement as opposed to the correlated subquery you have now:
SELECT t1.column1, t1.column2, t2.Name
FROM dbo.MyTable t1
LEFT OUTER JOIN dbo.MySecondTable t2 ON
t2.RecordID = t1.PersonID
WHERE PersonID = _ID
Took a bit of a guess on where RecordID and PersonID were coming from, but that's the general idea.

Can't drop temp table in Postgres function: "being used by active queries in this session"

It is expected to now take in a table called waypoints and follow through the function body.
drop function if exists everything(waypoints);
create function everything(waypoints) RETURNS TABLE(node int, xy text[]) as $$
BEGIN
drop table if exists bbox;
create temporary table bbox(...);
insert into bbox
select ... from waypoints;
drop table if exists b_spaces;
create temporary table b_spaces(
...
);
insert into b_spaces
select ...
drop table if exists b_graph; -- Line the error flags.
create temporary table b_graph(
...
);
insert into b_graph
select ...
drop table if exists local_green;
create temporary table local_green(
...
);
insert into local_green
...
with aug_temp as (
select ...
)
insert into b_graph(source, target, cost) (
(select ... from aug_temp)
UNION
(select ... from aug_temp)
);
return query
with
results as (
select id1, ... from b_graph -- The relation being complained about.
),
pkg as (
select loc, ...
)
select id1, array_agg(loc)
from pkg
group by id1;
return;
END;
$$ LANGUAGE plpgsql;
This returns cannot DROP TABLE b_graph because it is being used by active queries in this session
How do I go about rectifying this issue?
The error message is rather obvious, you cannot drop a temp table while it is being used.
You might be able to avoid the problem by adding ON COMMIT DROP:
Temporary table and loops in a function
However, this can probably be simpler. If you don't need all those temp tables to begin with (which I suspect), you can replace them all with CTEs (or most of them probably even with cheaper subqueries) and simplify to one big query. Can be plpgsql or just SQL:
CREATE FUNCTION everything(waypoints)
RETURNS TABLE(node int, xy text[]) AS
$func$
WITH bbox AS (SELECT ... FROM waypoints) -- not the fct. parameter!
, b_spaces AS (SELECT ... )
, b_graph AS (SELECT ... )
, local_green AS (SELECT ... )
, aug_temp AS (SELECT ... )
, b_graph2(source, target, cost) AS (
SELECT ... FROM b_graph
UNION ALL -- guessing you really want UNION ALL
SELECT ... FROM aug_temp
UNION ALL
SELECT ... FROM aug_temp
)
, results AS (SELECT id1, ... FROM b_graph2)
, pkg AS (SELECT loc, ... )
SELECT id1, array_agg(loc)
FROM pkg
GROUP BY id1
$func$ LANGUAGE sql;
Views are just storing a query ("the recipe"), not the actual resulting values ("the soup").
It's typically cheaper to use CTEs instead of creating temp tables.
Derived tables in queries, sorted by their typical overall performance (exceptions for special cases involving indexes). From slow to fast:
CREATE TABLE
CREATE UNLOGGED TABLE
CREATE TEMP TABLE
CTE
subquery
UNION would try to fold duplicate rows. Typically, people really want UNION ALL, which just appends rows. Faster and does not try to remove dupes.