Reusing json parsed input in postgres plpgsql function - postgresql

I have a plpgsql function that takes a jsonb input, and uses it to first check something, and then again in a query to get results. Something like:
CREATE OR REPLACE FUNCTION public.my_func(
a jsonb,
OUT inserted integer)
RETURNS integer
LANGUAGE 'plpgsql'
COST 100.0
VOLATILE NOT LEAKPROOF
AS $function$
BEGIN
-- fail if there's something already there
IF EXISTS(
select t.x from jsonb_populate_recordset(null::my_type, a) f inner join some_table t
on f.x = t.x and
f.y = t.y
) THEN
RAISE EXCEPTION 'concurrency violation... already present.';
END IF;
-- straight insert, and collect number of inserted
WITH inserted_rows AS (
INSERT INTO some_table (x, y, z)
SELECT f.x, f.y, f.z
FROM jsonb_populate_recordset(null::my_type, a) f
RETURNING 1
)
SELECT count(*) from inserted_rows INTO inserted
;
END
Here, I'm using jsonb_populate_recordset(null::my_type, a) both in the IF check, and also in the actual insert. Is there a way to do the parsing once - perhaps via a variable of some sort? Or would the query optimiser kick in and ensure the parse operation happens only once?

If I understand correctly you look to something like this:
CREATE OR REPLACE FUNCTION public.my_func(
a jsonb,
OUT inserted integer)
RETURNS integer
LANGUAGE 'plpgsql'
COST 100.0
VOLATILE NOT LEAKPROOF
AS $function$
BEGIN
WITH checked_rows AS (
SELECT f.x, f.y, f.z, t.x IS NOT NULL as present
FROM jsonb_populate_recordset(null::my_type, a) f
LEFT join some_table t
on f.x = t.x and f.y = t.y
), vioalted_rows AS (
SELECT count(*) AS violated FROM checked_rows AS c WHERE c.present
), inserted_rows AS (
INSERT INTO some_table (x, y, z)
SELECT c.x, c.y, c.z
FROM checked_rows AS c
WHERE (SELECT violated FROM vioalted_rows) = 0
RETURNING 1
)
SELECT count(*) from inserted_rows INTO inserted
;
IF inserted = 0 THEN
RAISE EXCEPTION 'concurrency violation... already present.';
END IF;
END;
$function$;

JSONB type is no need to parse more then once, at the assignment:
while jsonb data is stored in a decomposed binary format that makes it slightly slower to input due to added conversion overhead, but significantly faster to process, since no reparsing is needed.
Link
jsonb_populate_recordset function declared as STABLE:
STABLE indicates that the function cannot modify the database, and that within a single table scan it will consistently return the same result for the same argument values, but that its result could change across SQL statements.
Link
I am not sure about it. From the one side UDF call is considering as single statements, from the other side UDF can contains multiple statement. Clarification needed.
Finally if you want to cache such sings then you could to use arrays:
CREATE OR REPLACE FUNCTION public.my_func(
a jsonb,
OUT inserted integer)
RETURNS integer
LANGUAGE 'plpgsql'
COST 100.0
VOLATILE NOT LEAKPROOF
AS $function$
DECLARE
d my_type[]; -- There is variable for caching
BEGIN
select array_agg(f) into d from jsonb_populate_recordset(null::my_type, a) as f;
-- fail if there's something already there
IF EXISTS(
select *
from some_table t
where (t.x, t.y) in (select x, y from unnest(d)))
THEN
RAISE EXCEPTION 'concurrency violation... already present.';
END IF;
-- straight insert, and collect number of inserted
WITH inserted_rows AS (
INSERT INTO some_table (x, y, z)
SELECT f.x, f.y, f.z
FROM unnest(d) f
RETURNING 1
)
SELECT count(*) from inserted_rows INTO inserted;
END $function$;

If you actually want to reuse a result set repeatedly, the general solution would be a temporary table. Example:
Using temp table in PL/pgSQL procedure for cleaning tables
However, that's rather expensive. Looks like all you need is a UNIQUE constraint or index:
Simple and safe with UNIQUE constraint
ALTER TABLE some_table ADD CONSTRAINT some_table_x_y_uni UNIQUE (x,y);
As opposed to your procedural attempt, this is also concurrency-safe (no race conditions). Much faster, too.
Then the function can be dead simple:
CREATE OR REPLACE FUNCTION public.my_func(a jsonb, OUT inserted integer) AS
$func$
BEGIN
INSERT INTO some_table (x, y, z)
SELECT f.x, f.y, f.z
FROM jsonb_populate_recordset(null::my_type, a) f;
GET DIAGNOSTICS inserted = ROW_COUNT; -- OUT param, we're done here
END
$func$ LANGUAGE plpgsql;
If any (x,y) is already present in some_table you get your exception. Chose an instructive name for the constraint, which is reported in the error message.
And we can just read the command tag with GET DIAGNOSTICS, which is substantially cheaper than running another count query.
Related:
How does PostgreSQL enforce the UNIQUE constraint / what type of index does it use?
UNIQUE constraint not possible?
For the unlikely case that a UNIQUE constraint should not be feasible, you can still have it rather simple:
CREATE OR REPLACE FUNCTION public.my_func(a jsonb, OUT inserted integer) AS
$func$
BEGIN
INSERT INTO some_table (x, y, z)
SELECT f.x, f.y, f.z -- empty result set if there are any violations
FROM (
SELECT f.x, f.y, f.z, count(t.x) OVER () AS conflicts
FROM jsonb_populate_recordset(null::my_type, a) f
LEFT JOIN some_table t USING (x,y)
) f
WHERE f.conflicts = 0;
GET DIAGNOSTICS inserted = ROW_COUNT;
IF inserted = 0 THEN
RAISE EXCEPTION 'concurrency violation... already present.';
END IF;
END
$func$ LANGUAGE plpgsql;
Count the number of violations in the same query. (count() only counts non-null values). Related:
Best way to get result count before LIMIT was applied
You should have at least a simple index on some_table (x,y) anyway.
It's important to know that plpgsql does not return results before control exits the function. The exception cancels the return, the user never gets results, only the error message. We added a code example to the manual.
Note, however, that there are race conditions here under concurrent write load. Related:
Is SELECT or INSERT in a function prone to race conditions?
Would the query planner avoid repeated evaluation?
Certainly not between multiple SQL statements.
Even if the function itself is defined STABLE or IMMUTABLE (jsonb_populate_recordset() in the example is STABLE), the query planner does not know that values of input parameters are unchanged between calls. It would be expensive to keep track and make sure of it.
Actually, since plpgsql treats SQL statements like prepared statements, that's plain impossible, since the query is planned before parameter values are fed to the planned query.

Related

How to get a function to work within a DO statement in Postgresql

I have a long and complex plpgsql function that creates a bunch of temporary tables nested within a while statement to get the optimal result. When the condition has been met I insert the result into an existing table, the function is far to long to post here but this is an example:
CREATE OR REPLACE FUNCTION public.test_function(id_input integer, val_input numeric)
RETURNS VOID AS
$BODY$
DECLARE
id_input numeric = $1;
val_input numeric = $2;
BEGIN
WHILE test_val < 0
LOOP
CREATE TEMP TABLE temp_table AS
SELECT a.existing_val - val_input AS new_val
FROM existing_table a
WHERE a.id = id_input;
test_val := (SELECT new_val FROM temp_table);
val_input := val_input + 1;
END LOOP;
INSERT INTO output_table (id, new_val)
SELECT a.id, a.new_val
FROM temp_table a;
END;
$BODY$
LANGUAGE plpgsql;
The function works if I call it like this SELECT test_function(1, 1000) However I would like run this function on a table with 60,000+ rows, like this:
SELECT test_function(a.id, a.val_input)
FROM data_table a;
It works when I use a subset of the data_table, say 1000 rows. However when I run it on the full table (60,000+ rows) I get the following error "AbortTransaction while in COMMIT state". After some reading I found out COMMITS, so in my case the inserts do not occur until the function has finished running which takes about 4 hours. So does anyone know what is going on?
As a workaround I tried nesting the function in a DO statement so the inserts are committed straight away:
DO
$do$
DECLARE
r data_table%rowtype;
BEGIN
FOR r IN
SELECT * FROM data_table
LOOP
SELECT public.test_function(r.id, r.val_input);
END LOOP;
END
$do$;
However then I get the following error "ERROR: query has no destination for result data", which I guess means I need to rewrite the function to use PERFORM instead of SELECT. However I have not had any luck with this as yet.
Any ideas?
Since you are not interested in the function result, you should use
PERFORM public.test_function(r.id, r.val_input);
instead of
SELECT public.test_function(r.id, r.val_input);
The latter syntax would only work if you add INTO some_variable as a destination for the query result.
Thank you all for your suggestions. I ended up using Jim Jones's suggestion and converting the function to a procedure which allowed me to use COMMIT after I did the INSERT. I also followed Jeremy's suggestion and moved from temp tables to CTE's. This solved the problem for me.

Query on Return Statement - PostgreSQL

I have this question, I was doing some migration from SQL Server to PostgreSQL 12.
The scenario, I am trying to accomplish:
The function should have a RETURN Statement, be it with SETOF 'tableType' or RETURN TABLE ( some number of columns )
The body starts with a count of records, if there is no record found based on input parameters, then simply Return Zero (0), else, return the entire set of record defined in the RETURN Statement.
The Equivalent part in SQL Server or Oracle is: They can just put a SELECT Statement inside a Procedure to accomplish this. But, its a kind of difficult in case of PostgreSQL.
Any suggestion, please.
What I could accomplish still now - If no record found, it will simply return NULL, may be using PERFORM, or may be selecting NULL as column name for the returning tableType columns.
I hope I am clear !
What I want is something like -
============================================================
CREATE OR REPLACE FUNCTION public.get_some_data(
id integer)
RETURNS TABLE ( id_1 integer, name character varying )
LANGUAGE 'plpgsql'
AS $BODY$
DECLARE
p_id alias for $1;
v_cnt integer:=0;
BEGIN
SELECT COUNT(1) FROM public.exampleTable e
WHERE id::integer = e.id::integer;
IF v_cnt= 0 THEN
SELECT 0;
ELSE
SELECT
a.id, a.name
public.exampleTable a
where a.id = p_id;
END;
$BODY$;
If you just want to return a set of a single table, using returns setof some_table is indeed the easiest way. The most basic SQL function to do that would be:
create function get_data()
returns setof some_table
as
$$
select *
from some_table;
$$
language sql;
PL/pgSQL isn't really necessary to put a SELECT statement into a function, but if you need to do other things, you need to use RETURN QUERY in a PL/pgSQL function:
create function get_data()
returns setof some_table
as
$$
begin
return query
select *
from some_table;
end;
$$
language plpgsql;
A function as exactly one return type. You can't have a function that sometimes returns an integer and sometimes returns thousands of rows with a dozen columns.
The only thing you could do, if you insist on returning something is something like this:
create function get_data()
returns setof some_table
as
$$
begin
return query
select *
from some_table;
if not found then
return query
select (null::some_table).*;
end if;
end;
$$
language plpgsql;
But I would consider the above an extremely ugly and confusing (not to say stupid) solution. I certainly wouldn't let that pass through a code review.
The caller of the function can test if something was returned in the same way I implemented that ugly hack: check the found variable after using the function.
One more hack to get as close as possible to what you want. But I will repeat what others have told you: You cannot do what you want directly. Just because MS SQL Server lets you get away poor coding does not mean Postgres is obligated to do so. As the link by #a_horse_with_no_name implies converting code is easy, once you migrate how you think about the problem in the first place. The closest you can get is return a tuple with a 0 id. The following is one way.
create or replace function public.get_some_data(
p_id integer)
returns table ( id integer, name character varying )
language plpgsql
as $$
declare
v_at_least_one boolean = false;
v_exp_rec record;
begin
for v_exp_rec in
select a.id, a.name
from public.exampletable a
where a.id = p_id
union all
select 0,null
loop
if v_exp_rec.id::integer > 0
or (v_exp_rec.id::integer = 0 and not v_at_least_one)
then
id = v_exp_rec.id;
name = v_exp_rec.name;
return next;
v_at_least_one = true;
end if;
end loop ;
return;
end
$$;
But that is still just a hack and assumes there in not valid row with id=0. A much better approach would by for the calling routing to check what the function returns (it has to do that in one way or another anyway) and let the function just return the data found instead of making up data. That is that mindset shift. Doing that you can reduce this function to a simple select statement:
create or replace function public.get_some_data2(
p_id integer)
returns table ( id integer, name character varying )
language sql strict
as $$
select a.id, a.name
from public.exampletable a
where a.id = p_id;
$$;
Or one of the other solutions offered.

Postgres FOR LOOP

I am trying to get 25 random samples of 15,000 IDs from a table. Instead of manually pressing run every time, I'm trying to do a loop. Which I fully understand is not the optimum use of Postgres, but it is the tool I have. This is what I have so far:
for i in 1..25 LOOP
insert into playtime.meta_random_sample
select i, ID
from tbl
order by random() limit 15000
end loop
Procedural elements like loops are not part of the SQL language and can only be used inside the body of a procedural language function, procedure (Postgres 11 or later) or a DO statement, where such additional elements are defined by the respective procedural language. The default is PL/pgSQL, but there are others.
Example with plpgsql:
DO
$do$
BEGIN
FOR i IN 1..25 LOOP
INSERT INTO playtime.meta_random_sample
(col_i, col_id) -- declare target columns!
SELECT i, id
FROM tbl
ORDER BY random()
LIMIT 15000;
END LOOP;
END
$do$;
For many tasks that can be solved with a loop, there is a shorter and faster set-based solution around the corner. Pure SQL equivalent for your example:
INSERT INTO playtime.meta_random_sample (col_i, col_id)
SELECT t.*
FROM generate_series(1,25) i
CROSS JOIN LATERAL (
SELECT i, id
FROM tbl
ORDER BY random()
LIMIT 15000
) t;
About generate_series():
What is the expected behaviour for multiple set-returning functions in SELECT clause?
About optimizing performance of random selections:
Best way to select random rows PostgreSQL
Below is example you can use:
create temp table test2 (
id1 numeric,
id2 numeric,
id3 numeric,
id4 numeric,
id5 numeric,
id6 numeric,
id7 numeric,
id8 numeric,
id9 numeric,
id10 numeric)
with (oids = false);
do
$do$
declare
i int;
begin
for i in 1..100000
loop
insert into test2 values (random(), i * random(), i / random(), i + random(), i * random(), i / random(), i + random(), i * random(), i / random(), i + random());
end loop;
end;
$do$;
I just ran into this question and, while it is old, I figured I'd add an answer for the archives. The OP asked about for loops, but their goal was to gather a random sample of rows from the table. For that task, Postgres 9.5+ offers the TABLESAMPLE clause on WHERE. Here's a good rundown:
https://www.2ndquadrant.com/en/blog/tablesample-in-postgresql-9-5-2/
I tend to use Bernoulli as it's row-based rather than page-based, but the original question is about a specific row count. For that, there's a built-in extension:
https://www.postgresql.org/docs/current/tsm-system-rows.html
CREATE EXTENSION tsm_system_rows;
Then you can grab whatever number of rows you want:
select * from playtime tablesample system_rows (15);
I find it more convenient to make a connection using a procedural programming language (like Python) and do these types of queries.
import psycopg2
connection_psql = psycopg2.connect( user="admin_user"
, password="***"
, port="5432"
, database="myDB"
, host="[ENDPOINT]")
cursor_psql = connection_psql.cursor()
myList = [...]
for item in myList:
cursor_psql.execute('''
-- The query goes here
''')
connection_psql.commit()
cursor_psql.close()
Here is the one complex postgres function involving UUID Array, For loop, Case condition and Enum data update. This function parses each row and checks for the condition and updates the individual row.
CREATE OR REPLACE FUNCTION order_status_update() RETURNS void AS $$
DECLARE
oid_list uuid[];
oid uuid;
BEGIN
SELECT array_agg(order_id) FROM order INTO oid_list;
FOREACH uid IN ARRAY uid_list
LOOP
WITH status_cmp AS (select COUNT(sku)=0 AS empty,
COUNT(sku)<COUNT(sku_order_id) AS partial,
COUNT(sku)=COUNT(sku_order_id) AS full
FROM fulfillment
WHERE order_id=oid)
UPDATE order
SET status=CASE WHEN status_cmp.empty THEN 'EMPTY'::orderstatus
WHEN status_cmp.full THEN 'FULL'::orderstatus
WHEN status_cmp.partial THEN 'PARTIAL'::orderstatus
ELSE null
END
FROM status_cmp
WHERE order_id=uid;
END LOOP;
END;
$$ LANGUAGE plpgsql;
To run the above function
SELECT order_status_update();
Using procedure.
CREATE or replace PROCEDURE pg_temp_3.insert_data()
LANGUAGE SQL
BEGIN ATOMIC
INSERT INTO meta_random_sample(col_serial, parent_id)
SELECT t.*
FROM generate_series(1,25) i
CROSS JOIN LATERAL (
SELECT i, parent_id
FROM parent_tree order by random() limit 2
) t;
END;
Call the procedure.
call pg_temp_3.insert_data();
PostgreSQL manual: https://www.postgresql.org/docs/current/sql-createprocedure.html

update many to many records with postgresql / plpgsql

I have a web based system that has several tables (postgres/pgsql) that hold many to many relationships such as;
table x
column_id1 smallint FK
column_id2 smallint FK
In this scenario the update is made based on column_id2
At first to update these records we would run the following function;
-- edited to protect the innocent
CREATE FUNCTION associate_id1_with_id2(integer[], integer) RETURNS integer
AS $_$
DECLARE
a alias for $1;
b alias for $2;
i integer;
BEGIN
delete from tablex where user_id = b;
FOR i IN array_lower(a,1) .. array_upper(a,1) LOOP
INSERT INTO tablex (
column_id2,
column_id1)
VALUES (
b,
a[i]);
end loop;
RETURN i;
END;
$_$
LANGUAGE plpgsql;
that seemed sloppy and now with the addition of auditing it really shows.
What I am trying to do now is only delete and insert the necessary rows.
I have been trying various forms of the following with no luck
CREATE OR REPLACE FUNCTION associate_id1_with_id2(integer[], integer) RETURNS integer
AS $_$
DECLARE
a alias for $1;
b alias for $2;
c varchar;
i integer;
BEGIN
c = array_to_string($1,',');
INSERT INTO tablex (
column_id2,
column_id1)
(
SELECT column_id2, column_id1
FROM tablex
WHERE column_id2 = b
AND column_id1 NOT IN (c)
);
DELETE FROM tablex
WHERE column_id2 = b
AND column_id1 NOT IN (c);
RETURN i;
END;
$_$
LANGUAGE plpgsql;
depending on the version of the function I'm attempting there are various errors such as explicit type casts (i'm guessing it doesnt like c being varchar?) for the current version.
first off, is my approach correct or is there a more elegant solution given there are a couple tables which this type of handling is required? If not could you please point me in the right direction?
if this is the right approach could you please assist with the array conversion for the NOT IN portion of the where clause?
Instead of array_to_string, use unnest to transform the array into a set of rows (as if it was a table), and the problem can be solved with vanilla SQL:
INSERT INTO tablex(column_id1,column_id2)
select ai,b from unnest(a) as ai where not exists
(select 1 from tablex where column_id1=ai and column_id2=b);
DELETE FROM tablex
where column_id2=b and column_id1 not in
(select ai from unnest(a) as ai);

Stored function with temporary table in postgresql

Im new to writing stored functions in postgresql and in general . I'm trying to write onw with an input parameter and return a set of results stored in a temporary table.
I do the following in my function .
1) Get a list of all the consumers and store their id's stored in a temp table.
2) Iterate over a particular table and retrieve values corresponding to each value from the above list and store in a temp table.
3)Return the temp table.
Here's the function that I've tried to write by myself ,
create or replace function getPumps(status varchar) returns setof record as $$ (setof record?)
DECLARE
cons_id integer[];
i integer;
temp table tmp_table;--Point B
BEGIN
select consumer_id into cons_id from db_consumer_pump_details;
FOR i in select * from cons_id LOOP
select objectid,pump_id,pump_serial_id,repdate,pumpmake,db_consumer_pump_details.status,db_consumer.consumer_name,db_consumer.wenexa_id,db_consumer.rr_no into tmp_table from db_consumer_pump_details inner join db_consumer on db_consumer.consumer_id=db_consumer_pump_details.consumer_id
where db_consumer_pump_details.consumer_id=i and db_consumer_pump_details.status=$1--Point A
order by db_consumer_pump_details.consumer_id,pump_id,createddate desc limit 2
END LOOP;
return tmp_table
END;
$$
LANGUAGE plpgsql;
However Im not sure about my approach and whether im right at the points A and B as I've marked in the code above.And getting a load of errors while trying to create the temporary table.
EDIT: got the function to work ,but I get the following error when I try to run the function.
ERROR: array value must start with "{" or dimension information
Here's my revised function.
create temp table tmp_table(objectid integer,pump_id integer,pump_serial_id varchar(50),repdate timestamp with time zone,pumpmake varchar(50),status varchar(2),consumer_name varchar(50),wenexa_id varchar(50),rr_no varchar(25));
select consumer_id into cons_id from db_consumer_pump_details;
FOR i in select * from cons_id LOOP
insert into tmp_table
select objectid,pump_id,pump_serial_id,repdate,pumpmake,db_consumer_pump_details.status,db_consumer.consumer_name,db_consumer.wenexa_id,db_consumer.rr_no from db_consumer_pump_details inner join db_consumer on db_consumer.consumer_id=db_consumer_pump_details.consumer_id where db_consumer_pump_details.consumer_id=i and db_consumer_pump_details.status=$1
order by db_consumer_pump_details.consumer_id,pump_id,createddate desc limit 2;
END LOOP;
return query (select * from tmp_table);
drop table tmp_table;
END;
$$
LANGUAGE plpgsql;
AFAIK one can't declare tables as variables in postgres. What you can do is create one in your funcion body and use it thourough (or even outside of function). Beware though as temporary tables aren't dropped until the end of the session or commit.
The way to go is to use RETURN NEXT or RETURN QUERY
As for the function result type I always found RETURNS TABLE to be more readable.
edit:
Your cons_id array is innecessary, just iterate the values returned by select.
Also you can have multiple return query statements in a single function to append result of the query to the result returned by function.
In your case:
CREATE OR REPLACE FUNCTION getPumps(status varchar)
RETURNS TABLE (objectid INTEGER,pump_id INTEGER,pump_serial_id INTEGER....)
AS
$$
BEGIN
FOR i in SELECT consumer_id FROM db_consumer_pump_details LOOP
RETURN QUERY(
SELECT objectid,pump_id,pump_serial_id,repdate,pumpmake,db_consumer_pump_details.status,db_consumer.consumer_name,db_consumer.wenexa_id,db_consumer.rr_no FROM db_consumer_pump_details INNER JOIN db_consumer ON db_consumer.consumer_id=db_consumer_pump_details.consumer_id
WHERE db_consumer_pump_details.consumer_id=i AND db_consumer_pump_details.status=$1
ORDER BY db_consumer_pump_details.consumer_id,pump_id,createddate DESC LIMIT 2
);
END LOOP;
END;
$$
edit2:
You probably want to take a look at this solution for groupwise-k-maximum problem as that's exactly what you're dealing with here.
it might be easier to just return a table (or query)
CREATE FUNCTION extended_sales(p_itemno int)
RETURNS TABLE(quantity int, total numeric) AS $$
BEGIN
RETURN QUERY SELECT quantity, quantity * price FROM sales
WHERE itemno = p_itemno;
END;
$$ LANGUAGE plpgsql;
(copied from postgresql docs)