I'm writing some pl/pgsql code in postgres - and have come up to this issue. Simplified, my code looks like this:
declare
resulter mytype%rowtype;
...
for resulter in
select id, [a lot of other fields]
from mytable [joining a lot of other tables]
where [some reasonable where clause]
loop
if [certain condition on the resulter] then
[add id to a set];
end if;
return next resulter;
end loop;
select into myvar sum([some field])
from anothertable
where id in ([my set from above])
The question is around the [add to set]. In another scenario in the past, I used to deal with it this way:
declare
myset varchar := '';
...
loop
if [condition] then
myset || ',' || id;
end if;
return next resulter;
end loop;
execute 'select sum([field]) from anothertable where id in (' || trim(leading ',' from myset) || ')' into myvar
However this doesn't seem too efficient to me when the number of id's to be added to this set is large. What other options do I have for keeping track of this set and then using it?
-- update --
Obviously, another option is to create a temporary table and insert ids into it when needed. Then in the last select statement have a sub-select on that temporary table - like so:
create temporary table x (id integer);
loop
if [condition] then
insert into x values (id);
end if;
return next resulter;
end loop;
select into myvar sum([field]) from anothertable where id in (select id from x);
Any other options? Also, what would be the most efficient, considering that there may be many thousands of relevant ID's.
In my opinion, temp tables are the most efficient way to handle this:
create temp table x(id integer not null primary key) on commit drop;
Related
I'm working with PostgreSQL 9.5.
I'm creating a trigger in PL/pgSQL, that adds a record to a table (synthese_poly) when an INSERT is performed on a second table (operation_poly), with other tables data.
The trigger works well, except for some variables, that are not filled (especially the ones I try to fill with an array_to_string() function).
This is the code:
-- Function: bdtravaux.totablesynth_fn()
-- DROP FUNCTION bdtravaux.totablesynth_fn();
CREATE OR REPLACE FUNCTION bdtravaux.totablesynth_fn()
RETURNS trigger AS
$BODY$
DECLARE
varoperateur varchar;
varchantvol boolean;
BEGIN
IF (TG_OP = 'INSERT') THEN
varsortie_id := NEW.sortie;
varopeid := NEW.operation_id;
--The following « SELECT » queries take data in third-party tables and fill variables, which will be used in the final insertion query.
SELECT array_to_string(array_agg(DISTINCT oper.operateurs),'; ')
INTO varoperateur
FROM bdtravaux.join_operateurs oper INNER JOIN bdtravaux.operation_poly o ON (oper.id_joinop=o.id_oper)
WHERE o.operation_id = varopeid;
SELECT CASE WHEN o.ope_chvol = 0 THEN 'f' ELSE 't' END as opechvol INTO varchantvol
FROM bdtravaux.operation_poly o WHERE o.operation_id = varopeid;
-- «INSERT» query
INSERT INTO bdtravaux.synthese_poly (soperateur, schantvol) SELECT varoperateur, varchantvol;
RAISE NOTICE 'varoperateur value : (%)', varoperateur;
RAISE NOTICE 'varchantvol value : (%)', varchantvol;
END IF;
RETURN NEW;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
ALTER FUNCTION bdtravaux.totablesynth_fn()
OWNER TO postgres;
And this is the trigger :
-- Trigger: totablesynth on bdtravaux.operation_poly
-- DROP TRIGGER totablesynth ON bdtravaux.operation_poly;
CREATE TRIGGER totablesynth
AFTER INSERT
ON bdtravaux.operation_poly
FOR EACH ROW
WHEN ((new.chantfini = true))
EXECUTE PROCEDURE bdtravaux.totablesynth_fn();
The varchantvol variable is correctly filled, but varoperateur stays desperately empty (NULL value) (and so on for the corresponding field in the synthese_poly table).
Note:
The SELECT array_to_string(…) ... query itself (launched with pgAdmin, without INTO varoperateur and replacing varopeid with a value) works well, and returns a string.
I tried to change array_to_string() function and variables' data types (using ::varchar or ::text …), nothing works.
Do you see what can happen?
using array_agg
You can replace array_to_string(array_agg(DISTINCT oper.operateurs),'; ') with
string_agg(DISTINCT oper.operateurs,'; ')
And you can use order by to sort the text in the agregate
string_agg(DISTINCT oper.operateurs,'; ' ORDER BY oper.operateurs)
My educated guess: you have a trigger with BEFORE INSERT ON bdtravaux.operation_poly. And operation_id is its serial PK column.
In this case, the query with WHERE o.operation_id = varopeid
(where varopeid has been filled with NEW.operation_id) can never find any rows because the row is not in the table, yet.
array_agg() has no role in this.
Would work with a trigger AFTER INSERT ON bdtravaux.operation_poly. But if id_oper is from the same inserted row, you can just simplify to:
SELECT array_to_string(array_agg(DISTINCT oper.operateurs),'; ')
INTO varoperateur
FROM bdtravaux.join_operateurs oper
WHERE oper.id_joinop = NEW.id_oper;
And keep the BEFORE trigger.
The whole function might be simpler, can probably done with a single query.
I have few SELECT statements that are made by joining multiple tables.
Each select is returning single row but from 10 to 20 fields in that row. What is the easiest way to store that data for later use?
I would like to avoid creating 50 variables and writing select into statements, using cursors and loop for single row is not the smartest idea from what i read around.
Is there a good way to do this?
Stupid example so you can get general idea
SELECT t1.field1
, t1.field2
, t1.field3
, t1.field4
, t2.field5
, t2.field6
, t2.field7
, t3.field8
FROM table1 t1
JOIN table2 t2 ON something
JOIN table3 t3 ON something
Sorry for errors in my english and thanks in advance
Create a view from your select statement. Thereafter can reference a record by using a single variable of type <viewname>%ROWTYPE.
Another option would be to wrap the select in an implicit cursor loop:
DECLARE
strvar VARCHAR2(400); -- demo purpose only
BEGIN
-- ...
FOR i IN (
-- ... here goesyour select statement ...
) LOOP
strvar := i.field1 || i.field2; -- ... whatever
END LOOP;
-- ...
END;
-- ...
Still another option is the declaration of a record type and a record variable:
DECLARE
TYPE tRec IS RECORD (
field1 table1.field1%TYPE
, field2 table1.field2%TYPE
, field3 table1.field3%TYPE
, field4 table1.field4%TYPE
, field5 table2.field5%TYPE
, field6 table2.field6%TYPE
, field7 table2.field7%TYPE
, field8 table3.field8%TYPE
)
r tRec;
BEGIN
-- ...
SELECT --...
INTO r
FROM --...
;
-- ...
END;
-- ...
Every time you make a select you will have a cursor - there's no escape from that.
Views and explicit cursors are a way to reuse select statements. Which one is better option depends on the case.
rowtype-attribute is handy way to create records automatically based on tables/views/cursors. I think this is the closest to your requirement what PL/SQL can offer.
create or replace package so51 is
cursor cursor1 is -- an example single row query
select
dual.* -- all table columns included
,'Y' as str
,rownum as num
,sysdate as date_
from dual
;
function get_data return cursor1%rowtype;
end;
/
show errors
create or replace package body so51 is
-- function never changes when cursor1 changes
function get_data return cursor1%rowtype is
v_data cursor1%rowtype;
begin
open cursor1;
fetch cursor1 into v_data;
close cursor1;
return v_data;
end;
end;
/
show errors
declare
v_data constant so51.cursor1%rowtype := so51.get_data;
begin
-- use only the data you need
dbms_output.put_line('v_data.dummy = ' || v_data.dummy);
dbms_output.put_line('v_data.str = ' || v_data.str);
dbms_output.put_line('v_data.num = ' || v_data.num);
dbms_output.put_line('v_data.date_ = ' || v_data.date_);
end;
/
Example run
SQL> #so51.sql
Package created.
No errors.
Package body created.
No errors.
v_data.dummy = X
v_data.str = Y
v_data.num = 1
v_data.date_ = 2015-11-23 09:42:02
PL/SQL procedure successfully completed.
SQL>
I need to loop through type RECORD items by key/index, like I can do this using array structures in other programming languages.
For example:
DECLARE
data1 record;
data2 text;
...
BEGIN
...
FOR data1 IN
SELECT
*
FROM
sometable
LOOP
FOR data2 IN
SELECT
unnest( data1 ) -- THIS IS DOESN'T WORK!
LOOP
RETURN NEXT data1[data2]; -- SMTH LIKE THIS
END LOOP;
END LOOP;
As #Pavel explained, it is not simply possible to traverse a record, like you could traverse an array. But there are several ways around it - depending on your exact requirements. Ultimately, since you want to return all values in the same column, you need to cast them to the same type - text is the obvious common ground, because there is a text representation for every type.
Quick and dirty
Say, you have a table with an integer, a text and a date column.
CREATE TEMP TABLE tbl(a int, b text, c date);
INSERT INTO tbl VALUES
(1, '1text', '2012-10-01')
,(2, '2text', '2012-10-02')
,(3, ',3,ex,', '2012-10-03') -- text with commas
,(4, '",4,"ex,"', '2012-10-04') -- text with commas and double quotes
Then the solution can be a simple as:
SELECT unnest(string_to_array(trim(t::text, '()'), ','))
FROM tbl t;
Works for the first two rows, but fails for the special cases of row 3 and 4.
You can easily solve the problem with commas in the text representation:
SELECT unnest(('{' || trim(t::text, '()') || '}')::text[])
FROM tbl t
WHERE a < 4;
This would work fine - except for line 4 which has double quotes in the text representation. Those are escaped by doubling them up. But the array constructor would need them escaped by \. Not sure why this incompatibility is there ...
SELECT ('{' || trim(t::text, '()') || '}') FROM tbl t WHERE a = 4
Yields:
{4,""",4,""ex,""",2012-10-04}
But you would need:
SELECT '{4,"\",4,\"ex,\"",2012-10-04}'::text[]; -- works
Proper solution
If you knew the column names beforehand, a clean solution would be simple:
SELECT unnest(ARRAY[a::text,b::text,c::text])
FROM tbl
Since you operate on records of well know type you can just query the system catalog:
SELECT string_agg(a.attname || '::text', ',' ORDER BY a.attnum)
FROM pg_catalog.pg_attribute a
WHERE a.attrelid = 'tbl'::regclass
AND a.attnum > 0
AND a.attisdropped = FALSE
Put this in a function with dynamic SQL:
CREATE OR REPLACE FUNCTION unnest_table(_tbl text)
RETURNS SETOF text LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE '
SELECT unnest(ARRAY[' || (
SELECT string_agg(a.attname || '::text', ',' ORDER BY a.attnum)
FROM pg_catalog.pg_attribute a
WHERE a.attrelid = _tbl::regclass
AND a.attnum > 0
AND a.attisdropped = false
) || '])
FROM ' || _tbl::regclass;
END
$func$;
Call:
SELECT unnest_table('tbl') AS val
Returns:
val
-----
1
1text
2012-10-01
2
2text
2012-10-02
3
,3,ex,
2012-10-03
4
",4,"ex,"
2012-10-04
This works without installing additional modules. Another option is to install the hstore extension and use it like #Craig demonstrates.
PL/pgSQL isn't really designed for what you want to do. It doesn't consider a record to be iterable, it's a tuple of possibly different and incompatible data types.
PL/pgSQL has EXECUTE for dynamic SQL, but EXECUTE queries cannot refer to PL/pgSQL variables like NEW or other records directly.
What you can do is convert the record to a hstore key/value structure, then iterate over the hstore. Use each(hstore(the_record)), which produces a rowset of key,value tuples. All values are cast to their text representations.
This toy function demonstrates iteration over a record by creating an anonymous ROW(..) - which will have column names f1, f2, f3 - then converting that to hstore, iterating over its column/value pairs, and returning each pair.
CREATE EXTENSION hstore;
CREATE OR REPLACE FUNCTION hs_demo()
RETURNS TABLE ("key" text, "value" text)
LANGUAGE plpgsql AS
$$
DECLARE
data1 record;
hs_row record;
BEGIN
data1 = ROW(1, 2, 'test');
FOR hs_row IN SELECT kv."key", kv."value" FROM each(hstore(data1)) kv
LOOP
"key" = hs_row."key";
"value" = hs_row."value";
RETURN NEXT;
END LOOP;
END;
$$;
In reality you would never write it this way, since the whole loop can be replaced with a simple RETURN QUERY statement and it does the same thing each(hstore) does anyway - so this is only to show how each(hstore(record)) works, and the above function should never actually be used.
This feature is not supported in plpgsql - Record IS NOT hash array like other scripting languages - it is similar to C or ADA, where this functionality is impossible. You can use other PL language like PLPerl or PLPython or some tricks - you can iterate with HSTORE datatype (extension) or via dynamic SQL
see How to set value of composite variable field using dynamic SQL
But request for this functionality usually means, so you do some wrong. When you use PL/pgSQL you have think different than you use Javascript or Python
FOR data2 IN
SELECT d
from unnest( data1 ) s(d)
LOOP
RETURN NEXT data2;
END LOOP;
If you order your results prior to looping, will you accomplish what you want.
for rc in select * from t1 order by t1.key asc loop
return next rc;
end loop;
will do exactly what you need. It is also the fastest way to perform that kind of task.
I wasn't able to find a proper way to loop over record, so what I did is converted record to json first and looped over json
declare
_src_schema varchar := 'db_utility';
_targetjson json;
_key text;
_value text;
BEGIN
select row_to_json(c.*) from information_schema.columns c where c.table_name = prm_table and c.column_name = prm_column
and c.table_schema = _src_schema into _targetjson;
raise notice '_targetjson %', _targetjson;
FOR _key, _value IN
SELECT * FROM jsonb_each_text(_targetjson)
LOOP
-- do some math operation on its corresponding value
RAISE NOTICE '%: %', _key, _value;
END LOOP;
return true;
end;
Into:
Currently i have scraped all the data into one PostgreSQL 'Bigtable' table(there are about 1.2M rows). Now i need to split the design into separate tables which all have dependency on the Bigtable. Some of the tables might have subtables. The model looks pretty much like snowflake.
Problem:
What would be best option to inserting data into tables? I thought to make the insertion with functions written in 'SQL' or PLgSQL. But the problem is still with auto-generated ID-s.
Also if you know what tools might make this problem solving easier then post!
//Edit i have added example, this not the real case just for illustration
1.2 M rows is not too much. The best tool is sql script executed from console "psql". If you have a some newer version of Pg, then you can use inline functions (DO statement) when it is necessary. But probably the most useful command is INSERT INTO SELECT statement.
-- file conversion.sql
DROP TABLE IF EXISTS f1 CASCADE;
CREATE TABLE f1(a int, b int);
INSERT INTO f1
SELECT x1, y1
FROM data
WHERE x1 = 10;
...
-- end file
psql mydb -f conversion.sql
If I understand your question, you can use a psql function like this:
CREATE OR REPLACE FUNCTION migration() RETURNS integer AS
$BODY$
DECLARE
currentProductId INTEGER;
currentUserId INTEGER;
currentReg RECORD;
BEGIN
FOR currentReg IN
SELECT * FROM bigtable
LOOP
-- Product
SELECT productid INTO currentProductId
FROM product
WHERE name = currentReg.product_name;
IF currentProductId IS NULL THEN
EXECUTE 'INSERT INTO product (name) VALUES (''' || currentReg.product_name || ''') RETURNING productid'
INTO currentProductId;
END IF;
-- User
SELECT userid INTO currentUserId
FROM user
WHERE first_name = currentReg.first_name and last_name = currentReg.last_name;
IF currentUserId IS NULL THEN
EXECUTE 'INSERT INTO user (first_name, last_name) VALUES (''' || currentReg.first_name || ''', ''' || currentReg.last_name || ''') RETURNING userid'
INTO currentUserId;
-- Insert into userAdded too with: currentUserId and currentProductId
[...]
END IF;
-- Rest of tables
[...]
END LOOP;
RETURN 1;
END;
$BODY$
LANGUAGE plpgsql;
select * from migration();
In this case it's assumed that each table runs its own primary key sequence and I have reduced the number of fields in the tables to simplify.
I hope you have been helpful.
No need to use a function for this (unless I misunderstood your problem)
If your id columns are all defined as serial column (i.e. they automatically generate the values), then this can be done with simple INSERT statements. This assumes that the target tables are all empty.
INSERT INTO users (firstname, lastname)
SELECT DISTINCT firstname, lastname
FROM bigtable;
INSERT INTO category (name)
SELECT DISTINCT category_name
FROM bigtable;
-- the following assumes a column categoryid in the product table
-- which is not visible from your screenshot
INSERT INTO product (product_name, description, categoryid)
SELECT DISTINCT b.product_name, b.description, c.categoryid
FROM bigtable b
JOIN category c ON c.category_name = b.category_name;
INSERT INTO product_added (product_productid, user_userid)
SELECT p.productid, u.userid
FROM bigtable b
JOIN product p ON p.product_name = b.product_name
JOIN users u ON u.firstname = b.firstname AND u.lastname = b.lastname
I have this upsert function that allows me to modify the fill_rate column of a row.
CREATE FUNCTION upsert_fillrate_alarming(integer, boolean) RETURNS VOID AS '
DECLARE
num ALIAS FOR $1;
dat ALIAS FOR $2;
BEGIN
LOOP
-- First try to update.
UPDATE alarming SET fill_rate = dat WHERE equipid = num;
IF FOUND THEN
RETURN;
END IF;
-- Since its not there we try to insert the key
-- Notice if we had a concurent key insertion we would error
BEGIN
INSERT INTO alarming (equipid, fill_rate) VALUES (num, dat);
RETURN;
EXCEPTION WHEN unique_violation THEN
-- Loop and try the update again
END;
END LOOP;
END;
' LANGUAGE 'plpgsql';
Is it possible to modify this function to take a column argument as well? Extra bonus points if there is a way to modify the function to take a column and a table.
As an alternative approach, you can do an upsert without a function by using an insert + update with where clauses to make them only succeed in the right case. E.g.
update mytable set col1='value1' where (col2 = 'myId');
insert into mytable select 'value1', 'myId' where not exists (select 1 from mytable where col2='myId');
Which would avoid having lots of custom postgres specific functions.
You want to read about dynamic commands in plsql.
Just build your query and invoke EXECUTE.
Maybe a simpler approach, just less line ;)
CREATE OR REPLACE FUNCTION upsert_tableName(arg1 type, arg2 type) RETURNS VOID AS $$
DECLARE
BEGIN
UPDATE tableName SET col1 = value WHERE colX = arg1 and colY = arg2;
IF NOT FOUND THEN
INSERT INTO tableName values (value, arg1, arg2);
END IF;
END;
$$ LANGUAGE 'plpgsql';