Create a temp table (if not exists) for use into a custom procedure - postgresql

I'm trying to get the hang of using temp tables:
CREATE OR REPLACE FUNCTION test1(user_id BIGINT) RETURNS BIGINT AS
$BODY$
BEGIN
create temp table temp_table1
ON COMMIT DELETE ROWS
as SELECT table1.column1, table1.column2
FROM table1
INNER JOIN -- ............
if exists (select * from temp_table1) then
-- work with the result
return 777;
else
return 0;
end if;
END;
$BODY$
LANGUAGE plpgsql;
I want the row temp_table1 to be deleted immediately or as soon as possible, that's why I added ON COMMIT DELETE ROWS. Obviously, I got the error:
ERROR: relation "temp_table1" already exists
I tried to add IF NOT EXISTS but I couldn't, I simply couldn't find working example of it that would be the I'm looking for.
Your suggestions?

DROP Table each time before creating TEMP table as below:
BEGIN
DROP TABLE IF EXISTS temp_table1;
create temp table temp_table1
-- Your rest Code comes here

The problem of temp tables is that dropping and recreating temp table bloats pg_attribute heavily and therefore one sunny morning you will find db performance dead, and pg_attribute 200+ gb while your db would be like 10gb.
So we're very heavy on temp tables having >500 rps and async i\o via nodejs and thus experienced a very heavy bloating of pg_attribute because of that. All you are left with is a very aggressive vacuuming which halts performance.
All answers given here do not solve this, because they all bloat pg_attribute heavily.
So the solution is elegantly this
create temp table if not exists my_temp_table (description) on commit delete rows;
So you go on playing with temp tables and save your pg_attribute.

You want to DROP term table after commit (not DELETE ROWS), so:
begin
create temp table temp_table1
on commit drop
...
Documentation

Related

Should I use Plpgsql to loop through table instead of using SQL?

I have a job which runs every night to load changes into a temporary table and apply those changes to the main table.
CREATE TEMP TABLE IF NOT EXIST tmp AS SELECT * FROM mytable LIMIT 0;
COPY tmp FROM PROGRAM '';
11 SQL queries to update 'mytable' based on data from 'tmp'
I have a large number of queries to delete duplicates from tmp, update values in tmp, update values in the main table and insert new rows into the main table. Is it possible to loop over both tables using plpgsql instead?
UPDATE mytable m
SET "Field" = t."Field" +1
FROM tmp t
WHERE (t."ID" = m."ID");
In this example, it is simple change of a column value. Instead, I want to do more complex operations on both the main table as well as the temp table.
EDIT: so here is some is some PSEUDO code of what I imagine.
LOOP tmp t, mytable m
BEGIN
-- operation in plpgsql including UPDATE, INSERT, DELETE
END
WHERE t.ID = m.ID;
You can use plpgsql FOR to loop over query results.
DECLARE
myrow RECORD;
BEGIN
FOR myrow IN SELECT * FROM table1 JOIN table2 USING (id)
LOOP
... do something with the row ...
END LOOP;
END
If you want to update a table while looping over it, you can create a FOR UPDATE cursor, but that won't work if the query is a join, because then you're not opening an update cursor on a table.
Note writing to/updating temp tables is much faster than writing to normal tables because temp tables don't have WAL and crash recovery overhead, and they're owned by one single connection, so you don't have to worry about locks.
If you put a query inside the loop, it will be executed many times though, which could get pretty slow. It's usually faster to use bulk queries, even if they're complicated.
If you want to UPDATE many rows in the temp table with values that depend on other tables and joins, it could be faster to run several updates on the temp table with different join and WHERE conditions.

Postgres getting duplicate key exception (DELETE AND INSERT) from inside psql function

I am trying to update the table with huge data. On checking I figured that upsert (Update and insert) is slow compared to using the temp table to delete and then insert from temp table. But I am facing issue with duplicate data in such scenario.
As far as I understand the delete and insert is happening in same transaction, so I could not understand why I am facing duplicate data issue when I have already deleted the data with the delete.
This is multithreaded scenario but i guess the other transaction will have its own set of data.
Any help is appreciated here.
Sample code
CREATE OR REPLACE FUNCTION insertUsingTempTable(a_id int, s_obj int[], p bigint[])
RETURNS BOOLEAN AS $BODY$
DECLARE passed BOOLEAN;
BEGIN
CREATE TEMP TABLE IF NOT EXISTS temp_lists AS SELECT * FROM acm_lists WHERE 0=1;
TRUNCATE TABLE temp_lists;
INSERT INTO temp_lists(a_id, s_obj_id, eff_p)
SELECT a_id , unnest($2::int[]), unnest($3::bit(64)[]) ;
IF NOT EXISTS(SELECT t.a_id, s_obj_id, count(1) FROM temp_lists t
GROUP BY t.a_id, s_obj_id HAVING count(1) > 1) THEN
DELETE FROM acm_lists t WHERE EXISTS (SELECT 1 FROM temp_lists t1
WHERE t1.a_id=t.a_id AND s_obj_id=t.s_obj_id);
INSERT INTO acm_lists (a_id, s_obj_id, eff_p)
SELECT t.a_id, t.s_obj_id, t.eff_p FROM temp_lists t;
RETURN true;
END IF;
RETURN false;
END;
$BODY$
LANGUAGE plpgsql VOLATILE;
Uniqueness is on a_id, s_obj_id in the above code.
I would like to know why duplicate exception is coming at times, if I am already deleting the data before inserting. This happens only when multiple transactions are running at the same time on the table.
INSERT ... ON CONFLICT DO UPDATE resolves the issue, but seems there is considerable performance hit. so I don't plan to use ON CONFLICT DO UPDATE approach.

Delete Duplicate rows in several Postgresql tables

I have a postgres database with several tables like table1, table2, table3. More than 1000 tables.
I imported all of these tables from a script. And apparently the script had issues to import.
Many tables have duplicate rows (all values exactly same).
I am able to go in each table and then delete duplicate row using Dbeaver, but because there are over 1000 tables, it is very time consuming.
Example of tables:
table1
name gender age
a m 20
a m 20
b f 21
b f 21
table2
fruit hobby
x running
x running
y stamp
y stamp
How can I do the following:
Identify tables in postgres with duplicate rows.
Delete all duplicate rows, leaving 1 record.
I need to do this on all 1000+ tables at once.
As you want to automate your deduplication of all table, you need to use plpgsql function where you can write dynamic queries to achieve it.
Try This function:
create or replace function func_dedup(_schemaname varchar) returns void as
$$
declare
_rec record;
begin
for _rec in select table_name from information_schema. tables where table_schema=_schemaname
loop
execute format('CREATE TEMP TABLE tab_temp as select DISTINCT * from '||_rec.table_name);
execute format('truncate '||_rec.table_name);
execute format('insert into '||_rec.table_name||' select * from tab_temp');
execute format('drop table tab_temp');
end loop;
end;
$$
language plpgsql
Now call your function like below:
select * from func_dedup('your_schema'); --
demo
Steps:
Get the list of all tables in your schema by using below query and loop it for each table.
select table_name from information_schema. tables where table_schema=_schemaname
Insert all distinct records in a TEMP TABLE.
Truncate your main table.
Insert all your data from TEMP TABLE to main table.
Drop the TEMP TABLE. (here dropping temp table is important we have to reuse it for next loop cycle.)
Note - if your tables are very large in size the consider using Regular Table instead of TEMP TABLE.

What is the scope of a PostgreSQL Temp Table?

I have googled quite a bit, and I have fairly decent reading comprehension, but I don't understand if this script will work in multiple threads on my postgres/postgis box. Here is the code:
Do
$do$
DECLARE
x RECORD;
b int;
begin
create temp table geoms (id serial, geom geometry) on commit drop;
for x in select id,geom from asdf loop
truncate table geoms;
insert into geoms (geom) select someGeomfield from sometable where st_intersects(somegeomfield,x.geom);
----do something with the records in geoms here...and insert that data somewhere else
end loop;
end;
$do$
So, if I run this in more than one client, called from Java, will the scope of the geoms temp table cause problems? If so, any ideas for a solution to this in PostGres would be helpful.
Thanks
One subtle trap you will run into though, which is why I am not quite ready to declare it "safe" is that the scope is per session, but people often forget to drop the tables (so they drop on disconnect).
I think you are much better off if you don't need the temp table after your function to drop it explicitly after you are done with it. This will prevent issues that arise from trying to run the function twice in the same transaction. (On commit you are dropping)
Temp tables in PostgreSQL (or Postgres) (PostGres doesn't exists) are local only and related to session where they are created. So no other sessions (clients) can see temp tables from other session. Both (schema and data) are invisible for others. Your code is safe.

How to find table creation time?

How can I find the table creation time in PostgreSQL?
Example: If I created a file I can find the file creation time like that I want to know the table creation time.
I had a look through the pg_* tables, and I couldn't find any creation times in there. It's possible to locate the table files, but then on Linux you can't get file creation time. So I think the answer is that you can only find this information on Windows, using the following steps:
get the database id with select datname, datdba from pg_database;
get the table filenode id with select relname, relfilenode from pg_class;
find the table file and look up its creation time; I think the location should be something like <PostgreSQL folder>/main/base/<database id>/<table filenode id> (not sure what it is on Windows).
You can't - the information isn't recorded anywhere. Looking at the table files won't necessarily give you the right information - there are table operations that will create a new file for you, in which case the date would reset.
I don't think it's possible from within PostgreSQL, but you'll probably find it in the underlying table file's creation time.
Suggested here :
SELECT oid FROM pg_database WHERE datname = 'mydb';
Then (assuming the oid is 12345) :
ls -l $PGDATA/base/12345/PG_VERSION
This workaround assumes that PG_VERSION is the least likely to be modified after the creation.
NB : If PGDATA is not defined, check Where does PostgreSQL store the database?
Check data dir location
SHOW data_directory;
Check For Postgres relation file path :
SELECT pg_relation_filepath('table_name');
you will get the file path of your relation
check for creation time of this file <data-dir>/<relation-file-path>
I tried a different approach to get table creation date which could help for keeping track of dynamically created tables. Suppose you have a table inventory in your database where you manage to save the creation date of the tables.
CREATE TABLE inventory (id SERIAL, tablename CHARACTER VARYING (128), created_at DATE);
Then, when a table you want to keep track of is created it's added in your inventory.
CREATE TABLE temp_table_1 (id SERIAL); -- A dynamic table is created
INSERT INTO inventory VALUES (1, 'temp_table_1', '2020-10-07 10:00:00'); -- We add it into the inventory
Then you could get advantage of pg_tables to run something like this to get existing table creation dates:
SELECT pg_tables.tablename, inventory.created_at
FROM pg_tables
INNER JOIN inventory
ON pg_tables.tablename = inventory.tablename
/*
tablename | created_at
--------------+------------
temp_table_1 | 2020-10-07
*/
For my use-case it is ok because I work with a set of dynamic tables that I need to keep track of.
P.S: Replace inventory in the database with your table name.
I'm trying to follow a different way for obtain this.
Starting from this discussion my solution was:
DROP TABLE IF EXISTS t_create_history CASCADE;
CREATE TABLE t_create_history (
gid serial primary key,
object_type varchar(20),
schema_name varchar(50),
object_identity varchar(200),
creation_date timestamp without time zone
);
--delete event trigger before dropping function
DROP EVENT TRIGGER IF EXISTS t_create_history_trigger;
--create history function
DROP FUNCTION IF EXISTS public.t_create_history_func();
CREATE OR REPLACE FUNCTION t_create_history_func()
RETURNS event_trigger
LANGUAGE plpgsql
AS $$
DECLARE
obj record;
BEGIN
FOR obj IN SELECT * FROM pg_event_trigger_ddl_commands () WHERE command_tag in ('SELECT INTO','CREATE TABLE','CREATE TABLE AS')
LOOP
INSERT INTO public.t_create_history (object_type, schema_name, object_identity, creation_date) SELECT obj.object_type, obj.schema_name, obj.object_identity, now();
END LOOP;
END;
$$;
--ALTER EVENT TRIGGER t_create_history_trigger DISABLE;
--DROP EVENT TRIGGER t_create_history_trigger;
CREATE EVENT TRIGGER t_create_history_trigger ON ddl_command_end
WHEN TAG IN ('SELECT INTO','CREATE TABLE','CREATE TABLE AS')
EXECUTE PROCEDURE t_create_history_func();
In this way you obtain a table that records all the creation tables.
--query
select pslo.stasubtype, pc.relname, pslo.statime
from pg_stat_last_operation pslo
join pg_class pc on(pc.relfilenode = pslo.objid)
and pslo.staactionname = 'CREATE'
Order By pslo.statime desc
will help to accomplish desired results
(tried it on greenplum)
You can get this from pg_stat_last_operation. Here is how to do it:
select * from pg_stat_last_operation where objid = 'table_name'::regclass order by statime;
This table stores following operations:
select distinct staactionname from pg_stat_last_operation;
staactionname
---------------
ALTER
ANALYZE
CREATE
PARTITION
PRIVILEGE
VACUUM
(6 rows)