What is the scope of a PostgreSQL Temp Table? - plpgsql

I have googled quite a bit, and I have fairly decent reading comprehension, but I don't understand if this script will work in multiple threads on my postgres/postgis box. Here is the code:
Do
$do$
DECLARE
x RECORD;
b int;
begin
create temp table geoms (id serial, geom geometry) on commit drop;
for x in select id,geom from asdf loop
truncate table geoms;
insert into geoms (geom) select someGeomfield from sometable where st_intersects(somegeomfield,x.geom);
----do something with the records in geoms here...and insert that data somewhere else
end loop;
end;
$do$
So, if I run this in more than one client, called from Java, will the scope of the geoms temp table cause problems? If so, any ideas for a solution to this in PostGres would be helpful.
Thanks

One subtle trap you will run into though, which is why I am not quite ready to declare it "safe" is that the scope is per session, but people often forget to drop the tables (so they drop on disconnect).
I think you are much better off if you don't need the temp table after your function to drop it explicitly after you are done with it. This will prevent issues that arise from trying to run the function twice in the same transaction. (On commit you are dropping)

Temp tables in PostgreSQL (or Postgres) (PostGres doesn't exists) are local only and related to session where they are created. So no other sessions (clients) can see temp tables from other session. Both (schema and data) are invisible for others. Your code is safe.

Related

PostgreSQL - Left pad value on COPY

I'm bringing in data from Excel into a PostgreSQL Db. There's a lot wrong with this data, but one thing that seems to connect several tables is a customer_id.
However, in the customer table I've a unique char(8) that always has a leading zero. Yes, if it were up to me I'd enforce this data weren't so screwy upstream, but I'm dealing with sales folks here, manufacturing there, financing, etc.
And, the customer id ALMOST matches through these various sources! It is just that the customer_id some data doesn't have the leading zero, so customers.id = '01234567' does represent orders.customer_id = '1234567'.
I'm using COPY command in Postgres, which is a new thing to me. Unfortunately, I cannot define a foreign key relationship on customer.id because of this small discrepancy.
How would I do a COPY and tell the column value to add a leading zero? Is this possible? I'm hoping I can do it right in the COPY statement? Thanks for any insight in how to do this!
EDIT:
A comment lead me to this documentation. I'll update with an answer after I figure this out. Looks like an ON BEFORE INSERT is what I'll need.
CREATE TRIGGER trigger_name
{BEFORE | AFTER} { event }
ON table_name
[FOR [EACH] { ROW | STATEMENT }]
EXECUTE PROCEDURE trigger_function
I'm the original poster and this is the answer to my question. I was bringing in data from XLS to PG and the leading zeros on customer_id(s) were dropped when exporting XLS to CSV for a COPY into PG.
Thanks be to an answer here that really pointed me down the right path: Postgresql insert trigger to set value
-- create table
CREATE TABLE T (customer_id char(8));
-- draft function to be used by trigger. NOTE the double single quotes.
CREATE FUNCTION lpad_8_0 ()
RETURNS trigger AS '
BEGIN
NEW.customer_id := (SELECT LPAD(NEW.customer_id, 8, ''0''));
RETURN NEW;
END' LANGUAGE 'plpgsql';
-- setup on before insert trigger to execute lpad_8_0 function
CREATE TRIGGER my_on_before_insert_trigger
BEFORE INSERT ON T
FOR EACH ROW
EXECUTE PROCEDURE lpad_8_0();
-- some sample inserts
INSERT INTO T
VALUES ('1234'), ('7');
Here's a working fiddle: http://sqlfiddle.com/#!17/a176e/1/0
NOTE: If the value here were larger than char(8) the COPY will still fail.

Is it safe to use temporary tables when an application may try to create them for independent, but simultaneous processes?

I am hoping that I can articulate this effectively, so here it goes:
I am creating a model which will be run on a platform by users, possibly simultaneously, but each model run is marked by a unique integer identifier. This model will execute a series of PostgreSQL queries and eventually write a result elswehere.
Now because of the required parallelization of model runs, I have to make sure that the processes will not collide, despite running in the same database. I am at a point now where I have to store a list of records, sorted by a score variable and then operate on them. This is the beginning of the query:
DO
$$
DECLARE row RECORD;
BEGIN
DROP TABLE IF EXISTS ranked_clusters;
CREATE TEMP TABLE ranked_clusters AS (
SELECT
pl.cluster_id AS c_id,
SUM(pl.total_area) AS cluster_score
FROM
emob.parking_lots AS pl
WHERE
pl.cluster_id IS NOT NULL
AND
run_id = 2005149
GROUP BY
pl.cluster_id
ORDER BY
cluster_score DESC
);
FOR row IN SELECT c_id FROM ranked_clusters LOOP
RAISE NOTICE 'Cluster %', row.c_id;
END LOOP;
END;
$$ LANGUAGE plpgsql;
So I create a temporary table called ranked_clusters and then iterate through it, at the moment just logging the identifiers of each record.
I have been careful to only build this list from records which have a run_id value equal to a certain number, so data from the same source, but with a different number will be ignored.
What I am worried about however is that a simultaneous process will also create its own ranked_clusters temporary table, which will collide with the first one, invalidating the results.
So my question is essentially this: Are temporary tables only visible to the session which creates them (or to the cursor object from say, Python)? And is it therefore safe to use a temporary table in this way?
The main reason I ask is because I see that these so-called "temporary" tables seem to persist after I execute the query in PgAdmin III, and the query fails on the next execution because the table already exists. This troubles me because it seems as though the tables are actually globally accessible during their lifetime and would therefore introduce the possibility of a collision when a simultaneous run occurs.
Thanks #a_horse_with_no_name for the explanation but I am not yet convinced that it is safe, because I have been able to execute the following code:
import psycopg2 as pg2
conn = pg2.connect(dbname=CONFIG["GEODB_NAME"],
user=CONFIG["GEODB_USER"],
password=CONFIG["GEODB_PASS"],
host=CONFIG["GEODB_HOST"],
port=CONFIG["GEODB_PORT"])
conn.autocommit = True
cur = conn.cursor()
conn2 = pg2.connect(dbname=CONFIG["GEODB_NAME"],
user=CONFIG["GEODB_USER"],
password=CONFIG["GEODB_PASS"],
host=CONFIG["GEODB_HOST"],
port=CONFIG["GEODB_PORT"])
conn2.autocommit = True
cur2 = conn.cursor()
cur.execute("CREATE TEMPORARY TABLE temptable (tempcol INTEGER); INSERT INTO temptable VALUES (0);")
cur2.execute("SELECT tempcol FROM temptable;")
print(cur2.fetchall())
And I receive the value in temptable despite it being created as a temporary table in a completely different connection as the one which queries it afterwards. Am I missing something here? Because it seems like the temporary table is indeed accessible between connections.
The above had a typo, Both cursors were actually being spawned from conn, rather than one from conn and another from conn2. Individual connections in psycopg2 are not able to access each other's temporary tables, but cursors spawned from the same connection are.
Temporary tables are only visible to the session (=connection) that created them. Even if two sessions create the same table, they won't interfere with each other.
Temporary tables are removed automatically when the session is disconnected.
If you want to automatically remove them when your transaction ends, use the ON COMMIT DROP option when creating the table.
So the answer is: yes, this is safe.
Unrelated, but: you can't store rows "in a sorted way". Rows in a table have no implicit sort order. The only way you can get a guaranteed sort order is to use an ORDER BY when selecting the rows. The order by that is part of your CREATE TABLE AS statement is pretty much useless.
If you have to rely on the sort order of the rows, the only safe way to do that is in the SELECT statement:
FOR row IN SELECT c_id FROM ranked_clusters ORDER BY cluster_score
LOOP
RAISE NOTICE 'Cluster %', row.c_id;
END LOOP;

Postgresql insert stopped working, duplicate key value violations

About 8 months ago I used a suggestion to set up a holding table, then push to the formal table and prevent duplicate entries, per this post: Best way to prevent duplicate data on copy csv postgresql
It's been working very nicely, but today I noticed some errors and gaps in the data.
Here's my insert statement:
And here's how the index is set up:
And here's an example of the error I'm getting, although on the next chron insert, it went through.
Here's it going through fine:
I haven't noted any large changes in the data incoming. Here's what the data looks like that's coming in now:
In summary, I've noticed recent oddities with the insert statements, and success is erratic, resulting in large data gaps in the database. Thanks for any help, and I'm happy to provide more details, but I wanted to see if my information sounds like something someone else has already dealt with.
Thanks very much for any help,
S
As Gordon pointed out in his answer to your previous question, this approach only works if you have exclusive access to the table. There is a delay between the existence check and the insert itself, and if another process modifies the table during this window, you may end up with duplicates.
If you're on Postgres 9.5+, the best approach is to skip the existence check and simply use an INSERT ... ON CONFLICT DO NOTHING statement.
On earlier versions, the simplest solution (if you can afford to do so) would be to lock the table for the duration of the import. Otherwise, you can emulate ON CONFLICT DO NOTHING (albeit less efficiently) using a loop and an exception handler:
DO $$
DECLARE r RECORD;
BEGIN
FOR r IN (SELECT * FROM holding) LOOP
BEGIN
INSERT INTO ltg_data (pulsecount, intensity, time, lon, lat, ltg_geom)
VALUES (r.pulsecount, r.intensity, r.time, r.lon, r.lat, r.ltg_geom);
EXCEPTION WHEN unique_violation THEN
END;
END LOOP;
END
$$
As an aside, DELETE FROM holding is time-consuming and probably unnecessary. Create your staging table as TEMP, and it will be cleaned up automatically at the end of your session. You can easily build a table which matches the structure of your import target:
CREATE TEMP TABLE holding (LIKE ltg_data);

Drop DB2 table if exists

In my script I have to do a lot of selects to a joined table, so instead I decided to put this join into a temporal table.
First I thought:
1. Create table
2. Put the data from the join into a table
3. Drop the table
But then I thought, what if the script fails before I dropped the table?
So I decided to go with:
1. Drop the table
2. Create the table
3. Put the data from the join into a table
I don't really mind if the table is left there until the next time I run the script, so the second option works too.
But what if somebody had already dropped the table?
I saw some systems have a "drop if exists" but unfortunately not DB2. I would like to do something that won't make the script die when the drop table fails.
Ideas? On any of this? Thanks!
EDIT: I forgot to say this is in a PERL script!
The best way to do this is by using an annonymous block like in this code
You need to call the drop table in a dynamic sql, and catch the exception in the block.
--#SET TERMINATOR #
begin
declare statement varchar(128);
declare continue handle for sqlstate '42710' BEGIN END;
SET STATEMENT = 'DROP TABLE MYTABLE';
EXECUTE IMMEDIATE STATEMENT;
end #
This code will run normally in DB2. It does not need to be part of a procedure nor function.
Why not look for the table first? If you find it, it needs to be dropped; if you don't, it doesn't.
db2perf_quiet_drop that might works the way you want.. Its a free add-on :)
You can look into this post too..
http://www.dbforums.com/showthread.php?1609047-DB2-equivalent-for-mysql-s-DROP-TABLE-IF-EXISTS
If this doesn't work for you please let me know what error you are getting so I can try to help :)
Or this might work
if( NOT exists( create table detailval
(
id int,
detaildeptNo int,
info varchar(255)
);
insert into detailval(1,1, 'detail values A');
insert into detailval(2,1, 'detail values B');
insert into detailval(3,1, 'detail values C');
insert into detailval(4,2, 'detail values D');
)
)
then customStoredproc('droptable');
end if;
End
I think you should look into working with temporary tables (DECLARE GLOBAL TEMPORARY TABLE). They are stored in the temporary table space and are dropped automatically after commit.
You can easily also query syscat.tables like this:
select COUNT(*) from SYSCAT.TABLES where TRIM(TABNAME) = '<some_table_name>'
if this query returns 0 then the table does not exists.

Create a temp table (if not exists) for use into a custom procedure

I'm trying to get the hang of using temp tables:
CREATE OR REPLACE FUNCTION test1(user_id BIGINT) RETURNS BIGINT AS
$BODY$
BEGIN
create temp table temp_table1
ON COMMIT DELETE ROWS
as SELECT table1.column1, table1.column2
FROM table1
INNER JOIN -- ............
if exists (select * from temp_table1) then
-- work with the result
return 777;
else
return 0;
end if;
END;
$BODY$
LANGUAGE plpgsql;
I want the row temp_table1 to be deleted immediately or as soon as possible, that's why I added ON COMMIT DELETE ROWS. Obviously, I got the error:
ERROR: relation "temp_table1" already exists
I tried to add IF NOT EXISTS but I couldn't, I simply couldn't find working example of it that would be the I'm looking for.
Your suggestions?
DROP Table each time before creating TEMP table as below:
BEGIN
DROP TABLE IF EXISTS temp_table1;
create temp table temp_table1
-- Your rest Code comes here
The problem of temp tables is that dropping and recreating temp table bloats pg_attribute heavily and therefore one sunny morning you will find db performance dead, and pg_attribute 200+ gb while your db would be like 10gb.
So we're very heavy on temp tables having >500 rps and async i\o via nodejs and thus experienced a very heavy bloating of pg_attribute because of that. All you are left with is a very aggressive vacuuming which halts performance.
All answers given here do not solve this, because they all bloat pg_attribute heavily.
So the solution is elegantly this
create temp table if not exists my_temp_table (description) on commit delete rows;
So you go on playing with temp tables and save your pg_attribute.
You want to DROP term table after commit (not DELETE ROWS), so:
begin
create temp table temp_table1
on commit drop
...
Documentation