I have written a script, using PL/pgSQL, that I run in pgAdmin III. The script deletes existing DB contents and then adds a bunch of "sample" data for the desired testing scenario (usually various types of load tests). Once the data is loaded, I would like to "vacuum analyze" the affected tables, both to recover the space from the deleted records and to accurately reflect the new contents.
I can use various workarounds (e.g. do the VACUUM ANALYZE manually, include drop/create statements for the various structures within the script, etc.) But, what I would really like to do is:
DO $$
BEGIN
-- parent table
FOR i IN 1..10000 LOOP
INSERT INTO my_parent_table( ... ) VALUES ...;
END LOOP;
VACUUM ANALYZE my_parent_table;
-- child table
FOR i IN 1..50000 LOOP
INSERT INTO my_child_table( ... ) VALUES ...;
END LOOP;
VACUUM ANALYZE my_child_table;
END;
$$;
When I run this, I get:
ERROR: VACUUM cannot be executed from a function or multi-command string
So then I tried moving the vacuum statements to the end like so:
DO $$
BEGIN
-- parent table
FOR i IN 1..10000 LOOP
INSERT INTO my_parent_table( ... ) VALUES ...;
END LOOP;
-- child table
FOR i IN 1..50000 LOOP
INSERT INTO my_child_table( ... ) VALUES ...;
END LOOP;
END;
$$;
VACUUM ANALYZE my_parent_table;
VACUUM ANALYZE my_child_table;
This give me the same error. Is there any way I can incorporate the vacuum analyze into the same script that adds the data?
I am using PostgreSQL v 9.2.
If you are running this from the pgAdmin3 query window with the "execute query" button, that sends the entire script to the server as one string.
If you execute it from the query window "execute pgScript" button, that sends the commands separately and so would work, except that it does not tolerate the DO syntax for anonymous blocks. You would have to create a function that does the currently anonymous work, and then invoke the "execute pgScript" with something like:
select inserts_function();
VACUUM ANALYZE my_parent_table;
VACUUM ANALYZE my_child_table;
The final solution I implemented ended up being a composite of suggestions made in the comments by a_horse_with_no_name and xzilla:
Using TRUNCATE instead of DELETE avoids the need for a VACUUM
ANALYZE can then be used by itself in-line during the script as needed
Related
I have a job which runs every night to load changes into a temporary table and apply those changes to the main table.
CREATE TEMP TABLE IF NOT EXIST tmp AS SELECT * FROM mytable LIMIT 0;
COPY tmp FROM PROGRAM '';
11 SQL queries to update 'mytable' based on data from 'tmp'
I have a large number of queries to delete duplicates from tmp, update values in tmp, update values in the main table and insert new rows into the main table. Is it possible to loop over both tables using plpgsql instead?
UPDATE mytable m
SET "Field" = t."Field" +1
FROM tmp t
WHERE (t."ID" = m."ID");
In this example, it is simple change of a column value. Instead, I want to do more complex operations on both the main table as well as the temp table.
EDIT: so here is some is some PSEUDO code of what I imagine.
LOOP tmp t, mytable m
BEGIN
-- operation in plpgsql including UPDATE, INSERT, DELETE
END
WHERE t.ID = m.ID;
You can use plpgsql FOR to loop over query results.
DECLARE
myrow RECORD;
BEGIN
FOR myrow IN SELECT * FROM table1 JOIN table2 USING (id)
LOOP
... do something with the row ...
END LOOP;
END
If you want to update a table while looping over it, you can create a FOR UPDATE cursor, but that won't work if the query is a join, because then you're not opening an update cursor on a table.
Note writing to/updating temp tables is much faster than writing to normal tables because temp tables don't have WAL and crash recovery overhead, and they're owned by one single connection, so you don't have to worry about locks.
If you put a query inside the loop, it will be executed many times though, which could get pretty slow. It's usually faster to use bulk queries, even if they're complicated.
If you want to UPDATE many rows in the temp table with values that depend on other tables and joins, it could be faster to run several updates on the temp table with different join and WHERE conditions.
Postgres 13.4
I've got some pg_cron jobs set up to periodically delete older records out of log-like files. What I'd like to do is to run VACUUM ANALYZE after performing a purge. Unfortunately, I can't work out how to do this in a stored function. Am I missing a trick? Is a stored procedure more appropriate?
As an example, here's one of my purge routines
CREATE OR REPLACE FUNCTION dba.purge_event_log (
retain_days_in integer_positive default 14)
RETURNS int4
AS $BODY$
WITH -- Use a CTE so that we've got a way of returning the count easily.
deleted AS (
-- Normal-looking code for this requires a literal:
-- where your_dts < now() - INTERVAL '14 days'
-- Don't want to use a literal, SQL injection, etc.
-- Instead, using a interval constructor to achieve the same result:
DELETE
FROM dba.event_log
WHERE dts < now() - make_interval (days => $1)
RETURNING *
),
----------------------------------------
-- Save details to a custom log table
----------------------------------------
logit AS (
insert into dba.event_log (name, details)
values ('purge_event_log(' || retain_days_in::text || ')',
'count = ' || (select count(*)::text from deleted)
)
)
----------------------------------------
-- Return result count
----------------------------------------
select count(*) from deleted;
$BODY$
LANGUAGE sql;
COMMENT ON FUNCTION dba.purge_event_log (integer_positive) IS
'Delete dba.event_log records older than the day count passed in, with a default retention period of 14 days.';
The truth is, I don't really care about the count(*) result from this routine, in this case. But I might want a result and an additional action in some other, similar context. As you can see, the routine deletes records, uses a CTE to insert a report into another table, and then returns a result. No matter what, I figure this example is a good way to get me head around the alternatives and options in stored functions. The main thing I want to achieve here is to delete records, and then run maintenance. if this is an awkward fit for a stored function or procedure, I could write out an entry to a vacuum_list table with the table name, and have another job to run though that list.
If there's a smarter way to approach vacuum without the extra, I'm of course interested in that. But I'm also interested in understanding the limits on what operationa you can combine in PL/PgSQL routines.
Pavel Stehule' answer is correct and complete. I decided to follow-up a bit here as I like to dig in on bugs in my code, behaviors in Postgres, etc. to get a better sense of what I'm dealing with. I'm including some notes below for anyone who finds them of use.
COMMAND cannot be executed...
The reference to "VACUUM cannot be executed inside a transaction block" gave me a better way to search the docs for similarly restricted commands. The information below probably doesn't cover everything, but it's a start.
Command Limitation
CREATE DATABASE
ALTER DATABASE If creating a new table space.
DROP DATABASE
CLUSTER Without any parameters.
CREATE TABLESPACE
DROP TABLESPACE
REINDEX All in system catalogs, database, or schema.
CREATE SUBSCRIPTION When creating a replication slot (the default behavior.)
ALTER SUBSCRIPTION With refresh option as true.
DROP SUBSCRIPTION If the subscription is associated with a replication slot.
COMMIT PREPARED
ROLLBACK PREPARED
DISCARD ALL
VACUUM
The accepted answer indicates that the limitation has nothing to do with the specific server-side language used. I've just come across an older thread that has some excellent explanations and links for stored functions and transactions:
Do stored procedures run in database transaction in Postgres?
Sample Code
I also wondered about stored procedures, as they're allowed to control transactions. I tried them out in PG 13 and, no, the code is treated like a stored function, down to the error messages.
For anyone that goes in for this sort of thing, here are the "hello world" samples of sQL and PL/PgSQL stored functions and procedures to test out how VACCUM behaves in these cases. Spoiler: It doesn't work, as advertised.
SQL Function
/*
select * from dba.vacuum_sql_function();
Fails:
ERROR: VACUUM cannot be executed from a function
CONTEXT: SQL function "vacuum_sql_function" statement 1. 0.000 seconds. (Line 13).
*/
DROP FUNCTION IF EXISTS dba.vacuum_sql_function();
CREATE FUNCTION dba.vacuum_sql_function()
RETURNS VOID
LANGUAGE sql
AS $sql_code$
VACUUM ANALYZE activity;
$sql_code$;
select * from dba.vacuum_sql_function(); -- Fails.
PL/PgSQL Function
/*
select * from dba.vacuum_plpgsql_function();
Fails:
ERROR: VACUUM cannot be executed from a function
CONTEXT: SQL statement "VACUUM ANALYZE activity"
PL/pgSQL function vacuum_plpgsql_function() line 4 at SQL statement. 0.000 seconds. (Line 22).
*/
DROP FUNCTION IF EXISTS dba.vacuum_plpgsql_function();
CREATE FUNCTION dba.vacuum_plpgsql_function()
RETURNS VOID
LANGUAGE plpgsql
AS $plpgsql_code$
BEGIN
VACUUM ANALYZE activity;
END
$plpgsql_code$;
select * from dba.vacuum_plpgsql_function();
SQL Procedure
/*
call dba.vacuum_sql_procedure();
ERROR: VACUUM cannot be executed from a function
CONTEXT: SQL function "vacuum_sql_procedure" statement 1. 0.000 seconds. (Line 20).
*/
DROP PROCEDURE IF EXISTS dba.vacuum_sql_procedure();
CREATE PROCEDURE dba.vacuum_sql_procedure()
LANGUAGE SQL
AS $sql_code$
VACUUM ANALYZE activity;
$sql_code$;
call dba.vacuum_sql_procedure();
PL/PgSQL Procedure
/*
call dba.vacuum_plpgsql_procedure();
ERROR: VACUUM cannot be executed from a function
CONTEXT: SQL statement "VACUUM ANALYZE activity"
PL/pgSQL function vacuum_plpgsql_procedure() line 4 at SQL statement. 0.000 seconds. (Line 23).
*/
DROP PROCEDURE IF EXISTS dba.vacuum_plpgsql_procedure();
CREATE PROCEDURE dba.vacuum_plpgsql_procedure()
LANGUAGE plpgsql
AS $plpgsql_code$
BEGIN
VACUUM ANALYZE activity;
END
$plpgsql_code$;
call dba.vacuum_plpgsql_procedure();
Other Options
Plenty. As I understand it, VACUUM, and a handful of other commands, are not supported in server-side code running within Postgres. Therefore, you code needs to start from somewhere else. That can be:
Whatever cron you've got in your server's OS.
Any exteral client you like.
pg_cron.
As we're deployed on RDS, those last two options are where I'll look. And there's one more:
Let AUTOVACCUM and an occasional VACCUM do their thing.
That's pretty easy to do, and seems to work fine for the bulk of our needs.
Another Idea
If you do want a bit more control and some custom logging, I'm imagining a table like this:
CREATE TABLE IF NOT EXISTS dba.vacuum_list (
database_name text,
schema_name text,
table_name text,
run boolean,
run_analyze boolean,
run_full boolean,
last_run_dts timestamp)
ALTER TABLE dba.vacuum_list ADD CONSTRAINT
vacuum_list_pk
PRIMARY KEY (database_name, schema_name, table_name);
That's just a sketch. The idea is like this:
You INSERT into vacuum_list when a table needs some vacuuming, at least as far as you're concerned.
In my case, that would be an UPSERT as I don't need a full log-like table, just a single row per table of interest with the last outcome and/or pending state.
Periodically, a remote client, etc. connects, reads the table, and executes each specified VACUUM, according to the options specified in the record.
The external client updates the row with the last run timestamp, and whatever else you're including in the row.
Optionally, you could include fields for duration and change in relation size pre:post vacuuming.
That last option is what I'm interested in. None of our VACUUM calls were working for quite some time as there was a months-old dead connection from something sever-side. VACUUM appears to run fine, in such a case, it just can't delete a whole lot of rows. (Because of the super old "open" transaction ID, visibility maps, etc.) The only way to see this sort of thing seems to be to VACUUM VERBOSE and study the output. Or to record vacuum time and, more important, relation size change to flag cases where nothing seems to happen, when it seems like it should.
VACUUM is "top level" command. It cannot be executed from PL/pgSQL ever or from any other PL.
How to prevent or avoid running update or delete statements without where clauses in PostgreSQL?
Same as SQL_SAFE_UPDATES statement in MySQL is needed for PostgreSQL.
For example:
UPDATE table_name SET active=1; -- Prevent this statement or throw error message.
UPDATE table_name SET active=1 WHERE id=1; -- This is allowed
My company database has many users with insert and update privilege any one of the users do that unsafe update.
In this secoario how to handle this.
Any idea can write trigger or any extension to handle the unsafe update in PostgreSQL.
I have switched off autocommits to avoid these errors. So I always have a transaction that I can roll back. All you have to do is modify .psqlrc:
\set AUTOCOMMIT off
\echo AUTOCOMMIT = :AUTOCOMMIT
\set PROMPT1 '%[%033[32m%]%/%[%033[0m%]%R%[%033[1;32;40m%]%x%[%033[0m%]%# '
\set PROMPT2 '%[%033[32m%]%/%[%033[0m%]%R%[%033[1;32;40m%]%x%[%033[0m%]%# '
\set PROMPT3 '>> '
You don't have to insert the PROMPT statements. But they are helpful because they change the psql prompt to show the transaction status.
Another advantage of this approach is that it gives you a chance to prevent any erroneous changes.
Example (psql):
database=# SELECT * FROM my_table; -- implicit start transaction; see prompt
-- output result
database*# UPDATE my_table SET my_column = 1; -- missed where clause
UPDATE 525125 -- Oh, no!
database*# ROLLBACK; -- Puh! revert wrong changes
ROLLBACK
database=# -- I'm completely operational and all of my circuits working perfectly
There actually was a discussion on the hackers list about this very feature. It had a mixed reception, but might have been accepted if the author had persisted.
As it is, the best you can do is a statement level trigger that bleats if you modify too many rows:
CREATE TABLE deleteme
AS SELECT i FROM generate_series(1, 1000) AS i;
CREATE FUNCTION stop_mass_deletes() RETURNS trigger
LANGUAGE plpgsql AS
$$BEGIN
IF (SELECT count(*) FROM OLD) > TG_ARGV[0]::bigint THEN
RAISE EXCEPTION 'must not modify more than % rows', TG_ARGV[0];
END IF;
RETURN NULL;
END;$$;
CREATE TRIGGER stop_mass_deletes AFTER DELETE ON deleteme
REFERENCING OLD TABLE AS old FOR EACH STATEMENT
EXECUTE FUNCTION stop_mass_deletes(10);
DELETE FROM deleteme WHERE i < 100;
ERROR: must not modify more than 10 rows
CONTEXT: PL/pgSQL function stop_mass_deletes() line 1 at RAISE
DELETE FROM deleteme WHERE i < 10;
DELETE 9
This will have a certain performance impact on deletes.
This works from v10 on, when transition tables were introduced.
If you can afford making it a little less convinient for your users, you might try revoking UPDATE privilege for all "standard" users and creating a stored procedure like this:
CREATE FUNCTION update(table_name, col_name, new_value, condition) RETURNS void
/*
Check if condition is acceptable, create and run UPDATE statement
*/
LANGUAGE plpgsql SECURITY DEFINER
Because of SECURITY DEFINER this way your users will be able to UPDATE despite not having UPDATE privilege.
I'm not sure if this is a good approach, but this way you can force as strict UPDATE (or anything else) requirements as you wish.
Of course the more complicated UPDATES are required, the more complicated has to be your procedure, but if this is mostly just about updating single row by ID (as in your example) this might be worth a try.
I have a shapefile loaded into a postgis database. This shapefile is frequently updated by the source and thus my current process is:
Use shp2pgql with -a option to generate insert statements.
Run the SQL generated in step 1 to append to database.
Of course, I end up with all the rows from both versions of the shapefile, and what I need is to get rid of all the previous rows and load only the rows from the updated shapefile.
I tried creating a trigger and trigger function in the database:
CREATE TRIGGER drop_all_rows_from_owner_table_trigger
BEFORE INSERT
ON owner_polygons_common_ownership_layer
FOR EACH STATEMENT
EXECUTE PROCEDURE drop_all_rows_from_owner_table();
Here's the trigger function:
CREATE OR REPLACE FUNCTION drop_all_rows_from_owner_table()
RETURNS trigger AS $$
BEGIN DELETE FROM owner_polygons_common_ownership_layer;
RETURN NEW;
END;
$$
LANGUAGE 'plpgsql';
I believe all I have accomplished is to delete all rows from the table, insert the new rows, then delete them again, because when I look at the table after the process ends I have zero rows. I used the FOR EACH STATEMENT clause because shp2sql created one INSERT statement.
My question is: Are triggers the way to go to accomplish this?
Your trigger function seems right.
However, I don't think this is the way to go: you cannot be sure that shp2pgsql produces a single statement.
If your shapefile grows, it could split your inserts in multiple statements.
So, if you can't use the -d option (that delete and recreate the table), I'd add a step to the process, between 1 and 2, to truncate the table.
You could also prepend the truncate statement in the generated sql file, or you can execute another psql command to truncate the table.
I have googled quite a bit, and I have fairly decent reading comprehension, but I don't understand if this script will work in multiple threads on my postgres/postgis box. Here is the code:
Do
$do$
DECLARE
x RECORD;
b int;
begin
create temp table geoms (id serial, geom geometry) on commit drop;
for x in select id,geom from asdf loop
truncate table geoms;
insert into geoms (geom) select someGeomfield from sometable where st_intersects(somegeomfield,x.geom);
----do something with the records in geoms here...and insert that data somewhere else
end loop;
end;
$do$
So, if I run this in more than one client, called from Java, will the scope of the geoms temp table cause problems? If so, any ideas for a solution to this in PostGres would be helpful.
Thanks
One subtle trap you will run into though, which is why I am not quite ready to declare it "safe" is that the scope is per session, but people often forget to drop the tables (so they drop on disconnect).
I think you are much better off if you don't need the temp table after your function to drop it explicitly after you are done with it. This will prevent issues that arise from trying to run the function twice in the same transaction. (On commit you are dropping)
Temp tables in PostgreSQL (or Postgres) (PostGres doesn't exists) are local only and related to session where they are created. So no other sessions (clients) can see temp tables from other session. Both (schema and data) are invisible for others. Your code is safe.