PostgreSQL ALTER TABLE explicit locking doesn't work as expected - postgresql

I'm trying to perform ALTER TABLE on a huge table without full access locking.
According to ALTER TABLE documentation:
An ACCESS EXCLUSIVE lock is held unless explicitly noted
Hence I do the following:
BEGIN;
LOCK TABLE MyTable IN SHARE MODE;
ALTER TABLE MyTable ALTER COLUMN size SET DATA TYPE BIGINT;
COMMIT;
I expect that it gives me a possibility to perform SELECT queries during the upgrade.
But in fact it doesn't. Looking at pg_locks during the transaction I've found that:
SELECT c.relname, l.mode, l.granted, l.pid
FROM pg_locks as l JOIN pg_class as c on c.oid = l.relation
WHERE relname='MyTable' AND granted IS TRUE;
relname | mode | granted | pid
----------+---------------------+---------+------
MyTable | ShareLock | t | 2277
MyTable | AccessExclusiveLock | t | 2277
So AccessExclusiveLock unexpectedly was taken as well and it explains why my SELECTs are hanging until the end of the transaction
I use PostgreSQL 9.4

You seem to misinterpret the unless explicitly noted bit.
It means that, given that there are different actions grouped under ALTER TABLE, for some of them it's possible that a lock weaker than ACCESS EXCLUSIVE might be sufficient, and when it's in the case, it's explicity noted in the documentation.
For instance (from
https://www.postgresql.org/docs/9.5/static/sql-altertable.html ):
SET STATISTICS acquires a SHARE UPDATE EXCLUSIVE lock.
...
Changing per-attribute options acquires a SHARE UPDATE EXCLUSIVE lock.
...
Validation acquires only a SHARE UPDATE EXCLUSIVE lock on the table being altered. If the constraint is a foreign key then a ROW SHARE lock is also required on the table referenced by the constraint
It doesn't mean that the actions that require an ACCESS EXCLUSIVE lock (such as changing the type of a column) could be influenced by a previous explicit weaker lock grabbed on the table in the same transaction. They are going to need an ACCESS EXCLUSIVE lock no matter what.

Related

How to check whether table is busy or free before running the ALTER or creating TRIGGER on that table

We have thousands of tables. Out of these tables we have few tables. Which are busy some times. If I execute any ALTER statement or creating trigger on those tables I am unable to do it. How to check whether table is busy or free before running the ALTER or creating TRIGGER on that table in postgresql database.
The easiest way would be to run
LOCK TABLE mytable NOWAIT;
If you get no error, the ALTER TABLE statement can proceed without waiting.
Query below returns locked objects in a database.
select t.relname, l.locktype, page, virtualtransaction, pid, mode, granted
from pg_locks l, pg_stat_all_tables t
where l.relation=t.relid
order by relation asc;

Alter Table Set Statistics requires table lock

I have run into a case such that Pg always preferring into a sequential scan for a table that has around 70M rows. (Index scan is ideal for that query and i have confirmed it by setting enable_seq_scan=off, speed improved by 200x)
So, in order to help Pg understand my data better i executed this
ALTER TABLE tablename ALTER COLUMN columnname SET STATISTICS 1000;
Unfortunately this requires Update Exclusive lock which locks the entire table (too much lock).
Is there a solution to avoid locking for this statement ?
Data sharding is done for this table based on Primary Key Range, so I would like Pg to even understand my Pk better so that it knows which User has got large data. Will it be of use if i increase the statistics of PrimaryKey column as well ?
From the very docs you linked
SET STATISTICS
This form sets the per-column statistics-gathering target for subsequent ANALYZE operations. The target can be set in the range 0 to 10000; alternatively, set it to -1 to revert to using the system default statistics target (default_statistics_target). For more information on the use of statistics by the PostgreSQL query planner, refer to Section 14.2.
SET STATISTICS acquires a SHARE UPDATE EXCLUSIVE lock.
And, on the docs for Explicit Locking
SHARE UPDATE EXCLUSIVE
Conflicts with the SHARE UPDATE EXCLUSIVE, SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, and ACCESS EXCLUSIVE lock modes. This mode protects a table against concurrent schema changes and VACUUM runs.
Acquired by VACUUM (without FULL), ANALYZE, CREATE INDEX CONCURRENTLY, and ALTER TABLE VALIDATE and other ALTER TABLE variants (for full details see ALTER TABLE).
So you can't change the schema, or vacuum while analytics are happening. So what? They should happen very fast. Almost instantly.

Is pg_class.relhastriggers a safe alternative to `ALTER TABLE ... DISABLE TRIGGER USER`?

On my table I have a trigger which prevents updates, therefore meaning that rows are effectively immutable post insertion.
When needing to perform retrospective updates to this table (e.g. adding a new calculated field) I have taken the following approach:
ALTER TABLE my_table DISABLE TRIGGER USER;
UPDATE my_table
SET x = (...);
ALTER TABLE my_table ENABLE TRIGGER USER;
The downside to this approach is that it requires an AccessExclusiveLock.
I was wondering if the following is safe for me to use given it is guaranteed that the rows in the UPDATE will not be being updated by other queries:
BEGIN;
UPDATE pg_class
SET relhastriggers = FALSE
WHERE relname = 'my_table';
UPDATE my_table
SET x = (...);
UPDATE pg_class
SET relhastriggers = TRUE
WHERE relname = 'my_table';
COMMIT;
What I have tried so far suggests this is safe and that outside of this transaction the triggers will continue to be applied as normal.
Also, if it is indeed safe for my use case what are the circumstances in which it would not be safe?
I am using Postgres 9.4.8.
Thanks :-)
Updating relhastriggers will behave more like DISABLE TRIGGER ALL, i.e. it will also disable the internal triggers used for foreign key and deferred uniqueness checks. This may or may not be an issue in your case.
I don't know if a direct update of pg_class might violate some assumption made in some corner of the Postgres codebase. But in general, hacking catalog tables is inherently unsafe; even if it doesn't break anything now, there is no guarantee that this will be the case in future versions.
REVOKE UPDATE ON my_table is a far better approach for blocking updates. Superusers are automatically exempt, so anybody with permission to UPDATE pg_class will have no problem with my_table.
If, for whatever reason, you really do need to do this with a trigger, there is another (somewhat hacky, but at least supported) way of circumventing it. By default, triggers will not be fired by replication processes (though this can be controlled via the ENABLE ALWAYS clause of ALTER TABLE). This means that you can bypass the trigger by impersonating a replicator:
BEGIN;
SET LOCAL session_replication_role TO replica;
UPDATE my_table SET x = (...);
COMMIT;
As with the catalog update, this will also disable any internal constraint triggers.

Avoid exclusive access locks on referenced tables when DROPping in PostgreSQL

Why does dropping a table in PostgreSQL require ACCESS EXCLUSIVE locks on any referenced tables? How can I reduce this to an ACCESS SHARED lock or no lock at all? i.e. is there a way to drop a relation without locking the referenced table?
I can't find any mention of which locks are required in the documentation, but unless I explicitly get locks in the correct order when dropping multiple tables during concurrent operations, I can see deadlocks waiting on an AccessExclusiveLock in the logs, and acquiring this restrictive lock on commonly-referenced tables is causing momentary delays to other processes when tables are deleted.
To clarify,
CREATE TABLE base (
id SERIAL,
PRIMARY KEY (id)
);
CREATE TABLE main (
id SERIAL,
base_id INT,
PRIMARY KEY (id),
CONSTRAINT fk_main_base (base_id)
REFERENCES base (id)
ON DELETE CASCADE ON UPDATE CASCADE
);
DROP TABLE main; -- why does this need to lock base?
For anyone googling and trying to understand why their drop table (or drop foreign key or add foreign key) got stuck for a long time:
PostgreSQL (I looked at versions 9.4 to 13) foreign key constraints are actually implemented using triggers on both ends of the foreign key.
If you have a company table (id as primary key) and a bank_account table (id as primary key, company_id as foreign key pointing to company.id), then there are actually 2 triggers on the bank_account table and also 2 triggers on the company table.
table_name
timing
trigger_name
function_name
bank_account
AFTER UPDATE
RI_ConstraintTrigger_c_1515961
RI_FKey_check_upd
bank_account
AFTER INSERT
RI_ConstraintTrigger_c_1515960
RI_FKey_check_ins
company
AFTER UPDATE
RI_ConstraintTrigger_a_1515959
RI_FKey_noaction_upd
company
AFTER DELETE
RI_ConstraintTrigger_a_1515958
RI_FKey_noaction_del
Initial creation of those triggers (when creating the foreing key) requires SHARE ROW EXCLUSIVE lock on those tables (it used to be ACCESS EXCLUSIVE lock in version 9.4 and earlier). This lock does not conflict with "data reading locks", but will conflict with all other locks, for example a simple INSERT/UPDATE/DELETE into company table.
Deletion of those triggers (when droping the foreign key, or the whole table) requires ACCESS EXCLUSIVE lock on those tables. This lock conflicts with every other lock!
So imagine a scenario, where you have a transaction A running that first did a simple SELECT from company table (causing it to hold an ACCESS SHARE lock for company table until the transaction is commited or rolled back) and is now doing some other work for 3 minutes. You try to drop the bank_account table in transaction B. This requires ACCESS EXCLUSIVE lock, which will need to wait until the ACCESS SHARE lock is released first.
In addition of that all other transactions, which want to access the company table (just SELECT, or maybe INSERT/UPDATE/DELETE), will be queued to wait on the ACCESS EXCLUSIVE lock, which is waiting on the ACCESS SHARE lock.
Long running transactions and DDL changes require delicate handling.
-- SESSION#1
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path=tmp;
BEGIN;
CREATE TABLE base (
id SERIAL
, dummy INTEGER
, PRIMARY KEY (id)
);
CREATE TABLE main (
id SERIAL
, base_id INTEGER
, PRIMARY KEY (id)
, CONSTRAINT fk_main_base FOREIGN KEY (base_id) REFERENCES base (id)
-- comment the next line out ( plus maybe tghe previous one)
ON DELETE CASCADE ON UPDATE CASCADE
);
-- make some data ...
INSERT INTO base (dummy)
SELECT generate_series(1,10)
;
-- make some FK references
INSERT INTO main(base_id)
SELECT id FROM base
WHERE random() < 0.5
;
COMMIT;
BEGIN;
DROP TABLE main; -- why does this need to lock base?
SELECT pg_backend_pid();
-- allow other session to check the locks
-- and attempt an update to "base"
SELECT pg_sleep(20);
-- On rollback the other session will fail.
-- On commit the other session will succeed.
-- In both cases the other session must wait for us to complete.
-- ROLLBACK;
COMMIT;
-- SESSION#2
-- (Start this after session#1 from a different terminal)
SET search_path = tmp, pg_catalog;
PREPARE peeklock(text) AS
SELECT dat.datname
, rel.relname as relrelname
, cat.relname as catrelname
, lck.locktype
-- , lck.database, lck.relation
, lck.page, lck.tuple
-- , lck.virtualxid, lck.transactionid
-- , lck.classid
, lck.objid, lck.objsubid
-- , lck.virtualtransaction
, lck.pid, lck.mode, lck.granted, lck.fastpath
FROM pg_locks lck
LEFT JOIN pg_database dat ON dat.oid = lck.database
LEFT JOIN pg_class rel ON rel.oid = lck.relation
LEFT JOIN pg_class cat ON cat.oid = lck.classid
WHERE EXISTS(
SELECT * FROM pg_locks l
JOIN pg_class c ON c.oid = l.relation AND c.relname = $1
WHERE l.pid =lck.pid
)
;
EXECUTE peeklock( 'base' );
BEGIN;
-- attempt to perfom some DDL
ALTER TABLE base ALTER COLUMN id TYPE BIGINT;
-- attempt to perfom some DML
UPDATE base SET id = id+100;
COMMIT;
EXECUTE peeklock( 'base' );
\d base
SELECT * FROM base;
I suppose DDL locks everything it touches exclusively for the sake of simplicity — you're not supposed to run DDL involving not-temporary tables during normal operation anyway.
To avoid deadlock you may use advisory lock:
start transaction;
select pg_advisory_xact_lock(0);
drop table main;
commit;
This would ensure that only one client is concurrently running DDL involving referenced tables so it wouldn't matter in which order would other locks be acquired.
You can avoid locking table for long time by dropping foreign key first:
start transaction;
select pg_advisory_xact_lock(0);
alter table main drop constraint fk_main_base;
commit;
start transaction;
drop table main;
commit;
This would still need to lock base exclusively, but for much shorter time.

Prevent users to create tables in default tablespace

I've a problem and haven't find any clues so far. I'll try to explain it the best I can, but feel free to ask for more details!
Context
I'm working with Postgres 9.2.4 on Windows, and I need to implement some kind of quota administration for each user.
As far as I've read, there's no such built-in functionality, and most answers points to use file system's quota administration capabilities.
There's one single database, and each user will have his own schema.
The approach I've taken includes the separation of data files for each user on different locations by having different tablespaces, one for each user, being the user the owner of his tablespace (so I can apply the quota configuration on a per folder basis).
This led me to the problem I'm facing...
Problem
It happens that, when creating a table, the user is able to select the pg_default tablespace to store the data.
To add to my confusion, if later I change the tablespace to the one owned by the user, and then try to switch it back to the pg_default tablespace, a permission denied error is thrown.
To clarify the sequence here is some sample code:
-- Creates the table in the default tablespace
CREATE TABLE test_schema.test_table ( )
TABLESPACE pg_default;
-- Changes the tablespace to the one owned by the user
ALTER TABLE test_schema.test_table
SET TABLESPACE user_tablespace;
-- Tries to set back the pg_default tablespace (throws permission denied to pg_default tablespace)
ALTER TABLE test_schema.test_table
SET TABLESPACE pg_default;
All these commands were executed using a user login without administrative privileges. The pg_default tablespace is owned by the postgres login (administrative account).
My guess is that it has something to do with the database tablespace, which is set to use the pg_default tablespace.
Question
It is possible to constraint a user to only create objects in their owned tablespace?
If you use disk quota then you give yourself an awful lot of work. There is, in fact, an approximate solution in PostgreSQL, with some minor tinkering and no need to make a large number of tablespaces (schemas will still be a good idea to give every user his/her own namespace).
The function pg_total_relation_size(regclass) gives you the total disk space used for a table, including its indexes and TOAST tables. So scan pg_class and sum up:
CREATE VIEW user_disk_usage AS
SELECT r.rolname, SUM(pg_total_relation_size(c.oid)) AS total_disk_usage
FROM pg_class c, pg_roles r
WHERE c.relkind = 'r'
AND c.relowner = r.oid
GROUP BY c.relowner;
This gives you the total disk space used by each owner, irrespective of where tables are located. It is presented as a view definition here for use below.
To make this work in a reasonably accurate fashion you need to regularly VACUUM ANALYZE your database. If you have low traffic periods (e.g. 3am-5am daily, or Sunday) run it then using a scheduled job with user postgres. Create a function for that job that does the VACUUM and then the quota check:
CREATE FUNCTION user_quota_check() RETURNS void AS $$
DECLARE
user_data record;
BEGIN
-- Vacuum the database to get accurate disk use data
VACUUM FULL ANALYZE;
-- Find users over disk quota
FOR user_data IN SELECT * FROM user_disk_usage LOOP
IF (user_data.total_disk_usage > <<your quota>>) THEN
EXECUTE 'REVOKE CREATE ON SCHEMA ' || <<user''s schema name>> || ', PUBLIC FROM ' || user_data.rolname;
-- REVOKE INSERT privileges too, unless you work with BEFORE INSERT triggers on all tables
END IF;
END LOOP;
END; $$ LANGUAGE plpgsql;
REVOKE ALL ON FUNCTION user_quota_check() FROM PUBLIC;
If the owner goes over the quota you can REVOKE CREATE on all relevant schemas, typically only the schema assigned to the user and the public schema, such that no new tables can be created. You should also REVOKE INSERT on all tables but this is easily circumvented because the owner can GRANT INSERT right back. That, however, could be cause for more drastic action against the user. Preferably you will create a before insert trigger on every table in the database, using a daily sweep just like the one above.
A user will still have SELECT privileges so he/she can still access data. More interestingly, DELETE and TRUNCATE will allow the user to free disk space and remedy the lock-out. The privileges can then be re-instated using something similar to the above function:
CREATE FUNCTION reclaim_disk_space() RETURNS void AS $$
DECLARE
disk_use bigint;
BEGIN
-- Vacuum current_user's tables.
-- Slow and therefore adequate punishment for going over quota.
VACUUM FULL VERBOSE ANALYZE;
-- Now re-instate privileges if enough space was reclaimed.
SELECT total_disk_usage INTO disk_use
FROM user_disk_usage
WHERE rolname = session_user;
IF (disk_use < <<your quota>>) THEN
EXECUTE 'GRANT CREATE ON SCHEMA ' || <<user''s schema name>> || ', PUBLIC TO ' || user_data.rolname;
-- GRANT INSERT privileges too, unless you work with BEFORE INSERT triggers on all tables
RAISE NOTICE 'Disk use under quota limit. Privileges restored.';
ELSE
RAISE NOTICE 'Still using too much disk space. Free up more space.';
END IF;
END; $$ LANGUAGE plpgsql;
The locked-out user can call this function him-/herself after having deleted sufficient data to go under the quota limit.
You can add more sophisticated features, such as having a table listing quotas per user (instead of an overall quota) and comparing actual use against that quota, issuing a RAISE NOTICE on an insert trigger when going over 80% of quota (this requires every table to have a before insert trigger, which can easily be done by the postgres user in a regular sweep of new tables, same trigger can be used to deny inserts if over the quota), repeating that notice every hour (so record when last notice was issued), etc.
This solution is approximate because the quota are not checked in real-time. This is possible (run the user_quota_check() on every insert, modified to check just the tables of the session_user) but most likely too much overhead to make it interesting. Run user_quota_check() overnight to have daily management of quotas. And manually flog any user using up too much space during the day.