Schema pg_dump failed due to a Lock on a table - postgresql

I'm running backup restore on a schema every day and get this every now and then:
pg_dump: Error message from server: ERROR: relation not found (OID
86157003) DETAIL: This can be validly caused by a concurrent delete
operation on this object. pg_dump: The command was: LOCK TABLE
myschema.products IN ACCESS SHARE MODE
How can this be avoided? It seems the table was being used at the time, or someone was running something against the table. can I just kill all connections to the DB before restoring or is there another alternative?
As far as I understand, pg_dump could run even if users are doing something with the table but it doesn't seem to be the case.
Thanks,

It is somewhat buried but the answer lies here:
https://www.postgresql.org/docs/current/app-pgdump.html
"
-j njobs
...
To detect this conflict, the pg_dump worker process requests another shared lock using the NOWAIT option. If the worker process is not granted this shared lock, somebody else must have requested an exclusive lock in the meantime and there is no way to continue with the dump, so pg_dump has no choice but to abort the dump.
"
Which is borne out by the this in the error message:
"LOCK TABLE myschema.products IN ACCESS SHARE MODE"
ACCESS SHARE will cooperate with all other locks modes except ACCESS EXCLUSIVE. ACCESS EXCLUSIVE is used by DROP TABLE, TRUNCATE, REINDEX, etc. See here Locks for more information. So you need to do the dump during a time where the operations listed for ACCESS EXCLUSIVE are known to not happen or by blocking/dropping connections.

Somebody dropped a table between the time pg_dump took an inventory of the tables and the time it tries to dump the table.
This can happen if your application is in the habit of dropping tables all the time.

This is not an answer to your main question, but a caution regarding:
As far as I understand, pg_dump could run even if users are doing something with the table but it doesn't seem to be the case.
It assumes that the application performs every action in a single transaction. I have known of applications which accomplish some tasks using more than one.
I don't know exactly what the tasks were or if it was unavoidable that they use multiple transactions, but dumps could only be trusted when the application was idle or, better yet, when the service was stopped.
For the function that those applications performed, it wasn't a big deal to work around down times or stop services.
I don't know how you'd determine this behaviour without being told by the developers. Just something to consider.

Related

After using pg_dump behind pg_bouncer, the search_path appears to be altered and other clients are affected

My network looks like this:
App (many connections) pg_bouncer (few sessions) PostgreSql
nodes ----------------------- nodes ----------------- nodes
So pg_bouncer multiplexes connections giving app nodes the illusion that they are all connected directly.
The issue comes when I launch pg_dump: few milliseconds after the dump finishes, all app nodes fail with errors saying "relation xxxx does not exist" though the table or sequence is actually there. I'm pretty sure the cause is pg_bouncer manipulating the "search_path" variable, so that app nodes no longer find tables in my schema. This happens at dump time even if the dump file is not imported nor executed.
Note, I've searched SO and google and I've seen there are many threads asking about the search_path in the generated file, but that's not what I'm asking about. I have no problems with the generated file, my issue is the pg_bouncer session that other clients are using, and I haven't found anything about this.
The most obvious workaround would probably be to set the search_path manually in the app, but attention, don't fall into this fallacy: it's useless for the app to do it at the beginning since it may be assigned a different pg_bouncer session at the next transaction. And I cannot be setting it all the time.
The next most obvious workaround would be to set it back to the intended value immediately after launching pg_dump, but there's a race condition here, and other nodes are quick enough so that I fear they will still fail.
Is there a way to avoid letting pg_dump manipulate this variable, or making sure it resets it before exiting?
(Also, I'm taking for granted pg_dump and search_path are the cause for this, can you suggest a way to confirm that? All the evidence I have is the errors few milliseconds later and the set search_path instruction in the generated file which produces the same errors if executed.)
Thanks
Don't connect pg_dump through pgbouncer with transaction pooling. Just change the port number so it connects directly to the database. pg_dump is incompatible with transaction pooling.
You might be able to get it to work anyway by setting server_reset_query_always = 1

Dropping index concurrently PostgreSQL 9.1

DROP INDEX CONCURRENTLY first appeared in PSQL 9.2, but my server runs 9.1. Unfortunately that operation locks my app for an unpredictable amount of time, that's a very sad fact when doing it on production.
Is there a way to drop an index concurrently?
No, there's no simple workaround - otherwise it's rather less likely that DROP INDEX CONCURRENTLY would've been added in 9.2.
However, you can kill all sessions to force the drop to occur promptly.
What you want to avoid is the drop waiting on a partially acquired exclusive lock that prevents other transactions from proceeding, but doesn't let it proceed either, while it waits for other transactions to finish and release their share locks. The best way to ensure that happens is to kill all concurrent sessions.
So, in one session:
DROP INDEX my_index;
In another session, as a superuser, terminate all other sessions using the following untested query, which you'll need to adapt appropriately and test before use:
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE procpid <> (
SELECT pid
FROM pg_stat_activity
WHERE query = 'DROP INDEX my_index;')
AND
procpid <> pg_backend_pid();
Your well-written and well-tested application will immediately reconnect and retry its queries without bothering the user or getting upset, because it knows that transient errors are something it has to cope with so it runs all its database access in retry loops. If it isn't well written then you'll discover that with a flood of error user-visible messages. If it's really not well written you'll have to restart it before it gets its head together, but it's rare to see apps that are quite that broken.
This is a heavy handed approach. You can be rather softer about it by joining against pg_locks and only terminating sessions that actually hold a lock on the relation you're interested in or the index you wish to modify. You get to enjoy writing that query, because my interest in working around limitations of older database versions is limited.

Online backup blocking truncate table

It´s documented that in DB2 the TRUNCATE statement is not compatible with online backup because it gets a Z lock on the table and prevents an online backup from running concurrently.
The lock wait happens when a truncate tries to get a shared lock on an internal online backup object.
Since this is by design in the product I will have to go for workarounds, so this thread is not about a solution, but why they can´t work together. I didn´t find a reasonable explanation why there is such limitation in db2.
Any insights?
Thanks,
Luciano Moreira
from http://www.ibm.com/developerworks/data/library/techarticle/dm-0501melnyk/
When a table holds a Z lock, no concurrent application can read or
update data in that table.
So now we know that a Z lock is and exclusive access to a table denying read and write access to the table.
from http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.sql.ref.doc/doc/r0053474.html
Exclusive Access: No other session can have a cursor open on the table, or a lock held on the table (SQLSTATE 25001).
from https://sites.google.com/site/umeshdanderdbms/difference-between-truncate-and-delete
Delete is logging operation, where as Truncate is makes the table empty on container level.
(Logging operation – DML operation are logged into logs (redo log in oracle, transaction log in DB2 etc). It is stored in logs for commit or rollback operation.)
This is the most interesting part. Truncate just 'forgets' the content of the table whereas deletes removes line by line processing all triggers, bells, and whistles. Therefore when you truncate all reading cursors will get invalid. To prevent stupid stuff like that you can only completely empty a table when nobody tries to access it. Online backup obviously needs to read the table. Therefore it is not possible to have both accessing the same table at the same time.

How to profile azure SQL deadlocks?

I know this question was asked here but 1) it's relatively old and 2) It didn't help me much.
I am running into a relatively large number of deadlocks with a few operations on my database. The setup is as follows:
Tables:
Table A with foreign key into Table B.
Operations:
Insert into table A
Insert into table B
Update row in table B
Delete row in table B
Delete row in table A
Problem:
These operations can happen essentially in any order because I have multiple worker roles so these operations must be idempotent, however, each worker role will be working with a different primary key from table A. I am still trying to wrap my head around the concept of locks on tables and from what i understand, any delete on A will first lock table B, delete relevant rows there, and then delete the row from A. I currently assume that is an atomic operation and there is no time to execute additional locks between locking table B and locking table A because I can't imagine a way to get around that.
I am currently able to catch an exception in microsoft visual studio of the following format:
Transaction (Process ID xxx) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
This exception seems like it can happen on any of the above operations.
My question is: How do i know which locks/transactions are the ones causing the deadlock? Does anyone know any queries that would be useful AFTER we get the exception?
sys.event_log is the answer here.
It lives in your server's masterdb and should contain an entry with all of the deadlock graphs your database has hit in the last month.
Armed with the deadlock graph there are many tutorials on sql server deadlock graph debugging.
Currently profiling tools for Sql Azure are practically non existent.
The locking problem shouldn't differ much between standard Sql Server and Sql Azure world thus I would suggest trying to repro the problem in the 'old' world using standard techniques such as good old Profiler: quite useful article & this.
If that approach doesn't prove to be fruitful a dirty solution could be to work on catch/retry logic.
I ran into similar issues recently.
Try using your updates with "with (UPDLOCK)".
To try and find the root cause:
Start by just running a single worker role.
Then check:
Are you locking at the right level table lock, page lock or row lock?
Are you releasing the locks?
is your system designed in such a way, that all edits to the same row will be done by the same machine?
There is a blog post on finding blocking queries here: http://blogs.msdn.com/b/sqlazure/archive/2010/08/13/10049896.aspx

How to rollback an update in PostgreSQL

While editing some records in my PostgreSQL database using sql in the terminal (in ubuntu lucid), I made a wrong update.
Instead of -
update mytable set start_time='13:06:00' where id=123;
I typed -
update mytable set start_time='13:06:00';
So, all records are now having the same start_time value.
Is there a way to undo this change? There are some 500+ records in the table, and I do not know what the start_time value for each record was
Is it lost forever?
I'm assuming it was a transaction that's already committed? If so, that's what "commit" means, you can't go back.
Some data may be recoverable if you're lucky. Stop the database NOW.
Here's an answer I wrote on the same topic earlier. I hope it's helpful.
This might be too: Recoved deleted rows in postgresql .
Unless the data is absolutely critical, just restore from backups, it'll be lots easier and less painful. If you didn't have backups, consider yourself soundly thwacked.
If you catch the mistake and immediately bring down any applications using the database and take it offline, you can potentially use Point-in-Time Recovery (PITR) to replay your Write Ahead Log (WAL) files up to, but not including, the moment when the errant transaction was made. This would return the database to the state it was in prior, thus effectively 'undoing' that transaction.
As an approach for a production application database it has a number of obvious limitations, but there are circumstances in which PITR may be the best option available, especially when critical data loss has occurred. However, it is of no value if archiving was not already configured before the corruption event.
https://www.postgresql.org/docs/current/static/continuous-archiving.html
Similar capabilities exist with other relational database engines.