Controlling duration of PostgreSQL lock waits - postgresql

I have a table called deposits
When a deposit is made, the table is locked, so the query looks something like:
SELECT * FROM deposits WHERE id=123 FOR UPDATE
I assume FOR UPDATE is locking the table so that we can manipulate it without another thread stomping on the data.
The problem occurs though, when other deposits are trying to get the lock for the table. What happens is, somewhere in between locking the table and calling psql_commit() something is failing and keeping the lock for a stupidly long amount of time. There are a couple of things I need help addressing:
Subsequent queries trying to get the lock should fail, I have tried achieving this with NOWAIT but would prefer a timeout method (because it may be ok to wait, just not wait for a 'stupid amount of time')
Ideally I would head this off at the pass, and have my initial query only hold the lock for a certain amount of time, is this possible with postgresql?
Is there some other magic function I can tack onto the query (similar to NOWAIT) which will only wait for the lock for 4 seconds before failing?
Due to the painfully monolithic spaghetti code nature of the code base, its not simply a matter of changing global configs, it kinda needs to be a per-query based solution
Thanks for your help guys, I will keep poking around but I haven't had much luck. Is this a non-existing function of psql, because I found this: http://www.postgresql.org/message-id/40286F1F.8050703#optusnet.com.au

I assume FOR UPDATE is locking the table so that we can manipulate it without another thread stomping on the data.
Nope. FOR UPDATE locks only those rows, so that another transaction that attempts to lock them (with FOR SHARE, FOR UPDATE, UPDATE or DELETE) blocks until your transaction commits or rolls back.
If you want a whole table lock that blocks inserts/updates/deletes you probably want LOCK TABLE ... IN EXCLUSIVE MODE.
Subsequent queries trying to get the lock should fail, I have tried achieving this with NOWAIT but would prefer a timeout method (because it may be ok to wait, just not wait for a 'stupid amount of time')
See the lock_timeout setting. This was added in 9.3 and is not available in older versions.
Crude approximations for older versions can be achieved with statement_timeout, but that can lead to statements being cancelled unnecessarily. If statement_timeout is 1s and a statement waits 950ms on a lock, it might then get the lock and proceed, only to be immediately cancelled by a timeout. Not what you want.
There's no query-level way to set lock_timeout, but you can and should just:
SET LOCAL lock_timeout = '1s';
after you BEGIN a transaction.
Ideally I would head this off at the pass, and have my initial query only hold the lock for a certain amount of time, is this possible with postgresql?
There is a statement timeout, but locks are held at transaction level. There's no transaction timeout feature.
If you're running single-statement transactions you can just set a statement_timeout before running the statement to limit how long it can run for. This isn't quite the same thing as limiting how long it can hold a lock, though, because it might wait 900ms of an allowed 1s for the lock, only actually hold the lock for 100ms, then get cancelled by the timeout.
Is there some other magic function I can tack onto the query (similar to NOWAIT) which will only wait for the lock for 4 seconds before failing?
No. You must:
BEGIN;
SET LOCAL lock_timeout = '4s';
SELECT ....;
COMMIT;
Due to the painfully monolithic spaghetti code nature of the code base, its not simply a matter of changing global configs, it kinda needs to be a per-query based solution
SET LOCAL is suitable, and preferred, for this.
There's no way to do it in the text of the query, it must be a separate statement.
The mailing list post you linked to is a proposal for an imaginary syntax that was never implemented (at least in a public PostgreSQL release) and does not exist.
In a situation like this you may want to consider "optimistic concurrency control", often called "optimistic locking". It gives you greater control over locking behaviour at the cost of increased rates of query repetition and the need for more application logic.

Related

PostgreSQL trigger an event on table update

I'm new with PostgreSQL and I would like to know or have some leads on:
Emit event (call an API) when a table is updated
My problem is: I have a SSO that insert row in an event table when user do something (login, register, update info). I need to exploit these inserts in another solution (a loyalty program) on real time.
For now I have in mind to query the table every minute (in nodeJS) and compare the size of table with the size of the previous minute. I think that is not the good way :)
You can do that with a trigger in principle. If the API is external to the database, you'd need a trigger function written in C or a language like PL/Perl or PL/Python that can perform the action you need.
However, unless this action can be guaranteed to be fast, it may not be a good idea to run it in a trigger. The trigger runs in the same transaction as the triggering statement, so if your trigger happens to run for a long time, you end up with a long database transaction. This has two main disadvantages:
Locks are held for a long time, which harms concurrency and hence performance, and also increases the risk of deadlocks.
Autovacuum cannot remove dead rows that were still active when the transaction started, which can lead to excessive table bloat on busy tables.
To avoid that risk, it is often better to use a queuing system: The trigger creates an entry in the queue, which is a fast action, and worker processes read and process these queue entries asynchronously outside the database.
Implementing a queue in a database is notoriously difficult, so you may want to look for existing solutions.

How to avoid being blocked by deadlocks?

Can I write an UPDATE statement that will simply not bother executing if there's a deadlock?
I have a small, but frequently updated table.
This statement is run quite frequently on it....
UPDATE table_a SET lastChangedTime = 'blah' WHERE pk = 1234;
Where pk is the primary key.
Every now and again this statement gets blocked. That's not in itself a big deal; the issue is that each time there's a lock it seems to take a minute or two for Postgres to sort itself out, and I can lose a lot a data.
table_a is very volatile, and lastChangedTime gets altered all the time, so rather than occasionally having to wait two minutes for the UPDATE to get executed, I'd rather it just didn't bother. Ok, my data might not be as up-to-date as I'd like for this one record, but at least I wouldn't have locked the whole table for 2 minutes.
Update following comments:
the application interacts very simply with the database, it only issues simple, one line UPDATE and INSERT statements, and commits each one immediately. One of the issues causing me a lot of head scratching is how can something work a million times without problem, and then just fail on another record that appears to be identical to all the others.
Final suggestion/question.....
The UPDATE statement is being invoked from a C# application. If I change the 'command timeout' to a very short value - say 1 millisecond would that have the desired effect? or might it end up clogging up the database with lots of broken transactions?
To avoid waiting for locks, first run
SELECT 1 FROM table_a WHERE pk = 1234 FOR UPDATE NOWAIT;
If there is a lock on the row, the statement will fail immediately, and you can go o working on something different.
Mind that the SELECT ... FOR UPDATE statement has to be in the same database transaction as your UPDATE statement.
As a general advice, you should use shorter transactions, which will reduce the length of lock waits and the risk of deadlock.

In Postgres, what does pg_stat_database.xact_commit actually mean?

I'm trying to understand SELECT xact_commit FROM pg_stat_database; According to docs, it is "Number of transactions in this database that have been committed". But I turned on logging all queries (log_min_duration = 0) and it seems there are other things besides that can affect xact_commit than just a query. For example, connecting a psql client or typing BEGIN; will increase it by various values. There is a step in my application that runs a single query (as confirmed by the log), but consistently increases the counter by 15-20. Does anyone know anything more specific about what is counted in xact_commit, or if there is a way to count only actual queries?
pg_stat_database.xact_commit really is the number of commits in the database (remember that every statement that is not run in a transaction block actually runs in its own little transaction, so it will cause a commit).
The mystery that remains to be solved is why you see more commits than statements, which seems quite impossible (For example, BEGIN starts a transaction, so by definition it cannot increase xact_commit).
The solution is probably that database activity statistics are collected asynchronously: they are sent to the statistics collector process via an UDP socket, and the statistics collector eventually updates the statistics.
So my guess is that the increased transaction count you see is actually from earlier activities.
Try keeping the database absolutely idle for a while and then try again, then you should see a slower increase.

postgresql concurrent queries as stored procedures

I have 2 stored procedures that interact with the same datatables.
first executes for several hours and second one is instant.
So if I run first one, and after that second one (second connection) then the second procedure will wait for the first one to end.
It is harmless for my data if both can run at the same time, how to do that?
The fact that the shorter query is blocked while being on a second connection suggests that the longer query is getting an exclusive lock on the table during the query.
That suggests it is doing writes, as if they were both reads there shouldn't be any locking issues. PgAdmin can show what locks are active during the longer query and also if the shorter query is indeed blocked on the longer one.
If the longer query is indeed doing writes, it's possible that you may be able to reduce the lock contention -- by chunking it, for example, which could allow readers in between chunked updates/inserts -- but if it's an operation that requires an exclusive write lock, then it will block everybody until it's done.
It's also possible that you may be able to optimize the query such that it needs to be a lower-level lock that isn't exclusive, but that would all depend on the specifics of what the query is doing and your data.

PostgreSQL Lock Row on indefinitely time

I wanna lock one row by some user until he work with this row on indefinitely time and he must unlock it when done. So any others users will not be able to lock this row for yourself. It is possible to do on data base level?
You can do it with a long-lived transaction, but there'll be performance issues with that. This sounds like more of a job for optimistic concurrency control.
You can just open a transaction and do a SELECT 1 FROM mytable WHERE clause to match row FOR UPDATE;. Then keep the transaction open until you're done. The problem with this is that it can cause issues with vacuum that result in table and index bloat, where tables get filled with deleted data and indexes fill up with entries pointing to obsolete blocks.
It'd be much better to use an advisory lock. You still have to hold the connection the holds the lock open, but it doesn't have to keep an open idle transaction, so it's much lower impact. Transactions that wish to update the row must explicitly check for a conflicting advisory lock, though, otherwise they can just proceed as if it wasn't locked. This approach also scales poorly to lots of tables (due to limited advisory lock namespace) or lots of concurrent locks (due to number of connections).
You can use a trigger to check for the advisory lock and wait for it if you can't make sure your client apps will always get the advisory lock explicitly. However, this can create deadlock issues.
For that reason, the best approach is probably to have a locked_by field that records a user ID, and a locked_time field that records when it was locked. Do it at the application level and/or with triggers. To deal with concurrent attempts to obtain the lock you can use optimistic concurrency control techniques, where the WHERE clause on the UPDATE that sets locked_by and locked_time will not match if someone else gets there first, so the rowcount will be zero and you'll know you lost the race for the lock and have to re-check. That WHERE clause usually tests locked_by and locked_time. So you'd write something like:
UPDATE t
SET locked_by = 'me' AND locked_time = current_timestamp
WHERE locked_by IS NULL AND locked_time IS NULL
AND id = [ID of row to update];
(This is a simplified optimistic locking mode for grabbing a lock, where you don't mind if someone else jumped in and did an entire transaction. If you want stricter ordering, you use a row-version column or you check that a last_modified column hasn't changed.)