PostgreSQL advisory locks are not working with Doctrine's DBAL - postgresql

I'm experiencing a very strange behavior when trying to use advisory locks in Doctrine's DBAL.
I have a Symfony 2 application in which I want to obtain an advisory lock for some entity. I'm making the following query to obtain the lock:
SELECT pg_try_advisory_lock(83049, 5)
Via the following code in PHP:
/** #var Doctrine\DBAL\Connection */
protected $connection;
public function lock()
{
return $this->connection->fetchColumn(
"SELECT pg_try_advisory_lock({$this->getTableOid()}, {$this->entity->getLockingId()})"
);
}
I've created the following script to test the concurrency:
// Obtaining the lock.
$locker->lock();
// Doing something for ten seconds.
sleep(10);
However, when I run it concurrently, it looks like every instance is successfully getting the lock. Also, after request is terminated, it looks like lock is automatically released, even when I've not called unlock().
Why it behaving that way?
Does doctrine use single connection for all requests?
Does doctrine releases the locks automatically after script is terminated?

13.3.4. Advisory Locks
PostgreSQL provides a means for creating locks that have
application-defined meanings. These are called advisory locks, because
the system does not enforce their use — it is up to the application to
use them correctly. Advisory locks can be useful for locking
strategies that are an awkward fit for the MVCC model. For example, a
common use of advisory locks is to emulate pessimistic locking
strategies typical of so-called "flat file" data management systems.
While a flag stored in a table could be used for the same purpose,
advisory locks are faster, avoid table bloat, and are automatically
cleaned up by the server at the end of the session.
There are two ways to acquire an advisory lock in PostgreSQL: at
session level or at transaction level. Once acquired at session level,
an advisory lock is held until explicitly released or the session
ends. Unlike standard lock requests, session-level advisory lock
requests do not honor transaction semantics: a lock acquired during a
transaction that is later rolled back will still be held following the
rollback, and likewise an unlock is effective even if the calling
transaction fails later. A lock can be acquired multiple times by its
owning process; for each completed lock request there must be a
corresponding unlock request before the lock is actually released.
Transaction-level lock requests, on the other hand, behave more like
regular lock requests: they are automatically released at the end of
the transaction, and there is no explicit unlock operation. This
behavior is often more convenient than the session-level behavior for
short-term usage of an advisory lock. Session-level and
transaction-level lock requests for the same advisory lock identifier
will block each other in the expected way. If a session already holds
a given advisory lock, additional requests by it will always succeed,
even if other sessions are awaiting the lock; this statement is true
regardless of whether the existing lock hold and new request are at
session level or transaction level.
http://www.postgresql.org/docs/9.1/static/explicit-locking.html

Related

When are locks released in Postgres

I have some problems understanding locks. Naturally locks are released when everything goes smoothly. But I'm unsure on the exact logic for when locks are released, when things break down. How long a lock can persist? Can I kill all processes and thereby release all locks? Do I have to explicitly call rollback?
In general locks are released when transaction ends with COMMIT or ROLLBACK.
There are exceptions:
Once acquired, a lock is normally held till end of transaction. But if
a lock is acquired after establishing a savepoint, the lock is
released immediately if the savepoint is rolled back to. This is
consistent with the principle that ROLLBACK cancels all effects of the
commands since the savepoint. The same holds for locks acquired within
a PL/pgSQL exception block: an error escape from the block releases
locks acquired within it.
2.
There are two ways to acquire an advisory lock in PostgreSQL: at
session level or at transaction level. Once acquired at session level,
an advisory lock is held until explicitly released or the session
ends.
Killing backend processes should release the locks but should be not the right way to release the locks: it should only be used as last resort if you cannot end the client application in a clean way.

maxTransactionLockRequestTimeoutMillis with concurrent transactions

I'm trying to get a better understanding of the lock acquisition behavior on MongoDB transactions. I have a scenario where two concurrent transactions try to modify the same document. Since one transaction will get the write lock on the document first, the second transaction will run into a write conflict and fail.
I stumbled upon the maxTransactionLockRequestTimeoutMillis setting as documented here: https://docs.mongodb.com/manual/reference/parameters/#param.maxTransactionLockRequestTimeoutMillis and it states:
The maximum amount of time in milliseconds that multi-document transactions should wait to acquire locks required by the operations in the transaction.
However, changing this value does not seem to have an impact on the observed behavior with a write conflict. Transaction 2 does not seem to wait for the lock to be released again but immediately runs into a write conflict when another transaction holds the lock (other than concurrent writes outside a transaction which will block and wait for the lock).
Do I understand correctly that the configured time in maxTransactionLockRequestTimeoutMillis does not include the act of actually receiving the write lock on the document or is there something wrong with my tests?

In this case from Nygard's "Release it!" why do deadlocks happen?

I'm reading over and over this paragraph from Michael Nygard's book "Release it!" and I still don't understand why exactly deadlocks can happen:
Imagine 100,000 transactions all trying to update the same row of the
same table in the same database. Somebody is bound to get deadlocked.
Once a single transaction with a lock on the user’s profile got hung
(because of the need for a connection from a different resource pool),
all the other database transactions on that row got blocked. Pretty
soon, every single request-handling thread got used up with these
bogus logins. As soon as that happens, the site is down.
When he says "because of the need for a connection from a different resource pool", is this inside the DB engine? What is this other resource pool and why would a connection from this other resource pool be needed?
Then, "every single request-handling thread" refers already not to DB threads, but to application threads, right? And they hung because they're waiting for the DB transactions (that are already hung) to finish?
The problem is in that applications interface with a LOT of different systems, any of which can run in parallel, have internal or external locks, and depend on yet more systems.
A simple example of a deadlock is basically when two processes need to acquire exactly the same two locks at the same time to proceed, but can't agree to who will go first and in which order (which is usually what the locks are for in the first place, so it's a chicken-and-the-egg problem, not exactly trivial). So processes A and B need to acquire two locks, #1 and #2, to do their thing and proceed. But while A is locking #1, B is locking #2, and then A tries to lock #2 and B tries to lock #1 - that's a deadlock. Someone's got to give in for any work to be done.
In real life, let's say you're running multiple instances of your web application, to be able to serve multiple incoming client requests (e.g. web browsers) at the same time. It doesn't matter if those are threads, processes or coroutines. Instances of your application can hang if they require locks on two database rows. Or they can hang because in addition to a database lock, they also need a lock on a file in the file system. Or they can hang because they need a lock on a file in the file system and they are using a third party remote REST API which also has locks of its own. Or because of infinite other reasons including all of the above simultaneously.

Curator InterProcessMutex vs InterProcessSemaphoreMutex

What's the difference between InterProcessMutex vs InterProcessSemaphoreMutex? The docs say InterProcessSemaphoreMutex is the same InterProcessMutex except it's not reentrant. But I don't know what reentrant means.
I'm the main author of Apache Curator. Irrespective of what the docs may or may not say I'd like, for the record, to give the exact use cases for each of the two classes.
InterProcessMutex
InterProcessMutex should be used when you need to be able to lock in a re-entrant manner. This means that a given thread is said to "own" the lock once acquired and can lock it again if needed. This is useful if the thread passes the lock object around to other methods that need not be concerned if the lock has been acquired or not. Note that this also means that only the owning thread can release the lock. Here's an example:
InterProcessMutex lock = new InterProcessMutex(...);
if ( !lock.acquire(...) ) ... // if acquire failed: throw, return, etc
try {
doWork(lock); // doWork() can safely call lock.acquire() again on the lock
} finally {
lock.release();
}
Once acquired, if the lock is released in a different thread than the one used to acquire the lock IllegalMonitorStateException is thrown.
InterProcessSemaphoreMutex
InterProcessSemaphoreMutex is a relaxed version of a lock that does not make note of the thread that acquired it. It has simpler semantics. Each InterProcessSemaphoreMutex instance can be acquired exactly once and must be balanced by a release (in any thread). i.e.
InterProcessSemaphoreMutex lock = new InterProcessSemaphoreMutex(...);
lock.acquire();
lock.acquire(); // this will block forever
I hope this helps. If the docs need clarification we'd appreciate a Pull Request with improvements.
In this context, reentrancy means that a thread can call acquire on the same lock more than once and not block when calling it the second or third time. It must balance all acquires with the same number of releases, however. This StackOverflow post talks more about re-entrant locks:
What is the Re-entrant lock and concept in general?
Specific to Curator, here is what the documentation has to say about the different locks:
Shared Re-entrant lock aka InterProcessMutex:
public void acquire()
Acquire the mutex - blocking until it's
available. Note: the same thread can call acquire re-entrantly. Each
call to acquire must be balanced by a call to release()
Shared lock (non-reentrant) aka InterProcessSemaphoreMutex:
public void acquire()
Acquire the mutex - blocking until it's available. Must be balanced by a call to release().
In the following example, if lock were a reentrant lock, then the code below would run to completion and work normally. If lock were not a reentrant lock, the thread would deadlock while calling the second lock.acquire():
lock.acquire();
lock.acquire();
doWork();
lock.release();
lock.release();
Reentrant locks tend to be a little more costly in implementation, but easier to use.
The above pattern happens frequently when you have multiple public methods into your API that must lock, but your implementation has public methods calling other public methods. You can avoid this by having your public methods do locking and only locking, and then call a private method that assumes it is always executed under the lock; then your private methods can call other private methods without needing to acquire the lock more than once.
Edit to address #Randgalt's comments:
Curator's InterProcessMutex requires that the same thread that acquires the lock release it. InterProcessSemaphoreMutex does not. Maybe you misread what I wrote? Maybe I wasn't clear? Don't know. Anyway, this is the case.
This is patently false; neither lock allows you to release it from a thread other than the thread that acquired the lock. Furthermore, this still has nothing to do with the question of "what is re-entrancy" in this context - and in this context, re-entrancy is whether or not you can call acquire on the same lock more than once on the same thread.
InterProcessMutex.release():
public void release()
Perform one release of the mutex if the calling thread is the same thread that acquired it. If the thread had made multiple calls to acquire, the mutex will still be held when this method returns.
InterProcessSemaphoreMutex.release():
public void release()
Perform one release of the mutex if the calling thread is the same thread that acquired it.
Emphasis added. Neither lock allows you to unlock it from any thread other than the thread that owns the lock - and this makes sense, because both locks are mutexes, and that is one of the properties of a mutex.

Pattern for a singleton application process using the database

I have a backend process that maintains state in a PostgreSQL database, which needs to be visible to the frontend. I want to:
Properly handle the backend being stopped and started. This alone is as simple as clearing out the backend state tables on startup.
Guard against multiple instances of the backend trampling each other. There should only be one backend process, but if I accidentally start a second instance, I want to make sure either the first instance is killed, or the second instance is blocked until the first instance dies.
Solutions I can think of include:
Exploit the fact that my backend process listens on a port. If a second instance of the process tries to start, it will fail with "Address already in use". I just have to make sure it does the listen step before connecting to the database and wiping out state tables.
Open a secondary connection and run the following:
BEGIN;
LOCK TABLE initech.backend_lock IN EXCLUSIVE MODE;
Note: the reason for IN EXCLUSIVE MODE is that LOCK defaults to the AccessExclusive locking mode. This conflicts with the AccessShare lock acquired by pg_dump.
Don't commit. Leave the table locked until the program dies.
What's a good pattern for maintaining a singleton backend process that maintains state in a PostgreSQL database? Ideally, I would acquire a lock for the duration of the connection, but LOCK TABLE cannot be used outside of a transaction.
Background
Consider an application with a "broker" process which talks to the database, and accepts connections from clients. Any time a client connects, the broker process adds an entry for it to the database. This provides two benefits:
The frontend can query the database to see what clients are connected.
When a row changes in another table called initech.objects, and clients need to know about it, I can create a trigger that generates a list of clients to notify of the change, writes it to a table, then uses NOTIFY to wake up the broker process.
Without the table of connected clients, the application has to figure out what clients to notify. In my case, this turned out to be quite messy: store a copy of the initech.objects table in memory, and any time a row changes, dispatch the old row and new row to handlers that check if the row changed and act if it did. To do it efficiently involves creating "indexes" against both the table-stored-in-memory, and handlers interested in row changes. I'm making a poor replica of SQL's indexing and querying capabilities in the broker program. I'd rather move this work to the database.
In summary, I want the broker process to maintain some of its state in the database. It vastly simplifies dispatching configuration changes to clients, but it requires that only one instance of the broker be connected to the database at a time.
it can be done by advisory locks
http://www.postgresql.org/docs/9.1/interactive/functions-admin.html#FUNCTIONS-ADVISORY-LOCKS
I solved this today in a way I thought was concise:
CREATE TYPE mutex as ENUM ('active');
CREATE TABLE singleton (status mutex DEFAULT 'active' NOT NULL UNIQUE);
Then your backend process tries to do this:
insert into singleton values ('active');
And quits or waits if it fails to do so.