PostgreSQL LISTEN/NOTIFY number of notifications per transaction with identical payloads - postgresql

In the PostgreSQL manual it says:
If the same channel name is signaled multiple times from the same transaction with identical payload strings, the database server can decide to deliver a single notification only.
Do you know how this "decision" is made?

That's an interesting question. Perhaps the documentation is unclear but in my experience duplicated notifications are sent only within subtransactions.
To not just guess here, let's open the PostgreSQL source code. Notification function has a test of duplicates:
/* no point in making duplicate entries in the list ... */
if (AsyncExistsPendingNotify(channel, payload))
return;
Ok, but it does not explain the possibility of duplicates. So, we can move forward and inspect the AsyncExistsPendingNotify function. Somewhere inside this function, we found our answer in a comment:
/*
* As we are not checking our parents' lists, we can still get duplicates
* in combination with subtransactions, like in:
*
* begin;
* notify foo '1';
* savepoint foo;
* notify foo '1';
* commit;
*/
So, that's it. We can have duplicated notifications when using subtransactions. The documentation could be clearer, but perhaps PostgreSQL's folks made it intentionally. Therefore I can conclude that avoiding duplicates, in this case, is not a strict requirement.

Related

thread safe increment value in db

I have come across a problem, not sure how to implement it with DB. I have go lang on the application side.
I have product table with column assigned as last_port_used. I need to assign ports to services when someone hits an api. It need to increment the last_port_id by 1 against its product name.
one possible solution would have been to use redis server and sync this value over there. Since we dont have redis. I wanted to achieve the same by psql.
I read more about locks and i think i need ACCESS EXCLUSIVE lock. is this the right way to do it?
product
id
name
start_port //11000
end_port//11999
last_port_used// 11023
How to handle it concurrently properly?
You could do simply:
UPDATE products SET last_port_used = last_port_used+1
WHERE id=...
AND last_port_used < end_port
RETURNING *
This will perform the update in a thread-safe manner, and only if a port number is available (last_port_used < end_port) and return the assigned port.
If you need to lock the row, you can also use SELECT FOR UPDATE.

Update a row in a table respecting a constraint on another table

book:
id: primary key, integer
title: varchar
borrowed: boolean
borrowed_by_user_id: foreign key user.id
user:
id: primary key, integer
name: varchar
blocked: boolean
The isolation level is READ COMMITED, because it is default level in PostgreSQL (this requirement is not from me).
I am using one database transaction to SELECT FOR UPDATE a book and lend it to any user if book is not borrowed yet. The book was selected FOR UPDATE so it cannot be borrowed concurrently.
But there is another problem. We cannot allow to lend a book to blocked user. How can we ascertain that? Even if we check at the beginning if user is not blocked, the result might not be correct because a concurrent transaction could block the user after that check.
For example, a user can be blocked by a concurrent transaction from the admin's panel.
How to solve that issue?
I see that I can use SERIALIZABLE. It requires a handling errors, yes?
I am not sure how that CHECK works. Could you say more about it?
These are actually two questions.
About the books:
If you lock the book with SELECT ... FOR UPDATE as soon as you consider lending it out, this is an example of “pessimistic locking” and will block the book for all concurrent activity.
That is fine if the transactions are very short – specifically, if there is no user interaction between the locking and the end of the transaction.
Otherwise you should use “optimistic locking”. This can be done in several ways:
Use REPEATABLE READ transaction isolation. Then updating a book that has been modified since you read its data will lead to a serialization error (see the note at the end).
When selecting books, remember the values of the system columns ctid and xmin. Then update as follows:
UPDATE books SET ...
WHERE id = ...
AND ctid = original_ctid AND xmin = original_xmin;
If no row gets updated, somebody must have modified the book since you looked at it.
About the users:
Three ideas:
You use SERIALIZABLE transaction isolation (see the note at the end).
You maintain a counter on the user that contains the number of books the user has borrowed.
Then you can have a check constraint like
ALTER TABLE users ADD CHECK (NOT blocked OR books_borrowed = 0);
Such a check constraint is evaluated at the end of each statement and has to yield TRUE, else an error is thrown.
So either the transaction that borrows a book or the transaction that blocks the user must fail (both transactions have to modify the user).
Right before lending a book to a user, you run
SELECT blocked FROM users WHERE id = ... FOR UPDATE;
If you get TRUE, you abort the transaction, otherwise lend out the book.
A concurrent transaction that wants to block the user has to SELECT ... FOR UPDATE on the user as well and only then check if there are any books lent to that user.
That way, no inconsistency can happen: if you want to block a user, all concurrent transactions that want to lend a book to the user must either be completed, so that you see their effect, or they must wait until you are done blocking the user, whereupon they will fail.
Note about higher isolation levels:
If you run transactions at an isolation level of REPEATABLE READ or SERIALIZABLE, you can encounter serialization errors. These are not bugs in your program, they are normal and to be expected. If you encounter a serialization error, you have to rollback and try the same transaction again. That is the price you pay for not having to worry about race conditions.

Using row data in pg_notify trigger as channel name?

Is it possible to use data from the row a trigger is firing on, as the channel of a pg_notify, like this:
CREATE OR REPLACE FUNCTION notify_pricesinserted()
RETURNS trigger AS $$
DECLARE
BEGIN
PERFORM pg_notify(
NEW.my_label,
row_to_json(NEW)::text);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER notify_pricesinserted
AFTER INSERT ON prices
FOR EACH ROW
EXECUTE PROCEDURE notify_pricesinserted();
EDIT: I found out the reason it was not working is due to the case of my label. If I replace it with lower(NEW.my_label) and also do the same for the listener then it works.
The pg_notify() part would work without throwing an error. PostgreSQL places very few restrictions on what a channel name could be. But in practice it is probably useless because you would need to establish a LISTEN some_channel command prior to the pg_notify() statement to pick up the payload message somewhere outside of the trigger function and doing that on some dynamic value is difficult in most situations and probably terribly inefficient in all cases.
If - in your trigger - NEW.my_label has a small number of well-defined values, then you might work it out by establishing listening channels on all possible values, but you are probably better off defining a single channel identifier for your table, or perhaps for this specific trigger, and then construct the payload message in such a way that you can easily extract the appropriate information for some response. If you cannot predict the values of NEW.my_label then it is plain impossible.
In your specific case you could have a channel name 'prices' and then do something like:
pg_notify('prices', format('%s: %s, NEW.my_label, row_to_json(NEW)::text));
The session with LISTEN prices will receive:
Asynchronous notification "prices" with payload "some_label: {new_row_to_json}" received from server process with PID 8448.
That is a rather silly response (why the "Asynchronous notification "channel" with payload ..." instead of just the payload and the PID?) but you can easily extract the relevant parts and work with those. Since you would have to manipulate the string anyway it is not a big burden to strip away all the PG overhead in one go, on a single channel, making management of the trigger actions far easier.

Reading postgres NOTICE message in C++ API

I am struggling to read my Postgres NOTICE messages in my C++ API. I can only read EXCEPTIONmessages using the function PQresultErrorMessage(PGresult), but not lower level messages.
PQresultErrorField(res, PG_DIAG_SEVERITY) returns null pointer.
How do I read NOTICE and other low level messages?
(Using PostgreSQL 9.2)
Set up a notice receiver or notice processor using PQsetNoticeReceiver / PQsetNoticeProcessor. Both set up callbacks that are invoked when asynchronous notifications are received. Note that this may happen before, during, or after processing of query data.
It's safe to assume that after all query results are returned (PQexec or whatever has returned) and you've called PQconsumeInput to make sure there's nothing else waiting, then all notices for the last command are received. The PQconsumeInput shouldn't really be necessary, it's just to be cautious.
See the documentation for libpq.

Atomic get and delete in memcached?

Is there a way to do atomic get-and-delete in memcached?
In other words, I want to get the value for a key if it exists and delete it immediately, so this value can be read once and only once.
I think this pseudocode might work, but note the caveat postscript:
# When setting:
SET key-0 value
SET key-ns 0
# When getting:
ns = INCR key-ns
GET key-{ns - 1}
Constraint: I have millions of keys that could be accessed millions of times, and only a small percentage will have a value set at any given time. I don't want to have to update an atomic counter for every key with every get access request as above.
The canonical, but yet generic, answer to your question is : lock free hash table with a relaxed memory model.
The more relaxed is your memory model the more you get with a good lock free design, it's a way to get more performance out of the same chipset.
Here is a talk about that, I don't think that it's possible to answer to your question with a single post on hash tables and lock free programming, I'm not even trying to do that.
You cannot do this with memcached in a single command since there is no api that supports exactly what your asking for. What I would do to get the behavior your looking for is to implement some sort of marking behavior to signify that another client has or hasn't read the data. For example, you could create a JSON document as follows:
{
"data": "value",
"used": false
}
When you get the item check to see if it has already been used by another client by examining the used field. If it hasn't been used then set the value using the cas you got from the GET command and make sure that the document is updated to reflect the fact that a client has already accessed this key.
If the set operation fails because the cas is invalid then this means that another client has obtained this item and already updated it in memcached to signify that it has been used. In this case you just cancel whatever you were doing with the item and move on.
If the set operation succeeds then this means you client is the sole owner of this data. You can now delete it from memcached and do whatever processing on it you like.
Note that when doing the set I would also add an expiration time of about 5 seconds. This way if you application crashes your documents will clean themselves up if you don't finish with the entire process of deleting them.
To put some code to the answer from #mikewied, I think the basic gist is... (using Node.js):
var Memcached = require('memcached');
var memcache = new Memcached('localhost:11211');
var getOnce = function(key, callback) {
// gets is the check-and-set get (vs regular get)
memcache.gets(key, function(err, data) {
if (!data) {
// Cache miss, nothing to see here.
callback(null);
} else {
var yourData = data[key];
// Do a check-and-set to remove the data from the cache.
// This sets the value to null *only* if no one else already did.
memcache.cas(key, null /* new data */, data.cas, 10, function(err) {
if (err) {
// Check-and-set failed! (Here we'll treat it like a cache miss)
yourData = null;
}
callback(yourData);
});
}
});
};
I'm not an expert on Memcached and so I may be wrong. My answer is from reading the documentation and my experience using Memcached.
IMO this is not possible with memcached's current implementation.
to demonstrate why this is not possible currently here is a simple example to demonstrate the race condition:
two processes start at the same time
both execute a get/delete at the same time
memcached replies to both get commands at the same time
done (the desired result was to have get/delete execute atomically then the second get/delete to fail. instead memcached did get, get, delete, fails to delete)
to get an atomic get/delete would require:
a new command for memcached that is atomic let's call it get_delete
some sort of synchronization lock method of all the memcached clients to ensure both the get and delete commands are executed while the lock is held
so all clients would grab the synchronization lock whenever they need to enter the critcal section (i.e. get, delete) then release the lock after the critical section