ConcurrentUpdateException with single user - atg

On my Dev box running on single Jboss instance, I was able to see some occurrence of ConcurrentUpdateException on the Order item. Order repository uses Simple cache. I am trying to figure out a scenario how CUE could occur in a single instance. Also, a lock is obtained and order is updated in a transaction as recommended by ATG.
My assumption is that the lock would prevent concurrent request from updating Order at the same time. So, what could cause the order version in the database to be greater than one in the cache?

Related

Stable pagination using Postgres

I want to implement stable pagination using Postgres database as a backend. By stable, I mean if I re-read a page using some pagination token, the results should be identical.
Using insertion timestamps will not work, because clock synchronization errors can make pagination unstable.
I was considering using pg_export_snapshot() as a pagination token. That way, I can reuse it on every read, and the database would guarantee me the same results since I am always using the same snapshot. But the documentation says that
"The snapshot is available for import only until the end of the transaction that exported it."
(https://www.postgresql.org/docs/9.4/functions-admin.html)
Is there any workaround for this? Is there an alternate way to export the snapshot even after the transaction is closed?
You wouldn't need to export snapshots; all you need is a REPEATABLE READ READ ONLY transaction so that the same snapshot is used for the whole transaction. But, as you say, that is a bad idea, because long transactions are quite problematic.
Using insert timestamps I see no real problem for insert-only tables, but rows that get deleted or updated will certainly vanish or move unless you use “soft delete and update” and leave the old values in the table (which gives you the problem of how to get rid of the values eventually). That would be re-implementing PostgreSQL's multiversioning on the application level and doesn't look very appealing.
Perhaps you could use a scrollable WITH HOLD cursor. Then the database server will materialize the result set when the selecting transaction is committed, and you can fetch forward and backward at your leisure. Sure, that will hog server resources, but you will have to pay somewhere. Just don't forget to close the cursor when you are done.
If you prefer to conserve server resources, the obvious alternative would be to fetch the whole result set to the client and implement pagination on the client side alone.

cursors: atomic MOVE then FETCH?

the intent
Basically, pagination: have a CURSOR (created using DECLARE outside any function here, but that can be changed if need be) concurrently addressed to retrieve batches of rows, implying moving the cursor position in order to fetch more than one line (FETCH count seems to be the only way to fetch more than one line).
the context
During a more global transaction (i.e. using one connection), I want to retrieve a range of rows through a cursor. To do so, I:
MOVE the cursor to the desired position (e.g. MOVE 42 FROM "mycursor")
then FETCH the amount of rows (e.g. FETCH FORWARD 10 FROM "mycursor")
However, this transaction is used by many workers (horizontally scaled), each receiving a set of "coordinates" for the cursor, like LIMIT and OFFSET: the index to MOVE to, and the amount of rows to FETCH. These workers use the DB connection through HTTP calls to a single DB API which handles the pool of connections and the transactions' liveliness.
Because of this concurrent access to the transaction/connection, I need to ensure atomic execution of the couple "MOVE then FETCH".
the setup
NodeJS workers consuming ranges of rows through a DB API
NodeJS DB API based on pg (latest)
PostgreSQL v10 (can be upgraded if required, all documentation links here are from v12 - latest)
the tries
WITH (MOVE 42 FROM "mycursor") FETCH 10 FROM "mycursor" produces a syntax error, apparently WITH doesn't handle MOVE
MOVE 42 FROM "mycursor" ; FETCH 10 FROM "mycursor" as I'm inside a transaction I suppose this could work, but anyway I'm using Node's pg which apparently doesn't handle several statements in the same call to query() (no error, but no result yielded, I didn't dig into this too much as it looks like a hack)
I'm not confident a function would guarantee atomicity, it doesn't seem to be what PARALLEL UNSAFE does, and as I'm going to have high concurrency, I'd really love some explicitly written assurances about atomicity...
the reason
I'd prefer not to rely on LIMIT/OFFSET as it would require an ORDER BY clause to ensure pagination consistency (as per the docs, ctrl-f for "unpredictable"), unless (scrollable, without hold) cursors prove to be way more resource-consuming. "Way more" because it has to be weighed with the INSENSITIVE behavior of cursors that would allow me not to acquire a lock on the underlying table during the whole process. If it's proven that pagination in this context is not feasible using cursors, I'll fall back to this solution, unless you have something better to suggest!
the human side
Hello, and thanks in advance for the help! :)
You can share connections, but you cannot to share transactions. This request is impossible for this context. Good tools doesn't share connections, when any transaction is used in this connection.

Syncing two mongodb's

Im trying to figure out how to solve following use case with usage of mongodb.
There is a main server, where users can upload/edit/... contents. Sometimes the users want to work independently of the main, so they get a defined set of entries of mongodb of the main server and can deploy own local environment. But after some time, they want to get an update for entries from main server and/or submit their own updates or new insertions
Is there any out of box solution for that? I thought that the standard replica sets can be used, but the documentation is so sparse, so i cant really understand if it can work, if users are e.g. 1 month offline

DB2 Read committed without locking?

We have a transaction that is modifying a record. The transaction must call a web service, rolling back the transaction if the service fails (so it can't commit it before hand). Because the record is modified, the client app has a lock on it. However, the web service must retrieve that record to get information from it as part of it's processing. Bam, deadlock.
We use websphere, which, for reasons that boggle my mind, defaults to repeatable read isolation level. We knocked it down to read_committed, thinking that this would retrieve the row without seeking a lock. In our dev environment, it seemed to work, but in staging we're getting deadlocks.
I'm not asking why it behaved differently, we probably made a mistake somewhere. Nor am I asking about the specifics of the web service example above, because obviously this same thing could happen elsewhere.
But based on reading the docs, it seems like read_committed DOES acquire a shared lock during read, and as a result will wait for an exclusive lock held by another transaction (in this case the client app). But I don't want to go to read_uncommitted isolation level because I don't want dirty reads. Is there a less extreme solution? I need some middle ground where I can perform reads without any lock-waiting, and retrieve only committed data.
Is there such a goldilocks solution? Not too deadlock-y, not too dirty-read-y? If not in siolation level, maybe some modifier I can tack onto my SQL? Anything?
I assume you are talking jdbc isolation levels, and not db2. The difference between read_committed (cursor stability in db2) and repeatable_read (read stability) is how long the share locks are kept. repeatable_read keeps every lock that satisfied the predicates, read_committed on the other hand only keeps the lock until another row that matches the predicate is found.
Have you compared the plans? If the plans are different you may end up with different behaviour.
Are there any escalations occurring?
Have you tried CURRENTLY_COMMITTED (assuming you are on 9.7+)?
Pre currently_committed there where the following settings, DB2_SKIPINSERTED, DB2_EVALUNCOMMITTED and DB2_SKIPDELETED
The lowest isolation level that reads committed rows is read committed.
Usually, you process rows in a DB2 database like this:
Read database row with no read locks (ordinary SELECT with read committed).
Process data so you have a row with changed values.
Read database row again, with a read lock. (SELECT for UPDATE)
Check to see the database row in 1. matches the database row in 3.
If rows match, update database row.
If rows don't match, release update lock and go back to 2.

Memcached, Locking and Race Conditions

We are trying to update memcached objects when we write to the database to avoid having to read them from database after inserts/updates.
For our forum post object we have a ViewCount field containing the number of times a post is viewed.
We are afraid that we are introducing a race condition by updating the memcached object, as the same post could be viewed at the same time on another server in the farm.
Any idea how to deal with these kind of issues - it would seem that some sort of locking is needed but how to do it reliably across servers in a farm?
If you're dealing with data that doesn't necessarily need to be updated realtime, and to me the view count is one of them, then you could add an expires field to the objects that are stored in memcache.
Once that expiration happens, it'll go back to the database and read the new value, but until then it will leave it alone.
Of course for new posts you may want this updated more often, but you can code for this.
Memcache only stores one copy of your object in one of its instances, not in many of them, so I wouldn't worry about object locking or anything. That is for the database to handle, not your cache.
Edit:
Memcache offers no guarantee that when you're getting and setting from varied servers that your data won't get clobbered.
From memcache docs:
A series of commands is not atomic. If you issue a 'get' against an item, operate on the data, then wish to 'set' it back into memcached, you are not guaranteed to be the only process working on that value. In parallel, you could end up overwriting a value set by something else.
Race conditions and stale data
One thing to keep in mind as you design your application to cache data, is how to deal with race conditions and occasional stale data.
Say you cache the latest five comments for display on a sidebar in your application. You decide that the data only needs to be refreshed once per minute. However, you neglect to remember that this sidebar display is renderred 50 times per second! Thus, once 60 seconds rolls around and the cache expires, suddenly 10+ processes are running the same SQL query to repopulate that cache. Every time the cache expires, a sudden burst of SQL traffic will result.
Worse yet, you have multiple processes updating the same data, and the wrong one ends up dating the cache. Then you have stale, outdated data floating about.
One should be mindful about possible issues in populating or repopulating our cache. Remember that the process of checking memcached, fetching SQL, and storing into memcached, is not atomic at all!
I'm thinking - could a solution be to store viewcount seperately from the Post object, and then do an INCR on it. Of course this would require reading 2 seperate values from memcached when displaying the information.
memcached operations are atomic. the server process will queue the requests and serve each one completely before going to the next, so there's no need for locking.
edit: memcached has an increment command, which is atomic. You just have to store the counter as a separate value in the cache.
We encountered this in our system. We modified get so
If the value is unset, it sets it with a flag ('g') and [8] second TTL, and returns false so the calling function generates it.
If the value is not flagged (!== 'g') then unserialize and return it.
If the value is flagged (==='g') then wait 1 second and try again until it's not flagged. It will eventually be set by the other process, or expired by the TTL.
Our database load dropped by a factor of 100 when we implemented this.
function get($key) {
$value=$m->get($key);
if ($value===false) $m->set($key, 'g', $ttl=8);
else while ($value==='g') {
sleep(1);
$value=$m->get($key);
}
return $value;
}