Delete an uncommitted inserted row in DB2 (V8.2.7 - Fix 14) - db2

Upon client's request, I was asked to turn a web application on read-uncommitted isolation level (it's a probably a bad idea...).
While testing if the isolation was in place, I inserted a row without committing (DBVisualiser : #set autocommit off + stop VPN connection to the database) and I started testing my application towards that uncommitted insert.
select * from MYTABLE WHERE MY ID = "NON_COMMIT_INSERT_ID" WITH UR is working fine. Now I would like to "delete" this row and I did not find any way...
UPDATE : The row did disappear after some time (about 30min). I guess there is some kind of timeout before a rollback is automatically issued. Is there any way to remove an uncommitted row before this happens ?

I think that this will not be possible using normal SQL statements - the only way to delete the row will be to rollback the transaction which inserted it (or wait for tx to commit, then delete). As you have disconnected from DB on network level, then 30 minutes you talk about is probably TCP timeout enforced on operating system level. After TCP connection has been terminated, DB2 rollbacked client's transaction automatically.
Still I think you could administratively force application to disconnect from database (using FORCE APPLICATION with handle obtained from LIST APPLICATIONS) which should rollback the transaction, see http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.udb.doc/core/r0001951.htm for details on these commands.

It's one thing to read uncommitted rows from a data base. There are sometimes good reasons (lack of read locks) for doing this,
It's another to leave inserted, updated, or deleted rows on a data base without a commit or roll back. You should never do this. Either commit or roll back after a database change.

Related

How to avoid long delay before finally getting "40001 could not serialize access due to concurrent update"

We have a Postgres 12 system running one master master and two async hot-standby replica servers and we use SERIALIZABLE transactions. All the database servers have very fast SSD storage for Postgres and 64 GB of RAM. Clients connect directly to master server if they cannot accept delayed data for a transaction. Read-only clients that accept data up to 5 seconds old use the replica servers for querying data. Read-only clients use REPEATABLE READ transactions.
I'm aware that because we use SERIALIZABLE transactions Postgres might give us false positive matches and force us to repeat transactions. This is fine and expected.
However, the problem I'm seeing is that randomly a single line INSERT or UPDATE query stalls for a very long time. As an example, one error case was as follows (speaking directly to master to allow modifying table data):
A simple single row insert
insert into restservices (id, parent_id, ...) values ('...', '...', ...);
stalled for 74.62 seconds before finally emitting error
ERROR 40001 could not serialize access due to concurrent update
with error context
SQL statement "SELECT 1 FROM ONLY "public"."restservices" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x"
We log all queries exceeding 40 ms so I know this kind of stall is rare. Like maybe a couple of queries a day. We average around 200-400 transactions per second during normal load with 5-40 queries per transaction.
After finally getting the above error, the client code automatically released two savepoints, rolled back the transaction and disconnected from database (this cleanup took 2 ms total). It then reconnected to database 2 ms later and replayed the whole transaction from the start and finished in 66 ms including the time to connect to the database. So I think this is not about performance of the client or the master server as a whole. The expected transaction time is between 5-90 ms depending on transaction.
Is there some PostgreSQL connection or master configuration setting that I can use to make PostgreSQL to return the error 40001 faster even if it caused more transactions to be rolled back? Does anybody know if setting
set local statement_timeout='250'
within the transaction has dangerous side-effects? According to the documentation https://www.postgresql.org/docs/12/runtime-config-client.html "Setting statement_timeout in postgresql.conf is not recommended because it would affect all sessions" but I could set the timeout only for transactions by this client that's able to automatically retry the transaction very fast.
Is there anything else to try?
It looks like someone had the parent row to the one you were trying to insert locked. PostgreSQL doesn't know what to do about that until the lock is released, so it blocks. If you failed rather than blocking, and upon failure retried the exact same thing, the same parent row would (most likely) still be locked and so would just fail again, and you would busy-wait. Busy-waiting is not good, so blocking rather than failing is generally a good thing here. It blocks and then unblocks only to fail, but once it does fail a retry should succeed.
An obvious exception to blocking-better-than-failing being if when you retry, you can pick a different parent row to retry with, if that make sense in your context. In this case, maybe the best thing to do is explicitly lock the parent row with NOWAIT before attempting the insert. That way you can perhaps deal with failures in a more nuanced way.
If you must retry with the same parent_id, then I think the only real solution is to figure out who is holding the parent row lock for so long, and fix that. I don't think that setting statement_timeout would be hazardous, but it also wouldn't solve your problem, as you would probably just keep retrying until the lock on the offending row is released. (Setting it on the other session, the one holding the lock, might be helpful, depending on what that session is doing while the lock is held.)

What to do after a query when auto_commit is disabled

In some scenarios we should setAutoCommit(false) before query, see here https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor and When does the PostgreSQL JDBC driver fetch rows after executing a query? .
But none of these topics mentioned how to do after query, when ResultSet and Statement is closed but Connection is not (may be recycled by ConnectionPool or DataSource).
I have these choices:
Do nothing (keep autoCommit = false for next query)
set autoCommit = true
commit
rollback
Which one is the best practice?
Even queries are executed in a transaction. If you started a transaction (which implicitly happened when you executed the query), then you should also end it. Generally, doing nothing would - with a well-behaved connection pool - result in a rollback when your connection is returned to the pool. However, it is best not the rely on such implicit behaviour, because not all connection pools or drivers will adhere to it. For example the Oracle JDBC driver will commit on connection close (or at least, it did so in the past, I'm not sure if it still does), and it might not be the correct behaviour for your program. Explicitly calling commit() or rollback() will clearly document the boundaries and expectations of your program.
Though committing or rolling back a transaction that only executed a query (and thus did not modify the database), will have the same end result, I would recommend using commit() rather than rollback(), to clearly indicate that the result was successful. For some databases, committing might be cheaper than rollback (or vice versa), but such systems usually have heuristics that will convert a commit to rollback (or vice versa, whatever is 'cheaper'), if the result would be equivalent.
You generally don't need to switch auto-commit mode when you're done. A well-behaved connection pool should do that for you (though not all do, or sometimes you need to explicitly configure this). Double check the behaviour and options of your connection pool to be sure.
If you want to continue using a connection yourself (without returning to the pool), then switching back to auto-commit mode is sufficient: calling setAutoCommit(true) with an active transaction will automatically commit that transaction.
It depends what you want to do afterwards. If you want to return to autocommit mode after the operation:
conn.setAutoCommit(true);
This will automatically commit the open transaction.

Locking row that is being currently inserted to avoid race conditions when reading same

I have a system in which I'll dispatch events as part of transactions when inserting rows into the database.
This is to ensure that events only ever get dispatched for a successful insert, and that inserts are rolled back incase event dispatching fails.
Problem is that I'm seeing a race condition when dispatching an event after a successful insert. Sometimes the corresponding event listeners will receive the event, before the transaction has been committed, which will result in the row not being available at this time
Want I want to achieve is when inserting row A, any processes trying to read row A must wait until row A's transaction has been committed
Is this a sound approach and how is it best achieved?
For two processes A and B
How it currently works
A Start transaction
A Attempt insert
A dispatch event
B Receives event and attempts to read the inserted row
B Exception is raised as record is not visible yet
A commits transaction
How I'd want it to work
A Start transaction
A Attempt insert
A dispatch event
B Receives event and attempts to read the inserted row
B The row is currently locked by transaction so it waits until released
A commits transaction
B Lock is released and newly inserted row is returned
Based on the question and the comments, I think the main issue here is that you want "A" to dispatch an event based on the content of an uncommitted row, and you want "B" to read and act on that uncommitted row. But in relational theory and in SQL databases, you don't know whether a commit will succeed. A commit can fail for lots of reasons--not only because some constraint fails, but because of things like insufficient permissions or disk space.
To borrow words from one of your comments, the dbms manages a database's transition from one consistent state to another. Uncommitted rows aren't known to be part of a consistent database state. Only successfully committed rows are part of a consistent database state.
So I think, in general, you need "A" to commit before "B" tries to read the new row. And I think this is true even if you switch to a dbms that supports the "read uncommitted" transaction isolation level. (PostgreSQL does not, at least not in the way you might think.)
That means "B" will be responsible for deleting the new row (or for telling "A" to delete it) if "dispatching" fails. Neither "A" nor "B" can roll back a committed transaction.
It might make more sense for another process, "C", to oversee the actions of both "A" and "B".

Bitronix transaction appears to be committing prematurely

We have a spring-batch process that uses the bitronix transaction manager. On the first pass of a particular step, we see the expected commit behavior - data is only committed to the target database when the transaction boundary is reached.
However, on the second and subsequent passes, rows are committed as soon as they are written. That is, they do not wait for the commit point.
We have confirmed that the bitronix commit is only called at the expected points.
Has anyone experienced this behavior before? What kind of bug am I looking for?
Java XA is designed in such a way that connections cannot be reused across transactions. Once the transaction is committed, the connection property is changed to autocommit=true, and the connection cannot be used in another transaction until it is returned to the connection pool and retrieved by the XA code again.

Pattern for a singleton application process using the database

I have a backend process that maintains state in a PostgreSQL database, which needs to be visible to the frontend. I want to:
Properly handle the backend being stopped and started. This alone is as simple as clearing out the backend state tables on startup.
Guard against multiple instances of the backend trampling each other. There should only be one backend process, but if I accidentally start a second instance, I want to make sure either the first instance is killed, or the second instance is blocked until the first instance dies.
Solutions I can think of include:
Exploit the fact that my backend process listens on a port. If a second instance of the process tries to start, it will fail with "Address already in use". I just have to make sure it does the listen step before connecting to the database and wiping out state tables.
Open a secondary connection and run the following:
BEGIN;
LOCK TABLE initech.backend_lock IN EXCLUSIVE MODE;
Note: the reason for IN EXCLUSIVE MODE is that LOCK defaults to the AccessExclusive locking mode. This conflicts with the AccessShare lock acquired by pg_dump.
Don't commit. Leave the table locked until the program dies.
What's a good pattern for maintaining a singleton backend process that maintains state in a PostgreSQL database? Ideally, I would acquire a lock for the duration of the connection, but LOCK TABLE cannot be used outside of a transaction.
Background
Consider an application with a "broker" process which talks to the database, and accepts connections from clients. Any time a client connects, the broker process adds an entry for it to the database. This provides two benefits:
The frontend can query the database to see what clients are connected.
When a row changes in another table called initech.objects, and clients need to know about it, I can create a trigger that generates a list of clients to notify of the change, writes it to a table, then uses NOTIFY to wake up the broker process.
Without the table of connected clients, the application has to figure out what clients to notify. In my case, this turned out to be quite messy: store a copy of the initech.objects table in memory, and any time a row changes, dispatch the old row and new row to handlers that check if the row changed and act if it did. To do it efficiently involves creating "indexes" against both the table-stored-in-memory, and handlers interested in row changes. I'm making a poor replica of SQL's indexing and querying capabilities in the broker program. I'd rather move this work to the database.
In summary, I want the broker process to maintain some of its state in the database. It vastly simplifies dispatching configuration changes to clients, but it requires that only one instance of the broker be connected to the database at a time.
it can be done by advisory locks
http://www.postgresql.org/docs/9.1/interactive/functions-admin.html#FUNCTIONS-ADVISORY-LOCKS
I solved this today in a way I thought was concise:
CREATE TYPE mutex as ENUM ('active');
CREATE TABLE singleton (status mutex DEFAULT 'active' NOT NULL UNIQUE);
Then your backend process tries to do this:
insert into singleton values ('active');
And quits or waits if it fails to do so.