Suspend transaction in Postgres - postgresql

I have seen another database system that offers to suspend transaction. The current transaction is kept intact but put on hold while your code is allowed to work with the database to effect immediate permanent changes to rows. Then you can resume transaction, continuing where you left off with the same locks and other transaction protections in place as if you'd never interrupted it.
For example, say an customer is placing an order, in a transaction. During that transaction, the customer notices their phone number needs to be updated, so we change that data. Next, customer decides to cancel the not-yet-completed order. A rollback of the order has the unintended consequence of also undoing the phone number change. So it would be nice if we could:
Suspend the transaction for the order.
Update the phone number, committed to the database.
Resume the transaction for the order.
Is there some way to suspend a transaction in Postgres? In JDBC?

If a transaction cannot continue, it must roll back.
If your transaction has a point at which you don't know how to carry on, then your transaction logic is flawed, you need to reorganize it - either split into multiple transactions (or sub-transactions, aka save points), or take out the parts that do not belong to the transaction logic.
Is there some way to suspend a transaction in Postgres?
No, no such thing. And the data integrity principle is unconditional as to time.

No.
The closest things are
prepared transactions: this allows (with some conditions) for a transaction to be saved, and then later rolled back or committed.
savepoints: this allows for "nested transactions", where portions of transactions can be rolled back .
Neither of these fit exactly what you are looking for. It seems that our example has two operations that do not need to be part of the same transaction at all, since the phone number update appears to be unrelated to the success of the order. (Also, a long-running transaction is a bad idea....your order should probably be a state machine implemented without long-running transaction.)

Workaround – open second connection
In JDBC, you could just open a second connection to the database.
Do your separate work on that second connection and close. The first connection is still open and remains in its same state. Any active transaction in that first connection remains.

Related

How to catch PostgreSQL ROLLBACK for a function with UOW side-effects?

I am writing a scalar plpgsql function that contains a C function that has a side-effect outside of the database. When the function is invoked, in some arbitrary SQL (trigger, select, write, etc), I want the side-effect to be committed or rolled back on the PostgreSQL unit of work (UOW) boundary. I can handle the UOW commit, but I don't know how to "catch" the database ROLLBACK and rollback the side-effect. The key point is I am writing the function, but don't have control of how it is called, i.e., I can not "force" the call to be in a block with EXCEPTION handlers. Any ideas?
For the commit, I plan to have the plpsql function INSERT into a database TABLE that has a trigger "... AFTER INSERT ... EXECUTE PROCEDURE commit_my_side_effect()", so when the UOW is committed, the row is committed, the AFTER INSERT trigger fires and presto, the side effect is committed;
The only idea I have so far is to pass out the txid_current() to a background worker process. Then on some heartbeat using SPI, check if the txid is not in flight or committed, then it must have been rolled back. But that seems like heavy lifting.
Broadly speaking, a transaction is considered "rolled back" if it's not committed and it's no longer running; in the interests of ACID compliance, an explicit ROLLBACK needs to be functionally identical to yanking the power cord on your server, so in general, there can't be any deliberate action associated with a rollback which you might be able to hook into.
The actual removal of rolled-back data is handled by vacuuming, which works more or less like your proposed background worker: anything written by a transaction which is not running and not committed is a candidate for removal. However, there's a bit more to it than that, as a transaction containing subtransactions (SAVEPOINTs or PL/pgSQL EXCEPTION blocks) can be partially rolled back. In other words, txid_current() alone isn't enough to decide if a change was committed, and I don't know if Postgres exposes enough information about subtransaction states to let you to cater for this.
I think the only sane approach is to move the application of side-effects to an external process, and trigger it after commit, once you know what has actually been committed. Two approaches come to mind:
Have your PL/pgSQL function insert into a work queue which is polled by the external process, or
Feed changes to the process via NOTIFY (notifications are only delivered on commit, and notifications from rolled-back subtransactions are discarded)
Notifications are more lightweight and lower latency (they're delivered asynchronously, so no polling is necessary), but less robust than a table-based approach, as the notification queue is wiped out in the event of a crash or an unexpected disconnection. Of course, if you want crash safety without the downsides of polling, you can simply do both.
I found a feature called ON_ERROR_ROLLBACK, and looking at the implementation,https://github.com/postgres/postgres/blob/master/src/bin/psql/common.c, I think I can "wrap" all the SQL commands using the following pseudo-code to add "fake" savepoint, and "fake" rollback to savepoint and fire off a "rollback_side_effect()":
side_effect_fired = false; // set true if the side_effect udf called
run("SAVEPOINT _savepoint");
run($sqlcommand);
if (txn_status == ERROR && side_effect_fired) {
run("ROLLBACK TO _savepoint"
rollback_side_effect()); // rollback the side effect
}
I probably need a stack of _savepoint. I will run with that!

Executing a trigger AFTER the completion of a transaction

In PostgreSQL, are DEFERRED triggers executed before (within) the completion of the transaction or just after it?
The documentation says:
DEFERRABLE
NOT DEFERRABLE
This controls whether the constraint can be deferred. A constraint
that is not deferrable will be checked immediately after every
command. Checking of constraints that are deferrable can be postponed
until the end of the transaction (using the SET CONSTRAINTS command).
It doesn't specify if it is still inside the transaction or out. My personal experience says that it is inside the transaction and I need it to be outside!
Are DEFERRED (or INITIALLY DEFERRED) triggers executed inside of the transaction? And if they are, how can I postpone their execution to the time when the transaction is completed?
To give you a hint what I'm after, I'm using pg_notify and RabbitMQ (PostgreSQL LISTEN Exchange) to send out messages. I process such messages in an external application. Right now I have a trigger which notifies the external app of the newly inserted records by including the record's id in the message. But in a non-deterministic way, once in a while, when I try to select a record by its id at hand, the record can not be found. That's because the transaction is not complete yet and the record is not actually added to the table. If I can only postpone the execution of the trigger for after the completion of the transaction, everything will work out.
In order to get better answers let me explain the situation even closer to the real world. The actual scenario is a little more complicated than what I explained before. The source code can be found here if anyone's interested. Becuase of reasons that I'm not gonna dig into, I have to send the notification from another database so the notification is actually sent like:
PERFORM * FROM dblink('hq','SELECT pg_notify(''' || channel || ''', ''' || payload || ''')');
Which I'm sure makes the whole situation much more complicated.
Triggers (including all sorts of deferred triggers) fire inside the transaction.
But that is not the problem here, because notifications are delivered between transactions anyway.
The manual on NOTIFY:
NOTIFY interacts with SQL transactions in some important ways.
Firstly, if a NOTIFY is executed inside a transaction, the notify
events are not delivered until and unless the transaction is
committed. This is appropriate, since if the transaction is aborted,
all the commands within it have had no effect, including NOTIFY. But
it can be disconcerting if one is expecting the notification events to
be delivered immediately. Secondly, if a listening session receives a
notification signal while it is within a transaction, the notification
event will not be delivered to its connected client until just after
the transaction is completed (either committed or aborted). Again, the
reasoning is that if a notification were delivered within a
transaction that was later aborted, one would want the notification to
be undone somehow — but the server cannot "take back" a notification
once it has sent it to the client. So notification events are only
delivered between transactions. The upshot of this is that
applications using NOTIFY for real-time signaling should try to keep
their transactions short.
Bold emphasis mine.
pg_notify() is just a convenient wrapper function for the SQL NOTIFY command.
If some rows cannot be found after a notification has been received, there must be a different cause! Go find it. Likely candidates:
Concurrent transactions interfering
Triggers doing something more or different than you think they do.
All sorts of programming errors.
Either way, like the manual suggests, keep transactions that send notifications short.
dblink
Update: Transaction control in a PROCEDURE or DO statement in Postgres 11 or later makes this a lot simpler. Just COMMIT; to (also) send waiting notifications.
Original answer (mostly for Postgres 10 or older):
PERFORM * FROM dblink('hq','SELECT pg_notify(''' || channel || ''', ''' || payload || ''')');
... which should be rewritten with format() to simplify and make the syntax secure:
PRERFORM dblink('hq', format('NOTIFY %I, %L', channel, payload));
dblink is a game-changer here, because it opens a separate transaction in the other database. This is sometimes used to fake autonomous transaction.
Does Postgres support nested or autonomous transactions?
How do I do large non-blocking updates in PostgreSQL?
dblink() waits for the remote command to finish. So the remote transaction will most probably commit first. The manual:
The function returns the row(s) produced by the query.
If you can send notification from the same transaction instead, that would be a clean solution.
Workaround for dblink
If notifications have to be sent from a different transaction, there is a workaround with dblink_send_query():
dblink_send_query sends a query to be executed asynchronously, that is, without immediately waiting for the result.
DO -- or plpgsql function
$$
BEGIN
-- do stuff
PERFORM dblink_connect ('hq', 'your_connstr_or_foreign_server_here');
PERFORM dblink_send_query('con1', format('SELECT pg_sleep(3); NOTIFY %I, %L ', 'Channel', 'payload'));
PERFORM dblink_disconnect('con1');
END
$$;
If you do this right before the end of the transaction, your local transaction gets 3 seconds (pg_sleep(3)) head start to commit. Chose an appropriate number of seconds.
There is an inherent uncertainty to this approach, since you get no error message if anything goes wrong. For a secure solution you need a different design. After successfully sending the command, chances for it to still fail are extremely slim, though. The chance that successful notifications are missed seem much higher, but that's built into your current solution already.
Safe alternative
A safer alternative would be to write to a queue table and poll it like discussed in #Bohemian's answer. This related answer demonstrates how to poll safely:
Postgres UPDATE … LIMIT 1
I'm posting this as an answer, assuming the actual problem you are trying to solve is deferring execution of an external process until after the transaction is completed (rather than the X-Y "problem" you're trying to solve using trigger Kung Fu).
Having the database tell an app to do something is a broken pattern. It's broken because:
There's no fallback if the app doesn't get the message, eg because it's down, network explodes, whatever. Even the app replying with an acknowledgment (which it can't), wouldn't fix this problem (see next point)
There's no sensible way to retry the work if the app gets the message but fails to complete it (for any of lots of reasons)
In contrast, using the database as a persistant queue, and having the app poll it for work, and take the work off the queue when work is complete, has none of the above problems.
There are lots of ways to achieve this. The one I prefer is to have some process (usually trigger on insert, update and delete) put data into a "queue" table. Have another process poll that table for work to do, and delete from the table when work is complete.
It also adds some other benefits:
The production and consumption of work is decoupled, which means you can safely kill and restart your app (which must happen from time to time, eg deploying) - the queue table will happily grow while the app is down, and will drain when the app is back up. You can even replace the app with an entirely new one
If for whatever reason you want to initiate processing of certain items, you can just manually insert rows into the queue table. I used this technique myself to initiate the processing of all items in a database that needed initialising by being put on the queue once. Importantly, I didn't need to do a perfunctory update to every row just to fire the trigger
Getting to your question, a slight delay can be introduced by adding a timestamp column to the queue table and having the poll query only select rows that are older than (say) 1 second, which gives the database time to complete its transaction
You can't overload the app. The app will read only as much work as it can handle. If your queue is growing, you need a faster app, or more apps If multiple consumers are operating, concurrency can be solved by (for example) adding a "token" column to the queue table
Queues that are backed by database tables is the basis of how persistent queues are implemented in commercial grade queue-based platforms, so the pattern is well tested, used and understood.
Leave the database to do what it does best, and the only thing it does well: Manage data. Don't try to make your database server into an app server.

Could I save Postgres transaction and continue work with db within it later

I know about prepared transaction in Postgres, but seems you can just commit or rollback it later. You cannot even view the transaction's db state before you've committed it. Is any way to save transaction for later use?
What I want to achieve actually is a preview (and correcting) of some changes in db (changes are imports from csv file, so user need to see preview before apply it). I want to make changes, add some changes later, see full state of db and apply it (certainly, commit transaction)
I cannot find a very good reference in docs, but I have a very strong feeling that the answer is: No, you cannot do that.
It would mean that when you "save" the transaction, the database would basically have to maintain all of its locks in place for an indefinite amount of time. Even if it was possible, it would mean horrible failure modes and trouble on all fronts.
For the pattern that you are describing, I would use two separate transactions. Import to a staging table and show that to user (or import to the main table but mark rows as "unapproved"). If user approves, in another transactions move or update these rows.
You can always end up in a situation where user can simply leave or crash without clicking "OK" or "Cancel". If what you're describing was possible, you would end up with a hung transaction holding all these resources. In my proposed solution you end up with wasteful rows in "staging" table that you may still show to user later or remove.
You may want to read up on persistence saga. This is actually a very simple example of a well known and researched problem.
To make the long story short, this pattern breaks down a long-running process like yours into smaller operations that are applied and persisted in some way in separate transactions. If any of them happens to fail (or does not occur as expected), you have compensating actions that usually undo what the steps executed so far have done (e.g. by throwing away stale/irrelevant data).
Here's a decent introduction:
https://blog.couchbase.com/saga-pattern-implement-business-transactions-using-microservices-part/#:~:text=The%20SAGA%20Pattern,completion%20of%20the%20previous%20one.
http://vasters.com/clemensv/2012/09/01/Sagas.aspx
This concept was formally introduced in the 80s, but is well alive and relevant today.

Transaction & Locks Problem

with in do transaction, i defined a label and in this label i am accessing a table with exclusive-lock.and at the end of label i have done all the changes in that table. bt now i am with in transaction block.
Now, i tried to access that same table in another session.then it show an error, Table used by another user. So is it possible that, can we release teh table with in transaction,so another user can access it.
For example:
Session 1)
DO TRANSACTION:
---
---
loopb:
REPEAT:
--
--
---------------------> control is here right now.
END. /*repeat*/
--
--
END. /*do transaction*/
Session 2)
I tried to access same table, but it show an error, that table locked by another user.
All those records you touched in the loop using EXCLUSIVE-LOCK will not be available to be locked by another user until the TRANSACTION is complete. There is no getting around this. If the second process needs to lock those records, then all you can do is decrease your TRANSACTION scope in the first process. This is a safety feature so that if an error happens later on in the TRANSACTION, all the changes made during the TRANSACTION will be rolled back. Another way to look at it is if you could release some record locks during a TRANSACTION, you would lose the atomicity (all-or-nothingness) that is part of the definition of a TRANSACTION.
It should be noted that if you don't really need to lock those records in the second process but just need to see their updated value, that is possible. Once the updated records are no longer in the record buffer (or the record lock status is downgraded to a NO-LOCK in the TRANSACTION), they will become limbo locks and you can view their updated values using a NO-LOCK. To make the last record in the loop become a limbo lock, you can either do this
FIND CURRENT tablerecord NO-LOCK.
Or this, if you do not need to access the record buffer any longer:
RELEASE tablerecord.
Other sessions can do a "dirty read" of the record using NO-LOCK. But they will not be able to lock it or update it until the transaction is committed (or rolled back). And that won't happen until the repeat block iterates or you leave it.

How to check for pending operations in a PostgreSQL transaction

I have a session (SQLAlchemy) on PostgreSQL, with an active uncommitted transaction. I have just passed the session to some call tree that may or may not have issued SQL INSERT/UPDATE/DELETE statements, through sqlalchemy.orm or directly through the underlying connection.
Is there a way to check whether there are any pending data-modifying statements in this transaction? I.e. whether commit would be a no-op or not, and whether rollback would discard something or not?
I've seen people point out v$transaction in Oracle for the same thing (see this SO question). I'm looking for something similar to use on PostgreSQL.
Start by checking into system view pg_locks.
http://www.postgresql.org/docs/8.4/interactive/view-pg-locks.html
Consider the following sequence of statements:
select txid_current();
begin;
select txid_current();
If the transaction id returned by the two selects is equal, then there is an open transaction. If not then there wasn't, (but now is).
If the numbers are different, then as a side effect you will just have opened a transaction, which you will probably want to close.
UPDATE: In fact, as #r2evans points out (thanks for the insight!), you don't need the "begin" -- txid_current() will return the same number just if you are in a transaction.
Since Postgres 10:
select txid_current_if_assigned();
will return null if there is no current transaction.
If a Start Transaction has been issued, it will still return null if there have been no updates.
No, not from the database level, really. Perhaps you can add some tracing at the sqlalchemy level to track it?
Also, how do you define a no-op? What if you updated a value to the same value it had before, is that a no-op or not? From the databases perspective, if it had one, it would not be a no-op. But from the application perspective, it probably would.