Rollback in Postgres - postgresql

As far as I know, we can't use start transaction within functions, thus we can't use COMMIT and ROLLBACK in functions.
But how then we ROLLBACK by some if-condition?
How then we can perform a sequence of statements in a specific level of isolation? I mean a situation when an application wants to call a SQL (plpgsql) function and that function really needs to be run in a transaction with a certain isolation level. What to do in such a case?
In which cases then it is really practical to run ROLLBACK? Only when we manually write a script, check something and then ROLLBACK manually if we don't like the result. And in the same case, I see the practicality of savepoints. However, I feel like it is a serious constraint.

If you want to rollback the complete transaction, RAISE an exception.
If you only want to roll back part of your work, start a new block with a BEGIN at the point to which you want to roll back and add an EXCEPTION clause to the block.
Since the transaction is started outside the function, the isolation level already has to be set properly when you are in the function.
You can query
SELECT current_setting('transaction_isolation', TRUE);
and throw an error if the setting is not correct.
is too general or too simple to answer.
You roll back a transaction if you have reached a point in your processing where you want to undo everything you have done so far in the transaction.
Often, that happens implicitly rather than explicitly by throwing an error.

Related

PostgreSQL - how to determine whether a transaction is active?

Let me open by saying: yes, I am aware of Determine if a transaction is active (Postgres)
Unfortunately the sole answer to that question is far too specific to the use case provided, and doesn't actually indicate whether or not a transaction is active.
The select txid_current(); trick suggested by How to check for pending operations in a PostgreSQL transaction doesn't appear to work - I always get the same transaction ID from adjacent calls to that function. Possibly this is because I'm trying to test it from pgAdmin, which is transparently starting transactions...? (Note: I don't actually care whether there are any pending changes or active locks, so looking at pg_locks isn't helpful - what if nothing's been touched since the transaction was started?)
So: How can I determine in PostgreSQL PL/pgSQL code if a transaction is currently active?
One possible use case is: the SP/FN in question will be doing its own explicit transaction management, and calling it with a transaction already active will greatly interfere with that. I want to raise an error so that the coding mistake of calling this SP/FN in a transaction can be corrected.
There are other use cases, though.
Ideally what I'm looking for is an equivalent to MSSQL's ##TRANCOUNT (though I don't really care how deeply the transactions may be nested...)
Postgres runs PL/pgSQL inside the transaction. Thus you can't control transaction from inside PL/pgSQL. Code will look like:
begin;
select plpgsql_fn();
do '/*same any plpgsql*/';
end;
So answering your question:
If you have PL/pgSQL running ATM, you have your transaction active ATM...
Of course you can do some trick, like starting/ending work over dblink or such. but then you can check select txid_current(); over the dblink successfully...
If you want to determine if there have been any data modifications in your transaction, call txid_current_if_assigned(). It returns NULL if nothing has been modified yet.
If you only want to know if you are inside some transaction, you can save yourself the trouble, because you always are.
Before PostgreSQL v11, you cannot use transaction control statements in a function.
I haven't found a clean way to do that, but you can always call BEGIN and if it succeeds it means there is no transaction in progress (don't forget to rollback). If it fails with "there is already a transaction in progress" this means you are within transaction (better not to rollback then).

How to catch PostgreSQL ROLLBACK for a function with UOW side-effects?

I am writing a scalar plpgsql function that contains a C function that has a side-effect outside of the database. When the function is invoked, in some arbitrary SQL (trigger, select, write, etc), I want the side-effect to be committed or rolled back on the PostgreSQL unit of work (UOW) boundary. I can handle the UOW commit, but I don't know how to "catch" the database ROLLBACK and rollback the side-effect. The key point is I am writing the function, but don't have control of how it is called, i.e., I can not "force" the call to be in a block with EXCEPTION handlers. Any ideas?
For the commit, I plan to have the plpsql function INSERT into a database TABLE that has a trigger "... AFTER INSERT ... EXECUTE PROCEDURE commit_my_side_effect()", so when the UOW is committed, the row is committed, the AFTER INSERT trigger fires and presto, the side effect is committed;
The only idea I have so far is to pass out the txid_current() to a background worker process. Then on some heartbeat using SPI, check if the txid is not in flight or committed, then it must have been rolled back. But that seems like heavy lifting.
Broadly speaking, a transaction is considered "rolled back" if it's not committed and it's no longer running; in the interests of ACID compliance, an explicit ROLLBACK needs to be functionally identical to yanking the power cord on your server, so in general, there can't be any deliberate action associated with a rollback which you might be able to hook into.
The actual removal of rolled-back data is handled by vacuuming, which works more or less like your proposed background worker: anything written by a transaction which is not running and not committed is a candidate for removal. However, there's a bit more to it than that, as a transaction containing subtransactions (SAVEPOINTs or PL/pgSQL EXCEPTION blocks) can be partially rolled back. In other words, txid_current() alone isn't enough to decide if a change was committed, and I don't know if Postgres exposes enough information about subtransaction states to let you to cater for this.
I think the only sane approach is to move the application of side-effects to an external process, and trigger it after commit, once you know what has actually been committed. Two approaches come to mind:
Have your PL/pgSQL function insert into a work queue which is polled by the external process, or
Feed changes to the process via NOTIFY (notifications are only delivered on commit, and notifications from rolled-back subtransactions are discarded)
Notifications are more lightweight and lower latency (they're delivered asynchronously, so no polling is necessary), but less robust than a table-based approach, as the notification queue is wiped out in the event of a crash or an unexpected disconnection. Of course, if you want crash safety without the downsides of polling, you can simply do both.
I found a feature called ON_ERROR_ROLLBACK, and looking at the implementation,https://github.com/postgres/postgres/blob/master/src/bin/psql/common.c, I think I can "wrap" all the SQL commands using the following pseudo-code to add "fake" savepoint, and "fake" rollback to savepoint and fire off a "rollback_side_effect()":
side_effect_fired = false; // set true if the side_effect udf called
run("SAVEPOINT _savepoint");
run($sqlcommand);
if (txn_status == ERROR && side_effect_fired) {
run("ROLLBACK TO _savepoint"
rollback_side_effect()); // rollback the side effect
}
I probably need a stack of _savepoint. I will run with that!

Could I save Postgres transaction and continue work with db within it later

I know about prepared transaction in Postgres, but seems you can just commit or rollback it later. You cannot even view the transaction's db state before you've committed it. Is any way to save transaction for later use?
What I want to achieve actually is a preview (and correcting) of some changes in db (changes are imports from csv file, so user need to see preview before apply it). I want to make changes, add some changes later, see full state of db and apply it (certainly, commit transaction)
I cannot find a very good reference in docs, but I have a very strong feeling that the answer is: No, you cannot do that.
It would mean that when you "save" the transaction, the database would basically have to maintain all of its locks in place for an indefinite amount of time. Even if it was possible, it would mean horrible failure modes and trouble on all fronts.
For the pattern that you are describing, I would use two separate transactions. Import to a staging table and show that to user (or import to the main table but mark rows as "unapproved"). If user approves, in another transactions move or update these rows.
You can always end up in a situation where user can simply leave or crash without clicking "OK" or "Cancel". If what you're describing was possible, you would end up with a hung transaction holding all these resources. In my proposed solution you end up with wasteful rows in "staging" table that you may still show to user later or remove.
You may want to read up on persistence saga. This is actually a very simple example of a well known and researched problem.
To make the long story short, this pattern breaks down a long-running process like yours into smaller operations that are applied and persisted in some way in separate transactions. If any of them happens to fail (or does not occur as expected), you have compensating actions that usually undo what the steps executed so far have done (e.g. by throwing away stale/irrelevant data).
Here's a decent introduction:
https://blog.couchbase.com/saga-pattern-implement-business-transactions-using-microservices-part/#:~:text=The%20SAGA%20Pattern,completion%20of%20the%20previous%20one.
http://vasters.com/clemensv/2012/09/01/Sagas.aspx
This concept was formally introduced in the 80s, but is well alive and relevant today.

CREATE DATABASE inside transaction

According to postgresql docs;
CREATE DATABASE cannot be executed inside a transaction block.
Is there a technical reason for this?
When you try it, you get the error:
ERROR: CREATE DATABASE cannot run inside a transaction block
This comes from src/backend/access/transam/xact.c (line 3023 on my sources, but varies by version), in PreventTransactionChain(...).
The comment there explains that:
This routine is to be called by statements that must not run inside
a transaction block, typically because they have non-rollback-able
side effects or do internal commits.
For CREATE DATABASE it's called from src/backend/tcop/utility.c in standard_ProcessUtility under the case for T_CreatedbStmt, but unfortunately there isn't any informative comment that says why specifically CREATE DATABASE isn't safe to run in a transaction.
Looking at the sources, I can see that for one thing it forces a checkpoint.
Overall, though, I don't see anything that really screams out "we can't do this transactionally". It's more "we haven't implemented the functionality to do this transactionally".
It's conceptual reason: files creation has no relation to DB transaction and there is no guaranty that during the rollback they will be deleted.

How to check for pending operations in a PostgreSQL transaction

I have a session (SQLAlchemy) on PostgreSQL, with an active uncommitted transaction. I have just passed the session to some call tree that may or may not have issued SQL INSERT/UPDATE/DELETE statements, through sqlalchemy.orm or directly through the underlying connection.
Is there a way to check whether there are any pending data-modifying statements in this transaction? I.e. whether commit would be a no-op or not, and whether rollback would discard something or not?
I've seen people point out v$transaction in Oracle for the same thing (see this SO question). I'm looking for something similar to use on PostgreSQL.
Start by checking into system view pg_locks.
http://www.postgresql.org/docs/8.4/interactive/view-pg-locks.html
Consider the following sequence of statements:
select txid_current();
begin;
select txid_current();
If the transaction id returned by the two selects is equal, then there is an open transaction. If not then there wasn't, (but now is).
If the numbers are different, then as a side effect you will just have opened a transaction, which you will probably want to close.
UPDATE: In fact, as #r2evans points out (thanks for the insight!), you don't need the "begin" -- txid_current() will return the same number just if you are in a transaction.
Since Postgres 10:
select txid_current_if_assigned();
will return null if there is no current transaction.
If a Start Transaction has been issued, it will still return null if there have been no updates.
No, not from the database level, really. Perhaps you can add some tracing at the sqlalchemy level to track it?
Also, how do you define a no-op? What if you updated a value to the same value it had before, is that a no-op or not? From the databases perspective, if it had one, it would not be a no-op. But from the application perspective, it probably would.