What is a good way of rolling back a transaction in Postgres - amazon-redshift

I want to insert data into a table from a staging table but keep the data unchanged if an error happens.
What I have is a working happy path
Begin transaction;
DELETE FROM mytable;
INSERT INTO mytable SELECT * FROM mytable_staging ;
Commit transaction;
In case the insert statement is failing how can I rollback the transaction?

PostgreSQL transactions will roll back on error automatically, see this.
Atomicity − Ensures that all operations within the work unit are
completed successfully; otherwise, the transaction is aborted at the
point of failure and previous operations are rolled back to their
former state.
Consistency − Ensures that the database properly changes states upon a
successfully committed transaction.
Isolation − Enables transactions to operate independently of and
transparent to each other.
Durability − Ensures that the result or effect of a committed
transaction persists in case of a system failure.

You can rollback a Postgres transaction using the ROLLBACK [WORK | TRANSACTION] statement:
Begin transaction;
DELETE FROM mytable;
INSERT INTO mytable SELECT * FROM mytable_staging ;
Rollback transaction;
All the SQL commands are case-insensitive and the transaction part of the statement is optional, but I like to include it for clarity.

Related

PostgreSQL for update statement

PostgreSQL has read committed isolation level. Now I have a transaction which consists of a single DELETE statement and this delete statement has a subquery consisting of a SELECT statement for selection the rows to delete.
Is it true that I have to use FOR UPDATE in the select statement to get no conflicts with other transaction?
My thinking is the following: First the corresponding rows are read out from the table and in a second step these rows are deleted, so another transaction could interfere.
And what about a simple DELETE FROM myTable WHERE id = 4 statement? Do I also have to use FOR UPDATE?
Is it true that I have to use FOR UPDATE in the select statement to
get no conflicts with other transaction?
What does "no conflicts with other transaction" mean to you? You can test this by opening two terminals, and executing statements in each of them. Interleaved correctly, the DELETE statement will make the "other transaction" (the one that has its isolation level set to READ COMMITTED) wait until it commits or rolls back.
sandbox=# set transaction isolation level read committed;
SET
sandbox=# select * from customer;
date_of_birth
---------------
1996-09-29
1996-09-28
(2 rows)
sandbox=# begin transaction;
BEGIN
sandbox=# delete from customer
sandbox-# where date_of_birth = '1996-09-28';
DELETE 1
sandbox=# update customer
sandbox-# set date_of_birth = '1900-01-01'
sandbox-# where date_of_birth = '1996-09-28';
(Execution pauses here, waiting for transaction in other terminal.)
sandbox=# commit;
COMMIT
sandbox=#
UPDATE 0
sandbox=#
See below for the documentation.
And what about a simple DELETE FROM myTable WHERE id = 4 statement? Do
I also have to use FOR UPDATE?
There's no such statement as DELETE . . . FOR UPDATE.
You need to be sensitive to context when you're reading about database updates. Update can mean any change to a database; it can include inserting, deleting, and updating rows. In the docs cited below, "locked as though for update" is explicitly talking about UPDATE and DELETE statements, among others.
Current docs
FOR UPDATE causes the rows retrieved by the SELECT statement to be
locked as though for update. This prevents them from being modified or
deleted by other transactions until the current transaction ends. That
is, other transactions that attempt UPDATE, DELETE, SELECT FOR UPDATE,
SELECT FOR NO KEY UPDATE, SELECT FOR SHARE or SELECT FOR KEY SHARE of
these rows will be blocked until the current transaction ends. The FOR
UPDATE lock mode is also acquired by any DELETE on a row, and also by
an UPDATE that modifies the values on certain columns. Currently, the
set of columns considered for the UPDATE case are those that have an
unique index on them that can be used in a foreign key (so partial
indexes and expressional indexes are not considered), but this may
change in the future. Also, if an UPDATE, DELETE, or SELECT FOR UPDATE
from another transaction has already locked a selected row or rows,
SELECT FOR UPDATE will wait for the other transaction to complete, and
will then lock and return the updated row (or no row, if the row was
deleted).
Short version: the FOR UPDATE in a sub-select is not necessary because the DELETE implementation already does the necessary locking. It would be redundant.
Ideally you should read and digest Concurrency Control to learn how the concurrency issues are dealt with by the SQL engine.
Specifically for the case you're mentioning, I think these couple of excerpts are the most relevant, in Read Committed Isolation Level:
UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE commands
behave the same as SELECT in terms of searching for target rows: they
will only find target rows that were committed as of the command start
time.
However, such a target row might have already been updated (or
deleted or locked) by another concurrent transaction by the time it is
found. In this case, the would-be updater will wait for the first
updating transaction to commit or roll back (if it is still in
progress).
So one of your two concurrent DELETE will be put to wait, as soon as it tries to delete a row that the other one already processed just before. This wait will only end when the other one commits or roll backs. In a way, that means that the engine "detected the conflict" and serialized the two DELETE in order to deal with that conflict.
If the first updater rolls back, then its effects are
negated and the second updater can proceed with updating the
originally found row. If the first updater commits, the second updater
will ignore the row if the first updater deleted it, otherwise it will
attempt to apply its operation to the updated version of the row.
In your scenario, after the first DELETE has committed and the second one is waked up, the second one will be unable to delete the row that it was put to wait for, because it's no longer current, it's gone. That's not an error in this isolation level. The execution will just go on with the other rows, some of which may also have disappeared. Eventually it will report the actual number of rows that were deleted by this statement, that may be different from the number that the sub-select initially found, before the statement was put to wait.

T-SQL unfinished transaction lock

I tried to run the following query
BEGIN TRANSACTION t1
SET IDENTITY_INSERT existingTableA On
insert INTO existingTableA (columnsFromTableA)
SELECT (columnsFromIdenticalTableB) from column identicalTableB
problem is that I didn't commit the transaction explicitly and it appears that I cannot do it any more and it's still ongoing. the select operation from tableA never ends because of the lock I cannot kill ("Distributed transaction with UOW {2B9A3B1B-F3EF-4C5A-9AD8-F434EF9EA3EC} is in prepared state. Only Microsoft Distributed Transaction Coordinator can resolve this transaction. KILL command failed.")
how do I end that transaction?
Run DBCC OPENTRAN.
Then kill the open Transaction with KILL xx, where xx is the SPID.

Are PostgreSQL functions transactional?

Is a PostgreSQL function such as the following automatically transactional?
CREATE OR REPLACE FUNCTION refresh_materialized_view(name)
RETURNS integer AS
$BODY$
DECLARE
_table_name ALIAS FOR $1;
_entry materialized_views%ROWTYPE;
_result INT;
BEGIN
EXECUTE 'TRUNCATE TABLE ' || _table_name;
UPDATE materialized_views
SET last_refresh = CURRENT_TIMESTAMP
WHERE table_name = _table_name;
RETURN 1;
END
$BODY$
LANGUAGE plpgsql VOLATILE SECURITY DEFINER;
In other words, if an error occurs during the execution of the function, will any changes be rolled back? If this isn't the default behavior, how can I make the function transactional?
PostgreSQL 12 update: there is limited support for top-level PROCEDUREs that can do transaction control. You still cannot manage transactions in regular SQL-callable functions, so the below remains true except when using the new top-level procedures.
Functions are part of the transaction they're called from. Their effects are rolled back if the transaction rolls back. Their work commits if the transaction commits. Any BEGIN ... EXCEPT blocks within the function operate like (and under the hood use) savepoints like the SAVEPOINT and ROLLBACK TO SAVEPOINT SQL statements.
The function either succeeds in its entirety or fails in its entirety, barring BEGIN ... EXCEPT error handling. If an error is raised within the function and not handled, the transaction calling the function is aborted. Aborted transactions cannot commit, and if they try to commit the COMMIT is treated as ROLLBACK, same as for any other transaction in error. Observe:
regress=# BEGIN;
BEGIN
regress=# SELECT 1/0;
ERROR: division by zero
regress=# COMMIT;
ROLLBACK
See how the transaction, which is in the error state due to the zero division, rolls back on COMMIT?
If you call a function without an explicit surounding transaction the rules are exactly the same as for any other Pg statement:
BEGIN;
SELECT refresh_materialized_view(name);
COMMIT;
(where COMMIT will fail if the SELECT raised an error).
PostgreSQL does not (yet) support autonomous transactions in functions, where the procedure/function could commit/rollback independently of the calling transaction. This can be simulated using a new session via dblink.
BUT, things that aren't transactional or are imperfectly transactional exist in PostgreSQL. If it has non-transactional behaviour in a normal BEGIN; do stuff; COMMIT; block, it has non-transactional behaviour in a function too. For example, nextval and setval, TRUNCATE, etc.
As my knowledge of PostgreSQL is less deeper than Craig Ringer´s I will try to give a shorter answer: Yes.
If you execute a function that has an error in it, none of the steps will impact in the database.
Also, if you execute a query in PgAdmin the same happen.
For example, if you execute in a query:
update your_table yt set column1 = 10 where yt.id=20;
select anything_that_do_not_exists;
The update in the row, id = 20 of your_table will not be saved in the database.
UPDATE Sep - 2018
To clarify the concept I have made a little example with non-transactional function nextval.
First, let´s create a sequence:
create sequence test_sequence start 100;
Then, let´s execute:
update your_table yt set column1 = 10 where yt.id=20;
select nextval('test_sequence');
select anything_that_do_not_exists;
Now, if we open another query and execute
select nextval('test_sequence');
We will get 101 because the first value (100) was used in the latter query (that is because the sequences are not transactional) although the update was not committed.
https://www.postgresql.org/docs/current/static/plpgsql-structure.html
It is important not to confuse the use of BEGIN/END for grouping statements in PL/pgSQL with the similarly-named SQL commands for transaction control. PL/pgSQL's BEGIN/END are only for grouping; they do not start or end a transaction. Functions and trigger procedures are always executed within a transaction established by an outer query — they cannot start or commit that transaction, since there would be no context for them to execute in. However, a block containing an EXCEPTION clause effectively forms a subtransaction that can be rolled back without affecting the outer transaction. For more about that see Section 39.6.6.
In the function level, it is not transnational. In other words, each statement in the function belongs to a single transaction, which is the default db auto commit value. Auto commit is true by default. But anyway, you have to call the function using
select schemaName.functionName()
The above statement 'select schemaName.functionName()' is a single transaction, let's name the transaction T1, and so the all the statements in the function belong to the transaction T1. In this way, the function is in a single transaction.
Postgres 14 update: All statements written in between the BEGIN and END block of a Procedure/Function is executed in a single transaction. Thus, any errors arising while execution of this block will cause automatic roll back of the transaction.
Additionally, the ATOMIC Transaction including triggers as well.

Create a stored procedure in PostgreSQL that is never rolled back?

From the PostgreSQL 9.0 manual:
Important: To avoid blocking concurrent transactions that obtain
numbers from the same sequence, a nextval operation is never rolled
back; that is, once a value has been fetched it is considered used,
even if the transaction that did the nextval later aborts. This means
that aborted transactions might leave unused "holes" in the sequence
of assigned values. setval operations are never rolled back, either.
So, how can I create a PL\PgSQL function with the same behaviour: "operation is never rolled back"?
In a call like this, whatever the function changes will NOT be rolled back:
BEGIN;
SELECT composite_nextval(...);
ROLLBACK;
You can use a savepoint after selecting composite_nextval. Then, just rollback to that savepoint and commit the rest.
Something like this:
BEGIN;
SELECT composite_nextval(...);
SAVEPOINT my_savepoint;
INSERT INTO some_table(a) VALUES (2);
ROLLBACK TO SAVEPOINT my_savepoint;
COMMIT;
This way, select composite_nextval(...) will be committed, but insert into some_table will not.

What is the point of "ROLLBACK TRANSACTION named_transaction"?

I've read through MSDN on ROLLBACK TRANSACTION and nesting transactions. While I see the point of ROLLBACK TRANSACTION savepointname, I do not understand ROLLBACK TRANSACTION transactionname.
It only works when transactionname is the outermost transaction
ROLLBACK always rolls back the entire transaction "stack", except in the case of savepointname
Basically, as I read the documentation, except in the case of a save point, ROLLBACK rolls back all transactions (to ##TRANCOUNT=0). The only difference I can see is this snippet:
If a ROLLBACK TRANSACTION transaction_name statement using the name of
the outer transaction is executed at any level of a set of nested
transactions, all of the nested transactions are rolled back.
If a ROLLBACK WORK or ROLLBACK TRANSACTION statement without a
transaction_name parameter is executed at any level of a set of nested
transaction, it rolls back all of the nested transactions, including
the outermost transaction.
From the reading, this suggests to me that rolling back a named transaction (which must be the name of the outermost transaction), only the nested transactions will be rolled back. This would give some meaning to rolling back a named transaction. So I set up a test:
CREATE TABLE #TEMP (id varchar(50))
INSERT INTO #TEMP (id) VALUES ('NO')
SELECT id AS NOTRAN FROM #TEMP
SELECT ##TRANCOUNT AS NOTRAN_TRANCOUNT
BEGIN TRAN OUTERTRAN
INSERT INTO #TEMP (id) VALUES ('OUTER')
SELECT id AS OUTERTRAN FROM #TEMP
SELECT ##TRANCOUNT AS OUTERTRAN_TRANCOUNT
BEGIN TRAN INNERTRAN
INSERT INTO #TEMP (id) VALUES ('INNER')
SELECT id AS INNERTRAN FROM #TEMP
SELECT ##TRANCOUNT AS INNERTRAN_TRANCOUNT
ROLLBACK TRAN OUTERTRAN
IF ##TRANCOUNT > 0 ROLLBACK TRAN
SELECT id AS AFTERROLLBACK FROM #TEMP
SELECT ##TRANCOUNT AS AFTERROLLBACK_TRANCOUNT
DROP TABLE #TEMP
results in (all "X row(s) affected" stuff removed)
NOTRAN
--------------------------------------------------
NO
NOTRAN_TRANCOUNT
----------------
0
OUTERTRAN
--------------------------------------------------
NO
OUTER
OUTERTRAN_TRANCOUNT
-------------------
1
INNERTRAN
--------------------------------------------------
NO
OUTER
INNER
INNERTRAN_TRANCOUNT
-------------------
2
AFTERROLLBACK
--------------------------------------------------
NO
AFTERROLLBACK_TRANCOUNT
-----------------------
0
Note that there is no difference to the output when I change
ROLLBACK TRAN OUTERTRAN
to simply
ROLLBACK TRAN
So what is the point of ROLLBACK TRANSACTION named_transaction?
Save points are exactly as the name implies: 'save points' in the log sequence. The log sequence is always linear. If you rollback to a save point, you rollback everything your transaction did between your current log position and the save point. Consider your example:
LSN 1: BEGIN TRAN OUTERTRAN
LSN 2: INSERT INTO ...
LSN 3: BEGIN TRAN INNERTRAN
LSN 4: INSERT INTO ...
LSN 5: ROLLBACK TRAN OUTERTRAN
At Log Sequence Number (LSN) 1 the OUTERTRAN save point is created. The first INSERT creates LSN 2. Then the INNERTRAN creates a save point with LSN 3. Second INSERT creates a new LSN, 4. The ROLLBACK OUTERTRAN is equivalent to 'ROLLBACK log until the LSN 1'. You cannot 'skip' portions of the log, so you must rollback every operation in the log until LSN 1 (when the save point OUTERTRAN was created) is hit.
On the other hand if at the last operation you would issue ROLLBACK INNERTRAN the engine would roll back until the LSN 3 (where the 'INNERTRAN' save point was inserted in the log) thus preserving LSN 1 and LSN 2 (ie. the first INSERT).
For a practical example of save points see Exception handling and nested transactions.