redshift serializable-isolation-violation due to select and insert - amazon-redshift

I have two thread:
thread1:
select * from A;
thread2:
insert into A values(...)
When these two threads run in parallel, it will trigger serializable-isolation-violation.
In my understanding this violation should only been triggered when concurrent write, can anyone provide some guidance about why this happened in select/insert use case?

Related

How do you handle error handling and commits in Postgres

I am using Postgres 13.5 and I am unsure how to combine commit and error handling in a stored procedure or DO block. I know that if I include the EXCEPTION clause in my block, then I cannot include a commit.
I am new to Postgres. It has also been over 15 years since I have written SQL that was working with transactions. When I was working with transactions I was using Oracle and recall using AUTONOMOUS_TRANSACTION to resolve some of these issues. I am just not sure how to do something like that in Postgres.
Here is a very simplified DO block. As I said above, I know that the Commits will cause the procedure to throw and exception. But, if I remove the EXCEPTION clause, then how will I trap an error if it happens? After reading many things, I still have not found a solution. So, I am not understanding something that will lead me to the solution.
Do
$$
DECLARE
v_Start timestamptz;
v_id integer;
v_message_type varchar(500);
Begin
select current_timestamp into start;
select q.id, q.message_type into (v_id, v_message_type) from message_queue;
call Load_data(v_id, v_message_type);
commit; -- if Load_Data completes successfully, I want to commmit the data
insert into log (id, message_type, Status, start, end)
values (v_id, v_message_type, 'Success', v_start, Currrent_Timestamp);
commit; -- commit the log issert for success
EXCEPTION
WHEN others THEN
insert into log (id, message_type, status, start, end, error_message)
values (v_id, v_message_type, 'Failue', v_start, Currrent_Timestamp, SQLERRM || '', ' ||
SQLSTATE );
commit; -- commit the log insert for failure.
end;
$$
Thanks!
Since this is a pattern that I will have to do tens of times, I want to understand the right way to do this.
Since you cannot use transaction management statements in a subtransaction, you will have to move part of the processing to the client side.
But your sample code doesn't need any transaction management at all! Simply remove all the COMMIT statements, and the procedure will work just as you want it to. Remember that PostgreSQL uses the autocommit mode, so your procedure call from the client will automatically run in its own transaction and commit when it is done.
But perhaps your sample code is simplified, and you would like more complicated processing (looping etc.) in your actual use cases. So let's discuss your options:
One option is to remove the EXCEPTION handler and move only that part to the client side: if the procedure causes an error, roll back and insert a log message. Another, perhaps cleaner, method is to move the whole transaction management to the client side. In that case, you would replace the complete procedure with client code and call load_data directly from client code.

Prevent Deadlock Errors with Trigger on high concurrent write table

I have a table that is getting around 1000+ inserts per minute. There is a trigger on it to update a column on another table.
CREATE or replace FUNCTION clothing_price_update() RETURNS trigger AS $clothing_price_update$
BEGIN
INSERT INTO
clothes(clothing_id, last_price, sale_date)
VALUES(NEW.clothing_id, new.price, new."timestamp")
ON CONFLICT (clothing_id) DO UPDATE set last_price = NEW.price, sale_date = NEW."timestamp";
RETURN NEW;
END;
$clothing_price_update$ LANGUAGE plpgsql;
CREATE TRIGGER clothing_price_update_trigger BEFORE INSERT OR UPDATE ON sales
FOR EACH ROW EXECUTE PROCEDURE clothing_price_update();
However, I'm randomly getting a Deadlock error. This seems pretty straightforward and there are no other triggers in play. Am I missing something?
sales has data constantly being inserted into it, but it relies on no other tables and no updates occur once data has been added.
Going out on a limb, the typical root cause for deadlocks is that the order of written (locked) rows is inconsistent among concurrent transactions.
Imagine two exactly concurrent transactions:
T1:
INSERT INTO sales(clothing_id, price, timestamp) VALUES
(1, 11, '2000-1-1')
, (2, 22, '2000-2-1');
T2:
INSERT INTO sales(clothing_id, price, timestamp) VALUES
(2, 23, '2000-2-1')
, (1, 12, '2000-1-1');
T1 locks the row with `clothing_id = 1` in `sales` and `clothes`.
T2 locks the row with `clothing_id = 2` in `sales` and `clothes`.
T1 waits for T2 to release locks for `clothing_id = 2`.
T2 waits for T1 to release locks for `clothing_id = 1`.
💣 Deadlock.
Typically, deadlocks are still extremely unlikely as the time window is so narrow, but with bigger sets / more concurrent transaction / longer transactions / more expensive writes / added cycles for triggers (!) etc. it gets more likely.
The trigger itself is not the cause in this scenario (unless it introduces writes out of order!), it only increases the probability of a deadlock actually happening.
The cure is to insert rows in consistent sort order within the same transaction. Most importantly within the same command. Then the next transaction will wait in line until the first one finishes (COMMIT or ROLLBACK) and releases its locks. The manual:
The best defense against deadlocks is generally to avoid them by being
certain that all applications using a database acquire locks on
multiple objects in a consistent order.
See:
How to simulate deadlock in PostgreSQL?
Long-running transactions typically add to the problem. See:
Table Locking in PostgreSQL
Aside, you use:
ON CONFLICT (clothing_id) DO UPDATE set last_price = NEW.price ...
You may want to use EXCLUDED instead of NEW here:
ON CONFLICT (clothing_id) DO UPDATE set last_price = EXCLUDED.price ...
Subtle difference: this way, effects of possible triggers ON INSERT are carried over, while pasting NEW again overwrites that. Related:
How to UPSERT multiple rows with individual values in one statement?

Which explicit lock to use for a trigger?

I am trying to understand which type of a lock to use for a trigger function.
Simplified function:
CREATE OR REPLACE FUNCTION max_count() RETURNS TRIGGER AS
$$
DECLARE
max_row INTEGER := 6;
association_count INTEGER := 0;
BEGIN
LOCK TABLE my_table IN ROW EXCLUSIVE MODE;
SELECT INTO association_count COUNT(*) FROM my_table WHERE user_id = NEW.user_id;
IF association_count > max_row THEN
RAISE EXCEPTION 'Too many rows';
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE CONSTRAINT TRIGGER my_max_count
AFTER INSERT OR UPDATE ON my_table
DEFERRABLE INITIALLY DEFERRED
FOR EACH ROW
EXECUTE PROCEDURE max_count();
I initially was planning to use EXCLUSIVE but it feels too heavy. What I really want is to ensure that during this function execution no new rows are added to the table with concerned user_id.
If you want to prevent concurrent transactions from modifying the table, a SHARE lock would be correct. But that could lead to a deadlock if two such transactions run at the same time — each has modified some rows and is blocked by the other one when it tries to escalate the table lock.
Moreover, all table locks that conflict with SHARE UPDATE EXCLUSIVE will lead to autovacuum cancelation, which will cause table bloat when it happens too often.
So stay away from table locks, they are usually the wrong thing.
The better way to go about this is to use no explicit locking at all, but to use the SERIALIZABLE isolation level for all transactions that access this table.
Then you can simply use your trigger (without lock), and no anomalies can occur. If you get a serialization error, repeat the transaction.
This comes with a certain performance penalty, but allows more concurrency than a table lock. It also avoids the problems described in the beginning.

PostgreSQL concurrent update selects

I am attempting to have some sort of update select for a job queue. I need it to support concurrent processes affecting the same table or database This server will be used only for the queue so a database per queue is acceptable. Originally I was thinking about something like the following:
UPDATE state=1,ts=NOW() FROM queue WHERE ID IN (SELECT ID FROM queue WHERE state=0 LIMIT X) RETURN *
Which I been reading that this will cause a race condition, I read that there was a an option for the SELECT subquery to use FOR UPDATE, but then that will lock the row and concurrent calls will be blocked where I would not mind if they skip over to the next unlocked row.
So what i am asking for is the best way to have a fifo system in postgres that requires the least amount of locking the entire database.
The typical way to do this is to wrap it in a PLPGSQL function, select FOR UPDATE NOWAIT, and then use exception handling to skip the locked rows.
This does place some additional overhead on the function because exception handling requires additional processor cycles to manage even if there are no exceptions.
As a very simple example:
CREATE OR REPLACE FUNCTION get_all_unlocked_customers() RETURNS SETOF customer
LANGUAGE PLPGSQL AS
$$
RETURN QUERY SELECT * FROM customer FOR UPDATE NOWAIT;
EXCEPTION
WHEN LOCK_NOT_AVAILABLE THEN
-- NO NEED TO DO ANYTHING
END;
END;
$$;

DB2 deadlock timeout Sqlstate: 40001, reason code 68 due to update statements called from servlet using SQL

I am calling update statements one after the other from a servlet to DB2. I am getting error sqlstate 40001, reason code 68 which i found it is due to deadlock timeout.
How can I resolve this issue?
Can it be resolved by setting query timeout?
If yes then how to use it with update statements in servlet or where to use it?
The reason code 68 already tells you this is due to a lock timeout (deadlock is reason code 2) It could be due to other users running queries at the same time that use the same data you are accessing, or your own multiple updates.
Begin by running db2pd -db locktest -locks show detail from a db2 command line to see where the locks are. You'll then need to run something like:
select tabschema, tabname, tableid, tbspaceid
from syscat.tables where tbspaceid = # and tableid = #
filling in the # symbols with the ID number you get from the db2pd command output.
Once you see where the locks are, here are some tips:
◦Deadlock frequency can sometimes be reduced by ensuring that all applications access their common data in the same order – meaning, for example, that they access (and therefore lock) rows in Table A, followed by Table B, followed by Table C, and so on.
taken from: http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.trb.doc/doc/t0055074.html
recommended reading: http://www.ibm.com/developerworks/data/library/techarticle/dm-0511bond/index.html
Addendum: if your servlet or another guilty application is using select statements found to be involved in the deadlock, you can try appending with ur to the select statements if accuracy of the newly updated (or inserted) data isn't important.
For me, the solution was adding FOR READ ONLY WITH UR at the end of all my SELECT statements. (Apparently my select statements were returning so much data, it locked the tables long enough to interfere with other SQL statements)
See https://www.ibm.com/support/knowledgecenter/SSEPEK_10.0.0/sqlref/src/tpc/db2z_sql_isolationclause.html