PostgreSQL: prevent lock on self table update with left join - postgresql

I'm on PostgreSQL 9.3. I'm the only one working on the database, and my code run queries sequentially for unit tests.
Most of the times the following UPDATE query run without problem, but sometimes it makes locks on the PostgreSQL server. And then the query seems to never ends, while it takes only 3 sec normally.
I must precise that the query run in a unit test context, i.e. data is exactly the same whereas the lock happens or not. The code is the only process that updates the data.
I know there may be lock problems with PostgreSQL when using update query for a self updating table. And most over when a LEFT JOIN is used.
I also know that a LEFT JOIN query can be replaced with a NOT EXISTS query for an UPDATE but in my case the LEFT JOIN is much faster because there is few data to update, while a NOT EXISTS should visit quite all row candidates.
So my question is: what PostgreSQL commands (like Explicit Locking LOCK on table) or options (like SELECT FOR UPDATE) I should use in order to ensure to run my query without never-ending lock.
Query:
-- for each places of scenario #1 update all owners that
-- are different from scenario #0
UPDATE t_territories AS upt
SET id_owner = diff.id_owner
FROM (
-- list of owners in the source that are different from target
SELECT trg.id_place, src.id_owner
FROM t_territories AS trg
LEFT JOIN t_territories AS src
ON (src.id_scenario = 0)
AND (src.id_place = trg.id_place)
WHERE (trg.id_scenario = 1)
AND (trg.id_owner IS DISTINCT FROM src.id_owner)
-- FOR UPDATE -- bug SQL : FOR UPDATE cannot be applied to the nullable side of an outer join
) AS diff
WHERE (upt.id_scenario = 1)
AND (upt.id_place = diff.id_place)
Table structure:
CREATE TABLE t_territories
(
id_scenario integer NOT NULL,
id_place integer NOT NULL,
id_owner integer,
CONSTRAINT t_territories_pk PRIMARY KEY (id_scenario, id_place),
CONSTRAINT t_territories_fkey_owner FOREIGN KEY (id_owner)
REFERENCES t_owner (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE RESTRICT
)

I think that your query was locked by another query. You can find this query by
SELECT
COALESCE(blockingl.relation::regclass::text,blockingl.locktype) as locked_item,
now() - blockeda.query_start AS waiting_duration, blockeda.pid AS blocked_pid,
blockeda.query as blocked_query, blockedl.mode as blocked_mode,
blockinga.pid AS blocking_pid, blockinga.query as blocking_query,
blockingl.mode as blocking_mode
FROM pg_catalog.pg_locks blockedl
JOIN pg_stat_activity blockeda ON blockedl.pid = blockeda.pid
JOIN pg_catalog.pg_locks blockingl ON(
( (blockingl.transactionid=blockedl.transactionid) OR
(blockingl.relation=blockedl.relation AND blockingl.locktype=blockedl.locktype)
) AND blockedl.pid != blockingl.pid)
JOIN pg_stat_activity blockinga ON blockingl.pid = blockinga.pid
AND blockinga.datid = blockeda.datid
WHERE NOT blockedl.granted
AND blockinga.datname = current_database()
This query I've found here http://big-elephants.com/2013-09/exploring-query-locks-in-postgres/
Also can use ACCESS EXCLUSIVE LOCK to prevent any query to read and write table t_territories
LOCK t_territories IN ACCESS EXCLUSIVE MODE;
More info about locks here https://www.postgresql.org/docs/9.1/static/explicit-locking.html

Related

When is it better to use CTE or temp table postgres

I am doing a query on a very large data set and i am using WITH (CTE) syntax.. this seems to take a while and i was reading online that temp tables could be faster to use in these cases can someone advise me in which direction to go. In the CTE we join to a lot of tables then we filter on the CTE result..
Only interesting in postgres answers
What version of PostgreSQL are you using? CTEs perform differently in PostgreSQL versions 11 and older than versions 12 and above.
In PostgreSQL 11 and older, CTEs are optimization fences (outer query restrictions are not passed on to CTEs) and the database evaluates the query inside the CTE and caches the results (i.e., materialized results) and outer WHERE clauses are applied later when the outer query is processed, which means either a full table scan or a full index seek is performed and results in horrible performance for large tables. To avoid this, apply as much filters in the WHERE clause inside the CTE:
WITH UserRecord AS (SELECT * FROM Users WHERE Id = 100)
SELECT * FROM UserRecord;
PostgreSQL 12 addresses this problem by introducing query optimizer hints to enable us to control if the CTE should be materialized or not: MATERIALIZED, NOT MATERIALIZED.
WITH AllUsers AS NOT MATERIALIZED (SELECT * FROM Users)
SELECT * FROM AllUsers WHERE Id = 100;
Note: Text and code examples are taken from my book Migrating your SQL Server Workloads to PostgreSQL
Summary:
PostgreSQL 11 and older: Use Subquery
PostgreSQL 12 and above: Use CTE with NOT MATERIALIZED clause
My follow up comment is more than I can fit in a comment... so understand this may not be an answer to the OP per se.
Take the following query, which uses a CTE:
with sales as (
select item, sum (qty) as sales_qty, sum (revenue) as sales_revenue
from sales_data
where country = 'USA'
group by item
),
inventory as (
select item, sum (on_hand_qty) as inventory_qty
from inventory_data
where country = 'USA' and on_hand_qty != 0
group by item
)
select
a.item, a.description, s.sales_qty, s.sales_revenue,
i.inventory_qty, i.inventory_qty * a.cost as inventory_cost
from
all_items a
left join sales s on
a.item = s.item
left join inventory i on
a.item = i.item
There are times where I cannot explain why that the query runs slower than I would expect. Some times, simply materializing the CTEs makes it run better, as expected. Other times it does not, but when I do this:
drop table if exists sales;
drop table if exists inventory;
create temporary table sales as
select item, sum (qty) as sales_qty, sum (revenue) as sales_revenue
from sales_data
where country = 'USA'
group by item;
create temporary table inventory as
select item, sum (on_hand_qty) as inventory_qty
from inventory_data
where country = 'USA' and on_hand_qty != 0
group by item;
select
a.item, a.description, s.sales_qty, s.sales_revenue,
i.inventory_qty, i.inventory_qty * a.cost as inventory_cost
from
all_items a
left join sales s on
a.item = s.item
left join inventory i on
a.item = i.item;
Suddenly all is right in the world.
Temp tables may persist across sessions, but to my knowledge the data in them will be session-based. I'm honestly not even sure if the structures persist, which is why to be safe I always drop:
drop table if exists sales;
And use "if exists" to avoid any errors about the object not existing.
I rarely use these in common queries for the simple reason that they are not as portable as a simple SQL statement (you can't give the final query to another user without having the temp tables). My most common use case is when I am processing within a procedure/function:
create procedure sales_and_inventory()
language plpgsql
as
$BODY$
BEGIN
create temp table sales...
insert into sales_inventory
select ...
drop table sales;
END;
$BODY$
Hopefully this helps.
Also, to answer your question on indexes... typically I don't, but nothing says that's always the right answer. If I put data into a temp table, I assume I'm going to use all or most of it. That said, if you plan to query it multiple times with conditions where an index makes sense, then by all means do it.

Does Postgres lock all rows in a query atomically, even across different tables via JOIN?

I am getting a deadlock error on my code. The issue is that this deadlock error is happening on the very first query of the transaction. This query joins two tables, TableA and TableB and should lock a single row in TableA with id==table_a_id, and all the rows on TableB that have a foreign key for table_a_id.
The query looks as follows (I am using SQLAlchemy, this output is from printing the equivalent query from it and will have its code below as well):
SELECT TableB.id AS TableB_id
FROM TableA JOIN TableB ON TableA.id = TableB.table_a_id
WHERE TableB.id = %(id_1)s FOR UPDATE
The query looks as follows in SQLAlchemy syntax:
query = (
database.query(TableB.id)
.select_from(TableA)
.filter_by(id=table_a_id)
.join((TableB, TableA.id == TableB.table_a_id))
.with_for_update()
)
return query.all()
My question is, will this query atomically lock all those rows from both tables? If so, why would I get a deadlock already exactly on this query, given it's the first query of the transaction?
The query will lock the rows one after the other as they are selected. The exact order will depend on the execution plan. Perhaps you can add FOR UPDATE OF table_name to lock rows only in the table where you need them locked.
I have two more ideas:
rewrite the query so that it locks the rows in a certain order:
WITH b AS MATERIALIZED (
SELECT id, table_a_id
FROM tableb
WHERE id = 42
FOR NO KEY UPDATE
)
SELECT b.id
FROM tablea
WHERE EXISTS (SELECT 1 FROM b
WHERE tablea.id = b.table_a_id)
ORDER BY tablea.id
FOR NO KEY UPDATE;
Performance may not be as good, but if everybody selects like that, you won't get a deadlock.
lock the tables:
LOCK TABLE tablea, tableb IN EXCLUSIVE MODE;
That lock will prevent concurrent row locks and data modifications, so you will be safe from a deadlock.
Only do that as a last-ditch effort, and don't do it too often. If you frequently take high table locks like that, you keep autovacuum from running and endanger the health of your database.

PostgreSQL - How to make a condition with records between the current record date and the same date plus 5 min?

I have something like this. With this part of code I detect if a vehicle stopped at least 5 minutes.
And works but, with a large amount of data, it starts to be slow.
I did a lot of tests and I'm sure that my problem is in the not exists block.
My table:
CREATE TABLE public.messages
(
id bigint PRIMARY KEY DEFAULT nextval('messages_id_seq'::regclass),
messagedate timestamp with time zone NOT NULL,
vehicleid integer NOT NULL,
driverid integer NOT NULL,
speedeffective double precision NOT NULL,
-- ... few nonsense properties
)
WITH (
OIDS=FALSE
);
ALTER TABLE public.messages OWNER TO postgres;
CREATE INDEX idx_messages_1 ON public.messages
USING btree (vehicleid, messagedate);
And my query:
SELECT
*
FROM
messages m
WHERE
m.speedeffective > 0
and m.next_speedeffective = 0
and not exists( -- my problem
select id
from messages
where
vehicleid = m.vehicleid
and speedeffective > 5 -- I forgot this condition
and messagedate > m.messagedate
and messagedate <= m.messagedate + interval '5 minutes'
)
I can't figure out how to build the condition in a more performant way.
Edit DAY2:
I added a previous table like this to use in the second table:
WITH messagesx as (
SELECT
vehicleid,
messagedate
FROM
messages
WHERE
speedeffective > 5
)
and now works better. I think that I'm missing a little detail.
Typically, a 'NOT EXISTS' will slow down your query as it requires a full scan of the table for each of the outer rows. Try to incorporate the same functionality within a join (I'm trying to rewrite the query here, without knowing the table, so I might make a mistake here):
SELECT
*
FROM
messages m1
LEFT JOIN
messages m2
ON m1.vehicleid = m2.vehicleid AND m1.messagedate < m2.messagedate AND m1.messagedate <= m2.messagedate+interval '5 minutes'
WHERE
speedeffective > 0
and next_speedeffective = 0
and m2.vehicleid IS NULL
Take note that the NOT EXISTS is rewritten as the non-hit of the join condition.
Based on this answer: https://stackoverflow.com/a/36445233/5000827
and reading about NOT IN, NOT EXISTS and LEFT JOIN (where join is NULL)
For PostgreSQL, NOT EXISTS and LEFT JOIN are anti-join and works at the same way. (This is the reason why the #CountZukula answer's result is almost the same than mine)
The problem was on the kind of operation: Nest or Hash.
So, based on this: https://www.postgresql.org/docs/9.6/static/routine-vacuuming.html
PostgreSQL's VACUUM command has to process each table on a regular basis for several reasons:
To recover or reuse disk space occupied by updated or deleted rows.
To update data statistics used by the PostgreSQL query planner.
To update the visibility map, which speeds up index-only scans.
To protect against loss of very old data due to transaction ID wraparound or multixact ID wraparound.
I made a VACUUM ANALYZE to messages table and the same query works way fast.
So, with the VACUUM PostgreSQL can decide better.

When is a Deadlock not a Deadlock?

I'm asking this question because I'm getting a deadlock from time to time that I don't understand.
This is the scenario:
Stored Procedure that updates table A:
UPDATE A
SET A.Column = #SomeValue
WHERE A.ID = #ID
Stored Procedure that inserts into a temp table #temp:
INSERT INTO #temp (Column1,Column2)
SELECT B.Column1, A.Column2
FROM B
INNER JOIN A
ON A.ID = B.ID
WHERE B.Code IN ('Something','SomethingElse')
I see that there could possibly be a lock wait but I fail to see how a deadlock would occur, am I missing something obvious?
EDIT:
The SPs that I typed here are obviously simplified versions but I'm using the columns involved. The structure of both tables would be:
CREATE TABLE A (ID IDENTITY
CONSTRAINT PRIMARY KEY,
Column VARCHAR (100))
CREATE TABLE B (ID IDENTITY
CONSTRAINT PRIMARY KEY,
Code VARCHAR (100))
Try this since its causeing locks specify for the tables name the table hint and keyword:
WITH(NOLOCK)
So some thing like this for your scenario:
INSERT INTO #temp (Column1,Column2)
SELECT B.Column1, A.Column2
FROM B WITH(NOLCOK)
INNER JOIN A WITH(NOLOCK)
ON A.ID = B.ID
WHERE B.Code IN ('Something','SomethingElse')
See how you go then.
You can lookup table hint also for tsql, sql server to see which one suits you best. The one I specified NOLCOK will not cause locks and also it will skip locked rows as some other process is using them, so if you dont care you can use it.
I am not sure with temp tables but you can also use table hints with INSERT, INSERT INTO WITH(TABLE_HINT).

How to do a safe "SELECT FOR UPDATE" with a WHERE condition over multiple tables on a DB2?

Problem
On a DB2 (version 9.5) the SQL statement
SELECT o.Id FROM Table1 o, Table2 x WHERE [...] FOR UPDATE WITH RR
gives me the error message SQLSTATE=42829 (The FOR UPDATE clause is not allowed because the table specified by the cursor cannot be modified).
Additional info
I need to specify WITH RR, because I'm running on isolation level READ_COMMITTED, but I need my query to block while there is another process running the same query.
Solution so far...
If I instead query like this:
SELECT t.Id FROM Table t WHERE t.Id IN (
SELECT o.Id FROM Table1 o, Table2 x WHERE [...]
) FOR UPDATE WITH RR
everything works fine.
New problem
But now I occasionally get deadlock exceptions when multiple processes perform this query simultaneously.
Question
Is there a way to formulate the FOR UPDATE query without introducing a place where a deadlock can occur?
First, for having isolation level READ_COMMITTED you do not need to specify WITH RR, because this results in the isolation level SERIALIZABLE. To specify WITH RS (Read Stability) is enough.
To propagate the FOR UPDATE WITH RS to the inner select you have to specify additionally USE AND KEEP UPDATE LOCKS.
So the complete statement looks like this:
SELECT t.Id FROM Table t WHERE t.Id IN (
SELECT o.Id FROM Table1 o, Table2 x WHERE [...]
) FOR UPDATE WITH RS USE AND KEEP UPDATE LOCKS
I made some tests on a DB2 via JDBC and it worked without deadlocks.