What could be the reason that select queries performed inside a transaction is really slow
even though the queries are with just their primary key?
I have two tables after updating one row in table1 the query in table2 within the same transaction is really slow
even though the query on table2 is just
select * from table2 where id = 10
table1 and table2 have a lot of rows
Related
I am getting a deadlock error on my code. The issue is that this deadlock error is happening on the very first query of the transaction. This query joins two tables, TableA and TableB and should lock a single row in TableA with id==table_a_id, and all the rows on TableB that have a foreign key for table_a_id.
The query looks as follows (I am using SQLAlchemy, this output is from printing the equivalent query from it and will have its code below as well):
SELECT TableB.id AS TableB_id
FROM TableA JOIN TableB ON TableA.id = TableB.table_a_id
WHERE TableB.id = %(id_1)s FOR UPDATE
The query looks as follows in SQLAlchemy syntax:
query = (
database.query(TableB.id)
.select_from(TableA)
.filter_by(id=table_a_id)
.join((TableB, TableA.id == TableB.table_a_id))
.with_for_update()
)
return query.all()
My question is, will this query atomically lock all those rows from both tables? If so, why would I get a deadlock already exactly on this query, given it's the first query of the transaction?
The query will lock the rows one after the other as they are selected. The exact order will depend on the execution plan. Perhaps you can add FOR UPDATE OF table_name to lock rows only in the table where you need them locked.
I have two more ideas:
rewrite the query so that it locks the rows in a certain order:
WITH b AS MATERIALIZED (
SELECT id, table_a_id
FROM tableb
WHERE id = 42
FOR NO KEY UPDATE
)
SELECT b.id
FROM tablea
WHERE EXISTS (SELECT 1 FROM b
WHERE tablea.id = b.table_a_id)
ORDER BY tablea.id
FOR NO KEY UPDATE;
Performance may not be as good, but if everybody selects like that, you won't get a deadlock.
lock the tables:
LOCK TABLE tablea, tableb IN EXCLUSIVE MODE;
That lock will prevent concurrent row locks and data modifications, so you will be safe from a deadlock.
Only do that as a last-ditch effort, and don't do it too often. If you frequently take high table locks like that, you keep autovacuum from running and endanger the health of your database.
I am submitting 3 million records to postgres table1 from a staging table table2,I have my update and insert queries as below
UPDATE table1 t set
col1 = stage.col1,
col2 = stage.col2 ,
col3 = stage.stage.col3
from table2 stage
where t.id::uuid = stage.id::uuid
and coalesce(t.name,'name') = coalesce(stage.name,'name')
and coalesce(t.level,'level') = coalesce(stage.level,'level');
INSERT INTO table1
(col1,
col2,
col3,
col4,
id,
name,
level)
select
stage.col1,
stage.col2,
stage.col3,
stage.col4,
stage.id,
stage.name,
stage.level
from table2 stage
where NOT EXISTS (select
from table1 t where
t.id::uuid = stage.id::uuid
and coalesce(t.name,'name') = coalesce(stage.name,'name')
and coalesce(t.level,'level') = coalesce(stage.level,'level'));
I am facing performance issues (takes long 1.5 hours) even using the exactly same indexed keys (btree) as defined on the table, In order to test the cause ,I created a replica of the table1 without indexes and I was able to submit entire data in 15 ~ 17 mins approx., So I am inclined to think that indexes are killing the performance on the table as there are so many of them (some unused indexes which I cannot drop due to permission issues).I am looking for suggestions to improve/optimize my query or may be use some other strategy to upsert the data to reduce data load time. Any suggestion is appreciated.
Running an explain analyze on the query helped me to realize the query was never using the defined indexes on target table and was doing a sequential scan on a large number of rows ,the cause was one of the keys used in update/insert was defined without a coalesce in the defined indexes , though it means I have to handle null well before feeding in to my code , but it improved the performance significantly. I am open to further improvements.
Imagine I have this SQL query and the table2 is HUGE.
select product_id, count(product_id)
from table1
where table2_ptr_id in (select id
from table2
where author is not null)
Will SQL first execute the subquery and load all the table2 into memory? like if table1 has 10 rows and table2 has 10 million rows will it be better to join first and then filter? Or DB is smart enough to optimize this query as it is written.
You have to EXPLAIN the query to know what it is doing.
However, your query will likely perform better in PostgreSQL if you rewrite it to
SELECT product_id
FROM table1
WHERE EXISTS (SELECT 1
FROM table2
WHERE table2.id = table1.table2_ptr_id
AND table2.author IS NOT NULL);
Then PostgreSQL can use an anti-join, which will probably perform much better with a huge table2.
Remark: the count in your query doesn't make any sense to me.
I'm trying to implement a many-to-many relationship using PostgreSQL's Array type, because it scales better for my use case than a join table would. I have two tables: table1 and table2. table1 is the parent in the relationship, having the column child_ids bigint[] default array[]::bigint[]. A single row in table1 can have upwards of tens of thousands of references to table2 in the table1.child_ids column, therefore I want to try to limit the amount returned by my query to a maximum of 10. How would I structure this query?
My query to dereference the child ids is SELECT *, json_agg(table2.*) as children FROM table1 INNER JOIN table2 ON table2 = ANY(table1.child_ids). I don't see a way I could set a limit without limiting the entire response as a whole. Is there a way to either limit this INNER JOIN, or at least utilize a subquery to that I can use LIMIT to restrict the amount of results from table2?
This would have been dead simple with properly normalized tables, but here goes with arrays:
SELECT *
FROM table1 t1, LATERAL (
SELECT json_agg(*) AS children
FROM table2
WHERE id = ANY (t1.child_ids)
LIMIT 10) t2;
Of course, you have no influence over which 10 rows per id of table2 will be selected.
I am new to postgresql (and databases in general) and was hoping to get some pointers on improving the efficiency of the following statement.
I am inserting data from one table to another, and do not want to insert duplicate values. I have a rid (unique identifier in each table) that are indexed and are Primary Keys.
I am currently using the following statement:
INSERT INTO table1 SELECT * FROM table2 WHERE rid NOT IN (SELECT rid FROM table1).
As of now the table one is 200,000 records, table2 is 20,000 records. Table1 is going to keep growing (probably to around 2,000,000) and table2 will stay around 20,000 records. As of now the statement takes about 15 minutes to run. I am concerned that as Table1 grows this is going to take way to long. Any suggestions?
This should be more efficient than your current query:
INSERT INTO table1
SELECT *
FROM table2
WHERE NOT EXISTS (
SELECT 1 FROM table1 WHERE table1.rid = table2.rid
);
insert into table1
select t2.*
from
table2 t2
left join
table1 t1 on t1.rid = t2.rid
where t1.rid is null