Question
My query (simple with a few joins) runs extremely slow when I have a small amount of data (~50k rows) but runs fast when I have a bigger amount of data (~180k rows). The time difference is huge since it is from a few seconds to almost half an hour.
Attempts
I have re-checked the joins and they are all correct. In addition, I have run a VACUUM ANALYZE to the table before running the query but it didn't solve anything. I also checked whether there were locks that were blocking the query in any way or the connectivity was in any case slow but they are not the case of the faults.
Therefore, I went to check the output of the EXPLAIN. After reading the outcome, I see that in the slow case, it makes unnecessary extra sortings and it gets stuck in a nested for loop that is non existent in the case where I have way more data. I'm not sure how to tell postgres to do the same plan as with the bigger dataset scenario.
Based on a comment, I also tried not to use CTEs but it is not helping either: still makes the nested loops and the sortings.
Details:
Postgres version: PostgreSQL 12.3
Full query text:
WITH t0 AS (SELECT * FROM original_table WHERE id=0),
t1 AS (SELECT * FROM original_table WHERE id=1),
t2 AS (SELECT * FROM original_table WHERE id=2),
t3 AS (SELECT * FROM original_table WHERE id=3),
t4 AS (SELECT * FROM original_table WHERE id=4)
SELECT
t0.dtime,
t1.dtime,
t3.dtime,
t3.dtime::date,
t4.dtime,
t1.first_id,
t1.field,
t1.second_id,
t1.third_id,
t2.fourth_id,
t4.fourth_id
FROM t1
LEFT JOIN t0 ON t1.first_id=t0.first_id
JOIN t2 ON t1.first_id=t2.first_id AND t1.second_id = t2.second_id AND t1.third_id = t2.third_id
JOIN t3 ON t1.first_id=t3.first_id AND t1.second_id = t3.second_id AND t1.third_id = t3.third_id
JOIN t4 ON t1.first_id=t4.first_id AND t1.second_id = t4.second_id AND t1.fourth_id= t4.third_id
ORDER BY t3.dtime
;
Table definition:
Column | Type
----------+----------------------------
id | smallint
dtime | timestamp without time zone
first_id | character varying(10)
second_id | character varying(10)
third_id | character varying(10)
fourth_id | character varying(10)
field | character varying(10)
Cardinality: slow case ~50k, fast case ~180k
Query plans: output of EXPLAIN (BUFFERS, ANALYZE) for the two cases - slow case https://explain.depesz.com/s/5JDw, fast case: https://explain.depesz.com/s/JMIL
Additional info: the relevant memory configuration is:
name | current_setting | source
---------------+-----------------+---------------------
max_stack_dept | 2MB | environment variable
max_wal_size | 1GB | configuration file
min_wal_size | 80MB | configuration file
shared_buffers | 128MB | configuration file
This often happens to me on SQL-Server.
What usually causes the slowness, is that it executes the CTE once per row joined.
You can prevent that from happening by selecting into temp-tables, instead of using CTEs.
I assume the same is true for PostgreSQL, but I didn't test it:
DROP TABLE IF EXISTS tempT0;
DROP TABLE IF EXISTS tempT1;
CREATE TEMP TABLE tempT0 AS SELECT * FROM original_table WHERE id=0;
CREATE TEMP TABLE tempT1 AS SELECT * FROM original_table WHERE id=1;
[... etc]
FROM tempT1 AS t1
LEFT JOIN tempT0 AS t0 ON t1.first_id=t0.first_id
Related
I am using PostgreSQL 13 and has intermediate level experience with PostgreSQL.
I have a table named tbl_employee. it stores employee details for number of customers.
Below is my table structure, followed by datatype and index access method
Column | Data Type | Index name | Idx Access Type
-------------+-----------------------------+---------------------------+---------------------------
id | bigint | |
name | character varying | |
customer_id | bigint | idx_customer_id | btree
is_active | boolean | idx_is_active | btree
is_delete | boolean | idx_is_delete | btree
I want to delete employees for specific customer by customer_id.
In table I have total 18,00,000+ records.
When I execute below query for customer_id 1001 it returns 85,000.
SELECT COUNT(*) FROM tbl_employee WHERE customer_id=1001;
When I perform delete operation using below query for this customer then it takes 2 hours, 45 minutes to delete the records.
DELETE FROM tbl_employee WHERE customer_id=1001
Problem
My concern is that this query should take less than 1 min to delete the records. Is this normal to take such long time or is there any way we can optimise and reduce the execution time?
Below is Explain output of delete query
The values of seq_page_cost = 1 and random_page_cost = 4.
Below are no.of pages occupied by the table "tbl_employee" from pg_class.
Please guide. Thanks
During :
DELETE FROM tbl_employee WHERE customer_id=1001
Is there any other operation accessing this table? If only this SQL accessing this table, I don't think it will take so much time.
In RDBMS systems each SQL statement is also a transaction, unless it's wrapped in BEGIN; and COMMIT; to make multi-statement transactions.
It's possible your multirow DELETE statement is generating a very large transaction that's forcing PostgreSQL to thrash -- to spill its transaction logs from RAM to disk.
You can try repeating this statement until you've deleted all the rows you need to delete:
DELETE FROM tbl_employee WHERE customer_id=1001 LIMIT 1000;
Doing it this way will keep your transactions smaller, and may avoid the thrashing.
SQL: DELETE FROM tbl_employee WHERE customer_id=1001 LIMIT 1000;
will not work then.
To make the batch delete smaller, you can try this:
DELETE FROM tbl_employee WHERE ctid IN (SELECT ctid FROM tbl_employee where customer_id=1001 limit 1000)
Until there is nothing to delete.
Here the "ctid" is an internal column of Postgresql Tables. It can locate the rows.
I have some data migration that has to occur between a parent and child table. For the sake of simplicity, the schemas are as follows:
------- -----------
| event | | parameter |
------- -----------
| id | | id |
| order | | eventId |
------- | order |
-----------
Because of an oversight with business logic that needs to be performed, we need to update parameter.order to the parent event.order. I have come up with the following SQL to do that:
UPDATE "parameter"
SET "order" = e."order"
FROM "event" e
WHERE "eventId" = e.id
The problem is that this query didn't resolve after over 4 hours and I had to clock out, so I cancelled it.
There are 11 million rows on parameter and 4 million rows on event. I've run EXPLAIN on the query and it tells me this:
Update on parameter (cost=706691.80..1706622.39 rows=11217313 width=155)
-> Hash Join (cost=706691.80..1706622.39 rows=11217313 width=155)
Hash Cond: (parameter."eventId" = e.id)
-> Seq Scan on parameter (cost=0.00..435684.13 rows=11217313 width=145)
-> Hash (cost=557324.91..557324.91 rows=7724791 width=26)
-> Seq Scan on event e (cost=0.00..557324.91 rows=7724791 width=26)
Based on this article it tells me that the "cost" referenced by the EXPLAIN is an "arbitrary unit of computation".
Ultimately, this update needs to be performed, but I would accept it happening in one of two ways:
I am advised of a better way to do this query that executes in a timely manner (I'm open to all suggestions, including updating schemas, indexing, etc.)
The query remains the same but I can somehow get an accurate prediction of execution time (even if it's hours long). This way, at least, I can manage the expectations of the team. I understand that without actually running the query it can't be expected to know the times, but is there an easy way to "convert" these arbitrary units into some millisecond execution time?
Edit for Jim Jones' comment:
I executed the following query:
SELECT psa.pid,locktype,mode,query,query_start,state FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid
I got 9 identical rows like the following:
pid | locktype | mode | query | query-start | state
-------------------------------------------------------------------------
23192 | relation | AccessShareLock | <see below> | 2021-10-26 14:10:01 | active
query column:
--update parameter
--set "order" = e."order"
--from "event" e
--where "eventId" = e.id
SELECT psa.pid,locktype,mode,query,query_start,state FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid
Edit 2: I think I've been stupid here... The query produced by checking these locks is just the commented query. I think that means there's actually nothing to report.
If some rows already have the target value, you can skip empty updates (at full cost). Like:
UPDATE parameter p
SET "order" = e."order"
FROM event e
WHERE p."eventId" = e.id
AND p."order" IS DISTINCT FROM e."order"; -- this
If both "order" columns are defined NOT NULL, simplify to:
...
AND p."order" <> e."order";
See:
How do I (or can I) SELECT DISTINCT on multiple columns?
If you have to update all or most rows - and can afford it! - writing a new table may be cheaper overall, like Mike already mentioned. But concurrency and depending objects may stand in the way.
Aside: use legal, lower-case identifiers, so you don't have to double-quote. Makes your life with Postgres easier.
The query will be slow because for each UPDATE operation, it has to look up the index by id. Even with an index, on a large table, this is a per-row read/write so it is slow.
I'm not sure how to get a good estimate, maybe do 1% of the table and multiply?
I suggest creating a new table, then dropping the old one and renaming the new table.
CREATE TABLE parameter_new AS
SELECT
parameter.id,
parameter."eventId",
e."order"
FROM
parameter
JOIN event AS "e" ON
"e".id = parameter."eventId"
Later, once you verify things:
ALTER TABLE parameter RENAME TO parameter_old;
ALTER TABLE parameter_new RENAME TO parameter;
Later, once you're completely certain:
DROP TABLE parameter_old;
An application inherited by me was oriented on so to say "natural record flow" in a PostgreSQL table and there was a Delphi code:
query.Open('SELECT * FROM TheTable');
query.Last();
The task is to get all the fields of last table record. I decided to rewrite this query in a more effective way, something like this:
SELECT * FROM TheTable ORDER BY ReportDate DESC LIMIT 1
but it broke all the workflow. Some of ReportDate records turned out to be NULL. The application was really oriented on a "natural" records order in a table.
How to do a physically last record selection effectively without ORDER BY?
to do a physically last record selection, you should use ctid - the tuple id, to get the last one - just select max(ctid). smth like:
t=# select ctid,* from t order by ctid desc limit 1;
ctid | t
--------+-------------------------------
(5,50) | 2017-06-13 11:41:04.894666+00
(1 row)
and to do it without order by:
t=# select t from t where ctid = (select max(ctid) from t);
t
-------------------------------
2017-06-13 11:41:04.894666+00
(1 row)
Its worth knowing that you can find ctid only after sequential scan. so checking the latest physically row will be costy on large data sets
Version: PostgreSQL 9.4.2
Column | Type | Modifiers
------------+---------+----------------------------------------------------------------
id | integer | not null default nextval('T1_id_seq'::regclass)
name | text |
value | text |
parent_id | integer |
Indexes:
"T1_pkey" PRIMARY KEY, btree (id)
"T1_id_idx" btree (id)
I have two tables like this in Postgresql, say T1 and T2 with tree like data structure referencing data from own table.
I need to modify some rows in T1 and insert it to T2 in the exact order as the rows appeared in T1. What I have done thus far is copy the relevant rows from table T1 to a temporary table T3 for data modification and insert everything from T3 to T2 when changes' made.
T3 is created using
CREATE TABLE T3 (LIKE T1 INCLUDING ALL)
INSERT * INTO T3 SELECT * FROM T1
The end result is rather strange. All the data from T3 were copied to T2, but the order of the ids seems to be random.
However the result is correct if I invoke the same script to copy data from T1 to T3 directly. What is even more bizarre is it's also correct if if I split the above script into two separate script to
Create T3 from T1 and copy data from T1 to T3
Copy T3 to T2 using INSERT method.
Any clues?
You didn't specify an ORDER BY clause. Without one, PostgreSQL might fetch the rows for your SELECT in whatever order happens to be fastest to execute.
Try:
CREATE TABLE T3 (LIKE T1 INCLUDING ALL);
INSERT INTO T3
SELECT * FROM T1 ORDER BY T1.id;
Note that strictly there is no guarantee that the INSERT of multiple rows will process rows in the order they are read from the SELECT, but in practice PostgreSQL at this time will always process them in order and it's not likely to change in a hurry.
I've greatly simplified the examples to hopefully produce a clear enough question that can be answered:
Consider a table of events
CREATE TABLE alertable_events
(
unique_id text NOT NULL DEFAULT ''::text,
generated_on timestamp without time zone NOT NULL DEFAULT now(),
message_text text NOT NULL DEFAULT ''::text,
CONSTRAINT pk_alertable_events PRIMARY KEY (unique_id),
)
with the following data:
COPY alertable_events (unique_id,message_text,generated_on) FROM stdin;
one message one 2014-03-20 06:00:00.000000
two message two 2014-03-21 06:00:00.000000
three message three 2014-03-22 06:00:00.000000
four message four 2014-03-23 06:00:00.000000
five message five 2014-03-24 06:00:00.000000
\.
And for each event, there is a list of fields
CREATE TABLE alertable_event_fields
(
unique_id text NOT NULL DEFAULT ''::text,
field_name text NOT NULL,
field_value text NOT NULL DEFAULT ''::text,
CONSTRAINT pk_alertable_event_fields PRIMARY KEY (unique_id, field_name),
CONSTRAINT fk_alertable_event_fields_0 FOREIGN KEY (unique_id)
REFERENCES alertable_events (unique_id) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE CASCADE,
)
with the following data:
COPY alertable_event_fields (unique_id,field_name,field_value) FROM stdin;
one field1 a
one field2 b
two field1 z
two field2 y
three field1 a
three field2 m
four field1 a
four field2 b
five field1 z
five field2 y
\.
I want to define a view that produces the following:
| unique_id | fields | message_text | generated_on | updated_on | count |
| five | z|y | message five | 2014-03-21 06:00:00.000000 | 2014-03-24 06:00:00.000000 | 2 |
| four | a|b | message four | 2014-03-20 06:00:00.000000 | 2014-03-23 06:00:00.000000 | 2 |
| three | a|m | message three | 2014-03-22 06:00:00.000000 | 2014-03-22 06:00:00.000000 | 1 |
Notably:
fields is a pipe delimited string (or any serialization of) the field values (json encoding of field_name:field_value pairs would be even better ... but I can work with pipe_delim for now)
the output is grouped by matching fields. Update 3/30 12:45am The values are ordered by their field_name's alphabetically therefore a|b would not match b|a
a count is produced of the events that match that field set. updated 3/30 12:45am there can be different number of fields per unique_id, a match requires matching all fields and not a subset of the fields.
generated_on is the timestamp of the first event
updated_on is the timestamp of the most recent event
message_text is the message_text of the most recent event
I've produced this view, and it works for small data sets, however, as the alertable_events table grows, it becomes exceptionally slow. I can only assume I'm doing something wrong in the view because I have never dealt with anything quite so ugly.
Update 3/30 12:15PM EDT It looks like I may have server tuning problems causing this high run-times, see added explain for more info. If you see a glaring issue there, I'd be greatly interested in tweaking the server's configuration.
Can anyone piece together a view that handles large datasets well and has a significantly better run time than this? Perhaps using hstore? (I'm running 9.2 preferrably, though 9.3 if I can have a nice json encoding of the fields.)
Updated 3/30 11:30AM I'm beginning to think my issue may be server tuning (which means I'll need to talk to the SA) Here's a very simple explain (analyze,buffers) which is showing a ridiculous run-time for as few as 8k rows in the unduplicated_event_fields
Update 3/30 7:20PM I bumped my available memory to 5MB using SET WORK_MEM='5MB' (which is plenty for the query below), strangely, even though the planner went to in memory quicksort, it actually took on average 100ms longer!
explain (analyze,buffers)
SELECT a.unique_id,
array_to_string(array_agg(a.field_value order by a.field_name),'|') AS "values"
FROM alertable_event_fields a
GROUP BY a.unique_id;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=771.11..892.79 rows=4056 width=80) (actual time=588.679..630.989 rows=4056 loops=1)
Buffers: shared hit=143, temp read=90 written=90
-> Sort (cost=771.11..791.39 rows=8112 width=80) (actual time=588.591..592.622 rows=8112 loops=1)
Sort Key: unique_id
Sort Method: external merge Disk: 712kB
Buffers: shared hit=143, temp read=90 written=90
-> Seq Scan on alertable_event_fields a (cost=0.00..244.40 rows=8112 width=80) (actual time=0.018..5.478 rows=8112 loops=1)
Filter: (message_name = 'LIMIT_STATUS'::text)
Buffers: shared hit=143
Total runtime: 632.323 ms
(10 rows)
Update 3/30 4:10AM EDT I'm still not completely satisfied and would be interested in any further optimization. I have a requirement to support 500msgs/sec steady state, and although most of those should not be "events", I get a little backlogged right now when stress testing.
Update 3/30 12:00PM EDT Here's my most readable iteration yet, unfortunately, for 4000 rows I'm still looking at 600ms runtimes! ... (see above, as its mostly contained with the inner most query) Any help here would be greatly appreciated
CREATE OR REPLACE VIEW views.unduplicated_events AS
SELECT a.unique_id,a.message_text,
b."values",b.generated_on,b.updated_on,b.count
FROM alertable_events a
JOIN (
SELECT b."values",
min(a.generated_on) AS generated_on,
max(a.generated_on) AS updated_on,
count(*) AS count
FROM alertable_events a
JOIN (
SELECT a.unique_id,
array_to_string(array_agg(a.field_value order by a.field_name),'|') AS "values"
FROM alertable_event_fields a
GROUP BY a.unique_id
) b USING (unique_id)
GROUP BY b."values"
) b ON a.generated_on=b.updated_on
ORDER BY updated_on DESC;
Update 3/30 12:00PM EDT removed old stuff as this is getting too long
Some pointers
Invalid query
Your current query is incorrect unless generated_on is unique, which is not declared in the question and probably is not the case:
CREATE OR REPLACE VIEW views.unduplicated_events AS
SELECT ...
FROM alertable_events a
JOIN ( ... ) b ON a.generated_on=b.updated_on -- !! unreliable
Possibly faster
SELECT DISTINCT ON (f.fields)
unique_id -- most recent
, f.fields
, e.message_text -- most recent
, min(e.generated_on) OVER (PARTITION BY f.fields) AS generated_on -- "first"
, e.generated_on AS updated_on -- most recent
, count(*) OVER (PARTITION BY f.fields) AS ct
FROM alertable_events e
JOIN (
SELECT unique_id, array_to_string(array_agg(field_value), '|') AS fields
FROM (
SELECT unique_id, field_value
FROM alertable_event_fields
ORDER BY 1, field_name -- a bit of a hack, but much faster
) f
GROUP BY 1
) f USING (unique_id)
ORDER BY f.fields, e.generated_on DESC;
SQL Fiddle.
The result is currently sorted by fields. If you need a different sort order, you'd need to wrap it in another subquery ...
Major points
The output column name generated_on conflicts with the input column generated_on. You have to table-qualify the column e.generated_on to refer to the input column. I added table-qualification everywhere to make it clear, but it is only actually necessary the ORDER BY clause. The manual:
If an ORDER BY expression is a simple name that matches both an
output column name and an input column name, ORDER BY will interpret
it as the output column name. This is the opposite of the choice that
GROUP BY will make in the same situation. This inconsistency is made
to be compatible with the SQL standard.
The updated query should also be faster (as intended all along). Run EXPLAIN ANALYZE again.
For the whole query, indexes will hardly be of use. Only if you select specific rows ... One possible exception: a covering index for alertable_event_fields:
CREATE INDEX f_idx1
ON alertable_event_fields (unique_id, field_name, field_value);
Lots of write operations might void the benefit, though.
array_agg(field_value ORDER BY ...) tends to be slower for big sets than pre-sorting in a subquery.
DISTINCT ON is convenient here. Not sure, whether it's actually faster, though, since ct and generated_on have to be computed in separate window functions, which requires another sort step.
work_mem: setting it too high can actually harm performance. More in the Postgres Wiki. or in "Craig's list".
Generally this is hard to optimize. Indexes fail because the sort order depends on two tables. If you can work with a snapshot, consider a MATERIALIZED VIEW.