PostgreSQL version 10
Windows 10
16GB RAM
SSD
I'm ashamed to admit that, despite searching the hundred years of PG support archives, I cannot figure out this most basic problem. But here it is...
I have big_table with 45 million rows and little_table with 12,000 rows. I need to do a left join to include all big_table rows, along with the id's of little_table rows where big_table's timestamp overlaps with two timestamps in little_table.
This doesn't seem like it should be an extreme operation for PG, but it is taking 2 1/2 hours!
Any ideas on what I can do here? Or do you think I have unwittingly come up against the limitations of my software/hardware combo given the table size?
Thanks!
little_table with 12,000 rows
CREATE TABLE public.little_table
(
id bigint,
start_time timestamp without time zone,
stop_time timestamp without time zone
);
CREATE INDEX idx_little_table
ON public.little_table USING btree
(start_time, stop_time DESC);
big_table with 45 million rows
CREATE TABLE public.big_table
(
id bigint,
datetime timestamp without time zone
) ;
CREATE INDEX idx_big_table
ON public.big_table USING btree
(datetime);
Query
explain analyze
select
bt.id as bt_id,
lt.id as lt_id
from
big_table bt
left join
little_table lt
on
(bt.datetime between lt.start_time and lt.stop_time)
Explain Results
Nested Loop Left Join (cost=0.29..3260589190.64 rows=64945831346 width=16) (actual time=0.672..9163998.367 rows=1374445323 loops=1)
-> Seq Scan on big_table bt (cost=0.00..694755.92 rows=45097792 width=16) (actual time=0.014..10085.746 rows=45098790 loops=1)
-> Index Scan using idx_little_table on little_table lt (cost=0.29..57.89 rows=1440 width=24) (actual time=0.188..0.199 rows=30 loops=45098790)
Index Cond: ((bt.datetime >= start_time) AND (bt.datetime <= stop_time))
Planning time: 0.165 ms
Execution time: 9199473.052 ms
NOTE: My actual query criteria is a bit more complex, but this seems to be the root of the problem. If I can fix this part, I think I can fix the rest.
This query cannot perform any faster.
Since there is no equality operator (=) in your join condition, the only strategy left to PostgreSQL is a nested loop join. 45 million repetitions of an index scan on the small table just take a while.
I would suggest trying to change the start_time and end_time columns in the
little table to a single tsrange column. According to the docs, this datatype supports a GIST index which can speed up the "range contains element" operator #>. Maybe this will do better than the index scan on your current btree.
Generating 1.3 billion rows seems pretty extreme to me. How often do you need to do this, and how fast do you need it to be?
To explain a bit about your current plan:
Index Cond: ((bt.datetime >= start_time) AND (bt.datetime <= stop_time))
While it is not obvious from what is displayed above, this always scans about half the index. It starts at the beginning of the index, and stops once start_time > bt.datetime, using bt.datetime <= stop_time as an in-index filter that need to examine each row before rejecting it.
To flesh out Bergi's answer, you could do this:
alter table little_table add range tsrange;
update little_table set range =tsrange(start_time,stop_time,'[]');
create index on little_table using gist(range);
select
bt.id as bt_id,
lt.id as lt_id
from
big_table bt
left join
little_table lt
on
(bt.datetime <# lt.range)
In my hands, that is about 4 times faster than your current method.
If your join did not need to do a left join, then you could get some more efficient operations by joining the tables in the opposite order. Perhaps you could get better performance by separating this into 2 operations, and inner join and then a probe for missing values, and combining the results.
Related
I try to improve query performances on a big (500M rows) time partitioned table. Here is the simplified table structure:
CREATE TABLE execution (
start_time TIMESTAMP WITH TIME ZONE NOT NULL,
end_time TIMESTAMP WITH TIME ZONE,
restriction_criteria VARCHAR(36) NOT NULL
PARTITION BY RANGE (start_time);
Time partitioning
is based on the start_time column because the end_time value is not known when the row is created.
is used to implement efficiently the retention policy.
Request looks like to this generic pattern
SELECT *
FROM execution
WHERE start_time BETWEEN :from AND start_time :to
AND restriction_criteria IN ('123', '456')
ORDER BY end_time DESC, id
FETCH NEXT 20 ROWS ONLY;
I've got the "best" performances using this index
CREATE INDEX IF NOT EXISTS end_time_desc_start_time_index ON execution USING btree (end_time DESC, start_time);
Yet, performances are not good enough.
Limit (cost=1303.21..27189.31 rows=20 width=1674) (actual time=6791.191..6791.198 rows=20 loops=1)
-> Incremental Sort (cost=1303.21..250693964.74 rows=193689 width=1674) (actual time=6791.189..6791.194 rows=20 loops=1)
" Sort Key: execution.end_time DESC, execution.id"
Presorted Key: execution.end_time
Full-sort Groups: 1 Sort Method: quicksort Average Memory: 64kB Peak Memory: 64kB
-> Merge Append (cost=8.93..250685248.74 rows=193689 width=1674) (actual time=4082.161..6791.047 rows=21 loops=1)
Sort Key: execution.end_time DESC
Subplans Removed: 15
-> Index Scan using execution_2021_10_end_time_start_time_idx on execution_2021_10 execution_1 (cost=0.56..113448316.66 rows=93103 width=1674) (actual time=578.896..578.896 rows=1 loops=1)
Index Cond: ((start_time <= '2021-12-05 02:00:04+00'::timestamp with time zone) AND (start_time >= '2021-10-02 02:00:04+00'::timestamp with time zone))
" Filter: (((restriction_criteria)::text = ANY ('{123,456}'::text[])))"
Rows Removed by Filter: 734
-> Index Scan using execution_2021_11_end_time_start_time_idx on execution_2021_11 execution_2 (cost=0.56..113653576.54 rows=87605 width=1674) (actual time=116.841..116.841 rows=1 loops=1)
Index Cond: ((start_time <= '2021-12-05 02:00:04+00'::timestamp with time zone) AND (start_time >= '2021-10-02 02:00:04+00'::timestamp with time zone))
" Filter: (((restriction_criteria)::text = ANY ('{123,456}'::text[])))"
Rows Removed by Filter: 200
-> Index Scan using execution_2021_12_end_time_start_time_idx on execution_2021_12 execution_3 (cost=0.56..16367185.18 rows=12966 width=1674) (actual time=3386.416..6095.261 rows=21 loops=1)
Index Cond: ((start_time <= '2021-12-05 02:00:04+00'::timestamp with time zone) AND (start_time >= '2021-10-02 02:00:04+00'::timestamp with time zone))
" Filter: (((restriction_criteria)::text = ANY ('{123,456}'::text[])))"
Rows Removed by Filter: 5934
Planning Time: 4.108 ms
Execution Time: 6791.317 ms
The index Filter looks is very slow.
I set up a multi-column index hoping the filtering would be done in the Index cond. But it doesn't work
CREATE INDEX IF NOT EXISTS pagination_index ON execution USING btree (end_time DESC, start_time, restriction_criteria);
My feeling is that the first index column should be end_time because we want to leverage the btree index sorting capability. The second one should be restriction_criteria so that an index cond filters rows which doesn't match the restriction_criteria. However, this doesn't work because the query planner need to also check the start_time clause.
The alternative I imagine is to get rid of the partitioning because a multi-column end_time, restriction_critera index would work just fine.
Yet, this is not a perfect solution because dealing with our retention policy would become a pain.
Is there another alternative allowing to keep the start_time partitioning ?
I set up a multi-column index hoping the filtering would be done in the Index cond
The index machinery is very circumspect about what code it runs inside the index. It won't call any operators that it doesn't 'trust', because if the operator throws an error then the whole query will error out, possibly due to rows that weren't even user 'visible' in the first place (i.e. ones that were already deleted or created but never committed). No one wants that. Now the =ANY construct could be considered trustable, but it is not. That means it won't be applied in the Index Cond, but must be applied against the table row, which in turn means you need to visit the table, which is probably where all your time is going, visiting random table rows.
I don't know what it would take code-wise to make =ANY trusted. I've made efforts to investigate that in the past but really never got anywhere, the code around the ANY is too complicated for me to grasp. That would be a nice improvement for the future, but won't help you now anyway.
One way around this is to get an index-only scan. At that point it will call arbitrary code in the index, as it already knows the tuple is visible. But it won't do that for you, because you are selecting at least one column not in the index (and also not shown in your CREATE command, but obviously present anyway)
If you create an index like your widest one but adding "id" to the end, and only select from among those columns, then you should be get a much faster index only scans with merge appends.
If you have even more columns than the ones you've shown plus "id", and you really need to select those columns, and don't want to add all of them to the index, then you can use a trick to use an index-only scan anyway by doing a dummy self join:
with t as (SELECT id
FROM execution
WHERE start_time BETWEEN :from AND :to
AND restriction_criteria IN ('123', '456')
ORDER BY end_time DESC, id
FETCH NEXT 20 ROWS ONLY
)
select real.* from execution real join t using (id)
ORDER BY end_time DESC, id
(If "id" is not unique, then you might need to join on additional column. Also, you would need an index on "id", which you probably already have)
This one will still need to visit the table to fetch the extra columns, but only for the 20 rows being returned, not for all the ones failing the restriction_criteria.
If the restriction_criteria is very selective, another approach might be better: an index on or leading with that column. It will need to read and sort all of those rows (in the relevant partitions) before applying the LIMIT, but if it is very selective this will not take long.
While you can have the output sorted if the leading column is end_time you can reduce the amount of data processed if you use start_time as a leading column.
Since your filter in start_time and restriction_criteria, is excluding ~7000 rows in order to retrieve 20, maybe speeding up the filtering is more important that speeding up the sorting.
CREATE INDEX IF NOT EXISTS execution_start_time_restriction_idx
ON execution USING btree (start_time, restriction_criteria);
CREATE INDEX IF NOT EXISTS execution_restriction_start_time_idx
ON execution USING btree (restriction_criteria, start_time);
ANALYZE execution
If
FROM execution
WHERE start_time BETWEEN :from AND start_time :to
AND restriction_criteria IN ('123', '456')
Is more than the number of rows removed by the filter then having the `end_time as the leading column might be a good idea. But the planner should be able to figure that out for you.
In the end if some of those indices are not used you can drop it.
This is related to 2 other questions I posted (sounds like I should post this as a new question) - the feedback helped, but I think the same issue will come back the next time I need to insert data. Things were running slowly still which forced me to temporarily remove some of the older data so that only 2 months' worth remained in the table that I'm querying.
Indexing strategy for different combinations of WHERE clauses incl. text patterns
How to get date_part query to hit index?
Giving further detail this time - hopefully it will help pinpoint the issue:
PG version 10.7 (running on heroku
Total DB size: 18.4GB (this contains 2 months worth of data, and it will grow at approximately the same rate each month)
15GB RAM
Total available storage: 512GB
The largest table (the one that the slowest query is acting on) is 9.6GB (it's the largest chunk of the total DB) - about 10 million records
Schema of the largest table:
-- Table Definition ----------------------------------------------
CREATE TABLE reportimpression (
datelocal timestamp without time zone,
devicename text,
network text,
sitecode text,
advertisername text,
mediafilename text,
gender text,
agegroup text,
views integer,
impressions integer,
dwelltime numeric
);
-- Indices -------------------------------------------------------
CREATE INDEX reportimpression_feb2019_index ON reportimpression(datelocal timestamp_ops) WHERE datelocal >= '2019-02-01 00:00:00'::timestamp without time zone AND datelocal < '2019-03-01 00:00:00'::timestamp without time zone;
CREATE INDEX reportimpression_mar2019_index ON reportimpression(datelocal timestamp_ops) WHERE datelocal >= '2019-03-01 00:00:00'::timestamp without time zone AND datelocal < '2019-04-01 00:00:00'::timestamp without time zone;
CREATE INDEX reportimpression_jan2019_index ON reportimpression(datelocal timestamp_ops) WHERE datelocal >= '2019-01-01 00:00:00'::timestamp without time zone AND datelocal < '2019-02-01 00:00:00'::timestamp without time zone;
Slow query:
SELECT
date_part('hour', datelocal) AS hour,
SUM(CASE WHEN gender = 'male' THEN views ELSE 0 END) AS male,
SUM(CASE WHEN gender = 'female' THEN views ELSE 0 END) AS female
FROM reportimpression
WHERE
datelocal >= '3-1-2019' AND
datelocal < '4-1-2019'
GROUP BY date_part('hour', datelocal)
ORDER BY date_part('hour', datelocal)
The date range in this query will generally be for an entire month (it accepts user input from a web based report) - as you can see, I tried creating an index for each month's worth of data. That helped, but as far as I can tell, unless the query has recently been run (putting the results into the cache), it can still take up to a minute to run.
Explain analyze results:
Finalize GroupAggregate (cost=1035890.38..1035897.86 rows=1361 width=24) (actual time=3536.089..3536.108 rows=24 loops=1)
Group Key: (date_part('hour'::text, datelocal))
-> Sort (cost=1035890.38..1035891.06 rows=1361 width=24) (actual time=3536.083..3536.087 rows=48 loops=1)
Sort Key: (date_part('hour'::text, datelocal))
Sort Method: quicksort Memory: 28kB
-> Gather (cost=1035735.34..1035876.21 rows=1361 width=24) (actual time=3535.926..3579.818 rows=48 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Partial HashAggregate (cost=1034735.34..1034740.11 rows=1361 width=24) (actual time=3532.917..3532.933 rows=24 loops=2)
Group Key: date_part('hour'::text, datelocal)
-> Parallel Index Scan using reportimpression_mar2019_index on reportimpression (cost=0.09..1026482.42 rows=3301168 width=17) (actual time=0.045..2132.174 rows=2801158 loops=2)
Planning time: 0.517 ms
Execution time: 3579.965 ms
I wouldn't think 10 million records would be too much to handle, especially given that I recently bumped up the PG plan that I'm on to try to throw resources at it, so I assume that the issue is still just either my indexes or my queries not being very efficient.
A materialized view is the way to go for what you outlined. Querying past months of read-only data works without refreshing it. You may want to special-case the current month if you need to cover that, too.
The underlying query can still benefit from an index, and there are two directions you might take:
First off, partial indexes like you have now won't buy much in your scenario, not worth it. If you collect many more months of data and mostly query by month (and add / drop rows by month) table partitioning might be an idea, then you have your indexes partitioned automatically, too. I'd consider Postgres 11 or even the upcoming Postgres 12 for this, though.)
If your rows are wide, create an index that allows index-only scans. Like:
CREATE INDEX reportimpression_covering_idx ON reportimpression(datelocal, views, gender);
Related:
How does PostgreSQL perform ORDER BY if a b-tree index is built on that field?
Or INCLUDE additional columns in Postgres 11 or later:
CREATE INDEX reportimpression_covering_idx ON reportimpression(datelocal) INCLUDE (views, gender);
Else, if your rows are physically sorted by datelocal, consider a BRIN index. It's extremely small and probably about as fast as a B-tree index for your case. (But being so small it will stay cached much easier and not push other data out as much.)
CREATE INDEX reportimpression_brin_idx ON reportimpression USING BRIN (datelocal);
You may be interested in CLUSTER or pg_repack to physically sort table rows. pg_repack can do it without exclusive locks on the table and even without a btree index (required by CLUSTER). But it's an additional module not shipped with the standard distribution of Postgres.
Related:
Optimize Postgres deletion of orphaned records
How to reclaim disk space after delete without rebuilding table?
Your execution plan seems to be doing the right thing.
Things you can do to improve, in descending order of effectiveness:
Use a materialized view that pre-aggregates the data
Don't use a hosted database, use your own iron with good local storage and lots of RAM.
Use only one index instead of several partitioned ones. This is not primarily a performance advice (the query will probably not be measurably slower unless you have a lot of indexes), but it will ease the management burden.
Let's say I have a table with some columns and a column dt which is of type TIMESTAMP.
I create a (non functional) index on this column.
Then I execute a query
SELECT *
FROM tbl
WHERE
dt::DATE = NOW()::DATE
The question is will Postgres use the index I've created earlier and under which circumstances it will/will not?
I understand that a functional index would cover this case, but does a simple index cover both cases or not when it's a TIMESTAMP -> DATE type conversion?
EDIT:
performing an EXPLAIN ANALYZE on the query tells us it does not use index and performs a Seq scan (table with 3+ mil records:
Seq Scan on tbl (cost=0.00..192289.92 rows=17043 width=12) (actual time=7.237..2493.496 rows=4928 loops=1)
Filter: ((dt)::date = (now())::date)
Rows Removed by Filter: 3397155
Total runtime: 2494.546 ms
Let me ask a question differently then, is it possible to make Postgres utilize this index or should I create another one?
A simple index will not work in this case; try it with EXPLAIN.
What you could do to use the simple index is
WHERE dt >= current_date::timestamptz
AND dt < (current_date + 1)::timestamptz
I think that this is pretty readable and the best solution, but if you want to go with your current query, you'll have to add a second index on (dt::date).
Don't forget that every additional index costs space and slows down the performance of data modifying statements.
I currently have a large table mivehdetailedtrajectory (25B rows) and a small table cell_data_tower (400 rows) that I need to join using PostGIS. Specifically, I need to run this query:
SELECT COUNT(traj.*), tower.id
FROM cell_data_tower tower LEFT OUTER JOIN mivehdetailedtrajectory traj
ON ST_Contains(tower.geom, traj.location)
GROUP BY tower.id
ORDER BY tower.id;
It errors out angry that it can't write to disk. This seemed weird for a SELECT so I ran EXPLAIN:
NOTICE: gserialized_gist_joinsel: jointype 1 not supported
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Sort (cost=28905094882.25..28905094883.25 rows=400 width=120)
Sort Key: tower.id
-> HashAggregate (cost=28905094860.96..28905094864.96 rows=400 width=120)
-> Nested Loop Left Join (cost=0.00..28904927894.80 rows=33393232 width=120)
Join Filter: ((tower.geom && traj.location) AND _st_contains(tower.geom, traj.location))
-> Seq Scan on cell_data_tower tower (cost=0.00..52.00 rows=400 width=153)
-> Materialize (cost=0.00..15839886.96 rows=250449264 width=164)
-> Seq Scan on mivehdetailedtrajectory traj (cost=0.00..8717735.64 rows=250449264 width=164)
I don't understand why postgres thinks it should materialize the inner table. Also, I don't understand the plan in general to be honest. Seems like it should keep the cell_data_tower table in memory and iterate over the mivehdetailedtrajectory table. Any thoughts on how I can optimize this to (a) run, (b) do so in a reasonable amount of time. Specifically, it seems like this should be do-able in less than 1 day.
Edit: Postgres version 9.3
Queries that need a lot of memory are those rare places where correlated subqueries perform better (LATERAL JOIN should work too but those are beyond me). Also please note you didn't select tower.id so your result wouldn't be too useful.
SELECT tower.id, (SELECT COUNT(traj.*)
FROM mivehdetailedtrajectory traj
WHERE ST_Contains(tower.geom, traj.location))
FROM cell_data_tower tower
ORDER BY tower.id;
Try running it with LIMIT 1 first. The total runtime should be the runtime for one tower * number of towers.
I don't have a db so big like you, only 80M. But in my case i create a LinkID field to know where is each geom, and calculate which one is the closest LinkID when i insert a new record.
When i found out a single LinkID take 30ms and doing that 80M times would take 27 days i went from pre calculate those values.
Also i don't keep all the records, i only keep a month at any time.
I'm currently working on a complex sorting problem in Postgres 9.2
You can find the Source Code used in this Question(simplified) here: http://sqlfiddle.com/#!12/9857e/11
I have a Huge (>>20Mio rows) table containing various columns of different types.
CREATE TABLE data_table
(
id bigserial PRIMARY KEY,
column_a character(1),
column_b integer
-- ~100 more columns
);
Lets say i want to sort this table over 2 Columns (ASC).
But i don't want to do that with a simply Order By, because later I might need to insert rows in the sorted output and the user probably only wants to see 100 Rows at once (of the sorted output).
To achieve these goals i do the following:
CREATE TABLE meta_table
(
id bigserial PRIMARY KEY,
id_data bigint NOT NULL -- refers to the data_table
);
--Function to get the Column A of the current row
CREATE OR REPLACE FUNCTION get_column_a(bigint)
RETURNS character AS
'SELECT column_a FROM data_table WHERE id=$1'
LANGUAGE sql IMMUTABLE STRICT;
--Function to get the Column B of the current row
CREATE OR REPLACE FUNCTION get_column_b(bigint)
RETURNS integer AS
'SELECT column_b FROM data_table WHERE id=$1'
LANGUAGE sql IMMUTABLE STRICT;
--Creating a index on expression:
CREATE INDEX meta_sort_index
ON meta_table
USING btree
(get_column_a(id_data), get_column_b(id_data), id_data);
And then I copy the Id's of the data_table to the meta_table:
INSERT INTO meta_table(id_data) (SELECT id FROM data_table);
Later I can add additional rows to the table with a similar simple insert.
To get the Rows 900000 - 900099 (100 Rows) i can now use:
SELECT get_column_a(id_data), get_column_b(id_data), id_data
FROM meta_table
ORDER BY 1,2,3 OFFSET 900000 LIMIT 100;
(With an additional INNER JOIN on data_table if I want all the data.)
The Resulting Plan is:
Limit (cost=498956.59..499012.03 rows=100 width=8)
-> Index Only Scan using meta_sort_index on meta_table (cost=0.00..554396.21 rows=1000000 width=8)
This is a pretty efficient plan (Index Only Scans are new in Postgres 9.2).
But what is if I want to get Rows 20'000'000 - 20'000'099 (100 Rows)? Same Plan, much longer execution time. Well, to improve the Offset Performance (Improving OFFSET performance in PostgreSQL) I can do the following (Let's assume I saved every 100'000th Row away into another table).
SELECT get_column_a(id_data), get_column_b(id_data), id_data
FROM meta_table
WHERE (get_column_a(id_data), get_column_b(id_data), id_data ) >= (get_column_a(587857), get_column_b(587857), 587857 )
ORDER BY 1,2,3 LIMIT 100;
This runs much faster. The Resulting Plan is:
Limit (cost=0.51..61.13 rows=100 width=8)
-> Index Only Scan using meta_sort_index on meta_table (cost=0.51..193379.65 rows=318954 width=8)
Index Cond: (ROW((get_column_a(id_data)), (get_column_b(id_data)), id_data) >= ROW('Z'::bpchar, 27857, 587857))
So far everything works perfect and postgres does a great job!
Let's assume I want to change the Order of the 2nd Column to DESC.
But then I would have to change my WHERE Clause, because the > Operator compares both Columns ASC. The same query as above (ASC Ordering) could also be written as:
SELECT get_column_a(id_data), get_column_b(id_data), id_data
FROM meta_table
WHERE
(get_column_a(id_data) > get_column_a(587857))
OR (get_column_a(id_data) = get_column_a(587857) AND ((get_column_b(id_data) > get_column_b(587857))
OR ( (get_column_b(id_data) = get_column_b(587857)) AND (id_data >= 587857))))
ORDER BY 1,2,3 LIMIT 100;
Now the Plan Changes and the Query becomes slow:
Limit (cost=0.00..1095.94 rows=100 width=8)
-> Index Only Scan using meta_sort_index on meta_table (cost=0.00..1117877.41 rows=102002 width=8)
Filter: (((get_column_a(id_data)) > 'Z'::bpchar) OR (((get_column_a(id_data)) = 'Z'::bpchar) AND (((get_column_b(id_data)) > 27857) OR (((get_column_b(id_data)) = 27857) AND (id_data >= 587857)))))
How can I use the efficient older plan with DESC-Ordering?
Do you have any better ideas how to solve the Problem?
(I already tried to declare a own Type with own Operator Classes, but that's too slow)
You need to rethink your approach. Where to begin? This is a clear example, basically of the limits, performance-wise, of the sort of functional approach you are taking to SQL. Functions are largely planner opaque, and you are forcing two different lookups on data_table for every row retrieved because the stored procedure's plans cannot be folded together.
Now, far worse, you are indexing one table based on data in another. This might work for append-only workloads (inserts allowed but no updates) but it will not work if data_table can ever have updates applied. If the data in data_table ever changes, you will have the index return wrong results.
In these cases, you are almost always better off writing in the join as explicit, and letting the planner figure out the best way to retrieve the data.
Now your problem is that your index becomes a lot less useful (and a lot more intensive disk I/O-wise) when you change the order of your second column. On the other hand, if you had two different indexes on the data_table and had an explicit join, PostgreSQL could more easily handle this.