Following tables:
Collections (around 100 records)
Events (related to collections, around 6000 per collection)
Sales (related to events, in total around 2m records)
I need to get all sales for a certain collection, sorted by either sales.timestamp DESC (datetime field) or sales.id DESC (since id is already in insertion order) but to filter by collection_id I need to join the events table in first, then filter by e.collection_id.
To help with this I created a separate index on timestamp: idx_sales_timestamp_desc (timestamp DESC NULLS LAST), alongside the usual pkey index on sales.id
EXPLAIN (analyze,buffers) SELECT * from sales s
LEFT JOIN events e ON s.sales_event = e.sales_event
WHERE e.collection_id = 9
ORDER BY s.id DESC -- identical results with s.timestamp
LIMIT 10;
Without the ORDER:
Limit (cost=0.85..196.61 rows=10 width=619) (actual time=0.069..2.416 rows=10 loops=1)
Buffers: shared hit=172 read=3
I/O Timings: read=1.810
-> Nested Loop (cost=0.85..122231.34 rows=6244 width=619) (actual time=0.068..2.413 rows=10 loops=1)
Buffers: shared hit=172 read=3
I/O Timings: read=1.810
-> Index Scan using idx_events_collection_id on events e (cost=0.43..32359.71 rows=9551 width=206) (actual time=0.027..0.074 rows=47 loops=1)
Index Cond: (collection_id = 9)
Buffers: shared hit=24
-> Index Scan using idx_sales_sales_event on sales s (cost=0.42..9.39 rows=2 width=413) (actual time=0.049..0.049 rows=0 loops=47)
Index Cond: (sales_event = e.sales_event)
Buffers: shared hit=148 read=3
I/O Timings: read=1.810
Planning:
Buffers: shared hit=20
Planning Time: 0.418 ms
Execution Time: 2.444 ms
With the order:
Limit (cost=1001.00..3353.78 rows=10 width=619) (actual time=1084.650..2353.191 rows=10 loops=1)
Buffers: shared hit=81908 read=6967 dirtied=1930
I/O Timings: read=3732.771
-> Gather Merge (cost=1001.00..1470076.56 rows=6244 width=619) (actual time=1084.649..2352.683 rows=10 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=81908 read=6967 dirtied=1930
I/O Timings: read=3732.771
-> Nested Loop (cost=0.98..1468355.82 rows=2602 width=619) (actual time=622.297..1768.414 rows=6 loops=3)
Buffers: shared hit=81908 read=6967 dirtied=1930
I/O Timings: read=3732.771
-> Parallel Index Scan Backward using sales_pkey on sales s (cost=0.42..58907.93 rows=303693 width=413) (actual time=0.237..301.251 rows=6008 loops=3)
Buffers: shared hit=9958 read=1094 dirtied=1479
I/O Timings: read=513.609
-> Index Scan using events_pkey on events e (cost=0.55..4.64 rows=1 width=206) (actual time=0.243..0.243 rows=0 loops=18024)
Index Cond: (sales_event = s.sales_event)
Filter: (collection_id = 9)
Rows Removed by Filter: 0
Buffers: shared hit=71950 read=5873 dirtied=451
I/O Timings: read=3219.161
Planning:
Buffers: shared hit=20
Planning Time: 0.268 ms
Execution Time: 2354.905 ms
sales DDL:
-- Table Definition ----------------------------------------------
CREATE TABLE sales (
id BIGSERIAL PRIMARY KEY,
transaction text,
sales_event text,
price bigint,
name text,
timestamp timestamp with time zone,
);
-- Indices -------------------------------------------------------
CREATE UNIQUE INDEX sales_pkey ON sales(id int8_ops);
CREATE INDEX idx_sales_sales_event ON sales(sales_event text_ops);
CREATE INDEX idx_sales_timestamp_desc ON sales(timestamp timestamptz_ops DESC);
Events DDL:
-- Table Definition ----------------------------------------------
CREATE TABLE events (
created_at timestamp with time zone,
updated_at timestamp with time zone,
sales_event text PRIMARY KEY,
collection_id bigint REFERENCES collections(id) REFERENCES collections(id),
);
-- Indices -------------------------------------------------------
CREATE UNIQUE INDEX events_pkey ON events(sales_event text_ops);
Without the ORDER BY, I'm at around 500ms. With the ORDER BY, it easily ends up in the 2s-3m or longer category, depending on DB load, despite all indices being used according to the explain.
The default order when omitting ORDER BY altogether is not the one I want. I kept ANALYZE up to date as well.
How do I solve this in a good way?
This is a follow up to this issue I posted a while ago.
I have the following code:
SET work_mem = '16MB';
SELECT s.start_date, s.end_date, s.resources, s.activity_index, r.resource_id, sa.usedresourceset
FROM rm_o_resource_usage_instance_splits_new s
INNER JOIN rm_o_resource_usage r ON s.usage_id = r.id
INNER JOIN scheduledactivities sa ON s.activity_index = sa.activity_index AND r.schedule_id = sa.solution_id and s.solution = sa.solution_id
WHERE r.schedule_id = 10
ORDER BY r.resource_id, s.start_date
When I run EXPLAIN (ANALYZE, BUFFERS) I get the following:
Sort (cost=3724.02..3724.29 rows=105 width=89) (actual time=245.802..247.573 rows=22302 loops=1)
Sort Key: r.resource_id, s.start_date
Sort Method: quicksort Memory: 6692kB
Buffers: shared hit=198702 read=5993 written=612
-> Nested Loop (cost=703.76..3720.50 rows=105 width=89) (actual time=1.898..164.741 rows=22302 loops=1)
Buffers: shared hit=198702 read=5993 written=612
-> Hash Join (cost=703.34..3558.54 rows=105 width=101) (actual time=1.815..11.259 rows=22302 loops=1)
Hash Cond: (s.usage_id = r.id)
Buffers: shared hit=3 read=397 written=2
-> Bitmap Heap Scan on rm_o_resource_usage_instance_splits_new s (cost=690.61..3486.58 rows=22477 width=69) (actual time=1.782..5.820 rows=22302 loops=1)
Recheck Cond: (solution = 10)
Heap Blocks: exact=319
Buffers: shared hit=2 read=396 written=2
-> Bitmap Index Scan on rm_o_resource_usage_instance_splits_new_solution_idx (cost=0.00..685.00 rows=22477 width=0) (actual time=1.609..1.609 rows=22302 loops=1)
Index Cond: (solution = 10)
Buffers: shared hit=2 read=77
-> Hash (cost=12.66..12.66 rows=5 width=48) (actual time=0.023..0.023 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: shared hit=1 read=1
-> Bitmap Heap Scan on rm_o_resource_usage r (cost=4.19..12.66 rows=5 width=48) (actual time=0.020..0.020 rows=1 loops=1)
Recheck Cond: (schedule_id = 10)
Heap Blocks: exact=1
Buffers: shared hit=1 read=1
-> Bitmap Index Scan on rm_o_resource_usage_sched (cost=0.00..4.19 rows=5 width=0) (actual time=0.017..0.017 rows=1 loops=1)
Index Cond: (schedule_id = 10)
Buffers: shared read=1
-> Index Scan using scheduledactivities_activity_index_idx on scheduledactivities sa (cost=0.42..1.53 rows=1 width=16) (actual time=0.004..0.007 rows=1 loops=22302)
Index Cond: (activity_index = s.activity_index)
Filter: (solution_id = 10)
Rows Removed by Filter: 5
Buffers: shared hit=198699 read=5596 written=610
Planning time: 7.070 ms
Execution time: 248.691 ms
Every time I run EXPLAIN, I get roughly the same results. The Execution Time is always between 170ms and 250ms, which, to me is perfectly fine. However, when this query is run through a C++ project (using PQexec(conn, query) where conn is a dedicated connection, and query is the above query), the time it takes seems to vary widely. In general, the query is very quick, and you don't notice a delay. The problem is, that on occasion, this query will take 2 to 3 minutes to complete.
If I open the pgadmin, and have a look at the "server activity" for the database, there's about 30 or so connections, mostly sitting at "idle". The above query's connection is marked as "active", and will stay as "active" for several minutes.
I am at a loss of why it randomly takes several minutes to complete the same query, with no change in data in the DB either. I have tried increasing the work_mem which didn't make any difference (nor did I really expect it to). Any help or suggestions would be greatly appreciated.
There isn't any more specific tags, but I'm currently using Postgres 10.11, but it's also been an issue on other versions of 10.x. System is a Xeon quad-core # 3.4Ghz, with SSD and 24GB of memory.
Per jjanes's suggestion, I put in the auto_explain. Eventually go this output:
duration: 128057.373 ms
plan:
Query Text: SET work_mem = '32MB';SELECT s.start_date, s.end_date, s.resources, s.activity_index, r.resource_id, sa.usedresourceset FROM rm_o_resource_usage_instance_splits_new s INNER JOIN rm_o_resource_usage r ON s.usage_id = r.id INNER JOIN scheduledactivities sa ON s.activity_index = sa.activity_index AND r.schedule_id = sa.solution_id and s.solution = sa.solution_id WHERE r.schedule_id = 12642 ORDER BY r.resource_id, s.start_date
Sort (cost=14.36..14.37 rows=1 width=98) (actual time=128042.083..128043.287 rows=21899 loops=1)
Output: s.start_date, s.end_date, s.resources, s.activity_index, r.resource_id, sa.usedresourceset
Sort Key: r.resource_id, s.start_date
Sort Method: quicksort Memory: 6585kB
Buffers: shared hit=21198435 read=388 dirtied=119
-> Nested Loop (cost=0.85..14.35 rows=1 width=98) (actual time=4.995..127958.935 rows=21899 loops=1)
Output: s.start_date, s.end_date, s.resources, s.activity_index, r.resource_id, sa.usedresourceset
Join Filter: (s.activity_index = sa.activity_index)
Rows Removed by Join Filter: 705476285
Buffers: shared hit=21198435 read=388 dirtied=119
-> Nested Loop (cost=0.42..9.74 rows=1 width=110) (actual time=0.091..227.705 rows=21899 loops=1)
Output: s.start_date, s.end_date, s.resources, s.activity_index, s.solution, r.resource_id, r.schedule_id
Inner Unique: true
Join Filter: (s.usage_id = r.id)
Buffers: shared hit=22102 read=388 dirtied=119
-> Index Scan using rm_o_resource_usage_instance_splits_new_solution_idx on public.rm_o_resource_usage_instance_splits_new s (cost=0.42..8.44 rows=1 width=69) (actual time=0.082..17.418 rows=21899 loops=1)
Output: s.start_time, s.end_time, s.resources, s.activity_index, s.usage_id, s.start_date, s.end_date, s.solution
Index Cond: (s.solution = 12642)
Buffers: shared hit=203 read=388 dirtied=119
-> Seq Scan on public.rm_o_resource_usage r (cost=0.00..1.29 rows=1 width=57) (actual time=0.002..0.002 rows=1 loops=21899)
Output: r.id, r.schedule_id, r.resource_id
Filter: (r.schedule_id = 12642)
Rows Removed by Filter: 26
Buffers: shared hit=21899
-> Index Scan using scheduled_activities_idx on public.scheduledactivities sa (cost=0.42..4.60 rows=1 width=16) (actual time=0.006..4.612 rows=32216 loops=21899)
Output: sa.usedresourceset, sa.activity_index, sa.solution_id
Index Cond: (sa.solution_id = 12642)
Buffers: shared hit=21176333",,,,,,,,,""
EDIT: Full definitions of the tables are below:
CREATE TABLE public.rm_o_resource_usage_instance_splits_new
(
start_time integer NOT NULL,
end_time integer NOT NULL,
resources jsonb NOT NULL,
activity_index integer NOT NULL,
usage_id bigint NOT NULL,
start_date text COLLATE pg_catalog."default" NOT NULL,
end_date text COLLATE pg_catalog."default" NOT NULL,
solution bigint NOT NULL,
CONSTRAINT rm_o_resource_usage_instance_splits_new_pkey PRIMARY KEY (start_time, activity_index, usage_id),
CONSTRAINT rm_o_resource_usage_instance_splits_new_solution_fkey FOREIGN KEY (solution)
REFERENCES public.rm_o_schedule_stats (id) MATCH SIMPLE
ON UPDATE CASCADE
ON DELETE CASCADE,
CONSTRAINT rm_o_resource_usage_instance_splits_new_usage_id_fkey FOREIGN KEY (usage_id)
REFERENCES public.rm_o_resource_usage (id) MATCH SIMPLE
ON UPDATE CASCADE
ON DELETE CASCADE
)
WITH (
OIDS = FALSE
)
TABLESPACE pg_default;
CREATE INDEX rm_o_resource_usage_instance_splits_new_activity_idx
ON public.rm_o_resource_usage_instance_splits_new USING btree
(activity_index ASC NULLS LAST)
TABLESPACE pg_default;
CREATE INDEX rm_o_resource_usage_instance_splits_new_solution_idx
ON public.rm_o_resource_usage_instance_splits_new USING btree
(solution ASC NULLS LAST)
TABLESPACE pg_default;
CREATE INDEX rm_o_resource_usage_instance_splits_new_usage_idx
ON public.rm_o_resource_usage_instance_splits_new USING btree
(usage_id ASC NULLS LAST)
TABLESPACE pg_default;
CREATE TABLE public.rm_o_resource_usage
(
id bigint NOT NULL DEFAULT nextval('rm_o_resource_usage_id_seq'::regclass),
schedule_id bigint NOT NULL,
resource_id text COLLATE pg_catalog."default" NOT NULL,
CONSTRAINT rm_o_resource_usage_pkey PRIMARY KEY (id),
CONSTRAINT rm_o_resource_usage_schedule_id_fkey FOREIGN KEY (schedule_id)
REFERENCES public.rm_o_schedule_stats (id) MATCH SIMPLE
ON UPDATE CASCADE
ON DELETE CASCADE
)
WITH (
OIDS = FALSE
)
TABLESPACE pg_default;
CREATE INDEX rm_o_resource_usage_idx
ON public.rm_o_resource_usage USING btree
(id ASC NULLS LAST)
TABLESPACE pg_default;
CREATE INDEX rm_o_resource_usage_sched
ON public.rm_o_resource_usage USING btree
(schedule_id ASC NULLS LAST)
TABLESPACE pg_default;
CREATE TABLE public.scheduledactivities
(
id bigint NOT NULL DEFAULT nextval('scheduledactivities_id_seq'::regclass),
solution_id bigint NOT NULL,
activity_id text COLLATE pg_catalog."default" NOT NULL,
sequence_index integer,
startminute integer,
finishminute integer,
issue text COLLATE pg_catalog."default",
activity_index integer NOT NULL,
is_objective boolean NOT NULL,
usedresourceset integer DEFAULT '-1'::integer,
start timestamp without time zone,
finish timestamp without time zone,
is_ore boolean,
is_ignored boolean,
CONSTRAINT scheduled_activities_pkey PRIMARY KEY (id),
CONSTRAINT scheduledactivities_solution_id_fkey FOREIGN KEY (solution_id)
REFERENCES public.rm_o_schedule_stats (id) MATCH SIMPLE
ON UPDATE CASCADE
ON DELETE CASCADE
)
WITH (
OIDS = FALSE
)
TABLESPACE pg_default;
CREATE INDEX scheduled_activities_activity_id_idx
ON public.scheduledactivities USING btree
(activity_id COLLATE pg_catalog."default" ASC NULLS LAST)
TABLESPACE pg_default;
CREATE INDEX scheduled_activities_id_idx
ON public.scheduledactivities USING btree
(id ASC NULLS LAST)
TABLESPACE pg_default;
CREATE INDEX scheduled_activities_idx
ON public.scheduledactivities USING btree
(solution_id ASC NULLS LAST)
TABLESPACE pg_default;
CREATE INDEX scheduledactivities_activity_index_idx
ON public.scheduledactivities USING btree
(activity_index ASC NULLS LAST)
TABLESPACE pg_default;
EDIT: Additional output from auto_explain after adding index on scheduledactivities (solution_id, activity_index)
Output: s.start_date, s.end_date, s.resources, s.activity_index, r.resource_id, sa.usedresourceset
Sort Key: r.resource_id, s.start_date
Sort Method: quicksort Memory: 6283kB
Buffers: shared hit=20159117 read=375 dirtied=190
-> Nested Loop (cost=0.85..10.76 rows=1 width=100) (actual time=5.518..122489.627 rows=20761 loops=1)
Output: s.start_date, s.end_date, s.resources, s.activity_index, r.resource_id, sa.usedresourceset
Join Filter: (s.activity_index = sa.activity_index)
Rows Removed by Join Filter: 668815615
Buffers: shared hit=20159117 read=375 dirtied=190
-> Nested Loop (cost=0.42..5.80 rows=1 width=112) (actual time=0.057..217.563 rows=20761 loops=1)
Output: s.start_date, s.end_date, s.resources, s.activity_index, s.solution, r.resource_id, r.schedule_id
Inner Unique: true
Join Filter: (s.usage_id = r.id)
Buffers: shared hit=20947 read=375 dirtied=190
-> Index Scan using rm_o_resource_usage_instance_splits_new_solution_idx on public.rm_o_resource_usage_instance_splits_new s (cost=0.42..4.44 rows=1 width=69) (actual time=0.049..17.622 rows=20761 loops=1)
Output: s.start_time, s.end_time, s.resources, s.activity_index, s.usage_id, s.start_date, s.end_date, s.solution
Index Cond: (s.solution = 12644)
Buffers: shared hit=186 read=375 dirtied=190
-> Seq Scan on public.rm_o_resource_usage r (cost=0.00..1.35 rows=1 width=59) (actual time=0.002..0.002 rows=1 loops=20761)
Output: r.id, r.schedule_id, r.resource_id
Filter: (r.schedule_id = 12644)
Rows Removed by Filter: 22
Buffers: shared hit=20761
-> Index Scan using scheduled_activities_idx on public.scheduledactivities sa (cost=0.42..4.94 rows=1 width=16) (actual time=0.007..4.654 rows=32216 loops=20761)
Output: sa.usedresourceset, sa.activity_index, sa.solution_id
Index Cond: (sa.solution_id = 12644)
Buffers: shared hit=20138170",,,,,,,,,""
The easiest way to reproduce the issue is to add more values to the three tables. I didn't delete any, only did a few thousand INSERTs.
-> Index Scan using .. s (cost=0.42..8.44 rows=1 width=69) (actual time=0.082..17.418 rows=21899 loops=1)
Index Cond: (s.solution = 12642)
The planner thinks it will find 1 row, and instead finds 21899. That error can pretty clearly lead to bad plans. And a single equality condition should be estimated quite accurately, so I'd say the statistics on your table are way off. It could be that the autovac launcher is tuned poorly so it doesn't run often enough, or it could be that select parts of your data change very rapidly (did you just insert 21899 rows with s.solution = 12642 immediately before running the query?) and so the stats can't be kept accurate enough.
-> Nested Loop ...
Join Filter: (s.activity_index = sa.activity_index)
Rows Removed by Join Filter: 705476285
-> ...
-> Index Scan using scheduled_activities_idx on public.scheduledactivities sa (cost=0.42..4.60 rows=1 width=16) (actual time=0.006..4.612 rows=32216 loops=21899)
Output: sa.usedresourceset, sa.activity_index, sa.solution_id
Index Cond: (sa.solution_id = 12642)
If you can't get it to use the Hash Join, you can at least reduce the harm of the Nested Loop by building an index on scheduledactivities (solution_id, activity_index). That way the activity_index criterion could be part of the Index Condition, rather than being a Join Filter. You could probably then drop the index exclusively on solution_id, as there is little point in maintaining both indexes.
The SQL statement of the fast plan is using WHERE r.schedule_id = 10 and returns about 22000 rows (with estimated 105).
The SQL statement of the slow plan is using WHERE r.schedule_id = 12642 and returns about 21000 rows (with estimated only 1).
The slow plan is using nested loops instead of hash joins: maybe because there is a bad estimation for joins: estimated rows is 1 but actual rows is 21899.
For example in this step:
Nested Loop (cost=0.42..9.74 rows=1 width=110) (actual time=0.091..227.705 rows=21899 loops=1)
If data does not change there is maybe a statistic issue (skew data) for some columns.
In my query, I just want to call data with exact where conditions. These where conditions were created in index. Bu the explain shows bit index-scan. I couldn't understand why.
My query looks like below:
Select
r.spend,
r.date,
...
from metadata m
inner join
report r
on m.org_id = r.org_id and m.country_or_region = r.country_or_region and m.campaign_id = r.campaign_id and m.keyword_id = r.keyword_id
where r.org_id = 1 and m.keyword_type = 'KEYWORD'
offset 0 limit 20
Indexes:
Metadata(org_id, keyword_type, country_or_region, campaign_id, keyword_id);
Report(org_id, country_or_region, campaign_id, keyword_id, date);
Explain Analyze:
"Limit (cost=811883.21..910327.87 rows=20 width=8) (actual time=18120.268..18235.831 rows=20 loops=1)"
" -> Gather (cost=811883.21..2702020.67 rows=384 width=8) (actual time=18120.267..18235.791 rows=20 loops=1)"
" Workers Planned: 2"
" Workers Launched: 2"
" -> Parallel Hash Join (cost=810883.21..2700982.27 rows=160 width=8) (actual time=18103.440..18103.496 rows=14 loops=3)"
" Hash Cond: (((r.country_or_region)::text = (m.country_or_region)::text) AND (r.campaign_id = m.campaign_id) AND (r.keyword_id = m.keyword_id))"
" -> Parallel Bitmap Heap Scan on report r (cost=260773.11..2051875.83 rows=3939599 width=35) (actual time=552.601..8532.962 rows=3162553 loops=3)"
" Recheck Cond: (org_id = 479360)"
" Rows Removed by Index Recheck: 21"
" Heap Blocks: exact=20484 lossy=84350"
" -> Bitmap Index Scan on idx_kr_org_date_camp (cost=0.00..258409.35 rows=9455038 width=0) (actual time=539.329..539.329 rows=9487660 loops=1)"
" Index Cond: (org_id = 479360)"
" -> Parallel Hash (cost=527278.08..527278.08 rows=938173 width=26) (actual time=7425.062..7425.062 rows=727133 loops=3)"
" Buckets: 65536 Batches: 64 Memory Usage: 2656kB"
" -> Parallel Bitmap Heap Scan on metadata m (cost=88007.61..527278.08 rows=938173 width=26) (actual time=1007.028..7119.233 rows=727133 loops=3)"
" Recheck Cond: ((org_id = 479360) AND ((keyword_type)::text = 'KEYWORD'::text))"
" Rows Removed by Index Recheck: 3"
" Heap Blocks: exact=14585 lossy=11054"
" -> Bitmap Index Scan on idx_primary (cost=0.00..87444.71 rows=2251615 width=0) (actual time=1014.631..1014.631 rows=2181399 loops=1)"
" Index Cond: ((org_id = 479360) AND ((keyword_type)::text = 'KEYWORD'::text))"
"Planning Time: 0.492 ms"
"Execution Time: 18235.879 ms"
In here, I just want to call 20 items. It should be more effective?
The Bitmap Index Scan happens when the result set will have a high selectivity rate with respect to the search conditions (i.e., there is a high percentage of rows that satisfy the search criteria). In this case, the planner will plan to scan the entire index, forming a bitmap of which pages on disk to pull the data out from (which happens during the Bitmap Heap Scan step). This is better than a Sequential Scan because it only scans the relevant pages on disk, skipping the pages that it knows relevant data does not exist. Depending on the statistics available to the optimizer, it may not be advantageous to do an Index Scan or an Index-Only Scan, but it is still better than a Sequential Scan.
To complete the answer to the question, an Index-Only Scan is a scan of the index that will pull the relevant data without having to visit the actual table. This is because the relevant data is already in the index. Take, for example, this table:
postgres=# create table foo (id int primary key, name text);
CREATE TABLE
postgres=# insert into foo values (generate_series(1,1000000),'foo');
INSERT 0 1000000
There is an index on the id column of this table, and suppose we call the following query:
postgres=# EXPLAIN ANALYZE SELECT * FROM foo WHERE id < 100;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Index Scan using foo_pkey on foo (cost=0.42..10.25 rows=104 width=8) (actual time=0.012..1.027 rows=99 loops=1)
Index Cond: (id < 100)
Planning Time: 0.190 ms
Execution Time: 2.067 ms
(4 rows)
This query results in an Index scan because it scans the index for the rows that have id < 100, and then visits the actual table on disk to pull the other columns included in the * portion of the SELECT query.
However, suppose we call the following query (notice SELECT id instead of SELECT *):
postgres=# EXPLAIN ANALYZE SELECT id FROM foo WHERE id < 100;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Index Only Scan using foo_pkey on foo (cost=0.42..10.25 rows=104 width=4) (actual time=0.019..0.996 rows=99 loops=1)
Index Cond: (id < 100)
Heap Fetches: 99
Planning Time: 0.098 ms
Execution Time: 1.980 ms
(5 rows)
This results in an Index-only scan because only the id column is requested, and that is included (naturally) in the index, so there's no need to visit the actual table on disk to retrieve anything else. This saves time, but its occurrence is very limited.
To answer your question about limiting to 20 results, the limiting occurs after the Bitmap Index Scan has occurred, so the runtime will still be the same whether you limit to 20, 40, or some other value. In the case of an Index/Index-Only Scan, the executor will stop scanning after it has acquired enough rows as specified by the LIMIT clause. In your case, with the Bitmap Heap Scan, this isn’t possible
I am currently working on improving the performance of our db. And I need some help from you.
I have a table and its index like this
CREATE TABLE public.ar
(
id integer NOT NULL DEFAULT nextval('id_seq'::regclass),
user_id integer NOT NULL,
duration double precision,
is_idle boolean NOT NULL,
activity_id integer NOT NULL,
device_id integer NOT NULL,
calendar_id integer,
on_taskness integer,
week_id integer,
some_other_column_below,
CONSTRAINT id_ PRIMARY KEY (id),
CONSTRAINT a_unique_key UNIQUE (user_id, device_id, start_time_local, start_time_utc, end_time_local, end_time_utc)
)
CREATE INDEX ar_idx
ON public.ar USING btree
(week_id, calendar_id, user_id, activity_id, duration, on_taskness, is_idle)
TABLESPACE pg_default;
Then I am trying to run a query like this
EXPLAIN ANALYZE
SELECT COUNT(*)
FROM (
SELECT ar.user_id
FROM ar
WHERE ar.user_id = ANY(array[some_data]) -- data size is 352
AND ROUND(ar.duration) >0 AND ar.is_idle = false
AND ar.week_id = ANY(ARRAY[some_data]) -- data size is 37
AND ar.calendar_id = ANY(array[some_data]) -- data size is 16716
GROUP by ar.user_id
) tmp;
And below is the explain result
Aggregate (cost=31389954.72..31389954.73 rows=1 width=8) (actual time=252020.695..252020.695 rows=1 loops=1)
-> Group (cost=31389032.69..31389922.37 rows=2588 width=4) (actual time=251089.270..252020.659 rows=351 loops=1)
Group Key: ar.user_id
-> Sort (cost=31389032.69..31389477.53 rows=177935 width=4) (actual time=251089.268..251776.202 rows=6993358 loops=1)
Sort Key: ar.user_id
Sort Method: external merge Disk: 95672kB
-> Bitmap Heap Scan on ar (cost=609015.18..31371079.88 rows=177935 width=4) (actual time=1670.413..248939.440 rows=6993358 loops=1)
Recheck Cond: ((week_id = ANY ('{some_data}'::integer[])) AND (user_id = ANY ('{some_data}'::integer[])))
Rows Removed by Index Recheck: 2081028
Filter: ((NOT is_idle) AND (round(duration) > '0'::double precision) AND (calendar_id = ANY ('{some_data}'::integer[])))
Rows Removed by Filter: 534017
Heap Blocks: exact=29551 lossy=313127
-> BitmapAnd (cost=609015.18..609015.18 rows=1357521 width=0) (actual time=1666.334..1666.334 rows=0 loops=1)
-> Bitmap Index Scan on test_index_only_scan_idx (cost=0.00..272396.77 rows=6970353 width=0) (actual time=614.366..614.366 rows=7269830 loops=1)
Index Cond: ((week_id = ANY ('{some_data}'::integer[])) AND (is_idle = false))
-> Bitmap Index Scan on unique_key (cost=0.00..336529.20 rows=9948573 width=0) (actual time=1041.999..1041.999 rows=14959355 loops=1)
Index Cond: (user_id = ANY ('{some_data}'::integer[]))
Planning time: 25.563 ms
Execution time: 252029.237 ms
I used distinct as well, and the result is the same.
So my questions are below.
The ar_idx contains user_id, but when searching for rows, why does it use the unique_key instead of the index I created?
I thought group by will not do the sort(that is why I did not choose distinct), but why does the sort happen in the explain analyze?
The running time is pretty long(more than 4 minutes). How do I make it faster? Is the index wrong? Or anything else I can do.
Be advised, the ar table contains 51585203 rows.
Any help will be appreciated. Thx.
---------------------------update--------------------------
After I created this index, everything goes really fast now. I don't understand why, anyone can explain this to me?
CREATE INDEX ar_1_idx
ON public.ar USING btree
(calendar_id, user_id)
TABLESPACE pg_default;
And I changed the old index to
CREATE INDEX ar_idx
ON public.ar USING btree
(week_id, calendar, user_id, activity_id, duration, on_taskness, start_time_local, end_time_local) WHERE is_idle IS FALSE
TABLESPACE pg_default;
-----updated analyze results-----------
Aggregate (cost=31216435.97..31216435.98 rows=1 width=8) (actual time=13206.941..13206.941 rows=1 loops=1)
Buffers: shared hit=25940518 read=430315, temp read=31079 written=31079
-> Group (cost=31215436.80..31216403.88 rows=2567 width=4) (actual time=12239.336..13206.894 rows=351 loops=1)
Group Key: ar.user_id
Buffers: shared hit=25940518 read=430315, temp read=31079 written=31079
-> Sort (cost=31215436.80..31215920.34 rows=193417 width=4) (actual time=12239.334..12932.801 rows=6993358 loops=1)
Sort Key: ar.user_id
Sort Method: external merge Disk: 95664kB
Buffers: shared hit=25940518 read=430315, temp read=31079 written=31079
-> Index Scan using ar_1_idx on activity_report ar (cost=0.56..31195807.48 rows=193417 width=4) (actual time=0.275..10387.051 rows=6993358 loops=1)
Index Cond: ((calendar_id = ANY ('{some_data}'::integer[])) AND (user_id = ANY ('{some_data}'::integer[])))
Filter: ((NOT is_idle) AND (round(duration) > '0'::double precision) AND (week_id = ANY ('{some_data}'::integer[])))
Rows Removed by Filter: 590705
Buffers: shared hit=25940518 read=430315
Planning time: 25.577 ms
Execution time: 13217.611 ms
Test table and indexes:
CREATE TABLE public.t (id serial, cb boolean, ci integer, co integer)
INSERT INTO t(cb, ci, co)
SELECT ((round(random()*1))::int)::boolean, round(random()*100), round(random()*100)
FROM generate_series(1, 1000000)
CREATE INDEX "right" ON public.t USING btree (ci, cb, co);
CREATE INDEX wrong ON public.t USING btree (ci, co);
CREATE INDEX right_hack ON public.t USING btree (ci, (cb::integer), co);
The problem is that I can't force PostgreSQL to use the "right" index. The next query uses the "wrong" index. It's not optimal because it uses "Filter" (condition: cb = TRUE) and so reads more data from memory (and execution becomes longer):
explain (analyze, buffers)
SELECT * FROM t WHERE cb = TRUE AND ci = 46 ORDER BY co LIMIT 1000
"Limit (cost=0.42..4063.87 rows=1000 width=13) (actual time=0.057..4.405 rows=1000 loops=1)"
" Buffers: shared hit=1960"
" -> Index Scan using wrong on t (cost=0.42..21784.57 rows=5361 width=13) (actual time=0.055..4.256 rows=1000 loops=1)"
" Index Cond: (ci = 46)"
" Filter: cb"
" Rows Removed by Filter: 967"
" Buffers: shared hit=1960"
"Planning time: 0.318 ms"
"Execution time: 4.530 ms"
But when I cast bool column to int, that works fine. This is unclear, because selectivity of both indexes (right and right_hack) remains the same.
explain (analyze, buffers)
SELECT * FROM t WHERE cb::int = 1 AND ci = 46 ORDER BY co LIMIT 1000
"Limit (cost=0.42..2709.91 rows=1000 width=13) (actual time=0.027..1.484 rows=1000 loops=1)"
" Buffers: shared hit=1003"
" -> Index Scan using right_hack on t (cost=0.42..14525.95 rows=5361 width=13) (actual time=0.025..1.391 rows=1000 loops=1)"
" Index Cond: ((ci = 46) AND ((cb)::integer = 1))"
" Buffers: shared hit=1003"
"Planning time: 0.202 ms"
"Execution time: 1.565 ms"
Are there any limitations of using boolean column inside multicolumn index?
A conditional index (or two) does seem to work:
CREATE INDEX true_bits ON ttt (ci, co)
WHERE cb = True ;
CREATE INDEX false_bits ON ttt (ci, co)
WHERE cb = False ;
VACUUM ANALYZE ttt;
EXPLAIN (ANALYZE, buffers)
SELECT * FROM ttt
WHERE cb = TRUE AND ci = 46 ORDER BY co LIMIT 1000
;
Plan
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.25..779.19 rows=1000 width=13) (actual time=0.024..1.804 rows=1000 loops=1)
Buffers: shared hit=1001
-> Index Scan using true_bits on ttt (cost=0.25..3653.46 rows=4690 width=13) (actual time=0.020..1.570 rows=1000 loops=1)
Index Cond: (ci = 46)
Buffers: shared hit=1001
Planning time: 0.468 ms
Execution time: 1.949 ms
(7 rows)
Still, there is very little gain in indexes on low-cardinality columns. The chance that an index-entry can avoid a page-read is very small. For a page size of 8K and a rowsize of ~20, there are ~400 records on a page. There will (almost) always be a true record on any page (and a false record), so the page will have to be read anyway.