I recently encountered and solved a Problem - but I do not get why there even was a Problem to begin with.
simplified said, i have 3 tables in a postgres 10.5 database:
entities (id, name)
entities_to_stuff(
id,
entities_id -> fk entities.id,
stuff_id -> fk stuff.id,
unique constraint (entity_id, stuff_id)
)
stuff(id, name)
after inserting about 200k records, selects on the query:
select * from entities_to_stuff where entities_id = 1;
started to take 100 - 400 ms.
As is understood, creating a unique constraint, creates an index on the unique fields. so i have an index on (entities_id, stuff_id), entities_id being the "leftmost" column.
according to the docs, queries including the leftmost column are the most efficient ( postgres docs on this ) - so i assumed this index would do for me.
So i checked the execution plan - it wasn't using the index.
So, just to be sure i did:
SET enable_seqscan = OFF;
and re ran the query - still took well over 100 ms most of the time.
I then got annoyed and created this index
create index "idx_entities_id" on "entities_to_stuff" ("entities_id");
and suddenly it takes 0.2 ms or even less to run and the execution plan also uses it when sequential scans are enabled.
How is this index orders of magnitudes faster than the one already existing?
Edit:
execution plan after generating the additional index:
Index Scan using idx_entities_id on entities_to_stuff (cost=0.00..12.04 rows=2 width=32) (actual time=0.049..0.050 rows=1 loops=1)
Index Cond: (entities_id = 199283)
Planning time: 0.378 ms
Execution time: 0.073 ms
plan with just the unique constraint, seq_scan=on
Gather (cost=1000.00..38679.87 rows=2 width=32) (actual time=344.321..1740.861 rows=1 loops=1)
Workers Planned: 2
Workers Launched: 0
-> Parallel Seq Scan on entities_to_stuff (cost=0.00..37679.67 rows=1 width=32) (actual time=344.088..1739.684 rows=1 loops=1)
Filter: (entities_id = 199283)
Rows Removed by Filter: 2907419
Planning time: 0.241 ms
Execution time: 740.888 ms
plan with constraint, seq-scan = off
Index Scan using uq_entities_to_stuff on entities_to_stuff (cost=0.43..66636.34 rows=2 width=32) (actual time=0.385..553.066 rows=1 loops=1)
Index Cond: (entities_id = 199283)
Planning time: 0.082 ms
Execution time: 553.103 ms
Related
Postgres is using a much heavier Seq Scan on table tracking when an index is available. The first query was the original attempt, which uses a Seq Scan and therefore has a slow query. I attempted to force an Index Scan with an Inner Select, but postgres converted it back to effectively the same query with nearly the same runtime. I finally copied the list from the Inner Select of query two to make the third query. Finally postgres used the Index Scan, which dramatically decreased the runtime. The third query is not viable in a production environment. What will cause postgres to use the last query plan?
(vacuum was used on both tables)
Tables
tracking (worker_id, localdatetime) total records: 118664105
project_worker (id, project_id) total records: 12935
INDEX
CREATE INDEX tracking_worker_id_localdatetime_idx ON public.tracking USING btree (worker_id, localdatetime)
Queries
SELECT worker_id, localdatetime FROM tracking t JOIN project_worker pw ON t.worker_id = pw.id WHERE project_id = 68475018
Hash Join (cost=29185.80..2638162.26 rows=19294218 width=16) (actual time=16.912..18376.032 rows=177681 loops=1)
Hash Cond: (t.worker_id = pw.id)
-> Seq Scan on tracking t (cost=0.00..2297293.86 rows=118716186 width=16) (actual time=0.004..8242.891 rows=118674660 loops=1)
-> Hash (cost=29134.80..29134.80 rows=4080 width=8) (actual time=16.855..16.855 rows=2102 loops=1)
Buckets: 4096 Batches: 1 Memory Usage: 115kB
-> Seq Scan on project_worker pw (cost=0.00..29134.80 rows=4080 width=8) (actual time=0.004..16.596 rows=2102 loops=1)
Filter: (project_id = 68475018)
Rows Removed by Filter: 10833
Planning Time: 0.192 ms
Execution Time: 18382.698 ms
SELECT worker_id, localdatetime FROM tracking t WHERE worker_id IN (SELECT id FROM project_worker WHERE project_id = 68475018 LIMIT 500)
Hash Semi Join (cost=6905.32..2923969.14 rows=27733254 width=24) (actual time=19.715..20191.517 rows=20530 loops=1)
Hash Cond: (t.worker_id = project_worker.id)
-> Seq Scan on tracking t (cost=0.00..2296948.27 rows=118698327 width=24) (actual time=0.005..9184.676 rows=118657026 loops=1)
-> Hash (cost=6899.07..6899.07 rows=500 width=8) (actual time=1.103..1.103 rows=500 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 28kB
-> Limit (cost=0.00..6894.07 rows=500 width=8) (actual time=0.006..1.011 rows=500 loops=1)
-> Seq Scan on project_worker (cost=0.00..28982.65 rows=2102 width=8) (actual time=0.005..0.968 rows=500 loops=1)
Filter: (project_id = 68475018)
Rows Removed by Filter: 4493
Planning Time: 0.224 ms
Execution Time: 20192.421 ms
SELECT worker_id, localdatetime FROM tracking t WHERE worker_id IN (322016383,316007840,...,285702579)
Index Scan using tracking_worker_id_localdatetime_idx on tracking t (cost=0.57..4766798.31 rows=21877360 width=24) (actual time=0.079..29.756 rows=22112 loops=1)
" Index Cond: (worker_id = ANY ('{322016383,316007840,...,285702579}'::bigint[]))"
Planning Time: 1.162 ms
Execution Time: 30.884 ms
... is in place of the 500 id entries used in the query
Same query ran on another set of 500 id's
Index Scan using tracking_worker_id_localdatetime_idx on tracking t (cost=0.57..4776714.91 rows=21900980 width=24) (actual time=0.105..5528.109 rows=117838 loops=1)
" Index Cond: (worker_id = ANY ('{286237712,286237844,...,216724213}'::bigint[]))"
Planning Time: 2.105 ms
Execution Time: 5534.948 ms
The distribution of "worker_id" within "tracking" seems very skewed. For one thing, the number of rows in one of your instances of query 3 returns over 5 times as many rows as the other instance of it. For another, the estimated number of rows is 100 to 1000 times higher than the actual number. This can certainly lead to bad plans (although it is unlikely to be the complete picture).
What is the actual number of distinct values for worker_id within tracking: select count(distinct worker_id) from tracking? What does the planner think this value is: select n_distinct from pg_stats where tablename='tracking' and attname='worker_id'? If those values are far apart and you force the planner to use a more reasonable value with alter table tracking alter column worker_id set (n_distinct = <real value>); analyze tracking; does that change the plans?
If you want to nudge PostgreSQL towards a nested loop join, try the following:
Create an index on tracking that can be used for an index-only scan:
CREATE INDEX ON tracking (worker_id) INCLUDE (localdatetime);
Make sure that tracking is VACUUMed often, so that an index-only scan is effective.
Reduce random_page_cost and increase effective_cache_size so that the optimizer prices index scans lower (but don't use insane values).
Make sure that you have good estimates on project_worker:
ALTER TABLE project_worker ALTER project_id SET STATISTICS 1000;
ANALYZE project_worker;
I am working on a relatively simple query:
SELECT row.id, row.name FROM things AS row
WHERE row.type IN (
'00000000-0000-0000-0000-000000031201',
...
)
ORDER BY row.name ASC, row.id ASC
LIMIT 2000;
The problem:
the query is fine if the list contains 25 or less UUIDs:
Limit (cost=21530.51..21760.51 rows=2000 width=55) (actual time=5.057..7.780 rows=806 loops=1)
-> Gather Merge (cost=21530.51..36388.05 rows=129196 width=55) (actual time=5.055..6.751 rows=806 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Sort (cost=20530.50..20853.49 rows=129196 width=55) (actual time=2.273..2.546 rows=403 loops=2)
Sort Key: name, id
Sort Method: quicksort Memory: 119kB
-> Parallel Index Only Scan using idx_things_type_name_id on things row (cost=0.69..9562.28 rows=129196 width=55) (actual time=0.065..0.840 rows=403 loops=2)
Index Cond: (type = ANY ('{00000000-0000-0000-0000-000000031201,... (< 24 more)}'::text[]))
Heap Fetches: 0
Planning time: 0.202 ms
Execution time: 8.485 ms
but once the list grows larger than 25 elements a different index is used and the query execution time really goes up:
Limit (cost=1000.58..15740.63 rows=2000 width=55) (actual time=11.553..29789.670 rows=952 loops=1)
-> Gather Merge (cost=1000.58..2400621.01 rows=325592 width=55) (actual time=11.551..29855.053 rows=952 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Index Scan using idx_things_name_id on things row (cost=0.56..2362039.59 rows=135663 width=55) (actual time=3.570..24437.039 rows=317 loops=3)
Filter: ((type)::text = ANY ('{00000000-0000-0000-0000-000000031201,... (> 24 more)}'::text[]))
Rows Removed by Filter: 5478258
Planning time: 0.209 ms
Execution time: 29857.454 ms
Details:
The table contains 16435726 rows, 17 columns. The 3 columns relevant to the query are:
id - varchar(36), not null, unique, primary key
type - varchar(36), foreign key
name - varchar(2000)
The relevant indexes are:
create unique index idx_things_pkey on things (id);
create index idx_things_type on things (type);
create index idx_things_name_id on things (name, id);
create index idx_things_type_name_id on things (type, name, id);
there are 70 different type values of which 2 account for ~15 million rows. Those two are NOT in the IN list.
Experiments and questions:
I started by checking if this index helps:
create index idx_things_name_id_type ON things (name, id, type);
it did but slightly. 12s is not acceptable:
Limit (cost=1000.71..7638.73 rows=2000 width=55) (actual time=5.888..12120.907 rows=952 loops=1)
-> Gather Merge (cost=1000.71..963238.21 rows=289917 width=55) (actual time=5.886..12154.580 rows=952 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Index Only Scan using idx_things_name_id_type on things row (cost=0.69..928774.57 rows=120799 width=55) (actual time=1.024..9852.923 rows=317 loops=3)
Filter: ((type)::text = ANY ('{00000000-0000-0000-0000-000000031201,... 37 more}'::text[]))
Rows Removed by Filter: 5478258
Heap Fetches: 0
Planning time: 0.638 ms
Execution time: 12156.817 ms
I know that large IN lists are not efficient in Postgres but I was surprised to hit this as soon as 25 elements. Or is the issue here something else?
I tried the solutions suggested in other posts (to inner join to a VALUES list, too change IN to IN VALUES, ...) but it made things even worse. Here is an example of one if the experiments:
SELECT row.id, row.name
FROM things AS row
WHERE row.type IN (VALUES ('00000000-0000-0000-0000-000000031201'), ... )
ORDER BY row.name ASC, row.id ASC
LIMIT 2000;
Limit (cost=0.56..1254.91 rows=2000 width=55) (actual time=45.718..847919.632 rows=952 loops=1)
-> Nested Loop Semi Join (cost=0.56..10298994.72 rows=16421232 width=55) (actual time=45.714..847917.788 rows=952 loops=1)
Join Filter: ((row.type)::text = "*VALUES*".column1)
Rows Removed by Join Filter: 542360414
-> Index Scan using idx_things_name_id on things row (cost=0.56..2170484.38 rows=16421232 width=92) (actual time=0.132..61387.582 rows=16435726 loops=1)
-> Materialize (cost=0.00..0.58 rows=33 width=32) (actual time=0.001..0.022 rows=33 loops=16435726)
-> Values Scan on "*VALUES*" (cost=0.00..0.41 rows=33 width=32) (actual time=0.004..0.030 rows=33 loops=1)
Planning time: 1.131 ms
Execution time: 847920.680 ms
(9 rows)
Query plan from inner join values():
Limit (cost=0.56..1254.91 rows=2000 width=55) (actual time=38.289..847714.160 rows=952 loops=1)
-> Nested Loop (cost=0.56..10298994.72 rows=16421232 width=55) (actual time=38.287..847712.333 rows=952 loops=1)
Join Filter: ((row.type)::text = "*VALUES*".column1)
Rows Removed by Join Filter: 542378006
-> Index Scan using idx_things_name_id on things row (cost=0.56..2170484.38 rows=16421232 width=92) (actual time=0.019..60303.676 rows=16435726 loops=1)
-> Materialize (cost=0.00..0.58 rows=33 width=32) (actual time=0.001..0.022 rows=33 loops=16435726)
-> Values Scan on "*VALUES*" (cost=0.00..0.41 rows=33 width=32) (actual time=0.002..0.029 rows=33 loops=1)
Planning time: 0.247 ms
Execution time: 847715.215 ms
(9 rows)
Am I doing something incorrectly here?
Any tips on how to handle this?
If any more info is needed I will add it as you guys ask.
ps. The column/table/index names were "anonymised" to comply with the company policy so please do not point to the stupid names :)
I actually figured out what is going on. Postgres planner is correct in its decision. But it makes it basing on not perfect statistical data. The key are those lines of the query plans:
Below 25 UUIDs:
-> Gather Merge (cost=21530.51..36388.05 **rows=129196** width=55)
(actual time=5.055..6.751 **rows=806** loops=1)
Overestimated by ~160 times
Over 25 UUIDs:
-> Gather Merge (cost=1000.58..2400621.01 **rows=325592** width=55)
(actual time=11.551..29855.053 **rows=952** loops=1)
overestimated by ~342(!) times
If this would indeed be 325592 than using the index on (type, id) which already is sorted as needed and filtering from it could be the most efficient. But because the overestimation Postgres needs to remove over 5M rows to fetch the full result:
**Rows Removed by Filter: 5478258**
I guess Postgres figures out that sorting 325592 rows (query > 25 UUIDs) will be so expensive that it is more beneficial to use the already sorted index vs sorting of 129196 rows (query <25 UUIDs) which it can sort in memory.
I took a peek into pg_stats and the statistics were quite unhelpful.
This is because just a few types that are not present in the query occur so often and the UUIDs that do occur fall into the histogram and are overestimated.
Increasing the statistics target for this column:
ALTER TABLE things ALTER COLUMN type SET STATISTICS 1000;
Solved the issue.
Now the query executes in 8ms also for UUIDs lists with more than 25 elements.
UPDATE:
Because of claims that the :text cast visible in the query plan is the culprit I went to the test server and run:
ALTER TABLE things ALTER COLUMN type TYPE text;
no there is no cast but nothing has changed:
`Limit (cost=1000.58..20590.31 rows=2000 width=55) (actual time=12.761..30611.878 rows=952 loops=1)
-> Gather Merge (cost=1000.58..2386715.56 rows=243568 width=55) (actual time=12.759..30682.784 rows=952 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Index Scan using idx_things_name_id on things row (cost=0.56..2357601.74 rows=101487 width=55) (actual time=3.994..24190.693 rows=317 loops=3)
Filter: (type = ANY ('{00000000-0000-0000-0000-000000031201,... (> 24 more)}'::text[]))
Rows Removed by Filter: 5478258
Planning time: 0.227 ms
Execution time: 30685.092 ms
(9 rows)`
This is related with how the PostgreSQL choose to use or not the subquery JOIN.
The query optimizer is free to try to decide how he acts (if uses or not indexes) by some internal rules and statistical behaviors and try to do the cheapest costs.
But, sometimes, this decision could not be the best. You can "force" your optimizer to use joins changing the join_collapse_limit.
You can try using SET join_collapse_limit = 1 and then execute your query to "force" to use the indexes.
Remember to put this option on session, normally the query optimizer has right on decisions.
Is an option that worked for me for subqueries, maybe a try inside a IN will also helps to force your query to use the indexes.
there are 70 different type values of which 2 account for ~15 million rows. Those two are NOT in the IN list
And you address this domain using a 36 character key. That is 36*8 bits of space, where only 7 bits are needed.
put these 70 different types in a separate LookUpTable, containing a surrogate key
the original type field could have a UNIQUE constraint in this table
refer to this table, using its surrogate id as a Foreign Key
... and: probably similar for the other obese keys
This is the query:
EXPLAIN (analyze, BUFFERS, SETTINGS)
SELECT
operation.id
FROM
operation
RIGHT JOIN(
SELECT uid, did FROM (
SELECT uid, did FROM operation where id = 993754
) t
) parts ON (operation.uid = parts.uid AND operation.did = parts.did)
and EXPLAIN info:
Nested Loop Left Join (cost=0.85..29695.77 rows=100 width=8) (actual time=13.709..13.711 rows=1 loops=1)
Buffers: shared hit=4905
-> Unique (cost=0.42..8.45 rows=1 width=16) (actual time=0.011..0.013 rows=1 loops=1)
Buffers: shared hit=5
-> Index Only Scan using oi on operation operation_1 (cost=0.42..8.44 rows=1 width=16) (actual time=0.011..0.011 rows=1 loops=1)
Index Cond: (id = 993754)
Heap Fetches: 1
Buffers: shared hit=5
-> Index Only Scan using oi on operation (cost=0.42..29686.32 rows=100 width=24) (actual time=13.695..13.696 rows=1 loops=1)
Index Cond: ((uid = operation_1.uid) AND (did = operation_1.did))
Heap Fetches: 1
Buffers: shared hit=4900
Settings: max_parallel_workers_per_gather = '4', min_parallel_index_scan_size = '0', min_parallel_table_scan_size = '0', parallel_setup_cost = '0', parallel_tuple_cost = '0', work_mem = '256MB'
Planning Time: 0.084 ms
Execution Time: 13.728 ms
Why does Nested Loop cost more and more time than sum of childs cost? What can I do for that? The Execution Time should less than 1 ms right?
update:
Nested Loop Left Join (cost=5.88..400.63 rows=101 width=8) (actual time=0.012..0.012 rows=1 loops=1)
Buffers: shared hit=8
-> Index Scan using oi on operation operation_1 (cost=0.42..8.44 rows=1 width=16) (actual time=0.005..0.005 rows=1 loops=1)
Index Cond: (id = 993754)
Buffers: shared hit=4
-> Bitmap Heap Scan on operation (cost=5.45..391.19 rows=100 width=24) (actual time=0.004..0.005 rows=1 loops=1)
Recheck Cond: ((uid = operation_1.uid) AND (did = operation_1.did))
Heap Blocks: exact=1
Buffers: shared hit=4
-> Bitmap Index Scan on ou (cost=0.00..5.42 rows=100 width=0) (actual time=0.003..0.003 rows=1 loops=1)
Index Cond: ((uid = operation_1.uid) AND (did = operation_1.did))
Buffers: shared hit=3
Settings: max_parallel_workers_per_gather = '4', min_parallel_index_scan_size = '0', min_parallel_table_scan_size = '0', parallel_setup_cost = '0', parallel_tuple_cost = '0', work_mem = '256MB'
Planning Time: 0.127 ms
Execution Time: 0.028 ms
Thanks all of you, when I split the index to btree(id) and btree(uid, did), everything's going perfect, but what caused those can not be used together? Any details or rules?
BTW, the sql is used for Real-Time Calculation, there are some Window Functions code didn't show here.
The Nested Loop does not take much time actually. The actual time of 13.709..13.711 means that it took 13.709 ms until the first row was ready to be emitted from this node and it took 0.002 ms until it was finished.
Note that the startup cost of 13.709 ms includes the cost of its two child nodes. Both of the child nodes need to emit at least one row before the nested loop can start.
The Unique child began emitting its first (and only) row after 0.011 ms. The Index Only Scan child however only started to emit its first (and only) row after 13.695 ms. This means that most of your actual time spent is in this Index Only Scan.
There is a great answer here which explains the costs and actual times in depth.
Also there is a nice tool at https://explain.depesz.com which calculates an inclusive and exclusive time for each node. Here it is used for your query plan which clearly shows that most of the time is spent in the Index Only Scan.
Since the query is spending almost all of the time in this index only scan, optimizations there will have the most benefit. Creating a separate index for the columns uid and did on the operation table should improve query time a lot.
CREATE INDEX operation_uid_did ON operation(uid, did);
The current execution plan contains 2 index only scans.
A slow one:
-> Index Only Scan using oi on operation (cost=0.42..29686.32 rows=100 width=24) (actual time=13.695..13.696 rows=1 loops=1)
Index Cond: ((uid = operation_1.uid) AND (did = operation_1.did))
Heap Fetches: 1
Buffers: shared hit=4900
And a fast one:
-> Index Only Scan using oi on operation operation_1 (cost=0.42..8.44 rows=1 width=16) (actual time=0.011..0.011 rows=1 loops=1)
Index Cond: (id = 993754)
Heap Fetches: 1
Buffers: shared hit=5
Both of them use the index oi but have different index conditions. Note how the fast one, who uses the id as index condition only needs to load 5 pages of data (Buffers: shared hit=5). The slow one needs to load 4900 pages instead (Buffers: shared hit=4900). This indicates that the index is optimized to query for id but not so much for uid and did. Probably the index oi covers all 3 columns id, uid, did in this order.
A multi-column btree index can only be used efficently when there are constraints in the query on the leftmost columns. The official documentation about multi-column indexes explains this very well in depth.
Why does Nested Loop cost more and more time than sum of childs cost?
Based on your example, it doesn't. Can you elaborate on what makes you think it does?
Anyway, it seems extravagant to visit 4900 pages to fetch 1 tuple. I'm guessing your tables are not getting vacuumed enough.
Although now I prefer Florian's suggestion, that "uid" and "did" are not the leading columns of the index, and that is why it is slow. It is basically doing a full index scan, using the index as a skinny version of the table. It is a shame that EXPLAIN output doesn't make it clear when a index is being used in this fashion, rather than the traditional "jump to a specific part of the index"
So you have a missing index.
I have a table with 28 columns and 7M records without primary key.
CREATE TABLE records (
direction smallint,
exporters_id integer,
time_stamp integer
...
)
I create index on this table and vacuum table after that (autovacuum is on)
CREATE INDEX exporter_dir_time_only_index ON sacopre_records
USING btree (exporters_id, direction, time_stamp);
and i want execute this query
SELECT count(exporters_id) FROM records WHERE exporters_id = 50
The table has 6982224 records with exporters_id = 50. I expected this query use index-only scan to get results but it used sequential scan.
This is "EXPLAIN ANALYZE" output:
Aggregate (cost=204562.25..204562.26 rows=1 width=4) (actual time=1521.862..1521.862 rows=1 loops=1)
-> Seq Scan on sacopre_records (cost=0.00..187106.88 rows=6982149 width=4) (actual time=0.885..1216.211 rows=6982224 loops=1)
Filter: (exporters_id = 50)
Rows Removed by Filter: 2663
Total runtime: 1521.886 ms
but when I change the exporters_id to another id, query use index-only scan
Aggregate (cost=46.05..46.06 rows=1 width=4) (actual time=0.321..0.321 rows=1 loops=1)
-> Index Only Scan using exporter_dir_time_only_index on sacopre_records (cost=0.43..42.85 rows=1281 width=4) (actual time=0.313..0.315 rows=4 loops=1)
Index Cond: (exporters_id = 47)
Heap Fetches: 0
Total runtime: 0.358 ms
Where is the problem?
The explain is telling you the reason. Look closer.
Aggregate (cost=204562.25..204562.26 rows=1 width=4) (actual time=1521.862..1521.862 rows=1 loops=1)
-> Seq Scan on sacopre_records (cost=0.00..187106.88 rows=6982149 width=4) (actual time=0.885..1216.211 rows=6982224 loops=1)
Filter: (exporters_id = 50)
Rows Removed by Filter: 2663
Total runtime: 1521.886 ms
Your filter is removing only 2663 rows out of the total amount of 6982149 rows in the table, hence doing a sequential scan should really be faster than using an index, as the disk head should pass through 6982149 - 2663 = 6979486 records anyway. The disk head is starting to read the entire table sequentially and on the way is removing that tiny fraction (0.000004 %) that does not match your criteria. While in the index scan case it should jump from the index file(s) and get back to the data file(s) 6979486 times, which for sure should be slower than these 1.5 seconds you are getting now!
I'm experiencing strange PostgreSQL behavior.
I have partitioned history table into smaller pieces based on time
History -> History_part_YYYY-MM
Check constraints:
"History_part_2013-11_sentdate_check" CHECK (sentdate >= '2013-11-01 00:00:00-04'::timestamp with time zone AND sentdate < '2013-12-01 00:00:00-05'::timestamp with time zone)
Inherits: "History"
Each partition has its own index on transaction_id column.
History_part_2013-11_transaction_id_idx" btree (transaction_id)
It is as far as I know 'nothing special' way of partitioning, taken from postgres tutorial.
What the problem is that executing this query is slow:
SELECT * FROM "History" WHERE transaction_id = 'MMS-dev-23599-2013-12-11-13:03:53.349735' LIMIT 1;
I was able to narrow the problem down that this query is slow only FIRST TIME per script, if it is run second time it is fast. If it is run again in separate script it is slow again and second run (in script) will be fast again... I really have no explanation for this. It is not inside any transaction.
Here are sample execution times of two queries run one by one in same script:
1.33s SELECT * FROM "History" WHERE transaction_id = 'MMS-dev-14970-2013-12-11-13:18:29.889376' LIMIT 1;...
0.019s SELECT * FROM "History" WHERE transaction_id = 'MMS-dev-14970-2013-12-11-13:18:29.889376' LIMIT 1;
The first query is that slow that is trigger 'explain analyze' call and that looks like this (and is really really fast too):
Limit (cost=0.00..8.07 rows=1 width=2589) (actual time=0.972..0.973 rows=1 loops=1)
-> Result (cost=0.00..581.07 rows=72 width=2589) (actual time=0.964..0.964 rows=1 loops=1)
-> Append (cost=0.00..581.07 rows=72 width=2589) (actual time=0.958..0.958 rows=1 loops=1)
-> Seq Scan on "History" (cost=0.00..1.00 rows=1 width=3760) (actual time=0.015..0.015 rows=0 loops=1)
Filter: ((transaction_id)::text = 'MMS-dev-23595-2013-12-11-13:20:10.422306'::text)
-> Index Scan using "History_part_2013-10_transaction_id_idx" on "History_part_2013-10" "History" (cost=0.00..8.28 rows=1 width=1829) (actual time=0.040..0.040 rows=0 loops=1)
Index Cond: ((transaction_id)::text = 'MMS-dev-23595-2013-12-11-13:20:10.422306'::text)
-> Index Scan using "History_part_2013-02_transaction_id_idx" on "History_part_2013-02" "History" (cost=0.00..8.32 rows=1 width=1707) (actual time=0.021..0.021 rows=0 loops=1)
Index Cond: ((transaction_id)::text = 'MMS-dev-23595-2013-12-11-13:20:10.422306'::text)
....
and it check all tables (around 54 now - few tables are empty created for future ) and at the end
-> Index Scan using "History_part_2014-10_transaction_id_idx" on "History_part_2014-10" "History" (cost=0.00..8.27 rows=1 width=3760) (never executed)
Index Cond: ((transaction_id)::text = 'MMS-dev-23595-2013-12-11-13:20:10.422306'::text)
Total runtime: 6.390 ms
The Total runtime is 0,006s and the first query is always above 1s - if there is more concurrent scripts running (each with UNIQUE transaction_id) first execution can go up to 20s and the second execution is at few miliseconds.
Did anyone experience that? I wonder if there is something wrong I am doing or maybe this is postgres issue??
I upgraded postgres from 9.2.4 -> 9.2.5 - it seems it is slightly better but the issue definitely remains.
UPDATE:
I use this query now:
SELECT * FROM "History" WHERE transaction_id = 'MMS-live-15425-2013-18-11-17:32:20.917198' AND sentdate>='2013-10-18' AND sentdate<'2013-11-19' LIMIT 1
First time it is run in the script - 3 to 8 SECONDS when many queries run at once against this table (if there s only on script at a time it is much faster).
When I change first query in script to (calls the partition table directly):
SELECT * FROM "History_part_2013-11" WHERE transaction_id = 'MMS-live-15425-2013-18-11-17:32:20.917198' AND sentdate>='2013-10-18' AND sentdate<'2013-11-19' LIMIT 1
It is like 0.03s - much faster BUT the next query in the script that uses query against "History" table is still around 3-8 SECONDS.
Here is the explain analyze of the first query against "History"
Limit (cost=0.00..25.41 rows=1 width=2540) (actual time=0.129..0.130 rows=1 loops=1)
-> Result (cost=0.00..76.23 rows=3 width=2540) (actual time=0.121..0.121 rows=1 loops=1)
-> Append (cost=0.00..76.23 rows=3 width=2540) (actual time=0.117..0.117 rows=1 loops=1)
-> Seq Scan on "History" (cost=0.00..58.00 rows=1 width=3750) (actual time=0.060..0.060 rows=0 loops=1)
Filter: ((sentdate >= '2013-10-18 00:00:00-04'::timestamp with time zone) AND (sentdate < '2013-11-19 00:00:00-05'::timestamp with time zone) AND ((transaction_id)::text = 'MMS-live-15425-2013-18-11-17:32:20.917198'::text))
-> Index Scan using "History_part_2013-11_transaction_id_idx" on "History_part_2013-11" "History" (cost=0.00..8.36 rows=1 width=1985) (actual time=0.051..0.051 rows=1 loops=1)
Index Cond: ((transaction_id)::text = 'MMS-live-15425-2013-18-11-17:32:20.917198'::text)
Filter: ((sentdate >= '2013-10-18 00:00:00-04'::timestamp with time zone) AND (sentdate < '2013-11-19 00:00:00-05'::timestamp with time zone))
-> Index Scan using "History_part_2013-10_transaction_id_idx" on "History_part_2013-10" "History" (cost=0.00..9.87 rows=1 width=1884) (never executed)
Index Cond: ((transaction_id)::text = 'MMS-live-15425-2013-18-11-17:32:20.917198'::text)
Filter: ((sentdate >= '2013-10-18 00:00:00-04'::timestamp with time zone) AND (sentdate < '2013-11-19 00:00:00-05'::timestamp with time zone))
Total runtime: 0.572 ms
Seems like it is ALWAYS slow when running against 'main' History table (but not when calling partition directly) and only for the first time - is that some cashing thing? But then why calling partition directly is so much faster - calling main History table does not check all tables anymore.
See comments above, the partitioning criteria (sentdate) must be included in the query and must be a constant expression for partition exclusion to work.