PostgreSQL join fetch all rows in table, too slow - postgresql

i have the two tables "commissions" and "mt4_trades". In "mt4_trades" "ticket" column is private key, in "commissions" there is "order_id" and it is has relation to mt4_trades.ticket as one to many (one "ticket" to many "order_id"). And i have this statement:
SELECT commissions.ibs_account AS ibs_account
FROM "public"."mt4_trades"
INNER JOIN commissions ON commissions.order_id = mt4_trades.ticket
WHERE "mt4_trades"."close_time" >= '2014.11.01'
AND "mt4_trades"."close_time" < '2014.12.01'
commissions table constains about 4 millions rows. This statement return 480000 rows. But it is too slow: executions time 9 sec. I did EXPLAIN ANALYZE:
Hash Join (cost=43397.07..216259.97 rows=144233 width=7) (actual time=3993.839..9459.896 rows=488131 loops=1)
Hash Cond: (commissions.order_id = mt4_trades.ticket)
-> Seq Scan on commissions (cost=0.00..116452.08 rows=3997708 width=15) (actual time=0.005..4185.254 rows=3997157 loops=1)
-> Hash (cost=42485.10..42485.10 rows=72958 width=4) (actual time=288.767..288.767 rows=97260 loops=1)
Buckets: 8192 Batches: 1 Memory Usage: 3420kB
-> Index Scan using "INDEX_CLOSETIME" on mt4_trades (cost=0.43..42485.10 rows=72958 width=4) (actual time=0.020..174.810 rows=97260 loops=1)
Index Cond: ((close_time >= '2014-11-01 00:00:00'::timestamp without time zone) AND (close_time < '2014-12-01 00:00:00'::timestamp without time zone))
Total runtime: 9881.979 ms
This row:
-> Seq Scan on commissions (cost=0.00..116452.08 rows=3997708 width=15) (actual time=0.005..4185.254 rows=3997157 loops=1)
Means that scan the whole "commissions" table instead compare "order_id" and "ticket" first.
Can you help me how i can improve this query. THanks

9 seconds to return half a million rows is not terrible, and a sequential scan on 4M could be much faster than faster than 100K indexed lookups on 4M. Assuming you've got an index on order_id, you can test this by running set enable_seqscan TO false; before running your query (this will only affect the current connection).
Do you really need all 500K rows every time you run this query? Or are you going to be filtering the results? If you're almost always going to be filtering the results in some other way, you probably want to optimize against that query rather than the one that returns all 500K rows.

Related

Improve PostgreSQL query response

I have one table that includes about 100K rows and their growth and growth.
My query response has a bad response time and it affects my front-end user experience.
I want to ask for your help to improve my response time from the DB.
Today the PostgreSQL runs on my local machine, Macbook pro 13 2019 16 RAM and I5 Core.
I the future I will load this DB on a docker and run it on a Better server.
What can I do to improve it for now?
Table Structure:
CREATE TABLE dots
(
dot_id INT,
site_id INT,
latitude float ( 6 ),
longitude float ( 6 ),
rsrp float ( 6 ),
dist INT,
project_id INT,
dist_from_site INT,
geom geometry,
dist_from_ref INT,
file_name VARCHAR
);
The dot_id resets after inserting the bulk of data and each for "file_name" column.
Table Dots:
The queries:
Query #1:
await db.query(
`select MAX(rsrp) FROM dots where site_id=$1 and ${table}=$2 and project_id = $3 and file_name ilike $4`,
[site_id, dist, project_id, filename]
);
Time for response: 200ms
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=37159.88..37159.89 rows=1 width=4) (actual time=198.416..201.762 rows=1 loops=1)
Buffers: shared hit=16165 read=16031
-> Gather (cost=37159.66..37159.87 rows=2 width=4) (actual time=198.299..201.752 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=16165 read=16031
-> Partial Aggregate (cost=36159.66..36159.67 rows=1 width=4) (actual time=179.009..179.010 rows=1 loops=3)
Buffers: shared hit=16165 read=16031
-> Parallel Seq Scan on dots (cost=0.00..36150.01 rows=3861 width=4) (actual time=122.889..178.817 rows=1088 loops=3)
Filter: (((file_name)::text ~~* 'BigFile'::text) AND (site_id = 42047) AND (dist_from_ref = 500) AND (project_id = 1))
Rows Removed by Filter: 157073
Buffers: shared hit=16165 read=16031
Planning Time: 0.290 ms
Execution Time: 201.879 ms
(14 rows)
Query #2:
await db.query(
`SELECT DISTINCT (${table}) FROM dots where site_id=$1 and project_id = $2 and file_name ilike $3 order by ${table}`,
[site_id, project_id, filename]
);
Time for response: 1100ms
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Sort (cost=41322.12..41322.31 rows=77 width=4) (actual time=1176.071..1176.077 rows=66 loops=1)
Sort Key: dist_from_ref
Sort Method: quicksort Memory: 28kB
Buffers: shared hit=16175 read=16021
-> HashAggregate (cost=41318.94..41319.71 rows=77 width=4) (actual time=1176.024..1176.042 rows=66 loops=1)
Group Key: dist_from_ref
Batches: 1 Memory Usage: 24kB
Buffers: shared hit=16175 read=16021
-> Seq Scan on dots (cost=0.00..40499.42 rows=327807 width=4) (actual time=0.423..1066.316 rows=326668 loops=1)
Filter: (((file_name)::text ~~* 'BigFile'::text) AND (site_id = 42047) AND (project_id = 1))
Rows Removed by Filter: 147813
Buffers: shared hit=16175 read=16021
Planning:
Buffers: shared hit=5 dirtied=1
Planning Time: 0.242 ms
Execution Time: 1176.125 ms
(16 rows)
Query #3:
await db.query(
`SELECT count(*) FROM dots WHERE site_id = $1 AND ${table} = $2 and project_id = $3 and file_name ilike $4`,
[site_id, dist, project_id, filename]
);
Time for response: 200ms
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=37159.88..37159.89 rows=1 width=8) (actual time=198.725..202.335 rows=1 loops=1)
Buffers: shared hit=16160 read=16036
-> Gather (cost=37159.66..37159.87 rows=2 width=8) (actual time=198.613..202.328 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=16160 read=16036
-> Partial Aggregate (cost=36159.66..36159.67 rows=1 width=8) (actual time=179.182..179.183 rows=1 loops=3)
Buffers: shared hit=16160 read=16036
-> Parallel Seq Scan on dots (cost=0.00..36150.01 rows=3861 width=0) (actual time=119.340..179.020 rows=1088 loops=3)
Filter: (((file_name)::text ~~* 'BigFile'::text) AND (site_id = 42047) AND (dist_from_ref = 500) AND (project_id = 1))
Rows Removed by Filter: 157073
Buffers: shared hit=16160 read=16036
Planning Time: 0.109 ms
Execution Time: 202.377 ms
(14 rows)
Tables do no have any indexes.
I added an index and it helped a bit... create index idx1 on dots (site_id, project_id, file_name, dist_from_site,dist_from_ref)
OK, that's a bit too much.
An index on columns (a,b) is useful for "where a=..." and also for "where a=... and b=..." but it is not really useful for "where b=...". Creating an index with many columns uses more disk space and is slower than with fewer columns, which is a waste if the extra columns in the index don't make your queries faster. Both dist_ columns in the index will probably not be used.
Indices are a compromise : if your table has small rows, like two integer columns, and you create an index on these two columns, then it will be as big as the table, so you better be sure you need it. But in your case, your rows are pretty large at around 5kB and the number of rows is small, so adding an index or several on small int columns costs very little overhead.
So, since you very often use WHERE conditions on both site_id and project_id you can create an index on site_id,project_id. This will also work for a WHERE condition on site_id alone. If you sometimes use project_id alone, you can swap the order of the columns so it appears first, or even create another index.
You say the cardinality of these columns is about 30, so selecting on one value of site_id or project_id should hit 1/30 or 3.3% of the table, and selecting on both columns should hit 0.1% of the table, if they are uncorrelated and evenly distributed. This should already result in a substantial speedup.
You could also add an index on dist_from_site, and another one on dist_on_ref, if they have good selectivity (ie, high cardinality in those columns). Postgres can combine indices with bitmap index scan. But, if say 50% of the rows in the table have the same value for dist_from_site, then an index will be useless for this value, due to not having enough selectivity.
You could also replace the previous 2-column index with 2 indices on site_id,project_id,dist_from_site and site_id,project_id,dist_from_ref. You can try it, see if it is worth the extra resources.
Also there's the filename column and ILIKE. ILIKE can't use an index, which means it's slow. One solution is to use an expression index
CREATE INDEX dots_filename ON dots( lower(file_name) );
and replace your where condition with:
lower(file_name) like lower($4)
This will use the index unless the parameter $4 starts with a "%". And if you never use the LIKE '%' wildcards, and you're just using ILIKE for case-insensitive comparison, then you can replace LIKE with the = operator. Basically lower(a) = lower(b) is a case-insensitive comparison.
Likewise you could replace the previous 2-column index with an index on site_id,project_id,lower(filename) if you often use these three together in the WHERE condition. But as said above, it won't optimize searches on filename alone.
Since your rows are huge, even adding 1 kB of index per row will only add 20% overhead to your table, so you can overdo it without too much trouble. So go ahead and experiment, you'll see what works best.

Postgres uses Hash Join with Seq Scan when Inner Select Index Cond is faster

Postgres is using a much heavier Seq Scan on table tracking when an index is available. The first query was the original attempt, which uses a Seq Scan and therefore has a slow query. I attempted to force an Index Scan with an Inner Select, but postgres converted it back to effectively the same query with nearly the same runtime. I finally copied the list from the Inner Select of query two to make the third query. Finally postgres used the Index Scan, which dramatically decreased the runtime. The third query is not viable in a production environment. What will cause postgres to use the last query plan?
(vacuum was used on both tables)
Tables
tracking (worker_id, localdatetime) total records: 118664105
project_worker (id, project_id) total records: 12935
INDEX
CREATE INDEX tracking_worker_id_localdatetime_idx ON public.tracking USING btree (worker_id, localdatetime)
Queries
SELECT worker_id, localdatetime FROM tracking t JOIN project_worker pw ON t.worker_id = pw.id WHERE project_id = 68475018
Hash Join (cost=29185.80..2638162.26 rows=19294218 width=16) (actual time=16.912..18376.032 rows=177681 loops=1)
Hash Cond: (t.worker_id = pw.id)
-> Seq Scan on tracking t (cost=0.00..2297293.86 rows=118716186 width=16) (actual time=0.004..8242.891 rows=118674660 loops=1)
-> Hash (cost=29134.80..29134.80 rows=4080 width=8) (actual time=16.855..16.855 rows=2102 loops=1)
Buckets: 4096 Batches: 1 Memory Usage: 115kB
-> Seq Scan on project_worker pw (cost=0.00..29134.80 rows=4080 width=8) (actual time=0.004..16.596 rows=2102 loops=1)
Filter: (project_id = 68475018)
Rows Removed by Filter: 10833
Planning Time: 0.192 ms
Execution Time: 18382.698 ms
SELECT worker_id, localdatetime FROM tracking t WHERE worker_id IN (SELECT id FROM project_worker WHERE project_id = 68475018 LIMIT 500)
Hash Semi Join (cost=6905.32..2923969.14 rows=27733254 width=24) (actual time=19.715..20191.517 rows=20530 loops=1)
Hash Cond: (t.worker_id = project_worker.id)
-> Seq Scan on tracking t (cost=0.00..2296948.27 rows=118698327 width=24) (actual time=0.005..9184.676 rows=118657026 loops=1)
-> Hash (cost=6899.07..6899.07 rows=500 width=8) (actual time=1.103..1.103 rows=500 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 28kB
-> Limit (cost=0.00..6894.07 rows=500 width=8) (actual time=0.006..1.011 rows=500 loops=1)
-> Seq Scan on project_worker (cost=0.00..28982.65 rows=2102 width=8) (actual time=0.005..0.968 rows=500 loops=1)
Filter: (project_id = 68475018)
Rows Removed by Filter: 4493
Planning Time: 0.224 ms
Execution Time: 20192.421 ms
SELECT worker_id, localdatetime FROM tracking t WHERE worker_id IN (322016383,316007840,...,285702579)
Index Scan using tracking_worker_id_localdatetime_idx on tracking t (cost=0.57..4766798.31 rows=21877360 width=24) (actual time=0.079..29.756 rows=22112 loops=1)
" Index Cond: (worker_id = ANY ('{322016383,316007840,...,285702579}'::bigint[]))"
Planning Time: 1.162 ms
Execution Time: 30.884 ms
... is in place of the 500 id entries used in the query
Same query ran on another set of 500 id's
Index Scan using tracking_worker_id_localdatetime_idx on tracking t (cost=0.57..4776714.91 rows=21900980 width=24) (actual time=0.105..5528.109 rows=117838 loops=1)
" Index Cond: (worker_id = ANY ('{286237712,286237844,...,216724213}'::bigint[]))"
Planning Time: 2.105 ms
Execution Time: 5534.948 ms
The distribution of "worker_id" within "tracking" seems very skewed. For one thing, the number of rows in one of your instances of query 3 returns over 5 times as many rows as the other instance of it. For another, the estimated number of rows is 100 to 1000 times higher than the actual number. This can certainly lead to bad plans (although it is unlikely to be the complete picture).
What is the actual number of distinct values for worker_id within tracking: select count(distinct worker_id) from tracking? What does the planner think this value is: select n_distinct from pg_stats where tablename='tracking' and attname='worker_id'? If those values are far apart and you force the planner to use a more reasonable value with alter table tracking alter column worker_id set (n_distinct = <real value>); analyze tracking; does that change the plans?
If you want to nudge PostgreSQL towards a nested loop join, try the following:
Create an index on tracking that can be used for an index-only scan:
CREATE INDEX ON tracking (worker_id) INCLUDE (localdatetime);
Make sure that tracking is VACUUMed often, so that an index-only scan is effective.
Reduce random_page_cost and increase effective_cache_size so that the optimizer prices index scans lower (but don't use insane values).
Make sure that you have good estimates on project_worker:
ALTER TABLE project_worker ALTER project_id SET STATISTICS 1000;
ANALYZE project_worker;

Postgres not using index to sort data

I have a table learners which has around 3.2 million rows. This table contains user related information like name and email. I need to optimize some queries that uses order by on some column. So for testing I have created a temp_learners table, with 0.8 million rows. I have created two indexes on this table:
CREATE UNIQUE INDEX "temp_learners_companyId_userId_idx"
ON temp_learners ("companyId" ASC, "userId" ASC, "learnerUserName" ASC, "learnerEmailId" ASC);
and
CREATE INDEX temp_learners_company_name_email_index
ON temp_learners ("companyId", "learnerUserName", "learnerEmailId");
The second index is just for testing.
Now When I run this query:
SELECT *
FROM temp_learners
WHERE "companyId" = 909666665757230431 AND "userId" IN (
4990609084216745771,
4990610022492247987,
4990609742667096366,
4990609476136523663,
5451985767018841230,
5451985767078553638,
5270390122102920730,
4763688819142650938,
5056979692501246449,
5279569274741647114,
5031660827132289520,
4862889373349389098,
5299864070077160421,
4740222596778406913,
5320170488686569878,
5270367618320474818,
5320170488587895729,
5228888485293847415,
4778050469432720821,
5270392314970177842,
4849087862439244546,
5270392117430427860,
5270351184072717902,
5330263074228870897,
4763688829301614114,
4763684609695916489,
5270390232949727716
) ORDER BY "learnerUserName","learnerEmailId";
The query plan used by db is this:
Sort (cost=138.75..138.76 rows=4 width=1581) (actual time=0.169..0.171 rows=27 loops=1)
" Sort Key: ""learnerUserName"", ""learnerEmailId"""
Sort Method: quicksort Memory: 73kB
-> Index Scan using "temp_learners_companyId_userId_idx" on temp_learners (cost=0.55..138.71 rows=4 width=1581) (actual time=0.018..0.112 rows=27 loops=1)
" Index Cond: ((""companyId"" = '909666665757230431'::bigint) AND (""userId"" = ANY ('{4990609084216745771,4990610022492247987,4990609742667096366,4990609476136523663,5451985767018841230,5451985767078553638,5270390122102920730,4763688819142650938,5056979692501246449,5279569274741647114,5031660827132289520,4862889373349389098,5299864070077160421,4740222596778406913,5320170488686569878,5270367618320474818,5320170488587895729,5228888485293847415,4778050469432720821,5270392314970177842,4849087862439244546,5270392117430427860,5270351184072717902,5330263074228870897,4763688829301614114,4763684609695916489,5270390232949727716}'::bigint[])))"
Planning time: 0.116 ms
Execution time: 0.191 ms
In this it does not sort on indexs.
But when I run this query
SELECT *
FROM temp_learners
WHERE "companyId" = 909666665757230431
ORDER BY "learnerUserName","learnerEmailId" limit 500;
This query uses indexs on sorting.
Limit (cost=0.42..1360.05 rows=500 width=1581) (actual time=0.018..0.477 rows=500 loops=1)
-> Index Scan using temp_learners_company_name_email_index on temp_learners (cost=0.42..332639.30 rows=122327 width=1581) (actual time=0.018..0.442 rows=500 loops=1)
Index Cond: ("companyId" = '909666665757230431'::bigint)
Planning time: 0.093 ms
Execution time: 0.513 ms
What I am not able to understand is why postgre does not uses index in first query? Also, I want to clear out that the normal use case of this table learner is to join with other tables. So the first query I written is more similar to joins equation. So for example,
SELECT *
FROM temp_learners AS l
INNER JOIN entity_learners_basic AS elb
ON l."companyId" = elb."companyId" AND l."userId" = elb."userId"
WHERE l."companyId" = 909666665757230431 AND elb."gameId" = 1050403501267716928
ORDER BY "learnerUserName", "learnerEmailId" limit 5000;
Even after correcting indexes the query plan does not indexes for sorting.
QUERY PLAN
Limit (cost=3785.11..3785.22 rows=44 width=1767) (actual time=163.554..173.135 rows=5000 loops=1)
-> Sort (cost=3785.11..3785.22 rows=44 width=1767) (actual time=163.553..172.791 rows=5000 loops=1)
" Sort Key: l.""learnerUserName"", l.""learnerEmailId"""
Sort Method: external merge Disk: 35416kB
-> Nested Loop (cost=1.12..3783.91 rows=44 width=1767) (actual time=0.019..63.743 rows=21195 loops=1)
-> Index Scan using primary_index__entity_learners_basic on entity_learners_basic elb (cost=0.57..1109.79 rows=314 width=186) (actual time=0.010..6.221 rows=21195 loops=1)
Index Cond: (("companyId" = '909666665757230431'::bigint) AND ("gameId" = '1050403501267716928'::bigint))
-> Index Scan using "temp_learners_companyId_userId_idx" on temp_learners l (cost=0.55..8.51 rows=1 width=1581) (actual time=0.002..0.002 rows=1 loops=21195)
Index Cond: (("companyId" = '909666665757230431'::bigint) AND ("userId" = elb."userId"))
Planning time: 0.309 ms
Execution time: 178.422 ms
Does Postgres not use indexes when joining and ordering data?
PostgreSQL chooses the plan it thinks will be faster. Using the index that provides rows in the correct order means using a much less selective index, so it doesn't think that will be faster overall.
If you want to force PostgreSQL into believing that sorting is the worst thing in the world, you could set enable_sort=off. If it still sorts after that, then you know PostgreSQL doesn't have the right indexes to avoid sorting, as opposed to just thinking they will not actually be faster.
PostgreSQL could use an index on ("companyId", "learnerUserName", "learnerEmailId") for your first query, but the additional IN condition reduces the number of result rows to an estimated 4 rows, which means that the sort won't cost anything at all. So it chooses to use the index that can support the IN condition.
Rows returned with that index won't be in the correct order automatically, because
you specified DESC for the last index column, but ASC to the preceding one
you have more than one element in the IN list.
Without the IN condition, enough rows are returned, so that PostgreSQL thinks that it is cheaper to order by the index and filter out rows that don't match the condition.
With your first query, it is impossible to have an index that supports both the IN list in the WHERE condition and the ORDER BY clause, so PostgreSQL has to make a choice.

Speed up Postgresql row counts

I've read How do I speed up counting rows in a PostgreSQL table? and https://wiki.postgresql.org/wiki/Slow_Counting but I'm not getting any closer to a better result. The problem with most of the answers in the other question do not have any kind of filtering and are relying on table wide statistics.
I have a table of about 10 million rows, currently the data is fetched in pages (I know this pagination strategy is not ideal) and I have to display a total count back to the user (business requirement). The query is fast, typically < 200ms and looks like this:
explain analyze
SELECT DISTINCT ON (ID) table.*
FROM "table"
WHERE "table"."deleted_at" IS NULL
GROUP BY "table"."id"
ORDER BY "table"."id" DESC
LIMIT 25 OFFSET 200;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=530.48..585.91 rows=25 width=252) (actual time=0.870..0.942 rows=25 loops=1)
-> Unique (cost=87.00..19878232.36 rows=8964709 width=252) (actual time=0.328..0.899 rows=225 loops=1)
-> Group (cost=87.00..15395877.86 rows=8964709 width=252) (actual time=0.327..0.747 rows=225 loops=1)
Group Key: id
-> Index Scan Backward using table_pkey on table (cost=87.00..10913523.36 rows=8964709 width=252) (actual time=0.324..0.535 rows=225 loops=1)
Filter: (deleted_at IS NULL)
Rows Removed by Filter: 397
Planning time: 0.174 ms
Execution time: 0.986 ms
(9 rows)
Time: 77.437 ms
The problem is when I try and display back the count, via:
explain analyze
SELECT COUNT(*) AS count_all, "table"."id" AS id
FROM "table"
WHERE "table"."deleted_at" IS NULL
GROUP BY "table"."id";
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=87.00..21194868.36 rows=10282202 width=4) (actual time=0.016..16984.904 rows=10343557 loops=1)
Group Key: id
-> Index Scan using table_pkey on table (cost=87.00..5771565.36 rows=10282202 width=4) (actual time=0.012..11435.350 rows=10343557 loops=1)
Filter: (deleted_at IS NULL)
Rows Removed by Filter: 2170
Planning time: 0.098 ms
Execution time: 18638.381 ms
(7 rows)
I can't use probabilistic counts right this moment but I also can't live with 10-50 seconds to return back a count. Is there any other way I can speed this up?

postgresql index fetch on partitioned table

I'm experiencing strange PostgreSQL behavior.
I have partitioned history table into smaller pieces based on time
History -> History_part_YYYY-MM
Check constraints:
"History_part_2013-11_sentdate_check" CHECK (sentdate >= '2013-11-01 00:00:00-04'::timestamp with time zone AND sentdate < '2013-12-01 00:00:00-05'::timestamp with time zone)
Inherits: "History"
Each partition has its own index on transaction_id column.
History_part_2013-11_transaction_id_idx" btree (transaction_id)
It is as far as I know 'nothing special' way of partitioning, taken from postgres tutorial.
What the problem is that executing this query is slow:
SELECT * FROM "History" WHERE transaction_id = 'MMS-dev-23599-2013-12-11-13:03:53.349735' LIMIT 1;
I was able to narrow the problem down that this query is slow only FIRST TIME per script, if it is run second time it is fast. If it is run again in separate script it is slow again and second run (in script) will be fast again... I really have no explanation for this. It is not inside any transaction.
Here are sample execution times of two queries run one by one in same script:
1.33s SELECT * FROM "History" WHERE transaction_id = 'MMS-dev-14970-2013-12-11-13:18:29.889376' LIMIT 1;...
0.019s SELECT * FROM "History" WHERE transaction_id = 'MMS-dev-14970-2013-12-11-13:18:29.889376' LIMIT 1;
The first query is that slow that is trigger 'explain analyze' call and that looks like this (and is really really fast too):
Limit (cost=0.00..8.07 rows=1 width=2589) (actual time=0.972..0.973 rows=1 loops=1)
-> Result (cost=0.00..581.07 rows=72 width=2589) (actual time=0.964..0.964 rows=1 loops=1)
-> Append (cost=0.00..581.07 rows=72 width=2589) (actual time=0.958..0.958 rows=1 loops=1)
-> Seq Scan on "History" (cost=0.00..1.00 rows=1 width=3760) (actual time=0.015..0.015 rows=0 loops=1)
Filter: ((transaction_id)::text = 'MMS-dev-23595-2013-12-11-13:20:10.422306'::text)
-> Index Scan using "History_part_2013-10_transaction_id_idx" on "History_part_2013-10" "History" (cost=0.00..8.28 rows=1 width=1829) (actual time=0.040..0.040 rows=0 loops=1)
Index Cond: ((transaction_id)::text = 'MMS-dev-23595-2013-12-11-13:20:10.422306'::text)
-> Index Scan using "History_part_2013-02_transaction_id_idx" on "History_part_2013-02" "History" (cost=0.00..8.32 rows=1 width=1707) (actual time=0.021..0.021 rows=0 loops=1)
Index Cond: ((transaction_id)::text = 'MMS-dev-23595-2013-12-11-13:20:10.422306'::text)
....
and it check all tables (around 54 now - few tables are empty created for future ) and at the end
-> Index Scan using "History_part_2014-10_transaction_id_idx" on "History_part_2014-10" "History" (cost=0.00..8.27 rows=1 width=3760) (never executed)
Index Cond: ((transaction_id)::text = 'MMS-dev-23595-2013-12-11-13:20:10.422306'::text)
Total runtime: 6.390 ms
The Total runtime is 0,006s and the first query is always above 1s - if there is more concurrent scripts running (each with UNIQUE transaction_id) first execution can go up to 20s and the second execution is at few miliseconds.
Did anyone experience that? I wonder if there is something wrong I am doing or maybe this is postgres issue??
I upgraded postgres from 9.2.4 -> 9.2.5 - it seems it is slightly better but the issue definitely remains.
UPDATE:
I use this query now:
SELECT * FROM "History" WHERE transaction_id = 'MMS-live-15425-2013-18-11-17:32:20.917198' AND sentdate>='2013-10-18' AND sentdate<'2013-11-19' LIMIT 1
First time it is run in the script - 3 to 8 SECONDS when many queries run at once against this table (if there s only on script at a time it is much faster).
When I change first query in script to (calls the partition table directly):
SELECT * FROM "History_part_2013-11" WHERE transaction_id = 'MMS-live-15425-2013-18-11-17:32:20.917198' AND sentdate>='2013-10-18' AND sentdate<'2013-11-19' LIMIT 1
It is like 0.03s - much faster BUT the next query in the script that uses query against "History" table is still around 3-8 SECONDS.
Here is the explain analyze of the first query against "History"
Limit (cost=0.00..25.41 rows=1 width=2540) (actual time=0.129..0.130 rows=1 loops=1)
-> Result (cost=0.00..76.23 rows=3 width=2540) (actual time=0.121..0.121 rows=1 loops=1)
-> Append (cost=0.00..76.23 rows=3 width=2540) (actual time=0.117..0.117 rows=1 loops=1)
-> Seq Scan on "History" (cost=0.00..58.00 rows=1 width=3750) (actual time=0.060..0.060 rows=0 loops=1)
Filter: ((sentdate >= '2013-10-18 00:00:00-04'::timestamp with time zone) AND (sentdate < '2013-11-19 00:00:00-05'::timestamp with time zone) AND ((transaction_id)::text = 'MMS-live-15425-2013-18-11-17:32:20.917198'::text))
-> Index Scan using "History_part_2013-11_transaction_id_idx" on "History_part_2013-11" "History" (cost=0.00..8.36 rows=1 width=1985) (actual time=0.051..0.051 rows=1 loops=1)
Index Cond: ((transaction_id)::text = 'MMS-live-15425-2013-18-11-17:32:20.917198'::text)
Filter: ((sentdate >= '2013-10-18 00:00:00-04'::timestamp with time zone) AND (sentdate < '2013-11-19 00:00:00-05'::timestamp with time zone))
-> Index Scan using "History_part_2013-10_transaction_id_idx" on "History_part_2013-10" "History" (cost=0.00..9.87 rows=1 width=1884) (never executed)
Index Cond: ((transaction_id)::text = 'MMS-live-15425-2013-18-11-17:32:20.917198'::text)
Filter: ((sentdate >= '2013-10-18 00:00:00-04'::timestamp with time zone) AND (sentdate < '2013-11-19 00:00:00-05'::timestamp with time zone))
Total runtime: 0.572 ms
Seems like it is ALWAYS slow when running against 'main' History table (but not when calling partition directly) and only for the first time - is that some cashing thing? But then why calling partition directly is so much faster - calling main History table does not check all tables anymore.
See comments above, the partitioning criteria (sentdate) must be included in the query and must be a constant expression for partition exclusion to work.