Composite primary key and slow queries - postgresql

I have a table which looks like this:
CREATE TABLE IF NOT EXISTS codes (
code character varying(255) NOT NULL,
used boolean NOT NULL DEFAULT FALSE,
user_id integer NOT NULL,
order_id integer,
created timestamp without time zone,
CONSTRAINT pk_code PRIMARY KEY (code, user_id)
);
And I have simple query:
SELECT COUNT(*) as "count"
FROM "codes" "code"
WHERE "code"."user_id" = {some_id}
AND "code"."order_id" IS NULL;
Which is very slow, explain analyze:
'Aggregate (cost=3576.10..3576.11 rows=1 width=8) (actual time=1471.870..1471.871 rows=1 loops=1)'
' -> Seq Scan on codes (cost=0.00..3323.89 rows=50443 width=17) (actual time=0.139..203.139 rows=49998 loops=1)'
' Filter: ((order_id IS NULL) AND (user_id = 10))'
' Rows Removed by Filter: 116498'
'Planning Time: 1.450 ms'
'Execution Time: 1471.981 ms'
How can I optimize this query?

An index can only be used efficiently if the first index column is used in the WHERE condition (yes, I know there are exceptions, but it is a good rule of thumb).
So to support that query, you should define the primary key the other way around, as (user_id, code). That will guarantee the constraint just as well, but the underlying index will support your query.
If you cannot change the primary key like that, you need another index on user_id alone.
But then, looking on the Rows Removed by Filter, probably a sequential scan is the fastest access strategy, and an index won't help at all. Try it out.

Related

SQL Performance problem with like query after migration from MySQL to PostgreSQL

I migrated my database from MySQL to PostgreSQL with pgloader, it's globally much more efficient but a query with like condition is more slower on PostgreSQL.
MySQL : ~1ms
PostgreSQL : ~110 ms
Table info:
105 columns
23 indexes
1.6M records
Columns info:
name character varying(30) COLLATE pg_catalog."default" NOT NULL,
ratemax3v3 integer NOT NULL DEFAULT 0,
Query is :
SELECT name, realm, region, class, id
FROM personnages
WHERE blacklisted = 0 AND name LIKE 'Krok%' AND region = 'eu'
ORDER BY ratemax3v3 DESC LIMIT 5;
EXPLAIN ANALYSE (PostgreSQL)
Limit (cost=629.10..629.12 rows=5 width=34) (actual time=111.128..111.130 rows=5 loops=1)
-> Sort (cost=629.10..629.40 rows=117 width=34) (actual time=111.126..111.128 rows=5 loops=1)
Sort Key: ratemax3v3 DESC
Sort Method: top-N heapsort Memory: 25kB
-> Bitmap Heap Scan on personnages (cost=9.63..627.16 rows=117 width=34) (actual time=110.619..111.093 rows=75 loops=1)
Recheck Cond: ((name)::text ~~ 'Krok%'::text)
Rows Removed by Index Recheck: 1
Filter: ((blacklisted = 0) AND ((region)::text = 'eu'::text))
Rows Removed by Filter: 13
Heap Blocks: exact=88
-> Bitmap Index Scan on trgm_idx_name (cost=0.00..9.60 rows=158 width=0) (actual time=110.581..110.582 rows=89 loops=1)
Index Cond: ((name)::text ~~ 'Krok%'::text)
Planning Time: 0.268 ms
Execution Time: 111.174 ms
Pgloader have been created indexes on ratemax3v3 and name like:
CREATE INDEX idx_24683_ratemax3v3
ON wow.personnages USING btree
(ratemax3v3 ASC NULLS LAST)
TABLESPACE pg_default;
CREATE INDEX idx_24683_name
ON wow.personnages USING btree
(name COLLATE pg_catalog."default" ASC NULLS LAST)
TABLESPACE pg_default;
I created a new index on name :
CREATE INDEX trgm_idx_name ON wow.personnages USING GIST (name gist_trgm_ops);
I'm quite a beginner with postgresql at the moment.
Do you see anything I could do?
Don't hesitate to ask me if you need more information!
Antoine
To support a LIKE query like that (left anchored) you need to use a special "operator class":
CREATE INDEX ON wow.personnages(name varchar_pattern_ops);
But for your given query, an index on multiple columns would probably be more efficient:
CREATE INDEX ON wow.personnages(region, blacklisted, name varchar_pattern_ops);
Of maybe even a filtered index if e.g. the blacklisted = 0 is a static condition and there are relatively few rows matching that condition.
CREATE INDEX ON wow.personnages(region, name varchar_pattern_ops)
WHERE blacklisted = 0;
If the majority of the rows has blacklisted = 0 that won't really help (and adding the column to the index wouldn't help either). In that case just an index with (region, name varchar_pattern_ops) is probably more efficient.
If your pattern is anchored at the beginning, the following index would perform better:
CREATE INDEX ON personnages (name text_pattern_ops);
Besides, GIN indexes usually perform better than GiST indexes in a case like this. Try with a GIN index.
Finally, it is possible that the trigrams k, kr, kro, rok and ok occur very frequently, which would also make the index perform bad.

Postgres index not using correct plan

My postgresql version is 10.6. I have created an index but that is not used for all where clause condition check. Below are more details :
Create index concurrently ticket_created_at_portal_id_created_by_id_assigned_group_id_idx on ticket(created_at, portal_id, created_by_id, assigned_group);
EXPLAIN (analyze true, verbose true, costs true, buffers true, timing true ) select * from ticket where status is not null
and (assigned_group in ('447') or created_by_id in ('39731566'))
and portal_id=8
and created_at>='2020-12-07T03:00:10.973'
and created_at<='2021-02-05T03:00:10.973'
order by updated_at DESC limit 10;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=18975.23..18975.25 rows=10 width=638) (actual time=278.340..278.345 rows=10 loops=1)
Output: id, action, assigned_agent, assigned_at, assigned_group, attachments, closed_at, created_at, created_by_email, created_by_id, description, first_response_time, parent_id, portal_id, priority, reopened_at, resolution_id, resolved_at, resolved_by_id, resource_id, resource_type, source, status, subject, tags, ticket_category, ticket_id, ticket_sub_category, ticket_sub_sub_category, type, updated_at, custom_fields, updated_by_id, first_assigned_at, mode, sla_breached, agent_assist_tags, comm_vendor, from_email
Buffers: shared hit=2280 read=3105
-> Sort (cost=18975.23..18975.45 rows=87 width=638) (actual time=278.338..278.339 rows=10 loops=1)
Output: id, action, assigned_agent, assigned_at, assigned_group, attachments, closed_at, created_at, created_by_email, created_by_id, description, first_response_time, parent_id, portal_id, priority, reopened_at, resolution_id, resolved_at, resolved_by_id, resource_id, resource_type, source, status, subject, tags, ticket_category, ticket_id, ticket_sub_category, ticket_sub_sub_category, type, updated_at, custom_fields, updated_by_id, first_assigned_at, mode, sla_breached, agent_assist_tags, comm_vendor, from_email
Sort Key: ticket.updated_at DESC
Sort Method: top-N heapsort Memory: 33kB
Buffers: shared hit=2280 read=3105
-> Bitmap Heap Scan on public.ticket (cost=17855.76..18973.35 rows=87 width=638) (actual time=111.871..275.835 rows=1256 loops=1)
Output: id, action, assigned_agent, assigned_at, assigned_group, attachments, closed_at, created_at, created_by_email, created_by_id, description, first_response_time, parent_id, portal_id, priority, reopened_at, resolution_id, resolved_at, resolved_by_id, resource_id, resource_type, source, status, subject, tags, ticket_category, ticket_id, ticket_sub_category, ticket_sub_sub_category, type, updated_at, custom_fields, updated_by_id, first_assigned_at, mode, sla_breached, agent_assist_tags, comm_vendor, from_email
Recheck Cond: (((ticket.assigned_group = '447'::bigint) AND (ticket.portal_id = 8)) OR ((ticket.created_at >= '2020-12-07 03:00:10.973'::timestamp without time zone) AND (ticket.created_at <= '2021-02-05 03:00:10.973'::timestamp without time zone) AND (ticket.portal_id = 8) AND (ticket.created_by_id = '39731566'::bigint)))
Filter: ((ticket.status IS NOT NULL) AND (ticket.created_at >= '2020-12-07 03:00:10.973'::timestamp without time zone) AND (ticket.created_at <= '2021-02-05 03:00:10.973'::timestamp without time zone))
Rows Removed by Filter: 1517
Heap Blocks: exact=2638
Buffers: shared hit=2277 read=3105
-> BitmapOr (cost=17855.76..17855.76 rows=291 width=0) (actual time=106.215..106.216 rows=0 loops=1)
Buffers: shared hit=336 read=2408
-> Bitmap Index Scan on ticket_assigned_group_portal_id_assigned_agent_idx (cost=0.00..11.25 rows=282 width=0) (actual time=10.661..10.661 rows=2776 loops=1)
Index Cond: ((ticket.assigned_group = '447'::bigint) AND (ticket.portal_id = 8))
Buffers: shared hit=4 read=15
-> Bitmap Index Scan on ticket_created_at_portal_id_created_by_id_assigned_group_id_idx (cost=0.00..17844.47 rows=9 width=0) (actual time=95.551..95.551 rows=2 loops=1)
Index Cond: ((ticket.created_at >= '2020-12-07 03:00:10.973'::timestamp without time zone) AND (ticket.created_at <= '2021-02-05 03:00:10.973'::timestamp without time zone) AND (ticket.portal_id = 8) AND (ticket.created_by_id = '39731566'::bigint))
Buffers: shared hit=332 read=2393
Planning time: 43.083 ms
Execution time: 278.556 ms
(25 rows)
ticket_created_at_portal_id_created_by_id_assigned_group_id_idx is having all the columns of where clause except status is not null, but still query is using separate index for Index Cond: ((ticket.assigned_group = '447'::bigint) AND (ticket.portal_id = 8)) which is already present in 2nd index ticket_created_at_portal_id_created_by_id_assigned_group_id_idx.
Why it is so? Even when I include status column as well in the index, still query was using 2 index and hen doing a heavy filter on index heap scan.
How we can optimize it?
Also I did experiment with indexes of table but still unable to remove indexes. It seems that same columns are repeated in multiple indexes for diff queries, Please help if we can reduce these no of indexes.
All indexes for table are:
"ticket_pkey" PRIMARY KEY, btree (id)
"ticket_ticket_id_idx" UNIQUE, btree (ticket_id)
"uk2uors84i0m8sjxc6oaocuy6oj" UNIQUE CONSTRAINT, btree (ticket_id)
"idx_resource_id" btree (resource_id)
"idx_ticket_created_at" btree (created_at)
"ticket_assigned_agent_idx" btree (assigned_agent)
"ticket_assigned_group_idx" btree (assigned_group)
"ticket_assigned_group_portal_id_assigned_agent_idx" btree (assigned_group, portal_id, assigned_agent)
"ticket_created_at_portal_id_created_by_id_assigned_group_id_idx" btree (created_at, portal_id, created_by_id, assigned_group)
"ticket_created_at_portal_id_status_idx" btree (created_at, portal_id, status)
"ticket_id_resolved_at_assigned_group_status_idx" btree (id, resolved_at, assigned_group, status)
How easy would it be to use a phone book (sorted by last-name then first-name) to find every one with a first name of "Francis" whose last name starts with a letter between K and T? Not very easy, because it is not sorted by first name. You would have to go through the entire middle half of the phone book, reading everyone's first name.
Same here. When the first column in your index is used in a range/inequality query rather than equality, it makes all the columns after that one much less efficient. You would want to put the columns used for equality and not in an OR first. Unfortunately, that is only portal_id. The best thing to put next would depend on how selective each of those other conditions are which we can't guess from the info provided.
In deciding this, status IS NULL would be the same thing as equality, but status IS NOT NULL is not as there are any number of values it could be while still being not null, so it is effectively the same as an inequality. If this condition is highly selective, the best way to incorporate it would be in the WHERE of a partial index.
Because of the OR, you might still be best off with 2 indexes which could be combined in a bitmap or.
...(portal_id, assigned_group, created_at) WHERE status IS NOT NULL;
...(portal_id, created_by_id, created_at) WHERE status IS NOT NULL;
Another approach would be to avoid fetching and sorting all of the matching rows, by walking rows in order by updated_at using an index and stopping after 10 of them are found. An index can be used for walking a column in order as long as only things tested for equality (and without ORs) occur before the ORDER BY column, so:
...(portal_id, updated_at) WHERE status IS NOT NULL;

Optimising UUID Lookups in Postgres

All uuid columns below use the native Postgres uuid column type.
Have a lookup table where the uuid (uuid type 4 - so as random as can feasibly be) is the primary key. Regularly pull sequence of rows, say 10,000 from this lookup table.
Then, wish to use that set of uuid's retrieved from the lookup table to query other tables, typically two others, using the UUID's just retrieved. The UUID's in the other tables (tables A and B) are not primary keys. UUID columns in other tables A and B have UNIQUE constraints (btree indices).
Currently not doing this merging using a JOIN of any kind, just simple:
Query lookup table, get uuids.
Query table A using uuids from (1)
Query table B using uuids from (1)
The issue is that queries (2) and (3) are surprisingly slow. So for around 4000 rows in tables A and B, particularly table A, around 30-50 seconds typically. Table A has around 60M rows.
Dealing with just table A, when using EXPLAIN ANALYZE, reports as doing an "Index Scan" on the uuid column in column A, with an Index Cond in the EXPLAIN ANALYZE output.
I've experiment with various WHERE clauses:
uuid = ANY ('{
uuid = ANY(VALUES('
uuid ='uuid1' OR uuid='uuid2' etc ....
And experimented with btree (distinct), hash index table A on uuid, btree and hash index.
By far the fastest (which is still relatively slow) is: btree and use of "ANY ('{" in the WHERE clause.
Various opinions I've read:
Actually doing a proper JOIN e.g. LEFT OUTER JOIN across the three tables.
That the use of uuid type 4 is the problem, it being a randomly generated id, as opposed to a sequence based id.
Possibly experimenting with work_mem.
Anyway. Wondered if anyone else had any other suggestions?
Table: "lookup"
uuid: type uuid. not null. plain storage.
datetime_stamp: type bigint. not null. plain storage.
harvest_date_stamp: type bigint. not null. plain storage.
state: type smallint. not null. plain storage.
Indexes:
"lookup_pkey" PRIMARY KEY, btree (uuid)
"lookup_32ff3898" btree (datetime_stamp)
"lookup_6c8369bc" btree (harvest_date_stamp)
"lookup_9ed39e2e" btree (state)
Has OIDs: no
Table: "article_data"`
int: type integer. not null default nextval('article_data_id_seq'::regclass). plain storage.
title: text.
text: text.
insertion_date: date
harvest_date: timestamp with time zone.
uuid: uuid.
Indexes:
"article_data_pkey" PRIMARY KEY, btree (id)
"article_data_uuid_key" UNIQUE CONSTRAINT, btree (uuid)
Has OIDs: no
Both lookup and article_data have around 65m rows. Two queries:
SELECT uuid FROM lookup WHERE state = 200 LIMIT 4000;
OUTPUT FROM EXPLAIN (ANALYZE, BUFFERS):
Limit (cost=0.00..4661.02 rows=4000 width=16) (actual time=0.009..1.036 rows=4000 loops=1)
Buffers: shared hit=42
-> Seq Scan on lookup (cost=0.00..1482857.00 rows=1272559 width=16) (actual time=0.008..0.777 rows=4000 loops=1)
Filter: (state = 200)
Rows Removed by Filter: 410
Buffers: shared hit=42
Total runtime: 1.196 ms
(7 rows)
Question: Why does this do a sequence scan and not an index scan when there is a btree on state?
SELECT article_data.id, article_data.uuid, article_data.title, article_data.text
FROM article_data
WHERE uuid = ANY ('{f0d5e665-4f21-4337-a54b-cf0b4757db65,..... 3999 more uuid's ....}'::uuid[]);
OUTPUT FROM EXPLAIN (ANALYZE, BUFFERS):
Index Scan using article_data_uuid_key on article_data (cost=5.56..34277.00 rows=4000 width=581) (actual time=0.063..66029.031 rows=400
0 loops=1)
Index Cond: (uuid = ANY ('{f0d5e665-4f21-4337-a54b-cf0b4757db65,5618754f-544b-4700-9d24-c364fd0ba4e9,958e37e3-6e6e-4b2a-b854-48e88ac1fdb7, ba56b483-59b2-4ae5-ae44-910401f3221b,aa4
aca60-a320-4ed3-b7b4-829e6ca63592,05f1c0b9-1f9b-4e1c-8f41-07545d694e6b,7aa4dee9-be17-49df-b0ca-d6e63b0dc023,e9037826-86c4-4bbc-a9d5-6977ff7458af,db5852bf- a447-4a1d-9673-ead2f7045589
,6704d89 .......}'::uuid[]))
Buffers: shared hit=16060 read=4084 dirtied=292
Total runtime: 66041.443 ms
(4 rows)
Question: Why is this so slow, even though it's reading from disk?
Without seeing your table structure and the output of explain analyze..., I'd expect an inner join on the lookup table to give the best performance. (My table_a has about 10 million rows.)
select *
from table_a
inner join
-- Brain dead way to get about 1000 rows
-- from a renamed scratch table.
(select test_uuid from lookup_table
where test_id < 10000) t
on table_a.test_uuid = t.test_uuid;
"Nested Loop (cost=0.72..8208.85 rows=972 width=36) (actual time=0.041..11.825 rows=999 loops=1)"
" -> Index Scan using uuid_test_2_test_id_key on lookup_table (cost=0.29..39.30 rows=972 width=16) (actual time=0.015..0.414 rows=999 loops=1)"
" Index Cond: (test_id Index Scan using uuid_test_test_uuid_key on table_a (cost=0.43..8.39 rows=1 width=20) (actual time=0.010..0.011 rows=1 loops=999)"
" Index Cond: (test_uuid = lookup_table.test_uuid)"
"Planning time: 0.503 ms"
"Execution time: 11.953 ms"

Limit slows down my postgres query

Hi i have a simple query on a single table which runs pretty fast, but i want to page my results and the LIMIT slows down the select incredibly. The Table contains about 80 Million rows. I'm on postgres 9.2.
Without LIMIT it takes 330ms and returns 2100 rows
EXPLAIN SELECT * from interval where username='1228321f131084766f3b0c6e40bc5edc41d4677e' order by time desc
Sort (cost=156599.71..156622.43 rows=45438 width=108)"
Sort Key: "time""
-> Bitmap Heap Scan on "interval" (cost=1608.05..155896.71 rows=45438 width=108)"
Recheck Cond: ((username)::text = '1228321f131084766f3b0c6e40bc5edc41d4677e'::text)"
-> Bitmap Index Scan on interval_username (cost=0.00..1605.77 rows=45438 width=0)"
Index Cond: ((username)::text = '1228321f131084766f3b0c6e40bc5edc41d4677e'::text)
EXPLAIN ANALYZE SELECT * from interval where
username='1228321f131084766f3b0c6e40bc5edc41d4677e' order by time desc
Sort (cost=156599.71..156622.43 rows=45438 width=108) (actual time=1.734..1.887 rows=2131 loops=1)
Sort Key: id
Sort Method: quicksort Memory: 396kB
-> Bitmap Heap Scan on "interval" (cost=1608.05..155896.71 rows=45438 width=108) (actual time=0.425..0.934 rows=2131 loops=1)
Recheck Cond: ((username)::text = '1228321f131084766f3b0c6e40bc5edc41d4677e'::text)
-> Bitmap Index Scan on interval_username (cost=0.00..1605.77 rows=45438 width=0) (actual time=0.402..0.402 rows=2131 loops=1)
Index Cond: ((username)::text = '1228321f131084766f3b0c6e40bc5edc41d4677e'::text)
Total runtime: 2.065 ms
With LIMIT it takes several minuts (i never waited for it to end)
EXPLAIN SELECT * from interval where username='1228321f131084766f3b0c6e40bc5edc41d4677e' order by time desc LIMIT 10
Limit (cost=0.00..6693.99 rows=10 width=108)
-> Index Scan Backward using interval_time on "interval" (cost=0.00..30416156.03 rows=45438 width=108)
Filter: ((username)::text = '1228321f131084766f3b0c6e40bc5edc41d4677e'::text)
Table definition
-- Table: "interval"
-- DROP TABLE "interval";
CREATE TABLE "interval"
(
uuid character varying(255) NOT NULL,
deleted boolean NOT NULL,
id bigint NOT NULL,
"interval" bigint NOT NULL,
"time" timestamp without time zone,
trackerversion character varying(255),
username character varying(255),
CONSTRAINT interval_pkey PRIMARY KEY (uuid),
CONSTRAINT fk_272h71b2gfyov9fwnksyditdd FOREIGN KEY (username)
REFERENCES appuser (panelistcode) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE,
CONSTRAINT uk_hyi5iws50qif6jwky9xcch3of UNIQUE (id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE "interval"
OWNER TO postgres;
-- Index: interval_time
-- DROP INDEX interval_time;
CREATE INDEX interval_time
ON "interval"
USING btree
("time");
-- Index: interval_username
-- DROP INDEX interval_username;
CREATE INDEX interval_username
ON "interval"
USING btree
(username COLLATE pg_catalog."default");
-- Index: interval_uuid
-- DROP INDEX interval_uuid;
CREATE INDEX interval_uuid
ON "interval"
USING btree
(uuid COLLATE pg_catalog."default");
Further results
SELECT n_distinct FROM pg_stats WHERE tablename='interval' AND attname='username';
n_distinct=1460
SELECT AVG(length) FROM (SELECT username, COUNT(*) AS length FROM interval GROUP BY username) as freq;
45786.022605591910
SELECT COUNT(*) FROM interval WHERE username='1228321f131084766f3b0c6e40bc5edc41d4677e';
2131
The planner is expecting 45438 rows for username '1228321f131084766f3b0c6e40bc5edc41d4677e', while in reality there are only 2131 rows with it, thus it thinks it will find the 10 rows you want faster by looking backward through the interval_time index.
Try increasing the stats on the username column and see whether the query plan will change.
ALTER TABLE interval ALTER COLUMN username SET STATISTICS 100;
ANALYZE interval;
You can try different values of statistics up to 10000.
If you are still not satisfied with the plan and you are sure that you can do better than the planner and know what you are doing, then you can bypass any index easily by performing some operation over it that does not change its value.
For example, instead of ORDER BY time, you can use ORDER BY time + '0 seconds'::interval. That way any index on the value of time stored in the table will be bypassed. For integer values you can multiply * 1, etc.
The page http://thebuild.com/blog/2014/11/18/when-limit-attacks/ showed that i could force postgres to do better by using CTE
WITH inner_query AS (SELECT * from interval where username='7823721a3eb9243be63c6c3a13dffee44753cda6')
SELECT * FROM inner_query order by time desc LIMIT 10;

postgreSQL get last ID in partitioned tables /

my question is basically the same as this one, but i couldn't find an answer, its also written "to be solved in the next release" and "easy for min/max scans"
PostgreSQL+table partitioning: inefficient max() and min()
CREATE TABLE mc_handstats
(
id integer NOT NULL DEFAULT nextval('mc_handst_id_seq'::regclass),
playerid integer NOT NULL,
CONSTRAINT mc_handst_pkey PRIMARY KEY (id),
);
table is partitioned over playerid.
CREATE TABLE mc_handst_0000 ( CHECK ( playerid >= 0 AND playerid < 10000) ) INHERITS (mc_handst) TABLESPACE ssd01;
CREATE TABLE mc_handst_0010 ( CHECK ( playerid >= 10000 AND playerid < 30000) ) INHERITS (mc_handst) TABLESPACE ssd02;
CREATE TABLE mc_handst_0030 ( CHECK ( playerid >= 30000 AND playerid < 50000) ) INHERITS (mc_handst) TABLESPACE ssd03;
...
CREATE INDEX mc_handst_0000_PlayerID ON mc_handst_0000 (playerid);
CREATE INDEX mc_handst_0010_PlayerID ON mc_handst_0010 (playerid);
CREATE INDEX mc_handst_0030_PlayerID ON mc_handst_0030 (playerid);
...
plus create trigger on playerID
i want to get the last id (i could also get the value for the sequence, but i am used to work with tables/colums), but pSQL seems to be rather stupid scanning the table:
EXPLAIN ANALYZE select max(id) from mc_handstats; (the real query runs forever)
"Aggregate (cost=9080859.04..9080859.05 rows=1 width=4) (actual time=181867.626..181867.626 rows=1 loops=1)"
" -> Append (cost=0.00..8704322.43 rows=150614644 width=4) (actual time=2.460..163638.343 rows=151134891 loops=1)"
" -> Seq Scan on mc_handstats (cost=0.00..0.00 rows=1 width=4) (actual time=0.002..0.002 rows=0 loops=1)"
" -> Seq Scan on mc_handst_0000 mc_handstats (cost=0.00..728523.69 rows=12580969 width=4) (actual time=2.457..10800.539 rows=12656647 loops=1)"
...
ALL TABLES
...
"Total runtime: 181867.819 ms"
EXPLAIN ANALYZE select max(id) from mc_handst_1000
"Aggregate (cost=83999.50..83999.51 rows=1 width=4) (actual time=1917.933..1917.933 rows=1 loops=1)"
" -> Seq Scan on mc_handst_1000 (cost=0.00..80507.40 rows=1396840 width=4) (actual time=0.007..1728.268 rows=1396717 loops=1)"
"Total runtime: 1918.494 ms"
the runtime for the partitioned table is 'snap', and completely off the record over the master table. (postgreSQL 9.2)
\d mc_handstats (only the indexes)
Indexes:
"mc_handst_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
"mc_handst_playerid_fkey" FOREIGN KEY (playerid) REFERENCES mc_players(id)
Triggers:
mc_handst_insert_trigger BEFORE INSERT ON mc_handstats FOR EACH ROW EXECUTE PROCEDURE mc_handst_insert_function()
Number of child tables: 20 (Use \d+ to list them.)
\d mc_handst_1000
Indexes:
"mc_handst_1000_playerid" btree (playerid)
Check constraints:
"mc_handst_1000_playerid_check" CHECK (playerid >= 1000000 AND playerid < 1100000)
hm, no PK index in the sub tables. while i don't understand why the result for max(id) is pretty fast on the subtables (as there is no index) and slow from the master table, it seems i need to add an index for PK also for all subtables. maybe that solves it.
CREATE INDEX mc_handst_0010_ID ON mc_handst_0010 (id);
... plus many more ...
and everything fine. still strange why it worked fast on the subtables before, that made me think they are indexed, but i also don't care to much.
thanks for this!
The first thing you need to do is index all the child tables on (id) and see if max(id) is smart enough to do an index scan on each table. I think i should be but I am not entirely sure.
If not, here's what I would do: I would start with currval([sequence_name]) and work back until a record is found. You could do something check blocks of 10 at a time, or the like in what is essentially a sparse scan. This could be done with a CTE like such (again relies on indexes):
WITH RECURSIVE ids (
select max(id) as max_id, currval('mc_handst_id_seq') - 10 as min_block
FROM mc_handst
WHERE id BETWEEN currval('mc_handst_id_seq') - 10 AND currval('mc_handst_id_seq')
UNION ALL
SELECT max(id), i.min_block - 10
FROM mc_handst
JOIN ids i ON id BETWEEN i.min_block - 10 AND i.min_block
WHERE i.max_id IS NULL
)
SELECT max(max_id) from ids;
That should do a sparse scan if the planner won't use an index once the partitions are indexed. In most cases it should only do one scan but it will repeat as necessary to find an id. Note that it might run forever on an empty table.
Assuming a parent's table like this:
CREATE TABLE parent AS (
id not null default nextval('parent_id_seq'::regclass)
... other columns ...
);
Whether you're using a rule or a trigger to divert the INSERTs into the child tables, immediately after the INSERT you may use:
SELECT currval('parent_id_seq'::regclass);
to get the last id inserted by your session, independently of concurrent INSERTs, each session having its own copy of the last sequence value it has obtained.
https://dba.stackexchange.com/questions/58497/return-id-from-partitioned-table-in-postgres