Limit slows down my postgres query - postgresql

Hi i have a simple query on a single table which runs pretty fast, but i want to page my results and the LIMIT slows down the select incredibly. The Table contains about 80 Million rows. I'm on postgres 9.2.
Without LIMIT it takes 330ms and returns 2100 rows
EXPLAIN SELECT * from interval where username='1228321f131084766f3b0c6e40bc5edc41d4677e' order by time desc
Sort (cost=156599.71..156622.43 rows=45438 width=108)"
Sort Key: "time""
-> Bitmap Heap Scan on "interval" (cost=1608.05..155896.71 rows=45438 width=108)"
Recheck Cond: ((username)::text = '1228321f131084766f3b0c6e40bc5edc41d4677e'::text)"
-> Bitmap Index Scan on interval_username (cost=0.00..1605.77 rows=45438 width=0)"
Index Cond: ((username)::text = '1228321f131084766f3b0c6e40bc5edc41d4677e'::text)
EXPLAIN ANALYZE SELECT * from interval where
username='1228321f131084766f3b0c6e40bc5edc41d4677e' order by time desc
Sort (cost=156599.71..156622.43 rows=45438 width=108) (actual time=1.734..1.887 rows=2131 loops=1)
Sort Key: id
Sort Method: quicksort Memory: 396kB
-> Bitmap Heap Scan on "interval" (cost=1608.05..155896.71 rows=45438 width=108) (actual time=0.425..0.934 rows=2131 loops=1)
Recheck Cond: ((username)::text = '1228321f131084766f3b0c6e40bc5edc41d4677e'::text)
-> Bitmap Index Scan on interval_username (cost=0.00..1605.77 rows=45438 width=0) (actual time=0.402..0.402 rows=2131 loops=1)
Index Cond: ((username)::text = '1228321f131084766f3b0c6e40bc5edc41d4677e'::text)
Total runtime: 2.065 ms
With LIMIT it takes several minuts (i never waited for it to end)
EXPLAIN SELECT * from interval where username='1228321f131084766f3b0c6e40bc5edc41d4677e' order by time desc LIMIT 10
Limit (cost=0.00..6693.99 rows=10 width=108)
-> Index Scan Backward using interval_time on "interval" (cost=0.00..30416156.03 rows=45438 width=108)
Filter: ((username)::text = '1228321f131084766f3b0c6e40bc5edc41d4677e'::text)
Table definition
-- Table: "interval"
-- DROP TABLE "interval";
CREATE TABLE "interval"
(
uuid character varying(255) NOT NULL,
deleted boolean NOT NULL,
id bigint NOT NULL,
"interval" bigint NOT NULL,
"time" timestamp without time zone,
trackerversion character varying(255),
username character varying(255),
CONSTRAINT interval_pkey PRIMARY KEY (uuid),
CONSTRAINT fk_272h71b2gfyov9fwnksyditdd FOREIGN KEY (username)
REFERENCES appuser (panelistcode) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE,
CONSTRAINT uk_hyi5iws50qif6jwky9xcch3of UNIQUE (id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE "interval"
OWNER TO postgres;
-- Index: interval_time
-- DROP INDEX interval_time;
CREATE INDEX interval_time
ON "interval"
USING btree
("time");
-- Index: interval_username
-- DROP INDEX interval_username;
CREATE INDEX interval_username
ON "interval"
USING btree
(username COLLATE pg_catalog."default");
-- Index: interval_uuid
-- DROP INDEX interval_uuid;
CREATE INDEX interval_uuid
ON "interval"
USING btree
(uuid COLLATE pg_catalog."default");
Further results
SELECT n_distinct FROM pg_stats WHERE tablename='interval' AND attname='username';
n_distinct=1460
SELECT AVG(length) FROM (SELECT username, COUNT(*) AS length FROM interval GROUP BY username) as freq;
45786.022605591910
SELECT COUNT(*) FROM interval WHERE username='1228321f131084766f3b0c6e40bc5edc41d4677e';
2131

The planner is expecting 45438 rows for username '1228321f131084766f3b0c6e40bc5edc41d4677e', while in reality there are only 2131 rows with it, thus it thinks it will find the 10 rows you want faster by looking backward through the interval_time index.
Try increasing the stats on the username column and see whether the query plan will change.
ALTER TABLE interval ALTER COLUMN username SET STATISTICS 100;
ANALYZE interval;
You can try different values of statistics up to 10000.
If you are still not satisfied with the plan and you are sure that you can do better than the planner and know what you are doing, then you can bypass any index easily by performing some operation over it that does not change its value.
For example, instead of ORDER BY time, you can use ORDER BY time + '0 seconds'::interval. That way any index on the value of time stored in the table will be bypassed. For integer values you can multiply * 1, etc.

The page http://thebuild.com/blog/2014/11/18/when-limit-attacks/ showed that i could force postgres to do better by using CTE
WITH inner_query AS (SELECT * from interval where username='7823721a3eb9243be63c6c3a13dffee44753cda6')
SELECT * FROM inner_query order by time desc LIMIT 10;

Related

Composite primary key and slow queries

I have a table which looks like this:
CREATE TABLE IF NOT EXISTS codes (
code character varying(255) NOT NULL,
used boolean NOT NULL DEFAULT FALSE,
user_id integer NOT NULL,
order_id integer,
created timestamp without time zone,
CONSTRAINT pk_code PRIMARY KEY (code, user_id)
);
And I have simple query:
SELECT COUNT(*) as "count"
FROM "codes" "code"
WHERE "code"."user_id" = {some_id}
AND "code"."order_id" IS NULL;
Which is very slow, explain analyze:
'Aggregate (cost=3576.10..3576.11 rows=1 width=8) (actual time=1471.870..1471.871 rows=1 loops=1)'
' -> Seq Scan on codes (cost=0.00..3323.89 rows=50443 width=17) (actual time=0.139..203.139 rows=49998 loops=1)'
' Filter: ((order_id IS NULL) AND (user_id = 10))'
' Rows Removed by Filter: 116498'
'Planning Time: 1.450 ms'
'Execution Time: 1471.981 ms'
How can I optimize this query?
An index can only be used efficiently if the first index column is used in the WHERE condition (yes, I know there are exceptions, but it is a good rule of thumb).
So to support that query, you should define the primary key the other way around, as (user_id, code). That will guarantee the constraint just as well, but the underlying index will support your query.
If you cannot change the primary key like that, you need another index on user_id alone.
But then, looking on the Rows Removed by Filter, probably a sequential scan is the fastest access strategy, and an index won't help at all. Try it out.

Postgresql Index Only Scan Doesnt Properly Work On Group By

I have a table like:
CREATE TABLE summary
(
id serial NOT NULL,
user_id bigint NOT NULL,
country character varying(5),
product_id bigint NOT NULL,
category_id bigint NOT NULL,
text_id bigint NOT NULL,
text character varying(255),
product_type integer NOT NULL,
event_name character varying(255),
report_date date NOT NULL,
currency character varying(5),
revenue double precision,
last_event_time timestamp
);
My table size is 1786 MB (except index). In here, I've created index like below:
CREATE INDEX "idx_as_type_usr_productId_eventTime"
ON summary USING btree
(product_type, user_id, product_id, last_event_time)
INCLUDE(event_name);
And my simple query looks like below:
select
event_name,
max(last_event_time)
from summary s
where s.user_id = ? and s.product_id = ? and s.product_type = ?
and s.last_event_time > '2020-03-01' and s.last_event_time < '2020-03-25'
group by event_name;
When I explain it, it looks like;
HashAggregate (cost=93.82..96.41 rows=259 width=25) (actual time=9187.533..9187.536 rows=10 loops=1)
Group Key: event_name
Buffers: shared hit=70898 read=10579 dirtied=22650
I/O Timings: read=3876.367
-> Index Only Scan using "idx_as_type_usr_productId_eventTime" on summary s (cost=0.56..92.36 rows=292 width=25) (actual time=0.485..9153.812 rows=87322 loops=1)
Index Cond: ((product_type = 2) AND (product_id = ?) AND (product_id = ?) AND (last_event_time > '2020-03-01 00:00:00'::timestamp without time zone) AND (last_event_time < '2020-03-25 00:00:00'::timestamp without time zone))
Heap Fetches: 35967
Buffers: shared hit=70898 read=10579 dirtied=22650
I/O Timings: read=3876.367
Planning Time: 0.452 ms
Execution Time: 9187.583 ms
In here, everything looks fine. But when I execute it, it takes more than 10 seconds, sometime it takes more than 30 seconds.
In here, if I execute it without Group By, it returns so quickly like less than 2 seconds. What can be the effect of Group By? The select part is so little (like a 500 rows).
This table has insert/update operations with 30/per second. Can this be related with this indexing problem?
Updated:
Query Without - GroupBy:
select
event_name,
last_event_time
from summary s
where s.user_id = ? and s.product_id = ? and s.product_type = ?
and s.last_event_time > '2020-03-01' and s.last_event_time < '2020-03-25';
Explain Without - Group By:
Index Only Scan using "idx_as_type_usr_productId_eventTime" on summary s (cost=0.56..92.36 rows=292 width=25) (actual time=0.023..79.138 rows=87305 loops=1)
Index Cond: ((product_type = ?) AND (user_id = ?) AND (product_id = ?) AND (last_event_time > '2020-03-01 00:00:00'::timestamp without time zone) AND (last_event_time < '2020-03-25 00:00:00'::timestamp without time zone))
Heap Fetches: 22949
Buffers: shared hit=37780 read=12143 dirtied=15156
I/O Timings: read=4418.930
Planning Time: 0.639 ms
Execution Time: 4625.213 ms
There are several problems:
PostgreSQL had to set hint bits, which dirty the pages and cause writes.
PostgreSQL has to fetch table rows from disk to fetch their visibility.
PostgreSQL has to scan 80000 pages to get 87000 rows, so the index must be totally bloated.
The first two can be taken care of by running
VACUUM summary;
which is always a good idea after a bulk load, and the bloat can be cured by
REINDEX INDEX "idx_as_type_usr_productId_eventTime";

Postgres selecting sub-optimal query plan in production

I've got a query with an ORDER and a LIMIT to support a paginated interface:
SELECT segment_members.id AS t0_r0,
segment_members.segment_id AS t0_r1,
segment_members.account_id AS t0_r2,
segment_members.score AS t0_r3,
segment_members.created_at AS t0_r4,
segment_members.updated_at AS t0_r5,
segment_members.posts_count AS t0_r6,
accounts.id AS t1_r0,
accounts.platform AS t1_r1,
accounts.username AS t1_r2,
accounts.created_at AS t1_r3,
accounts.updated_at AS t1_r4,
accounts.remote_id AS t1_r5,
accounts.name AS t1_r6,
accounts.language AS t1_r7,
accounts.description AS t1_r8,
accounts.timezone AS t1_r9,
accounts.profile_image_url AS t1_r10,
accounts.post_count AS t1_r11,
accounts.follower_count AS t1_r12,
accounts.following_count AS t1_r13,
accounts.uri AS t1_r14,
accounts.location AS t1_r15,
accounts.favorite_count AS t1_r16,
accounts.raw AS t1_r17,
accounts.followers_completed_at AS t1_r18,
accounts.followings_completed_at AS t1_r19,
accounts.followers_started_at AS t1_r20,
accounts.followings_started_at AS t1_r21,
accounts.profile_fetched_at AS t1_r22,
accounts.managed_source_id AS t1_r23
FROM segment_members
INNER JOIN accounts ON accounts.id = segment_members.account_id
WHERE segment_members.segment_id = 1
ORDER BY accounts.follower_count ASC LIMIT 20
OFFSET 0;
Here are the indexes on the tables:
accounts
"accounts_pkey" PRIMARY KEY, btree (id)
"index_accounts_on_remote_id_and_platform" UNIQUE, btree (remote_id, platform)
"index_accounts_on_description" btree (description)
"index_accounts_on_favorite_count" btree (favorite_count)
"index_accounts_on_follower_count" btree (follower_count)
"index_accounts_on_following_count" btree (following_count)
"index_accounts_on_lower_username_and_platform" btree (lower(username::text), platform)
"index_accounts_on_post_count" btree (post_count)
"index_accounts_on_profile_fetched_at_and_platform" btree (profile_fetched_at, platform)
"index_accounts_on_username" btree (username)
segment_members
"segment_members_pkey" PRIMARY KEY, btree (id)
"index_segment_members_on_segment_id_and_account_id" UNIQUE, btree (segment_id, account_id)
"index_segment_members_on_account_id" btree (account_id)
"index_segment_members_on_segment_id" btree (segment_id)
In my development and staging databases, the query plan looks like the following, and the query executes very quickly.
Limit (cost=4802.15..4802.20 rows=20 width=2086)
-> Sort (cost=4802.15..4803.20 rows=421 width=2086)
Sort Key: accounts.follower_count
-> Nested Loop (cost=20.12..4790.95 rows=421 width=2086)
-> Bitmap Heap Scan on segment_members (cost=19.69..1244.24 rows=421 width=38)
Recheck Cond: (segment_id = 1)
-> Bitmap Index Scan on index_segment_members_on_segment_id_and_account_id (cost=0.00..19.58 rows=
421 width=0)
Index Cond: (segment_id = 1)
-> Index Scan using accounts_pkey on accounts (cost=0.43..8.41 rows=1 width=2048)
Index Cond: (id = segment_members.account_id)
In production, however, the query plan is the following, and the query takes forever (several minutes until it hits the statement timeout).
Limit (cost=0.86..25120.72 rows=20 width=2130)
-> Nested Loop (cost=0.86..4614518.64 rows=3674 width=2130)
-> Index Scan using index_accounts_on_follower_count on accounts (cost=0.43..2779897.53 rows=3434917 width=209
2)
-> Index Scan using index_segment_members_on_segment_id_and_account_id on segment_members (cost=0.43..0.52 row
s=1 width=38)
Index Cond: ((segment_id = 1) AND (account_id = accounts.id))
accounts has about 6m rows in staging and 3m in production. segment_members has about 300k rows in staging and 4m in production. Is it the differences in table sizes that is causing the differences in the query plan selection? Is there any way I can get Postgres to use the faster query plan in production?
Update:
Here's the EXPLAIN ANALYZE from the slow production server:
Limit (cost=0.86..22525.66 rows=20 width=2127) (actual time=173.148..187568.247 rows=20 loops=1)
-> Nested Loop (cost=0.86..4654749.92 rows=4133 width=2127) (actual time=173.141..187568.193 rows=20 loops=1)
-> Index Scan using index_accounts_on_follower_count on accounts (cost=0.43..2839731.81 rows=3390197 width=2089) (actual time=0.110..180374.279 rows=1401278 loops=1)
-> Index Scan using index_segment_members_on_segment_id_and_account_id on segment_members (cost=0.43..0.53 rows=1 width=38) (actual time=0.003..0.003 rows=0 loops=1401278)
Index Cond: ((segment_id = 1) AND (account_id = accounts.id))
Total runtime: 187568.318 ms
(6 rows)
Either your table statistics are not up to date or the two queries you present are very different. The second one estimates to retrieve 3.5M rows (rows=3434917). ORDER BY / LIMIT 20 is forced to sort all 3.5 million rows to find the top 20, which is going to be extremely expensive - unless you have a matching index.
The first query plan expects to sort 421 rows. Not even close. Different query plans are no surprise.
It would be interesting to see the output of EXPLAIN ANALYZE, not just EXPLAIN. (Expensive for the second query!)
It very much depends on how many account_id for each segment_id. If segment_id is not selective, the query cannot be fast. Your only other option is a MATERIALIZED VIEW with the top n rows per segment_id and an appropriate regime to keep it up to date.
If your statistics are not up to date, just run ANALYZE on both tables and retry.
It might help to increase the statistics target for selected columns:
ALTER TABLE segment_members ALTER segment_id SET STATISTICS 1000;
ALTER TABLE segment_members ALTER account_id SET STATISTICS 1000;
ALTER TABLE accounts ALTER id SET STATISTICS 1000;
ALTER TABLE accounts ALTER follower_count SET STATISTICS 1000;
ANALYZE segment_members(segment_id, account_id);
ANALYZE accounts (id, follower_count);
Details:
Check statistics targets in PostgreSQL
Keep PostgreSQL from sometimes choosing a bad query plan
Better indexes
I addition to your existing UNIQUE constraint index_segment_members_on_segment_id_and_account_id on segment_members, I suggest a multicolumn index on accounts:
CREATE INDEX index_accounts_on_follower_count ON accounts (id, follower_count)
Again, run ANALYZE after creating the index.
Some indexes useless?
All other indexes in your question are irrelevant for this query. They may be useful for other purposes or useless.
This index is 100% dead freight, drop it. (Detailed explanation here.)
"index_segment_members_on_segment_id" btree (segment_id)
This one may be useless:
"index_accounts_on_description" btree (description)
Since a "description" is typically free text that is hardly used to order rows or in a WHERE condition with a suitable operator. But that's just an educated guess.

postgresql hashaggregate query optimization

I am trying to optimize the query below.
select cellid2 as cellid, max(endeks) as turkcell
from (select a.cellid2 as cellid2, b.endeks
from (select geom, cellid as cellid2 from grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000 ) a join (select endeks, st_transform(geom, 2320) as geom_tmp from turkcell_data ) b on st_intersects(a.geom, b.geom_tmp) ) x
group by cellid2 limit 5
and explain analyze returns
"Limit (cost=81808.31..81808.36 rows=5 width=12) (actual time=271376.201..271376.204 rows=5 loops=1)"
" -> HashAggregate (cost=81808.31..81879.63 rows=7132 width=12) (actual time=271376.200..271376.203 rows=5 loops=1)"
" -> Nested Loop (cost=0.00..81772.65 rows=7132 width=12) (actual time=5.128..269753.647 rows=1237707 loops=1)"
" Join Filter: _st_intersects(grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000.geom, st_transform(turkcell_data.geom, 2320))"
" -> Seq Scan on turkcell_data (cost=0.00..809.40 rows=3040 width=3045) (actual time=0.031..7.426 rows=3040 loops=1)"
" -> Index Scan using grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000_geom_gist on grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000 (cost=0.00..24.76 rows=7 width=124) (actual time=0.012..0.799 rows=647 loops=3040)"
" Index Cond: (geom && st_transform(turkcell_data.geom, 2320))"
"Total runtime: 271387.499 ms"
There exist indexes on geometry column and cellid columns. I read that instead of using max, order by desc and limit 1 works better. However, since I have group by clause, it does not work I think. Is there any way to do this or any other way which improves the performance?
Table Definitions:
CREATE TABLE grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000
(
regionid numeric,
geom geometry(Geometry,2320),
cellid integer,
turkcell double precision
)
WITH (
OIDS=FALSE
);
ALTER TABLE grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000
OWNER TO postgres;
-- Index: grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000_cellid
-- DROP INDEX grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000_cellid;
CREATE INDEX grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000_cellid
ON grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000
USING btree
(cellid );
-- Index: grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000_geom_gist
-- DROP INDEX grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000_geom_gist;
CREATE INDEX grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000_geom_gist
ON grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000
USING gist
(geom );
CREATE TABLE turkcell_data
(
gid serial NOT NULL,
objectid_1 integer,
objectid integer,
neighbourh numeric,
endeks numeric,
coorx numeric,
coory numeric,
shape_leng numeric,
shape_le_1 numeric,
shape_area numeric,
geom geometry(MultiPolygon,4326),
CONSTRAINT turkcell_data_pkey PRIMARY KEY (gid )
)
WITH (
OIDS=FALSE
);
ALTER TABLE turkcell_data
OWNER TO postgres;
-- Index: turkcell_data_geom_gist
-- DROP INDEX turkcell_data_geom_gist;
CREATE INDEX turkcell_data_geom_gist
ON turkcell_data
USING gist
(geom );
Either store your data re-projected to 2320, index that column, and use it in your join, or create an index on the transformed projection of the geometry in turkcell_data. I usually prefer the latter:
CREATE INDEX turkcell_data_geom_gist2320
ON turkcell_data
USING gist
(st_transform(geom, 2320) );
The other issue might be if your geometries are very complex - if any of your polygons have a relatively large number of points you might get stuck crunching away on the intersection. Try the index first, though.

postgreSQL get last ID in partitioned tables /

my question is basically the same as this one, but i couldn't find an answer, its also written "to be solved in the next release" and "easy for min/max scans"
PostgreSQL+table partitioning: inefficient max() and min()
CREATE TABLE mc_handstats
(
id integer NOT NULL DEFAULT nextval('mc_handst_id_seq'::regclass),
playerid integer NOT NULL,
CONSTRAINT mc_handst_pkey PRIMARY KEY (id),
);
table is partitioned over playerid.
CREATE TABLE mc_handst_0000 ( CHECK ( playerid >= 0 AND playerid < 10000) ) INHERITS (mc_handst) TABLESPACE ssd01;
CREATE TABLE mc_handst_0010 ( CHECK ( playerid >= 10000 AND playerid < 30000) ) INHERITS (mc_handst) TABLESPACE ssd02;
CREATE TABLE mc_handst_0030 ( CHECK ( playerid >= 30000 AND playerid < 50000) ) INHERITS (mc_handst) TABLESPACE ssd03;
...
CREATE INDEX mc_handst_0000_PlayerID ON mc_handst_0000 (playerid);
CREATE INDEX mc_handst_0010_PlayerID ON mc_handst_0010 (playerid);
CREATE INDEX mc_handst_0030_PlayerID ON mc_handst_0030 (playerid);
...
plus create trigger on playerID
i want to get the last id (i could also get the value for the sequence, but i am used to work with tables/colums), but pSQL seems to be rather stupid scanning the table:
EXPLAIN ANALYZE select max(id) from mc_handstats; (the real query runs forever)
"Aggregate (cost=9080859.04..9080859.05 rows=1 width=4) (actual time=181867.626..181867.626 rows=1 loops=1)"
" -> Append (cost=0.00..8704322.43 rows=150614644 width=4) (actual time=2.460..163638.343 rows=151134891 loops=1)"
" -> Seq Scan on mc_handstats (cost=0.00..0.00 rows=1 width=4) (actual time=0.002..0.002 rows=0 loops=1)"
" -> Seq Scan on mc_handst_0000 mc_handstats (cost=0.00..728523.69 rows=12580969 width=4) (actual time=2.457..10800.539 rows=12656647 loops=1)"
...
ALL TABLES
...
"Total runtime: 181867.819 ms"
EXPLAIN ANALYZE select max(id) from mc_handst_1000
"Aggregate (cost=83999.50..83999.51 rows=1 width=4) (actual time=1917.933..1917.933 rows=1 loops=1)"
" -> Seq Scan on mc_handst_1000 (cost=0.00..80507.40 rows=1396840 width=4) (actual time=0.007..1728.268 rows=1396717 loops=1)"
"Total runtime: 1918.494 ms"
the runtime for the partitioned table is 'snap', and completely off the record over the master table. (postgreSQL 9.2)
\d mc_handstats (only the indexes)
Indexes:
"mc_handst_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
"mc_handst_playerid_fkey" FOREIGN KEY (playerid) REFERENCES mc_players(id)
Triggers:
mc_handst_insert_trigger BEFORE INSERT ON mc_handstats FOR EACH ROW EXECUTE PROCEDURE mc_handst_insert_function()
Number of child tables: 20 (Use \d+ to list them.)
\d mc_handst_1000
Indexes:
"mc_handst_1000_playerid" btree (playerid)
Check constraints:
"mc_handst_1000_playerid_check" CHECK (playerid >= 1000000 AND playerid < 1100000)
hm, no PK index in the sub tables. while i don't understand why the result for max(id) is pretty fast on the subtables (as there is no index) and slow from the master table, it seems i need to add an index for PK also for all subtables. maybe that solves it.
CREATE INDEX mc_handst_0010_ID ON mc_handst_0010 (id);
... plus many more ...
and everything fine. still strange why it worked fast on the subtables before, that made me think they are indexed, but i also don't care to much.
thanks for this!
The first thing you need to do is index all the child tables on (id) and see if max(id) is smart enough to do an index scan on each table. I think i should be but I am not entirely sure.
If not, here's what I would do: I would start with currval([sequence_name]) and work back until a record is found. You could do something check blocks of 10 at a time, or the like in what is essentially a sparse scan. This could be done with a CTE like such (again relies on indexes):
WITH RECURSIVE ids (
select max(id) as max_id, currval('mc_handst_id_seq') - 10 as min_block
FROM mc_handst
WHERE id BETWEEN currval('mc_handst_id_seq') - 10 AND currval('mc_handst_id_seq')
UNION ALL
SELECT max(id), i.min_block - 10
FROM mc_handst
JOIN ids i ON id BETWEEN i.min_block - 10 AND i.min_block
WHERE i.max_id IS NULL
)
SELECT max(max_id) from ids;
That should do a sparse scan if the planner won't use an index once the partitions are indexed. In most cases it should only do one scan but it will repeat as necessary to find an id. Note that it might run forever on an empty table.
Assuming a parent's table like this:
CREATE TABLE parent AS (
id not null default nextval('parent_id_seq'::regclass)
... other columns ...
);
Whether you're using a rule or a trigger to divert the INSERTs into the child tables, immediately after the INSERT you may use:
SELECT currval('parent_id_seq'::regclass);
to get the last id inserted by your session, independently of concurrent INSERTs, each session having its own copy of the last sequence value it has obtained.
https://dba.stackexchange.com/questions/58497/return-id-from-partitioned-table-in-postgres