PostgreSQL is not using a straight forward index - postgresql

I have a PostgreSQL 10.6 database on Amazon RDS. My table is like this:
CREATE TABLE dfo_by_quarter (
release_key int4 NOT NULL,
country varchar(100) NOT NULL,
product_group varchar(100) NOT NULL,
distribution_type varchar(100) NOT NULL,
"year" int2 NOT NULL,
"date" date NULL,
quarter int2 NOT NULL,
category varchar(100) NOT NULL,
units numeric(38,6) NOT NULL,
sales_value_eur numeric(38,6) NOT NULL,
sales_value_usd numeric(38,6) NOT NULL,
sales_value_local numeric(38,6) NOT NULL,
data_status bpchar(1) NOT NULL,
panel_market_units numeric(38,6) NOT NULL,
panel_market_sales_value_eur numeric(38,6) NOT NULL,
panel_market_sales_value_usd numeric(38,6) NOT NULL,
panel_market_sales_value_local numeric(38,6) NOT NULL,
CONSTRAINT pk_dpretailer_dfo_by_quarter PRIMARY KEY (release_key, country, category, product_group, distribution_type, year, quarter),
CONSTRAINT fk_dpretailer_dfo_by_quarter_release FOREIGN KEY (release_key) REFERENCES dpretailer.dfo_release(release_id)
);
I understand Primary Key implies a unique index
If I simply ask how many rows I have when filtering on non existing data (release_key = 1 returns nothing), I can see it uses the index
EXPLAIN
SELECT COUNT(*)
FROM dpretailer.dfo_by_quarter
WHERE release_key = 1
Aggregate (cost=6.32..6.33 rows=1 width=8)
-> Index Only Scan using pk_dpretailer_dfo_by_quarter on dfo_by_quarter (cost=0.55..6.32 rows=1 width=0)
Index Cond: (release_key = 1)
But if I run the same query on a value that returns data, it scans the table, which is bound to be more expensive...
EXPLAIN
SELECT COUNT(*)
FROM dpretailer.dfo_by_quarter
WHERE release_key = 2
Finalize Aggregate (cost=47611.07..47611.08 rows=1 width=8)
-> Gather (cost=47610.86..47611.07 rows=2 width=8)
Workers Planned: 2
-> Partial Aggregate (cost=46610.86..46610.87 rows=1 width=8)
-> Parallel Seq Scan on dfo_by_quarter (cost=0.00..46307.29 rows=121428 width=0)
Filter: (release_key = 2)
I get it that using the index when there is no data makes sense and is driven by the stats on the table (I ran ANALYSE before the tests)
But why not using my index if there is data?
Surely, it must be quicker to scan part of an index (because release_key is the first column) rather than scanning an entire table???
I must be missing something...?
Update 2019-03-07
Thank You for your comments, which are very useful.
This simple query was just me trying to understand why the index was not used...
But I should have known better (I am new to postgresql but have MANY years experience with SQL Server) and it makes sense that it is not, as you commented about.
bad selectivity because my criteria only filters about 20% of the rows
bad table design (too fat, which we knew and are now addressing)
index not "covering" the query, etc...
So let me change "slightly" my question if I may...
Our table will be normalised in facts/dimensions (no more varchars in the wrong place).
We do only inserts, never updates and so few deletes that we can ignore it.
The table size will not be huge (tens of million of rows order).
Our queries will ALWAYS specify an exact release_key value.
Our new version of the table would look like this
CREATE TABLE dfo_by_quarter (
release_key int4 NOT NULL,
country_key int2 NOT NULL,
product_group_key int2 NOT NULL,
distribution_type_key int2 NOT NULL,
category_key int2 NOT NULL,
"year" int2 NOT NULL,
"date" date NULL,
quarter int2 NOT NULL,
units numeric(38,6) NOT NULL,
sales_value_eur numeric(38,6) NOT NULL,
sales_value_usd numeric(38,6) NOT NULL,
sales_value_local numeric(38,6) NOT NULL,
CONSTRAINT pk_milly_dfo_by_quarter PRIMARY KEY (release_key, country_key, category_key, product_group_key, distribution_type_key, year, quarter),
CONSTRAINT fk_milly_dfo_by_quarter_release FOREIGN KEY (release_key) REFERENCES dpretailer.dfo_release(release_id),
CONSTRAINT fk_milly_dim_dfo_category FOREIGN KEY (category_key) REFERENCES milly.dim_dfo_category(category_key),
CONSTRAINT fk_milly_dim_dfo_country FOREIGN KEY (country_key) REFERENCES milly.dim_dfo_country(country_key),
CONSTRAINT fk_milly_dim_dfo_distribution_type FOREIGN KEY (distribution_type_key) REFERENCES milly.dim_dfo_distribution_type(distribution_type_key),
CONSTRAINT fk_milly_dim_dfo_product_group FOREIGN KEY (product_group_key) REFERENCES milly.dim_dfo_product_group(product_group_key)
);
With that in mind, in a SQL Server environment, I could solve this by having a "Clustered" primary key (the entire table being sorted), or having an index on the primary key with INCLUDE option for the other columns required to cover the queries (Units, Values, etc).
Question 1)
In postgresql, is there an equivalent to the SQL Server Clustered index? A way to actually sort the entire table? I suppose it might be difficult because postgresql does not do updates "in place", hence it might make sorting expensive...
Or, is there a way to create something like a SQL Server Index WITH INCLUDE(units, values)?
update: I came across the SQL CLUSTER command, which is the closest thing I suppose.
It would be suitable for us
Question 2
With the query below
EXPLAIN (ANALYZE, BUFFERS)
WITH "rank_query" AS
(
SELECT
ROW_NUMBER() OVER(PARTITION BY "year" ORDER BY SUM("main"."units") DESC) AS "rank_by",
"year",
"main"."product_group_key" AS "productgroupkey",
SUM("main"."units") AS "salesunits",
SUM("main"."sales_value_eur") AS "salesvalue",
SUM("sales_value_eur")/SUM("units") AS "asp"
FROM "milly"."dfo_by_quarter" AS "main"
WHERE
"release_key" = 17 AND
"main"."year" >= 2010
GROUP BY
"year",
"main"."product_group_key"
)
,BeforeLookup
AS (
SELECT
"year" AS date,
SUM("salesunits") AS "salesunits",
SUM("salesvalue") AS "salesvalue",
SUM("salesvalue")/SUM("salesunits") AS "asp",
CASE WHEN "rank_by" <= 50 THEN "productgroupkey" ELSE -1 END AS "productgroupkey"
FROM
"rank_query"
GROUP BY
"year",
CASE WHEN "rank_by" <= 50 THEN "productgroupkey" ELSE -1 END
)
SELECT BL.date, BL.salesunits, BL.salesvalue, BL.asp
FROM BeforeLookup AS BL
INNER JOIN milly.dim_dfo_product_group PG ON PG.product_group_key = BL.productgroupkey;
I get this
Hash Join (cost=40883.82..40896.46 rows=558 width=98) (actual time=676.565..678.308 rows=663 loops=1)
Hash Cond: (bl.productgroupkey = pg.product_group_key)
Buffers: shared hit=483 read=22719
CTE rank_query
-> WindowAgg (cost=40507.15..40632.63 rows=5577 width=108) (actual time=660.076..668.272 rows=5418 loops=1)
Buffers: shared hit=480 read=22719
-> Sort (cost=40507.15..40521.09 rows=5577 width=68) (actual time=660.062..661.226 rows=5418 loops=1)
Sort Key: main.year, (sum(main.units)) DESC
Sort Method: quicksort Memory: 616kB
Buffers: shared hit=480 read=22719
-> Finalize HashAggregate (cost=40076.46..40160.11 rows=5577 width=68) (actual time=648.762..653.227 rows=5418 loops=1)
Group Key: main.year, main.product_group_key
Buffers: shared hit=480 read=22719
-> Gather (cost=38710.09..39909.15 rows=11154 width=68) (actual time=597.878..622.379 rows=11938 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=480 read=22719
-> Partial HashAggregate (cost=37710.09..37793.75 rows=5577 width=68) (actual time=594.044..600.494 rows=3979 loops=3)
Group Key: main.year, main.product_group_key
Buffers: shared hit=480 read=22719
-> Parallel Seq Scan on dfo_by_quarter main (cost=0.00..36019.74 rows=169035 width=22) (actual time=106.916..357.071 rows=137171 loops=3)
Filter: ((year >= 2010) AND (release_key = 17))
Rows Removed by Filter: 546602
Buffers: shared hit=480 read=22719
CTE beforelookup
-> HashAggregate (cost=223.08..238.43 rows=558 width=102) (actual time=676.293..677.167 rows=663 loops=1)
Group Key: rank_query.year, CASE WHEN (rank_query.rank_by <= 50) THEN (rank_query.productgroupkey)::integer ELSE '-1'::integer END
Buffers: shared hit=480 read=22719
-> CTE Scan on rank_query (cost=0.00..139.43 rows=5577 width=70) (actual time=660.079..672.978 rows=5418 loops=1)
Buffers: shared hit=480 read=22719
-> CTE Scan on beforelookup bl (cost=0.00..11.16 rows=558 width=102) (actual time=676.296..677.665 rows=663 loops=1)
Buffers: shared hit=480 read=22719
-> Hash (cost=7.34..7.34 rows=434 width=4) (actual time=0.253..0.253 rows=435 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 24kB
Buffers: shared hit=3
-> Seq Scan on dim_dfo_product_group pg (cost=0.00..7.34 rows=434 width=4) (actual time=0.017..0.121 rows=435 loops=1)
Buffers: shared hit=3
Planning time: 0.319 ms
Execution time: 678.714 ms
Does anything spring to mind?
If I read it properly, it means my biggest cost by far is the initial scanof the table... but I don't manage to make it use an index...
I had created an index I hoped would help but it got ignored...
CREATE INDEX eric_silly_index ON milly.dfo_by_quarter(release_key, YEAR, date, product_group_key, units, sales_value_eur);
ANALYZE milly.dfo_by_quarter;
I also tried to cluster the table but no visible effect either
CLUSTER milly.dfo_by_quarter USING pk_milly_dfo_by_quarter; -- took 30 seconds (uidev)
ANALYZE milly.dfo_by_quarter;
Many thanks
Eric

Because release_key isn't actually a unique column, it's not possible from the information you've provided to know whether or not the index should be used. If a high percentage of rows have release_key = 2 or even a smaller percentage of rows match on a large table, it may not be efficient to use the index.
In part this is because Postgres indexes are indirect -- that is the index actually contains a pointer to the location on disk in the heap where the real tuple lives. So looping through an index requires reading an entry from the index, reading the tuple from the heap, and repeating. For a large number of tuples it's often more valuable to scan the heap directly and avoid the indirect disk access penalty.
Edit:
You generally don't want to be using CLUSTER in PostgreSQL; it's not how indexes are maintained, and it's rare to see that in the wild for that reason.
Your updated query with no data gives this plan:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
CTE Scan on beforelookup bl (cost=8.33..8.35 rows=1 width=98) (actual time=0.143..0.143 rows=0 loops=1)
Buffers: shared hit=4
CTE rank_query
-> WindowAgg (cost=8.24..8.26 rows=1 width=108) (actual time=0.126..0.126 rows=0 loops=1)
Buffers: shared hit=4
-> Sort (cost=8.24..8.24 rows=1 width=68) (actual time=0.060..0.061 rows=0 loops=1)
Sort Key: main.year, (sum(main.units)) DESC
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=4
-> GroupAggregate (cost=8.19..8.23 rows=1 width=68) (actual time=0.011..0.011 rows=0 loops=1)
Group Key: main.year, main.product_group_key
Buffers: shared hit=1
-> Sort (cost=8.19..8.19 rows=1 width=64) (actual time=0.011..0.011 rows=0 loops=1)
Sort Key: main.year, main.product_group_key
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=1
-> Index Scan using pk_milly_dfo_by_quarter on dfo_by_quarter main (cost=0.15..8.18 rows=1 width=64) (actual time=0.003..0.003 rows=0 loops=1)
Index Cond: ((release_key = 17) AND (year >= 2010))
Buffers: shared hit=1
CTE beforelookup
-> HashAggregate (cost=0.04..0.07 rows=1 width=102) (actual time=0.128..0.128 rows=0 loops=1)
Group Key: rank_query.year, CASE WHEN (rank_query.rank_by <= 50) THEN (rank_query.productgroupkey)::integer ELSE '-1'::integer END
Buffers: shared hit=4
-> CTE Scan on rank_query (cost=0.00..0.03 rows=1 width=70) (actual time=0.127..0.127 rows=0 loops=1)
Buffers: shared hit=4
Planning Time: 0.723 ms
Execution Time: 0.485 ms
(27 rows)
So PostgreSQL is entirely capable of using the index for your query, but the planner is deciding that it's not worth it (i.e., the costing for using the index directly is higher than the costing for using the parallel sequence scan).
If you set enable_indexscan = off; with no data, you get a bitmap index scan (as I'd expect). If you set enable_bitmapscan = off; with no data you get an (non-parallel) sequence scan.
You should see the plan change back (with large amounts of data) if you set max_parallel_workers = 0;.
But looking at your query's explain results, I'd very much expect using the index to be more expensive and take longer than using the parallel sequence scan. In your updated query you're still scanning a very high percentage of the table and a large number of rows, and you're also forcing accessing the heap by accessing fields not in the index. Postgres 11 (I believe) adds covering indexes which would theoretically allow you to make this query be driven by the index alone, but I'm not at all convinced in this example it would actually be worth it.

Generally, while possible, a PK spanning 7 columns, several of which being varchar(100) is not optimized for performance, to say the least.
Such an index is large to begin with and tends to bloat quickly, if you have updates on involved columns.
I would operate with a surrogate PK, a serial (or bigserial if you have that many rows). Or IDENTITY. See:
Auto increment table column
And a UNIQUE constraint on all 7 to enforce uniqueness (all are NOT NULL anyway).
If you have lots of counting queries with the only predicate on release_key consider an additional plain btree index on just that column.
The data type varchar(100) for so many columns may not be optimal. Some normalization might help.
More advise depends on missing information ...

The answer to my initial question: why is postgresql not using my index on something like SELECT (*)... can be found in the documentation...
Introduction to VACUUM, ANALYZE, EXPLAIN, and COUNT
In particular: This means that every time a row is read from an index, the engine has to also read the actual row in the table to ensure that the row hasn't been deleted.
This explains a lot why I don't manage to get postgresql to use my indexes when, from a SQL Server perspective, it obviously "should".

Related

Optimsing a Prisma based PostgreSQL query with indexes or DB setttings

I'm trying to optimize a pagination query that runs on a table of Videos joined with Channels for a project that uses the Prism ORM.
I don't have a lot of experience adding database indexes to optimize performance and could use some guidance on whether I missed something obvious or I'm simply constrained by the database server hardware.
When extracted, the query that Prisma runs looks like this and takes at least 4-10 seconds to run on 136k videos even after I added some indexes:
SELECT
"public"."Video"."id",
"public"."Video"."youtubeId",
"public"."Video"."channelId",
"public"."Video"."type",
"public"."Video"."status",
"public"."Video"."reviewed",
"public"."Video"."category",
"public"."Video"."youtubeTags",
"public"."Video"."language",
"public"."Video"."title",
"public"."Video"."description",
"public"."Video"."duration",
"public"."Video"."durationSeconds",
"public"."Video"."viewCount",
"public"."Video"."likeCount",
"public"."Video"."commentCount",
"public"."Video"."scheduledStartTime",
"public"."Video"."actualStartTime",
"public"."Video"."actualEndTime",
"public"."Video"."sortTime",
"public"."Video"."createdAt",
"public"."Video"."updatedAt",
"public"."Video"."publishedAt"
FROM
"public"."Video",
(
SELECT
"public"."Video"."sortTime" AS "Video_sortTime_0"
FROM
"public"."Video"
WHERE ("public"."Video"."id") = (29949)
) AS "order_cmp"
WHERE (
("public"."Video"."id") IN(
SELECT
"t0"."id" FROM "public"."Video" AS "t0"
INNER JOIN "public"."Channel" AS "j0" ON ("j0"."id") = ("t0"."channelId")
WHERE (
(NOT "j0"."status" IN('HIDDEN', 'ARCHIVED'))
AND "t0"."id" IS NOT NULL)
)
AND "public"."Video"."status" IN('UPCOMING', 'LIVE', 'PUBLISHED')
AND "public"."Video"."sortTime" <= "order_cmp"."Video_sortTime_0")
ORDER BY
"public"."Video"."sortTime" DESC OFFSET 0;
I haven't yet defined the indexes in my Prisma schema file, I've just been setting them directly on the database. These are the current indexes on the Video and Channel tables:
CREATE UNIQUE INDEX "Video_youtubeId_key" ON public."Video" USING btree ("youtubeId") CREATE INDEX "Video_status_idx" ON public."Video" USING btree (status)
CREATE INDEX "Video_sortTime_idx" ON public."Video" USING btree ("sortTime" DESC)
CREATE UNIQUE INDEX "Video_pkey" ON public."Video" USING btree (id)
CREATE INDEX "Video_channelId_idx" ON public."Video" USING btree ("channelId")
CREATE UNIQUE INDEX "Channel_youtubeId_key" ON public."Channel" USING btree ("youtubeId")
CREATE UNIQUE INDEX "Channel_pkey" ON public."Channel" USING btree (id)
EXPLAIN (ANALYZE,BUFFERS) for the query shows this (analyzer tool link):
Sort (cost=114760.67..114867.67 rows=42801 width=1071) (actual time=4115.144..4170.368 rows=13943 loops=1)
Sort Key: ""Video"".""sortTime"" DESC
Sort Method: external merge Disk: 12552kB
Buffers: shared hit=19049 read=54334 dirtied=168, temp read=1569 written=1573
I/O Timings: read=11229.719
-> Nested Loop (cost=39030.38..91423.62 rows=42801 width=1071) (actual time=2720.873..4037.549 rows=13943 loops=1)
Join Filter: (""Video"".""sortTime"" <= ""Video_1"".""sortTime"")
Rows Removed by Join Filter: 115529
Buffers: shared hit=19049 read=54334 dirtied=168
I/O Timings: read=11229.719
-> Index Scan using ""Video_pkey"" on ""Video"" ""Video_1"" (cost=0.42..8.44 rows=1 width=8) (actual time=0.852..1.642 rows=1 loops=1)
Index Cond: (id = 29949)
Buffers: shared hit=2 read=2
I/O Timings: read=0.809
-> Gather (cost=39029.96..89810.14 rows=128404 width=1071) (actual time=2719.274..4003.170 rows=129472 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=19047 read=54332 dirtied=168
I/O Timings: read=11228.910
-> Parallel Hash Semi Join (cost=38029.96..75969.74 rows=53502 width=1071) (actual time=2695.849..3959.412 rows=43157 loops=3)
Hash Cond: (""Video"".id = t0.id)
Buffers: shared hit=19047 read=54332 dirtied=168
I/O Timings: read=11228.910
-> Parallel Seq Scan on ""Video"" (cost=0.00..37202.99 rows=53938 width=1071) (actual time=0.929..1236.450 rows=43157 loops=3)
Filter: (status = ANY ('{UPCOMING,LIVE,PUBLISHED}'::""VideoStatus""[]))
Rows Removed by Filter: 3160
Buffers: shared hit=9289 read=27118
I/O Timings: read=3526.407
-> Parallel Hash (cost=37312.18..37312.18 rows=57422 width=4) (actual time=2692.172..2692.180 rows=46084 loops=3)
Buckets: 262144 Batches: 1 Memory Usage: 7520kB
Buffers: shared hit=9664 read=27214 dirtied=168
I/O Timings: read=7702.502
-> Hash Join (cost=173.45..37312.18 rows=57422 width=4) (actual time=3.485..2666.998 rows=46084 loops=3)
Hash Cond: (t0.""channelId"" = j0.id)
Buffers: shared hit=9664 read=27214 dirtied=168
I/O Timings: read=7702.502
-> Parallel Seq Scan on ""Video"" t0 (cost=0.00..36985.90 rows=57890 width=8) (actual time=1.774..2646.207 rows=46318 loops=3)
Filter: (id IS NOT NULL)
Buffers: shared hit=9193 read=27214 dirtied=168
I/O Timings: read=7702.502
-> Hash (cost=164.26..164.26 rows=735 width=4) (actual time=1.132..1.136 rows=735 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 34kB
Buffers: shared hit=471
-> Seq Scan on ""Channel"" j0 (cost=0.00..164.26 rows=735 width=4) (actual time=0.024..0.890 rows=735 loops=3)
Filter: (status <> ALL ('{HIDDEN,ARCHIVED}'::""ChannelStatus""[]))
Rows Removed by Filter: 6
Buffers: shared hit=471
Planning Time: 8.134 ms
Execution Time: 4173.202 ms
Now a hint from that same tool seems to suggest that it needed to use disk space for sorting, since my work_mem setting was too low (needed 12560kB and on my Lightsail Postgres DB with 1 gig of RAM, it's set to '4M').
I'm a bit nervous about bumping work_mem to something like 16 or even 24M on a whim. Is that too much for my server's total RAM? Does this look like my root problem? Is there anything else I can do with indexes or my query?
If it helps, the actual Prisma query looks like this
const videos = await ctx.prisma.video.findMany({
where: {
channel: {
NOT: {
status: {
in: [ChannelStatus.HIDDEN, ChannelStatus.ARCHIVED],
},
},
},
status: {
in: [VideoStatus.UPCOMING, VideoStatus.LIVE, VideoStatus.PUBLISHED],
},
},
include: {
channel: {
include: {
links: true,
},
},
},
cursor: _args.cursor
? {
id: _args.cursor,
}
: undefined,
skip: _args.cursor ? 1 : 0,
orderBy: {
sortTime: 'desc',
},
take: Math.min(_args.limit, config.GRAPHQL_MAX_RECENT_VIDEOS),
});
Even if I eliminate the join with the Channel table from the Prisma query entirely, the performance doesn't improve by much and a query still takes 7-8 seconds to run.
This query is a bit of an ORM generated nested-select mess. Nested-selects get in the way of the optimizer. Joins are usually better.
If written by hand, the query would be something like this.
select *
from video
join channel on channel.id = video.channelId
where video.status in('UPCOMING', 'LIVE', 'PUBLISHED')
-- Not clear what this is for, might be wacky pagination?
and video.sortTime <= (
select sortTime from video where id = 29949
)
and not channel.status in('HIDDEN', 'ARCHIVED')
order by sortTime desc
offset 0
limit 100
Pretty straight-forward. Much easier to understand and optimize.
Same as below, this query would benefit from a single composite index on sortTime, status.
And since you're paginating, using limit to only get as many rows as you need in a page can drastically help with performance. Otherwise Postgres will do all the work to calculate all rows.
The performance is getting killed by multiple sequential scans of video.
-> Parallel Seq Scan on ""Video"" (cost=0.00..37202.99 rows=53938 width=1071) (actual time=0.929..1236.450 rows=43157 loops=3)
Filter: (status = ANY ('{UPCOMING,LIVE,PUBLISHED}'::""VideoStatus""[]))
Rows Removed by Filter: 3160
Buffers: shared hit=9289 read=27118
I/O Timings: read=3526.407
-> Parallel Seq Scan on ""Video"" t0 (cost=0.00..36985.90 rows=57890 width=8) (actual time=1.774..2646.207 rows=46318 loops=3)
Filter: (id IS NOT NULL)
Buffers: shared hit=9193 read=27214 dirtied=168
I/O Timings: read=7702.502
Looking at the where clause...
WHERE (
("public"."Video"."id") IN(
SELECT
"t0"."id" FROM "public"."Video" AS "t0"
INNER JOIN "public"."Channel" AS "j0" ON ("j0"."id") = ("t0"."channelId")
WHERE (
(NOT "j0"."status" IN('HIDDEN', 'ARCHIVED'))
AND "t0"."id" IS NOT NULL)
)
AND "public"."Video"."status" IN('UPCOMING', 'LIVE', 'PUBLISHED')
AND "public"."Video"."sortTime" <= "order_cmp"."Video_sortTime_0")
But you have indexes on video.id and video.status. Why is it doing a seq scan?
In general, Postgres will only use one index per query. Your query needs to check three columns: id, status, and sortTime. Postgres can only use one index, so it uses the one on sortTime and has to seq scan for the rest.
To solve this, try creating a single composite index on both sortTime and status. This will allow Postgres to use an index for both the status and sortTime parts of the query.
create index video_sortTime_status_idx on video (sortTime, status)
With this index the separate sortTime index is no longer necessary, drop it.
The second seq scan is from "t0"."id" IS NOT NULL. "t0" is the Video table. "id" is its primary key. It should be impossible for a primary key to be null, so remove that.
I don't think any index or db setting is going to improve your existing query by much.
Two small changes to the existing query does get it to use the "sortTime" index for ordering, but I don't know if you can influence Prisma to make the changes. One is to add the explicit LIMIT (although I don't know if that is necessary, I don't know how to test it with the "take" method instead of the LIMIT method), and the other is to move the 29949 subquery out of the join and put it directly into the WHERE.
AND "public"."Video"."sortTime" <= (
SELECT
"public"."Video"."sortTime" AS "Video_sortTime_0"
FROM
"public"."Video"
WHERE ("public"."Video"."id") = (29949)
)
But if Prisma allows you to inject custom queries, I would just rewrite it from scratch along the lines Schwern has suggested.
An improvement in the PostgreSQL planner might get it to work without moving the subquery, but even if we knew exactly what to change and had a high-quality implementation of it and could convince people it was a trade-off free improvement, it would still not be released for over a year (in v16), so it wouldn't help you immediately and I wouldn't have much hope for getting it accepted anyway.

Improve PostgreSQL query response

I have one table that includes about 100K rows and their growth and growth.
My query response has a bad response time and it affects my front-end user experience.
I want to ask for your help to improve my response time from the DB.
Today the PostgreSQL runs on my local machine, Macbook pro 13 2019 16 RAM and I5 Core.
I the future I will load this DB on a docker and run it on a Better server.
What can I do to improve it for now?
Table Structure:
CREATE TABLE dots
(
dot_id INT,
site_id INT,
latitude float ( 6 ),
longitude float ( 6 ),
rsrp float ( 6 ),
dist INT,
project_id INT,
dist_from_site INT,
geom geometry,
dist_from_ref INT,
file_name VARCHAR
);
The dot_id resets after inserting the bulk of data and each for "file_name" column.
Table Dots:
The queries:
Query #1:
await db.query(
`select MAX(rsrp) FROM dots where site_id=$1 and ${table}=$2 and project_id = $3 and file_name ilike $4`,
[site_id, dist, project_id, filename]
);
Time for response: 200ms
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=37159.88..37159.89 rows=1 width=4) (actual time=198.416..201.762 rows=1 loops=1)
Buffers: shared hit=16165 read=16031
-> Gather (cost=37159.66..37159.87 rows=2 width=4) (actual time=198.299..201.752 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=16165 read=16031
-> Partial Aggregate (cost=36159.66..36159.67 rows=1 width=4) (actual time=179.009..179.010 rows=1 loops=3)
Buffers: shared hit=16165 read=16031
-> Parallel Seq Scan on dots (cost=0.00..36150.01 rows=3861 width=4) (actual time=122.889..178.817 rows=1088 loops=3)
Filter: (((file_name)::text ~~* 'BigFile'::text) AND (site_id = 42047) AND (dist_from_ref = 500) AND (project_id = 1))
Rows Removed by Filter: 157073
Buffers: shared hit=16165 read=16031
Planning Time: 0.290 ms
Execution Time: 201.879 ms
(14 rows)
Query #2:
await db.query(
`SELECT DISTINCT (${table}) FROM dots where site_id=$1 and project_id = $2 and file_name ilike $3 order by ${table}`,
[site_id, project_id, filename]
);
Time for response: 1100ms
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Sort (cost=41322.12..41322.31 rows=77 width=4) (actual time=1176.071..1176.077 rows=66 loops=1)
Sort Key: dist_from_ref
Sort Method: quicksort Memory: 28kB
Buffers: shared hit=16175 read=16021
-> HashAggregate (cost=41318.94..41319.71 rows=77 width=4) (actual time=1176.024..1176.042 rows=66 loops=1)
Group Key: dist_from_ref
Batches: 1 Memory Usage: 24kB
Buffers: shared hit=16175 read=16021
-> Seq Scan on dots (cost=0.00..40499.42 rows=327807 width=4) (actual time=0.423..1066.316 rows=326668 loops=1)
Filter: (((file_name)::text ~~* 'BigFile'::text) AND (site_id = 42047) AND (project_id = 1))
Rows Removed by Filter: 147813
Buffers: shared hit=16175 read=16021
Planning:
Buffers: shared hit=5 dirtied=1
Planning Time: 0.242 ms
Execution Time: 1176.125 ms
(16 rows)
Query #3:
await db.query(
`SELECT count(*) FROM dots WHERE site_id = $1 AND ${table} = $2 and project_id = $3 and file_name ilike $4`,
[site_id, dist, project_id, filename]
);
Time for response: 200ms
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=37159.88..37159.89 rows=1 width=8) (actual time=198.725..202.335 rows=1 loops=1)
Buffers: shared hit=16160 read=16036
-> Gather (cost=37159.66..37159.87 rows=2 width=8) (actual time=198.613..202.328 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=16160 read=16036
-> Partial Aggregate (cost=36159.66..36159.67 rows=1 width=8) (actual time=179.182..179.183 rows=1 loops=3)
Buffers: shared hit=16160 read=16036
-> Parallel Seq Scan on dots (cost=0.00..36150.01 rows=3861 width=0) (actual time=119.340..179.020 rows=1088 loops=3)
Filter: (((file_name)::text ~~* 'BigFile'::text) AND (site_id = 42047) AND (dist_from_ref = 500) AND (project_id = 1))
Rows Removed by Filter: 157073
Buffers: shared hit=16160 read=16036
Planning Time: 0.109 ms
Execution Time: 202.377 ms
(14 rows)
Tables do no have any indexes.
I added an index and it helped a bit... create index idx1 on dots (site_id, project_id, file_name, dist_from_site,dist_from_ref)
OK, that's a bit too much.
An index on columns (a,b) is useful for "where a=..." and also for "where a=... and b=..." but it is not really useful for "where b=...". Creating an index with many columns uses more disk space and is slower than with fewer columns, which is a waste if the extra columns in the index don't make your queries faster. Both dist_ columns in the index will probably not be used.
Indices are a compromise : if your table has small rows, like two integer columns, and you create an index on these two columns, then it will be as big as the table, so you better be sure you need it. But in your case, your rows are pretty large at around 5kB and the number of rows is small, so adding an index or several on small int columns costs very little overhead.
So, since you very often use WHERE conditions on both site_id and project_id you can create an index on site_id,project_id. This will also work for a WHERE condition on site_id alone. If you sometimes use project_id alone, you can swap the order of the columns so it appears first, or even create another index.
You say the cardinality of these columns is about 30, so selecting on one value of site_id or project_id should hit 1/30 or 3.3% of the table, and selecting on both columns should hit 0.1% of the table, if they are uncorrelated and evenly distributed. This should already result in a substantial speedup.
You could also add an index on dist_from_site, and another one on dist_on_ref, if they have good selectivity (ie, high cardinality in those columns). Postgres can combine indices with bitmap index scan. But, if say 50% of the rows in the table have the same value for dist_from_site, then an index will be useless for this value, due to not having enough selectivity.
You could also replace the previous 2-column index with 2 indices on site_id,project_id,dist_from_site and site_id,project_id,dist_from_ref. You can try it, see if it is worth the extra resources.
Also there's the filename column and ILIKE. ILIKE can't use an index, which means it's slow. One solution is to use an expression index
CREATE INDEX dots_filename ON dots( lower(file_name) );
and replace your where condition with:
lower(file_name) like lower($4)
This will use the index unless the parameter $4 starts with a "%". And if you never use the LIKE '%' wildcards, and you're just using ILIKE for case-insensitive comparison, then you can replace LIKE with the = operator. Basically lower(a) = lower(b) is a case-insensitive comparison.
Likewise you could replace the previous 2-column index with an index on site_id,project_id,lower(filename) if you often use these three together in the WHERE condition. But as said above, it won't optimize searches on filename alone.
Since your rows are huge, even adding 1 kB of index per row will only add 20% overhead to your table, so you can overdo it without too much trouble. So go ahead and experiment, you'll see what works best.

Postgresql 10 query with large(?) IN lists in filters

I am working on a relatively simple query:
SELECT row.id, row.name FROM things AS row
WHERE row.type IN (
'00000000-0000-0000-0000-000000031201',
...
)
ORDER BY row.name ASC, row.id ASC
LIMIT 2000;
The problem:
the query is fine if the list contains 25 or less UUIDs:
Limit (cost=21530.51..21760.51 rows=2000 width=55) (actual time=5.057..7.780 rows=806 loops=1)
-> Gather Merge (cost=21530.51..36388.05 rows=129196 width=55) (actual time=5.055..6.751 rows=806 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Sort (cost=20530.50..20853.49 rows=129196 width=55) (actual time=2.273..2.546 rows=403 loops=2)
Sort Key: name, id
Sort Method: quicksort Memory: 119kB
-> Parallel Index Only Scan using idx_things_type_name_id on things row (cost=0.69..9562.28 rows=129196 width=55) (actual time=0.065..0.840 rows=403 loops=2)
Index Cond: (type = ANY ('{00000000-0000-0000-0000-000000031201,... (< 24 more)}'::text[]))
Heap Fetches: 0
Planning time: 0.202 ms
Execution time: 8.485 ms
but once the list grows larger than 25 elements a different index is used and the query execution time really goes up:
Limit (cost=1000.58..15740.63 rows=2000 width=55) (actual time=11.553..29789.670 rows=952 loops=1)
-> Gather Merge (cost=1000.58..2400621.01 rows=325592 width=55) (actual time=11.551..29855.053 rows=952 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Index Scan using idx_things_name_id on things row (cost=0.56..2362039.59 rows=135663 width=55) (actual time=3.570..24437.039 rows=317 loops=3)
Filter: ((type)::text = ANY ('{00000000-0000-0000-0000-000000031201,... (> 24 more)}'::text[]))
Rows Removed by Filter: 5478258
Planning time: 0.209 ms
Execution time: 29857.454 ms
Details:
The table contains 16435726 rows, 17 columns. The 3 columns relevant to the query are:
id - varchar(36), not null, unique, primary key
type - varchar(36), foreign key
name - varchar(2000)
The relevant indexes are:
create unique index idx_things_pkey on things (id);
create index idx_things_type on things (type);
create index idx_things_name_id on things (name, id);
create index idx_things_type_name_id on things (type, name, id);
there are 70 different type values of which 2 account for ~15 million rows. Those two are NOT in the IN list.
Experiments and questions:
I started by checking if this index helps:
create index idx_things_name_id_type ON things (name, id, type);
it did but slightly. 12s is not acceptable:
Limit (cost=1000.71..7638.73 rows=2000 width=55) (actual time=5.888..12120.907 rows=952 loops=1)
-> Gather Merge (cost=1000.71..963238.21 rows=289917 width=55) (actual time=5.886..12154.580 rows=952 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Index Only Scan using idx_things_name_id_type on things row (cost=0.69..928774.57 rows=120799 width=55) (actual time=1.024..9852.923 rows=317 loops=3)
Filter: ((type)::text = ANY ('{00000000-0000-0000-0000-000000031201,... 37 more}'::text[]))
Rows Removed by Filter: 5478258
Heap Fetches: 0
Planning time: 0.638 ms
Execution time: 12156.817 ms
I know that large IN lists are not efficient in Postgres but I was surprised to hit this as soon as 25 elements. Or is the issue here something else?
I tried the solutions suggested in other posts (to inner join to a VALUES list, too change IN to IN VALUES, ...) but it made things even worse. Here is an example of one if the experiments:
SELECT row.id, row.name
FROM things AS row
WHERE row.type IN (VALUES ('00000000-0000-0000-0000-000000031201'), ... )
ORDER BY row.name ASC, row.id ASC
LIMIT 2000;
Limit (cost=0.56..1254.91 rows=2000 width=55) (actual time=45.718..847919.632 rows=952 loops=1)
-> Nested Loop Semi Join (cost=0.56..10298994.72 rows=16421232 width=55) (actual time=45.714..847917.788 rows=952 loops=1)
Join Filter: ((row.type)::text = "*VALUES*".column1)
Rows Removed by Join Filter: 542360414
-> Index Scan using idx_things_name_id on things row (cost=0.56..2170484.38 rows=16421232 width=92) (actual time=0.132..61387.582 rows=16435726 loops=1)
-> Materialize (cost=0.00..0.58 rows=33 width=32) (actual time=0.001..0.022 rows=33 loops=16435726)
-> Values Scan on "*VALUES*" (cost=0.00..0.41 rows=33 width=32) (actual time=0.004..0.030 rows=33 loops=1)
Planning time: 1.131 ms
Execution time: 847920.680 ms
(9 rows)
Query plan from inner join values():
Limit (cost=0.56..1254.91 rows=2000 width=55) (actual time=38.289..847714.160 rows=952 loops=1)
-> Nested Loop (cost=0.56..10298994.72 rows=16421232 width=55) (actual time=38.287..847712.333 rows=952 loops=1)
Join Filter: ((row.type)::text = "*VALUES*".column1)
Rows Removed by Join Filter: 542378006
-> Index Scan using idx_things_name_id on things row (cost=0.56..2170484.38 rows=16421232 width=92) (actual time=0.019..60303.676 rows=16435726 loops=1)
-> Materialize (cost=0.00..0.58 rows=33 width=32) (actual time=0.001..0.022 rows=33 loops=16435726)
-> Values Scan on "*VALUES*" (cost=0.00..0.41 rows=33 width=32) (actual time=0.002..0.029 rows=33 loops=1)
Planning time: 0.247 ms
Execution time: 847715.215 ms
(9 rows)
Am I doing something incorrectly here?
Any tips on how to handle this?
If any more info is needed I will add it as you guys ask.
ps. The column/table/index names were "anonymised" to comply with the company policy so please do not point to the stupid names :)
I actually figured out what is going on. Postgres planner is correct in its decision. But it makes it basing on not perfect statistical data. The key are those lines of the query plans:
Below 25 UUIDs:
-> Gather Merge (cost=21530.51..36388.05 **rows=129196** width=55)
(actual time=5.055..6.751 **rows=806** loops=1)
Overestimated by ~160 times
Over 25 UUIDs:
-> Gather Merge (cost=1000.58..2400621.01 **rows=325592** width=55)
(actual time=11.551..29855.053 **rows=952** loops=1)
overestimated by ~342(!) times
If this would indeed be 325592 than using the index on (type, id) which already is sorted as needed and filtering from it could be the most efficient. But because the overestimation Postgres needs to remove over 5M rows to fetch the full result:
**Rows Removed by Filter: 5478258**
I guess Postgres figures out that sorting 325592 rows (query > 25 UUIDs) will be so expensive that it is more beneficial to use the already sorted index vs sorting of 129196 rows (query <25 UUIDs) which it can sort in memory.
I took a peek into pg_stats and the statistics were quite unhelpful.
This is because just a few types that are not present in the query occur so often and the UUIDs that do occur fall into the histogram and are overestimated.
Increasing the statistics target for this column:
ALTER TABLE things ALTER COLUMN type SET STATISTICS 1000;
Solved the issue.
Now the query executes in 8ms also for UUIDs lists with more than 25 elements.
UPDATE:
Because of claims that the :text cast visible in the query plan is the culprit I went to the test server and run:
ALTER TABLE things ALTER COLUMN type TYPE text;
no there is no cast but nothing has changed:
`Limit (cost=1000.58..20590.31 rows=2000 width=55) (actual time=12.761..30611.878 rows=952 loops=1)
-> Gather Merge (cost=1000.58..2386715.56 rows=243568 width=55) (actual time=12.759..30682.784 rows=952 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Index Scan using idx_things_name_id on things row (cost=0.56..2357601.74 rows=101487 width=55) (actual time=3.994..24190.693 rows=317 loops=3)
Filter: (type = ANY ('{00000000-0000-0000-0000-000000031201,... (> 24 more)}'::text[]))
Rows Removed by Filter: 5478258
Planning time: 0.227 ms
Execution time: 30685.092 ms
(9 rows)`
This is related with how the PostgreSQL choose to use or not the subquery JOIN.
The query optimizer is free to try to decide how he acts (if uses or not indexes) by some internal rules and statistical behaviors and try to do the cheapest costs.
But, sometimes, this decision could not be the best. You can "force" your optimizer to use joins changing the join_collapse_limit.
You can try using SET join_collapse_limit = 1 and then execute your query to "force" to use the indexes.
Remember to put this option on session, normally the query optimizer has right on decisions.
Is an option that worked for me for subqueries, maybe a try inside a IN will also helps to force your query to use the indexes.
there are 70 different type values of which 2 account for ~15 million rows. Those two are NOT in the IN list
And you address this domain using a 36 character key. That is 36*8 bits of space, where only 7 bits are needed.
put these 70 different types in a separate LookUpTable, containing a surrogate key
the original type field could have a UNIQUE constraint in this table
refer to this table, using its surrogate id as a Foreign Key
... and: probably similar for the other obese keys

Why is only one index used

I have a table
CREATE TABLE timedevent
(
id bigint NOT NULL,
eventdate timestamp with time zone NOT NULL,
newstateids character varying(255) NOT NULL,
sourceid character varying(255) NOT NULL,
CONSTRAINT timedevent_pkey PRIMARY KEY (id)
) WITH (OIDS=FALSE);
with PK id.
I have to query rows between two dates with certain newstate and source from a set of possible sources.
I created a btree indexes on eventdate and newstateids and one more (hash index) on sourceid. Only the index on date made the queries faster - it seems the other two are not used. Why is that so? How could I make my queries faster?
CREATE INDEX eventdate_index ON timedevent USING btree (eventdate);
CREATE INDEX newstateids_index ON timedevent USING btree (newstateids COLLATE pg_catalog."default");
CREATE INDEX sourceid_index_hash ON timedevent USING hash (sourceid COLLATE pg_catalog."default");
Here is the query as Hibernate generates it:
select this_.id as id1_0_0_, this_.description as descript2_0_0_, this_.eventDate as eventDat3_0_0_, this_.locationId as location4_0_0_, this_.newStateIds as newState5_0_0_, this_.oldStateIds as oldState6_0_0_, this_.sourceId as sourceId7_0_0_
from TimedEvent this_
where ((this_.newStateIds=? and this_.sourceId in (?, ?, ?, ?, ?, ?)))
and this_.eventDate between ? and ?
limit ?
EDIT:
Sorry for the misleading title but it seem postges uses all indexes. The problem is my query time still remains the same. Here ist the query plan I got:
Limit (cost=25130.29..33155.77 rows=321 width=161) (actual time=705.330..706.744 rows=279 loops=1)
Buffers: shared hit=6 read=8167 written=61
-> Bitmap Heap Scan on timedevent this_ (cost=25130.29..33155.77 rows=321 width=161) (actual time=705.330..706.728 rows=279 loops=1)
Recheck Cond: (((sourceid)::text = ANY ('{"root,kus-chemnitz,ize-159,Anwesend Bad","root,kus-chemnitz,ize-159,Alarmruf","root,kus-chemnitz,ize-159,Bett Alarm 1","root,kus-chemnitz,ize-159,Bett Alarm 2","root,kus-chemnitz,ize-159,Anwesend Zimmer" (...)
Filter: ((eventdate >= '2017-11-01 15:41:00+01'::timestamp with time zone) AND (eventdate <= '2018-03-20 14:58:16.724+01'::timestamp with time zone))
Buffers: shared hit=6 read=8167 written=61
-> BitmapAnd (cost=25130.29..25130.29 rows=2122 width=0) (actual time=232.990..232.990 rows=0 loops=1)
Buffers: shared hit=6 read=2152
-> Bitmap Index Scan on sourceid_index_hash (cost=0.00..1403.36 rows=39182 width=0) (actual time=1.195..1.195 rows=9308 loops=1)
Index Cond: ((sourceid)::text = ANY ('{"root,kus-chemnitz,ize-159,Anwesend Bad","root,kus-chemnitz,ize-159,Alarmruf","root,kus-chemnitz,ize-159,Bett Alarm 1","root,kus-chemnitz,ize-159,Bett Alarm 2","root,kus-chemnitz,ize-159,Anwesend Z (...)
Buffers: shared hit=6 read=26
-> Bitmap Index Scan on state_index (cost=0.00..23726.53 rows=777463 width=0) (actual time=231.160..231.160 rows=776520 loops=1)
Index Cond: ((newstateids)::text = 'ACTIV'::text)
Buffers: shared read=2126
Total runtime: 706.804 ms
After creating an index using btree on (sourceid, newstateids) as a_horse_with_no_name suggested, the cost reduced:
Limit (cost=125.03..8150.52 rows=321 width=161) (actual time=13.611..14.454 rows=279 loops=1)
Buffers: shared hit=18 read=4336
-> Bitmap Heap Scan on timedevent this_ (cost=125.03..8150.52 rows=321 width=161) (actual time=13.609..14.432 rows=279 loops=1)
Recheck Cond: (((sourceid)::text = ANY ('{"root,kus-chemnitz,ize-159,Anwesend Bad","root,kus-chemnitz,ize-159,Alarmruf","root,kus-chemnitz,ize-159,Bett Alarm 1","root,kus-chemnitz,ize-159,Bett Alarm 2","root,kus-chemnitz,ize-159,Anwesend Zimmer","r (...)
Filter: ((eventdate >= '2017-11-01 15:41:00+01'::timestamp with time zone) AND (eventdate <= '2018-03-20 14:58:16.724+01'::timestamp with time zone))
Buffers: shared hit=18 read=4336
-> Bitmap Index Scan on src_state_index (cost=0.00..124.95 rows=2122 width=0) (actual time=0.864..0.864 rows=4526 loops=1)
Index Cond: (((sourceid)::text = ANY ('{"root,kus-chemnitz,ize-159,Anwesend Bad","root,kus-chemnitz,ize-159,Alarmruf","root,kus-chemnitz,ize-159,Bett Alarm 1","root,kus-chemnitz,ize-159,Bett Alarm 2","root,kus-chemnitz,ize-159,Anwesend Zimmer (...)
Buffers: shared hit=18 read=44
Total runtime: 14.497 ms"
Basically only one index is used because database would have to combine your indexes into one so that they would be useful (or combine results from searches over more indexes) and doing this is so expensive, that it in this case chooses not to and use just one of the indexes relevant for one predicate and check the other predicates directly on values in found rows.
One B-tree index with several columns would work better, just as a_horse_with_no_name suggests in comments. Also note, that order of columns matter a lot (columns that are used for single value search should be first, the one for range search later, you want to limit range searches as much as possible).
Then databese will go through index, looking for rows that satisfy on predicate using the first column of index (hopefully narrowing the numbers of rows a lot) and the the second column and second predicate come to play, ...
Using separate B-tree indexes when predicates are combined using AND operator does not make sense to database, because it would have to use one index to choose all rows, which satisfy one predicate and then, it would have to use another index, read its blocks (where the index is stored) from disk again, only to get to rows which satisfy the condition relevant to this second index, but possibly not the other condition. And if they satisfy it, it's probably cheaper to just load the row after first using index and check the other predicates directly, not using index.

postgres text field performance

I am using postgres 9.2.4.
We have a background job that imports user's emails into our system and stores them in a postgres database table.
Below is the table:
CREATE TABLE emails
(
id serial NOT NULL,
subject text,
body text,
personal boolean,
sent_at timestamp without time zone NOT NULL,
created_at timestamp without time zone,
updated_at timestamp without time zone,
account_id integer NOT NULL,
sender_user_id integer,
sender_contact_id integer,
html text,
folder text,
draft boolean DEFAULT false,
check_for_response timestamp without time zone,
send_time timestamp without time zone,
CONSTRAINT emails_pkey PRIMARY KEY (id),
CONSTRAINT emails_account_id_fkey FOREIGN KEY (account_id)
REFERENCES accounts (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE,
CONSTRAINT emails_sender_contact_id_fkey FOREIGN KEY (sender_contact_id)
REFERENCES contacts (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE
)
WITH (
OIDS=FALSE
);
ALTER TABLE emails
OWNER TO paulcowan;
-- Index: emails_account_id_index
-- DROP INDEX emails_account_id_index;
CREATE INDEX emails_account_id_index
ON emails
USING btree
(account_id);
-- Index: emails_sender_contact_id_index
-- DROP INDEX emails_sender_contact_id_index;
CREATE INDEX emails_sender_contact_id_index
ON emails
USING btree
(sender_contact_id);
-- Index: emails_sender_user_id_index
-- DROP INDEX emails_sender_user_id_index;
CREATE INDEX emails_sender_user_id_index
ON emails
USING btree
(sender_user_id);
The query is further complicated because I have a view on this table where I pull in other data:
CREATE OR REPLACE VIEW email_graphs AS
SELECT emails.id, emails.subject, emails.body, emails.folder, emails.html,
emails.personal, emails.draft, emails.created_at, emails.updated_at,
emails.sent_at, emails.sender_contact_id, emails.sender_user_id,
emails.addresses, emails.read_by, emails.check_for_response,
emails.send_time, ts.ids AS todo_ids, cs.ids AS call_ids,
ds.ids AS deal_ids, ms.ids AS meeting_ids, c.comments, p.people,
atts.ids AS attachment_ids
FROM emails
LEFT JOIN ( SELECT todos.reference_email_id AS email_id,
array_to_json(array_agg(todos.id)) AS ids
FROM todos
GROUP BY todos.reference_email_id) ts ON ts.email_id = emails.id
LEFT JOIN ( SELECT calls.reference_email_id AS email_id,
array_to_json(array_agg(calls.id)) AS ids
FROM calls
GROUP BY calls.reference_email_id) cs ON cs.email_id = emails.id
LEFT JOIN ( SELECT deals.reference_email_id AS email_id,
array_to_json(array_agg(deals.id)) AS ids
FROM deals
GROUP BY deals.reference_email_id) ds ON ds.email_id = emails.id
LEFT JOIN ( SELECT meetings.reference_email_id AS email_id,
array_to_json(array_agg(meetings.id)) AS ids
FROM meetings
GROUP BY meetings.reference_email_id) ms ON ms.email_id = emails.id
LEFT JOIN ( SELECT comments.email_id,
array_to_json(array_agg(( SELECT row_to_json(r.*) AS row_to_json
FROM ( VALUES (comments.id,comments.text,comments.author_id,comments.created_at,comments.updated_at)) r(id, text, author_id, created_at, updated_at)))) AS comments
FROM comments
WHERE comments.email_id IS NOT NULL
GROUP BY comments.email_id) c ON c.email_id = emails.id
LEFT JOIN ( SELECT email_participants.email_id,
array_to_json(array_agg(( SELECT row_to_json(r.*) AS row_to_json
FROM ( VALUES (email_participants.user_id,email_participants.contact_id,email_participants.kind)) r(user_id, contact_id, kind)))) AS people
FROM email_participants
GROUP BY email_participants.email_id) p ON p.email_id = emails.id
LEFT JOIN ( SELECT attachments.reference_email_id AS email_id,
array_to_json(array_agg(attachments.id)) AS ids
FROM attachments
GROUP BY attachments.reference_email_id) atts ON atts.email_id = emails.id;
ALTER TABLE email_graphs
OWNER TO paulcowan;
We then run paginated queries against this view e.g.
SELECT "email_graphs".* FROM "email_graphs" INNER JOIN "email_participants" ON ("email_participants"."email_id" = "email_graphs"."id") WHERE (("user_id" = 75) AND ("folder" = 'INBOX')) ORDER BY "sent_at" DESC LIMIT 5 OFFSET 0
As the table has grown, queries on this table have dramatically slowed down.
If I run the paginated query with EXPLAIN ANALYZE
EXPLAIN ANALYZE SELECT "email_graphs".* FROM "email_graphs" INNER JOIN "email_participants" ON ("email_participants"."email_id" = "email_graphs"."id") WHERE (("user_id" = 75) AND ("folder" = 'INBOX')) ORDER BY "sent_at" DESC LIMIT 5 OFFSET 0;
I get this result
-> Seq Scan on deals (cost=0.00..9.11 rows=36 width=8) (actual time=0.003..0.044 rows=34 loops=1)
-> Sort (cost=5.36..5.43 rows=131 width=36) (actual time=0.416..0.416 rows=1 loops=1)
Sort Key: ms.email_id
Sort Method: quicksort Memory: 26kB
-> Subquery Scan on ms (cost=3.52..4.44 rows=131 width=36) (actual time=0.408..0.411 rows=1 loops=1)
-> HashAggregate (cost=3.52..4.05 rows=131 width=8) (actual time=0.406..0.408 rows=1 loops=1)
-> Seq Scan on meetings (cost=0.00..3.39 rows=131 width=8) (actual time=0.006..0.163 rows=161 loops=1)
-> Sort (cost=18.81..18.91 rows=199 width=36) (actual time=0.012..0.012 rows=0 loops=1)
Sort Key: c.email_id
Sort Method: quicksort Memory: 25kB
-> Subquery Scan on c (cost=15.90..17.29 rows=199 width=36) (actual time=0.007..0.007 rows=0 loops=1)
-> HashAggregate (cost=15.90..16.70 rows=199 width=60) (actual time=0.006..0.006 rows=0 loops=1)
-> Seq Scan on comments (cost=0.00..12.22 rows=736 width=60) (actual time=0.004..0.004 rows=0 loops=1)
Filter: (email_id IS NOT NULL)
Rows Removed by Filter: 2
SubPlan 1
-> Values Scan on "*VALUES*" (cost=0.00..0.00 rows=1 width=56) (never executed)
-> Materialize (cost=4220.14..4883.55 rows=27275 width=36) (actual time=247.720..1189.545 rows=29516 loops=1)
-> GroupAggregate (cost=4220.14..4788.09 rows=27275 width=15) (actual time=247.715..1131.787 rows=29516 loops=1)
-> Sort (cost=4220.14..4261.86 rows=83426 width=15) (actual time=247.634..339.376 rows=82632 loops=1)
Sort Key: public.email_participants.email_id
Sort Method: external sort Disk: 1760kB
-> Seq Scan on email_participants (cost=0.00..2856.28 rows=83426 width=15) (actual time=0.009..88.938 rows=82720 loops=1)
SubPlan 2
-> Values Scan on "*VALUES*" (cost=0.00..0.00 rows=1 width=40) (actual time=0.004..0.005 rows=1 loops=82631)
-> Sort (cost=2.01..2.01 rows=1 width=36) (actual time=0.074..0.077 rows=3 loops=1)
Sort Key: atts.email_id
Sort Method: quicksort Memory: 25kB
-> Subquery Scan on atts (cost=2.00..2.01 rows=1 width=36) (actual time=0.048..0.060 rows=3 loops=1)
-> HashAggregate (cost=2.00..2.01 rows=1 width=8) (actual time=0.045..0.051 rows=3 loops=1)
-> Seq Scan on attachments (cost=0.00..2.00 rows=1 width=8) (actual time=0.013..0.021 rows=5 loops=1)
-> Index Only Scan using email_participants_email_id_user_id_index on email_participants (cost=0.00..990.04 rows=269 width=4) (actual time=1.357..2.886 rows=43 loops=1)
Index Cond: (user_id = 75)
Heap Fetches: 43
Total runtime: 1642.157 ms
(75 rows)
I am definitely not looking for fixes :) or a refactored query. Any sort of hight level advice would be most welcome.
Per my comment, the gist of the issue lies in aggregates that get joined with one another. This prevents the use of indexes, and yields a bunch of merge joins (and a materialize) in your query plan…
Put another way, think of it as a plan so convoluted that Postgres proceeds by materializing temporary tables in memory, and then sorting them repeatedly until they're all merge-joined as appropriate. From where I'm standing, the full hogwash seems to amount to selecting all rows from all tables, and all of their possible relationships. Once it has been figured out and conjured up into existence, Postgres proceeds to sorting the mess in order to extract the top-n rows.
Anyway, you want rewrite the query so it can actually use indexes to begin with.
Part of this is simple. This, for instance, is a big no no:
select …,
ts.ids AS todo_ids, cs.ids AS call_ids,
ds.ids AS deal_ids, ms.ids AS meeting_ids, c.comments, p.people,
atts.ids AS attachment_ids
Fetch the emails in one query. Fetch the related objects in separate queries with a bite-sized email_id in (…) clause. Merely doing that should speed things up quite a bit.
For the rest, it may or may not be simple or involve some re-engineering of your schema. I only scanned through the incomprehensible monster and its gruesome query plan, so I cannot comment for sure.
I think the big view is not likely to ever perform well and that you should break it into more manageable components, but still here are two specific bits of advice that come to mind:
Schema change
Move the text and html bodies out of the main table.
Although large contents get automatically stored in TOAST space, mail parts will often be smaller than the TOAST threshold (~2000 bytes), especially for plain text, so it won't happen systematically.
Each non-TOASTED content inflates the table in a way that is detrimental to I/O performance and caching, if you consider that the primary purpose of the table is to contain the header fields like sender, recipients, date, subject...
I can test this with contents I happen to have in a mail database. On a sample of the 55k mails in my inbox:
average text/plain size: 1511 bytes.
average text/html size: 11895 bytes (but 42395 messages have no html at all)
Size of the mail table without the bodies: 14Mb (no TOAST)
If adding the bodies as 2 more TEXT columns like you have: 59Mb in main storage, 61Mb in TOAST.
Despite TOAST, the main storage appears to be 4 times larger. So when scanning the table without need for the TEXT columns, 80% of the I/Os are wasted. Future row updates are likely to make this worse with the fragmentation effect.
The effect in terms of block reads can be spotted through the pg_statio_all_tables view (compare heap_blks_read + heap_blks_hit before and after the query)
Tuning
This part of the EXPLAIN
Sort Method: external sort Disk: 1760kB
suggests that you work_mem is too small. You don't want to hit the disk for such small sorts. Make it as least 10Mb unless you're low on free memory. While you're at it, set shared_buffers to a reasonable value if it's still the default. See http://wiki.postgresql.org/wiki/Performance_Optimization for more.