The table looks something like this:
CREATE TABLE "audit_log" (
"id" int4 NOT NULL DEFAULT nextval('audit_log_id_seq'::regclass),
"entity" varchar(50) COLLATE "public"."ci",
"updated" timestamp(6) NOT NULL,
"transaction_id" uuid,
CONSTRAINT "PK_audit_log" PRIMARY KEY ("id")
);
It contains millions of row.
I tried adding an index on one column like this:
CREATE INDEX "testing" ON "audit_log" USING btree (
"entity" COLLATE "public"."ci" "pg_catalog"."text_ops" ASC NULLS LAST
);
Then ran the following query over both the indexed column, and the primary key:
EXPLAIN ANALYZE SELECT entity, id FROM audit_log WHERE entity = 'abcd'
As I expected, the query plan uses both a Bitmap Index Scan (to find the 'entity' column, presumably) and a Bitmap Heap Scan (to retrieve the 'id' column, I assume):
Gather (cost=2640.10..260915.23 rows=87166 width=122) (actual time=2.828..3.764 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Bitmap Heap Scan on audit_log (cost=1640.10..251198.63 rows=36319 width=122) (actual time=0.061..0.062 rows=0 loops=3)
Recheck Cond: ((entity)::text = '1234'::text)
-> Bitmap Index Scan on testing (cost=0.00..1618.31 rows=87166 width=0) (actual time=0.036..0.036 rows=0 loops=1)
Index Cond: ((entity)::text = '1234'::text)
Next I added an INCLUDE column to the index in order to make it cover the above query:
DROP INDEX testing
CREATE INDEX testing ON audit_log USING btree (
"entity" COLLATE "public"."ci" "pg_catalog"."text_ops" ASC NULLS LAST
)
INCLUDE
(
"id"
)
Then I reran my query, but it still does the Bitmap Heap Scan:
Gather (cost=2964.10..261239.23 rows=87166 width=122) (actual time=2.711..3.570 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Bitmap Heap Scan on audit_log (cost=1964.10..251522.63 rows=36319 width=122) (actual time=0.062..0.062 rows=0 loops=3)
Recheck Cond: ((entity)::text = '1234'::text)
-> Bitmap Index Scan on testing (cost=0.00..1942.31 rows=87166 width=0) (actual time=0.029..0.029 rows=0 loops=1)
Index Cond: ((entity)::text = '1234'::text)
Why is that?
PostgreSQL implements row versioning using a concept called visibility. Each query knows which version of a row it can see.
Now that visibility information is stored in the table row, but not in the index entry, so that table has to be visited just to test if the row is visible or not.
Because of that, every bitmap index scan needs a bitmap heap scan.
To overcome the unfortunate property, PostgreSQL has introduced the visibility map, a data structure that stores for each 8kB-block of the table if all rows in that block are visible to everybody. If that is the case, looking up the table row can be skipped. This is only possible for a regular index scan, not a bitmap index scan.
That visibility map is maintained by VACUUM. So run VACUUM on the table, then you may get an index only scan on the table.
If that alone is not enough, you could try CLUSTER to rewrite the table in index order.
Some detail information on how PostgreSQL estimates the cost of an index scan. The following code is from cost_index in src/backend/optimizer/path/costsize.c:
/*----------
[...]
* If it's an index-only scan, then we will not need to fetch any heap
* pages for which the visibility map shows all tuples are visible.
* Hence, reduce the estimated number of heap fetches accordingly.
* We use the measured fraction of the entire heap that is all-visible,
* which might not be particularly relevant to the subset of the heap
* that this query will fetch; but it's not clear how to do better.
*----------
*/
[...]
if (indexonly)
pages_fetched = ceil(pages_fetched * (1.0 - baserel->allvisfrac));
allvisfrac is calculated using pg_class.relallvisible, which holds an estimate for the number of all-visible pages in the table, and pg_class.relpages.
I have a PostgreSQL 10.6 database on Amazon RDS. My table is like this:
CREATE TABLE dfo_by_quarter (
release_key int4 NOT NULL,
country varchar(100) NOT NULL,
product_group varchar(100) NOT NULL,
distribution_type varchar(100) NOT NULL,
"year" int2 NOT NULL,
"date" date NULL,
quarter int2 NOT NULL,
category varchar(100) NOT NULL,
units numeric(38,6) NOT NULL,
sales_value_eur numeric(38,6) NOT NULL,
sales_value_usd numeric(38,6) NOT NULL,
sales_value_local numeric(38,6) NOT NULL,
data_status bpchar(1) NOT NULL,
panel_market_units numeric(38,6) NOT NULL,
panel_market_sales_value_eur numeric(38,6) NOT NULL,
panel_market_sales_value_usd numeric(38,6) NOT NULL,
panel_market_sales_value_local numeric(38,6) NOT NULL,
CONSTRAINT pk_dpretailer_dfo_by_quarter PRIMARY KEY (release_key, country, category, product_group, distribution_type, year, quarter),
CONSTRAINT fk_dpretailer_dfo_by_quarter_release FOREIGN KEY (release_key) REFERENCES dpretailer.dfo_release(release_id)
);
I understand Primary Key implies a unique index
If I simply ask how many rows I have when filtering on non existing data (release_key = 1 returns nothing), I can see it uses the index
EXPLAIN
SELECT COUNT(*)
FROM dpretailer.dfo_by_quarter
WHERE release_key = 1
Aggregate (cost=6.32..6.33 rows=1 width=8)
-> Index Only Scan using pk_dpretailer_dfo_by_quarter on dfo_by_quarter (cost=0.55..6.32 rows=1 width=0)
Index Cond: (release_key = 1)
But if I run the same query on a value that returns data, it scans the table, which is bound to be more expensive...
EXPLAIN
SELECT COUNT(*)
FROM dpretailer.dfo_by_quarter
WHERE release_key = 2
Finalize Aggregate (cost=47611.07..47611.08 rows=1 width=8)
-> Gather (cost=47610.86..47611.07 rows=2 width=8)
Workers Planned: 2
-> Partial Aggregate (cost=46610.86..46610.87 rows=1 width=8)
-> Parallel Seq Scan on dfo_by_quarter (cost=0.00..46307.29 rows=121428 width=0)
Filter: (release_key = 2)
I get it that using the index when there is no data makes sense and is driven by the stats on the table (I ran ANALYSE before the tests)
But why not using my index if there is data?
Surely, it must be quicker to scan part of an index (because release_key is the first column) rather than scanning an entire table???
I must be missing something...?
Update 2019-03-07
Thank You for your comments, which are very useful.
This simple query was just me trying to understand why the index was not used...
But I should have known better (I am new to postgresql but have MANY years experience with SQL Server) and it makes sense that it is not, as you commented about.
bad selectivity because my criteria only filters about 20% of the rows
bad table design (too fat, which we knew and are now addressing)
index not "covering" the query, etc...
So let me change "slightly" my question if I may...
Our table will be normalised in facts/dimensions (no more varchars in the wrong place).
We do only inserts, never updates and so few deletes that we can ignore it.
The table size will not be huge (tens of million of rows order).
Our queries will ALWAYS specify an exact release_key value.
Our new version of the table would look like this
CREATE TABLE dfo_by_quarter (
release_key int4 NOT NULL,
country_key int2 NOT NULL,
product_group_key int2 NOT NULL,
distribution_type_key int2 NOT NULL,
category_key int2 NOT NULL,
"year" int2 NOT NULL,
"date" date NULL,
quarter int2 NOT NULL,
units numeric(38,6) NOT NULL,
sales_value_eur numeric(38,6) NOT NULL,
sales_value_usd numeric(38,6) NOT NULL,
sales_value_local numeric(38,6) NOT NULL,
CONSTRAINT pk_milly_dfo_by_quarter PRIMARY KEY (release_key, country_key, category_key, product_group_key, distribution_type_key, year, quarter),
CONSTRAINT fk_milly_dfo_by_quarter_release FOREIGN KEY (release_key) REFERENCES dpretailer.dfo_release(release_id),
CONSTRAINT fk_milly_dim_dfo_category FOREIGN KEY (category_key) REFERENCES milly.dim_dfo_category(category_key),
CONSTRAINT fk_milly_dim_dfo_country FOREIGN KEY (country_key) REFERENCES milly.dim_dfo_country(country_key),
CONSTRAINT fk_milly_dim_dfo_distribution_type FOREIGN KEY (distribution_type_key) REFERENCES milly.dim_dfo_distribution_type(distribution_type_key),
CONSTRAINT fk_milly_dim_dfo_product_group FOREIGN KEY (product_group_key) REFERENCES milly.dim_dfo_product_group(product_group_key)
);
With that in mind, in a SQL Server environment, I could solve this by having a "Clustered" primary key (the entire table being sorted), or having an index on the primary key with INCLUDE option for the other columns required to cover the queries (Units, Values, etc).
Question 1)
In postgresql, is there an equivalent to the SQL Server Clustered index? A way to actually sort the entire table? I suppose it might be difficult because postgresql does not do updates "in place", hence it might make sorting expensive...
Or, is there a way to create something like a SQL Server Index WITH INCLUDE(units, values)?
update: I came across the SQL CLUSTER command, which is the closest thing I suppose.
It would be suitable for us
Question 2
With the query below
EXPLAIN (ANALYZE, BUFFERS)
WITH "rank_query" AS
(
SELECT
ROW_NUMBER() OVER(PARTITION BY "year" ORDER BY SUM("main"."units") DESC) AS "rank_by",
"year",
"main"."product_group_key" AS "productgroupkey",
SUM("main"."units") AS "salesunits",
SUM("main"."sales_value_eur") AS "salesvalue",
SUM("sales_value_eur")/SUM("units") AS "asp"
FROM "milly"."dfo_by_quarter" AS "main"
WHERE
"release_key" = 17 AND
"main"."year" >= 2010
GROUP BY
"year",
"main"."product_group_key"
)
,BeforeLookup
AS (
SELECT
"year" AS date,
SUM("salesunits") AS "salesunits",
SUM("salesvalue") AS "salesvalue",
SUM("salesvalue")/SUM("salesunits") AS "asp",
CASE WHEN "rank_by" <= 50 THEN "productgroupkey" ELSE -1 END AS "productgroupkey"
FROM
"rank_query"
GROUP BY
"year",
CASE WHEN "rank_by" <= 50 THEN "productgroupkey" ELSE -1 END
)
SELECT BL.date, BL.salesunits, BL.salesvalue, BL.asp
FROM BeforeLookup AS BL
INNER JOIN milly.dim_dfo_product_group PG ON PG.product_group_key = BL.productgroupkey;
I get this
Hash Join (cost=40883.82..40896.46 rows=558 width=98) (actual time=676.565..678.308 rows=663 loops=1)
Hash Cond: (bl.productgroupkey = pg.product_group_key)
Buffers: shared hit=483 read=22719
CTE rank_query
-> WindowAgg (cost=40507.15..40632.63 rows=5577 width=108) (actual time=660.076..668.272 rows=5418 loops=1)
Buffers: shared hit=480 read=22719
-> Sort (cost=40507.15..40521.09 rows=5577 width=68) (actual time=660.062..661.226 rows=5418 loops=1)
Sort Key: main.year, (sum(main.units)) DESC
Sort Method: quicksort Memory: 616kB
Buffers: shared hit=480 read=22719
-> Finalize HashAggregate (cost=40076.46..40160.11 rows=5577 width=68) (actual time=648.762..653.227 rows=5418 loops=1)
Group Key: main.year, main.product_group_key
Buffers: shared hit=480 read=22719
-> Gather (cost=38710.09..39909.15 rows=11154 width=68) (actual time=597.878..622.379 rows=11938 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=480 read=22719
-> Partial HashAggregate (cost=37710.09..37793.75 rows=5577 width=68) (actual time=594.044..600.494 rows=3979 loops=3)
Group Key: main.year, main.product_group_key
Buffers: shared hit=480 read=22719
-> Parallel Seq Scan on dfo_by_quarter main (cost=0.00..36019.74 rows=169035 width=22) (actual time=106.916..357.071 rows=137171 loops=3)
Filter: ((year >= 2010) AND (release_key = 17))
Rows Removed by Filter: 546602
Buffers: shared hit=480 read=22719
CTE beforelookup
-> HashAggregate (cost=223.08..238.43 rows=558 width=102) (actual time=676.293..677.167 rows=663 loops=1)
Group Key: rank_query.year, CASE WHEN (rank_query.rank_by <= 50) THEN (rank_query.productgroupkey)::integer ELSE '-1'::integer END
Buffers: shared hit=480 read=22719
-> CTE Scan on rank_query (cost=0.00..139.43 rows=5577 width=70) (actual time=660.079..672.978 rows=5418 loops=1)
Buffers: shared hit=480 read=22719
-> CTE Scan on beforelookup bl (cost=0.00..11.16 rows=558 width=102) (actual time=676.296..677.665 rows=663 loops=1)
Buffers: shared hit=480 read=22719
-> Hash (cost=7.34..7.34 rows=434 width=4) (actual time=0.253..0.253 rows=435 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 24kB
Buffers: shared hit=3
-> Seq Scan on dim_dfo_product_group pg (cost=0.00..7.34 rows=434 width=4) (actual time=0.017..0.121 rows=435 loops=1)
Buffers: shared hit=3
Planning time: 0.319 ms
Execution time: 678.714 ms
Does anything spring to mind?
If I read it properly, it means my biggest cost by far is the initial scanof the table... but I don't manage to make it use an index...
I had created an index I hoped would help but it got ignored...
CREATE INDEX eric_silly_index ON milly.dfo_by_quarter(release_key, YEAR, date, product_group_key, units, sales_value_eur);
ANALYZE milly.dfo_by_quarter;
I also tried to cluster the table but no visible effect either
CLUSTER milly.dfo_by_quarter USING pk_milly_dfo_by_quarter; -- took 30 seconds (uidev)
ANALYZE milly.dfo_by_quarter;
Many thanks
Eric
Because release_key isn't actually a unique column, it's not possible from the information you've provided to know whether or not the index should be used. If a high percentage of rows have release_key = 2 or even a smaller percentage of rows match on a large table, it may not be efficient to use the index.
In part this is because Postgres indexes are indirect -- that is the index actually contains a pointer to the location on disk in the heap where the real tuple lives. So looping through an index requires reading an entry from the index, reading the tuple from the heap, and repeating. For a large number of tuples it's often more valuable to scan the heap directly and avoid the indirect disk access penalty.
Edit:
You generally don't want to be using CLUSTER in PostgreSQL; it's not how indexes are maintained, and it's rare to see that in the wild for that reason.
Your updated query with no data gives this plan:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
CTE Scan on beforelookup bl (cost=8.33..8.35 rows=1 width=98) (actual time=0.143..0.143 rows=0 loops=1)
Buffers: shared hit=4
CTE rank_query
-> WindowAgg (cost=8.24..8.26 rows=1 width=108) (actual time=0.126..0.126 rows=0 loops=1)
Buffers: shared hit=4
-> Sort (cost=8.24..8.24 rows=1 width=68) (actual time=0.060..0.061 rows=0 loops=1)
Sort Key: main.year, (sum(main.units)) DESC
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=4
-> GroupAggregate (cost=8.19..8.23 rows=1 width=68) (actual time=0.011..0.011 rows=0 loops=1)
Group Key: main.year, main.product_group_key
Buffers: shared hit=1
-> Sort (cost=8.19..8.19 rows=1 width=64) (actual time=0.011..0.011 rows=0 loops=1)
Sort Key: main.year, main.product_group_key
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=1
-> Index Scan using pk_milly_dfo_by_quarter on dfo_by_quarter main (cost=0.15..8.18 rows=1 width=64) (actual time=0.003..0.003 rows=0 loops=1)
Index Cond: ((release_key = 17) AND (year >= 2010))
Buffers: shared hit=1
CTE beforelookup
-> HashAggregate (cost=0.04..0.07 rows=1 width=102) (actual time=0.128..0.128 rows=0 loops=1)
Group Key: rank_query.year, CASE WHEN (rank_query.rank_by <= 50) THEN (rank_query.productgroupkey)::integer ELSE '-1'::integer END
Buffers: shared hit=4
-> CTE Scan on rank_query (cost=0.00..0.03 rows=1 width=70) (actual time=0.127..0.127 rows=0 loops=1)
Buffers: shared hit=4
Planning Time: 0.723 ms
Execution Time: 0.485 ms
(27 rows)
So PostgreSQL is entirely capable of using the index for your query, but the planner is deciding that it's not worth it (i.e., the costing for using the index directly is higher than the costing for using the parallel sequence scan).
If you set enable_indexscan = off; with no data, you get a bitmap index scan (as I'd expect). If you set enable_bitmapscan = off; with no data you get an (non-parallel) sequence scan.
You should see the plan change back (with large amounts of data) if you set max_parallel_workers = 0;.
But looking at your query's explain results, I'd very much expect using the index to be more expensive and take longer than using the parallel sequence scan. In your updated query you're still scanning a very high percentage of the table and a large number of rows, and you're also forcing accessing the heap by accessing fields not in the index. Postgres 11 (I believe) adds covering indexes which would theoretically allow you to make this query be driven by the index alone, but I'm not at all convinced in this example it would actually be worth it.
Generally, while possible, a PK spanning 7 columns, several of which being varchar(100) is not optimized for performance, to say the least.
Such an index is large to begin with and tends to bloat quickly, if you have updates on involved columns.
I would operate with a surrogate PK, a serial (or bigserial if you have that many rows). Or IDENTITY. See:
Auto increment table column
And a UNIQUE constraint on all 7 to enforce uniqueness (all are NOT NULL anyway).
If you have lots of counting queries with the only predicate on release_key consider an additional plain btree index on just that column.
The data type varchar(100) for so many columns may not be optimal. Some normalization might help.
More advise depends on missing information ...
The answer to my initial question: why is postgresql not using my index on something like SELECT (*)... can be found in the documentation...
Introduction to VACUUM, ANALYZE, EXPLAIN, and COUNT
In particular: This means that every time a row is read from an index, the engine has to also read the actual row in the table to ensure that the row hasn't been deleted.
This explains a lot why I don't manage to get postgresql to use my indexes when, from a SQL Server perspective, it obviously "should".
I have a large table (30M rows) which has ~10 jsonb B-tree indexes.
When I create a query using few conditions, the query is relatively fast.
When I add more conditions, especially one with a sparse jsonb index (e.g. an integer between 0 and 1,000,000), the query speed drops off dramatically.
I am wondering whether jsonb indexes are slower than native indexes? Would I expect a performance boost by switching to native columns rather than JSON?
Table definition:
id integer
type text
data jsonb
company_index ARRAY
exchange_index ARRAY
eligible boolean
Example query:
SELECT id, data, type
FROM collection.bundles
WHERE ( (ARRAY['.X'] && bundles.exchange_index) AND
type IN ('discussion') AND
( ((data->>'sentiment_score')::bigint > 0 AND
(data->'display_tweet'->'stocktwit'->'id') IS NOT NULL) ) AND
( eligible = true ) AND
((data->'display_tweet'->'stocktwit')->>'id')::bigint IS NULL )
ORDER BY id DESC
LIMIT 50
Output:
Limit (cost=0.56..16197.56 rows=50 width=212) (actual time=31900.874..31900.874 rows=0 loops=1)
Buffers: shared hit=13713180 read=1267819 dirtied=34 written=713
I/O Timings: read=7644.206 write=7.294
-> Index Scan using bundles2_id_desc_idx on bundles (cost=0.56..2401044.17 rows=7412 width=212) (actual time=31900.871..31900.871 rows=0 loops=1)
Filter: (eligible AND ('{.X}'::text[] && exchange_index) AND (type = 'discussion'::text) AND ((((data -> 'display_tweet'::text) -> 'stocktwit'::text) -> 'id'::text) IS NOT NULL) AND (((data ->> 'sentiment_score'::text))::bigint > 0) AND (((((data -> 'display_tweet'::text) -> 'stocktwit'::text) ->> 'id'::text))::bigint IS NULL))
Rows Removed by Filter: 16093269
Buffers: shared hit=13713180 read=1267819 dirtied=34 written=713
I/O Timings: read=7644.206 write=7.294
Planning time: 0.366 ms
Execution time: 31900.909 ms
Note:
There are jsonb B-tree indexes on every jsonb condition used in this query. exchange_index and company_index have GIN indexes.
UPDATE
After Laurenz's changed query:
Limit (cost=150634.15..150634.27 rows=50 width=211) (actual time=15925.828..15925.828 rows=0 loops=1)
Buffers: shared hit=1137490 read=680349 written=2
I/O Timings: read=2896.702 write=0.038
-> Sort (cost=150634.15..150652.53 rows=7352 width=211) (actual time=15925.827..15925.827 rows=0 loops=1)
Sort Key: bundles.id DESC
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=1137490 read=680349 written=2
I/O Timings: read=2896.702 write=0.038
-> Bitmap Heap Scan on bundles (cost=56666.15..150316.40 rows=7352 width=211) (actual time=15925.816..15925.816 rows=0 loops=1)
Recheck Cond: (('{.X}'::text[] && exchange_index) AND (type = 'discussion'::text))
Filter: (eligible AND ((((data -> 'display_tweet'::text) -> 'stocktwit'::text) -> 'id'::text) IS NOT NULL) AND (((data ->> 'sentiment_score'::text))::bigint > 0) AND (((((data -> 'display_tweet'::text) -> 'stocktwit'::text) ->> 'id'::text))::bigint IS NULL))
Rows Removed by Filter: 273230
Heap Blocks: exact=175975
Buffers: shared hit=1137490 read=680349 written=2
I/O Timings: read=2896.702 write=0.038
-> BitmapAnd (cost=56666.15..56666.15 rows=23817 width=0) (actual time=1895.890..1895.890 rows=0 loops=1)
Buffers: shared hit=37488 read=85559
I/O Timings: read=325.535
-> Bitmap Index Scan on bundles2_exchange_index_ops_idx (cost=0.00..6515.57 rows=863703 width=0) (actual time=218.690..218.690 rows=892669 loops=1)
Index Cond: ('{.X}'::text[] && exchange_index)
Buffers: shared hit=7 read=313
I/O Timings: read=1.458
-> Bitmap Index Scan on bundles_eligible_idx (cost=0.00..23561.74 rows=2476877 width=0) (actual time=436.719..436.719 rows=2569331 loops=1)
Index Cond: (eligible = true)
Buffers: shared hit=37473
-> Bitmap Index Scan on bundles2_type_idx (cost=0.00..26582.83 rows=2706276 width=0) (actual time=1052.267..1052.267 rows=2794517 loops=1)
Index Cond: (type = 'discussion'::text)
Buffers: shared hit=8 read=85246
I/O Timings: read=324.077
Planning time: 0.433 ms
Execution time: 15928.959 ms
All your fancy indexes are not used at all, so the problem is not if they are fast or not.
There are several things at play here:
Seeing the dirtied and the written pages during the index scan, I suspect that there are quite a lot of “dead tuples” in your table. When the index scan visits them and notices they are dead, it “kills” those index entries so that subsequent index scans don't have to repeat that work.
If you repeat the query, you will probably notice that the number of blocks and the execution time becomes less.
You can reduce that problem by running VACUUM on the table or making sure autovacuum processes the table often enough.
Your major problem, however, is that the LIMIT clause tempts PostgreSQL to use the following strategy:
Since you only want 50 result rows in an order for which you have an index, just examine the table rows in index order and discard all rows that do not match the complicated condition until you have 50 results.
Unfortunately it has to scan 16093319 rows until it has found its 50 hits. The rows at the “high id” end of the table don't match the condition. PostgreSQL does not know about that correlation.
The solution is to discourage PostgreSQL from going down that route. The easiest way would be to drop all indexes on id, but given its name that is probably unfeasible.
The other way is to keep PostgreSQL from “seeing” the LIMIT clause when it plans the scan:
SELECT id, data, type
FROM (SELECT id, data, type
FROM collection.bundles
WHERE /* all your complicated conditions */
OFFSET 0) subquery
ORDER BY id DESC
LIMIT 50;
Remark: You didn't show your index definitions, but it sounds to be like you have quite a lot of them, possibly too many. Indexes are expensive, so make sure you define only those that give you a clear benefit.
I have a index like this on my candidates and their first_name column:
CREATE INDEX ix_public_candidates_first_name_not_null
ON public.candidates (first_name)
WHERE first_name IS NOT NULL;
Is Postgres smart enough to know that an equal operator means it can't be null or am I just lucky that my "is not null" index is used in this query?
select *
from public.candidates
where first_name = 'Erik'
Analyze output:
Bitmap Heap Scan on candidates (cost=57.46..8096.88 rows=2714 width=352) (actual time=1.481..18.847 rows=2460 loops=1)
Recheck Cond: (first_name = 'Erik'::citext)
Heap Blocks: exact=2256
-> Bitmap Index Scan on ix_public_candidates_first_name_not_null (cost=0.00..56.78 rows=2714 width=0) (actual time=1.204..1.204 rows=2460 loops=1)
Index Cond: (first_name = 'Erik'::citext)
Planning time: 0.785 ms
Execution time: 19.340 ms
The PostgreSQL optimizer is not based on lucky guesses.
It can indeed infer that anything that matches an equality condition cannot be NULL; the proof is the execution plan you show.
I have defined the following index:
CREATE INDEX
users_search_idx
ON
auth_user
USING
gin(
username gin_trgm_ops,
first_name gin_trgm_ops,
last_name gin_trgm_ops
);
I am performing the following query:
PREPARE user_search (TEXT, INT) AS
SELECT
username,
email,
first_name,
last_name,
( -- would probably do per-field weightings here
s_username + s_first_name + s_last_name
) rank
FROM
auth_user,
similarity(username, $1) s_username,
similarity(first_name, $1) s_first_name,
similarity(last_name, $1) s_last_name
WHERE
username % $1 OR
first_name % $1 OR
last_name % $1
ORDER BY
rank DESC
LIMIT $2;
The auth_user table has 6.2 million rows.
The speed of the query seems to depend very heavily on the number of results potentially returned by the similarity query.
Increasing the similarity threshold via set_limit helps, but reduces usefulness of results by eliminating partial matches.
Some searches return in 200ms, others take ~ 10 seconds.
We have an existing implementation of this feature using Elasticsearch that returns in < 200ms for any query, while doing more complicated (better) ranking.
I would like to know if there is any way we could improve this to get more consistent performance?
It's my understanding that GIN index (inverted index) is the same basic approach used by Elasticsearch so I would have thought there is some optimization possible.
An EXPLAIN ANALYZE EXECUTE user_search('mel', 20) shows:
Limit (cost=54099.81..54099.86 rows=20 width=52) (actual time=10302.092..10302.104 rows=20 loops=1)
-> Sort (cost=54099.81..54146.66 rows=18739 width=52) (actual time=10302.091..10302.095 rows=20 loops=1)
Sort Key: (((s_username.s_username + s_first_name.s_first_name) + s_last_name.s_last_name)) DESC
Sort Method: top-N heapsort Memory: 26kB
-> Nested Loop (cost=382.74..53601.17 rows=18739 width=52) (actual time=118.164..10293.765 rows=8380 loops=1)
-> Nested Loop (cost=382.74..53132.69 rows=18739 width=56) (actual time=118.150..10262.804 rows=8380 loops=1)
-> Nested Loop (cost=382.74..52757.91 rows=18739 width=52) (actual time=118.142..10233.990 rows=8380 loops=1)
-> Bitmap Heap Scan on auth_user (cost=382.74..52383.13 rows=18739 width=48) (actual time=118.128..10186.816 rows=8380loops=1)"
Recheck Cond: (((username)::text % 'mel'::text) OR ((first_name)::text % 'mel'::text) OR ((last_name)::text %'mel'::text))"
Rows Removed by Index Recheck: 2434523
Heap Blocks: exact=49337 lossy=53104
-> BitmapOr (cost=382.74..382.74 rows=18757 width=0) (actual time=107.436..107.436 rows=0 loops=1)
-> Bitmap Index Scan on users_search_idx (cost=0.00..122.89 rows=6252 width=0) (actual time=40.200..40.200rows=88908 loops=1)"
Index Cond: ((username)::text % 'mel'::text)
-> Bitmap Index Scan on users_search_idx (cost=0.00..122.89 rows=6252 width=0) (actual time=43.847..43.847rows=102028 loops=1)"
Index Cond: ((first_name)::text % 'mel'::text)
-> Bitmap Index Scan on users_search_idx (cost=0.00..122.89 rows=6252 width=0) (actual time=23.387..23.387rows=58740 loops=1)"
Index Cond: ((last_name)::text % 'mel'::text)
-> Function Scan on similarity s_username (cost=0.00..0.01 rows=1 width=4) (actual time=0.004..0.004 rows=1 loops=8380)
-> Function Scan on similarity s_first_name (cost=0.00..0.01 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=8380)
-> Function Scan on similarity s_last_name (cost=0.00..0.01 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=8380)
Execution time: 10302.559 ms
Server is Postgres 9.6.1 running on Amazon RDS
update
1.
Shortly after posting the question I found this info: https://www.postgresql.org/message-id/464F3C5D.2000700#enterprisedb.com
So I tried
-> SHOW work_mem;
4MB
-> SET work_mem='12MB';
-> EXECUTE user_search('mel', 20);
(results returned in ~1.5s)
This made a big improvement (previously > 10s)!
1.5s is still way slower than ES for similar query so I would still like to hear any suggestions for optimising the query.
2.
In response to comments, and after seeing this question (Postgresql GIN index slower than GIST for pg_trgm), I tried exactly the same set up with a GIST index in place of the GIN one.
Trying the same search above, it returned in ~3.5s, using default work_mem='4MB'. Increasing work_mem made no difference.
From this I conclude that GIST index is more memory efficient (did not hit pathological case like GIN did) but is slower than GIN when GIN is working properly. This is inline with what's described in the docs recommending GIN index.
3.
I still don't understand why so much time is spent in:
-> Bitmap Heap Scan on auth_user (cost=382.74..52383.13 rows=18739 width=48) (actual time=118.128..10186.816 rows=8380loops=1)"
Recheck Cond: (((username)::text % 'mel'::text) OR ((first_name)::text % 'mel'::text) OR ((last_name)::text %'mel'::text))"
Rows Removed by Index Recheck: 2434523
Heap Blocks: exact=49337 lossy=53104
I don't understand why this step is needed or what it's doing.
There are the three Bitmap Index Scan beneath it for each of the username % $1 clauses... these results then get combined with a BitmapOr step. These parts are all quite fast.
But even in the case where we don't run out of work mem, we still spend nearly a whole second in Bitmap Heap Scan.
I expect much faster results with this approach:
1.
Create a GiST index with 1 column holding concatenated values:
CREATE INDEX users_search_idx ON auth_user
USING gist((username || ' ' || first_name || ' ' || last_name) gist_trgm_ops);
This assumes all 3 columns to be defined NOT NULL (you did not specify). Else you need to do more.
Why not simplify with concat_ws()?
Combine two columns and add into one new column
Faster query with pattern-matching on multiple text fields
Combine two columns and add into one new column
2.
Use a proper nearest-neighbor query, matching above index:
SELECT username, email, first_name, last_name
, similarity(username , $1) AS s_username
, similarity(first_name, $1) AS s_first_name
, similarity(last_name , $1) AS s_last_name
, row_number() OVER () AS rank -- greatest similarity first
FROM auth_user
WHERE (username || ' ' || first_name || ' ' || last_name) % $1 -- !!
ORDER BY (username || ' ' || first_name || ' ' || last_name) <-> $1 -- !!
LIMIT $2;
Expressions in WHERE and ORDER BY must match index expression!
In particular ORDER BY rank (like you had it) will always perform poorly for a small LIMIT picking from a much larger pool of qualifying rows, because it cannot use an index directly: The sophisticated expression behind rank has to be calculated for every qualifying row, then all have to be sorted before the small selection of best matches can be returned. This is much, much more expensive than a true nearest-neighbor query that can pick the best results from the index directly without even looking at the rest.
row_number() with empty window definition just reflects the ordering produced by the ORDER BY of the same SELECT.
Related answers:
Best index for similarity function
Search in 300 million addresses with pg_trgm
As for your item 3., I added an answer to the question you referenced, that should explain it:
PostgreSQL GIN index slower than GIST for pg_trgm?