EDIT In my original question I noticed a difference between searching for neighbors using a JOIN and using a WHERE .. IN clause, which #LukaszSzozda rightfully pointed out is a semi-join. It turns out my node list had duplicates, which explains why the JOIN took longer to run. Thanks, #LukaszSzozda. The more important aspect of my question remains, though, which is what is brought below. UPDATE I added the relevant configuration options to the bottom, and updated statistics using ANALYZE (thanks to #joop). Also, I tested with three different indices (B-Tree, hash, BRIN). Finally, I noticed that using different queries returned different number of rows into tmp_nodes, possibly because of different ordering, so I fixed it to a constant set of rather-random 8,000 nodes .
In PostgreSQL, my query to search for neighbors of 8,000 nodes among ~200*106 nodes (within ~1.3*109 edges) is slow (~30 seconds using hash index; see index benchmarking below).
Given the setup I describe below, are there further optimizations for my server software, database, table or query to make the neighbor search faster? I am particularly surprised at this speed considering how well PostgreSQL did on the ArangoDB NoSQL benchmark.
More specifically:
I am aware of AgnesGraph, but do not wish yet to move to a graph-database solution, specifically since I cannot tell from AgnesGraph how well it keeps up-to-date with PostgreSQL. Can someone explain the performance benefits with regard to how the query actually happens in AgnesGraph vs PostgreSQL, so that I can decide whether to migrate?
Are there any configuration tweaks, whether in the server or the OS, that affect my query according to the plan, which cause it to run for longer than needed?
Set up
I have a large graph database ( ~109 edges, ~200*106 nodes) in PostgreSQL (PostgreSQL 10.1, which I had to pull from the zesty PPA) stored on the cloud (DigitalOcean, 6-core, 16GB RAM machine, Ubuntu 17.10, Intel(R) Xeon(R) CPU E5-2650 v4 # 2.20GHz), and set up with parameters suggested by PGTune (see bottom). I am querying on-server.
I have created forward- and backward-edges tables (see this question)
CREATE TABLE edges_fwd (src BIGINT, dest BIGINT, PRIMARY KEY (src, dest));
CREATE TABLE edges_back (src BIGINT, dest BIGINT, PRIMARY KEY (dest, src));
and clustered both by the respective keys (just in case):
CLUSTER edges_fwd USING edges_fwd_pkey;
CLUSTER edges_back USING edges_back_pkey;
I turned off enabled_seqscan for the purpose of testing my queries (see side note below).
I would like to load all the out-edges for 8,000 nodes (which 8,000 nodes these are can change depending on user query), whose identifiers are list in a table tmp_nodes (with a single column, nid). I initially wrote this version on the query (patting myself in the back on already following the lines of the graph talk from PGCon11):
SELECT e.*
FROM tmp_nodes
JOIN edges AS e
ON e.src = tmp_nodes.nid;
I also tried:
SELECT * FROM edges_fwd AS e
WHERE e.src IN (SELECT nid FROM tmp_nodes);
They both are slow, and take about 30 seconds to run at best (using hash indicies). EXPLAIN ANALYZE outputs are brought below.
I expected things to generally run much faster. For looking for 8,000 keys in a clustered table (yes, I know it's not really a clustered index), since the server knows that the rows are ordered, I should expect less page reads than the number of total rows returned. So while 243,708 rows are fetched, which isn't a little, they are associated with 8,000 distinct keys, and the number of reads should not be much larger than that: it's an average of 30 rows per key, which is about 1,400 bytes per read (the table size is 56GB and has 1.3B rows, so it's about 46 bytes per row; which by the way is quite a bloat for 16 bytes of data). This is far below the page size (4K) for the system. I didn't think reading 8,000 pages, even random-access, should take this long.
This brings me back to my questions (above).
Forcing index usage
I took advice from answers to another question and at least for testing (though, since my database is read-only, I might be tempted to use it in production), set enable_seqscan to off, in order to force index usage.
I ran each 5 times - the times varied by a few seconds here and there.
EXPLAIN ANALYZE outputs
Taking care to flush OS disk cached and restart the server to reflect correct random-seek timings, I used EXPLAIN ANALYZE on both queries. I used two types of indexes - B-Tree and hash. I also tried BRIN with different values for the pages_per_range option (2, 8, 32 and 128), but they are all slower (in orders or magnitude) than those mentioned above. I am giving the results below for reference.
B-Tree index, JOIN query:
Nested Loop (cost=10000000000.58..10025160709.50 rows=15783833 width=16) (actual time=4.546..39152.408 rows=243708 loops=1)
-> Seq Scan on tmp_nodes (cost=10000000000.00..10000000116.00 rows=8000 width=8) (actual time=0.712..15.721 rows=8000 loops=1)
-> Index Only Scan using edges_fwd_pkey on edges_fwd e (cost=0.58..3125.34 rows=1973 width=16) (actual time=4.565..4.879 rows=30 loops=8000)
Index Cond: (src = tmp_nodes.nid)
Heap Fetches: 243708
Planning time: 20.962 ms
Execution time: 39175.454 ms
B-Tree index, WHERE .. IN query (semi-join):
Nested Loop (cost=10000000136.58..10025160809.50 rows=15783833 width=16) (actual time=9.578..42605.783 rows=243708 loops=1)
-> HashAggregate (cost=10000000136.00..10000000216.00 rows=8000 width=8) (actual time=5.903..35.750 rows=8000 loops=1)
Group Key: tmp_nodes.nid
-> Seq Scan on tmp_nodes (cost=10000000000.00..10000000116.00 rows=8000 width=8) (actual time=0.722..2.695 rows=8000 loops=1
)
-> Index Only Scan using edges_fwd_pkey on edged_fwd e (cost=0.58..3125.34 rows=1973 width=16) (actual time=4.924..5.309 rows=30 loops=8000)
Index Cond: (src = tmp_nodes.nid)
Heap Fetches: 243708
Planning time: 19.126 ms
Execution time: 42629.084 ms
Hash index, JOIN query:
Nested Loop (cost=10000000051.08..10056052287.01 rows=15783833 width=16) (actual time=3.710..34131.371 rows=243708 loops=1)
-> Seq Scan on tmp_nodes (cost=10000000000.00..10000000116.00 rows=8000 width=8) (actual time=0.960..13.338 rows=8000 loops=1)
-> Bitmap Heap Scan on edges_fwd e (cost=51.08..6986.79 rows=1973 width=16) (actual time=4.086..4.250 rows=30 loops=8000)
Heap Blocks: exact=8094
-> Bitmap Index Scan on ix_edges_fwd_src_hash (cost=0.00..50.58 rows=1973 width=0) (actual time=2.563..2.563 rows=31
loops=8000)
Execution time: 34155.511 ms
Hash index, WHERE .. IN query (semi-join):
Nested Loop (cost=10000000187.08..10056052387.01 rows=15783833 width=16) (actual time=12.766..31834.767 rows=243708 loops=1)
-> HashAggregate (cost=10000000136.00..10000000216.00 rows=8000 width=8) (actual time=6.297..30.760 rows=8000 loops=1)
-> Seq Scan on tmp_nodes (cost=10000000000.00..10000000116.00 rows=8000 width=8) (actual time=0.883..3.108 rows=8000 loops=$
)
-> Bitmap Heap Scan on edges_fwd e (cost=51.08..6986.79 rows=1973 width=16) (actual time=3.768..3.958 rows=30 loops=8000)
Heap Blocks: exact=8094
-> Bitmap Index Scan on ix_edges_fwd_src_hash (cost=0.00..50.58 rows=1973 width=0) (actual time=2.340..2.340 rows=31
loops=8000)
Execution time: 31857.692 ms
postgresql.conf settings
I set the following configuration options as suggested by PGTune:
max_connections = 10
shared_buffers = 4GB
effective_cache_size = 12GB
maintenance_work_mem = 2GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 500
random_page_cost = 4
effective_io_concurrency = 2
work_mem = 69905kB
min_wal_size = 4GB
max_wal_size = 8GB
max_worker_processes = 6
max_parallel_workers_per_gather = 3
max_parallel_workers = 6
You also need indexes to search in the reversed direction:
CREATE TABLE edges_fwd
(src BIGINT
, dest BIGINT
, PRIMARY KEY (src, dest)
);
CREATE UNIQUE INDEX ON edges_fwd(dest, src);
CREATE TABLE edges_back
(src BIGINT
, dest BIGINT
, PRIMARY KEY (dest, src)
);
CREATE UNIQUE INDEX ON edges_back(src, dest);
SELECT fwd.*
FROM edges_back AS bck
JOIN edges_fwd AS fwd
ON fwd.src = bck.src -- bck.src does not have a usable index
WHERE bck.dest = root_id;
The absence of this index causes the hashjoin (or: tablescan)
Also, you could maybe combine the two tables.
Also, you can force the src and dest columns to be NOT NULL
(a null would make no sense in a edges table)
, and make them FOREIGN KEYs to your nodes table:
CREATE TABLE nodes
(nid BIGINT NOT NULL PRIMARY KEY
-- ... more stuff...
);
CREATE TABLE edges_fwd
(src BIGINT NOT NULL REFERENCES nodes(nid)
, dest BIGINT NOT NULL REFERENCES nodes(nid)
, PRIMARY KEY (src, dest)
);
CREATE TABLE edges_back
(src BIGINT NOT NULL REFERENCES nodes(nid)
, dest BIGINT NOT NULL REFERENCES nodes(nid)
, PRIMARY KEY (dest, src)
);
INSERT INTO nodes(nid)
SELECT a
FROM generate_series(1,1000) a -- 1000 rows
;
INSERT INTO edges_fwd(src, dest)
SELECT a.nid, b.nid
FROM nodes a
JOIN nodes b ON random()< 0.1 --100K rows
;
INSERT INTO edges_back(src, dest)
SELECT a.nid, b.nid
FROM nodes a
JOIN nodes b ON random()< 0.1 --100K rows
;
This will result in this plan:
DROP SCHEMA
CREATE SCHEMA
SET
CREATE TABLE
CREATE TABLE
CREATE TABLE
INSERT 0 1000
INSERT 0 99298
INSERT 0 99671
ANALYZE
ANALYZE
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.50..677.62 rows=9620 width=16) (actual time=0.086..5.299 rows=9630 loops=1)
-> Index Only Scan using edges_back_pkey on edges_back bck (cost=0.25..100.07 rows=97 width=8) (actual time=0.053..0.194 rows=96 loops=1)
Index Cond: (dest = 11)
Heap Fetches: 96
-> Index Only Scan using edges_fwd_pkey on edges_fwd fwd (cost=0.25..5.46 rows=99 width=16) (actual time=0.008..0.037 rows=100 loops=96)
Index Cond: (src = bck.src)
Heap Fetches: 9630
Planning time: 0.480 ms
Execution time: 5.836 ms
(9 rows)
It seems that random-access for this kind of setup is just this slow. Running a script to check random-access of 8,000 different, random 4K blocks within a large file takes nearly 30 seconds. Using Linux time and the linked script, I get in average something like 24 seconds:
File size: 8586524825 Read size: 4096
32768000 bytes read
real 0m24.076s
So it seems the assumption that random access should be quicker is wrong. Together with time taken to read the actual index, it means performance is at its peak without a hardware change. To improve performance, I will likely need to use a RAID set-up or a cluster. If a RAID set-up will improve performance in a close-to-linear fashion, I will accept my own answer.
Related
I have a table with 4707838 rows. When I run the following query on this table it takes around 9 seconds to execute.
SELECT json_agg(
json_build_object('accessorId',
p."accessorId",
'mobile',json_build_object('enabled', p.mobile,'settings',
json_build_object('proximityAccess', p."proximity",
'tapToAccess', p."tapToAccess",
'clickToAccessRange', p."clickToAccessRange",
'remoteAccess',p."remote")
),'
card',json_build_object('enabled',p."card"),
'fingerprint',json_build_object('enabled',p."fingerprint"))
) AS permissions
FROM permissions AS p
WHERE p."accessPointId"=99
The output of explain analyze is as follows:
Aggregate (cost=49860.12..49860.13 rows=1 width=32) (actual time=9011.711..9011.712 rows=1 loops=1)
Buffers: shared read=29720
I/O Timings: read=8192.273
-> Bitmap Heap Scan on permissions p (cost=775.86..49350.25 rows=33991 width=14) (actual time=48.886..8704.470 rows=36556 loops=1)
Recheck Cond: ("accessPointId" = 99)
Heap Blocks: exact=29331
Buffers: shared read=29720
I/O Timings: read=8192.273
-> Bitmap Index Scan on composite_key_accessor_access_point (cost=0.00..767.37 rows=33991 width=0) (actual time=38.767..38.768 rows=37032 loops=1)
Index Cond: ("accessPointId" = 99)
Buffers: shared read=105
I/O Timings: read=32.592
Planning Time: 0.142 ms
Execution Time: 9012.719 ms
This table has a btree index on accessorId column and composite index on (accessorId,accessPointId).
Can anyone tell me what could be the reason for this query to be slow even though it uses an index?
Over 90% of the time is waiting to get data from disk. At 3.6 ms per read, that is pretty fast for a harddrive (suggesting that much of the data was already in the filesystem cache, or that some of the reads brought in neighboring data that was also eventually required--that is sequential reads not just random reads) but slow for a SSD.
If you set enable_bitmapscan=off and clear the cache (or pick a not recently used "accessPointId" value) what performance do you get?
How big is the table? If you are reading a substantial fraction of the table and think you are not getting as much benefit from sequential reads as you should be, you can try making your OSes readahead settings more aggressive. On Linux that is something like sudo blockdev --setra ...
You could put all columns referred to by the query into the index, to enable index-only scans. But given the number of columns you are using that might be impractical. You could want "accessPointId" to be the first column in the index. By the way, is the index currently used really on (accessorId,accessPointId)? It looks to me like "accessPointId" is really the first column in that index, not the 2nd one.
You could cluster the table by an index which has "accessPointId" as the first column. That would group the related records together for faster access. But note it is a slow operation and takes a strong lock on the table while it is running, and future data going into the table won't be clustered, only the current data.
You could try to increase effective_io_concurrency so that you can have multiple io requests outstanding at a time. How effective this is will depend on your hardware.
I am new to Postgres and sure I’m doing something wrong.
So I just wondered if anybody had experienced something similar to my experiences below or could point me in the right direction to improve Postgres performance.
My initial goal was to speed up the analytical processing of my Datamarts in various Dashboards by moving from MS SQL Server to Postgres.
To get a sample query to compare speeds I ran query profiler on MS SQL Server whilst referencing a BI dashboard, which produced something similar to this (I know there are redundant columns in the sub query):
SELECT COUNT(*)
FROM (
SELECT
BM.Key_Date, BM.[Actual Date], BM.[Month]
,BM.[Month Number], BM.[Month Year], BM.[No of Working Days]
,SDI.Key_Delivery, SDI.[Order Number], SDI.[Quantity SKU]
,SDI.[Quantity Sales Unit], SDI.[FactSales - GBP], SDI.[NNSA Capsules]
,SFI.[Ship-to], SFI.[Sold-to], SFI.[Sales Force Type], SFI.Region
,SFI.[Top Level Account], SFI.[Customer Organisation]
,EX.Rate
,PDI.[Product Description], PDI.[Product Type Group], PDI.[Product Type],
PDI.[Main Product Categories], PDI.Section, PDI.Family
FROM Fact.SalesDataInvoiced AS SDI
JOIN Dimension.SalesforceInvoiced AS SFI
ON SDI.[Key_Ship-to]=SFI.[Key_Ship-to]
JOIN Dimension.BillingMonth AS BM
ON SDI.[Key_Billing Month]=BM.Key_Date
JOIN Dimension.ProductDataInvoiced AS PDI
ON SDI.[Key_Product Code]=PDI.[Key_Product Code]
CROSS JOIN Dimension.Exchange AS EX
WHERE BM.[Actual Date] BETWEEN '20160101' AND '20211001'
) AS a
GROUP BY [Product Type], [Product Type Group],[Main Product Categories]
I then installed Postgres 14 (on Centos 8) and MS SQL Server Developer 2017 (on windows 10) on separate identical laptops and created a Database and tables from the same csv data files to enable the replication of the above query.
Running a Postgres query with indexing performs massively slower than MS SQL without indexing.
Adding indexes to MS SQL produces results almost instantly.
Because of the difference in processing time I even installed Citus with Postgres14 and created Fact.SalesDataInvoiced as a columnar table (This made the processing time worse).
I have played about with memory settings in postgresql.conf but nothing seems to enable speeds comparable to MSSQL.
Explain Analyze shows that despite the indexes it always runs a sequential scan of all tables. Forcing indexed scans doesn't make any difference to processing time.
Would I be right in thinking Postgres would perform significantly better using a cluster and partitioning? Even if this is the case surely a simple query like the one I'm trying to run on a stand alone machine should be faster?
TABLE DETAILS
Dimension.BillingMonth
Records 120,
Primary Key is KeyDate,
Clustered Unique Index on KeyDate
Dimension.Exchange
Records 1
Dimension.ProductDataInvoiced
Records 275563,
Primary Key is KeyProduct,
Clustered Unique Index on KeyProduct
Dimension.SalesforceInvoiced
Records 377414,
Primary Key is KeyShipTo,
Clustered Unique Index on KeyShipTo
Fact.SalesDataInvoiced
Records 43807943,
Non-Clustered Unique Index on KeyShipTo, KeyProduct, KeyBillingMonth
Any help would be appreciated as previously mentioned I'm sure I must be missing something obvious.
Many thanks in advance.
David
Thank you for the responses. I have placed additional info below.
Forgot to add my postgres performance woes were after i'd carried out a Full Vacuum and Reindex. I performed these maintenance tasks after I had imported the data and created my indexes.
Output after querying pg_indexes
tablename
indexname
indexdef
BillingMonth
BillingMonth_pkey
CREATE UNIQUE INDEX BillingMonth_pkey ON public.BillingMonth USING btree (KeyDate)
ProductDataInvoiced
ProductDataInvoiced_pkey
CREATE UNIQUE INDEX ProductDataInvoiced_pkey ON public.ProductDataInvoiced USING btree (KeyProductCode)
SalesforceInvoiced
SalesforceInvoiced_pkey
CREATE UNIQUE INDEX SalesforceInvoiced_pkey ON public.SalesforceInvoiced USING btree (KeyShipTo)
SalesDataInvoiced
CI_SalesData
CREATE INDEX CI_SalesData ON public.SalesDataInvoiced USING btree (KeyShipTo, KeyProductCode, KeyBillingMonth)
Output After running EXPLAIN (ANALYZE, BUFFERS)
Finalize GroupAggregate (cost=1435439.30..1435565.71 rows=480 width=53) (actual time=25960.468..25973.326 rows=31 loops=1)
Group Key: pdi."ProductType", pdi."ProductTypeGroup", pdi."MainProductCategories"
Buffers: shared hit=71246 read=859119
-> Gather Merge (cost=1435439.30..1435551.31 rows=960 width=53) (actual time=25960.458..25973.282 rows=89 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=71246 read=859119
-> Sort (cost=1434439.28..1434440.48 rows=480 width=53) (actual time=25956.982..25956.989 rows=30 loops=3)
Sort Key: pdi."ProductType", pdi."ProductTypeGroup", pdi."MainProductCategories"
Sort Method: quicksort Memory: 28kB
Buffers: shared hit=71246 read=859119
Worker 0: Sort Method: quicksort Memory: 29kB
Worker 1: Sort Method: quicksort Memory: 29kB
-> Partial HashAggregate (cost=1434413.10..1434417.90 rows=480 width=53) (actual time=25956.878..25956.895 rows=30 loops=3)
Group Key: pdi."ProductType", pdi."ProductTypeGroup", pdi."MainProductCategories"
Batches: 1 Memory Usage: 49kB
Buffers: shared hit=71230 read=859119
Worker 0: Batches: 1 Memory Usage: 49kB
Worker 1: Batches: 1 Memory Usage: 49kB
-> Parallel Hash Join (cost=62124.74..1327935.46 rows=10647764 width=45) (actual time=285.864..19240.004 rows=14602648 loops=3)
Hash Cond: (sdi."KeyShipTo" = sfi."KeyShipTo")
Buffers: shared hit=71230 read=859119
-> Hash Join (cost=19648.48..1257508.51 rows=10647764 width=49) (actual time=204.794..12862.063 rows=14602648 loops=3)
Hash Cond: (sdi."KeyProductCode" = pdi."KeyProductCode")
Buffers: shared hit=32264 read=859119
-> Hash Join (cost=3.67..1091456.95 rows=10647764 width=8) (actual time=0.143..7076.104 rows=14602648 loops=3)
Hash Cond: (sdi."KeyBillingMonth" = bm."KeyDate")
Buffers: shared hit=197 read=859119
-> Parallel Seq Scan on "SalesData_Invoiced" sdi (cost=0.00..1041846.10 rows=18253310 width=12) (actual
time=0.071..2585.596 rows=14602648 loops=3)
Buffers: shared hit=194 read=859119
-> Hash (cost=2.80..2.80 rows=70 width=4) (actual time=0.049..0.050 rows=70 loops=3)
Hash Cond: (sdi."KeyBillingMonth" = bm."KeyDate")
Buffers: shared hit=197 read=859119
-> Parallel Seq Scan on "SalesData_Invoiced" sdi (cost=0.00..1041846.10 rows=18253310 width=12) (actual
time=0.071..2585.596 rows=14602648 loops=3)
Buffers: shared hit=194 read=859119
-> Hash (cost=2.80..2.80 rows=70 width=4) (actual time=0.049..0.050 rows=70 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 11kB
Buffers: shared hit=3
-> Seq Scan on "BillingMonth" bm (cost=0.00..2.80 rows=70 width=4) (actual time=0.012..0.028
rows=70 loops=3)
Filter: (("ActualDate" >= '2016-01-01'::date) AND ("ActualDate" <= '2021-10-01'::date))
Rows Removed by Filter: 50
Buffers: shared hit=3
-> Hash (cost=16200.27..16200.27 rows=275563 width=49) (actual time=203.237..203.238 rows=275563 loops=3)
Buckets: 524288 Batches: 1 Memory Usage: 26832kB
Buffers: shared hit=32067
-> Nested Loop (cost=0.00..16200.27 rows=275563 width=49) (actual time=0.034..104.143 rows=275563 loops=3)
Buffers: shared hit=32067
-> Seq Scan on "Exchange" ex (cost=0.00..1.01 rows=1 width=0) (actual time=0.024..0.024 rows=
1 loops=3)
Buffers: shared hit=3
-> Seq Scan on "ProductData_Invoiced" pdi (cost=0.00..13443.63 rows=275563 width=49) (actual
time=0.007..48.176 rows=275563 loops=3)
Buffers: shared hit=32064
-> Parallel Hash (cost=40510.56..40510.56 rows=157256 width=4) (actual time=79.536..79.536 rows=125805 loops=3)
Buckets: 524288 Batches: 1 Memory Usage: 18912kB
Buffers: shared hit=38938
-> Parallel Seq Scan on "Salesforce_Invoiced" sfi (cost=0.00..40510.56 rows=157256 width=4) (actual time=
0.011..42.968 rows=125805 loops=3)
Buffers: shared hit=38938
Planning:
Buffers: shared hit=426
Planning Time: 1.936 ms
Execution Time: 25973.709 ms
(55 rows)
Firstly, remember to run VACUUM ANALYZE after rebuilding indexes, or sometimes after importing large amount of data. (VACUUM FULL is mainly useful for the OS to reclaim disk space, and you'd still need to analyse afterwards, especially after rebuilding indexes.)
It seems from your query that your main table is SalesDataInvoiced (SDI) and that you'd want to use an index on KeyBillingMonth if possible (since it's the main restriction you're placing). In general, you'd also want indexes, at least on the other tables on the columns that are used for the joins.
As the documentation for multi-column indexes in PostgreSQL says:
A multicolumn B-tree index can be used with query conditions that involve any subset of the index's columns, but the index is most efficient when there are constraints on the leading (leftmost) columns. The exact rule is that equality constraints on leading columns, plus any inequality constraints on the first column that does not have an equality constraint, will be used to limit the portion of the index that is scanned. Constraints on columns to the right of these columns are checked in the index, so they save visits to the table proper, but they do not reduce the portion of the index that has to be scanned. For example, given an index on (a, b, c) and a query condition WHERE a = 5 AND b >= 42 AND c < 77, the index would have to be scanned from the first entry with a = 5 and b = 42 up through the last entry with a = 5. Index entries with c >= 77 would be skipped, but they'd still have to be scanned through. This index could in principle be used for queries that have constraints on b and/or c with no constraint on a — but the entire index would have to be scanned, so in most cases the planner would prefer a sequential table scan over using the index.
In your example, the main column you'd want to use a constraint on (KeyBillingMonth) is in third position, so it's unlikely to be used.
CREATE INDEX CI_SalesData ON public.SalesDataInvoiced
USING btree (KeyShipTo, KeyProductCode, KeyBillingMonth)
Creating this should make it more likely to be used:
CREATE INDEX ON SalesDataInvoiced(KeyBillingMonth);
Then, run VACUUM ANALYZE and try your query again.
You may also want an index on BillingMonth(ActualDate), but that's not necessarily useful since there seems to be few rows (and most of them are returned in your query).
It's not clear what the BillingMonth table is for. If it's basically about truncating the ActualDate to have the first day of the month, you could for example get rid of the join on BillingMonth and use the constraint on SalesDataInvoiced.KeyBillingMonth directly. For example ... WHERE SDI.KeyBillingMonth BETWEEN '2016-01-01' AND '2021-10-01' ....
As a side-note, as far as I know, BETWEEN is inclusive for its upper bound. I'd imagine a query like this is meant to represent some monthly statistics, hence should probably not include what's on 2021-10-01 (but not the rest of that month).
I have a table where I store information about a customer and the timestamp, and time range of an event.
The indices I use look as follows:
event_index (customer_id, time)
state_index (customer_id, end, start desc)
The vast majority of queries query the last few days about state and events.
This is a sample query text (events have identical an identical problem as I'll describe for states):
SELECT "states".*
FROM "states"
WHERE ("states"."customer_id" = $1 AND "states"."start" < $2)
AND ("states"."end" IS NULL OR "states"."end" > $3)
AND ("states"."obsolete" = $4)
ORDER BY "states"."start" DESC
I see that sometimes the query planner only uses only the customer_id to filter, and then filters using a heap scan all rows for the customer:
Sort (cost=103089.00..103096.17 rows=2869 width=78)
Sort Key: start DESC
-> Bitmap Heap Scan on states (cost=1222.56..102924.23 rows=2869 width=78)
Recheck Cond: (customer_id = '----'::bpchar)
Filter: ((NOT obsolete) AND ((start)::double precision < '1557711009'::double precision) AND ((end IS NULL) OR ((end)::double precision > '1557666000'::double precision)))
-> Bitmap Index Scan on states_index (cost=0.00..1221.85 rows=26820 width=0)
Index Cond: (customer_id = '----'::bpchar)
This is in contrast to what I see in a session manually:
Sort Key: start DESC
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on states (cost=111.12..9338.04 rows=1 width=78) (actual time=141.674..141.674 rows=0 loops=1)
Recheck Cond: (((customer_id = '-----'::bpchar) AND (end IS NULL) AND (start < '1557349200'::numeric)) OR ((customer_id = '----'::bpchar) AND (end > '1557249200'::numeric) AND (start < '1557349200'::numeric)))
Filter: ((NOT obsolete) AND ((title)::text = '---'::text))
Rows Removed by Filter: 112
Heap Blocks: exact=101
-> BitmapOr (cost=111.12..111.12 rows=2333 width=0) (actual time=4.198..4.198 rows=0 loops=1)
-> Bitmap Index Scan on states_index (cost=0.00..4.57 rows=1 width=0) (actual time=0.086..0.086 rows=0 loops=1)
Index Cond: ((customer_id = '----'::bpchar) AND (end IS NULL) AND (start < '1557349200'::numeric))
-> Bitmap Index Scan on state_index (cost=0.00..106.55 rows=2332 width=0) (actual time=4.109..4.109 rows=112 loops=1)
Index Cond: ((customer_id = '---'::bpchar) AND (end > '1557262800'::numeric) AND (start < '1557349200'::numeric))
In other words - the query planner sometimes chooses to use only the first column of the index which slows the query significantly.
I can see why it makes sense to just bring the entire customer data when its small enough and filter in memory, but the problem is this data is very sparse and is probably not entirely cached (data from a year ago is probably not cached for the customer, database is a few hundreds of GBs). If the index would use the timestamps to the fullest extent (as in the second example) - the result should be much faster since recent data is cached.
I used a partial index on the last week to see if the query time drops but postgres only uses it sometimes. This solves the problem when the partial index is used since old rows do not exist in that index - but sadly postgres still selects the bigger index even when it doesn't have to. I ran vacuum analyze but to no visible effect.
I tried to see the cache hits using this:
Database Name | Temporary files | Size of temporary files | Block Hits | Block Reads
------------------+-----------------+-------------------------+---------------+-------------
customers | 1922 | 18784440622 | 69553504584 | 2401546773
And then I calculated (block_hits/(block_hits + block_reads)):
>>> 69553504584.0 / (69553504584.0 + 2401546773.0)
0.9666243477322406
So this shows me ~96.6% cache (I want it much closer to 100 because I know the nature of the queries)
I also tried increasing statistics (SET STATISTICS) on the customer_id, start and end since it seemed to be a suggestion to people facing query planner issues. It didn't help as well (and I ran analyze after...).
After reading further about this issue I saw that there is a way to make the query planner prefer index scans using lower random_page_cost than the default (4). I also saw a post backing that here:
https://amplitude.engineering/how-a-single-postgresql-config-change-improved-slow-query-performance-by-50x-85593b8991b0
Does this make sense for my use case? Will it make the query planner use the index to the fullest more often (preferably always)?
If not - is there something else I can do to lower the query time? I know partitioning can be very effective but seems to be an overkill and is not fully supported on my current postgres version (9.5.9) as far as I can tell from what I've read.
Update: After lowering random_page_cost I don't see a conclusive difference. There are still times where the query planner chooses to only use part of the. index for a much slower result.
Any suggestions are very welcome.
Thanks :)
I have created a 36M rows table with an index on type column:
CREATE TABLE items AS
SELECT
(random()*36000000)::integer AS id,
(random()*10000)::integer AS type,
md5(random()::text) AS s
FROM
generate_series(1,36000000);
CREATE INDEX items_type_idx ON items USING btree ("type");
I run this simple query and expect postgresql to use my index:
explain select count(*) from "items" group by "type";
But the query planner decides to use Seq Scan instead:
HashAggregate (cost=734592.00..734627.90 rows=3590 width=12) (actual time=6477.913..6478.344 rows=3601 loops=1)
Group Key: type
-> Seq Scan on items (cost=0.00..554593.00 rows=35999800 width=4) (actual time=0.044..1820.522 rows=36000000 loops=1)
Planning time: 0.107 ms
Execution time: 6478.525 ms
Time without EXPLAIN: 5s 979ms
I have tried several solutions from here and here:
Run VACUUM ANALYZE or VACUUM ANALYZE
Configure default_statistics_target, random_page_cost, work_mem
but nothing helps apart from setting enable_seqscan = OFF:
SET enable_seqscan = OFF;
explain select count(*) from "items" group by "type";
GroupAggregate (cost=0.56..1114880.46 rows=3590 width=12) (actual time=5.637..5256.406 rows=3601 loops=1)
Group Key: type
-> Index Only Scan using items_type_idx on items (cost=0.56..934845.56 rows=35999800 width=4) (actual time=0.074..2783.896 rows=36000000 loops=1)
Heap Fetches: 0
Planning time: 0.103 ms
Execution time: 5256.667 ms
Time without EXPLAIN: 659ms
Query with index scan is about 10x faster on my machine.
Is there a better solution than setting enable_seqscan?
UPD1
My postgresql version is 9.6.3, work_mem = 4MB (tried 64MB), random_page_cost = 4 (tried 1.1), max_parallel_workers_per_gather = 0 (tried 4).
UPD2
I have tried to fill type column not with random numbers, but with i / 10000 to make pg_stats.correlation = 1 - still seqscan.
UPD3
#jgh is 100% right:
This typically only happens when the table's row width is much wider than some indexes
I've made large column data and now postgres use index. Thanks everyone!
The Index-only scans wiki says
It is important to realise that the planner is concerned with
minimising the total cost of the query. With databases, the cost of
I/O typically dominates. For that reason, "count(*) without any
predicate" queries will only use an index-only scan if the index is
significantly smaller than its table. This typically only happens when
the table's row width is much wider than some indexes'.
and
Index-only scans are only used when the planner surmises that that
will reduce the total amount of I/O required, according to its
imperfect cost-based modelling. This all heavily depends on visibility
of tuples, if an index would be used anyway (i.e. how selective a
predicate is, etc), and if there is actually an index available that
could be used by an index-only scan in principle
Accordingly, your index is not considered "significantly smaller" and the entire dataset is to be read, which leads the planner in using a seq scan
We have queries of the form
select sum(acol)
where xpath_exists('/Root/KeyValue[Key="val"]/Value//text()', xmlcol)
What index can be built to speed up the where clause ?
A btree index created using
create index idx_01 using btree(xpath_exists('/Root/KeyValue[Key="val"]/Value//text()', xmlcol))
does not seem to be used at all.
EDIT
Setting enable_seqscan to off, the query using xpath_exists is much faster (one order of magnitude) and clearly shows using the corresponding index (the btree index built with xpath_exists).
Any clue why PostgreSQL would not be using the index and attempt a much slower sequential scan ?
Since I do not want to disable sequential scanning globally, I am back to square one and I am happily welcoming suggestions.
EDIT 2 - Explain plans
See below - Cost of first plan (seqscan off) is slightly higher but processing time much faster
b2box=# set enable_seqscan=off;
SET
b2box=# explain analyze
Select count(*)
from B2HEAD.item
where cluster = 'B2BOX' and ( ( xpath_exists('/MessageInfo[FinalRecipient="ABigBank"]//text()', content) ) ) offset 0 limit 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=22766.63..22766.64 rows=1 width=0) (actual time=606.042..606.042 rows=1 loops=1)
-> Aggregate (cost=22766.63..22766.64 rows=1 width=0) (actual time=606.039..606.039 rows=1 loops=1)
-> Bitmap Heap Scan on item (cost=1058.65..22701.38 rows=26102 width=0) (actual time=3.290..603.823 rows=4085 loops=1)
Filter: (xpath_exists('/MessageInfo[FinalRecipient="ABigBank"]//text()'::text, content, '{}'::text[]) AND ((cluster)::text = 'B2BOX'::text))
-> Bitmap Index Scan on item_counter_01 (cost=0.00..1052.13 rows=56515 width=0) (actual time=2.283..2.283 rows=4085 loops=1)
Index Cond: (xpath_exists('/MessageInfo[FinalRecipient="ABigBank"]//text()'::text, content, '{}'::text[]) = true)
Total runtime: 606.136 ms
(7 rows)
plan on explain.depesz.com
b2box=# set enable_seqscan=on;
SET
b2box=# explain analyze
Select count(*)
from B2HEAD.item
where cluster = 'B2BOX' and ( ( xpath_exists('/MessageInfo[FinalRecipient="ABigBank"]//text()', content) ) ) offset 0 limit 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=22555.71..22555.72 rows=1 width=0) (actual time=10864.163..10864.163 rows=1 loops=1)
-> Aggregate (cost=22555.71..22555.72 rows=1 width=0) (actual time=10864.160..10864.160 rows=1 loops=1)
-> Seq Scan on item (cost=0.00..22490.45 rows=26102 width=0) (actual time=33.574..10861.672 rows=4085 loops=1)
Filter: (xpath_exists('/MessageInfo[FinalRecipient="ABigBank"]//text()'::text, content, '{}'::text[]) AND ((cluster)::text = 'B2BOX'::text))
Rows Removed by Filter: 108945
Total runtime: 10864.242 ms
(6 rows)
plan on explain.depesz.com
Planner cost parameters
Cost of first plan (seqscan off) is slightly higher but processing time much faster
This tells me that your random_page_cost and seq_page_cost are probably wrong. You're likely on storage with fast random I/O - either because most of the database is cached in RAM or because you're using SSD, SAN with cache, or other storage where random I/O is inherently fast.
Try:
SET random_page_cost = 1;
SET seq_page_cost = 1.1;
to greatly reduce the cost param differences and then re-run. If that does the job consider changing those params in postgresql.conf..
Your row-count estimates are reasonable, so it doesn't look like a planner mis-estimation problem or a problem with bad table statistics.
Incorrect query
Your query is also incorrect. OFFSET 0 LIMIT 1 without an ORDER BY will produce unpredictable results unless you're guaranteed to have exactly one match, in which case the OFFSET ... LIMIT ... clauses are unnecessary and can be removed entirely.
You're usually much better off phrasing such queries as SELECT max(...) or SELECT min(...) where possible; PostgreSQL will tend to be able to use an index to just pluck off the desired value without doing an expensive table scan or an index scan and sort.
Tips
BTW, for future questions the PostgreSQL wiki has some good information in the performance category and a guide to asking Slow query questions.