I am trying to understand how PostgreSQL physical index layout is. What I came to know is indexes are stores as part of set of pages with a B tree data structure. I am trying to understand how vacuumming impacts indexes. Does it help to contain its size?
B-tree indexes are decade-old technology, so a web search will turn up plenty of good detailed descriptions. In a nutshell:
A B-tree is a balanced tree of index pages (8KB in PostgreSQL), that is, every branch of the tree has the same depth.
The tree is usually drawn upside down, the starting (top) node is the root node, and the pages at the bottom are called leaf nodes.
Each level of the tree partitions the search space; the deeper the level, the finer the partitioning, until the individual index entries are reached in the leaf nodes.
Each entry in an index page points to a table entry (in the leaf nodes) or to another index page at the next level.
This is a sketch of an index with depth three, but mind the following:
some nodes are omitted, in reality all leaf nodes are on level 3
in reality here are not three entries (keys) in one node, but around 100
┌───────────┐
level 1 (root node) │ 20 75 100 │
└───────────┘
╱ ╱ │ ╲
╱ ╱ │ ╲
╱ ╱ │ ╲
┌───────────┐┌─────┐┌──────────┐┌─────┐
level 2 │ 5 10 15 ││ ... ││ 80 87 95 ││ ... │
└───────────┘└─────┘└──────────┘└─────┘
╱ ╱ │ ╲
╱ ╱ │ ╲
╱ ╱ │ ╲
┌─────┐┌─────┐┌──────────┐┌─────┐
level 3 (leaf nodes) │ ... ││ ... ││ 89 91 92 ││ ... │
└─────┘└─────┘└──────────┘└─────┘
Some notes:
The pointers to the next level are actually in the gaps between the entries, searching in an index is like “drilling down” to the correct leaf page.
Each node ia also linked with its siblings to facilitate insertion and deletion of nodes.
When a node is full, it is split in two new nodes. This splitting can recurse up and even reach the root node. When the root node is split, the depth of the index increases by 1.
In real life, the depth of a B-tree index can hardly exceed 5.
When an index entry is deleted, an empty space remains. There are techniques to consolidate that by joining pages, but this is tricky, and PostgreSQL doesn't do that.
Now to your question:
When a table (heap) entry is removed by VACUUM because it is not visible for any active snapshot, the corresponding entry in the index is removed as well. This results in empty space in the index, which can be reused by future index entries.
Empty index pages can be deleted, but the depth of the index will never be reduced. So mass deletion can (after VACUUM has done its job) reduce the index size, but will more likely result in a bloated index with pages that contain only few keys and lots of empty space.
A certain amount of index bloat (up to more than 50%) is normal, but if unusual usage patterns like mass updates and deletes cause bad index bloat, you'll have to rewrite the index with REINDEX, thereby getting rid of bloat. Unfortunately this operation locks the index, so that all concurrent access is blocked until it is done.
Related
I have an ltree column containing a tree with a depth of 3. I'm trying to write a query that can select all children at a specific depth (level 1 = get all parents, 2 = get all children, 3 = get all grandchildren). I know this is pretty straightforward with n_level:
SELECT path FROM hierarchies
WHERE
nlevel(path) = 1
LIMIT 1000;
I have 200,000 dummy records and it's pretty fast (~170 ms). However, this query uses a sequential scan. I think it'd be better to write it in a way that takes advantage of the ltree operators supported by the GiST index. Frustratingly, I can't seem to wrap my brain around them, and I haven't found a similar question on SO or DBA (besides this one on finding leaves)
Any advice is appreciated!
The only index that could support your query is a simple b-tree index on an expression.
create index on hierarchies((nlevel(path)))
Note however that it is quite possible for the planner to choose a sequential scan anyway, exemplary in the case the number of rows with level 1 is much more than other levels.
We have a Postgres 11.2 database which stores time-series of values against a composite key. Given 1 or a number of keys, the query tries to find the latest value(s) in each time-series given a time constraint.
We suffer query timeouts when the data is not cached, because it seems to have to walk a huge number of pages in order to find the data.
Here is the relevant section in the explain. We are getting the data for a single time-series (with 367 values in this example):
-> Index Scan using quotes_idx on quotes q (cost=0.58..8.61 rows=1 width=74) (actual time=0.011..0.283 rows=367 loops=1)
Index Cond: ((client_id = c.id) AND (quote_detail_id = qd.id) AND (effective_at <= '2019-09-26 00:59:59+01'::timestamp with time zone) AND (effective_at >= '0001-12-31 23:58:45-00:01:15 BC'::timestamp with time zone))
Buffers: shared hit=374
This is the definition of the index in question:
CREATE UNIQUE INDEX quotes_idx ON quotes.quotes USING btree (client_id, quote_detail_id, effective_at);
Where the columns are 2x int4 and a timestamptz, respectively.
Assuming I'm reading the output correctly, why is the engine walking 374 pages (~3Mb, given our 8kb page size) in order to return ~26kb of data (367 rows of width 74 bytes)?
When we scale up the number of keys (say, 500) the engine ends up walking over 150k pages (over 1GB), which when not cached, takes a significant time.
Note, the average row size in the underlying table is 82 bytes (over 11 columns), and contains around 700mi rows.
Thanks in advance for any insights!
The 367 rows found in your index scan are probably stored in more than 300 table blocks (that is not surprising in a large table). So PostgreSQL has to access all these blocks to come up with a result.
This would perform much better if the rows were all concentrated in a few table blocks. In other words, if the logical ordering of the index would correspond to the physical order of the rows in the table. In PostgreSQL terms, a high correlation would be beneficial.
You can force PostgreSQL to rewrite the entire table in the correct order with
CLUSTER quotes USING quotes_idx;
Then your query should become much faster.
There are some disadvantages though:
While CLUSTER is running, the table is not accessible. This usually means down time.
Right after CLUSTER, performance will be good, but PostgreSQL does not maintain the ordering. Subsequent data modifications will reduce the correlation.
To keep the query performing well, you'll have to schedule CLUSTER regularly.
Reading 374 blocks to obtain 367 rows is not unexpected. CLUSTERing the data is one way to address that, as already mentioned. Another possibility is to add some more columns into the index column list (by creating a new index and dropping the old one), so that the query can be satisfied with an index-only-scan.
This requires no down-time if the index is created concurrently. You do have to keep the table well-vacuumed, which can be tricky to do as the autovacuum parameters were really not designed with IOS in mind. It requires no maintenance, other than the vacuuming, so I would prefer this method if the list (and size) of columns you need to add to the index is small.
Question
I would like to know: How can I rewrite/alter my search query/strategy to get an acceptable performance for my end users?
The search
I'm implementing a search for our users, they are provided the ability to search for candidates on our system based on:
A professional group they fall into,
A location + radius,
A full text search.
The query
select v.id
from (
select
c.id,
c.ts_description,
c.latitude,
c.longitude,
g.group
from entities.candidates c
join entities.candidates_connections cc on cc.candidates_id = c.id
join system.groups g on cc.systems_id = g.id
) v
-- Group selection
where v.group = 'medical'
-- Location + radius
and earth_distance(ll_to_earth(v.latitude, v.longitude), ll_to_earth(50.87050439999999, -1.2191283)) < 48270
-- Full text search
and v.ts_description ## to_tsquery('simple', 'nurse | doctor')
;
Data size & benchmarks
I am working with 1.7 million records
I have the 3 conditions in order of impact which were benchmarked in isolation:
Group clause: 3s & reduces to 700k records
Location clause: 8s & reduces to 54k records
Full text clause: 60s+ & reduces to 10k records
When combined they seem to take 71s, which is the full impact of the 3 queries in isolation, my expectation was that when putting all 3 clauses together they would work sequentially i.e on the subset of data from the previous clause therefore the timing should reduce dramatically - but this has not happened.
What I've tried
All join conditions & where clauses are indexed
Notably the ts_description index (GIN) is 2GB
lat/lng is indexed with ll_to_earth() to reduce the impact inline
I nested each where clause into a different subquery in order
Changed the order of all clauses & subqueries
Increased the shared_buffers size to increase the potential cache hits
It seems you do not need to subquery, and it is also a good practice to filter with numeric fields, so, instead of filtering with where v.group = 'medical' for example, create a dictionary and just filter with where v.group = 1
select
DISTINCT c.id,
from entities.candidates c
join entities.candidates_connections cc on cc.candidates_id = c.id
join system.groups g on cc.systems_id = g.id
where tablename.group = 1
and earth_distance(ll_to_earth(v.latitude, v.longitude), ll_to_earth(50.87050439999999, -1.2191283)) < 48270
and v.ts_description ## to_tsquery(0, 1 | 2)
also, use EXPLAIN ANALYSE to see and check your execution plan. These quick tips will help you improve it clearly.
There were some best practice cases that I had not considered, I have subsequently implemented these to gain a substantial performance increase:
tsvector Index Size Reduction
I was storing up to 25,000 characters in the tsvector, this meant that when more complicated full text search queries were used there was just an immense amount of work to do, I reduced this down to 10,000 which has made a big difference and for my use case this is an acceptable trade-off.
Create a Materialised View
I created a materialised view that contains the join, this offloads a little bit of the work, additionally I built my indexes on there and run a concurrent refresh on a 2 hour interval. This gives me a pretty stable table to work with.
Even though my search yields 10k records I end up paginating on the front-end so I only ever bring back up to 100 results on the screen, this allows me to join onto the original table for only the 100 records I'm going to send back.
Increase RAM & utilise pg_prewarm
I increased the server RAM to give me enough space to store my materialised view into, then ran pg_prewarm on my materialised view. Keeping it in memory yielded the biggest performance increase for me, bringing a 2m query down to 3s.
I have been struggling with marker-clustering problem with 1000+ markers (that should be put on a Google map). I am not very keen on rendering large JSON structures with all the markers, neither I am fond of some complex server "geo"-computations with PostGIS.
The solution I came up with is to divide world map into some sort of hierarchical spatial tree, let's say quad tree, where each point in my db will be assigned with "coordinates" in that tree. These coordinates are strings that have on position_x index_of_tile in tier_x, e.g. '031232320012'. The length of the string depends on number of zoom levels that will be enabled for the front-end map. Basically if a user moves or zooms the map, I'll launch Ajax GET request with the current zoom level and view port coordinates as parameters. Then in back-end I plan to build a string that should point to the "viewport at the given zoom level", e.g. '02113' and I want to find all points that have this prefix ('02113') in the tree coordinates column.
EDIT: I will also need fast GROUP BY, e.g. SELECT count(*) from points GROUP BY left(coordinates, 5);
My question is how to perform these operations as fast as possible? My database is PostgreSQL.
Then in back-end I plan to build a string that should point to the "viewport at the given zoom level", e.g. '02113' and I want to find all points that have this prefix ('02113') in the tree coordinates column.
An ordinary index should perform well on any modern dbms as long as you're looking at the left-most five (or six or seven) characters of a string in an indexed column.
SELECT ...
...
WHERE column_name LIKE '02113%';
In PostgreSQL, you can also build an index on an expression. So you could create an index on the first five characters.
CREATE INDEX your_index_name ON your_table (left(column_name, 5));
I'd expect PostgreSQL's query optimizer to pick the right index if there were three or four like that. (One for 5 characters, one for 6 characters, etc.)
I build a table, and I populated it with a million rows of random data.
In the following query, PostgreSQL's query optimizer did pick the right index.
explain analyze
select s
from coords
where left(s, 5) ='12345';
It returned in 0.1 ms.
I also tested using GROUP BY. Again, PostgreSQL's query optimizer picked the right index.
"GroupAggregate (cost=0.00..62783.15 rows=899423 width=8) (actual time=91.300..3096.788 rows=90 loops=1)"
" -> Index Scan using coords_left_idx1 on coords (cost=0.00..46540.36 rows=1000000 width=8) (actual time=0.051..2915.265 rows=1000000 loops=1)"
"Total runtime: 3096.914 ms"
An expression like left(name, 2) in the GROUP BY clause will require PostgreSQL to touch every row in the index, if not every row in the table. That's why my query took 3096ms; it had to touch a million rows in the index. But you can see from the EXPLAIN plan that it used the index.
Ordinarily, I'd expect a geographic application to use a bounding box against a PostGIS table to reduce the number of rows you access. If your quad tree implementation can't do better than that, I'd stick with PostGIS long enough to become an expert with it. (You won't know for sure that it can't do the job until you've spent some time in it.)
Have a 3 part composite key Int, Int, Int on a large table
Insert speed degrades due to fragmentation
PK1 does not fragment (inserts are in order and never revised)
But PK2,and PK3 fragment badly and quickly
What strategy should I use for index maintenance?
Is there a way to Rebuild the index with?
PK1 fill factor 100
PK2 fill factor 10
PK3 fill factor 10
No - it's ONE index - you cannot have different fill factors on the columns of a single index ... the index structure is made up of entries of (PK1, PK2, PK3) and this tuple combined is stored on the pages. You can only set fill factors for the index/page - not for individual parts of a compound index.
My typical approach would be to use something like 70% or 80% on an index I suspect of fragmentation, and then just observe. See how fast and how badly it fragments. If it's unbearable later in the day - lower the fill factor even more. Typically, with a 70-80% fill factor, you should be fine during the day, and if you rebuild those critical indexes every night, your system should work fine.