this is a follow-up question to my previous post regarding BRIN indexes in Postgres 9.5 Postgres choosing BTREE instead of BRIN index.
Is there an intelligent way to calculate what the:
pages_per_range
value should be should set to when creating a BRIN index? I have a fact table that will grow by about 1 million rows every day and I want to place a BRIN index on the date_key column (integer). A typical value would be 20170801 (August, 1 2017).
From my previous post I found that setting it to half the default value ( I set it to 64) led to a huge speed increase in query time. I can choose an arbitrary value (like 64) but wanted to know if anyone had another (more intelligent) method?
Thanks
Ryan
Related
What is the difference between a BRIN index and a table partition in PostgreSQL? When I should use one instead of another? It seems that they provide very similar benefits and also have similar use cases
Example
Suppose we have the following table structure
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
store_id INT,
client_id INT,
created_at timestamp,
information jsonb
)
that has the following characteristics:
orders can only be inserted, deletions are not allowed and updates are very rare and they don't involve the created_at column
the created_at column contains the timestamp of the insertion of the row in the database thus the values in the column are strictly increasing
almost every query use the created_at column in a condition and some of them may use the store_id and client_id columns
the most accessed rows are the most recent ones in terms of the created_at column
some queries may return a few records (example: analyzing a single record or the records created in a small time interval) while others may scan a vast amount of records (example: aggregate functions for a dashboard functionality)
I have chosen this example because it's very common and also both approach could be used (in my opinion). In this case which choice should I use between a BRIN index on the whole table or a partitioned table with maybe a btree index (or just a simple btree index without partitioning)? Does the table dimension influence the choice?
I have used both features (although I'll caveat that my experience with partitioning is from back when you had to use inheritance + constraints, before the introduction of CREATE TABLE ... PARTITION BY). You are correct that they seem similar-ish on a surface level, but they function by completely different mechanisms.
Table partitioning basically works as follows: replace all references to table with (select * from table_partition1 union all select * from table_partition2 /* repeat for all partitions */). The partitions will have a constraint on the partition columns, so that if those columns appear in a WHERE, the constraints can be applied up-front to prune which partitions are actually scanned. IOW, if table_partition1 has CHECK(client_id=1), and your WHERE Has client_id=2, table_partition1 will be skipped since the table constraint automatically excludes all rows from this partition from passing that WHERE.
BRIN indexes, in contrast, choose a block size for the table, and then for each block, records a min/max bound of the indexed column. This allows WHERE conditions to skip entire blocks when we can see, say, that the maximum created_at in a particular block of rows is below a created_at>={some_value} clause in your WHERE.
I can't tell you a definitive answer for your case as to which would work better. Well, that's not true, actually: the definitive answer is, "benchmark it for your own data" ;)
This is kind of fuzzy, but my general feeling is that BRIN is lightweight, and table partitioning is not. BRIN is something that can be added on to an existing table without much trouble, the indexes themselves are very small, and the impact on writes is not major (at least, not without inordinately many indices). Table partitioning, on the other hand, is a different way of representing the data on-disk; you are actually determining into which data files particular rows will be written. This requires a much more involved migration process when introducing it to an existing dataset.
However, the set of query optimizations available for table partitioning is much greater. Not only is there the constraint exclusion I described above, but you can also have indices (even BRIN ones!) on each individual partition. Of course, you can also have BRIN + other indices on a single-big-table, but I'm not sure that is particularly helpful IRL.
A few other thoughts: BRIN is good for monotonic data (timestamps, incremnting IDs, etc); the more correlated the on-disk ordering is to the indexed value, the more effective a BRIN index can be at pruning blocks to be scanned. Things like customer IDs, however, are unlikely to work well with BRIN; any given block of rows is likely to have at least one relatively low and relatively high ID. However, fields that like work quite well for partitioning: a partition-per-client, or partitioning on the modulus of a customer ID (which would more commonly be called sharding), is a good way of scaling horizontally, almost without bound.
Any update, even if it does not change the indexed column, will make a BRIN index pretty useless (unless it is a HOT update). Even without that, there are differences, for example:
partitioning allows you to get rid of lots of data efficiently, a BRIN index won't
a partitioned table allows one autovacuum worker per partition, which improves autovacuum performance
But if your only concern is to efficiently select all rows for a certain value of the index or partitioning key, both may offer about the same benefit.
So I have a (logged) table with two columns A, B, containing text.
They basically contain the same type of information, it's just two columns because of where the data came from.
I wanted to have a table of all unique values (so I made the column be the primary key), not caring about the column. But when I asked postgres to do
insert into new_table(value) select A from old_table on conflict (value) do nothing; (and later on same thing for column B)
it used 1 cpu core, and only read from my SSD with about 5 MB/s. I stopped it after a couple of hours.
I suspected that it might be because of the b-tree being slow and so I added a hashindex on the only attribute in my new table. But it's still using 1 core to the max and reading from the ssd at only 5 MB/s per second. My java program can hashset that at at least 150 MB/s, so postgres should be way faster than 5 MB/s, right? I've analyzed my old table and I made my new table unlogged for faster inserts and yet it still uses 1 core and reads extremely slowly.
How to fix this?
EDIT: This is the explain to the above query. Seems like postgres is using the b-tree it created for the primary key instead of my (much faster, isn't it??) Hash index.
Insert on users (cost=0.00..28648717.24 rows=1340108416 width=14)
Conflict Resolution: NOTHING
Conflict Arbiter Indexes: users_pkey
-> Seq Scan on games (cost=0.00..28648717.24 rows=1340108416 width=14)
The ON CONFLICT mechanism is primarily for resolving concurrency-induced conflicts. You can use it in a "static" case like this, but other methods will be more efficient.
Just insert only distinct values in the first place:
insert into new_table(value)
select A from old_table union
select B from old_table
For increased performance, don't add the primary key until after the table is populated. And set work_mem to the largest value you credibly can.
My java program can hashset that at at least 150 MB/s,
That is working with the hashset entirely in memory. PostgreSQL indexes are disk-based structures. They do benefit from caching, but that only goes so far and depends on hardware and settings you haven't told us about.
Seems like postgres is using the b-tree it created for the primary key instead of my (much faster, isn't it??) Hash index.
It can only use the index which defines the constraint, which is the btree index, as hash indexes cannot support primary key constraints. You could define an EXCLUDE constraint using the hash index, but that would just make it slower yet. And in general, hash indexes are not "much faster" than btree indexes in PostgreSQL.
I tried to create several types of indexes on same column of my table to see how they compare, all of them I was able to create quickly but not a HASH index. I read about them how they got better in recent Postgres versions but I guess they may still have some limitations.
My table has 96 477 996 rows and column I tried indexes on is type of integer.
CREATE INDEX gpps_brin_index ON cdc_s5_gpps_ind USING brin (id_transformace) WITH (pages_per_range='256');
--27s 879ms
-- drop index gpps_brin_index;
CREATE INDEX gpps_gin_index ON cdc_s5_gpps_ind USING gin (id_transformace);
-- 1m 13s
-- drop index gpps_gin_index;
CREATE INDEX gpps_btree_index ON cdc_s5_gpps_ind (id_transformace);
-- 45s 744ms
-- drop index gpps_btree_index;
But hash index didn't finish even after 38 minutes
CREATE INDEX gpps_hash_index ON cdc_s5_gpps_ind USING hash (id_transformace);
I tried to set work memory to 4GB to see if it makes any difference but no change.
So if other indexes are created within a minute then there is probably something wrong with hash index. I tried to create it on some small table and it finished quickly so it seems there is probably some size limitations when from certain table size index will start to struggle. Can someone confirm me this or is there something I am missing.
EDIT: As explained by #jjanes I tried hash index on another column which has only unique values (row id) and HASH index was created in 2m34s.
PostgreSQL 12.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.3.1
20191121 (Red Hat 8.3.1-5), 64-bit
Say you have 100 distinct values, which occur about 1 million times each. So only 100 buckets can ever be occupied. Once each id_transformance has its own bucket, then no matter how many more times you split a bucket, all the rows follow one path of the split and end up in the same bucket again. So each occupied bucket will have a long list of overflow pages. And I don't think there is a fast path to get to the end of such a list, you have to traverse it each time you need to add a record to the end.
So you get degenerate build performance when you have a large number of rows, but with only a small number of distinct values. This is not a general problem with large tables, but is specific to this situation.
This could possibly be improved for bulk index creation by creating a fast-path to the end of the overflow page list or the most-recently used bucket, but even if it were I still don't think this index type would be well suited for this type of data.
I'm running Postgres 9.5 and am playing around with BRIN indexes. I have a fact table with about 150 million rows and I'm trying to get PG to use a BRIN index. My query is:
select sum(transaction_amt),
sum (total_amt)
from fact_transaction
where transaction_date_key between 20170101 and 20170201
I created both a BTREE index and a BRIN index (default pages_per_range value of 128) on column transaction_date_key (the above query is referring to January to February 2017). I would have thought that PG would choose to use the BRIN index however it goes with the BTREE index. Here is the explain plan:
https://explain.depesz.com/s/uPI
I then deleted the BTREE index, did a vacuum / analyze on the the table, and re-ran the query and it did choose the BRIN index however the run time was considerably longer:
https://explain.depesz.com/s/5VXi
In fact my tests were all faster when using the BTREE index rather than the BRIN index. I thought it was supposed to be the opposite?
I'd prefer to use the BRIN index because of its smaller size however I can't seem to get PG to use it.
Note: I loaded the data, starting from January 2017 through to June 2017 (defined via transaction_date_key) as I read that physical table ordering makes a difference when using BRIN indexes.
Does anyone know why PG is choosing to use the BTREE index and why BRIN is so much slower in my case?
It seems like the BRIN index scan is not very selective – it returns 30 million rows, all of which have to be re-checked, which is where the time is spent.
That probably means that transaction_date_key is not well correlated with the physical location of the rows in the table.
A BRIN index works by “lumping together” ranges of table blocks (how many can be configured with the storage parameter pages_per_range, whose default value is 128). The maximum and minimum of the indexed value for eatch range of blocks is stored.
So a lot of block ranges in your table contain transaction_date_key between 20170101 and 20170201, and all of these blocks have to be scanned to compute the query result.
I see two options to improve the situation:
Lower the pages_per_range storage parameter. That will make the index bigger, but it will reduce the number of “false positive” blocks.
Cluster the table on the transaction_date_key attribute. As you have found out, that requires (at least temporarily) a B-tree index on the column.
In my app I have a concept of "seasons" which change discretely over time. All the entities are related to some season. All entities have season based indices as well as some indices on other fields. When season change occurs, postgresql decides to use filtered scan plan based on season index rather than more specific field indices. At the beginning of the season the planning cost of such decision is very little, so it's ok, but the problem is - season change brings MANY users to come at the very beginning of the season, so postgresql scan based query plan becomes bad very fast - it simply scans all the entities in the new season, and filters target items. After first auto analyze postgres decides to use a good plan, BUT auto analyze runs VERY SLOWLY due to contention and I suppose it's like a snowball - the more requests are done, the more contention is due to a bad plan and thus auto analyze works slowly and slowly. The biggest time for auto analyze to work was about an hour last week, and it becomes a real problem. I know postgresql architects decided to disable the possibility to choose the index used in query, but what is the best way to overcome my problem then?
Just to clarify, here is a DDL, one of the "slow" queries and explain results before and after auto analyze.
DDL
CREATE TABLE race_results (
id INTEGER PRIMARY KEY NOT NULL DEFAULT nextval('race_results_id_seq'::regclass),
user_id INTEGER NOT NULL,
opponent_id INTEGER,
season_id INTEGER NOT NULL,
type RACE_TYPE NOT NULL DEFAULT 'battle'::race_type,
elo_delta INTEGER NOT NULL,
opponent_elo_delta INTEGER NOT NULL DEFAULT 0,
);
CREATE INDEX race_results_type_user_id_index ON race_results USING BTREE (season_id, type, user_id);
CREATE INDEX race_results_type_opponent_id_index ON race_results USING BTREE (season_id, type, opponent_id);
CREATE INDEX race_results_opponent_id_index ON race_results USING BTREE (opponent_id);
CREATE INDEX race_results_user_id_index ON race_results USING BTREE (user_id);
Query
SELECT 1000 + COALESCE(SUM(CASE WHEN user_id = 6446 THEN elo_delta ELSE opponent_elo_delta END), 0)
FROM race_results
WHERE type = 'battle' :: race_type AND (user_id = 6446 OR opponent_id = 6446) AND
season_id = current_season_id()
Results of explain before auto analyze (as you see more than a thousand items is already removed by filter and soon it becomes hundreds of thousands for each request)
Results of explain analyze after auto analyze (now postgres decides to use the right index and no filtering needed anymore, but the problem is - auto analyze takes too long partly due to contention of ineffective index selection in previous picture)
ps: Now I'm solving the problem just turning off the application server after 10 seconds after season changes so that postgres gets new data and starts autoanalyze, and then turn it on, when autoanalyze finishes, but such solution involves downtime, which is not desirable and overall it looks weird
Finally I found the solution. It's not perfect and I will not mark it as the best one, however it works and could help someone.
Instead of indices on season, type and user/opponent id, I now have indices
CREATE INDEX race_results_type_user_id_index ON race_results USING BTREE (user_id,season_id, type);
CREATE INDEX race_results_type_opponent_id_index ON race_results USING BTREE (opponent_id,season_id, type);
One problem which appeared - I needed and index on season anyway in other queries, but when I add index
CREATE INDEX race_results_season_index ON race_results USING BTREE (season_id);
the planner tries to use it again instead of those right indices and the whole situation is repeated. What I've done is simply added one more field: 'season_id_clone', which contains the same data as 'season_id', and I made an index against it. Now when I need to filter something based on season (not including queries from the first post), I'm using season_id_clone in query. I know it's weird, but I haven't found anything better.