We have recently upgraded our OLTP Production DB (2TB) from v9.2.9.21 to 9.5.1.6 using pg_upgrade.
The upgrade went without incident and we have been running for a week, however have found that the optimizer is ignoring indexes on 2 of our largest partitioned tables. (Note: different issue to 38943115, the data migrated with no issues).
The tables are constructed with individual Btree indexes on BIGINT columns, which the optimizer would previously return queries on sub-second. Post-upgrade the queries take up to 16 minutes (unusable for our customers). The partitions are <100GB, with 2-3 partitions per table.
We suspected index corruption and tried adding duplicate indexes and analyzing, however the new indexes are still ignored unless we force using enable_seqscan=no or reduce random_page_cost to 2 (not practical system-wide). The query response times using the new indexes are still appalling (16 minutes).
We have tried increasing effective_cache_size but no effect. The DB is 24x7 so we cannot reindex the tables/ partitions.
The indexes are defined like so:
CREATE INDEX table1_column1_index ON table1 USING btree (column1);
CREATE INDEX table1part1_column1_index ON table1 USING btree (column1);
CREATE INDEX table1part2_column1_index ON table1 USING btree (column1);
CREATE INDEX table1part3_column1_index ON table1 USING btree (column1);
....And repeats for each subsequent column in the query (the query plan isn't using composite indexes).
Has anyone encountered this or could suggest any further steps?
Related
So I have a (logged) table with two columns A, B, containing text.
They basically contain the same type of information, it's just two columns because of where the data came from.
I wanted to have a table of all unique values (so I made the column be the primary key), not caring about the column. But when I asked postgres to do
insert into new_table(value) select A from old_table on conflict (value) do nothing; (and later on same thing for column B)
it used 1 cpu core, and only read from my SSD with about 5 MB/s. I stopped it after a couple of hours.
I suspected that it might be because of the b-tree being slow and so I added a hashindex on the only attribute in my new table. But it's still using 1 core to the max and reading from the ssd at only 5 MB/s per second. My java program can hashset that at at least 150 MB/s, so postgres should be way faster than 5 MB/s, right? I've analyzed my old table and I made my new table unlogged for faster inserts and yet it still uses 1 core and reads extremely slowly.
How to fix this?
EDIT: This is the explain to the above query. Seems like postgres is using the b-tree it created for the primary key instead of my (much faster, isn't it??) Hash index.
Insert on users (cost=0.00..28648717.24 rows=1340108416 width=14)
Conflict Resolution: NOTHING
Conflict Arbiter Indexes: users_pkey
-> Seq Scan on games (cost=0.00..28648717.24 rows=1340108416 width=14)
The ON CONFLICT mechanism is primarily for resolving concurrency-induced conflicts. You can use it in a "static" case like this, but other methods will be more efficient.
Just insert only distinct values in the first place:
insert into new_table(value)
select A from old_table union
select B from old_table
For increased performance, don't add the primary key until after the table is populated. And set work_mem to the largest value you credibly can.
My java program can hashset that at at least 150 MB/s,
That is working with the hashset entirely in memory. PostgreSQL indexes are disk-based structures. They do benefit from caching, but that only goes so far and depends on hardware and settings you haven't told us about.
Seems like postgres is using the b-tree it created for the primary key instead of my (much faster, isn't it??) Hash index.
It can only use the index which defines the constraint, which is the btree index, as hash indexes cannot support primary key constraints. You could define an EXCLUDE constraint using the hash index, but that would just make it slower yet. And in general, hash indexes are not "much faster" than btree indexes in PostgreSQL.
I'm running into something I cannot explain and I have been googling for a few days now and have not yet found the cause for my "problem" with the PostgresQL scheduler causing a (relatively simple) query to take massive amounts of time.
Let's start from the top (I've tried to remove as much useless information as possible so the tables may look pointless but trust me, they're not):
I have the following schema:
CREATE TABLE ct_log (
ID integer,
CONSTRAINT ctl_pk
PRIMARY KEY (ID)
);
CREATE TABLE ct_log_entry (
CERTIFICATE_ID bigint NOT NULL,
ENTRY_ID bigint NOT NULL,
ENTRY_TIMESTAMP timestamp NOT NULL,
CT_LOG_ID integer NOT NULL,
CONSTRAINT ctle_ctl_fk
FOREIGN KEY (CT_LOG_ID)
REFERENCES ct_log(ID)
) PARTITION BY RANGE (ENTRY_TIMESTAMP);
-- I will not repeat this one 7 times, but there are partition for each year from 2013-2020:
CREATE TABLE ct_log_entry_2020 PARTITION OF ct_log_entry
FOR VALUES FROM ('2020-01-01T00:00:00'::timestamp) TO ('2021-01-01T00:00:00'::timestamp);
CREATE INDEX ctle_c ON ct_log_entry (CERTIFICATE_ID);
CREATE INDEX ctle_e ON ct_log_entry (ENTRY_ID);
CREATE INDEX ctle_t ON ct_log_entry (ENTRY_TIMESTAMP);
CREATE INDEX ctle_le ON ct_log_entry (CT_LOG_ID, ENTRY_ID DESC);
(in case you are curious about the full schema: https://github.com/crtsh/certwatch_db/blob/master/sql/create_schema.sql)
And this is the query I am trying to run:
SELECT ctl.ID, latest.entry_id
FROM ct_log ctl
LEFT JOIN LATERAL (
SELECT coalesce(max(entry_id), -1) entry_id
FROM ct_log_entry ctle
WHERE ctle.ct_log_id = ctl.id
) latest ON TRUE;
For the people that know https://crt.sh this might look familiar because this is indeed the schema from crt.sh. This makes it a bit interesting since crt.sh provides public PostgresQL access allowing me to compare query plans between my own server and theirs.
My server query plan (~700s): https://explain.depesz.com/s/ZKkt
Public crt.sh query plan (~3ms): https://explain.depesz.com/s/01Ht
This difference is quit noticeable (:sad_smile:) but I'm not sure why because as far as I know I have the correct indexes for this to be very fast and the same indexes as the crt.sh server.
It looks like my instance is using a backwards index scan instead of a index only scan for the largest 2 partitions. This was not always the case and previously it execute using the same query plan as the crt.sh instance but for some reason it decided to stop doing that.
(This is the amount of data in those tables in case it's not clear from the query plans: https://d.bouma.dev/wUjdXJXk1OzF. I cannot see how much is in the crt.sh database because they don't provide access to the individual partitions)
Now onto the list of thing I've tried:
ANALYZE the ct_log_entry (and ct_log_entry_* tables created by the partitioning)
VACUUM ANALYZE the ct_log_entry (and ct_log_entry_* tables created by the partitioning)
VACUUM FULL the ct_log_entry (and ct_log_entry_* tables created by the partitioning)
Dropping the ctle_le index and recreating it again (this worked once for me giving me a few hours of great performance until I imported more data and it went with the backwards scan again)
REINDEX INDEX the ctle_le index on each ct_log_entry_* table
SET random_page_cost = x;, tried 1, 1.1, 4 and 5 (according to many SO answers and blog posts)
The only thing I notice that are different is that crt.sh is running PostgresQL 12.1 and I'm running 12.3, but as far as I can tell that shouldn't have any impact.
Also before you say, "yes well, but you cannot run this amount of data on your laptop", the server I'm running is a dedicated box with 32 available threads and 128GB RAM and running a RAID 5 with 8 2TB Samsung EVO 860 drives on hardware RAID (yes I know this is bad if a drive fails, that's another issue I'll deal with later but the read performance should be excellent). I don't know what crt.sh is running for hardware but since I only have a fraction of the data imported I don't see my hardware being the issue here (yet).
I've also "tuned" my config using the guide here: https://pgtune.leopard.in.ua/#/.
Happy to provide more info where needed but hoping someone can point me to a flaw and/or provide a solution to resolve the problem and show PostgresQL how to use the optimal path!
We have a table with nearly 2 billion events recorded. As per our data model, each event is uniquely identified with 4 columns combined primary key. Excluding the primary key, there are 5 B-tree indexes each on single different columns. So totally 6 B-tree indexes.
The events recorded span for years and now we need to remove the data older than 1 year.
We have a time column with long values recorded for each event. And we use the following query,
delete from events where ctid = any ( array (select ctid from events where time < 1517423400000 limit 10000) )
Does the indices gets updated?
During testing, it didn't.
After insertion,
total_table_size - 27893760
table_size - 7659520
index_size - 20209664
After deletion,
total_table_size - 20226048
table_size - 0
index_size - 20209664
Reindex can be done
Command: REINDEX
Description: rebuild indexes
Syntax:
REINDEX { INDEX | TABLE | DATABASE | SYSTEM } name [ FORCE ]
Considering #a_horse_with_no_name method is the good solution.
What we had:
Postgres version 9.4.
1 table with 2 billion rows with 21 columns (all bigint) and 5 columns combined primary key and 5 individual column indices with date spanning 2 years.
It looks similar to time-series data with a time column containing UNIX timestamp except that its analytics project, so time is not at an ordered increase. The table was insert and select only (most select queries use aggregate functions).
What we need: Our data span is 6 months and need to remove the old data.
What we did (with less knowledge on Postgres internals):
Delete rows at 10000 batch rate.
At inital, the delete was so fast taking ms, as the bloat increased each batch delete increased to nearly 10s. Then autovacuum got triggered and it ran for almost 3 months. The insert rate was high and each batch delete has increased the WAL size too. Poor stats in the table made the current queries so slow that they ran for minutes and hours.
So we decided to go for Partitioning. Using Table Inheritance in 9.4, we implemented it.
Note: Postgres has Declarative Partitioning from version 10, which handles most manual work needed in partitioning using Table Inheritance.
Please go through the official docs as they have clear explanation.
Simplified and how we implemented it:
Create parent table
Create child table inheriting it with check constraints. (We had monthly partitions and created using schedular)
Indexes are need to be created separately for each child table
To drop old data, just drop the table, so vacuum is not needed and will be instant.
Make sure to have the postgres property constraint_exclusion to partition.
VACUUM ANALYZE the old partition after started inserting in the new partition. (In our case, it helped the query planner to use Index-Only scan instead of Seq. scan)
Using Triggers as mentioned in the docs may make the inserts slower, so we deviated from it, as we partitioned based on time column, we calculated the table name at application level based on time value before every insert and it didn't affect the insert rate for us.
Also read other caveats mentioned there.
I have a simple table with a primary key ID and other columns that are not so interesting. The data there is not so much, like several thousands records. However there too many constraints on this table. I mean it is referenced by other tables (more than 200) with foreign key to the ID column. And every time I tried to insert something into it, it checks all the constraints and takes around 2-3 secs for every insert to complete. There are btree indexes on all tables, so the query planner uses index scans as well as sequential ones.
My question is there any option or something that I can apply in order to speed up these inserts. I tried to disable the sequential scans and use index scans only, but this didn't help. The partitioning wouldn't be helpful as well, I think. So please advise.
The version of psql is 10.0.
Thank you!
I'm running Postgres 9.5 and am playing around with BRIN indexes. I have a fact table with about 150 million rows and I'm trying to get PG to use a BRIN index. My query is:
select sum(transaction_amt),
sum (total_amt)
from fact_transaction
where transaction_date_key between 20170101 and 20170201
I created both a BTREE index and a BRIN index (default pages_per_range value of 128) on column transaction_date_key (the above query is referring to January to February 2017). I would have thought that PG would choose to use the BRIN index however it goes with the BTREE index. Here is the explain plan:
https://explain.depesz.com/s/uPI
I then deleted the BTREE index, did a vacuum / analyze on the the table, and re-ran the query and it did choose the BRIN index however the run time was considerably longer:
https://explain.depesz.com/s/5VXi
In fact my tests were all faster when using the BTREE index rather than the BRIN index. I thought it was supposed to be the opposite?
I'd prefer to use the BRIN index because of its smaller size however I can't seem to get PG to use it.
Note: I loaded the data, starting from January 2017 through to June 2017 (defined via transaction_date_key) as I read that physical table ordering makes a difference when using BRIN indexes.
Does anyone know why PG is choosing to use the BTREE index and why BRIN is so much slower in my case?
It seems like the BRIN index scan is not very selective – it returns 30 million rows, all of which have to be re-checked, which is where the time is spent.
That probably means that transaction_date_key is not well correlated with the physical location of the rows in the table.
A BRIN index works by “lumping together” ranges of table blocks (how many can be configured with the storage parameter pages_per_range, whose default value is 128). The maximum and minimum of the indexed value for eatch range of blocks is stored.
So a lot of block ranges in your table contain transaction_date_key between 20170101 and 20170201, and all of these blocks have to be scanned to compute the query result.
I see two options to improve the situation:
Lower the pages_per_range storage parameter. That will make the index bigger, but it will reduce the number of “false positive” blocks.
Cluster the table on the transaction_date_key attribute. As you have found out, that requires (at least temporarily) a B-tree index on the column.