PostgresQL query using index scan backward instead of index only scan - postgresql

I'm running into something I cannot explain and I have been googling for a few days now and have not yet found the cause for my "problem" with the PostgresQL scheduler causing a (relatively simple) query to take massive amounts of time.
Let's start from the top (I've tried to remove as much useless information as possible so the tables may look pointless but trust me, they're not):
I have the following schema:
CREATE TABLE ct_log (
ID integer,
CONSTRAINT ctl_pk
PRIMARY KEY (ID)
);
CREATE TABLE ct_log_entry (
CERTIFICATE_ID bigint NOT NULL,
ENTRY_ID bigint NOT NULL,
ENTRY_TIMESTAMP timestamp NOT NULL,
CT_LOG_ID integer NOT NULL,
CONSTRAINT ctle_ctl_fk
FOREIGN KEY (CT_LOG_ID)
REFERENCES ct_log(ID)
) PARTITION BY RANGE (ENTRY_TIMESTAMP);
-- I will not repeat this one 7 times, but there are partition for each year from 2013-2020:
CREATE TABLE ct_log_entry_2020 PARTITION OF ct_log_entry
FOR VALUES FROM ('2020-01-01T00:00:00'::timestamp) TO ('2021-01-01T00:00:00'::timestamp);
CREATE INDEX ctle_c ON ct_log_entry (CERTIFICATE_ID);
CREATE INDEX ctle_e ON ct_log_entry (ENTRY_ID);
CREATE INDEX ctle_t ON ct_log_entry (ENTRY_TIMESTAMP);
CREATE INDEX ctle_le ON ct_log_entry (CT_LOG_ID, ENTRY_ID DESC);
(in case you are curious about the full schema: https://github.com/crtsh/certwatch_db/blob/master/sql/create_schema.sql)
And this is the query I am trying to run:
SELECT ctl.ID, latest.entry_id
FROM ct_log ctl
LEFT JOIN LATERAL (
SELECT coalesce(max(entry_id), -1) entry_id
FROM ct_log_entry ctle
WHERE ctle.ct_log_id = ctl.id
) latest ON TRUE;
For the people that know https://crt.sh this might look familiar because this is indeed the schema from crt.sh. This makes it a bit interesting since crt.sh provides public PostgresQL access allowing me to compare query plans between my own server and theirs.
My server query plan (~700s): https://explain.depesz.com/s/ZKkt
Public crt.sh query plan (~3ms): https://explain.depesz.com/s/01Ht
This difference is quit noticeable (:sad_smile:) but I'm not sure why because as far as I know I have the correct indexes for this to be very fast and the same indexes as the crt.sh server.
It looks like my instance is using a backwards index scan instead of a index only scan for the largest 2 partitions. This was not always the case and previously it execute using the same query plan as the crt.sh instance but for some reason it decided to stop doing that.
(This is the amount of data in those tables in case it's not clear from the query plans: https://d.bouma.dev/wUjdXJXk1OzF. I cannot see how much is in the crt.sh database because they don't provide access to the individual partitions)
Now onto the list of thing I've tried:
ANALYZE the ct_log_entry (and ct_log_entry_* tables created by the partitioning)
VACUUM ANALYZE the ct_log_entry (and ct_log_entry_* tables created by the partitioning)
VACUUM FULL the ct_log_entry (and ct_log_entry_* tables created by the partitioning)
Dropping the ctle_le index and recreating it again (this worked once for me giving me a few hours of great performance until I imported more data and it went with the backwards scan again)
REINDEX INDEX the ctle_le index on each ct_log_entry_* table
SET random_page_cost = x;, tried 1, 1.1, 4 and 5 (according to many SO answers and blog posts)
The only thing I notice that are different is that crt.sh is running PostgresQL 12.1 and I'm running 12.3, but as far as I can tell that shouldn't have any impact.
Also before you say, "yes well, but you cannot run this amount of data on your laptop", the server I'm running is a dedicated box with 32 available threads and 128GB RAM and running a RAID 5 with 8 2TB Samsung EVO 860 drives on hardware RAID (yes I know this is bad if a drive fails, that's another issue I'll deal with later but the read performance should be excellent). I don't know what crt.sh is running for hardware but since I only have a fraction of the data imported I don't see my hardware being the issue here (yet).
I've also "tuned" my config using the guide here: https://pgtune.leopard.in.ua/#/.
Happy to provide more info where needed but hoping someone can point me to a flaw and/or provide a solution to resolve the problem and show PostgresQL how to use the optimal path!

Related

How to enable index-sequential files in postgres

I am writing an application backed by Postgres DB.
The application is like a logging system, the main table is like this
create table if not exists logs
(
user_id bigint not null,
log bytea not null,
timestamp timestamptz not null default clock_timestamp() at time zone 'UTC'
);
One of the main query is to fetch all log about a certain user_id, ordered by timestamp desc. It would be nice that under the hood Postgres DB stores all rows about the same user_id in one page or sequential pages, instead of scattering here and there on the disk.
As I recall from textbooks, is this the so-called "index-sequential files"? How can I guide Postgres to do that?
The simple thing to do is to create a B-tree index to speed up the search:
CREATE INDEX logs_user_time_idx ON logs (user_id, timestamp);
That would speed up the query, but take extra space on the disk and slow down all INSERT operations on the table (the index has to be maintained). There is no free lunch!
I assume that you were talking about that when you mentioned "index-sequential files". But perhaps you meant what is called a clustered index or index-organized table, which essentially keeps the table itself in a certain order. That can speed up searches like that even more. However, PostgreSQL does not have that feature.
The best you can do to make disk access more efficient in PostgreSQL is to run the CLUSTER command, which rewrites the table in index order:
CLUSTER logs USING logs_user_time_idx;
But be warned:
That statement rewrites the whole table, so it could take a long time. During that time, the table is inaccessible.
Subsequent INSERTs won't maintain the order in the table, so it “rots” over time, and after a while you will have to CLUSTER the table again.

Postgres not very fast at finding unique values in table with about 1.3 billion rows

So I have a (logged) table with two columns A, B, containing text.
They basically contain the same type of information, it's just two columns because of where the data came from.
I wanted to have a table of all unique values (so I made the column be the primary key), not caring about the column. But when I asked postgres to do
insert into new_table(value) select A from old_table on conflict (value) do nothing; (and later on same thing for column B)
it used 1 cpu core, and only read from my SSD with about 5 MB/s. I stopped it after a couple of hours.
I suspected that it might be because of the b-tree being slow and so I added a hashindex on the only attribute in my new table. But it's still using 1 core to the max and reading from the ssd at only 5 MB/s per second. My java program can hashset that at at least 150 MB/s, so postgres should be way faster than 5 MB/s, right? I've analyzed my old table and I made my new table unlogged for faster inserts and yet it still uses 1 core and reads extremely slowly.
How to fix this?
EDIT: This is the explain to the above query. Seems like postgres is using the b-tree it created for the primary key instead of my (much faster, isn't it??) Hash index.
Insert on users (cost=0.00..28648717.24 rows=1340108416 width=14)
Conflict Resolution: NOTHING
Conflict Arbiter Indexes: users_pkey
-> Seq Scan on games (cost=0.00..28648717.24 rows=1340108416 width=14)
The ON CONFLICT mechanism is primarily for resolving concurrency-induced conflicts. You can use it in a "static" case like this, but other methods will be more efficient.
Just insert only distinct values in the first place:
insert into new_table(value)
select A from old_table union
select B from old_table
For increased performance, don't add the primary key until after the table is populated. And set work_mem to the largest value you credibly can.
My java program can hashset that at at least 150 MB/s,
That is working with the hashset entirely in memory. PostgreSQL indexes are disk-based structures. They do benefit from caching, but that only goes so far and depends on hardware and settings you haven't told us about.
Seems like postgres is using the b-tree it created for the primary key instead of my (much faster, isn't it??) Hash index.
It can only use the index which defines the constraint, which is the btree index, as hash indexes cannot support primary key constraints. You could define an EXCLUDE constraint using the hash index, but that would just make it slower yet. And in general, hash indexes are not "much faster" than btree indexes in PostgreSQL.

PostgreSQL insert into too many referenced table

I have a simple table with a primary key ID and other columns that are not so interesting. The data there is not so much, like several thousands records. However there too many constraints on this table. I mean it is referenced by other tables (more than 200) with foreign key to the ID column. And every time I tried to insert something into it, it checks all the constraints and takes around 2-3 secs for every insert to complete. There are btree indexes on all tables, so the query planner uses index scans as well as sequential ones.
My question is there any option or something that I can apply in order to speed up these inserts. I tried to disable the sequential scans and use index scans only, but this didn't help. The partitioning wouldn't be helpful as well, I think. So please advise.
The version of psql is 10.0.
Thank you!

Postgres ignores indexes after upgrade to 9.5

We have recently upgraded our OLTP Production DB (2TB) from v9.2.9.21 to 9.5.1.6 using pg_upgrade.
The upgrade went without incident and we have been running for a week, however have found that the optimizer is ignoring indexes on 2 of our largest partitioned tables. (Note: different issue to 38943115, the data migrated with no issues).
The tables are constructed with individual Btree indexes on BIGINT columns, which the optimizer would previously return queries on sub-second. Post-upgrade the queries take up to 16 minutes (unusable for our customers). The partitions are <100GB, with 2-3 partitions per table.
We suspected index corruption and tried adding duplicate indexes and analyzing, however the new indexes are still ignored unless we force using enable_seqscan=no or reduce random_page_cost to 2 (not practical system-wide). The query response times using the new indexes are still appalling (16 minutes).
We have tried increasing effective_cache_size but no effect. The DB is 24x7 so we cannot reindex the tables/ partitions.
The indexes are defined like so:
CREATE INDEX table1_column1_index ON table1 USING btree (column1);
CREATE INDEX table1part1_column1_index ON table1 USING btree (column1);
CREATE INDEX table1part2_column1_index ON table1 USING btree (column1);
CREATE INDEX table1part3_column1_index ON table1 USING btree (column1);
....And repeats for each subsequent column in the query (the query plan isn't using composite indexes).
Has anyone encountered this or could suggest any further steps?

Postgresql becomes unresponsible when new index value is added

In my app I have a concept of "seasons" which change discretely over time. All the entities are related to some season. All entities have season based indices as well as some indices on other fields. When season change occurs, postgresql decides to use filtered scan plan based on season index rather than more specific field indices. At the beginning of the season the planning cost of such decision is very little, so it's ok, but the problem is - season change brings MANY users to come at the very beginning of the season, so postgresql scan based query plan becomes bad very fast - it simply scans all the entities in the new season, and filters target items. After first auto analyze postgres decides to use a good plan, BUT auto analyze runs VERY SLOWLY due to contention and I suppose it's like a snowball - the more requests are done, the more contention is due to a bad plan and thus auto analyze works slowly and slowly. The biggest time for auto analyze to work was about an hour last week, and it becomes a real problem. I know postgresql architects decided to disable the possibility to choose the index used in query, but what is the best way to overcome my problem then?
Just to clarify, here is a DDL, one of the "slow" queries and explain results before and after auto analyze.
DDL
CREATE TABLE race_results (
id INTEGER PRIMARY KEY NOT NULL DEFAULT nextval('race_results_id_seq'::regclass),
user_id INTEGER NOT NULL,
opponent_id INTEGER,
season_id INTEGER NOT NULL,
type RACE_TYPE NOT NULL DEFAULT 'battle'::race_type,
elo_delta INTEGER NOT NULL,
opponent_elo_delta INTEGER NOT NULL DEFAULT 0,
);
CREATE INDEX race_results_type_user_id_index ON race_results USING BTREE (season_id, type, user_id);
CREATE INDEX race_results_type_opponent_id_index ON race_results USING BTREE (season_id, type, opponent_id);
CREATE INDEX race_results_opponent_id_index ON race_results USING BTREE (opponent_id);
CREATE INDEX race_results_user_id_index ON race_results USING BTREE (user_id);
Query
SELECT 1000 + COALESCE(SUM(CASE WHEN user_id = 6446 THEN elo_delta ELSE opponent_elo_delta END), 0)
FROM race_results
WHERE type = 'battle' :: race_type AND (user_id = 6446 OR opponent_id = 6446) AND
season_id = current_season_id()
Results of explain before auto analyze (as you see more than a thousand items is already removed by filter and soon it becomes hundreds of thousands for each request)
Results of explain analyze after auto analyze (now postgres decides to use the right index and no filtering needed anymore, but the problem is - auto analyze takes too long partly due to contention of ineffective index selection in previous picture)
ps: Now I'm solving the problem just turning off the application server after 10 seconds after season changes so that postgres gets new data and starts autoanalyze, and then turn it on, when autoanalyze finishes, but such solution involves downtime, which is not desirable and overall it looks weird
Finally I found the solution. It's not perfect and I will not mark it as the best one, however it works and could help someone.
Instead of indices on season, type and user/opponent id, I now have indices
CREATE INDEX race_results_type_user_id_index ON race_results USING BTREE (user_id,season_id, type);
CREATE INDEX race_results_type_opponent_id_index ON race_results USING BTREE (opponent_id,season_id, type);
One problem which appeared - I needed and index on season anyway in other queries, but when I add index
CREATE INDEX race_results_season_index ON race_results USING BTREE (season_id);
the planner tries to use it again instead of those right indices and the whole situation is repeated. What I've done is simply added one more field: 'season_id_clone', which contains the same data as 'season_id', and I made an index against it. Now when I need to filter something based on season (not including queries from the first post), I'm using season_id_clone in query. I know it's weird, but I haven't found anything better.