Autovacuum and partitioned tables - postgresql

Postgres doc tells that partitioned tables are not processed by autovacuum. But still I see that last_autovacuum column from pg_stat_user_tables is populated with recent timestamps for live partitions.
Does it mean that these timestamps are set by the background worker which only prevents transaction ID wraparound, without actually performing ANALYZE&VACUUM? Or whatever else could populate them?
Besides, taken that partitions are large and active enough, should I run the both ANALYZE and VACUUM manually on those partitions? If yes, does the order matter?
UPDATE
I'm trying to elaborate, thanks to the comments given.
Taking that vacuum should work the same way on partition as on the regular table, what could be a reason for much faster growth of the occupied disk space after partitioning? Before partitioning it was nearly a linear function of records count.
What is confusing as well, when looking for autovacuum processes running I see that those related to partitions are denoted with "to prevent wraparound", while others are not. Is it absolutely a coincidence or there is something to check?
Documentation describes partitioned table as rather a virtual entity, without its own storage. What is the point in denoting that it is not vacuumed?

The statement from the documentation is true, but misleading. Autovacuum does not process the partitioned table itself, but it processes the partitions, which are regular PostgreSQL tables. So dead tuples get removed, the visibility map gets updated, and so on. In short, there is nothing to worry about as far as vacuuming is concerned. Remember that the partitioned table itself does not hold any data!
What the documentation warns you about is ANALYZE. Autovacuum also launches automatic ANALYZE jobs to collect accurate table statistics. This will be work fine on the partitions, but there are no table statistics collected on the partitioned table itself, so you have to run ANALYZE manually on the partitioned table to get these data. In practice, I find that not to be a problem, since the optimizer generates plans for each individual partition anyway, and there it has accurate statistics.

Related

Stop autovacuum on old partition

We're having some issues when autovacuum triggers on one of our large tables (~100Gb).
Our ETL jobs only hit the last three partition of this table but, from my understanding, when autovacuum is triggered on a partition the whole table is vacuumed and this is causing the ETL job to wait until it's finished.
So far I've tried to set autovacuum_vacuum_scale_factor to both a higher and lower value which yields approximately the same execution time for our job.
Since a rather important number of INSERT/UPDATE is performed on this table, I would like to put a low value for autovacuum_vacuum_scale_factor on the three lasts partitions, but prevent the vacuuming of older partitions.
We are already using a vacuum script that runs every evening so I'm thinking about setting autovacuum_enabled to off on older partitionned and let the script handle the vacuum on these tables, but I'm not sure if it's the right way to treat this problem.
Another parameter I've stumbled upon is the vacuum_freeze_min_age and autovacuum_freeze_max_age, but I'm not sure I understand how to use them.

PostgreSQL ANALYZE statisticts & Replication

On my primary I ran a VACUUM then an ANALYZE on all databases, then when I check pg_stat_user_tables, the last_analyze column shows a current timestamp which is great.
When I check my replication instance, there are no values in the last_analyze column. I was assuming this timestamp would also eventually populate? Is this known behaviour?
The reason I ask is that after that VACUUM/ANALYZE on the primary, I'm running into some extremely slow queries on the replication instance. I ran an EXPLAIN plan prior to the VACUUM/ANALYZE on a query and it ran in 5 seconds... now it's taking 65 seconds. The EXPLAIN shows it's not using a lot of indexes that it should be.
PostgreSQL has two different stats systems. One records data about the distribution of values in the columns, this is transactional. It propagates to the replica via the WAL.
The other system records data about turn over on the tables and data on when the last vac/an was done. This system is used to determine when to schedule new vac/an (to prevent the first system from getting too out of date). This one is not transactional, and does not propagate to the replica.
So the replica has the latest column value distribution statistics (as soon as the WAL replays, anyway), but it doesn't know how recent they are.

Postgres Partitioning Query Performance when Partitioned for Delete

We are on Postgresql 12 and looking to partition a group of tables that are all related by Data Source Name. A source can have tens of millions of records and the whole dataset makes up about 900GB of space across the 2000 data sources. We don't have a good way to update these records so we are looking at a full dump and reload any time we need to update data for a source. This is why we are looking at using partitioning so we can load the new data into a new partition, detach (and later drop) the partition that currently houses the data, and then attach the new partition with the latest data. Queries will be performed via a single ID field. My concern is that since we are partitioning by source name and querying by an ID that isn't used in the partition definition that we won't be able to utilize any partition pruning and our queries will suffer for it.
How concerned should we be with query performance for this use case? There will be an index defined on the ID that is being queried, but based on the Postgres documentation it can add a lot of planning time and use a lot of memory to service queries that look at many partitions.
Performance will suffer, but it will depend on the number of partitions how much. The more partitions you have, the slower both planning and execution time will get, so keep the number low.
You can save on query planning time by defining a prepared statement and reusing it.

Slow bulk read from Postgres Read replica while updating the rows we read

We have on RDS a main Postgres server and a read replica.
We constantly write and update new data for the last couple of days.
Reading from the read-replica works fine when looking at older data but when trying to read from the last couple of days, where we keep updating the data on the main server, is painfully slow.
Queries that take 2-3 minutes on old data can timeout after 20 minutes when querying data from the last day or two.
Looking at the monitors like CPU I don't see any extra load on the read replica.
Is there a solution for this?
You are accessing over 65 buffers for ever 1 visible row found in the index scan (and over 500 buffers for each row which is returned by the index scan, since 90% are filtered out by the mmsi criterion).
One issue is that your index is not as well selective as it could be. If you had the index on (day, mmsi) rather than just (day) it should be about 10 times faster.
But it also looks like you have a massive amount of bloat.
You are probably not vacuuming the table often enough. With your described UPDATE pattern, all the vacuum needs are accumulating in the newest data, but the activity counters are evaluated based on the full table size, so autovacuum is not done often enough to suit the needs of the new data. You could lower the scale factor for this table:
alter table simplified_blips set (autovacuum_vacuum_scale_factor = 0.01)
Or if you partition the data based on "day", then the partitions for newer days will naturally get vacuumed more often because the occurrence of updates will be judged against the size of each partition, it won't get diluted out by the size of all the older inactive partitions. Also, each vacuum run will take less work, as it won't have to scan all of the indexes of the entire table, just the indexes of the active partitions.
As suggested, the problem was bloat.
When you update a record in an ACID database the database creates a new version of the record with the new updated record.
After the update you end with a "dead record" (AKA dead tuple)
Once in a while the database will do autovacuum and clean the table from the dead tuples.
Usually the autovacuum should be fine but if your table is really large and updated often you should consider changing the autovacuum analysis and size to be more aggressive.

Overhead of using many partitions in Postgres

In the following link, the creator of a tool I use (Airflow) suggests to create partitions for daily snapshots of dimension tables. I am wondering about the overhead of doing something like this in Postgres.
I am using the Postgres 10 built in partitioning for several tables, but mostly at a monthly or yearly level for facts. I never tried implementing a daily partition for dimensions before and it seems scary. It would simplify things though in several areas for me in case I need to rerun old tasks.
https://medium.com/#maximebeauchemin/functional-data-engineering-a-modern-paradigm-for-batch-data-processing-2327ec32c42a
Simple. With dimension snapshots where a new partition is appended at
each ETL schedule. The dimension table becomes a collection of
dimension snapshots where each partition contains the full dimension
as-of a point in time. “But only a small percentage of the data
changes every day, that’s a lot of data duplication!”. That’s right,
though typically dimension tables are negligible in size in proportion
to facts. It’s also an elegant way to solve SCD-type problematic by
its simplicity and reproducibility. Now that storage and compute are
dirt cheap compared to engineering time, snapshoting dimensions make
sense in most cases.
While the traditional type-2 slowly changing dimension approach is
conceptually sound and may be more computationally efficient overall,
it’s cumbersome to manage. The processes around this approach, like
managing surrogate keys on dimensions and performing surrogate key
lookup when loading facts, are error-prone, full of mutations and
hardly reproducible.
I have worked with systems with different levels of partitioning.
Generally any partitioning is OK as long as you have check constrains on partitions which allow query planner to find adequate partitions for query. Or you will have to query specific partition directly for some special cases. Otherwise you will see sequential scans over all partitions even for simple queries.
Daily partitions are completely OK do not worry. And I worked event with data collector based on PG which needed to have partitions for every 5 minutes of data because it collected several TBs per day.
Number of partitions can become a bigger problem only when you have several thousands or dozens of thousands of partitions - with this amount of partitions everything goes to different level of problems.
You will have to set proper max_locks_per_transaction for example to be able to work with them. Because even simple select over parent table places SharedAccessLock over all partitions which is not exactly nice but PG inheritance works this way.
Plus higher planing time for query - in our data warehouse we sometimes see planning times for queries like several minutes and queries taking only seconds - which is a bit craped... But it is hard to do anything with it because current PG planner works this way.
But PROs still overweight CONs so I highly recommend to use any partitioning granularity you need.