So I have cancelled a global VACUUM FULL out of necessity, there will be tables that were not processed and can now be individually targeted.
Problem. VACUUM(FULL, ANALYZE) does not update last_vacuum, a known issue for a decade?
How can I identify the tables that were completed, so that I might by extension identify the complement of those? I cannot find a duplicate of this, but I find it hard to believe this is the first time the question has been asked. I am aware that this could have been extracted from verbose output.
VACUUM (FULL) isn't really VACUUM, strange as that seems. Rather, it is CLUSTER without a special ordering. The reason for this oddity is partly that the implementation of VACUUM (FULL) was radically changed in version 9.0. Since it is so different from normal VACUUM, it is not tracked in pg_stat_user_tables.last_vacuum, and its progress is tracked in pg_stat_progress_cluster rather than in pg_stat_progress_vacuum.
Apart from pg_stat_user_tables.last_analyze, which you can use since you ran VACUUM (FULL, ANALYZE), you could look at the creation timestamp of the data files. That would work, since VACUUM (FULL) creates a new copy of the table.
On Windows, you can use the following query for that:
SELECT t.oid::regclass,
s.creation
FROM pg_class AS t
JOIN pg_database AS d ON d.datname = current_database()
JOIN pg_tablespace AS ts
ON CASE WHEN t.reltablespace = 0 THEN d.dattablespace
ELSE t.reltablespace
END = ts.oid
CROSS JOIN LATERAL pg_stat_file(
CASE ts.spcname
WHEN 'pg_default' THEN 'base/' || d.oid
WHEN 'pg_global' THEN 'global'
ELSE 'pg_tblspc/' || ts.oid || '/' || d.oid
END
|| '/' || pg_relation_filenode(t.oid::regclass)
) AS s
WHERE t.relkind = 'r'
ORDER BY s.creation;
On other operating systems, pg_stat_file() returns NULL for creation, and you'd have to go look into the file system yourself.
Related
I have a process which is creating thousands of temporary tables a day to import data into a system.
It is using the form of:
create temp table if not exists test_table_temp as
select * from test_table where 1=0;
This very quickly creates a lot of dead rows in pg_attribute as it is constantly making lots of new columns and deleting them shortly afterwards for these tables. I have seen solutions elsewhere that suggest using on commit delete rows. However, this does not appear to have the desired effect either.
To test the above, you can create two separate sessions on a test database. In one of them, check:
select count(*)
from pg_catalog.pg_attribute;
and also note down the value for n_dead_tup from:
select n_dead_tup
from pg_stat_sys_tables
where relname = 'pg_attribute';
On the other one, create a temp table (will need another table to select from):
create temp table if not exists test_table_temp on commit delete rows as
select * from test_table where 1=0;
The count query for pg_attribute immediately goes up, even before we reach the commit. Upon closing the temp table creation session, the pg_attribute value goes down, but the n_dead_tup goes up, suggesting that vacuuming is still required.
I guess my real question is have I missed something above, or is the only way of dealing with this issue vacuuming aggressively and taking the performance hit that comes with it?
Thanks for any responses in advance.
No, you have understood the situation correctly.
You either need to make autovacuum more aggressive, or you need to use fewer temporary tables.
Unfortunately you cannot change the storage parameters on a catalog table – at least not in a supported fashion that will survive an upgrade – so you will have to do so for the whole cluster.
After 9.6->12.3 pg_upgrade we marked some serious select give missing results!
REINDEX or drop / create healthed the problem.
Upgrade bulletpoints
Stop 9.6
rsync 9.6 data and bin files from Centos7 to Centos8 (pre installed 12)
pg_upgrade
./analyze_new_cluster.sh
./delete_old_cluster.sh
Per database we found 1-3 UNIQUE corrupt indexes. Missed circa 20 values per index.
We found a very useful tool amcheck!
https://www.postgresql.org/docs/10/amcheck.html
SELECT bt_index_check(c.oid), c.relname, c.relpages
FROM pg_index i
JOIN pg_opclass op ON i.indclass[0] = op.oid
JOIN pg_am am ON op.opcmethod = am.oid
JOIN pg_class c ON i.indexrelid = c.oid
JOIN pg_namespace n ON c.relnamespace = n.oid
WHERE am.amname = 'btree' AND n.nspname = 'pg_catalog'
-- Don't check temp tables, which may be from another session:
AND c.relpersistence != 't'
-- Function may throw an error when this is omitted:
AND i.indisready AND i.indisvalid
ORDER BY c.relpages DESC LIMIT 10;
VERY IMPORTANT: Comment out (AND n.nspname = 'pg_catalog' + LIMIT 10)restictions by validation, to run bt_index_check function to your index too!
And yes the function throw an exception if find a corrupt index.
Why does index go wrong?
How can we be sure our new db is consistent and upgrade is succesful?
Why does index go wrong?
The most likely explanation is a difference in the glibc versions between CentOS 7 and CentOS 8.
This blog post gives some more insights about this.
In general when changing glibc versions (e.g. because of system patches or OS upgrade), you should re-index all indexes that include text, varchar or char columns.
This is not something Postgres can directly influence, though the ability to use ICU collations is a partial answer to that problem. But if the ICU version of the operating system is updated (e.g. again implicitly because of system patches) you would have the same problem there (but it seems the ICU libraries are less often updated than the glibc libraries)
I think there is some work in progress to at least warn the user. But to my knowledge nothing as been committed in the current version 12 or the upcoming version 13.
I have a bloated table, its name is "role_info".
There are about 20K insert operations and a lot of update operations per day, there are no delete operations.
The table is about 4063GB now.
We have migrated the table to another database using dump, and the new table is about 62GB, so the table on the old database is bloated very seriously.
PostgreSQL version: 9.5.4
The table schema is below:
CREATE TABLE "role_info" (
"roleId" bigint NOT NULL,
"playerId" bigint NOT NULL,
"serverId" int NOT NULL,
"status" int NOT NULL,
"baseData" bytea NOT NULL,
"detailData" bytea NOT NULL,
PRIMARY KEY ("roleId")
);
CREATE INDEX "idx_role_info_serverId_playerId_roleId" ON "role_info" ("serverId", "playerId", "roleId");
The average size of field 'detailData' is about 13KB each line.
There are some SQL execution results below:
1)
SELECT
relname AS name,
pg_stat_get_live_tuples(c.oid) AS lives,
pg_stat_get_dead_tuples(c.oid) AS deads
FROM pg_class c
ORDER BY deads DESC;
Execution Result:
2)
SELECT *,
Pg_size_pretty(total_bytes) AS total,
Pg_size_pretty(index_bytes) AS INDEX,
Pg_size_pretty(toast_bytes) AS toast,
Pg_size_pretty(table_bytes) AS TABLE
FROM (SELECT *,
total_bytes - index_bytes - Coalesce(toast_bytes, 0) AS
table_bytes
FROM (SELECT c.oid,
nspname AS table_schema,
relname AS TABLE_NAME,
c.reltuples AS row_estimate,
Pg_total_relation_size(c.oid) AS total_bytes,
Pg_indexes_size(c.oid) AS index_bytes,
Pg_total_relation_size(reltoastrelid) AS toast_bytes
FROM pg_class c
LEFT JOIN pg_namespace n
ON n.oid = c.relnamespace
WHERE relkind = 'r') a
WHERE table_schema = 'public'
ORDER BY total_bytes DESC) a;
Execution Result:
3)
I have tried to vacuum full the table "role_info", but it seemed blocked by some other process, and didn't execute at all.
select * from pg_stat_activity where query like '%VACUUM%' and query not like '%pg_stat_activity%';
Execution Result:
select * from pg_locks;
Execution Result:
There are parameters of vacuum:
I have two questions:
How to deal with table bloating? autovacuum seems not working.
Why did the vacuum full blocked?
With your autovacuum settings, it will sleep for 20ms once for every 10 pages (200 cost_limit / 20 cost_dirty) it dirties. Even more because there will also be cost_hit and cost_miss as well. At that rate is would take over 12 days to autovacuum a 4063GB table which is mostly in need of dirtying pages. That is just the throttling time, not counting the actual work-time, nor the repeated scanning of the indexes. So it the actual run time could be months. The chances of autovacuum getting to run to completion in one sitting without being interrupted by something could be pretty low. Does your database get restarted often? Do you build and drop indexes on this table a lot, or add and drop partitions, or run ALTER TABLE?
Note that in v12, the default setting of autovacuum_vacuum_cost_delay was lowered by a factor of 10. This is not just because of some change to the code in v12, it was because we realized the default setting was just not sensible on modern hardware. So it would probably make sense to backport this change your existing database, if not go even further. Before 12, you can't lower to less than 1 ms, but you could lower it to 1 ms and also either increase autovacuum_vacuum_cost_delay or lower vacuum_cost_page_* setting.
Now this analysis is based on the table already being extremely bloated. Why didn't autovacuum prevent it from getting this bloated in the first place, back when the table was small enough to be autovacuumed in a reasonable time? That is hard to say. We really have no evidence as to what happened back then. Maybe your settings were even more throttled than they are now (although unlikely as it looks like you just accepted the defaults), maybe it was constantly interrupted by something. What is the "autovacuum_count" from pg_stat_all_tables for the table and its toast table?
Why did the vacuum full blocked?
Because that is how it works, as documented. That is why it is important to avoid getting into this situation in the first place. VACUUM FULL needs to swap around filenodes at the end, and needs an AccessExclusive lock to do that. It could take a weaker lock at first and then try to upgrade to AccessExclusive later, but lock upgrades have a strong deadlock risk, so it takes the strongest lock it needs up front.
You need a maintenance window where no one else is using the table. If you think you are already in such window, then you should look at the query text for the process doing the blocking. Because the lock already held is ShareUpdateExclusive, the thing holding it is not a normal query/DML, but some kind of DDL or maintenance operation.
If you can't take a maintenance window now, then you can at least do a manual VACUUM without the FULL. This takes a much weaker lock. It probably won't shrink the table dramatically, but should at least free up space for internal reuse so it stops getting even bigger while you figure out when you can schedule a maintenance window or what your other next steps are.
I can list the partitions with
SELECT
child.relname AS child_schema
FROM pg_inherits
JOIN pg_class child ON pg_inherits.inhrelid = child.oid ;
Is it guaranteed that they are listed in creation order? Because then only an additional LIMIT 1 is required. Else this will print the oldest, the one with the lowest number in its name: (my partitions are named name_1 name_2 name_3 ...)
SELECT
MIN ( trim(leading 'name_' from child.relname)::int ) AS child_schema
FROM pg_inherits
JOIN pg_class child ON pg_inherits.inhrelid = child.oid ;
Then I need to create a script which uses the result to execute DROP TABLE? Is there no easier way?
Is it guaranteed that they are listed in creation order?
No. This is likely as long as sequential scans and no dropped tables, but if you change the query and the plan changes, you could get rather unexpected results ordering-wise. Also I would expect that once free space is re-used, the ordering may change as well.
Your current trim query is the best way. Stick with it.
I have a DB table with 25M rows, ~3K each (i.e. ~75GB), that together with multiple indexes I use (an additional 15-20GB) will not fit entirely in memory (64GB on machine). A typical query locates 300 rows thru an index, optionally filters them down to ~50-300 rows using other indexes, finally fetching the matching rows. Response times vary between 20ms on a warm DB to 20 secs on a cold DB. I have two related questions:
At any given time how can I check what portion (%) of specific tables and indexes is cached in memory?
What is the best way to warm up the cache before opening the DB to queries? E.g. "select *" forces a sequential scan (~15 minutes on cold DB) but response times following it are still poor. Is there a built-in way to do this instead of via queries?a
Thanks, feel free to also reply by email (info#shauldar.com])
-- Shaul
Regarding your first point, the contrib module "pg_buffercache" allows you to inspect the contents of the buffer cache. I like to define this:
create or replace view util.buffercache_hogs as
select case
when pg_buffercache.reldatabase = 0
then '- global'
when pg_buffercache.reldatabase <> (select pg_database.oid from pg_database where pg_database.datname = current_database())
then '- database ' || quote_literal(pg_database.datname)
when pg_namespace.nspname = 'pg_catalog'
then '- system catalogues'
when pg_class.oid is null and pg_buffercache.relfilenode > 0
then '- unknown file ' || pg_buffercache.relfilenode
when pg_namespace.nspname = 'pg_toast' and pg_class.relname ~ '^pg_toast_[0-9]+$'
then (substring(pg_class.relname, 10)::oid)::regclass || ' TOAST'::text
when pg_namespace.nspname = 'pg_toast' and pg_class.relname ~ '^pg_toast_[0-9]+_index$'
then ((rtrim(substring(pg_class.relname, 10), '_index'))::oid)::regclass || ' TOAST index'
else pg_class.oid::regclass::text
end as key,
count(*) as buffers, sum(case when pg_buffercache.isdirty then 1 else 0 end) as dirty_buffers,
round(count(*) / (SELECT pg_settings.setting FROM pg_settings WHERE pg_settings.name = 'shared_buffers')::numeric, 4) as hog_factor
from pg_buffercache
left join pg_database on pg_database.oid = pg_buffercache.reldatabase
left join pg_class on pg_class.relfilenode = pg_buffercache.relfilenode
left join pg_namespace on pg_namespace.oid = pg_class.relnamespace
group by 1
order by 2 desc;
Additionally, the "pageinspect" contrib module allows you to access a specific page from a relation, so I suppose you could simply loop through all the pages in a relation grabbing them?
select count(get_raw_page('information_schema.sql_features', n))
from generate_series(0,
(select relpages-1 from pg_class where relname = 'sql_features')) n;
This will load all of information_schema.sql_features into the cache.
2) I usually solve this by having a log of queries from a live system and replaying them. This warms up the typical parts of the data and not the parts that aren't as frequently used (which would otherwise waste RAM).
Ad. 1 - I have absolutely no idead.
Ad. 2 - why don't you just choose randomly some queries that you know that are important, and run them on cold server? the more the queries you'll run, the better will be the warmup process.
Don't try to warm up memory, that's the postgresql and OS work. Just divide tables (and indexes) in partitions and try to work with smaller data sets. If you accomplish to establish a good partitioning plan then there is no problem with huge indexes or tables. If you still want to warm up tables and indexes then maybe can be cached completly in RAM because are smaller than before.