High number of live/dead tuples in postgresql/ Vacuum not working - postgresql

There is a table , which has 200 rows . But number of live tuples showing there is more than that (around 60K) .
select count(*) from subscriber_offset_manager;
count
-------
200
(1 row)
SELECT schemaname,relname,n_live_tup,n_dead_tup FROM pg_stat_user_tables where relname='subscriber_offset_manager' ORDER BY n_dead_tup
;
schemaname | relname | n_live_tup | n_dead_tup
------------+---------------------------+------------+------------
public | subscriber_offset_manager | 61453 | 5
(1 row)
But as seen from pg_stat_activity and pg_locks , we are not able to track any open connection .
SELECT query, state,locktype,mode
FROM pg_locks
JOIN pg_stat_activity
USING (pid)
WHERE relation::regclass = 'subscriber_offset_manager'::regclass
;
query | state | locktype | mode
-------+-------+----------+------
(0 rows)
I also tried full vacuum on this table , Below are results :
All the times no rows are removed
some times all the live tuples become dead tuples .
Here is output .
vacuum FULL VERBOSE ANALYZE subscriber_offset_manager;
INFO: vacuuming "public.subscriber_offset_manager"
INFO: "subscriber_offset_manager": found 0 removable, 67920 nonremovable row versions in 714 pages
DETAIL: 67720 dead row versions cannot be removed yet.
CPU 0.01s/0.06u sec elapsed 0.13 sec.
INFO: analyzing "public.subscriber_offset_manager"
INFO: "subscriber_offset_manager": scanned 710 of 710 pages, containing 200 live rows and 67720 dead rows; 200 rows in sample, 200 estimated total rows
VACUUM
SELECT schemaname,relname,n_live_tup,n_dead_tup FROM pg_stat_user_tables where relname='subscriber_offset_manager' ORDER BY n_dead_tup
;
schemaname | relname | n_live_tup | n_dead_tup
------------+---------------------------+------------+------------
public | subscriber_offset_manager | 200 | 67749
and after 10 sec
SELECT schemaname,relname,n_live_tup,n_dead_tup FROM pg_stat_user_tables where relname='subscriber_offset_manager' ORDER BY n_dead_tup
;
schemaname | relname | n_live_tup | n_dead_tup
------------+---------------------------+------------+------------
public | subscriber_offset_manager | 68325 | 132
How Our App query to this table .
Our application generally select some rows and based on some business calculation, update the row .
select query -- select based on some id
select * from subscriber_offset_manager where shard_id=1 ;
update query -- update some other column for this selected shard id
around 20 threads do this in parallel and One thread works on only one row .
app is writen in java and we are using hibernate to do db operations .
Postgresql version is 9.3.24
One more interesting observation :
- when i stop my java app and then do full vacuum , it works fine (number of rows and live tuples become equal). So there is something wrong if we select and update continuously from java app . –
Problem/Issue
These live tuples some times go to dead tuples and after some times again comes to live .
Due to above behaviour select from the table taking time and increasing load on server as lots of live/deadtuples are there ..

I know three things that keep VACUUM from doing its job:
Long running transactions.
Prepared transactions that did not get committed.
Stale replication slots.
See my blog post for details.

I got the issue ☺ .
For Understanding the issue consider the following flow :
Thread 1 -
Opens a hibernate session
Make some queries on Table-A
Select from subscriber_offset_manager
Update subscriber_offset_manager .
Closes the Session .
Many Threads of Type Thread-1 running in parallel .
Thread 2 -
These type of threads are running in parallel .
Opens a hibernate session
Make some select queries on Table-A
Does not close session .(session leak .)
Temporary Solution - If i close all those connection made by Thread-2 by using pg_cancel_backend then vacuuming starts working .
Also we have recreated the issue many times and tried this solution and it worked .
Now, there are following doubts which are still not answered .
Why postgres is not showing any data related to table "subscriber_offset_manager" .
This issue is not re-creating when instead of running Thread-2 , if we run select on Table-A , using psql .
why postgres is working like this with jdbc .
Some more mind blowing observation :
event if we run queries on "subscriber_offset_manager" in different session then also issue coming ;
we found many instance here where Thread 2 is working on some third table "Table-C" and issue is coming
all these type od transactions state in pg_stat_activity is "idle_in_transaction ."
#Erwin Brandstetter and #Laurenz Albe , if you know there is bug related to postgres/jdbc .

There might be locks after all, your query might be misleading:
SELECT query, state,locktype,mode
FROM pg_locks
JOIN pg_stat_activity USING (pid)
WHERE relation = 'subscriber_offset_manager'::regclass
pg_locks.pid can be NULL, then the join would eliminate rows. The manual for Postgres 9.3:
Process ID of the server process holding or awaiting this lock, or null if the lock is held by a prepared transaction
Bold emphasis mine. (Still the same in pg 10.)
Do you get anything for the simple query?
SELECT * FROM pg_locks
WHERE relation = 'subscriber_offset_manager'::regclass;
This could explain why VACUUM complains:
DETAIL: 67720 dead row versions cannot be removed yet.
This, in turn, would point to problems in your application logic / queries, locking more rows than necessary.
My first idea would be long running transactions, where even a simple SELECT (acquiring a lowly ACCESS SHARE lock) can block VACUUM from doing its job. 20 threads in parallel might chain up and lock out VACUUM indefinitely. Keep your transactions (and their locks) as brief as possible. And make sure your queries are optimized and don't lock more rows than necessary.
One more thing to note: transaction isolation levels SERIALIZABLE or REPEATABLE READ make it much harder for VACUUM to clean up. Default READ COMMITTED mode is less restrictive, but VACUUM can still be blocked as discussed.
Related:
What are the consequences of not ending a database transaction?
Postgres UPDATE … LIMIT 1
VACUUM VERBOSE outputs, nonremovable “dead row versions cannot be removed yet”?

Related

Why does Postgres VACUUM FULL ANALYZE gives performance boost but VACUUM ANALYZE does not

I have a large database with the largest tables having more than 30 million records. The database server is a dedicated server with 64 cores, 128 GB RAM running ubuntu and postgres 12. So the server is more powerful than we normally need. The server receives around 300-400 new records every second.
The problem is that almost after 1 week or 10 days of use the database becomes extremely slow, therefore we have to perform VACUUM FULL ANALYZE, and after this everything goes back to normal. But we have to put our server in maintenance mode and then perform this operation every week which is a pain.
I came up with the idea that we don't need a VACUUM FULL and we can just run ANALYZE on the database as it can run in parallel, but this didn't work. There was no performance gains after running this. Even when i run simple VACUUM on the whole database and then run ANALYZE after it, it still doesn't give the kind of performance boost that we get from VACUUM FULL ANALYZE.
I know that VACUUM FULL copies the data from the old table to a new tables and deletes the old table. But what else does it do?
Update:
So i have also reindexed the 15 largest tables, in order to confirm if this would speed up the database. But this also didnt work.
So i had to execute VACUUM FULL ANALYZE, as i didnt see any other way. Now i am trying to identify the slow queries.
Thanks to jjanes, i was able to install Track_io_timing and also identified a few queries where indexes can be added. I am using like this
SELECT * FROM pg_stat_statements ORDER BY total_time DESC;
And i get this result.
userid | 10
dbid | 16401
queryid | -3264485807545194012
query | update events set field1 = $1, field2 = $2 , field3= $3, field4 = $4 , field5 =$5 where id = $6
calls | 104559
total_time | 106180828.60536088
min_time | 3.326082
max_time | 259055.09376800002
mean_time | 1015.5111334783633
stddev_time | 1665.0715182035976
rows | 104559
shared_blks_hit | 4456728574
shared_blks_read | 4838722113
shared_blks_dirtied | 879809
shared_blks_written | 326809
local_blks_hit | 0
local_blks_read | 0
local_blks_dirtied | 0
local_blks_written | 0
temp_blks_read | 0
temp_blks_written | 0
blk_read_time | 15074237.05887792
blk_write_time | 15691.634870000113
This query simply updates 1 record, and the table size is around 30 Million records.
Question: This query already uses an index, can you please guide on what should be the next step and why is this slow? Also IO information does this show?
As you say, VACUUM FULL is an expensive command. PGs secret weapon is AUTOVACUUM, which monitors database stats and attempts to target tables with dead tuples. Read about how to tune it for the database as a whole, and possibly for big tables.

Why is the postgreSQL waiting while executing vacuum full table? 4T table data

I have a bloated table, its name is "role_info".
There are about 20K insert operations and a lot of update operations per day, there are no delete operations.
The table is about 4063GB now.
We have migrated the table to another database using dump, and the new table is about 62GB, so the table on the old database is bloated very seriously.
PostgreSQL version: 9.5.4
The table schema is below:
CREATE TABLE "role_info" (
"roleId" bigint NOT NULL,
"playerId" bigint NOT NULL,
"serverId" int NOT NULL,
"status" int NOT NULL,
"baseData" bytea NOT NULL,
"detailData" bytea NOT NULL,
PRIMARY KEY ("roleId")
);
CREATE INDEX "idx_role_info_serverId_playerId_roleId" ON "role_info" ("serverId", "playerId", "roleId");
The average size of field 'detailData' is about 13KB each line.
There are some SQL execution results below:
1)
SELECT
relname AS name,
pg_stat_get_live_tuples(c.oid) AS lives,
pg_stat_get_dead_tuples(c.oid) AS deads
FROM pg_class c
ORDER BY deads DESC;
Execution Result:
2)
SELECT *,
Pg_size_pretty(total_bytes) AS total,
Pg_size_pretty(index_bytes) AS INDEX,
Pg_size_pretty(toast_bytes) AS toast,
Pg_size_pretty(table_bytes) AS TABLE
FROM (SELECT *,
total_bytes - index_bytes - Coalesce(toast_bytes, 0) AS
table_bytes
FROM (SELECT c.oid,
nspname AS table_schema,
relname AS TABLE_NAME,
c.reltuples AS row_estimate,
Pg_total_relation_size(c.oid) AS total_bytes,
Pg_indexes_size(c.oid) AS index_bytes,
Pg_total_relation_size(reltoastrelid) AS toast_bytes
FROM pg_class c
LEFT JOIN pg_namespace n
ON n.oid = c.relnamespace
WHERE relkind = 'r') a
WHERE table_schema = 'public'
ORDER BY total_bytes DESC) a;
Execution Result:
3)
I have tried to vacuum full the table "role_info", but it seemed blocked by some other process, and didn't execute at all.
select * from pg_stat_activity where query like '%VACUUM%' and query not like '%pg_stat_activity%';
Execution Result:
select * from pg_locks;
Execution Result:
There are parameters of vacuum:
I have two questions:
How to deal with table bloating? autovacuum seems not working.
Why did the vacuum full blocked?
With your autovacuum settings, it will sleep for 20ms once for every 10 pages (200 cost_limit / 20 cost_dirty) it dirties. Even more because there will also be cost_hit and cost_miss as well. At that rate is would take over 12 days to autovacuum a 4063GB table which is mostly in need of dirtying pages. That is just the throttling time, not counting the actual work-time, nor the repeated scanning of the indexes. So it the actual run time could be months. The chances of autovacuum getting to run to completion in one sitting without being interrupted by something could be pretty low. Does your database get restarted often? Do you build and drop indexes on this table a lot, or add and drop partitions, or run ALTER TABLE?
Note that in v12, the default setting of autovacuum_vacuum_cost_delay was lowered by a factor of 10. This is not just because of some change to the code in v12, it was because we realized the default setting was just not sensible on modern hardware. So it would probably make sense to backport this change your existing database, if not go even further. Before 12, you can't lower to less than 1 ms, but you could lower it to 1 ms and also either increase autovacuum_vacuum_cost_delay or lower vacuum_cost_page_* setting.
Now this analysis is based on the table already being extremely bloated. Why didn't autovacuum prevent it from getting this bloated in the first place, back when the table was small enough to be autovacuumed in a reasonable time? That is hard to say. We really have no evidence as to what happened back then. Maybe your settings were even more throttled than they are now (although unlikely as it looks like you just accepted the defaults), maybe it was constantly interrupted by something. What is the "autovacuum_count" from pg_stat_all_tables for the table and its toast table?
Why did the vacuum full blocked?
Because that is how it works, as documented. That is why it is important to avoid getting into this situation in the first place. VACUUM FULL needs to swap around filenodes at the end, and needs an AccessExclusive lock to do that. It could take a weaker lock at first and then try to upgrade to AccessExclusive later, but lock upgrades have a strong deadlock risk, so it takes the strongest lock it needs up front.
You need a maintenance window where no one else is using the table. If you think you are already in such window, then you should look at the query text for the process doing the blocking. Because the lock already held is ShareUpdateExclusive, the thing holding it is not a normal query/DML, but some kind of DDL or maintenance operation.
If you can't take a maintenance window now, then you can at least do a manual VACUUM without the FULL. This takes a much weaker lock. It probably won't shrink the table dramatically, but should at least free up space for internal reuse so it stops getting even bigger while you figure out when you can schedule a maintenance window or what your other next steps are.

Autovacuum not removing dead rows (and xmin horizon doesn't match xmin of any session) [duplicate]

There is a table , which has 200 rows . But number of live tuples showing there is more than that (around 60K) .
select count(*) from subscriber_offset_manager;
count
-------
200
(1 row)
SELECT schemaname,relname,n_live_tup,n_dead_tup FROM pg_stat_user_tables where relname='subscriber_offset_manager' ORDER BY n_dead_tup
;
schemaname | relname | n_live_tup | n_dead_tup
------------+---------------------------+------------+------------
public | subscriber_offset_manager | 61453 | 5
(1 row)
But as seen from pg_stat_activity and pg_locks , we are not able to track any open connection .
SELECT query, state,locktype,mode
FROM pg_locks
JOIN pg_stat_activity
USING (pid)
WHERE relation::regclass = 'subscriber_offset_manager'::regclass
;
query | state | locktype | mode
-------+-------+----------+------
(0 rows)
I also tried full vacuum on this table , Below are results :
All the times no rows are removed
some times all the live tuples become dead tuples .
Here is output .
vacuum FULL VERBOSE ANALYZE subscriber_offset_manager;
INFO: vacuuming "public.subscriber_offset_manager"
INFO: "subscriber_offset_manager": found 0 removable, 67920 nonremovable row versions in 714 pages
DETAIL: 67720 dead row versions cannot be removed yet.
CPU 0.01s/0.06u sec elapsed 0.13 sec.
INFO: analyzing "public.subscriber_offset_manager"
INFO: "subscriber_offset_manager": scanned 710 of 710 pages, containing 200 live rows and 67720 dead rows; 200 rows in sample, 200 estimated total rows
VACUUM
SELECT schemaname,relname,n_live_tup,n_dead_tup FROM pg_stat_user_tables where relname='subscriber_offset_manager' ORDER BY n_dead_tup
;
schemaname | relname | n_live_tup | n_dead_tup
------------+---------------------------+------------+------------
public | subscriber_offset_manager | 200 | 67749
and after 10 sec
SELECT schemaname,relname,n_live_tup,n_dead_tup FROM pg_stat_user_tables where relname='subscriber_offset_manager' ORDER BY n_dead_tup
;
schemaname | relname | n_live_tup | n_dead_tup
------------+---------------------------+------------+------------
public | subscriber_offset_manager | 68325 | 132
How Our App query to this table .
Our application generally select some rows and based on some business calculation, update the row .
select query -- select based on some id
select * from subscriber_offset_manager where shard_id=1 ;
update query -- update some other column for this selected shard id
around 20 threads do this in parallel and One thread works on only one row .
app is writen in java and we are using hibernate to do db operations .
Postgresql version is 9.3.24
One more interesting observation :
- when i stop my java app and then do full vacuum , it works fine (number of rows and live tuples become equal). So there is something wrong if we select and update continuously from java app . –
Problem/Issue
These live tuples some times go to dead tuples and after some times again comes to live .
Due to above behaviour select from the table taking time and increasing load on server as lots of live/deadtuples are there ..
I know three things that keep VACUUM from doing its job:
Long running transactions.
Prepared transactions that did not get committed.
Stale replication slots.
See my blog post for details.
I got the issue ☺ .
For Understanding the issue consider the following flow :
Thread 1 -
Opens a hibernate session
Make some queries on Table-A
Select from subscriber_offset_manager
Update subscriber_offset_manager .
Closes the Session .
Many Threads of Type Thread-1 running in parallel .
Thread 2 -
These type of threads are running in parallel .
Opens a hibernate session
Make some select queries on Table-A
Does not close session .(session leak .)
Temporary Solution - If i close all those connection made by Thread-2 by using pg_cancel_backend then vacuuming starts working .
Also we have recreated the issue many times and tried this solution and it worked .
Now, there are following doubts which are still not answered .
Why postgres is not showing any data related to table "subscriber_offset_manager" .
This issue is not re-creating when instead of running Thread-2 , if we run select on Table-A , using psql .
why postgres is working like this with jdbc .
Some more mind blowing observation :
event if we run queries on "subscriber_offset_manager" in different session then also issue coming ;
we found many instance here where Thread 2 is working on some third table "Table-C" and issue is coming
all these type od transactions state in pg_stat_activity is "idle_in_transaction ."
#Erwin Brandstetter and #Laurenz Albe , if you know there is bug related to postgres/jdbc .
There might be locks after all, your query might be misleading:
SELECT query, state,locktype,mode
FROM pg_locks
JOIN pg_stat_activity USING (pid)
WHERE relation = 'subscriber_offset_manager'::regclass
pg_locks.pid can be NULL, then the join would eliminate rows. The manual for Postgres 9.3:
Process ID of the server process holding or awaiting this lock, or null if the lock is held by a prepared transaction
Bold emphasis mine. (Still the same in pg 10.)
Do you get anything for the simple query?
SELECT * FROM pg_locks
WHERE relation = 'subscriber_offset_manager'::regclass;
This could explain why VACUUM complains:
DETAIL: 67720 dead row versions cannot be removed yet.
This, in turn, would point to problems in your application logic / queries, locking more rows than necessary.
My first idea would be long running transactions, where even a simple SELECT (acquiring a lowly ACCESS SHARE lock) can block VACUUM from doing its job. 20 threads in parallel might chain up and lock out VACUUM indefinitely. Keep your transactions (and their locks) as brief as possible. And make sure your queries are optimized and don't lock more rows than necessary.
One more thing to note: transaction isolation levels SERIALIZABLE or REPEATABLE READ make it much harder for VACUUM to clean up. Default READ COMMITTED mode is less restrictive, but VACUUM can still be blocked as discussed.
Related:
What are the consequences of not ending a database transaction?
Postgres UPDATE … LIMIT 1
VACUUM VERBOSE outputs, nonremovable “dead row versions cannot be removed yet”?

Postgres slow distinct query for multiple columns

I have a very simple query that is taking way too long to run.
SELECT DISTINCT col1,col2,col3,col4 FROM tbl1;
What indexes do I need to add to speed up? I ran a simple vacuum; command and added the following index but neither helped.
CREATE INDEX tbl_idx ON tbl1(col1,col2,col3,col4);
The table has 400k rows. In fact counting them is taking extremely long as well. Running a simple
SELECT count(*) from tbl1;
is taking 8 seconds. So it's possible my problems are with vacuuming or reindexing or something I'm not sure.
Here is the explain command
EXPLAIN SELECT DISTINCT col1,col2,col3,col4 FROM tbl1;
QUERY PLAN
---------------------------------------------------------------------------------
Unique (cost=3259846.80..3449267.51 rows=137830 width=25)
-> Sort (cost=3259846.80..3297730.94 rows=15153657 width=25)
Sort Key: col1, col2, col3, col4
-> Seq Scan on tbl1 (cost=0.00..727403.57 rows=15153657 width=25)
(4 rows)
Edit: I'm currently running vacuum full; which hopefully fixes the issue and then maybe someone can give me some pointers on how to fix where I went wrong. It is several hours in and still going as far as I can tell. I did run
select relname, last_autoanalyze, last_autovacuum, last_vacuum, n_dead_tup from pg_stat_all_tables where n_dead_tup >0;
and the table has nearly 16 million n_dead_tup rows.
My data doesn't change that frequently so I ended up creating a materialized view
CREATE MATERIALIZED VIEW tbl1_distinct_view AS SELECT DISTINCT col1,col2,col3,col4 FROM tbl1;
that I refresh with a cronjob once a day at 6am
0 6 * * * psql -U mydb mydb -c 'REFRESH MATERIALIZED VIEW tbl1_distinct_view;
try force database to use your index
set enable_seqscan=off ;
SELECT DISTINCT col1,col2,col3,col4 FROM tbl1;
set enable_seqscan=on ;
VACUUM and VACUUM FULL are two commands that sound the same but have very different effects.
VACUUM scans a table for tuples that it no longer needs, so that it can overwrite that space during INSERT or UPDATE statements. This command only looks at deleted rows, and does not "defragment" the table - it leaves the space usage the same, but simply marks some space as "dead" in order that it can be reused.
VACUUM FULL looks at every row, and reclaims the space left by deleted rows and dead tuples, essentially "defragmenting" the table. If this is done on a live table, it can take a very long time, and can result in heavy weight locks, increased IO, and index bloat.
I imagine what you need is a VACUUM followed by an ANALYZE, which will rebuild your statistics for each table, improving index performance. These should be performed reasonably regularly in low-usage times for a database. Only if you have a lot of space to reclaim (due to lots of DELETE statements) should you use VACUUM FULL.
Anyhow, since you've run a VACUUM FULL, once that it complete you should run an ANALYZE on the database, followed by a REINDEX (on the database), and then an EXPLAIN on your query again, you should notice an improvement.

n_dead_tup vs dead_tuple_count in postgresql?

I initially thought n_dead_tup and dead_tuple_count in PostgreSQL give the same counts. But they seem to be not. I do not quite understand what exactly is difference.
Following are my observations:
Created a table with 10k rows.
Updated all the 10k rows. Now I have 10k dead tuples.
SELECT dead_tuple_count FROM public.pgstattuple('public.vacuum_test');
dead_tuple_count
------------------
10002
select * from pg_stat_get_dead_tuples('18466');
pg_stat_get_dead_tuples
-------------------------
10002
I did vacuum full on the table. As expected dead_tuple_count is 0.
SELECT dead_tuple_count FROM public.pgstattuple('public.vacuum_test');
dead_tuple_count
------------------
0
But n_dead_tup from pg_stat_all_tables i.e pg_stat_get_dead_tuples('18466') is still 10002:
select * from pg_stat_get_dead_tuples('18466');
pg_stat_get_dead_tuples
-------------------------
10002
I repeated this process several times and observed that number of updated tuples is getting added to the stat n_dead_tup after every update.
So what exactly is VACUUM doing here?
And what is the difference between n_dead_tup and dead_tuple_count?
pgstattuple scans the tables and calculates real-time results. It can be quite slow for a big table, but produces accurate results.
Access to the pg_stat views, directly or via functions like pg_stat_get_dead_tuples, uses the most recent data collected by ANALYZE. So it can be out of date, especially if you just made big changes. However, it's very fast to access.
If you ANALYZE the table, the stats will match again, or close. They'll often not be exactly the same because the stats from ANALYZE are just estimates.
BTW, it's time to upgrade from 8.4 to something current.
VACUUM FULL is a little buggy in that it doesn't reset those statistics counters. An ordinary VACUUM would do so.