checkpoint_completion_target being ignored - postgresql

I'm testing checkpoint_completion_target in RDS PostgreSQL and see that checkpoint is taking total time of 28.5 seconds. However, I configured the
checkpoint_completion_target = 0.9
checkpoint_timeout = 300
According to this, should the checkpoint spread for 300*0.9 which is 270 seconds?
PostgreSQL version 11.10
Log:
2021-03-19 16:06:47 UTC::#:[25023]:LOG: checkpoint starting: time
2021-03-19 16:07:16 UTC::#:[25023]:LOG: checkpoint complete: wrote 283 buffers (0.2%); 0 WAL file(s) added, 0 removed, 1 recycled; write=28.500 s, sync=0.006 s, total=28.533 s; sync files=56, longest=0.006 s, average=0.000 s; distance=64990 kB, estimate=68721 kB
https://www.postgresql.org/docs/10/runtime-config-wal.html
https://www.postgresql.org/docs/11/wal-configuration.html

The checkpointer implements its throttling by napping in 0.1 second chunks. And there is no provision for taking more than one nap per buffer needing to be written. So if there is very little work to be done, it will finish early despite the setting of checkpoint_completion_target.

Related

Postgres - pg_statistic vacuum timeout

In my Aurora Postgres server I continuously see a vacuum timeout every minute for the following:
autovacuum: VACUUM pg_catalog.pg_statistic
I tried doing it manually and got following output:
INFO: vacuuming "pg_catalog.pg_statistic"
INFO: "pg_statistic": found 0 removable, 409109 nonremovable row versions in 19981 out of 19990 pages
DETAIL: 408390 dead row versions cannot be removed yet, oldest xmin: 4230263474
There were 0 unused item identifiers.
Skipped 0 pages due to buffer pins, 9 frozen pages.
0 pages are entirely empty.
CPU: user: 0.06 s, system: 0.00 s, elapsed: 0.07 s.
INFO: vacuuming "pg_toast.pg_toast_2619"
INFO: "pg_toast_2619": found 0 removable, 272673 nonremovable row versions in 61558 out of 61558 pages
DETAIL: 219 dead row versions cannot be removed yet, oldest xmin: 4230263474
There were 0 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.14 s, system: 0.00 s, elapsed: 0.14 s.
VACUUM
Query returned successfully in 5 secs 55 msec.
Anyone can point to the reason why this is happening?
I think your autovacuum might work as well, Timeout:VacuumDelay might only be a metric From AWS
A process is waiting in a cost-based vacuum delay point.
The most similar setting is autovacuum_vacuum_cost_delay which exists in PostgreSQL.
That want to slow down autovacuum. autovacuum will sleep for these many milliseconds when a cleanup reaching autovacuum_vacuum_cost_limit cost is done.
We can try to use pg_stat_user_tables to verify the latest time of autovacuum.
SELECT
schemaname, relname,
last_autovacuum,
last_autoanalyze
FROM pg_stat_user_tables;

Postgres walwriter and background writer using memory

I just did a large import of renderd tiles for an osm server. I want to start my next process (running the import of nominatim) but it takes a lot of memory. The problem I have is that walwriter, background writer, checkpointer are consuming 131GB of memory. I checked pg_top and the processes are sleeping. Is there any way to clear these processes safely or just force postgres to complete the walwriter and background writer?
I am using Postgres v12, and shared_buffers is set to 128GB.
HTOP:
pg_top:
last pid: 628600; load avg 0.08, 0.03, 0.04; up 1+00:31:38 02:22:22
5 5 sleeping
CPU states: 0.0% user, 0.0% nice, 0.0% system, 100% idle, 0.0% iowait
Memory: 487G used, 16G free, 546M buffers, 253G cached
DB activity: 0 tps, 0 rollbs/s, 0 buffer r/s, 100 hit%, 43 row r/s, 0 row w/s -
DB I/O: 0 reads/s, 0 KB/s, 0 writes/s, 0 KB/s
DB disk: 3088.7 GB total, 2538.8 GB free (17% used)
Swap: 45M used, 8147M free, 588K cached
627692 postgres 20 0 131G 4368K sleep 0:00 0.00% 0.00% postgres: 12/main: background writer
627691 postgres 20 0 131G 6056K sleep 0:00 0.00% 0.00% postgres: 12/main: checkpointer
627693 postgres 20 0 131G 4368K sleep 0:00 0.00% 0.00% postgres: 12/main: walwriter
628601 postgres 20 0 131G 11M sleep 0:00 0.00% 0.00% postgres: 12/main: postgres postgres [local] idle
627695 postgres 20 0 131G 6924K sleep 0:00 0.00% 0.00% postgres: 12/main: logical replication launcher
pg_wal directory:
Everything is just fine, and htop is lying to you.
Of course the background processes that access shared buffers will use that memory, and since it is shared memory, it is reported for each of these processes. In reality, it is allocated only once.
The shared memory allocated by PostgreSQL is slightly bigger than shared_buffers, so if that parameter is set to 128GB, you reported data make sense and are perfectly normal.
If you set max_wal_size = 32GB, it is normal to have a lot of WAL segments.

pg_top output analysis of puppetdb with postgres

I recently started using a tool called pg_top that shows statistics for Postgres, however since I am not verify versed with the internals of Postgres I needed a bit of clarification on the output.
last pid: 6152; load avg: 19.1, 18.6, 20.4; up 119+20:31:38 13:09:41
41 processes: 5 running, 36 sleeping
CPU states: 52.1% user, 0.0% nice, 0.8% system, 47.1% idle, 0.0% iowait
Memory: 47G used, 16G free, 2524M buffers, 20G cached
DB activity: 151 tps, 0 rollbs/s, 253403 buffer r/s, 86 hit%, 1550639 row r/s,
21 row w/s
DB I/O: 0 reads/s, 0 KB/s, 35 writes/s, 2538 KB/s
DB disk: 233.6 GB total, 195.1 GB free (16% used)
Swap:
My question is under the DB Activity, the 1.5 million row r/s, is that a lot? If so what can be done to improve it? I am running puppetdb 2.3.8, with 6.8 million resources, 2500 nodes, and Postgres 9.1. All of this runs on a single 24 core box with 64GB of memory.

Postgres gets out of memory errors despite having plenty of free memory

I have a server running Postgres 9.1.15. The server has 2GB of RAM and no swap. Intermittently Postgres will start getting "out of memory" errors on some SELECTs, and will continue doing so until I restart Postgres or some of the clients that are connected to it. What's weird is that when this happens, free still reports over 500MB of free memory.
select version();:
PostgreSQL 9.1.15 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit
uname -a:
Linux db 3.2.0-23-virtual #36-Ubuntu SMP Tue Apr 10 22:29:03 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
Postgresql.conf (everything else is commented out/default):
max_connections = 100
shared_buffers = 500MB
work_mem = 2000kB
maintenance_work_mem = 128MB
wal_buffers = 16MB
checkpoint_segments = 32
checkpoint_completion_target = 0.9
random_page_cost = 2.0
effective_cache_size = 1000MB
default_statistics_target = 100
log_temp_files = 0
I got these values from pgtune (I chose "mixed type of applications") and have been fiddling with them based on what I've read, without making much real progress. At the moment there's 68 connections, which is a typical number (I'm not using pgbouncer or any other connection poolers yet).
/etc/sysctl.conf:
kernel.shmmax=1050451968
kernel.shmall=256458
vm.overcommit_ratio=100
vm.overcommit_memory=2
I first changed overcommit_memory to 2 about a fortnight ago after the OOM killer killed the Postgres server. Prior to that the server had been running fine for a long time. The errors I get now are less catastrophic but much more annoying because they are much more frequent.
I haven't had much luck pinpointing the first event that causes postgres to run "out of memory" - it seems to be different each time. The most recent time it crashed, the first three lines logged were:
2015-04-07 05:32:39 UTC ERROR: out of memory
2015-04-07 05:32:39 UTC DETAIL: Failed on request of size 125.
2015-04-07 05:32:39 UTC CONTEXT: automatic analyze of table "xxx.public.delayed_jobs"
TopMemoryContext: 68688 total in 10 blocks; 4560 free (4 chunks); 64128 used
[... snipped heaps of lines which I can provide if they are useful ...]
---
2015-04-07 05:33:58 UTC ERROR: out of memory
2015-04-07 05:33:58 UTC DETAIL: Failed on request of size 16.
2015-04-07 05:33:58 UTC STATEMENT: SELECT oid, typname, typelem, typdelim, typinput FROM pg_type
2015-04-07 05:33:59 UTC LOG: could not fork new process for connection: Cannot allocate memory
2015-04-07 05:33:59 UTC LOG: could not fork new process for connection: Cannot allocate memory
2015-04-07 05:33:59 UTC LOG: could not fork new process for connection: Cannot allocate memory
TopMemoryContext: 396368 total in 50 blocks; 10160 free (28 chunks); 386208 used
[... snipped heaps of lines which I can provide if they are useful ...]
---
2015-04-07 05:33:59 UTC ERROR: out of memory
2015-04-07 05:33:59 UTC DETAIL: Failed on request of size 1840.
2015-04-07 05:33:59 UTC STATEMENT: SELECT... [nested select with 4 joins, 19 ands, and 2 order bys]
TopMemoryContext: 388176 total in 49 blocks; 17264 free (55 chunks); 370912 used
The crash before that, a few hours earlier, just had three instances of that last query as the first three lines of the crash. That query gets run very often, so I'm not sure if the issues are because of this query, or if it just comes up in the error log because it's a reasonably complex SELECT getting run all the time. That said, here's an EXPLAIN ANALYZE of it: http://explain.depesz.com/s/r00
This is what ulimit -a for the postgres user looks like:
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 15956
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 15956
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
I'll try and get the exact numbers from free next time there's a crash, in the meantime this is a braindump of all the info I have.
Any ideas on where to go from here?
I just ran into this same issue with a ~2.5 GB plain-text SQL file I was trying to restore. I scaled my Digital Ocean server up to 64 GB RAM, created a 10 GB swap file, and tried again. I got an out-of-memory error with 50 GB free, and no swap in use.
I scaled back my server to the small 1 GB instance I was using (requiring a reboot) and figured I'd give it another shot for no other reason than I was frustrated. I started the import and realized I forgot to create my temporary swap file again.
I created it in the middle of the import. psql made it a lot further before crashing. It made it through 5 additional tables.
I think there must be a bug allocating memory in psql.
Can you check if there's any swap memory available when the error raises up?
I've remove completely the swap memory in my Linux desktop (just for testing other things...) and I got the exactly same error! I'm pretty sure that this is what is going on with you too.
It is a bit suspicious that you report the same free memory size as your shared_buffers size. Are you sure you are looking the right values?
Output of free command at the time of crash would be useful as well as the content of /proc/meminfo
Beware that setting overcommit_memory to 2 is not so effective if you see the overcommit_ratio to 100. It will basically limits the memory allocation to the size swap (0 in this case) + 100% of physical RAM, which doesn't take into account any space for shared memory and disk caches.
You should probably set overcommit_ratio to 50.

New zero sized sphinx index

I have just added a new index in sphinx server where mysql is running on local server, sphinx quires to local mysql, it searches data but index is zero sized,
sing config file '/usr/local/sphinx/etc/sphinx.conf'...
indexing index 'test'...
collected 861000 docs, 0.0 MB
collected 932575 docs, 0.0 MB
total 932575 docs, 0 bytes
total 479.180 sec, 0 bytes/sec, 1946.18 docs/sec
total 0 reads, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg
total 4 writes, 0.000 sec, 0.1 kb/call avg, 0.0 msec/call avg
rotating indices: succesfully sent SIGHUP to searchd (pid=5545).
please help
This issue was resolved after restarting sphinx service and .new.sp* files which sphinx created also gone after restarting sphinx.