Weird Postgres Xlog - postgresql-9.4

I have question about Postgres xlog. According the Postgres documentation, the Postgres xlog should be in this format 0000000100000044000000FE, i don t understand why the xlog format is my database is 000000010000004400000**5**FE. The additional 5 from where it comes...
Here real database xlogs for postgres 9.4
cat PG_VERSION
9.4
ll pg_xlog/ | tail -n 10
-rw——- 1 postgres postgres 2097152 May 12 03:24 0000000100000044000005FE
-rw——- 1 postgres postgres 2097152 May 12 03:24 0000000100000044000005FF
-rw——- 1 postgres postgres 2097152 May 12 03:24 000000010000004400000600
-rw——- 1 postgres postgres 2097152 May 12 03:24 000000010000004400000601
-rw——- 1 postgres postgres 2097152 May 12 03:24 000000010000004400000602
-rw——- 1 postgres postgres 2097152 May 12 03:24 000000010000004400000603
-rw——- 1 postgres postgres 2097152 May 12 03:24 000000010000004400000604
-rw——- 1 postgres postgres 2097152 May 12 03:24 000000010000004400000605
-rw——- 1 postgres postgres 2097152 May 12 03:24 000000010000004400000606
drwx—— 2 postgres postgres 45056 May 12 15:10 archive_status/
Thank you in advance

There is nothing "weird" about the filenames. You correctly stated that the filename should be in that format (the actual names will be very different in reality).
Quote from the manual:
Segment files are given ever-increasing numbers as names, starting at 000000010000000000000000. The numbers do not wrap, but it will take a very, very long time to exhaust the available stock of numbers.
So the filename reflects an increasing number that has some structure in it. You can't expect any specific values in there, on a busy system they will simply keep increasing.
The first 8 digits (the 00000001) reflect the timeline of the WAL record if I am not mistaken, the rest the sequence number of the WAL record for that timeline.

Related

PostgreSQL keeps WAL segments not required by any replication slot

I have wal_keep_segments set to 3000. But directory pg_xlog contains more than 6000 WAL segments. Interesting thing that there are ~ 3000 files dated after Aug 14, so files dated before Aug 14 should not be exists, I guess. Also these files have an executable bit set.
$ ls -al pg_xlog | grep -A2 -B2 00000001000034DB0000003B
-rwx------ 1 postgres postgres 16777216 Jul 19 07:58 00000001000034DB00000039
-rwx------ 1 postgres postgres 16777216 Jul 19 07:58 00000001000034DB0000003A
-rwx------ 1 postgres postgres 16777216 Jul 19 07:58 00000001000034DB0000003B
-rw------- 1 postgres postgres 16777216 Aug 14 19:17 0000000100003826000000EA
-rw------- 1 postgres postgres 16777216 Aug 14 19:17 0000000100003826000000EB
```
This cluster has no replication slots, archive_mode is enabled but archive_command is set to /bin/true. I think the new WAL segments are recycled and total amount is about 6000 but postgres does not delete the old files for some reason. Any ideas?
PostgreSQL is not in the habit of setting executable flags on WAL segments.
Besides, it looks like there is a gap in the numbering.
These files must be there by accident, you can delete them.

pg_top output analysis of puppetdb with postgres

I recently started using a tool called pg_top that shows statistics for Postgres, however since I am not verify versed with the internals of Postgres I needed a bit of clarification on the output.
last pid: 6152; load avg: 19.1, 18.6, 20.4; up 119+20:31:38 13:09:41
41 processes: 5 running, 36 sleeping
CPU states: 52.1% user, 0.0% nice, 0.8% system, 47.1% idle, 0.0% iowait
Memory: 47G used, 16G free, 2524M buffers, 20G cached
DB activity: 151 tps, 0 rollbs/s, 253403 buffer r/s, 86 hit%, 1550639 row r/s,
21 row w/s
DB I/O: 0 reads/s, 0 KB/s, 35 writes/s, 2538 KB/s
DB disk: 233.6 GB total, 195.1 GB free (16% used)
Swap:
My question is under the DB Activity, the 1.5 million row r/s, is that a lot? If so what can be done to improve it? I am running puppetdb 2.3.8, with 6.8 million resources, 2500 nodes, and Postgres 9.1. All of this runs on a single 24 core box with 64GB of memory.

Postgres gets out of memory errors despite having plenty of free memory

I have a server running Postgres 9.1.15. The server has 2GB of RAM and no swap. Intermittently Postgres will start getting "out of memory" errors on some SELECTs, and will continue doing so until I restart Postgres or some of the clients that are connected to it. What's weird is that when this happens, free still reports over 500MB of free memory.
select version();:
PostgreSQL 9.1.15 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit
uname -a:
Linux db 3.2.0-23-virtual #36-Ubuntu SMP Tue Apr 10 22:29:03 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
Postgresql.conf (everything else is commented out/default):
max_connections = 100
shared_buffers = 500MB
work_mem = 2000kB
maintenance_work_mem = 128MB
wal_buffers = 16MB
checkpoint_segments = 32
checkpoint_completion_target = 0.9
random_page_cost = 2.0
effective_cache_size = 1000MB
default_statistics_target = 100
log_temp_files = 0
I got these values from pgtune (I chose "mixed type of applications") and have been fiddling with them based on what I've read, without making much real progress. At the moment there's 68 connections, which is a typical number (I'm not using pgbouncer or any other connection poolers yet).
/etc/sysctl.conf:
kernel.shmmax=1050451968
kernel.shmall=256458
vm.overcommit_ratio=100
vm.overcommit_memory=2
I first changed overcommit_memory to 2 about a fortnight ago after the OOM killer killed the Postgres server. Prior to that the server had been running fine for a long time. The errors I get now are less catastrophic but much more annoying because they are much more frequent.
I haven't had much luck pinpointing the first event that causes postgres to run "out of memory" - it seems to be different each time. The most recent time it crashed, the first three lines logged were:
2015-04-07 05:32:39 UTC ERROR: out of memory
2015-04-07 05:32:39 UTC DETAIL: Failed on request of size 125.
2015-04-07 05:32:39 UTC CONTEXT: automatic analyze of table "xxx.public.delayed_jobs"
TopMemoryContext: 68688 total in 10 blocks; 4560 free (4 chunks); 64128 used
[... snipped heaps of lines which I can provide if they are useful ...]
---
2015-04-07 05:33:58 UTC ERROR: out of memory
2015-04-07 05:33:58 UTC DETAIL: Failed on request of size 16.
2015-04-07 05:33:58 UTC STATEMENT: SELECT oid, typname, typelem, typdelim, typinput FROM pg_type
2015-04-07 05:33:59 UTC LOG: could not fork new process for connection: Cannot allocate memory
2015-04-07 05:33:59 UTC LOG: could not fork new process for connection: Cannot allocate memory
2015-04-07 05:33:59 UTC LOG: could not fork new process for connection: Cannot allocate memory
TopMemoryContext: 396368 total in 50 blocks; 10160 free (28 chunks); 386208 used
[... snipped heaps of lines which I can provide if they are useful ...]
---
2015-04-07 05:33:59 UTC ERROR: out of memory
2015-04-07 05:33:59 UTC DETAIL: Failed on request of size 1840.
2015-04-07 05:33:59 UTC STATEMENT: SELECT... [nested select with 4 joins, 19 ands, and 2 order bys]
TopMemoryContext: 388176 total in 49 blocks; 17264 free (55 chunks); 370912 used
The crash before that, a few hours earlier, just had three instances of that last query as the first three lines of the crash. That query gets run very often, so I'm not sure if the issues are because of this query, or if it just comes up in the error log because it's a reasonably complex SELECT getting run all the time. That said, here's an EXPLAIN ANALYZE of it: http://explain.depesz.com/s/r00
This is what ulimit -a for the postgres user looks like:
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 15956
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 15956
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
I'll try and get the exact numbers from free next time there's a crash, in the meantime this is a braindump of all the info I have.
Any ideas on where to go from here?
I just ran into this same issue with a ~2.5 GB plain-text SQL file I was trying to restore. I scaled my Digital Ocean server up to 64 GB RAM, created a 10 GB swap file, and tried again. I got an out-of-memory error with 50 GB free, and no swap in use.
I scaled back my server to the small 1 GB instance I was using (requiring a reboot) and figured I'd give it another shot for no other reason than I was frustrated. I started the import and realized I forgot to create my temporary swap file again.
I created it in the middle of the import. psql made it a lot further before crashing. It made it through 5 additional tables.
I think there must be a bug allocating memory in psql.
Can you check if there's any swap memory available when the error raises up?
I've remove completely the swap memory in my Linux desktop (just for testing other things...) and I got the exactly same error! I'm pretty sure that this is what is going on with you too.
It is a bit suspicious that you report the same free memory size as your shared_buffers size. Are you sure you are looking the right values?
Output of free command at the time of crash would be useful as well as the content of /proc/meminfo
Beware that setting overcommit_memory to 2 is not so effective if you see the overcommit_ratio to 100. It will basically limits the memory allocation to the size swap (0 in this case) + 100% of physical RAM, which doesn't take into account any space for shared memory and disk caches.
You should probably set overcommit_ratio to 50.

Postgresql Logger Process

I'm trying to determine if Postgres 9.3 still has a logger process. It isn't referenced anywhere in the "PostgreSQL 9.3.4 Documentation". And I can't find it in my cluster's process list (see below). Also, does anyone know of a good general overview the memory structures in 9.3?
postgres 21397 1 0 20:51 pts/1 00:00:00 /opt/PostgreSQL/9.3/bin/postgres
postgres 21399 21397 0 20:51 ? 00:00:00 postgres: checkpointer process
postgres 21400 21397 0 20:51 ? 00:00:00 postgres: writer process
postgres 21401 21397 0 20:51 ? 00:00:00 postgres: wal writer process
postgres 21402 21397 0 20:51 ? 00:00:00 postgres: autovacuum launcher process
postgres 21403 21397 0 20:51 ? 00:00:00 postgres: archiver process last was 0001000004000092
postgres 21404 21397 0 20:51 ? 00:00:00 postgres: stats collector process
Thanks
Jim
Postgres has a logging collector process which is controlled through a config parameter,
logging_collector.
So in your postgresql.conf file, you would make sure this is set:
logging_collector = on
The blurb on this param from the postgres doc:
This parameter enables the logging collector, which is a background
process that captures log messages sent to stderr and redirects them
into log files. This approach is often more useful than logging to
syslog, since some types of messages might not appear in syslog
output. (One common example is dynamic-linker failure messages;
another is error messages produced by scripts such as
archive_command.) This parameter can only be set at server start.
It will show up in the process list with the following description:
postgres: logger process
For more info: http://www.postgresql.org/docs/current/static/runtime-config-logging.html
Regarding the memory structures, I'm not sure offhand, but would recommend you post that as a separate question.

Finding size of postgres shared memory segment on CentOS 6.3

I am trying to debug some shared memory issues with Postgres 9.3.1 and CentOS release 6.3 (Final). Using top, I can see that many of the postgres connections are using shared memory:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3534 postgres 20 0 2330m 1.4g 1.1g S 0.0 20.4 1:06.99 postgres: deploy mtalcott 10.222.154.172(53495) idle
9143 postgres 20 0 2221m 1.1g 983m S 0.0 16.9 0:14.75 postgres: deploy mtalcott 10.222.154.167(35811) idle
6026 postgres 20 0 2341m 1.1g 864m S 0.0 16.4 0:46.56 postgres: deploy mtalcott 10.222.154.167(37110) idle
18538 postgres 20 0 2327m 1.1g 865m S 0.0 16.1 2:06.59 postgres: deploy mtalcott 10.222.154.172(47796) idle
1575 postgres 20 0 2358m 1.1g 858m S 0.0 15.9 1:41.76 postgres: deploy mtalcott 10.222.154.172(52560) idle
...
There are about 29 total idle connections. However, sudo ipcs -m only shows:
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x0052e2c1 163840 postgres 600 48 21
Surprisingly, it only shows it using 48 bytes. Why doesn't ipcs show a larger segment? Is there a different command I should be using?
I think it is because your postgre is of version 9.3, which uses POSIX type of shared memory. And ipcs -m shows sysV shared memory segments, which were used in Postgre of prior versions.