How to manage the wals dir on a PostgreSQL replication server? - postgresql

I have two PostgreSQL 9 servers running on Amazon EC2. One is the master, the other is a replication server in standby.
The replication server is continuing to fail as the hard drive fills up. It appears that the following directory is constantly growing:
/usr/local/pgsql/wals
There are thousands of files like:
-rw------- 1 pgsql users 16777216 Jan 3 20:36 000000010000001B000000A2
-rw------- 1 pgsql users 16777216 Jan 3 20:40 000000010000001B000000A3
-rw------- 1 pgsql users 16777216 Jan 3 20:46 000000010000001B000000A4
How do you set this up to not cause failure? Do you need to auto-rotate the wal files? Or? Could really use your advice. Thank you

You should configure an archive_cleanup_command in your recovery.conf file. Check out pg_archivecleanup; it's made for this purpose.
The WAL files could also serve as an archive for backup and recovery purposes, so they are not automatically deleted if you are just doing replication.
(Alternatively, you could use whatever hand-crafted method you like to clean up the archive, but that could be a bit difficult and error prone.)

There is a setting in the postgresql.conf file called wal_keep_segments= and checkpoint_segments= I have mine set to wal_keep_segments=128 and checkpoint_segments=128 with replication fine.
wal_keep_segments are the minimum wal_files in the directory.
checkpoint_segments are the maximum.
I set it on both.

Related

Creating a copy of the database in PostgreSQL

I'm trying to create the first copy of my database. I'm using PostgreSQL and Ubuntu 16+ with Django technology.
I found this documentation to create a copy:
I'm trying to export the entire database to a file so that I can add it to another server. I tried this:
pg_dump app_prod > test_copy
pg_dump --host=localhost --username=app --dbname=app_prod --file=testdb.sql
after selecting ls my directory can see the database. But by running eg WinSCP it is not visible.
How can I take these files, copy them to my Windows system and upload to another Ubuntu server?
I think that it is enough to make them visible in WinSCP. How can I do this?
EDIT:
drwxr-xr-x 3 postgres postgres 4096 Oct 4 08:06 9.5
-rw-rw-r-- 1 postgres postgres 3578964 Jan 18 10:46 test_copy
-rw-rw-r-- 1 postgres postgres 0 Jan 18 10:54 testdb.sql
It seems like this was resolved in the comments: you were looking at the wrong folder in the WinSCP folder explorer.
There are a few items worth noting to bolster the good advice already given:
Your ls -l output indicates that the SQL file is zero bytes in size, so something has gone wrong there. If you manage to transfer it to your local machine, you will find it is empty.
Also, try not to store database dumps in /var/lib/postgresql - this is where your PostgreSQL database keeps live database files on the server, and you don't want to risk changing or deleting anything here. Use /home/maddie instead (change the username as appropriate).

Is it safe to delete archive_status log file in postgresql 10

I am not a DBA but i am using postgresql for production server and i am using postgresql 10 database. I am using Bigsql and i started replication of my production server to other server and on replication server everything is working but on my production server their is no space left. And after du command on my production server i am getting that pg_wal folder have 17 gb file and each file is of 16 mb size.
After some google search i change my postgresql.conf file as:
wal_level = logical
archive_mode = on
archive_command = 'cp -i %p /etc/bigsql/data/pg10/pg_wal/archive_status/%f'
i install postgresql 10 from Bigsql and did above changes.
After changes the dir /pg_wal/archive_status had 16 gb of log. So my question is that should i delete them manually or i have to wait for system delete them automatically.
And is that if i write archive_mode to on should that wal file getting removed automatically??
Thanks for your precious time.
This depends on how you do your backups and whether you'd ever need to restore the database to some point in time.
Only a full offline filesystem backup (offline meaning with database turned off) or an on-line logical backup with pg_dumpall will not need those files for a restore.
You'd need those files to restore a filesystem backup created while the database is running. Without them the backup will fail to restore. Though there exist backup solutions that copy needed WAL files automatically (like Barman).
You'd also need those files if your replica database will ever fall behind the master for some reason. Or you'd need to restore the database to some past point-in-time.
But these files compress pretty well - should be less than 10% size after compression - you can write your archive_command to compress them automatically instead of just copying.
And you should delete them eventually from the archive. I'd recommend to not delete them until they're at least a month old and also at least 2 full successful backups are done after creating them.

How can I tell if barman is receiving the WAL stream during the day?

I followed the directions in this and this. I've also successfully backed up from one server and restored it to another server. My barman is on a dedicated machine. Looking good. But how can I tell if it's receiving the WAL stream during the day?
I can see the base backups in [barman-server]:/var/lib/barman
barman check mydb is reporting good things
[root#barman barman]# barman check mydb
Server mydb:
PostgreSQL: OK
is_superuser: OK
PostgreSQL streaming: OK
wal_level: OK
replication slot: OK
directories: OK
retention policy settings: OK
backup maximum age: OK (interval provided: 7 days, latest backup age: 24 minutes)
compression settings: OK
failed backups: OK (there are 0 failed backups)
minimum redundancy requirements: OK (have 3 backups, expected at least 0)
pg_basebackup: OK
pg_basebackup compatible: OK
pg_basebackup supports tablespaces mapping: OK
pg_receivexlog: OK
pg_receivexlog compatible: OK
receive-wal running: OK
archiver errors: OK
I have made a cron entry to run the barman backup mydb command (I think it makes more base backups)
[root#barman ~]# cat /etc/cron.d/do_backups
30 23 * * * /usr/bin/barman backup mydb
I share this guy's opinion that this doesn't belong in a separate cron job -- it belongs in the /etc/barman.d/.conf files as some kind of setting that says "Take a Base-Backup every X days" or some such, but that's not my problem in this question.
How do I tell if this is receiving the WAL stream intra-day?
What do I look for to see some progress?
Is there a way to see the IP address or a database connection for this so I know for sure?
(I think I need a little education on WAL streams as well) Are WAL streams something that the PG server "sends" to barman? or is it "pulled" from a process on the barman?
barman uses cron command to make sure WAL streaming actually works as expected
you can see related document here
this command runs every minute and added to your system cron if you have installed barman via debian/fedora packages
you can check it on debian here : /etc/cron.d/barman
for getting a sense about barman cron job , set log_level to DEBUG in /etc/barman.conf
and watch for barman log via tailf /var/log/barman/barman.log
every minute , this command takes care of new WAL files and archive them

Moved postgres data file and postgres started writing to a new file with same name

I have a directory where postgres was writing to a file: 15426233.4
-rw------- 1 postgres sql 1.0G Feb 4 13:41 15426233
-rw------- 1 postgres sql 149M Feb 4 13:41 15426233.4
-rw------- 1 postgres sql 1.0G Feb 4 13:41 15426233.3
drwx------ 3 postgres sql 75K Feb 4 13:40 .
-rw------- 1 postgres sql 1.0G Feb 4 13:34 15426233.2
-rw------- 1 postgres sql 1.0G Feb 4 13:28 15426233.1
Initially this file 15426233.4 was under /data5/PG_9.1/15411/15426233.4. Due to disk space getting filled up under /data6 partition I ended up moving to /data8 however normally we run a symlink so that /data5/PG_9.1/15411/15426233.4 would now symlink to /data8/PG_9.1/15411/15426233.4. Due to lack of disk space the symlink creation failed but move still happened. A while later postgres started writing to a new file /data5/PG_9.1/15411/15426233.4. Is there a way I can stop the db, consolidate the data in /data5/PG_9.1/15411/15426233.4 to /data8/PG_9.1/15411/15426233.4, then create a symlink /data5/PG_9.1/15411/15426233.4 that points to /data8/PG_9.1/15411/15426233.4 and restart the postgres db?
Never, ever, mess with individual files in the data directory. And never ever change catalog tables (pg_tablespace) manually. Everything you need to do can be done through SQL or by handling the complete data directory.
You have several ways to move data to a bigger disk:
Tablespaces
When you create a new tablespace you can put that on the other disk. Then move the tables in question to the new tablespace:
create tablespace bigdata location '/path/to/bigdisk';
Then move the tables to the tablespace:
alter table bigtable set tablespace bigdata;
There is no need to mess around with the data directory manually.
Move the data directory
Caution!
Only do this after stopping Postgres!
Once you have stopped the Postgres server, copy the complete(!) existing data directory to the new disk (make sure you have Postgres stopped before you do that). It's safe to copy the data and later delete the original if everything is OK rather than moving it right away.
After copying the data, adjust postgresql.conf to point it to the new data directory: http://www.postgresql.org/docs/current/static/runtime-config-file-locations.html
Rename the old data to make sure you will see an error if the configuration wasn't changed properly.
Start Postgres again.
The physical layout of the disk storage is documented in the manual:
http://www.postgresql.org/docs/current/static/storage.html

Restoring Mongodb database fails using VMC tunneling

I have successfully published an app to CloudFoundry. When I try and seed the database using VMC tunneling and mongorestore only part of the data is transferred. The restore process hangs part way into the collection. If I use mongorestore to restore the dump to my local mongo instance it works well.
$vmc tunnel energy mongorestore
Opening tunnel on port 10000... OK
Waiting for local tunnel to become available... OK
Directory or filename to restore from> ./dump/energy
connected to: localhost:10000
Wed Jan 16 09:22:25 ./dump/energy/twohourlyhistoryDatas.bson
Wed Jan 16 09:22:25 going into namespace [db.twohourlyhistoryDatas]
Wed Jan 16 09:22:27 warning: Restoring to db.twohourlyhistoryDatas without dropping.
Restored data will be inserted without raising errors; check your server log
795 objects found
Wed Jan 16 09:22:27 Creating index: { key: { _id: 1 }, ns: "db.twohourlyhistoryDatas", name: "_id_" }
I've left this for several hours and it hasn't finished. Using a network monitor I can see the data being transferred for 10-15 seconds, and then stopping suddenly. Turning on verbose mode for vmc hasn't given any failures. Running mongorestore directly with the same command and very verbose output also hasn't shed any light on the problem.
Apart from this, using CloudFoundry has been outstandingly easy. Any suggestions on where to look now to resolve the issue are welcome!
There are size limits on the database (for Mongo it's 240Mb) and also time limits on operations over the tunnel too, how big is the database?