optimize (disk/memory) of dockerized postgres database containing test data - postgresql

Context
For local development and testing within a CI pipeline, I want a postgres docker image that contains some data sampled from production (a few tens of MBs). I will periodically rebuild this image to ensure the sampled data stays fresh.
I don't care at all about data integrity, but I care quite a bit about image size and container disk/memory usage when run. Startup time should be at most a couple of mins.
What I've built
I have a docker file that builds on top of one of the official postgres (postgis) docker images, but it actually initializes the database and uses pg_restore to insert my sample data.
Attempted optimising
I use a mutlistage build, just copying the postgres directory into the final image (this helps as I used node during the build).
I notice that the pg_xlog directory is quite large, and logically seems redundant here since I would happily checkpoint and ditch any WAL before sealing the image. I can't figure out how to get rid of it. I tried starting postgres with the following flags:
-min_wal_size=2 --max_wal_size=3 --archive_mode=off --wal_keep_segments
and running Checkpoint and waiting for a few seconds, but it doesn't seem to change anything. I also tried deleting the contents of the directory, but that seemed to break the database on its next startup.
Rather than put the actual database within the image, I could just leave a pg_dump file in the image and have the image entrypoint build the database from that. I think this would improve the image size (though I'm not clear why the database should take up much more space than the dump, unless indexes are especially big - I actually thought the dump format was less compact than the database itself, so this might offset the index size). This would obviously impact on startup time (but not prohibitively so).
Summary/Questions
Am I going about this the right way? If so, what kind of disk/memory optimizations can I use? In particular can I remove/shrink pg_xlog?
I'm using Postgres 9.5 and Postgis 2.X.

Was the server ever run with a larger max_wal_size than 3? If so, it could have "recycled" ahead a lot of wal files by renaming old ones for future use. Once those are renamed, they will never be removed until after they are used, even if max_wal_size is later reduced.
I also tried deleting the contents of the directory, but that seemed to break the database on its next startup.
You can fix that by using pg_resetxlog. Just don't get in the habit of running that blindly, it is very dangerous to run outside of a test environment.

Related

Are there any negative performance or functionality downsides to using pg_upgrade with --link option afterwards?

I'm about upgrade a quite large PostgreSQL cluster from 9.3 to 11.
The upgrade
The cluster is approximately 1,2Tb in size. The database has a disk system consisting of a fast HW RAID 10 array of 8 DC-edition SSDs with 192GB ram and 64 cores. I am performing the upgrade by replicating the data to a new server with streaming replication first, then upgrading that one to 11.
I tested the upgrade using pg_upgrade with the --link option, this takes less than a minute. I also tested the upgrade regularly (without --link) with many jobs, that takes several hours (+4).
Questions
Now the obvious choice is of cause for me to use the --link option, however all this makes me wonder - is there any downsides (performance or functionality wise) to using that over the regular slower method? I do not know the internal workings of postgresql data structures, but I have a feeling there could be a performance difference after the upgrade between rewriting the data entirely and to just using hard links - whatever that means?
Considerations
The only thing I can find in the documentation about the drawbacks of --link is the downside of not being able to access the old data directory after the upgrade is performed https://www.postgresql.org/docs/11/pgupgrade.htm However that is only a safety concern and not a performance drawback and doesn't really apply in my case of replicating the data first.
The only other thing I can think of is reclaiming space, with whatever performance upsides that might have. However as I understand it, that can also be achieved by running a VACUUM FULL DATABASE (or CLUSTER?) command after the --link-upgraded database has been upgraded? Also the reclaiming of space is not very impactful performance wise on an SSD as I understand.
I appreciate if anyone can help cast some light into this.
There is absolutely no downside to using hard links (with the exception you noted, that the old cluster is dead and has to be removed).
A hard link is in no way different from a normal file.
A “file” in UNIX is in reality an “inode”, a structure containing file metadata. An entry in a directory is a (hard) link to that inode.
If you create another hard link to the inode, the same file will be in two different directories, but that has no impact whatsoever on the behavior of the file.
Of course you must make sure that you don't start both the only and the new server. Instant data corruption would ensue. That's why you should remove the old cluster as soon as possible.

Safe way to backup PostgreSQL when using Persistent Disk

I’m trying to set up daily backups (using Persistent Disk snapshots) for a PostgreSQL instance I’m running on Google Compute Engine and whose data directory lives on a Persistent Disk.
Now, according to the Persistent Disk Backups blog post, I should:
stop my application (PostgreSQL)
fsfreeze my file system to prevent further modifications and flush pending blocks to disk
take a Persistent Disk snapshot
unfreeze my filesystem
start my application (PostgreSQL)
This obviously brings with it some downtime (each of the steps took from seconds to minutes in my tests) that I’d like to avoid or at least minimize.
The steps of the blog post are labeled as necessary to ensure the snapshot is consistent (I’m assuming on the filesystem level), but I’m not interested in a clean filesystem, I’m interested in being able to restore all the data that’s in my PostgreSQL instance from such a snapshot.
PostgreSQL uses fsync when committing, so all data which PostgreSQL acknowledges as committed has made its way to the disk already (fsync goes to the disk).
For the purpose of this discussion, I think it makes sense to compare a Persistent Disk snapshot without stopping PostgreSQL and without using fsfreeze with a filesystem on a disk that has just experienced an unexpected power outage.
After reading https://wiki.postgresql.org/wiki/Corruption and http://www.postgresql.org/docs/current/static/wal-reliability.html, my understanding is that all committed data should survive an unexpected power outage.
My questions are:
Is my comparison with an unexpected power outage accurate or am I missing anything?
Can I take snapshots without stopping PostgreSQL and without using fsfreeze or am I missing some side-effect?
If the answer to the above is that I shouldn’t just take a snapshot, would it be idiomatic to create another Persistent Disk, periodically use pg_dumpall(1) to dump the entire database and then snapshot that other Persistent Disk?
1) Yes, though it should be even safer to take a snapshot. The fsfreeze stuff is really to be 100% safe (anecdotally: I never use fsfreeze on my PDs and have not run into issues)
2) Yes, but there is no 100% guarantee that it will always work (paranoid solution: take a snapshot, spin up a temp VM with that snapshot, check the disk is ok, and delete the VM. This can be automated)
3) No, I would not recommend this over snapshots. It will take a lot more time, might degrade your DB performance, and what happens if something happens in the middle of a dump? Also, PDs are very expensive for incremental backups. Snapshots are diffed, so you don't have to pay for the whole disk every copy (just the first one), only the changes.
Possible recommendation:
Do #3, but then create a snapshot of the new PD and then delete the PD.
https://cloud.google.com/compute/docs/disks/persistent-disks#creating_snapshots has recently been updated and now includes this new paragraph:
If you skip this step, only data which was successfully flushed to disk by the application will be included in the snapshot. The application experiences this scenario as if it was a sudden power outage.
So the answers to my original questions are:
Yes
Yes
N/A, since the answer to ② is Yes.

Mongodb normal exit before applying a write lock

I am using python, scrapy, MongoDB for my web scraping project. I used to scrape 40Gb data daily. Is there a way or setting in mongodb.conf file so that MongoDB will exit normally before applying a write lock on db due to disk full error ?
Because every time i face this problem of disk full error in MongoDB. Then I have to manually re-install MongoDB to remove the write lock from db. I cant run repair and compact command on the database because for running this command also I need free space.
MongoDB doesn't handle disk-full errors very well in certain cases, but you do not have to uninstall and then re-install MongoDB to remove the lock file. Instead, you can just mongod.lock file from this. As long as you have journalling enabled, your data should be good. Of course, at that moment, you can't add more data to the MongoDB databases.
You probably wouldn't need repair and compact only helps if you actually have deleted data from MongoDB. compact does not compress data, so this is only useful if you indeed have deleted data.
Constant adding, and then deleting later can cause fragmentation and lots of disk space to be unused. You can prevent that mostly by using the userPowerOf2Sizes option that you can set on collections. compact mitigates this by rewriting the database files as well, but as you said you need free disk space for this. I would advice you to also add some monitoring to warn you when your data size reaches 50% of your full disk space. In that case, there is still plenty of time to use compact to reclaim unused space.

postgresql: Accidentally deleted pg_filenode.map

Is there any way to recover or re-create pg_filenode.map file that was accidentally deleted? Or is there any solution on how to fix this issue without affecting the database? Any suggestions to fix this issue is highly appreciated! The postgres version that we have is 9.0 running in Redhat Linux 5. Thanks!
STOP TRYING TO FIX ANYTHING RIGHT NOW. Everything you do risks making it worse.
Treat this as critical database corruption. Read and act on this wiki article.
Only once you have followed its advice should you even consider attempting repair or recovery.
Since you may have some hope of recovering the deleted file if it hasn't been overwritten yet, you should also STOP THE ENTIRE SERVER MACHINE or unmount the file system PostgreSQL is on and disk image it.
If this data is important to you I advise you to contact professional support. This will cost you, but is probably your best chance of getting your data back after a severe administrator mistake like this. See PostgreSQL professional support. (Disclaimer: I work for one of the listed companies as shown in my SO profile).
It's possible you could reconstruct pg_filenode.map by hand using information about the table structure and contents extracted from the on-disk tables. Probably a big job, though.
First, if this is urgent and valuable, I strongly recommend contacting professional support initially. However, if you can work on a disk image, if it is not time critical, etc. here are important points to note and how to proceed (we recently had to recover a bad pg_filenode.map. Moreover you are better off working on a disk image of a disk image.
What follows is what I learned from having to recover a damaged file due to an incomplete write on the containing directory. It is current to PostgreSQL 10, but that could change at any time
Before you begin
Data recovery is risky business. Always note what recovery means to your organization, what data loss is tolerable, what downtime is tolerable etc before you begin. Work on a copy of a copy if you can. If anything doesn't seem right, circle back, evaluate what went wrong and make sure you understand why before proceeding.
What this file is and what it does
The standard file node map for PostgreSQL is stored in the pg_class relation which is referenced by object id inside the Pg catalogs. Unfortunately you need a way to bootstrap the mappings of the system tables so you can look up this sort of informatuion.
In most deployments this file will never be written. It can be copied from a new initdb on the same version of Postgres with the same options passed to initdb aside from data directory. However this is not guaranteed.
Several things can change this mapping. If you do a vacuum full or similar on system catalogs, this can change the mapping from the default and then copying in a fresh file from an initdb will not help.
Some Things to Try
The first thing to try (on a copy of a copy!) is to replace the file with one from a fresh initdb onto another filesystem from the same server (this could be a thumb drive or whatever). This may work. It may not work.
If that fails, then it would be possible perhaps to use pg_filedump and custom scripting/C programming to create a new file based on efforts to look through the data of each relation file in the data directory. This would be significant work as Craig notes above.
If you get it to work
Take a fresh pg_dump of your database and restore it into a fresh initdb. This way you know everything is consistent and complete.

MongoDB: mongodump/restore vs. backup up files directly

I'm wondering about experiences people have had with MongoDB backups. Assuming a filesystem snapshot is not an option, what have your experiences been with mongodump/restore versus doing a write lock and backing up the files? Have you run into any bugs with one method that caused you to switch?
From the reading I've done so far, it seems like mongodump/restore has the advantage of being able to run it while the server is live, but I'm not sure how well it will scale.
Locking and copying files is only an option when you don't have heavy write load.
mongodump can be run against live server. It will create some additional load, so don't do it on peak hours. Also, it is advised to do it on a secondary node (if you don't use replica sets, you should).
There are some complications when you have a DB so large that no single machine can hold it. See this document.
Also, if you have replica set, you take down one of secondaries and copy its files directly. See http://www.mongodb.org/display/DOCS/Backups:
A simple approach is just to stop the database, back up the data files, and resume. This is safe but of course requires downtime. This can be done on a secondary without requiring downtime, but you must ensure your oplog is large enough to cover the time the secondary is unavailable so that it can catch up again when you restart it.