How to recover PostgreSQL 13.5 database without backup file in Ubuntu 20.04 server? - postgresql

I had 10-15 microservices databases running in production on Ubuntu server. Accidentally deleted everything in the /var/lib/postgresql/** folder with command sudo rm -r *. I think PGDATA is inside the /var/lib/postgresql/13/ folder.
I tried TestDisk to restore this folder but it showed everything deleted except the 13/ folder.
I only have backup files from a long time ago.
Is there any way to restore the last data?

If you don't have a backup of the deleted files and testdisk was not able to recover them, you may want to try using another data recovery tool such as extundelete or photorec. These tools work by scanning the partition and looking for data that is no longer referenced by the file system, which can include deleted files.
It's important to note that the chances of successfully recovering deleted files decrease as more time passes and more activity occurs on the partition, so it's best to try to recover the files as soon as possible after the deletion. In addition, the more you use the partition after the deletion, the more likely it is that the deleted data will be overwritten, making it impossible to recover.
If you are unable to recover the deleted files using these tools, you may want to consider seeking the assistance of a professional data recovery service. These services typically have specialized equipment and expertise that can be used to recover data from damaged or formatted disks. However, these services can be expensive, so it's important to weigh the value of the data against the cost of recovery.

Related

Difference between incremental backup and WAL archiving with PgBackRest

As far as I understood
WAL archiving is pushing the WAL logs to a storage place as the WAL files are generated
Incremental backup is pushing all the WAL files created since the last backup
So, assuming my WAL archiving is setup correctly
Why would I need incremental backups?
Shouldn't the cost of incremental backups be almost zero?
Most of the documentation I found is focusing on a high level implementation (e.g. how to setup WAL archiving or incremental backups) vs the internal ( what happens when I trigger an incremental backup)
My question can probably be solved with a link to some documentation, but my google-fu has failed me so far
Backups are not copies of the WAL files, they're copies of the cluster's whole data directory. As it says in the docs, an incremental backup contains:
those database cluster files that have changed since the last backup (which can be another incremental backup, a differential backup, or a full backup)
WALs alone aren't enough to restore a database; they only record changes to the cluster files, so they require a backup as a starting point.
The need for periodic backups (incremental or otherwise) is primarily to do with recovery time. Technically, you could just hold on to your original full backup plus years worth of WAL files, but replaying them all in the event of a failure could take hours or days, and you likely can't tolerate that kind of downtime.
A new backup also means that you can safely discard any older WALs (assuming you don't still need them for point-in-time recovery), meaning less data to store, and less data whose integrity you're relying on in order to recover.
If you want to know more about what pgBackRest is actually doing under the hood, it's all covered pretty thoroughly in the Postgres docs.

How to fix this ERROR: could not open file "pg_wal/00000003.history", or what *not* to do when PostgreSQL runs out of disk space

If you ever run out of space in your pg_wal directory, and you can't grow the size of that directory or switch to standby in a case where your primary goes down as a result of the full pg_wal directory, when moving the WAL files to a different location, ensure not to move pg_wal/00000003.historyas it is a core file when starting pg_basebackup for streaming replication. If in case you had moved the file already to a different location, consider bringing this particular one pg_wal/00000003.history back to the pg_wal directory.
In a case where you had actually deleted the file then you may consider just creating another to see if will work...i have not tried it myself so you maybe the first or second to try :).
I faced this issue in my production environment and resolved it by copying back pg_wal/00000003.history from where i had copied it to and my streaming replication could run without errors.
Never, ever, manually mess with the contents of pg_wal. In v10, we went to some effort to rename what was formerly known as pg_xlog to pg_wal, precisely to discourage misguided people who thought that were "just log files" from deleting files there.
If you run out of space, there is one thing you can do: move the entire pg_wal to a different file system where you have space and put a symbolic link to the new location into the PostgreSQL data directory.

How to configure WAL archiving for a cluster that *only* hosts dev or test databases?

I've got a dev and test database for a project, i.e. databases that I use to either run my project or run tests, locally. They're both in the same cluster ('instance' – I come from Redmond).
Note that my local cluster is different than the cluster that hosts the production database.
How should I configure those databases with respect to archiving the WAL files?
I'd like to be able to 'build' or 'rebuild' either of those databases by restoring from a base backup and running seed data scripts.
But how should I configure the databases or the cluster for archiving WAL files? I understand that I need them if I want to recover the database. I think that's unlikely (as I didn't even know about 'WAL' or their files, or that, presumably they're shared by all of the databases in the same cluster, which seems weird and scary coming from Microsoft SQL Server.)
In the event that I rebuild one of the databases, I should delete the WAL files since the base backup – how can I do that?
But I also don't want to have to worry about the size of the WAL files growing indefinitely. I don't want to be forced to rebuild just to save space. What can I do to prevent this?
My local cluster only contains a single dev and test database for my project, i.e. losing data from one of these databases is (or should be) no big deal. Even having to recreate the cluster itself, and the two databases, is fine and not an issue if it's even just easier than otherwise to restore the two databases to a 'working' condition for local development and testing.
In other words, I don't care about the data in either database. I will ensure – separate from WAL archiving – that I can restore either database to a state sufficient for my needs.
Also, I'd like to document (e.g. in code) how to configure my local cluster and the two databases so that other developers for the same project can use the same setup for their local clusters. These clusters are all distinct from the cluster that hosts the production database.
Rather than trying to manage your WAL files manually, it's generally recommended that you let a third-party app take care of that for you. There are several options, but pg_backrest is the most popular of the open-source offerings out there.
Each database instance writes its WAL stream, chopped in segments of 16MB.
Every other relational database does the same thing, even Microsoft SQL Server (the differences are in the name and organization of these files).
The WAL contains the physical information required to replay transactions. Imagine it as information like: "in file x, block 2734, change 24 bytes at offset 543 as follows: ..."
With a base backup and this information you can restore any given point in time in the life of the database since the end of the base backup.
Each PostgreSQL cluster writes its own "WAL stream". The files are named with long weird hexadecimal numbers that never repeat, so there is no danger that a later WAL segment of a cluster can conflict with an earlier WAL segment of the same cluster.
You have to make sure that WAL is archived to a different machine, otherwise the exercise is pretty useless. If you have several clusters on the same machine, make sure that you archive them to different directories (or locations in general), because the names of the WAL segments of different clusters will collide.
About retention: You want to keep around your backups for some time. Once you get rid of a base backup, you can also get rid of all WAL segments from before that base backup. There is the pg_archivecleanup executable that can help you get rid of all archived WAL segments older than a given base backup.
I'd like to be able to 'build' or 'rebuild' either of those databases by restoring from a base backup and running seed data scripts.
Where is the basebackup coming from? If you are restoring the PROD base backup and running the seed scripts over it, then you don't need WAL archiving at all on test/dev. But then what you get will be a clone of PROD, which means it will not have different databases for test and for dev in the same instance, since (presumably) PROD doesn't have that.
If the base backup is coming from someplace else, you will have to describe what it is. That will dictate your WAL needs.
Trying to run one instance with both test and dev on it seems like a false economy to me. Just run two instances.
Setting archive_mode=off will entirely disable a wal archive. There will still be "live" WAL files in the pg_wal or pg_xlog directory, but these get removed/recycled automatically after each checkpoint--you should not need to manage these, other than by controlling how often checkpoints take place (and making sure you don't have any replication slots hanging around). The WAL archive and the live WAL files are different things. The live WAL files are mandatory and are needed to automatically recover from something like a power failure. The WAL archive may be needed to manually recover from a hard-drive crash or the total destruction of your server, and probably isn't needed at all on dev/test.

Can't find wal backup files

I have two servers, a master and a replica that work together in asynchronous replication mode; slave server seems to be working fine, since any change in the master is mirrored in the slave right away. Moreover, there is also an archive process that copy the wal files from the master to another filesystem to keep them safety. My doubt is, what wal files can I delete by means of pg_archivecleanup? I guess that I need look for wal files with .backup extension in the master and then I could delete wal files older than last backup.
For instance, if I have these files in the master server
000000010000000000000089
000000010000000000000088.00000028.backup
000000010000000000000088
000000010000000000000087
000000010000000000000086
...
I come to the conclusion that is safe deleting 000000010000000000000088 and older files, and keep the newest ones.
The problem is that I don't find .backup files anywhere: neither in the master, nor in the replica, nor in the archive location.
The *.backup files are created if you perform an online backup with pg_basebackup or similar and you are archiving WAL files using the archive_command.
In that case you can use pg_archivecleanup with such a file as argument to automatically remove all WAL archives that are older than that backup.
Your either use a different backup method (pg_dump?), or you are not using archive_command.
With pg_dump you cannot do archive recovery, so you wouldn't need to archive WALs at all. If you are using a different archive method like pg_receivewal, you won't get the *.backup files and you have to think of a different method to remove your old WAL archives.
One simple method to purge your old WAL archives is to simply remove all those that are older than your retention time.
The files are still being generated and archived (unless you turned that off) an the master. They are also passed by streaming to the replica where they are kept in pg_wal, but the replica automatically cleans them up every restartpoint. You can get the replica to keep them permanently by setting archive_mode=always on the replica, but it sounds like you don't want that.
If the only purpose of the archive (by the master) is to save files for use by the replica in case it falls to far behind for streaming (not for disaster recovery or PITR) than you can use "pg_archivecleanup" to automatically clean them up. This is invoked on the replica (not the master) but it must have write access to the archive directory. So you can mount it as a network file share, you can wrap pg_archivecleanup in ssh so it gets run on the master rather than the replica.

Data Recovery: PostgreSQL showing base volume under postgres pg_default tablespace, but does not recognize separate databases

I had an instance of Postgres (v 9.2), running locally on Windows 7. I have yet to isolate the cause, but PG became corrupted in such a way that the server abruptly stopped, and the service would shut down immediately when I attempted to restart it. I reinstalled 9.2, and that fixed the problem with the service not starting. However, now pgAdmin does not show any of the databases were there previously (yet the files are still there in the data\base directory). Oddly, the size of the pg_default tablespace shows 11GB, the correct size, but does not show any of the databases or objects under the dependencies. The backups I have are a few days old, so I would like to restore the databases directly from the files. How do I get PG to recognize the database files that are in the data/base directory?
In general, every data recovery job is unique. You aren't going to find a simple answer, and these require a lot of hands-on troubleshooting. If you are going to do this yourself, I have some pointers below for getting started. If the data is important, hire an expert (2ndQuadrant, PgExperts, etc).
A few general rules:
Work on a copy of the files (i.e. back up your data directory and all tablespaces, and work on that, on another computer). Better yet, create a validated copy and work on a copy of that.
After having made and verified copies (ideally with hashes of data), run hardware diagnostics on the corrupted system to see what went wrong.
Now to get started, you are probably want to look over the PostgreSQL architecture docs and source code relating to on-disk layout. You will probably need a hex editor. You will certainly want to look at the system tables to see why the relations are not showing up. If you don't have a good understanding of memory and disk alignment issues on your platform you need to brush up on that as well.