Data Recovery: PostgreSQL showing base volume under postgres pg_default tablespace, but does not recognize separate databases - postgresql

I had an instance of Postgres (v 9.2), running locally on Windows 7. I have yet to isolate the cause, but PG became corrupted in such a way that the server abruptly stopped, and the service would shut down immediately when I attempted to restart it. I reinstalled 9.2, and that fixed the problem with the service not starting. However, now pgAdmin does not show any of the databases were there previously (yet the files are still there in the data\base directory). Oddly, the size of the pg_default tablespace shows 11GB, the correct size, but does not show any of the databases or objects under the dependencies. The backups I have are a few days old, so I would like to restore the databases directly from the files. How do I get PG to recognize the database files that are in the data/base directory?

In general, every data recovery job is unique. You aren't going to find a simple answer, and these require a lot of hands-on troubleshooting. If you are going to do this yourself, I have some pointers below for getting started. If the data is important, hire an expert (2ndQuadrant, PgExperts, etc).
A few general rules:
Work on a copy of the files (i.e. back up your data directory and all tablespaces, and work on that, on another computer). Better yet, create a validated copy and work on a copy of that.
After having made and verified copies (ideally with hashes of data), run hardware diagnostics on the corrupted system to see what went wrong.
Now to get started, you are probably want to look over the PostgreSQL architecture docs and source code relating to on-disk layout. You will probably need a hex editor. You will certainly want to look at the system tables to see why the relations are not showing up. If you don't have a good understanding of memory and disk alignment issues on your platform you need to brush up on that as well.

Related

How to recover PostgreSQL 13.5 database without backup file in Ubuntu 20.04 server?

I had 10-15 microservices databases running in production on Ubuntu server. Accidentally deleted everything in the /var/lib/postgresql/** folder with command sudo rm -r *. I think PGDATA is inside the /var/lib/postgresql/13/ folder.
I tried TestDisk to restore this folder but it showed everything deleted except the 13/ folder.
I only have backup files from a long time ago.
Is there any way to restore the last data?
If you don't have a backup of the deleted files and testdisk was not able to recover them, you may want to try using another data recovery tool such as extundelete or photorec. These tools work by scanning the partition and looking for data that is no longer referenced by the file system, which can include deleted files.
It's important to note that the chances of successfully recovering deleted files decrease as more time passes and more activity occurs on the partition, so it's best to try to recover the files as soon as possible after the deletion. In addition, the more you use the partition after the deletion, the more likely it is that the deleted data will be overwritten, making it impossible to recover.
If you are unable to recover the deleted files using these tools, you may want to consider seeking the assistance of a professional data recovery service. These services typically have specialized equipment and expertise that can be used to recover data from damaged or formatted disks. However, these services can be expensive, so it's important to weigh the value of the data against the cost of recovery.

Install Postgres on removable volume on linux?

Cloud platforms like Linode.com often provide hot-pluggable storage volumes that you can easily attach and detach from a Linux virtual machine without restarting it.
I am looking for a way to install Postgres so that its data and configuration ends up on a volume that I have mounted to the virtual machine. The end result should allow me to shut down the machine, detach the volume, spin up another machine with an identical version of Postgres already installed, attach the volume and have Postgres work just like it did on the old machine with all the data, file system permissions and server-wide configuration intact.
Is such a thing possible? Is there a reliable way to move installations (i.e databases and configuration, not the actual binaries) of Postgres across machines?
CLARIFICATION: the virtual machine has two disks:
the "built-in" one which is created when the VM is created and mounted to /. That's where Postgres gets installed to and you can't move this disk.
the hot-pluggable disk which you can easily attach and detach from a running VM. This is where I want Postgres data and configuration to be so I can just detach the disk (after shutting down the VM to prevent data loss/corruption) and attach it to another VM when I want my data to move so it behaves like it did on the old VM (i.e. no failures to start Postgres, no errors about permissions or missing files, etc).
This works just fine. It is not really any different to starting and stopping PostgreSQL and not removing the disk. There are a couple of things to consider though.
You have to make sure it is stopped + writing synced before unmounting the volume. Obvious enough, and I can't believe you'd be able to unmount before sync completed, but worth repeating.
You will want the same version of PostgreSQL, probably on the same version of operating system with the same locales too. Different distributions might compile it with different options.
Although you can put configuration and data in the same directory hierarchy, most distros tend to put config in /etc. If you compile from source yourself this won't be a problem. Alternatively, you can usually override the default locations or, and this is probably simpler, bind-mount the data and config directories into the places your distro expects.
Note that if your storage allows you to connect the same volume to multiple hosts in some sort of "read only" mode that won't work.
Edit: steps from comment moved into body for easier reading.
start up PG, create a table put one row in it.
Stop PG.
Mount your volume at /mnt/db
rsync /var/lib/postgresql/NN/main to /mnt/db/pg_data and /etc/postgresql/NN/main to /mnt/db/pg_etc
rename /var/lib/postgresql/NN/main and add .OLD to the name and do the same with the /etc
bind-mount the dirs from /mnt to replace them
restart PG
Test
Repeat
Return to step 8 until you are happy

How to configure WAL archiving for a cluster that *only* hosts dev or test databases?

I've got a dev and test database for a project, i.e. databases that I use to either run my project or run tests, locally. They're both in the same cluster ('instance' – I come from Redmond).
Note that my local cluster is different than the cluster that hosts the production database.
How should I configure those databases with respect to archiving the WAL files?
I'd like to be able to 'build' or 'rebuild' either of those databases by restoring from a base backup and running seed data scripts.
But how should I configure the databases or the cluster for archiving WAL files? I understand that I need them if I want to recover the database. I think that's unlikely (as I didn't even know about 'WAL' or their files, or that, presumably they're shared by all of the databases in the same cluster, which seems weird and scary coming from Microsoft SQL Server.)
In the event that I rebuild one of the databases, I should delete the WAL files since the base backup – how can I do that?
But I also don't want to have to worry about the size of the WAL files growing indefinitely. I don't want to be forced to rebuild just to save space. What can I do to prevent this?
My local cluster only contains a single dev and test database for my project, i.e. losing data from one of these databases is (or should be) no big deal. Even having to recreate the cluster itself, and the two databases, is fine and not an issue if it's even just easier than otherwise to restore the two databases to a 'working' condition for local development and testing.
In other words, I don't care about the data in either database. I will ensure – separate from WAL archiving – that I can restore either database to a state sufficient for my needs.
Also, I'd like to document (e.g. in code) how to configure my local cluster and the two databases so that other developers for the same project can use the same setup for their local clusters. These clusters are all distinct from the cluster that hosts the production database.
Rather than trying to manage your WAL files manually, it's generally recommended that you let a third-party app take care of that for you. There are several options, but pg_backrest is the most popular of the open-source offerings out there.
Each database instance writes its WAL stream, chopped in segments of 16MB.
Every other relational database does the same thing, even Microsoft SQL Server (the differences are in the name and organization of these files).
The WAL contains the physical information required to replay transactions. Imagine it as information like: "in file x, block 2734, change 24 bytes at offset 543 as follows: ..."
With a base backup and this information you can restore any given point in time in the life of the database since the end of the base backup.
Each PostgreSQL cluster writes its own "WAL stream". The files are named with long weird hexadecimal numbers that never repeat, so there is no danger that a later WAL segment of a cluster can conflict with an earlier WAL segment of the same cluster.
You have to make sure that WAL is archived to a different machine, otherwise the exercise is pretty useless. If you have several clusters on the same machine, make sure that you archive them to different directories (or locations in general), because the names of the WAL segments of different clusters will collide.
About retention: You want to keep around your backups for some time. Once you get rid of a base backup, you can also get rid of all WAL segments from before that base backup. There is the pg_archivecleanup executable that can help you get rid of all archived WAL segments older than a given base backup.
I'd like to be able to 'build' or 'rebuild' either of those databases by restoring from a base backup and running seed data scripts.
Where is the basebackup coming from? If you are restoring the PROD base backup and running the seed scripts over it, then you don't need WAL archiving at all on test/dev. But then what you get will be a clone of PROD, which means it will not have different databases for test and for dev in the same instance, since (presumably) PROD doesn't have that.
If the base backup is coming from someplace else, you will have to describe what it is. That will dictate your WAL needs.
Trying to run one instance with both test and dev on it seems like a false economy to me. Just run two instances.
Setting archive_mode=off will entirely disable a wal archive. There will still be "live" WAL files in the pg_wal or pg_xlog directory, but these get removed/recycled automatically after each checkpoint--you should not need to manage these, other than by controlling how often checkpoints take place (and making sure you don't have any replication slots hanging around). The WAL archive and the live WAL files are different things. The live WAL files are mandatory and are needed to automatically recover from something like a power failure. The WAL archive may be needed to manually recover from a hard-drive crash or the total destruction of your server, and probably isn't needed at all on dev/test.

Using vagrant & puppet, how to create and restore a database on fresh postgresql-server instance?

I have fresh provisioned instances of apache and postgres all set to go. I would like to restore a dump or mount a logical volume with data to the postgres instance. Likewise, I'd like to ensure that the dump is written out or the volume unmounted when i bring the instance down.
Can I use a logical volume this way? How should I approach?
I see this:
How to handle data such as Mysql, web sites sources with Vagrant?
The other answer had the following suggestions. Below I will discuss their implications for PostgreSQL.
In the current version of Vagrant (1.0.3), you have two main options:
Use shared folders. You can put your MySQL data directory into a shared folder so that the data comes back onto your host machine. The
con of this is that shared folders are actually quite slow compared to
the native VM filesystem in VirtualBox, and you can run into weird
permission issues as well.
Setup a task (rake, make, etc.) to copy your MySQL data to your shared folder on demand. Then, before you decide to destroy your VM,
you can run the task to export your data to your shared folder, then
you can reimport the data when you bring your VM back up.
The shared folders approach may work, but if you do this you need to be extremely careful with file permissions. PostgreSQL tends to be very paranoid about this, so you may have to be cautious about group permissions.
I would recommend something based on the second approach with a base backup (using pg_basebackup) since you get a copy of your database. You can also archive your wal segments to that directory to have something that can be restored on demand to near-present conditions.

How to scale MongoDB?

I know that MongoDB can scale vertically. What about if I am running out of disk?
I am currently using EC2 with EBS. As you know, I have to assign EBS for a fixed size.
What if the MongoDB growth bigger than the EBS size? Do I have to create a larger EBS and Copy & Paste the files?
Or shall we start more MongoDB instance and each connect to different EBS disk? In such case, I could connect to a different instance for different databases.
If you're running out of disk, you obviously need to get a bigger disk.
There are several ways to migrate your data, it really depends on the type of up-time you need. First steps of course involve bundling the machine and creating the new volume.
These tips go from easiest to hardest.
Can you take the database completely off-line for several minutes?
If so, do this (migration by copy):
Mount new EBS on the server.
Stop your app from connecting to Mongo.
Shut down mongod and wait for everything to write (check the logs)
Copy all of the data files (and probably the logs) to the new EBS volume.
While the copy is happening, update your mongod start script (or config file) to point to the new volume.
Start mongod and check connection
Restart your app.
Can you take the database off-line for just a few minutes?
If so, do this (slaving and switch):
Start up a new instance and mount the new EBS on that server.
Install / start mongod as a --slave pointing at the current database. (you may need to re-start the current as --master)
The slave will do a fresh synchronization. Once the slave is up-to-date, you'll do a "switch" (next steps).
Turn off writes from the system.
Shut down the original mongod process.
Re-start the "new" mongod as a master instead of the slave.
Re-activate system writes pointing at the new master.
Done correctly those last three steps can happen in minutes or even seconds.
Can you not afford any down-time?
If so, do this (master-master):
Start up a new instance and mount the new EBS on that server.
Install / start mongod as a master and a slave against the current database. (may need to re-start current as master, minimal down-time?)
The new computer should do a fresh synchronization.
Once the new computer is up-to-date, switch the system to point at the new server.
I know it seems like this last version is actually the best, but it can be a little dicey (as of this writing). The reason is simply that I've honestly had a lot of issues with "Master-Master" replication, especially if you don't start with both active.
If you plan on using this method, I highly suggest a smaller practice run first. If something bombs here, Mongo might simply wipe all of your data files which will have the effect of taking more stuff down.
If you get a good version of this please post the commands, I'd like to see it in action.
Doesn't the E in EBS stand for elastic meaning something like resizing on the fly?
Currently the MongoDB team is working on finishining sharding which will allow you horizontal scaling by partitioning data separately on different servers. Give it a month or two and it will work fine. The developers are quite good at keeping their promises.
http://api.mongodb.org/wiki/current/Sharding%20Introduction.html
http://api.mongodb.org/wiki/current/Sharding%20Limits.html
You could slave the bigger disk off the smaller until it's caught up
or
fsync+lock and take a file system snapshot and copy it onto the bigger disk.
well, I am using Mongo DB now. I am pretty amazed the performance it generated, especially on some simple sorting.
I believe it's a good tool for simple web application logic. The remaining concern for is how to scale and backup. I will continue to explore.
The only disadvantage I have is that I didn't have any good tools to reveal the data stored inside. For example, I want to put my logging from MYSQL into Mongo as well. However, it's pretty difficult for me to view the log. Previously, i can use MYSQL query to fetch what I want easily.
Anyway, it's a good tool and I will continue to use it.