Can two postgres service share one common PGDATA folder, one at a time - postgresql

Can I share data between two postgres services in separate machines (PGDATA folder will be in a shared location) while only one service will run at a time?

PostgreSQL has a number of ways to make sure that you cannot start two postmaster processes on the same data directory, but if you mount a filesystem on two machines, these mechanisms will fail. So you would have to make very sure that you don't start servers on both machines; that would lead to data corruption. Moreover, you'd have to make sure that the remote file system is reliable. A Windows network share isn't, for example.
So, all in all, my only recommendation is "don't do that". For high availability, use a proven shared-nothing architecture like Patroni.

Related

Install Postgres on removable volume on linux?

Cloud platforms like Linode.com often provide hot-pluggable storage volumes that you can easily attach and detach from a Linux virtual machine without restarting it.
I am looking for a way to install Postgres so that its data and configuration ends up on a volume that I have mounted to the virtual machine. The end result should allow me to shut down the machine, detach the volume, spin up another machine with an identical version of Postgres already installed, attach the volume and have Postgres work just like it did on the old machine with all the data, file system permissions and server-wide configuration intact.
Is such a thing possible? Is there a reliable way to move installations (i.e databases and configuration, not the actual binaries) of Postgres across machines?
CLARIFICATION: the virtual machine has two disks:
the "built-in" one which is created when the VM is created and mounted to /. That's where Postgres gets installed to and you can't move this disk.
the hot-pluggable disk which you can easily attach and detach from a running VM. This is where I want Postgres data and configuration to be so I can just detach the disk (after shutting down the VM to prevent data loss/corruption) and attach it to another VM when I want my data to move so it behaves like it did on the old VM (i.e. no failures to start Postgres, no errors about permissions or missing files, etc).
This works just fine. It is not really any different to starting and stopping PostgreSQL and not removing the disk. There are a couple of things to consider though.
You have to make sure it is stopped + writing synced before unmounting the volume. Obvious enough, and I can't believe you'd be able to unmount before sync completed, but worth repeating.
You will want the same version of PostgreSQL, probably on the same version of operating system with the same locales too. Different distributions might compile it with different options.
Although you can put configuration and data in the same directory hierarchy, most distros tend to put config in /etc. If you compile from source yourself this won't be a problem. Alternatively, you can usually override the default locations or, and this is probably simpler, bind-mount the data and config directories into the places your distro expects.
Note that if your storage allows you to connect the same volume to multiple hosts in some sort of "read only" mode that won't work.
Edit: steps from comment moved into body for easier reading.
start up PG, create a table put one row in it.
Stop PG.
Mount your volume at /mnt/db
rsync /var/lib/postgresql/NN/main to /mnt/db/pg_data and /etc/postgresql/NN/main to /mnt/db/pg_etc
rename /var/lib/postgresql/NN/main and add .OLD to the name and do the same with the /etc
bind-mount the dirs from /mnt to replace them
restart PG
Test
Repeat
Return to step 8 until you are happy

Using multiple PostgeSQL servers with a single shared network data directory?

How does PostgreSQL handle running multiple servers on different machines using a shared data directory? Does it automatically handle this under-the-hood without problems? Is it possible, but requiring some special configuration? Or is this a bad idea in general?
I'm doing some data science on high performance machine cluster, where I submit jobs, the job is run by a random machine, and each machine has access to a shared network drive. Currently, I'm using SQLite, where this use-case works fine. A single shared SQLite database file can handle multiple connections from different machines without trouble.
I'm now attempting to switch over to PostgreSQL. Intercommunication between the machines of the cluster is surprisingly not straightforward. So while the immediate solution should be having one server which all the other machines connect to, this might not end up being practical. Ideally, I could just continue doing what I've been doing with the SQLite setup. That is, have each machine run it's own PostgreSQL server, which then connects to the shared databases.
No, no, no and yes.
A PostgreSQL installation ("cluster" is the term used in the manuals) expects to be in charge of all of its files. It carefully coordinates access between multiple processes accessing those files. You are supposed to access PostgreSQL in a client/server manner over a socket (unix if local, tcp if not).
This is not supported with PostgreSQL. It will lead to corruption and data loss. If you can't simplify your networking, then you best stick to SQLite. (Assuming it is actually safe with SQLite, something I haven't verified)

can many Postgres processes run with the same data directory?

I have an application running in multiple pods. You can imagine the app as a web application which connects to Postgres (so each container has both app and Postgres processes). I would like to mount the volume into each pod at /var/lib/postgresql/data so that every app can have the same state of the database. They can read/write at the same time.
This is just an idea of how I will go.
My question is: is there any concern I need to be aware of? Or is this the totally wrong way to go?
Or will it be better to separate Postgres from the app container into a single pod and let the app containers connect to that one pod?
If my questions show knowledge I lack, please provide links I should read, thank you!
This will absolutely fail to work, and PostgreSQL will try to prevent you from starting several postmasters against the same data directory as good as it can. If you still manage to do it, instant data corruption will ensue.
The correct way to do this is to have a single database server and have all your “pods” connect to that one. If you have many of these “pods”, you should probably use a connection pooler like pgbouncer to fight the problems caused by too many database connections.

In a containerized cluster, should mongodb servers be running on a worker or a core service?

I'm trying to implement an architecture that's similar to the coreos's production architecture (shown below)
Should I run the database as a central service or one or more of the workers?
I figured the database needs some kind of replication, which makes me think that putting it in the worker cluster makes more sense, but I'm just not sure.
This should be run as a worker. The central services are the basic things that come with CoreOS (mainly etcd). The workers host your applications, the database being one of them. You do have a persistence issue because your database will have state to remember between restarts. So, there is a bigger issue of how do you make that persistence? One was to do it is use a host file and give the database an affinity to that host and mount the host file. Another thing you might consider is running more than one database (if your db technology supports that) and replicate that database so you have two (or more) copies in different workers. (non-affinity). If your database creates transaction logs that can be applied to a backup, you can manage those transaction logs in a worker.
Another thing to consider is not using a container for your database. The database is a weird animal, its care and feeding is not like the rest of the applications. So it is reasonable (in my opinion) to have your database managed and maintained outside the scope of your cluster (but still reachable by the cluster).

Using vagrant & puppet, how to create and restore a database on fresh postgresql-server instance?

I have fresh provisioned instances of apache and postgres all set to go. I would like to restore a dump or mount a logical volume with data to the postgres instance. Likewise, I'd like to ensure that the dump is written out or the volume unmounted when i bring the instance down.
Can I use a logical volume this way? How should I approach?
I see this:
How to handle data such as Mysql, web sites sources with Vagrant?
The other answer had the following suggestions. Below I will discuss their implications for PostgreSQL.
In the current version of Vagrant (1.0.3), you have two main options:
Use shared folders. You can put your MySQL data directory into a shared folder so that the data comes back onto your host machine. The
con of this is that shared folders are actually quite slow compared to
the native VM filesystem in VirtualBox, and you can run into weird
permission issues as well.
Setup a task (rake, make, etc.) to copy your MySQL data to your shared folder on demand. Then, before you decide to destroy your VM,
you can run the task to export your data to your shared folder, then
you can reimport the data when you bring your VM back up.
The shared folders approach may work, but if you do this you need to be extremely careful with file permissions. PostgreSQL tends to be very paranoid about this, so you may have to be cautious about group permissions.
I would recommend something based on the second approach with a base backup (using pg_basebackup) since you get a copy of your database. You can also archive your wal segments to that directory to have something that can be restored on demand to near-present conditions.