Mirror one database to another in PostgreSQL - postgresql

I know the way to set up a Master/Slave DB in Postgres is having 2 DB servers, but unfortunately i have only one server for now.
How can i mirror my production db into another "backup db" in "real_time"? I want to give access to another user to the mirrored db, so even if he does something there it will not affect production.

Nothing stops you setting up hot standby streaming replication, or another replication option like Londiste, between two PostgreSQL instances on the same computer.
The two copies of PostgreSQL must use different ports, but that's the only real restriction.
How to set up the second PostgreSQL instance depends on your operating system and how you installed PostgreSQL, which you have not mentioned.
You'll want to use streaming replication with hot standby if you want a read-only replica. If you want it to be read/write, then you can do a one-off copy of the database with pg_basebackup and not keep them in sync after that. Or you can use a tool like Londiste to replicate changes selectively.

You can run multiple instances of PostgreSQL on the same computer, by using different ports.

Related

Postgres pg_dumpall consistency across databases that are backed up

Is there a way to backup all of the databases on a hyperscaler managed postgres server as of a certain time in order to maintain data consistency between the databases with either pg_dumpall, pg_dump or something else?
Background:
With the utilization of micro-services, an application may have many databases associated to it on a single hyperscaler managed postgres server. The hyperscalers do perform a functional snapshot backup; however, when a hyperscaler managed postgres server is accidentally deleted, the postgres backups are lost as well. These hyperscalers provide locks to prevent accidental deletes of a postgres server and mention that their support teams can be contacted to restore a deleted server, however, we still had a postgres server get deleted. We were able to recover by contacting the hyperscalers support team but would like to have a second way of backing up a hyperscaler managed postgres server.
I realize that the micro-services should be able to auto-recover to a data consistent point but the reality is that many of the micro-services have not been designed nor written to that requirement. I really do not want to get into the aspect of micro-service design and want to retain this to be a DBA backup question.
pg_dumpall will not take a consistent backup across all databases. Each database backup will be consistent, but the snapshots for the backups of the different databases will be taken at different times.
If you need a consistent backup across several databases in a single cluster, use an online file system backup with pg_basebackup.
You can use pg_basebackup which most of the PostgreSQL DBA’s would end up using on a daily basis be it via scripts or manually. This creates the base backup of the database which can help in recovering in multiple situations. This takes an online backup of the database and hence is very useful when being used in production
you can review this for more details

SQL Live Backup Over Intermittent Connection

I have a few PCs that have local PostgreSQL databases running, just logging data. Data is only ever inserted, never removed or updated. The remote PCs are connected to the internet by cellular modem and depending on their location, often do not have internet access. When they do have an internet connection I would like them to push a copy of their databases to a central location and keep the remote database up to date with any new data. Essentially, I need an 'rsync' for databases.
At first it seemed like what I need is to set up PostgreSQL Hot-Standby but I'm unsure if this is actually what I need because my situation seems to differ from the examples I've seen.
Each remote PC has a Postgres server with a single database that has a unique name, the tables within the DBs have generic names. I would like to synchronize these databases to a single remote Postgres server. I think this should be okay due to the unique DB names.
My connectivity is very intermittent, days to weeks without a connection. I've seen PgAdmin be very reliable despite a terrible (cellular) internet connection, if Postges Hot-Standby is the same I may be alright.
As far as I can see my options are either to set up PostgreSQL Hot-Standby, or roll my own solution. I don't want to roll my own solution. However it is simple enough if I can't find anything better; a Python daemon run by systemd to find the diff between the local and remote DB, then push the new rows from the local to the remote DB. But I'm sure someone has solved this problem, I just haven't found the solution yet.
You don't need hot standby (which is the PostgreSQL term for being able to query the replicated database), but streaming replication. You need a central standby server for each intermittently connected remote database server. If you use replication slots, you can be sure that replication will never fall behind.

Postgres and multiple locations of data storage

Postgres and the default location for its storage is at my C-drive. I would like to restore a backup to another database but to access it via the same Postgres server instance - the issue is that the size of the DB is too big to be restore on the same c-drive ...would it be possible to tell Postgres that the second database should be restore and placed on another location/drive (while still remaining the first one)? Like database1 at my C-drive and database2 at my D-drive?
Otherwise the second best solution would be to install 2 separate Postgres instances - but that also seems a bit overkill?
That should be entirely achievable, if you've used the postgres pg_dump command.
The pg_dump command does not create the database, so you create it yourself first. Use CREATE TABLESPACE to specify the location.
CREATE TABLESPACE secondspace LOCATION 'D:\postgresdata';
CREATE DATABASE seconddb TABLESPACE secondspace;
This creates an empty database on the D: drive.
Then the standard restore from a pg_dump should work:
psql seconddb < dumpfile
Replication
Sounds like you need database replication.
There are several ways to do this with Postgres, one built-in, and other approaches using add-on libraries.
Built-in replication feature
The built-in replication feature is likely to suit your needs. See the manual. In this approach, you have an instance of Postgres running on your primary server, doing reads and writes of your data. On a second server, an entirely separate computer, you run another instance of Postgres known as the replica. You first set up the replica by doing a full backup of your database on the first server, and restore to the second server.
Next you configure the replication feature. The replica needs to know it is playing the role of a replica rather than a regular database server. And the primary server needs to know the replica exists, so that every database change, every insert, modification, and deletion, can be communicated.
WAL
This communication happens via WAL files.
The Write-Ahead Log (WAL) feature in Postgres is where the database writes all changes first to the WAL, and only after that is complete, then writes to the actual database. In case of crash, power outage, or other failure, the database upon restarting can detect a transaction left incomplete. If incomplete, the transaction is rolled back, and the database server can try again by seeing the "To-Do" list of work listed in the WAL.
Every so often the current WAL is closed, with a new WAL file created to take over the work. With replication enabled, the closed WAL file is copied to the replica. The replica then incorporates that WAL file, to follow the same "To-Do" list of changes as written in that WAL file. So all changes are made to the replica database exactly as they were made to the primary database. Your replica is an exact match to the primary, except for a slight lag in time. The replica is always just one WAL file behind the progress of the primary.
In times of trouble, the replica serves as a warm stand-by. You can shutdown the primary, then tell the replica that it is now the primary. You can even configure the replica to be a hot stand-by, meaning it will automatically take-over when the primary seems to have failed. There are pros and cons to hot stand-by.
Offload read-only queries
As a bonus feature, the replica can be used for read-only queries. If your database is heavily used, you can offload some of the work burden from your primary to the replica. Any queries that do not require the absolute latest information can be shifted by connecting to the replica rather than the original. For example, a quarterly sales report likely does not need the latest data stored in the active WAL file that has not yet arrived on the replica.
Physical replication means all databases are copied
Caveat: This built-in replication feature is physical replication. This means all the changes to the entire Postgres installation (formally known as a cluster, not to be confused with a hardware cluster) is copied to the replica. If you use one Postgres server to server multiple databases, all those databases must be replicated – you cannot pick and choose which get copied over. There may be alternative replication features in the future related to logical replication.
More to learn
I am being brief here. The topics of replication, high-availability, and disaster-recovery are broad and complex, too much for an Answer on Stack Overflow.
Tip: This kind of Question might have been better asked on the sister site, DBA.StackExchange.com.

Heroku horizontal scaling

I have a python app running on Heroku using a PostgreSQL database. If I create a database follower, will that follower be used to balance the read database load automatically? I know this provides me a failover copy of sorts, but will it relieve my database load?
No -- you'll need to configure your Python software to send SQL queries to both the follower AND the master database in order to actually 'relieve' your database load.
If you're using Django, you'll want to read this: https://docs.djangoproject.com/en/1.8/topics/db/multi-db/
If you're using SQLAlchemy, you'll want to read this: read slave , read-write master setup

MongoDB replica set to stand alone backup and restore

For development reasons, I need to backup a production replica set mongodb and restore it on a stand alone, different machine test instance.
Some docs are talking about the opposite ( standalone 2 replica-set ), but I cannot find his downgrade/rollback way.
What's the way to go, in this case ?
No matter how many nodes you have in a replica set, each of them holds the same data.
So getting the data is easy - just use mongodump (preferably against the secondary, for performance reasons) and then mongorestore into a new mongod for your development stand-alone system.
mongodump does not pick up any replication related collections (they live in database called local). If you end up taking a file system snapshot of a replica node rather than using mongodump, be sure to drop the local database when you restore the snapshot into your production stand-alone server and then restart mongod so that it will properly detect that it is not part of a replica set.