PostgreSQL - Periodically copying data from one database to another - postgresql

I'm trying to set up an architecture with 2 databases, say preview and live, that have the exact same schemas. The use case is that edits can be made to the preview database and then pushed to the live database after they are vetted and approved. The production application would read from the live database.
What would be the most appropriate way to push all data from the preview database to the live database without bringing the live database down? Ideally the copy from preview to live would be an atomic transaction.
I've worked with this type of setup in MSSQL, but I'm fairly new to Postgres. So I'm open to hearing other ways to architect this (with Schemas perhaps?).
EDIT: The main reason to use separate databases is that I may need more than 1 target database (not just a single "live" database). I also may need to switch target databases on the fly without altering the source database schema.

I think what you're looking for is a "hot standby". This would be a separate instance of Postgresql, possibly on the same server but usually not, which is a near-real-time replica of the primary server.
In broad strokes, this is done by shipping the binary transaction logs from the primary server to the backup server, and then "replaying" them there. The exact mechanism for transmitting the logs may vary depending on your requirements.
Fortunately, the docs on this are excellent:
https://www.postgresql.org/docs/9.3/static/warm-standby.html
https://www.postgresql.org/docs/9.0/static/hot-standby.html

Related

Moodle asynchronous replication

I'm looking to deploy moodle in the cloud however I have some 50 odd sites which require access to this moodle possibly even temporarily offline. So I'm looking into replicating moodle down onto each site. From what I understand there are 2 data stores that require replication, moodledata and the database, postgresql in our case. moodledata if I'm not mistaken contains the multimedia data and the database among other things all the user records. Luckily the multimedia data will be centralized and is thus synched only one way down to the nodes, that seems doable. Where I'm stuck is how do I handle the Postgres database where the sync will need to be bidirectional?

How do you sync many writable MongoDB databases?

What I mean by witable is that you can CRUD on each database, and it automatically syncs with the other so that all of them are synced all the time (as much as possible).
I want to start a project for a company with some tricks.
The company is present in many locations (at least 5) and wants the app to run locally (with local database), but when there's a change(Create Update or Delete), the change is propagated to the other databases.
The goal is to have them all synced at every moment, but with the possibility that if internet connection is lost on one site, they continue to use the app properly since they are actually connected to the local database. That's why they don't one a totally online database.
They use MongoDB.
I saw the replica sets technology, but since it's with a unique master, it seems complicated.
Please can you share solutions to such a situation?

How do I move data from RDS of one AWS account to another account

We have our web services and database set up on AWS a while back and application is now in production. For some reason, we need to terminate the old AWS and move everything under a newly created AWS account. Application and all the infrastructure are pretty straightforward. It is trickier for data though. The current database is still receiving lots of data on daily basis. So it is best to migrate the data after we turn off the old application and switch on new platform.
Both source RDS and target RDS are Postgres. We have about 40GB data to transfer. There are three approaches I could think of and they all have drawbacks.
Take a snapshot of the first RDS and restore it in second one. Problem is I don't need to transfer all the data from source to destination. Probably just records after 10/01 is enough. Also snapshot works best to restore in an empty rds that is just created. For our case, the new RDS will start receiving data already after the cutoff. Only after that, the data will be transferred from old account to new account otherwise we will lose data.
Dump data from tables in old RDS and backup in new RDS. This will have the same problem as #1. Also, if I dump data to local machine and then back up from local, the network speed is bottleneck.
Export table data to csv files and import to new RDS. The advantage is this method allows pick and choose and some data cleaning as well. But it takes forever to export a big fact table to local csv file. Another problem is, for some of the tables, I have surrogate row IDs which are serial (auto-incremental). The row IDs of exported csv may conflicting with existing data in new RDS tables.
I wonder if there is a better way to do it. Maybe some ETL tool AWS has which does point to point direct transfer without involving using local computer as the middle point.
In 2022 the simplest way to achieve this task is using AWS Database Migration Services (AWS DMS).
You can create a migration task, and set the original database as the source endpoint, and the new database as a destination endpoint.
Next create a task with "Full load, ongoing replication" settings.
More details here: https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.PostgreSQL.html
I have recently moved the data of RDS from one account to other using Bucardo (https://bucardo.org/). Please refer the following blogs
https://www.compose.com/articles/using-bucardo-5-3-to-migrate-a-live-postgresql-database/
https://bucardo.org/pipermail/bucardo-general/2017-February/002875.html
Though this has not mentioned exactly about migration between two RDS account, this could help setting things. We still need some intermediate point such as EC2 instance where we need to configure this Bucardo and migrate the data between accounts. If you are looking for more information, I am happy to help.
In short, we need to take a manual snapshot of the source db and restore it in the another account (https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ShareSnapshot.html) and with Bucardo set up in the EC2 instance, we can start to sync the data using triggers and that will update the data in destination db as and then the new data comes in to the source DB.

Point in time reqovery for just one schema or database

I'm going to develop a multi-tenant application, where each tenant lives in its own database or schema (I've not decided this yet).
In this scenario, if I wanted to use point in time recovery (PITR), I also want to have it per-tenant. If a tenant has a problem, I want to be able to roll back only his database or schema and not the whole server.
While I found information how to do backup/restore in such situations with pg_dump and pg_restore, I haven't found any information for PITR.
Is this even possible? If yes, only per database or even per schema?
I can imagine that postgres maybe stores the log of the whole server in a single file, which may be the reason why it could not be possible. But I may be wrong..

Firebird 2.5.1 List databases in use by the server (superserver mode)

I want to write a C++ administrative app to simplify management of DBs I am in charge of. Currently, when I want to tell if there are users connected to multiple Firebird databases operated by 2 different instances of it, I have to connect to every single DB and check. That's ok, but I don't want to register every new database that is being created when i don't look, I want some way to list databases that are currently open or otherwise in use by the server. Current 2 uses of this functionality I can think of are:
Auto-inclusion in backup procedure
Application update, which require users to log off (one-look and I would be able to tell whom to kick or at least which department to call)
Firebird does not have an API to list all available databases. Technically Firebird simply doesn't know about the existence of a database until you actually connect to it.
You might be able to find all databases that are being connected to using the Trace API or the monitoring tables, but that does not exclude the possibility that other databases exist on your system.