Will "PostgreSQL Streaming Replication" fit this use case? - postgresql

I am designing an application for public organizations.
The purpose is to record data (text and video streams) which will be produced in "local" offices, where connectivity is not guaranteed, and where the power will be available only during the occurrences of meetings.
One of the requisites of the project is the "locality" of the data storage, since data is considered "sensitive" and "important".
One second requisite of the project is to publish to a web server a portion of the data produced during the meetings.
The database server shall be PostgreSQL.
I plan to set up a second PostgreSQL database server on the web infrastructure hosting the web server, and synchronize it with the "local" database.
The "public" database will be accessed only by *selection queryes" (no writes).
I see PostgreSQL does implement "Streaming Replication" PostgreSQL Streaming Replication since version 9.0.
The question(s):
Is PostgreSQL Streaming Replication ready for primetime?
Does it fit the use case I describe above?
Should I expect any major problem?
Could you suggest alternative, better solutions?

Yes it is the best solution for your case you should know that
the master database and standby database will be 100% identiques
standby database will not allow to write (read only)
If you have the configuration of master - standby you will not have problems , but if you use master - master configuration , it may cause some problems .

Related

Streaming replication solution in Postgres

I'm reading the article below how to achieve streaming replication in Postgres DB.
https://www.percona.com/blog/2018/09/07/setting-up-streaming-replication-postgresql
Some things are not quite clear
1) Are both DB instances active OR the slave instance is just a clone of master (o it communicates with master, but not the backend?
2) If DB master node failed, what will happen until second node will get back online? Is this covered by default by just having wal sender and wal receiver processes or something else needs to be added?
3) Which DB_HOST:PORT should be configured in the backend app if for example I have two backend nodes (both of them are active)?
If hot_standby = on in postgresql.conf, clients can connect to the standby, but only read data and not modify them. The standby is an identical physical cooy of the primary, just as if you had copied it file by file.
If the primary fails, the standby will remain up and running, but you still can only read data until somebody promotes the standby. You have to understand that PostgreSQL does not ship with cluster software that allows this to happen automatically. You have to usr some other software like Patroni for that.
That depends on the API your software is using. With libpq (the C API) or JDBC you can have a connection string that contains both servers and will select the primary automatically, but with other clients you may have to use external load balancing software.

PostgreSQL Multi master Synchronisation

I have a scenario as follows,
One cloud server is running an application with PGSQL as DB
Multiple local servers are running with same application with PGSQL as DB
User may access the cloud server for read/write data
User may access any of the local server to read/write data
What I need is synchronisation between all these databases. The synchronisation can be done live if connectivity is available, or immediately when connectivity is available.
Please guide me with some inputs, where can i start from.
Rethink your requirements.
Multimaster replication is full of pitfalls, and it is easy to get your databases out of sync unless you plan carefully. You'd probably be better off with a single master node.
That said, you could look at BDR by 2ndQuadrant which provides such functionality.

Postgres replication + incremental backup

I'm using PostgreSQL 9.6.
Is it possible to have replication and incremental backup on the same setup
I would like to have high availability setup. On the main site I will have two servers with replication between them and pgpool will handle the failover in case the primary server goes down.
I would also like to have another remote site for geographical redundancy. This site will be active only if the main site is no longer functioning. The remote site does not need to be updated in real-time. Therefore, if it saves resources I thought about having incremental backup and restore from the main site to the remote site. In other words the main site primary server will replicate its data to the main site secondary server. In addition it will also generate incremental backup and that backup will be restored on the remote site.
From your answer I understood that it is possible to have both replication and incremental backup. However, will this solution be better (resource consumption, reliability etc.) than just have replication to both the main secondary server and the remote site server?
Yes, you can have PITR and streaming replication in use at the same time. Streaming replication can fall back to restoring from the WAL archive if it loses direct connectivity to the master too.
Lots more detail in the manual. It's hard to be more specific with a rather open and vague question - what, exactly, do you mean by "incremental backup"? etc.

Which PostgreSQL replication solution to use for my specific scenario

I need to replicate a PostgreSQL database server as follows:
Two servers are adjacent to each-other - one is the master and the other standby. If the master fails, the standby takes over. Replication from master to slave needs to be failsafe, hence, synchronous. The standby will not be used for any querying unless it has become a master. So, no high-availability/load-balancing is required.
There is another backup server at a remote location. Data from the master server mentioned above will be replicated to this remote server asynchronously and in batches. Time is not a factor at all in this replication - a couple of hours is just fine. This server would be used just for backup.
I've studied the currently available replication solutions from the PostgreSQL docs as well as from Google, but can't decide which combination of synchronous-asynchronous solutions would I need.
The closest I came up with is using pgpool-II for scenario 1 and Mammoth for scenario 2. However, as pgpool is statement-based, what would happen to queries containing rand() and now()?
Please note that I'd rather use free and open-source replication tools.
Also, just a side question - according to scenario 1 above, when the master fails, the standby will take over. Would the master-slave role be reversed after that, or would after the recovery of the master server the slave would go back to its standby state?
Any suggestion would be highly appreciated. Thanks.
I suggest using DRBD for scenario 1 and either 9.0 built-in replication or Slony for scenario 2.
Before PostgreSQL 9.1 (not yet released), there is no other synchronous replication solution available, and DRBD is widely established for this purpose. Together with Pacemaker or Heartbeat, which come with all the scripts needed for PostgreSQL monitoring and switchover, you have a very robust and fairly easy to manage solution. (In fact, I'd consider continuing to use DRBD even after 9.1 comes out; it's just a lot easier and has a longer track record.)
For the cross-site asynchronous, you could try the built-in replication of PostgreSQL 9.0, perhaps in conjunction with repmgr for monitoring and management. Alternatively, you could try the (now a bit) old-school Slony, but I'd guess it will more complicated for your needs.
You didn't mention if the server in question was on a specific version or if this was a new project with the freedom to choose the version. The answers vary based on that information.
If you are starting with a clean slate, I would recommend designing based on the PostgreSQL 9.1 beta. The final version will be released long before you would be ready to go into a production environment and it has binary synchronous replication built-in.
I've been using the built-in asynchronous replication in PostgreSQL for years in almost the exact same scenario you describe and it has always been rock-solid for me. It's become even better with 9.0 with Hot standby and it's become much easier to configure and maintain. 9.1 provides the only missing piece you require.
However, if you are trying to replicate an existing server, built-in asynchronous replication with aggressive settings for "checkpoint_timeout" a very frequent backup of unarchived WAL files could be sufficient until you can upgrade to 9.1.
The bottom line here is that you can get exactly what you want is with stock PostgreSQL 9.1--no third-party products required.
As for failover, it is not an automatic process, you'll need to handle that yourself. I would recommend that after a failover, switching the roles of the two machines until either the next failover event or until a controlled manual failover during a scheduled outage during a slow period of use. Again, this is not automatic and much be managed by the administrator (via shell scripts, presumably).

Load Balancing and Failover for Read-Only PostgreSQL Database

Scenario
Multiple application servers host web services written in Java, running in SpringSource dm Server. To implement a new requirement, they will need to query a read-only PostgreSQL database.
Issue
To support redundancy, at least two PostgreSQL instances will be running. Access to PostgreSQL must be load balanced and must auto-fail over to currently running instances if an instance should go down. Auto-discovery of newly running instances is desirable but not required.
Research
I have reviewed the official PostgreSQL documentation on this issue. However, that focuses on the more general case of read/write access to the database. Top google results tend to lead to older newsgroup messages or dead projects such as Sequoia or DB Balancer, as well as one active project PG Pool II
Question
What are your real-world experiences with PG Pool II? What other simple and reliable alternatives are available?
PostgreSQL's wiki also lists clustering solutions, and the page on Replication, Clustering, and Connection Pooling has a table showing which solutions are suitable for load balancing.
I'm looking forward to PostgreSQL 9.0's combination of Hot Standby and Streaming Replication.
Have you looked at SQL Relay?
The standard solution for something like this is to look at Slony, Londiste or Bucardo. They all provide async replication to many slaves, where the slaves are read-only.
You then implement the load-balancing independent of this - on the TCP layer with something like HAProxy. Such a solution will be able to do failover of the read connections (though you'll still loose transaction visibility on a failover, and have to start new transaction on the new slave - but that's fine for most people)
Then all you have left is failover of the master role. There are supported ways of doing it on all these systems. None of them are automatic by default (because automatic failover of a database master role is really dangerous - consider the situation you are in once you've got split brain), but they can be automated easily if the requirement needs this for the master as well.