PostgreSQL Multi master Synchronisation - postgresql

I have a scenario as follows,
One cloud server is running an application with PGSQL as DB
Multiple local servers are running with same application with PGSQL as DB
User may access the cloud server for read/write data
User may access any of the local server to read/write data
What I need is synchronisation between all these databases. The synchronisation can be done live if connectivity is available, or immediately when connectivity is available.
Please guide me with some inputs, where can i start from.

Rethink your requirements.
Multimaster replication is full of pitfalls, and it is easy to get your databases out of sync unless you plan carefully. You'd probably be better off with a single master node.
That said, you could look at BDR by 2ndQuadrant which provides such functionality.

Related

real-time sync between local Postgres instance and Azure Cloud Postgres instance

I need to set up real time sync process between a on premise postgresql instance with cloud postgresql instance. Please let me know what are all the options available through which i can achieve it.
Do i have to use any specific tool or it can be managed through replication .
Please advice
Use PgPool
http://www.pgpool.net/mediawiki/index.php/Main_Page
from their web page:
pgpool-II can manage multiple PostgreSQL servers. Using the replication function enables creating a realtime backup on 2 or more physical disks, so that the service can continue without stopping servers in case of a disk failure.

Will "PostgreSQL Streaming Replication" fit this use case?

I am designing an application for public organizations.
The purpose is to record data (text and video streams) which will be produced in "local" offices, where connectivity is not guaranteed, and where the power will be available only during the occurrences of meetings.
One of the requisites of the project is the "locality" of the data storage, since data is considered "sensitive" and "important".
One second requisite of the project is to publish to a web server a portion of the data produced during the meetings.
The database server shall be PostgreSQL.
I plan to set up a second PostgreSQL database server on the web infrastructure hosting the web server, and synchronize it with the "local" database.
The "public" database will be accessed only by *selection queryes" (no writes).
I see PostgreSQL does implement "Streaming Replication" PostgreSQL Streaming Replication since version 9.0.
The question(s):
Is PostgreSQL Streaming Replication ready for primetime?
Does it fit the use case I describe above?
Should I expect any major problem?
Could you suggest alternative, better solutions?
Yes it is the best solution for your case you should know that
the master database and standby database will be 100% identiques
standby database will not allow to write (read only)
If you have the configuration of master - standby you will not have problems , but if you use master - master configuration , it may cause some problems .

Is there any way to know if a CouchDB database is the source of a pull continuous replication?

For my example, let's say we have two servers. Server A creates a continuous pull replication with a local database on Server A. The source of this pull replication is a database on Server B.
I know that Server A can monitor the status of the replication either by the _replicator database if it was created that way or by querying _active_tasks. Nevertheless, is there any way for Server B to know that it is the source of a continuous pull replication, except by monitoring the GET requests?
Even then, we are using Cloudant as our Server B, monitoring through a proxy is not an option. So if a database on Cloudant is part of a replication not created on the Cloudant server, there is absolutely no way to know it since it won't show up in Cloudant's _active_tasks, am I correct?
EDIT: After communicating with Samantha Scharr from Cloudant Support and she said that "making logs available to our clients is a concern that we are working on". This would not be such a problem once this is done.
Thank you,
Paul
There is no such. For CouchDB replication process is not something special to track on.
Say, you have three instances: A, B and C. CouchDB allows you to run replication process on A to replicate data from B to C. For instance A the replication process will be explicitly defined in _active_tasks since replication is running within separate Erlang process. But for B and C instances this will be looked as that some HTTP client calling their public API resources with some payload. They will never know that someone trying to keep them synced.
Theoretically, you may write some logs parse or proxy that will aware about remote replication running by analyzing HTTP requests basing on Replication protocol definition.
But I fear you have to make it smart enough to not let him make a lot of false-positive matches for regular clients.

Aws app with Heroku Postgres database

Is it possible to have an app running at aws EC2 and have it's database running at heroku's postgres?
In case it is, what are the downsides I should consider?
Since heroku is hosted at AWS, is there a way to know where is the location of the machine running my database?
Hosting my app in the same region of the database would help to keep the performance?
I would like to hear some opinions about this, I've been searching the topic without much success.
You can determine the public-facing location of your Heroku DB at any given time with a traceroute ... but there's no guarantee that it'll stay at that location, or that there isn't any internal re-routing going on. You'd probably want to speak directly with Heroku support about ways to make sure your Heroku DB instances are local to your AWS application instances, as that certainly would benefit performance. See if you can find out which availability zone, or at least which major region, they run the DB in, and whether you can "pin" your database instance to a given region/zone.
Amazon's RDS looks OK, but doesn't support PostgreSQL. Please keep nagging them to.
I'd probably just run the DB on AWS if performance wasn't particularly important. Use a raid10 of provisioned IOPS EBS volumes on an EBS-optimized instance and you'll get kind-of-ok performance (but at a really big price); alternately, you can use non-crash-safe ssd-based instance store servers and rely on replication and backups to keep your data safe.
I dont have any experience on Heroku PostgreSQL.
Generally of course you can run your own service on Amazon EC2 and use the managed database services of Heroku.
Downsides might be
nobody guarantees, that Herouku exclusively uses AWS and you probably can't determine the physical Heroku service location within the cloud so you will have to deal with network latencies
in addition to your external traffic fees you'll have to pay for the database traffic unless you talk to a server in the same availability zone in the same region
My suggestion ( without knowing any detail about the pros of Heroku )
Have a look at Amazon RDS if you don't want to run a database server on our own.
http://aws.amazon.com/de/rds/
I am operating around 70 server instances on AWS, both RDS and EC2 for more than a year now and I can't imagine any simpler way to keep your stuff running

Load Balancing and Failover for Read-Only PostgreSQL Database

Scenario
Multiple application servers host web services written in Java, running in SpringSource dm Server. To implement a new requirement, they will need to query a read-only PostgreSQL database.
Issue
To support redundancy, at least two PostgreSQL instances will be running. Access to PostgreSQL must be load balanced and must auto-fail over to currently running instances if an instance should go down. Auto-discovery of newly running instances is desirable but not required.
Research
I have reviewed the official PostgreSQL documentation on this issue. However, that focuses on the more general case of read/write access to the database. Top google results tend to lead to older newsgroup messages or dead projects such as Sequoia or DB Balancer, as well as one active project PG Pool II
Question
What are your real-world experiences with PG Pool II? What other simple and reliable alternatives are available?
PostgreSQL's wiki also lists clustering solutions, and the page on Replication, Clustering, and Connection Pooling has a table showing which solutions are suitable for load balancing.
I'm looking forward to PostgreSQL 9.0's combination of Hot Standby and Streaming Replication.
Have you looked at SQL Relay?
The standard solution for something like this is to look at Slony, Londiste or Bucardo. They all provide async replication to many slaves, where the slaves are read-only.
You then implement the load-balancing independent of this - on the TCP layer with something like HAProxy. Such a solution will be able to do failover of the read connections (though you'll still loose transaction visibility on a failover, and have to start new transaction on the new slave - but that's fine for most people)
Then all you have left is failover of the master role. There are supported ways of doing it on all these systems. None of them are automatic by default (because automatic failover of a database master role is really dangerous - consider the situation you are in once you've got split brain), but they can be automated easily if the requirement needs this for the master as well.