Is there any way to know if a CouchDB database is the source of a pull continuous replication? - nosql

For my example, let's say we have two servers. Server A creates a continuous pull replication with a local database on Server A. The source of this pull replication is a database on Server B.
I know that Server A can monitor the status of the replication either by the _replicator database if it was created that way or by querying _active_tasks. Nevertheless, is there any way for Server B to know that it is the source of a continuous pull replication, except by monitoring the GET requests?
Even then, we are using Cloudant as our Server B, monitoring through a proxy is not an option. So if a database on Cloudant is part of a replication not created on the Cloudant server, there is absolutely no way to know it since it won't show up in Cloudant's _active_tasks, am I correct?
EDIT: After communicating with Samantha Scharr from Cloudant Support and she said that "making logs available to our clients is a concern that we are working on". This would not be such a problem once this is done.
Thank you,
Paul

There is no such. For CouchDB replication process is not something special to track on.
Say, you have three instances: A, B and C. CouchDB allows you to run replication process on A to replicate data from B to C. For instance A the replication process will be explicitly defined in _active_tasks since replication is running within separate Erlang process. But for B and C instances this will be looked as that some HTTP client calling their public API resources with some payload. They will never know that someone trying to keep them synced.
Theoretically, you may write some logs parse or proxy that will aware about remote replication running by analyzing HTTP requests basing on Replication protocol definition.
But I fear you have to make it smart enough to not let him make a lot of false-positive matches for regular clients.

Related

MongoDB replication and failover with just two server instances

So I'm in the process of laying out an architecture for a system that I intend to build myself. One of the features of the system should be, that it comprise redundancy - So that server B can take over in the event that Server A fails.
The problem is, that I know MongoDB supports replication with failover - However, just not when you only have 2 MongoDB instances (Because a single MongoDB instance can't appoint itself as primary).
As I see it, I therefore have 2 options:
Have a small service, that listens to the MongoDB changestream on server A and synchronize with server B at every change event
Use replication and accept that failover is not supported (Write a failover script to manually appoint a primary based on my own rules)
I don't have much practical experience with MongoDB, thus why I would like to hear from you if:
My two solutions are feasible
What caveats I may run into

PostgreSQL Multi master Synchronisation

I have a scenario as follows,
One cloud server is running an application with PGSQL as DB
Multiple local servers are running with same application with PGSQL as DB
User may access the cloud server for read/write data
User may access any of the local server to read/write data
What I need is synchronisation between all these databases. The synchronisation can be done live if connectivity is available, or immediately when connectivity is available.
Please guide me with some inputs, where can i start from.
Rethink your requirements.
Multimaster replication is full of pitfalls, and it is easy to get your databases out of sync unless you plan carefully. You'd probably be better off with a single master node.
That said, you could look at BDR by 2ndQuadrant which provides such functionality.

Will "PostgreSQL Streaming Replication" fit this use case?

I am designing an application for public organizations.
The purpose is to record data (text and video streams) which will be produced in "local" offices, where connectivity is not guaranteed, and where the power will be available only during the occurrences of meetings.
One of the requisites of the project is the "locality" of the data storage, since data is considered "sensitive" and "important".
One second requisite of the project is to publish to a web server a portion of the data produced during the meetings.
The database server shall be PostgreSQL.
I plan to set up a second PostgreSQL database server on the web infrastructure hosting the web server, and synchronize it with the "local" database.
The "public" database will be accessed only by *selection queryes" (no writes).
I see PostgreSQL does implement "Streaming Replication" PostgreSQL Streaming Replication since version 9.0.
The question(s):
Is PostgreSQL Streaming Replication ready for primetime?
Does it fit the use case I describe above?
Should I expect any major problem?
Could you suggest alternative, better solutions?
Yes it is the best solution for your case you should know that
the master database and standby database will be 100% identiques
standby database will not allow to write (read only)
If you have the configuration of master - standby you will not have problems , but if you use master - master configuration , it may cause some problems .

Can I keep two mongo databases synced?

I have an app that can run in offline mode. If offline it uses a local mongo database, if it has a data connection it will use a remote mongo database.
Is there an easy way to sync these two databases and make sure they both have the union of their collections and documents?
EDIT: Effectively there are two databases that could both have insertions and deletions happening on them that aren't happening on the other. At fixed points in time I would like to have both databases show the union of them both.
For example over a period of time.
DB1.insert(A)
DB1.insert(B)
DB2.insert(C)
DB1.remove(A)
RUN SYNC
DB1 = DB2 = {B, C}
EDIT2: Been doing some reading. It's not the intended purpose but could they be set up as slaves replica sets of the remote and used that way? Problem is that I think replicas need to have a replica hosts must be accessible by way of resolvable DNS. Not sure how the remote could access local host.
You could use replica set but MongoDB doesn’t support master-master replication. Let's assume if you have setup like this:
two nodes with priority 1 which will be used as remote servers
single arbiter to ensure majority if one of remotes dies
5 local dbs with priority set as 0
When your application goes offline, it will stay secondary so you won't be able to perform writes. When you go online it will sync changes from remote dbs but you still need some way of syncing local changes. One of dealing with could be using local fallback db which will be used for writes when you are offline. When you go online, you push all new records to master. A little bit trickier could be dealing with updates but it is doable.
Another problem is that it won't scale up if you'll need to add more applications. If I remember correctly, there is a 12 nodes per replica set limit. For small cluster DNS resolution could be solved by using ssh tunnels.
Another way of dealing with a problem could be using small restful service and document timestamps. Whenever app is online it can periodically push local inserts to remote and pull data from remote db.

Load Balancing and Failover for Read-Only PostgreSQL Database

Scenario
Multiple application servers host web services written in Java, running in SpringSource dm Server. To implement a new requirement, they will need to query a read-only PostgreSQL database.
Issue
To support redundancy, at least two PostgreSQL instances will be running. Access to PostgreSQL must be load balanced and must auto-fail over to currently running instances if an instance should go down. Auto-discovery of newly running instances is desirable but not required.
Research
I have reviewed the official PostgreSQL documentation on this issue. However, that focuses on the more general case of read/write access to the database. Top google results tend to lead to older newsgroup messages or dead projects such as Sequoia or DB Balancer, as well as one active project PG Pool II
Question
What are your real-world experiences with PG Pool II? What other simple and reliable alternatives are available?
PostgreSQL's wiki also lists clustering solutions, and the page on Replication, Clustering, and Connection Pooling has a table showing which solutions are suitable for load balancing.
I'm looking forward to PostgreSQL 9.0's combination of Hot Standby and Streaming Replication.
Have you looked at SQL Relay?
The standard solution for something like this is to look at Slony, Londiste or Bucardo. They all provide async replication to many slaves, where the slaves are read-only.
You then implement the load-balancing independent of this - on the TCP layer with something like HAProxy. Such a solution will be able to do failover of the read connections (though you'll still loose transaction visibility on a failover, and have to start new transaction on the new slave - but that's fine for most people)
Then all you have left is failover of the master role. There are supported ways of doing it on all these systems. None of them are automatic by default (because automatic failover of a database master role is really dangerous - consider the situation you are in once you've got split brain), but they can be automated easily if the requirement needs this for the master as well.