I'm currently have a mongo replicaset consisting of 1 primary and 2 slaves, that is used by a read-only application. I'm adding a 2nd read-only application that requires access to the same data. I have / am considering using the same RS for both applications, but was wondering if there's a way to create a specific type of configuration with Mongo, that works something like this:
1 x primary, that handles all writes, but is not seen as part of a replicaset by the application, and then 2 sets of read-only secondaries that replicate from primary. Each set of secondary replicates writes from the master. Conceptually, something like:
/----> RS1: |Secondary1|Secondary2|..|SecondaryN| <--- App1
PRIMARY|=>
\----> RS2: |Secondary1|Secondary2|..|SecondaryN| <--- App2
Is this sort of configuration possible at all? What similar architectures could I consider for this use-case?
Thanks in advance.
Brett
I came across a way to implement this using mongo tooling:
Create a replica set to use as a master. Data updates are written to this RS (represented by "PRIMARY" in the diagram). Do not enable authentication on this host
Create 2 replica sets with the same data (completely independent of each other)
Schedule regular "mongooplog" runs, using #1 as from and each of the RS's for host see the manual
Authentication can be set up in the RSs from #2, that only give applications read access to the data.
I haven't tested this yet, but from what I can tell, this approach should work for my objectives - is there anything I've overlooked in this approach?
edit
While trying to use this approach, I ran into issues when trying to use mongooplog with authentication on the destination. over and above that, mongooplog doesn't cater for authentication on the source / --from rs, and so I wrote a tool to cater for this:
https://github.com/brettcave/mongo-oplogreplay
It supports authentication on both source and destination, as well as replicaset targets.
Related
I have a request asking for a read only schema replica for a role in postgresql. After reading documentation and better understanding replication in postgresql, I'm trying to identify whether or not I can create the publisher and subscriber within the same database.
Any thoughts on the best approach without having a second server would be greatly appreciated.
You asked two different question. Same database? No. Since Pub/Sub requires tables to have the same name (including schema) on both ends, you would be trying to replicate a table onto itself. Using logical replication plugins other than the built-in one might get around this restriction.
Same server? Yes. You can replicate between two databases of the same instance (but see the note in the docs about some extra hoops you need to jump through) or between two instances on the same host. So whichever of those things you meant by "same server", yes, you can.
But it seems like an odd way to do this. If the access is read only, why does it matter whether it is to a replica of the real data or to the real data itself?
I need to create a replica of existing database, that would copy any changing operation from master to slave, I.e create a mirror some sort of. I found a lot of examples in web but they all describes process when master and slave are on different servers.
I would like to create a write replica on the same server where master is located , without spinning up second instance of Postgres.
Is it possible to do so and could you point me a direction where I could find a solution how to do it?
Thank you.
P.S. I understand that replication on 2 servers is better, but I just need to do it on one common server.
If you want physical replication, you will need to run two instances of PostgreSQL. If they are on the same server machine, they will need to have different port numbers. The different port numbers is the only complexity, otherwise it is just like running on two different servers.
If you want logical replication, you can do that within a single instance, but you will need to jump through some hoops to create the subscription intra-instance, as described in the "Notes" section
You could consider using a simple trigger to insert/update/delete data on the other database as soon as the main one get modified.
A more "professional" way would be to use synchronous replication.
I would like to develop a multi-tenant web application using PostgreSQL DB, having the data of each tenant in a dedicated scheme.
Each query or update will access only a single tenant scheme and/or the public scheme.
Assuming I will, at some point, need to scale out and have several PostgreSQL servers, is there some automatic way in which I can connect to a single load balancer of some sort, that will redirect the queries/updates to the relevant server, based on the required scheme?
The challenging part of this question is 'automatic way'. I have a feeling that postgres is moving that way, maybe 9.5 or later will have multi-master tendencies with partitioning allowing spreading of data across a cluster so that your frontend doesn't have to change.
Assuming that your tenants can operate in separate databases, and you are looking for a way to operate a query in the correct database, perhaps something like dns could be used during your connection to the database, using the tenant ID as a component in the dns host. Something like:
tenant_1.example.com -> 192.168.0.10
tenant_2.example.com -> 192.168.0.11
tenant_3.example.com -> 192.168.0.11
etc.example.com -> 192.168.0.X
Then you could use the connection as a map to the correct db installation. The tricky part here is the overlapping data that all tenants would need access to. If that overlapping data needs to be joined to then it will have to exist in all databases. Either copied or dblinked. If the overlapping data needs to be updated then automatic is going to be tough. Good question.
I have to setup a database that can handle failover (if one crashes the other takes over). For that, I decided to use mongodb:
I set up a replica set with two instances. Each instance is running on a separate VM. I have several questions:
It is recommended to use at least 3 instances in a replica set. Is it ok to use only two ?
I have two instances, and then two IP addresses. Which IP should I give to my application that will need to read/write in the database ? When a database is down, how the request will be redirected to the instance that is still up ?
Some help to get started would be great !
It is recommended to use at least 3 instances in a replica set. Is it ok to use only two ?
No, the minimum requirement for a replica set is three processes (docs), but the third could be an arbiter even though it is not recommended.
I have two instances, and then two IP addresses. Which IP should I give to my application that will need to read/write in the database ? When a database is down, how the request will be redirected to the instance that is still up ?
There are two alternatives:
#1 (recommended)
You provide the driver with all addresses (for more detailed information how, visit the docs), example with nodejs driver (it is similar with the other). This way, the driver will know all, or at least more then one of, the instances directly, which will prevent problems if all of the specified instances are down (see #2).
var MongoClient = require('mongodb').MongoClient;
MongoClient.connect('mongodb://[server1],[server2],[...]/[database]?replicaSet=[name]', function(err, db) {
});
#2
You provide the driver with one of them (probably the primary) and mongodb will figure out the rest of them. However, if your app starts up when the specified instance(s) is down, the driver would not be able to find the other instances, and therefore cannot connect to mongodb.
Need some way to push data from clients database to central database.Basically, there are several instances of MongoDB running on remote machines [clients] , and need some method to periodically update central mongo database with newly added and modified documents in clients.it must replicate its records to the single central server
Eg:
If I have 3 mongo instances running on 3 machines each having data of 10GB then after the data migration 4th machine's mongoDB must have 30GB of data. And cenral mongoDB machine must get periodically updated with data of all those 3 machines. But these 3 machines not only get new documents but existing documents in them may get updated. I would like the central mongoDB machine also to get these updations.
Your desired replication strategy is not formally supported by MongoDB.
A MongoDB replica set consists of a single primary with asynchronous replication to one or more secondary servers in the same replica set. You cannot configure a replica set with multiple primaries or replication to a different replica set.
However, there are a few possible approaches for your use case depending on how actively you want to keep your central server up to date and the volume of data/updates you need to manage.
Some general caveats:
Merging data from multiple standalone servers can create unexpected conflicts. For example, unique indexes would not know about documents created on other servers.
Ideally the data you are consolidating will still be separated by a unique database name per origin server so you don't have strange crosstalk between disparate documents that happen to have the same namespace and _id shared by different origin servers.
Approach #1: use mongodump and mongorestore
If you just need to periodically sync content to your central server, one way to do so is using mongodump and mongorestore. You can schedule a periodic mongodump from each of your standalone instances and use mongorestore to import them into the central server.
Caveats:
There is a --db parameter for mongorestore that allows you to restore into a different database from the original name (if needed)
mongorestore only performs inserts into the existing database (i.e. does not perform updates or upserts). If existing data with the same _id already exists on the target database, mongorestore will not replace it.
You can use mongodump options such as --query to be more selective on data to export (for example, only select recent data rather than all)
If you want to limit the amount of data to dump & restore on each run (for example, only exporting "changed" data), you will need to work out how to handle updates and deletions on the central server.
Given the caveats, the simplest use of this approach would be to do a full dump & restore (i.e. using mongorestore --drop) to ensure all changes are copied.
Approach #2: use a tailable cursor with the MongoDB oplog.
If you need more realtime or incremental replication, a possible approach is creating tailable cursors on the MongoDB replication oplog.
This approach is basically "roll your own replication". You would have to write an application which tails the oplog on each of your MongoDB instances and looks for changes of interest to save to your central server. For example, you may only want to replicate changes for selective namespaces (databases or collections).
A related tool that may be of interest is the experimental Mongo Connector from 10gen labs. This is a Python module that provides an interface for tailing the replication oplog.
Caveats:
You have to implement your own code for this, and learn/understand how to work with the oplog documents
There may be an alternative product which better supports your desired replication model "out of the box".
You should be aware that there are only replica set for doing replication there a replicat set always means: one primary, multiple secondary. Write always go to the primary server. Appearently you want multi-master replication which is not supported by MongoDB. So you want to look into a different technology like CouchDB or CouchBase. MongoDB is barrel burst here.
There may be a way since MongoDB 3.6 to achieve your goal: Change Streams.
Change streams allow applications to access real-time data changes without the complexity and risk of tailing the oplog. Applications can use change streams to subscribe to all data changes on a single collection, a database, or an entire deployment, and immediately react to them. Because change streams use the aggregation framework, applications can also filter for specific changes or transform the notifications at will.
There are some configuration options that affect whether you can use Change Streams or not, so please read about them.
Another option is Delayed Replica Set Members.
Because delayed members are a "rolling backup" or a running "historical" snapshot of the data set, they may help you recover from various kinds of human error. For example, a delayed member can make it possible to recover from unsuccessful application upgrades and operator errors including dropped databases and collections.
Hidden Replica Set Members may be another option to consider.
A hidden member maintains a copy of the primary's data set but is invisible to client applications. Hidden members are good for workloads with different usage patterns from the other members in the replica set.
Another option may be to configure a Priority 0 Replica Set Member.
Because delayed members are a "rolling backup" or a running "historical" snapshot of the data set, they may help you recover from various kinds of human error.
I am interested in these options myself, but I haven't decided what approach I will use.