Background: I have a very specific use case where I have an existing MongoDB that I need to interact with via reads, but I have to ensure that the data can never be modified. However I also need to trigger some form of event when new data comes in so I can do post processing on it.
The current plan is to use replication to get the data onto a slave for the read processing. However for my purposes I only care about new data in various document stores. Part of the issue is that I can not modify the existing MongoDB and not all the data is timestamped, so there is no incremental way to handle this that I can think of.
Question: Is it possible to fire an event from a slave that would tell me I have new data and what it is? I will only have access to the slave DB, as the master will be locked.
I may have some limited ability to change the master DB, but I can not expect to change the document structure at all.
Instead of using a master/slave configuration you could instead use a replica set with a priority 0 secondary (so that it can never become primary).
You can tail the oplog on that secondary looking for insert operations.
Related
I want to setup PostgreSQL databases in a way that the one (primary) server receives data from multiple agents, does some post-processing and regularly syncs to read-only (archive) replica. The sync does not have to be real-time. The replica server is used to feed the processed data to other clients.
The idea is that the replica will keep a complete database in the historical sense but the primary will keep all data from all agents (before some aggregation etc).
The primary server would ideally keep only part of the database e.g. last month.
The more I think about it, the term replica is not correct here.
What I probably need is to setup a procedure (for the aggregation etc.) and as a last step send the resulted data the 'replica' and delete portion of the primary DB.
Is there a better term for my use case?
I am new in mongodb and i want to know How mongodb handels users requests.
What happened if the multiple users fire the multiple insert commands or read commands at the same time.
2:-When or where Snapshot coming in to the picture.(Which phase).
Multiple Inserts and Multiple Reads
MongoDB allows multiple clients to read and write the same data.
In order to ensure consistency, it uses locking and other concurrency control measures to prevent multiple clients from modifying the same piece of data simultaneously
Read this documentation it will give you complete info about concurrency
concurrency reference
MongoDB allows very fast writes and updates by default. The tradeoff is that you are not explicitly notified of failures.By default most drivers do asynchronous, ‘unsafe’ writes - this means that the driver does not return an error directly, similar to INSERT DELAYED with MySQL. If you want to know if something succeeded, you have to manually check for errors using getLastError.
MongoDB doesn't offer durability if you use the default configuration. It writes once every minute data to the disk.
This can be configured using j Option and Write Concern on the insert query.
write-concern reference
Snapshot
The $snapshot operator prevents the cursor from returning a document more than once because an intervening write operation results in a move of the document.
Even in snapshot mode, objects inserted or deleted during the lifetime of the cursor may or may not be returned.
snapshot reference
References: here and here
Hope it Helps!!
I am asking that question in the context of journaling in mongodb. As per the mongodb documentation. A write operation first come into the private view.So the Quetion is if multiple write operation have been performed at the same time,than multiple private view will be created...
2;-Checkpoints and snapshot:in the journaling process which point of place snapshot of data is available..?
I'm trying to set up an app that will act as a front end to an externally updated mongo database. The data will be pushed into the database by another process.
I so far have the app connecting to the external mongo instance and pulling data out with on issues, but its not reactive (not seeing any of the new data going into the mongo database).
I've done some digging and it so far can only find that I might need to set up replica sets and use oplog, is there a way to do this without going to replica sets (or is that the best way anyway)?
The code so far is really simple, a single collection, a single publication (pulling out the last 10 records from the database) and a single template just displaying that data.
No deps that I've written (not sure if that's what I'm missing).
Thanks.
Any reason not to use Oplog? For what I've read it is the recommended approach even if your DB isn't updated by an external process, and a must if it does.
Nevertheless, without Oplog your app should see the changes on the DB made by the external process anyway. It should take longer (up to 10 seconds), but it should update.
I have one mongo instance running on amazon. There are 5M docs in a single collection. And 20docs/1sec come in data. No index. And my server just have 50G space, already used 22G.
Now I need to do some data analyse for those data, but because on index, I execute one query, the db is block and can't insert data until I restart the server.
And data keep come in, so I worry about the space is not enough.
What I'm trying to do is build another server, setup a new mongo instance, then copy the data into it. Then add index on the new one and do the analyse.
Waht is the best way, any suggestion?
Probably the best way is to just create an index in the background. This will not block anything and you can then just run the indexed query on your node. Creating an index in the background takes a bit longer but it does prevent the blocking:
db.collection.ensureIndex( { col: 1 }, { background: true } );
See also: http://docs.mongodb.org/manual/reference/method/db.collection.ensureIndex/
If you really want a secondary to do analysis, then you can create a replica set from your existing member. But for that you will have to take MongoDB down - and restart it with the replSet parameter. After starting it with that parameter, you can now add a new replica set member which will sync the data. This synching will also impact performance as lots of data will have to be copied. The primary will also need more disk space now because of the oplog that MongoDB needs to sync secondaries with.
mongodump and mongorestore can also be an option but then the data between the two nodes will not stay in sync. You would have to run the dump+restore each time you want to run analysis on the new data. In that case, a replica set might be better.
A replica set really wants 3 members though, to prevent a split brain in case a node goes down. This can be another data node, but in your case you would probably want to set-up an arbiter. If you don't want automatic failover (I don't think you'd need it in this case, as you're just doing analysis), then set up your replica set two nodes, but make the second (new) one hidden: http://docs.mongodb.org/manual/tutorial/configure-a-hidden-replica-set-member/
set up replica set from this existing member, and then add the index
on the secondary and do analysis.
Take a mongodump and restore to a new server and do the analysis
Need some way to push data from clients database to central database.Basically, there are several instances of MongoDB running on remote machines [clients] , and need some method to periodically update central mongo database with newly added and modified documents in clients.it must replicate its records to the single central server
Eg:
If I have 3 mongo instances running on 3 machines each having data of 10GB then after the data migration 4th machine's mongoDB must have 30GB of data. And cenral mongoDB machine must get periodically updated with data of all those 3 machines. But these 3 machines not only get new documents but existing documents in them may get updated. I would like the central mongoDB machine also to get these updations.
Your desired replication strategy is not formally supported by MongoDB.
A MongoDB replica set consists of a single primary with asynchronous replication to one or more secondary servers in the same replica set. You cannot configure a replica set with multiple primaries or replication to a different replica set.
However, there are a few possible approaches for your use case depending on how actively you want to keep your central server up to date and the volume of data/updates you need to manage.
Some general caveats:
Merging data from multiple standalone servers can create unexpected conflicts. For example, unique indexes would not know about documents created on other servers.
Ideally the data you are consolidating will still be separated by a unique database name per origin server so you don't have strange crosstalk between disparate documents that happen to have the same namespace and _id shared by different origin servers.
Approach #1: use mongodump and mongorestore
If you just need to periodically sync content to your central server, one way to do so is using mongodump and mongorestore. You can schedule a periodic mongodump from each of your standalone instances and use mongorestore to import them into the central server.
Caveats:
There is a --db parameter for mongorestore that allows you to restore into a different database from the original name (if needed)
mongorestore only performs inserts into the existing database (i.e. does not perform updates or upserts). If existing data with the same _id already exists on the target database, mongorestore will not replace it.
You can use mongodump options such as --query to be more selective on data to export (for example, only select recent data rather than all)
If you want to limit the amount of data to dump & restore on each run (for example, only exporting "changed" data), you will need to work out how to handle updates and deletions on the central server.
Given the caveats, the simplest use of this approach would be to do a full dump & restore (i.e. using mongorestore --drop) to ensure all changes are copied.
Approach #2: use a tailable cursor with the MongoDB oplog.
If you need more realtime or incremental replication, a possible approach is creating tailable cursors on the MongoDB replication oplog.
This approach is basically "roll your own replication". You would have to write an application which tails the oplog on each of your MongoDB instances and looks for changes of interest to save to your central server. For example, you may only want to replicate changes for selective namespaces (databases or collections).
A related tool that may be of interest is the experimental Mongo Connector from 10gen labs. This is a Python module that provides an interface for tailing the replication oplog.
Caveats:
You have to implement your own code for this, and learn/understand how to work with the oplog documents
There may be an alternative product which better supports your desired replication model "out of the box".
You should be aware that there are only replica set for doing replication there a replicat set always means: one primary, multiple secondary. Write always go to the primary server. Appearently you want multi-master replication which is not supported by MongoDB. So you want to look into a different technology like CouchDB or CouchBase. MongoDB is barrel burst here.
There may be a way since MongoDB 3.6 to achieve your goal: Change Streams.
Change streams allow applications to access real-time data changes without the complexity and risk of tailing the oplog. Applications can use change streams to subscribe to all data changes on a single collection, a database, or an entire deployment, and immediately react to them. Because change streams use the aggregation framework, applications can also filter for specific changes or transform the notifications at will.
There are some configuration options that affect whether you can use Change Streams or not, so please read about them.
Another option is Delayed Replica Set Members.
Because delayed members are a "rolling backup" or a running "historical" snapshot of the data set, they may help you recover from various kinds of human error. For example, a delayed member can make it possible to recover from unsuccessful application upgrades and operator errors including dropped databases and collections.
Hidden Replica Set Members may be another option to consider.
A hidden member maintains a copy of the primary's data set but is invisible to client applications. Hidden members are good for workloads with different usage patterns from the other members in the replica set.
Another option may be to configure a Priority 0 Replica Set Member.
Because delayed members are a "rolling backup" or a running "historical" snapshot of the data set, they may help you recover from various kinds of human error.
I am interested in these options myself, but I haven't decided what approach I will use.