I understand that MongoDB has the ability to do bulkWrite / bulkExec per collection... But what I don't understand is how to do it across the whole database...
Currently I'm doing things in parallel via Promise.all([collectionA.op1,collectionB.op2...]) but this seems incredibly inefficient as it's doing a new network request to Mongo for each operation.
It seems that if I could bulk up all the instructions I have and just send them to Mongo to operate on it would be much more efficient.
Does MongoDB support this? If not, why wouldn't it?
Related
I am new to MongoDB. I read that MongoDB does not support multi-document transactionshere http://docs.mongodb.org/manual/faq/fundamentals/.
If I want to save data in two collections(A and B) atomically, then i can't do that using MongoDB i.e. if save fails in case of B, still A will have the data. Isn't it a big disadvantage?
Still, people are using MongoDB rather than RDBMS. Why?
MongoDB 4.0 adds support for multi-document ACID transactions now.
For reference
See Refrence
UPDATE
MongoDB have already started to support multi-document transactions.
https://docs.mongodb.com/manual/core/transactions/
MongoDB does not support multi-document transactions.
However, MongoDB does provide atomic operations on a single document. Often these document-level atomic operations are sufficient to solve problems that would require ACID transactions in a relational database.
For example, in MongoDB, you can embed related data in nested arrays or nested documents within a single document and update the entire document in a single atomic operation. Relational databases might represent the same kind of data with multiple tables and rows, which would require transaction support to update the data atomically.
MongoDB doesn't support transactions, but saving one document is atomic.
So, it is better to design you database schema in such a way, that all the data needed to be saved atomically will be placed in one document.
MongoDB does not support transactions as in Relational DB. ACID postulates in transactions is a complete different functionality provided by storage engines in MySQL
Some of the features of InnoDB engine in MySQL:
Crash Recovery
Double write buffer
Auto commit settings
Isolation Level
This is what MongoDB community has to say:
MongoDB does not have support for traditional locking or complex transactions with rollback.
MongoDB aims to be lightweight, fast, and predictable in its performance. By keeping transaction support extremely simple, MongoDB can provide greater performance especially for partitioned or replicated systems with a number of database server processes.
The purpose of a transaction is to make sure that the whole database stays consistent while multiple operations take place.
But in contrary to most relational databases, MongoDB isn't designed to run on a single host. It is designed to be set up as a cluster of multiple shards where each shard is a replica-sets of multiple servers (optionally at different geographical locations).
But if you are still looking for way to make transactions possible:
Try using document level atomicity provided by mongo
two phase commit in Mongo provides simple transaction mechanism for basic operations
mongomvcc is built on the top of mongo and also supports transaction as they say
Hybrid of MySQL and Mongo
Multi-document updates or “multi-document transactions” using a two-phase commit approac described here: http://docs.mongodb.org/manual/tutorial/perform-two-phase-commits/
This question is quite old but for anyone who stumbles upon this page, you could use fawn. It's an npm package that solves this exact problem. Disclosure: I wrote it
Say you have two bank accounts, one belongs to John Smith and the other belongs to Broke Individual. You would like to transfer $20 from John Smith to Broke Individual. Assuming all first name and last name pairs are unique, this might look like:
var Fawn = require("fawn");
var task = Fawn.Task()
//assuming "Accounts" is the Accounts collection
task.update("Accounts", {firstName: "John", lastName: "Smith"}, {$inc: {balance: -20}})
.update("Accounts", {firstName: "Broke", lastName: "Individual"}, {$inc: {balance: 20}})
.run()
.then(function(){
//update is complete
})
.catch(function(err){
// Everything has been rolled back.
//log the error which caused the failure
console.log(err);
});
Caveat:
tasks are currently not isolated(working on that) so, technically, it's possible for two tasks to retrieve and edit the same document just because that's how MongoDB works.
It's really just a generic implementation of the two phase commit example on the tutorial site: https://docs.mongodb.com/manual/tutorial/perform-two-phase-commits/
Starting from version 4.0 MongoDB will add support for multi-document transactions. So you will have the power of the document model with ACID guarantees in MongoDB.
Transactions in MongoDB will be like transactions in relational databases.
For details visit this link: https://www.mongodb.com/blog/post/multi-document-transactions-in-mongodb?jmp=community
Only support Single Document Transaction.
You can see it at: https://docs.mongodb.com/v3.2/tutorial/perform-two-phase-commits/
How does Meteor handle the process of DB indexing? I've read that there are none at this time but I'm particularly concerned with very large data sets, joined with multiple lookups, etc. and will really impact performance. Are these issues taken care of by Mongo and Meteor?
I am coming from a Rails/PostgreSQL background and am about 2 days into Meteor and Mongo.
Thanks.
Meteor does expose a method for creating indexes, which maps to the mongo method db.collection.ensureIndex
You can access it on each Meteor.Collection instance, on the server. For Example:
if (Meteor.isServer){
var myCollection = new Meteor.Collection("dummy");
// create an index on 'dummy', field1 & field2
myCollection._ensureIndex({field1: 1, field2: 1});
}
From a performance POV, create indexes based on what you publish, but avoid over-indexing.
With oplog tailing, the initial query will only run occasionally- and get changes from the oplog.
Without oplog tailing, meteor will re-run the query every 10s, so better indexes have a large gain.
Got a response from the Discover Meteor book folks:
Sacha Greif Mod − Actually, we are in the process of writing a new
sidebar to address migrations. You'll have access to it for free if
you're on the Full or Premium packages :)
Regarding indexes, I think we might address that in an upcoming blog
post :)
Thanks much for the reply. I'm looking forward to both.
Can anyone give example use cases of when you would benefit from using Redis and MongoDB in conjunction with each other?
Redis and MongoDB can be used together with good results. A company well-known for running MongoDB and Redis (along with MySQL and Sphinx) is Craiglist. See this presentation from Jeremy Zawodny.
MongoDB is interesting for persistent, document oriented, data indexed in various ways. Redis is more interesting for volatile data, or latency sensitive semi-persistent data.
Here are a few examples of concrete usage of Redis on top of MongoDB.
Pre-2.2 MongoDB does not have yet an expiration mechanism. Capped collections cannot really be used to implement a real TTL. Redis has a TTL-based expiration mechanism, making it convenient to store volatile data. For instance, user sessions are commonly stored in Redis, while user data will be stored and indexed in MongoDB. Note that MongoDB 2.2 has introduced a low accuracy expiration mechanism at the collection level (to be used for purging data for instance).
Redis provides a convenient set datatype and its associated operations (union, intersection, difference on multiple sets, etc ...). It is quite easy to implement a basic faceted search or tagging engine on top of this feature, which is an interesting addition to MongoDB more traditional indexing capabilities.
Redis supports efficient blocking pop operations on lists. This can be used to implement an ad-hoc distributed queuing system. It is more flexible than MongoDB tailable cursors IMO, since a backend application can listen to several queues with a timeout, transfer items to another queue atomically, etc ... If the application requires some queuing, it makes sense to store the queue in Redis, and keep the persistent functional data in MongoDB.
Redis also offers a pub/sub mechanism. In a distributed application, an event propagation system may be useful. This is again an excellent use case for Redis, while the persistent data are kept in MongoDB.
Because it is much easier to design a data model with MongoDB than with Redis (Redis is more low-level), it is interesting to benefit from the flexibility of MongoDB for main persistent data, and from the extra features provided by Redis (low latency, item expiration, queues, pub/sub, atomic blocks, etc ...). It is indeed a good combination.
Please note you should never run a Redis and MongoDB server on the same machine. MongoDB memory is designed to be swapped out, Redis is not. If MongoDB triggers some swapping activity, the performance of Redis will be catastrophic. They should be isolated on different nodes.
Obviously there are far more differences than this, but for an extremely high overview:
For use-cases:
Redis is often used as a caching layer or shared whiteboard for distributed computation.
MongoDB is often used as a swap-out replacement for traditional SQL databases.
Technically:
Redis is an in-memory db with disk persistence (the whole db needs to fit in RAM).
MongoDB is a disk-backed db which only needs enough RAM for the indexes.
There is some overlap, but it is extremely common to use both. Here's why:
MongoDB can store more data cheaper.
Redis is faster for the entire dataset.
MongoDB's culture is "store it all, figure out access patterns later"
Redis's culture is "carefully consider how you'll access data, then store"
Both have open source tools that depend on them, many of which are used together.
Redis can be used as a replacement for a traditional datastore, but it's most often used with another normal "long" data store, like Mongo, Postgresql, MySQL, etc.
Redis works excellently with MongoDB as a caching server. Here is what happens.
Anytime that mongoose issues a cache query, it will first go over to the cache server.
The cache server will check to see if that exact query has ever been issued before.
If it hasn’t then the cache server will take the query, send it over to mongodb and Mongo will execute the query.
We will then take the result of that query, it then goes back to the cache server, the cache server will store the result of the query on itself.
It will say anytime I execute that query, I get this response and so its going to maintain a record between queries that are issued and responses that come back from those queries.
The cache server will take the response and send it back to mongoose, mongoose will give it to express and it eventually ends up inside the application.
Anytime that the same exact query is issued again, mongoose will send the same query to the cache server, but if the cache server sees that this query was issued before it will not send the query onto mongodb, instead its going to take the response to the query it got the last time and immediately send it back over to mongoose. There is no indices here, no full table scan, nothing.
We are doing a simple lookup to say has this query been executed? Yes? Okay, take the request and send it back immediately and don’t send anything to mongo.
We have the mongoose server, the cache server (Redis) and Mongodb.
On the cache server there might be a datastore with key value type of data store where all the keys are some type of query issued before and the value the result of that query.
So maybe we are looking up a bunch of blogposts by _id.
So maybe the keys in here are the _id of the records we have looked up before.
So lets imagine that mongoose issues a new query where it tries to find a blogpost with _id of 123, the query flows into the cache server, the cache server will check to see if it has a result for any query that was looking for an _id of 123.
If it does not exist in the cache server, this query is taken and sent on to the mongodb instance. Mongodb will execute the query, get a response and send it back.
This result is sent back over to the cache server who takes that result and immediately sends it back to mongoose so we get as fast a response as possible.
Right after that, the cache server will also take the query issued, and add that on to its collection of queries that have been issued and take the result of the query and store it right up against the query.
So we can imagine that in the future we issue the same query again, it hits the cache server, it looks at all the keys it has and says oh I already found that blogpost, it doesn’t reach out to mongo, it just takes the result of the query and sends it directly to mongoose.
We are not doing complex query logic, no indices, nothing like that. Its as fast as possible. Its a simple key value lookup.
Thats an overview of how the cache server (Redis) works with MongoDB.
Now there are other concerns. Are we caching data forever? How do we update records?
We don’t want to always be storing data in the cache and be reading from the cache.
The cache server is not used for any write actions. The cache layer is only used for reading data. If we ever write data, writing will always go over to the mongodb instance and we need to ensure that anytime we write data we clear any data stored on the cache server that is related to the record we just updated in Mongo.
What is actually happening behind the scene with a big InsertBatch if
one is writing to a sharded cluster? Does MongoDb actually support
bulk insert or the InserBatch is actually inserting one at a time at
the server level? How does this work with sharding then? Does this
mean that a mongos will look at every item in the batch to figure out
what is the shard key of each item and then will route it to the right
server? This will break bulk insert if it exist and does not seem to
be efficient. What is the mechanics of InsertBatch for a sharding
solution? I am using version 2.0 and willing to upgrade if that makes any difference
Bulk inserts are an actual MongoDB feature and are (somewhat) more performant than seperate per-document inserts due to less roundtrips.
In a sharded environment if mongos receives a bulk insert it will figure out which part of the bulk has to be sent to which shard. There are no differences between 2.0 and 2.1 and it is the most efficient way to bulk insert data into a sharded database.
If you're curious to how exactly mongos works have a look at it's source code here :
https://github.com/mongodb/mongo/tree/master/src/mongo/s
Is it possible to run MongoDB commands like a query to grab additional data or to do an update from with in MongoDB's MapReduce command. Either in the Map or the Reduce function?
Is this completely ludicrous to do anyways? Currently I have some documents that refer to separate collections using the MongoDB DBReference command.
Thanks for the help!
Is it possible to run MongoDB commands... from within MongoDB's MapReduce command.
In theory, this is possible. In practice there are lots of problems with this.
Problem #1: exponential work. M/R is already pretty intense and poorly logged. Adding queries can easily make M/R run out of control.
Problem #2: context. Imagine that you're running a sharded M/R and you are querying into an unsharded collection. Does the current context even have that connection?
You're basically trying to implement JOIN logic and MongoDB has no joins. Instead, you may need to build the final data in a couple of phases by running a few loops on a few sets of data.