Some blogs says that mapreduce is slower than aggregation. So which one is ideal to use?
If you go through the official document you can clearly see it written:
For most aggregation operations, the Aggregation Pipeline provides better performance and more coherent interface. However, map-reduce operations provide some flexibility that is not presently available in the aggregation pipeline.
Related
The MongoDB documentation says about map-reduce:
For most aggregation operations, the Aggregation Pipeline provides better performance and more coherent interface. However, map-reduce operations provide some flexibility that is not presently available in the aggregation pipeline.
Does it mean that there are some aggregation operations that cannot be performed in the usual MongoDB aggregation framework but are possible using map-reduce?
In particular I'm looking for an example of map-reduce that cannot be implemented in the MongoDB aggregation framework
Thanks!
An example of "flexibility". Basically, if you have any logic, that does not fit into standard aggregation operators, map-reduce is the only option to do it serverside.
While looking at documentation for map-reduce, I found that:
NOTE:
For most aggregation operations, the Aggregation Pipeline provides
better performance and more coherent interface. However, map-reduce
operations provide some flexibility that is not presently available in
the aggregation pipeline.
I did not understand much from it.
What are the use cases for using map-reduce over aggregation pipeline?
What flexibility does map-reduce provide?
How much delta is there in performance?
For one thing, Map/Reduce in MongoDB wasn't made for ad-hoc queries, there's considerable overhead to M/R. Even a very simple M/R operation on a small dataset can take in the hundreds of milliseconds because of that overhead.
I can't say much about the performance of M/R compared to the aggregation framework on large datasets in practice, but in theory, M/R operations on a large sharded database should be faster since the shards can run the operations largely in parallel.
As to the flexibility, since M/R actually runs javascript methods you have the full power of the language at your disposal. For example, let's say you wanted to group some data by the cosine of a field's value. Since there's neither a $cos operator in the aggregation framework, nor a meaningful way to build discrete buckets from continuous numbers (something like $truncate), the aggregation framework wouldn't help in that case.
So, in a nutshell, I'd say the use cases are
keeping the results of M/R in a separate collection and updating it from time to time (using the out parameter and merging the results)
Complex queries on large sharded data sets
Queries that are so complex that you can't use the aggregation framework. I'd say that's a pretty certain sign of a design flaw in the data structure, but in principle, it can help
When do I use prefer MapReduce over Pipeline in MongoDB or vice versa? I feel most of the aggregation operations are suitable for pipeline. What kind of complexity of the problem or what use case should make me go for MapReduce.
As a general rule of thumb: When you can do it with the aggregation pipeline, you should.
One reason is that the aggregation pipeline is able to use indexes and internal optimizations between the aggregation steps which are just not possible with MapReduce.
Aggregation is also a lot more secure when the operation is triggered by user input. When there are any user-supplied parameters to your query, MapReduce forces you to create javascript functions through string concatenation. This opens the door for dangerous Javascript code injection vulnerabilities. The APIs used for creating aggregation pipeline objects (in most programming languages!) usually has fewer such obvious pitfalls.
There are, however, still a few cases which can not be done easily or not at all with aggregation. For these cases, MapReduce has still a reason to exist.
Another limitation of the aggregation framework is that the intermediate dataset after each aggregation step is limited to 100MB unless you use the allowDiskUse option, which really slows down the query. MapReduce usually behaves a lot better when you need to work with a really large dataset.
I wanted to know about some examples or scenarios related to Mongo DB which can be done by map-reduce but not aggregation framework ?
Map-reduce is considered to be very powerful tool/mechanism of aggregating data. Then can some of you please share few scenarios where it is not possible for map-reduce to do it ?
Thanks & Best Regards.
In MongoDB currently aggregation framework is limited to 16MB of returned results.
MapReduce can write its output to a collection and has no size limitations.
MapReduce can group entire documents, aggregation framework works on field level. MapReduce can map keys to values and values to keys which can't be done any other way. MapReduce can also call/use various JavaScript built-in functions where aggregation is limited to functions and expressions which are built-in to its framework.
Do CouchDB or MongoDB support aggregation?
CouchDB uses map/reduce for this. Aggregation happens during the reduce stage.
For general information on CouchDB's map/reduce system, aka views, see:
http://guide.couchdb.org/editions/1/en/views.html
http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views
And for some examples of simulating SQL-style aggregation functions and those CouchDB provides, see:
http://guide.couchdb.org/editions/1/en/cookbook.html#aggregate
http://wiki.apache.org/couchdb/Built-In_Reduce_Functions
From MongoDB's point of view, yes.
See: http://www.mongodb.org/display/DOCS/Aggregation
Note there are limitations there (there is work being done to provide greater support for aggregrations, see here). Typically, you may need to use MapReduce (see Mongo docs here) especially on larger resultsets.