When do I need map reduce for database queries? - mongodb

In CouchDB you always have to use map reduce to query results.
In MongoDB you can their query methods for retrieving data, but they also let you do map-reduce.
I wonder, when do I actually need map-reduce?
Are those query methods different from map-reduce or are they just wrappers for map-reduce functions?

MapReduce is needed for aggregations in MongoDB. The normal queries follow a very different (and much faster) code path and they should always be used for real-time operations. MapReduce is definitely not intended for real-time, it's more for batch jobs.
Technically, you could write all your queries using MapReduce, but that would be both painful and slow.

Related

MongoDB Aggregation V/S simple query performance?

I am reasking this question as i thought this question should be on seperate thread from this one in-mongodb-know-index-of-array-element-matched-with-in-operator.
I am using mongoDB and actually i was writing all of my queries using simple queries which are find, update etc. (No Aggregations). Now i read on many SO posts see this one for example mongodb-aggregation-match-vs-find-speed. Now i thought about why increasing computation time on server because as if i will compute more then my server load will become more, so i tried to use aggregations and i thought i am going in right direction now. But later on my previous question andreas-limoli told me about not using aggregations as it is slow and for using simple queries and computing on server. Now literally i am in a delimma about what should i use, i am working with mongoDB from a year now but i don't have any knowledge about its performance when data size increases so i completely don't know which one should i pick.
Also one more thing i didn't find on anywhere, if aggregation is slower than is it because of $lookup or not, because $lookup is the foremost thing i thought about using aggregation because otherwise i have to execute many queries serially and then compute on server which appears to me very poor in front of aggregation.
Also i read about 100MB restriction on mongodb aggregation when passing data from one pipeline to other, so how people handle that case efficiently and also if they turn on Disk usage then because Disk usage slow down everything than how people handle that case.
Also i fetched 30,000 sample collection and tried to run aggregation with $match and find query and i found that aggregation was little bit faster than find query which was aggregation took 180ms to execute where as find took 220 ms to execute.
Please help me out guys please it would be really helpful for me.
Aggregation pipelines are costly queries. It might impact on your performance as an increasing data because of CPU memory. If you can achieve the with find query, go for it because Aggregation is costlier once DB data increases.
Aggregation framework in MongoDB is similar to join operations in SQL. Aggregation pipelines are generally resource intensive operations. So if in case your work is satisfied with simple queries, you should use that one at first place.
However, if it is absolute necessary then you can use aggregation pipelines in case you need to fetch the data from the multiple collections.

When to use map reduce over Aggregation Pipeline in MongoDB?

While looking at documentation for map-reduce, I found that:
NOTE:
For most aggregation operations, the Aggregation Pipeline provides
better performance and more coherent interface. However, map-reduce
operations provide some flexibility that is not presently available in
the aggregation pipeline.
I did not understand much from it.
What are the use cases for using map-reduce over aggregation pipeline?
What flexibility does map-reduce provide?
How much delta is there in performance?
For one thing, Map/Reduce in MongoDB wasn't made for ad-hoc queries, there's considerable overhead to M/R. Even a very simple M/R operation on a small dataset can take in the hundreds of milliseconds because of that overhead.
I can't say much about the performance of M/R compared to the aggregation framework on large datasets in practice, but in theory, M/R operations on a large sharded database should be faster since the shards can run the operations largely in parallel.
As to the flexibility, since M/R actually runs javascript methods you have the full power of the language at your disposal. For example, let's say you wanted to group some data by the cosine of a field's value. Since there's neither a $cos operator in the aggregation framework, nor a meaningful way to build discrete buckets from continuous numbers (something like $truncate), the aggregation framework wouldn't help in that case.
So, in a nutshell, I'd say the use cases are
keeping the results of M/R in a separate collection and updating it from time to time (using the out parameter and merging the results)
Complex queries on large sharded data sets
Queries that are so complex that you can't use the aggregation framework. I'd say that's a pretty certain sign of a design flaw in the data structure, but in principle, it can help

(Real time) Small data aggregation MongoDB: triggers?

What is a reliable and efficient way to aggregate small data in MongoDB?
Currently, my data that needs to be aggregated is under 1 GB, but can go as high as 10 GB. I'm looking for a real time strategy or near real time (aggregation every 15 minutes).
It seems like the likes of Map/Reduce, Hadoop, Storm are all over kill. I know that triggers don't exist, but I found this one post that may be ideal for my situation. Is creating a trigger in MongoDB an ideal solution for real time small data aggregation?
MongoDB has two built-in options for aggregating data - the aggregation framework and map-reduce.
The aggregation framework is faster (executing as native C++ code as opposed to a JavaScript map-reduce job) but more limited in the sorts of aggregations that are supported. Map-reduce is very versatile and can support very complex aggregations but is slower than the aggregation framework and can be more difficult to code.
Either of these would be a good option for near real time aggregation.
One further consideration to take into account is that as of the 2.4 release the aggregation framework returns a single document containing its results and is therefore limited to returning 16MB of data. In contrast, MongoDB map-reduce jobs have no such limitation and may output directly to a collection. In the upcoming 2.6 release of MongoDB, the aggregation framework will also gain the ability to output directly to a collection, using the new $out operator.
Based on the description of your use case, I would recommend using map-reduce as I assume you need to output more than 16MB of data. Also, note that after the first map-reduce run you may run incremental map-reduce jobs that run only on the data that is new/changed and merge the results into the existing output collection.
As you know, MongoDB doesn't support triggers, but you may easily implement triggers in the application by tailing the MongoDB oplog. This blog post and this SO post cover the topic well.

Why db object is not accessible from map function in Mongodb's Mapreduce?

In Mongodb's Mapreduce, Before I think there was "db"(like db.anotherCollection.find()) object accessible inside map function. But this features has been removed(from version 1.6 or so on), which make difficult in case of join. what was the reason? why it has been removed?
As at MongoDB 2.4 there are several reasons to disallow access to the db object with from within Map/Reduce functions including:
Deadlocks: There are potential deadlock scenarios between database and/or JavaScript locks called from within the same server-side function.
Performance: The Map/Reduce pattern calls reduce() multiple times; each iteration is a different JavaScript context and would have to open new connections to the database and allocate additional memory for query results. Long-running JavaScript operations will block other operations.
Security: Cross-database queries require appropriate authentication checks.
The above issues could be further complicated for Map/Reduce jobs reading or writing to sharded clusters. The MongoDB Map/Reduce implementation is currently only designed to work with data from a single input collection, and any historical abuses of db object within Map/Reduce functions should be considered a bug rather than a feature.
If you want to merge data with Map/Reduce, you can use an Incremental Map/Reduce. Depending on what outcome you are trying to achieve, there are other approaches that may be more straightforward such as adjusting your schema or doing joins in your application code via multiple queries.

Examples which can be done by map reduce only and not aggregation framework in mongodb?

I wanted to know about some examples or scenarios related to Mongo DB which can be done by map-reduce but not aggregation framework ?
Map-reduce is considered to be very powerful tool/mechanism of aggregating data. Then can some of you please share few scenarios where it is not possible for map-reduce to do it ?
Thanks & Best Regards.
In MongoDB currently aggregation framework is limited to 16MB of returned results.
MapReduce can write its output to a collection and has no size limitations.
MapReduce can group entire documents, aggregation framework works on field level. MapReduce can map keys to values and values to keys which can't be done any other way. MapReduce can also call/use various JavaScript built-in functions where aggregation is limited to functions and expressions which are built-in to its framework.