Why aggregate+sort is faster than find+sort in mongo? - mongodb

I'm using mongoose in my project. When the number of documents in my collection becomes bigger, the method of find+sort becomes slower. So I use aggregate+$sort instead. I just wonder why?

Without seeing your data and your query it is difficult to answer why aggregate+sort is faster than find+sort.
But below are the things that holds good on find and aggregate
A well indexed(Indexing that suits your query) data will always yield faster results on your find query.
The components of aggregation pipeline which you use on your aggregate query, more operations is directly proportional to more execution time.
When you go for aggregation pipeline you can create new fields such as sum, avg and so on, which is not possible in a find.
see this thread for more info
MongoDB {aggregation $match} vs {find} speed

Related

MongoDB Aggregation V/S simple query performance?

I am reasking this question as i thought this question should be on seperate thread from this one in-mongodb-know-index-of-array-element-matched-with-in-operator.
I am using mongoDB and actually i was writing all of my queries using simple queries which are find, update etc. (No Aggregations). Now i read on many SO posts see this one for example mongodb-aggregation-match-vs-find-speed. Now i thought about why increasing computation time on server because as if i will compute more then my server load will become more, so i tried to use aggregations and i thought i am going in right direction now. But later on my previous question andreas-limoli told me about not using aggregations as it is slow and for using simple queries and computing on server. Now literally i am in a delimma about what should i use, i am working with mongoDB from a year now but i don't have any knowledge about its performance when data size increases so i completely don't know which one should i pick.
Also one more thing i didn't find on anywhere, if aggregation is slower than is it because of $lookup or not, because $lookup is the foremost thing i thought about using aggregation because otherwise i have to execute many queries serially and then compute on server which appears to me very poor in front of aggregation.
Also i read about 100MB restriction on mongodb aggregation when passing data from one pipeline to other, so how people handle that case efficiently and also if they turn on Disk usage then because Disk usage slow down everything than how people handle that case.
Also i fetched 30,000 sample collection and tried to run aggregation with $match and find query and i found that aggregation was little bit faster than find query which was aggregation took 180ms to execute where as find took 220 ms to execute.
Please help me out guys please it would be really helpful for me.
Aggregation pipelines are costly queries. It might impact on your performance as an increasing data because of CPU memory. If you can achieve the with find query, go for it because Aggregation is costlier once DB data increases.
Aggregation framework in MongoDB is similar to join operations in SQL. Aggregation pipelines are generally resource intensive operations. So if in case your work is satisfied with simple queries, you should use that one at first place.
However, if it is absolute necessary then you can use aggregation pipelines in case you need to fetch the data from the multiple collections.

How to excute map reduce on map reduce result in mongodb

i want to know whether i can excute map reduce on a result of map reduce function previous like pipeline without write it on a collection, thanks all. My english is bad, hope you understand my question :(
Chaining of map reduce is not supported at this time without storing intermediary data in some kind of collection.
Again map reduce in MongoDB is not very efficient and MongoDB recommend to export data and run map reduce in proper framework like Hadoop if you have to.
Yes, but this could cost you a lot of performance, you have to store the first result in a new colllection, then run next map-reduce on the previous output collection. See this for more infomation.
However, you still can pipeline query results through aggregation pipeline, see this. So consider convert your map-reduce to aggregation.

Query with $in operator and large list of Ids

I have a pretty large number of document Ids to iterate through (say 5k-10k). The $in operator doesn't limit that number starting from mongodb version 2.6. Earlier versions had a combinatorial limit of 4 mio.
That said, does it make sense at all to do something like that in mongodb or is it an anti-pattern with performance penalties and I should iterate manually in the application layer?
It's somewhat of an anti-pattern, but sometimes there's no other choice.
If you can change the schema and make that query redundant then you should. If you can't, doing it yourself will surly be slower than letting MongoDB do it.
However, there is another limit you need to consider. Each document in MongoDB is limited to 16MB and each query is sent as a document so with enough items you may reach that limit and get an exception.

When to use map reduce over Aggregation Pipeline in MongoDB?

While looking at documentation for map-reduce, I found that:
NOTE:
For most aggregation operations, the Aggregation Pipeline provides
better performance and more coherent interface. However, map-reduce
operations provide some flexibility that is not presently available in
the aggregation pipeline.
I did not understand much from it.
What are the use cases for using map-reduce over aggregation pipeline?
What flexibility does map-reduce provide?
How much delta is there in performance?
For one thing, Map/Reduce in MongoDB wasn't made for ad-hoc queries, there's considerable overhead to M/R. Even a very simple M/R operation on a small dataset can take in the hundreds of milliseconds because of that overhead.
I can't say much about the performance of M/R compared to the aggregation framework on large datasets in practice, but in theory, M/R operations on a large sharded database should be faster since the shards can run the operations largely in parallel.
As to the flexibility, since M/R actually runs javascript methods you have the full power of the language at your disposal. For example, let's say you wanted to group some data by the cosine of a field's value. Since there's neither a $cos operator in the aggregation framework, nor a meaningful way to build discrete buckets from continuous numbers (something like $truncate), the aggregation framework wouldn't help in that case.
So, in a nutshell, I'd say the use cases are
keeping the results of M/R in a separate collection and updating it from time to time (using the out parameter and merging the results)
Complex queries on large sharded data sets
Queries that are so complex that you can't use the aggregation framework. I'd say that's a pretty certain sign of a design flaw in the data structure, but in principle, it can help

(Real time) Small data aggregation MongoDB: triggers?

What is a reliable and efficient way to aggregate small data in MongoDB?
Currently, my data that needs to be aggregated is under 1 GB, but can go as high as 10 GB. I'm looking for a real time strategy or near real time (aggregation every 15 minutes).
It seems like the likes of Map/Reduce, Hadoop, Storm are all over kill. I know that triggers don't exist, but I found this one post that may be ideal for my situation. Is creating a trigger in MongoDB an ideal solution for real time small data aggregation?
MongoDB has two built-in options for aggregating data - the aggregation framework and map-reduce.
The aggregation framework is faster (executing as native C++ code as opposed to a JavaScript map-reduce job) but more limited in the sorts of aggregations that are supported. Map-reduce is very versatile and can support very complex aggregations but is slower than the aggregation framework and can be more difficult to code.
Either of these would be a good option for near real time aggregation.
One further consideration to take into account is that as of the 2.4 release the aggregation framework returns a single document containing its results and is therefore limited to returning 16MB of data. In contrast, MongoDB map-reduce jobs have no such limitation and may output directly to a collection. In the upcoming 2.6 release of MongoDB, the aggregation framework will also gain the ability to output directly to a collection, using the new $out operator.
Based on the description of your use case, I would recommend using map-reduce as I assume you need to output more than 16MB of data. Also, note that after the first map-reduce run you may run incremental map-reduce jobs that run only on the data that is new/changed and merge the results into the existing output collection.
As you know, MongoDB doesn't support triggers, but you may easily implement triggers in the application by tailing the MongoDB oplog. This blog post and this SO post cover the topic well.