Aggregation Execution Stats (Mongo) - mongodb

I can see in the documentation that explain("executionStats") is available for some of the mongo queries which is really helpful but I can't see such utility for the aggregation query, apart from the explain flag which doesn't give much. Is there a plan to integrate the full overview in the aggregation query too?

This function can be applied to aggregate queries, it just needs to be called before the .aggregate() function, and with 'executionStats', otherwise you will only get details on the query's pipeline and not it's execution.
db.collection.explain('executionStats').aggregate([..]);

Related

Does mongodb use index search in lookup stage?

I'm querying a collection with aggregate function in MongoDB and I have to look up some other collections in its aggregation. But I have a question about it:
Does MongoDB use indexes for foreignField? I wasn't able to figure this out and I searched
everywhere for this but I didn't get my answer. It must certainly use indexes for it but I just want to be sure.
The best way to determine how the database is executing a query is to generate and examine the explain output for the operation. With aggregations that include the $lookup stage specifically you will want to use the more verbose .explain("executionStats") mode. You may also utilize the $indexStats operator to confirm that the usage count of the intended index is increasing.
The best answer we can give based on the limited information in the question is: MongoDB will probably use the index. Query execution behavior, including index usage, depends on the situation and the version. If you provide more information in your question, then we can provide more specific information. There is also some details about index usage on the $lookup documentation page.

In mongodb Which is better to use- MapReduce or Aggregation pipeline?

Some blogs says that mapreduce is slower than aggregation. So which one is ideal to use?
If you go through the official document you can clearly see it written:
For most aggregation operations, the Aggregation Pipeline provides better performance and more coherent interface. However, map-reduce operations provide some flexibility that is not presently available in the aggregation pipeline.

Mongo - How to explain query without executing it

I'm new to Mongo and have searched but don't see a specific answer.
I understand that Mongo explain method will execute the query in parallel with possible access plans and choose a winning plan based on execution time.
The Best Practices Guide states "The query plan can be calculated and returned without first having to run the query". I cannot find how to do this in the doc.
So what if even the winning plan takes a very long time to execute before it returns even a small result set, for example sorting a large collection?
I've seen older comments that execution stops after the first 101 documents are returned, but again can't find that in official doc.
So my question is: How to get the access plan without executing the query?
Thanks for your help. I'm using Mongo 3.4.
You can set the verbosity of the explain() function. By default it uses the queryPlanner which is (if I'm right) what you are looking for.
Check the other modes in the official MongoDB documentation about explain()
MongoDB builds Query Plans by running a limited subset of a query, then caches the winning plan so it can be re-used (until something causes a cache flush).
So the "limited subset" means that it will get documents that match up to the 100 docs as a "sample", and go no further, but what the documentation means by
The query plan can be calculated and returned without first having to
run the query
is that you can do an explain before you run the full query. This is good practice because it then populates that Plan Cache for you ahead of time.
To put it simply, if MongoDB notices that another query is taking longer than a completed query, it will cease execution and ditch the plan in favour of the more efficient plan. For example, you can see this in action by running the allPlansExecution explain where an index is chosen over a colscan.

Using MapReduce as a Stage in Mongo DB Aggregation Pipeline

I want to use Mongo DB MapReduce functionality along with Aggregation Query.
The below are the stages which I see could be part of the Aggregation pipeline.
Filter docs for which the user has access based on content in the docs and
passed security context(roles of the user)
(Using $REDACT)
Filter based on one or more criteria (Using MATCH)
Tokenize the words in returned docs based on above filtering and populate a
collection (Using MAPREDUCE) (OR) return the docs inline
Query the populated collection/returned docs inline for words based on user
criteria using like query(REGEX) and return the words along with their
locations
I able to achieve steps 1,2 and 4 in the aggregation pipeline.
I am able to achieve Stage 3, separately by using mapreduce functionality in Mongo DB.
I want to make the mapreduce operation also as a stage in the aggregation pipeline and use it to receive the filtered docs from the earlier steps and pass the processed result to next step.
The mapreduce operation is based on sample map and reduce operation. I intend to use the map , reduce and finalize functions as shared in the below stackoverflow issue.
Implement auto-complete feature using MongoDB search
My query is I do now know if we can have MapReduce operation as part of the Mongo DB aggregation pipeline and if so can we use as inline and pass it to the next stage.
I am using Spring Data Mongo DB to implement the Mongo DB aggregation solution.
If someone has implemented the same please help me on this.

MongoDB aggregation comparison: group(), $group and MapReduce

I am somewhat confused about when to use group(), aggregate with $group or mapreduce. I read the documentation at http://www.mongodb.org/display/DOCS/Aggregation for group(), http://docs.mongodb.org/manual/reference/aggregation/group/#_S_group for $group.. Is sharding the only situation where group() won't work? Also, I get this feeling that $group is more powerful than group() because it can be used in conjunction with other pipeline operators from aggregation framework.. How does $group compare with mapreduce? I read somewhere that it doesn't generate any temporary collection whereas mapreduce does. Is that so?
Can someone present an illustration or guide me to a link where these three concepts are explained together, taking the same sample data, so I can compare them easily?
EDIT:Also, it would be great if you can point out anything new specifically in these commands since the new 2.2 release came out..
It is somewhat confusing since the names are similar, but the group() command is a different feature and implementation from the $group pipeline operator in the Aggregation Framework.
The group() command, Aggregation Framework, and MapReduce are collectively aggregation features of MongoDB. There is some overlap in features, but I'll attempt to explain the differences and limitations of each as at MongoDB 2.2.0.
Note: inline result sets mentioned below refer to queries that are processed in memory with results returned at the end of the function call. Alternative output options (currently only available with MapReduce) could include saving results to a new or existing collection.
group() Command
Simple syntax and functionality for grouping .. analogous to GROUP BY in SQL.
Returns result set inline (as an array of grouped items).
Implemented using the JavaScript engine; custom reduce() functions can be written in JavaScript.
Current Limitations
Will not group into a result set with more than 20,000 keys.
Results must fit within the limitations of a BSON document (currently 16MB).
Takes a read lock and does not allow any other threads to execute JavaScript while it is running.
Does not work with sharded collections.
See also: group() command examples.
MapReduce
Implements the MapReduce model for processing large data sets.
Can choose from one of several output options (inline, new collection, merge, replace, reduce)
MapReduce functions are written in JavaScript.
Supports non-sharded and sharded input collections.
Can be used for incremental aggregation over large collections.
MongoDB 2.2 implements much better support for sharded map reduce output.
Current Limitations
A single emit can only hold half of MongoDB's maximum BSON document size (16MB).
There is a JavaScript lock so a mongod server can only execute one JavaScript function at a point in time .. however, most steps of the MapReduce are very short so locks can be yielded frequently.
MapReduce functions can be difficult to debug. You can use print() and printjson() to include diagnostic output in the mongod log.
MapReduce is generally not intuitive for programmers trying to translate relational query aggregation experience.
See also: Map/Reduce examples.
Aggregation Framework
New feature in the MongoDB 2.2.0 production release (August, 2012).
Designed with specific goals of improving performance and usability.
Returns result set inline.
Supports non-sharded and sharded input collections.
Uses a "pipeline" approach where objects are transformed as they pass through a series of pipeline operators such as matching, projecting, sorting, and grouping.
Pipeline operators need not produce one output document for every input document: operators may also generate new documents or filter out documents.
Using projections you can add computed fields, create new virtual sub-objects, and extract sub-fields into the top-level of results.
Pipeline operators can be repeated as needed (for example, multiple $project or $group steps.
Current Limitations
Results are returned inline, so are limited to the maximum document size supported by the server (16MB)
Doesn't support as many output options as MapReduce
Limited to operators and expressions supported by the Aggregation Framework (i.e. can't write custom functions)
Newest server feature for aggregation, so has more room to mature in terms of documentation, feature set, and usage.
See also: Aggregation Framework examples.
Can someone present an illustration or guide me to a link where these three concepts are explained together, taking the same sample data, so I can compare them easily?
You generally won't find examples where it would be useful to compare all three approaches, but here are previous StackOverflow questions which show variations:
group() versus Aggregation Framework
MapReduce versus Aggregation Framework