MongoDB and Aggregation Framework Pipeline Stages - mongodb

I have a doubt like,i am using mongodb aggregation framework but i have multiple stages of $lookup in the pipeline does it going to affect the performance.Is there any limitation on number of stages in the aggregation pipeline?

There is no limitation on the number of stages in a pipeline. However, there are result size and memory limitations, refer to the online doc. $lookup doesn't, at least for now, take advantage of indexes. The more data and stages you have, the more time mongo engines needs to process.

Related

Get data between stages in an aggregation Pipeline

Is it possible to retrieve the documents between stages in mongo aggregation pipeline?
Imagine that I have an aggregation pipeline running in pymongo with 10 stages and I want to be able to retrive some info available after stage 8 that will not be available on the last stage. Is it possible?
The idea is quite similar of this question, and looking at the answers I found this $facet but it wasn't clear for me if the stage1 of all outputFields are the same then it will be executed only once and perform as expected. And also, as I saw on the docs, $facet does not support indexes, that is a problem in my case.
To retrieve values of particular fields which are changed in subsequent stages, use $set to duplicate those values into new fields.
To retrieve the result set exactly as it exists after the 8th stage, send the first 8 stages as their own pipeline.

MongoDB $facet limitation

Scenario - For a pipeline the $facet stage has limitation of 16MB data for processing and pass the data to next stage. This means that if I have millions of records (as in my case), the data processed from any $facet stage will be limited to 16MB only.
Question -
How to overcome the above problem?
Are there any other pipeline stages that can help in this regard?
Can we use fix this issue at programming level? (Note: I am using C#'s mongo db driver).
Solutions already looked at :
Using "allowDiskUse" feature -> This doesn't work as expected.
For facet 104MB pagination limit in mongodb. Per record you can store
16MB size not more than.

In mongodb Which is better to use- MapReduce or Aggregation pipeline?

Some blogs says that mapreduce is slower than aggregation. So which one is ideal to use?
If you go through the official document you can clearly see it written:
For most aggregation operations, the Aggregation Pipeline provides better performance and more coherent interface. However, map-reduce operations provide some flexibility that is not presently available in the aggregation pipeline.

Flexibility of map-reduce in mongoDB

The MongoDB documentation says about map-reduce:
For most aggregation operations, the Aggregation Pipeline provides better performance and more coherent interface. However, map-reduce operations provide some flexibility that is not presently available in the aggregation pipeline.
Does it mean that there are some aggregation operations that cannot be performed in the usual MongoDB aggregation framework but are possible using map-reduce?
In particular I'm looking for an example of map-reduce that cannot be implemented in the MongoDB aggregation framework
Thanks!
An example of "flexibility". Basically, if you have any logic, that does not fit into standard aggregation operators, map-reduce is the only option to do it serverside.

Aggregation framework on full table scan

I know that aggregation framework is suitable if there is an initial $match pipeline to limit the collection to be aggregated. However, there may be times that the filtered collection may still be large, say around 2 million and the aggregation will involve $group. Is the aggregation framework fit to work on such a collection given a requirement to output results in at most 5 seconds. Currently I work on a single node. By performing the aggregation on a shard set, will there be a significant improvement in the performance.
As far as I know the only limitations are that the result of the aggregation can't surpass the limit of 16MB, since what it returns is a document and that's the limit size for a document in MongoDB. Also you can't use more than 10% of the total memory of the machine, for that usually $match phases are used to reduce the set you work with, or a $project phase to reduce the data per document.
Be aware that in a sharded environment after $group or $sort phases the aggregation is brought back to the MongoS before sending it to the next phase of the pipeline. Potentially the MongoS could be running in the same machine as your application and could hurt your application performance if not handled correctly.