Mongodb aggregation pipeline stages - mongodb

I have an aggregation of say 4 stages match,lookup, unwind, project as mentioned in the same order.
Suppose if the resultant data from the match stage is null that is it does not return any document then does the aggregation breaks? or it passes null data to next stage and executes all the next stages?
If it executes all the next stage then how to break the aggregation when the result of match query is null.
I am asking this so that i can minimise the query to the database.

To answer your question, if it is null, it will execute the stages and returns empty result. You don't need to break it if it is null.
Play
The MongoDB aggregation pipeline consists of stages. Each stage transforms the documents as they pass through the pipeline.
Reference
Explanation:
db.orders.aggregate([
{ $match: { status: "A" } },
{ $group: { _id: "$cust_id", total: { $sum: "$amount" } } }
])
First Stage: The $match stage filters the documents by the status field and passes to the next stage those documents that have status equal to "A".
Second Stage: The $group stage groups the documents by the cust_id field to calculate the sum of the amount for each unique cust_id.

Related

Use MongoDB $sort twice

Is there a way to use the $sort operator twice within a single aggregation pipeline?
I know that using a singular $sort with two keys works properly, i.e. sort by the first key, then the second.
My current project requires multiple $sort stages to exist, for example
db.collection.aggregate([
{
$sort: {
"age": 1
}
},
{
$sort: {
"score": -1
}
}
])
Currently, the second stage doesn't respect the result of the first stage. Is there any workaround for that?
Is it possible to, for example, assign each document a new field 'index' after the first stage, storing its index within the current array of results, and use that field in the second $sort stage?
You can use multiple value in '$sort'.
db.collection.aggregate([
{
"$sort": {
"age": 1,
"score": -1
}
}
])
I have define mongo playground link, you can refer it
https://mongoplayground.net/p/ZaRX_XNSXhu

Why $sort on indexed fields with $group stage does not exceed RAM limit, but $sort alone does?

I have a collection with about 50,000 items with created indexes on e.g. name, and _id
If I use db.items.find().sort({ name: 1, _id: 1 })
or:
db.items.aggregate([
{
$match: {}
},
{
$sort: {
name 1,
_id: 1
}
}
])
then it exceed the RAM limit: Executor error during find command :: caused by :: Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit. and I have to pass { allowDiskUse: true } to aggregate if I want this to work.
However when I use $group stage in the aggregation pipeline it does not exceed RAM limit and it works:
db.items.aggregate.aggregate([
{
$match: {}
},
{
$sort: {
name 1,
_id: 1
}
},
{
$group: {
_id: 1,
x: {
$push: {
_id: '$_id'
}
}
}
}
])
Why is this happening with $sort alone, but not with $sort + $group?
I have a theory it's connected to this feature.
If a pipeline sorts and groups by the same field and the $group stage only uses the $first accumulator operator, consider adding an index on the grouped field which matches the sort order. In some cases, the $group stage can use the index to quickly find the first document of each group.
While the pipeline optimizations and the way things "actually" run is a black box this is the only thing I can think of ( that is mentioned in the docs at least ).
I'm assuming this "optimization" kicks in, making the $group stage utilize the index. meaning the pipeline might be holding "less" memory as it's using the index to scan this. Eventually you're not returning the name making the total result smaller.
Again this is pure speculation, but it's the best I got.

Trying to select single documents from mongo collection

We have a rudimentary versioning system in a collection that uses a field (pageId) as a root key. Subsequent versions of this page have the same pageId. This allows us to very easily find all versions of a single page.
How do I go about running a query that returns only the lastModified document for each distinct pageId.
In psuedo-code you could say:
For each distinct pageId
sort documents based on lastModified descending
and return only the first document
You can use the aggregation pipelines for that.
$sort - Sorts all input documents and returns them to the pipeline in sorted order.
$group - Groups documents by some specified expression and outputs to the next stage a document for each distinct grouping.
$first - Returns the value that results from applying an expression to the first document in a group of documents that share the same group by key.
Example:
db.getCollection('t01').aggregate([
{
$sort: {'lastModified': -1}
},
{
$group: {
_id: "$pageId",
element1: { $first: "$element1" },
element2: { $first: "$element2" },
elementN: { $first: "$elementN" },
}
}
]);

SELECT avg(rate) FROM ratings WHERE sid=1 in MongoDB

How to implement equivalent of this SQL command in MongoDB?
SELECT avg(rate) FROM ratings WHERE sid=1
No need to grouping.
Yes there is aggregation framework in mongodb where you can make a pipeline of stages you want for query.
db.collection.aggregate([
{
$match: {
"sid": 1
}
},
{
$project: avg(rate): {
$avg: "$rate"
}
}
])
As you know in sql query where part is applied first that's why we've place $match pipeline at first. $match in mongodb is somehow equivalent to where i SQL and there is $avg in mongodb which works the same as AVG in SQL
To solve this, use $avg within the $group aggregation pipeline element. Basic pipeline flow:
match on sid=1 (your WHERE clause)
group by sid (there's only one sid to group by at this point, because the others are filtered out via match), and generate an average within the group'd content
Your pipeline would look something like:
db.rates.aggregate(
[
{ $match: {"sid":1}},
{ $group: { _id: "$sid", rateAvg: {$avg: "$rate" } }}
])

Will aggregation pipeline use index for derived field?

In aggregation pipeline I have this:-
{
"$project" => {
account_name_i: { "$toLower" => "$account_name" },
}
}
{
"$sort" => {
account_name_i: 1
}
}
and I have index { account_name: 1 }
My Question is will $sort use index on account_name? If no, there is any other way we can achieve this in aggregation pipeline?
No. The aggregation pipeline can only use a standard index on a $match or $sort phase that's at the beginning of a pipeline. The rules for using indexes with aggregation pipelines are described in detail in the manual.
You aim to sort on the lower-case version of account_name_i, most likely to function as a case-insensitive sort. To achieve this with an index, store a lower-case-normalized version of account_name_i in each document
{ "account_name_i" : "TruMAn's HABerdaShEry", "account_name_i_lc" : "truman's haberdashery" }
and index the normalized field ({ "account_name_i_lc" : 1 }).
If you are using Aggregation Pipeline Stages like $group or $match on a collection then it will
produce a new set of documents from that collection.
as it is new set of documents which means no index is defined for it and hence we cant use the index in Aggregation Pipeline
Note: although if you use $sort as first stage of Aggregation Pipeline then it will utilize the index of
the collection as input for this stage is collection itself.
Kind of, though only with preprocessing. You can do an output after your project stage and query the output collection for sorting:
db.yourSource.aggregate([
{
"$project" => {
account_name_i: { "$toLower" => "$account_name" },
}
},
{
$out: "intermediate"
}
])
db.intermediate.ensureIndex({account_name_i:1})
db.intermediate.find({}).sort({account_name_i:1})