Find documents based on a property values progressively in MongoDB - mongodb

Assume I have a collection with many documents which they have a property called "status". Status accepts any Int value. I want to find all documents that have status with value "1". If there are not any, find all documents that have status with value "2" and so on... Is there any solution to do such action in a single query?

That is possible.
If you create an index on {status:1}, you can run a query with limit:1 and sort:{status:1}, and project to return only the status field (excluding _id). This would be a covered query that is quite efficient and should only examine a single index key.
Then use that value to query for matching status. This query would also use the index to minimize the number of document examined.
The difference between doing this in 2 queries vs 1 is likely small.
You could peform both in an aggregation:
db.collection.aggregate([
{$sort: {status: 1}},
{$limit: 1},
{$project: {
_id: 0,
status: 1
}},
{$lookup: {
as: "matched",
from: "collection",
let: {target: "$status"},
pipeline: [
{$match: {
$expr: {
$eq: [
"$status",
"$$target"
]
}
}}
]
}},
{$unwind: "$matched"},
{$replaceRoot: {newRoot: "$matched"}}
])
It is not completely clear whether the $lookup part will be able to use the index, so you should test to see if that actually performs better than running 2 queries from the client.
Playground

Related

find missing elements from the passed array to mongodb qyery

for example
animals = ['cat','mat','rat'];
collection contains only 'cat' and 'mat'
I want the query to return 'rat' which is not there in collection..
collection contains
[
{
_id:objectid,
animal:'cat'
},
{
_id:objectid,
animal:'mat'
}
]
db.collection.find({'animal':{$nin:animals}})
(or)
db.collection.find({'animal':{$nin:['cat','mat','rat']}})
EDIT:
One option is:
Use $facet to $group all existing values to a set. using $facet allows to continue even if the db is empty, as #leoll2 mentioned.
$project with $cond to handle both cases: with or without data.
Find the set difference
db.collection.aggregate([
{$facet: {data: [{$group: {_id: 0, animals: {$addToSet: "$animal"}}}]}},
{$project: {
data: {
$cond: [{$gt: [{$size: "$data"}, 0]}, {$first: "$data"}, {animals: []}]
}
}},
{$project: {data: "$data.animals"}},
{$project: {_id: 0, missing: {$setDifference: [animals, "$data"]}}}
])
See how it works on the playground example - with data or playground example - without data

Optional break in aggregation pipeline

I have the following pipeline:
Match a single document from collection collection1.
Take key k_id, which is ObjectId and lookup for another single document in collection2 with that _id.
$unwind the result.
The problem is, I am not sure k_id exists and collection2 contains the document I lookup for. In that case I'd like to do some checks and break aggregation pipeline after the first step. I didn't found any operator with similar functionality.
For now I do some complicated mess.
db['collection1'].aggregate([
{$match: ...},
// make field that always exist and put either $k_id or null there
{$addFields: {
k_id_ensured: {$ifNull: ['$k_id', null]}
}},
// $document_unensured may be empty array in case no documents found
{$lookup: {
from: 'collection2',
...,
as: 'document_unensured'
}},
// replace empty array with [null], ...
{$addFields: {
document_ensured: {
$cond: {
// (if $document_unensured is empty array...)
if: {$eq: [ {$size, '$document_unensured'}, 0 ]},
// (...then make it contain at least `null`)
then: [ null ],
else: '$document_unensured'
}
}
}},
// ...because after unwinding empty array the whole
// document will just dissappear
{$unwind: {path: '$document_ensured'}},
{$project: /*delete all non needed fields*/}
])
Is there more elegant way to do it?

Mongodb aggregation $group followed by $limit for pagination

In MongoDB aggregation pipeline, record flow from stage to stage happens one/batch at a time (or) will wait for the current stage to complete for whole collection before passing it to next stage?
For e.g., I have a collection classtest with following sample records
{name: "Person1", marks: 20}
{name: "Person2", marks: 20}
{name: "Person1", marks: 20}
I have total 1000 records for about 100 students and I have following aggregate query
db.classtest.aggregate(
[
{$sort: {name: 1}},
{$group: {_id: '$name',
total: {$sum: '$marks'}}},
{$limit: 5}
])
I have following questions.
The sort order is lost in final results. If I place another sort after $group, then results are sorted properly. Does that mean $group not maintains the previous sort order?
I would like to limit the results to 5. Does group operation has to be completely done (for all 1000 records) before passing to the limit. (or) The group operation passes the records to limit stage as and when it has record and stops processing when the requirement for limit stage is met?
My actual idea is to do pagination on results of aggregate. In above scenario, if $group maintains sort order and processes only required number of records, I want to apply $match condition {$ge: 'lastPersonName'} in subsequent page queries.
I do not want to apply $limit before $group as I want results for 5 students not first 5 records.
I may not want to use $skip as that means effectively traversing those many records.
I have solved the problem without need of maintaining another collection or even without $group traversing whole collection, hence posting my own answer.
As others have pointed:
$group doesn't retain order, hence early sorting is not of much help.
$group doesn't do any optimization, even if there is a following $limit, i.e., runs $group on entire collection.
My usecase has following unique features, which helped me to solve it:
There will be maximum of 10 records per each student (minimum of 1).
I am not very particular on page size. The front-end capable of handling varying page sizes.
The following is the aggregation command I have used.
db.classtest.aggregate(
[
{$sort: {name: 1}},
{$limit: 5 * 10},
{$group: {_id: '$name',
total: {$sum: '$marks'}}},
{$sort: {_id: 1}}
])
Explaining the above.
if $sort immediately precedes $limit, the framework optimizes the amount of data to be sent to next stage. Refer here
To get a minimum of 5 records (page size), I need to pass at least 5 (page size) * 10 (max records per student) = 50 records to the $group stage. With this, the size of final result may be anywhere between 0 and 50.
If the result is less than 5, then there is no further pagination required.
If the result size is greater than 5, there may be chance that last student record is not completely processed (i.e., not grouped all the records of student), hence I discard the last record from the result.
Then name in last record (among retained results) is used as $match criteria in subsequent page request as shown below.
db.classtest.aggregate(
[
{$match: {name: {$gt: lastRecordName}}}
{$sort: {name: 1}},
{$limit: 5 * 10},
{$group: {_id: '$name',
total: {$sum: '$marks'}}},
{$sort: {_id: 1}}
])
In above, the framework will still optimize $match, $sort and $limit together as single operation, which I have confirmed through explain plan.
The first few things to consider here is that the aggregation framework works with a "pipeline" of stages to be applied in order to get a result. If you are familiar with processing things on the "command line" or "shell" of your operating system, then you might have some experience with the "pipe" or | operator.
Here is a common unix idiom:
ps -ef | grep mongod | tee "out.txt"
In this case the output of the first command here ps -ef is being "piped" to the next command grep mongod which in turn has it's output "piped" to the tee out.txt which both outputs to terminal as well as the specified file name. This is a "pipeline" wher each stage "feeds" the next, and in "order" of the sequence they are written in.
The same is true of the aggregation pipeline. A "pipeline" here is in fact an "array", which is an ordered set of instructions to be passed in processing the data to a result.
db.classtest.aggregate([
{ "$group": {
"_id": "$name",
"total": { "$sum": "$marks"}
}},
{ "$sort": { "name": 1 } },
{ "$limit": 5 }
])
So what happens here is that all of the items in the collection are first processed by $group to get their totals. There is no specified "order" to grouping so there is not much sense in pre-ordering the data. Neither is there any point in doing so because you are yet to get to your later stages.
Then you would $sort the results and also $limit as required.
For your next "page" of data you will want ideally $match on the last unique name found, like so:
db.classtest.aggregate([
{ "$match": { "name": { "$gt": lastNameFound } }},
{ "$group": {
"_id": "$name",
"total": { "$sum": "$marks"}
}},
{ "$sort": { "name": 1 } },
{ "$limit": 5 }
])
It's not the best solution, but there really are not alternatives for this type of grouping. It will however notably get "faster" with each iteration towards the end. Alternately, storing all the unqiue names ( or reading that out of another collection ) and "paging" through that list with a "range query" on each aggregation statement may be a viable option, if your data permits it.
Something like:
db.classtest.aggregate([
{ "$match": { "name": { "$gte": "Allan", "$lte": "David" } }},
{ "$group": {
"_id": "$name",
"total": { "$sum": "$marks"}
}},
{ "$sort": { "name": 1 } },
])
Unfortunately there is not a "limit grouping up until x results" option, so unless you can work with another list, then you are basically grouping up everything ( and possibly a a gradually smaller set each time ) with each aggregation query you send.
"$group does not order its output documents." See http://docs.mongodb.org/manual/reference/operator/aggregation/group/
$limit limits the number of processed elements of an immediately preceding $sort operation, not only the number of elements passed to the next stage. See the note at http://docs.mongodb.org/manual/reference/operator/aggregation/limit/
For the very first question you asked, I am not sure, but it appears (see 1.) that a stage n+1 can influence the behaviour of stage n : the limit will limit the sort operation to its first n elements, and the sort operation will not complete just as if the following limit stage did not exist.
pagination on group data mongodb -
in $group items you can't directly apply pagination, but below trick will be used ,
if you want pagination on group data -
for example- i want group products categoryWise and then i want only 5 product per category then
step 1 - write aggregation on product table, and write groupBY
{ $group: { _id: '$prdCategoryId', products: { $push: '$$ROOT' } } },
step 2 - prdSkip for skipping , and limit for limiting data , pass it
dynamically
{
$project: {
// pagination for products
products: {
$slice: ['$products', prdSkip, prdLimit],
}
}
},
finally query looks like -
params - limit , skip - for category pagination
and prdSkip and PrdLimit for products pagination
db.products.aggregate([
{ $group: { _id: '$prdCategoryId', products: { $push: '$$ROOT' } } },
{
$lookup: {
from: 'categories',
localField: '_id',
foreignField: '_id',
as: 'categoryProducts',
},
},
{
$replaceRoot: {
newRoot: {
$mergeObjects: [{ $arrayElemAt: ['$categoryProducts', 0] }, '$$ROOT'],
},
},
},
{
$project: {
// pagination for products
products: {
$slice: ['$products', prdSkip, prdLimit],
},
_id: 1,
catName: 1,
catDescription: 1,
},
},
])
.limit(limit) // pagination for category
.skip(skip);
I used replaceRoot here to pullOut category.

Return a document with the max value on a number field using MongoDB aggregation

If I have a bunch of documents for instance
{
_id: mongoId,
subcatId: mongoId,
viewCount: 2
}
_id is unique but subcatId isn't.
If I wanted to return each document that had the highest viewCount per subcatId, how would I do that using aggregation with Mongoose/MongoDB?
You can do that like this:
db.test.aggregate([
// Sort the docs by viewCount descending so that the highest ones come first
{$sort: {viewCount: -1}},
// Group by subcatId and take the first doc from each group
{$group: {_id: '$subcatId', doc: {$first: '$$ROOT'}}}
])
The $$ROOT system variable was added in 2.6 and represents the whole document being processed at that stage in the pipeline. All system variables are referenced with the $$ prefix.
For older versions of MongoDB, you need to individually add each field you need to the $group:
db.test.aggregate([
{$sort: {viewCount: -1}},
{$group: {
_id: '$subcatId',
doc_id: {$first: '$_id'},
viewCount: {$first: '$viewCount'}
}}
])

mongodb aggregation framework - generate _id from function

Is it possible to have a custom function in the _id field in $group? I couldn't make it work although the documentation seems to indicate that the field can be computed.
For example, let's say I have a set of documents having a number field that ranges 1 to 100. I want to classify the number into several buckets e.g. 1-20, 21-40, etc. Then, I will sum/avg a different field with this bucket identifier. So I am trying to do this:
$group : { _id : bucket("$numberfield") , sum: { $sum: "$otherfield" } }
...where bucket is a function that returns a string e.g. "1-20".
That didn't work.
http://docs.mongodb.org/manual/reference/operator/aggregation/group/#pipe._S_group
For this _id field, you can specify various expressions, including a single field from the documents in the pipeline, a computed value from a previous stage, a document that consists of multiple fields, and other valid expressions, such as constant or subdocument fields.
As at MongoDB 2.4, you cannot implement any custom functions in the Aggregation Framework. If you want to $group by one or more fields, you need to add those either through aggregation operators and expressions or via an explicit update() if you don't want to calculate each time.
Using the Aggregation Framework you can add a computed bucket field in a $project pipeline step with the $cond operator.
Here is an example of calculating ranges based on numberField that can then be used in a $group pipeline for sum/avg/etc:
db.data.aggregate(
{ $project: {
numberfield: 1,
someotherfield: 1,
bucket: {
$cond: [ {$and: [ {$gte: ["$numberfield", 1]}, {$lte: ["$numberfield", 20]} ] }, '1-20', {
$cond: [ {$lt: ["$numberfield", 41]}, '21-40', {
$cond: [ {$lt: ["$numberfield", 61]}, '41-60', {
$cond: [ {$lt: ["$numberfield", 81]}, '61-80', {
$cond: [ {$lt: ["$numberfield", 101]}, '81-100', '100+' ]
}]}]}]}]
}
}},
{ $group: {
_id: "$bucket",
sum: { $sum: "$someotherfield" }
}}
)