Related
I have the following query...
db.getCollection('apprenticeships')
.aggregate([
{
$match: {
'Vacancy._id': { $in: [1, 2, 3] },
}
},
{
$group: {
'_id': {
'VacancyId': '$Vacancy._id',
'Status': '$Status'
},
'Count': { $sum: 1 }
}
},
{
$sort: {
'_id.VacancyId': 1,
'_id.Status': 1
}
}
])
Which gives results where each element has following structure
{
"_id" : {
"VacancyId" : 1,
"Status" : 90
},
"Count" : 40.0
}
How can I remap that structure so that the elements in the output look like this instead?
{
"VacancyId": 1,
"Status": 90,
"Count": 40
}
You can add $project stage to aggregation pipeline to add new fields VacancyId and status and then hide the _id
db.getCollection('apprenticeships')
.aggregate([{
$match: {
'Vacancy._id': {
$in: [1, 2, 3]
},
}
},
{
$group: {
'_id': {
'VacancyId': '$Vacancy._id',
'Status': '$Status'
},
'Count': {
$sum: 1
}
}
},
{
$sort: {
'_id.VacancyId': 1,
'_id.Status': 1
}
},
{
{
$project:{ 'VacancyId': '$_id.VacancyId', 'Status': '$_id.Status', 'Count': '$Count', '_id': 0 }
}
}
])
I wonder if it's possible to do an average over a MongoDb time series aggregate. For instance, an aggregate that gives the average temp for every minute.
My data looks like this:
[
{
"_id": "57fbebf99929a71d305e2bb2",
"temp": 23.77,
"dateTime": "2016-10-10T19:28:57.923Z",
"_dateTime": 1476127737000
},
{
"_id": "57fbebfa9929a71d305e2bb3",
"temp": 27.16,
"dateTime": "2016-10-10T19:28:58.838Z",
"_dateTime": 1476127738000
},
{
"_id": "57fbebff9929a71d305e2bb4",
"temp": 31.93,
"dateTime": "2016-10-10T19:29:03.848Z",
"_dateTime": 1476127743000
}
]
The code (javascript) looks like this so far..
var results = temperatures.aggregate(
[
{ $project : { "timeSpan" : {$add : [new Date(0),"$_dateTime"] } } },
{ $project : { "minuteRead" : { $minute : "$timeSpan" }} },
{
$group : {
_id : {minuteRead : "$minuteRead" },
count : { $sum : 1 }
}
}
],
function(err, result) {
console.log(result);
}
);
With the output of:
[ { _id: { minuteRead: 30 }, count: 7 },
{ _id: { minuteRead: 29 }, count: 12 },
{ _id: { minuteRead: 28 }, count: 2 } ]
But what I'd like to have is:
[ { _id: { minuteRead: 30 }, avgTemp: 17.6 },
{ _id: { minuteRead: 29 }, avgTemp: 18.3 },
{ _id: { minuteRead: 28 }, avgTemp: 20.1 } ]
Is this possible?
Thank you!
The "$avg" would do the trick
$group : {
_id : {minuteRead : "$minuteRead" },
avgTemp : { $avg :"$temp" }
}
Got it!
temperatures.aggregate(
[
{ $project: { temp:'$temp', "timeSpan": { $add: [new Date(0), "$_dateTime"] } }},
{ $project: { "timestamp": { $minute: "$timeSpan" }, temp:'$temp' } },
{
$group: {
_id: { minuteRead: "$timestamp" },
avgTemp : { $avg :"$temp" }
}
}
],
function (err, result) {
if (err)
console.log("ERROR " + err);
else
console.log(result);
}
);
I am trying to fetch all records (and count of all records) for a structure like the following,
{
id: 1,
level1: {
level2:
[
{
field1:value1;
},
{
field1:value1;
},
]
}
},
{
id: 2,
level1: {
level2:
[
{
field1:null;
},
{
field1:value1;
},
]
}
}
My requirement is to fetch the number of records that have field1 populated (atleast one in level2). I need to say fetch all the ids or the number of such ids.
The query I am using is,
db.table.find({},
{
_id = id,
value: {
$elemMatch: {'level1.level2.field1':{$exists: true}}
}
}
})
Please suggest.
EDIT1:
This is the question I was trying to ask in the comment. I was unable to elucidate in the comment properly. Hence, editing the question.
{
id: 1,
level1: {
level2:
[
{
field1:value1;
},
{
field1:value1;
},
]
}
},
{
id: 2,
level1: {
level2:
[
{
field1:value2;
},
{
field1:value2;
},
{
field1:value2;
}
]
}
}
{
id: 3,
level1: {
level2:
[
{
field1:value1;
},
{
field1:value1;
},
]
}
}
The query we used results in
value1: 4
value2: 3
I want something like
value1: 2 // Once each for documents 1 & 3
value2: 1 // Once for document 2
You can do that with the following find query:
db.table.find({ "level1.level2" : { $elemMatch: { field1 : {$exists: true} } } }, {})
This will return all documents that have a field1 in the "level1.level2" structure.
For your question in the comment, you can use the following aggregation to "I had to return a grouping (and the corresponding count) for the values in field1":
db.table.aggregate(
[
{
$unwind: "$level1.level2"
},
{
$match: { "level1.level2.field1" : { $exists: true } }
},
{
$group: {
_id : "$level1.level2.field1",
count : {$sum : 1}
}
}
]
UPDATE: For your question "'value1 - 2` At level2, for a document, assume all values will be the same for field1.".
I hope i understand your question correctly, instead of grouping only on the value of field1, i added the document _id as an xtra grouping:
db.table.aggregate(
[
{
$unwind: "$level1.level2"
},
{
$match: {
"level1.level2.field1" : { $exists: true }
}
},
{
$group: {
_id : { id : "$_id", field1: "$level1.level2.field1" },
count : {$sum : 1}
}
}
]
);
UPDATE2:
I altered the aggregation and added a extra grouping, the aggregation below gives you the results you want.
db.table.aggregate(
[
{
$unwind: "$level1.level2"
},
{
$match: {
"level1.level2.field1" : { $exists: true }
}
},
{
$group: {
_id : { id : "$_id", field1: "$level1.level2.field1" }
}
},
{
$group: {
_id : { id : "$_id.field1"},
count : { $sum : 1}
}
}
]
);
I have used aggregation for fetching records from mongodb.
$result = $collection->aggregate(array(
array('$match' => $document),
array('$group' => array('_id' => '$book_id', 'date' => array('$max' => '$book_viewed'), 'views' => array('$sum' => 1))),
array('$sort' => $sort),
array('$skip' => $skip),
array('$limit' => $limit),
));
If I execute this query without limit then 10 records will be fetched. But I want to keep limit as 2. So I would like to get the total records count. How can I do with aggregation? Please advice me. Thanks
Since v.3.4 (i think) MongoDB has now a new aggregation pipeline operator named 'facet' which in their own words:
Processes multiple aggregation pipelines within a single stage on the same set of input documents. Each sub-pipeline has its own field in the output document where its results are stored as an array of documents.
In this particular case, this means that one can do something like this:
$result = $collection->aggregate([
{ ...execute queries, group, sort... },
{ ...execute queries, group, sort... },
{ ...execute queries, group, sort... },
{
$facet: {
paginatedResults: [{ $skip: skipPage }, { $limit: perPage }],
totalCount: [
{
$count: 'count'
}
]
}
}
]);
The result will be (with for ex 100 total results):
[
{
"paginatedResults":[{...},{...},{...}, ...],
"totalCount":[{"count":100}]
}
]
This is one of the most commonly asked question to obtain the paginated result and the total number of results simultaneously in single query. I can't explain how I felt when I finally achieved it LOL.
$result = $collection->aggregate(array(
array('$match' => $document),
array('$group' => array('_id' => '$book_id', 'date' => array('$max' => '$book_viewed'), 'views' => array('$sum' => 1))),
array('$sort' => $sort),
// get total, AND preserve the results
array('$group' => array('_id' => null, 'total' => array( '$sum' => 1 ), 'results' => array( '$push' => '$$ROOT' ) ),
// apply limit and offset
array('$project' => array( 'total' => 1, 'results' => array( '$slice' => array( '$results', $skip, $length ) ) ) )
))
Result will look something like this:
[
{
"_id": null,
"total": ...,
"results": [
{...},
{...},
{...},
]
}
]
Use this to find total count in resulting collection.
db.collection.aggregate( [
{ $match : { score : { $gt : 70, $lte : 90 } } },
{ $group: { _id: null, count: { $sum: 1 } } }
] );
You can use toArray function and then get its length for total records count.
db.CollectionName.aggregate([....]).toArray().length
Here are some ways to get total records count while doing MongoDB Aggregation:
Using $count:
db.collection.aggregate([
// Other stages here
{ $count: "Total" }
])
For getting 1000 records this takes on average 2 ms and is the fastest way.
Using .toArray():
db.collection.aggregate([...]).toArray().length
For getting 1000 records this takes on average 18 ms.
Using .itcount():
db.collection.aggregate([...]).itcount()
For getting 1000 records this takes on average 14 ms.
Use the $count aggregation pipeline stage to get the total document count:
Query :
db.collection.aggregate(
[
{
$match: {
...
}
},
{
$group: {
...
}
},
{
$count: "totalCount"
}
]
)
Result:
{
"totalCount" : Number of records (some integer value)
}
I did it this way:
db.collection.aggregate([
{ $match : { score : { $gt : 70, $lte : 90 } } },
{ $group: { _id: null, count: { $sum: 1 } } }
] ).map(function(record, index){
print(index);
});
The aggregate will return the array so just loop it and get the final index .
And other way of doing it is:
var count = 0 ;
db.collection.aggregate([
{ $match : { score : { $gt : 70, $lte : 90 } } },
{ $group: { _id: null, count: { $sum: 1 } } }
] ).map(function(record, index){
count++
});
print(count);
//const total_count = await User.find(query).countDocuments();
//const users = await User.find(query).skip(+offset).limit(+limit).sort({[sort]: order}).select('-password');
const result = await User.aggregate([
{$match : query},
{$sort: {[sort]:order}},
{$project: {password: 0, avatarData: 0, tokens: 0}},
{$facet:{
users: [{ $skip: +offset }, { $limit: +limit}],
totalCount: [
{
$count: 'count'
}
]
}}
]);
console.log(JSON.stringify(result));
console.log(result[0]);
return res.status(200).json({users: result[0].users, total_count: result[0].totalCount[0].count});
Solution provided by #Divergent does work, but in my experience it is better to have 2 queries:
First for filtering and then grouping by ID to get number of filtered elements. Do not filter here, it is unnecessary.
Second query which filters, sorts and paginates.
Solution with pushing $$ROOT and using $slice runs into document memory limitation of 16MB for large collections. Also, for large collections two queries together seem to run faster than the one with $$ROOT pushing. You can run them in parallel as well, so you are limited only by the slower of the two queries (probably the one which sorts).
I have settled with this solution using 2 queries and aggregation framework (note - I use node.js in this example, but idea is the same):
var aggregation = [
{
// If you can match fields at the begining, match as many as early as possible.
$match: {...}
},
{
// Projection.
$project: {...}
},
{
// Some things you can match only after projection or grouping, so do it now.
$match: {...}
}
];
// Copy filtering elements from the pipeline - this is the same for both counting number of fileter elements and for pagination queries.
var aggregationPaginated = aggregation.slice(0);
// Count filtered elements.
aggregation.push(
{
$group: {
_id: null,
count: { $sum: 1 }
}
}
);
// Sort in pagination query.
aggregationPaginated.push(
{
$sort: sorting
}
);
// Paginate.
aggregationPaginated.push(
{
$limit: skip + length
},
{
$skip: skip
}
);
// I use mongoose.
// Get total count.
model.count(function(errCount, totalCount) {
// Count filtered.
model.aggregate(aggregation)
.allowDiskUse(true)
.exec(
function(errFind, documents) {
if (errFind) {
// Errors.
res.status(503);
return res.json({
'success': false,
'response': 'err_counting'
});
}
else {
// Number of filtered elements.
var numFiltered = documents[0].count;
// Filter, sort and pagiante.
model.request.aggregate(aggregationPaginated)
.allowDiskUse(true)
.exec(
function(errFindP, documentsP) {
if (errFindP) {
// Errors.
res.status(503);
return res.json({
'success': false,
'response': 'err_pagination'
});
}
else {
return res.json({
'success': true,
'recordsTotal': totalCount,
'recordsFiltered': numFiltered,
'response': documentsP
});
}
});
}
});
});
This could be work for multiple match conditions
const query = [
{
$facet: {
cancelled: [
{ $match: { orderStatus: 'Cancelled' } },
{ $count: 'cancelled' }
],
pending: [
{ $match: { orderStatus: 'Pending' } },
{ $count: 'pending' }
],
total: [
{ $match: { isActive: true } },
{ $count: 'total' }
]
}
},
{
$project: {
cancelled: { $arrayElemAt: ['$cancelled.cancelled', 0] },
pending: { $arrayElemAt: ['$pending.pending', 0] },
total: { $arrayElemAt: ['$total.total', 0] }
}
}
]
Order.aggregate(query, (error, findRes) => {})
I needed the absolute total count after applying the aggregation. This worked for me:
db.mycollection.aggregate([
{
$group: {
_id: { field1: "$field1", field2: "$field2" },
}
},
{
$group: {
_id: null, count: { $sum: 1 }
}
}
])
Result:
{
"_id" : null,
"count" : 57.0
}
If you don't want to group, then use the following method:
db.collection.aggregate( [
{ $match : { score : { $gt : 70, $lte : 90 } } },
{ $count: 'count' }
] );
Here is an example with Pagination, match and sort in mongoose aggregate
const [response] = await Prescribers.aggregate([
{ $match: searchObj },
{ $sort: sortObj },
{
$facet: {
response: [{ $skip: count * page }, { $limit: count }],
pagination: [
{
$count: 'totalDocs',
},
{
$addFields: {
page: page + 1,
totalPages: {
$floor: {
$divide: ['$totalDocs', count],
},
},
},
},
],
},
},
]);
Here count is the limit of each page and page is the the page number. Prescribers is the model
This would return the records similar to this
"data": {
"response": [
{
"_id": "6349308c90e58c6820bbc682",
"foo": "bar"
}
{
"_id": "6349308c90e58c6820bbc682",
"foo": "bar"
},
{
"_id": "6349308c90e58c6820bbc682",
"foo": "bar"
}
{
"_id": "6349308c90e58c6820bbc682",
"foo": "bar"
},
{
"_id": "6349308c90e58c6820bbc682",
"foo": "bar"
},
{
"_id": "6349308c90e58c6820bbc682",
"foo": "bar"
}
{
"_id": "6349308c90e58c6820bbc682",
"foo": "bar"
},
{
"_id": "6349308c90e58c6820bbc682",
"foo": "bar"
}
{
"_id": "6349308c90e58c6820bbc682",
"foo": "bar"
},
{
"_id": "6349308c90e58c6820bbc682",
"foo": "bar"
},
],
"pagination": [
{
"totalDocs": 592438,
"page": 1,
"totalPages": 59243
}
]
}
Sorry, but I think you need two queries. One for total views and another one for grouped records.
You can find useful this answer
if you need to $match with nested documents then
https://mongoplayground.net/p/DpX6cFhR_mm
db.collection.aggregate([
{
"$unwind": "$tags"
},
{
"$match": {
"$or": [
{
"tags.name": "Canada"
},
{
"tags.name": "ABC"
}
]
}
},
{
"$group": {
"_id": null,
"count": {
"$sum": 1
}
}
}
])
I had to perform a lookup, match and then count the documents recieved. Here is how I achieved it using mongoose:
ModelName.aggregate([
{
'$lookup': {
'from': 'categories',
'localField': 'category',
'foreignField': '_id',
'as': 'category'
}
}, {
'$unwind': {
'path': '$category'
}
}, {
'$match': {
'category.price': {
'$lte': 3,
'$gte': 0
}
}
}, {
'$count': 'count'
}
]);
Sample Documents:
{ time: ISODate("2013-10-10T20:55:36Z"), value: 1 }
{ time: ISODate("2013-10-10T22:43:16Z"), value: 2 }
{ time: ISODate("2013-10-11T19:12:66Z"), value: 3 }
{ time: ISODate("2013-10-11T10:15:38Z"), value: 4 }
{ time: ISODate("2013-10-12T04:15:38Z"), value: 5 }
It's easy to get the aggregated results that is grouped by date.
But what I want is to query results that returns a running total
of the aggregation, like:
{ time: "2013-10-10" total: 3, runningTotal: 3 }
{ time: "2013-10-11" total: 7, runningTotal: 10 }
{ time: "2013-10-12" total: 5, runningTotal: 15 }
Is this possible with the MongoDB Aggregation?
EDIT: Since MongoDB v5.0 the prefered approach would be to use the new $setWindowFields aggregation stage as shared by Xavier Guihot.
This does what you need. I have normalised the times in the data so they group together (You could do something like this). The idea is to $group and push the time's and total's into separate arrays. Then $unwind the time array, and you have made a copy of the totals array for each time document. You can then calculated the runningTotal (or something like the rolling average) from the array containing all the data for different times. The 'index' generated by $unwind is the array index for the total corresponding to that time. It is important to $sort before $unwinding since this ensures the arrays are in the correct order.
db.temp.aggregate(
[
{
'$group': {
'_id': '$time',
'total': { '$sum': '$value' }
}
},
{
'$sort': {
'_id': 1
}
},
{
'$group': {
'_id': 0,
'time': { '$push': '$_id' },
'totals': { '$push': '$total' }
}
},
{
'$unwind': {
'path' : '$time',
'includeArrayIndex' : 'index'
}
},
{
'$project': {
'_id': 0,
'time': { '$dateToString': { 'format': '%Y-%m-%d', 'date': '$time' } },
'total': { '$arrayElemAt': [ '$totals', '$index' ] },
'runningTotal': { '$sum': { '$slice': [ '$totals', { '$add': [ '$index', 1 ] } ] } },
}
},
]
);
I have used something similar on a collection with ~80 000 documents, aggregating to 63 results. I am not sure how well it will work on larger collections, but I have found that performing transformations(projections, array manipulations) on aggregated data does not seem to have a large performance cost once the data is reduced to a manageable size.
here is another approach
pipeline
db.col.aggregate([
{$group : {
_id : { time :{ $dateToString: {format: "%Y-%m-%d", date: "$time", timezone: "-05:00"}}},
value : {$sum : "$value"}
}},
{$addFields : {_id : "$_id.time"}},
{$sort : {_id : 1}},
{$group : {_id : null, data : {$push : "$$ROOT"}}},
{$addFields : {data : {
$reduce : {
input : "$data",
initialValue : {total : 0, d : []},
in : {
total : {$sum : ["$$this.value", "$$value.total"]},
d : {$concatArrays : [
"$$value.d",
[{
_id : "$$this._id",
value : "$$this.value",
runningTotal : {$sum : ["$$value.total", "$$this.value"]}
}]
]}
}
}
}}},
{$unwind : "$data.d"},
{$replaceRoot : {newRoot : "$data.d"}}
]).pretty()
collection
> db.col.find()
{ "_id" : ObjectId("4f442120eb03305789000000"), "time" : ISODate("2013-10-10T20:55:36Z"), "value" : 1 }
{ "_id" : ObjectId("4f442120eb03305789000001"), "time" : ISODate("2013-10-11T04:43:16Z"), "value" : 2 }
{ "_id" : ObjectId("4f442120eb03305789000002"), "time" : ISODate("2013-10-12T03:13:06Z"), "value" : 3 }
{ "_id" : ObjectId("4f442120eb03305789000003"), "time" : ISODate("2013-10-11T10:15:38Z"), "value" : 4 }
{ "_id" : ObjectId("4f442120eb03305789000004"), "time" : ISODate("2013-10-13T02:15:38Z"), "value" : 5 }
result
{ "_id" : "2013-10-10", "value" : 3, "runningTotal" : 3 }
{ "_id" : "2013-10-11", "value" : 7, "runningTotal" : 10 }
{ "_id" : "2013-10-12", "value" : 5, "runningTotal" : 15 }
>
Here is a solution without pushing previous documents into a new array and then processing them. (If the array gets too big then you can exceed the maximum BSON document size limit, the 16MB.)
Calculating running totals is as simple as:
db.collection1.aggregate(
[
{
$lookup: {
from: 'collection1',
let: { date_to: '$time' },
pipeline: [
{
$match: {
$expr: {
$lt: [ '$time', '$$date_to' ]
}
}
},
{
$group: {
_id: null,
summary: {
$sum: '$value'
}
}
}
],
as: 'sum_prev_days'
}
},
{
$addFields: {
sum_prev_days: {
$arrayElemAt: [ '$sum_prev_days', 0 ]
}
}
},
{
$addFields: {
running_total: {
$sum: [ '$value', '$sum_prev_days.summary' ]
}
}
},
{
$project: { sum_prev_days: 0 }
}
]
)
What we did: within the lookup we selected all documents with smaller datetime and immediately calculated the sum (using $group as the second step of lookup's pipeline). The $lookup put the value into the first element of an array. We pull the first array element and then calculate the sum: current value + sum of previous values.
If you would like to group transactions into days and after it calculate running totals then we need to insert $group to the beginning and also insert it into $lookup's pipeline.
db.collection1.aggregate(
[
{
$group: {
_id: {
$substrBytes: ['$time', 0, 10]
},
value: {
$sum: '$value'
}
}
},
{
$lookup: {
from: 'collection1',
let: { date_to: '$_id' },
pipeline: [
{
$group: {
_id: {
$substrBytes: ['$time', 0, 10]
},
value: {
$sum: '$value'
}
}
},
{
$match: {
$expr: {
$lt: [ '$_id', '$$date_to' ]
}
}
},
{
$group: {
_id: null,
summary: {
$sum: '$value'
}
}
}
],
as: 'sum_prev_days'
}
},
{
$addFields: {
sum_prev_days: {
$arrayElemAt: [ '$sum_prev_days', 0 ]
}
}
},
{
$addFields: {
running_total: {
$sum: [ '$value', '$sum_prev_days.summary' ]
}
}
},
{
$project: { sum_prev_days: 0 }
}
]
)
The result is:
{ "_id" : "2013-10-10", "value" : 3, "running_total" : 3 }
{ "_id" : "2013-10-11", "value" : 7, "running_total" : 10 }
{ "_id" : "2013-10-12", "value" : 5, "running_total" : 15 }
Starting in Mongo 5, it's a perfect use case for the new $setWindowFields aggregation operator:
// { time: ISODate("2013-10-10T20:55:36Z"), value: 1 }
// { time: ISODate("2013-10-10T22:43:16Z"), value: 2 }
// { time: ISODate("2013-10-11T12:12:66Z"), value: 3 }
// { time: ISODate("2013-10-11T10:15:38Z"), value: 4 }
// { time: ISODate("2013-10-12T05:15:38Z"), value: 5 }
db.collection.aggregate([
{ $group: {
_id: { $dateToString: { format: "%Y-%m-%d", date: "$time" } },
total: { $sum: "$value" }
}},
// e.g.: { "_id" : "2013-10-11", "total" : 7 }
{ $set: { "date": "$_id" } }, { $unset: ["_id"] },
// e.g.: { "date" : "2013-10-11", "total" : 7 }
{ $setWindowFields: {
sortBy: { date: 1 },
output: {
running: {
$sum: "$total",
window: { documents: [ "unbounded", "current" ] }
}
}
}}
])
// { date: "2013-10-11", total: 7, running: 7 }
// { date: "2013-10-10", total: 3, running: 10 }
// { date: "2013-10-12", total: 5, running: 15 }
Let's focus on the $setWindowFields stage that:
chronologically $sorts grouped documents by date: sortBy: { date: 1 }
adds the running field in each document (output: { running: { ... }})
which is the $sum of totals ($sum: "$total")
on a specified span of documents (the window)
which is in our case any previous document: window: { documents: [ "unbounded", "current" ] } }
as defined by [ "unbounded", "current" ] meaning the window is all documents seen between the first document (unbounded) and the current document (current).