I have the following MongoDB collection of documents, each containing a field called "history", which contains an array of sub-documents with fields "date" and "points".
[{
history: [{
date: "2019-20-20",
points: 1,
}, {
date: "2019-20-21",
points: 1,
}, {
date: "2019-20-22",
points: 1,
}, {
date: "2019-20-23",
points: 1,
}],
}, {
history: [{
date: "2019-20-20",
points: 1,
}, {
date: "2019-20-21",
points: 2,
}, {
date: "2019-20-22",
points: 3,
}, {
date: "2019-20-23",
points: 4,
}],
}]
I'm not sure what is the best way to construct a query that produces the below output. For the following example, the date range (inclusive) is "2019-20-21" to "2019-20-22". "totalPoints" is a new field, which contains the sum of all the points in the "history" field across that date range.
[{
history: [{
date: "2019-20-20",
points: 1,
}, {
date: "2019-20-21",
points: 1,
}, {
date: "2019-20-22",
points: 1,
}, {
date: "2019-20-23",
points: 1,
}],
totalPoints: 2,
}, {
history: [{
date: "2019-20-20",
points: 1,
}, {
date: "2019-20-21",
points: 2,
}, {
date: "2019-20-22",
points: 3,
}, {
date: "2019-20-23",
points: 4,
}],
totalPoints: 5,
}]
Below is a general idea of what I'm trying to do:
User.aggregate([{
$addFields: {
totalPoints: { $sum: points in "history" field if date range between "2019-20-21" and "2019-20-22" } ,
}
}]);
The reason I want to create a new "totalPoints" field is because eventually I want to sort via the "totalPoints" field.
For a single pipeline, you can combine $reduce with $filter to get the sum as follows:
var startDate = "2019-20-21";
var endDate = "2019-20-22";
User.aggregate([
{ "$addFields": {
"totalPoints": {
"$reduce": {
"input": {
"$filter": {
"input": "$history",
"as": "el",
"cond": {
"$and": [
{ "$gte": ["$$el.date", startDate] },
{ "$lte": ["$$el.date", endDate ] },
]
}
}
},
"initialValue": 0,
"in": { "$add": [ "$$value", "$$this.points" ] }
}
}
} }
]);
Another alternative is having two pipeline stages where you start your aggregation with a filtered array which contains only the elements that match the date range query. Combine $addFields with $filter for this and your filter condition uses the conditional operator $and with the comparison operators $gte and $lte. The following pipeline shows this:
{ "$addFields": {
"totalPoints": {
"$filter": {
"input": "$history",
"cond": {
"$and": [
{ "$gte": ["$$this.date", "2019-20-21"] },
{ "$lte": ["$$this.date", "2019-20-22"] },
]
}
}
}
} },
On getting the filtered array you can then get the sum easily in the next pipeline with $sum, so your complete pipeline becomes
var startDate = "2019-20-21";
var endDate = "2019-20-22";
User.aggregate([
{ "$addFields": {
"totalPoints": {
"$filter": {
"input": "$history",
"cond": {
"$and": [
{ "$gte": ["$$this.date", startDate] },
{ "$lte": ["$$this.date", endDate ] },
]
}
}
}
} },
{ "$addFields": {
"totalPoints": { "$sum": "$totalPoints.points" }
} }
])
Related
I'll explain my problem here and i'll put a tldr at the bottom summarizing the question.
We have a collection called apple_receipt, since we have some apple purchases in our application. That document has some fields that we will be using on this aggregation. Those are: price, currency, startedAt and history. Price, currency and startedAt are self-explanatory. History is a field that is an array of objects containing a price and startedAt. So, what we are trying to accomplish is a query that gets every document between a date of our choice, for example: 06-06-2020 through 10-10-2022 and get the total price combined of all those receipts that have a startedAt between that. We have a document like this:
{
price: 12.9,
currency: 'BRL',
startedAt: 2022-08-10T16:23:42.000+00:00
history: [
{
price: 12.9,
startedAt: 2022-05-10T16:23:42.000+00:00
},
{
price: 12.9,
startedAt: 2022-06-10T16:23:42.000+00:00
},
{
price: 12.9,
startedAt: 2022-07-10T16:23:42.000+00:00
}
]
}
If we query between dates 06-06-2022 to 10-10-2022, we would have a return like this: totalPrice: 38,7.
-total price of the 3 objects that have matched the date inside that value range-
I have tried this so far:
AppleReceipt.aggregate([
{
$project: {
price: 1,
startedAt: 1,
currency: 1,
history: 1,
}
},
{
$unwind: {
path: "$history",
preserveNullAndEmptyArrays: true,
}
},
{
$match: {
$or: [
{ startedAt: {$gte: new Date(filters.begin), $lt: new Date(filters.end)} },
]
}
},
{
$group: {
_id: "$_id",
data: { $push: '$$ROOT' },
totalAmountHelper: { $sum: '$history.price' }
}
},
{
$unwind: "$data"
},
{
$addFields: {
totalAmount: { $add: ['$totalAmountHelper', '$data.price'] }
}
}
])
It does bring me the total value but I couldn't know how to take into consideration the date to make the match stage to only get the sum of the documents that are between that date.
tl;dr: Want to make a query that gets the total sum of the prices of all documents that have startedAt between the dates we choose. Needs to match the ones inside history field - which is an array of objects, and also the startedAt outside of the history field.
https://mongoplayground.net/p/lOvRbX24QI9
db.collection.aggregate([
{
$set: {
"history_total": {
"$reduce": {
"input": "$history",
"initialValue": 0,
"in": {
$sum: [
{
"$cond": {
"if": {
$and: [
{
$gte: [
new Date("2022-06-06"),
{
$dateFromString: {
dateString: "$$this.startedAt"
}
}
]
},
{
$lt: [
{
$dateFromString: {
dateString: "$$this.startedAt"
}
},
new Date("2022-10-10")
]
},
]
},
"then": "$$this.price",
"else": 0
}
},
"$$value",
]
}
}
}
}
},
{
$set: {
"history_total": {
"$sum": [
"$price",
"$history_total"
]
}
}
}
])
Result:
[
{
"_id": ObjectId("5a934e000102030405000000"),
"currency": "BRL",
"history": [
{
"price": 12.9,
"startedAt": "2022-05-10T16:23:42.000+00:00"
},
{
"price": 12.9,
"startedAt": "2022-06-10T16:23:42.000+00:00"
},
{
"price": 12.9,
"startedAt": "2022-07-10T16:23:42.000+00:00"
}
],
"history_total": 325.79999999999995,
"price": 312.9,
"startedAt": "2022-08-10T16:23:42.000+00:00"
}
]
Kudos goes to #user20042973
I have a lists of records like below
[
{
"product": "p1",
"salesdate": "2020-02-01",
"amount": 100
},
{
"product": "p2",
"salesdate": "2020-02-04",
"amount": 200
},
]
On 2nd feb and 3rd feb i don't have data. But I need to add this in my result. My expected result is
[
{
"amount": 100,
"salesdate": "2020-02-01"
},
{
"amount": 0,
"salesdate": "2020-02-02"
},
{
"amount": 0,
"salesdate": "2020-02-03"
}
{
"amount": 200,
"salesdate": "2020-02-04"
}
]
Can I achieve this using mongoDB?
https://mongoplayground.net/p/EiAJdY9jRHn
You can use $reduce for it. Whenever one has to work with data/time values, then I recommend the moment.js library. You don't have to use it, but it makes your life easier.
db.collection.aggregate([
// Put all data into one document
{ $group: { _id: null, data: { $push: "$$ROOT" } } },
// Add missing days
{
$addFields: {
data: {
$reduce: {
// Define the range of date
input: { $range: [0, moment().get('day')] },
initialValue: [],
in: {
$let: {
vars: {
ts: {
$add: [moment().startOf('month').toDate(), { $multiply: ["$$this", 1000 * 60 * 60 * 24] }]
}
},
in: {
$concatArrays: [
"$$value",
[{
$ifNull: [
{ $first: { $filter: { input: "$data", cond: { $eq: ["$$this.salesdate", "$$ts"] } } } },
// Default value for missing days
{ salesdate: "$$ts", amount: 0 }
]
}]
]
}
}
}
}
}
}
},
{ $unwind: "$data" },
{ $replaceRoot: { newRoot: "$data" } }
// If requried add further $group stages
])
Note, this code returns values from first day of current months to current day (not 2020 as in your sample data). You may adapt the ranges - your requirements are not clear from the question.
Mongo Playground
this is my document .
"calendar": {
"_id": "5cd26a886458720f7a66a3b8",
"hotel": "5cd02fe495be1a4f48150447",
"calendar": [
{
"_id": "5cd26a886458720f7a66a413",
"date": "1970-01-01T00:00:00.001Z",
"rooms": [
{
"_id": "5cd26a886458720f7a66a415",
"room": "5cd17d82ca56fe43e24ae5d3",
"price": 10,
"remaining": 8,
"reserved": 0
},
{
"_id": "5cd26a886458720f7a66a414",
"room": "5cd17db6ca56fe43e24ae5d4",
"price": 12,
"remaining": 8,
"reserved": 0
},
{
"_id": "5cd26a886458720f7a66a34",
"room": "5cd17db6ca45fe43e24ae5e7",
"price": 0,
"remaining": 0,
"reserved": 0
}
]
},
}
and this is my shema:
const calendarSchema = mongoose.Schema({
hotel: {
type: mongoose.Schema.ObjectId,
ref: "Hotel",
required: true
},
city: {
type: mongoose.Schema.ObjectId,
ref: "City"
},
calendar: [
{
date: Date,
rooms: [
{
room: {
type: mongoose.Schema.ObjectId,
ref: "Room",
required: true
},
price: {
type: Number
},
remaining: {
type: Number
},
reserved: {
type: Number
}
}
]
}
]
});
First of all, as you can see my calendar stores hotelId and CityId and included another calendar that contains some objects. There is nothing fancy here. The query has two conditions as below:
1.Our specific filter is located whole dates between startDate and endDate
2.Mentioned filter only shows the room's prices and remaining ( Not included zero num ).
And after injecting this conditions, query must return only the rooms that are matched with my filter.
I tried some query but the outcome is not my result .
db.calendars.find({
'calendar': {
'$elemMatch': {
date: {
'$lt': ISODate("2019-05-09T09:37:24.005Z"),
'$lt': ISODate("2019-06-05T09:37:24.005Z")
},
"rooms.$.price": { '$gt': 0 },
"rooms.$.remaining": { '$gt': 0 }
}
}
})
Unfortunately this is not THAT easy as you describe, this cannot be done with just a find assuming you want to project ONLY (and all) the rooms that match.
However with an aggregate this is possible, it would look like this:
db.calendars.aggregate([
{
$project:
{
"rooms": {
$filter: {
input: {
"$map": {
"input": "$calendar",
"as": "cal",
"in": {
"$cond": [
{
$and: [{$gt: ["$$cal.date", ISODate("2019-05-09T09:37:24.005Z")]},
{$lt: ["$$cal.date", ISODate("2019-06-05T09:37:24.005Z")]},]
},
{
"rooms": {
"$filter": {
"input": "$$cal.rooms",
"as": "room",
"cond": {
$and: [{"$gt": ["$$room.price", 0]},
{"$gt": ["$$room.remaining", 0]}]
}
}
},
date: "$$cal.date"
},
null
]
}
},
},
as: 'final',
cond: {$size: {$ifNull: ["$$final.rooms", []]}}
}
},
}
},
{
$match: {
"rooms.0": {$exists: true}
}
}
])
UserActivity.aggregate([
{
$match: {
user_id: {$in: user_id},
"tracker_id": {$in:getTrackerId},
date: { $gte: req.app.locals._mongo_date(req.params[3]),$lte: req.app.locals._mongo_date(req.params[4]) }
}
},
{ $sort: {date: 1 } },
{ $unwind: "$duration" },
{
$group: {
_id: {
tracker: '$tracker_id',
$last:"$duration",
year:{$year: '$date'},
month: {$month: '$date'},
day: {$dayOfMonth: '$date'}
},
resultData: {$sum: "$duration"}
}
},
{
$group: {
_id: {
year: "$_id.year",
$last:"$duration",
month:"$_id.month",
day: "$_id.day"
},
resultData: {
$addToSet: {
tracker: "$_id.tracker",
val: "$resultData"
}
}
}
}
], function (err,tAData) {
tAData.forEach(function(key){
console.log(key);
});
});
I got output from this collection
{ _id: { year: 2015, month: 11, day: 1 },
resultData:[ { tracker: 55d2e6b043d77c0877105397, val: 60 },
{ tracker: 55d2e6b043d77c0877105397, val: 75 },
{ tracker: 55d2e6b043d77c0877105397, val: 25 },
{ tracker: 55d2e6b043d77c0877105397, val: 21 } ] }
{ _id: { year: 2015, month: 11, day: 2 },
resultData:[ { tracker: 55d2e6b043d77c0877105397, val: 100 },
{ tracker: 55d2e6b043d77c0877105397, val: 110 },
{ tracker: 55d2e6b043d77c0877105397, val: 40 },
{ tracker: 55d2e6b043d77c0877105397, val: 45 } ] }
But I need this output from this collection, I want to fetch two last record from each collection:
{ _id: { year: 2015, month: 11, day: 1 },
resultData:[ { tracker: 55d2e6b043d77c0877105397, val: 25 },
{ tracker: 55d2e6b043d77c0877105397, val: 21 } ] }
{ _id: { year: 2015, month: 11, day: 2 },
resultData:[ { tracker: 55d2e6b043d77c0877105397, val: 40 },
{ tracker: 55d2e6b043d77c0877105397, val: 45 } ] }
You have clear syntax errors in your $group statement with $last as that is not a valid usage, but I suspect this has something to do with what you are "trying" to do rather than what you are using to get your actual result.
Getting a result with the "best n values" is a bit of a problem for the aggregation framework. There is this recent answer from myself with a longer explaination of the basic case, but it all boils down to the aggregation framework lacks the basic tools to do this "limitted" grouping per grouping key that you want.
Doing it badly
The horrible way to approach this is very "iterative" per the number of results you want to return. It basically means pushing everything into an array and then using operators like $first ( after sorting in reverse ) to return the result off the stack and subsequently "filter" that result ( think an array pop or shift operation ) out of the results and then do it again to get the next one.
Basically this with a 2 iteration example:
UserActivity.aggregate(
[
{ "$match": {
"user_id": { "$in": user_id },
"tracker_id": { "$in": getTrackerId },
"date": {
"$gte": startDate,
"$lt": endDate
}
}},
{ "$unwind": "$duration" },
{ "$group": {
"_id": {
"tracker_id": "$tracker_id",
"date": {
"$add": [
{ "$subtract": [
{ "$subtract": [ "$date", new Date(0) ] },
{ "$mod": [
{ "$subtract": [ "$date", new Date(0) ] },
1000 * 60 * 60 * 24
]}
]},
new Date(0)
]
},
"val": { "$sum": "$duration" }
}
}},
{ "$sort": { "_id": 1, "val": -1 } },
{ "$group": {
"_id": "$_id.date",
"resultData": {
"$push": {
"tracker_id": "$_id.tracker_id",
"val": "$val"
}
}
}},
{ "$unwind": "$resultData " },
{ "$group": {
"_id": "$_id",
"last": { "$first": "$resultData" },
"resultData": { "$push": "$resultData" }
}},
{ "$unwind": "$resultData" },
{ "$redact": {
"if": { "$eq": [ "$resultData", "$last" ] },
"then": "$$PRUNE",
"else": "$$KEEP"
}},
{ "$group": {
"_id": "$_id",
"last": { "$first": "$last" },
"secondLast": { "$first": "$resultData" }
}},
{ "$project": {
"resultData": {
"$map": {
"input": [0,1],
"as": "index",
"in": {
"$cond": {
"if": { "$eq": [ "$$index", 0 ] },
"$last",
}
}
}
}
}}
],
function (err,tAData) {
console.log(JSON.stringify(tAData,undefined,2))
}
);
Also simplifying your date inputs to startDate and endDate as pre determined date object values before the pipeline code. But the principles here show this is not a performant or very scalable approach, and mostly due to needing to put all results into an array and then deal with that to just get the values.
Doing it better
A much better approach is to send an aggregation query to the server for each date in the range, as date is what you want as the eventual key. Since you only return each "key" at once, it is easy to just apply $limit to restrict the response.
The ideal case is to perform these queries in parallel and then combine them. Fortunately the node async library provides an async.map or specifically async.mapLimit which performs this function exactly:
N.B You don't want async.mapSeries for the best performance since queries are "serially executed in order" and that means only one operation occurs on the server at a time. The results are array ordered, but it's going to take longer. A client sort makes more sense here.
var dates = [],
returnLimit = 2,
OneDay = 1000 * 60 * 60 * 24;
// Produce an array for each date in the range
for (
var myDate = startDate;
myDate < endDate;
myDate = new Date( startDate.valueOf() + OneDay )
) {
dates.push(myDate);
}
async.mapLimit(
dates,
10,
function (date,callback) {
UserActivity.aggregate(
[
{ "$match": {
"user_id": { "$in": user_id },
"tracker_id": { "$in": getTrackerId },
"date": {
"$gte": date,
"$lt": new Date( date.valueOf() + OneDay )
}
}},
{ "$unwind": "$duration" },
{ "$group": {
"_id": {
"tracker_id": "$tracker_id",
"date": {
"$add": [
{ "$subtract": [
{ "$subtract": [ "$date", new Date(0) ] },
{ "$mod": [
{ "$subtract": [ "$date", new Date(0) ] },
OneDay
]}
]},
new Date(0)
]
},
"val": { "$sum": "$duration" }
}
}},
{ "$sort": { "_id": 1, "val": -1 } },
{ "$limit": returnLimit },
{ "$group": {
"_id": "$_id.date",
"resultData": {
"$push": {
"tracker_id": "$_id.tracker_id",
"val": "$val"
}
}
}}
],
function (err,result) {
callback(err,result[0]);
}
);
},
function (err,results) {
if (err) throw err;
results.sort(function (a,b) {
return a._id > b._id;
});
console.log( JSON.stringify( results, undefined, 2 ) );
}
);
Now that is a much cleaner listing and a lot more efficient and scalable than the first approach. By issuing each aggregation per single date and then combining the results, the "limit" there allows up to 10 queries to execute on the server at the same time ( tune to your needs ) and ultimately return a singular response.
Since these are "async" and not performed in series ( the best performance option ) then you just need to sort the returned array as is done in the final block:
results.sort(function (a,b) {
return a._id > b._id;
});
And then everything is ordered as you would expect in the response.
Forcing the aggregation pipeline to do this where it really is not necessary is a sure path to code that will fail in the future if it does not already do so now. Parallel query operations and combining the results "just makes sense" for efficient and scalable output.
Also note that you should not use $lte with range selections on dates. Even if you though about it, the better approach is "startDate" with "endDate" being the next "whole day" ( start ) after the range you want. This makes a cleaner distinction on the selection that say "the last second of the day".
I am struggling with MongoDb in order to achieve a desirable result.
My Collection looks like:
{
_id: ...
place: 1
city: 6
user: 306
createDate: 2014-08-10 12:20:21,
lastUpdate: 2014-08-14 10:11:01,
data: [
{
customId4: 4,
entryDate: 2014-07-12 12:01:11,
exitDate: 2014-07-12 13:12:12
},
{
customId4: 4,
entryDate: 2014-07-14 00:00:01,
},
{
customId4: 5,
entryDate: 2014-07-15 11:01:11,
exitDate: 2014-07-15 11:05:15
},
{
customId4: 5,
entryDate: 2014-07-22 21:01:11,
exitDate: 2014-07-22 21:23:22
},
{
customId4: 4,
entryDate: 2014-07-23 14:00:11,
},
{
customId4: 4,
entryDate: 2014-07-29 22:00:11,
exitDate: 2014-07-29 23:00:12
},
{
customId4: 5,
entryDate: 2014-08-12 12:01:11,
exitDate: 2014-08-12 13:12:12
},
]
}
So what I would like to achieve is the array data that meets the requirements of a certain interval and that has both, entryDate and exitDate values set.
For example, if I filter by the interval "2014-07-23 00:00:00 to 2014-08-31 00:00:00" I would like the result like:
{
result: [
{
_id: {
place: 1,
user: 306
},
city: 6,
place: 1,
user: 306,
data: [
{
customMap: 4,
entryDate: 2014-07-22 21:01:11,
exitDate: 2014-07-22 21:23:22
},
{
customId4: ,
entryDate: 2014-07-29 22:00:11,
exitDate: 2014-07-29 23:00:12
},
]
}
],
ok: 1
}
My custom mongodb query looks like (from, to and placeIds are variables properly configured)
db.myColl.aggregate(
{ $match: {
'user': 1,
'data.entryDate': { $gte: from, $lte: to },
'place': { $in: placeIds },
}},
{ $unwind : "$data" },
{ $project: {
'city': 1,
'place': 1,
'user': 1,
'lastUpdate': 1,
'data.entryDate': 1,
'data.exitDate': 1,
'data.custom': 1,
fromValid: { $gte: ["$'data.entryDate'", from]},
toValid: { $lte: ["$'data.entryDate'", to]}}
},
{ $group: {
'_id': {'place': '$place', 'user': '$user'},
'city': {'$first': '$city'},
'place': {'$first': '$place'},
'user': {'$first': '$user'},
'data': { '$push': '$data'}
}}
)
But this doesn't filter the way I want because it outputs every document that meets the $match operand conditions, inside the $project operand I am unable to define the condition (I don't know if this is how it has to be done in mongoDB)
Thanks in advance!
You were on the right track, but what you might be missing with the aggregation "pipeline" is that just like the "|" pipe operator in the unix shell you "chain" the pipeline stages together just as you would chain commands.
So in fact to can have a second $match pipeline stage that does the filtering for you:
db.myColl.aggregate([
{ "$match": {
"user": 1,
"data.entryDate": { "$gte": from, "$lte": to },
"place": { "$in": "placeIds" },
}},
{ "$unwind": "$data" },
{ "$match": {
"data.entryDate": { "$gte": from, "$lte": to },
}},
{ "$group": {
"_id": "$_id",
"place": { "$first": "$place" },
"city": { "$first": "$city" },
"user": { "$first": "$user" },
"data": { "$push": "$data" }
}}
])
Using the actual _id of the document as a grouping key presuming that you want the document back but just with a filtered array.
From MongoDB 2.6, as long as your matching array elements are unique, you could just do the same thing within $project using the $map and $setDifference** operators:
db.myColl.aggregate([
{ "$match": {
"user": 1,
"data.entryDate": { "$gte": from, "$lte": to },
"place": { "$in": "placeIds" },
}},
{ "$project": {
"place": 1,
"city": 1,
"user": 1,
"data": {
"$setDifference": [
{ "$map": {
"input": "$data",
"as": "el",
"in": {"$cond": [
{ "$and": [
{ "$gte": [ "$$el.entryDate", from ] },
{ "$lte": [ "$$el.entryDate", to ] }
]},
"$$el",
false
]}
}},
[false]
]
}
}}
])
That does the same logical thing by processing each array element and evaluating whether it meets the conditions. If so then the element content is returned, if not the false is returned. The $setDifference filters out all the false values so that only those that match remain.