Search date in array Mongodb - mongodb

I have many documents like this one:
{
"_id" : ObjectId("54a94200aa76d3db6cd51977"),
"URL" : "http://...",
"Statistics" : [
{
"Date" : ISODate("2010-05-18T18:07:29.000+0000"),
"Clicks" : NumberInt(250),
},
{
"Date" : ISODate("2010-05-21T12:06:41.000+0000"),
"Clicks" : NumberInt(165),
},
{
"Date" : ISODate("2010-05-30T08:37:50.000+0000"),
"Clicks" : NumberInt(263),
}
]
}
My query looks like this:
db.clicks.aggregate([
{ $match : 'Statistics.Date' : { $gte: new Date("2010-05-18T00:00:00.000Z"), $lte: new Date("2010-05-18T23:59:59.999Z") } },
{ $unwind' => '$Statistics' },
{ $group : { _id : { year : { $year : '$Statistics.Date' }, month : { $month : '$Statistics.Date' }, day : { $dayOfMonth : '$Statistics.Date' } }, Clicks : { $sum : '$Statistics.Clicks' } },
{ $sort : { _id : 1 } }
])
When I try to sum up the clicks from a specific date it gives me all dates, instead of only one. What am I doing wrong? Thanks in advance.
Edit 1:
As there are >80.000 documents in that collection I can't do a $unwind before the $match. Also afaik this would be not a good idea, 'cause that would make the query slower than necessary.
The huge amount of documents and data in it is the reason why I have to use $sum. The document I made above is just an example and only the structure is the same as in my project.
The above query gives me back smth like this:
{
"_id" : [
{
"year" : 2010,
"month" : 5,
"day" : 18
}
],
"Clicks" : 250
},
{
"_id" : [
{
"year" : 2010,
"month" : 4,
"day" : 21
}
],
"Clicks" : 165
},
{
"_id" : [
{
"year" : 2010,
"month" : 5,
"day" : 30
}
],
"Clicks" : 263
}
If I don't use $group I also have to use $limit as the query would exceed 16MB otherwise:
db.clicks.aggregate([
{ $match : 'Statistics.Date' : { $gte: new Date("2010-05-18T00:00:00.000Z"), $lte: new Date("2010-05-18T23:59:59.999Z") } },
{ $unwind' : '$Statistics' },
{ $limit : 1 }
])
This result:
{
"_id" : ObjectId("54a94200aa76d3db6cd51977"),
"URL" : "http://...",
"Statistics" : {
"Date" : {
"sec" : 1274166878,
"usec" : 0
},
"Clicks" : 250
}
}
Due to performance reasons I have to use $group and not using it is not an option.
As I have all done in PHP there may be some errors in the document, queries and results I mentioned. Hopefully this won't be a problem. I still haven't figured out what's causing my problem. Can anyone help me?
Edit 2:
As this seems to be an performance issue which can't be solved I'm migrating all the data from the 'Statistics' array into an own collection. Thx to anyone for your help.

You need to run your $match twice, both before and after the $unwind:
db.clicks.aggregate([
{ $match : { 'Statistics.Date' : {
$gte: new ISODate("2010-05-18T00:00:00.000Z"),
$lte: new ISODate("2010-05-18T23:59:59.999Z") } } },
{ $unwind: '$Statistics' },
{ $match : { 'Statistics.Date' : {
$gte: new ISODate("2010-05-18T00:00:00.000Z"),
$lte: new ISODate("2010-05-18T23:59:59.999Z") } } },
{ $group : {
_id : { year : { $year : '$Statistics.Date' },
month : { $month : '$Statistics.Date' },
day : { $dayOfMonth : '$Statistics.Date' } },
Clicks : { $sum : '$Statistics.Clicks' } } },
{ $sort : { _id : 1 } }
])
The first $match is used to select the documents with at least one Statistics element in the right date range. The second one is used to filter out the other Statistics elements of those docs that aren't in the right date range.

Things may have been solved but posting answer for ones who are seeking help from this question
{ $match : 'Statistics.Date' : { $gte: new Date("2010-05-18T00:00:00.000Z"),
enter code here$lte: new Date("2010-05-18T23:59:59.999Z") } }
this match will filter main documents. What you want is to filter the documents inside Statistics array.
Now documents filtered by $match will contain full Statistic array. And unwinding after filtering may have Sub-document of Statistic whose sibling document(document that are in the same array) have passed $match condition.
Note: simple find projection:
db.col_name.find({},{"Statistics.$":1}) will filter array too but
$project in aggregation is not helping in filtering array of documents.

Related

Problem in using indexes in aggregation pipeline

I have a query like this
db.UserPosts.aggregate([
{ "$match" : { "Posts.DateTime" : { "$gte" : ISODate("2018-09-04T11:50:58Z"), "$lte" : ISODate("2018-09-05T11:50:58Z") } } },
{ "$match" : { "UserId" : { "$in" : [NUUID("aaaaaaaa-cccc-dddd-dddd-5369b183cccc"), NUUID("vvvvvvvv-bbbb-ffff-cccc-e0af0c8acccc")] } } },
{ "$project" : { "_id" : 0, "UserId" : 1, "Posts" : 1 } },
{ "$unwind" : "$Posts" },
{ "$unwind" : "$Posts.Comments" },
{ "$sort" : {"Posts.DateTime" : -1} },
{ "$skip" : 0 }, { "$limit" : 20 },
{ "$project" : { "_id" : 0, "UserId" : 1, "DateTime" : "$Posts.DateTime", "Title" : "$Posts.Title", "Type" : "$Posts.Comments.Type", "Comment" : "$Posts.Comments.Description" } },
],{allowDiskUse:true})
I have a compound index
{
"Posts.DateTime" : -1,
"UserId" : 1
}
Posts and Comments are array of objects.
I've tried different types of indexes but the problem is it does not use my index in $sort stage. I changed the place of my $sort stage but wasn't successful. It seems it is working in $match but not set to $sort. I even tried 2 simple indexes on those fields and combination of 2 simple indexes and one compound index but none of them works.
I also read related documents in MongoDB website for
Compound Indexes
Use Indexes to Sort Query Results
Index Intersection
Aggregation Pipeline Optimization
Could somebody please help me to find the solution?
I solved this problem by changing my data model and moving DateTime to higher level of data.

mongodb $avg aggregation calculation out by a few decimals.

We have a collection in Mongodb which saves a value linked to a timestamp.
Our document looks as follows (I have pasted an actual one here):
{
"_id" : ObjectId("5a99596b0155fe271cfcf41d"),
"Timestamp" : ISODate("2018-03-02T16:00:00.000Z"),
"TagID" : ObjectId("59f8609eefbb4102f4c249e3"),
"Value" : 71.3,
"FileReferenceID" : ObjectId("000000000000000000000000"),
"WasValueInterpolated" : 0
}
What we then do is calculate the avg between two intervals for a given period, in more basic terms, work out an aggregated profile.
The aggregation code we use is:
{[{ "$match" :
{
"TagID" : ObjectId("59f8609eefbb4102f4c249e3") }
},
{
"$match" : { "Timestamp" : { "$gte" : ISODate("2018-03-12T00:00:00.001Z") } }
},
{
"$match" : { "Timestamp" : { "$lte" : ISODate("2018-03-13T00:00:00.001Z") } }
},
{
"$group" :
{
"_id" : { "GroupedMillisecond" :
{
"$let" :
{
"vars" :
{ "newMillisecondField" :
{
"$subtract" : ["$Timestamp", ISODate("2018-03-12T00:00:00.001Z")]
}
},
"in" : { "$subtract" : ["$$newMillisecondField", { "$mod" : ["$$newMillisecondField", NumberLong(1800000)] }] }
}
} }, "AverageValue" : { "$avg" : "$Value" }
}
}, { "$sort" : { "_id.GroupedMillisecond" : 1 } }
]}
The problem is this, the value it should give back is 71.3, but we get back 71.299999999999997
In this case, I posted above we are calculating the avg value, half hourly aggregated, for a day. And there is only one value per half hour logged (I checked this in the database). The value is also logged as a constant, as far back as I manually checked (a few months back) it is 71.3
So my question is why does the value differ?

Get array of Date for each user grouped by $dayOfWeek and operation on it in MongoDb

I want to aggregate my data and make an array with multiple stored date, grouped by user and day of week and for this day, something like for this data (according we are february, the 24th) :
{
"_id" : ObjectId("58b0b4b732d3cd188cea9e1b"),
"user" : 1,
"heure" : ISODate("2017-02-24T22:33:27.858Z")
}
{
"_id" : ObjectId("58b0b4b732d3cd188cea9e1b"),
"user" : 1,
"heure" : ISODate("2017-02-24T23:33:27.858Z")
}
{
"_id" : ObjectId("58b0b4b732d3cd188cea9e1b"),
"user" : 2,
"heure" : ISODate("2017-02-24T22:34:27.858Z")
}
{
"_id" : ObjectId("58b0b4b732d3cd188cea9e1b"),
"user" : 1,
"heure" : ISODate("2017-02-25T07:21:27.858Z")
}
Get this :
{
"_id" : {user : 1, jour : 55}
"date" : [ISODate("2017-02-24T22:33:27.858Z"), ISODate("2017-02-24T23:33:27.858Z") ]
}
{
"_id" : {user : 2, jour : 55}
"date" : [ISODate("2017-02-24T22:34:27.858Z") ]
}
I tried using $push of $match, but everything failed.
Optionally, i want to have the time beetween time two date, like for user 1, adding another field which contains 1 hours. But i don't wan't to use ate at most once, so with 4 date in array, i need to have only a addition : the value of first and second with the value of third and fourth. I want to see this to learn how to use the $cond properly
Here is my actual pipeline :
[
{ $match : {$eq : [{$dayOfYear : "$heure"}, {$dayOfYear : ISODate()}] }
{
$group : {
_id : {
user : "$user",
},
date : {$push: "$heure"},
nombre: { $sum : 1 }
}
}
]
For now, i don't handle the second part of the aggregate function
For the first filter part you need to use $redact pipeline as it will return all documents that match the condition with the $$KEEP system variable returned by $cond based on the $dayOfYear date operator and discards documents otherwise with $$PRUNE.
Consider composing your final aggregate pipeline as:
[
{
"$redact": {
"$cond": [
{
"$eq": [
{ "$dayOfYear": "$heure" },
{ "$dayOfYear": new Date() }
]
},
"$$KEEP",
"$$PRUNE"
]
}
},
{
"$group": {
"_id": {
"user": "$user",
"jour": { "$dayOfYear": "$heure" }
},
"date": { "$push": "$heure" },
"nombre": { "$sum": 1 }
}
}
]

Mongodb count() of internal array

I have the following MongoDB collection db.students:
/* 0 */
{
"id" : "0000",
"name" : "John"
"subjects" : [
{
"professor" : "Smith",
"day" : "Monday"
},
{
"professor" : "Smith",
"day" : "Tuesday"
}
]
}
/* 1 */
{
"id" : "0001",
"name" : "Mike"
"subjects" : [
{
"professor" : "Smith",
"day" : "Monday"
}
]
}
I want to find the number of subjects for a given student. I have a query:
db.students.find({'id':'0000'})
that will return the student document. How do I find the count for 'subjects'? Is it doable in a simple query?
If query will return just one element :
db.students.find({'id':'0000'})[0].subjects.length;
For multiple elements in cursor :
db.students.find({'id':'0000'}).forEach(function(doc) {
print(doc.subjects.length);
})
Do not forget to check existence of subjects either in query or before check .length
You could use the aggregation framework
db.students.aggregate(
[
{ $match : {'_id': '0000'}},
{ $unwind : "$subjects" },
{ $group : { _id : null, number : { $sum : 1 } } }
]
);
The $match stage will filter based on the student's _id
The $unwind stage will deconstruct your subjects array to multiple documents
The $group stage is when the count is done. _id is null because you are doing the count for only one user and only need to count.
You will have a result like :
{ "result" : [ { "_id" : null, "number" : 187 } ], "ok" : 1 }
Just another nice and simple aggregation solution:
db.students.aggregate([
{ $match : { 'id':'0000' } },
{ $project: {
subjectsCount: { $cond: {
if: { $isArray: "$subjects" },
then: { $size: "$subjects" },
else: 0
}
}
}
}
]).then(result => {
// handle result
}).catch(err => {
throw err;
});
Thanks!

Mongo aggregation framework: group users by age

I have a user base stored in mongo. Users may record their date of birth.
I need to run a report aggregating users by age.
I now have a pipeline that groups users by year of birth. However, that is not precise enough because most people are not born on January 1st; so even if they are born in, say, 1970, they may well not be 43 yet.
db.Users.aggregate([
{ $match : { "DateOfBirth" : { $exists : true} } },
{ $project : {"YearOfBirth" : {$year : "$DateOfBirth"} } },
{ $group : { _id : "$YearOfBirth", Total : { $sum : 1} } },
{ $sort : { "Total" : -1 } }
])
Do you know if it's possible to perform some kind of arithmetic within the aggregation framework to exactly calculate the age of a user? Or is this possible with MapReduce only?
It seems like the whole thing is possible with the new Mongo 2.4 version just released, supporting additional Date operations (namely the "$subtract").
Here's how I did it:
db.Users.aggregate([
{ $match : { "DateOfBirth" : { $exists : true} } },
{ $project : {"ageInMillis" : {$subtract : [new Date(), "$DateOfBirth"] } } },
{ $project : {"age" : {$divide : ["$ageInMillis", 31558464000] }}},
// take the floor of the previous number:
{ $project : {"age" : {$subtract : ["$age", {$mod : ["$age",1]}]}}},
{ $group : { _id : "$age", Total : { $sum : 1} } },
{ $sort : { "Total" : -1 } }
])
There are not enough dateTime operators and math operators to project out the date. But you might be able to create age ranges by composing a dynamic query:
Define your date ranges as cut-off dates as
dt18 = today - 18
dt25 = today - 25
...
dt65 = today - 65
Then do nested conditionals, where you progressively use the cut off dates as age group markers, like so:
db.folks.save({ "_id" : 1, "bd" : ISODate("2000-02-03T00:00:00Z") });
db.folks.save({ "_id" : 2, "bd" : ISODate("2010-06-07T00:00:00Z") });
db.folks.save({ "_id" : 3, "bd" : ISODate("1990-10-20T00:00:00Z") });
db.folks.save({ "_id" : 4, "bd" : ISODate("1964-09-23T00:00:00Z") });
db.folks.aggregate(
{
$project: {
ageGroup: {
$cond: [{
$gt: ["$bd",
ISODate("1995-03-19")]
},
"age0_18",
{
$cond: [{
$gt: ["$bd",
ISODate("1988-03-19")]
},
"age18_25",
"age25_plus"]
}]
}
}
},
{
$group: {
_id: "$ageGroup",
count: {
$sum: 1
}
}
})