Group distinct count() on a single field in a single request - mongodb

Is there a way to get different counts on a single field from a single document ?
Here is a schema for a document User
UserSchema: {
name: { type: String, required: true },
created_at: { type: Date, default: now }
}
I would like to get every Users created the 01/05/2013 and the 06/08/2013, maybe i'll need to count more different dates.
Can i get these datas on a sigle count() or should i get all the Users with a find() then count it using javascript ?

You can use the collection.count() form which accepts a query, along with the use of $or and ranges:
db.collection.count({ "$or": [
{ "created_at":
{"$gte": new Date("2014-05-01"), "$lt": new Date("2014-05-02") }
},
{ "created_at":
{"$gte": new Date("2013-08-06"), "$lt": new Date("2013-08-07") }
}
]})
Or you can pass that query to .find() and use the cursor count from there if that suits your taste.
But then, I read your title again, and distinct count would be different, and best to use aggregate to get the distinct days:
db.collection.aggregate([
// Match the dates you want to filter
{ "$match": {
{ "$or": [
{ "created_at": {
"$gte": new Date("2014-05-01"),
"$lt": new Date("2014-05-02")
}},
{ "created_at": {
"$gte": new Date("2013-08-06"),
"$lt": new Date("2013-08-07")
}}
]}
}},
// Group on the *whole* day and sum the count
{ "$group": {
"_id": {
"year": { "$year": "$created_at" },
"month": { "$month": "$created_at" },
"day": { "$dayOfMonth": "$created_at" }
},
"count": { "$sum": 1 }
}}
])
And that would give you a distinct count of the documents for each selected day you had added in your $or clause.
No need for looping in code.

Related

Slow date aggregation query in mongo

I have a mongo collection with 7 million documents, besides a couple of other fields each document has a 'createdAt' Date object. I have an index 'createdAt:1' on the field and it's hosted at a dedicated mongo service.
When I try to group by day the query gets real slow. Here is my aggregation query:
{
"$match": {
"createdAt": {
$gte:new Date(1472189560111)
}
}
},
{
"$project": {
"date":
{
"$dateToString": {
"format": "%Y-%m-%d",
"date": "$createdAt"
}
},
"count": 1
}
},
{
"$group": {
"_id": "$date",
"count": {
"$sum": 1
}
}
},
{
"$sort": {
"_id": 1
}
},
{
"$project": {
"date": "$_id",
"count": 1,
"_id": 0
}
}
What's a good strategy to improve the performance? Is there a problem in my aggregation pipeline? Do I need a field that contains the day date object with a fixed time like 00:00 and group on that? It seems such a basic operation that I believe there has to be a mongodb native way of doing that.

Finding multiple documents with one query

I have a schema like so:
schema
{
owner: <id to other document type>
created: date
}
I have an array of owner's ids: [owner_id_1, owner_id_2, ... owner_id_x]
I want to get a list of documents, with these owners, but limited to just the latest of each. Doing the queries individually:
find_one({ owner: owner_id_1 }).sort({ created: -1 }).limit(1)
But I don't want to have to fire off x of these, I'd like a way to do it in one query if possible
The .aggregate() method allows you do do this, along with matching the documents via the $in operator:
collection.aggregate([
{ "$match": { "owner": { "$in": [owner_id_1, owner_id_2, ... owner_id_x] } },
{ "$group": {
"_id": "$owner",
"created": { "$max": "$created" }
}}
])
Gets the maximum ( $max ) "created" value for each "owner" you asked for with the $in, which takes an array of values to match the field in the condition.
If you wanted more data than just that one field, the use $sort before you $group:
collection.aggregate([
{ "$match": { "owner": { "$in": [owner_id_1, owner_id_2, ... owner_id_x] } },
{ "$sort": { "owner": 1, "created": -1 } },
{ "$group": {
"_id": "$owner",
"created": { "$first": "$created" },
"docId": { "$first": "$_id" },
"something": { "$first": "$something" }
}}
])
And the $first takes the first value ( descending was done in sort ) from each grouping boundary.

GroupBy DayOfMonth in mongodb but project Complete Date

I have a Collection containing a date field. I want to group it by dayOfMonth but at the time of projection I want to project the complete Date and associated count.
I have a raw Collection in mongodb containing a Timestamp (Date field)
This is my Aggregation query:
db.raw.aggregate(
{
"$match" : { "Timestamp":{$gte:new Date("2012-05-30T00:00:00.000Z"),$lt:new Date("2014-05-31T00:00:00.000Z")}}
},
{
$group:
{
_id: { ApplicationId: "$ApplicationId", date: {$dayOfMonth: '$Timestamp'} },
count: { $sum: 1 }
}
}
)
In the above query I'm grouping with dayOfMonth but how can I project complete the Date with count?
Your "Timestamp" values are clearly actual points in time so there really isn't a "complete date" to return. You could just generally "do the math" based on the date range you are applying and the "day of month" values returned as you process the results returned.
But alternately you could just "apply the math" to the date values in order by rounding the "timestamp" values out to the day. The returned values are no longer date objects, but they are the millisecond since epoch values, so it is relatively easy to "seed" those to date functions:
db.raw.aggregate([
{ "$match" : {
"Timestamp":{
"$gte": new Date("2012-05-30"),
"$lt": new Date("2014-05-31")
}
}},
{ "$group": {
"_id": {
"$subtract": [
{ "$subtract": [ "$Timestamp", new Date("1970-01-01") ] },
{ "$mod": [
{ "$subtract": [ "$Timestamp", new Date("1970-01-01") ] },
1000 * 60 * 60 * 24
])
]
},
"count": { "$sum": 1 }
}}
])
So when you subtract one date object from another the difference is milliseconds is returned as a number. So this just normalizes to epoch seconds by subtracting the epoch date. The rest is basic date math to round the result to the current day.
Alternately again you could just use other date aggregation operators and concatenate to a string, but there would be usually a bit more work involved unless those values were for direct use:
db.raw.aggregate([
{ "$match" : {
"Timestamp":{
"$gte": new Date("2012-05-30"),
"$lt": new Date("2014-05-31")
}
}},
{ "$group": {
"_id": {
"$concat": [
{ "$substr": [{ "$year": "$Timestamp" },0,4] },
"-",
{ "$substr": [{ "$month": "$Timestamp" },0,2] },
"-",
{ "$substr": [{ "$dayOfMonth": "$Timestamp" },0,2] }
]
},
"count": { "$sum": 1 }
}}
])
Neil Lunn has provides a great answer.
Theirs one more approach that u can use :
db.raw.aggregate([
{
"$match" :
{
"Timestamp":{"$gte": new Date("2012-05-30"), "$lt": new Date("2014-07-31")}
}
},
{
"$group" :
{
"_id":{"$dayOfMonth": "$Timestamp"},
"Date":{"$first":"$Timestamp"},
"count": { "$sum": 1 }
}
}
])
It will return you date.
Hope so this helps you.

Correct query for group by user, per month

I have MongoDB collection that stores documents in this format:
"name" : "Username",
"timeOfError" : ISODate("...")
I'm using this collection to keep track of who got an error and when it occurred.
What I want to do now is create a query that retrieves errors per user, per month or something similar. Something like this:
{
"result": [
{
"_id": "$name",
"errorsPerMonth": [
{
"month": "0",
"errorsThisMonth": 10
},
{
"month": "1",
"errorsThisMonth": 20
}
]
}
]
}
I have tried several different queries, but none have given the desired result. The closest result came from this query:
db.collection.aggregate(
[
{
$group:
{
_id: { $month: "$timeOfError"},
name: { $push: "$name" },
totalErrorsThisMonth: { $sum: 1 }
}
}
]
);
The problem here is that the $push just adds the username for each error. So I get an array with duplicate names.
You need to compound the _id value in $group:
db.collection.aggregate([
{ "$group": {
"_id": {
"name": "$name",
"month": { "$month": "$timeOfError" }
},
"totalErrors": { "$sum": 1 }
}}
])
The _id is essentially the "grouping key", so whatever elements you want to group by need to be a part of that.
If you want a different order then you can change the grouping key precedence:
db.collection.aggregate([
{ "$group": {
"_id": {
"month": { "$month": "$timeOfError" },
"name": "$name"
},
"totalErrors": { "$sum": 1 }
}}
])
Or if you even wanted to or had other conditions in your pipeline with different fields, just add a $sort pipeline stage at the end:
db.collection.aggregate([
{ "$group": {
"_id": {
"month": { "$month": "$timeOfError" },
"name": "$name"
},
"totalErrors": { "$sum": 1 }
}},
{ "$sort": { "_id.name": 1, "_id.month": 1 } }
])
Where you can essentially $sort on whatever you want.

Mongodb mapreduce sorting (optimization) or alternative

I have a few documents that look like this:
{
'page_id': 123131,
'timestamp': ISODate('2014-06-10T12:13:59'),
'processed': false
}
The documents have other fields, but these are the only one relevant for this purpose. On this collection is also an index for these documents:
{
'page_id': 1
'timestamp': -1
}
I run a mapreduce that returns distinct (page_id, day) results, with day being the date-portion of the timestamp (in the above, it would be 2014-06-10).
This is done with the following mapreduce:
function() {
emit({
site_id: this.page_id,
day: Date.UTC(this.timestamp.getUTCFullYear(),
this.timestamp.getUTCMonth(),
this.timestamp.getUTCDate())
}, {
count: 1
});
}
The reduce-function basically just returns { count: 1 } as I am not really interested in the number, just unique tuples.
I wish to make this more efficient. I tried adding sort: { 'page_id' }, but it triggers an error - googling shows that I can apparently only sort by the key, but since this is not a "raw" key how does that work?
Also, is there an alternative to this mapreduce that is faster? I know mongodb has the distinct, but from what I can gather it only works on one field. Might the group aggregate function be relevant?
The aggregation framework would seem more appropriate since it runs in native code where mapReduce runs under a JavaScript interpreter instance. MapReduce has it's uses, but generally the aggregation framework should be best suited to common tasks that do not require specific processing where only the JavaScript methods allow the needed control:
db.collection.aggregate([
{ "$group": {
"_id": {
"page": "$page_id",
"day": {
"year": { "$year": "$timestamp" },
"month": { "$month": "$timestamp" },
"day": { "$dayOfMonth": "$timestamp" },
}
},
"count": { "$sum": 1 }
}}
])
This largely makes use of the date aggregation operators. See other aggregation framework operators for more details.
Of course if you wanted to reverse sort those unique dates (which is the opposite of what mapReduce will do) or other fields then just add a $sort to the end of the pipeline for what you want:
db.collection.aggregate([
{ "$group": {
"_id": {
"page": "$page_id",
"day": {
"year": { "$year": "$timestamp" },
"month": { "$month": "$timestamp" },
"day": { "$dayOfMonth": "$timestamp" },
}
},
"count": { "$sum": 1 }
}},
{ "$sort": {
"day.year": -1, "day.month": -1, "day.day": -1
}}
])
you might want to look at the aggregation framework.
query like this:
collection.aggregate([
{$group:
{
_id: {
year: { $year: [ "$timestamp" ] },
month: { $month: [ "$timestamp" ] },
day: { $dayOfMonth: [ "$timestamp" ] },
pageId: "$page_id"
}
}
])
will give you all unique combinations of the fields you're looking for.