Order by date in sub-document and then by document - mongodb

I have a simple "Event" mongo schema. Two sample documents are below :
Event Document #1
{
"event_name": "Some nice event",
"venues": [
{
"venue_name": "venue #1",
"shows": [
{
"show_time": "2014-06-18T07:46:02.415Z",
"capacity": 20
},
{
"show_time": "2014-06-20T07:46:02.415Z",
"capacity": 40
}
]
},
{
"venue_name": "venue #2",
"shows": [
{
"show_time": "2014-06-17T07:46:02.415Z",
"capacity": 20
},
{
"show_time": "2014-06-24T07:46:02.415Z",
"capacity": 40
}
]
}
]
}
Event Document #2
{
"event_name": "Another nice event",
"venues": [
{
"venue_name": "venue #1",
"shows": [
{
"show_time": "2014-06-19T07:46:02.415Z",
"capacity": 20
},
{
"show_time": "2014-06-16T07:46:02.415Z",
"capacity": 40
}
]
}
]
}
I need to query this collection of event documents and fetch the events with the closest shows, with respective to a particular time.
So for e.g., if I had to find events happening on or after 16 Jun, I should get document #2 followed by document #1, with the venue sub-document order as [venue #2, venue #1].
On the other hand, if I wanted events happening on or after 18 Jun, I should get document #1, with [venue #1, venue #2], followed by document #2.
Essentially, I need to be able to sort by the start_time of the nested sub-document. And this sorting should work on multiple venue sub-documents.
According to mongo's documentation, this doesn't seem to be supported, so is there a way of using aggregation to achieve this?
Or is there a way to rejig the schema to support such queries?
Or is mongoDB the wrong use-case for such scenarios altogether?

Really good question. Hoping that your dates are real date but the lexical form should not really matter here. The following form should do it, as long as you take the dates into consideration:
db.event.aggregate([
// Match the "documents" that meet the condition
{ "$match": {
"venues.shows.show_time": { "$gte": new Date("2014-06-16") }
}},
// Unwind the arrays
{ "$unwind": "$venues" },
{ "$unwind": "$venues.shows" },
// Sort the entries just to float the nearest result
{ "$sort": { "venues.shows.show_time": 1 } },
// Find the "earliest" for the venue while grouping
{ "$group": {
"_id": {
"_id": "$_id",
"event_name": "$event_name",
"venue_name": "$venues.venue_name"
},
"shows": {
"$push": "$venues.shows"
},
"earliest": {
"$min": {
"$cond": [
{ "$gte": [
"$venues.shows.show_time",
new Date("2014-06-16")
]},
"$venues.shows.show_time",
null
]
}
}
}},
// Sort those because of the order you want
{ "$sort": { "earliest": 1 } },
// Group back and with the "earliest" document
{ "$group": {
"_id": "$_id._id",
"event_name": { "$first": "$_id.event_name" },
"venues": {
"$push": {
"venue_name": "$_id.venue_name",
"shows": "$shows"
}
},
"earliest": {
"$min": {
"$cond": [
{ "$gte": [
"$earliest",
new Date("2014-06-16")
]},
"$earliest",
null
]
}
}
}},
// Sort by the earliest document
{ "$sort": { "earliest": 1 } },
// Project the fields
{ "$project": {
"event_name": 1,
"venues": 1
}}
])
So most of this looks reasonable straightforward if you have some experience with the aggregation framework. If not then there is some general explaining, plus there are some "funky" things happening as we evaluate further.
The first steps in aggregation are to $match just like any normal query and then to $unwind the arrays you want to process. The "unwind" statement effectively "de-normalizes" the documents contained in the array to be standard documents by themselves.
The next $sort ends up as a "prettying up" function as the "earliest" event in each "set" will be at the top as a result.
As there are "two" levels of arrays, you do the grouping in two stages via the $group pipeline stage.
The first $group "groups" by "document", "event_name" and "venue". All of the shows are put back into their original array form, but at this time we extract the $min value for the "show_time".
The value taken is not just the ordinary "minimal" value. Here we use the $cond operator to make sure that the value returned must be "greater than or equal to" the date that you were requesting in the query initially. This makes sure that any "earlier" values are not taken into consideration when "sorting".
The next thing to do is to $sort on that "earliest" date, to keep the entries for the "venues" in order. The following stages then do the same as above, but "grouping" back to the original documents this time, then finally "sorting" in the order of which "show_time" would be the "earliest".
The result from the dates shown as input would be your desired result for the 16th:
{
"_id" : ObjectId("53a95263a1923f45a6c2d3dd"),
"event_name" : "Another nice event",
"venues" : [
{
"venue_name" : "venue #1",
"shows" : [
{
"show_time" : ISODate("2014-06-16T07:46:02.415Z"),
"capacity" : 40
},
{
"show_time" : ISODate("2014-06-19T07:46:02.415Z"),
"capacity" : 20
}
]
}
]
}
{
"_id" : ObjectId("53a952b5a1923f45a6c2d3de"),
"event_name" : "Some nice event",
"venues" : [
{
"venue_name" : "venue #2",
"shows" : [
{
"show_time" : ISODate("2014-06-17T07:46:02.415Z"),
"capacity" : 20
},
{
"show_time" : ISODate("2014-06-24T07:46:02.415Z"),
"capacity" : 40
}
]
},
{
"venue_name" : "venue #1",
"shows" : [
{
"show_time" : ISODate("2014-06-18T07:46:02.415Z"),
"capacity" : 20
},
{
"show_time" : ISODate("2014-06-20T07:46:02.415Z"),
"capacity" : 40
}
]
}
]
}
And by changing the input to the 18th you also get the desired result:
{
"_id" : ObjectId("53a952b5a1923f45a6c2d3de"),
"event_name" : "Some nice event",
"venues" : [
{
"venue_name" : "venue #1",
"shows" : [
{
"show_time" : ISODate("2014-06-18T07:46:02.415Z"),
"capacity" : 20
},
{
"show_time" : ISODate("2014-06-20T07:46:02.415Z"),
"capacity" : 40
}
]
},
{
"venue_name" : "venue #2",
"shows" : [
{
"show_time" : ISODate("2014-06-17T07:46:02.415Z"),
"capacity" : 20
},
{
"show_time" : ISODate("2014-06-24T07:46:02.415Z"),
"capacity" : 40
}
]
}
]
}
{
"_id" : ObjectId("53a95263a1923f45a6c2d3dd"),
"event_name" : "Another nice event",
"venues" : [
{
"venue_name" : "venue #1",
"shows" : [
{
"show_time" : ISODate("2014-06-16T07:46:02.415Z"),
"capacity" : 40
},
{
"show_time" : ISODate("2014-06-19T07:46:02.415Z"),
"capacity" : 20
}
]
}
]
}
Also if you want to go further with this, just add an additional $match stage, and that can filter out "events" that occur before the date that is requested in the query:
db.event.aggregate([
{ "$match": {
"venues.shows.show_time": { "$gte": new Date("2014-06-18") }
}},
{ "$unwind": "$venues" },
{ "$unwind": "$venues.shows" },
{ "$match": {
"venues.shows.show_time": { "$gte": new Date("2014-06-18") }
}},
{ "$sort": { "venues.shows.show_time": 1 } },
{ "$group": {
"_id": {
"_id": "$_id",
"event_name": "$event_name",
"venue_name": "$venues.venue_name"
},
"shows": {
"$push": "$venues.shows"
},
"earliest": {
"$min": {
"$cond": [
{ "$gte": [
"$venues.shows.show_time",
new Date("2014-06-18")
]},
"$venues.shows.show_time",
null
]
}
}
}},
{ "$sort": { "earliest": 1 } },
{ "$group": {
"_id": "$_id._id",
"event_name": { "$first": "$_id.event_name" },
"venues": {
"$push": {
"venue_name": "$_id.venue_name",
"shows": "$shows"
}
},
"earliest": {
"$min": {
"$cond": [
{ "$gte": [
"$earliest",
new Date("2014-06-18")
]},
"$earliest",
null
]
}
}
}},
{ "$sort": { "earliest": 1 } },
{ "$project": {
"event_name": 1,
"venues": 1
}}
])
With the result:
{
"_id" : ObjectId("53a952b5a1923f45a6c2d3de"),
"event_name" : "Some nice event",
"venues" : [
{
"venue_name" : "venue #1",
"shows" : [
{
"show_time" : ISODate("2014-06-18T07:46:02.415Z"),
"capacity" : 20
},
{
"show_time" : ISODate("2014-06-20T07:46:02.415Z"),
"capacity" : 40
}
]
},
{
"venue_name" : "venue #2",
"shows" : [
{
"show_time" : ISODate("2014-06-24T07:46:02.415Z"),
"capacity" : 40
}
]
}
]
}
{
"_id" : ObjectId("53a95263a1923f45a6c2d3dd"),
"event_name" : "Another nice event",
"venues" : [
{
"venue_name" : "venue #1",
"shows" : [
{
"show_time" : ISODate("2014-06-19T07:46:02.415Z"),
"capacity" : 20
}
]
}
]
}

Related

Combine results based on condition during group by

Mongo query generated out of java code:
{
"pipeline": [{
"$match": {
"Id": "09cd9a5a-85c5-4948-808b-20a52d92381a"
}
},
{
"$group": {
"_id": "$result",
"id": {
"$first": "$result"
},
"labelKey": {
"$first": {
"$ifNull": ["$result",
"$result"]
}
},
"value": {
"$sum": 1
}
}
}]
}
Field 'result' can have values like Approved, Rejected, null and "" (empty string). What I am trying to achieve is combining the count of both null and empty together.
So that the empty string Id will have the count of both null and "", which is equal to 4
I'm sure theres a more "proper" way but this is what i could quickly come up with:
[
{
"$group" : {
"_id" : "$result",
"id" : {
"$first" : "$result"
},
"labelKey" : {
"$first" : {
"$ifNull" : [
"$result",
"$result"
]
}
},
"value" : {
"$sum" : 1.0
}
}
},
{
"$group" : {
"_id" : {
"$cond" : [{
$or: [
{"$eq": ["$_id", "Approved"]},
{"$eq": ["$_id", "Rejected"]},
]}},
"$_id",
""
]
},
"temp" : {
"$push" : {
"_id" : "$_id",
"labelKey" : "$labelKey"
}
},
"count" : {
"$sum" : "$value"
}
}
},
{
"$unwind" : "$temp"
},
{
"$project" : {
"_id" : "$temp._id",
"labelKey": "$temp.labelKey",
"count" : "$count"
}
}
],
);
Due to the fact the second group is only on 4 documents tops i don't feel too bad about doing this.
I have used $facet.
The MongoDB stage $facet lets you run several independent pipelines within the stage of a pipeline, all using the same data. This means that you can run several aggregations with the same preliminary stages, and successive stages.
var queries = [{
"$match": {
"Id": "09cd9a5a-85c5-4948-808b-20a52d92381a"
}
},{
$facet: {//
"empty": [
{
$match : {
result : { $in : ['',null]}
}
},{
"$group" : {
"_id" : null,
value : { $sum : 1}
}
}
],
"non_empty": [
{
$match : {
result : { $nin : ['',null]}
}
},{
"$group" : {
"_id" : '$result',
value : { $sum : 1}
}
}
]
}
},
{
$project: {
results: {
$concatArrays: [ "$empty", "$non_empty" ]
}
}
}];
Output :
{
"results": [{
"_id": null,
"value": 52 // count of both '' and null.
}, {
"_id": "Approved",
"value": 83
}, {
"_id": "Rejected",
"value": 3661
}]
}
Changing the group by like below solved the problem
{
"$group": {
"_id": {
"$ifNull": ["$result", ""]
},
"id": {
"$first": "$result"
},
"labelKey": {
"$first": {
"$ifNull": ["$result",
"$result"]
}
},
"value": {
"$sum": 1
}
}
}

Group and count over a start and end range

If I have data in the following format:
[
{
_id: 1,
startDate: ISODate("2017-01-1T00:00:00.000Z"),
endDate: ISODate("2017-02-25T00:00:00.000Z"),
type: 'CAR'
},
{
_id: 2,
startDate: ISODate("2017-02-17T00:00:00.000Z"),
endDate: ISODate("2017-03-22T00:00:00.000Z"),
type: 'HGV'
}
]
Is it possible to retrieve data grouped by 'type', but also with a count of the type for each of month in a given date range e.g. between 2017/1/1 to 2017/4/1 would return:
[
{
_id: 'CAR',
monthCounts: [
/*January*/
{
from: ISODate("2017-01-1T00:00:00.000Z"),
to: ISODate("2017-01-31T23:59:59.999Z"),
count: 1
},
/*February*/
{
from: ISODate("2017-02-1T00:00:00.000Z"),
to: ISODate("2017-02-28T23:59:59.999Z"),
count: 1
},
/*March*/
{
from: ISODate("2017-03-1T00:00:00.000Z"),
to: ISODate("2017-03-31T23:59:59.999Z"),
count: 0
},
]
},
{
_id: 'HGV',
monthCounts: [
{
from: ISODate("2017-01-1T00:00:00.000Z"),
to: ISODate("2017-01-31T23:59:59.999Z"),
count: 0
},
{
from: ISODate("2017-02-1T00:00:00.000Z"),
to: ISODate("2017-02-28T23:59:59.999Z"),
count: 1
},
{
from: ISODate("2017-03-1T00:00:00.000Z"),
to: ISODate("2017-03-31T23:59:59.999Z"),
count: 1
},
]
}
]
The returned format is not really important, but what I am trying to achieve is in a single query to retrieve a number of counts for the same grouping (one per month). The input could be simply a start and end date to report from or more likely it could be an array of the date ranges to group by.
The algorithm for this is to basically "iterate" values between the interval of the two values. MongoDB has a couple of ways to deal with this, being what has always been present with mapReduce() and with new features available to the aggregate() method.
I'm going expand on your selection to deliberately show an overlapping month since your examples did not have one. This will result in the "HGV" values appearing in "three" months of output.
{
"_id" : 1,
"startDate" : ISODate("2017-01-01T00:00:00Z"),
"endDate" : ISODate("2017-02-25T00:00:00Z"),
"type" : "CAR"
}
{
"_id" : 2,
"startDate" : ISODate("2017-02-17T00:00:00Z"),
"endDate" : ISODate("2017-03-22T00:00:00Z"),
"type" : "HGV"
}
{
"_id" : 3,
"startDate" : ISODate("2017-02-17T00:00:00Z"),
"endDate" : ISODate("2017-04-22T00:00:00Z"),
"type" : "HGV"
}
Aggregate - Requires MongoDB 3.4
db.cars.aggregate([
{ "$addFields": {
"range": {
"$reduce": {
"input": { "$map": {
"input": { "$range": [
{ "$trunc": {
"$divide": [
{ "$subtract": [ "$startDate", new Date(0) ] },
1000
]
}},
{ "$trunc": {
"$divide": [
{ "$subtract": [ "$endDate", new Date(0) ] },
1000
]
}},
60 * 60 * 24
]},
"as": "el",
"in": {
"$let": {
"vars": {
"date": {
"$add": [
{ "$multiply": [ "$$el", 1000 ] },
new Date(0)
]
},
"month": {
}
},
"in": {
"$add": [
{ "$multiply": [ { "$year": "$$date" }, 100 ] },
{ "$month": "$$date" }
]
}
}
}
}},
"initialValue": [],
"in": {
"$cond": {
"if": { "$in": [ "$$this", "$$value" ] },
"then": "$$value",
"else": { "$concatArrays": [ "$$value", ["$$this"] ] }
}
}
}
}
}},
{ "$unwind": "$range" },
{ "$group": {
"_id": {
"type": "$type",
"month": "$range"
},
"count": { "$sum": 1 }
}},
{ "$sort": { "_id": 1 } },
{ "$group": {
"_id": "$_id.type",
"monthCounts": {
"$push": { "month": "$_id.month", "count": "$count" }
}
}}
])
The key to making this work is the $range operator which takes values for a "start" and and "end" as well as an "interval" to apply. The result is an array of values taken from the "start" and incremented until the "end" is reached.
We use this with startDate and endDate to generate the possible dates in between those values. You will note that we need to do some math here since the $range only takes a 32-bit integer, but we can take the milliseconds away from the timestamp values so that is okay.
Because we want "months", the operations applied extract the month and year values from the generated range. We actually generate the range as the "days" in between since "months" are difficult to deal with in math. The subsequent $reduce operation takes only the "distinct months" from the date range.
The result therefore of the first aggregation pipeline stage is a new field in the document which is an "array" of all the distinct months covered between startDate and endDate. This gives an "iterator" for the rest of the operation.
By "iterator" I mean than when we apply $unwind we get a copy of the original document for every distinct month covered in the interval. This then allows the following two $group stages to first apply a grouping to the common key of "month" and "type" in order to "total" the counts via $sum, and next $group makes the key just the "type" and puts the results in an array via $push.
This gives the result on the above data:
{
"_id" : "HGV",
"monthCounts" : [
{
"month" : 201702,
"count" : 2
},
{
"month" : 201703,
"count" : 2
},
{
"month" : 201704,
"count" : 1
}
]
}
{
"_id" : "CAR",
"monthCounts" : [
{
"month" : 201701,
"count" : 1
},
{
"month" : 201702,
"count" : 1
}
]
}
Note that the coverage of "months" is only present where there is actual data. Whilst possible to produce zero values over a range, it requires quite a bit of wrangling to do so and is not very practical. If you want zero values then it is better to add that in post processing in the client once the results have been retrieved.
If you really have your heart set on the zero values, then you should separately query for $min and $max values, and pass these in to "brute force" the pipeline into generating the copies for each supplied possible range value.
So this time the "range" is made externally to all documents, and you then use a $cond statement into the accumulator to see if the current data is within the grouped range produced. Also since the generation is "external", we really don't need the MongoDB 3.4 operator of $range, so this can be applied to earlier versions as well:
// Get min and max separately
var ranges = db.cars.aggregate(
{ "$group": {
"_id": null,
"startRange": { "$min": "$startDate" },
"endRange": { "$max": "$endDate" }
}}
).toArray()[0]
// Make the range array externally from all possible values
var range = [];
for ( var d = new Date(ranges.startRange.valueOf()); d <= ranges.endRange; d.setUTCMonth(d.getUTCMonth()+1)) {
var v = ( d.getUTCFullYear() * 100 ) + d.getUTCMonth()+1;
range.push(v);
}
// Run conditional aggregation
db.cars.aggregate([
{ "$addFields": { "range": range } },
{ "$unwind": "$range" },
{ "$group": {
"_id": {
"type": "$type",
"month": "$range"
},
"count": {
"$sum": {
"$cond": {
"if": {
"$and": [
{ "$gte": [
"$range",
{ "$add": [
{ "$multiply": [ { "$year": "$startDate" }, 100 ] },
{ "$month": "$startDate" }
]}
]},
{ "$lte": [
"$range",
{ "$add": [
{ "$multiply": [ { "$year": "$endDate" }, 100 ] },
{ "$month": "$endDate" }
]}
]}
]
},
"then": 1,
"else": 0
}
}
}
}},
{ "$sort": { "_id": 1 } },
{ "$group": {
"_id": "$_id.type",
"monthCounts": {
"$push": { "month": "$_id.month", "count": "$count" }
}
}}
])
Which produces the consistent zero fills for all possible months on all groupings:
{
"_id" : "HGV",
"monthCounts" : [
{
"month" : 201701,
"count" : 0
},
{
"month" : 201702,
"count" : 2
},
{
"month" : 201703,
"count" : 2
},
{
"month" : 201704,
"count" : 1
}
]
}
{
"_id" : "CAR",
"monthCounts" : [
{
"month" : 201701,
"count" : 1
},
{
"month" : 201702,
"count" : 1
},
{
"month" : 201703,
"count" : 0
},
{
"month" : 201704,
"count" : 0
}
]
}
MapReduce
All versions of MongoDB support mapReduce, and the simple case of the "iterator" as mentioned above is handled by a for loop in the mapper. We can get output as generated up to the first $group from above by simply doing:
db.cars.mapReduce(
function () {
for ( var d = this.startDate; d <= this.endDate;
d.setUTCMonth(d.getUTCMonth()+1) )
{
var m = new Date(0);
m.setUTCFullYear(d.getUTCFullYear());
m.setUTCMonth(d.getUTCMonth());
emit({ id: this.type, date: m},1);
}
},
function(key,values) {
return Array.sum(values);
},
{ "out": { "inline": 1 } }
)
Which produces:
{
"_id" : {
"id" : "CAR",
"date" : ISODate("2017-01-01T00:00:00Z")
},
"value" : 1
},
{
"_id" : {
"id" : "CAR",
"date" : ISODate("2017-02-01T00:00:00Z")
},
"value" : 1
},
{
"_id" : {
"id" : "HGV",
"date" : ISODate("2017-02-01T00:00:00Z")
},
"value" : 2
},
{
"_id" : {
"id" : "HGV",
"date" : ISODate("2017-03-01T00:00:00Z")
},
"value" : 2
},
{
"_id" : {
"id" : "HGV",
"date" : ISODate("2017-04-01T00:00:00Z")
},
"value" : 1
}
So it does not have the second grouping to compound to arrays, but we did produce the same basic aggregated output.

MongoDB aggregate count based on multiple query fields - (Multiple field count)

My collection will look this,
{
"_id" : ObjectId("55c8bd1d85b83e06dc54c0eb"),
"name" : "xxx",
"salary" : 10000,
"type" : "type1"
}
{
"_id" : ObjectId("55c8bd1d85b83e06dc54c0eb"),
"name" : "aaa",
"salary" : 10000,
"type" : "type2"
}
{
"_id" : ObjectId("55c8bd1d85b83e06dc54c0eb"),
"name" : "ccc",
"salary" : 10000,
"type" : "type2"
}
My query params will be coming as,
{salary=10000, type=type2}
so based on the query I need to fetch the count of above query params
The result should be something like this,
{ category: 'type1', count: 500 } { category: 'type2', count: 200 } { category: 'name', count: 100 }
Now I am getting count by hitting three different queries and constructing the result (or) server side iteration I can get the result.
Can anyone suggest or provide me good way to get above result
Your quesstion is not very clearly presented, but what it seems you wanted to do here was count the occurances of the data in the fields, optionally filtering those fields by the values that matches the criteria.
Here the $cond operator allows you to tranform a logical condition into a value:
db.collection.aggregate([
{ "$group": {
"_id": null,
"name": { "$sum": 1 },
"salary": {
"$sum": {
"$cond": [
{ "$gte": [ "$salary", 1000 ] },
1,
0
]
}
},
"type": {
"$sum": {
"$cond": [
{ "$eq": [ "$type", "type2" ] },
1,
0
]
}
}
}}
])
All values are in the same document, and it does not really make any sense to split them up here as this is additional work in the pipeline.
{ "_id" : null, "name" : 3, "salary" : 3, "type" : 2 }
Otherwise in the long form, which is not very performant due to needing to make a copy of each document for every key looks like this:
db.collection.aggregate([
{ "$project": {
"name": 1,
"salary": 1,
"type": 1,
"category": { "$literal": ["name","salary","type"] }
}},
{ "$unwind": "$category" },
{ "$group": {
"_id": "$category",
"count": {
"$sum": {
"$cond": [
{ "$and": [
{ "$eq": [ "$category", "name"] },
{ "$ifNull": [ "$name", false ] }
]},
1,
{ "$cond": [
{ "$and": [
{ "$eq": [ "$category", "salary" ] },
{ "$gte": [ "$salary", 1000 ] }
]},
1,
{ "$cond": [
{ "$and": [
{ "$eq": [ "$category", "type" ] },
{ "$eq": [ "$type", "type2" ] }
]},
1,
0
]}
]}
]
}
}
}}
])
And it's output:
{ "_id" : "type", "count" : 2 }
{ "_id" : "salary", "count" : 3 }
{ "_id" : "name", "count" : 3 }
If your documents do not have uniform key names or otherwise cannot specify each key in your pipeline condition, then apply with mapReduce instead:
db.collection.mapReduce(
function() {
var doc = this;
delete doc._id;
Object.keys(this).forEach(function(key) {
var value = (( key == "salary") && ( doc[key] < 1000 ))
? 0
: (( key == "type" ) && ( doc[key] != "type2" ))
? 0
: 1;
emit(key,value);
});
},
function(key,values) {
return Array.sum(values);
},
{
"out": { "inline": 1 }
}
);
And it's output:
"results" : [
{
"_id" : "name",
"value" : 3
},
{
"_id" : "salary",
"value" : 3
},
{
"_id" : "type",
"value" : 2
}
]
Which is basically the same thing with a conditional count, except that you only specify the "reverse" of the conditions you want and only for the fields you want to filter conditions on. And of course this output format is simple to emit as separate documents.
The same approach applies where to test the condition is met on the fields you want conditions for and return 1 where the condition is met or 0 where it is not for the summing the count.
You can use aggregation as following query:
db.collection.aggregate({
$match: {
salary: 10000,
//add any other condition here
}
}, {
$group: {
_id: "$type",
"count": {
$sum: 1
}
}
}, {
$project: {
"category": "$_id",
"count": 1,
_id: 0
}
}

MongoDB select distinct and count

I have a product collection which looks like that:
products = [
{
"ref": "1",
"facets": [
{
"type":"category",
"val":"kitchen"
},
{
"type":"category",
"val":"bedroom"
},
{
"type":"material",
"val":"wood"
}
]
},
{
"ref": "2",
"facets": [
{
"type":"category",
"val":"kitchen"
},
{
"type":"category",
"val":"livingroom"
},
{
"type":"material",
"val":"plastic"
}
]
}
]
I would like to select and count the distinct categories and the number of products that have the category (Note that a product can have more than one category). Something like that:
[
{
"category": "kitchen",
"numberOfProducts": 2
},
{
"category": "bedroom",
"numberOfProducts": 1
},
{
"category": "livingroom",
"numberOfProducts": 1
}
]
And it would be better if I could get the same result for each different facet type, something like that:
[
{
"facetType": "category",
"distinctValues":
[
{
"val": "kitchen",
"numberOfProducts": 2
},
{
"val": "livingroom",
"numberOfProducts": 1
},
{
"val": "bedroom",
"numberOfProducts": 1
}
]
},
{
"facetType": "material",
"distinctValues":
[
{
"val": "wood",
"numberOfProducts": 1
},
{
"val": "plastic",
"numberOfProducts": 1
}
]
}
]
I am doing tests with distinct, aggregate and mapReduce. But can't achieve the results needed. Can anybody tell me the good way?
UPDATE:
With aggregate, this give me the different facet categories that a product have, but not the values nor the count of different values:
db.products.aggregate([
{$match:{'content.facets.type':'category'}},
{$group:{ _id: '$content.facets.type'} }
]).pretty();
The following aggregation pipeline will give you the desired result. In the first pipeline step, you need to do an $unwind operation on the facets array so that it's deconstructed to output a document for each element. After the $unwind stage is the first of the $group operations which groups the documents from the previous stream by category and type and calculates the number of products in each group using $sum. The next $group operation in the next pipeline stage then creates the array that holds the aggregated values by using $addToSet operator. The final pipeline stage is the $project operation which then transforms the document in the stream by modifying existing fields:
var pipeline = [
{ "$unwind": "$facets" },
{
"$group": {
"_id": {
"facetType": "$facets.type",
"value": "$facets.val"
},
"count": { "$sum": 1 }
}
},
{
"$group": {
"_id": "$_id.facetType",
"distinctValues": {
"$addToSet": {
"val": "$_id.value",
"numberOfProducts": "$count"
}
}
}
},
{
"$project": {
"_id": 0,
"facetType": "$_id",
"distinctValues": 1
}
}
];
db.product.aggregate(pipeline);
Output
/* 0 */
{
"result" : [
{
"distinctValues" : [
{
"val" : "kitchen",
"numberOfProducts" : 2
},
{
"val" : "bedroom",
"numberOfProducts" : 1
},
{
"val" : "livingroom",
"numberOfProducts" : 1
}
],
"facetType" : "category"
},
{
"distinctValues" : [
{
"val" : "wood",
"numberOfProducts" : 1
},
{
"val" : "plastic",
"numberOfProducts" : 1
}
],
"facetType" : "material"
}
],
"ok" : 1
}

MongoDB Aggregation group and count strings

I have a problem with counting different LogStatusses from my collection. I'd like the following result from a query:
Month | ImporterName | NrOfError | NrOfDebug | NrOfInfo | NrOfWarning
So this includes grouping by Month and ImporterName and counting the number of documents with the different statusses.
My MongoDB Collection:
{
"_id" : "8ec84cb7-5099-4a9d-be00-a40200a67c5a",
"Messages" : [
{
"LogStatus" : "Error",
"Message" : "My test message"
},
{
"LogStatus" : "Error",
"Message" : "My test message"
},
{
"LogStatus" : "Error",
"Message" : "My test message"
},
{
"LogStatus" : "Error",
"Message" : "My test message"
},
{
"LogStatus" : "Error",
"Message" : "My test message"
}
],
"StartTime" : new Date("2014-12-15T10:06:09.00Z"),
"EndTime" : new Date("2014-12-15T13:06:09.00Z"),
"HasErrors" : true,
"HasWarnings" : false,
"ImporterName" : "MyImporter"
}
I already have the following query's:
db.SessionLogItems.aggregate
([
{
$project:
{
month :{$month : "$StartTime"},
name: "$ImporterName",
status: "$Messages.LogStatus",
_id: 0
}
}
])
result:
month: 12, "name" : "importername", status: ["Error", "Error", "Info"]
and
db.SessionLogItems.aggregate
([
{
$unwind: "$Messages"
},
{
$group: { _id: "$Messages", Number : {$sum : 1 }}
},
{
$sort: {Number : -1 }
}
])
result:
"_id" : { "LogStatus" : "Warning", "Message" : "My test warning" }, "Number" :5
"_id" : { "LogStatus" : "Error", "Message" : "My test message" }, "Number" : 5
But I can't seem to figure out the correct query. Any help is appreciated!
EDIT:
My example above is just one out of many documents. I have several importers which have a startTime and EndTime. The importers have several logmessages and four possible LogStatusses: "Error", "Info", "Debug", "Warning". I'd like to have an overview per month and per importer how many errors, infos, debugs and warnings they produced.
Assuming there is no overlap in your "month" between StartTime and EndTime values then you can simply use the StartTime value as the basis for a grouping key. Most of the magic for your other "fields" comes from the $cond operator which decides whether to count the value or not:
db.SessionLogItems.aggregate([
// Unwind the array to de-normalize the documents contained
{ "$unwind": "$Messages" },
// Month and Importer form the grouping key
{ "$group": {
"_id": {
"month": { "$month": "$StartTime" },
"ImporterName": "$ImporterName"
},
"NrOfError": {
"$sum": {
"$cond": [
{ "$eq": [ "$Messages.LogStatus", "Error" ] },
1,
0
]
}
},
"NrOfDebug": {
"$sum": {
"$cond": [
{ "$eq": [ "$Messages.LogStatus", "Debug" ] },
1,
0
]
}
},
"NrOfInfo": {
"$sum": {
"$cond": [
{ "$eq": [ "$Messages.LogStatus", "Info" ] },
1,
0
]
}
},
"NrOfWarning": {
"$sum": {
"$cond": [
{ "$eq": [ "$Messages.LogStatus", "Warning" ] },
1,
0
]
}
}
}}
])
So basically the "Status" value is tested and where it is matched or not then the appropriate count value is added to the appropriate field.