I'm attempting to write a query to return the top X terms across each category - e.g. top 5, top 10 etc. Each term has an associated category, and based up on some help from another stackoverflow question I've managed to get this:
db.collection.aggregate([
{
$group : {
_id : {
category: "$uri.category",
term: "$uri.term",
},
total: { $sum : 1 }
}
},
{ $sort : { total : -1 } },
{
$group : {
_id : "$_id.category",
terms: {
$push: {
term: "$_id.term",
total: "$total"
}
}
}
}
]);
The above query does work, and returns data that looks something like this:
[
{ category: "movies",
terms: [ { term: "movie 1", total: 5000 }, { term: "movie 2", total: 200 } ... ]
},
{ category: "sports",
terms: [ { term: "football 1", total: 4000 }, { term: "tennis 2", total: 250 } ... ]
},
]
However I'm trying to limit the terms array to a fixed number i.e. 5 or 10 - this will correspond to the X number of searches per category. I've been trying various options such as adding $slice within the $push to reduce the terms array down with no success.
Can this be achieved using the aggregate framework, or should I look at another approach?
As of MongoDb version 3.1.6 you can now slice on the $project stage:
{
$project: {
terms: {
$slice: ["$terms", 0, 10]
}
}
}
If you wanted to limit the number of items $pushed to 10.
Here's the issue:
https://jira.mongodb.org/browse/SERVER-6074
It seems as of Mongodb 2.6, the ability to limit the size of an array using $slice or $push with the .aggregate() function/command is unsupported.
Here's the feature request on the MongoDb issue tracker.
What I would do is output the aggregated result to an collection. Then update the collection.
Example:
Setup:
use test;
var rInt = function(x) {
return 1 + ~~(Math.random() * x);
};
var rObj = function() {
return {
"timestamp": new Date(),
"category": "movies" + rInt(5),
"term": "my movie" + rInt(20)
}
};
for (var i = 0, l = 100; i < l; i++) {
db.al.insert(rObj());
}
Aggregate query
db.al_out.drop();
db.al.aggregate([
{
$group : {
_id : {
category: "$category",
term: "$term",
},
total: { $sum : 1 }
}
},
{ $sort : { total : -1 } },
{
$group : {
_id : "$_id.category",
terms: {
$push: {
term: "$_id.term",
total: "$total"
}
}
}
}
,{ $out : "al_out" } // output the documents to `db.al_out`
]);
// limit the size of terms to 3 elements.
db.al_out.update( {}, {
$push : {
terms : {
$each : [],
$slice : 3
}
}
}, {
multi:true
});
Result:
db.al_out.find();
{ "_id" : "movies1", "terms" : [ { "term" : "my movie7", "total" : 3 }, { "term" : "my movie6", "total" : 3 }, { "term" : "my movie17", "total" : 2 } ] }
{ "_id" : "movies2", "terms" : [ { "term" : "my movie3", "total" : 4 }, { "term" : "my movie11", "total" : 2 }, { "term" : "my movie2", "total" : 2 } ] }
{ "_id" : "movies4", "terms" : [ { "term" : "my movie9", "total" : 3 }, { "term" : "my movie1", "total" : 3 }, { "term" : "my movie7", "total" : 2 } ] }
{ "_id" : "movies3", "terms" : [ { "term" : "my movie19", "total" : 5 }, { "term" : "my movie8", "total" : 4 }, { "term" : "my movie14", "total" : 4 } ] }
{ "_id" : "movies5", "terms" : [ { "term" : "my movie7", "total" : 6 }, { "term" : "my movie17", "total" : 4 }, { "term" : "my movie3", "total" : 2 } ] }
I would add a $limit stage after the $sort and before the $group:
{ $limit : 5 },
This should limit the number of documents that are then being pushed into the array to 5. This will also serve to limit the total number of documents maintained in memory in the sort, which should improve overall performance:
When a $sort immediately precedes a $limit in the pipeline, the $sort
operation only maintains the top n results as it progresses, where n
is the specified limit, and MongoDB only needs to store n items in
memory.
http://docs.mongodb.org/manual/reference/operator/aggregation/limit/
Related
I have a json collection with 300 records like this:
{
salesNumber: 23839,
batch: null
},
{
salesNumber 389230,
batch: null
}
...etc.
I need to divide this collection into 3 different batches. So, when sorted by salesNumber, the first 100 would be in batch 1, the next 100 would be batch 2, and the last 100 would be batch 3. How do I do this?
I wrote a script to select the first 100, but when I tried to turn it into an array to use in an update, the result was 0 records.
var firstBatchCompleteRecords = db.properties.find(
{
"auction": ObjectId("50")
}
).sort("saleNumber").limit(100);
// This returned 174 records as excepted with all the fields
var firstBatch = firstBatchCompleteRecords.distinct( "saleNumber", {});
// This returned 0 records
I was going to take the results of that last query and use it in an update statement:
db.properties.update(
{
"saleNumber":
{
"$in": firstBatch
}
}
,
{
$set:
{
batch: "1"
}
}
,
{
multi: true
}
);
...then I would have created an array using distinct of the next 100 and update those, but I never got that far.
there is a chance to get results using aggregation framework - and store them in new collection - then you can use this answer to iterate and update fields in source collection
Have a fun!
db.sn.aggregate([{
$sort : {
salesNumber : 1
}
}, {
$group : {
_id : null,
arrayOfData : {
$push : "$$ROOT"
},
}
}, {
$project : {
_id : 0,
firstHundred : {
$slice : ["$arrayOfData", 0, 100]
},
secondHundred : {
$slice : ["$arrayOfData", 99, 100]
},
thirdHundred : {
$slice : ["$arrayOfData", 199, 100]
},
}
}, {
$project : {
"firstHundred.batch" : {
$literal : 1
},
"firstHundred.salesNumber" : 1,
"firstHundred._id" : 1,
"secondHundred.batch" : {
$literal : 2
},
"secondHundred.salesNumber" : 1,
"secondHundred._id" : 1,
"thirdHundred.batch" : {
$literal : 3
},
"thirdHundred.salesNumber" : 1,
"thirdHundred._id" : 1,
}
}, {
$project : {
allValues : {
$setUnion : ["$firstHundred", "$secondHundred", "$thirdHundred"]
}
}
}, {
$unwind : "$allValues"
}, {
$project : {
_id : "$allValues._id",
salesNumber : "$allValues.salesNumber",
batch : "$allValues.batch",
}
}, {
$out : "collectionName"
}
])
db.collectionName.find()
and output generated for 6 document divided by 2:
{
"_id" : ObjectId("5733ade7eeeccba2bd546121"),
"salesNumber" : 389230,
"batch" : 2
}, {
"_id" : ObjectId("5733ade7eeeccba2bd546120"),
"salesNumber" : 23839,
"batch" : 1
}, {
"_id" : ObjectId("5733ade7eeeccba2bd546122"),
"salesNumber" : 43839,
"batch" : 1
}, {
"_id" : ObjectId("5733ade7eeeccba2bd546124"),
"salesNumber" : 63839,
"batch" : 2
}, {
"_id" : ObjectId("5733ade7eeeccba2bd546123"),
"salesNumber" : 589230,
"batch" : 3
}, {
"_id" : ObjectId("5733ade7eeeccba2bd546125"),
"salesNumber" : 789230,
"batch" : 3
}
Any comments welcome!
I have two different collection book and music in JSON .First I give a book collection example:
{
"_id" : ObjectId("b1"),
"author" : [
"Mary",
],
"title" : "Book1",
}
{
"_id" : ObjectId("b2"),
"author" : [
"Joe",
"Tony",
"Mary"
],
"title" : "Book2",
}
{
"_id" : ObjectId("b3"),
"author" : [
"Joe",
"Mary"
],
"title" : "Book3",
}
.......
Mary writes 3 books, Joe write 2 books, Tony writes 1 book. Second I give a music collection example:
{
"_id" : ObjectId("m1"),
"author" : [
"Tony"
],
"title" : "Music1",
}
{
"_id" : ObjectId("m2"),
"author" : [
"Joe",
"Tony"
],
"title" : "Music2",
}
.......
Tony has 2 musics, Joe has 1 music, Mary has 0 music.
I hope to get the number of authors who write more books than music.
Thus, Mary(3 > 0) and Joe(2 > 1) should take into consideration, but not Tony(1 < 2). Thus the final result should be 2(Mary and Joe).
I write down following code, but don't know how to compare:
db.book.aggregate([
{ $project:{ _id:0, author:1}},
{ $unwind:"$author" },
{$group:{_id:"$author", count:{$sum:1}}}
]
)
db.music.aggregate([
{ $project:{ _id:0, author:1}},
{ $unwind:"$author" },
{$group:{_id:"$author", count:{$sum:1}}}
]
)
Is it so far right? How to do the following comparison? Thanks.
to solve that problem, we need to use $out phase and store result of both queries in intermediate collection and then use aggregated query to join them ($lookup).
db.books.aggregate([{
$project : {
_id : 0,
author : 1
}
}, {
$unwind : "$author"
}, {
$group : {
_id : "$author",
count : {
$sum : 1
}
}
}, {
$project : {
_id : 0,
author : "$_id",
count : 1
}
}, {
$out : "bookAuthors"
}
])
db.music.aggregate([{
$project : {
_id : 0,
author : 1
}
}, {
$unwind : "$author"
}, {
$group : {
_id : "$author",
count : {
$sum : 1
}
}
}, {
$project : {
_id : 0,
author : "$_id",
count : 1
}
}, {
$out : "musicAuthors"
}
])
db.bookAuthors.aggregate([{
$lookup : {
from : "musicAuthors",
localField : "author",
foreignField : "author",
as : "music"
}
}, {
$unwind : "$music"
}, {
$project : {
_id : "$author",
result : {
$gt : ["$count", "$music.count"]
},
count : 1,
}
}, {
$match : {
result : true
}
}
])
EDIT CHANGES:
used author field instead of _id
added logical statement embeded in document in $project phase
result : { $gt : ["$count", "$music.count"]
Any questions welcome!
Have a fun!
I need to get $sum and $avg of subdocuments, i would like to get $sum and $avg of Channels[0].. and other channels as well.
my data structure looks like this
{
_id : ... Location : 1,
Channels : [
{ _id: ...,
Value: 25
},
{
_id: ... ,
Value: 39
},
{
_id: ..,
Value: 12
}
]
}
In order to get the sum and average of the Channels.Value elements for each document in your collection you will need to use mongodb's Aggregation processing. Further, since Channels is an array you will need to use the $unwind operator to deconstruct the array.
Assuming that your collection is called example, here's how you could get both the document sum and average of the Channels.Values:
db.example.aggregate( [
{
"$unwind" : "$Channels"
},
{
"$group" : {
"_id" : "$_id",
"documentSum" : { "$sum" : "$Channels.Value" },
"documentAvg" : { "$avg" : "$Channels.Value" }
}
}
] )
The output from your post's data would be:
{
"_id" : SomeObjectIdValue,
"documentSum" : 76,
"documentAvg" : 25.333333333333332
}
If you have more than one document in your collection then you will see a result row for each document containing a Channels array.
Solution 1: Using two groups based this example:
previous question
db.records.aggregate(
[
{ $unwind: "$Channels" },
{ $group: {
_id: {
"loc" : "$Location",
"cId" : "$Channels.Id"
},
"value" : {$sum : "$Channels.Value" },
"average" : {$avg : "$Channels.Value"},
"maximun" : {$max : "$Channels.Value"},
"minimum" : {$min : "$Channels.Value"}
}},
{ $group: {
_id : "$_id.loc",
"ChannelsSumary" : { $push :
{ "channelId" : '$_id.cId',
"value" :'$value',
"average" : '$average',
"maximun" : '$maximun',
"minimum" : '$minimum'
}}
}
}
]
)
Solution 2:
there is property i didn't show on my original question that might of help "Channels.Id" independent from "Channels._Id"
db.records.aggregate( [
{
"$unwind" : "$Channels"
},
{
"$group" : {
"_id" : "$Channels.Id",
"documentSum" : { "$sum" : "$Channels.Value" },
"documentAvg" : { "$avg" : "$Channels.Value" }
}
}
] )
I'm trying to implement a nested group query in mongodb and I'm getting stuck trying to add the outer group by. Given the below (simplified) data document:
{
"timestamp" : ISODate(),
"category" : "movies",
"term" : "my movie"
}
I'm trying to achieve a list of all categories and within the categories there should be the top number of terms. I would like my output something like this:
[
{ category: "movies",
terms: [ { term: "movie 1", total: 5000 }, { term: "movie 2", total: 200 } ... ]
},
{ category: "sports",
terms: [ { term: "football 1", total: 4000 }, { term: "tennis 2", total: 250 } ... ]
},
]
My 'inner group' is as shown below, and will get the top 5 for all categories:
db.collection.aggregate([
{ $match : { "timestamp": { $gt: ISODate("2014-08-27") } } },
{ $group : { _id : "$term", total : { $sum : 1 } } },
{ $sort : { total : -1 } },
{ $limit: 5 }
]);
// Outputs:
{ "_id" : "movie 1", "total" : 943 }
{ "_id" : "movie 2", "total" : 752 }
How would I go about implementing the 'outer group'?
Additionally sometimes the above aggregate]ion returns a null value (not all documents have a term value). How do I go about ignoring the null values?
thanks in advance
You will need two groups in this case. The first group generates a stream of documents with one document per term and category:
{ $group : {
_id : {
category: "$category",
term: "$term",
},
total: { $sum : 1 }
}
}
A second group will then merge all documents with the same term into one, using the $push operator to merge the categories into an array:
{ $group : {
_id : "$_id.category",
terms: {
$push: {
term:"$_id.term",
total:"$total"
}
}
}
}
Query:
db.getCollection('orders').aggregate([
{$match:{
tipo: {$regex:"[A-Z]+"}
}
},
{$group:
{
_id:{
codigo:"1",
tipo:"$tipo",
},
total:{$sum:1}
}
},
{$group:
{
_id:"$_id.codigo",
tipos:
{
$push:
{
tipo:"$_id.tipo",
total:"$total"
}
},
totalGeneral:{$sum:"$total"}
}
}
]);
Response:
{
"_id" : "1",
"tipos" : [
{
"tipo" : "TIPO_01",
"total" : 13.0
},
{
"tipo" : "TIPO_02",
"total" : 2479.0
},
{
"tipo" : "TIPO_03",
"total" : 12445.0
},
{
"tipo" : "TIPO_04",
"total" : 12445.0
},
{
"tipo" : "TIPO_05",
"total" : 21.0
},
{
"tipo" : "TIPO_06",
"total" : 21590.0
},
{
"tipo" : "TIPO_07",
"total" : 1065.0
},
{
"tipo" : "TIPO_08",
"total" : 562.0
}
],
"totalGeneral" : 50620.0
}
I am fairly new to MongoDB and I am playing with the aggregate framework. One of the examples from the documentation shows the following, which returns total number of new user joins per month and lists the month joined:
db.users.aggregate(
[
{ $project : { month_joined : { $month : "$joined" } } } ,
{ $group : { _id : {month_joined:"$month_joined"} , number : { $sum : 1 } } },
{ $sort : { "_id.month_joined" : 1 } }
]
)
The code outputs the following:
{
"_id" : {
"month_joined" : 1
},
"number" : 3
},
{
"_id" : {
"month_joined" : 2
},
"number" : 9
},
{
"_id" : {
"month_joined" : 3
},
"number" : 5
}
Is it possible to also have each object contain the sum of all users that have joined since the start, so I don't have to run over the objects programmatically and calculate it myself?
Example desired output:
{
"_id" : {
"month_joined" : 1
},
"number" : 3,
"total": 3
},
{
"_id" : {
"month_joined" : 2
},
"number" : 9,
"total": 12
},
{
"_id" : {
"month_joined" : 3
},
"number" : 5,
"total": 17
}