MongoDB fetch documents with sort by count - mongodb

I have a document with sub-document which looks something like:
{
"name" : "some name1"
"like" : [
{ "date" : ISODate("2012-11-30T19:00:00Z") },
{ "date" : ISODate("2012-12-02T19:00:00Z") },
{ "date" : ISODate("2012-12-01T19:00:00Z") },
{ "date" : ISODate("2012-12-03T19:00:00Z") }
]
}
Is it possible to fetch documents "most liked" (average value for the last 7 days) and sort by the count?

There are a few different ways to solve this problem. The solution I will focus on uses mongodb's aggregation framework. First, here is an aggregation pipeline that will solve your problem, following it will be an explanation/breakdown of what is happening in the command.
db.testagg.aggregate(
{ $unwind : '$likes' },
{ $group : { _id : '$_id', numlikes : { $sum : 1 }}},
{ $sort : { 'numlikes' : 1}})
This pipeline has 3 main commands:
1) Unwind: this splits up the 'likes' field so that there is 1 'like' element per document
2) Group: this regroups the document using the _id field, incrementing the numLikes field for every document it finds. This will cause numLikes to be filled with a number equal to the number of elements that were in "likes" before
3) Sort: Finally, we sort the return values in ascending order based on numLikes. In a test I ran the output of this command is:
{"result" : [
{
"_id" : 1,
"numlikes" : 1
},
{
"_id" : 2,
"numlikes" : 2
},
{
"_id" : 3,
"numlikes" : 3
},
{
"_id" : 4,
"numlikes" : 4
}....
This is for data inserted via:
for (var i=0; i < 100; i++) {
db.testagg.insert({_id : i})
for (var j=0; j < i; j++) {
db.testagg.update({_id : i}, {'$push' : {'likes' : j}})
}
}
Note that this does not completely answer your question as it avoids the issue of picking the date range, but it should hopefully get you started and moving in the right direction.
Of course, there are other ways to solve this problem. One solution might be to just do all of the sorting and manipulations client-side. This is just one method for getting the information you desire.
EDIT: If you find this somewhat tedious, there is a ticket to add a $size operator to the aggregation framework, I invite you to watch and potentially upvote it to try and speed to addition of this new operator if you are interested.
https://jira.mongodb.org/browse/SERVER-4899

A better solution would be to keep a count field that will record how many likes for this document. While you can use aggregation to do this, the performance will likely be not very good. Having a index on the count field will make read operation fast, and you can use atomic operation to increment the counter when inserting new likes.

You can use this simplify the above aggregation query by the following from mongodb v3.4 onwards:
> db.test.aggregate([
{ $unwind: "$like" },
{ $sortByCount: "$_id" }
]).pretty()
{ "_id" : ObjectId("5864edbfa4d3847e80147698"), "count" : 4 }
Also as #ACE said you can now use $size within a projection instead:
db.test.aggregate([
{ $project: { count: { $size : "$like" } } }
]);
{ "_id" : ObjectId("5864edbfa4d3847e80147698"), "count" : 4 }

Related

How to improve aggregate pipeline

I have pipeline
[
{'$match':{templateId:ObjectId('blabla')}},
{
"$sort" : {
"_id" : 1
}
},
{
"$facet" : {
"paginatedResult" : [
{
"$skip" : 0
},
{
"$limit" : 100
}
],
"totalCount" : [
{
"$count" : "count"
}
]
}
}
])
Index:
"key" : {
"templateId" : 1,
"_id" : 1
}
Collection has 10.6M documents 500k of it is with needed templateId.
Aggregate use index
"planSummary" : "IXSCAN { templateId: 1, _id: 1 }",
But the request takes 16 seconds. What i did wrong? How to speed up it?
For start, you should get rid of the $sort operator. The documents are already sorted by _id since the documents are already guaranteed to sorted by the { templateId: 1, _id: 1 } index. The outcome is sorting 500k which are already sorted anyway.
Next, you shouldn't use the $skip approach. For high page numbers you will skip large numbers of documents up to almost 500k (rather index entries, but still).
I suggest an alternative approach:
For the first page, calculate an id you know for sure falls out of the left side of the index. Say, if you know that you don't have entries back dated to 2019 and before, you can use a match operator similar to this:
var pageStart = ObjectId.fromDate(new Date("2020/01/01"))
Then, your match operator should look like this:
{'$match' : {templateId:ObjectId('blabla'), _id: {$gt: pageStart}}}
For the next pages, keep track of the last document of the previous page: if the rightmost document _id is x in a certain page, then pageStart should be x for the next page.
So your pipeline may look like this:
[
{'$match' : {templateId:ObjectId('blabla'), _id: {$gt: pageStart}}},
{
"$facet" : {
"paginatedResult" : [
{
"$limit" : 100
}
]
}
}
]
Note, that now the $skip is missing from the $facet operator as well.

Generating Unique Keys when using Aggregation in MongoDB? How to use $out after $unwind?

db.test.find() provides the following document
/* 1 */
{
"_id" : 1,
"relatives" : [
"A",
"B",
"C"
]
}
after $unwind (db.test.aggregate([{ $unwind : "$relatives"}])) , if becomes
/* 1 */
{
"_id" : 1,
"relatives" : "A"
}
/* 2 */
{
"_id" : 1,
"relatives" : "B"
}
/* 3 */
{
"_id" : 1,
"relatives" : "C"
}
Now if I want to $out (db.test.aggregate([{ $unwind : "$relatives"}, {"$out" : "new_collection"}])) the document into another collection, I will get a duplicate Key error. There is another question where he/she just wanted to remove the duplicate documents. But as you can see, I will need these different documents. And so, I want to recompute the IDs or create unique IDs for each document so that I can $out the collection successfully.
EDIT 1 :
I was able to solve this by using a ForEach loop...
count = 1;
db.test.aggregate([{ $unwind : "$relatives"}]).forEach(function (element){
element._id = count;
count++;
db.new_collection.save(element);
});
...but I want to know if there is a more elegant way to solve this problem.
Emit the _id with $project
use $out aggregate for new collection
db.test.aggregate([
{ $unwind : "$relatives"},
{ $project: {_id: 0, relatives: 1},
{$out: "newCollection"}
]));

MongoDB Calculate Values from Two Arrays, Sort and Limit

I have a MongoDB database storing float arrays. Assume a collection of documents in the following format:
{
"id" : 0,
"vals" : [ 0.8, 0.2, 0.5 ]
}
Having a query array, e.g., with values [ 0.1, 0.3, 0.4 ], I would like to compute for all elements in the collection a distance (e.g., sum of differences; for the given document and query it would be computed by abs(0.8 - 0.1) + abs(0.2 - 0.3) + abs(0.5 - 0.4) = 0.9).
I tried to use the aggregation function of MongoDB to achieve this, but I can't work out how to iterate over the array. (I am not using the built-in geo operations of MongoDB, as the arrays can be rather long)
I also need to sort the results and limit to the top 100, so calculation after reading the data is not desired.
Current Processing is mapReduce
If you need to execute this on the server and sort the top results and just keep the top 100, then you could use mapReduce for this like so:
db.test.mapReduce(
function() {
var input = [0.1,0.3,0.4];
var value = Array.sum(this.vals.map(function(el,idx) {
return Math.abs( el - input[idx] )
}));
emit(null,{ "output": [{ "_id": this._id, "value": value }]});
},
function(key,values) {
var output = [];
values.forEach(function(value) {
value.output.forEach(function(item) {
output.push(item);
});
});
output.sort(function(a,b) {
return a.value < b.value;
});
return { "output": output.slice(0,100) };
},
{ "out": { "inline": 1 } }
)
So the mapper function does the calculation and output's everything under the same key so all results are sent to the reducer. The end output is going to be contained in an array in a single output document, so it is both important that all results are emitted with the same key value and that the output of each emit is itself an array so mapReduce can work properly.
The sorting and reduction is done in the reducer itself, as each emitted document is inspected the elements are put into a single tempory array, sorted, and the top results are returned.
That is important, and just the reason why the emitter produces this as an array even if a single element at first. MapReduce works by processing results in "chunks", so even if all emitted documents have the same key, they are not all processed at once. Rather the reducer puts it's results back into the queue of emitted results to be reduced until there is only a single document left for that particular key.
I'm restricting the "slice" output here to 10 for brevity of listing, and including the stats to make a point, as the 100 reduce cycles called on this 10000 sample can be seen:
{
"results" : [
{
"_id" : null,
"value" : {
"output" : [
{
"_id" : ObjectId("56558d93138303848b496cd4"),
"value" : 2.2
},
{
"_id" : ObjectId("56558d96138303848b49906e"),
"value" : 2.2
},
{
"_id" : ObjectId("56558d93138303848b496d9a"),
"value" : 2.1
},
{
"_id" : ObjectId("56558d93138303848b496ef2"),
"value" : 2.1
},
{
"_id" : ObjectId("56558d94138303848b497861"),
"value" : 2.1
},
{
"_id" : ObjectId("56558d94138303848b497b58"),
"value" : 2.1
},
{
"_id" : ObjectId("56558d94138303848b497ba5"),
"value" : 2.1
},
{
"_id" : ObjectId("56558d94138303848b497c43"),
"value" : 2.1
},
{
"_id" : ObjectId("56558d95138303848b49842b"),
"value" : 2.1
},
{
"_id" : ObjectId("56558d96138303848b498db4"),
"value" : 2.1
}
]
}
}
],
"timeMillis" : 1758,
"counts" : {
"input" : 10000,
"emit" : 10000,
"reduce" : 100,
"output" : 1
},
"ok" : 1
}
So this is a single document output, in the specific mapReduce format, where the "value" contains an element which is an array of the sorted and limitted result.
Future Processing is Aggregate
As of writing, the current latest stable release of MongoDB is 3.0, and this lacks the functionality to make your operation possible. But the upcoming 3.2 release introduces new operators that make this possible:
db.test.aggregate([
{ "$unwind": { "path": "$vals", "includeArrayIndex": "index" }},
{ "$group": {
"_id": "$_id",
"result": {
"$sum": {
"$abs": {
"$subtract": [
"$vals",
{ "$arrayElemAt": [ { "$literal": [0.1,0.3,0.4] }, "$index" ] }
]
}
}
}
}},
{ "$sort": { "result": -1 } },
{ "$limit": 100 }
])
Also limitting to the same 10 results for brevity, you get output like this:
{ "_id" : ObjectId("56558d96138303848b49906e"), "result" : 2.2 }
{ "_id" : ObjectId("56558d93138303848b496cd4"), "result" : 2.2 }
{ "_id" : ObjectId("56558d96138303848b498e31"), "result" : 2.1 }
{ "_id" : ObjectId("56558d94138303848b497c43"), "result" : 2.1 }
{ "_id" : ObjectId("56558d94138303848b497861"), "result" : 2.1 }
{ "_id" : ObjectId("56558d96138303848b499037"), "result" : 2.1 }
{ "_id" : ObjectId("56558d96138303848b498db4"), "result" : 2.1 }
{ "_id" : ObjectId("56558d93138303848b496ef2"), "result" : 2.1 }
{ "_id" : ObjectId("56558d93138303848b496d9a"), "result" : 2.1 }
{ "_id" : ObjectId("56558d96138303848b499182"), "result" : 2.1 }
This is made possible largely due to $unwind being modified to project a field in results that contains the array index, and also due to $arrayElemAt which is a new operator that can extract an array element as a singular value from a provided index.
This allows the "look-up" of values by index position from your input array in order to apply the math to each element. The input array is facilitated by the existing $literal operator so $arrayElemAt does not complain and recongizes it as an array, ( seems to be a small bug at present, as other array functions don't have the problem with direct input ) and gets the appropriate matching index value by using the "index" field produced by $unwind for comparison.
The math is done by $subtract and of course another new operator in $abs to meet your functionality. Also since it was necessary to unwind the array in the first place, all of this is done inside a $group stage accumulating all array members per document and applying the addition of entries via the $sum accumulator.
Finally all result documents are processed with $sort and then the $limit is applied to just return the top results.
Summary
Even with the new functionallity about to be availble to the aggregation framework for MongoDB it is debatable which approach is actually more efficient for results. This is largely due to there still being a need to $unwind the array content, which effectively produces a copy of each document per array member in the pipeline to be processed, and that generally causes an overhead.
So whilst mapReduce is the only present way to do this until a new release, it may actually outperform the aggregation statement depending on the amount of data to be processed, and despite the fact that the aggregation framework works on native coded operators rather than translated JavaScript operations.
As with all things, testing is always recommended to see which case suits your purposes better and which gives the best performance for your expected processing.
Sample
Of course the expected result for the sample document provided in the question is 0.9 by the math applied. But just for my testing purposes, here is a short listing used to generate some sample data that I wanted to at least verify the mapReduce code was working as it should:
var bulk = db.test.initializeUnorderedBulkOp();
var x = 10000;
while ( x-- ) {
var vals = [0,0,0];
vals = vals.map(function(val) {
return Math.round((Math.random()*10),1)/10;
});
bulk.insert({ "vals": vals });
if ( x % 1000 == 0) {
bulk.execute();
bulk = db.test.initializeUnorderedBulkOp();
}
}
The arrays are totally random single decimal point values, so there is not a lot of distribution in the listed results I gave as sample output.

Mongoose Mongodb sorting and limiting query of subdocuments

I've got the following design Schema:
{
participants: [String],
conversations: [{
date: Date
messages: [String]
}]
}
Now i want to get the 6 newest converations. I have tried a lot but i can't seem to find the solution. I can sort by subdocuments, but at the end if 1 document has the 6 newest conversations the query will end up giving me this one document plus 5 others. I woud like to get an array like this at the end of the query or be able to get this particular information:
[{date:'Adate', messages:[]},{date:'Adate2',messages:[]}]
Thanks for your help!
Actually this is not possible with the single query if you are a using a mongoDB version LESS than 3.1.6.
$Slice is supported in aggregation pipeline in the mongoDB version 3.1.6 and above
If your mongoDB version is below 3.1.6, then you can try the below piece of code :
db.collection.aggregate([
{ $unwind : "conversations"},
{ $sort : {_id : 1, conversations.date : -1}},
{ $group: { _id : "$_id"} , conversations : { $push : "$conversations"}, participants : {$first : "$participants"} },
{ $project : { _id : 1, conversations : 1, participants : 1 } }
]).forEach( function(doc)
{
if( doc.conversations.length > 6)
{
var count = doc.conversations.length - 6;
doc.conversations.splice(6, count );
}
}
)
There is a similar question on the StackOverflow for the version below 3.1.6, Please check the link.
For the mongoDb Version 3.1.6 and above, you can use $Slice in aggregation pipeline to limit the contents of array.
Try the below code :
db.collection.aggregate([
{ $unwind : "conversations"},
{ $sort : {_id : 1, conversations.date : -1}},
{ $group: { _id : "$_id"} , conversations : { $push : "$conversations"}, participants : {$first : "$participants"} },
{ $project :
{
_id : 1,
participants : 1,
newconversations :
{
conversations : { $slice : 6 }
}
}
}
])

MongoDB - Get highest value of child

I'm trying to get the highest value of a child value. If I have two documents like this
{
"_id" : ObjectId("5585b8359557d21f44e1d857"),
"test" : {
"number" : 1,
"number2" : 1
}
}
{
"_id" : ObjectId("5585b8569557d21f44e1d858"),
"test" : {
"number" : 2,
"number2" : 1
}
}
How would I get the highest value of key "number"?
Using dot notation:
db.testSOF.find().sort({'test.number': -1}).limit(1)
To get the highest value of the key "number" you could use two approaches here. You could use the aggregation framework where the pipeline would look like this
db.collection.aggregate([
{
"$group": {
"_id": 0,
"max_number": {
"$max": "$test.number"
}
}
}
])
Result:
/* 0 */
{
"result" : [
{
"_id" : 0,
"max_number" : 2
}
],
"ok" : 1
}
or you could use the find() cursor as follows
db.collection.find().sort({"test.number": -1}).limit(1)
max() does not work the way you would expect it to in SQL for Mongo.
This is perhaps going to change in future versions but as of now,
max,min are to be used with indexed keys primarily internally for
sharding.
see http://www.mongodb.org/display/DOCS/min+and+max+Query+Specifiers
Unfortunately for now the only way to get the max value is to sort the
collection desc on that value and take the first.
db.collection.find("_id" => x).sort({"test.number" => -1}).limit(1).first()
quoted from: Getting the highest value of a column in MongoDB