Mongodb: Get documents sorted by a dynamic ranking - mongodb

I have these documents:
{ "_id" : ObjectId("52abac78f8b13c1e6d05aeed"), "score" : 125494, "updated" : ISODate("2013-12-14T00:55:20.339Z"), "url" : "http://pictwittrer.com/1crfS1t" }
{ "_id" : ObjectId("52abac86f8b13c1e6d05af0f"), "score" : 123166, "updated" : ISODate("2013-12-14T00:55:34.354Z"), "url" : "http://bit.ly/JghJ1N" }
Now, i would like to get all documents sorted by this dynamic ranking:
ranking = score / (NOW - updated).abs
ranking is a float value where:
- score is the value of scopre property of my document
- the denominator is just the difference between NOW (when I'm executing this query) and updated field of my document
I'd want to do this because I want the old documents are sorted last

I'm new to Mongodb and aggregation frameworks but considering the answer Tim B gave I came up with this:
db.coll.aggregate(
{ $project : {
"ranking" : {
"$divide" : ["$score", {"$subtract":[new Date(), "$updated"]}]
}
}
},
{ $sort : {"ranking" : 1}})
Using $project you can reshape documents to insert precomputed values, in your case the ranking field. After that using $sort you can sort the documents by rank in the order you like by specifying 1 for ascending or -1 for descending.
I'm sorry for the terrible code formatting, I tried to make it as readable as possible.

Look at the MongoDB aggregation framework, you can do a project to create the score you want and then a sort to sort by that created score.
http://docs.mongodb.org/manual/core/aggregation-pipeline/
http://docs.mongodb.org/manual/reference/command/aggregate/#dbcmd.aggregate

Related

Mongodb query using db.collection.find() method 100 times faster than using db.collection.aggregate()?

My collection testData has some 4 milion documents with the identical structure:
{"_id" : ObjectId("5932c56571f5a268cea12226"),
"x" : 1.0,
"text" : "w592cQzC5aAfZboMujL3knCUlIWgHqZNuUcH0yJNS9U4",
"country" : "Albania",
"location" : {
"longitude" : 118.8775183,
"latitude" : 75.4316019
}}
The collection is indexed on (country, location.longitude) pair.
The following two queries, which I would consider identical and which produce identical output, differ in execution time by a factor of 100:
db.testData.aggregate(
[
{ $match : {country : "Brazil"} },
{ $sort : { "location.longitude" : 1 } },
{ $project : {"_id" : 0, "country" : 1, "location.longitude" : 1} }
]);
(this one produces output within about 6 seconds for the repeated query and about 120 seconds for the first-time query)
db.testData.find(
{ country : "Brazil" },
{"_id" : 0, "country" : 1, "location.longitude" : 1}
).sort(
{"location.longitude" : 1}
);
(this one produces output within 15 milliseconds for the repeated query and about 1 second for the first-time query).
What am I missing here? Thanx for any feedback.
MongoDB find operation is used to fetch documents from a collection according to filters .
MongoDB aggregation groups values from a collection and performs computation on group of values through execution of stages in pipeline and return computed result.
MongoDB find operation performs speedily as compared to aggregation operation as aggregate operation encapsulates multiple stages into pipeline which performs computation on data stored into collection with each stage's output serving as input to another stage and return processed result.
Mongo DB find operation returns a cursor to fetched documents that match filters and cursor is iterated to access document.
According to above mentioned description we need to fetch only those documents where value of country key is Brazil and sort documents according to values of longitude key in ascending order which can be accomplished easily using MongoDB find operation.

MongoDB indexing and counting for subdocument

Please give me an advise, how i can resolve my issue.
I have a MongoDB collection "cards" with over 5 million documents.
Here is example of typycal document in my collection
{
"_id" : "300465-905543",
"products" : "00000",
"groupQuality" : {
"defQuality" : 100,
"summQuality" : 92.22
}
}
I need count up quantity of documents with some products and with some value of quality so i tried to use something like this
db.cards.count({groupQuality.defQuality : {$gt : 50, $lte : 100}})
For improving speed of this operation i created index {groupQuality.defQuality : 1}
It was good decision, count of documents returned fast, but after adding into imbedded document with name "groupQuality" one more group of quality i must create another index for this group.
Quantity of new groups of quality may be huge, so i don't want build index for every new group of quality.
I start thinking about creating index {groupQuality : 1} which will cower all groups of quality in imbedded document. Does it possible in MongoDB?
If i can create such index, how can i make a query to count up documents with some products and with some value of quality in case using index by groupQuality?
I tryed next query but it always return 0.
db.cards.count({ products : "00000", groupQuality : { defQuality : {$gt : 50, $lte : 100}, summQuality : {$gt : 0, $lte : 100}}})
Where is my mistake?
When you have nested fields, you need to provide the full path for every field you match instead of providing a nested document:
db.cards.count({
"products" : "00000",
"groupQuality.defQuality" : {$gt : 50, $lte : 100},
"groupQuality.summQuality" : {$gt : 0, $lte : 100}
})

matching fields internally in mongodb

I am having following document in mongodb
{
"_id" : ObjectId("517b88decd483543a8bdd95b"),
"studentId" : 23,
"students" : [
{
"id" : 23,
"class" : "a"
},
{
"id" : 55,
"class" : "b"
}
]
}
{
"_id" : ObjectId("517b9d05254e385a07fc4e71"),
"studentId" : 55,
"students" : [
{
"id" : 33,
"class" : "c"
}
]
}
Note: Not an actual data but schema is exactly same.
Requirement: Finding the document which matches the studentId and students.id(id inside the students array using single query.
I have tried the code like below
db.data.aggregate({$match:{"students.id":"$studentId"}},{$group:{_id:"$student"}});
Result: Empty Array, If i replace {"students.id":"$studentId"} to {"students.id":33} it is returning the second document in the above shown json.
Is it possible to get the documents for this scenario using single query?
If possible, I'd suggest that you set the condition while storing the data so that you can do a quick truth check (isInStudentsList). It would be super fast to do that type of query.
Otherwise, there is a relatively complex way of using the Aggregation framework pipeline to do what you want in a single query:
db.students.aggregate(
{$project:
{studentId: 1, studentIdComp: "$students.id"}},
{$unwind: "$studentIdComp"},
{$project : { studentId : 1,
isStudentEqual: { $eq : [ "$studentId", "$studentIdComp" ] }}},
{$match: {isStudentEqual: true}})
Given your input example the output would be:
{
"result" : [
{
"_id" : ObjectId("517b88decd483543a8bdd95b"),
"studentId" : 23,
"isStudentEqual" : true
}
],
"ok" : 1
}
A brief explanation of the steps:
Build a projection of the document with just studentId and a new field with an array containing just the id (so the first document it would contain [23, 55].
Using that structure, $unwind. That creates a new temporary document for each array element in the studentIdComp array.
Now, take those documents, and create a new document projection, which continues to have the studentId and adds a new field called isStudentEqual that compares the equality of two fields, the studentId and studentIdComp. Remember that at this point there is a single temporary document that contains those two fields.
Finally, check that the comparison value isStudentEqual is true and return those documents (which will contain the original document _id and the studentId.
If the student was in the list multiple times, you might need to group the results on studentId or _id to prevent duplicates (but I don't know that you'd need that).
Unfortunately it's impossible ;(
to solve this problem it is necessary to use a $where statement
(example: Finding embeded document in mongodb?),
but $where is restricted from being used with aggregation framework
db.data.find({students: {$elemMatch: {id: 23}} , studentId: 23});

group in aggregate framework stopped working properly

I hate this kind of questions but maybe you can point me to obvious. I'm using Mongo 2.2.2.
I have a collection (in replica set) with 6M documents which has string field called username on which I have index. The index was non-unique but recently I made it unique. Suddenly following query gives me false alarms that I have duplicates.
db.users.aggregate(
{ $group : {_id : "$username", total : { $sum : 1 } } },
{ $match : { total : { $gte : 2 } } },
{ $sort : {total : -1} } );
which returns
{
"result" : [
{
"_id" : "davidbeges",
"total" : 2
},
{
"_id" : "jesusantonio",
"total" : 2
},
{
"_id" : "elesitasweet",
"total" : 2
},
{
"_id" : "theschoolofbmx",
"total" : 2
},
{
"_id" : "longflight",
"total" : 2
},
{
"_id" : "thenotoriouscma",
"total" : 2
}
],
"ok" : 1
}
I tested this query on sample collection with few documents and it works as expected.
One of 10gen responded in their JIRA.
Are there any updates on this collection? If so, I'd try adding {$sort: {username:1}} to the front of the pipeline. That will ensure that you only see each username once if it is unique.
If there are updates going on, it is possible that aggregation would see a document twice if it moves due to growth. Another possibility is that a document was deleted after being seen by the aggregation and a new one was inserted with the same username.
So sorting by username before grouping helped.
I think the answer may lie in the fact that your $group is not using an index, it's just doing a scan over the entire collection. These operators can use and index currently in the aggregation framework:
$match $sort $limit $skip
And they will work if placed before:
$project $unwind $group
However, $group by itself will not use an index. When you do your find() test I am betting you are using the index, possibly as a covered index (you can verify by looking at an explain() for that query), rather than scanning the collection. Basically my theory is that your index has no dupes, but your collection does.
Edit: This likely happens because a document is updated/moved during the aggregation operation and hence is seen twice, not because of dupes in the collection as originally thought.
If you add an operator earlier in the pipeline that can use the index but not alter the results fed into $group, then you can avoid the issue.

Using Spring Data Mongodb, is it possible to get the max value of a field without pulling and iterating over an entire collection?

Using mongoTemplate.find(), I specify a Query with which I can call .limit() or .sort():
.limit() returns a Query object
.sort() returns a Sort object
Given this, I can say Query().limit(int).sort(), but this does not perform the desired operation, it merely sorts a limited result set.
I cannot call Query().sort().limit(int) either since .sort() returns a Sort()
So using Spring Data, how do I perform the following as shown in the mongoDB shell? Maybe there's a way to pass a raw query that I haven't found yet?
I would be ok with extending the Paging interface if need be...just doesn't seem to help any. Thanks!
> j = { order: 1 }
{ "order" : 1 }
> k = { order: 2 }
{ "order" : 2 }
> l = { order: 3 }
{ "order" : 3 }
> db.test.save(j)
> db.test.save(k)
> db.test.save(l)
> db.test.find()
{ "_id" : ObjectId("4f74d35b6f54e1f1c5850f19"), "order" : 1 }
{ "_id" : ObjectId("4f74d3606f54e1f1c5850f1a"), "order" : 2 }
{ "_id" : ObjectId("4f74d3666f54e1f1c5850f1b"), "order" : 3 }
> db.test.find().sort({ order : -1 }).limit(1)
{ "_id" : ObjectId("4f74d3666f54e1f1c5850f1b"), "order" : 3 }
You can do this in sping-data-mongodb. Mongo will optimize sort/limit combinations IF the sort field is indexed (or the #Id field). This produces very fast O(logN) or better results. Otherwise it is still O(N) as opposed to O(N*logN) because it will use a top-k algorithm and avoid the global sort (mongodb sort doc). This is from Mkyong's example but I do the sort first and set the limit to one second.
Query query = new Query();
query.with(new Sort(Sort.Direction.DESC, "idField"));
query.limit(1);
MyObject maxObject = mongoTemplate.findOne(query, MyObject.class);
Normally, things that are done with aggregate SQL queries, can be approached in (at least) three ways in NoSQL stores:
with Map/Reduce. This is effectively going through all the records, but more optimized (works with multiple threads, and in clusters). Here's the map/reduce tutorial for MongoDB.
pre-calculate the max value on each insert, and store it separately. So, whenever you insert a record, you compare it to the previous max value, and if it's greater - update the max value in the db.
fetch everything in memory and do the calculation in the code. That's the most trivial solution. It would probably work well for small data sets.
Choosing one over the other depends on your usage of this max value. If it is performed rarely, for example for some corner reporting, you can go with the map/reduce. If it is used often, then store the current max.
As far as I am aware Mongo totally supports sort then limit: see http://www.mongodb.org/display/DOCS/Sorting+and+Natural+Order
Get the max/min via map reduce is going to be very slow and should be avoided at all costs.
I don't know anything about Spring Data, but I can recommend Morphia to help with queries. Otherwise a basic way with the Java driver would be:
DBCollection coll = db.getCollection("...");
DBCursor curr = coll.find(new BasicDBObject()).sort(new BasicDBObject("order", -1))
.limit(1);
if (cur.hasNext())
System.out.println(cur.next());
Use aggregation $max .
As $max is an accumulator operator available only in the $group stage, you need to do a trick.
In the group operator use any constant as _id .
Lets take the example given in Mongodb site only --
Consider a sales collection with the following documents:
{ "_id" : 1, "item" : "abc", "price" : 10, "quantity" : 2, "date" : ISODate("2014-01-01T08:00:00Z") }
{ "_id" : 2, "item" : "jkl", "price" : 20, "quantity" : 1, "date" : ISODate("2014-02-03T09:00:00Z") }
{ "_id" : 3, "item" : "xyz", "price" : 5, "quantity" : 5, "date" : ISODate("2014-02-03T09:05:00Z") }
{ "_id" : 4, "item" : "abc", "price" : 10, "quantity" : 10, "date" : ISODate("2014-02-15T08:00:00Z") }
{ "_id" : 5, "item" : "xyz", "price" : 5, "quantity" : 10, "date" : ISODate("2014-02-15T09:05:00Z") }
If you want to find out the max price among all the items.
db.sales.aggregate(
[
{
$group:
{
_id: "1", //** This is the trick
maxPrice: { $max: "$price" }
}
}
]
)
Please note that the value of "_id" - it is "1". You can put any constant...
Since the first answer is correct but the code is obsolete, I'm replying with a similar solution that worked for me:
Query query = new Query();
query.with(Sort.by(Sort.Direction.DESC, "field"));
query.limit(1);
Entity maxEntity = mongoTemplate.findOne(query, Entity.class);