I would like to find() any document in a collection from a skip value to a limit value for a condition, not all documents.for example, I want to get all persons until I find fifth person with black hair. not just five persons with black hair. How can I do it in mongodb?
Thank you!
Get the start/end ids and then the documents for that range. Supposing you want the consecutive persons between first and fifth person with black hair:
var start = db.persons.find({"hair": "black"}).sort({_id:1}).limit(1).toArray()[0]._id;
var end = db.persons.find({"hair": "black"}).sort({_id:1}).skip(4).limit(1).toArray()[0]._id;
db.persons.find({"_id": {$gte: start, $lte: end }})
Related
document : {
score:123
}
I have a field in the document called score(integer). I want to use a range query db.collection.find({score: {$gte: 100, $lt: 200}}). I have definite number of these ranges(approx 20).
Should i introduce a new field in the document to tell the type of range and then query on the indentifier of that range. Ex -
document: {
score: 123,
scoreType: "type1"
}
so which query is better-
1. db.collection.find({score: {$gte: 100, $lt: 200}})
2. db.collection.find({scoreType: "type1"})
In any case i will have to create an Index on either score or scoreType.
Which index would tend to perform better??
It depends entirely on your situation, if you are sure the number of documents in your database will always remain the same then use scoreType.
Keep in mind: scoreType will be a fixed value and thus will not help when you query over different ranges i.e it might work for 100 to 200 if score type was created
with this range in mind, but will not work for other ranges i.e for 100 to 500,(Do you plan on having a new scoreType2?) keeping flexibility in scope, this is a bad idea
I am using the below query on my MongoDB collection which is taking more than an hour to complete.
db.collection.find({language:"hi"}).sort({_id:-1}).skip(5000).limit(1)
I am trying to to get the results in a batch of 5000 to process in either ascending or descending order for documents with "hi" as a value in language field. So i am using this query in which i am skipping the processed documents every time by incrementing the "skip" value.
The document count in this collection is just above 20 million.
An index on the field "language" is already created.
MongoDB Version i am using is 2.6.7
Is there a more appropriate index for this query which can get the result faster?
When you want to sort descending, you should create a multi-field index which uses the field(s) you sort on as descending field(s). You do that by setting those field(s) to -1.
This index should greatly increase the performance of your sort:
db.collection.ensureIndex({ language: 1, _id: -1 });
When you also want to speed up the other case - retrieving sorted in ascending order - create a second index like this:
db.collection.ensureIndex({ language: 1, _id: 1 });
Keep in mind that when you do not sort your results, you receive them in natural order. Natural order is often insertion order, but there is no guarantee for that. There are various events which can cause the natural order to get messed up, so when you care about the order you should always sort explicitly. The only exception to this rule are capped collections which always maintain insertion order.
In order to efficiently "page" through results in the way that you want, it is better to use a "range query" and keep the last value you processed.
You desired "sort key" here is _id, so that makes things simple:
First you want your index in the correct order which is done with .createIndex() which is not the deprecated method:
db.collection.createIndex({ "language": 1, "_id": -1 })
Then you want to do some simple processing, from the start:
var lastId = null;
var cursor = db.collection.find({language:"hi"});
cursor.sort({_id:-1}).limit(5000).forEach(funtion(doc) {
// do something with your document. But always set the next line
lastId = doc._id;
})
That's the first batch. Now when you move on to the next one:
var cursor = db.collection.find({ "language":"hi", "_id": { "$lt": lastId });
cursor.sort({_id:-1}).limit(5000).forEach(funtion(doc) {
// do something with your document. But always set the next line
lastId = doc._id;
})
So that the lastId value is always considered when making the selection. You store this between each batch, and continue on from the last one.
That is much more efficient than processing with .skip(), which regardless of the index will "still" need to "skip" through all data in the collection up to the skip point.
Using the $lt operator here "filters" all the results you already processed, so you can move along much more quickly.
I have a mongo DB collection that looks something like this:
{
{
_id: objectId('aabbccddeeff'),
objectName: 'MyFirstObject',
objectLength: 0xDEADBEEF,
objectSource: 'Source1',
accessCounter: {
'firstLocationCode' : 283,
'secondLocationCode' : 543,
'ThirdLocationCode' : 564,
'FourthLocationCode' : 12,
}
}
...
}
Now, assuming that this is not the only record in the collection and that most/all of the documents contain the accessCounter subdocument/field how will I go with selecting the x first documents where I have the most access from a specific location.
A sample "query" will be something like:
"Select the first 10 documents From myCollection where the accessCounter.firstLocationCode are the highest"
So a sample result will be X documents where the accessCounter. will be the greatest is the database.
Thank your for taking the time to read my question.
No need for an aggregation, that is a basic query:
db.collection.find().sort({"accessCounter.firstLocation":-1}).limit(10)
In order to speed this up, you should create a subdocument index on accessCounter first:
db.collection.ensureIndex({'accessCounter':-1})
assuming the you want to do the same query for all locations. In case you only want to query firstLocation, create the index on accessCounter.firstLocation.
You can speed this up further in case you only need the accessCounter value by making this a so called covered query, a query of which the values to return come from the index itself. For example, when you have the subdocument indexed and you query for the top secondLocations, you should be able to do a covered query with:
db.collection.find({},{_id:0,"accessCounter.secondLocation":1})
.sort("accessCounter.secondLocation":-1).limit(10)
which translates to "Get all documents ('{}'), don't return the _id field as you do by default ('_id:0'), get only the 'accessCounter.secondLocation' field ('accessCounter.secondLocation:1'). Sort the returned values in descending order and give me the first ten."
Ranged pagination is cut and dry when you're paginating based on single unique fields, but how does it work, if at all, in situations with non-unique fields, perhaps several of them at a time?
TL;DR: Is it reasonable or possible to paginate and sort an "advanced search" type query using range-based pagination? This means querying on, and sorting on, user-selected, perhaps non-unique fields.
For example say I wanted to paginate a search for played word docs in a word game. Let's say each doc has a score and a word and I'd like to let users filter and sort on those fields. Neither field is unique. Assume a sorted index on the fields in question.
Starting simple, say the user wants to see all words with a score of 10:
// page 1
db.words.find({score: 10}).limit(pp)
// page 2, all words with the score, ranged on a unique _id, easy enough!
db.words.find({score: 10, _id: {$gt: last_id}}).limit(pp)
But what if the user wanted to get all words with a score less than 10?
// page 1
db.words.find({score: {$lt: 10}}).limit(pp)
// page 2, getting ugly...
db.words.find({
// OR because we need everything lt the last score, but also docs with
// the *same* score as the last score we haven't seen yet
$or: [
{score: last_score, _id: {$gt: last_id}},
{score: {$lt: last_score}
]
}).limit(pp)
Now what if the user wanted words with a score less than 10, and an alphabetic value greater than "FOO"? The query quickly escalates in complexity, and this is for just one variation of the search form with the default sort.
// page 1
db.words.find({score: {$lt: 10}, word: {$gt: "FOO"}}).limit(pp)
// page 2, officially ugly.
db.words.find({
$or: [
// triple OR because now we need docs that have the *same* score but a
// higher word OR those have the *same* word but a lower score, plus
// the rest
{score: last_score, word: {$gt: last_word}, _id: {$gt: last_id}},
{word: last_word, score: {$lt: last_score}, _id: {$gt: last_id}},
{score: {$lt: last_score}, word: {$gt: last_word}}
]
}).limit(pp)
I suppose writing a query builder for this sort of pattern would be doable, but it seems terribly messy and error prone. I'm leaning toward falling back to skip pagination with a capped result size, but I'd like to use ranged pagination if possible. Am I completely wrong in my thinking of how this would have to work? Is there a better way?
Edit: For the record...
With no viable alternatives thus far I'm actually just using skip based pagination with a limited result set, keeping the skip manageable. For my purposes this is actually sufficient, as there's no real need to search then paginate into the thousands.
You can get ranged pagination by sorting on a unique field and saving the value of that field for the last result. For example:
// first page
var page = db.words.find({
score:{$lt:10},
word:{$gt:"FOO"}
}).sort({"_id":1}).limit(pp);
// Get the _id from the last result
var page_results = page.toArray();
var last_id = page_results[page_results.length-1]._id;
// Use last_id to get your next page
var next_page = db.words.find({
score:{$lt:10},
word:{$gt:"FOO"},
_id:{$gt:last_id}
}).sort({"_id":1}).limit(pp);
how would I create a query to get both the current player's rank and the surrounding player ranks. For example, if I had a leaderboard collection with name and points
{name: 'John', pts: 123}
If John was in 23rd place, I would want to show the names of users in the 22nd and 24th place as well.
I could query for a count of leader board items with pts greater than 123 to get John's rank, but how can I efficiently get the one player that is ranked just above and below the current player? Can I get items based on index position alone?
I suppose I can make 2 queries, first to get the number the rank position of a user, then a skip limit query, but that seems inefficient and doesn't seem to have an efficient use of the index
db.leaderboards.find({pts:{$gt:123}}).count();
-> 23
db.leaderboards.find().skip(21).limit(3)
The last query seems to scan across 24 records using the its index, is there a way I can reasonably do this with a range query or something more efficient? I can see this becoming an issue if the user is very low ranked, like 50,000th place.
You'll need to do three queries:
var john = db.players.findOne({name: 'John'})
var next_player = db.players.find(
{_id: {$ne: john._id}, pts: {$gte: john.pts}}).sort({pts:1,name:1}).limit(-1)[0]
var previous_player = db.players.find(
{_id: {$ne: john._id}, pts: {$lte: john.pts}}).sort({pts:-1,name:-1}).limit(-1)[0]
Create indexes on name and pts.
Answer of A. Jesse Jiryu Davis is ok but I think there is another better option where geo/2d index could be used.
You could start with creating 2d index on pts field. And query the N number of documents near to a given point or a score. For example if you want to fetch 10 documents with points near to a score let say 123 then you can do this:
db.players.find( { pts: { $near: [ 123, 123 ] } } ).limit(10)
Points probably need to be normalised to fit the 2d index coordinates but this should work.