Get records from db within a limit range - mongodb

Assume that my database returns 1000 records based on a query that I have.
What I wish to do is using the same query, get the first 100 records, then get the next 100 and so on until I have all the 1000.
That is, I do not want all the 100 records in one go. I need them in batches of 100.
So something like this perhaps:
query = {
'$from': 0,
'$to': 100
}
with the first request followed by
query = {
'$from': 100,
'$to': 200
}
for the next request and so on.
I don't want all 1000 results at once. I wish to be able to specify the start and end counts so that I get the result in batches - is this possible in mongodb?

You could use skip and limit for your queries.
For example..
db.myCollection.find().limit(100) //Get the first 100 records
db.myCollection.find().skip(100).limit(100) //Get the next 100 records
This is an expensive method though, I would much rather get all 1000 and "separate" them client-side.
Here's a link to both method's docs
http://docs.mongodb.org/manual/reference/method/cursor.skip/
http://docs.mongodb.org/manual/reference/method/cursor.limit/

Related

Couchbase N1QL Query getting distinct on the basis of particular fields

I have a document structure which looks something like this:
{
...
"groupedFieldKey": "groupedFieldVal",
"otherFieldKey": "otherFieldVal",
"filterFieldKey": "filterFieldVal"
...
}
I am trying to fetch all documents which are unique with respect to groupedFieldKey. I also want to fetch otherField from ANY of these documents. This otherFieldKey has minor changes from one document to another, but I am comfortable with getting ANY of these values.
SELECT DISTINCT groupedFieldKey, otherField
FROM bucket
WHERE filterFieldKey = "filterFieldVal";
This query fetches all the documents because of the minor variations.
SELECT groupedFieldKey, maxOtherFieldKey
FROM bucket
WHERE filterFieldKey = "filterFieldVal"
GROUP BY groupFieldKey
LETTING maxOtherFieldKey= MAX(otherFieldKey);
This query works as expected, but is taking a long time due to the GROUP BY step. As this query is used to show products in UI, this is not a desired behaviour. I have tried applying indexes, but it has not given fast results.
Actual details of the records:
Number of records = 100,000
Size per record = Approx 10 KB
Time taken to load the first 10 records: 3s
Is there a better way to do this? A way of getting DISTINCT only on particular fields will be good.
EDIT 1:
You can follow this discussion thread in Couchbase forum: https://forums.couchbase.com/t/getting-distinct-on-the-basis-of-a-field-with-other-fields/26458
GROUP must materialize all the documents. You can try covering index
CREATE INDEX ix1 ON bucket(filterFieldKey, groupFieldKey, otherFieldKey);

Why mongo db (version 3.0.6) returns wrong number of records when we use count with limit option?

As per mongo db doc says we can use count with limit.
Limit option is used to specify the maximum number of documents the cursor will return. But if we use limit with count it returns total count and not correct count.
Why?
Suppose we have 50 records in collection then only count option will return 50, and if we apply limit(10) option then it should return 10 and not 50. But count with limit returns 50.
db.collection.find(<query>).count();
You will get count of all records found after executing the query. i.e count=50;
db.collection.find(<query>).limit(10).count(true);
You will get the count of limited documents. i.e count=10.
You should set applySkipLimit to true.
http://docs.mongodb.org/manual/reference/method/cursor.count/

Get top 50 records for a certain value w/ mongo and meteor

In my meteor project, I have a leaderboard of sorts, where it shows players of every level on a chart, spread across every level in the game. For simplicitys sake, lets say there are levels 1-100. Currently, to avoid overloading meteor, I just tell the server to send me every record newer than two weeks old, but that's not sufficient for an accurate leaderboard.
What I'm trying to do is show 50 records representing each level. So, if there are 100 records at level 1, 85 at level 2, 65 at level 3, and 45 at level 4, I want to show the latest 50 records from each level, making it so I would have [50, 50, 50, 45] records, respectively.
The data looks something like this:
{
snapshotDate: new Date(),
level: 1,
hp: 5000,
str: 100
}
I think this requires some mongodb aggregation, but I couldn't quite figure out how to do this in one query. It would be trivial to do it in two, though - select all records, group by level, sort each level by date, then take the last 50 records from each level. However, I would prefer to do it in one operation, if I could. Is it currently possible to do something like this?
Currently there is no way to pick up n top records of a group, in the aggregation pipeline. There is an unresolved open ticket regarding this: https://jira.mongodb.org/browse/SERVER-9377.
There are two solutions to this:
Keep your document structure as it is now and aggregate, but,
grab the n top records and slice off the remaining records for each group, in the client side.
Code:
var top_records = [];
db.collection.aggregate([
// The sort operation needs to come before the $group,
// because once the records are grouped by level,
// there exists only one document per group.
{$sort:{"snapshotDate":-1}},
// Maintain all the records in an array in sorted order.
{$group:{"_id":"$level","recs":{$push:"$$ROOT"}}},
],{allowDiskUse: true}).forEach(function(level){
level.recs.splice(50); //Keep only the top 50 records.
top_records.push(level);
})
Remember that this loads all the documents for each level and removes the unwanted records in the client side.
Alter your document structure to accomplish what you really need. If
you only need the top n records always, keep them always in sorted
order in the root document.This is accomplished using a sorted capped array.
Your document would look like this:
{
level:1,
records:[{snapshotDate:2,hp:5000,str:100},
{snapshotDate:1,hp:5001,str:101}]
}
where, records is an capped array of size n and always has sub documents sorted in descending order of their snapshotDate.
To make the records array work that way, we always perform an update operation when we need to insert documents to it for any level.
db.collection.update({"level":1},
{$push:{
recs:{
$each:[{snapshotDate:1,hp:5000,str:100},
{snapshotDate:2,hp:5001,str:101}],
$sort:{"snapshotDate":-1},
$slice:50 //Always trim the array size to 50.
}
}},{upsert:true})
What this does is, is always keeps the size of the records array to 50 and always sorts the records whenever new sub documents are inserted at a level.
A simple find, db.collection.find({"level":{$in:[1,2,..]}}), would give you the top 50 records in order, for each selected level.

How to find the row number of a row in a sorted MongoDB collection to calculate its percentile?

I have a large MongoDB collection that contains a userID and a counter representing total hits for that user over time. I'd like to be able to calculate a given users percentile.
Conceptually, what I'd like to do is sort the collection and then get the row number for that given user's record and divide that number by the total count for the collection:
percentile = row_index / total_rows;
How would this be accomplished in MongoDB?
Get total count by db.yourCollection.count()
Then count record that have larger number using
db.yourCollection.find({$gte: value}).count()
If total count = 1000, count for larger or equal = 950 then you've got in 950/1000 - top 95%
But if you use your collection often in read mode and rare in write mode I'd suggest to make new temp collection using MapReduce to have records {_id:..., percent:...}
The trivial solution here is to sort by total hits descending. You then cursor through the results until you find your UserID.
Clearly, this solution does not provide great performance if you have to run it a lot. It's easy to get a "top 20", but it's far more computation get a "bottom 25%".
If this query is really important or you're running it a lot, there are couple of workarounds.
I think the easiest one is simply to run a job that builds the percentiles for you on a regular basis. Basically you build a collection that looks like this:
{ percent : 95, score : 888888 }
{ precent : 90, score : 777777 }
...
To get a user's percentile, you just look up their score in that relatively small collection. To update those scores, simply run a job on a regular basis that loops through all of the user.

Perl mongodb $collecton->find::How many roundtrips to mongodb while fetching?

If my collection has 10 records.
my $records = $collection->find;
while (my $record = $records->next){
do something;
}
Are there ten roundtrips to the mongodb server?
If so, is there any way to limit it to one roundtrip?
Thanks.
The answer is it's just one query per batch of records/documents returned in groups of 100 by default.
If your result set is 250 docs, the first access of the cursor to get doc 1 will load docs 1-100 in memory, when doc 101 is accessed this causes another 100 docs to be loaded from the server, and finally one more query for the last 50 docs.
See the mongodb docs about cursors and "getmore" command.
It's a single query, just like querying a RDBMS.
As per the documentation:
my $cursor = $collection->find({ i => { '$gt' => 42 } });
Executes the given $query and returns a MongoDB::Cursor with the results
my $cursor = $collection->query({ }, { limit => 10, skip => 10 });
Valid query attributes are:
limit - Limit the number of results.
skip -Skip a number of results.
sort_by - Order results.
No, i am absolutely sure that in above code only one roundtrip to the server. For example in c# the same code will load all data only once, when you start iteration.
while (my $record = $records->next){
^^^
here on first iteration driver load all 10 records
It seems to me logical have only one request to the server.
From documentation:
The shell find() method returns a
cursor object which we can then
iterate to retrieve specific documents
from the result
You can use the "mongosniff" tool to figure out the operations over the wire. Apart from that: you basically have no other option then iterating over the cursor....so why do you care?