Implementation of limit in mongodb - mongodb

My collection name is trial and data size is 112mb
My query is,
db.trial.find()
and i have added limit up-to 10.
db.trial.find.limit(10).
but the limit is not working.the entire query is running.

Replace
db.trial.find.limit(10)
with
db.trial.find().limit(10)
Also you mention that the entire database is being queried? Run this
db.trial.find().limit(10).explain()
It will tell you how many documents it looked at before stopping the query (nscanned). You will see that nscanned will be 10.

The .limit() modifier on it's own will only "limit" the results of the query that is processed, so that works as designed to "limit" the results returned. In a raw form though with no query you should just have the n scanned as the limit you want:
db.trial.find().limit(10)
If your intent is to only operate on a set number of documents you can alter this with the $maxScan modifier:
db.trial.find({})._addSpecial( "$maxScan" , 11 )
Which causes the query engine to "give up" after the set number of documents have been scanned. But that should only really matter when there is something meaningful in the query.
If you are actually trying to do "paging" then you are better of using "range" queries with $gt and $lt and cousins to effectively change the range of selection that is done in your query.

Related

How does the limit() option work in mongodb?

Let say you have a collection of 10,000 documents and I make a find query with a the option limit(50). How will mongoDb choose which 50 documents to return.
Will it auto-sort them(maybe by their creation date) or not?
Will the query return the same documents every time it is called? How does the limit option work in mongodb?
Does mongoDB limit the documents after they are returned or as it queries them. Meaning will mongoDB query all documents the limit the results to 50 documents or will it query the 50 documents only?
The first 50 documents of the result set will be returned.
If you do not sort the documents (or if the order is not well-defined, such as sorting by a field with values that occur multiple times in the result set), the order may change from one execution to the next.
Will it auto-sort them(maybe by their creation date) or not?
No.
Will the query return the same documents every time it is called?
The query may produce the same results for a while and then start producing different results if, for example, another document is inserted into the collection.
Meaning will mongoDB query all documents the limit the results to 50 documents or will it query the 50 documents only?
Depends on the query. If an index is used, only the needed documents will be read from the storage engine. If a sort stage is used in the query execution, all documents will be read from storage, sorted, then the required number will be returned and the rest discarded.

Why MongoDB find has same performance as count

I am running tests against my MongoDB and for some reason find has the same performance as count.
Stats:
orders collection size: ~20M,
orders with product_id 6: ~5K
product_id is indexed for improved performance.
Query: db.orders.find({product_id: 6}) vs db.orders.find({product_id: 6}).count()
result the orders for the product vs 5K after 0.08ms
Why count isn't dramatically faster? it can find the first and last elements position with the product_id index
As Mongo documentation for count states, calling count is same as calling find, but instead of returning the docs, it just counts them. In order to perform this count, it iterates over the cursor. It can't just read the index and determine the number of documents based on first and last value of some ID, especially since you can have index on some other field that's not ID (and Mongo IDs are not auto-incrementing). So basically find and count is the same operation, but instead of getting the documents, it just goes over them and sums their number and return it to you.
Also, if you want a faster result, you could use estimatedDocumentsCount (docs) which would go straight to collection's metadata. This results in loss of the ability to ask "What number of documents can I expect if I trigger this query?". If you need to find a count of docs for a query in a faster way, then you could use countDocuments (docs) which is a wrapper around an aggregate query. From my knowledge of Mongo, the provided query looks like a fastest way to count query results without calling count. I guess that this should be preferred way regarding performances for counting the docs from now on (since it's introduced in version 4.0.3).

MongoDB query: Using Limit together with $near skips few documents

I am currently developing an app which gets the specific number of documents from a collection if their location cordinates falls within certain range of distance. I am using a active record library for Codeigniter and the query that is generated is as follows
db.updates.find({locs: { $near: [72.844102008984, 19.130207090604 ], $maxDistance: 5000 }, posted_on : { $lt :1398425538.1942 },}).sort( { posted_on: -1 } ).limit(10).toArray()
The problem I am facing is that the above query skips few documents which should actually get pulled. But if I remove the limit(10) from the above query then proper documents gets pulled.
I am not sure, but does using limit() in MongoDB omit few results ? or does it limits to only the closest(nearest) documents?
P.S - The documents skipped using the limit are not always the same & random results are generated
I suspect you are running into problems with the special nature of the $near query. $near performs both a limit() and a sort() on the cursor returning the results -
Specifies a point for which a geospatial query returns the closest documents first. The query sorts the documents from nearest to farthest.
By default, queries that use a 2d index return a limit of 100 documents; however you may use limit() to change the number of results.
http://docs.mongodb.org/manual/reference/operator/query/near/
While the documentation does specifically discuss overriding the limit of 100 with your own limit call
You can further limit the number of results using cursor.limit().
It is silent on adding your own sort() or both sorting and overriding the limit at the same time. I suspect you are running into side effects of doing both. Note that it's not incorrect to do both - it just may not produce the results you are looking for. I'd suggest trying the same query using $geoWithin
http://docs.mongodb.org/manual/reference/operator/query/geoWithin/
$geoWithin does not apply a sort or a limit on the results, so it gives you something of a more raw result set.
Do you have any identical posted_on dates in the system? I recommend sorting by a second key, perhaps _id. If the sort order is non-deterministic the system may skip documents in a non-deterministic manor. Adding the _id field to your sort order is generally not that expensive if you have an index on the other fields as they will already be very close to the correct order and _id is part of all indexes. ("By default, all collections have an index on the _id field, and applications and users may add additional indexes to support important queries and operations." http://docs.mongodb.org/manual/core/index-single/ )

Is it faster to use with_limit_and_skip=True when counting query results in pymongo

I'm doing a query where all I want to know if there is at least one row in the collection that matches the query, so I pass limit=1 to find(). All I care about is whether the count() of the returned cursor is > 0. Would it be faster to use count(with_limit_and_skip=True) or just count()? Intuitively it seems to me like I should pass with_limit_and_skip=True, because if there are a whole bunch of matching records then the count could stop at my limit of 1.
Maybe this merits an explanation of how limits and skips work under the covers in mongodb/pymongo.
Thanks!
Your intuition is correct. That's the whole point of the with_limit_and_skip flag.
With with_limit_and_skip=False, count() has to count all the matching documents, even if you use limit=1, which is pretty much guaranteed to be slower.
From the docs:
Returns the number of documents in the results set for this query. Does not take limit() and skip() into account by default - set with_limit_and_skip to True if that is the desired behavior.

How does MongoDB evaluates multiple $or statements?

How will MongoDB evaluate this query:
db.testCol.find(
{
"$or" : [ {a:1, b:12}, {b:9, c:15}, {c:10, d:"foo"} ]
});
When scanning values in a document if first OR statement is TRUE will the other statements be also be evaluated?
Logically if the MongoDB is optimized other values in OR statement should not be evaluated, but I don't know how MongoDB is implemented.
UPDATE:
I updated my query because it was wrong and it didn't explain correctly what I was trying to accomplish. I need to find a set of documents that have different properties and if an exact combination of these properties is found the document must be returned.
The SQL equivalent of my query would be:
SELECT * FROM testCol
WHERE (a = 1 AND b = 12) OR (b = 9 AND c = 15) OR (c = 10 AND d = 'foo');
MongoDB will execute each clause of the $or operation as a seperate query and remove duplicates as a post processing pass. As such each clause can use a seperate index which is often very useful.
In other words, it will NOT look at 1 document, see which of the OR clauses apply and do an early-out if the first clause is a match. Rather it does a full dataset query per clause and de-dupe after the fact. This may seem less than efficient but in practice it's almost always faster since the first approach would only be able to hit at most one index for all clauses which is rarely efficient.
EDIT: Mongo only skips documents during the de-duplication process, not during the table scans.
Mongo won't check documents that are already part of the result set. So if your first {a:1, b:12} returns 100% of the documents, Mongo is done.
You want to put whatever will grab the most documents as your first evaluated statement because of this. If your first item only grabs 1% of documents, the subsequent item will need to scan the other 99%.
That being said, you are using $or to look for values in a single key. I think you want to use $in for this.
See here for more:
http://books.google.com/books?id=BQS33CxGid4C&lpg=PA48&ots=PqvQJPRUoe&dq=mongo%20tips%20and%20tricks%20%22OR-query%22&pg=PA48#v=onepage&q&f=false