MongoDB Count Inaccurate on Geospatial Queries - mongodb

I'm querying for documents that are close to a location ($near and $maxDistance) and fall within a date range (an $or with a 3 sets of $gt/$lt conditions relating to dates/schedules).
I find that $cursor->count() always returns 100 even if there are 100 or more results regardless of limit().
It seems like $cursor->skip()->limit() work fine, allowing me to skip more than 100 results (when there are more than 100), but it bothers me that count() always returns 100 and there seems to be no way to determine the full count (other than paging until there are no more results).
I find references to map reduce not working correctly with geospatial, and the mongodb docs reference a default limit() of 100.
The above query finds the closest points to (50,50) and returns them sorted by distance (there is no need for an additional sort parameter). Use limit() to specify a maximum number of points to return (a default limit of 100 applies if unspecified):
Is this a known issue? I'm using the PHP driver.

Waiting for add $or $and support to geo-spital for a year:
Estimate: Medium ( < 1 week)
Fix Version/s: planned but not scheduled
https://jira.mongodb.org/browse/SERVER-3984
__
maybe they support this due 2014 ;)
__
http://pastebin.com/raw.php?i=FD3xe6Jt
http://www.zopyx.de/blog/goodbye-mongodb
http://blog.engineering.kiip.me/post/20988881092/a-year-with-mongodb
http://blog.schmichael.com/2011/11/05/failing-with-mongodb/

Related

Is there a way to know whether .limit() actually removes any documents?

Using mongocxx driver (c++ project).
Working on a mongodb query to paginate some results from a query. I'm trying to return the first 10 results, while also informing whether or not there are more results to fetch with another query (using an offset) - so as to inform the recipient if there are more documents to fetch. The results are stored in a std::vector after the db find query.
Is there any elegant way to do this, preferably without returning all the result documents and then comparing the vector size to the specified page limit?
Current query (without specifics):
db.collection.find({"<some_field>" : <some value>}).limit(10);
This, however, will not inform whether or not any documents were removed, in the case that exactly 10 results were found.
Currently I'm simply returning the full vector of results and looping through it, breaking if the loop goes over 10 iterations (and setting a "more_items" bool to true).
You have 2 ways to do this:
Count all documents found by query:
db.collection.count({"<some_field>" : <some value>});
And then if there is more documents than you need (10 in here) - you can set "more_items" bool to true
Find and set limit to +1 (11 in here):
db.collection.find({"<some_field>" : <some value>}).limit(11);
That way you find 11 documents or less.
If you find 11 documents - this indicates that you have more documents than 10 (actual limit). If you find less than 11 - then you don't have documents to reach actual limit.

Why MongoDB find has same performance as count

I am running tests against my MongoDB and for some reason find has the same performance as count.
Stats:
orders collection size: ~20M,
orders with product_id 6: ~5K
product_id is indexed for improved performance.
Query: db.orders.find({product_id: 6}) vs db.orders.find({product_id: 6}).count()
result the orders for the product vs 5K after 0.08ms
Why count isn't dramatically faster? it can find the first and last elements position with the product_id index
As Mongo documentation for count states, calling count is same as calling find, but instead of returning the docs, it just counts them. In order to perform this count, it iterates over the cursor. It can't just read the index and determine the number of documents based on first and last value of some ID, especially since you can have index on some other field that's not ID (and Mongo IDs are not auto-incrementing). So basically find and count is the same operation, but instead of getting the documents, it just goes over them and sums their number and return it to you.
Also, if you want a faster result, you could use estimatedDocumentsCount (docs) which would go straight to collection's metadata. This results in loss of the ability to ask "What number of documents can I expect if I trigger this query?". If you need to find a count of docs for a query in a faster way, then you could use countDocuments (docs) which is a wrapper around an aggregate query. From my knowledge of Mongo, the provided query looks like a fastest way to count query results without calling count. I guess that this should be preferred way regarding performances for counting the docs from now on (since it's introduced in version 4.0.3).

Implementation of limit in mongodb

My collection name is trial and data size is 112mb
My query is,
db.trial.find()
and i have added limit up-to 10.
db.trial.find.limit(10).
but the limit is not working.the entire query is running.
Replace
db.trial.find.limit(10)
with
db.trial.find().limit(10)
Also you mention that the entire database is being queried? Run this
db.trial.find().limit(10).explain()
It will tell you how many documents it looked at before stopping the query (nscanned). You will see that nscanned will be 10.
The .limit() modifier on it's own will only "limit" the results of the query that is processed, so that works as designed to "limit" the results returned. In a raw form though with no query you should just have the n scanned as the limit you want:
db.trial.find().limit(10)
If your intent is to only operate on a set number of documents you can alter this with the $maxScan modifier:
db.trial.find({})._addSpecial( "$maxScan" , 11 )
Which causes the query engine to "give up" after the set number of documents have been scanned. But that should only really matter when there is something meaningful in the query.
If you are actually trying to do "paging" then you are better of using "range" queries with $gt and $lt and cousins to effectively change the range of selection that is done in your query.

MongoDB query: Using Limit together with $near skips few documents

I am currently developing an app which gets the specific number of documents from a collection if their location cordinates falls within certain range of distance. I am using a active record library for Codeigniter and the query that is generated is as follows
db.updates.find({locs: { $near: [72.844102008984, 19.130207090604 ], $maxDistance: 5000 }, posted_on : { $lt :1398425538.1942 },}).sort( { posted_on: -1 } ).limit(10).toArray()
The problem I am facing is that the above query skips few documents which should actually get pulled. But if I remove the limit(10) from the above query then proper documents gets pulled.
I am not sure, but does using limit() in MongoDB omit few results ? or does it limits to only the closest(nearest) documents?
P.S - The documents skipped using the limit are not always the same & random results are generated
I suspect you are running into problems with the special nature of the $near query. $near performs both a limit() and a sort() on the cursor returning the results -
Specifies a point for which a geospatial query returns the closest documents first. The query sorts the documents from nearest to farthest.
By default, queries that use a 2d index return a limit of 100 documents; however you may use limit() to change the number of results.
http://docs.mongodb.org/manual/reference/operator/query/near/
While the documentation does specifically discuss overriding the limit of 100 with your own limit call
You can further limit the number of results using cursor.limit().
It is silent on adding your own sort() or both sorting and overriding the limit at the same time. I suspect you are running into side effects of doing both. Note that it's not incorrect to do both - it just may not produce the results you are looking for. I'd suggest trying the same query using $geoWithin
http://docs.mongodb.org/manual/reference/operator/query/geoWithin/
$geoWithin does not apply a sort or a limit on the results, so it gives you something of a more raw result set.
Do you have any identical posted_on dates in the system? I recommend sorting by a second key, perhaps _id. If the sort order is non-deterministic the system may skip documents in a non-deterministic manor. Adding the _id field to your sort order is generally not that expensive if you have an index on the other fields as they will already be very close to the correct order and _id is part of all indexes. ("By default, all collections have an index on the _id field, and applications and users may add additional indexes to support important queries and operations." http://docs.mongodb.org/manual/core/index-single/ )

Is it faster to use with_limit_and_skip=True when counting query results in pymongo

I'm doing a query where all I want to know if there is at least one row in the collection that matches the query, so I pass limit=1 to find(). All I care about is whether the count() of the returned cursor is > 0. Would it be faster to use count(with_limit_and_skip=True) or just count()? Intuitively it seems to me like I should pass with_limit_and_skip=True, because if there are a whole bunch of matching records then the count could stop at my limit of 1.
Maybe this merits an explanation of how limits and skips work under the covers in mongodb/pymongo.
Thanks!
Your intuition is correct. That's the whole point of the with_limit_and_skip flag.
With with_limit_and_skip=False, count() has to count all the matching documents, even if you use limit=1, which is pretty much guaranteed to be slower.
From the docs:
Returns the number of documents in the results set for this query. Does not take limit() and skip() into account by default - set with_limit_and_skip to True if that is the desired behavior.