How does sorting work in MongoDB? - mongodb

I have this query: (Doctrine2 ODM)
....
->field('coordinates')
->near(
(float)$lat,
(float)$lng)
->field('date')
->lte($lastdate)
->sort('date','desc')
->limit(9)
->getQuery()
->execute()
->toArray();
It gives me documents with the following dates : (example)
2014-03-31 01:51:06
2014-03-31 01:51:02
2014-03-31 01:50:46
2014-03-31 01:50:07
IF I change the limit to 20 for example , I get these dates:
2014-03-31 01:52:01
2014-03-31 01:51:42
2014-03-31 01:51:16
2014-03-31 01:51:06
My question is why these dates were skipped in the first query ?
Does mongoDB collects the first documents that match the criteria then sort them ?
That would be very stupid !!
I changed the order in the query (criteria after sorting) but it doesn't seem to have any effect.. WTF

$near will get you the nearest documents; it's not going to get you newer documents further away.
From the doc:
Specifies a point for which a geospatial query returns the closest documents first. The query sorts the documents from nearest to farthest.
$near without $maxDistance is inherently a sort; without specifying the bounds of the data you're interested in, applying a sort that superscedes the $near query would effectively negate the $near.
You might be able to combine a $near query that specifies both $geometry and $maxDistance to filter a result set, then sort to sort on the date of that filtered set, though. I haven't tried it, but it would change the semantics of your $near clause such that it may work.

If you want to sort by distance, use $near, but if you want to sort by a different attribute, use $within now $geoWithin operator which returns results that are within a radius without sorting them, allowing you to sort on a different criteria.

Related

How to get closest 100 documents in Mongo?

I want to query a database for the first 100 documents that are closest to me in MongoDB. Once I have the closest 100 documents, I want to sort them by custom fields in the documents. Such as createdAt or points. It seems like $near is what I want, but their docs say:
When using sort() with geospatial queries, consider using $geoWithin operator, which does not sort documents, instead of $near.
So it seems like they suggest using $geoWithin, but I don't want to constrain the search to a specific range. Suggestions?

How to fire find query on sub-documents in MongoDB

I am not able to get values in a sub-document like the first query below.
> db.posts.find({'repository': {'language':'Python'}}).count()
0
> db.posts.find({'actor': 'swiftlinux'}).count()
12
Can someone tell me how to get results when the query is based on a sub-document?
Should be
db.posts.count({'repository.language': 'Python'})
Sub-documents are queried with a dot. Also, you apply the count the results of the query, not the result of the find method.

What is MongoDB's $min? How is that different from find().sort({the_field: 1}).limit(1)?

What is the difference among MongoDB's aggregation $min versus query modifier $min versus find().sort() that returns the minimum of the_field with: db.the_collection.find().sort({the_field:1}).limit(1)?
Which one should I use if there are about a few hundred calls per minute to retrieve the minimum element in a collection and work with it independently each time?
Side question: Can someone show me the correct syntax using either $min to give me the minimum of a field in a collection? db.the_collection.find().min({the_field:10}) doesn't work.
To get the document for lowest value of 'the_field' you should use
db.the_collection.find().sort({the_field:1}).limit(1)
So over here we are Sorting the document first and then taking the first one out of it as you can see from the query.
Aggregation $min :
It is used when we are grouping the document into a single document and we want to keep the value of the this single document as minimum of all the documents from where it was grouped.
$min operator :
It is used for inclusive lower bound for a specific index in order to constrain the results of find(). In simple words, if the_field is indexed and we want to keep some constraint on the find() then we can use it. It is generally used for improving the performance.
The syntax which you were entering was correct but it requires an Indexed field and the result will be different from what you actually want.
Because min() requires an index on a field, and forces the query to use this index even though a better index is available to be picked up,
So you must prefer the $gte or the sort operations for the query if possible.
db.the_collection.find().sort({the_field:1}).limit(1)
So the above query is better is to use as compare the $min operation.
Refer the below mentioned link.
http://docs.mongodb.org/manual/reference/operator/meta/min/#interaction-with-index-selection
While running the min operator on a no index key, It will give the following error
Error: error: {
"$err" : "Unable to execute query: error processing query: ns=db.dbName limit=0 skip=0\nTree: status == \"approved\"\nSort: {}\nProj: {}\n planner returned error: unable to find relevant index for max/min query",
"code" : 17007
}

Implementation of limit in mongodb

My collection name is trial and data size is 112mb
My query is,
db.trial.find()
and i have added limit up-to 10.
db.trial.find.limit(10).
but the limit is not working.the entire query is running.
Replace
db.trial.find.limit(10)
with
db.trial.find().limit(10)
Also you mention that the entire database is being queried? Run this
db.trial.find().limit(10).explain()
It will tell you how many documents it looked at before stopping the query (nscanned). You will see that nscanned will be 10.
The .limit() modifier on it's own will only "limit" the results of the query that is processed, so that works as designed to "limit" the results returned. In a raw form though with no query you should just have the n scanned as the limit you want:
db.trial.find().limit(10)
If your intent is to only operate on a set number of documents you can alter this with the $maxScan modifier:
db.trial.find({})._addSpecial( "$maxScan" , 11 )
Which causes the query engine to "give up" after the set number of documents have been scanned. But that should only really matter when there is something meaningful in the query.
If you are actually trying to do "paging" then you are better of using "range" queries with $gt and $lt and cousins to effectively change the range of selection that is done in your query.

MongoDB query: Using Limit together with $near skips few documents

I am currently developing an app which gets the specific number of documents from a collection if their location cordinates falls within certain range of distance. I am using a active record library for Codeigniter and the query that is generated is as follows
db.updates.find({locs: { $near: [72.844102008984, 19.130207090604 ], $maxDistance: 5000 }, posted_on : { $lt :1398425538.1942 },}).sort( { posted_on: -1 } ).limit(10).toArray()
The problem I am facing is that the above query skips few documents which should actually get pulled. But if I remove the limit(10) from the above query then proper documents gets pulled.
I am not sure, but does using limit() in MongoDB omit few results ? or does it limits to only the closest(nearest) documents?
P.S - The documents skipped using the limit are not always the same & random results are generated
I suspect you are running into problems with the special nature of the $near query. $near performs both a limit() and a sort() on the cursor returning the results -
Specifies a point for which a geospatial query returns the closest documents first. The query sorts the documents from nearest to farthest.
By default, queries that use a 2d index return a limit of 100 documents; however you may use limit() to change the number of results.
http://docs.mongodb.org/manual/reference/operator/query/near/
While the documentation does specifically discuss overriding the limit of 100 with your own limit call
You can further limit the number of results using cursor.limit().
It is silent on adding your own sort() or both sorting and overriding the limit at the same time. I suspect you are running into side effects of doing both. Note that it's not incorrect to do both - it just may not produce the results you are looking for. I'd suggest trying the same query using $geoWithin
http://docs.mongodb.org/manual/reference/operator/query/geoWithin/
$geoWithin does not apply a sort or a limit on the results, so it gives you something of a more raw result set.
Do you have any identical posted_on dates in the system? I recommend sorting by a second key, perhaps _id. If the sort order is non-deterministic the system may skip documents in a non-deterministic manor. Adding the _id field to your sort order is generally not that expensive if you have an index on the other fields as they will already be very close to the correct order and _id is part of all indexes. ("By default, all collections have an index on the _id field, and applications and users may add additional indexes to support important queries and operations." http://docs.mongodb.org/manual/core/index-single/ )