Mongo and find always limited to 100 with geo data - mongodb

While trying to experiment with mongo performance I found strange behaviour of my mongo db.
First of all i filled it with the following query:
for (i=0; i < 10000000; i++){db.location.insert( {_id: Math.floor((Math.random()*10000000000)+1), position: [Math.round(Math.random()*10000)/10000, Math.round(Math.random()*10000)/10000]} )}
next:
db.location.ensureIndex( {position: "2d"} )
Then i execute query:
db.location.find( {position: { $near: [1,1], $maxDistance: 1/111.12 } } )
Whatever i try to do i get always size or count result 100.
I noticed in documentation that defualt limit is 100. I tried also to override it with bigger than 100 values. Unfortunately I failed.
Have you ever encountered such a case?

To get all document query like
cordinate = [1,1];
maxDistance = 1/111.12 ;
db.location.find({"position" : {"$within" :
{"$center" : [cordinate , maxDistance ]}
}
});

From oficial documentation:
The $near operator requires a geospatial index: a 2dsphere index for GeoJSON points; a 2d index for legacy coordinate pairs. By default, queries that use a 2d index return a limit of 100 documents; however you may use limit() to change the number of results.
And also look at 'Note' in the and of this tutorial page.
Update:
As Sumeet wrote in comment to his answer - it is open issue.
For be sure, that your query return correct count, that you specifying in limit method, you could try to use .limit(<some_number>).explain().n with your cursor, if you working in the shell.

Just tried the same scenario on my MongoDB and it works fine. I gave limit of 145 and it gave me 145 records. However as mentioned in the documentation it give me 100 records if I do not specify any limit.
db.location.find( {position: { $near: [1,1], $maxDistance: 1 } } ).limit(145)
I tried using above statement.
Please note that I changed the value of $maxDistance so that I get lot of records.

Related

MongoDB: What is the fastest / is there a way to get the 200 documents with a closest timestamp to a specified list of 200 timestamps, say using a $in [duplicate]

Let's assume I have a collection with documents with a ratio attribute that is a floating point number.
{'ratio':1.437}
How do I write a query to find the single document with the closest value to a given integer without loading them all into memory using a driver and finding one with the smallest value of abs(x-ratio)?
Interesting problem. I don't know if you can do it in a single query, but you can do it in two:
var x = 1; // given integer
closestBelow = db.test.find({ratio: {$lte: x}}).sort({ratio: -1}).limit(1);
closestAbove = db.test.find({ratio: {$gt: x}}).sort({ratio: 1}).limit(1);
Then you just check which of the two docs has the ratio closest to the target integer.
MongoDB 3.2 Update
The 3.2 release adds support for the $abs absolute value aggregation operator which now allows this to be done in a single aggregate query:
var x = 1;
db.test.aggregate([
// Project a diff field that's the absolute difference along with the original doc.
{$project: {diff: {$abs: {$subtract: [x, '$ratio']}}, doc: '$$ROOT'}},
// Order the docs by diff
{$sort: {diff: 1}},
// Take the first one
{$limit: 1}
])
I have another idea, but very tricky and need to change your data structure.
You can use geolocation index which supported by mongodb
First, change your data to this structure and keep the second value with 0
{'ratio':[1.437, 0]}
Then you can use $near operator to find the the closest ratio value, and because the operator return a list sorted by distance with the integer you give, you have to use limit to get only the closest value.
db.places.find( { ratio : { $near : [50,0] } } ).limit(1)
If you don't want to do this, I think you can just use #JohnnyHK's answer :)

Mongo $geoNear query - incorrect nscanned number and incorrect results

I have a collection with around 6k documents with 2dsphere index on location field, example below:
"location" : {
"type" : "Point",
"coordinates" : [
138.576187,
-35.010441
]
}
When using the below query I only get around 450 docs returned with nscanned around 3k. Every document has a location, many locations are duplicated. Distances returned from GeoJSON are in meters, and a distance multiplier of 0.000625 will convert distances to miles. To test, I'm expecting max distance of 32180000000000 to return all the documents on the planet, ie 6000
db.x.aggregate([
{"$geoNear":{
"near":{
"type":"Point",
"coordinates":[-0.3658702,51.45686]
},
"distanceField":"distance",
"limit":100000,
"distanceMultiplier":0.000625,
"maxDistance":32180000000000,
"spherical":true,
}}
])
Why dont I get 6000 documents returned? I'm unable to find the logic behind this behaviour from Mongo. I've found on the mongo forums:
"geoNear's major limitation is that as a command it can return a result set up to the maximum document size as all of the matched documents are returned in a single result document."
I'm pretty sure that mongodb has a limit of 16 MB on the results of $GeoNear. In https://github.com/mongodb/mongo/blob/master/src/mongo/db/commands/geo_near_cmd.cpp you can see that while the results of the geonear are being built, there's this condition
// Don't make a too-big result object.
if (resultBuilder.len() + resObj.objsize()> BSONObjMaxUserSize) {
warning() << "Too many geoNear results for query " << rewritten.toString()
<< ", truncating output.";
break;
}
And in https://github.com/mongodb/mongo/blob/master/src/mongo/bson/util/builder.h youll see its limited to 16 MB.
const int BSONObjMaxUserSize = 16 * 1024 * 1024;

Meteor collection find sort after filtering

I've a collection of addresses, I would like to filter the collection to keep the 10 nearest address, then I would like to be able to sort them from the farther to the nearest.
Is that possible to achieve this within a single find request in meteor ?
The following gives me the 10 nearest addresses:
Addresses.find({}, {sort:{distance:1}, limit:10});
but they are ordered by increasing distance, obviously if I do set distance:-1 they will come by decreasing order but I will also get only the 10 farthest addresses…
You need the aggregation framework:
db.collection.aggregate(
{ $sort: { distance: 1 } },
{ $limit: 10 },
{ $sort: { distance: -1 } }
)
I hope the query is self-explanatory.
If you can't run an aggregation or native mongo query in MeteorJS, then you'll probably have to reverse the results you got from the DB query programatically.
If you fetch the result of your search and reverse it it should work.
Addresses.find({}, {sort:{distance:1}, limit:10}).fetch().reverse()
The only drawback is that now it's an array and not a cursor anymore

MongoDB $near Returning Max Results with Max Distance

I have the following mongo query (written in PHP)
$q = array("created_at" => array("\$gte" => $mdate), "icv" => 1, "loc" => array("\$near" => array($latitude, $longitude), "\$maxDistance" => 5));
Which is basically:
db.collection.find({loc: {$near: [XX,YY]}, $maxDistance: 5}, "vid": 1, "created_at":{$gte: "SOMEDATE"}});
I would like to find ALL the documents that match this query, not just the 100 that it returns by default. If i set the limit on this query, it goes out side the distance distance.
Any suggestions?
In this mailing list post, Eliot mentions that $near does not use a cursor, so its results are limited to either 4MB or 100 documents, whichever is reached first. The current documentation says the same, but this is true only for 2d indexes (the documentation should be fixed by DOCS-1841).
If you are storing points in GeoJSON and using 2dsphere indexes (new in version 2.4), $near queries do utilize cursors and should not have a hard upper limit of 100 documents.
Consider the following examples, first using a 2d index:
> var point = [0,0];
> for(i=0;i<200;i++) db.foo.insert({x: point});
> db.foo.ensureIndex({x: "2d"});
> db.foo.find({x: {$near: point}}).count(true);
100
> db.foo.find({x: {$near: point}}).limit(200).count(true);
100
Then using a 2dsphere index with equivalent point data:
> var point = {type: "Point", coordinates: [0,0]};
> for(i=0;i<200;i++) db.bar.insert({x: point});
> db.bar.ensureIndex({x: "2dsphere"})
> db.bar.find({x: {$near: point}}).count(true)
200
> db.bar.find({x: {$near: point}}).limit(150).count(true)
150

Geospatial Indexing with a simple key first

After reading about MongoDB and Geospatial Indexing
I was amazed that it did not support compound keys not starting with the 2d index.
I dont know if I would gain anything on it, but right now the mssql solution is just as slow/fast.
SELECT TOP 30 * FROM Villages WHERE SID = 10 ORDER BY (math to calc radius from the center point)
This works, but is slow because it not smart enough to use a index so it has to calc the radius for all villages with that SID.
So in Mongo I wanted to create an index like: {sid: 1, loc: "2d"} so I could filter out alot from the start.
I'm not sure there are any solutions for this. I thought about creating a collection for each sid since they don't share any information. But what are the disadvantages of this? Or is this how people do it ?
Update
The maps are flat: 800, 800 to -800,-800, villages are places from the center of the map and out. There are about 300 different maps which are not related, so they could be in diff collections, but not sure about the overhead.
If more information is need, please let me know.
What I have tried
> var res = db.Villages.find({sid: 464})
> db.Villages.find({loc: {$near: [50, 50]}, sid: {$in: res}})
error: { "$err" : "invalid query", "code" : 12580 }
>
Also tried this
db.Villages.find({loc: {$near: [50, 50]}, sid: {$in: db.Villages.find({sid: 464}, {sid: 1})}})
error: { "$err" : "invalid query", "code" : 12580 }
I'm not really sure what I'm doing wrong, but its probably somthing about the syntax. Confused here.
As you stated already Mongodb cannot accept location as secondary key in geo index. 2d has to be first in index. So you are out of luck here in changing indexing patterns here.
But there is a workaround, instead the compound geo index you can create two separate indexes on sid and one compound index with loc and sid
db.your_collection.ensureIndex({sid : 1})
db.your_collection.ensureIndex({loc : '2d',sid:1})
or two separate indexes on sid and loc
db.your_collection.ensureIndex({sid : 1})
db.your_collection.ensureIndex({loc : '2d'})
(am not sure which of the above one is efficient, you can try it yourself)
and you can make two different queries to get the results filterd by sid first and the location next, kinda like this
res = db.your_collection.find({sid:10})
//get all the ids from the res (res_ids)
//and query by location using the ids
db.your_collection.find({loc:{ $near : [50,50] } ,sid : {$in : res_ids}})