Geospatial Indexing with a simple key first - mongodb

After reading about MongoDB and Geospatial Indexing
I was amazed that it did not support compound keys not starting with the 2d index.
I dont know if I would gain anything on it, but right now the mssql solution is just as slow/fast.
SELECT TOP 30 * FROM Villages WHERE SID = 10 ORDER BY (math to calc radius from the center point)
This works, but is slow because it not smart enough to use a index so it has to calc the radius for all villages with that SID.
So in Mongo I wanted to create an index like: {sid: 1, loc: "2d"} so I could filter out alot from the start.
I'm not sure there are any solutions for this. I thought about creating a collection for each sid since they don't share any information. But what are the disadvantages of this? Or is this how people do it ?
Update
The maps are flat: 800, 800 to -800,-800, villages are places from the center of the map and out. There are about 300 different maps which are not related, so they could be in diff collections, but not sure about the overhead.
If more information is need, please let me know.
What I have tried
> var res = db.Villages.find({sid: 464})
> db.Villages.find({loc: {$near: [50, 50]}, sid: {$in: res}})
error: { "$err" : "invalid query", "code" : 12580 }
>
Also tried this
db.Villages.find({loc: {$near: [50, 50]}, sid: {$in: db.Villages.find({sid: 464}, {sid: 1})}})
error: { "$err" : "invalid query", "code" : 12580 }
I'm not really sure what I'm doing wrong, but its probably somthing about the syntax. Confused here.

As you stated already Mongodb cannot accept location as secondary key in geo index. 2d has to be first in index. So you are out of luck here in changing indexing patterns here.
But there is a workaround, instead the compound geo index you can create two separate indexes on sid and one compound index with loc and sid
db.your_collection.ensureIndex({sid : 1})
db.your_collection.ensureIndex({loc : '2d',sid:1})
or two separate indexes on sid and loc
db.your_collection.ensureIndex({sid : 1})
db.your_collection.ensureIndex({loc : '2d'})
(am not sure which of the above one is efficient, you can try it yourself)
and you can make two different queries to get the results filterd by sid first and the location next, kinda like this
res = db.your_collection.find({sid:10})
//get all the ids from the res (res_ids)
//and query by location using the ids
db.your_collection.find({loc:{ $near : [50,50] } ,sid : {$in : res_ids}})

Related

Using an indexed field for selecting a set of random items from a collection (MongoDB)

I am using MongoDB 2.4.10, and I have a collection of four million records, and a query that creates a subset of no more than 50000 even for our power users. I need to select a random 30 items from this subset, and, given the potential performance issues with skip and limit (especially when doing it 30 times with random skip amounts from 1-50000), I stumbled across the following solution:
Create a field for each record which is a completely random number
Create an index over this field
Sort by the field, and use skip(X).limit(30) to get a page of 30 items that, while consecutive in terms of the random field, actually bear no relation to each other. To the user, they seem random.
My index looks like this:
{a: 1, b: 1, c: 1, d: 1}
I also have a separate index:
{d : 1}
'd' is the randomised field.
My query looks like this:
db.content.find({a : {$in : ["xyz", "abc"]}, b : "ok", c : "Image"})
.sort({d : 1}).skip(X).limit(30)
When the collection is small, this works perfectly. However, on our performance and live systems, this query fails, because instead of using the a, b, c, d index, it uses this index only:
{d : 1}
As a result, the query ends up scanning more records than it needs to (by a factor of 25). So, I introduced hint:
db.content.find({a : {$in : ["xyz", "abc"]}, b : "ok", c : "Image"})
.hint({a : 1, b : 1, c : 1, d : 1}).sort({d : 1}).skip(X).limit(30)
This now works great with all values of X up to 11000, and explain() shows the correct index in use. But, when the skip amount exceeds 11000, I get:
{
"$err" : "too much data for sort() with no index. add an index or specify a smaller limit",
"code" : 10128
}
Presumably, the risk of hitting this error is why the query (without the hint) wasn't using this index earlier. So:
Why does Mongo think that the sort has no index to use, when I've forced it to use an index that explicitly includes the sorting field at the end?
Is there a better way of doing this?

Mongo and find always limited to 100 with geo data

While trying to experiment with mongo performance I found strange behaviour of my mongo db.
First of all i filled it with the following query:
for (i=0; i < 10000000; i++){db.location.insert( {_id: Math.floor((Math.random()*10000000000)+1), position: [Math.round(Math.random()*10000)/10000, Math.round(Math.random()*10000)/10000]} )}
next:
db.location.ensureIndex( {position: "2d"} )
Then i execute query:
db.location.find( {position: { $near: [1,1], $maxDistance: 1/111.12 } } )
Whatever i try to do i get always size or count result 100.
I noticed in documentation that defualt limit is 100. I tried also to override it with bigger than 100 values. Unfortunately I failed.
Have you ever encountered such a case?
To get all document query like
cordinate = [1,1];
maxDistance = 1/111.12 ;
db.location.find({"position" : {"$within" :
{"$center" : [cordinate , maxDistance ]}
}
});
From oficial documentation:
The $near operator requires a geospatial index: a 2dsphere index for GeoJSON points; a 2d index for legacy coordinate pairs. By default, queries that use a 2d index return a limit of 100 documents; however you may use limit() to change the number of results.
And also look at 'Note' in the and of this tutorial page.
Update:
As Sumeet wrote in comment to his answer - it is open issue.
For be sure, that your query return correct count, that you specifying in limit method, you could try to use .limit(<some_number>).explain().n with your cursor, if you working in the shell.
Just tried the same scenario on my MongoDB and it works fine. I gave limit of 145 and it gave me 145 records. However as mentioned in the documentation it give me 100 records if I do not specify any limit.
db.location.find( {position: { $near: [1,1], $maxDistance: 1 } } ).limit(145)
I tried using above statement.
Please note that I changed the value of $maxDistance so that I get lot of records.

Mongodb: geospatial indexes and covered index queries

Currently I have an index on a geospatial field in one of my collections set up like this:
collection.ensureIndex({ loc : "2d" }, { min : -10000 , max : 10000,
bits : 32, unique:true}
However, I would like to include one more field in the index, so I can take advantage of covered index queries in one of my use cases. An ensureIndex with multiple fields (compound index) looks something like this:
collection.ensureIndex( { username : 1, password : 1, roles : 1} );
Question is - how do I write my first index spec with an additional field, so that I keep my min/max parameters? In other words, how to specify that the min/max/bits only apply to one of the index fields? My best guess so far is:
collection.ensureIndex({ loc : "2d", field2 : 1 },
{ min : -10000 , max : 10000 , bits : 32, unique:true}
But I have no confidence that this is working as it should!
UPDATE:
There is more info in the documentation here, but it still does not explicitly show how to specify the min/max in this case.
db.collection.getIndexes() should give you the parameters used to construct the index - the min/max/bits field will only ever apply to the "2d" field.
There's no way to see these parameters aside from getIndexes(), but you can easily verify that you aren't allowed to insert the same location/field pair, and that you aren't allowed to insert locs outside your bounds (but field2 can be anything). The bits setting is harder to verify directly, though you can indirectly verify by setting it very low and seeing that nearby points then trigger the unique key violations.

MongoDB "point not in interval of [ -180, 180 )"

I have objects like the following:
{
"_id" : ObjectId("4f7f0d64e4b0db1e18790f10"),
"location" : [
0.674081,
23.473
],
"name" : "Item Name"
}
If I try and create a 2D index on the location field, I get an error:
> db.Place.ensureIndex({location:"2d"});
point not in interval of [ -180, 180 )
I've tried finding documents that are out of bounds but this returns no results:
> db.Place.find({"location.0":{"$lte":-180, "$gte":180}});
> db.Place.find({"location.1":{"$lte":-180, "$gte":180}});
I've also tried the following queries to make sure:
> db.Place.find({"location":{"$size":0}}).count();
0
> db.Place.find({"location":{"$size":1}}).count();
0
> db.Place.find({"location":{"$size":2}}).count();
363485
> db.Place.count();
363485
What other approaches can I use to find the bad document(s)?
For information I'm using MongoDB version 2.0.2 on 32 bit Linux (yes, I know that's not good enough for prod). The documents were imported by a Java document where the location objects are only ever represented as double primitives.
Your queries aren't quite right. You need:
db.Place.find({"location": {"$lte":-180 } })
And
db.Place.find({"location": {"$gte":180 } })
I've run into a similar issue, and I've been able to pinpoint that ensureIndex breaks when one of lat/lon are exactly equal to -180 or 180. Look for those cases if you want to get around this issue, otherwise it looks like there's a fix: https://github.com/mongodb/mongo/commit/86b5b89c2785f3f45ad2674f48fe3189c137904c

MongoDB geoNear multiple coordinates

I want to order my results based on their proximity to MULTIPLE points in a 2D space.
I assume this would be done by querying against the first point and then re-querying/checking those results against the second point?
Maybe the code below explains what I am trying to achieve a bit better?
db.runCommand({
geoNear:"places",
near:[ [52.5243, 13.4063], [48.1448, 11.5580] ]
})
Solution: Incase anyone is interested, this is how I achieved this (thanks to the answer below)
a = Trip.geo_near([52.5243, 13.4063], :max_distance => 40, :unit => :mi).uniq
b = Trip.geo_near([48.1448, 11.5580], :max_distance => 40, :unit => :mi).uniq
#results = a & b
MongoDB has a whole section in their documentation on Geospacial indexing. http://www.mongodb.org/display/DOCS/Geospatial+Indexing
I think what you're looking for is a bounding box query. This is directly from their code examples.
box = [[40.73083, -73.99756], [40.741404, -73.988135]]
db.places.find({"loc" : {"$within" : {"$box" : box}}})
What do you intend the query above to return? Places that are near one OR the other location? In that case, you should run two queries, then union the results in your application code.