How to modify 2dsphere index without downtime for $geoNear queries? - mongodb

$geoNear queries both require a geospatial index and also require only one geospatial index.
From the docs:
https://docs.mongodb.com/v3.4/reference/operator/aggregation/geoNear/#behavior
$geoNear requires a geospatial index.
The $geoNear requires that a collection have at most only one 2d index and/or only one 2dsphere index.
If I need to make changes to an existing geospatial index on a production system with frequent (one every few seconds) $geoNear queries, how would I apply this change without downtime?
I'm using Mongo 3.4 if that matters, and could upgrade to 3.6 if that would make this easier.

I just tried this on MongoDB 4.2.x and it appears to no longer be an issue. I don't know in which version this issue was resolved/improved. I had two 2dsphere indexes on the same collection and no queries were having issues.
According to the docs, this is still an issue, but only for $geoNear queries, and you can work around it by telling it which index to use:
If your collection has multiple 2dsphere index and/or multiple 2d
index, you must use the key option to specify the indexed field path
to use.
If you do not specify the key, you cannot have multiple
2dsphere index and/or multiple 2d index since without the key, index
selection among multiple 2d indexes or 2dsphere indexes is ambiguous.
https://docs.mongodb.com/manual/core/2dsphere/#geonear-and-geonear-restrictions

Related

Indexing in MongoDB [duplicate]

I need to know abt how indexing in mongo improve query performance. And currently my db is not indexed. How can i index an existing DB.? Also is i need to create a new field only for indexing.?.
Fundamentally, indexes in MongoDB are similar to indexes in other database systems. MongoDB supports indexes on any field or sub-field contained in documents within a MongoDB collection.
Indexes are covered in detail here and I highly recommend reading this documentation.
There are sections on indexing operations, strategies and creation options as well as a detailed explanations on the various indexes such as compound indexes (i.e. an index on multiple fields).
One thing to note is that by default, creating an index is a blocking operation. Creating an index is as simple as:
db.collection.ensureIndex( { zip: 1})
Something like this will be returned, indicating the index was correctly inserted:
Inserted 1 record(s) in 7ms
Building an index on a large collection of data, the operation can take a long time to complete. To resolve this issue, the background option can allow you to continue to use your mongod instance during the index build.
Limitations on indexing in MongoDB is covered here.

MongoDb index intersection usage

I have trouble understanding what MongoDB is doing with my queries. My documents contain almost exclusively array fields, keeping me from using compound indexes.
every field is Indexed with ensureIndex({FieldName:1})
The queries are AND concatenated like that:
{$and: [{FIELD1:"field1Val"},{FIELD2:"field2Val"},{FIELD3:"field3Val"}]}
If i run this query, MongoDB appears to be using only one index.
Why isn't MongoDB using all the Indexes in parallel and then intersects them?
The same problem solved with Lucene runs 8 times faster then my MongoDB implementaition does now.
(Before v2.6, one of MongoDB's well-known limitation is that it can use only one index per query except some special cases using $or
To improve query speed, you can use hint() to enforce the index used. Choose the most seletive index.)
As the comments say, its no longer true. Use index intersection. It seems that u can use at most 2 index intersected. See : When are Compound Indexes still relevant in MongoDB 2.6, given the new Index Intersection feature?
#JohnnyHK Ty for the comments, it makes me learn new things.

GEO2D index implementation in MongoDB

I am using GEO2D index (for data stored as points on a two-dimensional plane) in MongoDB and wondering how it is working under the hood.
There is this page but it did not mention which algorithm it uses.
Is it using R-Tree indexes ?
No, like all other MongoDB indexes it's a B-Tree:
Behavior of Indexes
All indexes in MongoDB are B-tree indexes, which can efficiently
support equality matches and range queries. The index stores items
internally in order sorted by the value of the index field. The
ordering of index entries supports efficient range-based operations
and allows MongoDB to return sorted results using the order of
documents in the index.
http://docs.mongodb.org/manual/core/index-types/
There is an open ticket to implement R-Tree indexing for Geospatial but it's old so it does not appear to be coming any time soon:
https://jira.mongodb.org/browse/SERVER-3551

mongodb geoNear vs near

It looks like mongodb offers two similar functions for geospatial queries - $near and $geoNear. According to the mongo docs
The geoNear command provides an alternative to the $near operator. In
addition to the functionality of $near, geoNear returns additional
diagnostic information.
It looks like geoNear provides a superset of the near functionality. For example, near seems to only return the closest 100 documents, whereas geoNear lets you specify a maximum. Is there a reason to use near instead of geoNear? Is one more efficient than the other?
Efficiency should be identical for either.
geoNear's major limitation is that as a command it can return a result set up to the maximum document size as all of the matched documents are returned in a single result document. It also requires that a distance field be added to each result document which may or may not be an issue depending on your usage.
$near is a query operator so the results can be larger than a single document (they are still returned in a single response but not a single document). You can also set the maximum number of documents via the query's limit().
I tend to recommend that users stick with the $near unless they need the diagnostics (e.g., distance, or location matched) from the geonear command.
These are major differences :-
$geoNear also gives you distance from the point but $near command doesn't.
$geoNear command requires that the collection have at most only one 2d index and/or only one 2dsphere index whereas geospatial query operators like $near and $geoWithin permit collections to have multiple geospatial indexes.
This is because in $geoNear command there is no option to specify the field on which you want to search, where as in $near command you can specify the field name.
The main difference is that $near is a query operator, but $geoNear is an aggregation stage. Both return documents in order of nearest to farthest from the given point.
What it means is that $near can be used in find() queries or in the $match aggregation stage, but $geoNear cannot. Instead $geoNear must be used as a separate aggregation stage only.
The options each feature provides also differ. I invite you to review the details in the corresponding documentation sections:
$near documentation
$geoNear documentation
The 100 documents limit with GeoNear is the default behaviour but you can just set the num fields as described on the mongodb documentation (http://docs.mongodb.org/manual/reference/command/geoNear/)
Default is set to 100 but you can set more. Unfortunately skip parameter is missing for the moment
(see https://jira.mongodb.org/browse/SERVER-3925)

How to workaround MongoDB's current limitation of just one geospatial index per collection?

Currently, MongoDB supports only one geospatial index per collection. How can I workaround this manually?
Is there some smart way to emulate this kind of index without losing too much accuracy?
You probably can create a second collection, which will hold only the (same) object-id, and the secondary geospatial index.
when you want to query for the secondary index, you'd query that second collection, get the list of ids back, and then query the master collection by ids.