MongoDb index intersection usage - mongodb

I have trouble understanding what MongoDB is doing with my queries. My documents contain almost exclusively array fields, keeping me from using compound indexes.
every field is Indexed with ensureIndex({FieldName:1})
The queries are AND concatenated like that:
{$and: [{FIELD1:"field1Val"},{FIELD2:"field2Val"},{FIELD3:"field3Val"}]}
If i run this query, MongoDB appears to be using only one index.
Why isn't MongoDB using all the Indexes in parallel and then intersects them?
The same problem solved with Lucene runs 8 times faster then my MongoDB implementaition does now.

(Before v2.6, one of MongoDB's well-known limitation is that it can use only one index per query except some special cases using $or
To improve query speed, you can use hint() to enforce the index used. Choose the most seletive index.)
As the comments say, its no longer true. Use index intersection. It seems that u can use at most 2 index intersected. See : When are Compound Indexes still relevant in MongoDB 2.6, given the new Index Intersection feature?
#JohnnyHK Ty for the comments, it makes me learn new things.

Related

Why $nin is slower than $in, Mongodb

I have collection with 5M documents with correct indexes.$in working perfect, but same query with $nin super slow...What of the reason of this?
Super fast:
{'tech': {'$in': ['Wordpress', 'wordpress', 'WORDPRESS']}}
Super slow..
{'tech': {'$nin': ['Wordpress', 'wordpress', 'WORDPRESS']}}
The following explanation is accurate only for Mongo versions prior to 3.2
Mongo v3.2 has all kinds of storage engine changes which improved performance on this issue.
Now $nin hash one important quality, which is it not a selective query, First let's understand what selectivity means:
Selectivity is the ability of a query to narrow results using the index. Effective indexes are more selective and allow MongoDB to use the index for a larger portion of the work associated with fulfilling the query.
Now they even state it themselfs:
For instance, the inequality operators $nin and $ne are not very selective since they often match a large portion of the index. As a result, in many cases, a $nin or $ne query with an index may perform no better than a $nin or $ne query that must scan all documents in a collection.
Back then selectivity was a big deal performance wise. This all leads us to your question, why isn't the index being used?
Well when Mongo is asked to create a query plan he preforms a "race" between all available query plans, one of which is a COLSCAN i.e collection scan where the first plan to find 101 documents wins. Due to the poor efficiency of non-selective query's the winning plan (And actually usually faster depending on the index and values in the query) is COLSCAN, Read further about this here
When you have an index (no matter if you talk about MongoDB or any other database), it is always faster to search for a certain value, than searching for a non-existing value.
The database has to scan the entire index, often the index is even not used when you look for "not in" or "not equal". Have a look at execution plan with explain()
Some databases (e.g. Oracle) provide so called Bitmap Indexes. They work differently and usually an IN operation is as fast as an NOT IN operation. But, as usual they have other drawbacks compared to B*Tree Indexes. According to my knowledge Oracle Database is the only major RDBMS which supports Bitmap Indexes.

Indexing in MongoDB [duplicate]

I need to know abt how indexing in mongo improve query performance. And currently my db is not indexed. How can i index an existing DB.? Also is i need to create a new field only for indexing.?.
Fundamentally, indexes in MongoDB are similar to indexes in other database systems. MongoDB supports indexes on any field or sub-field contained in documents within a MongoDB collection.
Indexes are covered in detail here and I highly recommend reading this documentation.
There are sections on indexing operations, strategies and creation options as well as a detailed explanations on the various indexes such as compound indexes (i.e. an index on multiple fields).
One thing to note is that by default, creating an index is a blocking operation. Creating an index is as simple as:
db.collection.ensureIndex( { zip: 1})
Something like this will be returned, indicating the index was correctly inserted:
Inserted 1 record(s) in 7ms
Building an index on a large collection of data, the operation can take a long time to complete. To resolve this issue, the background option can allow you to continue to use your mongod instance during the index build.
Limitations on indexing in MongoDB is covered here.

How to modify 2dsphere index without downtime for $geoNear queries?

$geoNear queries both require a geospatial index and also require only one geospatial index.
From the docs:
https://docs.mongodb.com/v3.4/reference/operator/aggregation/geoNear/#behavior
$geoNear requires a geospatial index.
The $geoNear requires that a collection have at most only one 2d index and/or only one 2dsphere index.
If I need to make changes to an existing geospatial index on a production system with frequent (one every few seconds) $geoNear queries, how would I apply this change without downtime?
I'm using Mongo 3.4 if that matters, and could upgrade to 3.6 if that would make this easier.
I just tried this on MongoDB 4.2.x and it appears to no longer be an issue. I don't know in which version this issue was resolved/improved. I had two 2dsphere indexes on the same collection and no queries were having issues.
According to the docs, this is still an issue, but only for $geoNear queries, and you can work around it by telling it which index to use:
If your collection has multiple 2dsphere index and/or multiple 2d
index, you must use the key option to specify the indexed field path
to use.
If you do not specify the key, you cannot have multiple
2dsphere index and/or multiple 2d index since without the key, index
selection among multiple 2d indexes or 2dsphere indexes is ambiguous.
https://docs.mongodb.com/manual/core/2dsphere/#geonear-and-geonear-restrictions

GEO2D index implementation in MongoDB

I am using GEO2D index (for data stored as points on a two-dimensional plane) in MongoDB and wondering how it is working under the hood.
There is this page but it did not mention which algorithm it uses.
Is it using R-Tree indexes ?
No, like all other MongoDB indexes it's a B-Tree:
Behavior of Indexes
All indexes in MongoDB are B-tree indexes, which can efficiently
support equality matches and range queries. The index stores items
internally in order sorted by the value of the index field. The
ordering of index entries supports efficient range-based operations
and allows MongoDB to return sorted results using the order of
documents in the index.
http://docs.mongodb.org/manual/core/index-types/
There is an open ticket to implement R-Tree indexing for Geospatial but it's old so it does not appear to be coming any time soon:
https://jira.mongodb.org/browse/SERVER-3551

Skipping the first term of a compound index by using hint()

Suppose I have a Mongo collection with fields a and b. I've populated this collection with {a:'a', b : index } where index increases iteratively from 0 to 1000.
I know this is very, very wrong, but can't explain (no pun intended) why:
collection.find({i:{$gt:500}}).explain() confirms that the index was not used (I can see that it scanned all 1,000 documents in the collection).
Somehow forcing Mongo to use the index seems to work though:
collection.find({i:{$gt:500}}).hint({a:1,i:1}).explain()
Edit
The Mongo documentation is very clear that it will only use compound indexes if one of your query terms is the matches the first term of the compound index. In this case, using hint, it appears that Mongo used the compound index {a:1,i:1} even though the query terms do NOT include a. Is this true?
The interesting part about the way MongoDB performs queries is that it actually may run multiple queries in parallel to determine what is the best plan. It may have chosen to not use the index due to other experimenting you've done from the shell, or even when you added the data and whether it was in memory, etc/ (or a few other factors). Looking at the performance numbers, it's not reporting that using the index was actually any faster than not (although you shouldn't take much stock in those numbers generally). In this case, the data set is really small.
But, more importantly, according to the MongoDB docs, the output from the hinted run also suggests that the query wasn't covered entirely by the index (indexOnly=false).
That's because your index is a:1, i:1, yet the query is for i. Compound indexes only support searches based on any prefix of the indexed fields (meaning they must be in the order they were specified).
http://docs.mongodb.org/manual/core/read-operations/#query-optimization
FYI: Use the verbose option to see a report of all plans that were considered for the find().