Index Vs Query Criteria - mongodb

I am new to MongoDB. I have read that Indexes limit the documents to be scanned when we query for some data.
Reference:-http://docs.mongodb.org/manual/core/indexes-introduction
I am confusing it with the Query Criteria as it limits the data.
For example:- db.users.find({score:{"$lt":30}}).
In the manual this example is given and explained in the sense of indexes. What are indexes and how are they different than Query Criteria?
Thank you

Indexes in MongoDB are similar, but not identical to indexes in relational databases. So, to get a basic feel, you can think of the two. Query criteria defines what subset of documents your queries are interested in. An index may be able to USE the query criteria to answer the query faster.
Suppose you have a collection with no indexes, and you do db.users.find({score:{$lt:30}}). With no index, you will need to scan the entire collection to answer the query, processing all documents regardless of their value. With an index on 'score', the query will be able to use the index to drill down on ONLY the documents that match your query, thereby executing faster.

Query criteria limits the data that is send to the client from server but it has to scan each and every document for the matching. On the other hand Index limits the scanning of the documents by having special data structure(B-tree in mongodb).
Ref:-http://docs.mongodb.org/manual/core/indexes-introduction

Related

Design MongoDB schema to make many fields queryable

I want to design a schema in MongoDB. In this, there are many fields ~6 fields and fields in subdocuments ~3 which I can query. Is there any way to make it queryable quicker then seq scan? I could have used index but the fields query can come in any order and also I don't want to make compound index for every combination.
Is there any way I can design this kind of schema?

Why $nin is slower than $in, Mongodb

I have collection with 5M documents with correct indexes.$in working perfect, but same query with $nin super slow...What of the reason of this?
Super fast:
{'tech': {'$in': ['Wordpress', 'wordpress', 'WORDPRESS']}}
Super slow..
{'tech': {'$nin': ['Wordpress', 'wordpress', 'WORDPRESS']}}
The following explanation is accurate only for Mongo versions prior to 3.2
Mongo v3.2 has all kinds of storage engine changes which improved performance on this issue.
Now $nin hash one important quality, which is it not a selective query, First let's understand what selectivity means:
Selectivity is the ability of a query to narrow results using the index. Effective indexes are more selective and allow MongoDB to use the index for a larger portion of the work associated with fulfilling the query.
Now they even state it themselfs:
For instance, the inequality operators $nin and $ne are not very selective since they often match a large portion of the index. As a result, in many cases, a $nin or $ne query with an index may perform no better than a $nin or $ne query that must scan all documents in a collection.
Back then selectivity was a big deal performance wise. This all leads us to your question, why isn't the index being used?
Well when Mongo is asked to create a query plan he preforms a "race" between all available query plans, one of which is a COLSCAN i.e collection scan where the first plan to find 101 documents wins. Due to the poor efficiency of non-selective query's the winning plan (And actually usually faster depending on the index and values in the query) is COLSCAN, Read further about this here
When you have an index (no matter if you talk about MongoDB or any other database), it is always faster to search for a certain value, than searching for a non-existing value.
The database has to scan the entire index, often the index is even not used when you look for "not in" or "not equal". Have a look at execution plan with explain()
Some databases (e.g. Oracle) provide so called Bitmap Indexes. They work differently and usually an IN operation is as fast as an NOT IN operation. But, as usual they have other drawbacks compared to B*Tree Indexes. According to my knowledge Oracle Database is the only major RDBMS which supports Bitmap Indexes.

Indexing in MongoDB [duplicate]

I need to know abt how indexing in mongo improve query performance. And currently my db is not indexed. How can i index an existing DB.? Also is i need to create a new field only for indexing.?.
Fundamentally, indexes in MongoDB are similar to indexes in other database systems. MongoDB supports indexes on any field or sub-field contained in documents within a MongoDB collection.
Indexes are covered in detail here and I highly recommend reading this documentation.
There are sections on indexing operations, strategies and creation options as well as a detailed explanations on the various indexes such as compound indexes (i.e. an index on multiple fields).
One thing to note is that by default, creating an index is a blocking operation. Creating an index is as simple as:
db.collection.ensureIndex( { zip: 1})
Something like this will be returned, indicating the index was correctly inserted:
Inserted 1 record(s) in 7ms
Building an index on a large collection of data, the operation can take a long time to complete. To resolve this issue, the background option can allow you to continue to use your mongod instance during the index build.
Limitations on indexing in MongoDB is covered here.

Mongo DB update query performance

I would like to understand which of the below queries would be faster, while doing updates, in mongo db? I want to update few thousands of records at one stretch.
Accumulating the object ids of those records and firing them using $in or using bulk update?
Using one or two fields in the collection which are common for those few thousand records - akin to "where" in sql and firing an update using those fields. These fields might or might not be indexed.
I know that query will be much smaller in the 2nd case as every single "_id" (oid) is not accumulated. Does accumulating _ids and using those to update documents offer any practical performance advantages?
Does accumulating _ids and using those to update documents offer any practical performance advantages?
Yes because MongoDB will certainly use the _id index (idhack).
In the second method - as you observed - you can't tell whether or not an index will be used for a certain field.
So the answer will be: it depends.
If your collection has million of documents or more, and / or the number of search fields is quite large you should prefer the first search method. Especially if the id list size is not small and / or the id values are adjacent.
If your collection is pretty small and you can tolerate a full scan you may prefer the second approach.
In any case, you should testify both methods using explain().

MongoDB 3.X : Does it make sense to have only one collection per database

Since MongoDB 3.x introduces lock per record and not on collection or database, does it make sense to write all of your data to single collection with one extra identifier field "documentType".
It will help simulate "join" through map-reduce operation.
Couchbase does the same thing with "buckets" instead of collection.
Does anybody see any disadvatanges with this approach ?
There's one big general-case disadvantage: indexes.
With Mongo, you generally want to set up indexes so that most, if not all, queries you make, use them. So in addition to the one on _id, you'll set up indexes on the primary fields you search by (often compounded with those you sort by).
If you're storing everything in one single collection, that means you need to have all those indexes on that collection. Which means two things:
The indexes are be bigger, since there's more documents to index. Granted, this can be somewhat mitigated by using sparse indexes.
Inserting or modifying documents in the collection requires Mongo to update all these indexes (where it'd just update the relevant indexes in the standard use-many-collections approach). This kills your write performance.
Furthermore, if you have in your application a query that somehow doesn't use one of those many indexes, it needs to scan through the entire collection, which is O(n) where n is the number of documents in the collection -- in your case, that means the number of documents in the entire database.
Collections are cheap. Use them ;)