MongoDB indexing , please explain difference between 1 and -1 - mongodb

mongoDB document example has two fields userid and score. And while indexing in mongoDB for userid , 1 value provided and for score its -1, i.e. { userid: 1, score: -1 }. Please explain here the difference between 1 and -1 in mongoDB indexing.
I tried indexing in mongoDB to make queries faster but with less knowledge of indexing in mongoDB I am not getting expected output

As #deceze suggests in their comment, the basic answer to the question is mentioned in the documentation. The answer is that the value determines whether the key is ascending (1) or descending (-1). But what does this actually mean and is it important?
Many databases use a type of B-tree data structure as the backbone for their (standard) indexes. These data structures are ordered by nature. This specific property is one of the things that make them ideal for supporting a wide range of database queries. For example, it allows them to bound a range scan (such as $lte) as it knows that there will be no more matching values later in the index once it scans to the first key that exceeds the requested maximum value. Similarly, if the definition of the index is compatible with the sort order requested by the client then the database can satisfy the sort for "free" as it scans the index.
For single field indexes as well as sorts on a single field, the direction specified in the index definition doesn't matter. This is because indexes can be traversed in either direction. So if your index is on { timestamp: 1 } but the sort is on { timestamp: -1 }, then the database can choose to traverse the index backwards to satisfy the requested sort.
Directionality becomes important when compound sorts are involved. If the index is defined as { score: 1, timestamp: 1 } then the database will either be less efficient or completely unable to use it to satisfy a sort on { score: 1, timestamp: -1 }. This is a direct consequence of the structure of the underlying b-tree data structure that the index is built on. More information can be found here in the documentation.
I tried indexing in mongoDB to make queries faster but with less knowledge of indexing in mongoDB I am not getting expected output
This is a much different and more specific question than the one that you have actually asked. If there is a particular query that you are trying to improve the performance of, then it may be beneficial to post a question with that query (and the associated .explain() output). That would allow us to advise on actionable steps to help you improve the performance of the operation (or understand why it cannot be done), as opposed to having broader discussions about index definitions which may or may not be relevant to your issue.

Related

mongodb - Multiple Compound Indexes involving a common field

We have a collection with millions of data. This data is being rendered in the UI for stats purpose and hence time to render is of key importance.
The queries to render the data involve the below fields:
field_a and field_t
field_b and field_t
field_c and field_t
As we are querying millions of data, we want to use Compound Index to speed up the queries.
To do so, we can simply add 3 different compound indexes as below:
db.mycollection.createIndex( { "field_a": 1, "field_t": 1 }
db.mycollection.createIndex( { "field_b": 1, "field_t": 1 }
db.mycollection.createIndex( { "field_c": 1, "field_t": 1 }
ESR rule is respected while creating the indexes as field_a, field_b and field_c are equality checks and field_t is a range check.
Please note that field_t is common in all the 3 indexes.
Instead of creating 3 different indexes, is there a better approach to this?
Does mongo provide a more efficient way to handle this scenario where same field is being used in multiple compound indexes?
Better or more efficient in what regard?
Having the three indexes that you mentioned is the most efficient approach in terms of query performance. They will allow the database to process only the data that is relevant for each query and nothing else. Any other approach would reduce read efficiency (and speed) which may not be a good tradeoff.
Most databases, MongoDB included, typically use a single index when executing a query. This is mostly a consequence of how indexes work. Typically indexes use a B-tree like data structure, which is an ordered set of information. When following the ESR rule (placing equality conditions before range conditions), all of the information for a specific query is contained within a single bounded subtree in the index which can be directly traversed. It loses the ability to do this when the index is not structured in this way (including putting range keys first).
Other potential approaches using single field indexes would be things like:
Index intersection - where you create (in this case) 4 single field indexes and have the database use 2 for each query. MongoDB typically does not choose this approach very often as it results in scanning larger portions of the index when compared to the compound index approach above.
Using 1 single field index for each query - the database would end up retrieving documents to filter on the other field which could be quite inefficient depending on the selectivity of the other field.
While these may reduce the overall size of the collective indexes, it increases the cost (and decreases the efficiency) of executing the queries. Depending on what you are optimizing for, the approach you've outlined would be considered a best practice in terms of query efficiency.

MongoDB Find performance: single compound index VS two single field indexes

I'm looking for an advice about which indexing strategy to use in MongoDb 3.4.
Let's suppose we have a people collection of documents with the following shape:
{
_id: 10,
name: "Bob",
age: 32,
profession: "Hacker"
}
Let's imagine that a web api to query the collection is exposed and that the only possibile filters are by name or by age.
A sample call to the api will be something like: http://myAwesomeWebSite/people?name="Bob"&age=25
Such a call will be translated in the following query: db.people.find({name: "Bob", age: 25}).
To better clarify our scenario, consider that:
the field name was already in our documents and we already have an index on that field
we are going to add the new field age due to some new features of our application
the database is only accessible via the web api mentioned above and the most important requirement is to expose a super fast web api
all the calls to the web api will apply a filter on both the fields name and age (put another way, all the calls to the web api will have the same pattern, which is the one showed above)
That said, we have to decide which of the following indexes offer the best performance:
One compound index: {name: 1, age: 1}
Two single-field indexes: {name: 1} and {age: 1}
According to some simple tests, it seems that the single compound index is much more performant than the two single-field indexes.
By executing a single query via the mongo shell, the explain() method suggests that using a single compound index you can query the database nearly ten times faster than using two single fields indexes.
This difference seems to be less drammatic in a more realistic scenario, where instead of executing a single query via the mongo shell, multiple calls are made to two different urls of a nodejs web application. Both urls execute a query to the database and return the fetched data as a json array, one using a collection with the single compound index and the other using a collection with two single-field indexes (both collections having exactly the same documents).
In this test the single compound index still seems to be the best choice in terms of performance, but this time the difference is less marked.
According to test results, we are considering to use the single compound index approach.
Does anyone has experience about this topic ? Are we missing any important consideration (maybe some disadvantage of big compound indexes) ?
Given a plain standard query (with no limit() or sort() or anything fancy applied) that has a filter condition on two fields (as in name and age in your example), in order to find the resulting documents, MongoDB will either:
do a full collection scan (read every document in the entire collection, parse the BSON, find the values in question, test them against the input and return/discard each document): This is super I/O intense and hence slow.
use one index that holds one of the fields (use index tree to locate relevant subset of documents followed by a scan of them): Depending on your data distribution/index selectivity this can be very fast or barely provide any benefit (imagine an index on age in a dataset of millions of people between 30 and 40 years --> every lookup would still yield an endless number of documents).
use two indexes that together contain both fields in question (load both indexes, perform key lookups, then calculate the intersection of the results): Again, depending on your data distribution, this may or may not give you great(er) performance. It should, however, in most cases be faster than #2. I would, however, be surprised if it was really 10x slower then #4 (as you mentioned).
use a compound index (two subsequent key lookups immediately lead to the required documents): This will be the fastest option of all given that it requires the least and cheapest operations to get to the right documents. In order to ensure the greatest level of reuse (not performance which won't be affected by this) you should in general start with the most selective field first, so in your case probably name and not age given that a lot of people will have the same age (so low selectivity) compared to name (higher selectivity). But that choice also depends on your concrete scenario and the queries you intend to run against your database. There is a pretty good article on the web about how to best define a compound index taking various aspects of your specific situation into account: https://emptysqua.re/blog/optimizing-mongodb-compound-indexes
Other aspects to consider are: Index updates come at a certain price. However, if all you care about is raw read speed and you only have a few updates every now and again, then you should go for more/bigger indexes.
And last but not least (!) the well over-used bottom line advice: Profile the hell out of your system using real data and perhaps even realistic load scenarios. And also keep measuring as your data/system changes over time.
Additional reads:
https://docs.mongodb.com/manual/core/query-optimization/index.html
https://dba.stackexchange.com/questions/158240/mongodb-index-intersection-does-not-eliminate-the-need-for-creating-compound-in
Index intersection vs. compound index?
mongodb compund index vs. index intersect
How does the order of compound indexes matter in MongoDB performance-wise?
In MongoDB, I am using a large query, how I will create compound index or single index, So My response time boost up

What does nscannedObjects = 0 actually mean?

As far as I understood, nscannedObjects entry in the explain() method means the number of documents that MongoDB needed to go to find in the disk.
My question is: when this value is 0, what this actually mean besides the explanation above? Does MongoDB keep a cache with some documents stored there?
nscannedObjects=0 means that there was no fetching or filtering to satisfy your query, the query was resolved solely based on indexes. So for example if you were to query for {_id:10} and there were no matching documents you would get nscannedObjects=0.
It has nothing to do with the data being in memory, there is no such distinction with the query plan.
Note that in MongoDB 3.0 and later nscanned and nscannedObjects are now called totalKeysExamined and totalDocsExamined, which is a little more self-explanatory.
Mongo is a document database, which means that it can interpret the structure of the stored documents (unlike for example key-value stores).
One particular advantage of that approach is that you can build indices on the documents in the database.
Index is a data structure (usually a variant of b-tree), which allows for fast searching of documents basing on some of their attributes (for example id (!= _id) or some other distinctive feature). These are usually stored in memory, allowing very fast access to them.
When you search for documents basing on indexed attributes (let's say id > 50), then mongo doesn't need to fetch the document from memory/disk/whatever - it can see which documents match the criteria basing solely on the index (note that fetching something from disk is several orders of magnitude slower than memory lookup, even with no cache). The only time it actually goes to the disk is when you need to fetch the document for further processing (and which is not covered by the statistic you cited).
Indices are crucial to achieve high performance, but also have drawbacks (for example rarely used index can slow down inserts and not be worth it - after each insertion the index has to be updated).

Choosing the right database index type

I have a very simple Mongo database for a personal nodejs project. It's basically just records of registered users.
My most important field is an alpha-numeric string (let's call it user_id and assume it can't be only numeric) of about 15 to 20 characters.
Now the most important operation is checking if the user exists at or all not. I do this by querying db.collection.find("user_id": "testuser-123")
if no record returns, I save the user along with some other not so important data like first name, last and signup date.
Now I obviously want to make user_id an index.
I read the Indexing Tutorials on the official MongoDB Manual.
First I tried setting a text index because I thought that would fit the alpha-numeric field. I also tried setting language:none. But it turned out that my query returned in ~12ms instead of 6ms without indexing.
Then I tried just setting an ordered index like {user_id: 1}, but I haven't seen any difference (is it only working for numeric values?).
Can anyone recommend me the best type of index for this case or quickest query to check if the user exists? Or maybe is MongoDB not the best match for this?
Some random thoughts first:
A text index is used to help full text search. Given your description this is not what is needed here, as, if I understand it well, you need to use an exact match of the whole field.
Without any index, MongoDB will use a linear search. Using big O notation, this is an O(n) operation. With an (ordered) index, the search is performed in O(log(n)). That means that an index will dramatically speed up queries when you will have many documents. But you will not necessary see any improvement if you have a small number of documents. In that case, O(n) can even be worst than O(log(n)). Some database management systems don't even bother using the index if the optimizer estimate that it will not provide enough benefits. I don't know if MongoDB does that, though.
Given your use case, I think the proper index is an unique index. This is an ordered index that would prevent insertion of two identical documents.
In your application, do not test before insert. In real application, this could lead to race condition when you have concurrent inserts. If you use an unique index, just try to insert -- and be prepared to gracefully handle an error caused by a duplicate key.

Skipping the first term of a compound index by using hint()

Suppose I have a Mongo collection with fields a and b. I've populated this collection with {a:'a', b : index } where index increases iteratively from 0 to 1000.
I know this is very, very wrong, but can't explain (no pun intended) why:
collection.find({i:{$gt:500}}).explain() confirms that the index was not used (I can see that it scanned all 1,000 documents in the collection).
Somehow forcing Mongo to use the index seems to work though:
collection.find({i:{$gt:500}}).hint({a:1,i:1}).explain()
Edit
The Mongo documentation is very clear that it will only use compound indexes if one of your query terms is the matches the first term of the compound index. In this case, using hint, it appears that Mongo used the compound index {a:1,i:1} even though the query terms do NOT include a. Is this true?
The interesting part about the way MongoDB performs queries is that it actually may run multiple queries in parallel to determine what is the best plan. It may have chosen to not use the index due to other experimenting you've done from the shell, or even when you added the data and whether it was in memory, etc/ (or a few other factors). Looking at the performance numbers, it's not reporting that using the index was actually any faster than not (although you shouldn't take much stock in those numbers generally). In this case, the data set is really small.
But, more importantly, according to the MongoDB docs, the output from the hinted run also suggests that the query wasn't covered entirely by the index (indexOnly=false).
That's because your index is a:1, i:1, yet the query is for i. Compound indexes only support searches based on any prefix of the indexed fields (meaning they must be in the order they were specified).
http://docs.mongodb.org/manual/core/read-operations/#query-optimization
FYI: Use the verbose option to see a report of all plans that were considered for the find().