MongoDB Indexing: Multiple single-field vs single compound? - mongodb

I have a collection of geospatial+temporal data with a few additional properties, which I'll be displaying on a map. The collection has a few million documents at this point, and will grow over time.
Each document has the following fields:
Location: [geojson object]
Date: [Date object]
ZoomLevel: [int32]
EntryType: [ObjectID]
I need to be able to rapidly query this collection by any combination of location (generally a geowithin query), Date (generally $gte/$lt), ZoomLevel and EntryType.
What I'm wondering is: Should I make a compound index containing all four fields, or a single index for each field, or some combination thereof? I read in the MongoDB docs the following:
For a compound index that includes a 2dsphere index key along with
keys of other types, only the 2dsphere index field determines whether
the index references a document.
...Which sounds like it means having the 2dsphere index for Location be part of a compound index might be pointless?
Any clarity on this would be much appreciated.

For your use case you will need to use multiple indexes.
If you create one index covering all fields of your documents your queries will only be able to use it when they include the first field in the index.
Since you need to query by any combination of these four fields I suggest you to analyze your data access patterns and see exactly what filters are you actually using and create specific index for each one or group of them.
EDIT: For your question about 2dsphere, it does make sense to make them compound.
This note refers to the 'sparse' option. Sparse index references only documents that contains the index fields, for 2dspheres the only documents that will be left out is the ones that do not contain the geojson/point array.

Related

Fundamental misunderstanding of MongoDB indices

So, I read the following definition of indexes from [MongoDB Docs][1].
Indexes support the efficient execution of queries in MongoDB. Without indexes, MongoDB must perform a collection scan, i.e. scan every document in a collection, to select those documents that match the query statement. If an appropriate index exists for a query, MongoDB can use the index to limit the number of documents it must inspect.
Indexes are special data structures that store a small portion of the
collection’s data set in an easy to traverse form. The index stores
the value of a specific field or set of fields, ordered by the value
of the field. The ordering of the index entries supports efficient
equality matches and range-based query operations. In addition,
MongoDB can return sorted results by using the ordering in the index.
I have a sample database with a collection called pets. Pets have the following structure.
{
"_id": ObjectId(123abc123abc)
"name": "My pet's name"
}
I created an index on the name field using the following code.
db.pets.createIndex({"name":1})
What I expect is that the documents in the collection, pets, will be indexed in ascending order based on the name field during queries. The result of this index can potentially reduce the overall query time, especially if a query is strategically structured with available indices in mind. Under that assumption, the following query should return all pets sorted by name in ascending order, but it doesn't.
db.pets.find({},{"_id":0})
Instead, it returns the pets in the order that they were inserted. My conclusion is that I lack a fundamental understanding of how indices work. Can someone please help me to understand?
Yes, it is misunderstanding about how indexes work.
Indexes don't change the output of a query but the way query is processed by the database engine. So db.pets.find({},{"_id":0}) will always return the documents in natural order irrespective of whether there is an index or not.
Indexes will be used only when you make use of them in your query. Thus,
db.pets.find({name : "My pet's name"},{"_id":0}) and db.pets.find({}, {_id : 0}).sort({name : 1}) will use the {name : 1} index.
You should run explain on your queries to check if indexes are being used or not.
You may want to refer the documentation on how indexes work.
https://docs.mongodb.com/manual/indexes/
https://docs.mongodb.com/manual/tutorial/sort-results-with-indexes/

DB Compound indexing best practices Mongo DB

How costly is it to index some fields in MongoDB,
I have a table where i want uniqueness combining two fields, Every where i search they suggested compound index with unique set to true. But what i was doing is " Appending both field1_field2 and making it a key, so that field2 will be always unique for field1.(and add Application logic) As i thought indexing is costly.
And also as MongoDB documentation advices us not to use Custom Object ID like auto incrementing number, I end up giving big numbers to Models like Classes, Students etc, (where i could have used easily used 1,2,3 in sql lite), I didn't think to add a new field for numbering and index that field for querying.
What are the best practices advice for production
The advantage of using compound indexes vs your own indexed field system is that compound indexes allows sorting quicker than regular indexed fields. It also lowers the size of every documents.
In your case, if you want to get the documents sorted with values in field1 ascending and in field2 descending, it is better to use a compound index. If you only want to get the documents that have some specific value contained in field1_field2, it does not really matter if you use compound indexes or a regular indexed field.
However, if you already have field1 and field2 in seperate fields in the documents, and you also have a field containing field1_field2, it could be better to use a compound index on field1 and field2, and simply delete the field containing field1_field2. This could lower the size of every document and ultimately reduce the size of your database.
Regarding the cost of the indexing, you almost have to index field1_field2 if you want to go down that route anyways. Queries based on unindexed fields in MongoDB are really slow. And it does not take much more time adding a document to a database when the document has an indexed field (we're talking 1 millisecond or so). Note that adding an index on many existing documents can take a few minutes. This is why you usually plan the indexing strategy before adding any documents.
TL;DR:
If you have limited disk space or need to sort the results, go with a compound index and delete field1_field2. Otherwise, use field1_field2, but it has to be indexed!

Mongoose indexes at both field and schema levels

I understand that indexing can be a valuable tool for quickly retrieving data, if implemented properly. I would like to be able to scan my documents for a certain field value or a combination of field values.
There are two fields I would be indexing (category, tags). Category is a string and tags is an array. I need to be able to query for items in a specific category and/or items that contain a specific tag.
Here are three examples:
Show me all of the documents in the category: "cars"
Show me all of the documents that contain the tag: "electric"
Show me all of the documents in the "cars" category that contain the "electric" tag
Will a schema level index for both fields suffice for all three scenarios?
docSchema.index({category:1, tags:1});
Or do I also need to define them at the field level, to support the scenarios when I am only searching through a single field?
docSchema = mongoose.Schema({
category: {
type: String,
index: true
},
tags: {
type: [String],
index: true
}
});
docSchema.index({category:1, tags:1}); is a compound index.
This compound index supports the scenarios 1 and 3:
-> Show me all of the documents in the category: "cars"
-> Show me all of the documents in the "cars" category that contain the "electric" tag
To support scenario 2 you will need to define an additional single index on the tag field.
docSchema.index({tags:1});
A compound index supports queries that involve all fields in the compound index as well as queries that involve a prefix of the compound index. In this case your compound index supports queries involving both categories and tags as well as queries involving just categories.
To better understand the logic please take a look at the Compound Indexes articles on MongoDB documentation site. Pay special attention to the section that talks about Prefixes.
You need an single field index on category and a multikey index on tags. You might be tempted to use a compound index instead of one of them. But it is not mandatory if you are using MongoDB >= 2.6, as it has a nice feature called index intersection.
Show me all of the documents in the category: "cars"
Show me all of the documents that contain the tag: "electric"
Show me all of the documents in the "cars" category that contain the "electric" tag
(1) will use the index on category (incl. any index having category as a prefix)
(2) will use the index on tags (incl. any index having tags as a prefix)
(3) will use the index on tags or the index on category or the index intersection of both of them (depending the choice of the query planner).
As a reference, there is a nice discussion about index intersection in the MongoDB blog. Worth reading the entire article. But to quote the conclusion, mostly comparing index intersection to compound indexes:
To be clear, compound indexing will ALWAYS be more performant [than index intersection] IF you know what you are going to be querying on and can create one ahead of time. Furthermore, if your working set is entirely in memory, then you will not reap any of the benefits of Index Intersection as it is primarily based on reducing IO. But in a more ad-hoc case where one cannot predict the shape of the queries and the working set is much larger than available memory, index intersection will automatically take over and choose the most performant path.

What does the digit "1" mean when creating indexes in mongodb

I am new to mongodb and want to make indexes for a specific collection. I have seen people use a digit "1" in front of the field name when they want to create an index. for example:
db.users.ensureIndex({user_name: 1})
now I want to know what does this digit mean and is it necessary to use it?
It's the type of index. MongoDB supports different kinds of indexes. However, only the first two indexes can be combined to a compound index.
1: Ascending binary-tree index.
-1: Descending binary-tree index. Very similar to the default index but the difference can matter for the behavior of compound indexes.
"hashed": A hashtable index. Very fast for lookup by exact value, especially in very large collections. But not usable for inexact queries ($gt, $regex or similar).
"text": A text index designed for searching for words in strings with natural language.
"2d": A geospatial index on a flat plane
"2dsphere": A geospatial index on a sphere
For more information, see the documentation of index types.
It defines the index type on that specefic field. For example the value of 1 creates an index with ascending order, while the value -1 create the index with descending order.
For more information, see the Manual

How does mongodb index lists

For example: If I had a db collection called Stores, and each store document has a list of the items they sell, and stores generally share items, then how would mongodb build an index on that?
Would it build a btree index on all possible items and then on each leaf of that tree (each item) will reference the documents which contain it?
Background:
I'm trying to perform queries like this using an index:
db.store.find({merchandise:{$exists:true}}) // where 'merchandise' is a list
db.store.find()[merchandise].count()
would an index on 'merchandise' help me?
If not, is my only option creating a separate meta field on 'merchandise' size, and index that?
Schema:
{ _id: 123456,
name: Macys
merchandise: [ 248651234564, 54862101248, 12450184, 1256001456 ]
}
From your document sample if you build your index on merchandise it will be multikey index and that index will be on every item on the array. See Multikey Indexes section in here.
If merchandise is an array of subdocuments, indexing over merchandise will put the index on all field of subdocument in the array. With index you can make queries like
db.store.find("merchandise":248651234564) and it will retrieve all document having merchandise 248651234564
For getting count of merchandise, you can get only get the size of merchandise field of one document like db.store.find()[index].merchandise.length. So creating a seperate field on merchandise size and indexing is a feasible option, if you want to run queries based on merchandise size.
Hope this helps
If you index a field that contains an array, MongoDB indexes each value in the array separately, in a multikey index. When you have 4 documents inside an array, each will act as a key in the index and point to the mentioned document(s).
You can use multikey indexes to index fields within objects embedded in arrays. That means, in your array, you can index a specific field in each document. For example: stuffs.thing : 1.
Read more about Multikey Indexes
Whether you need these indexes would depend on:
How many queries rely on that specific field?
How many updates, inserts hit that specific field (array)?
How many items will that array contain?
...
Remember that indexes slow writes as they need to be updated as well. I'd consider an explain on my queries to measure performance.