How does mongodb index lists - mongodb

For example: If I had a db collection called Stores, and each store document has a list of the items they sell, and stores generally share items, then how would mongodb build an index on that?
Would it build a btree index on all possible items and then on each leaf of that tree (each item) will reference the documents which contain it?
Background:
I'm trying to perform queries like this using an index:
db.store.find({merchandise:{$exists:true}}) // where 'merchandise' is a list
db.store.find()[merchandise].count()
would an index on 'merchandise' help me?
If not, is my only option creating a separate meta field on 'merchandise' size, and index that?
Schema:
{ _id: 123456,
name: Macys
merchandise: [ 248651234564, 54862101248, 12450184, 1256001456 ]
}

From your document sample if you build your index on merchandise it will be multikey index and that index will be on every item on the array. See Multikey Indexes section in here.
If merchandise is an array of subdocuments, indexing over merchandise will put the index on all field of subdocument in the array. With index you can make queries like
db.store.find("merchandise":248651234564) and it will retrieve all document having merchandise 248651234564
For getting count of merchandise, you can get only get the size of merchandise field of one document like db.store.find()[index].merchandise.length. So creating a seperate field on merchandise size and indexing is a feasible option, if you want to run queries based on merchandise size.
Hope this helps

If you index a field that contains an array, MongoDB indexes each value in the array separately, in a multikey index. When you have 4 documents inside an array, each will act as a key in the index and point to the mentioned document(s).
You can use multikey indexes to index fields within objects embedded in arrays. That means, in your array, you can index a specific field in each document. For example: stuffs.thing : 1.
Read more about Multikey Indexes
Whether you need these indexes would depend on:
How many queries rely on that specific field?
How many updates, inserts hit that specific field (array)?
How many items will that array contain?
...
Remember that indexes slow writes as they need to be updated as well. I'd consider an explain on my queries to measure performance.

Related

How to index and sorting with Pagination using custom field in MongoDB ex: name instead of id

https://scalegrid.io/blog/fast-paging-with-mongodb/
Example : {
_id,
name,
company,
state
}
I've gone through the 2 scenarios explained in the above link and it says sorting by object id makes good performance while retrieve and sort the results. Instead of default sorting using object id , I want to index for my own custom field "name" and "company" want to sort and pagination on this two fields (Both fields holds the string value).
I am not sure how we can use gt or lt for a name, currently blocked on how to resolve this to provide pagination when a user sort by name.
How to index and do pagination for two fields?
Answer to your question is
db.Example.createIndex( { name: 1, company: 1 } )
And for pagination explanation the link you have shared on your question is good enough. Ex
db.Example.find({name = "John", country = "Ireland"}). limit(10);
For Sorting
db.Example.find().sort({"name" = 1, "country" = 1}).limit(userPassedLowerLimit).skip(userPassedUpperLimit);
If the user request to fetch 21-30 first documents after sorting on Name then country both in ascending order
db.Example.find().sort({"name" = 1, "country" = 1}).limit(30).skip(20);
For basic understand of Indexing in MonogDB
Indexes support the efficient execution of queries in MongoDB. Without indexes, MongoDB must perform a collection scan, i.e. scan every document in a collection, to select those documents that match the query statement. If an appropriate index exists for a query, MongoDB can use the index to limit the number of documents it must inspect.
Indexes are special data structures, that store a small portion of the collection’s data set in an easy to traverse form. The index stores the value of a specific field or set of fields, ordered by the value of the field.
Default _id Index
MongoDB creates a unique index on the _id field during the creation of a collection. The _id index prevents clients from inserting two documents with the same value for the _id field. You cannot drop this index on the _id field.
Create an Index
Syntax to execute on Mongo Shell
db.collection.createIndex( <key and index type specification>, <options> )
Ex:
db.collection.createIndex( { name: -1 } )
for ascending use 1,for descending use -1
The above rich query only creates an index if an index of the same specification does not already exist.
Index Types
MongoDB provides different index types to support specific types of data and queries. But i would like to mention 2 important types
1. Single Field
In addition to the MongoDB-defined _id index, MongoDB supports the creation of user-defined ascending/descending indexes on a single field of a document.
2. Compound Index
MongoDB also supports user-defined indexes on multiple fields, i.e. compound indexes.
The order of fields listed in a compound index has significance. For instance, if a compound index consists of { name: 1, company: 1 }, the index sorts first by name and then, within each name value, sorts by company.
Source for my understanding and answer and to know more about MongoDB indexing MongoDB Indexing

In Mongodb how can I view an index's values

I not asking how to view the indexes on a collection but how can I look inside the index and see its values?
I have a field that should be unique so I created a unique index and now I want to cross verify that all the documents are present in the index.
Normally you cannot look inside the index. It's just linked list. But... You can do count from index. db.data.find({},{"_id":1}).hint({"_id":1}).itcount()
In that example I project only field _id, with hint() I ordered system use unique index of "_id" and with itcount() I ordered NOT to use metadata information of count, but go thru that find cursor and do count of every item.

MongoDB Indexing: Multiple single-field vs single compound?

I have a collection of geospatial+temporal data with a few additional properties, which I'll be displaying on a map. The collection has a few million documents at this point, and will grow over time.
Each document has the following fields:
Location: [geojson object]
Date: [Date object]
ZoomLevel: [int32]
EntryType: [ObjectID]
I need to be able to rapidly query this collection by any combination of location (generally a geowithin query), Date (generally $gte/$lt), ZoomLevel and EntryType.
What I'm wondering is: Should I make a compound index containing all four fields, or a single index for each field, or some combination thereof? I read in the MongoDB docs the following:
For a compound index that includes a 2dsphere index key along with
keys of other types, only the 2dsphere index field determines whether
the index references a document.
...Which sounds like it means having the 2dsphere index for Location be part of a compound index might be pointless?
Any clarity on this would be much appreciated.
For your use case you will need to use multiple indexes.
If you create one index covering all fields of your documents your queries will only be able to use it when they include the first field in the index.
Since you need to query by any combination of these four fields I suggest you to analyze your data access patterns and see exactly what filters are you actually using and create specific index for each one or group of them.
EDIT: For your question about 2dsphere, it does make sense to make them compound.
This note refers to the 'sparse' option. Sparse index references only documents that contains the index fields, for 2dspheres the only documents that will be left out is the ones that do not contain the geojson/point array.

Mongoose indexes at both field and schema levels

I understand that indexing can be a valuable tool for quickly retrieving data, if implemented properly. I would like to be able to scan my documents for a certain field value or a combination of field values.
There are two fields I would be indexing (category, tags). Category is a string and tags is an array. I need to be able to query for items in a specific category and/or items that contain a specific tag.
Here are three examples:
Show me all of the documents in the category: "cars"
Show me all of the documents that contain the tag: "electric"
Show me all of the documents in the "cars" category that contain the "electric" tag
Will a schema level index for both fields suffice for all three scenarios?
docSchema.index({category:1, tags:1});
Or do I also need to define them at the field level, to support the scenarios when I am only searching through a single field?
docSchema = mongoose.Schema({
category: {
type: String,
index: true
},
tags: {
type: [String],
index: true
}
});
docSchema.index({category:1, tags:1}); is a compound index.
This compound index supports the scenarios 1 and 3:
-> Show me all of the documents in the category: "cars"
-> Show me all of the documents in the "cars" category that contain the "electric" tag
To support scenario 2 you will need to define an additional single index on the tag field.
docSchema.index({tags:1});
A compound index supports queries that involve all fields in the compound index as well as queries that involve a prefix of the compound index. In this case your compound index supports queries involving both categories and tags as well as queries involving just categories.
To better understand the logic please take a look at the Compound Indexes articles on MongoDB documentation site. Pay special attention to the section that talks about Prefixes.
You need an single field index on category and a multikey index on tags. You might be tempted to use a compound index instead of one of them. But it is not mandatory if you are using MongoDB >= 2.6, as it has a nice feature called index intersection.
Show me all of the documents in the category: "cars"
Show me all of the documents that contain the tag: "electric"
Show me all of the documents in the "cars" category that contain the "electric" tag
(1) will use the index on category (incl. any index having category as a prefix)
(2) will use the index on tags (incl. any index having tags as a prefix)
(3) will use the index on tags or the index on category or the index intersection of both of them (depending the choice of the query planner).
As a reference, there is a nice discussion about index intersection in the MongoDB blog. Worth reading the entire article. But to quote the conclusion, mostly comparing index intersection to compound indexes:
To be clear, compound indexing will ALWAYS be more performant [than index intersection] IF you know what you are going to be querying on and can create one ahead of time. Furthermore, if your working set is entirely in memory, then you will not reap any of the benefits of Index Intersection as it is primarily based on reducing IO. But in a more ad-hoc case where one cannot predict the shape of the queries and the working set is much larger than available memory, index intersection will automatically take over and choose the most performant path.

MongoDB indexes issue

MongoDB can store documents with different fields in one collection.
How then indexes will work? If I create index on field that presents not in all documents, the documents which don't have that will not be indexed?
Documents without the field in an index will be indexed as having no value for that field. You probably want to review this: http://docs.mongodb.org/manual/core/indexes/
If you want to not include documents that don't have the key in the index, you can use a sparse index: http://docs.mongodb.org/manual/administration/indexes/#sparse-indexes