MongoDB Index on different types - mongodb

We can have { data: "hello" }, { data: 123 } in the same collection and even create a index on it. I'm curious how does mongodb manage the index behind the scene. We can't create single B-tree on different types. Right? However, I did getIndexes to see if another index is created but only one index is created.

There's no problem having two types in the same index. Each key within the index includes the type.
When you query, only objects matching the type you query on will be returned.
So if you query for {data: "hello"}, only strings will be returned, etc.

Related

How to index and sorting with Pagination using custom field in MongoDB ex: name instead of id

https://scalegrid.io/blog/fast-paging-with-mongodb/
Example : {
_id,
name,
company,
state
}
I've gone through the 2 scenarios explained in the above link and it says sorting by object id makes good performance while retrieve and sort the results. Instead of default sorting using object id , I want to index for my own custom field "name" and "company" want to sort and pagination on this two fields (Both fields holds the string value).
I am not sure how we can use gt or lt for a name, currently blocked on how to resolve this to provide pagination when a user sort by name.
How to index and do pagination for two fields?
Answer to your question is
db.Example.createIndex( { name: 1, company: 1 } )
And for pagination explanation the link you have shared on your question is good enough. Ex
db.Example.find({name = "John", country = "Ireland"}). limit(10);
For Sorting
db.Example.find().sort({"name" = 1, "country" = 1}).limit(userPassedLowerLimit).skip(userPassedUpperLimit);
If the user request to fetch 21-30 first documents after sorting on Name then country both in ascending order
db.Example.find().sort({"name" = 1, "country" = 1}).limit(30).skip(20);
For basic understand of Indexing in MonogDB
Indexes support the efficient execution of queries in MongoDB. Without indexes, MongoDB must perform a collection scan, i.e. scan every document in a collection, to select those documents that match the query statement. If an appropriate index exists for a query, MongoDB can use the index to limit the number of documents it must inspect.
Indexes are special data structures, that store a small portion of the collection’s data set in an easy to traverse form. The index stores the value of a specific field or set of fields, ordered by the value of the field.
Default _id Index
MongoDB creates a unique index on the _id field during the creation of a collection. The _id index prevents clients from inserting two documents with the same value for the _id field. You cannot drop this index on the _id field.
Create an Index
Syntax to execute on Mongo Shell
db.collection.createIndex( <key and index type specification>, <options> )
Ex:
db.collection.createIndex( { name: -1 } )
for ascending use 1,for descending use -1
The above rich query only creates an index if an index of the same specification does not already exist.
Index Types
MongoDB provides different index types to support specific types of data and queries. But i would like to mention 2 important types
1. Single Field
In addition to the MongoDB-defined _id index, MongoDB supports the creation of user-defined ascending/descending indexes on a single field of a document.
2. Compound Index
MongoDB also supports user-defined indexes on multiple fields, i.e. compound indexes.
The order of fields listed in a compound index has significance. For instance, if a compound index consists of { name: 1, company: 1 }, the index sorts first by name and then, within each name value, sorts by company.
Source for my understanding and answer and to know more about MongoDB indexing MongoDB Indexing

MongoDB index for nested document is not being used

In my collection, I've say the following structure
{
_id: ObjectId("ssxxdfasfsadf"),
a: {
b: "somevalue"
}
}
I've created an index for a.b, which works fine if I use find query as db.collection.find({"a.b": "someothervalue"}).
If I change my query to db.collection.find({a: {b: "somevalue"}}), it's doing a complete collection scan. (Source - find().explain())
Sure, I can modify my application to do the query as "a.b", but I want to avoid that, as I've few other fields in a, on which in future I may need to query.
Is there anyway {a: {b: "somevalue"}} could work with tweaking the index?
Also, is there any advantage/disadvantage of using one or the other?
I would go with the first approach. A quick read through MongoDB's documentation, states the following:
MongoDB uses the dot notation to access the elements of an array and to access the fields of an embedded document.
See MongoDB Dot Notation and Query on Embedded/Nested Documents.
About tweaking the index, you could index the embedded document as a whole:
db.myColl.createIndex({ "a": 1 });
But I don't see the reason of doing this if you only need specific properties indexed. I would be sensitive on the Index Size, especially if the property will be holding a lot of data.

Fundamental misunderstanding of MongoDB indices

So, I read the following definition of indexes from [MongoDB Docs][1].
Indexes support the efficient execution of queries in MongoDB. Without indexes, MongoDB must perform a collection scan, i.e. scan every document in a collection, to select those documents that match the query statement. If an appropriate index exists for a query, MongoDB can use the index to limit the number of documents it must inspect.
Indexes are special data structures that store a small portion of the
collection’s data set in an easy to traverse form. The index stores
the value of a specific field or set of fields, ordered by the value
of the field. The ordering of the index entries supports efficient
equality matches and range-based query operations. In addition,
MongoDB can return sorted results by using the ordering in the index.
I have a sample database with a collection called pets. Pets have the following structure.
{
"_id": ObjectId(123abc123abc)
"name": "My pet's name"
}
I created an index on the name field using the following code.
db.pets.createIndex({"name":1})
What I expect is that the documents in the collection, pets, will be indexed in ascending order based on the name field during queries. The result of this index can potentially reduce the overall query time, especially if a query is strategically structured with available indices in mind. Under that assumption, the following query should return all pets sorted by name in ascending order, but it doesn't.
db.pets.find({},{"_id":0})
Instead, it returns the pets in the order that they were inserted. My conclusion is that I lack a fundamental understanding of how indices work. Can someone please help me to understand?
Yes, it is misunderstanding about how indexes work.
Indexes don't change the output of a query but the way query is processed by the database engine. So db.pets.find({},{"_id":0}) will always return the documents in natural order irrespective of whether there is an index or not.
Indexes will be used only when you make use of them in your query. Thus,
db.pets.find({name : "My pet's name"},{"_id":0}) and db.pets.find({}, {_id : 0}).sort({name : 1}) will use the {name : 1} index.
You should run explain on your queries to check if indexes are being used or not.
You may want to refer the documentation on how indexes work.
https://docs.mongodb.com/manual/indexes/
https://docs.mongodb.com/manual/tutorial/sort-results-with-indexes/

MongoDB Indexing: Multiple single-field vs single compound?

I have a collection of geospatial+temporal data with a few additional properties, which I'll be displaying on a map. The collection has a few million documents at this point, and will grow over time.
Each document has the following fields:
Location: [geojson object]
Date: [Date object]
ZoomLevel: [int32]
EntryType: [ObjectID]
I need to be able to rapidly query this collection by any combination of location (generally a geowithin query), Date (generally $gte/$lt), ZoomLevel and EntryType.
What I'm wondering is: Should I make a compound index containing all four fields, or a single index for each field, or some combination thereof? I read in the MongoDB docs the following:
For a compound index that includes a 2dsphere index key along with
keys of other types, only the 2dsphere index field determines whether
the index references a document.
...Which sounds like it means having the 2dsphere index for Location be part of a compound index might be pointless?
Any clarity on this would be much appreciated.
For your use case you will need to use multiple indexes.
If you create one index covering all fields of your documents your queries will only be able to use it when they include the first field in the index.
Since you need to query by any combination of these four fields I suggest you to analyze your data access patterns and see exactly what filters are you actually using and create specific index for each one or group of them.
EDIT: For your question about 2dsphere, it does make sense to make them compound.
This note refers to the 'sparse' option. Sparse index references only documents that contains the index fields, for 2dspheres the only documents that will be left out is the ones that do not contain the geojson/point array.

Distinguish array from single value in a document

I have two type of documents in a mongodb collection:
one where key sessions has a simple value:
{"sessions": NumberLong("10000000000001")}
one where key sessions has an array of values.
{"sessions": [NumberLong("10000000000001")]}
Is there any way to retrieve all documents from the second category, ie. only documents whose value is an arary and not a simple value?
You can use this kind of query for that:
db.collectionName.find( { $where : "Array.isArray(this.sessions)" } );
but you'd better convert all the records to one type to keep the things consistent.
This code can be simple like this:
db.c.find({sessions:{$gte:[]}});
Explanation:
Because you only want to retrieve documents whose sessions data type is array, and by the feature of $gte (if data types are different between tow operands, it returns false; Double, Integer32, Integer64 are considered as same data type.), giving an empty array as the opposite operand will help to retrieve all results by required.
Also , $gt, $lt, $lte for standard query (attention: different behaviors to operaors with same name in expression of aggregation pipeline) have the same feature. I proved this by practice on MongoDB V2.4.8, V2.6.4.