Purpose of Index in Mongoose Schema - mongodb

I am trying to add unique documents in the collection. However the problem is I want to decide uniqueness based on 2 fields. So I found a solution for it online. What confuses me is, What is the purpose ofINDEX here? In RDBMS index is usually the word used for row id, what does this mean here and how does it affect the uniqueness?
var patientSchema = mongoose.Schema({
name : String,
fatherOrHusbandName : String,
address : String,
});
patientSchema .***index***({ email: 1, sweepstakes_id: 1 }, { unique: true });

Indexes support the efficient execution of queries in MongoDB. Without
indexes, MongoDB must perform a collection scan, i.e. scan every
document in a collection, to select those documents that match the
query statement. If an appropriate index exists for a query, MongoDB
can use the index to limit the number of documents it must inspect.
for more details see this documentation .
Don't be confused after reading this document because this document uses createIndex and you're code uses index. createIndex is for MongoDB and index is for mongoose that internally executes MongoDB operations.
If you have no data in your database then it will work fine
patientSchema .index({ email: 1, sweepstakes_id: 1 }, { unique: true });
but if you have some data in the database with duplicate items then you should use dropDups keyword to make it a unique index.
But one thing you should know before use dropDups. If you use dropDups: true and you have some data with duplicate value then keep one document in your database and all the other data will be deleted (duplicate data).
Like:
patientSchema .index({ email: 1, sweepstakes_id: 1 }, { unique: true, dropDups: true });

It actually has the same purpose like indexes in RDBMS. You just have to think in some mongo-terminology. In mongodb instead of Tables, you have collections, instead of rows, you have documents.
With mongoose you defined a schema for a collection 'Patient', within this collection(index belongs to Patient collection) you defined a unique index on the document properties email and sweepstakes_id.
Every time you save a document in the Patient collection, mongodb makes sure that a document which has the email and sweepstakes_id properties set, those 2 properties will be unique among all other documents.
So, instead of 'rows', think in 'documents'

Related

How to index and sorting with Pagination using custom field in MongoDB ex: name instead of id

https://scalegrid.io/blog/fast-paging-with-mongodb/
Example : {
_id,
name,
company,
state
}
I've gone through the 2 scenarios explained in the above link and it says sorting by object id makes good performance while retrieve and sort the results. Instead of default sorting using object id , I want to index for my own custom field "name" and "company" want to sort and pagination on this two fields (Both fields holds the string value).
I am not sure how we can use gt or lt for a name, currently blocked on how to resolve this to provide pagination when a user sort by name.
How to index and do pagination for two fields?
Answer to your question is
db.Example.createIndex( { name: 1, company: 1 } )
And for pagination explanation the link you have shared on your question is good enough. Ex
db.Example.find({name = "John", country = "Ireland"}). limit(10);
For Sorting
db.Example.find().sort({"name" = 1, "country" = 1}).limit(userPassedLowerLimit).skip(userPassedUpperLimit);
If the user request to fetch 21-30 first documents after sorting on Name then country both in ascending order
db.Example.find().sort({"name" = 1, "country" = 1}).limit(30).skip(20);
For basic understand of Indexing in MonogDB
Indexes support the efficient execution of queries in MongoDB. Without indexes, MongoDB must perform a collection scan, i.e. scan every document in a collection, to select those documents that match the query statement. If an appropriate index exists for a query, MongoDB can use the index to limit the number of documents it must inspect.
Indexes are special data structures, that store a small portion of the collection’s data set in an easy to traverse form. The index stores the value of a specific field or set of fields, ordered by the value of the field.
Default _id Index
MongoDB creates a unique index on the _id field during the creation of a collection. The _id index prevents clients from inserting two documents with the same value for the _id field. You cannot drop this index on the _id field.
Create an Index
Syntax to execute on Mongo Shell
db.collection.createIndex( <key and index type specification>, <options> )
Ex:
db.collection.createIndex( { name: -1 } )
for ascending use 1,for descending use -1
The above rich query only creates an index if an index of the same specification does not already exist.
Index Types
MongoDB provides different index types to support specific types of data and queries. But i would like to mention 2 important types
1. Single Field
In addition to the MongoDB-defined _id index, MongoDB supports the creation of user-defined ascending/descending indexes on a single field of a document.
2. Compound Index
MongoDB also supports user-defined indexes on multiple fields, i.e. compound indexes.
The order of fields listed in a compound index has significance. For instance, if a compound index consists of { name: 1, company: 1 }, the index sorts first by name and then, within each name value, sorts by company.
Source for my understanding and answer and to know more about MongoDB indexing MongoDB Indexing

Validate uniqueness of a relationship model

I have an application where users can follow each other. Once this relationship is made a document is added into the collection. That document has two fields follower and followee. I want to prevent insertions of duplicate relationships. I do not want to query the db, wait for a promise, then insert as this seems like an inefficient approach. I'd rather stop it from saving a new document if the new document's follower and followee matches an existing document.
Look into creating a Unique Compound Index index:
db.members.createIndex( { follower: 1, followee: 1 }, { unique: true } )
The created index enforces uniqueness for the combination of follower and followee values.
A unique index ensures that the indexed fields do not store duplicate
values; i.e. enforces uniqueness for the indexed fields. By default,
MongoDB creates a unique index on the _id field during the creation of
a collection

Mongo bulk insert and avoid duplicate values for multiple keys

I have two collections items with 120,000 entries and itemHistories with more than 20 million entries. I periodically update all items and itemHistories by fetching an API that lists all history data for an item.
What I need to do is batch insert the history data to the collection while avoiding duplicates. Also the history API returns only date, info, item_id values.
Is it possible to batch insert in Mongo so that it doesn't add duplicates for 2 values combined (date, item_id). So if there already is an entry with the same date and item_id don't add it. Basically the date is an unique index for the item_id. It's allowed to have duplicate date values in the collection but only if the item_id is different for all the duplicates.
One item can have close to a million entries so I don't think fetching the history from the collection and comparing it to the API response is going to be optimal.
My current idea was to add another key to the collection called hash that is an md5(date,info,item_id) and make it an unique index. Suggestions?
A little bit of digging in the documentation of Mongoose and MongoDB I found out that there is a thing called Unique Compound Index that solves my problem and answers this question. Since I've never used indexes before I didn't know such a thing was possible.
You can also enforce a unique constraint on compound indexes. If you
use the unique constraint on a compound index, then MongoDB will
enforce uniqueness on the combination of the index key values.
For example, to create a unique index on groupNumber, lastname, and
firstname fields of the members collection, use the following
operation in the mongo shell:
db.members.createIndex( { groupNumber: 1, lastname: 1, firstname: 1 }, { unique: true } )
Source: https://docs.mongodb.org/manual/core/index-unique/
In my case I can use this code below to avoid duplicates:
db.itemHistories.createIndex( { date: 1, item_id: 1 }, { unique: true } )

mongodb sharding, use multiple fields as the shard key?

I have documents with the following schema:
{
idents: {
list: ['foo', 'bar', ...],
id: 123
}
...
}
the field idents.list is an array of string and always contains at least one element.
the field idents.id may or may not be existant.
over time more entries are added to 'idents.list' and at some point in the future the field idents.id may be set too.
these two fields are used to clearly identify a document and therefore are relevant for a shard key.
is it possible to use sharding with this schema?
UPDATE:
documents are always queried via {idents.list: 'foo'} OR { $or: [ {idents.list: 'foo'}, {idents.id: 42} ] }
Yes,you can do this. The documentation says:
Use a compound shard key that uses two or three values from all documents that provide the right mix of cardinality with scalable write operations and query isolation.
https://docs.mongodb.org/manual/tutorial/choose-a-shard-key/

MongoDB multikeys on _id + some value

In MongoDB I have a query which looks like this to find out for which comments the user has already voted:
db.comments.find({
_id: { $in: [...some ids...] },
votes.uid: "4fe1d64d85d4f4c00d000002"
});
As the documentation says you should have
One index per query
So what's better creating a multikey on _id + votes.uid or is it enough to just index on votes.uid because Mongo handles _id automatically in any way?
There is automatically an index on _id.
Depending of your queries (how many ids you have in the $in array) and your data, (how many votes you have on one object) you may create a index on votes.uid.
Take care of which index is used during query execution and remember you can force Mongo to use the index you want by adding .hints(field:1) or hints('indexname')