Are MongoDB indexes persistent across restarts? - mongodb

Referencing the guide here, http://docs.mongodb.org/manual/core/indexes/ I cannot tell if Mongo indexes for fields are stored persistently.
If ensureIndex() is called (and completes) within an application using MongoDB, what happens if:
The application using MongoDB is restarted. Will a subsequent call to ensureIndex() cause a complete reindex?
The MongoDB server is restarted. Would a later call of ensureIndex() from a client application rebuild?
Is any of this affected by having multiple client sessions? I assume indexing is global across the entire collection per the documentation: "MongoDB defines indexes on a per-collection level."

The application using MongoDB is restarted. Will a subsequent call to ensureIndex() cause a complete reindex?
No, it should (as in every other driver) register as a no op since the index already exists. Some drivers provide a cache mechanism to detect, without going to the server, if an index has been created (i.e. Python).
The MongoDB server is restarted. Would a later call of ensureIndex() from a client application rebuild?
Same as above
Is any of this affected by having multiple client sessions? I assume indexing is global across the entire collection per the documentation: "MongoDB defines indexes on a per-collection level."
Yes indexes are stored in MongoDB on the collection itself (to be technical, as a namespace within the db.ns file). Since it is a single point of knowledge for ensureIndex and an index is a single process (much like the write lock really) multiple connections should not effect whether the index creation is registered twice.

Related

Is it possible to perform rolling index update with prisma on MongoDB?

Prisma requires mogodb to run in a replica set for some of the transactions work. This means that in theory some expensive and blocking index updates can be performed on a rolling bases i.e. made to replicas first and replica then promoted to be primary.
I tried using prisma db push and it appears it applies indexes to primary database (my database was not available for some time while indexes were being pushed).
I was wondering if it is possible to perform this rolling index update using prisma, thus allow zero downtime?
No.
Prisma is an ORM meaning that it assists interacting with data in the database. It is not responsible for management or operation of the database.
This is relevant because MongoDB does not support a command to perform index builds in a rolling manner across the replica set. Instead, that procedure is a technique that operators can elect to follow when building indexes. The procedure is outlined here.
Although that functionality is not built into the database itself, managed solutions can optionally build it into their offering. Likely the closest you can get to perform a rolling index build programmatically is to leverage an API that triggers the procedure via the managed solution. For reference, it looks like Atlas offers such an endpoint here.
All that said, MongoDB has made some changes to their index build procedures in recent years. It used to be the case that indexes were built in the foreground. This was a blocking operation and could cause behavior similar to what you described (where the database was unavailable for some time). There was the ability to specify that indexes be built in the background to reduce the impact in those versions.
Since version 4.2, there is no longer the ability to build indexes in the foreground. Since then, index builds behave more similar to how they previously did when using the background option. More information is available here. So the impact of building indexes through Prisma (or any client drivers) should at least be reduced in version 4.2+ of MongoDB compared to previous versions.

Is the update order preserved in MongoDB?

If I trigger two updates which are just 1 nanosecond apart. Is it possible that the updates could be done out-of-order? Say if the first update was more complex than the second.
I understand that MongoDB is eventually consistent, but I'm just not clear on whether or not the order of writes is preserved.
NB I am using a legacy system with an old version of MongoDB that doesn't have the newer transaction stuff
In MongoDB, write operations are atomic on document level as every document in a collection is independent & individual on it's own. So when an operation is executing write on a document then the second operation has to wait until first one finishes writing to the document.
From their docs :
a write operation is atomic on the level of a single document, even if
the operation modifies multiple embedded documents within a single
document.
Ref : atomicity-in-mongodb
So when this can be an issue ? - On reads, This is when if your application is so ready heavy. As reads can happen during updates - if your read happened before update finishes then your app will see old data or in other case reading from secondary can also result in inconsistent data.
In general MongoDB is usually hosted as replica-set (A MongoDB is generally a set of atleast 3 servers/shards/nodes) in which writes has to definitely be targeted to primary shard & by default reads as well are targeted to primary shard but if you overwrite read preference to read from Secondary shards to make primary free(maybe in general for app reporting), then you might see few to zero issues.
But why ? In general in background data gets sync'd from Primary to Secondary, if at any case this got delayed or not done by the time your application reads then you'll see the issue but chances can be low. Anyway all of this is until MongoDB version 4.0 - From 4.0 secondary read preference enabled apps will read from WiredTiger snapshot of data.
Ref : replica-set-data-synchronization

What happens when Index creation in MongoDB which is running in background fails

There are existing collections in MongoDB on which need to be programmatically updated for new indexes.
So there is an admin web API in my ASP.net application when invoked will invoke the create index API in MongoDB. In order to not cause an impact due to index building process, it is performed in background.
It is not known whether the existing data is good as per the index definition. Because Mongo DB imposes index key size limit to 1024, and it may be possible that values of indexed fields in some of the existing documents may sum up to length more than 1024.
So the question is when this happens what would happen when the index building fails due to this.
Also how can I programmatically (C# driver) find the status of the index build operation at a later point in time?
According to the MongoDB Documentation
MongoDB will not create an index on a collection if the index entry for an existing document exceeds the index key limit. Previous versions of MongoDB would create the index but not index such documents.
So this means, background or foreground, an index key too long will cause the creation to fail. However, no matter how you create the index, the session issuing the create index command, will block. This means if the index build fails, you should be notified by an exception thrown while await-ing the task returned by the Indexes.CreateManyASync() method.
Since you are unsure if the data will be affected by the maximum key length, I strongly suggest you test this in a pre-production environment before attempting it in production. Since production is (I assume) active, the pre-production environment won't match the data exactly (writes still happening) it will reduce the possibility of finding a failed index build in production.
Additionally, even if the index is able to be built, in the future, writes that break that key length will be rejected. This can be avoided by setting failIndexKeyTooLong server parameter. However this has its own set of caveats. Specifically,
Setting failIndexKeyTooLong to false is a temporary workaround, not a permanent solution to the problem of oversized index keys. With failIndexKeyTooLong set to false, queries can return incomplete results if they use indexes that skip over documents whose indexed fields exceed the Index Key Length Limit.
I strongly suggest you read and understand those docs before implementing that particular parameter.
In general, it is considered by many to be bad practice to build an index at run-time. If the collection is already empty, this is not a big deal, however on a collection with a large amount of data, this can cause the create command to block for quite some time. This is especially true on a busy mongod when creating the index in the background.
If you are building this index on a Replica Set or Sharded Cluster, I strongly recommend you take a look at the documentation specific to those use cases before implementing the build in code.

Is it mandatory to restart MongoDB after adding new index on collection?

A MongoDB collection is slow to provide data as it has grown huge overtime.
I need to add an index on a few fields and to reflect it immediately in search. So I seek for clarification on followings things:
Is it mandatory to restart MongoDB after indexing?
If yes, then is there any way to add index without restarting the server? I don't want any downtime...
MongoDB does not need to be restarted after indexing.
However, by default, the createIndex operation blocks read/write on the affected database (note that it is not only the collection but the db). You may change the behaviour using background mode like this:
db.collectionName.createIndex( { collectionKey: 1 }, { background: true } )
It might seem that your client is blocked when creating the index. The mongo shell session or connection where you are creating the index will block, but if there are more connections to the database, these will still be able to query and operate on the database.
Docs: https://docs.mongodb.com/manual/core/index-creation/
There is no need to restart MongoDB after you add an index!
However,an index could be created in the foreground which is the default.
What does it mean? MongoDB documentation states: ‘By default, creating an index on a populated collection blocks all other operations on a database. When building an index on a populated collection, the database that holds the collection is unavailable for reading or write operations until the index build completes. Any operation that requires a read or writes lock on all databases will wait for the foreground index build to complete’.
For potentially long-running index building operations on standalone deployments, the background option should be used. In that case, the MongoDB database remains available during the index building operation.
To create an index in the background, the following snippet should be used, see the image below.

does mongodb have the properties such as trigger and procedure in a relational database?

as the title suggests, include out the map-reduce framework
if i want to trigger an event to run a consistency check or security operations before a record is inserted, how can i do that with MongoDB?
MongoDB does not support triggers, but people have created solutions around them, mostly using the oplog, though this will only help you if you are running with replica sets, as the oplog is a capped collection that keeps track of data changes for the purposes of replication.
For a nodejs solution see: https://www.npmjs.org/package/mongo-watch or see an earlier SO thread: How to listen for changes to a MongoDB collection?
If you are concerned with consistency, read about write concern in mongoDB. http://docs.mongodb.org/manual/core/write-concern/ You can be as relaxed or as strict as you want by setting insert write concern levels, from fire and hope to getting an acknowledgement from all members of the replica set.
So, if you want to run a consistency check before inserting data, you probably will have to move that logic to the client application and set your write concern level to a level that will ensure consistency.
MongoDb does not have triggers or stored procedures. While there are solutions that some have used to try to emulate the behavior, as it is not a built-in feature, you'll need to decide whether the solutions are effective for you. Searching for "triggers and mongodb" should find dozens. All depend on the oplog and replicas.
But, given the nature of MongoDb and a typical 3 tier architecture, I would expect that at the point of data insertion, which could be on a web server for example, you would run, on the web server, the necessary consistency and security checks. You wouldn't allow a client such as a mobile application to directly set data into the database collection without some checks.
Many drivers for MongoDb and extended libraries have validation and consistency checks built in already, so there is less to do. Using unique indexes for some fields can also provide a level of consistency that you cannot do from the driver alone. Look at calls like findAndModify which make atomic updates.