Can I modify existing index in MongoDB without dropping it ? I don't see anything about it in documentation.
I have an non-unique index on String field. Collection is ~6M documents. It's replica set.
I know I can delete index and add new one. But it's problematic due to two reasons:
1) at time while index doesn't exist some queries will be very very slow.
2) adding new index (in my project) creates very high load on DB which visibly slows down my web-site.
There is no way to alter an index as you describe, and if there was I think the outcome in terms of performance would be similar - how would the database use the half created/altered index while this operation was going on for example?
Instead I would recommend using the background option to build the index on a single node, if that is your configuration - it will take longer but will not interfere with your normal operation as much. Once it is finished you can drop the old index at your leisure.
However, if you have a replica set (recommended) you should be aware that index creation is always (currently) done in the foreground on the secondary. If you want to avoid load on your secondaries, then you should follow the steps outlined here to take a member out one at a time and build the index required before rejoining the set:
http://docs.mongodb.org/manual/administration/indexes/#index-building-replica-sets
Update
Background index builds on secondaries will be possible starting with the 2.6 release (see release notes for details). This is not going to be backported to prior versions, so the above note will be true for versions prior to 2.6.
Finally, As a general note, indexes built in the background will generally be larger and a less efficient than those built in the foreground, so the methodology above will still have its uses.
Related
Prisma requires mogodb to run in a replica set for some of the transactions work. This means that in theory some expensive and blocking index updates can be performed on a rolling bases i.e. made to replicas first and replica then promoted to be primary.
I tried using prisma db push and it appears it applies indexes to primary database (my database was not available for some time while indexes were being pushed).
I was wondering if it is possible to perform this rolling index update using prisma, thus allow zero downtime?
No.
Prisma is an ORM meaning that it assists interacting with data in the database. It is not responsible for management or operation of the database.
This is relevant because MongoDB does not support a command to perform index builds in a rolling manner across the replica set. Instead, that procedure is a technique that operators can elect to follow when building indexes. The procedure is outlined here.
Although that functionality is not built into the database itself, managed solutions can optionally build it into their offering. Likely the closest you can get to perform a rolling index build programmatically is to leverage an API that triggers the procedure via the managed solution. For reference, it looks like Atlas offers such an endpoint here.
All that said, MongoDB has made some changes to their index build procedures in recent years. It used to be the case that indexes were built in the foreground. This was a blocking operation and could cause behavior similar to what you described (where the database was unavailable for some time). There was the ability to specify that indexes be built in the background to reduce the impact in those versions.
Since version 4.2, there is no longer the ability to build indexes in the foreground. Since then, index builds behave more similar to how they previously did when using the background option. More information is available here. So the impact of building indexes through Prisma (or any client drivers) should at least be reduced in version 4.2+ of MongoDB compared to previous versions.
I have a huge collection, more than 2TiB of data. During release of a new feature I add an index of new field, that 100% sure doesn't exist in any document, MongoDB will still perfom a full scan for this field, which may process for a long time.
Is there any hack to just manually create an empty index file with valid structure and notify MongoDB node about it, so it will load index into memory and everything else MongoDB is doing when index is crerated?
Unlike in relational RDBMS, MongoDB creates indexes also on non-existing fields, i.e. it scans the entire collection.
Index creation runs in background, so it should not harm so much.
See createIndexes
Changed in version 4.2.
For feature compatibility version (fcv) "4.2", all index builds use an optimized build process that holds the exclusive lock only at the beginning and end of the build process. The rest of the build process yields to interleaving read and write operations. MongoDB ignores the background option if specified.
If you run MongoDB version 4.2 or earlier, then you may specify option { background: true }
There are existing collections in MongoDB on which need to be programmatically updated for new indexes.
So there is an admin web API in my ASP.net application when invoked will invoke the create index API in MongoDB. In order to not cause an impact due to index building process, it is performed in background.
It is not known whether the existing data is good as per the index definition. Because Mongo DB imposes index key size limit to 1024, and it may be possible that values of indexed fields in some of the existing documents may sum up to length more than 1024.
So the question is when this happens what would happen when the index building fails due to this.
Also how can I programmatically (C# driver) find the status of the index build operation at a later point in time?
According to the MongoDB Documentation
MongoDB will not create an index on a collection if the index entry for an existing document exceeds the index key limit. Previous versions of MongoDB would create the index but not index such documents.
So this means, background or foreground, an index key too long will cause the creation to fail. However, no matter how you create the index, the session issuing the create index command, will block. This means if the index build fails, you should be notified by an exception thrown while await-ing the task returned by the Indexes.CreateManyASync() method.
Since you are unsure if the data will be affected by the maximum key length, I strongly suggest you test this in a pre-production environment before attempting it in production. Since production is (I assume) active, the pre-production environment won't match the data exactly (writes still happening) it will reduce the possibility of finding a failed index build in production.
Additionally, even if the index is able to be built, in the future, writes that break that key length will be rejected. This can be avoided by setting failIndexKeyTooLong server parameter. However this has its own set of caveats. Specifically,
Setting failIndexKeyTooLong to false is a temporary workaround, not a permanent solution to the problem of oversized index keys. With failIndexKeyTooLong set to false, queries can return incomplete results if they use indexes that skip over documents whose indexed fields exceed the Index Key Length Limit.
I strongly suggest you read and understand those docs before implementing that particular parameter.
In general, it is considered by many to be bad practice to build an index at run-time. If the collection is already empty, this is not a big deal, however on a collection with a large amount of data, this can cause the create command to block for quite some time. This is especially true on a busy mongod when creating the index in the background.
If you are building this index on a Replica Set or Sharded Cluster, I strongly recommend you take a look at the documentation specific to those use cases before implementing the build in code.
Per the Mongoose documentation for MongooseJS and MongoDB/Node.js :
When your application starts up, Mongoose automatically calls ensureIndex for each defined index in your schema. While nice for development, it is recommended this behavior be disabled in production since index creation can cause a significant performance impact. Disable the behavior by setting the autoIndex option of your schema to false.
This appears to instruct removal of auto-indexing from mongoose prior to deploying to optimize Mongoose from instructing Mongo to go and churn through all indexes on application startup, which seems to make sense.
What is the proper way to handle indexing in production code? Maybe an external script should generate indexes? Or maybe ensureIndex is unnecessary if a single application is the sole reader/writer to a collection because it will continue an index every time a DB write occurs?
Edit: To supplement, MongoDB provides good documentation for the how to do indexing, but not why or when explicit indexing directives should be done. It seems to me that indexes should be kept up to date by writer applications automatically on collections with existing indexes and that ensureIndex is really more of a one-time thing (done when a new index is being applied), in which case Mongoose's autoIndex should be a no-op under a normal server restart.
I've never understood why the Mongoose documentation so broadly recommends disabling autoIndex in production. Once the index has been added, subsequent ensureIndex calls will simply see that the index already exists and then return. So it only has an effect on performance when you're first creating the index, and at that time the collections are often empty so creating an index would be quick anyway.
My suggestion is to leave autoIndex enabled unless you have a specific situation where it's giving you trouble; like if you want to add a new index to an existing collection that has millions of docs and you want more control over when it's created.
Although I agree with the accepted answer, its worth noting that according to the MongoDB manual, this isn't the recommended way of adding indexes on a production server:
If your application includes ensureIndex() operations, and an index doesn’t exist for other operational concerns, building the index can have a severe impact on the performance of the database.
To avoid performance issues, make sure that your application checks for the indexes at start up using the getIndexes() method or the equivalent method for your driver and terminates if the proper indexes do not exist. Always build indexes in production instances using separate application code, during designated maintenance windows.
Of course, it really depends on how your application is structured and deployed. If you are deploying to Heroku, for example, and you aren't using Heroku's preboot feature, then it is likely your application is not serving requests at all during startup, and so it's probably safe to create an index at that time.
In addition to this, from the accepted answer:
So it only has an effect on performance when you're first creating the index, and at that time the collections are often empty so creating an index would be quick anyway.
If you've managed to get your data model and queries nailed on first time around, this is fine, and often the case. However, if you are adding new functionality to your app, with a new DB query on a property without an index, you'll often find yourself adding an index to a collection containing many existing documents.
This is the time when you need to be careful about adding indexes, and carefully consider the performance implications of doing so. For example, you could create the index in the background:
db.ensureIndex({ name: 1 }, { background: true });
use this block code to handle production mode:
const autoIndex = process.env.NODE_ENV !== 'production';
mongoose.connect('mongodb://localhost/collection', { autoIndex });
I read somewhere that calling ensureIndex() actually creates a collection if it does not exist. But the index is always on some fields, not all of them, so if I ensure an index on say { name:1 } and then add documents to that collection that have many more fields, the index will work? I know we don't have a schema, coming from RDBMS world I just want to make sure. :) I'd like to create indexes when my website starts, but initially the database is empty. I do not need to have any data prior to ensuring indexes, is that correct?
ensureIndex will create the collection if it does not yet exist. It does not matter if you add documents that don't have the property that the index covers, you just can't use that index to find those documents. The way I understand it is that in versions before 1.7.4 a document that is missing a property for which there is an index will be indexed as though it had that property, but will a null value. In versions after 1.7.4 you can create sparse indexes that don't include these objects at all. The difference is slight but may be significant in some situations.
Depending on the circumstances it may not be a good idea to create indexes when the app starts. Consider the situation where you deploy a new version which adds new indexes when it starts up, in development you will not notice this as you only have a small database, but in production you may have a huge database and adding the index will take a lot of time. During the index creation your app will hang and you can't serve requests. You can create indexes with the background flag set to true (the syntax depends on which driver you're using), but in most cases it's better to add indexes manually, or as part of a setup script. That way you will have to think before you update indexes.
Deprecated since version 3.0: db.collection.ensureIndex() has been
replaced by db.collection.createIndex().
Ref: https://docs.mongodb.com/manual/reference/method/db.collection.ensureIndex/