I have a huge collection, more than 2TiB of data. During release of a new feature I add an index of new field, that 100% sure doesn't exist in any document, MongoDB will still perfom a full scan for this field, which may process for a long time.
Is there any hack to just manually create an empty index file with valid structure and notify MongoDB node about it, so it will load index into memory and everything else MongoDB is doing when index is crerated?
Unlike in relational RDBMS, MongoDB creates indexes also on non-existing fields, i.e. it scans the entire collection.
Index creation runs in background, so it should not harm so much.
See createIndexes
Changed in version 4.2.
For feature compatibility version (fcv) "4.2", all index builds use an optimized build process that holds the exclusive lock only at the beginning and end of the build process. The rest of the build process yields to interleaving read and write operations. MongoDB ignores the background option if specified.
If you run MongoDB version 4.2 or earlier, then you may specify option { background: true }
Related
we're using MongoDb 4.0 with Spring Data MongoDB and we noticed that when doing some housekeeping by batch-deleting millions of documents using external Studio3T that all index entries on all indexes stayed untouched. I read lots of MongoDb documentation regarding this but couldn't find any reference to that circumstance.
If this code does not trigger an index update, then which code does?
Query query = new Query();
query.addCriteria(Criteria.where("modifiedAt").lte(LocalDateTime.now()));
// Does not remove index entries
mongoTemplate.findAllAndRemove(query, MyModel.class);
// Does not either
mongoTemplate.remove(query, MyModel.class);
// Does not either
mongoTemplate.findAll(MyModel.class).forEach(mongoTemplate::remove);
Having an effective mechanic of removing documents for housekeeping purposes and having their index entries removed at the same time is important to us as the Index size is growing and does not fit in memory anymore. Therefore we're required to scale up our hardware here which is more expensive unnecessarily.
I know there are ways to trigger this manually, e. g. dropping indexes and recreating them, or using the compact administrative function. However in a 24/7 onlineshop use case this seems rather unpractical.
There are existing collections in MongoDB on which need to be programmatically updated for new indexes.
So there is an admin web API in my ASP.net application when invoked will invoke the create index API in MongoDB. In order to not cause an impact due to index building process, it is performed in background.
It is not known whether the existing data is good as per the index definition. Because Mongo DB imposes index key size limit to 1024, and it may be possible that values of indexed fields in some of the existing documents may sum up to length more than 1024.
So the question is when this happens what would happen when the index building fails due to this.
Also how can I programmatically (C# driver) find the status of the index build operation at a later point in time?
According to the MongoDB Documentation
MongoDB will not create an index on a collection if the index entry for an existing document exceeds the index key limit. Previous versions of MongoDB would create the index but not index such documents.
So this means, background or foreground, an index key too long will cause the creation to fail. However, no matter how you create the index, the session issuing the create index command, will block. This means if the index build fails, you should be notified by an exception thrown while await-ing the task returned by the Indexes.CreateManyASync() method.
Since you are unsure if the data will be affected by the maximum key length, I strongly suggest you test this in a pre-production environment before attempting it in production. Since production is (I assume) active, the pre-production environment won't match the data exactly (writes still happening) it will reduce the possibility of finding a failed index build in production.
Additionally, even if the index is able to be built, in the future, writes that break that key length will be rejected. This can be avoided by setting failIndexKeyTooLong server parameter. However this has its own set of caveats. Specifically,
Setting failIndexKeyTooLong to false is a temporary workaround, not a permanent solution to the problem of oversized index keys. With failIndexKeyTooLong set to false, queries can return incomplete results if they use indexes that skip over documents whose indexed fields exceed the Index Key Length Limit.
I strongly suggest you read and understand those docs before implementing that particular parameter.
In general, it is considered by many to be bad practice to build an index at run-time. If the collection is already empty, this is not a big deal, however on a collection with a large amount of data, this can cause the create command to block for quite some time. This is especially true on a busy mongod when creating the index in the background.
If you are building this index on a Replica Set or Sharded Cluster, I strongly recommend you take a look at the documentation specific to those use cases before implementing the build in code.
A MongoDB collection is slow to provide data as it has grown huge overtime.
I need to add an index on a few fields and to reflect it immediately in search. So I seek for clarification on followings things:
Is it mandatory to restart MongoDB after indexing?
If yes, then is there any way to add index without restarting the server? I don't want any downtime...
MongoDB does not need to be restarted after indexing.
However, by default, the createIndex operation blocks read/write on the affected database (note that it is not only the collection but the db). You may change the behaviour using background mode like this:
db.collectionName.createIndex( { collectionKey: 1 }, { background: true } )
It might seem that your client is blocked when creating the index. The mongo shell session or connection where you are creating the index will block, but if there are more connections to the database, these will still be able to query and operate on the database.
Docs: https://docs.mongodb.com/manual/core/index-creation/
There is no need to restart MongoDB after you add an index!
However,an index could be created in the foreground which is the default.
What does it mean? MongoDB documentation states: ‘By default, creating an index on a populated collection blocks all other operations on a database. When building an index on a populated collection, the database that holds the collection is unavailable for reading or write operations until the index build completes. Any operation that requires a read or writes lock on all databases will wait for the foreground index build to complete’.
For potentially long-running index building operations on standalone deployments, the background option should be used. In that case, the MongoDB database remains available during the index building operation.
To create an index in the background, the following snippet should be used, see the image below.
What happens when I add a new document to my collection and run the createIndex() function. Does MongoDB build a new index over the whole collection (time and resource consuming)? Or is MongoDB just updating the index for the single document (time and resource saving)? I am not sure because I found this in the documentation (3.0):
By default, MongoDB builds indexes in the foreground, which prevents all read and write operations to the database while the index builds. Also, no operation that requires a read or write lock on all databases (e.g. listDatabases) can occur during a foreground index build.
I am asking because I need a dynamic 2dsphere index which will be updated continuous. If it needs to build the index everytime over the whole collection it would take too much time.
Building an index indexes all existing documents and also cause all future documents to get indexed as well. So if you insert new documents into an already indexed collection (or update the indexed fields of existing documents), the indexes will get updated.
When you call createIndex and an index of the same type over the same fields already exists, nothing happens.
Can I modify existing index in MongoDB without dropping it ? I don't see anything about it in documentation.
I have an non-unique index on String field. Collection is ~6M documents. It's replica set.
I know I can delete index and add new one. But it's problematic due to two reasons:
1) at time while index doesn't exist some queries will be very very slow.
2) adding new index (in my project) creates very high load on DB which visibly slows down my web-site.
There is no way to alter an index as you describe, and if there was I think the outcome in terms of performance would be similar - how would the database use the half created/altered index while this operation was going on for example?
Instead I would recommend using the background option to build the index on a single node, if that is your configuration - it will take longer but will not interfere with your normal operation as much. Once it is finished you can drop the old index at your leisure.
However, if you have a replica set (recommended) you should be aware that index creation is always (currently) done in the foreground on the secondary. If you want to avoid load on your secondaries, then you should follow the steps outlined here to take a member out one at a time and build the index required before rejoining the set:
http://docs.mongodb.org/manual/administration/indexes/#index-building-replica-sets
Update
Background index builds on secondaries will be possible starting with the 2.6 release (see release notes for details). This is not going to be backported to prior versions, so the above note will be true for versions prior to 2.6.
Finally, As a general note, indexes built in the background will generally be larger and a less efficient than those built in the foreground, so the methodology above will still have its uses.