When are mongodb indexes updated? - mongodb

Question
Are mongodb indexes updated before the success of a write operation is reported to the application or do index updates run in the background? If they run in the background: is there a way to wait for an index update to complete?
Background
I have a document
person1obj = {
email: 'user#domain.tld',
[...]
}
in a people collection where a unique index is applied to the email field. Now I'd like to insert another document
person2obj = {
email: 'user#domain.tld',
[...]
}
Obviously, I have to change the email field of person1 before person2 can be inserted. With mongoose, the code looks like
mongoose.model('Person').create(person1obj, function (err, person1) {
// person1 has been saved to the db and 'user#domain.tld' is
// added to the *unique* email field index
// change email for person1 and save
person1.email = 'otheruser#domain.tld';
person1.save(function(err, person1) {
// person1 has been updated in the db
// QUESTION: is it guaranteed that 'user#domain.tld' has been removed from
// the index?
// inserting person2 could fail if the index has not yet been updated
mongoose.model('Person').create(person2obj, function (err, person2) {
// ...
});
});
});
I have seen a random fail of my unit tests with the error E11000 duplicate key error index which made me wonder if index updates run in the background.
This question probably is related to mongodb's write concern but I couldn't find any documentation on the actual process for index updates.

From the FAQ (emphasis mine):
How do write operations affect indexes?
Any write operation that alters an indexed field requires an update to the index in addition to the document itself. If you update a document that causes the document to grow beyond the allotted record size, then MongoDB must update all indexes that include this document as part of the update operation.
Therefore, if your application is write-heavy, creating too many indexes might affect performance.

At the very least in the case of unique indexes, the indexing does not run in the background. This is evident by the fact that when you try to write a new document with a duplicate key that is suppose to be unique you get a duplicate key error.
If indexing was to happen asynchronously in the background, Mongo would not be able to tell if the write actually succeeded. Thus the indexing must happen during the write sequence.
While I have no evidence for this (though Mongo is open source, if you have enough time you can look it up), I believe that all indexing is done during the write sequence, even if its not a unique index. It wouldn't make sense to have special logic for writes that affect a unique index.

Related

Is it possible to run a "dummy" query to see how many documents _would_ be inserted

I am using MongoDB to track unique views of a resource.
Everytime a user views a specific resource for the first time, a new view is logged in the db.
If that same user views the same resource again, the unique compound index on the collection blocks the insert of the duplicate.
For bulk inserts, with { ordered: false }, Mongo allows the new views through and blocks the duplicates. The return value of the insert is an object with an insertedCount property, telling me how many docs made it past the unique index.
In some cases, I want to know how many docs would be inserted before running the query. Then, based on the dummy insertedCount, I would choose to run the query, or not.
Is there a way to test a query and have it do everything except actually inserting the docs?
I could solve this by running some js serverside to get the answer I need. But I would prefer to let the db do those checks

How do I remove the duplicate Key check before insert in mongoDB?

I use mongoDB to manage my DB for a yearly contest. Every year many users just renew their registration. MongoDB rejects duplicate emails, therefore they cannot register if they participated any of the edition of early years.
My question, is there any way to remove that limitation? Or maybe change the dup-key-check to be i.e. the "_id" (or whatever) instead of the "email"?
Apart from the mandatory _id field, MongoDB will only enforce uniqueness based on additional unique indexes that have been created. In your situation it sounds like there may be a unique index defined on { email: 1 }.
If that is not the logic that you wish to enforce, then you should drop that index and replace it with a different one. How exactly you define that really depends on your desired application logic. If you had a registrationYear field, for example, perhaps a compound unique index on both of those fields ({ email: 1, registrationYear: 1 }) would be appropriate.
But this probably isn't the only way to solve the problem. An alternative approach may be to combine a unique index with a partial index. With this approach, you could define as index as follows assuming that there is some active field in the document that becomes false after the specified amount of time:
db.foo.createIndex({ email: 1}, { unique: true, partialFilterExpression: { active: true } })
Such an index would only include documents that were currently considered active therefore only enforcing uniqueness on them. Once it was time to renew and an old document was no longer active the database would accept a new one.
Another alternative approach would be to just update the existing documents rather than creating new ones. Again this depends on what exactly you are trying to achieve, but you could use a similar approach of marking a document as no longer active and having the registration process perform an upsert (either an insert or an update).
Finally, if you don't need historical information at all then you could additionally do some sort of archival or deletion (perhaps via a TTL index) to expire the old documents.

MongoDb background indexing and unique index

When you create an index in MongoDb. There are 2 options:
Do foreground indexing and lock all write operations while doing so
Do background indexing and still allow records to be written in the mean time
My question is:
How can something like unique index be built in the background? What if a duplicated document is inserted while the index is building?
I believe this is the most relevant excerpt from the MongoDB docs:
Background indexing operations run in the background so that other database operations can run while creating the index. However, the mongo shell session or connection where you are creating the index will block until the index build is complete. To continue issuing commands to the database, open another connection or mongo instance.
Queries will not use partially-built indexes: the index will only be usable once the index build is complete.
So this means the client where you issued the command to create the index will remain blocked until the index is fully created. If, from another client, you're doing something like adding a duplicate document while the index is being built, it will insert the document without an error, but eventually your initial client will encounter an error that it was unable to complete the index because there is a duplicate key for the unique index.
Now, I actually ended up here while trying to understand what MongoID's index(..., {background: true}) option does, because it seems to imply that every write may perform the indexing portion of the write in the background, but my understanding now is that this option only applies to the initial creation of the index. This is explained in the introduction to the docs for the background option for MongoDB's createIndex method (which is not technically the same thing as MongoID's background option, but it clarifies the concept of the feature related to that option):
MongoDB provides several options that only affect the creation of the index [...] This section describes the uses of these creation options and their behavior.
Related: Some options that you can specify to createIndex() options control the properties of the index, which are not index creation options. For example, the unique option affects the behavior of the index after creation.
Referring MongoDB docs-
If a background index build is in progress when the mongod process terminates, when the instance restarts the index build will restart as foreground index build. If the index build encounters any errors, such as a duplicate key error, the mongod will exit with an error.
So there are two possibilities-
If index creation is completed then the document which you are trying to insert will give you instant error.
Or if index creation is in progress in background then you will be able to insert the document (because at the time of insertion the index is not there 100%). But later when index creation process tries to put index on your duplicate document then it will exit with error. This is same behavior as if you have duplicate documents and you try to create foreground index.
#mltsy
If, from another client, you're doing something like adding a duplicate document while the index is being built, it will insert the document without an error.
I am not sure this is correct,as Mongodb Doc described as below:
When building an index on a collection, the database that holds the collection is unavailable for read or write operations until the index build completes.
I used the mongoose to test this :
var uniqueUsernameSchema = new Schema({
username: {
index: { unique: true, background: false },
type: String,
required: true
}
})
var U = mongoose.model('U1', uniqueUsernameSchema)
var dup = [{ username: 'Val' }, { username: 'Val' }]
U.create(dup, function (e, d) {
console.log(e, d)
})
The unique index failed to build. This result showed the foreground option didnot block the write operation in MongoDB.

MongoDB update is slower with relevant indexes set

I am testing a small example for a sharded set up and I notice that updating an embedded field is slower when the search fields are indexed.
I know that indexes are updated during inserts but are the search indexes of the query also updated?
The query for the update and the fields that are updated are not related to any manner.
e.g. (tested with toy data) :
{
id:... (sharded on the id)
embedded :[{ 'a':..,'b':...,'c':.... (indexed on a,b,c),
data:.... (data is what gets updated)
},
...
]
}
In the example above the query for the update is on a,b,c
and the values for the update affect only the data.
The only reasons I can think is that indexes are updated even if the updates are not on the indexed fields. The search part of the update seems to use the indexes when issuing a "find" query with with explain.
Could there be another reason?
I think wdberkeley -on the comments- gives the best explanation.
The document moves because it grows larger and the indexes are updated every time.
As he also notes, updating multiple keys is "bad"....I thinks I will avoid this design for now.

Mongo find unique results

What's the easiest way to get all the documents from a collection that are unique based on a single field.
I know I can use db.collections.distrinct to get an array of all the distinct values of a field, but I want to get the first (or really any one) document for every distinct value of one field.
e.g. if the database contained:
{number:1, data:'Test 1'}
{number:1, data:'This is something else'}
{number:2, data:'I'm bad at examples'}
{number:3, data:'I guess there\'s room for one more'}
it would return (based on number being unique:
{number:1, data:'Test 1'}
{number:2, data:'I'm bad at examples'}
{number:3, data:'I guess there\'s room for one more'}
Edit: I should add that the server is running Mongo 2.0.8 so no aggregation and there's more results than group will support.
Update to 2.4 and use aggregation :)
When you really need to stick to the old version of MongoDB due to too much red tape involved, you could use MapReduce.
In MapReduce, the map function transforms each document of the collection into a new document and a distinctive key. The reduce function is used to merge documents with the same distincitve key into one.
Your map function would emit your documents as-is and with the number-field as unique key. It would look like this:
var mapFunction = function(document) {
emit(document.number, document);
}
Your reduce-function receives arrays of documents with the same key, and is supposed to somehow turn them into one document. In this case it would just discard all but the first document with the same key:
var reduceFunction = function(key, documents) {
return documents[0];
}
Unfortunately, MapReduce has some problems. It can't use indexes, so at least two javascript functions are executed for every single document in the collections (it can be limited by pre-excluding some documents with the query-argument to the mapReduce command). When you have a large collection, this can take a while. You also can't fully control how the docments created by MapReduce are formed. They always have two fields, _id with the key and value with the document you returned for the key.
MapReduce is also hard to debug an troubleshoot.
tl;dr: Update to 2.4