My requirement is to update or add an array field into large collection. I've index on filed "Roles". While update this collection it is taking arounf 3 miniutes .. Before creating index on "role" filed it was taking less than 40 sec to update/add fileds in the collection. We need the index to read the collection . But while update it makes trouble. Is it possible to disable index while update in mongodb.. Is there any funtions available with mongo? My mongodb version is 2.6.5
Please advice.
In Mongodb Indexes are updated synchronously with the insert/update. There is no way to pause the update of Indexes.
If your indexes are already created then you have two options
Drop the index and recreate the index, but it will have the following impacts
Queries executed at the time of the insert/update is happening will miss the index.
Rebuilding index is too expensive
Wait for the index to be updated
Queries will not use partially-built indexes: the index will only be usable once the index build is complete.
Source: http://docs.mongodb.org/manual/core/index-creation/
That means your index will block any query on the field/collection as long as the index is not complete. Therefore your have no chance but waiting for the index to be updated after adding new data.
Maybe try using another index.
Related
I have an 13GB of documents in a collection in mongoDB where I need to update a field ip_address. The original value and the replacement values are given in excel sheet. I am looping through each value from excel and updating it using:
old_value={"ip_address":original_value}
new_value={"$set":{"ip_address":replacement_value}
tableConnection.update_many(old_value,new_value)
In order to process 1 update it is taking over 2 minutes. I have 1500 updates to do. Is there any better way to do it?
Bulk operations won't speed up your updates by much; the best way to achieve a performance increase is to add an index. This can be as simple as:
db.collection.createIndex({'ip_address': 1})
Refer to the documentation regarding potential blocking on certain older versions of the database https://docs.mongodb.com/manual/reference/method/db.collection.createIndex/
The index will take up addtional storage; if that is an issue you can delete the index once you've completed the updates.
To add on to the above answer given by Belly Buster the syntax to perform indexing and bulk_write in PyMongo that worked for me is :
db.collection.create_index("ip_address")
requests = [UpdateMany({'ip_address': 'xx.xx.xx.xx'}, {'$set': {'ip_address':'yy.yy.yy.yy'}}),[UpdateMany({'ip_address': 'xx.xx.xx.xx'}, {'$set': {'ip_address':'yy.yy.yy.yy'}})]
try :
db.collection.bulk_write(requests, ordered=False)
except BulkWriteError as bwe:
print(bwe)
I have a collection with several billion documents and need to create a unique multi-key index for every attribute of my documents.
The problem is, I get an error if I try to do that because the generated keys would be too large.
pymongo.errors.OperationFailure: WiredTigerIndex::insert: key too large to index, failing
I found out MongoDB lets you create hashed indexes, which would resolve this problem, however they are not to be used for multi-key indexes.
How can i resolve this?
My first idea was to create another attribute for each of my document with an hash of every value of its attributes, then creating an index on that new field.
However this would mean to recalculate the hash every time I wish to add a new attribute, plus the excessive amount of time necessary to create both the hashes and the indexes.
This is a feature added in mongoDB since 2.6 to prevent the total size of an index entry to exceed 1024 bytes (also known as Index Key Length Limit).
In MongoDB 2.6, if you attempt to insert or update a document so that the value of an indexed field is longer than the Index Key Length Limit, the operation will fail and return an error to the client. In previous versions of MongoDB, these operations would successfully insert or modify a document but the index or indexes would not include references to the document.
For migration purposes and other temporary scenarios you can downgrade to 2.4 handling of this use case where this exception would not be triggered via setting this mongoDB server flag:
db.getSiblingDB('admin').runCommand( { setParameter: 1, failIndexKeyTooLong: false } )
This however is not recommended.
Also consider that creating indexes for every attribute of your documents may not be the optimal solution at all.
Have you examined how you query your documents and on which fields you key on? Have you used explain to view the query plan? It would be an exception to the rule if you tell us that you query on all fields all the time.
Here are the recommended MongoDB indexing strategies.
Excessive indexing has a price as well and should be avoided.
I've read the docs of mongodb and get to know that if I want to do text search I should use create index. But can I create index for each data during insert process? If I can, how should I do?
The index is created once for a collection, i.e. for text index do this
db.yourCollection.createIndex({yourText:"text"})
The index is updated automatically on every insert operation. You don't have to do this manually but have it in mind. This is what makes insert operations expensive. The more indexes you have the longer it takes to insert a document and update all of them.
If you want a link to the documentation, this topic is faced here.
I created a compound index on my db using:
db.collection.ensureIndex({a:1,b:1})
Now I realized I need another level in the composition:
db.collection.ensureIndex({a:1,b:1,c:1})
Will Mongodb create a whole new index, or will it know to modify the existing one?
Calling ensureIndex with different fields will create a new index.
You can confirm this after running both commands by using getIndexes to see what indexes exist for your collection:
db.collection.getIndexes()
If you no longer want the original index, you can drop it using:
db.collection.dropIndex({a:1,b:1})
I am testing a small example for a sharded set up and I notice that updating an embedded field is slower when the search fields are indexed.
I know that indexes are updated during inserts but are the search indexes of the query also updated?
The query for the update and the fields that are updated are not related to any manner.
e.g. (tested with toy data) :
{
id:... (sharded on the id)
embedded :[{ 'a':..,'b':...,'c':.... (indexed on a,b,c),
data:.... (data is what gets updated)
},
...
]
}
In the example above the query for the update is on a,b,c
and the values for the update affect only the data.
The only reasons I can think is that indexes are updated even if the updates are not on the indexed fields. The search part of the update seems to use the indexes when issuing a "find" query with with explain.
Could there be another reason?
I think wdberkeley -on the comments- gives the best explanation.
The document moves because it grows larger and the indexes are updated every time.
As he also notes, updating multiple keys is "bad"....I thinks I will avoid this design for now.