MongoDB update (add field to nearly every document) is very very slow - mongodb

I am working on a MongoDB cluster.
One DB named bnccdb, with a collection named AnalysedLiterature. It has about 7,000,000 documents in it.
For each document, I want to add two keys and then update this document.
I am using Java client. So I query this document, add these both keys to the BasicDBObject and then I use the save() method to update this object. I found the speed is so slow that it will take me nearly several weeks to complete the update for this whole collection.
I wonder the reason why my update operation is so slow is that what I do is add keys.
This will cause a disk/block re-arrangement in the background, so this operation becomes extremely time-consuming.
After I changed from save() to update, problem remains.This is my status information.
From the output of mongostat,it is very obvious that the faults rate is absolutely high.But I don't know what cased it.
Anyone can help me?

Related

Performance loss with big size of collections

I've a collection that name "test" and has 132K documents in it. When I get first document of the collection it takes between 2-5ms but it's not same for last documation. It takes 100-200ms to pull.
So I've decided to ask the community.
My questions
What is the best document amount in one collection for the performance?
Why does it take so long to get last document from the collection? (I actually don't know how mongo works partially.)
What should I do for this issue and future problems?
After some search of how mongodb works, I found the solution. I didn't use any indexes for my collection so whenever I try to pull something it scans each data and each document. After creating some indexes for my needs, it is much more faster, actually 1ms, than before.
Conclusion
Create indexes for your collection and your needs. It'd be effective write and read operation both. Do not forget to search more 'cause there're some options like background which prevents blocking operations while creating index.

MongoDB - Save vs Update [duplicate]

This question already has answers here:
Mongoose difference between .save() and using update()
(4 answers)
Closed 3 years ago.
I have around 400 fields in my collection (including both at top level as well as embedded), following is the nature of write queries:
All write queries always update single document and an average of 60
fields in that document.
There are indexed fields in collection but no write query updates an indexed field.
Volume of write queries is very large.
I can use either .save() or .update() to update the document. In update I only pass the fields that need to be updated, whereas in save I pass the entire document. I want to know if using update in this case will give me better performance than save (or vice versa) or does it not make any difference at the database level and both perform equally well?
It doesn't make any significant change in performance. The reasons are as below
When you save or update a document in mongodb, you probably decide to call save or update from another application that could be written in C#, Java, JavaScript, PHP or someother language.
In this case, there is a Inter process communication (or network call if you mongo db is running in another machine). When compared to this the difference in time take to selectively replace a document by update and completely replace the document by save is negligible. By the way, save and update both will probably have run time complexity of O(n) if there is no indexes.
For a document with 250 fields, the size of the document is probably not too big that we have to consider. If the size of the update document is significantly smaller that size of the save document, then please use update.
Else use a save/update depending on the which is more elegant in the client side code.

Get nth item from a collection

I'm in the learning phase of mongodb.
I have a test website project where each step of a story is a domain.com/step
for instance, step 14 is accessed through domain.com/14
In other words, for the above case, I will need to access 14th document in my collection to serve it.
I've been using find().skip(n).limit(1) method for this so far to return nth document however it becomes extremely slow when there are too many documents to skip. So I need a more efficient way to get the nth document in my collection.
Any ideas are appreciated.
Add a field to your documents which tells you which step it is, add an index to that field and query by it.
Document:
{
step:14
text:"text",
date:date,
imageurl:"imageurl"
}
Index:
db.collection.createIndex({step:1});
Query:
db.collection.find({step:14});
Relying on natural order in the collection is not just slow (as you found out), it is also unreliable. When you start a new collection and insert a bunch of documents, you will usually find them in the order you inserted them. But when you change documents after they were inserted, it can happen that the order gets messed up in unpredictable ways. So never rely on insertion order being consistent.
Exception: Capped Collections guarantee that insertion order stays consistent. But there are very few use-cases where these are useful, and I don't think you have such a case here.

What is the preferred way to add many fields to all documents in a MongoDB collection?

I have have a Python application that is iteratively going through every document in a MongoDB (3.0.2) collection (typically between 10K and 1M documents), and adding new fields (probably doubling/tripling the number of fields in the document).
My initial thought was that I would use upsert the entire of the revised documents (using pyMongo) - now I'm questioning that:
Given that the revised documents are significantly bigger should I be inserting only the new fields, or just replacing the document?
Also, is it better to perform a write to the collection on a document by document basis or in bulk?
this is actually a great question that can be solved a few different ways depending on how you are managing your data.
if you are upserting additional fields does this mean your data is appending additional fields at a later point in time with the only changes being the addition of the additional fields? if so you could set the ttl on your documents so that the old ones drop off over time. keep in mind that if you do this you will want to set an index that sorts your results by descending _id so that the most recent additions are selected before the older ones.
the benefit of this of doing it this way is that your are continually writing data as opposed to seeking and updating data so it is faster.
in regards to upserts vs bulk inserts. bulk inserts are always faster than upserts since bulk upserting requires you to find the original document first.
Given that the revised documents are significantly bigger should I be inserting only the new fields, or just replacing the document?
you really need to understand your data fully to determine what is best but if only change to the data is additional fields or changes that only need to be considered from that point forward then bulk inserting and setting a ttl on your older data is the better method from the stand point of write operations as opposed to seek, find and update. when using this method you will want to db.document.find_one() as opposed to db.document.find() so that only your current record is returned.
Also, is it better to perform a write to the collection on a document by document basis or in bulk?
bulk inserts will be faster than inserting each one sequentially.

Why mongodb only updates the first matching document in the collection?

Consider a collection student contains the following documents.
{name:”Nithin”,age:23}
{name:”Nithin”,age:25}
{name:”Nithin”,age:28}
{name:”Nithin”,age:12}
I want to update all the documents whose name is “Nithin” as age=60.
If we execute the following query it will only update the first document.
db.student.update({name:”Nithin”},{age:60})
For update all the documents I have to use the query
db.student.update({name:”Nithin”},{age:60},false,true)
or
db.student.update({name:”Nithin”},{age:60},multi:true)
What is the reason by default mongodb not updating all the documents by executing db.student.update({name:”Nithin”},{age:60}) ? What is the motivation for creating separate queries for updating all the documents? Is it improving the performance?
Originally, in the early early days of MongoDB (pre 1.1) it was not possible to update multiple documents. This was a feature added around 1.1.3.
You can see it in the release notes, New Feature 268.
I'm guessing this was not enabled by default for backwards compatibility with previous versions.
This may not really be the reason but I find the additional multi parameter as a safeguard to prevent accidental update of multiple records when one intends to update a single document only, something like accidentally performing UPDATE...SET on SQL without specifying additional constraints.
Again this is just an assumption but may not really be the case.
I suppose part of the reason might be to avoid people coming from the SQL world to think about multi-document updates as isolated transactions.
In fact, during a long update MongoDB will periodically yield control to other queries which can potentially modify the same dataset.
So, by explicitly setting multi=true you are somewhat acknowledging this fact (well, not really, but I guess that's the spirit...)