I am using Mongo java driver 3.11.1 and Mongo Version 4.2.0 for my development.I am still learning mongo. My application receives data and either have to do insert or replace the existing document i.e. do an upsert.
Each document size is 780-1000 bytes as of now and each collection can have more than 3 millions records.
Approach 1: I tried using findOneandreplace for each document and it was taking more than 15 minutes to save the data.
Approach-2 I changed it to bulkwrite using below, which resulted in ~6-7 minutes for saving 20000 records.
List<Data> dataList;
dataList.forEach(data-> {
Document updatedDocument = new Document(data.getFields());
updates.add(new ReplaceOneModel(eq("DataId", data.getId()), updatedDocument, updateOptions));
});
final BulkWriteResult bulkWriteResult = mongoCollection.bulkWrite(updates);
3) I tried using collection.insertMany which takes 2 seconds to store the data.
As per driver code, insertMany also Internally InsertMany uses MixedBulkWriteOperation for inserting the data similar to bulkWrite.
My queries are -
a) I have to do upsert operation, Please let me know where i am doing any mistakes.
- Created the indexes on DataId field but resulted in <2 miliiseconds difference in terms of performance.
- Tried using writeConcern of W1, but performance is still the same.
b) why insertMany's performance is faster than bulk write. I could understand in terms of few seconds difference but unable to figure out the reason for 2-3 seconds for insertMany and 5-7 minutes for bulkwrite.
c) Are there any approaches that can be used to solve this situation.
This problem was solved to greater extent by adding index on DataId Field. Previously i had created index on DataId field but forgot to create index after creating collection.
This link How to improve MongoDB insert performance helped in resolving the problem
Related
I have an 13GB of documents in a collection in mongoDB where I need to update a field ip_address. The original value and the replacement values are given in excel sheet. I am looping through each value from excel and updating it using:
old_value={"ip_address":original_value}
new_value={"$set":{"ip_address":replacement_value}
tableConnection.update_many(old_value,new_value)
In order to process 1 update it is taking over 2 minutes. I have 1500 updates to do. Is there any better way to do it?
Bulk operations won't speed up your updates by much; the best way to achieve a performance increase is to add an index. This can be as simple as:
db.collection.createIndex({'ip_address': 1})
Refer to the documentation regarding potential blocking on certain older versions of the database https://docs.mongodb.com/manual/reference/method/db.collection.createIndex/
The index will take up addtional storage; if that is an issue you can delete the index once you've completed the updates.
To add on to the above answer given by Belly Buster the syntax to perform indexing and bulk_write in PyMongo that worked for me is :
db.collection.create_index("ip_address")
requests = [UpdateMany({'ip_address': 'xx.xx.xx.xx'}, {'$set': {'ip_address':'yy.yy.yy.yy'}}),[UpdateMany({'ip_address': 'xx.xx.xx.xx'}, {'$set': {'ip_address':'yy.yy.yy.yy'}})]
try :
db.collection.bulk_write(requests, ordered=False)
except BulkWriteError as bwe:
print(bwe)
Does anyone know how to create efficient indexes in MongoDB to speed up the response time? I'm getting the right response, it's just taking too much time. My I need to get it below 100ms. I'm using the jenssegers/laravel-mongodb package.
https://pastebin.com/GHCwaJqT Here is the response I'm getting. The limit is 15 rows and it takes 300ms
$result = $this->model->skip($this->page)->limit($this->limit)->get();
After looking at your response (15 rows), it seems you are getting this data for the same user_id. If that is the case , what you can do is, create an index on user_id and use user_id as a filter in query to make MongoDB use this index.
I have a 30GB mongodb 3.6 collection with 500k documents. the main _id field is a float timestamp (i didnt manually define an index, but inserted on the _id field, assuming from documentation that _id will be used as index and automatically maintained.
Now if I query last data I do in Python 3
cursor = cry.find().sort('_id', pymongo.DESCENDING).limit(600)
df = list(cursor)
However, just querying the last 600 records takes about 1 minute. How can this be if the index is maintained? Is there a faster way to query (like by natural order) or do I need to re-index although documentation says its done automatically?
I also tried
cursor=cry.find().skip(cry.count() - 1000)
df = list(cursor)
but this is just as slow
My requirement is to update or add an array field into large collection. I've index on filed "Roles". While update this collection it is taking arounf 3 miniutes .. Before creating index on "role" filed it was taking less than 40 sec to update/add fileds in the collection. We need the index to read the collection . But while update it makes trouble. Is it possible to disable index while update in mongodb.. Is there any funtions available with mongo? My mongodb version is 2.6.5
Please advice.
In Mongodb Indexes are updated synchronously with the insert/update. There is no way to pause the update of Indexes.
If your indexes are already created then you have two options
Drop the index and recreate the index, but it will have the following impacts
Queries executed at the time of the insert/update is happening will miss the index.
Rebuilding index is too expensive
Wait for the index to be updated
Queries will not use partially-built indexes: the index will only be usable once the index build is complete.
Source: http://docs.mongodb.org/manual/core/index-creation/
That means your index will block any query on the field/collection as long as the index is not complete. Therefore your have no chance but waiting for the index to be updated after adding new data.
Maybe try using another index.
I am testing a small example for a sharded set up and I notice that updating an embedded field is slower when the search fields are indexed.
I know that indexes are updated during inserts but are the search indexes of the query also updated?
The query for the update and the fields that are updated are not related to any manner.
e.g. (tested with toy data) :
{
id:... (sharded on the id)
embedded :[{ 'a':..,'b':...,'c':.... (indexed on a,b,c),
data:.... (data is what gets updated)
},
...
]
}
In the example above the query for the update is on a,b,c
and the values for the update affect only the data.
The only reasons I can think is that indexes are updated even if the updates are not on the indexed fields. The search part of the update seems to use the indexes when issuing a "find" query with with explain.
Could there be another reason?
I think wdberkeley -on the comments- gives the best explanation.
The document moves because it grows larger and the indexes are updated every time.
As he also notes, updating multiple keys is "bad"....I thinks I will avoid this design for now.