What is the performance gain by using bulk inserts vs regular inserts in MongoDB and pymongo specifically. Are bulk inserts just a wrapper for regular inserts?
Bulk inserts are no wrappers for regular inserts. A bulk insert operation contains many documents sent as a whole. It saves as many database round trips. It is much more performant since you don't have to send each document over the network separately.
#dsmilkov Thats an advantage as you dont have to open a connection every single time.
Related
I am building a bulk operation for my application and it only consist of single-document write operations.
However, I need each operation to have mongodb "retryable writes" enabled correctly.
So I am wondering if an unordered bulk write works just fine for it or wether it only works with an ordered bulk operation (which would be less efficient) ?
Beside, I have correctly added the retryable write option in my connection string.
Thanks in advance,
Yes, the operations can be retried in an unordered operation. From the MongoDB docs:
Bulk write operations that only consist of the single-document write operations. A retryable bulk operation can include any combination of the specified write operations but cannot include any multi-document write operations, such as updateMany.
Note that the write is only retried a single time, so the application still needs to be able to deal with write failures.
I'm using mongo and I have multiple queries to insert at a time so I use a for loop to insert into the db. The problem is that each query falls under a key so I check if a key exists of not and if it doesn't I add it to the db, if it does, I append it. If I have multiple queries with the same key (since mongo inserts asynchronously) these two same keys could be identified as "nonexistent" in the db since they could be running in parallel. Is there a way around this?
If you're writing a lot of documents you're probably better off using bulk operations in mongo https://docs.mongodb.com/manual/core/bulk-write-operations/.
You can write the queries as upserts. this questions is very similar I think to what you are trying to accomplish. How to properly do a Bulk upsert/update in MongoDB.
If you do it as an ordered bulk operation you should not have the problem with two queries running simultaneously.
I see that mongo has bulk insert, but I see nowhere the capability to do bulk inserts across multiple collections.
Since I do not see it anywhere I'm assuming its not available from Mongo.
Any specific reason for that?
You are correct in that the bulk API operates on single collections only.
There is no specific reason but the APIs in general are collection-scoped so a "cross-collection bulk insert" would be a design deviation.
You can of course set up multiple bulk API objects in a program, each on a different collection. Keep in mind that while this wouldn't be transactional (in the startTrans-commit-rollback sense), neither is bulk insert.
I understand you cannot do transactions in MongoDB and the thinking is that its not needed because everything locks the whole database or collection, I am not sure which. However how then do you perform the following?
How do I chain together multiple insert, update, delete or select queries in mongodb so that other queries that might operate on the same data wait until these queries finish? An analogy would be serialization transaction isolation in ms sql server.
more..
I want to insert/update record into collection A and update a record in collection B and then read Collection A and B but I don't want anyone (process or thread) to read or write to collection A or B until BOTH A and B have been updated or inserted by the first queries.
Yes, that's absolutely possible.
It is called ordered bulk operations on planet Mongo and works like this in the mongo shell:
bulk = db.emptyCollection.initializeOrderedBulkOp()
bulk.insert({name:"First document"})
bulk.find({name:"First document"})
.update({$set:{name:"First document, updated"}})
bulk.execute()
bulk.findOne()
> {_id: <someObjectId>, name:"First document, updated"}
Please read the manual regarding Bulk Write Operations for details.
Edit: Somehow is misread your question. It isn't possible for two collections. Remember though, that you can have different documents in one collection. Some ODMs even allow to have different models saved to the same collection. Exploiting this, you should be able to achieve what you want using the above bulk operations. You may want to combine this with locking to prevent writing. But preventing reading and writing would be the same as an transaction in terms of global and possibly distributed locks.
Hi i am using mongodb as my database. My question is how can i make sure that when i do a query for one document or lots of documents. Example:
mongo.GetCollection("orders").Find(Query.EQ("OrderStatus", "unshiped")).ToList();
How to make sure that the documents that are in this list are locked and nobody can edit them and what ever i do in the code with this records when i loop them true and then save them it should unlock it
MongoDB supports atomic operations on single documents. MongoDB does
not support traditional locking and complex transactions for a number
of reasons:
First, in sharded environments, distributed locks could be expensive and slow. Mongo DB's goal is to be lightweight and fast.
We dislike the concept of deadlocks. We want the system to be simple and predictable without these sort of surprises.
We want Mongo DB to work well for realtime problems. If an operation may execute which locks large amounts of data, it might stop
some small light queries for an extended period of time.
I think your best bet is adding a locked property to your documents, and to go from there.
You can add the isLocked field in collection. Before update you can lock and unlock to finish the work. If you want more spesific lock mechanism, Add Guid in LockedId field.