I'm using mongo and I have multiple queries to insert at a time so I use a for loop to insert into the db. The problem is that each query falls under a key so I check if a key exists of not and if it doesn't I add it to the db, if it does, I append it. If I have multiple queries with the same key (since mongo inserts asynchronously) these two same keys could be identified as "nonexistent" in the db since they could be running in parallel. Is there a way around this?
If you're writing a lot of documents you're probably better off using bulk operations in mongo https://docs.mongodb.com/manual/core/bulk-write-operations/.
You can write the queries as upserts. this questions is very similar I think to what you are trying to accomplish. How to properly do a Bulk upsert/update in MongoDB.
If you do it as an ordered bulk operation you should not have the problem with two queries running simultaneously.
Related
I want to convert the MongoDB local Oplog file into an actual real query so I can execute that query and get the exact copy database.
Is there any package, file, build-in tools, or script for it?
It's not possible to get the exact query from the oplog entry because MongoDB doesn't save the query.
The oplog has an entry for each atomic modification performed. Multi-inserts/updates/deletes performed on the mongo instance using a single query are converted to multiple entries and written to the oplog collection. For example, if we insert 10,000 documents using Bulk.insert(), 10,000 new entries will be created in the oplog collection. Now the same can also be done by firing 10,000 Collection.insertOne() queries. The oplog entries would look identical! There is no way to tell which one actually happened.
Sorry, but that is impossible.
The reason is that, that opLog doesn't have queries. OpLog includes only changes (add, update, delete) to data, and it's there for replication and redo.
To get an exact copy of DB, it's called "replication", and that is of course supported by the system.
To "replicate" changes to f.ex. one DB or collection, you can use https://www.mongodb.com/docs/manual/changeStreams/.
You can get the query from the Oplogs. Oplog defines multiple op types, for instance op: "i","u", "d" etc, are for insert, update, delete. For these types, check the "o"/"o2" fields which have corresponding data and filters.
Now based on the op types call the corresponding driver APIs db.collection.insert()/update()/delete().
Hi I need to insert around 100,000 records to mongodb. I am using BulkWriteOperation api to insert a batch of records. I split up the whole and inserting a batch of 1000 records to mongo. If any one of the record in a batchis failed to insert, then the whole batch is not being inserted to mongo. Is there anyway to get the list of records of the failed batch alone, so that i can do a recursive and insert the remaining records to mongo. Or is there any way to do bulk insert to mongodb and the all the records except the failed ones needs to be inserted.
Thanks in advance.
Could you also make sure to mention which language you are using?
In case of python, I found that using the insert_many operation with ordered=False is better for this use case (This will only work if the ordering of your inserts don't matter - as required, it will not fail if some of the inserts fail). The BulkWriteError gives the details of the inserts that have failed and you can use the error code to decide what to do afterwards.
It should work similarly for other languages. Do let me know if it doesn't work.
Edit: This question seems similar to another question
I see that mongo has bulk insert, but I see nowhere the capability to do bulk inserts across multiple collections.
Since I do not see it anywhere I'm assuming its not available from Mongo.
Any specific reason for that?
You are correct in that the bulk API operates on single collections only.
There is no specific reason but the APIs in general are collection-scoped so a "cross-collection bulk insert" would be a design deviation.
You can of course set up multiple bulk API objects in a program, each on a different collection. Keep in mind that while this wouldn't be transactional (in the startTrans-commit-rollback sense), neither is bulk insert.
I understand you cannot do transactions in MongoDB and the thinking is that its not needed because everything locks the whole database or collection, I am not sure which. However how then do you perform the following?
How do I chain together multiple insert, update, delete or select queries in mongodb so that other queries that might operate on the same data wait until these queries finish? An analogy would be serialization transaction isolation in ms sql server.
more..
I want to insert/update record into collection A and update a record in collection B and then read Collection A and B but I don't want anyone (process or thread) to read or write to collection A or B until BOTH A and B have been updated or inserted by the first queries.
Yes, that's absolutely possible.
It is called ordered bulk operations on planet Mongo and works like this in the mongo shell:
bulk = db.emptyCollection.initializeOrderedBulkOp()
bulk.insert({name:"First document"})
bulk.find({name:"First document"})
.update({$set:{name:"First document, updated"}})
bulk.execute()
bulk.findOne()
> {_id: <someObjectId>, name:"First document, updated"}
Please read the manual regarding Bulk Write Operations for details.
Edit: Somehow is misread your question. It isn't possible for two collections. Remember though, that you can have different documents in one collection. Some ODMs even allow to have different models saved to the same collection. Exploiting this, you should be able to achieve what you want using the above bulk operations. You may want to combine this with locking to prevent writing. But preventing reading and writing would be the same as an transaction in terms of global and possibly distributed locks.
What is the performance gain by using bulk inserts vs regular inserts in MongoDB and pymongo specifically. Are bulk inserts just a wrapper for regular inserts?
Bulk inserts are no wrappers for regular inserts. A bulk insert operation contains many documents sent as a whole. It saves as many database round trips. It is much more performant since you don't have to send each document over the network separately.
#dsmilkov Thats an advantage as you dont have to open a connection every single time.