Is it possible to perform multiple DB operations in a single transaction in MongoDB? - mongodb

Suppose I have two collections A and B
I want to perform an operation
db.A.remove({_id:1});
db.B.insert({_id:"1","name":"dev"})
I know MongoDB maintains atomicity at the document level. Is it possible to perform the above set of operation in a single transaction?

Yes, now you can!
MongoDB has had atomic write operations on the level of a single document for a long time. But, MongoDB did not support such atomicity in the case of multi-document operations until v4.0.0. Multi-document operations are now atomic in nature thanks to the release of MongoDB Transactions.
But remember that transactions are only supported in replica sets using the WiredTiger storage engine, and not in standalone servers (but may support on standalone servers too in future!)
Here is a mongo shell example also provided in the official docs:
// Start a session.
session = db.getMongo().startSession( { readPreference: { mode: "primary" } } );
employeesCollection = session.getDatabase("hr").employees;
eventsCollection = session.getDatabase("reporting").events;
// Start a transaction
session.startTransaction( { readConcern: { level: "snapshot" }, writeConcern: { w: "majority" } } );
//As many operations as you want inside this transaction
try {
employeesCollection.updateOne( { employee: 3 }, { $set: { status: "Inactive" } } );
eventsCollection.insertOne( { employee: 3, status: { new: "Inactive", old: "Active" } } );
} catch (error) {
// Abort transaction on error
session.abortTransaction();
throw error;
}
// Commit the transaction using write concern set at transaction start
session.commitTransaction();
session.endSession();
I recommend you reading this and this to better understand how to use!

MongoDB can not guarantee atomicity when more than one document is involved.
Also, MongoDB does not offer any single operations which affect more than one collection.
When you want to do whatever you actually want to do in an atomic manner, you need to merge collections A and B into one collection. Remember that MongoDB is a schemaless database. You can store documents of different types in one collection and you can perform single atomic update operations which perform multiple changes to a document. That means that a single update can transform a document of type A into a document of type B.
To tell different types in the same collection apart, you could have a type field and add this to all of your queries, or you could use duck-typing and identify types by checking if a certain field $exists.

Related

Bulk.getOperations() in MongoDB Node driver

I'd like to view the results of a bulk operation, specifically to know the IDs of the documents that were updated. I understand that this information is made available through the Bulk.getOperations() method. However, it doesn't appear that this method is available through the MongoDB NodeJS library (at least, the one I'm using).
Could you please let me know if there's something I'm doing wrong here:
const bulk = db.collection('companies').initializeOrderedBulkOp()
const results = getLatestFinanicialResults() // from remote API
results.forEach(result =>
bulk.find({
name: result.companyName,
report: { $ne: result.report }
}).updateOne([
{ $unset: 'prevReport' },
{ $set: { prevReport: '$report' } },
{ $unset: 'report' },
{ $set: { report: result.report } }
]))
await bulk.execute()
await bulk.getOperations() // <-- fails, undefined in Typescript library
I get a static IDE error:
Uncaught TypeError: bulk.getOperations is not a function
I'd like to view the results of a bulk operation, specifically to know the IDs of the documents that were updated
As of currently (MongoDB server v6.x) There is no methods to return IDs for updated documents from a bulk operations (only insert and upsert operations). However, there may be a work around depending on your use case.
The manual that you linked for Bulk.getOperations() is for mongosh, which is a MongoDB Shell application. If you look into the source code for getOperations() in mongosh, it's just a convenient wrapper for batches'. The method batches` returns a list of operations sent for the bulk execution.
As you are utilising ordered bulk operations, MongoDB executes the operations serially. If an error occurs during the processing of one of the write operations, MongoDB will return without processing any remaining write operations in the list.
Depending on the use case, if you modify the bulk.find() part to contain a search for _id for example:
bulk.find({"_id": result._id}).updateOne({$set:{prevReport:"$report"}});
You should be able to see the _id value of the operation in the batches, i.e.
await batch.execute();
console.log(JSON.stringify(batch.batches));
Example output:
{
"originalZeroIndex":0,
"currentIndex":0,
"originalIndexes":[0],
"batchType":2,
"operations":[{"q":{"_id":"634354787d080d3a1e3da51f"},
"u":{"$set":{"prevReport":"$report"}}],
"size":0,
"sizeBytes":0
}
For additional information, you could also retrieve the BulkWriteResult. For example, the getLastOp to retrieve the last operation (in case of a failure)

Performance issues related to $nin/$ne querying in large database

I am working on a pipeline where multiple microservices(workers) modify and add attributes to documents. Some of them have to make sure the document was already processed by another microservice and/or make sure they don't process a document twice.
I've already tried two different data structures for this: array and object:
{
...other_attributes
worker_history_array: ["worker_1", "worker_2", ...]
woker_history_object: {"worker_1": true, "worker_2": true, ...}
}
I also created indexes for the 2 fields
{ "worker_history_array": 1 }
{ "worker_history_object.$**": 1 }
Both data structures use the index and work very well when querying for the existence of a worker in the history:
{
"worker_history_array": "worker_1"
}
{
"worker_history_object.worker_1": true
}
But I can't seem to find a query that is fast/ hits the index when looking if a worker did not already process this document. All of those queries perform awfully:
{
"worker_history_array": { $ne: "worker_1" }
}
{
"worker_history_array": { $nin: ["worker_1"] }
}
{
"worker_history_object.worker_1": { $exists: false }
}
{
"worker_history_object.worker_1":{ $not: { $exists: true } }
}
{
"worker_history_object.worker_1": { $ne: true }
}
Performance is already bad with 500k documents, but the database will grow to millions of documents.
Is there a way to improve the query performance?
Can I work around the low selectivity of $ne and $nin?
Different index?
Different data structure?
I don't think it matters but I'm using MongoDB Atlas (MongoDB 4.4.1, cluster with read replicas) on Google Cloud and examined the performance of the queries with MongoDB Compass.
Additional Infos/Restrictions:
Millions of records
Hundreds of workers
I don't know all workers beforehand
Not every worker processes every document (some may only work on documents with type: "x" while others work only on documents with type: "y")
No worker should have knowledge about the pipeline, only about the worker that directly precedes it.
Any help would be greatly appreciated.

Concurrent partial updates in Mongo collection

Consider the following mongo document
{
_id:...
param1:oldValue1
param2:oldValue2
}
Suppose if am trying to do two concurrent partial updates with the following queries:
db.collection.update( { _id:...} , { $set: { param1 : "newValue1" } }
db.collection.update( { _id:...} , { $set: { param2 : "newValue2" } }
Will I get the following docuemnt state in mongo after these concurrent partial updates:
{
_id:...
param1:newValue1
param2:newValue2
}
Does two concurrent updates leave the document with updated values considering the fact that the concurrent updates dont have common fields.without concurrent modification issue?
Yes, regardless of the execution order of the two updates, the doc will end up as you show it. This is because the two atomic $set operations target distinct fields, and any field not referenced in the update isn't modified.

Isolation of bulk operations in MongoDB

There's a new kind of operations in 2.6 that's called bulk operations. It resembles me transactions - user can specify a set of writes and subsequently execute them just like described below
var bulk = db.users.initializeOrderedBulkOp();
bulk.insert( { user: "abc123", status: "A", points: 0 } );
bulk.insert( { user: "ijk123", status: "A", points: 0 } );
bulk.insert( { user: "mop123", status: "P", points: 0 } );
bulk.find( { status: "D" } ).remove();
bulk.find( { status: "P" } ).update( { $set: { comment: "Pending" } } );
bulk.execute();
Is bulk operation atomic? Does a potential consumer experiences non-repeatable or phantom reads?
From mongo docs:
Operations on a single document are always atomic with MongoDB databases, however, operations that involve multiple documents, which are often referred to as "multi-document transactions", are not atomic.
Ordered vs Unordered Operations
Bulk write operations can be either ordered or unordered. With an ordered list of operations, MongoDB executes the operations serially. If an error occurs during the processing of one of the write operations, MongoDB will return without processing any remaining write operations in the list.
http://docs.mongodb.org/manual/core/bulk-write-operations/
Conclusion: there is no transaction in bulk operations

In Mongo any way to do check and setting like atomic operation?

Is in Mongo any way to do check and setting like atomic operation ? I am making booking for hotels and if there is free room you can reserve, but what if two or more people want to reserve in same time. Is there anything similar to transaction in Mongo or any way to solve this problem ?
Yes, that's the classic use case for MongoDB's findAndModify command.
Specifically for pymongo: find_and_modify.
All updates are atomic operations over a document. Now find_and_modify locks that document and returns it back in the same operation.
This allows you to combine a lock over the document during find and then applies the update operation.
You can find more about atomic operations:
http://www.mongodb.org/display/DOCS/Atomic+Operations
Best,
Norberto
The answers reference findAndModify documentation. But a practical example given the OP's requirements will do justice:
const current = new ISODate();
const timeAgoBy30Minutes = new Date(current.getTime() - 1000 * 30 ).toISOString();
db.runCommand(
{
findAndModify: "rooms",
query: {
"availability" : true,
"lastChecked" : {
"$lt": timeAgoBy30Minutes
}
},
update: { $set: { availability: false, lastChecked: current.toISOString() } }
}
)
In the above example, my decision to use db.runCommand verses db.rooms.findAndModify was strategic. db.runCommand will return a status code as to whether the document was updated, which allows me to perform additional work if the return value was true. findAndModify simply returns the old document, unless the new flag is passed to the argument list by which it will return the updated document.