Isolation of bulk operations in MongoDB - mongodb

There's a new kind of operations in 2.6 that's called bulk operations. It resembles me transactions - user can specify a set of writes and subsequently execute them just like described below
var bulk = db.users.initializeOrderedBulkOp();
bulk.insert( { user: "abc123", status: "A", points: 0 } );
bulk.insert( { user: "ijk123", status: "A", points: 0 } );
bulk.insert( { user: "mop123", status: "P", points: 0 } );
bulk.find( { status: "D" } ).remove();
bulk.find( { status: "P" } ).update( { $set: { comment: "Pending" } } );
bulk.execute();
Is bulk operation atomic? Does a potential consumer experiences non-repeatable or phantom reads?

From mongo docs:
Operations on a single document are always atomic with MongoDB databases, however, operations that involve multiple documents, which are often referred to as "multi-document transactions", are not atomic.
Ordered vs Unordered Operations
Bulk write operations can be either ordered or unordered. With an ordered list of operations, MongoDB executes the operations serially. If an error occurs during the processing of one of the write operations, MongoDB will return without processing any remaining write operations in the list.
http://docs.mongodb.org/manual/core/bulk-write-operations/
Conclusion: there is no transaction in bulk operations

Related

Performance issues related to $nin/$ne querying in large database

I am working on a pipeline where multiple microservices(workers) modify and add attributes to documents. Some of them have to make sure the document was already processed by another microservice and/or make sure they don't process a document twice.
I've already tried two different data structures for this: array and object:
{
...other_attributes
worker_history_array: ["worker_1", "worker_2", ...]
woker_history_object: {"worker_1": true, "worker_2": true, ...}
}
I also created indexes for the 2 fields
{ "worker_history_array": 1 }
{ "worker_history_object.$**": 1 }
Both data structures use the index and work very well when querying for the existence of a worker in the history:
{
"worker_history_array": "worker_1"
}
{
"worker_history_object.worker_1": true
}
But I can't seem to find a query that is fast/ hits the index when looking if a worker did not already process this document. All of those queries perform awfully:
{
"worker_history_array": { $ne: "worker_1" }
}
{
"worker_history_array": { $nin: ["worker_1"] }
}
{
"worker_history_object.worker_1": { $exists: false }
}
{
"worker_history_object.worker_1":{ $not: { $exists: true } }
}
{
"worker_history_object.worker_1": { $ne: true }
}
Performance is already bad with 500k documents, but the database will grow to millions of documents.
Is there a way to improve the query performance?
Can I work around the low selectivity of $ne and $nin?
Different index?
Different data structure?
I don't think it matters but I'm using MongoDB Atlas (MongoDB 4.4.1, cluster with read replicas) on Google Cloud and examined the performance of the queries with MongoDB Compass.
Additional Infos/Restrictions:
Millions of records
Hundreds of workers
I don't know all workers beforehand
Not every worker processes every document (some may only work on documents with type: "x" while others work only on documents with type: "y")
No worker should have knowledge about the pipeline, only about the worker that directly precedes it.
Any help would be greatly appreciated.

mongodb capped collections consume from offset

We are considering using MongoDB capped collections as our fifo queues.
Our requirements are the following:
Processing of the messages should be done in insertion order
No messages should be lost/skipped
We should be able to start consuming from a specific offset
However we are facing the following issue:
Capped collections are guaranteeing read by insertion order. However the _ids are not guaranteed to be monotonic.
This means that if there are multiple producers the following situation can occur:
[
...
{
_id: 5b72f12599757c9e26c0946b,
...
},
{
_id: 5b72f12599757c9e26c0946d,
...
},
{
_id: 5b72f12599757c9e26c0946c,
...
},
{
_id: 5b72f12599757c9e26c0946e,
...
},
...
]
This means that if we start consuming with the following code:
const cursor = collection
.find({ _id: { $gt: "5b72f12599757c9e26c0946d" } })
.tailable()
.cursor();
Then the message with 5b72f12599757c9e26c0946c will be skipped.
So my questions are the following:
Is it possible to guarantee monotonic ids on a capped collection?
Is it possible to start consuming from a specific offset without
skiping messages with out of order _id?
Are we missing something?
Thanks in advance.

Is it possible to perform multiple DB operations in a single transaction in MongoDB?

Suppose I have two collections A and B
I want to perform an operation
db.A.remove({_id:1});
db.B.insert({_id:"1","name":"dev"})
I know MongoDB maintains atomicity at the document level. Is it possible to perform the above set of operation in a single transaction?
Yes, now you can!
MongoDB has had atomic write operations on the level of a single document for a long time. But, MongoDB did not support such atomicity in the case of multi-document operations until v4.0.0. Multi-document operations are now atomic in nature thanks to the release of MongoDB Transactions.
But remember that transactions are only supported in replica sets using the WiredTiger storage engine, and not in standalone servers (but may support on standalone servers too in future!)
Here is a mongo shell example also provided in the official docs:
// Start a session.
session = db.getMongo().startSession( { readPreference: { mode: "primary" } } );
employeesCollection = session.getDatabase("hr").employees;
eventsCollection = session.getDatabase("reporting").events;
// Start a transaction
session.startTransaction( { readConcern: { level: "snapshot" }, writeConcern: { w: "majority" } } );
//As many operations as you want inside this transaction
try {
employeesCollection.updateOne( { employee: 3 }, { $set: { status: "Inactive" } } );
eventsCollection.insertOne( { employee: 3, status: { new: "Inactive", old: "Active" } } );
} catch (error) {
// Abort transaction on error
session.abortTransaction();
throw error;
}
// Commit the transaction using write concern set at transaction start
session.commitTransaction();
session.endSession();
I recommend you reading this and this to better understand how to use!
MongoDB can not guarantee atomicity when more than one document is involved.
Also, MongoDB does not offer any single operations which affect more than one collection.
When you want to do whatever you actually want to do in an atomic manner, you need to merge collections A and B into one collection. Remember that MongoDB is a schemaless database. You can store documents of different types in one collection and you can perform single atomic update operations which perform multiple changes to a document. That means that a single update can transform a document of type A into a document of type B.
To tell different types in the same collection apart, you could have a type field and add this to all of your queries, or you could use duck-typing and identify types by checking if a certain field $exists.

Concurrent partial updates in Mongo collection

Consider the following mongo document
{
_id:...
param1:oldValue1
param2:oldValue2
}
Suppose if am trying to do two concurrent partial updates with the following queries:
db.collection.update( { _id:...} , { $set: { param1 : "newValue1" } }
db.collection.update( { _id:...} , { $set: { param2 : "newValue2" } }
Will I get the following docuemnt state in mongo after these concurrent partial updates:
{
_id:...
param1:newValue1
param2:newValue2
}
Does two concurrent updates leave the document with updated values considering the fact that the concurrent updates dont have common fields.without concurrent modification issue?
Yes, regardless of the execution order of the two updates, the doc will end up as you show it. This is because the two atomic $set operations target distinct fields, and any field not referenced in the update isn't modified.

MongoDB Batch update performance problems

I understand that MongoDB supports batch inserts, but not batch updates.
Batch-inserting thousands of documents is pretty fast, but updating thousands of documents is incredibly slow. So slow that the workaround I'm using now is to remove the old documents and batch-insert the updated documents. I had to add some flags to mark them as invalid and all the stuff required for compensating from a failed mock 'bulk update'. :(
I know this is an awful and unsafe solution, but its the only way I've been able to reach the required performance.
If you know a better way, please help me.
Thanks!
As long as you're using MongoDB v2.6 or higher, you can use bulk operations to perform updates as well.
Example from the docs:
var bulk = db.items.initializeUnorderedBulkOp();
bulk.find( { status: "D" } ).update( { $set: { status: "I", points: "0" } } );
bulk.find( { item: null } ).update( { $set: { item: "TBD" } } );
bulk.execute();
I had a similar situation, after doing trial and error, I created an index in MongoDB or through mongoose, now thousands of documents is pretty fast by using bulk operations i.e. bulk.find({}).upsert.update({}).
Example:
var bulk = items.collection.initializeOrderedBulkOp();
bulk.find({fieldname: value, active: false}).updateOne({.upsert().updateOne({
$set: updatejsondata,
$setOnInsert: createjsondata
});
Note: you need to use $push for storing in array like $set, you need to include $push
example:
bulk.find({name: value, active: false}).updateOne({.upsert().updateOne({
$set: updatejsondata,
$push: {logdata: filename + " - " + new Date()}
$setOnInsert: createjsondata
});
Creating index: In above case on Items collection you need to create the index on search fields i.e. name, active
Example:
Mongo Command Line:
db.items.ensureIndex({name: 1, active: 1}, {unique: false, dropDups: false})
Mongoose Schema:
ItemSchema.index({name: 1, active: 1}, {name: "itemnameactive"});
Hope this will help you out doing bulk operations