Spring Data MongoDB Concurrent Updates Behavior

Spring Data MongoDB Concurrent Updates Behavior - mongodb

Imagine theres a document containing a single field: {availableSpots: 100}
and there are millions of users, racing to get a spot by sending a request to an API server.
each time a request comes, the server reads the document and if the availableSpot is > 0, it then decrements it by 1 and creates a booking in another collection.
Now i read that mongodb locks the document whenever an update operation is performed.
What will happen if theres a million concurrent requests? will it take a long time because the same document keeps getting locked? Also, the server reads the value of document before it tries to update the document, and by the time it acquires the lock, the spot may not be available anymore.
It is also possible that the threads are getting "availableSpot > 0" is true at the same instant in time, but in reality the availableSpot may not be enough for all the requests. How to deal with this?

The most important thing here is atomicity and concurrency.
1. Atomicity
Your operation to update (decrement by one) if availableSpots > 0 :
db.collection.updateOne({"availableSpots" :{$gt : 0}}, { $inc: { availableSpots: -1 })
is atomic.
$inc is an atomic operation within a single document.
Refer : https://docs.mongodb.com/manual/reference/operator/update/inc/
2. Concurrency
Since MongoDB has document-level concurrency control for write operations. Each update will take a lock on the document.
Now your questions:
What will happen if theres a million concurrent requests?
Yes each update will be performed one by one (due to locking) hence will slow down.
the server reads the value of document before it tries to update the
document, and by the time it acquires the lock, the spot may not be
available anymore.
Since the operation is atomic, this will not happen. It will work as you want, only 100 updates will be executed with number of affected rows greater than 0 or equal to 1.

MongoDB uses Wired Tiger as a default storage engine starting version 3.2.
Wired Tiger provides document level concurrency:
From docs:
WiredTiger uses document-level concurrency control for write
operations. As a result, multiple clients can modify different
documents of a collection at the same time.
For most read and write operations, WiredTiger uses optimistic
concurrency control. WiredTiger uses only intent locks at the global,
database and collection levels. When the storage engine detects
conflicts between two operations, one will incur a write conflict
causing MongoDB to transparently retry that operation.
When multiple clients are trying to update a value in a document, only that document will be locked, but not the entire collections.

My understanding is that you are concerned about the performance of many concurrent ACID-compliant transactions against two separate collections:
a collection (let us call it spots) with one document {availableSpots: 999..}
another collection (let us call it bookings) with multiple documents, one per booking.
Now i read that mongodb locks the document whenever an update operation is performed.
It is also possible that the threads are getting "availableSpot > 0"
is true at the same instant in time, but in reality the availableSpot
may not be enough for all the requests. How to deal with this?
With version 4.0, MongoDB provides the ability to perform multi-document transactions against replica sets. (The forthcoming MongoDB 4.2 will extend this multi-document ACID transaction capability to sharded clusters.)
This means that no write operations within a multi-document transaction (such as updates to both the spots and bookings collections, per your proposed approach) are visible outside the transaction until the transaction commits.
Nevertheless, as noted in the MongoDB documentation on transactions a denormalized approach will usually provide better performance than multi-document transactions:
In most cases, multi-document transaction incurs a greater performance
cost over single document writes, and the availability of
multi-document transaction should not be a replacement for effective
schema design. For many scenarios, the denormalized data model
(embedded documents and arrays) will continue to be optimal for your
data and use cases. That is, for many scenarios, modeling your data
appropriately will minimize the need for multi-document transactions.
In MongoDB, an operation on a single document is atomic. Because you can use embedded documents and arrays to capture relationships between data in a single document structure instead of normalizing across multiple documents and collections, this single-document atomicity obviates the need for multi-document transactions for many practical use cases.
But do bear in mind that your use case, if implemented within one collection as a single denormalized document containing one availableSpots sub-document and many thousands of bookings sub-documents, may not be feasible as the maximum document size is 16MB.
So, in conclusion, a denormalized approach to write atomicity will usually perform better than a multi-document approach, but is constrained by the maximum document size of 16MB.

You can try using findAndModify() option while trying to update the document. Each time you will need to cherry pick whichever field you want to update in that particular document. Also, since mongo db replicates data to Primary and secondary nodes, you may also want to adjust your WriteConcern values as well. You can read more about this in official documentation. I have something similar coded that handles similar kind of concurrency issues in mongoDB using spring mongoTemplate. Let me know if you want any reference related to java with that.

Related

How to lock a Collection in MongoDB

I have a collection in my database
1.I want to lock my collection when the User Updating the Document
2.No operations are Done Expect Reads while Updating the collection for another Users
please give suggestions how to Lock the collection in MongoDB
Best Regards
GSY

MongoDB implements a writer greedy database level lock already.
This means that when a specific document is being written to:
The User collection would be locked
No reads will be available until the data is written
The reason that no reads are available is because MongoDB cannot do a consistent read while writing (darn you physics, you win again).
It is good to note that if you wish for a more complex lock, spanning multiple rows, then this will not be available in MongoDB and there is no real way of implementing such a thing.

MongoDB locking already does that for you. See what operations acquire which lock and what does each lock mean.

See the MongoDB documentation on write operations paying special attention to this section:
Isolation of Write Operations
The modification of a single document is always atomic, even if the write operation modifies >multiple sub-documents within that document. For write operations that modify multiple >documents, the operation as a whole is not atomic, and other operations may interleave.
No other operations are atomic. You can, however, attempt to isolate a write operation that >affects multiple documents using the isolation operator.
To isolate a sequence of write operations from other read and write operations, see Perform >Two Phase Commits.

How Efficient is Mongo DB ISOLATION

I came across a document that Mongo DB maintains a global write lock, wanted to know how efficient it is to support "ISOLATION" of "ACID" as of SQL database.

I came across a document that Mongo DB maintains a global write lock
That's old information, MongoDB is now on a database level lock, maybe sometime in the future collection, however, that has been put back in favour of concurrency.
wanted to know how efficient it is to support "ISOLATION" of "ACID" as of SQL database.
First thing first, MongoDB IS NOT AN ACID DATABASE. If you want ACID you should go with an ACID compliant database. Don't try and make a database do what it isn't designed to do.
As for actual isolation, currently MongoDB has isolation on a single document level with atomic operations such as $inc, $set, $unset and all those others. Isolation does not occur on multiple documents, there is an $isolated ( http://docs.mongodb.org/manual/reference/operator/isolated/ ) operator but it is highly recommened not to use it, plus it isn't supported on sharded collections.
There is also a documentation page on providing isolation levels: http://docs.mongodb.org/manual/tutorial/isolate-sequence-of-operations/ but only findAndModify and indexing can provide some element of isolation whereby other queries will not interferer.
Fundamentally, even if it had atomic operations on multiple documents, MongoDB cannot normally support isolation across many documents, this is due to one of its main concurrency features, the ability to subside out of memory operations for ones in memory.
And so I come back to my original point, if you want ACID go to a ACID tech.

how do non-ACID RethinkDB or MongoDB maintain secondary indexes for non-equal queries

This is more of 'inner workings' undestanding question:
How do noSQL databases that do not support *A*CID (meaning that they cannot update/insert and then rollback data for more than one object in a single transaction) -- update the secondary indexes ?
My understanding is -- that in order to keep the secondary index in sync (other wise it will become stale for reads) -- this has to happen withing the same transaction.
furthermore, if it is possible for index to reside on a different host than the data -- then a distributed lock needs to be present and/or two-phase commit for such an update to work atomically.
But if these databases do not support the multi-object transactions (which means they do not do two-phase commit on data across multiple host) , what method do they use to guarantee that secondary indices that reside in B-trees structures separate from the data are not stale ?

This is a great question.
RethinkDB always stores secondary indexes on the same host as the primary index/data for the table. Even in case of joins, RethinkDB brings the query to the data, so the secondary indexes, primary indexes, and data always reside on the same node. As a result, there is no need for distributed locking protocols such as two phase commit.
RethinkDB does support a limited set of transactional functionality -- single document transactions. Changes to a single document are recorded atomically. Relevant secondary index changes are also recorded as part of that transaction, so either the entire change is recorded, or nothing is recorded at all.
It would be easy to extend the limited transactional functionality to support multiple documents in a single shard, but it would be hard to do it across shards (for the distributed locking reasons you brought up), so we decided not to implement transactions for multiple documents yet.
Hope this helps.

This is a MongoDB answer.
I am not quite sure what your logic here is. Updating a secondary index has nothing to do with being able to rollback multi statement transactions such as a multiple update.
MongoDB has transcactions per a single document, and that is what matters for updating indexes. These operations can be reversed using the journal if the need arises.
this has to happen withing the same transaction.
Yes, much like a RDBMS would. The more indexes you apply the slower your writes will be, and it seems to me you know why.
As the write occurs MongoDB will update all indexes which apply to that collection with the fields that apply to specific indexes.
furthermore, if it is possible for index to reside on a different host than the data
I am unsure if MongoDB allows that, I believe there is a JIRA for it; however, I cannot find that JIRA currently.
then a distributed lock needs to be present and/or two-phase commit for such an update to work atomically.
Most likely. Allowing this feature would be...well, let's just say creating a hairball.
Even in a sharded setup the index of each range resides on the shard itself, not on the config servers.
But if these databases do not support the multi-object transactions (which means they do not do two-phase commit on data across multiple host)
That is not what a two phase commit means. I believe you need to brush up on what a two phase commit is: http://docs.mongodb.org/manual/tutorial/perform-two-phase-commits/
I suppose if you are talking about a transaction covering more than one shard then, hmm ok.
what method do they use to guarantee that secondary indices that reside in B-trees structures separate from the data are not stale ?
Agan I am unsure why a multi document transaction would effect whether an index would be stale or not, your not grouping across documents. The exception to that is a unique index but that works on single document updates as well; note that its uniqueness gets kinda hairy in sharded setups and cannot be guaranteed.
In an index you are creating, normally, one entry per document prefix key, uless it is a multikey index on the docment then you can make more than one index, however, either way index updating is done per single object, not by multi document transactions and I am unsure what you logic here is aas such this is the answer I have placed.

RethinkDB always stores secondary index data on the same machine as the data it's indexing. This allows it to be updated within the same transaction. Rethink promises to be ACIDy with single document operations and considers the indexing of a document to be part of the document itself.

Does findAndModify effectively lock the document to prevent update conflicts?

What type of locking does findAndModify() offer? Is is a write lock only, or read/write? Does it prevent simultaneous updates on the same record?

MongoDB has a global (per-instance) write lock, which serializes all updates across all data in the server (though different servers in a sharded cluster will each have their own independent locks). This means that at any given instant in time, only one update is taking place on any document, and therefore only one update for any given document.
findAndModify doesn't do anything different in this regard than an ordinary update -- it just returns the document to you.

According to the MongoDB docs for MongoDB: findAndModify() for under MongoDB: Atomic Operations it should be.

Why doesn't MongoDB use fsync()?

So I have done some research and found out that MongoDB doesn't do fsync(), which means that when you tell the database to write something, the database might tell you it's written, although it's not. Isn't this going against CRUD?
If I'm correct, are there any good reasons for this?

The reason is performance. Without having to write to disk on each change, MongoDB can handle updates faster.
MongoDB tells you when updates have been delivered to the server, not when the updates have been written, as you can read in the documentation on Verifying Propagation of Writes with getLastError:
Note: the current implementation returns when the data has been delivered to [the] servers. Future versions will provide more options for delivery vs. say, physical fsync at the server.
This is going against ACID, more specifically against the D, which stands for durability:
Durability [guarantees] that once the user has been notified of a transaction's success the transaction will not be lost, the transaction's data changes will survive system failure, and that all integrity constraints have been satisfied, so the DBMS won't need to reverse the transaction.
ACID properties mostly apply to traditional RDBMS systems. NoSQL systems, which includes MongoDB, give up on one or more of the ACID properties in order to achieve better scalability. In MongoDB's case durability has been sacrificed for better performance when handling large amounts of updates.
MongoDB and ACID
Most ACID properties are guarantees at transaction level. A transaction is usually a group of queries that should be treated as a single unit. MongoDB has no concept of transactions, again for performance reasons. Therefore most ACID properties don't apply to MongoDB.
A — Atomicity states that a transaction should either succeed or fail. It is not allowed to partially succeed; if part of the transaction fails, the entire transaction should be rolled back. MongoDB supports atomic operations on a document level, but not on a 'transaction' level.
C — Consistency partially refers to atomicity, but also includes referential integrity. A relational database is responsible for making sure that all foreign key references are valid. MongoDB has no concept of foreign keys, so this ACID property doesn't apply.
I — Isolation states that two concurrent transactions are not allowed to interfere with each other; if two transactions try to modify the same data, the second transaction has to wait for the first one to complete. To achieve this, the database will lock the data. MongoDB has no concept of locking, so it doesn't support isolation for multiple operations1). Single operations are isolated.
D — Durability is described above. MongoDB doesn't support true durability (yet), in terms of ACID-ic durability.
Now, you may think that MongoDB is useless compared to RDBMS systems because it lacks transactions and most ACID guarantees. However, part of the reason that transactions exist is that relational databases need to treat certain data as a single entity, but this data has been normalized into multiple tables.
MongoDB allows you to store your data as a single entity. This removes the need for foreign keys and referential integrity in most cases. You also don't need multi-query transactions, because you don't need multiple tables to update a single entity. Most of the times you only have to update a single document, and these operations are atomic in MongoDB.
1) According to the first comment on this page, db.eval() provides isolation for multiple operations. However, according to the documentation you usually want to avoid the use of db.eval().

Is this relevant?
durability: added occasinal file sync
default: sync every 60 seconds, confiruable with syncdelay
http://github.com/mongodb/mongo/commit/c44bff08fd95616302a73e92b48b2853c1fd948d

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse