Why doesn't MongoDB use fsync()? - mongodb

So I have done some research and found out that MongoDB doesn't do fsync(), which means that when you tell the database to write something, the database might tell you it's written, although it's not. Isn't this going against CRUD?
If I'm correct, are there any good reasons for this?

The reason is performance. Without having to write to disk on each change, MongoDB can handle updates faster.
MongoDB tells you when updates have been delivered to the server, not when the updates have been written, as you can read in the documentation on Verifying Propagation of Writes with getLastError:
Note: the current implementation returns when the data has been delivered to [the] servers. Future versions will provide more options for delivery vs. say, physical fsync at the server.
This is going against ACID, more specifically against the D, which stands for durability:
Durability [guarantees] that once the user has been notified of a transaction's success the transaction will not be lost, the transaction's data changes will survive system failure, and that all integrity constraints have been satisfied, so the DBMS won't need to reverse the transaction.
ACID properties mostly apply to traditional RDBMS systems. NoSQL systems, which includes MongoDB, give up on one or more of the ACID properties in order to achieve better scalability. In MongoDB's case durability has been sacrificed for better performance when handling large amounts of updates.
MongoDB and ACID
Most ACID properties are guarantees at transaction level. A transaction is usually a group of queries that should be treated as a single unit. MongoDB has no concept of transactions, again for performance reasons. Therefore most ACID properties don't apply to MongoDB.
A — Atomicity states that a transaction should either succeed or fail. It is not allowed to partially succeed; if part of the transaction fails, the entire transaction should be rolled back. MongoDB supports atomic operations on a document level, but not on a 'transaction' level.
C — Consistency partially refers to atomicity, but also includes referential integrity. A relational database is responsible for making sure that all foreign key references are valid. MongoDB has no concept of foreign keys, so this ACID property doesn't apply.
I — Isolation states that two concurrent transactions are not allowed to interfere with each other; if two transactions try to modify the same data, the second transaction has to wait for the first one to complete. To achieve this, the database will lock the data. MongoDB has no concept of locking, so it doesn't support isolation for multiple operations1). Single operations are isolated.
D — Durability is described above. MongoDB doesn't support true durability (yet), in terms of ACID-ic durability.
Now, you may think that MongoDB is useless compared to RDBMS systems because it lacks transactions and most ACID guarantees. However, part of the reason that transactions exist is that relational databases need to treat certain data as a single entity, but this data has been normalized into multiple tables.
MongoDB allows you to store your data as a single entity. This removes the need for foreign keys and referential integrity in most cases. You also don't need multi-query transactions, because you don't need multiple tables to update a single entity. Most of the times you only have to update a single document, and these operations are atomic in MongoDB.
1) According to the first comment on this page, db.eval() provides isolation for multiple operations. However, according to the documentation you usually want to avoid the use of db.eval().

Is this relevant?
durability: added occasinal file sync
default: sync every 60 seconds, confiruable with syncdelay
http://github.com/mongodb/mongo/commit/c44bff08fd95616302a73e92b48b2853c1fd948d

Related

MongoDB rich queries and isolation guarantees

Asking for a friend.
Can we write ACID transactions that contain rich queries with MongoDB version 4?
If so - can I have an example or a pointer to an API?
Thanks in advance.
It seems like we can do it with MongoDB 4.0:
https://docs.mongodb.com/master/core/transactions/?_ga=2.98680029.215960757.1535643945-1204416970.1535643943
MongoDB provides the ability to perform multi-document transactions against replica sets. Multi-document transactions can be used across multiple operations, collections, databases, and documents. Multi-document transactions provide an “all-or-nothing” proposition. When a transaction commits, all data changes made in the transaction are saved. If any operation in the transaction fails, the transaction aborts and all data changes made in the transaction are discarded without ever becoming visible. Until a transaction commits, no write operations in the transaction are visible outside the transaction.
Some examples in Java:
https://spring.io/blog/2018/06/28/hands-on-mongodb-4-0-transactions-with-spring-data
It seems like in java you can even set it up to manage transactions using the #Transactional annotation, like in Hibernate.
Just be aware that normally the usage of transactions bring an additional performance cost.
In most cases, multi-document transaction incurs a greater performance cost over single document writes, and the availability of multi-document transaction should not be a replacement for effective schema design.
In my personal point of view, I would not go for mongo if you need to rely into transactions a lot.

Is document-level Transaction enough? (in mongodb)

MongoDB documentation and blog describe its transaction capabilities like this.
MongoDB write operations are ACID-compliance at the document level- including the >ability to update embedded arrays and sub-documents automatically.
Now I'm wondering is this "document-level transaction support" enough ?
by enough I mean can it be as good as transaction support in old fashioned RDBMSs ?
about the possible duplicate, what i had in mind was a general question, the fact that "is this enough?" for a developer? or not.
I'm going to agree with Joshua on this and add my two cents. In the RDBMS world a transaction is very frequently updating multiple normalized data-bearing structures. A robust level of atomicity is required to ensure that changes are committed to all of those structures as a unit, or rolled back as a unit. In MongoDB you would ideally be designing your schema to keep data that logically belongs together housed together in the same document. This makes document-level atomicity perfectly sufficient for your typical document schema.
I'll also agree that neither RDBMS nor MongoDB transaction handling should be your only line of defense against errors and data corruption. For critical data changes that must be atomic you should always check consistency at the code level post-update.
One final thought: In most RDBMS systems, transaction handling does not always map one-to-one to concurrency. Frequently a large transaction can lock an entire table or tables and cause backlogs in response. In MongoDB, document-level ACID compliance in transaction handling pairs well with document-level concurrency available to those using the WiredTiger storage engine. If designed with both in mind your application can be highly concurrent and completely ACID compliant at the document level, giving you a high level of performance and throughput for transactional workloads.
Cheers,
Bill Finch
Answering this question involves an understanding of schema design in the NoSQL world. If you approach your schema design like you would in an RDBMS, then you will have a very bad time, and not just because of transactions.
If you design your documents properly, however, document level ACID-compliance should be just fine for 99% of use cases. I would even argue that outside that 99% and in that 1% of use cases, you shouldn't be relying on your database for transactions anyways. This would be a really complicated case where you were changing two completely separate things in parallel. Even in an RDBMS if you were doing this, you would always write a verification in code.
One example might be a bulk update for a banking customer that involved them changing their name and doing an address change at the same time. In an RDBMS these are likely to be separate tables. In MongoDB these will both be in the same document. So this fits in the 99%.
A debit to one account and credit to another would be an example that fit into the 1%. You can wrap that in a transaction in SQL, but if you don't write code to verify the writes afterward, you are going to loose your job. You would never rely on the database for that. Same with MongoDB, where these would be two different documents.
Document-level transactions are good, but not enough for real-world applications. in general, you have to think a bit different as in a RDBMS-world, and use "sub-document" and you can solve many situations without collection-wide transactions, but there are enough use-cases, where you definitly need collection-wide transactions.
The debit/credit-situation of an account-system is one example... or if you implement a battle-game, where two player fight against each other and the one (winner) gets "resources" from the other (looser)... you have to update the resource-state of both players in parallel or both have to be rolled back, if something failed. This is not handled by MongoDB transactional as in RDBM-systems.
Once again, as others said already: you have to think in objects/document-structure, there you can handle many situations, where document-level-transactions are far enough...
But the collection-wide-transactions are on the roadmap of MongoDB ;-)
If you are able to include all your logical data in one document, MongoDB is going to be faster and higher performance than a relational database. You must be sure that all your data are going to be written, or not, at the same time (ACID compliant at the document level).
If you are not in a hurry, MongoDB is working hard to get transactions across collections!
Regards,
Juan
Starting from version 4.0 MongoDB will add support for multi-document transactions. So you will have the power of the document model with ACID guarantees in MongoDB.
For details visit this link: https://www.mongodb.com/blog/post/multi-document-transactions-in-mongodb?jmp=community

How to live without transactions?

Most of the popular NoSQL databases (MongoDB, RethinkDB) do not support ACID transactions. They are very popular today within developers of different systems.
The problem is: how to guarantee data consistency without transactions?
I thought that data consistency is one of the main things in production. Am I wrong?
Maybe there is some technics to restore data consistency?
I would like to use RethinkDB for my project, but I'm scare about missed transactions.
I do not know much about RethinkDB, so this answer is primarily based on MongoDB.
while MongoDB can not provide atomic operations on multiple documents at the same time, it does guarantee atomicity for a single operation which affects one document. That means when one query changes multiple fields of the same document, you can be sure that all these changes will be performed at the same time. Combined with the MongoDB philosophy of keeping a consistent dataset in one document instead of spreading it over many rows of different related tables, this removes many situations where you would need transactions in a relational database.
not every project needs complex transactions. Sure, there are some domains where it is essential (like most situations where you deal with money), but in other cases it isn't actually that big of a deal when some data is inconsistent for a few milliseconds. You need to consider how important data consistency is for your project. When you come to the conclusion that there are many situations where you do need transactions, then by all means, stick to SQL.
In a pinch, MongoDB can simulate multi-document transactions by using the two-phase commit model. It's not easy to implement, it's not easy to work with, it does not result in a pretty data model, but it is a valid workaround when you have a project which would be perfect for MongoDB in all regards except for that one use-case which just can't do without transactions.
A lot of popular NoSQL data stores don't support atomic multi-key updates (transactions) of the box but most of them provide primitives which allow you to build ACID transactions on the application level.
If a data store supports per key linearizability and compare-and-set operation (atomic document updates) then it's enough to implement serializable client-side transactions. For example, this approach is used in Google's Percolator and in CockroachDB database.
In my blog I created step-by-step visualization of serializable cross shard client-side transactions, described the major use cases and provided links to the variants of the algorithm. I hope it will help you to understand how to work with transactions with NoSQL data stores.
Among the data stores which support per key linearizability and CAS are:
Cassandra with lightweight transactions
Riak with consistent buckets
RethinkDB
ZooKeeper
Etdc
HBase
DynamoDB
MongoDB
By the way, if you're fine with Read Committed isolation level then it makes sense to take a look on RAMP transactions by Peter Bailis. They can be also implemented with the same set of primitives.
In RethinkDB, you have some guanrantee for atomicity. According to the document https://rethinkdb.com/docs/architecture/
Write atomicity is supported on a per-document basis – updates to a
single JSON document are guaranteed to be atomic. RethinkDB is
different from other NoSQL systems in that atomic document updates
aren’t limited to a small subset of possible operations – any
combination of operations that can be performed on a single document
is guaranteed to update the document atomically
When you want to run a non-atomic update, you have to explicitly opt in for it, according to https://www.rethinkdb.com/api/javascript/update/
nonAtomic: if set to true, executes the update and distributes the
result to replicas in a non-atomic fashion. This flag is required to
perform non-deterministic updates, such as those that require reading
data from another table.
It has an issue to track some Transaction support for RethinkDB here: https://github.com/rethinkdb/rethinkdb/issues/4598
Anyway, you don't have good transaction but you have some basic guarantee that is enough for you. And try to design your operation around those basic thing.

does mongodb have the properties such as trigger and procedure in a relational database?

as the title suggests, include out the map-reduce framework
if i want to trigger an event to run a consistency check or security operations before a record is inserted, how can i do that with MongoDB?
MongoDB does not support triggers, but people have created solutions around them, mostly using the oplog, though this will only help you if you are running with replica sets, as the oplog is a capped collection that keeps track of data changes for the purposes of replication.
For a nodejs solution see: https://www.npmjs.org/package/mongo-watch or see an earlier SO thread: How to listen for changes to a MongoDB collection?
If you are concerned with consistency, read about write concern in mongoDB. http://docs.mongodb.org/manual/core/write-concern/ You can be as relaxed or as strict as you want by setting insert write concern levels, from fire and hope to getting an acknowledgement from all members of the replica set.
So, if you want to run a consistency check before inserting data, you probably will have to move that logic to the client application and set your write concern level to a level that will ensure consistency.
MongoDb does not have triggers or stored procedures. While there are solutions that some have used to try to emulate the behavior, as it is not a built-in feature, you'll need to decide whether the solutions are effective for you. Searching for "triggers and mongodb" should find dozens. All depend on the oplog and replicas.
But, given the nature of MongoDb and a typical 3 tier architecture, I would expect that at the point of data insertion, which could be on a web server for example, you would run, on the web server, the necessary consistency and security checks. You wouldn't allow a client such as a mobile application to directly set data into the database collection without some checks.
Many drivers for MongoDb and extended libraries have validation and consistency checks built in already, so there is less to do. Using unique indexes for some fields can also provide a level of consistency that you cannot do from the driver alone. Look at calls like findAndModify which make atomic updates.

how do non-ACID RethinkDB or MongoDB maintain secondary indexes for non-equal queries

This is more of 'inner workings' undestanding question:
How do noSQL databases that do not support *A*CID (meaning that they cannot update/insert and then rollback data for more than one object in a single transaction) -- update the secondary indexes ?
My understanding is -- that in order to keep the secondary index in sync (other wise it will become stale for reads) -- this has to happen withing the same transaction.
furthermore, if it is possible for index to reside on a different host than the data -- then a distributed lock needs to be present and/or two-phase commit for such an update to work atomically.
But if these databases do not support the multi-object transactions (which means they do not do two-phase commit on data across multiple host) , what method do they use to guarantee that secondary indices that reside in B-trees structures separate from the data are not stale ?
This is a great question.
RethinkDB always stores secondary indexes on the same host as the primary index/data for the table. Even in case of joins, RethinkDB brings the query to the data, so the secondary indexes, primary indexes, and data always reside on the same node. As a result, there is no need for distributed locking protocols such as two phase commit.
RethinkDB does support a limited set of transactional functionality -- single document transactions. Changes to a single document are recorded atomically. Relevant secondary index changes are also recorded as part of that transaction, so either the entire change is recorded, or nothing is recorded at all.
It would be easy to extend the limited transactional functionality to support multiple documents in a single shard, but it would be hard to do it across shards (for the distributed locking reasons you brought up), so we decided not to implement transactions for multiple documents yet.
Hope this helps.
This is a MongoDB answer.
I am not quite sure what your logic here is. Updating a secondary index has nothing to do with being able to rollback multi statement transactions such as a multiple update.
MongoDB has transcactions per a single document, and that is what matters for updating indexes. These operations can be reversed using the journal if the need arises.
this has to happen withing the same transaction.
Yes, much like a RDBMS would. The more indexes you apply the slower your writes will be, and it seems to me you know why.
As the write occurs MongoDB will update all indexes which apply to that collection with the fields that apply to specific indexes.
furthermore, if it is possible for index to reside on a different host than the data
I am unsure if MongoDB allows that, I believe there is a JIRA for it; however, I cannot find that JIRA currently.
then a distributed lock needs to be present and/or two-phase commit for such an update to work atomically.
Most likely. Allowing this feature would be...well, let's just say creating a hairball.
Even in a sharded setup the index of each range resides on the shard itself, not on the config servers.
But if these databases do not support the multi-object transactions (which means they do not do two-phase commit on data across multiple host)
That is not what a two phase commit means. I believe you need to brush up on what a two phase commit is: http://docs.mongodb.org/manual/tutorial/perform-two-phase-commits/
I suppose if you are talking about a transaction covering more than one shard then, hmm ok.
what method do they use to guarantee that secondary indices that reside in B-trees structures separate from the data are not stale ?
Agan I am unsure why a multi document transaction would effect whether an index would be stale or not, your not grouping across documents. The exception to that is a unique index but that works on single document updates as well; note that its uniqueness gets kinda hairy in sharded setups and cannot be guaranteed.
In an index you are creating, normally, one entry per document prefix key, uless it is a multikey index on the docment then you can make more than one index, however, either way index updating is done per single object, not by multi document transactions and I am unsure what you logic here is aas such this is the answer I have placed.
RethinkDB always stores secondary index data on the same machine as the data it's indexing. This allows it to be updated within the same transaction. Rethink promises to be ACIDy with single document operations and considers the indexing of a document to be part of the document itself.