does mongodb have the properties such as trigger and procedure in a relational database? - mongodb

as the title suggests, include out the map-reduce framework
if i want to trigger an event to run a consistency check or security operations before a record is inserted, how can i do that with MongoDB?

MongoDB does not support triggers, but people have created solutions around them, mostly using the oplog, though this will only help you if you are running with replica sets, as the oplog is a capped collection that keeps track of data changes for the purposes of replication.
For a nodejs solution see: https://www.npmjs.org/package/mongo-watch or see an earlier SO thread: How to listen for changes to a MongoDB collection?
If you are concerned with consistency, read about write concern in mongoDB. http://docs.mongodb.org/manual/core/write-concern/ You can be as relaxed or as strict as you want by setting insert write concern levels, from fire and hope to getting an acknowledgement from all members of the replica set.
So, if you want to run a consistency check before inserting data, you probably will have to move that logic to the client application and set your write concern level to a level that will ensure consistency.

MongoDb does not have triggers or stored procedures. While there are solutions that some have used to try to emulate the behavior, as it is not a built-in feature, you'll need to decide whether the solutions are effective for you. Searching for "triggers and mongodb" should find dozens. All depend on the oplog and replicas.
But, given the nature of MongoDb and a typical 3 tier architecture, I would expect that at the point of data insertion, which could be on a web server for example, you would run, on the web server, the necessary consistency and security checks. You wouldn't allow a client such as a mobile application to directly set data into the database collection without some checks.
Many drivers for MongoDb and extended libraries have validation and consistency checks built in already, so there is less to do. Using unique indexes for some fields can also provide a level of consistency that you cannot do from the driver alone. Look at calls like findAndModify which make atomic updates.

Related

Is document-level Transaction enough? (in mongodb)

MongoDB documentation and blog describe its transaction capabilities like this.
MongoDB write operations are ACID-compliance at the document level- including the >ability to update embedded arrays and sub-documents automatically.
Now I'm wondering is this "document-level transaction support" enough ?
by enough I mean can it be as good as transaction support in old fashioned RDBMSs ?
about the possible duplicate, what i had in mind was a general question, the fact that "is this enough?" for a developer? or not.
I'm going to agree with Joshua on this and add my two cents. In the RDBMS world a transaction is very frequently updating multiple normalized data-bearing structures. A robust level of atomicity is required to ensure that changes are committed to all of those structures as a unit, or rolled back as a unit. In MongoDB you would ideally be designing your schema to keep data that logically belongs together housed together in the same document. This makes document-level atomicity perfectly sufficient for your typical document schema.
I'll also agree that neither RDBMS nor MongoDB transaction handling should be your only line of defense against errors and data corruption. For critical data changes that must be atomic you should always check consistency at the code level post-update.
One final thought: In most RDBMS systems, transaction handling does not always map one-to-one to concurrency. Frequently a large transaction can lock an entire table or tables and cause backlogs in response. In MongoDB, document-level ACID compliance in transaction handling pairs well with document-level concurrency available to those using the WiredTiger storage engine. If designed with both in mind your application can be highly concurrent and completely ACID compliant at the document level, giving you a high level of performance and throughput for transactional workloads.
Cheers,
Bill Finch
Answering this question involves an understanding of schema design in the NoSQL world. If you approach your schema design like you would in an RDBMS, then you will have a very bad time, and not just because of transactions.
If you design your documents properly, however, document level ACID-compliance should be just fine for 99% of use cases. I would even argue that outside that 99% and in that 1% of use cases, you shouldn't be relying on your database for transactions anyways. This would be a really complicated case where you were changing two completely separate things in parallel. Even in an RDBMS if you were doing this, you would always write a verification in code.
One example might be a bulk update for a banking customer that involved them changing their name and doing an address change at the same time. In an RDBMS these are likely to be separate tables. In MongoDB these will both be in the same document. So this fits in the 99%.
A debit to one account and credit to another would be an example that fit into the 1%. You can wrap that in a transaction in SQL, but if you don't write code to verify the writes afterward, you are going to loose your job. You would never rely on the database for that. Same with MongoDB, where these would be two different documents.
Document-level transactions are good, but not enough for real-world applications. in general, you have to think a bit different as in a RDBMS-world, and use "sub-document" and you can solve many situations without collection-wide transactions, but there are enough use-cases, where you definitly need collection-wide transactions.
The debit/credit-situation of an account-system is one example... or if you implement a battle-game, where two player fight against each other and the one (winner) gets "resources" from the other (looser)... you have to update the resource-state of both players in parallel or both have to be rolled back, if something failed. This is not handled by MongoDB transactional as in RDBM-systems.
Once again, as others said already: you have to think in objects/document-structure, there you can handle many situations, where document-level-transactions are far enough...
But the collection-wide-transactions are on the roadmap of MongoDB ;-)
If you are able to include all your logical data in one document, MongoDB is going to be faster and higher performance than a relational database. You must be sure that all your data are going to be written, or not, at the same time (ACID compliant at the document level).
If you are not in a hurry, MongoDB is working hard to get transactions across collections!
Regards,
Juan
Starting from version 4.0 MongoDB will add support for multi-document transactions. So you will have the power of the document model with ACID guarantees in MongoDB.
For details visit this link: https://www.mongodb.com/blog/post/multi-document-transactions-in-mongodb?jmp=community

How to live without transactions?

Most of the popular NoSQL databases (MongoDB, RethinkDB) do not support ACID transactions. They are very popular today within developers of different systems.
The problem is: how to guarantee data consistency without transactions?
I thought that data consistency is one of the main things in production. Am I wrong?
Maybe there is some technics to restore data consistency?
I would like to use RethinkDB for my project, but I'm scare about missed transactions.
I do not know much about RethinkDB, so this answer is primarily based on MongoDB.
while MongoDB can not provide atomic operations on multiple documents at the same time, it does guarantee atomicity for a single operation which affects one document. That means when one query changes multiple fields of the same document, you can be sure that all these changes will be performed at the same time. Combined with the MongoDB philosophy of keeping a consistent dataset in one document instead of spreading it over many rows of different related tables, this removes many situations where you would need transactions in a relational database.
not every project needs complex transactions. Sure, there are some domains where it is essential (like most situations where you deal with money), but in other cases it isn't actually that big of a deal when some data is inconsistent for a few milliseconds. You need to consider how important data consistency is for your project. When you come to the conclusion that there are many situations where you do need transactions, then by all means, stick to SQL.
In a pinch, MongoDB can simulate multi-document transactions by using the two-phase commit model. It's not easy to implement, it's not easy to work with, it does not result in a pretty data model, but it is a valid workaround when you have a project which would be perfect for MongoDB in all regards except for that one use-case which just can't do without transactions.
A lot of popular NoSQL data stores don't support atomic multi-key updates (transactions) of the box but most of them provide primitives which allow you to build ACID transactions on the application level.
If a data store supports per key linearizability and compare-and-set operation (atomic document updates) then it's enough to implement serializable client-side transactions. For example, this approach is used in Google's Percolator and in CockroachDB database.
In my blog I created step-by-step visualization of serializable cross shard client-side transactions, described the major use cases and provided links to the variants of the algorithm. I hope it will help you to understand how to work with transactions with NoSQL data stores.
Among the data stores which support per key linearizability and CAS are:
Cassandra with lightweight transactions
Riak with consistent buckets
RethinkDB
ZooKeeper
Etdc
HBase
DynamoDB
MongoDB
By the way, if you're fine with Read Committed isolation level then it makes sense to take a look on RAMP transactions by Peter Bailis. They can be also implemented with the same set of primitives.
In RethinkDB, you have some guanrantee for atomicity. According to the document https://rethinkdb.com/docs/architecture/
Write atomicity is supported on a per-document basis – updates to a
single JSON document are guaranteed to be atomic. RethinkDB is
different from other NoSQL systems in that atomic document updates
aren’t limited to a small subset of possible operations – any
combination of operations that can be performed on a single document
is guaranteed to update the document atomically
When you want to run a non-atomic update, you have to explicitly opt in for it, according to https://www.rethinkdb.com/api/javascript/update/
nonAtomic: if set to true, executes the update and distributes the
result to replicas in a non-atomic fashion. This flag is required to
perform non-deterministic updates, such as those that require reading
data from another table.
It has an issue to track some Transaction support for RethinkDB here: https://github.com/rethinkdb/rethinkdb/issues/4598
Anyway, you don't have good transaction but you have some basic guarantee that is enough for you. And try to design your operation around those basic thing.

Configure a Mongo replica set to only replicate certain collections

I have a ~3GB mongo database with several dozen collections. Three of these collections handle ~300 queries per second, while the rest sustain a much lower volume. I expect the traffic to continue to grow quickly.
I'd like to set up a replica set to handle the high-traffic collections. It isn't necessary for this new instance to replicate the rest of the database. Is this possible?
Seems like not possible at the moment by built-in features of mongodb and only way to do is to come up with your own manual replication algorithm or use some other tools written by third parties.
https://github.com/wordnik/wordnik-oss project might help you to achieve this according to the following post.
https://groups.google.com/forum/?fromgroups=#!topic/mongodb-user/Ap9V4ArGuFo
Describes workaround to filter documents in replication.
Replicate only documents where {'public':true} in MongoDB
Or just replicate the data yourself manually which might worth trying.
Good luck.
No that isn't possible now. What you could do is move those collections into another unreplicated database. But this will cause headaches once these collections see higher traffic too, so you would need to move them into your "replication"-db.
But in general Replication isn't the way to go if you need to scale, it's more considered for DR/failover. Replicaset Secondaries can only (optionally) answer read queries but no write queries, this is something you should keep in mind. So if you have high write load this may not cure your problem.
Once you allow your application to read from secondaries you need to live with eventual consistency, meaning that your application isn't guaranteed to see always the latest data. This is caused due to the asynchronous replication to the secondaries.
Indeed you can cure this problem if you configure your writeconcern, so that the write needs to succeeded on all replicas, before it's considered written and your driver returns. But this may slow down your write operations significant.
So for scaling query execution capabilities I would go with Sharding. This is possible on a per collection level, all unsharded collections will remain on a "default-shard".
Not possible but then if the data size is so small and these collections aren't updated, then the only overhead of having them replicated is the small storage size on the secondary. That is a relatively small price to pay, especially since the collections won't grow in size, compared with writing your own replication logic.
Instead of that archive the data, and have only the latest data set on the production server and the rest of the data can archive on the new server.

creating a different database for each collection in MongoDB 2.2

MongoDB 2.2 has a write lock per database as opposed to a global write lock on the server in previous versions. So would it be ok if i store each collection in a separate database to effectively have a write lock per collection.(This will make it look like MyISAM's table level locking). Is this approach faulty?
There's a key limitation to the locking and that is the local database. That database includes a the oplog collection which is used for replication.
If you're running in production, you should be running with Replica Sets. If you're running with Replica Sets, you need to be aware of the write lock effect on that database.
Breaking out your 10 collections into 10 DBs is useless if they all block waiting for the oplog.
Before taking a large step to re-write, please ensure that the oplog will not cause issues.
Also, be aware that MongoDB implements DB-level security. If you're using any security features, you are now creating more DBs to secure.
Yes that will work, 10gen actually offers this as an option in their talks on locking.
I probably isolate every collection, though. Most databases seem to have 2-5 high activity collections. For the sake of simplicity it's probably better to keep the low activity collections grouped in one DB and put high activity collections in their own databases.

Why doesn't MongoDB use fsync()?

So I have done some research and found out that MongoDB doesn't do fsync(), which means that when you tell the database to write something, the database might tell you it's written, although it's not. Isn't this going against CRUD?
If I'm correct, are there any good reasons for this?
The reason is performance. Without having to write to disk on each change, MongoDB can handle updates faster.
MongoDB tells you when updates have been delivered to the server, not when the updates have been written, as you can read in the documentation on Verifying Propagation of Writes with getLastError:
Note: the current implementation returns when the data has been delivered to [the] servers. Future versions will provide more options for delivery vs. say, physical fsync at the server.
This is going against ACID, more specifically against the D, which stands for durability:
Durability [guarantees] that once the user has been notified of a transaction's success the transaction will not be lost, the transaction's data changes will survive system failure, and that all integrity constraints have been satisfied, so the DBMS won't need to reverse the transaction.
ACID properties mostly apply to traditional RDBMS systems. NoSQL systems, which includes MongoDB, give up on one or more of the ACID properties in order to achieve better scalability. In MongoDB's case durability has been sacrificed for better performance when handling large amounts of updates.
MongoDB and ACID
Most ACID properties are guarantees at transaction level. A transaction is usually a group of queries that should be treated as a single unit. MongoDB has no concept of transactions, again for performance reasons. Therefore most ACID properties don't apply to MongoDB.
A — Atomicity states that a transaction should either succeed or fail. It is not allowed to partially succeed; if part of the transaction fails, the entire transaction should be rolled back. MongoDB supports atomic operations on a document level, but not on a 'transaction' level.
C — Consistency partially refers to atomicity, but also includes referential integrity. A relational database is responsible for making sure that all foreign key references are valid. MongoDB has no concept of foreign keys, so this ACID property doesn't apply.
I — Isolation states that two concurrent transactions are not allowed to interfere with each other; if two transactions try to modify the same data, the second transaction has to wait for the first one to complete. To achieve this, the database will lock the data. MongoDB has no concept of locking, so it doesn't support isolation for multiple operations1). Single operations are isolated.
D — Durability is described above. MongoDB doesn't support true durability (yet), in terms of ACID-ic durability.
Now, you may think that MongoDB is useless compared to RDBMS systems because it lacks transactions and most ACID guarantees. However, part of the reason that transactions exist is that relational databases need to treat certain data as a single entity, but this data has been normalized into multiple tables.
MongoDB allows you to store your data as a single entity. This removes the need for foreign keys and referential integrity in most cases. You also don't need multi-query transactions, because you don't need multiple tables to update a single entity. Most of the times you only have to update a single document, and these operations are atomic in MongoDB.
1) According to the first comment on this page, db.eval() provides isolation for multiple operations. However, according to the documentation you usually want to avoid the use of db.eval().
Is this relevant?
durability: added occasinal file sync
default: sync every 60 seconds, confiruable with syncdelay
http://github.com/mongodb/mongo/commit/c44bff08fd95616302a73e92b48b2853c1fd948d