MongoDB - Is there any advantage in moving frequently updated fields to different collections? - mongodb

I'm new to MongoDB and Document-Oriented Databases and while I was migrating a relational database to this whole new concept of storing a question raised:
In relational databases it's usually a good idea to create a new table to store frequently updated fields (let's say you have a user's table and a last_activity one) so that the slow write operations don't lock the other tables.
Is there any advantage of doing the same in MongoDB, since the read operations seem to be very performant and doing two queries wouldn't be much of a problem?
Thank you all in advance.

Starting with version mongodb 3.2, is already in use by default wiredtiger. This engine is not necessary to create additional collections.
Well, do not forget to create updatable fields Index.
db.test.ensureIndex({name: 1});
db.test.update({"name":"Alex"}, {$set:{"last_name":"alexeev"}})

If you use the default storage engine, MMAPv1, then you have collection-level concurrency and it may be beneficial to create new collections for frequently updated fields.
However, the WiredTiger storage engine has document-level concurrency and there is no need to create additional tables.
https://docs.mongodb.org/v3.0/core/wiredtiger/

Related

MongoDB - Database vs Collection memory-wise

Lets say for example i need to have 5 collections, each collection is about 10GB.
What is the difference in performance, with emphasis on memory usage, between assigning each said collection to a database, versus having all of these collections in the same database?
Also, in this scenario, whats the difference between MMAPv1 storage engine and TigerWire?
In MongoDB, database is just a namespace. All data is stored in collections. This is true for both MMAPv1 (deprecated in 4.0, not available anymore in the upcoming 4.2) and WiredTiger storage engine.
Due to this, separating your data into separate databases or not doesn't make any difference storage-wise, unless you specify directoryPerDB setting, which simply put collections related to a database in their own folder.

In MongoDB, does a lock apply to a collection, a database, or a server?

In a MongoDB server, there may be multiple databases, and each database can have multiple collections, and a collection can have multiple documents.
Does a lock apply to a collection, a database, or a server?
I asked this question because when designing MongoDB database, I want to determine what is stored in a database and what is in a collection. My data can be partitioned into different parts, and I hope to be able to move a part from a MongoDB server to a filesystem, without being hindered by the lock that applies to another part, so I wish to store the parts of data in a way that different parts have different locks.
Thanks.
From the official documentation : https://docs.mongodb.com/manual/faq/concurrency/
Basically, it's global / database / collection.
But with some specific storage engines, it can lock at document level too, for instance with WiredTiger (only with Mongo 3.0+)

Transaction support in MongoDB

I am new to MongoDB. I read that MongoDB does not support multi-document transactionshere http://docs.mongodb.org/manual/faq/fundamentals/.
If I want to save data in two collections(A and B) atomically, then i can't do that using MongoDB i.e. if save fails in case of B, still A will have the data. Isn't it a big disadvantage?
Still, people are using MongoDB rather than RDBMS. Why?
MongoDB 4.0 adds support for multi-document ACID transactions now.
For reference
See Refrence
UPDATE
MongoDB have already started to support multi-document transactions.
https://docs.mongodb.com/manual/core/transactions/
MongoDB does not support multi-document transactions.
However, MongoDB does provide atomic operations on a single document. Often these document-level atomic operations are sufficient to solve problems that would require ACID transactions in a relational database.
For example, in MongoDB, you can embed related data in nested arrays or nested documents within a single document and update the entire document in a single atomic operation. Relational databases might represent the same kind of data with multiple tables and rows, which would require transaction support to update the data atomically.
MongoDB doesn't support transactions, but saving one document is atomic.
So, it is better to design you database schema in such a way, that all the data needed to be saved atomically will be placed in one document.
MongoDB does not support transactions as in Relational DB. ACID postulates in transactions is a complete different functionality provided by storage engines in MySQL
Some of the features of InnoDB engine in MySQL:
Crash Recovery
Double write buffer
Auto commit settings
Isolation Level
This is what MongoDB community has to say:
MongoDB does not have support for traditional locking or complex transactions with rollback.
MongoDB aims to be lightweight, fast, and predictable in its performance. By keeping transaction support extremely simple, MongoDB can provide greater performance especially for partitioned or replicated systems with a number of database server processes.
The purpose of a transaction is to make sure that the whole database stays consistent while multiple operations take place.
But in contrary to most relational databases, MongoDB isn't designed to run on a single host. It is designed to be set up as a cluster of multiple shards where each shard is a replica-sets of multiple servers (optionally at different geographical locations).
But if you are still looking for way to make transactions possible:
Try using document level atomicity provided by mongo
two phase commit in Mongo provides simple transaction mechanism for basic operations
mongomvcc is built on the top of mongo and also supports transaction as they say
Hybrid of MySQL and Mongo
Multi-document updates or “multi-document transactions” using a two-phase commit approac described here: http://docs.mongodb.org/manual/tutorial/perform-two-phase-commits/
This question is quite old but for anyone who stumbles upon this page, you could use fawn. It's an npm package that solves this exact problem. Disclosure: I wrote it
Say you have two bank accounts, one belongs to John Smith and the other belongs to Broke Individual. You would like to transfer $20 from John Smith to Broke Individual. Assuming all first name and last name pairs are unique, this might look like:
var Fawn = require("fawn");
var task = Fawn.Task()
//assuming "Accounts" is the Accounts collection
task.update("Accounts", {firstName: "John", lastName: "Smith"}, {$inc: {balance: -20}})
.update("Accounts", {firstName: "Broke", lastName: "Individual"}, {$inc: {balance: 20}})
.run()
.then(function(){
//update is complete
})
.catch(function(err){
// Everything has been rolled back.
//log the error which caused the failure
console.log(err);
});
Caveat:
tasks are currently not isolated(working on that) so, technically, it's possible for two tasks to retrieve and edit the same document just because that's how MongoDB works.
It's really just a generic implementation of the two phase commit example on the tutorial site: https://docs.mongodb.com/manual/tutorial/perform-two-phase-commits/
Starting from version 4.0 MongoDB will add support for multi-document transactions. So you will have the power of the document model with ACID guarantees in MongoDB.
Transactions in MongoDB will be like transactions in relational databases.
For details visit this link: https://www.mongodb.com/blog/post/multi-document-transactions-in-mongodb?jmp=community
Only support Single Document Transaction.
You can see it at: https://docs.mongodb.com/v3.2/tutorial/perform-two-phase-commits/

creating a different database for each collection in MongoDB 2.2

MongoDB 2.2 has a write lock per database as opposed to a global write lock on the server in previous versions. So would it be ok if i store each collection in a separate database to effectively have a write lock per collection.(This will make it look like MyISAM's table level locking). Is this approach faulty?
There's a key limitation to the locking and that is the local database. That database includes a the oplog collection which is used for replication.
If you're running in production, you should be running with Replica Sets. If you're running with Replica Sets, you need to be aware of the write lock effect on that database.
Breaking out your 10 collections into 10 DBs is useless if they all block waiting for the oplog.
Before taking a large step to re-write, please ensure that the oplog will not cause issues.
Also, be aware that MongoDB implements DB-level security. If you're using any security features, you are now creating more DBs to secure.
Yes that will work, 10gen actually offers this as an option in their talks on locking.
I probably isolate every collection, though. Most databases seem to have 2-5 high activity collections. For the sake of simplicity it's probably better to keep the low activity collections grouped in one DB and put high activity collections in their own databases.

What are the advantages of using a schema-free database like MongoDB compared to a relational database?

I'm used to using relational databases like MySQL or PostgreSQL, and combined with MVC frameworks such as Symfony, RoR or Django, and I think it works great.
But lately I've heard a lot about MongoDB which is a non-relational database, or, to quote the official definition,
a scalable, high-performance, open
source, schema-free, document-oriented
database.
I'm really interested in being on edge and want to be aware of all the options I'll have for a next project and choose the best technologies out there.
In which cases using MongoDB (or similar databases) is better than using a "classic" relational databases?
And what are the advantages of MongoDB vs MySQL in general?
Or at least, why is it so different?
If you have pointers to documentation and/or examples, it would be of great help too.
Here are some of the advantages of MongoDB for building web applications:
A document-based data model. The basic unit of storage is analogous to JSON, Python dictionaries, Ruby hashes, etc. This is a rich data structure capable of holding arrays and other documents. This means you can often represent in a single entity a construct that would require several tables to properly represent in a relational db. This is especially useful if your data is immutable.
Deep query-ability. MongoDB supports dynamic queries on documents using a document-based query language that's nearly as powerful as SQL.
No schema migrations. Since MongoDB is schema-free, your code defines your schema.
A clear path to horizontal scalability.
You'll need to read more about it and play with it to get a better idea. Here's an online demo:
http://try.mongodb.org/
There are numerous advantages.
For instance your database schema will be more scalable, you won't have to worry about migrations, the code will be more pleasant to write... For instance here's one of my model's code :
class Setting
include MongoMapper::Document
key :news_search, String, :required => true
key :is_availaible_for_iphone, :required => true, :default => false
belongs_to :movie
end
Adding a key is just adding a line of code !
There are also other advantages that will appear in the long run, like a better scallability and speed.
... But keep in mind that a non-relational database is not better than a relational one. If your database has a lot of relations and normalization, it might make little sense to use something like MongoDB. It's all about finding the right tool for the job.
For more things to read I'd recommend taking a look at "Why I think Mongo is to Databases what Rails was to Frameworks" or this post on the mongodb website. To get excited and if you speak french, take a look at this article explaining how to set up MongoDB from scratch.
Edit: I almost forgot to tell you about this railscast by Ryan. It's very interesting and makes you want to start right away!
The advantage of schema-free is that you can dump whatever your load is in it, and no one will ever have any ground for complaining about it, or for saying that it was wrong.
It also means that whatever you dump in it, remains totally void of meaning after you have done so.
Some would label that a gross disadvantage, some others won't.
The fact that a relational database has a well-established schema, is a consequence of the fact that it has a well-established set of extensional predicates, which are what allows us to attach meaning to what is recorded in the database, and which are also a necessary prerequisite for us to do so.
Without a well-established schema, no extensional predicates, and without extensional precicates, no way for the user to make any meaning out of what was stuffed in it.
My experience with Postgres and Mongo after working with both the databases in my projects .
Postgres(RDBMS)
Postgres is recommended if your future applications have a complicated schema that needs lots of joins or all the data have relations or if we have heavy writing. Postgres is open source, faster, ACID compliant and uses less memory on disk, and is all around good performant for JSON storage also and includes full serializability of transactions with 3 levels of transaction isolation.
The biggest advantage of staying with Postgres is that we have best of both worlds. We can store data into JSONB with constraints, consistency and speed. On the other hand, we can use all SQL features for other types of data. The underlying engine is very stable and copes well with a good range of data volumes. It also runs on your choice of hardware and operating system. Postgres providing NoSQL capabilities along with full transaction support, storing JSON documents with constraints on the fields data.
General Constraints for Postgres
Scaling Postgres Horizontally is significantly harder, but doable.
Fast read operations cannot be fully achieved with Postgres.
NO SQL Data Bases
Mongo DB (Wired Tiger)
MongoDB may beat Postgres in dimension of “horizontal scale”. Storing JSON is what Mongo is optimized to do. Mongo stores its data in a binary format called BSONb which is (roughly) just a binary representation of a superset of JSON. MongoDB stores objects exactly as they were designed. According to MongoDB, for write-intensive applications, Mongo says the new engine(Wired Tiger) gives users an up to 10x increase in write performance(I should try this), with 80 percent reduction in storage utilization, helping to lower costs of storage, achieve greater utilization of hardware.
General Constraints of MongoDb
The usage of a schema less storage engine leads to the problem of implicit schemas. These schemas aren’t defined by our storage engine but instead are defined based on application behavior and expectations.
Stand-alone NoSQL technologies do not meet ACID standards because they sacrifice critical data protections in favor of high throughput performance for unstructured applications. It’s not hard to apply ACID on NoSQL databases but it would make database slow and inflexible up to some extent. “Most of the NoSQL limitations were optimized in the newer versions and releases which have overcome its previous limitations up to a great extent”.
It's all about trade offs. MongoDB is fast but not ACID, it has no transactions. It is better than MySQL in some use cases and worse in others.
Bellow Lines Written in MongoDB: The Definitive Guide.
There are several good reasons:
Keeping different kinds of documents in the same collection can be a
nightmare for developers and admins. Developers need to make sure
that each query is only returning documents of a certain kind or
that the application code performing a query can handle documents of
different shapes. If we’re querying for blog posts, it’s a hassle to
weed out documents containing author data.
It is much faster to get a list of collections than to extract a
list of the types in a collection. For example, if we had a type key
in the collection that said whether each document was a “skim,”
“whole,” or “chunky monkey” document, it would be much slower to
find those three values in a single collection than to have three
separate collections and query for their names
Grouping documents of the same kind together in the same collection
allows for data locality. Getting several blog posts from a
collection containing only posts will likely require fewer disk
seeks than getting the same posts from a collection con- taining
posts and author data.
We begin to impose some structure on our documents when we create
indexes. (This is especially true in the case of unique indexes.)
These indexes are defined per collection. By putting only documents
of a single type into the same collection, we can index our
collections more efficiently
After a question of databases with textual storage), I glanced at MongoDB and similar systems.
If I understood correctly, they are supposed to be easier to use and setup, and much faster. Perhaps also more secure as the lack of SQL prevents SQL injection...
Apparently, MongoDB is used mostly for Web applications.
Basically, and they state that themselves, these databases aren't suited for complex queries, data-mining, etc. But they shine at retrieving quickly lot of flat data.
MongoDB supports search by fields, regular expression searches.Includes user defined java script functions.
MongoDB can be used as a file system, taking advantage of load balancing and data replication features over multiple machines for storing files.