MongoDB vs RDBMS in terms of schema design - mongodb

I have a few questions about MongoDB. I've been using MongoDB since a few days ago and I'm very confused because I have experience only in relational databases such as MySQL and PostgreSQL.
My first attempt using Mongo went wrong: I tried to put all the entities as embedded documents in a single collection. The problem here is that one of these entities had a lot of millions of documents. Another problem is the dificulty of getting some data in that structure. So I designed again my "schema" separating some entities as differents collections with related id's, like a primary key in a RDBMS. This helps me a lot to do some "joins", because when the data is embedded, $lookup returns the whole document (its parent) and not only the ones who matches in the localField and foreignField. This is very strange to me, but I had a lot of flexibility to make querys and get only the data that I needed to.
So, my questions is: if I have different collections, one for each entity and references among them, what's the difference between a RDBMS and MongoDB? Is Mongo (in this context) more efficient than a RDBMS (knowing that some of the entities grows in millions of data every day)?
Thanks!

Related

Single big collection for all products vs Separate collections for each Product category

I'm new to NoSQL and I'm trying to figure out the best way to model my database. I'll be using ArangoDB in the project but I think this question also stands if using MongoDB.
The database will store 12 categories of products. Each category is expected to hold hundreds or thousands of products. Products will also be added / removed constantly.
There will be a number of common fields across all products, but each category will also have unique fields / different restrictions to data.
Keep in mind that there are instances where I'd need to query all the categories at the same time, for example to search a product across all categories, and other instances where I'll only need to query one category.
Should I create one single collection "Product" and use a field to indicate the category, or create a seperate collection for each category?
I've read many questions related to this idea (1 collection vs many) but I haven't been able to reach a conclusion, other than "it dependes".
So my question is: In this specific use case which option would be most optimal, multiple collections vs single collection + sharding, in terms of performance and speed ?
Any help would be appreciated.
As you mentioned, you need to play with your data and use-case. You will have better picture.
Some decisions required as below.
Decide the number of documents you will have in near future. If you will have 1m documents in an year, then try with at least 3m data
Decide the number of indices required.
Decide the number of writes, reads per second.
Decide the size of documents per category.
Decide the query pattern.
Some inputs based on the requirements
If you have more writes with more indices, then single monolithic collection will be slower as multiple indices needs to be updated.
As you have different set of fields per category, you could try with multiple collections.
There is $unionWith to combine data from multiple collections. But do check the performance it purely depends on the above decisions. Note this open issue also.
If you decide to go with monolithic collection, defer the sharding. Implement this once you found that queries are slower.
If you have more writes on the same document, writes will be executed sequentially. It will slow down your read also.
Think of reclaiming the disk space when more data is cleared from the collections. Multiple collections do good here.
The point which forces me to suggest monolithic collections is that I'd need to query all the categories at the same time. You may need to add more categories, but combining all of them in single response would not be better in terms of performance.
As you don't really have a join use case like in RDBMS, you can go with single monolithic collection from model point of view. I doubt you could have a join key.
If any of my points are incorrect, please let me know.
To SQL or to NoSQL?
I think that before you implement this in NoSQL, you should ask yourself why you are doing that. I quite like NoSQL but some data is definitely a better fit to that model than others.
The data you are describing is a classic case for a relational SQL DB. That's fine if it's a hobby project and you want to try NoSQL, but if this is for a production environment or client, you are likely making the situation more difficult for them.
Relational or non-relational?
You mention common fields across all products. If you wish to update these fields and have those updates reflected in all products, then you have relational data.
Background
It may be worth reading Sarah Mei 2013 article about this. Skip to the section "How MongoDB Stores Data" and read from there. Warning: the article is called "Why You Should Never Use MongoDB" and is (perhaps intentionally) somewhat biased against Mongo, so it's important to read this through the correct lens. The message you should get from this article is that MongoDB is not a good fit for every data type.
Two strategies for handling relational data in Mongo:
every time you update one of these common fields, update every product's document with the new common field data. This is generally only ok if you have few updates or few documents, but not both.
use references and do joins.
In Mongo, joins typically happen code-side (multiple db calls)
In Arango (and in other graph dbs, as well as some key-value stores), the joins happen db-side (single db call)
Decisions
These are important factors to consider when deciding which DB to use and how to model your data
I've used MongoDB, ArangoDB and Neo4j.
Mongo definitely has the best tooling and it's easy to find help, but I don't believe it's good fit in this case
Arango is quite pleasant to work with, but doesn't yet have the adoption that it deserves
I wouldn't recommend Neo4j to anyone looking for a NoSQL solution, as its nodes and relations only support flat properties (no nesting, so not real documents)
It may also be worth considering MariaDB or Postgres

Are there technical downsides to using a single collection over multiple collections in MongoDB?

Since MongoDB is schemaless, I could just drop all my documents into a single collection, with a key collection and an index on that key.
For example this:
db.getCollection('dogs').find()
db.getCollection('cars').find()
Would become this:
db.getCollection('all').find({'collection': 'dogs'})
db.getCollection('all').find({'collection': 'cars'})
Is there any technical downside to doing this?
There are multiple reasons to have different collections, maybe the two most importants are:
Performance: even if MongoDB has been designed to be flexible, it doesn't prevent the need to have indexes on fields that will be used during the search. You would have dramatic response times if the collection is too heterogeneous.
Maintenability/evolutivity: design should be driven by the usecases (usually you'll store the data as it's received by the application) and the design should be explicit to anyone looking at the database collections
MongoDB University is a great e-learning platform, it is free and there is in particular this course:
M320: Data Modeling
schema questions are often better understood by working backwards from the queries you'll rely on and how the data will get written.... if you were going to query Field1 AND Field2 together in 1 query statement you do want them in the same collection....dogs and cars don't sound very related while dogs and cats do...so really look at how you're going to want to query.....joining collections is not really ideal - doable via $lookup but not ideal....

When should I create a new collections in MongoDB?

So just a quick best practice question here. How do I know when I should create new collections in MongoDB?
I have an app that queries TV show data. Should each show have its own collection, or should they all be store within one collection with relevant data in the same document. Please explain why you chose the approach you did. (I'm still very new to MongoDB. I'm used to MySql.)
The Two Most Popular Approaches to Schema Design in MongoDB
Embed data into documents and store them in a single collection.
Normalize data across multiple collections.
Embedding Data
There are several reasons why MongoDB doesn't support joins across collections, and I won't get into all of them here. But the main reason why we don't need joins is because we can embed relevant data into a single hierarchical JSON document. We can think of it as pre-joining the data before we store it. In the relational database world, this amounts to denormalizing our data. In MongoDB, this is about the most routine thing we can do.
Normalizing Data
Even though MongoDB doesn't support joins, we can still store related data across multiple collections and still get to it all, albeit in a round about way. This requires us to store a reference to a key from one collection inside another collection. It sounds similar to relational databases, but MongoDB doesn't enforce any of key constraints for us like most relational databases do. Enforcing key constraints is left entirely up to us. We're good enough to manage it though, right?
Accessing all related data in this way means we're required to make at least one query for every collection the data is stored across. It's up to each of us to decide if we can live with that.
When to Embed Data
Embed data when that embedded data will be accessed at the same time as the rest of the document. Pre-joining data that is frequently used together reduces the amount of code we have to write to query across multiple collections. It also reduces the number of round trips to the server.
Embed data when that embedded data only pertains to that single document. Like most rules, we need to give this some thought before blindly following it. If we're storing an address for a user, we don't need to create a separate collection to store addresses just because the user might have a roommate with the same address. Remember, we're not normalizing here, so duplicating data to some degree is ok.
Embed data when you need "transaction-like" writes. Prior to v4.0, MongoDB did not support transactions, though it does guarantee that a single document write is atomic. It'll write the document or it won't. Writes across multiple collections could not be made atomic, and update anomalies could occur for how many ever number of scenarios we can imagine. This is no longer the case since v4.0, however it is still more typical to denormalize data to avoid the need for transactions.
When to Normalize Data
Normalize data when data that applies to many documents changes frequently. So here we're talking about "one to many" relationships. If we have a large number of documents that have a city field with the value "New York" and all of a sudden the city of New York decides to change its name to "New-New York", well then we have to update a lot of documents. Got anomalies? In cases like this where we suspect other cities will follow suit and change their name, then we'd be better off creating a cities collection containing a single document for each city.
Normalize data when data grows frequently. When documents grow, they have to be moved on disk. If we're embedding data that frequently grows beyond its allotted space, that document will have to be moved often. Since these documents are bigger each time they're moved, the process only grows more complex and won't get any better over time. By normalizing those embedded parts that grow frequently, we eliminate the need for the entire document to be moved.
Normalize data when the document is expected to grow larger than 16MB. Documents have a 16MB limit in MongoDB. That's just the way things are. We should start breaking them up into multiple collections if we ever approach that limit.
The Most Important Consideration to Schema Design in MongoDB is...
How our applications access and use data. This requires us to think? Uhg! What data is used together? What data is used mostly as read-only? What data is written to frequently? Let your applications data access patterns drive your schema, not the other way around.
The scope you've described is definitely not too much for "one collection". In fact, being able to store everything in a single place is the whole point of a MongoDB collection.
For the most part, you don't want to be thinking about querying across combined tables as you would in SQL. Unlike in SQL, MongoDB lets you avoid thinking in terms of "JOINs"--in fact MongoDB doesn't even support them natively.
See this slideshare:
http://www.slideshare.net/mongodb/migrating-from-rdbms-to-mongodb?related=1
Specifically look at slides 24 onward. Note how a MongoDB schema is meant to replace the multi-table schemas customary to SQL and RDBMS.
In MongoDB a single document holds all information regarding a record. All records are stored in a single collection.
Also see this question:
MongoDB query multiple collections at once

MongoDB data modeling - separate or combine collections?

i have a question for the performance in meteorJS. Before i used meteorJS is always wrote my Applications in PHP and MySQL. In MySQL i always created a lot of tables with many connections betweens them.
For example:
Table User
id;login;password;email
Table User_Data
user_id;name;age
My questions is now how i have to design my MongoDB collections. Its nice that the collection are build like js objects so i dont have to predesign my tables and can always easy change the collumns. But is it better to combine all data to one collection or to several collections ?
For example:
Table User
_id;login;password;email;data:{name;age}
Is it better or worse for the performance ? Or is it the wrong pattern to design MongoDB Collections ?
The question mainly about MongoDB data modeling. What you'll learn applies to MongoDB used with Meteor or with anything else.
http://docs.mongodb.org/manual/data-modeling/ talks about data modeling with MongoDB and is a good introduction.
In your particular case, you can read more about how to avoid JOINs in MongoDB.

Many-to-many relationships in CouchDB or MongoDB

I have an MSSQL database which I am considering porting to CouchDB or MongoDB. I have a many-to-many relationship within the SQL db which has hundreds of thousands rows in the xref table, corresponding to tens of thousands of rows in the tables on each side of the relationship. Will CouchDB and/or MongoDB be able to handle this data, and what would be the best way of formatting the relevant documents for performant querying? Thanks very much.
For CouchDB, I would highly recommend reading this article about Entity Relationships.
One thing I would note in CouchDB is to be careful of attempting to "normalize" a non-relational data model. The document-based storage offers you a great deal of flexibility, and it's seldom the best idea to abstract everything into as many "document types" as you can think of. Many times, it's best to leave much of your data within the same document unless you have clear cases where separate entities exist.
One common use-case of many-to-many relationships is implementing tagging. There are articles about different methods you can use to accomplish this in CouchDB. It may apply to your requirements, it may not, but it's probably worth a read.
Since the 'collection' model of MongoDB is similar to tables you can of course maintain
the m:n relationship inside a dedicated mapping collection (using the _id of the related documents of the referenced documents from other collections).
If you can: consider redesign your application using embedded documents.
http://www.mongodb.org/display/DOCS/Schema+Design
In general: try to turn off your memories to a RDBMS when working with MongoDB.
Blindly copying the database design from RDBMS to MongoDB is neither helpful nor adviceable nor will it work in general.