Should I use different databases or just different collections in MongoDB to store user information and rest of the database? - mongodb

I am pretty new to MongoDB. I am creating an application where I will have users and a lot of other data.I have already created a database where I am storing user information using MongoDB. Now I have to create a new database or collection to store rest of the data. What are the pros and cons of creating different or different collection ?

I use MongoDB in a very similar way and have already thought a lot about dividing my database. Here are some of the things we considered:
Using 2 databases is harder to maintain, your application will have to know which database to update, also it can increase the costs (even more if you intend to monitor the databases and host on different infrastructure).
Mongo 2 used to lock the entire database when updating, so I think it would be better to separate then, but Mongo 3 with WiredTiger locks only the document, so you won't have the problems we used to have in the past.
One good thing about splitting the database in two is that even if your data explodes one database, the other will still work
IMHO, if you use a decent machine to store your databases and monitor it the right way, you won't have any troubles keeping just one until your system is giant with millions of active users. You can also use Replica Sets and Sharding to increase efficiency.

Related

Is there any performance impact to have multiple Mongo databases?

We are currently working on an application using Mongo and we try to evaluate benefits and constraints on each differents architecture choices related to spreading data on multiple databases/collections or using a single shared one.
Is there any performance penalties between one single database with a lot of collections or many databases with less collections per database ?
From what I understand it does not seem to have any impact because sharding is done per collection basis but I would like some confirmations.
Regards
By performance, I guess you mean read/write speed. Using multiple databases with fewer collections would definitely increase your read/write speed since each database can handle more read/write operations on the collections associated with them. 
However, spreading data across databases this way I believe can bring about extra complexity to your project, depending on how your codebase is structured, it might introduce complexity to your application logic, things like backup and other admin database operations won't be straight forward, cross collection ad-hoc queries for collection that lives in different databases would be next to impossible.
If the goal of the architecture design is to ensure high read/write speed, you can still go with using a single DB that can be auto-scaled at the deployment level. I don't know much about it but I think Replication is a MongoDB feature that can help you achieve such auto-scaling and if you are in for database-as-a-service, you should check out MongoDB Atlas, auto-scaling is out of the box.

MongoDB - Can I shard all new db's (Created by application) automatically?

My team will deploy a new version of our app (Capture social media posts, hashtags etc.) they create a different DB for each user and we may have thousands of collections on each DB. I read all mongoDB shard documentation and I saw that I can only shard an collection or one DB at time, I'm missing something ?
We will start this new version fresh, without any databases and we will grow from 0 again (For now, we have 23k users) but we will escalate this number really quickly (100.000+ at the end of the year)
My question is: I really need a Shard cluster ? (My test setup have 3 shards with 3 microshards, 3 config servers and 2 mongos) for now, in production, i have a large server doing all the hard work but i dont want to scale to top, the horizontal scale is the best choice, i think.
Can I shard all my databases automatically or I really need to do that one by one doing the shard key procedure and so. ?
Thanks in advance
You are reading correctly. What you intend to do is so far away from what any sensible person would do that MongoDB doesn't offer any tools to support this. If you really want to go with this WTF solution, your application will be responsible to set up sharding for each collection it creates. This forces you to give administration permission to the application (despite what any security guides recommend).
"Will you really need a sharded cluster" - that depends on how much data you will have and how often you query it with what kind of query. But it is unlikely to work anyway, because your sharded cluster will have to manage (100,000 databases* 1.000 collections) = a hundred million collections. MongoDB is not designed for scaling in that direction. The cluster will likely be so busy with bookkeeping that you won't really see any notable performance gain.
It is also questionable if clustering would even theoretically make sense. Clustering is usually only useful when you have very large collections. But in your scenario where your data is so heavily fragmented into a million collections, each individual collection is unlikely to be very large.
If you really want to go this route, it might in fact be a better solution to separate the databases physically by assigning each user to a database server.
Or you could just build a database architecture like a normal team would with one database for all users and one collection per type of document. You would then speed up lookups by creating a compound index on user and whatever criteria you used to tell which database a document belonged to. This index might also be a good shard key.

Multiple single-collection databases, or database with multiple collections?

Is there any advantage to using multiple collections within a database, when multiple databases each with a single collection would accomplish the same thing? From what I can gather, using multiple databases reduces lock contention because locks are per-database, so I wonder why you'd ever want to put more than one collection in a single database.
The only downside I've found mentioned is that there's some overhead (~200MB) per database, and that with a large number of databases, OS file handles can become a limitation, but I imagine that if you have enough collections/databases for those to be issues, you've got too many databases. These overheads are bearable in my case; I'd like to know if there's something else I should know about.
EDIT: In my situation there are currently 30 collections spread across 8 databases. I'm asking this question because I think it may be better to make this 30 collections across 30 databases. There's no real reason for the current structure; it was set up by a team who don't know much about databases so picked arbitrarily. It's now used frequently enough for lock contention to be a factor (profiling shows some operations spending >1 second waiting for locks). We'll scale horizontally too, I just saw this as a potential low-hanging fruit since it just means using a different database name for some operations, instead of a different collection name.
Apologies if this has been asked before; the only similar questions I've found have been about whether to use e.g. "a collection per user" which isn't quite the same thing. In my case I have heterogeneous documents which I definitely do want stored in different collections, I'd just like to know whether to store those collections in the same database or not.
may be duplicate of this: creating a different database for each collection in MongoDB 2.2
in my solution I created own database for each large and highload collection, for rest collections I create another common database. Mongodb implements locks on a per-database basis for most read and write operations: http://docs.mongodb.org/manual/faq/concurrency/ But locks in mongoDb not so nasty as in SQL.
This solution increase productivity of mongodb for me.

Using a Database vs a Collection in MongoDB

I am building a site with users who have discussions and write blogs and plan to use MongoDB as the database for the site. Which architecture option would be more efficient and allow for easier data flow between them:
One Database with a Blogs Collection, a Discussions Collection, and a User Activity Collection? Each collection would be sharded as appropriate.
A Blogs Database, a Discussions Database, and a User Activity Database? Each database would be broken into collections and sha rded as appropriate.
It won't make a big difference whether you put everything into a single database or into multiple databases until you find you need to do something that's handled on the database level, for example access control, or placing database files on separate physical devices (to reduce I/O contention).
In addition, currently locking granularity is on the database level so if you happen to have a very large number of small writes having them go to different databases will mean that they will not be contending for the same lock. Since you anticipate sharding you can also place each database on a different shard which may allow you to defer actually needing to shard any particular collection as each shard would only be handling the traffic for that database's collection(s).
I would say if you are in doubt go ahead and put them in separate databases, it's unlikely to hurt and it may help.
Mongo will work, but getting familiar with it may take time depending on your experience.
If you use MySQL (or another SQL db) you may have an easier time. You should probably just create separate tables for your blogs, discussions, and activity, rather than multiple databases.
Another factor to consider is the size of your databases. An SQL database is fine for most applications, even fairly large ones. MongoDB (and other NoSQL db's) are great for scaling big data.
Hope this helps!

creating a different database for each collection in MongoDB 2.2

MongoDB 2.2 has a write lock per database as opposed to a global write lock on the server in previous versions. So would it be ok if i store each collection in a separate database to effectively have a write lock per collection.(This will make it look like MyISAM's table level locking). Is this approach faulty?
There's a key limitation to the locking and that is the local database. That database includes a the oplog collection which is used for replication.
If you're running in production, you should be running with Replica Sets. If you're running with Replica Sets, you need to be aware of the write lock effect on that database.
Breaking out your 10 collections into 10 DBs is useless if they all block waiting for the oplog.
Before taking a large step to re-write, please ensure that the oplog will not cause issues.
Also, be aware that MongoDB implements DB-level security. If you're using any security features, you are now creating more DBs to secure.
Yes that will work, 10gen actually offers this as an option in their talks on locking.
I probably isolate every collection, though. Most databases seem to have 2-5 high activity collections. For the sake of simplicity it's probably better to keep the low activity collections grouped in one DB and put high activity collections in their own databases.