Multiple databases v/s Multiple collections - MongoDB 3.2 - mongodb

I know this question has been asked a number of times here. However, I am unable to find a satisfying answer and reach on a conclusion.
This question is specifically for Mongo DB version 3.2. Should I have separate DBs and collection for different apps, or just one DB with all collections within it?
To simplify it further, let’s say I have about 15-20k apps on a server. Is it advisable to create a different database for each of these apps (with 10 collections/app), or create just one database and store all collections (20k apps * 10 collections = 200k collections) in it?
Also, this would be called from single Node app so need to consider the performance on having multi DB connections.

Should I have separate DBs and collection for different apps, or just one DB with all collections within it.
As I understand there is not any concrete answer or any concrete algorithm for this question It depends on no of connections with the database actually.
No of connections depends on the upcoming concurrent requests count so first see logically that Is it really worth to create a different database for each of these apps
Or another way (with 10 collections/app)
second way is you can judge the response time with some load testing (If possible)

Related

MongoDB multiple collections or multiple databases

We are using .net Core and node.js micro services some of them with mongoDB.
Currently we got the following DB structure :
Every customer gets his own Database.
So if we got a micro service for Invoices, every new customer adds 1 new DB for that micro service.
Invoice_customerA
Invoice_customerB
etc...
While the collections in each such DB remain the same (usually we got 1-3 collections in each DB)
In terms of logic - We choose the right DB by request input in runtime.
I am thinking now about changing it a bit, to start making separation on the collections instead:
So if we take the same example from before this time around this Invoice Service will only have 1 DB,
Invoice_allCustomers
and there will be 1 new collection for each customer in it ( or more if there were more collections for this service).
collection_customerA
collection_customerB
What I am trying to understand is if there is any difference performance wise?
Or is it mostly a "cosmetic" change?
Or maybe there are some other considerations?
P.S.
If the change is mostly cosmetic I am thinking that the new solution is better for us since we usually got only 1-2 collections per each micro service.
And it will be easier to navigate when there are significantly less Databases.
As far as I know in microservices,each service should have its own database. If it is not a different service than you can use one database with different collections in it. It is more of cosmetic changes but I should also warn you that mongodb still has it limits which you can find here. It really depends on the amount of data that will be stored and retrieved.

Should I use different databases or just different collections in MongoDB to store user information and rest of the database?

I am pretty new to MongoDB. I am creating an application where I will have users and a lot of other data.I have already created a database where I am storing user information using MongoDB. Now I have to create a new database or collection to store rest of the data. What are the pros and cons of creating different or different collection ?
I use MongoDB in a very similar way and have already thought a lot about dividing my database. Here are some of the things we considered:
Using 2 databases is harder to maintain, your application will have to know which database to update, also it can increase the costs (even more if you intend to monitor the databases and host on different infrastructure).
Mongo 2 used to lock the entire database when updating, so I think it would be better to separate then, but Mongo 3 with WiredTiger locks only the document, so you won't have the problems we used to have in the past.
One good thing about splitting the database in two is that even if your data explodes one database, the other will still work
IMHO, if you use a decent machine to store your databases and monitor it the right way, you won't have any troubles keeping just one until your system is giant with millions of active users. You can also use Replica Sets and Sharding to increase efficiency.

Multiple single-collection databases, or database with multiple collections?

Is there any advantage to using multiple collections within a database, when multiple databases each with a single collection would accomplish the same thing? From what I can gather, using multiple databases reduces lock contention because locks are per-database, so I wonder why you'd ever want to put more than one collection in a single database.
The only downside I've found mentioned is that there's some overhead (~200MB) per database, and that with a large number of databases, OS file handles can become a limitation, but I imagine that if you have enough collections/databases for those to be issues, you've got too many databases. These overheads are bearable in my case; I'd like to know if there's something else I should know about.
EDIT: In my situation there are currently 30 collections spread across 8 databases. I'm asking this question because I think it may be better to make this 30 collections across 30 databases. There's no real reason for the current structure; it was set up by a team who don't know much about databases so picked arbitrarily. It's now used frequently enough for lock contention to be a factor (profiling shows some operations spending >1 second waiting for locks). We'll scale horizontally too, I just saw this as a potential low-hanging fruit since it just means using a different database name for some operations, instead of a different collection name.
Apologies if this has been asked before; the only similar questions I've found have been about whether to use e.g. "a collection per user" which isn't quite the same thing. In my case I have heterogeneous documents which I definitely do want stored in different collections, I'd just like to know whether to store those collections in the same database or not.
may be duplicate of this: creating a different database for each collection in MongoDB 2.2
in my solution I created own database for each large and highload collection, for rest collections I create another common database. Mongodb implements locks on a per-database basis for most read and write operations: http://docs.mongodb.org/manual/faq/concurrency/ But locks in mongoDb not so nasty as in SQL.
This solution increase productivity of mongodb for me.

120 mongodb collections vs single collection - which one is more efficient?

I'm new to mongodb and I'm facing a dilemma regarding my DB Schema design:
Should I create one single collection or put my data into several collections (we could call these categories I suppose).
Now I know many such questions have been asked, but I believe my case is different for 2 reasons:
If I go for many collections, I'll have to create about 120 and that's it. This won't grow in the future.
I know I'll never need to query or insert into multiple collections. I will always have to query only one, since a document in collection X is not related to any document stored in the other collections. Documents may hold references to other parts of the DB though (like userId etc).
So my question is: could the 120 collections improve query performance? Is this a useful optimization in my case?
Or should I just go for single collection + sharding?
Each collection is expected hold millions of documents. If use only one, it will store billions of docs.
Thanks in advance!
------- Edit:
Thanks for the great answers.
In fact the 120 collections is only a self made limit, it's not really optimal:
The data in the collections is related to web publishers. There could be millions of these (any web site can join).
I guess the ideal situation would be if I could create a collection for each publisher (to hold their data only). But obviously, this is not possible due to mongo limitations.
So I came up with the idea of a fixed number of collections to at least distribute the data somehow. Like: collection "A_XX" would hold XX Platform related data for publishers whose names start with "A".. etc. We'll only support a few of these platforms, so 120 collections should be more than enough.
On another website someone suggested using many databases instead of many collections. But this means overhead and then I would have to use / manage many different connections.
What do you think about this? Is there a better solution?
Sorry for not being specific enough in my original question.
Thanks in advance
Single Sharded Collection
The edited version of the question makes the actual requirement clearer: you have a collection that can potentially grow very large and you want an approach to partition the data. The artificial collection limit is your own planned partitioning scheme.
In that case, I think you would be best off using a single collection and taking advantage of MongoDB's auto-sharding feature to distribute the data and workload to multiple servers as required. Multiple collections is still a valid approach, but unnecessarily complicates your application code & deployment versus leveraging core MongoDB features. Assuming you choose a good shard key, your data will be automatically balanced across your shards.
You can do not have to shard immediately; you can defer the decision until you see your workload actually requiring more write scale (but knowing the option is there when you need it). You have other options before deciding to shard as well, such as upgrading your servers (disks and memory in particular) to better support your workload. Conversely, you don't want to wait until your system is crushed by workload before sharding so you definitely need to monitor the growth. I would suggest using the free MongoDB Monitoring Service (MMS) provided by 10gen.
On another website someone suggested using many databases instead of many collections. But this means overhead and then I would have to use / manage many different connections.
Multiple databases will add significantly more administrative overhead, and would likely be overkill and possibly detrimental for your use case. Storage is allocated at the database level, so 120 databases would be consuming much more space than a single database with 120 collections.
Fixed number of collections (original answer)
If you can plan for a fixed number of collections (120 as per your original question description), I think it makes more sense to take this approach rather than using a monolithic collection.
NOTE: the design considerations below still apply, but since the question was updated to clarify that multiple collections are an attempted partitioning scheme, sharding a single collection would be a much more straightforward approach.
The motivations for using separate collections would be:
Your documents for a single large collection will likely have to include some indication of the collection subtype, which may need to be added to multiple indexes and could significantly increase index sizes. With separate collections the subtype is already implicit in the collection namespace.
Sharding is enabled at the collection level. A single large collection only gives you an "all or nothing" approach, whereas individual collections allow you to control which subset(s) of data need to be sharded and choose more appropriate shard keys.
You can use the compact to command to defragment individual collections. Note: compact is a blocking operation, so the normal recommendation for a HA production environment would be to deploy a replica set and use rolling maintenance (i.e. compact the secondaries first, then step down and compact the primary).
MongoDB 2.4 (and 2.2) currently have database-level write lock granularity. In practice this has not proven a problem for the vast majority of use cases, however multiple collections would allow you to more easily move high activity collections into separate databases if needed.
Further to the previous point .. if you have your data in separate collections, these will be able to take advantage of future improvements in collection-level locking (see SERVER-1240 in the MongoDB Jira issue tracker).
The main problem here is that you will gain very little performance in the current MongoDB versions if you separate out collections into the same database. To get any sort of extra performance over a single collection setup you would need to move the collections out into separate databases, then you will have operational overhead for judging what database you should query etc.
So yes, you could go for 120 collections easily however, you won't really gain anything currently due to: https://jira.mongodb.org/browse/SERVER-1240 not being implemented (anytime soon).
Housing billions of documents in a single collection isn't too bad. I presume that even if you was to house this in separate collections it probably would not be on a single server either, just like sharding a single collection, so any speed reduction due to multi server setup will also not matter in this case.
In my personal opinion, using a single collection is easier on everything.

Using a Database vs a Collection in MongoDB

I am building a site with users who have discussions and write blogs and plan to use MongoDB as the database for the site. Which architecture option would be more efficient and allow for easier data flow between them:
One Database with a Blogs Collection, a Discussions Collection, and a User Activity Collection? Each collection would be sharded as appropriate.
A Blogs Database, a Discussions Database, and a User Activity Database? Each database would be broken into collections and sha rded as appropriate.
It won't make a big difference whether you put everything into a single database or into multiple databases until you find you need to do something that's handled on the database level, for example access control, or placing database files on separate physical devices (to reduce I/O contention).
In addition, currently locking granularity is on the database level so if you happen to have a very large number of small writes having them go to different databases will mean that they will not be contending for the same lock. Since you anticipate sharding you can also place each database on a different shard which may allow you to defer actually needing to shard any particular collection as each shard would only be handling the traffic for that database's collection(s).
I would say if you are in doubt go ahead and put them in separate databases, it's unlikely to hurt and it may help.
Mongo will work, but getting familiar with it may take time depending on your experience.
If you use MySQL (or another SQL db) you may have an easier time. You should probably just create separate tables for your blogs, discussions, and activity, rather than multiple databases.
Another factor to consider is the size of your databases. An SQL database is fine for most applications, even fairly large ones. MongoDB (and other NoSQL db's) are great for scaling big data.
Hope this helps!