Why does MongoDB have collections - mongodb

MongoDB being document-oriented, the structure of collections seems to be a special case of documents. By that I mean one can define a document to contain other documents. So a collection is just a document containing other documents.
So why do we need collections after all?

Logically yes, you could design a database system like that, but practically speaking no.
A collection has indexes on the documents in it.
A collection requires the documents in it to have unique ids.
A document is limited in size.

Object ids (_id top-level document attribute) must be unique within a collection. Multiple collections may have the same _id, just like in RDBMs where the key constraint is per-table, yet multiple tables may contain the same value for a key.

collections is a container for documents. so when you say a document that contain other documents that s kinda wrong because, already, a document can have inner documents.
Collection is the unit where you put together the documents. Be aware that due to schema free design, you can put anything in a collection but it s not a good design. so collection is kinda logical container for documents. same as tables in relational world.

Related

Mongodb Collections and documents

I get its point about dynamic schema architecture. But this raises my confusion that if its schema less, then why would we need more than one collection (other than collection size issue)? Then we can include all sort of documents in one collection?

In meteor/mongo collections, is the _id field unique in its collection or in the entire database?

I want to check that if I use _id fields that refer to documents from different collections, I will never have a duplicate _id, i.e. used in 2 different collections inside the same database.
Using meteor (so both in minimongo and mongodb), is the _id field unique in its collection or in the entire database?
The _id values you have in your database are generated by Meteor using Random.id(). These are unique across all collections.
Please note that the uniqueness of _id values in MonogoDB is ensured on the collection level, meaning that there is always a unique index on the _id field for every collection. There is no MongoDB mechanism in place that would ensure _id uniqueness across collections.
In any case, it is quite a safe assumption that Meteor's random IDs will never collide.

how mongodb scales when you have relationships between collections?

I have a mongodb which links documents (the data cannot be embedded)
Does the mongos cluster (http://docs.mongodb.org/manual/core/sharding-introduction/) support sharding when the documents are linked?
How this impacts the performance?
Thanks!
Considering there is nothing special about referenced documents, it is just a logical relationship inferred by the application layer and not MongoDB itself, sharding is supported. This applies for "manual" references, as well as DBRefs. You can even shard on a DBRef property, although I'm not sure as to why you'd want to considering a DBRef should have inherently low cardinality.
There is an impact in performance for both manual and DBRefs, in that multiple queries must be performed to "join" the data. From the docs:
To resolve DBRefs, your application must perform additional queries to
return the referenced documents. Many drivers have helper methods that
form the query for the DBRef automatically. The drivers do not
automatically resolve DBRefs into documents.
There is no such thing as "document links" in MongoDB. Just fields in documents of collection A which happen to have the same values as fields of documents in collection B. DBRef's are just a conversion on the application layer and get no special treatment whatsoever by the database.
What matters for sharding efficiency is how you define the shard key for the referenced collection. When the field you search by is part of the shard key of the collection, mongos can accelerate it by redirecting the query to the correct shard.
You likely want all documents of collection A which belong to the same document of collection B to reside on the same shard. That means you should have the shard key of A include the field of A which is an unique identifier of B (objectID, name or whatever).

MongoDB - Same _id in different collections

I have two collections called Users and ElectedUsers. ElectedUsers is a subset of Users.
The main reason to have two collection is there are some unique different services for each collection. So I have to maintain two collections for that.
But when saving documents to ElectedUsers first it fetch the document from Users collections and do some business logic and save it to ElectedUsers with same _id. For the particular document _id field in both collections can be same.
I want to know is it violating best practices ? or is it impact to sharding or any other operation badly ?
If you are using _id as the shard key, then having duplicate _id values can be problematic, otherwise if you are not using _id as shard key and maintaining some other global unique value for sharding, then there shouldn't be any issue
refer this link
http://docs.mongodb.org/manual/faq/sharding/

MongoDB - Using email id as identifier across collections

I have user collection which holds email_id and _id as unique. I want to store user data across various collections. I would like to use email_id as identifier in those collections. Because it is easy to query in the shell against those collections with email_id instead of complex ObjectId.
Is this right way? will it give any performance problem while creating indexes with big emailIds?
Also, don't consider this option, If you have plan to enable email_id change
option in future.
While relational databases encourage you to normalize your data and spread it over many tables, this approach is usually not the best for MongoDB. MongoDB doesn't support JOINs over multiple collections or even multiple documents from the same collection. So you should try to design your database documents in a way that each query can be statisfied by searching for a single document. That means it is usually a good idea to store all information about a user in one document.
An exception for this is when certain points of data of the user grows indefinitely (like the posts made by a user in a forum). First, MongoDB documents have a size limit and second, when the size of a document increases, the database needs to reallocate its hard drive space frequently. This slows down writes and leads to fragmentation in the database. In that case it's better to put each entity in a different collection.
The size of the fields covered by an index don't matter when you search for equality. When you have an unique index on email_id, it should be just as fast as searching by _id.