Single Collection vs Multiple collections with same fields in mongodb

Single Collection vs Multiple collections with same fields in mongodb - mongodb

I have a collection in mongodb by the name Order. It has many fields some of which are mentioned below:
Order
{
"id": "1",
"name": "Hello1",
"orderType": "Type1",
"date": "2016-09-23T15:07:38.000Z"
...... //11 more fields
}
In the above collection, I have mentioned only 4 fields but in actual collection, I have around 15 fields. Order can be of 2 types: Type1 or Type2. It is mentioned in orderType field.
Now, for the Type2 order, I have all the 15 fields but it also have additional 5-7 fields. I have a single Order collection and I was wondering should I create 2 different collections for each type of orders; or whether I could keep this collection only and add additional fields here only. I have already written most of the logic considering only 1 collection Will it be worth the effort making it 2 different collections? If I keep a single collection, am I losing anything in terms of performance?

Keeping data on single collection is usually better in terms of performance since you can get required data mostly in a single query.
Since from the question I can mostly think it is a one-to-one relationship, embedded documents would be beneficial. Secondly, you are yourself saying that you have already written most of the logic considering only one collection then you should go forward with one collection only. Unless you have one-to-many relationships.
I would recommend you to go through Data Model Design and understand when and which model is better.

Embedded document would be best as you have one-to-one relationship.  For more info please check here.

Related

Meteor CRUD many-to-many relationships MongoDB

I have 3 collections interconnected through many-to-many relationships. Therefore, I have 2 concerns:
Should I have 2 arrays with 2 ids in each of the 3 collections, or one join collection with 3 ids?
How to perform reads, inserts, updates and deletes, so everything is in sync and integrity is ensured?

For most scenarios, I'd probably have the referencing IDs on each of the objects, like this:
{
"_id": "123",
"firstReferenceCollectionId": "abc",
"secondReferenceCollectionId": "def"
}
If your application is going to have massive scale, I'd probably further denormalize the data based on how it's actually being used.
To answer your second question, you really shouldn't need to worry about that with the above approach, since the internal id of the reference won't change when other properties on those objects change. If you go the route of additional denormalization, use meteor add matb33:collection-hooks to sync up data on upserts. Here's the documentation link: https://github.com/matb33/meteor-collection-hooks

What does denormalizing mean?

While reading the article http://blog.mongodb.org/post/88473035333/6-rules-of-thumb-for-mongodb-schema-design-part-3 chapter: "Rules of Thumb: Your Guide Through the Rainbow" i came across the words: embedding and denormalizing.
One: favor embedding unless there is a compelling reason not to
Five: Consider the write/read ratio when denormalizing. A field that will mostly be read and only seldom updated is a good candidate for denormalization: if you denormalize a field that is updated frequently then the extra work of finding and updating all the instances is likely to overwhelm the savings that you get from denormalizing.
I know embedding is nesting documents, instead of writing seperate tables/collections.
But i have no clue what denormalizing means.

Denormalization is the opposite of normalization, a good practice to design relational databases like MySQL, where you split up one logical dataset into separated tables to reduce redundancy.
Because MongoDB does not support joins of tables you prefere to duplicate an attribute into two collections if the attribute is read often and less updated.
E.G.: You want to save data about a person and want to save the gender of the Person:
In a SQL database you would create a table person and a table gender, store a foreign key of the genders ID in the person table and perform a join to get the full information. This way "male" and "female" only exists once as a value.
In MongoDB you would save the gender in every Person so the value "male" and "female" can exist multiple times but the lookup is faster because the data is tightly coupled in one object of a collection.

In MongoDB, how likely is it two documents in different collections in the same database will have the same Id?

According to the MongoDB documentation, the _id field (if not specified) is automatically assigned a 12 byte ObjectId.
It says a unique index is created on this field on the creation of a collection, but what I want to know is how likely is it that two documents in different collections but still in the same database instance will have the same ID, if that can even happen?
I want my application to be able to retrieve a document using just the _id field without knowing which collection it is in, but if I cannot guarantee uniqueness based on the way MongoDB generates one, I may need to look for a different way of generating Id's.

Short Answer for your question is : Yes that's possible.
below post on similar topic helps you in understanding better:
Possibility of duplicate Mongo ObjectId's being generated in two different collections?

You are not required to use a BSON ObjectId for the id field. You could use a hash of a timestamp and some random number or a field with extremely high cardinality (an US SSN for example) in order to make it close to impossible that two objects in the world will share the same id
The _id_index requires the idto be unique per collection. Much like in an RDBMS, where two objects in two tables may very likely have the same primary key when it's an auto incremented integer.
You can not retrieve a document solely by it's _id. Any driver I am aware of requires you to explicitly name the collection.
My 2 cents: The only thing you could do is to manually iterate over the existing collections and query for the _id you are looking for. Which is... ...inefficient, to put it polite. I'd rather semantically distinguish the documents in question by an additional field than by the collection they belong to. And remember, mongoDB uses dynamic schemas, so there is no reason to separate documents which semantically belong together but have a different set of fields. I'd guess there is something seriously, dramatically wrong with you schema. Please elaborate so that we can help you with that.

mongodb- schema design

I am using mongodb as my backend. I have data for movies, music, books and more which I am storing in one single collection. The compulsory fields for every bson entry are "_id", "name", "category". Rest of the fields depend upon the category to which the entry belongs.
For example, I have a movie record stored like.
{
"_id": <some_id>,
"name": <movie_name>,
"category": "movie",
"director": <director_name>,
"actors": <list_of_actors>,
"genre": <list_of_genre>
}
For music, I have,
{
"_id": <some_id>,
"name": <movie_name>,
"category": "music"
"record_label": <label_name>
"length": <length>
"lyrics": <lyrics>
}
Now I have 12 different categories for which only _id, name and category are common fields. Rest the fields are all different for different categories. Is my decision to store all data in one single collection fine or should I make different collections per category.

A single collection is best if you're searching across categories. Having the single collection might slow performance on inserts, but if you don't have a high write need, that shouldn't matter.

MongoDB allows you to store any field structure in a document even if every document is different, so that isn't a concern. By having those 3 consistent fields then you can use those as part of the index and to handle your queries. This is a good example of where a schemaless database helps because you can store everything in a single collection.
There is no performance hit for using a single collection in this way. Indeed, there is actually a benefit because you can shard the collection as a scaling strategy later. Sharding is done on a collection level so you could shard based on the _id field to have them evenly distributed, or use your category field to have certain categories per shard, or even a combination.
One thing to be aware of is future query requirements. If you do need to index the other fields then you can use sparse indexes which mean that documents without the indexed fields won't be in the index, so won't take any space in the index; a handy optimisation.
You should also be aware of growing the documents if you made updates. This does have a major performance impact.

MongoDB - One Collection Using Indexes

Ok so the more and more I develop in Mongodb i start to wonder about the need for multiple collections vs having one large collection with indexes (since columns and fields can be different for each document unlike tabular data). If i am trying to develop in the most efficient way possible (meaning less code and reusable code) then can I use one collection for all documents and just index on a field. By having all documents in one collection with indexes then i can reuse all my form processing code and other code since it will all be inserting into the same collection.
For Example:
Lets say i am developing a contact manager and I have two types of contacts "individuals" and "businesses". My original thought was to create a collection called individuals and a second collection called businesses. But that was because im used to developing in sql where yes this would be appropriate since columns would be different for each table. The more i started to think about the flexibility of document dbs the more I started to think, "do I really need two collections for this?" If i just add a field to each document called "contact type" and index on that, do i really need two collections? Since the fields/columns in each document do not have to be the same for all (like in sql) then each document can have their own fields as long as i have a "document type" field and an index on that field.
So then i took that concept and started to think, if i only need one collection for "individuals" and "businesses" then do i even need a separate collection for "Users" or "Contact History" or any other data. In theory couldn't i build the entire solution in once collection and just have a field in each document that specifield the "type" and index on it such as "Users", "Individual Contact", "Business Contacts", "Contact History", etc, and if it is a document related to another document i can index on the "parent key/foreign" Id field...
This would allow me to code the front end dynamically since the form processing code would all be the same (inserting into the same collection). This would save a lot of coding but i want to make sure by using indexes and secondary indexes that the db would still run fast and not cause future problems as the collection grew. As you can imagine, if everything was in one collection there might be hundreds of thousands even millions of documents in this collection as the user base grows but it would have indexes and secondary indexes to optimize performance.
My question is: Is this a common method mongodb developers use? Why or why not? What are the downfalls, if any? If this is a commonly used method, please also give any positives to using this method. thank you.

This is a really big point in Mongo and the answer is a little bit more of an art than science. Having one collection full of gigantic documents is definitely an anti-pattern because it works against many of Mongo's features.
For instance, when retrieving documents, you can only retrieve a whole document out of a collection (not entirely true, but mostly). So if you have huge documents, you're retrieving huge documents each time. Also, having huge documents makes sharding less flexible since only the top level documents are indexed (and hence, sharded) in each collection. You can index values deep into a document, but the index value is associated with the top level document.
At the same time, going purely relational is also an anti-pattern because you've lost a lot of the referential integrity by going to Mongo in the first place. Also, all joins are done in application memory, so each one requires a full round-trip (slow).
So the answer is to do something in between. I'm thinking you'll probably want a collection for individuals and a different collection for businesses in this case. I say this because it seem like businesses have enough meta-data associated that it could bulk up a lot. (Also, I individual-business relationship seems like a many-to-many). However, an individual might have a Name object (with first and last properties). That would be a bad idea to make Name into a separate collection.
Some info from 10gen about schema design: http://www.mongodb.org/display/DOCS/Schema+Design
EDIT
Also, Mongo has limited support for transactions - in the form of atomic aggregates. When you insert an object into mongo, the entire object is either inserted or not inserted. So you're application domain requires consistency between certain objects, you probably want to keep them in the same document/collection.
For example, consider an application that requires that a User always has a Name object (containing FirstName, LastName, and MiddleInitial). If a User was somehow inserted with no corresponding Name, the data would be considered to be corrupted. In an RDBMS you would wrap a transaction around the operations to insert User and Name. In Mongo, we make sure Name is in the same document (aggregate) as User to achieve the same effect.
Your example is a little less clear, since I don't understand the business cases. One thing that does come to mind is that Mongo has excellent support for inheritance. It might make sense to put all users, individuals, and potentially businesses into the same collection (depending on how the application is modeled). If one individual has many contacts, you probably want individuals to have an array of IDs. If your application requires that you get a quick preview of contacts, you might consider duplicating part of an individual and storing an array of contact objects.
If you're used to RDBMS thinking, you probably think all your data always has to be consistent. The truth is, that's probably not entirely true. This concept of applying atomic aggregates to the domain has been preached heavily by the DDD community recently. When you look at your domain in depth, like your business users do, the consistency boundaries should become distinct.

MongoDB, and NoSQL in general, is about de-normalising data and about reducing joins. It goes against normal SQL thinking.
In your case, I don't see any reason why you would want to have separate collections because it introduces unnecessary complexity and performance overhead. Consider, for example, if you wanted to have a screen that displayed all contacts, in alphabetical order. If you have one single collection for contacts, then its really easy, but if you have two collections it becomes a more complicated proposition.
Where I would have multiple collections is if your application had multiple users storing contacts. I would then have one collection for each user. This makes it so easy to extract out that users contacts.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse