Mongodb database schema design issue - mongodb

I have two collections in my mongodb database :- Users , Clubs
User Schema :-
var UserSchema = new Schema({
Name: {
type: String ,
required: true
},
Clubs: [
{type: mongoose.Schema.Types.ObjectId, ref: 'Club'}
]});
Now when a user joins a club , i update the club array . But I also frequently need to fetch all the users for a particular club . Therefore I am creating the club schema as :-
var ClubSchema = new Schema({
clubName : {
type: String ,
unique: true ,
required: true
},
members : [
{type: mongoose.Schema.Types.ObjectId,
ref: 'User' ,
default: []
} ]});
My question is : Is this the right way to do so , or should I maintain this club information at the User Collection only ? Maybe I need to optimize the query related to fetching all the Users belonging to a Club.

It's quite hard to say what's "the right way" to be honest, as this is most likely case by case depending on your application queries and architecture.
I have seen some people do as you designed above; Solving many-to-many relationships by using reference in both collection.
This would work for your case queries above:
db.user.find({"clubs": ObjectId("000000")}); #Find all users belonging to certain club.
db.club.find({"users": ObjectId("111111")}); # Find all clubs where a user belong to.
With two indexes on:
db.user.ensureIndex({"clubs": 1});
db.club.ensureIndex({"users": 1});
Though this may reduce the overall consistency. i.e. when you delete a club, you need to update the other affected documents as well. While in the process of the mentioned update, your documents may not be up-to-date.
If you don't have mongodb on replicas, not accessing the db from distributed systems and you think this problem is not a big issue, then go for this design for the ease of query.
If you think the consistency mentioned above is a deal-breaker,
then go with inserting only clubs reference in users collection.
This is probably the most common design, and the one listed in
mongodb official site.
http://docs.mongodb.org/manual/tutorial/model-referenced-one-to-many-relationships-between-documents/
I would suggest optimising your query first, before choosing to add complexities in your update/insert (Because you have to update all related docs by having ref in Users and Clubs).
Hope this helps.

Related

Mongodb schema, keeping a count field or counting documents from separate collection

I am facing a dilemma in my MongoDB schema design.
The site I am working on has support for comments. These comments have their own schema.
comment=
{
_id: objectId,
text: "example",
author: authorObjectId,
postId: objectId
}
Comments can be posted in Posts. Posts have their own schema also.
post=
{
_id: objectId,
title: "example",
author: authorObjectId
}
I need to add a commentCount field to the posts when I return them to the frontend.
From my understanding, there are 2 ways to go about this.
Define a commentCount in postSchema, and increment/decrement it every time I add/remove a comment using $inc
Use aggregation, (project + find) when querying posts and count the comments from the comment collection.
Which solution is better for performance? Which one is the best? I don't have many users right now but I would want to take the right approach from the beginning. Because this solution will be used in many other schemas (upvotes, karma, etc.).
Also! I have tried implementing the aggregation and I couldn't come up with a query that would be able to count documents from comments collection and use the result in the project stage.

Moving from relational db to mongodb

I have a question on best practises or ideal way how I should store the data in the database. As an example I have a Site that has a Country assigned.
Table Countries: id|name|alpha2
Table Sites: id|countryId|name
Each Site has a reference to the country ID.
I would like to create a new website using Meteor and its mongodb and was wondering how I should store the objects. Do I create a colleciton "countries" and "sites" and use the country _id to as a reference? Then resolve the references using transform?
Looking at SimpleSchema I came up with the following:
Schemas.Country = new SimpleSchema ({
name: {
type: String
},
alpha2: {
type: String,
max: 2
}
});
Schemas.Site = new SimpleSchema({
name: {
type: String,
label: "Site Name"
},
country: {
type: Schemas.Country
}
});
Countries = new Meteor.Collection("countries");
Countries.attachSchema(Schemas.Country);
Sites = new Meteor.Collection("sites");
Sites.attachSchema(Schemas.Site);
I was just wondering how this is then stored in the db. As I have 2 collections but inside the sites collection I do have defined country objects as well. What if a country changes its alpha2 code (very unlikely)?
Also this would continue where I have a collection called "conditions". Each condition will have a Site defined. I could now define the whole Site object into the condition object. What if the Sitename changes? Would I need to manually change it in all condition objects?
This confuses me a bit. I am very thankful for all your thoughts.
The challenge with Meteor is that its tightly bound to Mongo, which is not good to built OLTP app that require normalized DB design. Mongo is good for OLAP kind of apps which fall in WORM (Write Once Read Many) category. I would like to see Meteor supporting OrientDB as they do Mongo.
There can be two approaches:
Normalize the DB as we do in RDBMS and then retrieve data by hitting
data multiple times. Here is a good article explaining this approach - reactive joins in meteor.
Joins in
Meteor
are suggested in future. You can also try Meteor packages - publish
composite or
publish with
relations
Keep data de-normalized at least partially (for 1-N relation you can
embed things in document, for N-N relation you may having separate
collection). For instance, 'Student' can be embedded in 'Class' as
student will never be in more than 1 class, but to relate 'Student'
and 'Subject', they can be in different collections (N-N relation -
student will have more than one subject and each subject will be
taken by more than one student). For fetching N-N relation again you
can use the same approach that is mentioned point above.
I am not able to give you exact code example, but I hope it helps.

MongoDB One to Many Relationship

I’m starting to learn MongoDB and I at one moment I was asking myself how to solve the “one to many” relationship design in MongoDB. While searching, I found many comments in other posts/articles like ” you are thinking relational “.
Ok, I agree. There will be some cases like duplication of information won’t be a problem, like in for example, CLIENTS-ORDERS example.
But, suppose you have the tables: ORDERS, that has an embedded DETAIL structure with the PRODUCTS that a client bought.
So for one thing or another, you need to change a product name (or another kind of information) that is already embedded in several orders.
At the end, you are force to do a one-to-many relashionship in MongoDB (that means, putting the ObjectID field as link to another collection) so you can solve this simple problem, don’t you ?
But every time I found some article/comment about this, it says that will be a performance fault in Mongo. It’s kind of disappointing
Is there another way to solve/design this without performance fault in MongoDB ?
One to Many Relations
In this relationship, there is many, many entities or many entities that map to the one entity. e.g.:
- a city have many persons who live in that city. Say NYC have 8 million people.
Let's assume the below data model:
//city
{
_id: 1,
name: 'NYC',
area: 30,
people: [{
_id: 1,
name: 'name',
gender: 'gender'
.....
},
....
8 million people data inside this array
....
]
}
This won't work because that's going to be REALLY HUGE. Let's try to flip the head.
//people
{
_id: 1,
name: 'John Doe',
gender: gender,
city: {
_id: 1,
name: 'NYC',
area: '30'
.....
}
}
Now the problem with this design is that if there are obviously multiple people living in NYC, so we've done a lot of duplication for city data.
Probably, the best way to model this data is to use true linking.
//people
{
_id: 1,
name: 'John Doe',
gender: gender,
city: 'NYC'
}
//city
{
_id: 'NYC',
...
}
In this case, people collection can be linked to the city collection. Knowing we don't have foreign key constraints, we've to be consistent about it. So, this is a one to many relation. It requires 2 collections. For small one to few (which is also one to many), relations like blog post to comments. Comments can be embedded inside post documents as an array.
So, if it's truly one to many, 2 collections works best with linking. But for one to few, one single collection is generally enough.
The problem is that you over normalize your data. An order is defined by a customer, who lives at a certain place at the given point in time, pays a certain price valid at the time of the order (which might heavily change over the application lifetime and which you have to document anyway and several other parameters which are all valid only in a certain point of time. So to document an order (pun intended), you need to persist all data for that certain point in time. Let me give you an example:
{ _id: "order123456789",
date: ISODate("2014-08-01T16:25:00.141Z"),
customer: ObjectId("53fb38f0040980c9960ee270"),
items:[ ObjectId("53fb3940040980c9960ee271"),
ObjectId("53fb3940040980c9960ee272"),
ObjectId("53fb3940040980c9960ee273")
],
Total:400
}
Now, as long as neither the customer nor the details of the items change, you are able to reproduce where this order was sent to, what the prices on the order were and alike. But now what happens if the customer changes it's address? Or if the price of an item changes? You would need to keep track of those changes in their respective documents. It would be much easier and sufficiently efficient to store the order like:
{
_id: "order987654321",
date: ISODate("2014-08-01T16:25:00.141Z"),
customer: {
userID: ObjectId("53fb3940040980c9960ee283"),
recipientName: "Foo Bar"
address: {
street: "742 Evergreen Terrace",
city: "Springfield",
state: null
}
},
items: [
{count:1, productId:ObjectId("53fb3940040980c9960ee300"), price: 42.00 },
{count:3, productId:ObjectId("53fb3940040980c9960ee301"), price: 0.99},
{count:5, productId:ObjectId("53fb3940040980c9960ee302"), price: 199.00}
]
}
With this data model and the usage of aggregation pipelines, you have several advantages:
You don't need to independently keep track of prices and addresses or name changes or gift buys of a customer - it is already documented.
Using aggregation pipelines, you can create a price trends without the need of storing pricing data independently. You simply store the current price of an item in an order document.
Even complex aggregations such as price elasticity, turnover by state / city and alike can be done using pretty simple aggregations.
In general, it is safe to say that in a document oriented database, every property or field which is subject to change in the future and this change would create a different semantic meaning should be stored inside the document. Everything which is subject to change in the future but doesn't touch the semantic meaning (the users password in the example) may be linked via a GUID.

Node, MongoDB, Mongoose Design Choice - Creating two collections or one collection

I'm struggling with a large design choice for my applications' mongo collections and mongoose schemas.
My applications calls for two account types: Students and Teachers.
The only similarity between the two account types is that they both require the fields: firstName, lastName, email, and password. Other than that, they are different (teachers have "assignments", "tests", students have "homework", etc.)
I have pondered my options extensively, and considered the following design choices:
Use mongoose-schema-extend, and create an "abstract" schema for
all accounts. Then, extend this schema to create the Teacher and
Student schemas. This implies two collections, and therefore some
redundant fields. There are also issues with logging in and account creation (checking to see if the email used to log in is a student email or teacher email, etc.)
Create one collection "accounts", and add a type field to
indicate if the account is a "student" or a "teacher". This implies
that entries in the "accounts" collection will be dissimilar. This
also requires that I have two mongoose schemas for a single
collection.
Create an "accounts" collection, have a "type" field and an "accountId" field. In addition to a "student" collection and a "teacher" collection -- the "type" field will indicate which collection the student-specific or teacher-specific fields reside within, and the "accountId" field will indicate exactly which entry the account is matched with.
I appreciate all input, criticism or suggestions.
I've been down a similar road and I eventually landed on a mix of option 1 and 2.
mongoose-schema-extend simply modifies the prototype of Schema with an #extend() method which when invoked performs a deep copy of the passed schema. Most helpful. However, you can control which collection mongoose saves to in MongoDB by adding a collections property to the Schema:
var schema = new Schema({
foo: String,
bar: Boolean
}, { collection: "FooBarBaz" });
Remember: Mongoose understands the concept of a Schema but MongoDB does not. This means you can store dissimilar data and use your custom business logic to control the mess. With that said, you can create a base model called User, force mongoose to use the same collection by using the collection option and then extend off this base model to make your Teachers and Students models.
Make sure you add a type flag in the base model as you suggested in option 2. Not only is this convenient for quick lookups, but it will be critical when working commando with raw MongoDB data.
#jibsales has an excellent solution.
One more solution to consider is using Population with references http://mongoosejs.com/docs/populate.html from the Users collection to the Student and Teacher collections. Some benefits are:
Entries in each of the three collections (Users, Teachers, Students)
are similar in storage.
Allows you to obtain the fields for the "User" independently of
obtaining the fields for the referenced collection.
This would require that the schema is modified before an instance is created (and a model is created from the schema), where refType is the desired collection:
var userSchema = new Schema({
_id : Number,
name : String,
age : Number,
stories : [{ type: Schema.Types.ObjectId, ref: refType}]
});

MongoDB schema embedding and nested unique keys

I have a relational SQL DB that's being changed to MongoDB. In SQL there are 3 tables that are relevant: Farm, Division, Wombat (names and purpose changed for this question). There's also a Farmer table which is the equivalent of a users table.
Using Mongoose I've come up with this new schema:
var mongoose = require('mongoose');
var farmSchema = new mongoose.Schema({
// reference to the farmer collection's _id key
farmerId: mongoose.Schema.ObjectId,
name: String, // name of farm
division: [{
divisionId: mongoose.Schema.ObjectId,
name: String,
wombats: [{
wombatId: mongoose.Schema.ObjectId,
name: String,
weight: Number
}]
}]
});
Each of the (now) nested collections has a unique field in it. This will allow me to use Ajax to send just the uniqueId and the weight (for example) to adjust that value instead of updating the entire document when only the weight changes.
This feels like an incorrect SQL adaptation for MongoDB. Is there a better way to do this?
In general, I believe that people tend to embed way too much when using MongoDB.
The most important argument is that having different writers to the same objects makes things a lot more complicated. Working with arrays and embedded objects can be tricky and some modifications are impossible, for instance because there's no positional operator matching in nested arrays.
For your particular scenario, take note that unique array keys might not behave as expected, and that behavior might change in future releases.
It's often desirable to opt for a simple SQL-like schema such as
Farm {
_id : ObjectId("...")
}
Division {
_id : ObjectId("..."),
FarmId : ObjectId("..."),
...
}
Wombat {
_id : ObjectId("..."),
DivisionId : ObjectId("..."),
...
}
Whether embedding is the right approach or not very much depends on usage patterns, data size, concurrent writes, etc. - a key difference to SQL is that there is no one right way to model 1:n or n:n relationships, so you'll have to carefully weigh the pros and cons for each scenario. In my experience, having a unique ID is a pretty strong indicator that the document should be a 'first-class citizen' and have its own collection.