mongodb schema design reference schema - mongodb

i have an application with basically has 3 objects:
User, Company, Address where a User can belong to a Company, and a Company object can have number of Addresses and an Address object can have one to many users that belong to it.
From a mangodb, nosql design, what would be more efficient, so that the application can list all users belonging to a company and also to an address?
Currently, my User document is like:
> db.users.find().pretty()
{
"__v" : 1,
"_id" : ObjectId("532b039c17fc6100001a8737"),
"active" : true,
"email" : "norman#khine.net",
"groups" : "member",
"lockUntil" : 0,
"loginAttempts" : 0,
"name" : "Norman Khine",
"password" : "$2a$10$EqptX.RRmsk0.FgFRJOpYe9swH0y.lBrgUsg/IatxErjYPm9bT4yq",
"provider" : [
"github",
"local"
],
"surname" : "",
"tokenExpires" : 1395418101786,
"tokenString" : "xX-gA29ep27Yv_Cg3OHxLLSfURfVYnAloEncWsJeOf3Er8HvoVWaSvCSSddRFHQY"
}
Will it be better to have:
"company": ["_id","_id"...] # this way a user can belong to one or many companies
"address": ["_id", "_id"...] # this way a user can belong to one or many addresses
or shall i put each user_id within the "company" and "address" schema?
any advice much appreciated.

Well, it simply depends on what your application is used to load normally. If your application is very user-centric, you reference (*) the companies and addresses within the user document. If you normally show company information, reference the users in the company. And if you have an address centric.. well, I think you know what I mean ;)
(*) References in NoSQL databases CAN have impacts on the performance. This is the reason for something named "pre-joining". Simply put together the data, you need most of the times at one time.
Simple example: If you show the user information in your application, you most likely want also the name of the company, he is working for, on the screen. If you only reference the company, you need an additional lookup for the company just to get out the company's name. THe idea is now, to put the reference about the company together with the most needed data into the user's document. Or - if you only have companies in relation with users and company information will never be maintained on its own, then simply put the whole company data into the user's document. MongoDB will then ensure, that you can load the document with no additional disk lookups.
If you have to much references in your documents, it is most likely a sign for an outstanding redesign of your document structure. :)

Related

MongoDB Embedding alongside referencing

There is a lot of content of what kind of relationships should use in a database schema. However, I have not seen anything about mixing both techniques. 
The idea is to embed only the necessaries attributes and with them a reference. This way the application have the necessary data for rendering and the reference for the updating methods.
The problem I see here is that the logic for handle any CRUD operations becomes more tricky because its mandatory to update multiples collections however I have all the information in one single read.
Basic schema for a page that only wants the students names of a classroom:
CLASSROOM COLLECTION
{"_id": ObjectID(),
"students": [{"studentId" : ObjectID(),
"name" : "John Doe",
},
...
]
}
STUDENTS COLLECION
{"_id": ObjectID(),
"name" : "John Doe",
"address" : "...",
"age" : "...",
"gender": "..."
}
I use the students' collection in a different page and there I do not want any information about the classroom. That is the reason not to embed the students.
I started to learning mongo a few days ago and I don't know if this kind of schema bring some problems.
You can embed some fields and store other fields in a different collection as you are suggesting.
The issues with such an arrangement in my opinion would be:
What is the authority for a field? For example, what if a field like name is both embedded and stored in the separate collection, and the values differ?
Both updating and querying become awkward as you need to do it differently depending on which field is being worked with. If you make a mistake and go in the wrong place, you create/compound the first issue.

NoSQL db schema design

I'm trying to find a way to create the db schema. Most operations to the database will be Read.
Say I'm selling books on the app so the schema might look like this
{
{ title : "Adventures of Huckleberry Finn"
author : ["Mark Twain", "Thomas Becker", "Colin Barling"],
pageCount : 366,
genre: ["satire"] ,
release: "1884",
},
{ title : "The Great Gatsby"
author : ["F.Scott Fitzgerald"],
pageCount : 443,
genre: ["Novel, "Historical drama"] ,
release: "1924"
},
{ title : "This Side of Paradise"
author : ["F.Scott Fitzgerald"],
pageCount : 233,
genre: ["Novel] ,
release: "1920"
}
}
So most operations would be something like
1) Grab all books by "F.Scott Fitzgerald"
2) Grab books under genre "Novel"
3) Grab all book with page count less than 400
4) Grab books with page count more than 100 no later than 1930
Should I create separate collections just for authors and genre and then reference them like in a relational database or embed them like above? Because it seems like if I embed them, to store data in the db I have to manually type in an author name, I could misspell F.Scott Fitzgerald in a document and I wouldn't get back the result.
First of all i would say a nice DB choice.
As far as mongo is concerned the schema should be defined such that it serves your access patterns best. While designing schema we also must observe that mongo doesn't support joins and transactions like SQL. So considering all these and other attributes i would suggest that your choice of schema is best as it serves your access patterns. Usually whenever we pull any book detail, we need all information like author, pages, genre, year, price etc. It is just like object oriented programming where a class must have all its properties and all non- class properties should be kept in other class.
Taking author in separate collection will just add an extra collection and then you need to take care of joins and transactions by your code. Considering your concern about manually typing the author name, i don't get actually. Let's say user want to see books by author "xyz" so he clicks on author name "xyz" (like some tag) and you can fetch a query to bring all books having that selected name as one of the author. If user manually types user name then also it is just finding the document by entered string. I don't see anything manual here.
Just adding on, a price key shall also fit in to every document.

MongoDB N:M relation

I am trying to implement a kind of Access Control List for documents stored in MongoDB. I have users and items and the user have different rights on the items, e.g. read, write, delete... How do I store this n:m relation best in MongoDB?
I came up with the following ideas:
An ACL document : {itemID:"1", userID:"2", right:"write"}
An ACL per item : {itemID:"1", users {"2" : "write", "3" : "read"}}
Embedded ACL in the user : {_id:"1",..., users {"2" : "write", "3" : "read"}}
Embedded ACL in the item : {_id:"1",..., users {"2" : "write", "3" : "read"}}
Approaches 1 & 2 have clearly the disadvantage, that I have to query mongo twice and do the join form ACL to item by myself. The same holds true in most cases for solution 3, so to me approach 4. seems somehow the best, but I would be happy to hear some opinions on that!
Cheers,
Klaus

MongoDB : How do the stored objects are affected by model changes?

I am considering to move to mongoDB but I lack some certain basic understanding of the thing. My main question is "How do the stored objects are affected by model changes?". Here is a scenario to better understand what I want to know :
I create a "User" model with first_name, last_name, email attributes.
I create 25 users in my application that are stored in mongo (so they get stored as {first_name: "xxx", last_name: "yyy", email: "zzz"})
I add an attribute to my "User" model : username
I create 25 new users in my application (so they get stored as {first_name: "xxx", last_name: "yyy", email: "zzz", username: "xyz"})
I remove the "first_name" and "last_name" attributes from the "User" model.
I update the email address of 5 of the first 25 users.
So here are my questions :
After adding the "username" attribute to "User" model, what happens to the first 25 objects? Do they receive the "username" attribute in their BSON definition with an empty value? My understanding is they are simply left unnafected.
When I remove the "first_name" and "last_name" attributes from the "User" model, what happens to the existing 50 users? I guess the same answers as #1 applies.
After I updated the email addresses of the 10 records, what happens to the 5 firsts? Do they get the "username" added, "first_name" and "last_name" removed and their email addresses updated? Or simply their email addresses updated?
Your intuition is correct. MongoDB requires you to create and enforce the data model in your application (i.e., outside the database).. I think this is one of the biggest mental hurdles to get over when making the switch from SQL databases.
So to answer your questions
The original 25 User objects will not automatically receive a "username" attribute. You will either need to manually update the existing users to add a username or update the model to handle the case where no username exists.
Same as above. You will either need to manually update the existing records to remove the first_name and last_name attributes or wait until the object is updated to a future version that doesn't include them.
It depends on how you do the update. You can either update by replacing the entire record or by using modifiers to change individual fields. If you replace the entire record, then the current version of the model will be saved. If you modify the "email" attribute directly, then the other fields will not be changed.

How to enforce foreign keys in NoSql databases (MongoDB)?

Let's say I have a collection of documents such as:
{ "_id" : 0 , "owner":0 "name":"Doc1"},{ "_id" : 1 , "owner":1, "name":"Doc1"}, etc
And, on the other hand the owners are represented as a separate collection:
{ "_id" : 0 , "username":"John"}, { "_id" : 1 , "username":"Sam"}
How can I make sure that, when I insert a document it references the user in a correct way. In old-school RDBMS this could easily be done using a Foreign Key.
I know that I can check the correctness of insertion from my business code, BUT what if an attacker tampers with my request to the server and puts "owner" : 100, and Mongo doesn't throw any exception back.
I would like to know how this situation should be handled in a real-word application.
Thank you in advance!
MongoDB doesn't have foreign keys (as you have presumably noticed). Fundamentally the answer is therefore, "Don't let users tamper with the requests. Only let the application insert data that follows your referential integrity rules."
MongoDB is great in lots of ways... but if you find that you need foreign keys, then it's probably not the correct solution to your problem.
To answer your specific question - while MongoDB encourages handling foreign-key relationships on the client side, they also provide the idea of "Database References" - See this help page.
That said, I don't recommend using a DBRef. Either let your client code manage the associations or (better yet) link the documents together from the start. You may want to consider embedding the owner's "documents" inside the owner object itself. Assemble your documents to match your usage patterns and MongoDB will shine.
This is a one-to-one to relationship. It's better to embed one document in another, instead of maintaining separate collections. Check here on how to model them in mongodb and their advantages.
Although its not explicitly mentioned in the docs, embedding gives you the same effect as foreign key constraints. Just want to make this idea clear. When you have two collections like that:
C1:
{ "_id" : 0 , "owner":0 "name":"Doc1"},{ "_id" : 1 , "owner":1, "name":"Doc1"}, etc
C2:
{ "_id" : 0 , "username":"John"}, { "_id" : 1 , "username":"Sam"}
And if you were to declare foreign key constraint on C2._id to reference C1._id (assuming MongoDB allows it), it would mean that you cannot insert a document into C2 where C2._id is non-existent in C1. Compare this with an embedded document:
{
"_id" : 0 ,
"owner" : 0,
"name" : "Doc1",
"owner_details" : {
"username" : "John"
}
}
Now the owner_details field represents the data from the C2 collection, and the remaining fields represent the data from C1. You can't add an owner_details field to a non-existent document. You're essentially achieving the same effect.
This questions was originally answered in 2011, so I decided to post an update here.
Starting from version MongoDB 4.0 (released in June 2018), it started supporting multi-document ACID transactions.
Relations now can be modeled in two approaches:
Embedded
Referenced (NEW!)
You can model referenced relationship like so:
{
"_id":ObjectId("52ffc33cd85242f436000001"),
"contact": "987654321",
"dob": "01-01-1991",
"name": "Tom Benzamin",
"address_ids": [
ObjectId("52ffc4a5d85242602e000000")
]
}
Where the sample document structure of address document:
{
"_id":ObjectId("52ffc4a5d85242602e000000"),
"building": "22 A, Indiana Apt",
"pincode": 123456,
"city": "Los Angeles",
"state": "California"
}
If someone really wants to enforce the Foreign keys in the Project/WebApp. Then you should with a MixSQL approach i.e. SQL + NoSQL
I would prefer that the Bulky data which doesn't have that much references then it can be stored in NoSQL database Store. Like : Hotels or Places type of data.
But if there is some serious things like OAuth modules Tables, TokenStore and UserDetails and UserRole (Mapping Table) etc.... then you can go with SQL.
I would also reccommend that if username's are unique, then use them as the _id. You will save on an index. In the document being stored, set the value of 'owner' in the application as the value of 'username' when the document is created and never let any other piece of code update it.
If there are requirements to change the owner, then provide appropirate API's with business rules implemented.
There woudln't be any need of foreign keys.