I am considering to move to mongoDB but I lack some certain basic understanding of the thing. My main question is "How do the stored objects are affected by model changes?". Here is a scenario to better understand what I want to know :
I create a "User" model with first_name, last_name, email attributes.
I create 25 users in my application that are stored in mongo (so they get stored as {first_name: "xxx", last_name: "yyy", email: "zzz"})
I add an attribute to my "User" model : username
I create 25 new users in my application (so they get stored as {first_name: "xxx", last_name: "yyy", email: "zzz", username: "xyz"})
I remove the "first_name" and "last_name" attributes from the "User" model.
I update the email address of 5 of the first 25 users.
So here are my questions :
After adding the "username" attribute to "User" model, what happens to the first 25 objects? Do they receive the "username" attribute in their BSON definition with an empty value? My understanding is they are simply left unnafected.
When I remove the "first_name" and "last_name" attributes from the "User" model, what happens to the existing 50 users? I guess the same answers as #1 applies.
After I updated the email addresses of the 10 records, what happens to the 5 firsts? Do they get the "username" added, "first_name" and "last_name" removed and their email addresses updated? Or simply their email addresses updated?
Your intuition is correct. MongoDB requires you to create and enforce the data model in your application (i.e., outside the database).. I think this is one of the biggest mental hurdles to get over when making the switch from SQL databases.
So to answer your questions
The original 25 User objects will not automatically receive a "username" attribute. You will either need to manually update the existing users to add a username or update the model to handle the case where no username exists.
Same as above. You will either need to manually update the existing records to remove the first_name and last_name attributes or wait until the object is updated to a future version that doesn't include them.
It depends on how you do the update. You can either update by replacing the entire record or by using modifiers to change individual fields. If you replace the entire record, then the current version of the model will be saved. If you modify the "email" attribute directly, then the other fields will not be changed.
Related
There is a lot of content of what kind of relationships should use in a database schema. However, I have not seen anything about mixing both techniques.
The idea is to embed only the necessaries attributes and with them a reference. This way the application have the necessary data for rendering and the reference for the updating methods.
The problem I see here is that the logic for handle any CRUD operations becomes more tricky because its mandatory to update multiples collections however I have all the information in one single read.
Basic schema for a page that only wants the students names of a classroom:
CLASSROOM COLLECTION
{"_id": ObjectID(),
"students": [{"studentId" : ObjectID(),
"name" : "John Doe",
},
...
]
}
STUDENTS COLLECION
{"_id": ObjectID(),
"name" : "John Doe",
"address" : "...",
"age" : "...",
"gender": "..."
}
I use the students' collection in a different page and there I do not want any information about the classroom. That is the reason not to embed the students.
I started to learning mongo a few days ago and I don't know if this kind of schema bring some problems.
You can embed some fields and store other fields in a different collection as you are suggesting.
The issues with such an arrangement in my opinion would be:
What is the authority for a field? For example, what if a field like name is both embedded and stored in the separate collection, and the values differ?
Both updating and querying become awkward as you need to do it differently depending on which field is being worked with. If you make a mistake and go in the wrong place, you create/compound the first issue.
I just have 2 collections right now - Users and Markers.
Users schema is something like this:
{
username: String,
isVerified: Boolean,
zipCode: Number
}
Each new user that I create in Users has an objectId like this:
{
"_id" : ObjectId("588fbd3e39b266783285d573"),
"username" : "testuser",
"isVerified": true,
"zipCode": 12345
}
Markers is a collection of Marker objects created by users. It has some title, description, latLng, etc. However, I want to add the Users.username and also a userid of some kind. For this purpose, is it fine to just use the Users._id?
I'm fairly new to nosql database design so please bear with me. Since I am returning Markers data to the client side, I need a userid so I can open a user's profile page from it since my API can just use a userId parameter. But, should I have both username and userid in each Marker?
** after some reading I've decided to switch to postgres and learn that since my data is relational. thx for the help
Assuming that your username is unique, you can either have username or userid in your Markers connection. Mongodb does not support foreign keys intrinsically, nor does it support referential integrity. So it will have to be up to you update/delete these references as the data changes.
Is it possible to make a compound index, where one of the fields have a fixed value?
Let's say I want to avoid users using the same e-mail for different accounts, but just for regular user accounts, and I want to allow admins to use the mail in as many places as they want, and even have a regular user account and an administrative account using the same e-mail
User.index({ username: 1, email: 1 }, { unique: true })
Is not useful, since it will not allow admins to reuse the email. Is it possible to do something like?
User.index({ role: "regular_user", username 1, email: 1}, { unique: true });
Luis,
In regards to the example that you gave. If you create a unique compound index, individual keys can have the same values, but the combination of values across the keys that exist in the index entry can only appear once. So if we had a unique index on {"username" : 1, "role" : 1}. The following inserts would be legal:
> db.users.insert({"username" : "Luis Sieira"})
> db.users.insert({"username" : "Luis Sieira", "role" : "regular"})
> db.users.insert({"username" : "Luis Sieira", "role" : "admin"})
If you tried to insert a second copy of any of the above documents, you would cause a duplicate key exception.
Your Scenarios
I think that if you added an allowance field to your schema. When you do inserts for admins for new accounts. You can add a different value for their admin allowance. If you added unique index for {"username":1,"email":1, "allowance" : 1}
You could make the following inserts, legally:
>db.users.insert({"username" : "inspired","email": "i#so.com", "allowance": 0})
>db.users.insert({"username" : "inspired","email": "i#so.com", "allowance": 1})
>db.users.insert({"username" : "inspired","email": "i#so.com", "allowance": 2})
>db.users.insert({"username" : "inspired","email": "i#so.com", "allowance": 3})
Of course, you'll have to handle certain logic from the client, but this will allow you to use an allowance code of 0 for regular accounts and then allow you to save a higher allowance code (incrementing it or adding custom value for it) each time an admin creates another account.
I hope this offers some direction with using unique compound indexes.
You are on the right track. First things first, if you define an index with the role like this
User.index({role: 1, username: 1, email: 1}, { unique: true });
Mongo will use null for documents that do not specify the role field. If you insert an user without specifying the role and try to add it again, you will get an error because the three fields already exist in the database. So you can use this to your advantage by not including a role (or you can use a predefined value for better reading, like you proposed as regular_user).
Now, the tricky part is forcing the index to permit admins to bypass the uniqueness constraint. The best solution would be to generate a some hash and add it to the role. So, if you just add admins with roles like admin_user, you won't bypass the constraint. Meanwhile, using a role like admin_user_635646 (always with varying suffix) will allow you to insert the same admin multiple times.
i have an application with basically has 3 objects:
User, Company, Address where a User can belong to a Company, and a Company object can have number of Addresses and an Address object can have one to many users that belong to it.
From a mangodb, nosql design, what would be more efficient, so that the application can list all users belonging to a company and also to an address?
Currently, my User document is like:
> db.users.find().pretty()
{
"__v" : 1,
"_id" : ObjectId("532b039c17fc6100001a8737"),
"active" : true,
"email" : "norman#khine.net",
"groups" : "member",
"lockUntil" : 0,
"loginAttempts" : 0,
"name" : "Norman Khine",
"password" : "$2a$10$EqptX.RRmsk0.FgFRJOpYe9swH0y.lBrgUsg/IatxErjYPm9bT4yq",
"provider" : [
"github",
"local"
],
"surname" : "",
"tokenExpires" : 1395418101786,
"tokenString" : "xX-gA29ep27Yv_Cg3OHxLLSfURfVYnAloEncWsJeOf3Er8HvoVWaSvCSSddRFHQY"
}
Will it be better to have:
"company": ["_id","_id"...] # this way a user can belong to one or many companies
"address": ["_id", "_id"...] # this way a user can belong to one or many addresses
or shall i put each user_id within the "company" and "address" schema?
any advice much appreciated.
Well, it simply depends on what your application is used to load normally. If your application is very user-centric, you reference (*) the companies and addresses within the user document. If you normally show company information, reference the users in the company. And if you have an address centric.. well, I think you know what I mean ;)
(*) References in NoSQL databases CAN have impacts on the performance. This is the reason for something named "pre-joining". Simply put together the data, you need most of the times at one time.
Simple example: If you show the user information in your application, you most likely want also the name of the company, he is working for, on the screen. If you only reference the company, you need an additional lookup for the company just to get out the company's name. THe idea is now, to put the reference about the company together with the most needed data into the user's document. Or - if you only have companies in relation with users and company information will never be maintained on its own, then simply put the whole company data into the user's document. MongoDB will then ensure, that you can load the document with no additional disk lookups.
If you have to much references in your documents, it is most likely a sign for an outstanding redesign of your document structure. :)
Let's take a simple example, a blog post. I would store comments to a particular post within the same document.
messages = { '_id' : ObjectId("4cc179886c0d49bf9424fc74"),
'title' : 'Hello world',
'comments' : [ { 'user_id' : ObjectId("4cc179886c0d49bf9424fc74"),
'comment' : 'hello to you too!'},
{ 'user_id' : ObjectId("4cc1a1830a96c68cc67ef14d"),
'comment' : 'test!!!'},
]
}
The question is, would it make sense to store the username instead of the user's objectid aka primary key? There are pros/cons to both, pro being that if I display the username within the comment, I wouldn't have to run a second query. Con being if "John Doe" decides to modify his username, I would need to run a query across my entire collection to change his username within all comments/posts.
What's more efficient?
I will store the two fields. This way, you only run one query in the most common case (display the comments). Change user name is really rare so you will not have to update very often.
I will keep user_id because I don't like to use natural field like username as primary key and match on an object id must be faster.
Of course, it really depends on how much traffic you're going to get, how many comments you expect to have, etc… But it's likely that “do the simplest thing that works” is your friend here: it's simpler to store only the user_id, so do that until it doesn't work any more (eg, because you've got a post with 100,000 comments that takes 30 seconds to render), then denormalize and store the username along with the comments.