RESTful API related objects & denormalization (MongoDB) - mongodb

I'm building a RESTful web application using node.js and MongoDB.
I have a Person model
Person
id: '12345'
name: 'John'
likes: [ {id: '54321', name: 'Mary'} ]
isLikeydBy: []
Person
id: '54321'
name: 'Mary'
likes: []
isLikeydBy: [ {id: '12345', name: 'John'} ]
What is the best way to model "likes" and "isLikedBy" relationships? Since I am using MongoDB I thought that this is a good way to model the relationships, as there is only one access to database needed to get all data about one person.
How to create a REST api for this relationship? What if 'John' doesn't like 'Mary' anymore. The server only receives the following put request.
Person
id: '12345'
name: 'John'
likes: []
isLikeydBy: []
But the server should also update 'Mary', because now she is not liked by 'John' anymore. (I know that MongoDB does not directly support transactions and that I have to implement them myself.)
My ideas:
1. On each update of Person (of it's 'likes' and 'isLikedBy' fields) get this person from database and compare their 'likes' and 'isLikedBy' fields with the request. There is some overhead with this approach and also I do not know if it is in the spirit of RESTful APIs.
2. Make the client send both the original 'likes' and 'isLikedBy' fields and also the new updated ones (or only the diff). This seems even further away from RESTful design, since the client must now be aware of which data was last successfully saved to the server.
3. Create a separate object which would contain the relationship information (3 fields: id, me, whoILike). But this means that each time I would want to get data about a person I would need 2 queries, one for the person and one for relationships and then combine the data into single object.
What should I do?

We've wrestled with this same problem at my company when using Mongo and tracking likes.
After much discussion we decided to store the counts of likes with the entities - in this case storing likes with people.
Opinions on your options:
The overhead in doing additional queries is probably a bad idea. Especially since "liking" stuff is seen by the users as a lightweight operation. In other words, you might find users liking a ton of stuff, which means a lot of writes - and in this case every write has an additional read or two with it.
This is a lot of work for the developer to do and it's easy to get wrong.
I think it's ok, but I still prefer to store likes with the person. Mongo isn't good at joins like you mentioned.
I think you should store the like/liked by fields with the person document. The only thing I would change is the REST call being made.
Maybe something like:
PUT http://www.rest.com/person/123/likes/456
This would say "Person 123 likes 456." Then your REST call makes sure the data is updated. It would update the Person 123 object and the Person 456 object.
To remove something like:
DELETE http://www.rest.com/person/123/likes/456
Keep in mind every time someone makes a REST call, the update doesn't have to update the entire document. You can do partial updates on the Person document with the modified likes. You can also easily add/remove an array in a document.

Related

Mongodb Storing Friends Relationship

I am using MongoDb for one of the mobile app that we are developing. It has a feature of sync contacts.
I wanted to know the ideal way of storing the relationships(friends relationship and not RDBMS kind of relationship) in mongodb. I want to know the architecture for the same.
I have thought of the following user collection structure:
{
_id: ObjectID(abc),
name: "abc",
contacts: ["def", "ghi"]
}
In the above collection I am considering "def" and "ghi" as object ids of friends of user abc. Is this the correct way of doing it or can someone suggest me a better and right way that they might have implemented?
All I am concerned about is I should not get stuck or hit the performance when retrieving data specific the user's friends in future.
Consider If I want to get all the activities from collection Activities done by my friends.
I think you could use advantage of noSql structure and save/serve some more info about friend
{
_id: ObjectID(abc),
name: "abc",
contacts: [{id:"def" name:"John"}, {id:"ghi", name:"Sari"} ]
}
To display basic list you will need just one get query, and then having name (or other important related details) - check for activities.
The extra overhead with this structure is a need to update name (and other details) every time when user updates it's name - but this is not a hammer - who changes its name frequently?

Using a sub-resource or not?

Let's take the following example:
We want to expose company and employee information from a RESTful API.
Company data should be quite simply:
GET api/v1/companies
GET api/v1/companies/{id}
Employees BELONG to a company, but we still want to retrieve them individually as well, so which solution is best:
Solution 1: Using sub-resources
Get all employees for a company:
GET api/v1/companies/{companyId}/employees
Get a specific employee:
GET api/v1/companies/{companyId}/employees/{employeeId}
Solution 2: Using an independent resources
Get all employees for a company:
GET api/v1/employees?companyId={companyId}
Get a specific employee:
GET api/v1/employees/{employeeId}
Both options seem to have their pros and cons.
With sub-resources, I may not always have the CompanyId on hand when wanting to retrieve an individual employee.
With an independent resource, getting all employees for a company should use the sub-resource approach if we want to be RESTful.
Otherwise, we could use a mix, but this lacks consistency:
Get all employees for a company:
GET api/v1/companies/{companyId}/employees
Get a specific employee:
GET api/v1/employees/{employeeId}
What is the best approach to take in such a situation if we want to stay true to RESTful standards?
For me this sounds like the common many-to-many relationship problem for RESTful services. (see How to handle many-to-many relationships in a RESTful API?)
Your first solution seems good at first but you will have problems whenever you want to access the relation itself.
Instead of returning the employee with the following GET request you should return the relation.
GET api/v1/companies/{companyId}/employees/{employeeId}
If the relation can be identified by 2 keys this solutions seems to be fine. But what happens if the relation is identified by 3+ id's? The URI becomes rather long.
GET api/v1/companies/{companyId}/employees/{employeeId}/categories/{categoryId}
In this case I would come up with a separate resource for the relation:
GET api/v1/company-employees/{id}
The returned model in JSON would look like this:
{
"id": 1 <- the id of the relation
"company": {
"id": 2
},
"employee": {
"id": 3
},
"category": {
"id": 4
}
}
I think it would be okay to provide both. If you want the client to browse through the list of companies first, then select a company and then get the list of all employees, the first approach is necessary. If, may be in addition, you want the client to be able to filter employees by name or age, but without knowing the company identifier, you must provide the second approach as well. It depends on what you want the client to do. In my opinion, it would not be necessary to provide the second approach, if clients can only filter employees by company identifier.
I would go for the first approach and providing some links to retrieve the subordinate resource.
If I take the example of a new employee that you may add in a company. It seems to be difficult, for the client with the second approach to make a POST on your collections. Why ? Because he has to know the company id that is "somewhere else".
With the first approach, as you followed a path, you already know this information (the companyId)... so it's easier for the client to add a new employee.
Back to your example, the main benefit of the second approach is, if your client want something like "the amount of employees in a city", where you don't care about the notion of company.
But it seems that you need the notion of company, so I would go for the first.
Also, very related to this question: RESTful design: when to use sub-resources?

Many to many relationship on Mongodb based e-learning webapp?

I am relatively new to No-SQL databases. I am designing a data structure for an e-learning web app. There would be X quantity of courses and Y quantity of users.
Every user will be able to take any number of courses.
Every course will be compound of many sections (each section may be a video or a quiz).
I will need to keep track of every section a user takes, so I think the whole course should be part of the user set (for each user), like so:
{
_id: "ed",
name: "Eduardo Ibarra",
courses: [
{
name: "Node JS",
progress: "100%",
section: [
{name: "Introdiction", passed:"100%", field3:"x", field4:""},
{name: "Quiz 1", passed:"75%", questions:[...], field3:"x", field4:""},
]
},
{
name: "MongoDB",
progress: "65%",
...
}
]
}
Is this the best way to do it?
I would say that design your database depending upon your queries. One thing is for sure.. You will have to do some embedding.
If you are going to perform more queries on what a user is doing, then make user as the primary entity and embed the courses within it. You don't need to embed the entire course info. The info about a course is static. For ex: the data about Node JS course - i.e. the content, author of the course, exercise files etc - will not change. So you can keep the courses' info separately in another collection. But how much of the course a user has completed is dependent on the individual user. So you should only keep the id of the course (which is stored in the separate 'course' collection) and for each user you can store the information that is related to that (User, Course) pair embedded in the user collection itself.
Now the most important question - what to do if you have to perform queries which require 'join' of user and course collections? For this you can use javascript to first get the courses (and maybe store them in an array or list etc) and then fetch the user for each of those courses from the courses collection or vice-versa. There are a few drivers available online to help you accomplish this. One is UnityJDBC which is available here.
From my experience, I understand that knowing what you are going to query from MongoDB is very helpful in designing your database because the NoSQL nature of MongoDB implies that you have no correct way for designing. Every way is incorrect if it does not allow you in accomplishing your task. So clearly, knowing beforehand what you will do (i.e. what you will query) with the database is the only guide.

Mongo for Meteor data design: opposite of normalizing?

I'm new to Meteor and Mongo. Really digging both, but want to get feedback on something. I am digging into porting an app I made with Django over to Meteor and want to handle certain kinds of relations in a way that makes sense in Meteor. Given, I am more used to thinking about things in a Postgres way. So here goes.
Let's say I have three related collections: Locations, Beverages and Inventories. For this question though, I will only focus on the Locations and the Inventories. Here are the models as I've currently defined them:
Location:
_id: "someID"
beverages:
_id: "someID"
fillTo: "87"
name: "Beer"
orderWhen: "87"
startUnits: "87"
name: "Second"
number: "102"
organization: "The Second One"
Inventories:
_id: "someID"
beverages:
0: Object
name: "Diet Coke"
units: "88"
location: "someID"
timestamp: 1397622495615
user_id: "someID"
But here is my dilemma, I often need to retrieve one or many Inventories documents and need to render the "fillTo", "orderWhen" and "startUnits" per beverage. Doing things the Mongodb way it looks like I should actually be embedding these properties as I store each Inventory. But that feels really non-DRY (and dirty).
On the other hand, it seems like a lot of effort & querying to render a table for each Inventory taken. I would need to go get each Inventory, then lookup "fillTo", "orderWhen" and "startUnits" per beverage per location then render these in a table (I'm not even sure how I'd do that well).
TIA for the feedback!
If you only need this for rendering purposes (i.e. no further queries), then you can use the transform hook like this:
var myAwesomeCursor = Inventories.find(/* selector */, {
transform: function (doc) {
_.each(doc.beverages, function (bev) {
// use whatever method you want to receive these data,
// possibly from some cache or even another collection
// bev.fillTo = ...
// bev.orderWhen = ...
// bev.startUnits = ...
}
}
});
Now the myAwesomeCursor can be passed to each helper, and you're done.
In your case you might find denormalizing the inventories so they are a property of locations could be the best option, especially since they are a one-to-many relationship. In MongoDB and several other document databases, denormalizing is often preferred because it requires fewer queries and updates. As you've noticed, joins are not supported and must be done manually. As apendua mentions, Meteor's transform callback is probably the best place for the joins to happen.
However, the inventories may contain many beverage records and could cause the location records to grow too large over time. I highly recommend reading this page in the MongoDB docs (and the rest of the docs, of course). Essentially, this is a complex decision that could eventually have important performance implications for your application. Both normalized and denormalized data models are valid options in MongoDB, and both have their pros and cons.

Using REST API for lookup lists dependant on context

I am currently trying to decide on the best approach to solve a problem I am having with designing my REST API.
The simplified scenario is my web application has two resources for example departments and employees. Both are security controlled within the business layer.
A user can exist who has access to employee but not to department, however when this user edits an employee they need to be able to select that employee's department from a drop down list (similarly they might have a list of employees that they want to filter by department).
Ordinarily that user would not have access to the department object so wouldn't be able to call /department/ for example but in the case of editing an employee they need the list of departments.
What would be the recommended way of dealing with this, would I return a list of departments on each GET of /employee/ or would I create another resource which was a combination of employee and department objects (department being the full list of departments)?
I can't currently change the security on the objects as this is deeply ingrained in the application logic.
Has anybody got any ideas?
Regards,
Gary
Create a new resource called something like 'DepartmentList'
Note: I think plural names are better.
You have to think of what would make the life of your users (devs) easier.
A combined resource would 'pollute' your api. Your api would expose /employees, /departments and /employeeDepartments. I don't think the latter deserves to be that high in the hierarchy.
It'd be also be a little more complex for your users to use:
"To edit an employee you need to set a department, BUT that department is not always available at /department, so you better get it from employeeDepartments ... "
Think of your employee object: GET /employees/123
employee:{
name: John,
...
department: {
id: ID
--a subset of data--
}
}
The subset of data should be enough to operate for Users with no rights, and Users with right access may operate on /departments/ID.
Now, how to get the list of available options?
I use to provide a 'special' action /new where I provide a 'form' which users can use as a template to post and create a new resource. This is not an adopted Rest 'standard' but is HATEOAS friendly - it really helps to the discoverability of your api.
So, GET /employees/new could print
employee:{
name: "",
...
department: [{ id: 1, --subset of data-- },{ id: 2, --subset of data-- }.. ]
}
There is some convention to be taken on the format (e.g: user needs to know that it only has to pick one department). But that's a hole new discussion.