Parse.com data design - mongodb

I have a question regarding how I should set up a database in Parse.com to have scalable queries. Below, is a picture of a simplified example of what I am trying to do.
So, as you can see with this idea, there is a user, that can create a ChatMessage or a MainMessage. The MainMessage also can have Comments posted to it. However, in a "feed" I will want to display both ChatMessages and MainMessages by dateCreated and I don't think that it will be an efficient query to get both ChatMessages and then MainMessages, and then sort.
To simplify, I have this...
As you can see, I made both ChatMessage and MainMessage from the first approach into a single Message with all of the attributes for both ChatMessage and MainMessage with the additional type attribute to distinguish between the two. Is it okay that in the new Message that will have multiple unused attributes based on type? Is that inefficient in Parse.com? Hypothetically, if MainMessage had way more (say 10) attributes than ChatMessage would this still be okay? I believe the second approach is better because I can simplify the query for the "feed". However, if there is anybody with anything to say on how I should be setting up the database, any comments are greatly appreciated. Thanks.

Related

How to properly access children by filtering parents in a single REST API call

I'm rewriting an API to be more RESTful, but I'm struggling with a design issue. I'll explain the situation first and then my question.
SITUATION:
I have two sets resources users and items. Each user has a list of item, so the resource path would like something like this:
api/v1/users/{userId}/items
Also each user has an isPrimary property, but only one user can be primary at a time. This means that if I want to get the primary user you'd do something like this:
api/v1/users?isPrimary=true
This should return a single "primary" user.
I have client of my API that wants to get the items of the primary user, but can't make two API calls (one to get the primary user and the second to get the items of the user, using the userId). Instead the client would like to make a single API call.
QUESTION:
How should I got about designing an API that fetches the items of a single user in only one API call when all the client has is the isPrimary query parameter for the user?
MY THOUGHTS:
I think I have a some options:
Option 1) api/v1/users?isPrimary=true will return the list of items along with the user data.
I don't like this one, because I have other API clients that call api/v1/users or api/v1/users?isPrimary=true to only get and parse through user data NOT item data. A user can have thousands of items, so returning those items every time would be taxing on both the client and the service.
Option 2) api/v1/users/items?isPrimary=true
I also don't like this because it's ugly and not really RESTful since there is not {userId} in the path and isPrimary isn't a property of items.
Option 3) api/v1/users?isPrimary=true&isShowingItems=true
This is like the first one, but I use another query parameter to flag whether or not to show the items belonging to the user in the response. The problem is that the query parameter is misleading because there is no isShowingItems property associated with a user.
Any help that you all could provide will be greatly appreciated. Thanks in advance.
There's no real standard solution for this, and all of your solutions are in my mind valid. So my answer will be a bit subjective.
Have you looked at HAL for your API format? HAL has a standard way to embed data from one resources into another (using _embedded) and it sounds like a pretty valid use-case for this.
The server can decide whether to embed the items based on a number of criteria, but one cheap solution might be to just add a query parameter like ?embed=items
Even if you don't use HAL, conceptually you could still copy this behavior similarly. Or maybe you only use _embedded. At least it's re-using an existing idea over building something new.
Aside from that practical solution, there is nothing in un-RESTful about exposing data at multiple endpoints. So if you created a resource like:
/v1/primary-user-with-items
Then this might be ugly and inconsistent with the rest of your API, but not inherently
'not RESTful' (sorry for the double negative).
You could include a List<User.Fieldset> parameter called fieldsets, and then include things if they are specified in fieldsets. This has the benefit that you can reuse the pattern by adding fieldsets onto any object in your API that has fields you might wish to include.
api/v1/users?isPrimary=true&fieldsets=items

Parse DB Design: How to get all the posts for particular category

I'm creating a discussion system using Parse.com
In my [simplified] system, there are Posts, Categorys, and Comments.
As you probably imagined, Posts can belong to one or more Categorys and can have multiple Comments.
However, often users will want to see all the Posts in a Category. If I set up my database like this
Post (name, content, categories)
Category(name)
I am worried that querying for all the Posts in a Category will be very ineffeficient (since it will have to check the categories field of every Post.
However, if I design the database like
Post (name, content)
Category(name, posts)
it will be inefficient for me to query what Categorys a Post belongs to since it will have to search all the Posts arrays in the all the Categorys.
I'm sure this must be a common Database design dilemma but I am still new at this. What is the best way to approach and solve this problem?
What you're looking for is a bi-directional, many-to-many relationship between Post and Category. With Parse, there are at least three approaches you can take.
You can add a column as a PFRelation to the Post table. You can ask a Post for its categories relation, create a query from that and run it. Inversely, if you have a category you can create a Post query with a where clause on the categories key. PFRelations are good if you will have big collections.
If you think better as a relational model, just create a "join" table called CategoryPosts. It would have two pointer columns, one for the Post and another for the Category. This is also very efficient.
Lastly, you could add an array column to either class. Since all of the results are loaded at once, this works best for smaller collections.
These options are described in a little more detail in the Parse Relations Documentation.

Best way to store/get values referenced from a list in Mongo/RectiveMongo?

I have a quite common use case - a list of comments. Each comment has an author.
I'm storing the reference from a comment to the author using a reference, since an author can make multiple comments.
Now I'm working with ReactiveMongo and want to try to keep the database access asynchronous, but in this case, I don't know how. I do an asynchronous access to the database, to get the comments, but then for each comment I have to get the author, and until now the only way I know is to loop through the comments and get the user synchronously:
val userOption:Option[JsObject] = Await.result(usersCollection.find(Json.obj("id" -> userId).one[JsObject], timeout)
//...
Other than that, I could:
Get each user asynchronously but then I have to introduce some functionality to wait until all user were fetched, in order to return the response, and my code is likely to become a mess.
Store the complete user object - at least what I need for the comment (picture, name and such) in each comment. This redundancy could become troublesome to manage, since each time a user changes something (relevant to the data stored in the comments) I would have to go through all the comments in the database and modify it.
What is the correct pattern to apply here?
I tackled this exact problem a while ago.
There are no joins in mongo.
You have to manually take care of the join.
Your options are:
Loop through each comment entry and query mongo for the user. this is what you're doing.
Get all user id's from comments, query mongo for the users matching these ids, then take care to match user to comment.This is just what you did but a little more optimized.
Embed the user in comments or comments in users. Wouldn't recommend this, this is probably not the right place for comments/users.
Think of what set of data do you need from user when displaying a comment, and embed just this info in comment
I ended up going with the last option.
We embedded the user id, first and last name in each comment.
This info is unlikely to change (possibly not even allowed to change after creation?).
If it can change then it is not too hard to tailor the update-user method to update the related comments with the new info (we did that too).
So now no join is needed.

Breeze: complex graph returns only 1 collection

I have a physician graph that looks something like this:
The query I use to get data from a WebApi backend looks like this:
var query = new breeze.EntityQuery().from("Physicians")
.expand("ContactInfo")
.expand("ContactInfo.Phones")
.expand("ContactInfo.Addresses")
.expand("PhysicianNotes")
.expand("PhysicianSpecialties")
.where("ContactInfo.LastName", "startsWith", lastInitial).take(5);
(note the ContactInfo is a pseudonym of the People object)
What I find is that If I request Contact.Phones to be expanded, I'll get just phones and no Notes or Specialties. If I comment out the phones I'll get Contact.Addresses and no other collections. If I comment out ContactInfo along with Phones and Addresses I'll get Notes only etc. Essentially, it seems like I can only get one collection at a time.
So, Is this a built in 'don't let the programmer shoot himself in the foot'?? safeguard or do I have to enable something?
OR is this graph too complicated?? should I consider a NoSql object store??
Thanks
You need to put all your expand clauses in a single one like this:
var query = new breeze.EntityQuery().from("Physicians")
.expand("ContactInfo, ContactInfo.Phones, ContactInfo.Addresses, PhysicianNotes, PhysicianSpecialties")
.where("ContactInfo.LastName", "startsWith", lastInitial).take(5);
You can see the documentation here: http://www.breezejs.com/sites/all/apidocs/classes/EntityQuery.html#method_expand
JY told you HOW. But BEWARE of performance consequences ... both on the data tier and over the wire. You can die a miserable death by grabbing too widely and deeply at once.
I saw the take(5) in his sample. That is crucial for restraining a runaway request (something you really must do also on the server). In general, I would reserve extended graph fetches of this kind for queries that pulled a single root entity. If I'm presenting a list for selection and I need data from different parts of the entity graph, I'd use a projection to get exactly what I need to display (assuming, of course, that there is no SQL View readily available for this purpose).
If any of the related items are reference lists (color, status, states, ...), consider bringing them into cache separately in a preparation step. Don't include them in the expand; Breeze will connect them on the client to your queried entities automatically.
Finally, as a matter of syntax, you don't have to repeat the name of a segment. When you write "ContactInfo.Phones", you get both ContactInfos and Phones so you don't need to specify "ContactInfo" by itself.

MongoDB ObjectId foreign key implementation recommendation

I'm looking for a recommendation on how best to implement MongoDB foreign key ObjectId fields. There seem to be two possible options, either containing the nested _id field or without.
Take a look at the fkUid field below.
{'_id':ObjectId('4ee12488f047051590000000'), 'fkUid':{'_id':ObjectId('4ee12488f047051590000001')} }
OR
{'_id':ObjectId('4ee12488f047051590000000'), 'fkUid':ObjectId('4ee12488f047051590000001')} }
Any recommendations would be much appreciated.
I'm having a hard time coming up with any possible advantages for putting an extra field "layer" in there, so I would personally just store the ObjectId directly in fkUid.
I suggest to use default dbref implementation, that is described here http://www.mongodb.org/display/DOCS/Database+References and is compatible with most of specific language drivers.
If your question is about the naming of the field (what you have in the title), usually the convention is to name it after the object to which it refers.
The both ways that you have mentioned are one of the same meaning. But they have different kind of usages.
Storing fkUid like 'fkUid':{'_id':ObjectId('4ee12488f047051590000001')} an object has it's own pros. Let me give an example, Suppose there is a website where users can post images and view images posted by other users as well. But when showing the image the website also shows the name/username of the user. By using this way you also can store the details like 'fkUid':{'_id':ObjectId('4ee12488f047051590000001'), username: 'SOME_X'}. When you are getting details from the db you don't have to send a request again to get the username for the specific _id.
Where as in the second way 'fkUid':ObjectId('4ee12488f047051590000001')} } you have to send another request to the server only for getting the name/username and nothing else is useful from the same object.