Best way to store/get values referenced from a list in Mongo/RectiveMongo? - mongodb

I have a quite common use case - a list of comments. Each comment has an author.
I'm storing the reference from a comment to the author using a reference, since an author can make multiple comments.
Now I'm working with ReactiveMongo and want to try to keep the database access asynchronous, but in this case, I don't know how. I do an asynchronous access to the database, to get the comments, but then for each comment I have to get the author, and until now the only way I know is to loop through the comments and get the user synchronously:
val userOption:Option[JsObject] = Await.result(usersCollection.find(Json.obj("id" -> userId).one[JsObject], timeout)
//...
Other than that, I could:
Get each user asynchronously but then I have to introduce some functionality to wait until all user were fetched, in order to return the response, and my code is likely to become a mess.
Store the complete user object - at least what I need for the comment (picture, name and such) in each comment. This redundancy could become troublesome to manage, since each time a user changes something (relevant to the data stored in the comments) I would have to go through all the comments in the database and modify it.
What is the correct pattern to apply here?

I tackled this exact problem a while ago.
There are no joins in mongo.
You have to manually take care of the join.
Your options are:
Loop through each comment entry and query mongo for the user. this is what you're doing.
Get all user id's from comments, query mongo for the users matching these ids, then take care to match user to comment.This is just what you did but a little more optimized.
Embed the user in comments or comments in users. Wouldn't recommend this, this is probably not the right place for comments/users.
Think of what set of data do you need from user when displaying a comment, and embed just this info in comment
I ended up going with the last option.
We embedded the user id, first and last name in each comment.
This info is unlikely to change (possibly not even allowed to change after creation?).
If it can change then it is not too hard to tailor the update-user method to update the related comments with the new info (we did that too).
So now no join is needed.

Related

How to properly access children by filtering parents in a single REST API call

I'm rewriting an API to be more RESTful, but I'm struggling with a design issue. I'll explain the situation first and then my question.
SITUATION:
I have two sets resources users and items. Each user has a list of item, so the resource path would like something like this:
api/v1/users/{userId}/items
Also each user has an isPrimary property, but only one user can be primary at a time. This means that if I want to get the primary user you'd do something like this:
api/v1/users?isPrimary=true
This should return a single "primary" user.
I have client of my API that wants to get the items of the primary user, but can't make two API calls (one to get the primary user and the second to get the items of the user, using the userId). Instead the client would like to make a single API call.
QUESTION:
How should I got about designing an API that fetches the items of a single user in only one API call when all the client has is the isPrimary query parameter for the user?
MY THOUGHTS:
I think I have a some options:
Option 1) api/v1/users?isPrimary=true will return the list of items along with the user data.
I don't like this one, because I have other API clients that call api/v1/users or api/v1/users?isPrimary=true to only get and parse through user data NOT item data. A user can have thousands of items, so returning those items every time would be taxing on both the client and the service.
Option 2) api/v1/users/items?isPrimary=true
I also don't like this because it's ugly and not really RESTful since there is not {userId} in the path and isPrimary isn't a property of items.
Option 3) api/v1/users?isPrimary=true&isShowingItems=true
This is like the first one, but I use another query parameter to flag whether or not to show the items belonging to the user in the response. The problem is that the query parameter is misleading because there is no isShowingItems property associated with a user.
Any help that you all could provide will be greatly appreciated. Thanks in advance.
There's no real standard solution for this, and all of your solutions are in my mind valid. So my answer will be a bit subjective.
Have you looked at HAL for your API format? HAL has a standard way to embed data from one resources into another (using _embedded) and it sounds like a pretty valid use-case for this.
The server can decide whether to embed the items based on a number of criteria, but one cheap solution might be to just add a query parameter like ?embed=items
Even if you don't use HAL, conceptually you could still copy this behavior similarly. Or maybe you only use _embedded. At least it's re-using an existing idea over building something new.
Aside from that practical solution, there is nothing in un-RESTful about exposing data at multiple endpoints. So if you created a resource like:
/v1/primary-user-with-items
Then this might be ugly and inconsistent with the rest of your API, but not inherently
'not RESTful' (sorry for the double negative).
You could include a List<User.Fieldset> parameter called fieldsets, and then include things if they are specified in fieldsets. This has the benefit that you can reuse the pattern by adding fieldsets onto any object in your API that has fields you might wish to include.
api/v1/users?isPrimary=true&fieldsets=items

Sails.js one to many embedded associations with mongo

On the sails documentation here it shows modeling one to many associations with what looks like high level referencing.
Lets say I want to use mongo to make a post that has a lot of comments on it. I will take the post as the document and in it I will embed all the comments in one attribute.
If I did it like the documentation, would the mongo adapter automatically, create a document with the comments embedded? or would it do something relational and reference the comments?
If it doesn't embed, how would I go about putting the embedded comments in my model?
Thanks
Mongo doesn't provide associations on its own. Sails uses Waterline for ORM.
You need to create your Comment object yourself and just add its id to the appropriate attribute in the Post instance(which should be a collection), using post.comments.add(comment.id).
Removal is similar, just call post.comments.remove(comment.id)
Note that at some point you might not like to have thousands of commentids being fetched every time you retrieve a Post (or worse, thousands of Comment documents if you populate and fetch). This, of course, is only a concern if you're expecting thousands of comments per post in the first place.
Oh, and don't forget to save your document to finalize the changes.

Is it possible to group multiple collections in mongodb

so I'm working with a database that has multiple collections and some of the data overlaps in the collection . In particular I have a collection called app-launches which contains a field called userId and one called users where the _id of a particular object is actually the same as the userId in app-launches. Is it possible to group the two collections together so I can analyze the data? Or maybe match the the userId in app-launches with the _id in users?
There is no definit answer for your question Jeffrey and none of the experts here can tell you to choose which technique over other just by having this information.
After going through various web pages over internet and mongo documentation and understanding the design patterns used in Mongo over a period of time, How I would design it depends on few things which I can try explaining it here in short.
if you have a One-To-One relation then always prefer to choose Embedding over Linking. e.g. User and its address (assuming user has only one address) thus you can utilize the atomicity (without worrying about transactions) as well easily fetch the records without too and fro to bring other information as in the case of Linking (like in DBRef)
If you have One-To-Many relation then you need to consider whether you can do the stuff by using Embedding (prefer this as explained the benefits in point 1). However, embedding would help you if you always want the information altogether e.g. Post/Comments where your requirement is to get the post and all of its comments by postId let say. But think of a situation where you need to get all the comments (and it related posts) which contains some specific tags in comments. in this case you should prefer Linking Because if you go via Embedding route then you would end up getting all the collection of comments for a post and you have to filter the desired comments.
for a Many-To-Many relations I would prefer two separate entities as well another collection for linking them e.g. Product-Category.
-$

"Why" does Backbone NOT have a save (PUT/POST) method for its collections - is it unRESTful?

I asked a question a while back i.e. "How save an entire backbone collection?". However what intrigues me is that why is a save method not offered? Is it unRESTful to save (PUT/POST) entire collections or is it uncommon to do so in the REST-land?
GET: /MySite/Collections - allowed by collection.fetch()
POST: /MySite/Collections - for the model(s) in the collection to be Posted when calling model.save()
PUT: /MySite/Collections/{id} - for the model(s) to be updated individually
GET: /MySite/Collections/{id} - to fetch an individual model throuth model.fetch()
So why not allow for POST/PUT an entire collection of resources? It is convenient sometimes and although one can wrap/hack out some code using collection.toJSON why not include it? I'm just curious about its absence and the rationale for the same. Frameworks not having the capability of a few things usually implies bad programming/design and are thus left out. Is saving an entire collection 'bad practice'?
The wikipedia article about REST does mention CRUD verbs for collection.
But, in my opinion, a Collection is not a resource, it is not an entity, and it has not state. It is, instead, a bunch of resources. And if there would be an UPDATE command for a Collection it would be nothing else but a multiple UPDATE commands over multiple Models. Having the possibility of multiple UPDATE commands in only one request would be helpful but I think this is not a job for the REST implementation.
Also there will be problems of ambiguity, for example in a Collection that contains already saved Models with id and so on, and others that not, what will a POST command mean?... or an UPDATE command?...
No talking about the increase of the complexity in the server side where, if this Collection REST support should be taken like standard, we should to work the double to accomplish the casuistic.
Summarizing: I don't see any case where the need of a Collection REST command can't be solved with the actual, simpler, only-Model REST commands, so keeping the things as simple as possible I think is a good habit.

How to get list of aggregates using JOliviers's CommonDomain and EventStore?

The repository in the CommonDomain only exposes the "GetById()". So what to do if my Handler needs a list of Customers for example?
On face value of your question, if you needed to perform operations on multiple aggregates, you would just provide the ID's of each aggregate in your command (which the client would obtain from the query side), then you get each aggregate from the repository.
However, looking at one of your comments in response to another answer I see what you are actually referring to is set based validation.
This very question has raised quite a lot debate about how to do this, and Greg Young has written an blog post on it.
The classic question is 'how do I check that the username hasn't already been used when processing my 'CreateUserCommand'. I believe the suggested approach is to assume that the client has already done this check by asking the query side before issuing the command. When the user aggregate is created the UserCreatedEvent will be raised and handled by the query side. Here, the insert query will fail (either because of a check or unique constraint in the DB), and a compensating command would be issued, which would delete the newly created aggregate and perhaps email the user telling them the username is already taken.
The main point is, you assume that the client has done the check. I know this is approach is difficult to grasp at first - but it's the nature of eventual consistency.
Also you might want to read this other question which is similar, and contains some wise words from Udi Dahan.
In the classic event sourcing model, queries like get all customers would be carried out by a separate query handler which listens to all events in the domain and builds a query model to satisfy the relevant questions.
If you need to query customers by last name, for instance, you could listen to all customer created and customer name change events and just update one table of last-name to customer-id pairs. You could hold other information relevant to the UI that is showing the data, or you could simply hold IDs and go to the repository for the relevant customers in order to work further with them.
You don't need list of customers in your handler. Each aggregate MUST be processed in its own transaction. If you want to show this list to user - just build appropriate view.
Your command needs to contain the id of the aggregate root it should operate on.
This id will be looked up by the client sending the command using a view in your readmodel. This view will be populated with data from the events that your AR emits.