DB relationship: implementing a conversation - mongodb

I want to implement a simple conversation feature, where each conversation has a set of messages between two users. My question is, if I have a reference from a message to a conversation, whether I should have a reference the other way as well.
Right now, each message has conversationId. To retrieve all the messages the belong to a certain conversation, I should do Message.find({conversationId: ..}). If I had stored an array of messages in a conversation object, I could do conversation.messages.
Which way is the convention?

It all depends on usage patterns. First, you normalize: 1 conversation has many messages, 1 message belongs to 1 conversation. That means you've got a 1-to-many (1:M) relationship between conversations and messages.
In a 1:M relationship, the SQL standard is to assign the "1" as a foreign key to each of the "M". So, each message would have the conversationId.
In Mongo, you have the option of doing the opposite via arrays. Like you said, you could store an array of messageIds in the conversation. This gets pretty messy because for every new message, you have to edit the conversation doc. You're essentially doubling your writes to the DB & keeping the 2 writes in sync is completely on you (e.g. what if the user deletes a message & it's not deleted from the conversation?).
In Mongo, you also have to consider the difference between 1:M and 1:F (1-to-few). Many times, it's advantageous to nest 1:F relationships, ie make the "F" a subdoc of the "1". There is a limit: each doc cannot exceed 16MB (this may lift in future versions). The advantage of nesting subdocs is you have atomic transactions because it's all the same doc, not to mention subscriptions in a pub/sub are easier. This may work, but if you've got a group-chat with 20 friends that's been going on for the last 4 years, you might have to get clever (cap it, start a new conversation, etc.)
Nesting would be my recommendation, although your origin idea of assigning a conversationId to each message works too (make sure to index!).

Related

Should I use one projection per entity or category?

I am coding a new application usign CQRS+ES architecture with Event Store DB. In my app, I have the following streams:
user-1
user-2
user-3
...
Each stream contains all events regarding a given user.
I am now creating a projection called user-account, which consists in basic data regarding my user's account (like first name, email, and others)
What is the optimal way to design that projection?
I should have a single projection for each user, creating projections called:
user-account-1
user-account-2
user-account-3
...
Or a single projection for all user-accounts? Being it a key-value pair record (that may store millions of keys in the future)
You can go with one stream per user. Projections are like dimensions. A user can exist in different "dimensions" (CDC naming) and have a different shape in each.
Read https://www.eventstore.com/blog/the-cost-of-creating-a-stream
First, subscribing to individual streams (aggregate or entity streams) won't ever work. You will end up with thousands of subscriptions, which are sitting there doing nothing (how often the user details change?).
The category stream is one way to go, you will project all the events for all the users. Not only you need just one subscription for all your users, but you'll also have more interesting possibilities like "users pending activation" or "blocked users" projections.
I prefer subscribing to $all and apply server-side filtering if necessary. It might have a bit of overhead as you receive more events than you need, but you get so much more power by combining events from different aggregates.
I wrote a little about it in Eventuous documentation.

Message library

The scenario is: some user sending messages to some group of people.
I was thinking to create one ROW for that specific conversation into one CLASS. WHERE in that ROW contains information such "sender name", "receiver " and addition I have column (PFRelation) which connects this specific row to another class where all messages from the user to the receiver would be saved(vice-versa) into.
So this action will happen every time the user starts a new conversation.
The benefit from this prospective :
Privacy because the only convo that is being saved are only from the user and the receiver group.
Downside of this prospective:
We all know that parse only provide 30reqs/s for free which means that 1 min =1800 reqs. So every time I create a new class to keep track of the convo. Am I using a lot of requests ?
I am looking suggestions and thoughts for the ideal way before I implement this messenger library.
It sounds like you have come up with something that is similar to what I have used before to implement messaging in an app with Parse as a backend. It's also important to think about how your UI will be querying for data. In general, it's most important to ensure that it is very easy and fast to read data. For most social apps, the following quote from Facebook's engineering team on Haystack is particularly relevant.
Haystack is an object store that we designed for sharing photos on
Facebook where data is written once, read often, never modified, and
rarely deleted.
The crucial piece of information here is written once, read often, never modified, and rarely deleted. No matter what approach you decide to take, keep that in mind while engineering your solution. The approach that I have used before to implement a messaging system using Parse is described below.
Overview
Each row (object) of the Message class corresponds with an individual text, picture, or video message that was posted. Each Message belongs to a Group. A Group can be as small as 2 User (private conversation) or grow as large as you like.
The RecentMessage class is the solution I came up with to deal with quickly and easily populating the UI. Each RecentMessage object corresponds to each Group that a given User may belong. Each User in a Group will have their own RecentMessage object which is kept up to date using beforeSave/afterSave cloud code triggers. Whenever a new Message is created, in the afterSave trigger we want to update all of the RecentMessage objects that belong to the Group.
You will most likely have a table in your app which displays all of the conversations that the user is part of. This is easily achieved by querying for all of that user's RecentMessage objects which already contains all of the Group information needed to load the rest of the messages when selected and also contains the most recent message's data (hence the name) to display in the table. Alternatively, RecentMessage could contain a pointer to the most recent Message, however I decided that copying the data was a beneficial tradeoff since it streamlines future queries.
Message
group (pointer to group which message is part of)
user (pointer to user who created it)
text (string)
picture (optional file)
video (optional file)
RecentMessage
group (group pointer)
user (user pointer)
lastMessage (string containing the text of most recent Message)
lastUser (pointer to the User who posted the most recent Message)
Group
members (array of user pointers)
name or whatever other info you want
Security/Privacy
Security and privacy are imperative when creating messaging functionality in your app. Make sure to read through the Parse Engineering security blog posts, and take your time to let it all soak in: Part I, Part II, Part III, Part IV, Part V.
Most important in our case is Part III which describes ACLs, or Access Control Lists. Group objects will have an ACL which corresponds to all of its member User. RecentMessage objects will have a restricted read/write ACL to its owner User. Message objects will inherit the ACL of the Group to which they belong, allowing all of the Group members to read. I recommend disabling the write ACL in the afterSave trigger so messages cannot be modified.
General Remarks
With regards to Parse and the request limit, you need to accept that fact that you will very quickly surpass the 30 req/s free tier. As a general rule of thumb, it's much better to focus on building the best possible user experience than to focus too much on scalability. By and large, issues of scalability rarely come into play because most apps fail. Not saying that to be discouraging — just something to keep in mind to prevent you from falling into the trap of over-engineering at the cost of time :)

How to optimize collection subscription in Meteor?

I'm working on a filtered live search module with Meteor.js.
Usecase & problem:
A user wants to do a search through all the users to find friends. But I cannot afford for each user to ask the complete users collection. The user filter the search using checkboxes. I'd like to subscribe to the matched users. What is the best way to do it ?
I guess it would be better to create the query client-side, then send it the the method to get back the desired set of users. But, I wonder : when the filtering criteria changes, does the new subscription erase all of the old one ? Because, if I do a first search which return me [usr1, usr3, usr5], and after that a search that return me [usr2, usr4], the best would be to keep the first set and simply add the new one to it on the client-side suscribed collection.
And, in addition, if then I do a third research wich should return me [usr1, usr3, usr2, usr4], the autorunned subscription would not send me anything as I already have the whole result set in my collection.
The goal is to spare processing and data transfer from the server.
I have some ideas, but I haven't coded enough of it yet to share it in a easily comprehensive way.
How would you advice me to do to be the more relevant possible in term of time and performance saving ?
Thanks you all.
David
It depends on your application, but you'll probably send a non-empty string to a publisher which uses that string to search the users collection for matching names. For example:
Meteor.publish('usersByName', function(search) {
check(search, String);
// make sure the user is logged in and that search is sufficiently long
if (!(this.userId && search.length > 2))
return [];
// search by case insensitive regular expression
var selector = {username: new RegExp(search, 'i')};
// only publish the necessary fields
var options = {fields: {username: 1}};
return Meteor.users.find(selector, options);
});
Also see common mistakes for why we limit the fields.
performance
Meteor is clever enough to keep track of the current document set that each client has for each publisher. When the publisher reruns, it knows to only send the difference between the sets. So the situation you described above is already taken care of for you.
If you were subscribed for users: 1,2,3
Then you restarted the subscription for users 2,3,4
The server would send a removed message for 1 and an added message for 4.
Note this will not happen if you stopped the subscription prior to rerunning it.
To my knowledge, there isn't a way to avoid removed messages when modifying the parameters for a single subscription. I can think of two possible (but tricky) alternatives:
Accumulate the intersection of all prior search queries and use that when subscribing. For example, if a user searched for {height: 5} and then searched for {eyes: 'blue'} you could subscribe with {height: 5, eyes: 'blue'}. This may be hard to implement on the client, but it should accomplish what you want with the minimum network traffic.
Accumulate active subscriptions. Rather than modifying the existing subscription each time the user modifies the search, start a new subscription for the new set of documents, and push the subscription handle to an array. When the template is destroyed, you'll need to iterate through all of the handles and call stop() on them. This should work, but it will consume more resources (both network and server memory + CPU).
Before attempting either of these solutions, I'd recommend benchmarking the worst case scenario without using them. My main concern is that without fairly tight controls, you could end up publishing the entire users collection after successive searches.
If you want to go easy on your server, you'll want to send as little data to the client as possible. That means every document you send to the client that is NOT a friend is waste. So let's eliminate all that waste.
Collect your filters (eg filters = {sex: 'Male', state: 'Oregon'}). Then call a method to search based on your filter (eg Users.find(filters). Additionally, you can run your own proprietary ranking algorithm to determine the % chance that a person is a friend. Maybe base it off of distance from ip address (or from phone GPS history), mutual friends, etc. This will pay dividends in efficiency in a bit. Index things like GPS coords or other highly unique attributes, maybe try out composite indexes. But remember more indexes means slower writes.
Now you've got a cursor with all possible friends, ranked from most likely to least likely.
Next, change your subscription to match those friends, but put a limit:20 on there. Also, only send over the fields you need. That way, if a user wants to skip this step, you only wasted sending 20 partial docs over the wire. Then, have an infinite scroll or 'load more' button the user can click. When they load more, it's an additive subscription, so it's not resending duplicate info. Discover Meteor describes this pattern in great detail, so I won't.
After a few clicks/scrolls, the user won't find any more friends (because you were smart & sorted them) so they will stop trying & move on to the next step. If you returned 200 possible friends & they stop trying after 60, you just saved 140 docs from going through the pipeline. There's your efficiency.

Creation Concurrency with CQRS and EventStore

Baseline info:
I'm using an external OAuth provider for login. If the user logs into the external OAuth, they are OK to enter my system. However this user may not yet exist in my system. It's not really a technology issue, but I'm using JOliver EventStore for what it's worth.
Logic:
I'm not given a guid for new users. I just have an email address.
I check my read model before sending a command, if the user email
exists, I issue a Login command with the ID, if not I issue a
CreateUser command with a generated ID. My issue is in the case of a new user.
A save occurs in the event store with the new ID.
Issue:
Assume two create commands are somehow issued before the read model is updated due to browser refresh or some other anomaly that occurs before consistency with the read model is achieved. That's OK that's not my problem.
What Happens:
Because the new ID is a Guid comb, there's no chance the event store will know that these two CreateUser commands represent the same user. By the time they get to the read model, the read model will know (because they have the same email) and can merge the two records or take some other compensating action. But now my read model is out of sync with the event store which still thinks these are two separate entities.
Perhaps it doesn't matter because:
Replaying the events will have the same effect on the read model
so that should be OK.
Because both commands are duplicate "Create" commands, they should contain identical information, so it's not like I'm losing anything in the event store.
Can anybody illuminate how they handled similar issues? If some compensating action needs to occur does the read model service issue some kind of compensation command when it realizes it's got a duplicate entry? Is there a simpler methodology I'm not considering?
You're very close to what I'd consider a proper possible solution. The scenario, if I may summarize, is somewhat like this:
Perform the OAuth-entication.
Using the read model decide between a recurring visitor and a new visitor, based on the email address.
In case of a new visitor, send a RegisterNewVisitor command message that gets handled and stored in the eventstore.
Assume there is some concurrency going on that, for the same email address, causes two RegisterNewVisitor messages, each containing what the system thinks is the key associated with the email address. These keys (guids) are different.
Detect this duplicate key issue in the read model and merge both read model records into one record.
Now instead of merging the records in the read model, why not send a ResolveDuplicateVisitorEmailAddress { Key1, Key2 } towards your domain model, leaving it up to the domain model (the codified form of the business decision to be taken) to resolve this issue. You could even have a dedicated read model to deal with these kind of issues, the other read model will just get a kind of DuplicateVisitorEmailAddressResolved event, and project it into the proper records.
Word of warning: You've asked a technical question and I gave you a technical, possible solution. In general, I would not apply this technique unless I had some business indicator that this is worth investing in (what's the frequency of a user logging in concurrently for the first time - maybe solving it this way is just a way of ignoring the root cause (flakey OAuth, no register new visitor process in place, etc)). There are other technical solutions to this problem but I wanted to give you the one closest to what you already have in place. They range from registering new visitors sequentially to keeping an in-memory projection of the visitors not yet in the read model.

How to get list of aggregates using JOliviers's CommonDomain and EventStore?

The repository in the CommonDomain only exposes the "GetById()". So what to do if my Handler needs a list of Customers for example?
On face value of your question, if you needed to perform operations on multiple aggregates, you would just provide the ID's of each aggregate in your command (which the client would obtain from the query side), then you get each aggregate from the repository.
However, looking at one of your comments in response to another answer I see what you are actually referring to is set based validation.
This very question has raised quite a lot debate about how to do this, and Greg Young has written an blog post on it.
The classic question is 'how do I check that the username hasn't already been used when processing my 'CreateUserCommand'. I believe the suggested approach is to assume that the client has already done this check by asking the query side before issuing the command. When the user aggregate is created the UserCreatedEvent will be raised and handled by the query side. Here, the insert query will fail (either because of a check or unique constraint in the DB), and a compensating command would be issued, which would delete the newly created aggregate and perhaps email the user telling them the username is already taken.
The main point is, you assume that the client has done the check. I know this is approach is difficult to grasp at first - but it's the nature of eventual consistency.
Also you might want to read this other question which is similar, and contains some wise words from Udi Dahan.
In the classic event sourcing model, queries like get all customers would be carried out by a separate query handler which listens to all events in the domain and builds a query model to satisfy the relevant questions.
If you need to query customers by last name, for instance, you could listen to all customer created and customer name change events and just update one table of last-name to customer-id pairs. You could hold other information relevant to the UI that is showing the data, or you could simply hold IDs and go to the repository for the relevant customers in order to work further with them.
You don't need list of customers in your handler. Each aggregate MUST be processed in its own transaction. If you want to show this list to user - just build appropriate view.
Your command needs to contain the id of the aggregate root it should operate on.
This id will be looked up by the client sending the command using a view in your readmodel. This view will be populated with data from the events that your AR emits.