Retrieving a list of records in deepstream.io - chat

I'm currently implementing a simple chat in order to learn how to use deepstream.io. Is there an easy way to get an interval from, lets say, a list of records? Imagine the scenario that a user wants to get old chat messages by scrolling back in the history. I could not find anything about this in the documentation, and I have read through the source with no luck.
Is my best bet to work against a database (e.g. RethinkDb) directly or is there an easy way to do it through deepstream?

First: The bad news:
deepstream.io is purely a messaging server - it doesn't look into the data that passes through it. This means that any kind of querying functionality would need to be provided by another system, e.g. a client connected to RethinkDB.
Having said that: There's good news:
We're also looking into adding chat functionality (including extensive history keeping and searching) into our application.
Since chat messages are immutable (won't change once they are send) we will use deepstream events, rather than records. In order to facilitate chat history keeping, we’ll create a "chat history provider", a node process that sits between deepstream and our database and listens for any event that starts with 'chat-'. (Assuming your chat events are named
chat-<chat-name>/<message-id>, e.g. chat-idle-banter/254kbsdf-5mb2soodnv)
On a very high level our chat-history-provider will look like this:
ds.event.listen( /chat-*/, function( chatName, messageData ) {
//Add the timestamp on the server-side, otherwise people
//can change the order of messages by changing their system clock
messageData.timestamp = Date.now();
rethinkdbConnector.set( chatName, messageData );
});
ds.rpc.provide( 'get-chat-history', function( data, response ){
//Query your database here
});
Currently deepstream only supports "listening" for records, but the upcoming version will offer the same kind of functionality for events and rpcs.

Related

Microservices "JOIN" tables within different databases and data replication

I'm trying to achieve data join between entities.
I've got 2 separated microservices which can communicate with each other using events (rabbitmq). And all the requests are currently joined within an api gateway.
Suppose my first service is UserService , and second service is ProductService.
Usually to get a list of products we do an GET request like /products , the same goes when we want to create a product , we do an POST request like /products.
The product schema looks something like this:
{
title: 'ProductTitle`,
description: 'ProductDescriptio',
user: 'userId'
...
}
The user schema looks something like this:
{
username: 'UserUsername`,
email: 'UserEmail'
...
}
So , when creating a product or getting list of products we will not have some details about user like email, username...
What i'm trying to achieve is to get user details when creating or querying for a list of products along with user details like so:
[
{
title: 'ProductTitle`,
description: 'ProductDescriptio',
user: {
username: 'UserUsername`,
email: 'UserEmail'
}
}
]
I could make an REST GET request to UserService , to get the user details for each product.
But my concern is that if UserService goes down the product will not have user details.
What are other ways to JOIN tables ? other than making REST API calls ?
I've read about DATA REPLICATION , but here's another concern how do we keep a copy of user details in ProductService when we create a new product with and POST request ?
Usually i do not want to keep a copy of user details to ProductService if he did not created a product. I could also emit events to each other service.
Approach 1- Data Replication
Data replication is not harmful as long as it makes your service independent and resilient. But too much data replication is not good either. Microservices doesn't fit well every case so we have to compromise on things as well.
Approach 2- Event sourcing and Materialized views
Generally if you have data consist of multiple services you should be considering event sourcing and Materialized views. These views are pre-complied disposable data tables that can be updated using published events from different data services . Say your "user" service publish the event , then you would update your view if another related event is published you can add/update materialized views and so on. These views can be saved in cache for fast retrieval and can be queried to get the data. This pattern adds little complexity but it's highly scale-able.
Event sourcing is basically a store to save all your events and replay the events to reach the particular state of system. Generally we create Materialized views from event store.
Say e.g. you have event store where you keep on saving all your published events. At the same time you are also updating your Materialized views. If you want to query the data then you will be getting it from your Materialized views. Since Materialized views are disposable that can always be generated from event store. Say Materialized views which was in cache got corrupted , you can completely regenerate the view from Event store by replaying the events. Say if i miss the cache hit i can still get the data from event store by replaying the events. You can find more on the following links.
Event Sourcing , Materialized view
Actually we are working with data replication to make each microservice more resilient (giving them the chance to still work even if another service is down).
This can be achieved in many ways, e.g. in your case by making the ProductService listening to the events send by the UserSevice when a user is created, deleted, etc.
Or the UserService could have a feed the ProductService is reading every n minutes or so marking the position last read on the feed. Etc.
There are many thing to consider when designing services and it really depends on your systems mission. E.g. you always have to evaluate the impact of coupling - if it is fine or not for a service not to be able to work when another service is down. Like, how important is a service and how is the impact on other services when this on is not able to work.
If you do not want to keep a copy of data not needed you could just read the data of the users that are related to a product. If a new product is created with a user that is not in your dataset you would then get it from the UserService. This would give you a stronger coupling then replicating everything but a weaker then replicating no data at all.
Again it really depends on what your systems is designed for and what it needs to achieve.

How to merge/consolidate responses from multiple RESTful microservices?

Let's say there are two (or more) RESTful microservices serving JSON. Service (A) stores user information (name, login, password, etc) and service (B) stores messages to/from that user (e.g. sender_id, subject, body, rcpt_ids).
Service (A) on /profile/{user_id} may respond with:
{id: 1, name:'Bob'}
{id: 2, name:'Alice'}
{id: 3, name:'Sue'}
and so on
Service (B) responding at /user/{user_id}/messages returns a list of messages destined for that {user_id} like so:
{id: 1, subj:'Hey', body:'Lorem ipsum', sender_id: 2, rcpt_ids: [1,3]},
{id: 2, subj:'Test', body:'blah blah', sender_id: 3, rcpt_ids: [1]}
How does the client application consuming these services handle putting the message listing together such that names are shown instead of sender/rcpt ids?
Method 1: Pull the list of messages, then start pulling profile info for each id listed in sender_id and rcpt_ids? That may require 100's of requests and could take a while. Rather naive and inefficient and may not scale with complex apps???
Method 2: Pull the list of messages, extract all user ids and make bulk request for all relevant users separately... this assumes such service endpoint exists. There is still delay between getting message listing, extracting user ids, sending request for bulk user info, and then awaiting for bulk user info response.
Ideally I want to serve out a complete response set in one go (messages and user info). My research brings me to merging of responses at service layer... a.k.a. Method 3: API Gateway technique.
But how does one even implement this?
I can obtain list of messages, extract user ids, make a call behind the scenes and obtain users data, merge result sets, then serve this final result up... This works ok with 2 services behind the scenes... But what if the message listing depends on more services... What if I needed to query multiple services behind the scenes, further parse responses of these, query more services based on secondary (tertiary?) results, and then finally merge... where does this madness stop? How does this affect response times?
And I've now effectively created another "client" that combines all microservice responses into one mega-response... which is no different that Method 1 above... except at server level.
Is that how it's done in the "real world"? Any insights? Are there any open source projects that are built on such API Gateway architecture I could examine?
The solution which we used for such problem was denormalization of data and events for updating.
Basically, a microservice has a subset of data it requires from other microservices beforehand so that it doesn't have to call them at run time. This data is managed through events. Other microservices when updated, fire an event with id as a context which can be consumed by any microservice which have any interest in it. This way the data remain in sync (of course it requires some form of failure mechanism for events). This seems lots of work but helps us with any future decisions regarding consolidation of data from different microservices. Our microservice will always have all data available locally for it process any request without synchronous dependency on other services
In your case i.e. for showing names with a message, you can keep an extra property for names in Service(B). So whenever a name update in Service(A) it will fire an update event with id for the updated name. The Service(B) then gets consumes the event, fetches relevant data from Service(A) and updates its database. This way even if Service(A) is down Service(B) will function, albeit with some stale data which will eventually be consistent when Service(A) comes up and you will always have some name to be shown on UI.
https://enterprisecraftsmanship.com/2017/07/05/how-to-request-information-from-multiple-microservices/
You might want to perform response aggregation strategies on your API gateway. I've written an article on how to perform this on ASP.net Core and Ocelot, but there should be a counter-part for other API gateway technologies:
https://www.pogsdotnet.com/2018/09/api-gateway-response-aggregation-with.html
You need to write another service called Aggregator which will internally call both services and get the response and merge/filter them and return the desired result. This can be easily achieved in non-blocking using Mono/Flux in Spring Reactive.
An API Gateway often does API composition.
But this is typical engineering problem where you have microservices which is implementing databases per service pattern.
The API Composition and Command Query Responsibility Segregation (CQRS) pattern are useful ways to implement queries .
Ideally I want to serve out a complete response set in one go
(messages and user info).
The problem you've described is what Facebook realized years ago in which they decided to tackle that by creating an open source specification called GraphQL.
But how does one even implement this?
It is already implemented in various popular programming languages and maybe you can give it a try in the programming language of your choice.

Querying a list of Actors in Azure Service Fabric

I currently have a ReliableActor for every user in the system. This actor is appropriately named User, and for the sake of this question has a Location property. What would be the recommended approach for querying Users by Location?
My current thought is to create a ReliableService that contains a ReliableDictionary. The data in the dictionary would be a projection of the User data. If I did that, then I would need to:
Query the dictionary. After GA, this seems like the recommended approach.
Keep the dictionary in sync. Perhaps through Pub/Sub or IActorEvents.
Another alternative would be to have a persistent store outside Service Fabric, such as a database. This feels wrong, as it goes against some of the ideals of using the Service Fabric. If I did, I would assume something similar to the above but using a Stateless service?
Thank you very much.
I'm personally exploring the use of Actors as the main datastore (ie: source of truth) for my entities. As Actors are added, updated or deleted, I use MassTransit to publish events. I then have Reliable Statefull Services subscribed to these events. The services receive the events and update their internal IReliableDictionary's. The services can then be queried to find the entities required by the client. Each service only keeps the entity data that it requires to perform it's queries.
I'm also exploring the use of EventStore to publish the events as well. That way, if in the future I decide I need to query the entities in a new way, I could create a new service and replay all the events to it.
These Pub/Sub methods do mean the query services are only eventually consistent, but in a distributed system, this seems to be the norm.
While the standard recommendation is definitely as Vaclav's response, if querying is the exception then Actors could still be appropriate. For me whether they're suitable or not is defined by the normal way of accessing them, if it's by key (presumably for a user record it would be) then Actors work well.
It is possible to iterate over Actors, but it's quite a heavy task, so like I say is only appropriate if it's the exceptional case. The following code will build up a set of Actor references, you then iterate over this set to fetch the actors and then can use Linq or similar on the collection that you've built up.
ContinuationToken continuationToken = null;
var actorServiceProxy = ActorServiceProxy.Create("fabric:/MyActorApp/MyActorService", partitionKey);
var queriedActorCount = 0;
do
{
var queryResult = actorServiceProxy.GetActorsAsync(continuationToken, cancellationToken).GetAwaiter().GetResult();
queriedActorCount += queryResult.Items.Count();
continuationToken = queryResult.ContinuationToken;
} while (continuationToken != null);
TLDR: It's not always advisable to query over actors, but it can be achieved if required. Code above will get you started.
if you find yourself needing to query across a data set by some data property, like User.Location, then Reliable Collections are the right answer. Reliable Actors are not meant to be queried over this way.
In your case, a user could simply be a row in a Reliable Dictionary.

How to optimize collection subscription in Meteor?

I'm working on a filtered live search module with Meteor.js.
Usecase & problem:
A user wants to do a search through all the users to find friends. But I cannot afford for each user to ask the complete users collection. The user filter the search using checkboxes. I'd like to subscribe to the matched users. What is the best way to do it ?
I guess it would be better to create the query client-side, then send it the the method to get back the desired set of users. But, I wonder : when the filtering criteria changes, does the new subscription erase all of the old one ? Because, if I do a first search which return me [usr1, usr3, usr5], and after that a search that return me [usr2, usr4], the best would be to keep the first set and simply add the new one to it on the client-side suscribed collection.
And, in addition, if then I do a third research wich should return me [usr1, usr3, usr2, usr4], the autorunned subscription would not send me anything as I already have the whole result set in my collection.
The goal is to spare processing and data transfer from the server.
I have some ideas, but I haven't coded enough of it yet to share it in a easily comprehensive way.
How would you advice me to do to be the more relevant possible in term of time and performance saving ?
Thanks you all.
David
It depends on your application, but you'll probably send a non-empty string to a publisher which uses that string to search the users collection for matching names. For example:
Meteor.publish('usersByName', function(search) {
check(search, String);
// make sure the user is logged in and that search is sufficiently long
if (!(this.userId && search.length > 2))
return [];
// search by case insensitive regular expression
var selector = {username: new RegExp(search, 'i')};
// only publish the necessary fields
var options = {fields: {username: 1}};
return Meteor.users.find(selector, options);
});
Also see common mistakes for why we limit the fields.
performance
Meteor is clever enough to keep track of the current document set that each client has for each publisher. When the publisher reruns, it knows to only send the difference between the sets. So the situation you described above is already taken care of for you.
If you were subscribed for users: 1,2,3
Then you restarted the subscription for users 2,3,4
The server would send a removed message for 1 and an added message for 4.
Note this will not happen if you stopped the subscription prior to rerunning it.
To my knowledge, there isn't a way to avoid removed messages when modifying the parameters for a single subscription. I can think of two possible (but tricky) alternatives:
Accumulate the intersection of all prior search queries and use that when subscribing. For example, if a user searched for {height: 5} and then searched for {eyes: 'blue'} you could subscribe with {height: 5, eyes: 'blue'}. This may be hard to implement on the client, but it should accomplish what you want with the minimum network traffic.
Accumulate active subscriptions. Rather than modifying the existing subscription each time the user modifies the search, start a new subscription for the new set of documents, and push the subscription handle to an array. When the template is destroyed, you'll need to iterate through all of the handles and call stop() on them. This should work, but it will consume more resources (both network and server memory + CPU).
Before attempting either of these solutions, I'd recommend benchmarking the worst case scenario without using them. My main concern is that without fairly tight controls, you could end up publishing the entire users collection after successive searches.
If you want to go easy on your server, you'll want to send as little data to the client as possible. That means every document you send to the client that is NOT a friend is waste. So let's eliminate all that waste.
Collect your filters (eg filters = {sex: 'Male', state: 'Oregon'}). Then call a method to search based on your filter (eg Users.find(filters). Additionally, you can run your own proprietary ranking algorithm to determine the % chance that a person is a friend. Maybe base it off of distance from ip address (or from phone GPS history), mutual friends, etc. This will pay dividends in efficiency in a bit. Index things like GPS coords or other highly unique attributes, maybe try out composite indexes. But remember more indexes means slower writes.
Now you've got a cursor with all possible friends, ranked from most likely to least likely.
Next, change your subscription to match those friends, but put a limit:20 on there. Also, only send over the fields you need. That way, if a user wants to skip this step, you only wasted sending 20 partial docs over the wire. Then, have an infinite scroll or 'load more' button the user can click. When they load more, it's an additive subscription, so it's not resending duplicate info. Discover Meteor describes this pattern in great detail, so I won't.
After a few clicks/scrolls, the user won't find any more friends (because you were smart & sorted them) so they will stop trying & move on to the next step. If you returned 200 possible friends & they stop trying after 60, you just saved 140 docs from going through the pipeline. There's your efficiency.

SO style reputation system with CQRS & Event Sourcing

I am diving into my first forays with CQRS and Event Sourcing and I have a few points Id like some guidance on. I would like to implement a SO style reputation system. This seems a perfect fit for this architecture.
Keeping SO as the example. Say a question is upvoted this generates an UpvoteCommand which increases the questions total score and fires off a QuestionUpvotedEvent.
It seems like the author's User aggregate should subscribe to the QuestionUpvotedEvent which could increase the reputation score. But how/when you do this subscription is not clear to me? In Greg Youngs example the event/command handling is wired up in the global.asax but this doesn't seem to involve any routing based on aggregate Id.
It seems as though every User aggregate would subscribe to every QuestionUpvotedEvent which doesn't seem correct, to make such a scheme work the event handler would have to exhibit behavior to identify if that user owned the question that was just upvoted. Greg Young implied this should not be in event handler code, which should merely involve state change.
What am i getting wrong here?
Any guidance much appreciated.
EDIT
I guess what we are talking about here is inter-aggregate communication between the Question & User aggregates. One solution I can see is that the QuestionUpvotedEvent is subscribed to by a ReputationEventHandler which could then fetch the corresponding User AR and call a corresponding method on this object e.g. YourQuestionWasUpvoted. This would in turn generated a user specific UserQuestionUpvoted event thereby preserving replay ability in the future. Is this heading in the right direction?
EDIT 2
See also the discussion on google groups here.
My understanding is that aggregates themselves should not be be subscribing to events. The domain model only raises events. It's the query side or other infrastructure components (such as an emailing component) that subscribe to events.
Domain Services are designed to work with use-cases/commands that involve more than one aggregate.
What I would do in this situation:
VoteUpQuestionCommand gets invoked.
The handler for VoteUpQuestionCommand calls:
IQuestionVotingService.VoteUpQuestion(Guid questionId, Guid UserId);
This then fecthes both the question & user aggregates, calling the appropriate methods on both, such as user.IncrementReputation(int amount) and question.VoteUp(). This would raise two events; UsersReputationIncreasedEvent and QuestionUpVotedEvent respectively, which would be handled by the query side.
My rule of thumb: if you do inter-AR communication use a saga. It keeps things within the transactional boundary and makes your links explicit => easier to handle/maintain.
The user aggregate should have a QuestionAuthored event... in that event is subscribes to the QuestionUpvotedEvent... similarly it should have a QuestionDeletedEvent and/or QuestionClosedEvent in which it does the proper handling like unsibscribing from the QuestionUpvotedEvent etc.
EDIT - as per comment:
I would implement the Question is an external event source and handle it via a gateway. The gateway in turn is the one responsible for handling any replay correctly so the end result stays exactly the same - except for special events like rejection events...
This is the old question and tagged as answered but I think can add something to it.
After few months of reading, practice and create small framework and application base on CQRS+ES, I think CQRS try to decouple components dependencies and responsibilities. In some resources write for each command you Should change maximum one aggregate on command handler (you can load more than one aggregate on handler but only one of them can change).
So in your case I think the best practice is #Tom answer and you should use saga. If your framework doesn't support saga (Like my small framework) you can create some event handler like UpdateUserReputationByQuestionVotedEvent. In that, handler create UpdateUserReputation(Guid user id, int amount) OR UpdateUserReputation(Guid user id, Guid QuestionId, int amount) OR
UpdateUserReputation(Guid user id, string description, int amount). After command sends to handler, the handler load user by user id and update states and properties. In this type of handling you can create a more complex scenario or workflow.