Confused about React's Flux architecture - waitFor - cqrs

I have my own opiniated way on how to use React and am building my own framework, inspired by Om. I am implementing something a bit similar to the Flux architecture, with stores that can update themselves on some events.
What I am not sure to understand is why in the Flux architecture do we need store dependencies?
Stores aren't supposed to be self-contained data holders for a given bounded context, like we do with CQRS architectures?
In an evented system, 2 CQRS components could end up holding the same data. Do we express store dependencies to avoid holding duplicate data in stores?
Can someone come up with some very concrete use cases where store dependencies are needed and where the problem can hardly be solved in any other way? I can't find any myself.

In refluxjs we solve waitFor in a couple of ways, one for the sequential data flow and the other for the parallel data flow. I try to model the data stores in a way to avoid holding the same data (i.e. double maintenance data).
Basically, data stores are CQRS components, and I try to avoid having 2 data stores end up with the same kind of data. If I need to transform the data somehow that only some components need, I break that out to an "aggregate" data store. Naïve implementation:
var carsStore = Reflux.createStore({
init: function() {
this.listenTo(Actions.updateCars, this.updateCallback);
},
updateCallback: function() {
$.ajax('/api/cars', {}).done(function(data) {
this.trigger(data.cars);
}.bind(this));
}
});
We can create another data store that aggregates the data by listening to the carsStore:
var modelsStore = Reflux.createStore({
init: function() {
this.listenTo(carsStore, this.carsCallback);
},
carsCallback: function(cars) { // passed on from carsStore trigger
this.trigger(this.getModels(cars)); // pass on the models
}
getModels: function(cars) {
return _.unique(_.map(cars, function(car) { return car.model; }));
}
});
That way your React view components may use one to get the cars and the other to get the models, which is aggregated from the carStore.
If a store needs to wait for two parallell data streams to complete we provide the Reflux.all to join actions and stores. This is useful e.g. if you're waiting for data to load from seperate different REST resources.
var carsAndPartsAreLoaded = Reflux.all(carStore, partsStore);
// you may now listen to carsAndPartsAreLoaded
// from your data stores and components
Hope this makes sense to you.

I've finally built an application with something akin to Flux stores without any dependency.
Recently Dan Abramov created a framework (Redux) that highlights flux stores composability, without the need of any store dependency or waitFor, and I share most of his ideas

Related

Firestore database structure in flutter best practice

I'm creating a fitness app, and so far I came with the following structure:
Workout
difficulty (String)
duration (String)
exerciseSets (Firestore ref)
ExerciseSet
repNumber (int)
exercise (Firestore Ref)
and the Exercise object has a few fields describing the exercise.
So right now if i want to retrieve a whole workout, i need to do at least 3 calls to firestore, one for the Workout, then i get the ExerciseSets by ref (and there are usually a few in each workout) , and then the Exercise by ref as well..
ExerciseSet and Exercise objects are shared between workouts, thats why i have them in a different doc.
Also after retrieving all 3 or more snapshots from Firestore, i need to iterate through them to map it to my model.. i do something like this currently:
for (var exerciseSet in fsWorkout.exerciseSets) {
var fsExerciseSet = await _getFsExerciseSet(exerciseSet.ref);
var set = ExerciseSet.fromFirstoreObject(fsExerciseSet);
var fsExercise = await _getFsExercise(fsExerciseSet.exerciseRef.ref);
set.exercise = Exercise.fromFirestoreObject(fsExercise);
exerciseSets.add(set);
}
return Workout(fsWorkout.difficulty, fsWorkout.duration, exerciseSets);
Does this make sense? or is there a more efficient/easy way to achieve this? It feels like I over complicated stuff..
And is there any advantage to using firestore reference instead of just a String field with the ID?
Thanks!
EDIT: I would like to mention that in my case all the data is added once by me, and the client reads the data and needs to retrieve a Workout object that contains all the ExerciseSet and Exercise objects.
You are actually applying an SQL normalization data-modelling strategy to a NonSQL database. This is not the most efficient approach...
In the NoSQL world, you should not be afraid to duplicate data and denormalize your data model. I would suggest your read this "famous" post about NoSQL data-modelling approaches.
So, instead of designing your data-model according to SQL normalization you should, in the NoSQL world, think about it from a query perspective, trying to minimize the number of queries for a given screen/use case.
In your case a common approach would be to use a set of Cloud Functions (which are executed in the back-end) to duplicate your data and have all the ExerciceSets and corresponding Exercises in your Workout Firestore document. And to keep all these data in sync, you would also use also use Cloud Functions.
You could also go for an intermediate approach where you only add the ExerciceSets data to a Workout and when the user wants to see an ExerciceSet details (e.g. by clicking on the ExerciceSet link) you query the corresponding Exercises.

Do we perfom queries to the event store? When and how?

I am new to event sourcing, but as fas as I have understood when we have a command use case, we instantiate an aggregate in memory, apply events to it from the event store so as to be in the correct state, make the proper changes and then store those changes back to the event store. We also have a read model store that will eventually be updated by these changes.
In my case I have a CreateUserUseCase (which is a command use case) and I want to first check if the user already exists and if the username is already taken. For example something like this:
const userAlreadyExists = await this.userRepo.exists(email);
if (userAlreadyExists) {
return new EmailAlreadyExistsError(email);
}
const alreadyCreatedUserByUserName = await this.userRepo
.getUserByUserName(username);
if (alreadyCreatedUserByUserName) {
return new UsernameTakenError(username);
}
const user = new User(username, password, email);
await this.userRepo.save(user);
So, for the save method I would use the event store and append the uncommitted events to it. What about the exists and getUserByUserName methods though? On the one hand I want to make a specific query so I could use my read model store to get the data that I need, but on the other hand this makes a contradiction with CQRS. So what do we do in these cases? Do we, in some way, perform queries to the event store? And in what way do we do this?
Thank you in advance!
CQRS shouldn't be interpreted as "don't query the write model" (because the process of determining state from the write model for the purpose of command processing entails a query, this restriction is untenable). Instead, interpret it as "it's perfectly acceptable to have a different data model for a query than the one you use for handling intentions to update". This formulation implies that if the write model is a good fit for a given query, it's OK to execute the query against the write model.
Event sourcing in turn is arguably (especially in conjunction with certain usage styles) the ultimate in data models that optimize for write vs. read and accordingly the event-sourced model makes nearly all queries outside of a fairly small set so inefficient that some form of CQRS is needed.
What query facilities an event store includes are typically limited, but the one query anything that's a suitable event store will have (because it's needed for replaying the events) is a compound query that amounts to "give me the latest snapshot for that entity and either (if the snapshot exists) the first n events after that snapshot or (if no snapshot) the first n events for that entity". The result of that query is dispositive (modulo things like retention etc.) to the question of "has this entity published events"?

How to persist aggregate/read model from "EventStore" in a database?

Trying to implement Event Sourcing and CQRS for the first time, but got stuck when it came to persisting the aggregates.
This is where I'm at now
I've setup "EventStore" an a stream, "foos"
Connected to it from node-eventstore-client
I subscribe to events with catchup
This is all working fine.
With the help of the eventAppeared event handler function I can build the aggregate, whenever events occur. This is great, but what do I do with it?
Let's say I build and aggregate that is a list of Foos
[
{
id: 'some aggregate uuidv5 made from barId and bazId',
barId: 'qwe',
bazId: 'rty',
isActive: true,
history: [
{
id: 'some event uuid',
data: {
isActive: true,
},
timestamp: 123456788,
eventType: 'IsActiveUpdated'
}
{
id: 'some event uuid',
data: {
barId: 'qwe',
bazId: 'rty',
},
timestamp: 123456789,
eventType: 'FooCreated'
}
]
}
]
To follow CQRS I will build the above aggregate within a Read Model, right? But how do I store this aggregate in a database?
I guess just a nosql database should be fine for this, but I definitely need a db since I will put a gRPC APi in front of this and other read models / aggreates.
But what do I actually go from when I have built the aggregate, to when to persist it in the db?
I once tried following this tutorial https://blog.insiderattack.net/implementing-event-sourcing-and-cqrs-pattern-with-mongodb-66991e7b72be which was super simple, since you'd use mongodb both as the event store and just create a view for the aggregate and update that one when new events are incoming. It had it's flaws and limitations (the aggregation pipeline) which is why I now turned to "EventStore" for the event store part.
But how to persist the aggregate, which is currently just built and stored in code/memory from events in "EventStore"...?
I feel this may be a silly question but do I have to loop over each item in the array and insert each item in the db table/collection or do you somehow have a way to dump the whole array/aggregate there at once?
What happens after? Do you create a materialized view per aggregate and query against that?
I'm open to picking the best db for this, whether that is postgres/other rdbms, mongodb, cassandra, redis, table storage etc.
Last question. For now I'm just using a single stream "foos", but at this level I expect new events to happen quite frequently (every couple of seconds or so) but as I understand it you'd still persist it and update it using materialized views right?
So given that barId and bazId in combination can be used for grouping events, instead of a single stream I'd think more specialized streams such as foos-barId-bazId would be the way to go, to try and reduce the frequency of incoming new events to a point where recreating materialized views will make sense.
Is there a general rule of thumb saying not to recreate/update/refresh materialized views if the update frequency gets below a certain limit? Then the only other a lternative would be querying from a normal table/collection?
Edit:
In the end I'm trying to make a gRPC api that has just 2 rpcs - one for getting a single foo by id and one for getting all foos (with optional field for filtering by status - but that is not so important). The simplified proto would look something like this:
rpc GetFoo(FooRequest) returns (Foo)
rpc GetFoos(FoosRequest) returns (FooResponse)
message FooRequest {
string id = 1; // uuid
}
// If the optional status field is not specified, return all foos
message FoosRequest {
// If this field is specified only return the Foos that has isActive true or false
FooStatus status = 1;
enum FooStatus {
UNKNOWN = 0;
ACTIVE = 1;
INACTIVE = 2;
}
}
message FoosResponse {
repeated Foo foos;
}
message Foo {
string id = 1; // uuid
string bar_id = 2 // uuid
string baz_id = 3 // uuid
boolean is_active = 4;
repeated Event history = 5;
google.protobuf.Timestamp last_updated = 6;
}
message Event {
string id = 1; // uuid
google.protobuf.Any data = 2;
google.protobuf.Timestamp timestamp = 3;
string eventType = 4;
}
The incoming events would look something like this:
{
id: 'some event uuid',
barId: 'qwe',
bazId: 'rty',
timestamp: 123456789,
eventType: 'FooCreated'
}
{
id: 'some event uuid',
isActive: true,
timestamp: 123456788,
eventType: 'IsActiveUpdated'
}
As you can see there is no uuid to make it possible to GetFoo(uuid) in the gRPC API, which is why I'll generate a uuidv5 with the barId and bazId, which will combined, be a valid uuid. I'm making that in the projection / aggregate you see above.
Also the GetFoos rpc will either return all foos (if status field is left undefined), or alternatively it'll return the foo's that has isActive that matches the status field (if specified).
Yet I can't figure out how to continue from the catchup subscription handler.
I have the events stored in "EventStore" (https://eventstore.com/), using a subscription with catchup, I have built an aggregate/projection with an array of Foo's in the form that I want them, but to be able to get a single Foo by id from a gRPC API of mine, I guess I'll need to store this entire aggregate/projection in a database of some sort, so I can connect and fetch the data from the gRPC API? And every time a new event comes in I'll need to add that event to the database also or how is this working?
I think I've read every resource I can possibly find on the internet, but still I'm missing some key pieces of information to figure this out.
The gRPC is not so important. It could be REST I guess, but my big question is how to make the aggregated/projected data available to the API service (possible more API's will need it as well)? I guess I will need to store the aggregated/projected data with the generated uuid and history fields in a database to be able to fetch it by uuid from the API service, but what database and how is this storing process done, from the catchup event handler where I build the aggregate?
I know exactly how you feel! This is basically what happened to me when I first tried to do CQRS and ES.
I think you have a couple of gaps in your knowledge which I'm sure you will rapidly plug. You hydrate an aggregate from the event stream as you are doing. That IS your aggregate persisted. The read model is something different. Let me explain...
Your read model is the thing you use to run queries against and to provide data for display to a UI for example. Your aggregates are not (directly) involved in that. In fact they should be encapsulated. Meaning that you can't 'see' their state from the outside. i.e. no getter and setters with the exception of the aggregate ID which would have a getter.
This article gives you a helpful overview of how it all fits together: CQRS + Event Sourcing – Step by Step
The idea is that when an aggregate changes state it can only do so via an event it generates. You store that event in the event store. That event is also published so that read models can be updated.
Also looking at your aggregate it looks more like a typical read model object or DTO. An aggregate is interested in functionality, not properties. So you would expect to see void public functions for issuing commands to the aggregate. But not public properties like isActive or history.
I hope that makes sense.
EDIT:
Here are some more practical suggestions.
"To follow CQRS I will build the above aggregate within a Read Model, right? "
You do not build aggregates in the read model. They are separate things on separate sides of the CQRS side of the equation. Aggregates are on the command side. Queries are done against read models which are different from aggregates.
Aggregates have public void functions and no getter or setters (with the exception of the aggregate id). They are encapsulated. They generate events when their state changes as a result of a command being issued. These events are stored in an event store and are used to recover the state of an aggregate. In other words, that is how an aggregate is stored.
The events go on to be published so the event handlers and other processes can react to them and update the read model and or trigger new cascading commands.
"Last question. For now I'm just using a single stream "foos", but at this level I expect new events to happen quite frequently (every couple of seconds or so) but as I understand it you'd still persist it and update it using materialized views right?"
Every couple of seconds is very likely to be fine. I'm more concerned at the persist and update using materialised views. I don't know what you mean by that but it doesn't sound like you have the right idea. Views should be very simple read models. No need to complex relations like you find in an RDMS. And is therefore highly optimised fast for reading.
There can be a lot of confusion on all the terminologies and jargon used in DDD and CQRS and ES. I think in this case, the confusion lies in what you think an aggregate is. You mention that you would like to persist your aggregate as a read model. As #Codescribler mentioned, at the sink end of your event stream, there isn't a concept of an aggregate. Concretely, in ES, commands are applied onto aggregates in your domain by loading previous events pertaining to that aggregate, rehydrating the aggregate by folding each previous event onto the aggregate and then applying the command, which generates more events to be persisted in the event store.
Down stream, a subscribing process receives all the events in order and builds a read model based on the events and data contained within. The confusion here is that this read model, at this end, is not an aggregate per se. It might very well look exactly like your aggregate at the domain end or it could be only creating a read model that doesn't use all the events and or the event data.
For example, you may choose to use every bit of information and build a read model that looks exactly like the aggregate hydrated up to the newest event(likely your source of confusion). You may instead have another process that builds a read model that only tallies a specific type of event. You might even subscribe to multiple streams and "join" them into a big read model.
As for how to store it, this is really up to you. It seems to me like you are taking the events and rebuilding your aggregate plus a history of events in a memory structure. This, of course, doesn't scale, which is why you want to store it at rest in a database. I wouldn't use the memory structure, since you would need to do a lot of state diffing when you flush to the database. You should be modify the database directly in response to each individual event. Ideally, you also transactionally store the stream count with said modification so you don't process the same event again in the case of a failure.
Hope this helps a bit.

How to control data failures in Azure Data Factory Pipelines?

I receive an error from time and time due to incompatible data in my source data set compared to my target data set. I would like to control the action that the pipeline determines based on error types, maybe output or drop those particulate rows, yet completing everything else. Is that possible? Furthermore, is it possible to get a hold of the actual failing line(s) from Data Factory without accessing and searching in the actual source data set in some simple way?
Copy activity encountered a user error at Sink side: ErrorCode=UserErrorInvalidDataValue,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Column 'Timestamp' contains an invalid value '11667'. Cannot convert '11667' to type 'DateTimeOffset'.,Source=Microsoft.DataTransfer.Common,''Type=System.FormatException,Message=String was not recognized as a valid DateTime.,Source=mscorlib,'.
Thanks
I think you've hit a fairly common problem and limitation within ADF. Although the datasets you define with your JSON allow ADF to understand the structure of the data, that is all, just the structure, the orchestration tool can't do anything to transform or manipulate the data as part of the activity processing.
To answer your question directly, it's certainly possible. But you need to break out the C# and use ADF's extensibility functionality to deal with your bad rows before passing it to the final destination.
I suggest you expand your data factory to include a custom activity where you can build some lower level cleaning processes to divert the bad rows as described.
This is an approach we often take as not all data is perfect (I wish) and ETL or ELT doesn't work. I prefer the acronym ECLT. Where the 'C' stands for clean. Or cleanse, prepare etc. This certainly applies to ADF because this service doesn't have its own compute or SSIS style data flow engine.
So...
In terms of how to do this. First I recommend you check out this blog post on creating ADF custom activities. Link:
https://www.purplefrogsystems.com/paul/2016/11/creating-azure-data-factory-custom-activities/
Then within your C# class inherited from IDotNetActivity do something like the below.
public IDictionary<string, string> Execute(
IEnumerable<LinkedService> linkedServices,
IEnumerable<Dataset> datasets,
Activity activity,
IActivityLogger logger)
{
//etc
using (StreamReader vReader = new StreamReader(YourSource))
{
using (StreamWriter vWriter = new StreamWriter(YourDestination))
{
while (!vReader.EndOfStream)
{
//data transform logic, if bad row etc
}
}
}
}
You get the idea. Build your own SSIS data flow!
Then write out your clean row as an output dataset, which can be the input for your next ADF activity. Either with multiple pipelines, or as chained activities within a single pipeline.
This is the only way you will get ADF to deal with your bad data in the current service offerings.
Hope this helps

Querying a list of Actors in Azure Service Fabric

I currently have a ReliableActor for every user in the system. This actor is appropriately named User, and for the sake of this question has a Location property. What would be the recommended approach for querying Users by Location?
My current thought is to create a ReliableService that contains a ReliableDictionary. The data in the dictionary would be a projection of the User data. If I did that, then I would need to:
Query the dictionary. After GA, this seems like the recommended approach.
Keep the dictionary in sync. Perhaps through Pub/Sub or IActorEvents.
Another alternative would be to have a persistent store outside Service Fabric, such as a database. This feels wrong, as it goes against some of the ideals of using the Service Fabric. If I did, I would assume something similar to the above but using a Stateless service?
Thank you very much.
I'm personally exploring the use of Actors as the main datastore (ie: source of truth) for my entities. As Actors are added, updated or deleted, I use MassTransit to publish events. I then have Reliable Statefull Services subscribed to these events. The services receive the events and update their internal IReliableDictionary's. The services can then be queried to find the entities required by the client. Each service only keeps the entity data that it requires to perform it's queries.
I'm also exploring the use of EventStore to publish the events as well. That way, if in the future I decide I need to query the entities in a new way, I could create a new service and replay all the events to it.
These Pub/Sub methods do mean the query services are only eventually consistent, but in a distributed system, this seems to be the norm.
While the standard recommendation is definitely as Vaclav's response, if querying is the exception then Actors could still be appropriate. For me whether they're suitable or not is defined by the normal way of accessing them, if it's by key (presumably for a user record it would be) then Actors work well.
It is possible to iterate over Actors, but it's quite a heavy task, so like I say is only appropriate if it's the exceptional case. The following code will build up a set of Actor references, you then iterate over this set to fetch the actors and then can use Linq or similar on the collection that you've built up.
ContinuationToken continuationToken = null;
var actorServiceProxy = ActorServiceProxy.Create("fabric:/MyActorApp/MyActorService", partitionKey);
var queriedActorCount = 0;
do
{
var queryResult = actorServiceProxy.GetActorsAsync(continuationToken, cancellationToken).GetAwaiter().GetResult();
queriedActorCount += queryResult.Items.Count();
continuationToken = queryResult.ContinuationToken;
} while (continuationToken != null);
TLDR: It's not always advisable to query over actors, but it can be achieved if required. Code above will get you started.
if you find yourself needing to query across a data set by some data property, like User.Location, then Reliable Collections are the right answer. Reliable Actors are not meant to be queried over this way.
In your case, a user could simply be a row in a Reliable Dictionary.