How to persist and replay NestJS CQRS event and saga across restart? - cqrs

I am making an application which will need to use NestJS' CQRS module, as the requirements naturally lend themselves to that pattern.
Updates to the application logic are expected to be frequent and to happen during busy hours (that's just how my management works...), so the application needs to be able to restart gracefully. However, this means that events started just before the shutdown may not finish, or even if they do, some sagas may not trigger due to some events having happened before the restart... I'd like to ensure that doesn't happen.
I'm aware of NestJS' OnApplicationShutdown and OnApplicationBootstrap hooks, which is exactly for this purpose, but what I'm not sure is what I should do there. How can I capture all events that have unfinished handlers and sagas? Then after a restart, how can I make the event bus aware of the events monitored by sagas, without executing the already executed handlers?
I guess the second part could be worked around with a random ID per event/handler combo, that will be looked up in a log, and if present, the handler will be skipped, and if not, it will be executed and added to the log... But even with such a workaround, I don't see how I could do the first part. There will be a lot of events, and sagas (by definition) execute commands, meaning they have side effects... Even if all commands can become idempotent, the sheer quantity of events and frequent restarts means restarting from the very first command is a no go.
I've seen this package but I'm not sure if it solves this particular use case, or if it's really just logging the events, and pretty much nothing more.

Related

Axon Server auto-scaling split/merge delay

I am implementing auto-scaling in an application using Axon Server, and running in k8s.
I have created ReST endpoints in the application itself, which look at the local configuration (for processors and thread counts) and then speak to the Axon Server ReST API in order to split/merge the processors appropriately. The intent being to use container lifecycle hooks to trigger them.
As a result, if a new instance (pod) of an application is launched, configured for 2 threads on ProcessorA, then my code will make 2 requests to the /v1/components/blah/processors/ProcessorA/segments/split?context=default endpoint on the server. This is in order to make full use of the 2 new threads.
Likewise, when the pod is shut down, it makes 2 similar requests to the merge endpoint on the server.
When scaling up I see the processor split twice, as expected. However, on shutdown I don't see the merge twice unless I put a long (5s) wait between requests. This isn't likely to be particularly stable, so I'm wondering if there's something else I need to be doing.
Perhaps I ought to request the merge, then loop waiting for it to occur, then request another. This seems like it's going to be excessively slow.
There was another question on SO somewhat related, Automatically scale Axon's tracking event processors, where Steven commented that there was no inbuilt auto-scaling in Axon Server at that point in time. I've not seen anything in more recent times either.
As it stands work is underway to improve the split/merge functionality. For one, the result of a split/merge will be returned, which has been resolved under issue #1001.
This should make it so you do not have to wait for the status' to have been updated, which is the likely cause why it (seems to) take long. This functionality will be part of Axon Framework / Server 4.4 by the way, which should be released relatively soon.
Subsequently, discussion are still underway to allow for auto scaling. One requirement deemed important is the capability of a TrackingEventProcessor to process several segments per thread (issue #1434). This will ensure that the TEP can take over several segments to transition the boundary when scaling, for example.
Eventually though, Axon Server should be able to do this for you. It's just not there yet.
So for now I think the most pragmatic solution is indeed to wait for the result to show up on the status'. As said, I trust 4.4 will improve upon this by returning the result of the split/merge operation once called. Lastly, the Axon team is aware this can be improved upon further, hence why discussion on the matter are underway.

Is there a way to rely on Postgres Notify/Listen mechanism?

I have implemented a Notify/Listen mechanism, so when a special request is sent to the web server, using notify I can notify the workers (in Python) that there's a pending request waiting to be processed.
The implementation works fine, but the problem is that if the workers server is restarting, the notification gets lost, since at that particular time there's no listener.
I can implement a service like MQRabbit or similar, but my needs are so simple that implement such a monster is too much.
Is there any way, a configuration variable perhaps, that can give some persistence to the notification mechanism?
Thanks in advance
I don't think there is a way to persist notification channels, but you can simply store the pending requests to a table, and have the worker check for any missed work on startup.
Either a timestamp or a pending/completed flag would work, depending on what kind of work it's doing.
For consistency, you can have the NOTIFY fire from an INSERT trigger on the queue table, and have the worker always check for any remaining work (not just a specific request) when notified.

What triggers UI refresh in CQRS client app?

I am attempting to learn and apply the CQRS design approach (pattern and architecture) to a new project but seem to be missing a key piece.
My client application executes a query and retrieves a list of light-weight, read-only DTOs from the read model. The user selects an item and clicks a button to initiate some action. The action is performed by creating and sending the corresponding command object to the write model (where the command handler carries out the action, updates the data store, etc.) At some point, however, I need to update the UI to reflect changes to the state of the application resulting from the action.
How does the UI know when it is time to refresh the original list?
Additional Info
I have noticed that most articles/blogs discussing CQRS use MVC client apps in their examples. I am working on a Silverlight client right now and am beginning to wonder if the pattern simply doesn't work in that case.
Follow-Up Question
After thinking more about Bartlomiej's response and subsequent discussion, I am wondering about error handling in CQRS. Given that commands are basically fire-and-forget asynchronous operations, how do we report an error condition to the UI?
I see 'refreshing the UI' to take one of two forms:
The operation succeeds, data has changed and the UI should be updated to reflect these changes
The operation fails, data has not changed but the user should be notified of the failure and potential corrective actions.
Even with a Post-Redirect-Get pattern in an MVC, you can't really Redirect until you know the outcome of the operation. None of the examples I've seen thus far address these real-world concerns.
I've been struggling with similar issues for a WPF client. The re-query trigger for any data is dependent on the data your updating, commands tend to fall into categories:
The command is a true fire and forget method, it informs the back-end of a state change but this change does not need to be reflected in the UI, or the change simply isn't important to the UI.
The command will alter the result of a single query
The command will alter the result of multiple queries, usually (in my domain at least) in a cascading fashion, that is, changing the state of a single "high level" piece of data will likely affect many "low level" caches.
My first trigger is the page load, very few items are exempt from this as most pages must assume data has been updated since it was last visited. Though some systems may be able to escape with only updating financial and other critical data in this way.
For short commands I also update data when 'success' is returned from a command. Though this is mostly laziness as IMHO all CQRS commands should be fired asynchronously. It's still an option I couldn't live without but one you may have to if your implementation expects high latency between command and query.
One pattern I'm starting to make use of is the mediator (most MVVM frameworks come with one). When I fire a command, I also fire a message to the mediator specifying which command was launched. Each Cache (A view model property Retriever<T>) listens for commands which affect it and then updates appropriately. I try to minimise the number of messages while still minimising the number of caches that update unnecessary from a single message so I'll (hopefully) eventually end up with a shortlist of update reasons, with each 'reason' updating a list of caches.
Another approach is simple honesty, I find that by exposing graphically how the system updates itself makes users more willing to be patient with it. On firing a command show some UI indicating you're waiting for the successful response, on error you could offer to retry / show the error, on success you start the update of the relevant fields. Baring in mind that this command could have been fired from another terminal (of which you have no knowledge) so data will need to timeout eventually to avoid missing state changes invoked by other machines also.
Noting the irony that the only efficient method of updating cache's and values on a client is to un-separate the commands and queries again, be it through hardcoding or something like a hashmap.
My two cents.
I think MVVM actually fits into CQRS quite well. The ViewModel simply becomes an observable ReadModel.
1 - You initialize your ViewModel state via a query on the ReadModel.
2 - Changes on your ViewModel are automatically reflected on any Views that are bound to it.
3 - Certain changes on your ViewModel trigger a command to propegate to a message queue, an object responsible for sending those commands to the server takes those messages off the queue and sends them to the WriteModel.
4 - Clients should be well formed, meaning the ViewModel should have performed appropriate validation before it ever triggered the command. Once the command has been triggered, any event notifications can be published onto an event bus for the client to communicate changes to other ViewModels or components in the system interested in those changes. These events should carry the relevant information necessary. Typically, this means that other view models usually don't have to re-query the read model as a result of the change unless they are dependent on other data that needs to be retrieved.
5 - There is an object that connects to the message bus on the server for real-time push notifications when other clients make changes that this client is interested in knowing about, falling back to long-polling if necessary. It propagates those to the internal message bus that ties the components on the client together.
6 - The last part to handle is the fact that clients can be occasionally connected, which should be the only reason a command fails (they don't have internet access at the moment), which is when the client should be notified of problems.
In my ASP.NET MVC 3 I use 2 techniques depending on use case:
already well-known Post-Redirect-Get pattern which fits nicely with CQRS. Your MVC action that triggers the command returns a redirection to action that performs a query.
in some cases, like real-time updates of other clients, I rely on domain events/messages. I create an event handler that uses singlarR to push changes to all connected and interested clients.
There are two major ways you can take as far as I know :
1) design your UI , so that the user does not see its changes right away. Like for instance a message to tell him his action is a success, and offering him different choices to continue his work. this should buy you enough time to have updated your readmodel.
2) more complex, but you might keep the information you have send to the server and shows them in the interface.
The most important I guess, educate your user if you can so that they know why the data is not here... yet!
I am thinking about it only now, but these are for sync command handling, not async, in async things go really harder on the brain...the client interface becomes an event eater too..

CQRS/EventStore: How are failures to deliver events handled?

Getting into CQRS and I understand that you have commands (app layer) and events (from the domain).
In the simple case where events are to update the read model, do read model updates fail? If there is no "bug" then I cannot see them failing and as I am using EventStore, I know there is a commit flag which will retry failures.
So my question is do I have to do anything in addition to EventStore to handle failures?
Coming from a world where you do everything in one transaction and now things are done separately is worrying me.
Of course there may be cases where a published event will fail in the read models.
You have to make sure you can detect that and solve it.
The nice thing is that you can replay all the events again and again so you have the chance not only to fix the error. You can also test the fix by replaying every single event if you want.
I use NServiceBus as my publishing mechanism which allows me to use an error queue. Using my other logging tools together with the error queue I can easily determine what happened since I have the error log and the actual message that caused the error in the first place.

Windows Workflow: Persistence and Polling

I'm currently learning the WF framework, so bear with me; mostly I'm looking for where to start looking, not necessarily a direct answer. I just can't seem to figure out how to begin researching what I'd like in The Google.
Let's say I have a simple one-step workflow (much more complicated than that, but for simplicity's sake). This workflow needs to watch a certain record in the database to see when it changes. I don't have the capability to "push" via a trigger from the database when the row changes, so I need to poll for it every so often.
This workflow needs to be persisted to the database to be durable against restarts and whatnot as this is a long-running workflow. I'm trying to figure out the best way to get it to check every 3 minutes or so and also persist to the database. Do the persistence capabilities of the framework allow for that? It seems to be time-based. And since the workflow won't be reawakened by an external event, how does it reload from the database and check the same step it did previously again? Does it attempt the last unfulfilled activity automatically upon reloading?
Do "while" activities with a delay attached to it work at all, or can it be handled solely through the persistence services?
I'm not sure what you mean by "handled soley through persistence services"? Persistence refers only to the storing of an idle workflow.
You could have a Delay and a Code activity in a Sequence in a While loop. When in the Delay the workflow will go idle and may be persisted if necessary. However depending on how much state is needed when persisting the workflow and/or how many such workflows you would have running at any one time may mean that a leaner approach is necessary.
A leaner approach would be to externalise the DB watching and have some "DB watching" workflow service raise an event when the desired change has occured. This service would be added to Workflow runtime.
To that end you need a service contract which is defined by an Inteface with the [ExternalDataExchange] attribute. This interface in turn defines an event that the service will raise when the desired DB change is detected. It also defines a method that a Workflow can call to specify what what change this service should be looking for. The method should accept an instance GUID so that the requesting instance can be found when the DB change is detected.
In the workflow you use a CallExternalMethodActivity to call this services method. You then flow to a HandleExternalEventActivity which listen for the event. At this point the workflow will go idle and can be persisted. It will remain there until the service raises the event.