UML Use-Case Diagram - event-handling

I'm drawing a UML use-case diagram for the following scenario:
External system provides an event - I think my actor here would be the event generated
System ingests the event
System enriches the event
System correlates enriched event with some data existing in the system
If system finds a hit notification is sent to human actor
If not, event is discarded
So my 2 actors would be: the event generated by this external system and the user that receives the notification.
The event calls use case Ingest event use case
The external user uses Receive notification use case
Now, I'm not certain on how to model the other items in my initial list of they should be modeled at all.
Should I have something like:
Event (actor) - Generate notification (use case) - User (actor)
and then some relationships between Generate Notification use case and other uses cases: ingest event, enrich event, correlate event ?
Should I model the discard event at all?
Thanks!

Let's look at the definitions.
Actor
I will use a definition from UML Specification, section 18.1.3.1
An Actor models a type of role played by an entity that interacts with the subjects of its associated UseCases (e.g., by exchanging signals and data). Actors may represent roles played by human users, external hardware, or other systems
It clearly states a close list of who/what can become an Actor. In your case, it is an External System that interacts with your system so it is your Actor. The other Actor is naturally User.
Use Case
Here I will support myself with the definition from Alistair Cockburn's "Writing Effective Use Cases" book (section 1.1).
A use case captures a contract between the stakeholders of a system about its behavior. The use case describes the system’s behavior under various conditions as it responds to a request from one of the stakeholders, called the primary actor. The primary actor initiates an interaction with the system to accomplish some goal. The system responds, protecting the interests of all the stakeholders. Different sequences of behavior, or scenarios, can unfold, depending on the particular requests made and conditions surrounding the requests. The use case collects together those different scenarios.
In your case, it is apparent that once the External System (primary Actor) provides an event, the processing is then carried out by your system until either the User (secondary Actor) is notified or the event is discarded. It seems that there are no delays or further interactions required in order to accomplish this goal so it is clearly just one use case, Ingest event.
The processing ending by notifying the User will be your main path while the one where the event gets discarded will be either an alternative or even a negative path (depending on how you look at it).
If you consider the discard as an alternative path, you should model the multiplicity of the association to the User as 0..1 to show the User is not always notified. You do not have to do that if you account this as a negative path, as those are considered "failure paths" so not all tasks of the UC has to happen. I would be very careful though. Since you expect discarding to be something happening on a regular basis it seems to be an alternative path rather than a negative one.
Alternative approach
The assumption in my model is that you actively notify the User (e.g. send him a push, a mail or do some other action).
It might be possible though that you just create a notification and the User has to actively read it. In such case, User would not be an Actor of Ingest event at all. Instead, as a result of Ingest event you create a notification (not visible on UC diagram). In addition, User needs an additional use case to Read notifications, in which he is the primary (initiating) Actor.
Summary (TL;DR)
You only have one Use Case in your scenario: Ingest event.
Your Actors are: External System (primary) and User (secondary with multiplicity 0..1).

For your actor: Event is no actor. Event is an event. If you don't have a specific name call it External system.
Ingest event would be ok as UC.
I'd guess that enrichment goes with Ingest event. Same for discarding it. Both are activities inside the use case.
Correlation and sending to the user would be (UC---Actor) Inform User --- User.
Not informing the user would be a path inside the Inform User's activities.
Generally:
Use cases show added value the system under consideration delivers to one of its actors.
Use cases are no functions (those are hidden inside the UC's activities).
If a UC does not add value, it is no use case.

Related

Designing event-based architecture for the customer service

Being a developer with solid experience, i am only entering the world of microservices and event-driven architecture. Things like loose coupling, independent scalability and proper implementation of asynchronous business processes is something that i feel should get simplified as compared with traditional monolith approach. So giving it a try, making a simple PoC for myself.
I am considering making a simple application where user can register, login and change the customer details. However, i want to react on certain events asynchronously:
customer logs in - we send them an email, if the IP address used is new to the system.
customer changes their name, we send them an email notifying of the change.
The idea is to make a separate application that reacts on "CustomerLoggedIn", "CustomerChangeName" events.
Here i can think of three approaches, how to implement this simple functionality, with each of them having some drawbacks. So, when a customer submits their name change:
Store change name Changed name is stored in the DB + an event is sent to Kafkas when the DB transaction is completed. One of the big problems that arise here is that if a customer had 2 tabs open and almost simultaneously submits a change from initial name "Bob" to "Alice" in one tab and from "Bob" to "Jim" in another one, on a database level one of the updates overwrites the other, which is ok, however we cannot guarantee the order of the events to be the same. We can use some checks to ensure that DB update is only done when "the last version" has been seen, thus preventing the second update at all, so only one event will be emitted. But in general case, this pattern will not allow us to preserve the same order of events in the DB as in Kafka, unless we do DB change + Kafka event sending in one distributed transaction, which is anti-pattern afaik.
Change the name in the DB, and use Debezium or similar DB CDC to capture the event and stream it. Here we get a single event source, so ordering problem is solved, however what bothers me is that i lose the ability to enrich the events with business information. Another related drawback is that CDC will stream all the updates in the "customer" table regardless of the business meaning of the event. So, in this case, i will probably need to build a Kafka Streams application to convert the DB CDC events to business events and decouple the DB structure from event structure. The potential benefit of this approach is that i will be able to capture "direct" DB changes in the same manner as those originated in the application.
Emit event from the application, without storing it in the DB. One of the subscribers might to the DB persistence, another will do email sending, etc. The biggest problem i see here is - what do i return to the client? I cannot say "Ok, your name is changed", it's more like "Ok, you request has been recorded and will be processed". In case if the customer quickly hits refresh - he expects to see his new name, as we don't want to explain to the customers what's eventual consistency, do we? Also the order of processing the same event by "email sender" and "db updater" is not guaranteed, so i can send an email before the change is persisted.
I am looking for advices regarding any of these three approaches (and maybe some others i am missing), maybe the usecases when one can be preferrable over others?
It sounds to me like you want event sourcing. In event sourcing, all you need to store is the event: the current state of a customer is derived from replaying the events (either from the beginning of time, or since a snapshot: the snapshot is just an optional optimization). Some other process (there are a few ways to go about this) can then project the events to Kafka for consumption by interested parties. Since every event has a sequence number, you can use the sequence number to prevent concurrent modification (alternatively, the more actor modely event-sourcing implementations can use techniques like cluster sharding in Akka to achieve the same ends).
Doing this, you can have a "write-side" which processes the updates in a strongly consistent manner and can respond to queries which only involve a single customer having seen every update to that point (the consistency boundary basically makes customer in this case an aggregate in domain-driven-design terms). "Read-sides" consuming events are eventually consistent: the latencies are typically fairly short: in this case your services sending emails are read-sides (as would be a hypothetical panel showing names of all customers), but the customer's view of their own data could be served by the write-side.
(The separation into read-sides and write-side (the pluralization is significant) is Command Query Responsibility Segregation, which sometimes gets interpreted as "reads can only be served by a read-side". This is not totally accurate: for one thing a write-side's model needs to be read in order for the write-side to perform its task of validating commands and synchronizing updates, so nearly any CQRS-using project violates that interpretation. CQRS should instead be interpreted as "serve reads from the model that makes the most sense and avoid overcomplicating a model (including that model in the write-side) to support a new read".)
I think I qualify to answer this, having extensively used debezium for simplifying the architecture.
I would prefer Option 2:
Every transaction always results in an event emitted in correct order
Option 1/3 has a corner case, what if transaction succeeds, but application fails to emit the event?
To your point:
Another related drawback is that CDC will stream all the updates in
the "customer" table regardless of the business meaning of the event.
So, in this case, i will probably need to build a Kafka Streams
application to convert the DB CDC events to business events and
decouple the DB structure from event structure.
I really dont think that is a roadblock. The benefit you get is potentially other usecases may crop up where another consumer to this topic may want to read other columns of the table.
Option 1 and 3 are only going to tie this to your core application logic, and that is not doing any favor from simplifying PoV. With option 2, with zero code changes to core application APIs, a developer can independently work on the events, with no need to understand that core logic.

Correct aproach to model network interaction in Enterprise Architect

I have a class Actor whose instances send/receive network messages. (E.g. each instance of that class is part of a different process running on a different physical machine.) The network messages are serialized instances of classes MessageA and MessageB whose attributes are sent over the wire. An incoming message is handled by a callback method method of my Actor class. An ougoing message is triggered by calling a method of my Actor class.
Hence, I started to model this situation in a class diagram like this:
The network messages are "signals" in EA term, i.e. classes with a special prototype (for succinctness the attributes are left out)
My Actor-class is an usual class in EA with four corresponding methods
Now, I want to model a typical interaction and started to draw the following sequence diagram:
The messages are no methods invocations, but are asynchronous and have kind "signal" which allows me to assign them the correct message type.
However, I wonder how I model
the fact that a message with payload MessageA is handled by onMessageAReceived
that method sendMessageA emits a message with payload MessageA
(Note: In terms of my implementation it is correct, that sendMessageA returns void, because sending a network message is asynchronous, offloaded to the underlying OS and the method returns to its callee after having send the message.)
in the sequence diagram.
Maybe, my whole approach is completely wrong and I am trying to model something which cannot be modeled like that. In that case some pointers to the correct approach are highly welcome.
Of course there's more than one way to model this (and it does not depend on the tool EA). So, you should ask which audience you are talking to, repsectively which their domain is basically.
Technical
A SD is well suited to show a physical transport. In that case you concentrate on the way how messages are sent. In this case you will have the physical operations shown as messages. E.g. using sockets, it would be some (a-)synchronous send(message) which assures that the content message is transported from A to B. This could be at any level of technical implementation from rough to single CRCs being sent (or how the operation is internally built to ensure packages are not lost).
Logical
In order to show a more logical aspect it's a good idea to have components (being deployed on multiple hardware) having ports (realizing some interface) along which you have an information flow (which is a connector you will find in EA) that can transport something (that is your message classes).
Overview
You might want to describe both aspects in your model. But likely you will have the focus on the one or other part depending on your overall domain.
There is no single way to model something. Models are always abstraction which is why we create models. They shall show reality, but more light weight.

How to store sagas’ data?

From what I read aggregates must only contain properties which are used to protect their invariants.
I also read sagas can be aggregates which makes sense to me.
Now I modeled a registration process using a saga: on RegistrationStarted event it sends a ReserveEmail command which will trigger an EmailReserved or EmailReservationFailed given if the email is free or not. A listener will then either send a validation link or a message telling an account already exists.
I would like to use data from the RegistrationStarted event in this listener (say the IP and user-agent). How should I do it?
Storing these data in the saga? But they’re not used to protect invariants.
Pushing them through ReserveEmail command and the resulting event? Sounds tedious.
Project the saga to the read model? What about eventual consistency?
Another way?
Rinat Abdullin wrote a good overview of sagas / process managers.
The usual answer is that the saga has copies of the events that it cares about, and uses the information in those events to compute the command messages to send.
List[Command] processManager(List[Event] events)
Pushing them through ReserveEmail command and the resulting event?
Yes, that's the usual approach; we get a list [RegistrationStarted], and we use that to calculate the result [ReserveEmail]. Later on, we'll get [RegistrationStarted, EmailReserved], and we can use that to compute the next set of commands (if any).
Sounds tedious.
The data has to travel between the two capabilities somehow. So you are either copying the data from one message to another, or you are copying a correlation identifier from one message to another and then allowing the consumer to decide how to use the correlation identifier to fetch a copy of the data.
Storing these data in the saga? But they’re not used to protect invariants.
You are typically going to be storing events in the sagas (to keep track of what has happened). That gives you a copy of the data provided in the event. You don't have an invariant to protect because you are just caching a copy of a decision made somewhere else. You won't usually have the process manager running queries to collect additional data.
What about eventual consistency?
By their nature, sagas are always going to be "eventually consistent"; the "state" of an instance of a saga is just cached copies of data controlled elsewhere. The data is probably nanoseconds old by the time the saga sees it, there's no point in pretending that the data is "now".
If I understand correctly I could model my saga as a Registration aggregate storing all the events whose correlation identifier is its own identifier?
Udi Dahan, writing about CQRS:
Here’s the strongest indication I can give you to know that you’re doing CQRS correctly: Your aggregate roots are sagas.

EventStore: learning how to use

I'm trying to learn EventStore, I like the concept but when I try to apply in practice I'm getting stuck in same point.
Let's see the code:
foreach (var k in stream.CommittedEvents)
{
//handling events
}
Two question about that:
When an app start ups after some maintenance, how do we bookmark in a
safe way what events start to read? Is there a pattern to use?
as soon the events are all consumed, the cycle ends... what about the message arriving run time? I would expect the call blocking until some new message arrive ( of course need to be handled in a thread ) or having something like BeginRead EndRead.
Do I have to bind an ESB to handle run time event or does the EventSore provides some facility to do this?
I try to better explain with an example
Suppose the aggregate is a financial portfolio, and the application is an application showing that portfolio to a trader. Suppose the trader connect to the web app and he looks at his own portfolio. The current state will be the whole history, so I have to read potentially a lot of records to reproduce the status. I guess this could be done by a so called snapshot, but who's responsible for creating it? When one should choose to create an aggregate? How can one guess a snapshot for an aggregate exists ?
For the runtime part: as soon the user look at the reconstructed portfolio state, the real time part begin to run. The user can place an order and a new position can be created by succesfully execute that order in the market. How is the portfolio updated by the infrastructure? I would expect, but maybe I'm completely wrong, having the same event stream being the source of that new event new long position, otherwise I have two path handling the state of the same aggregate. I would like to know if this is how the strategy is supposed to work, even if I feel a little tricky having the two state agents, that can possibly overlap.
Just to clarify how I fear the overlapping:
I know events has to be idempotent, so I know it must not be a
problem anyway,
But let's consider the following:
I subscribe an event bus before streaming the event to update the state of the portfolio. some "open position event" appears on the bus: I must handle them, but maybe the portfolio is not in the correct state to handle it since is not yet actualized. Even if I'm able to handle such events I will find them again when I read the stream.
More insidious: I open the stream and I read all events and I create a state. Then I subscribe to the bus: some message on the bus happen in the middle between the end of the steram reading and the beggining of the subscription: those events are missing and the aggregate is not in the correct state.
Please be patient all, my English is poor and the argument is tricky, hope I managed to share my doubt :)
The current state will be the whole history, so I have to read
potentially a lot of records to reproduce the status. I guess this
could be done by a so called snapshot, but who's responsible for
creating it?
In CQRS and event sourcing, queries are served by projections which are generated from events emitted by aggregates. You don't use the aggregate instance as reconstituted from the event store to display information.
The term snapshot refers specifically to an optimization of the event store which allows rebuilding the aggregate without replaying all of the events.
Projections are essentially event handlers which maintain a denormalized view of aggregates. Events emitted from aggregates are published, possibly out of band, and the projection subscribes to and handles those events. A projection can combine multiple aggregates if a requirement exists to display summary information, for instance. In case of a trading application, each view will typically contain data from various aggregates. Projections are designed in a consumer-driven way - application requirements determine the different views of the underlying data that are needed.
With this type of workflow you have to embrace eventual consistency throughout your application. For instance, if an end user is viewing their portfolio and initiating new trades, the UI has to subscribe to updates to reflect updated projections in an asynchronous manner.
Take a look at here for an overview of CQRS and event sourcing.

CQRS events do not contain details needed for updating read model

There is one thing about CQRS I do not get: How to update the read model when the raised event does not contain the details needed for updating the read model.
Unfortunately, this is a quite common scenario.
Example: I add a user to a group, so I send a addUserToGroup(userId, groupId) command. This is received, handled by the command handler, the userAddedToGroup event is created, stored and published.
Now, an event handler receives this event and the both IDs. Now there shall be a view that lists all users with the names of the groups they're in. To update the read model for that view, we do need the user id (which we have) and the group name (which we don't have, we only have its id).
So the question is: How do I handle this scenario?
Currently, four options come to my mind, all with their specific disadvantages:
The read model asks the domain. => Forbidden, and not even possible, as the domain only has behavior, no (public) state.
The read model reads the group name from another table in the read model. => Works, but what if there is no matching table?
Add the neccessary data to the event. => Does not work, as this means that I had to update all previous events as well, and I cannot foresee which data I may need one day.
Do not handle the event via a "usual" event handler, but start an ETL process in the background that deals with the event store, creates the neccessary data and writes the read model. => Works, but to me this seems a little bit of way too much overhead for such a simple scenario.
So, the question is: How do I deal with this scenario correctly?
There are two common solutions.
1) "Event Enrichment" is where you indeed put information on the event that reflects the information you are mentioning, e.g. the group name. Doing this is somewhere between modeling your domain properly and cheating. If you know, for instance, that group names change, emitting the name at the moment of the change is not a bad idea. Imagine when you create a line item on a quote or invoice, you want to emit the price of the good sold on the invoice created event. This is because you must honor that price, even if it changes later.
2) Project several streams at once. Write a projector which watches information from the various streams and joins them together. You might watch user and group events as well as your user added to group event. Depending on the ordering of events in your system, you may know that a user is in a group before you know the name of the group, but you should know the general properties of your event store before you get going.
Events don't necessarily represent a one-to-one mapping of the commands that have initiated the process in the first place. For instance, if you have a command:
SubmitPurchaseOrder
Shopping Cart Id
Shipping Address
Billing Address
The resulting event might look like the following:
PurchaseOrderSubmitted
Items (Id, Name, Amount, Price)
Shipping Address
Shipping Provider
Our Shipping Cost
Shipping Cost billed to Customer
Billing Address
VAT %
VAT Amount
First Time Customer
...
Usually the information is available to the domain model (either by being provided by the command or as being known internal state of the concerned aggregate or by being calculated as part of processing.)
Additionally the event can be enriched by querying the read model or even a different BC (e.g. to retrieve the actual VAT % depending on state) during processing.
You're correctly assuming that events can (and probably will) change over time. This basically doesn't matter at all if you employ versioning: Add the new event (e.g. SubmitPurchaseOrderV2) and add an appropriate event handler to all the classes that are supposed to consume it. No need to change the old event, it can still be consumed since you don't modify the interface, you extend it. This basically comes down to a very good example of the Open/Closed Principle in practice.
Option 2 would be fine, your question about "what about the mismatching in the groups' name read-model table" wouldn´t apply. no data should be deleted, should invalidated when a previous event (say delete group) was emmited. In the end the row in the groups table is there effectively and you can read the group name without problem at all. The only apparent problem could be speed inconsistency, but thats another issue, events should be orderly processed no matter speed they are being processed.