Composite unique constraint on business fields with Axon - cqrs

We leverage AxonIQ Framework in our system. We've faced a problem implementing composite uniq constraint based on aggregate business fields.
Consider following Aggregate:
#Aggregate
public class PersonnelCardAggregate {
#AggregateIdentifier
private UUID personnelCardId;
private String personnelNumber;
private Boolean archived;
}
We want to avoid personnelNumber duplicates in the scope of NOT-archived (archived == false) records. At the same time personnelNumber duplicates may exist in the scope of archived records.
Query Side check seems NOT to be an option. Taking into account Eventual Consistency nature of our system, more than one creation request with the same personnelNumber may exist at the same time and the Query Side may be behind.
What the solution would be?

What you're asking is an issue that can occur as soon as you start implementing an application along the CQRS paradigm and DDD modeling techniques.
The PersonnelCardAggregate in your scenario maintains the consistency boundary of a single "Personnel Card". You are however looking to expand this scope to achieve a uniqueness constraints among all Personnel Cards in your system.
I feel that this blog explains the problem of "Set Based Consistency Validation" you are encountering quite nicely.
I will not iterate his entire blog, but he sums it up as having four options to resolving the problem:
Introduce locking, transactions and database constraints for your Personnel Card
Use a hybrid locking field prior to issuing your command
Really on the eventually consistent Query Model
Re-examine the domain model
To be fair, option 1 wont do if your using the Event-Driven approach to updating your Command and Query Model.
Option 3 has been pushed back by you in the original question.
Option 4 is something I cannot deduce for you given that I am not a domain expert, but I am guessing that the PersonnelCardAggregate does not belong to a larger encapsulating Aggregate Root. Maybe the business constraint you've stated, thus the option to reuse personalNumbers, could be dropped or adjusted? Like I said though, I cannot state this as a factual answer for you, as I am not the domain expert.
That leaves option 2, which in my eyes would be the most pragmatic approach too.
I feel this would require a combination of a cache at your command dispatching side to deal with quick successions of commands to resolve the eventual consistency issue. To capture the occurs that an update still comes through accidentally, I'd introduce some form of Event Handler that (1) knows the entire set of "PersonnelCards" from a personalNumber/archived point of view and (2) can react on a faulty introduction by dispatching a compensating action.
You'd thus introduce some business logic on the event handling side of your application, which I'd strongly recommend to segregate from the application part which updates your query models (as the use cases are entirely different).
Concluding though, this is a difficult topic with several ways around it.
It's not so much an Axon specific problem by the way, but more an occurrence of modeling your application through DDD and CQRS.

Related

Check for object ownership with Prisma

I'm new to working with Prisma. One aspect that is unclear to me is the right way to check if a user has permission on an object. Let's assume we have Book and Author models. Every book has an author (one-to-many). Only authors have permission to delete books.
An easy way to enforce this would be this:
prismaClient.book.deleteMany({
id: bookId, <-- id is known
author: {
id: userId <-- id is known
}
})
But this way it's very hard to show an UnauthorizedError to the user. Instead, the response will be a 500 status code since we can't know the exact reason why the query failed.
The other approach would be to query the book first and check the author of the book instance, which would result in one more query.
Is there a best practice for this in Prisma?
Assuming you are using PostgreSQL, the best approach would be to use row-level-security(RLS) - but unfortunately, it is not yet officially supported by Prisma.
There is a discussion about this subject here
https://github.com/prisma/prisma/issues/5128
As for the current situation, to my opinion, it is better to use an additional query and provide the users with informative feedback rather than using the other method you suggested without knowing why it was not deleted.
Eventually, it is up to you to decide based on your use case - whether or not it is important for you to know the reason for failure.
So this question is more generic than prisma - it is also true when running updates/deletes in raw SQL.
When you have extra where clauses to check for ownership, it's difficult to infer which of the clause(s) caused that if the update does not happen, without further queries.
You can achieve this with row level security in postgres, but even that does not come out the box and involves custom configuration to throw specific exceptions when rows are not found due to row level security rules. See this answer for more detail.
I tend to think that doing customised stuff like this is rarely worth the tradeoff, unless you need specialised UX for an uncommon circumstance.
What I would suggest instead in this case is to keep it simple and just use extra queries to check for ownership, but optimise the UX optimistically for the case where the user does own the entity and keep that common and legitimate usecase to a single query.
That is, catch the exception from primsa (or the fact that the update returns 0 rows, or whatever it is in different cases), and only then run a specific select for ownership, to check if that was the reason the update failed.
This is a nice tradeoff because it keeps things simple, and only runs extra queries in the (usually) far less common failure case.
Even having said all that, the broader caveat as always is that probably the extra queries simply won't matter regardless! It's, in 99% of cases, probably best to just always run the extra ownership query upfront as a pattern to keep things as simple as possible, which is what you really want to optimise for over performance until you're running at significant scale.

CQRS projections, joining data from different aggregates via probe commands

In CQRS when we need to create a custom-tailored projections for our read-models, we usually prefer a "denormalized" projections (assume we are talking about projecting onto a DB). It is not uncommon to have the information need by the application/UI come from different aggregates (possibly from different BCs).
Imagine we need a projected table to contain customer's information together with her full address and that Customer and Address are different aggregates in our system (possibly in different BCs). Meaning that, addresses are generated and maintained independently of customers. Or, in other words, when a new customer is created, there is no guarantee that there will be an AddressCreatedEvent subsequently produced by the system, this event may have already been processed prior to the creation of the customer. All we have at the time of CreateCustomerCommand is an UUID of an existing address.
We have several solutions here.
Enrich CreateCustomerCommand and the subsequent CustomerCreatedEvent to contain full address of the customer (looking up this information on the fly from the UI or the controller). This way the projection handler will just update the table directly upon receiving CustomerCreatedEvent.
Use the addrUuid provided in CustomerCreatedEvent to perform an ad-hoc query in the projection handler to get the missing part of the address information before updating the table.
These are commonly discussed solution to this problem. However, as noted by many others, there are problems with each approach. Enriching events can be difficult to justify as well described by Enrico Massone in this question, for example. Querying other views/projections (kind of JOINs) will work but introduces coupling (see the same link).
I would like describe another method here, which, as I believe, nicely addresses these concerns. I apologize beforehand for not giving a proper credit if this is a known technique. Sincerely, I have not seen it described elsewhere (at least not as explicitly).
"A picture speaks a thousand words", as they say:
The idea is that :
We keep CreateCustomerCommand and CustomerCreatedEvent simple with only addrUuid attribute (no enriching).
In API controller we send two commands to the command handler (aggregates): the first one, as usual, - CreateCustomerCommand to create customer and project customer information together with addrUuid to the table leaving other columns (full address, etc.) empty for time being. (Warning: See the update, we may have concurrency issue here and need to issue the probe command from a Saga.)
Right after this, and after we have obtained custUuid of the newly created customer, we issue a special ProbeAddrressCommand to Address aggregate triggering an AddressProbedEvent which will encapsulate the full state of the address together with the special attribute probeInitiatorUuid which is, of course our custUuid from the previous command.
The projection handler will then act upon AddressProbedEvent by simply filling in the missing pieces of the information in the table looking up the required row by matching the provided probeInitiatorUuid (i.e. custUuid) and addrUuid.
So we have two phases: create Customer and probe for the related Address. They are depicted in the diagram with (1) and (2) correspondingly.
Obviously, we can send as many such "probe" commands (in parallel) as needed by our projection: ProbeBillingCommand, ProbePreferencesCommand, etc. effectively populating or "filling in" the denormalized projection with missing data from each handled "probe" event.
The advantages of this method is that we keep the commands/events in the first phase simple (only UUIDs to other aggregates) all the while avoiding synchronous coupling (joining) of the projections. The whole approach has a nice EDA feeling about it.
My question is then: is this a known technique? Seems like I have not seen this... And what can go wrong with this approach?
I would be more then happy to update this question with any references to other sources which describe this method.
UPDATE 1:
There is one significant flaw with this approach that I can see already: command ProbeAddrressCommand cannot be issued before the projection handler had a chance to process CustomerCreatedEvent. But this is impossible to know from the API gateway (or controller).
The solution would probably involve a Saga, say CustomerAddressJoinProjectionSaga with will start upon receiving CustomerCreatedEvent and which will only then issue ProbeAddrressCommand. The Saga will end upon registering AddressProbedEvent. Or, if many other aggregates are involved in probing, when all such events have been received.
So here is the updated diagram.
UPDATE 2:
As noted by Levi Ramsey (see answer below) my example is rather convoluted with respect to the choice of aggregates. Indeed, Customer and Address are often conceptualized as belonging together (same Aggregate Root). So it is a better illustration of the problem to think of something like Student and Course instead, assuming for the sake of simplicity that there is a straightforward relation between the two: a student is taking a course. This way it is more obvious that Student and Course are independent aggregates (students and courses can be created and maintained at different times and different places in the system).
But the question still remains: how can we obtain a projection containing the full information about a student (full name, etc.) and the courses she is registered for (title, credits, the instructor's full name, prerequisites, etc.) all in the same table, if the UI requires it ?
A couple of thoughts:
I question why address needs to be a separate aggregate much less in a different bounded context, in view of the requirement that customers have an address. If in some other bounded context customer addresses are meaningful (e.g. you want to know "which addresses have more customers" etc.), then that context can subscribe to the events from the customer service.
As an alternative, if there's a particularly strong reason to model addresses separately from customers, why not have the read side prospectively listen for events from the address aggregate and store the latest address for a given address UUID in case there's a customer who ends up with that address. The reliability per unit effort of that approach is likely to be somewhat greater, I would expect.

DDD, Event Sourcing, and the shape of the Aggregate state

I'm having a hard time understanding the shape of the state that's derived applying that entity's events vs a projection of that entity's data.
Is an Aggregate's state ONLY used for determining whether or not a command can successfully be applied? Or should that state be usable in other ways?
An example - I have a Post entity for a standard blog post. I might have events like postCreated, postPublished, postUnpublished, etc. For my projections that I'll be persisting in my read tables, I need a projection for the base posts (which will include all posts, regardless of status, with lots of detail) as well as published_posts projection (which will only represent posts that are currently published with only the information necessary for rendering.
In the situation above, is my aggregate state ONLY supposed to be used to determine, for example, if a post can be published or unpublished, etc? If this is the case, is the shape of my state within the aggregate purely defined by what's required for these validations? For example, in my base post projection, I want to have a list of all users that have made a change to the post. In terms of validation for the aggregate/commands, I couldn't care less about the list of users that have made changes. Does that mean that this list should not be a part of my state within my aggregate?
TL;DR: yes - limit the "state" in the aggregate to that data that you choose to cache in support of data change.
In my aggregates, I distinguish two different ideas:
the history , aka the sequence of events that describes the changes in the lifetime of the aggregate
the cache, aka the data values we tuck away because querying the event history every time kind of sucks.
There's not a lot of value in caching results that we are never going to use.
One of the underlying lessons of CQRS is that we don't need aggregates everywhere
An AGGREGATE is a cluster of associated objects that we treat as a unit for the purpose of data changes. -- Evans, 2003
If we aren't changing the data, then we can safely work directly with immutable copies of the data.
The only essential purpose of the aggregate is to determine what events, if any, need to be applied to bring the aggregate's state in line with a command (if the aggregate can be brought so in line). All state that's not needed for that purpose can be offloaded to a read-side, which can be thought of as a remix of the event stream (with each read-side only maintaining the state it needs).
That said, there are in practice, reasons to use the aggregate state directly, with the primary one being a desire for a stronger consistency for the aggregate: CQRS is inherently eventually consistent. As with all questions of consistent updates, it's important to recognize that consistency isn't free and very often isn't even cheap; I tend to think of a project as having a consistency budget and I'm pretty miserly about spending it.
In your case, there's probably no reason to include the list of users changing a post in the aggregate state, unless e.g. there's something like "no single user can modify a given post more than n times".

Recreate a graph that change in time

I have an entity in my domain that represent a city electrical network. Actually my model is an entity with a List that contains breakers, transformers, lines.
The network change every time a breaker is opened/closed, user can change connections etc...
In all examples of CQRS the EventStore is queried with Version and aggregateId.
Do you think I have to implement events only for the "network" aggregate or also for every "Connectable" item?
In this case when I have to replay all events to get the "actual" status (based on a date) I can have near 10000-20000 events to process.
An Event modify one property or I need an Event that modify an object (containing all properties of the object)?
Theres always an exception to the rule but I think you need to have an event for every command handled in your domain. You can get around the problem of processing so many events by making use of Snapshots.
http://thinkbeforecoding.com/post/2010/02/25/Event-Sourcing-and-CQRS-Snapshots
I assume you mean currently your "connectable items" are part of the "network" aggregate and you are asking if they should be their own aggregate? That really depends on the nature of your system and problem and is more of a DDD issue than simple a CQRS one. However if the nature of your changes is typically to operate on the items independently of one another then then should probably be aggregate roots themselves. Regardless in order to answer that question we would need to know much more about the system you are modeling.
As for the challenge of replaying thousands of events, you certainly do not have to replay all your events for each command. Sure snapshotting is an option, but even better is caching the aggregate root objects in memory after they are first loaded to ensure that you do not have to source from events with each command (unless the system crashes, in which case you can rely on snapshots for quicker recovery though you may not need them with caching since you only pay the penalty of loading once).
Now if you are distributing this system across multiple hosts or threads there are some other issues to consider but I think that discussion is best left for another question or the forums.
Finally you asked (I think) can an event modify more than one property of the state of an object? Yes if that is what makes sense based on what that event represents. The idea of an event is simply that it represents a state change in the aggregate, however these events should also represent concepts that make sense to the business.
I hope that helps.

"Life Beyond Transactions" Entity-Message-Activity Model in Practice?

Over vacation I read Pat Helland's "Life Beyond Transactions" (yes, vacation was that good :). To sum it up briefly, it advocates limiting the scope of transactions to a single entity and then using groups of "activities" that have the ability to update the entity or cancel a task anytime a change takes place that would make that task invalid.
(E.g. Shipping Order A requires some amount of Item 1. The Shipping Orders and Items are stored as entities and have their own activities. Shipping Order B ships with the last of Item 1 before A finishes. The activity for Item 1 cancels Shipping Order A.)
I had thought I was printing out the Dynamo paper, so forgive me if I conflate the two here. I've seen quite a few "NoSQL" projects influenced by Dynamo and BigTable, particularly in how they address entities by keys and partition data. I was wondering if this Entity-Message-Activity model has influenced any of them?
Or, to put it in more concrete terms, if I have an operation in HBase, Cassandra, Riak, etc. that spans multiple entities, do I need to implement an Activity all by myself (as more of a design pattern in the application), or is there some kind of existing framework? Or do they do something else completely that renders this entire question moot?
Thanks!
I can add my 2 cents here just from a Cassandra point of view (I haven't used the other NoSQL engines available). Cassandra is primarily designed to be a fast read-write structure. Twitter is a great use case for Cassandra (check the twitter clone Twissandra for this)
Assuming I have understood your question correctly: yes you will have to implement the activity yourself. To understand the modeling of Column/SuperColumnFamilies I would suggest reading this great article WTF is a SuperColumn?
Cheers!