CQRS/Event Sourcing: How to enforce data-integrity? - cqrs

If I implement CQRS and Event Sourcing, how do I maintain the integrity and consistency of the data, assuming the final storage (read storage) of the data is in a RDBMS?
What if a an event is published but the RDBMS rejects the data derived from it, because of a check violation or missing FK reference?

CQRS implies at least 2 models: write and read. Both can be stored in the same db or in different dbs. With ES , you're using an event store which can be itself implemented on top of a rdbms (in .net there is NEventStore which afaik works with many databases rdbms or not).
You're saying you have the read model in a rdbms, that's great. Nothing needs to be enforced because it is the read model, nobody outside the model updater touches that. The app clients can ONLY query that model, never modify it. That's why you have 2 models in the first place so that the Domain would work with the 'write' model while the rest of the app works with the 'read' model.
Also, the RDBMS shouldn't really reject anything. An event handler should be idempotent so if, let's say, the handler inserts something with an id that should be unique, the second invocation should simply ignore any unique constraint violation. With CQRS you're using the RDBMS constraints to support idempotency, not to implement some business rule.
Also, think of the read model as the 'throw away model', that can be anytime changed or rebuilt.

Our read models are very simple. We don't have foriegn keys in them so that isn't possible. Why would you need foriegn keys? You may have foriegn key values but you don't need constraints as that is enforced by your domain model. You are really only reading and can rebuild all the read store if required.

Related

Lagom persistent read side and model evolution

To learn lagom i created a simple application with some simple persistent entities and a persistent read side (as per the official documentation, using cassandra)
The official doc contains a section about model evolution, describing how to change the model. However, there is no mention of evolution when it comes to the read side.
Assuming i have an entity called Item, with an ID and a name, and the read side creates a table like CREATE TABLE IF NOT EXISTS items (id TEXT, name TEXT, PRIMARY KEY (id))
I now want to change the item to include a description. This is trivial for the persistent entity, but the read side has to be changed as well.
I can see several approaches to (maybe) achieve that:
use a model evolution tool like liquibase or play evolutions to change the read side tables.
somehow include update table statements in createTables that migrate the model
create additional tables containing the additional information, and keep the old tables without modifications
Which approach would be the most fitting? Is there something better?
Creating a new table and dropping the old table is an option too IMHO.
It is simple as modifying your "create table" command ("create table mytable_v2 ..." and "drop table mytable...") and changing the offset name and modifying your event handlers.
override def buildHandler( ): ReadSideProcessor.ReadSideHandler[MyEvent] = {
readSide.builder[MyEvent]("myOffset") // change it to "myOffset_v2"
...
}
This results in all events to be replayed and your read side table to be reconstructed from the scratch. This may not be an option if the current table is really huge as the recostruction may last very long time.
Regarding what #erip says I see perfectly normal adding a new column to your read side table. Suppose there are lots of records in this table with list of all entities and you want to retrieve a list of entities based on some criteria so you need some columns to be included in the where clause. Retrieving list of all entities and asking each of them if it complies with the criteria is not an option at all - it could be very unefficient as it needs more time, memory and network usage.
The point of a read-side is to materialize views from entity state changes from your service's event stream in your service. In this respect, you as the service controller can decide what is important for your subscribers to know about. This is handled by creating read-sides with an anti-corruption layer (or ACL).
Typically your subscribers will subscribe to API events which should experience no evolution. Your internal events (or impl events) will likely need to evolve; because of this, there should be a transformation from the impl to the API.
This is why it's very important to consider your domain very carefully before design: you really need to nail down what subscribers will need to know about. In the case of a description, it strikes me as unlikely that subscribers will need (or want!) to know about that.

When to use an owned entity types vs just creating a foreign key or adding the columns directly to the table?

I was reading about owned entity types here https://learn.microsoft.com/en-us/ef/core/modeling/owned-entities#feedback and I was wondering when I would use that. Especially when using .ToTable(); although I am not sure if ToTable creates a relationship with keys.
I read the entire article so I understand that it essentially forces you to access the data via nav properties and prevents the owned table from being treated as an entity. They also say Include() is not needed and the data comes down with every query for the parent table so its not like you are reducing the amount of data that comes back.
So whats the point exactly? Also whats the point of "table splitting"?
It takes the place of Complex types with the option to set it up like a 1-1 relationship /w ToTable while automatically eager-loaded. This would use the same PK in both tables, same as 1-1.
The point Table-splitting would be that you want an object model that is normalized, where the table structure is not. This would fit scenarios where you have an existing table structure and want to split off related pieces of that data into sub-entities associated with the main entity. With the ToTable option, it would be similar to a 1-1 relationship, but automatically eager-loaded. However when considering the reasons to use a 1-1 relationship I would consider this option a bad choice.
The common reasons for using it in normal 1-1 relationships would include:
Splitting off expensive to load, rarely used data. (images, binary, memo)
Encapsulating data particular to a single application off of a common entity. i.e. if I have a "Customer" which is used by a billing system vs. a CRM I might have "CustomerBillingData" and "CustomerCRMData" owned by "Customer" rather than an inherited BillingCustomer / CRMCustomer. As there is a "single" customer that may serve one or both systems. Billing doesn't care about CRM data, CRM doesn't care about Billing. If all data is in "Customer" then both systems potentially need to be updated, and I cannot rely on constraints when the data is optional to the other system. By using composition I can enforce required data for a particular system.
In neither of these cases would I want to use table-splitting or anything that automatically eager-loads, so Owned Types /w ToTable would not replace 1-1 relationships by any stretch. It's essentially a more strict version of complex types, I'd say it's strictly used for entity organization. Not something I'd admit to wanting to use very often.

CQS and updating an existing entity

I'm just trying to get my head around how one goes about updating an entity using CQS. Say the UI allows a user to update several properties of a particular entity, and on submit, in the back-end, an update command is created and dispatched.
The part I'm not quite understanding is:
does the cmd handler receiving the message from the dispatcher then retrieve the existing entity from the DB to then map the received stock item properties to then save? Or
is the retrieval of the existing item done prior to the dispatching of the cmd msg, to which it is then attached (the retrieved entity attached to cmd that is then dispatched)?
My understanding is that CQS allows for a more easier transition to CQRS later on (if necessary)? Is that correct?
If that is the case, the problem with 2 above is that queries could be retrieved from a schema looking very different to that from the command/write schema. Am I missing something?
does the cmd handler receiving the message from the dispatcher then retrieve the existing entity from the DB to then map the received stock item properties to then save
Yes.
If you want to understand cqrs, it will help a lot to read up on ddd -- not that they are necessarily coupled, but because a lot of the literature on CQRS assumes that you are familiar with the DDD vocabulary.
But a rough outline of the responsibility of the command handler is
Load the current state of the target of the command
Invoke the command on the target
Persist the changes to the book of record
My understanding is that CQS allows for a more easier transition to CQRS later on (if necessary)?
That's not quite right -- understanding Meyer's distinction between command and queries make the CQRS pattern easier to think about, but I'm not convinced that actually helps in the transition all that much.
If that is the case, the problem with 2 above is that queries could be retrieved from a schema looking very different to that from the command/write schema. Am I missing something?
Maybe - queries typically run off of a schema that is optimized for query; another way of thinking about it is that the queries are returning different representations of the same entities.
Where things can get tricky is when the command representation and the query representation are decoupled -- aka eventual consistency. In a sense, you are always querying state in the past, but dispatching commands to state in the present. So you will need to have some mechanism to deal with commands that incorrectly assume the target is still in some previous state.

New entity ID in domain event

I'm building an application with a domain model using CQRS and domain events concepts (but no event sourcing, just plain old SQL). There was no problem with events of SomethingChanged kind. Then I got stuck in implementing SomethingCreated events.
When I create some entity which is mapped to a table with identity primary key then I don't know the Id until the entity is persisted. Entity is persistence ignorant so when publishing an event from inside the entity, Id is just not known - it's magically set after calling context.SaveChanges() only. So how/where/when can I put the Id in the event data?
I was thinking of:
Including the reference to the entity in the event. That would work inside the domain but not necesarily in a distributed environment with multiple autonomous system communicating by events/messages.
Overriding SaveChanges() to somehow update events enqueued for publishing. But events are meant to be immutable, so this seems very dirty.
Getting rid of identity fields and using GUIDs generated in the entity constructor. This might be the easiest but could hit performance and make other things harder, like debugging or querying (where id = 'B85E62C3-DC56-40C0-852A-49F759AC68FB', no MIN, MAX etc.). That's what I see in many sample applications.
Hybrid approach - leave alone the identity and use it mainly for foreign keys and faster joins but use GUID as the unique identifier by which i pull the entities from the repository in the application.
Personally I like GUIDs for unique identifiers, especially in multi-user, distributed environments where numeric ids cause problems. As such, I never use database generated identity columns/properties and this problem goes away.
Short of that, since you are following CQRS, you undoubtedly have a CreateSomethingCommand and corresponding CreateSomethingCommandHandler that actually carries out the steps required to create the new instance and persist the new object using the repository (via context.SaveChanges). I will raise the SomethingCreated event here rather than in the domain object itself.
For one, this solves your problem because the command handler can wait for the database operation to complete, pull out the identity value, update the object then pass the identity in the event. But, more importantly, it also addresses the tricky question of exactly when is the object 'created'?
Raising a domain event in the constructor is bad practice as constructors should be lean and simply perform initialization. Plus, in your model, the object isn't really created until it has an ID assigned. This means there are additional initialization steps required after the constructor has executed. If you have more than one step, do you enforce the order of execution (another anti-pattern) or put a check in each to recognize when they are all done (ooh, smelly)? Hopefully you can see how this can quickly spiral out of hand.
So, my recommendation is to raise the event from the command handler. (NOTE: Even if you switch to GUID identifiers, I'd follow this approach because you should never raise events from constructors.)

Polymorphic association foreign key constraints. Is this a good solution?

We're using polymorphic associations in our application. We've run into the classic problem: we encountered an invalid foreign key reference, and we can't create a foreign key constraint, because its a polymorphic association.
That said, I've done a lot of research on this. I know the downsides of using polymorphic associations, and the upsides. But I found what seems to be a decent solution:
http://blog.metaminded.com/2010/11/25/stable-polymorphic-foreign-key-relations-in-rails-with-postgresql/
This is nice, because you get the best of both worlds. My concern is the data duplication. I don't have a deep enough knowledge of postgresql to completely understand the cost of this solution.
What are your thoughts? Should this solution be completely avoided? Or is it a good solution?
The only alternative, in my opinion, is to create a foreign key for each association type. But then you run into validating that only one association exists. It's a "pick your poison" situation. Polymorphic associations clearly describe intent, and also make this scenario impossible. In my opinion that is the most important. The database foreign key constraint is a behind the scenes feature, and altering "intent" to work with database limitations feels wrong to me. This is why I'd like to use the above solution, assuming there is not a glaring "avoid" with it.
The biggest problem I have with PostgreSQL's INHERITS implementation is that you can't set a foreign key reference to the parent table. There are a lot of cases where you need to do that. See the examples at the end of my answer.
The decision to create tables, views, or triggers outside of Rails is the crucial one. Once you decide to do that, then I think you might as well use the very best structure you can find.
I have long used a base parent table, enforcing disjoint subtypes using foreign keys. This structure guarantees only one association can exist, and that the association resolves to the right subtype in the parent table. (In Bill Karwin's slideshow on SQL antipatterns, this approach starts on slide 46.) This doesn't require triggers in the simple cases, but I usually provide one updatable view per subtype, and require client code to use the views. In PostgreSQL, updatable views require writing either triggers or rules. (Versions before 9.1 require rules.)
In the most general case, the disjoint subtypes don't have the same number or kind of attributes. That's why I like updatable views.
Table inheritance isn't portable, but this kind of structure is. You can even implement it in MySQL. In MySQL, you have to replace the CHECK constraints with foreign key references to one-row tables. (MySQL parses and ignores CHECK constraints.)
I don't think you have to worry about data duplication. In the first place, I'm pretty sure data isn't duplicated between parent tables and inheriting tables. It just appears that way. In the second place, duplication or derived data whose integrity is completely controlled by the dbms is not an especially bitter pill to swallow. (But uncontrolled duplication is.)
Give some thought to whether deletes should cascade.
A publications example with SQL code.
A "parties" example with SQL code.
You cannot enforce that in a database in an easy way - so this is a really bad idea. The best solution is usually the simple one - forget about the polymorphic associations - this is a taste of an antipattern.