Are Birth and Death as Events modeled as events or attributes of profile in Genealogy - database-schema

Are births and deaths modeled as events for a person in a genealogy profile or as attributes of the person. What are the pros and cons of each approach?

if you consider that every event has artifacts to go with it, they really should be events, so you can have all of the documents, etc. associated with them.
on the other hand, can you imagine a person record that doesn't have birth/death dates as attributes? you wouldn't want to have to do a join with the events that give you birth/death just so you could sort by those dates.
so there are the pros and cons, but there's also the idea you can have both. if you are willing to live with a database that isn't completely normalized, you can have them as events and for each person with birth/death events, copy those values into the attributes.
keep in mind, of course, that you could have multiple birth/death events for a person, records that might be in conflict, in which case only one of them that the user has indicated is meant to be the person's birth/date attribute would be copied.

"Events" in genealogy (and in genealogy software) are generally considered something that takes place at a given time and place. They can be events for an individual, e.g. Birth, death, baptism, naturalization, emigration, etc., or for a family (husband/wife), e.g. Marriage, Engagement, Divorce.
"Attributes" (or "facts") are generally considered to be something that is true, e.g. Scholastic achievement,Tribal Origin, Occupation, Religious Affiliation, Title.
These are how GEDCOM defines them and how they try to get programmers to program them.
Personally, my concept of an "event' is a transition in a change of state. e.g. Going from before someone was born until once they are alive. It need not be a short period of time, but may take a long time, e.g. World War II was an event. And events can contain other events (e.g. the specific battles in World War II).
One more example is hair color, which is considered an attribute. But someone can be born with blond hair, have it fall out and replaced with brown hair, and then as they get older it turns grey before falling out again. Hair color are attributes that are true over a certain time, and are "fuzzy" as the event happens that changes it from one to another.
My concept of an "attribute" is that they have time periods to them. The attribute is the state which can be changed by events. e.g. "Occupation" changes with the "getting fired" event and
"Unemployed" takes over until the "getting hired" event occurs.
So attributes are between events, and events separate different attributes.
What I am basically saying is that in my genealogy program, I really don't make a distinction between events and attributes. I treat them the same. Either may include a date or time period and events usually include a place and attributes usually don't.
Because of their similarities, I don't see any need to model them separately.

Related

How to handle static data in ES/CQRS?

After reading dozens of articles and watching hours of videos, I don't seem to get an answer to a simple question:
Should static data be included in the events of the write/read models?
Let's take the oh-so-common "orders" example.
In all examples you'll likely see something like:
class OrderCreated(Event):
....
class LineAdded(Event):
itemID
itemCount
itemPrice
But in practice, you will also have lots of "static" data (products, locations, categories, vendors, etc).
For example, we have a STATIC products table, with their SKUs, description, etc. But in all examples, the STATIC data is never part of the event.
What I don't understand is this:
Command side: should the STATIC data be included in the event? If so, which data? Should the entire "product" record be included? But a product also has a category and a vendor. Should their data be in the event as well?
Query side: should the STATIC data be included in the model/view? Or can/should it be JOINED with the static table when an actual query is executed.
If static data is NOT part of the event, then the projector cannot add it to the read model, which implies that the query MUST use joins.
If static data IS part of the event, then let's say we change something in the products table (e.g. typo in the item description), this change will not be reflected in the read model.
So, what's the right approach to using static data with ES/CQRS?
Should static data be included in the events of the write/read models?
"It depends".
First thing to note is that ES/CQRS are a distraction from this question.
CQRS is simply the creation of two objects where there was previously only one. -- Greg Young
In other words, CQRS is a response to the idea that we want to make different trade offs when reading information out of a system than when writing information into the system.
Similarly, ES just means that the data model should be an append only sequence of immutable documents that describe changes of information.
Storing snapshots of your domain entities (be that a single document in a document store, or rows in a relational database, or whatever) has to solve the same problems with "static" data.
For data that is truly immutable (the ratio of a circle's circumference and diameter is the same today as it was a billion years ago), pretty much anything works.
When you are dealing with information that changes over time, you need to be aware of the fact that that the answer changes depending on when you ask it.
Consider:
Monday: we accept an order from a customer
Tuesday: we update the prices in the product catalog
Wednesday: we invoice the customer
Thursday: we update the prices in the product catalog
Friday: we print a report for this order
What price should appear in the report? Does the answer change if the revised prices went down rather than up?
Recommended reading: Helland 2015
Roughly, if you are going to need now's information later, then you need to either (a) write the information down now or (b) write down the information you'll need later to look up now's information (ex: id + timestamp).
Furthermore, in a distributed system, you'll need to think about the implications when part of the system is unavailable (ex: what happens if we are trying to invoice, but the product catalog is unavailable? can we cache the data ahead of time?)
Sometimes, this sort of thing can turn into a complete tangle until you discover that you are missing some domain concept (the invoice depends on a price from a quote, not the catalog price) or that you have your service boundaries drawn incorrectly (Udi Dahan talks about this often).
So the "easy" part of the answer is that you should expect time to be a concept you model in your solution. After that, it gets context sensitive very quickly, and discovering the "right" answer may involve investigating subtle questions.

Event Sourcing - stream design

I am sitting here looking into CQRS and event sourcing, really interesting topics. When it comes to stream design, and and aggregate roots, i feel a bit left in the dark. How do you do it?
Lets imagine that i have an UI, where i can add stuff to a basket, generating a lines in a basket.
Would I have:
a stream pr basket (with basic info attached, like shipping details, name, email etc)
a stream pr basketline
So i would have many streams
streams/basket-[basketid]
streams/basketline-[basketid]
Basically i only send the minimal data over the wire.
or would i simply have one stream
stream/basket-[basketid]
And every time i add a line to my basket, i send the whole basket over the wire.
As i understand it, it is best to have one to many streams, and not one big streams/basket stream. Or am I mistaken here as well?
My focus here is streams. Any "best practices" on this kind of design: Links, books etc would be appriciated.
How do you do it?
Start by watching All Our Aggregates are Wrong (Mauro Servienti, 2019), which considers the question of how many different aggregates you might need to represent a digital shopping cart.
I tend to think of aggregates as graphs of information - if two pieces of information must change together (A changes, and therefore B must also change RIGHT NOW; or A can't change, because its range of allowed values is constrained by B), then they belong to the same aggregate. The boundary of the aggregate separates information that is tightly coupled together from everything else.
Because distributed transactions are hard, it follows that we want our aggregates stored in such a way that changing an aggregate only requires holding one single lock. For example, we won't normally spread a single instance of an aggregate across multiple databases, because ensuring that all of the databases change in exactly the right way at the "same" time is really hard.
We normally store all of the information that is tightly coupled together in a single event stream for exactly the same reason: there's only a single lock to manage.

DDD, Event Sourcing, and the shape of the Aggregate state

I'm having a hard time understanding the shape of the state that's derived applying that entity's events vs a projection of that entity's data.
Is an Aggregate's state ONLY used for determining whether or not a command can successfully be applied? Or should that state be usable in other ways?
An example - I have a Post entity for a standard blog post. I might have events like postCreated, postPublished, postUnpublished, etc. For my projections that I'll be persisting in my read tables, I need a projection for the base posts (which will include all posts, regardless of status, with lots of detail) as well as published_posts projection (which will only represent posts that are currently published with only the information necessary for rendering.
In the situation above, is my aggregate state ONLY supposed to be used to determine, for example, if a post can be published or unpublished, etc? If this is the case, is the shape of my state within the aggregate purely defined by what's required for these validations? For example, in my base post projection, I want to have a list of all users that have made a change to the post. In terms of validation for the aggregate/commands, I couldn't care less about the list of users that have made changes. Does that mean that this list should not be a part of my state within my aggregate?
TL;DR: yes - limit the "state" in the aggregate to that data that you choose to cache in support of data change.
In my aggregates, I distinguish two different ideas:
the history , aka the sequence of events that describes the changes in the lifetime of the aggregate
the cache, aka the data values we tuck away because querying the event history every time kind of sucks.
There's not a lot of value in caching results that we are never going to use.
One of the underlying lessons of CQRS is that we don't need aggregates everywhere
An AGGREGATE is a cluster of associated objects that we treat as a unit for the purpose of data changes. -- Evans, 2003
If we aren't changing the data, then we can safely work directly with immutable copies of the data.
The only essential purpose of the aggregate is to determine what events, if any, need to be applied to bring the aggregate's state in line with a command (if the aggregate can be brought so in line). All state that's not needed for that purpose can be offloaded to a read-side, which can be thought of as a remix of the event stream (with each read-side only maintaining the state it needs).
That said, there are in practice, reasons to use the aggregate state directly, with the primary one being a desire for a stronger consistency for the aggregate: CQRS is inherently eventually consistent. As with all questions of consistent updates, it's important to recognize that consistency isn't free and very often isn't even cheap; I tend to think of a project as having a consistency budget and I'm pretty miserly about spending it.
In your case, there's probably no reason to include the list of users changing a post in the aggregate state, unless e.g. there's something like "no single user can modify a given post more than n times".

CQRS events do not contain details needed for updating read model

There is one thing about CQRS I do not get: How to update the read model when the raised event does not contain the details needed for updating the read model.
Unfortunately, this is a quite common scenario.
Example: I add a user to a group, so I send a addUserToGroup(userId, groupId) command. This is received, handled by the command handler, the userAddedToGroup event is created, stored and published.
Now, an event handler receives this event and the both IDs. Now there shall be a view that lists all users with the names of the groups they're in. To update the read model for that view, we do need the user id (which we have) and the group name (which we don't have, we only have its id).
So the question is: How do I handle this scenario?
Currently, four options come to my mind, all with their specific disadvantages:
The read model asks the domain. => Forbidden, and not even possible, as the domain only has behavior, no (public) state.
The read model reads the group name from another table in the read model. => Works, but what if there is no matching table?
Add the neccessary data to the event. => Does not work, as this means that I had to update all previous events as well, and I cannot foresee which data I may need one day.
Do not handle the event via a "usual" event handler, but start an ETL process in the background that deals with the event store, creates the neccessary data and writes the read model. => Works, but to me this seems a little bit of way too much overhead for such a simple scenario.
So, the question is: How do I deal with this scenario correctly?
There are two common solutions.
1) "Event Enrichment" is where you indeed put information on the event that reflects the information you are mentioning, e.g. the group name. Doing this is somewhere between modeling your domain properly and cheating. If you know, for instance, that group names change, emitting the name at the moment of the change is not a bad idea. Imagine when you create a line item on a quote or invoice, you want to emit the price of the good sold on the invoice created event. This is because you must honor that price, even if it changes later.
2) Project several streams at once. Write a projector which watches information from the various streams and joins them together. You might watch user and group events as well as your user added to group event. Depending on the ordering of events in your system, you may know that a user is in a group before you know the name of the group, but you should know the general properties of your event store before you get going.
Events don't necessarily represent a one-to-one mapping of the commands that have initiated the process in the first place. For instance, if you have a command:
SubmitPurchaseOrder
Shopping Cart Id
Shipping Address
Billing Address
The resulting event might look like the following:
PurchaseOrderSubmitted
Items (Id, Name, Amount, Price)
Shipping Address
Shipping Provider
Our Shipping Cost
Shipping Cost billed to Customer
Billing Address
VAT %
VAT Amount
First Time Customer
...
Usually the information is available to the domain model (either by being provided by the command or as being known internal state of the concerned aggregate or by being calculated as part of processing.)
Additionally the event can be enriched by querying the read model or even a different BC (e.g. to retrieve the actual VAT % depending on state) during processing.
You're correctly assuming that events can (and probably will) change over time. This basically doesn't matter at all if you employ versioning: Add the new event (e.g. SubmitPurchaseOrderV2) and add an appropriate event handler to all the classes that are supposed to consume it. No need to change the old event, it can still be consumed since you don't modify the interface, you extend it. This basically comes down to a very good example of the Open/Closed Principle in practice.
Option 2 would be fine, your question about "what about the mismatching in the groups' name read-model table" wouldn´t apply. no data should be deleted, should invalidated when a previous event (say delete group) was emmited. In the end the row in the groups table is there effectively and you can read the group name without problem at all. The only apparent problem could be speed inconsistency, but thats another issue, events should be orderly processed no matter speed they are being processed.

Recreate a graph that change in time

I have an entity in my domain that represent a city electrical network. Actually my model is an entity with a List that contains breakers, transformers, lines.
The network change every time a breaker is opened/closed, user can change connections etc...
In all examples of CQRS the EventStore is queried with Version and aggregateId.
Do you think I have to implement events only for the "network" aggregate or also for every "Connectable" item?
In this case when I have to replay all events to get the "actual" status (based on a date) I can have near 10000-20000 events to process.
An Event modify one property or I need an Event that modify an object (containing all properties of the object)?
Theres always an exception to the rule but I think you need to have an event for every command handled in your domain. You can get around the problem of processing so many events by making use of Snapshots.
http://thinkbeforecoding.com/post/2010/02/25/Event-Sourcing-and-CQRS-Snapshots
I assume you mean currently your "connectable items" are part of the "network" aggregate and you are asking if they should be their own aggregate? That really depends on the nature of your system and problem and is more of a DDD issue than simple a CQRS one. However if the nature of your changes is typically to operate on the items independently of one another then then should probably be aggregate roots themselves. Regardless in order to answer that question we would need to know much more about the system you are modeling.
As for the challenge of replaying thousands of events, you certainly do not have to replay all your events for each command. Sure snapshotting is an option, but even better is caching the aggregate root objects in memory after they are first loaded to ensure that you do not have to source from events with each command (unless the system crashes, in which case you can rely on snapshots for quicker recovery though you may not need them with caching since you only pay the penalty of loading once).
Now if you are distributing this system across multiple hosts or threads there are some other issues to consider but I think that discussion is best left for another question or the forums.
Finally you asked (I think) can an event modify more than one property of the state of an object? Yes if that is what makes sense based on what that event represents. The idea of an event is simply that it represents a state change in the aggregate, however these events should also represent concepts that make sense to the business.
I hope that helps.