Relational database schema for event sourcing

Relational database schema for event sourcing - postgresql

I am trying to store domain events in a postgres database. I am not sure in many things, and I don't want to redesign this structure later, so I am seeking for guidance from people who have experience with event sourcing. I have currently the following table:
domain events
version - or event id, integer sequence, helps to maintain order by replays
type - event type, probably classname with namespace
aggregate - aggregate id, probably random string for each aggregate
timestamp - when the event occured
promoter - the promoter of the event, probably user id
details - json encoded data about the properties
What I am not sure:
Should I store the promoter of the domain event?
It could help to find a compromised account by security breaches, but
I don't know what to store for example by a CRONjob.
In what format should I store the event type?
Should I add a table with event types, or are the class names enough?
Should I add event groups?
I got confused by the definition of bounded contexts. As far as I know, every aggregate can have multiple bounded contexts, so I can use different aspects of a single aggregate in multiple modules. That sounds nice, since for example accounts can be related to many things, including authentication, authorization, user profile, user posts, user contracts, and so on...
What I am unsure, that a domain event can have multiple bounded contexts, or just a single one, so should I store event contexts as well? (for cases I want to replay events related to a single context)
How to implements so many properties in a single aggregate class, should I use some kind of composition?

1.Should I store the promoter of the domain event?
I think it's more flex if you store the promoter as part of the event payload instead of meta data. The security concerns should be handled outside the domain. Not every event is raised by a user, although you could make a fake one for them(a SysAdmin for CronJob).
For example:
ManualPaymentMadeEvent { //store this object as details in your schema
amount,
by_user//In this case, developers can determine whether store the promoter case by case
}
2.what format should I store the event type?
Should I add a table with event types, or are the class names enough?
Should I add event groups?
I think class names is enough. Adding another table complicates event read(by join tables), and I think it only adds value when the class names is renamed(Update one row in event type table). But I think it does not add much trouble by using
update domain_events set
aggregate_type = 'new class name'
where aggregate_type = 'origin class name'
I'm not sure that I understand event groups, could you add more explanation?
3.What I am unsure, that a domain event can have multiple bounded contexts, or just a single one, so should I store event contexts as
well?
Sometimes the events are used to integrate multiple contexts. But each event is raised only in one context. For example, A ManualPaymentMadeEvent is raised in ordering context, and an event listner in shipping context also consumes it, regards it as the trigger of start shipping.
I prefer to use per database user(oracle term) per context. shipping.domain_events for shipping context and ordering.domain_events for ordering context.
Here is the schema in axon-framework which might help
create table DomainEventEntry (
aggregateIdentifier varchar2(255) not null,
sequenceNumber number(19,0) not null,
type varchar2(255) not null, --aggregate class name
eventIdentifier varchar2(255) not null,
metaData blob,
payload blob not null, -- details
payloadRevision varchar2(255),
payloadType varchar2(255) not null, --event class name
timeStamp varchar2(255) not null
);
alter table DomainEventEntry
add constraint PK_DomainEventEntry primary key (aggregateIdentifier, sequenceNumber, type);

Related

REST Design Dynamo Table Design - pros/cons of giving a single child the same partition key as parent

I'd like to model a parent-sole child relationships in a RESTful manner.
I have a User table with partition key userId formatted as usr_############. This userId is readily available in my front-end at all times.
I need to associate to each User the following child resource tables: Profile and Settings. The relationships are 1-to-1. I could make a single table, but I prefer to keep them separate.
My question is how to best index and link the tables in a RESTful manner?
One option is to give each sole child the same partition key as its parent. I can then get the users' information as follows:
/users/usr_abcdefg123456
/profiles/usr_abcdefg123456
/settings/usr_abcdefg123456
The issue with this is my keys are formatted to have the object type as a prefix (e.g., usr_) and it is confusing to call /profiles/{profileId} with profileId="usr_###...". Also it violates the principle that each resource should have a distinct identifier. Imagine in the future I need an array of indexes for a mixed group of objects.
A second option is to make a separate partition key (e.g., profileId, settingsId) and have an ownership attribute/global index ownerId for each child. Since I would not know these new partition keys beforehand (I only have access to userId), my endpoints would have to be either:
/profiles/me
/settings/me
(not ideal because "me" is not a resource.)
/profiles?ownerId=usr_abcdefg123456
/settings?ownerId=usr_abcdefg123456
(not ideal because /profiles and /settings return collections (lists) and not a single object)
/users/usr_abcdefg123456/profile
/users/usr_abcdefg123456/settings
(not ideal because it is nested and I would have to create 2 additional REST endpoints)
Is there a better way to do this?
Your help is greatly appreciated.

PostgreSQL Array Contains vs JOIN (performance)

I have a model in which a person can receive a gift for attending one event or receive multiple gifts for attending multiple events. The gift to person or multiple gifts to person is considered one transaction in both cases. I'm using PostgreSQL to implement this model.
For example,
if you attend to certain event, you will receive a gift (a single transaction of gift to person).
And another example, you attend to a set of events therefore you receive a set of gifts for these events (in a single transaction of gifts to person).
So, in the majority of cases, only one gift to one person will be transacted. But there will be a few cases of the second example.
In order to handle this cases, i have two options,
the first one is use a postgres array and query by array contains,
and the second one is create a new table of transaction_events and make a join to query by event.
I wanted to know which option is more performant and which option the community recommends. Tanking into account that the most transaction will contains only one event and also that i cannot change the transactions model.

The second option will perform better, and it has the added benefit that you can have foreign key constraints to enforce data integrity.
In most cases, it is a good idea to avoid composite types like arrays or JSON in the database.

Drools - Store Multi Stateful Sessions

We have implemented drools engine in our platform in order to be able to evaluate rules from streams.
In our use case we have a change detection stream which contains the changes of multiple entities.
Rules need to be evaluated for each entity from the stream over a period of time and evolve it's state apart from others entities(Sessions). Those rules produces alerts based on the state of each entity. And for this reason entities should be into boundaries, so the state of one entity does not interfere on the others.
To achieve this, we create a session as a Spring Bean for each entity id and store it in a inMemory HashMap. So every time an entity arrives, we try to find it`s session on the inMemory Map by using it's Id. If we get a null return we create it.
It does`t seems the right way to accomplish it. Because it does not offer a disaster recover strategy neither offers a great memory management.
We could use some kind of inMemory database such as Redis or Memchached. But I don`t think it would be able to recover a stateful session precisely.
Does someone know how to achieve disaster recover and a good memory management with a embedded Drools with multi sessions in the right way? Does the platform offers some solution?
Thanks very much for your attention and support

The answer is not to try to persist and reuse sessions, but rather to persist an object that models the current state of the entity.
Your current workflow is this:
Entity arrives at your application (from change detection stream or elsewhere)
You do a lookup on a hashmap to get a Session which has the entity's state stored
You fire the rules, which updates the session (and possibly the entity)
You persist the session in-memory.
What your workflow should be is this:
(same) Entity arrives at your application
You do a look-up on an external data source for the entity's state -- for example from a database or data store
You fire the rules, passing in the entity state. Instead of updating the session, you update the state instance.
You persist the state to your external data source.
If you add appropriate write-through caches you can guarantee both performance and consistency. This will also allow you to scale your application sideways if you implement appropriate locking / transaction handling for your data source.
Here's a toy example.
Let's say we have an application modelling a Library where a user is allowed to check out books. A user is only allowed to check out a total of 3 books at a time.
The 'event' we receive models a book check-in or check-out event:
class BookBorrowEvent {
int userId;
int bookId;
EventType eventType; // EventType.CHECK_IN or EventType.CHECK_OUT
}
In an external data source we maintain a UserState record -- maybe as a distinct record in a traditional RDBMS or an aggregate; how we store it isn't really relevant to the example. But let's say our UserState record as returned from the data source looks something like this:
class UserState {
int userId;
int[] borrowedBookIds;
}
When we receive the event, we'll first retrieve the user state from the external data store (or an internally-managed write-through cache), then add the UserState to the rule inputs. We should be appropriately handling our sessions (disposing of them after use, using session pools as needed), of course.
public void handleBookBorrow(BookBorrowEvent event) {
UserState state = getUserStateFromStore(event.getUserId());
KieSession kieSession = ...;
kieSession.insert( event );
kieSession.insert( state );
kieSession.fireAllRules();
persistUserStateToStore(state);
}
Your rules would then do their work against the UserState instance, instead of storing values in local variables.
Some example rules:
rule "User borrows a book"
when
BookBorrowEvent( eventType == EventType.CHECK_OUT,
$bookId: bookId != null )
$state: UserState( $checkedOutBooks: borrowedBookIds not contains $bookId )
Integer( this < 3 ) from $checkedOutBooks.length
then
modify( $state ) { ... }
end
rule "User returns a book"
when
BookBorrowEvent( eventType == EventType.CHECK_IN,
$bookId: bookId != null )
$state: UserState( $checkedOutBooks: borrowedBookIds contains $bookId )
then
modify( $state ) { ... }
end
Obviously a toy example, but you could easily add additional rules for cases like user attempts to check out a duplicate copy of a book, user tries to return a book that they hadn't checked out, return an error if the user exceeds the 3 max book borrowing limit, add time-based logic for length of checkout allowed, etc.
Even if you were using stream-based processing so you can take advantage of the temporal operators, this workflow still works because you would be passing the state instance into the evaluation stream as you receive it. Of course in this case it would be more important to properly implement a write-through cache for performance reasons (unless your temporal operators are permissive enough to allow for some data source transaction latency). The only changes you need to make is to refocus your rules to target their data persistence to the state object instead of the session itself -- which isn't generally recommended anyway since sessions are designed to be disposed of.

Why is a GUID used as the type for the Id fields in the EventSources and Events tables in an EventStore database?

Why is uniqueidentifier (which equates to a GUID in .NET) used as the type for the Id fields in the EventSources and Events tables?
Would it not be faster to use an integer type (like bigint in SQL Server) that functioned as an identity, so that the database could assign the Id as the inserts are performed?
I am a complete newb when it comes to Event Sourcing and CQRS, so I apologize if this has been asked and answered and my searching hasn't been correct enough to find the answer.

Note: Answers 2 and 4 assume that you are following a few basic principles of Domain-Driven Design.
IDs should be unique across different aggregate types and even across different bounded contexts
Every aggregate must always be in a valid state. Having a unique ID is part of that. This means you couldn't do anything with the aggregate before the initial event has been stored and the database generated and returned the respective ID.
A client that sends the initial command often needs some kind of reference to relate the created aggregate to. How would the client know which ID has been assigned? Remember, commands are void, they don't return anything beyond ack/nack, not even an ID.
A domain concern (identification of an entity) would heavily rely on technical implementation details (This should actually be #1)

Accessing a class that is related to two other classes

Given the following tables: User, Trial, UserTrial. Where A user has multiple trials, a Trial does not internally map to any Users and contains details about the trial (name, description, settings), and a UserTrial contains information specific to an instance of a User's trial (expiration date, for example). What would be the proper way for the controller of an MVC application to access data about a UserTrial?
Additional Details
There is no ORM
Each class is dual-purposed to be useable to create new, or load existing Users, Trials, or UserTrials. The constructor loads data when passed an ID and persists it with the method ->save()
It would seem that there are 2 options:
1
User.SetTrial()
User.GetUserTrial()
2
UserTrial.SetUser()
UserTrial.SetTrial()
UserTrial.GetSomeData()
Which is the most appropriate usage?

I don't think your option 1 will work because if each User can have multiple Trials, then you'd need something like User.AddTrial(Trial), User.RemoveTrial(Trial), User.getUserTrails().
Which design option you choose depends on whether you want to make UserTrial objects "first class" or not. Do you want to consider Users and Trials to be the primary objects with UserTrial objects just glue to hold the relations, or do you want UserTrial objects to be primary as well? If the former, you'll want something like your option 1; if the latter, you'll want something like your option 2.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Relational database schema for event sourcing - postgresql

Related

REST Design Dynamo Table Design - pros/cons of giving a single child the same partition key as parent

PostgreSQL Array Contains vs JOIN (performance)

Drools - Store Multi Stateful Sessions

Why is a GUID used as the type for the Id fields in the EventSources and Events tables in an EventStore database?

Accessing a class that is related to two other classes

Categories

Resources