I am working for a school district, and we are planning on using Drools to implement the following types for rules for the student population of the districts constituent schools:
If a student has 3 absences during a year their attendance metric moves to a WARN status.
If a student has 6 absences during a year their attendance metric moves to a CRITICAL status.
If a student has 3 major behavior incidents during a year their behavior metric moves to a WARN status.
If a student has 2 minor and 2 major behavior incidents during a year their behavior metric moves to a CRITICAL status.
...these are just examples from the top of my head, but there are many more rules of a similar nature.
All of these rules can be simply expressed using Drools expert. Also, the processing of the rules for a student does not need to be synchronous. I have a couple of questions about the best way to implement this.
From one standpoint this could be viewed a monitoring system for a stream of events. This made me think of creating a stateful session into which each new event would be inserted. However, the events happen over the course of 9 months and are relatively infrequent. Also, we could build a session per school, or a session per student.
Would keeping a session in memory for that long be a problem?
If the server failed, would we need to rebuild the session state from scratch or would it be advisable to take regular snapshots and just restore the facts that occurred since the time of the snapshot.
Another option would be to persist a session for each student after an event is processed for that student. When the next event comes in we would retrieve their session from storage and insert the new fact. This way we wouldn't need to retrieve all the facts for each run of the engine to get the student's status. Would a configuration like this be supported? Are there any cons to doing this?
A third approach would be to respond to a new fact for a student by retrieving all other facts the rules need to run, create a new KnowledgeSession and run the rules.
Any advice on what might be the best approach would be greatly appreciated.
Dave
I would go with solution number 2: one session per student. Given the fact that you are not going to be interacting too much with the session, I would keep it in a db and only restore it when needed: a new absence/incident arrives, the session for that student is restored from db, the facts are inserted, the rules are executed and the resulting status is retrieved.
The main disadvantage I see with this scenario is that creating rules about more than one student is not straightforward and you have to feed your facts to more than one session. For example, if you want rise an alert if you have more than 10 students with CRITICAL status in a single class. In this case, a session per class would be enough. So, as you can see, you have to decide what is better for you. But no matter the 'unit' you choose (school, class, student) I would still recommend you the execution flow I mentioned earlier.
Drools already comes with support for database persistence using JPA. You could get more information about this feature here: http://docs.jboss.org/drools/release/5.5.0.Final/drools-expert-docs/html_single/#d0e3961
The basic idea is that instead of creating your ksessions using kbase.newStatefulKnowledgeSession() you use the helper class called JPAKnowledgeService. This class will return a wrapper of a StatefulKnowledgeSession that will persist its state after each method invocation. In this class you will find 2 important methods: newStatefulKnowledgeSession(), to create a new ksession and loadStatefulKnowledgeSession() to retrieve an existing session from the database.
Hope it helps,
There is a fourth option to make the maintenance simpler. Build one single stateful knowledge session for entire school district for all the students. After each event is processed successfully persist the session in case you need to reconstruct the working memory in case of JVM failure. You will need larger RAM and heap space allocation, but in today's time RAM is cheap. (We use 32 GB RAM and allocate 16 GB XMs and Xmx) Most likely, your JVM will never go down provided you have 24x7 server.
Being lazy I would go for the third approach. I will store all the events in a DB, Then I will process all the students in batch once per: day, week, month (as you need). That will allow you to create just one session with rules that covers multiple students, classes, etc.
If you don't have 3+ Million students you will be fine and it will be a performant app.
Thanks for the suggestions and advice. I'm leaning towards #2 for a couple of reasons:
I think that gathering up all the facts for a given student to rerun them from scratch for every event that comes in will be a heavy-weight process that I'd rather avoid.
The use case I'm dealing with models as a very long-running monitoring process which leads me to believe (after reading the use cases for Fusion) that inserting events into a persistent KnowledgeSession is the way to go.
I don't think the scope of the problem space (i.e. a student vs a classroom, a school or the whole district) is a problem. If we need classroom metrics then we will just have a new rulebase for classes that also consumes the relevant events (tests, absences etc.)
The one caveat is that if the rules change, we need to re-evaluate all affected students against the new rulebase which means gathering up all the facts and starting from the beginning of the school year. This shouldn't happen much at all, but if it become more frequent then I might move to the 3rd approach.
Thanks again for the assisstance.
Related
we developed a web app that relies on real-time interaction between our users. We use Angular for the frontend and Hasura with GraphQL on Postgres as our backend.
What we noticed is that when more than 300 users are active at the same time we experience crucial performance losses.
Therefore, we want to improve our subscriptions setup. We think that possible issues could be:
Too many subscriptions
too large and complex subscriptions, too many forks in the subscription
Concerning 1. each user has approximately 5-10 subscriptions active when using the web app. Concerning 2. we have subscriptions that are complex as we join up to 6 tables together.
The solutions we think of:
Use more queries and limit the use of subscriptions on fields that are totally necessary to be real-time.
Split up complex queries/subscriptions in multiple smaller ones.
Are we missing another possible cause? What else can we use to improve the overall performance?
Thank you for your input!
Preface
OP question is quite broad and impossible to be answered in a general case.
So what I describe here reflects my experience with optimization of subscriptions - it's for OP to decide is it reflects their situtation.
Short description of system
Users of system: uploads documents, extracts information, prepare new documents, converse during process (IM-like functionalitty), there are AI-bots that tries to reduce the burden of repetitive tasks, services that exchange data with external systems.
There are a lot of entities, a lot of interaction between both human and robot participants. Plus quite complex authorization rules: visibility of data depends on organization, departements and content of documents.
What was on start
At first it was:
programmer wrote a graphql-query for whole data needed for application
changed query to subscription
finish
It was OK for first 2-3 monthes then:
queries became more complex and then even more complex
amount of subscriptions grew
UI became lagging
DB instance is always near 100% load. Even during nigths and weekends. Because somebody did not close application
First we did optimization of queries itself but it did not suffice:
some things are rightfully costly: JOINs, existence predicates, data itself grew significantly
network part: you can optimize DB but just to transfer all needed data has it's cost
Optimization of subscriptions
Step I. Split subscriptions: subscribe for change date, query on change
Instead of complex subscription for whole data split into parts:
A. Subscription for a single field that indicates that entity was changed
E.g.
Instead of:
subscription{
document{
id
title
# other fields
pages{ # array relation
...
}
tasks{ # array relation
...
}
# multiple other array/object relations
# pagination and ordering
}
that returns thousands of rows.
Create a function that:
accepts hasura_session - so that results are individual per user
returns just one field: max_change_date
So it became:
subscription{
doc_change_date{
max_change_date
}
}
Always one row and always one field
B. Change of application logic
Query whole data
Subscribe for doc_change_date
memorize value of max_change_date
if max_change_date changed - requery data
Notes
It's absolutely OK if subscription function sometimes returns false positives.
There is no need to replicate all predicates from source query to subscription function.
E.g.
In our case: visibility of data depends on organizations and departments (and even more).
So if a user of one department creates/modifies document - this change is not visible to user of other department.
But those changes are like ones/twice in a minute per organization.
So for subscription function we can ignore those granularity and calculate max_change_date for whole organization.
It's beneficial to have faster and cruder subscription function: it will trigger refresh of data more frequently but whole cost will be less.
Step II. Multiplex subscriptions
The first step is a crucial one.
And hasura has a multiplexing of subscriptions: https://hasura.io/docs/latest/graphql/core/databases/postgres/subscriptions/execution-and-performance.html#subscription-multiplexing
So in theory hasura could be smart enough and solve your problems.
But if you think "explicit better than implicit" there is another step you can do.
In our case:
user(s) uploads documents
combines them in dossiers
create new document types
converse with other
So subscriptions becames: doc_change_date, dossier_change_date, msg_change_date and so on.
But actually it could be beneficial to have just one subscription: "hey! there are changes for you!"
So instead of multiple subscriptions application makes just one.
Note
We thought about 2 formats of multiplexed subscription:
A. Subscription returns just one field {max_change_date} that is accumulative for all entities
B. Subscription returns more granular result: {doc_change_date, dossier_change_date, msg_change_date}
Right now "A" works for us. But maybe we change to "B" in future.
Step III. What we would do differently with hasura 2.0
That's what we did not tried yet.
Hasura 2.0 allows registering VOLATILE functions for queries.
That allows creation of functions with memoization in DB:
you define a cache for function call presumably in a table
then on function call you first look in cache
if not exists: add values to cache
return result from cache
That allows further optimizations both for subscription functions and query functions.
Note
Actually it's possible to do that without waiting for hasura 2.0 but it requires trickery on postgresql side:
you create VOLATILE function that did real work
and another function that's defined as STABLE that calls VOLATILE function. This function could be registered in hasura
It works but that's trick is hard to recommend.
Who knows, maybe future postgresql versions or updates will make it impossible.
Summary
That's everything that I can say on the topic right now.
Actually I would be glad to read something similar a year ago.
If somebody sees some pitfalls - please comment, I would be glad to hear opinions and maybe alternative ways.
I hope that this explanation will help somebody or at least provoke thought how to deal with subscriptions in other ways.
I know that one can utilize multiple KieBases and multiple KieSessions, but I don't understand under what scenarios one would use one approach vs the other (I am having some trouble in general understanding the definitions and relationships between KieContainer, KieBase, KieModule, and KieSession). Can someone clarify this?
You use multiple KieBases when you have multiple sets of rules doing different things.
KieSessions are the actual session for rule execution -- that is, they hold your data and some metadata and are what actually executes the rules.
Let's say I have an application for a school. One part of my application monitors students' attendance. The other part of my application tracks their grades. I have a set of rules which decides if students are truant and we need to talk to their parents. I have a completely unrelated set of rules which determines whether a student is having trouble academically and needs to be put on probation/a performance plan.
These rules have nothing to do with one another. They have completely separate concerns, different rule inputs, and are triggered in different parts of the application. The part of the application that is tracking attendance doesn't need to trigger the rules that monitor student performance.
For this application, I would have two different KieBases: one for attendance, and one for academics. When I need to fire the rules, I fire one or the other -- there is no use case for firing both at the same time.
The KieSession is the runtime for when we fire those rules. We add to it the data we need to trigger the rules, and it also tracks some other metadata that's really not relevant to this discussion. When firing the academics rules, I would be adding to it the student's grades, their classes, and maybe some information about the student (eg the grade level, whether they're an "honors" student, tec.). For the attendance rules, we would need the student information, plus historical tardiness/absence records. Those distinct pieces of data get added to the sessions.
When we decide to fire rules, we first get the appropriate KieBase -- academics or attendance. Then we get a session for that rule set, populate the data, and fire it. We technically "execute" the session, not the rules (and definitely not the rule base.) The rule base is just the collection of the rules; the session is how we actually execute it.
There are two kinds of sessions -- stateful and stateless. As their names imply, they differ with how data is stored and tracked. In most cases, people use stateful sessions because they want their rules to do iterative work on the inputs. You can read more about the specific differences in the documentation.
For low-volume applications, there's generally little need to reuse your KieSessions. Create, use, and dispose of them as needed. There is, however, some inherent overhead in this process, so there comes a point in which reuse does become something that you should consider. The documentation discusses the solution provided out-of-the box for Drools, which is session pooling.
(When trying to wrap your head around this, I like to use an analogy of databases. A session is like a JDBC connection: for small applications you can create them, use them, then close them as you need them. But as you scale you'll quickly find that you need to look into connection pooling to minimize this overhead. In this particular analogy, the rule base would be the database that the rules are executing against -- not the tables!)
I've been reading about CQRS+EventSoucing patterns (which I wish to apply in a near future) and one point common to all decks and presentations I found is to take snapshots of your model state in order to restore it, but none of these share patterns/strategies of doing that.
I wonder if you could share your thoughts and experience in this matter particularly in terms of:
When to snapshot
How to model a snapshot store
Application/cache cold start
TL;DR: How have you implemented Snapshotting in your CQRS+EventSourcing application? Pros and Cons?
Rule #1: Don't.
Rule #2: Don't.
Snapshotting an event sourced model is a performance optimization. The first rule of performance optimization? Don't.
Specifically, snapshotting reduces the amount of time you lose in your repository trying to reload the history of your model from your event store.
If your repository can keep the model in memory, then you aren't going to be reloading it very often. So the win from snapshotting will be small. Therefore: don't.
If you can decompose your model into aggregates, which is to say that you can decompose the history of your model into a number of entities that have non-overlapping histories, then your one model long model history becomes many many short histories that each describe the changes to a single entity. Each entity history that you need to load will be pretty short, so the win from a snapshot will be small. Therefore: don't.
The kind of systems I'm working today require high performance but not 24x7 availability. So in a situation where I shut down my system for maintenace and restart it I'd have to load and reprocess all my event store as my fresh system doesn't know which aggregate ids to process the events. I need a better starting point for my systems to restart be more efficient.
You are worried about missing a write SLA when the repository memory caches are cold, and you have long model histories with lots of events to reload. Bolting on snapshotting might be a lot more reasonable than trying to refactor your model history into smaller streams. OK....
The snapshot store is a read model -- at any point in time, you should be able to blow away the model and rebuild it from the persisted history in the event store.
From the perspective of the repository, the snapshot store is a cache; if no snapshot is available, or if the store itself doesn't respond within the SLA, you want to fall back to reprocessing the entire event history, starting from the initial seed state.
The service provider interface is going to look something like
interface SnapshotClient {
SnapshotRecord getSnapshot(Identifier id)
}
SnapshotRecord is going to provide to the repository the information it needs to consume the snapshot. That's going to include at a minimum
a memento that allows the repository to rehydrate the snapshotted state
a description of the last event processed by the snapshot projector when building the snapshot.
The model will then re-hydrate the snapshotted state from the memento, load the history from the event store, scanning backwards (ie, starting from the most recent event) looking for the event documented in the SnapshotRecord, then apply the subsequent events in order.
The SnapshotRepository itself could be a key-value store (at most one record for any given id), but a relational database with blob support will work fine too
select *
from snapshots s
where id = ?
order by s.total_events desc
limit 1
The snapshot projector and the repository are tightly coupled -- they need to agree on what the state of the entity should be for all possible histories, they need to agree how to de/re-hydrate the memento, and they need to agree which id will be used to locate the snapshot.
The tight coupling also means that you don't need to worry particularly about the
schema for the memento; a byte array will be fine.
They don't, however, need to agree with previous incarnations of themselves. Snapshot Projector 2.0 discards/ignores any snapshots left behind by Snapshot Projector 1.0 -- the snapshot store is just a cache after all.
i'm designing an application that will probably generate millions event a day. what can we do if we need to rebuild a view 6 month later
One of the more compelling answers here is to model time explicitly. Do you have one entity that lives for six months, or do you have 180+ entities that each live for one day? Accounting is a good domain to reference here: at the end of the fiscal year, the books are closed, and the next year's books are opened with the carryover.
Yves Reynhout frequently talks about modeling time and scheduling; Evolving a Model may be a good starting point.
There are few instances you need to snapshot for sure. But there are a couple - a common example is an account in a ledger. You'll have thousands maybe millions of credit/debit events producing the final BALANCE state of the account - it would be insane not to snapshot that every so often.
My approach to snapshoting when I designed Aggregates.NET was its off by default and to enable your aggregates or entities must inherit from AggregateWithMemento or EntityWithMemento which in turn your entity must define a RestoreSnapshot, a TakeSnapshot and a ShouldTakeSnapshot
The decision whether to take a snapshot or not is left up to the entity itself. A common pattern is
Boolean ShouldTakeSnapshot() {
return this.Version % 50 == 0;
}
Which of course would take a snapshot every 50 events.
When reading the entity stream the first thing we do is check for a snapshot then read the rest of the entity's stream from the moment the snapshot was taken. IE: Don't ask for the entire stream just the part we have not snapshoted.
As for the store - you can use literally anything. VOU is right though a key-value store is best because you only need to 1. check if one exists 2. load the entire thing - which is ideal for kv
For system restarts - I'm not really following what your described problem is. There's no reason for your domain server to be stateful in the sense that its doing something different at different points in time. It should do just 1 thing - process the next command. In the process of handling a command it loads data from the event store, including a snapshot, runs the command against the entity which either produces a business exception or domain events which are recorded to the store.
I think you may be trying to optimize too much with this talk of caching and cold starts.
I am trying to figure out how to manage a users game state using akka.
The game state will be persisted to mysql and this cannot change because we have other services that require this.
Anything that happens in a game is considered an "event".
Then you I have "Levels" which someone can achieve. A level is achieved when you complete all the "events" associated with it.
So you have:
Level
- event1 e.g. reach a point in the game
- event2 e.g. pickup a sword
- event3 e.g. defeat a monster
So in a game there are many levels, and 100's of events that are linked to levels.
So all "events" are sent via HTTP to my backend, and I save the event in the database.
I then have to load the users game profile in memory, and then re-calculate the Level's achieved since there was a new event that happened.
Note: This calculation cannot be done at the database level because it is a little more complicated that I am writing here.
The problem I see is that if I use akka, I can't have multiple actors processing the events for the same user, because the data can become stale.
Just to be clear, so when a new event arrives, I have to load the game profile in memory, loop through the levels and see if any of them have been achieved, if they have, update the database
e.g. update levels set achieved=true where level_id = 123 and user_id=234
e.g. actor1 loads the profile (all the levels and events for this user) and then processes the new event that just arrived in the inbox.
at the same time, actor2 loads the profile (same as actor1), and then processes the new event. When it persists the changes to mysql, the data will be out of sych.
If I was using threads, I would have to lock during the game profile calculation and persisting to the db.
How can I do this using Akka and be able to handle things in parallel, or is this scenerio not allow for it?
Let's think how you would manage it without actors. So, in nutshell, you have the following problem scenario:
two (or more) update requests arrive at the same time, both are
going to modify the same data
both requests read some stable data
state, then update it each in its own manner and persist to the DB
the modifications from the request which checked in first are lost, more precisely - overridden by the later request.
This is a classical problem. There are at least two classical solutions of it:
Optimistic locking
Pessimistic locking: it's usually achieved by applying Serializable isolation level for transactions.
It worth reading this answer with a nice comparison of both worlds.
As you're using Akka, you most probably want to prefer better concurrency and occasional failures, which are easy to recover. It goes on par with Akka motto let it crash.
So, you need to make the next steps:
Add version column to your table(s). It can be numeric or string (with hash). Numeric is the simplest one.
When you insert new record - initialize versions.
When you update the record - check version value has not changed. So, here's your update strategy:
Read record and its version.
Update record in memory.
Execute update query with criteria where rec_id=$id and version=$version.
If updated records count is 1 - you're good. If 0 - throw OptimisticLockException or smth like this.
Finally, it's time for Akka to do its job: come up with appropriate supervision strategy (I'd pick something like try again in 1 second). In actor's preRestart method return the update message back to the actor's mailbox (see Restart Hooks chapter in Akka docs).
With this strategy, even if two requests try to update the same record at a time, one of them will fail and will be immediately processed again.
I have an entity in my domain that represent a city electrical network. Actually my model is an entity with a List that contains breakers, transformers, lines.
The network change every time a breaker is opened/closed, user can change connections etc...
In all examples of CQRS the EventStore is queried with Version and aggregateId.
Do you think I have to implement events only for the "network" aggregate or also for every "Connectable" item?
In this case when I have to replay all events to get the "actual" status (based on a date) I can have near 10000-20000 events to process.
An Event modify one property or I need an Event that modify an object (containing all properties of the object)?
Theres always an exception to the rule but I think you need to have an event for every command handled in your domain. You can get around the problem of processing so many events by making use of Snapshots.
http://thinkbeforecoding.com/post/2010/02/25/Event-Sourcing-and-CQRS-Snapshots
I assume you mean currently your "connectable items" are part of the "network" aggregate and you are asking if they should be their own aggregate? That really depends on the nature of your system and problem and is more of a DDD issue than simple a CQRS one. However if the nature of your changes is typically to operate on the items independently of one another then then should probably be aggregate roots themselves. Regardless in order to answer that question we would need to know much more about the system you are modeling.
As for the challenge of replaying thousands of events, you certainly do not have to replay all your events for each command. Sure snapshotting is an option, but even better is caching the aggregate root objects in memory after they are first loaded to ensure that you do not have to source from events with each command (unless the system crashes, in which case you can rely on snapshots for quicker recovery though you may not need them with caching since you only pay the penalty of loading once).
Now if you are distributing this system across multiple hosts or threads there are some other issues to consider but I think that discussion is best left for another question or the forums.
Finally you asked (I think) can an event modify more than one property of the state of an object? Yes if that is what makes sense based on what that event represents. The idea of an event is simply that it represents a state change in the aggregate, however these events should also represent concepts that make sense to the business.
I hope that helps.