Expert/Rule Engine that updates facts atomically? - drools

Atomically might not be the right word. When modelling cellular automata or neural networks, usually you have two copies of the system state. One is the current state, and one is the state of the next step that you are updating. This ensures consistency that the state of the system as a whole remains unchanged while running all of the rules to determine the next step. For example, if you run the rules for one cell/neuron to determine the state of it for the next step, you then run the rules for the next cell, it's neighbor, you want to use as the input for those rules the current state of the neighbor cell, not its updated state.
This may seem inefficient due to the fact that each step requires you copy all of the current step states to the next step states before updating them, however it is important to do this to accurately simulate the system as if all cells/neurons were actually being processed simultaneously, and thus all inputs for rules/firing functions were the current states.
Something that has bothered me when designing rules for expert systems is how one rule can run, update some facts that should trigger other rules to run, and you might have 100 rules queued up to run in response, but the salience is used as a fragile way to ensure the really important ones run first. As these rules run, the system changes more. The state of the facts are consistently changing, so by the time you get to processing the 100th rule, the state of the system has changed significantly since the time it was added to the queue when it was really responding to the first fact change. It might have changed so drastically that the rule doesn't have a chance to react to the original state of the system when it really should have. Usually as a workaround you carefully adjust its salience, but then that moves other rules down the list and you run into a chicken or egg problem. Other workarounds involve adding "processing flag" facts that serve as a locking mechanism to suppress certain rules until other rules process. These all feel like hacks and cause rules to include criteria beyond just the core domain model.
If you build a really sophisticated system that modeled a problem accurately, you would really want the changes to the facts to be staged to a separate "updates" queue that doesn't affect the current facts until the rules queue is empty. So lets say you make a fact change that fills the queue of rules to run with 100 rules. All of these rules would run, but none of them would update facts in the current fact list, any change they make gets queued to a change list, and that ensures no other rules get activated while the current batch is processing. Once all rules are processed, then the fact changes get applied to the current fact list, all at once, and then that triggers more rules to be activated. Rinse repeat. So it becomes much like how neural networks or cellular automata are processed. Run all rules against an unchanging current state, queue changes, after running all rules apply the changes to current state.
Is this mode of operation a concept that exist in the academic world of expert systems? I'm wondering if there is a term for it.
Does Drools have the capability to run in a way that allows all rules to run without affecting the current facts, and queue fact changes separately until all rules have run? If so, how? I don't expect you to write the code for me, but just some keywords of what it's called or keywords in the API, some starting point to help me search.
Do any other expert/rule engines have this capability?
Note that in such a case, the order rules run in no longer matters, because all of the rules queued to run will all be seeing only the current state. Thus as the queue of rules is run and cleared, none of the rules see any of the changes the other rules are making, because they are all being run against the current set of facts. Thus the order becomes irrelevant and the complexities of managing rule execution order go away. All fact changes are pending and not applied to the current state until all rules have been cleared from the queue. Then all of those changes are applied at once, and thus cause relevant rules to queue again. So my goal is not to have more control over the order that rules run in, but to avoid the issue of rule execution order entirely by using an engine that simulates simultaneous rule execution.

If I understand what you describe:
You have one fact that is managed by many rules
Each rule should apply on the initial value of your fact and has no right to modify the fact value (to not modify other rules'executions)
You then batch all the updates made by the rules on your fact
Other rules apply on this new fact value in a similar manner 'simutanously'
It seems to me that it is a Unit of Work design pattern just like Hibernate implements it (and many ORM in fact): http://www.codeproject.com/Articles/581487/Unit-of-Work-Design-Pattern
Basically you store in-memory all the changes (in a 'technical' fact for instance) and then execute a 'transaction' when all the rules based on the initial value have been fired that updates the fact value, and so on. Hibernate does that with its session (you modify your attached object, then when required it executes the update query on the database, not all modifications on the java object produce queries on your database).
Still you will have troubles if updates conflict (same fact field value modified, which value to choose? Same as a source version control conflict), you will have to define a determinist way to order updates, but it will be defined only once and available for all rules and for other changes it will work seamlessly.

This workaournd may/may not work based on your rather vague description. If you really are concerned about rules triggering further activations, why not queue the intermediate state yourself. And once the current evaluation is complete, insert those new facts into the working memory.
You would have to invoke fireAllRules() after inserting each fact though, this could be quite expensive. And then in the rules, rather than inserting the facts directly, push these into a queue. Once the above call returns, walk through the queue doing the same (or after inserting the original facts completely...)
I would imagine that this will be quite slow, to speed up, you could have multiple parallel working memories with the same rules, and evaluate multiple facts in one go into several queues etc. But things get pretty hairy..
Anyway, just an idea that's too long for the comments...

Related

How do I know which drools rule is running now?

For example, I am load a lot of drools rules to run, how do I know which drools rule now is running? So I can know find out the rule
Assuming you're talking about the right hand side of the rules, you'll want to use an AgendaEventListener. This is an interface which defines a listener that you can create that watches the Event Lifecycle. For more information about the event model, please refer to the Drools documentation.
The easiest way to do this would be to extend either DefaultAgendaEventListener or DebugAgendaEventListener. Both of these classes implement all of the interface methods. The Default listener implements each method as a "no-op", so you can override just the methods you care about. The Debug listener implements each method with a logging statement, logging the toString() of the triggering event to INFO. If you're just learning about the Drools lifecycle, hooking up the various Debug listeners is a great way to watch and learn how rules and events process in rules.
(Also the cool thing about listeners is that they allow you to put breakpoints in the "when" clause that trigger when specific conditions are met -- eg when a rule match is created. In general I find that listeners are a great debugging tool because they allow you to put breakpoints in methods that trigger when different parts of the Drools lifecycle occur.)
Anyway, what you'll want to do is create an event listener and then pay attention to one or more of these specific events:
BeforeMatchFired
AfterMatchFired
MatchCreated
Which events to pay attention to depend on where you think the issue is.
If you think the issue is in the "when" clause (left-hand side, LHS), the MatchCreated event is what is triggered when Drools evaluates the LHS and decides that this rule is valid for firing based on the input data. It is then put on, effectively, a priority queue based on salience. When the rule is the highest priority on the queue, it is picked up for firing -- at this point the BeforeMatchFired event is triggered; note that this is before the "then" clause (right-hand side, RHS) is evaluated. Then Drools will actually do the work on the RHS, and once it finishes, trigger the AfterMatchFired.
Things get a little more complicated when your rules do things like updates/retracts/etc -- you'll start having to consider potential match cancellations when Drools re-evaluates the LHS and decides that a rule is no longer valid to be fired per the facts in working memory. But in general, these are the tools you'll want to start with.
The way I would traditionally identify long-running rules would be to start timing within the BeforeMatchFired and to stop timing in the AfterMatchFired, and then log the resulting rule execution time. Note that you want to be careful here to log the execution of the current rule, tracking it by name; if your rule extends another rule you might find that your execution flow goes BeforeMatchFired(Child) -> BeforeMatchFired(Parent) -> AfterMatchFired(Parent) -> AfterMatchFired(Child), so if you're naively stopping a shared timer you might start having issues. My preferred way of doing this is by tracking timers by rule name in thread local or even a thread-safe map implementation, but you can go whichever route you'd like.
If you're using a very new version of Drools (7.41+), there is a new library called drools-metric which you can use to identify slow rules. I haven't personally used this library yet because the newest versions of Drools have started introducing non-backwards-compatible changes in minor releases, but this is an option as well.
You can read more about drools-metric in the official documentation here (you'll need to scroll down a bit.) There's some tuning you'll need to do because the module only logs instances where the thresholds are exceeded. The docs that I've linked to include the Maven dependency you'll need to import, along with information about configuration, and some examples of the output and how to understand what it's telling you.

When to use multiple KieBases vs multiple KieSessions?

I know that one can utilize multiple KieBases and multiple KieSessions, but I don't understand under what scenarios one would use one approach vs the other (I am having some trouble in general understanding the definitions and relationships between KieContainer, KieBase, KieModule, and KieSession). Can someone clarify this?
You use multiple KieBases when you have multiple sets of rules doing different things.
KieSessions are the actual session for rule execution -- that is, they hold your data and some metadata and are what actually executes the rules.
Let's say I have an application for a school. One part of my application monitors students' attendance. The other part of my application tracks their grades. I have a set of rules which decides if students are truant and we need to talk to their parents. I have a completely unrelated set of rules which determines whether a student is having trouble academically and needs to be put on probation/a performance plan.
These rules have nothing to do with one another. They have completely separate concerns, different rule inputs, and are triggered in different parts of the application. The part of the application that is tracking attendance doesn't need to trigger the rules that monitor student performance.
For this application, I would have two different KieBases: one for attendance, and one for academics. When I need to fire the rules, I fire one or the other -- there is no use case for firing both at the same time.
The KieSession is the runtime for when we fire those rules. We add to it the data we need to trigger the rules, and it also tracks some other metadata that's really not relevant to this discussion. When firing the academics rules, I would be adding to it the student's grades, their classes, and maybe some information about the student (eg the grade level, whether they're an "honors" student, tec.). For the attendance rules, we would need the student information, plus historical tardiness/absence records. Those distinct pieces of data get added to the sessions.
When we decide to fire rules, we first get the appropriate KieBase -- academics or attendance. Then we get a session for that rule set, populate the data, and fire it. We technically "execute" the session, not the rules (and definitely not the rule base.) The rule base is just the collection of the rules; the session is how we actually execute it.
There are two kinds of sessions -- stateful and stateless. As their names imply, they differ with how data is stored and tracked. In most cases, people use stateful sessions because they want their rules to do iterative work on the inputs. You can read more about the specific differences in the documentation.
For low-volume applications, there's generally little need to reuse your KieSessions. Create, use, and dispose of them as needed. There is, however, some inherent overhead in this process, so there comes a point in which reuse does become something that you should consider. The documentation discusses the solution provided out-of-the box for Drools, which is session pooling.
(When trying to wrap your head around this, I like to use an analogy of databases. A session is like a JDBC connection: for small applications you can create them, use them, then close them as you need them. But as you scale you'll quickly find that you need to look into connection pooling to minimize this overhead. In this particular analogy, the rule base would be the database that the rules are executing against -- not the tables!)

In Business Rules Engines, is there a way to address a "stale" snapshot?

Specific environment I'm using in case it's needed: Drools, JAVA
Fairly new to BREs, but as I understand - it generates an initial snapshot of data.. then applies reasoning on the snapshot to enforce business rules. What happens if I perform data updates in between the snapshot generation and rule execution?
Is there any way to workaround this, other than limiting data updates altogether?
Drools does not generate a "snapshot of data". The Drools engine operates on a set of objects made available to it by passing object references to the insert method so that the become "facts".
Any changes made to these facts result in a change of what the engine is about to do until the engine indicates that all has been done by returning from the call fireAllRules. The idea is to be up-to-date during rule evaluation.
If you want to analyse a static set of facts or make sure that you are looking at consistent sets of facts you'll have withhold updates or operate on cloned objects. But that is usually not what is useful and wanted.

[Drools]Fact Objects Mistakenly Updated During Multi-Stage Rule Firing OR wtih Long Fact Object List

If the subject is confusing, that is because the problem itself is way too confusing to us. Here is the thing.
We have an application that leverages Drools' rule engine to help us evaluate java beans - Fact Objects in Drool's term - on their field values and update a particular flag field within the bean to "true" or "false" according to the evaluation result. All the evaluations and update operations are defined in the template.
The way it invokes Drools is like this. First it creates a stateful session before the first use. And when we have a list of beans, we insert them one by one to the session, and call fireAllRules. After firing the rules, we keep the session for later uses. And once we have another batch of beans, we do the same again, and again, and again...
This sounds making sense. But later during the testing, we found that although during the first batch, the rule engine worked fine, the following batches didn't. Some beans were mistakenly updated, that is, even no fields did match any rules, the flag was updated to true.
Then we thought maybe we should not reuse the session. So we put all beans from all batches into one big list. But soon we found that the problematic beans still got wrong update. And what's more weird, if we run this testing on different machines, problematic bean could be different. But if we test any of the problematic beans in unit test with itself, everything works fine.
Now I hope I have explained the problem. We are new to Drools. Maybe we did something wrong somewhere that we don't know. Could anyone here give any direction of the problem? That'll us a very big favor!
It sounds to me as though you're not clearing out working memory after each 'fireAllRules'.
If you use a stateful session, then every fact which you insert will remain in working memory until you explicitly retract it. Therefore every time you fire rules, you are re-evaluating the original set of facts, plus the new ones.
It might be useful to add a little debugging to your code. Using session.getObjects(), you will be able to see what facts are in working memory before and after execution of the rules. This should indicate what is not being retracted between evaluations.

Undoable sets of changes

I'm looking for a way to manage consistent sets of changes across several data sources, including, but not limited to, a database, some network control tools, and probably other SOAP-based services.
If one change fails for some reason (e.g. real-world app says "no", or a database insert fails), I want the whole set to be undone. So that's like transactions, just not limited to a DB.
I came up with a module that stacks up "change" objects which in turn have their init, commit, and rollback methods. When the set is DESTROYed, it rolls uncommitted changes back. This kinda works.
Still I can't overcome the feeling of a wheel being invented. Is there a standard CPAN module, or a well described common method to perform such a task? (At least GoF's "command" pattern and RAII principle come to mind...)
There are a couple of approaches to executing a Distributed transaction (which is what you're describing):
The standard pattern is called "Two-phase commit protocol".
At the moment I'm not aware of any Perl module which implements Two-phase commit, which is kind of surprising and may likely be due to a lapse in my Googling. The only thing I found was Env::Transaction but I have no clue how stable/good/functional it is.
For certain cases, a solution involving rollback via "Compensating transactions" is possible.
This is basically a special case of general rollback where, when generating task list A designed to change the target system state from S1 to S2, you at the same time generate a "compensating" task list A-neg designed to change the target system state from S2 back to S1. This is obviously only possible for certain systems, and moreover only a small subset of those are commutative (meaning that your can execute both transaction and its compensating transaction non-contiguously, e.g. the result of A + B + A-neg + B-neg is an invariant state.
Please notice that the compensating transactions does NOT always have to be engineered to be a "transaction" - one clever approach (again, only possible on certain subject domains) involves storing your data with a special "finalized" flag; then periodically harvest and destroy data with a false "finalized" flag if the data's "originating transaction timestamp" is older than some sort of threshold.