What causes RiverWare rules to execute out of order? - simulation

Why do RiverWare rules sometimes execute out of order?
Normally, RiverWare rules should execute in order of priority, but sometimes they do not.

Check that the variables used in the rule are set prior to the rule's execution. Often alternative variables can be used that have already been set (e.g. those in data objects).
One scenario is that a rule has a dependency which has not been set, causing it to ineffectively execute. If a higher priority rule sets this dependency, it will execute. This is intended behavior.
Another scenario is that one of the dependencies have been changed by a higher priority rule, causing the rule to re-execute. If this behavior is not desired, one may use an execution constraint, such as: NOT HasRuleFiredSuccessfully("ThisRule")

Related

Does drools have any validate utility that will scan a knowledge base?

If any incorrect rule pushed to knowledgebase. Can we scan the knowledgeBase to filter out the bad rules, which may create problem later?
Not only does this not exist, but it's not possible.
Simple syntax errors
The really easy problems -- straight up syntax errors -- are caught at runtime when the Drools rules are compiled. The rules themselves won't be loaded at all, and the Kie classes will throw an exception.
So, for example, let's say you were in a rush and you forgot the "when" keyword:
rule "Example 1"
salience 9
$s: SomeObject( foo == "bar" )
then
$s.doSomething()
end
This will be caught quickly at the time your rules are loaded. Other syntax errors like missing imports will be caught as well.
If you're following unit testing best practices, this will be caught at the unit testing stage, which -- given proper CI practices -- should keep the bad rules from being merged into your main code line.
If you're loading your rules from a Maven repository as a kjar, you'll likely want to have some sort of testing harness before publishing that kjar so that you don't end up with a situation where your service pulls new rules and ends up with a nonfunctional artifact.
Logical errors
Unlike syntax errors which can be caught early, logical errors really can't. The problem boils down to the fact that Drools is a framework for business logic -- it doesn't actually understand your business logic. And it can't, and it shouldn't need to.
If you write bad rules that put yourself into a bad situation, Drools has no way of actually identifying this because it doesn't understand your business needs.
Consider the following rules:
rule "A"
when
$f: Foo( value == 0 )
then
$f.setValue(100);
update($f);
end
rule "B"
when
$f: Foo( value > 10 )
then
$f.setValue(0);
update($f);
end
Assuming your initial input is Foo(value: 50), rule B will fire, setting value to 0. Then rule A will fire, setting value to 100. Then rule B will fire, setting value to 0. And so on, forever, until you kill the process manually.
This simple infinite loop is poor rule design, but it's only a bug because of your data characteristics. The above example is obvious and contrived, but that's because I'm just using it as an example. In practice, I once saw a set of 27 rules that triggered in a particular order given a very specific input and looped like that -- it took me a whole week to track down while manually tracing transactions through a customer system.
Drools can't "know" that this will cause an infinite loop because it doesn't understand your business rules or data. In the above Foo example with rules A and B, you could say "well obviously Drools could see that it's setting value in a way that'll trigger rules in a loop given the 'update' call" -- but you're making assumptions. What if I told you that there are situations where these rules don't loop? I never showed you the Foo model, maybe setValue has side effects.
The way you'd catch these is by actually validating your business logic. I usually start with unit tests and treat it as if I'm testing an "if" condition -- check the boundaries, check the edge cases, check the good value, check an obviously bad value, etc. Then I add end-to-end tests passing in real (sanitized) inputs collected from production -- the most common inputs, the weird inputs, the ones that have shown up in previous bug reports, etc.
And because that's not enough and you really need an understanding of all rules, we have internal style guidelines around DRL design to keep something like the above looping examples from happening. (We don't permit update, for example. There are specific rules around salience and retract as well.)
But built-in validation for this? Not actually possible; you need to do your own due diligence just like you would for your code.

Guided Decision Table and Guided Rule has same ruleflow group. Which rule would get executed first?

I have a Guided Rule file that sets a configuration for some convergence factors. Later I have a Guided Decision Table which has the same rule flow group.
Even when I don't mention any salience in both these files, the Guided Rule gets executed first and sets the configuration value, and the same model is later imported in the Guided Decision table and these default values are set in Guided rule are used in the Guided Decision table.
Is there a specific reason "why guided rules get execute first and guided decision tables gets executed later, even though they have the same rule flow group"
Unless you're using saliences, execution order is non-deterministic. There are no guarantees that Rule A will always go before Rule B. It might always go in this order (A -> B) right now, but that's not guaranteed, and tomorrow or the next time you do a rules change or a version update, it might go in a different order. It usually has to do with the order that rules are loaded into memory, (which is why un-salience'd rules in a single DRL file tend to be executed top of file to bottom of file, because that's the order they're read.)
If your rules are such that you require them to be executed in a specific order, you should put saliences on them so that Rule X will always execute before Rule Y because of the saliences on them. Alternatively, you could rewrite your rules to not rely on execution order (this is considered good practice anyway.)
The only order guarantee that Drools provides is that rules of a given salience will execute at the same time, though the order of those rules within the salience is not guaranteed. Rules with no salience are all defaulted to salience 0, so this guarantee holds true. All of the rules execute, but no order is guaranteed and cannot necessarily be consistently determined ahead of time (hence "non-deterministic.")
So really, at the end of the day, when your question is "which rule gets executed first?" the answer is -- unless you have saliences, it shouldn't matter. And if it does matter, you need to fix your rules.

How Drools works?

I have a scenario wherein I need to add rules to a rule engine dynamically.
What if I add same rule twice/multiple times?
I am not able to get exact behavior of Drools by doing POC(I am a newbie to Drools).
Also, if a rule once inserted remain in knowledgeBase until I explicitly remove it?
You cannot add the save rule twice (which is sufficient to rule out "multiple times"). If it is the "same", it simply replaces the previous "same" rule. If only the titles differ, then (it isn't the same rule and) you have two rules with the same LHS and the same RHS. This may, or may not, produces the same reaction a second time: this depends on what the RHS does or does not.
You can and should clarify these things by reading the documentation and/or experimenting with a simple setup.

Expert/Rule Engine that updates facts atomically?

Atomically might not be the right word. When modelling cellular automata or neural networks, usually you have two copies of the system state. One is the current state, and one is the state of the next step that you are updating. This ensures consistency that the state of the system as a whole remains unchanged while running all of the rules to determine the next step. For example, if you run the rules for one cell/neuron to determine the state of it for the next step, you then run the rules for the next cell, it's neighbor, you want to use as the input for those rules the current state of the neighbor cell, not its updated state.
This may seem inefficient due to the fact that each step requires you copy all of the current step states to the next step states before updating them, however it is important to do this to accurately simulate the system as if all cells/neurons were actually being processed simultaneously, and thus all inputs for rules/firing functions were the current states.
Something that has bothered me when designing rules for expert systems is how one rule can run, update some facts that should trigger other rules to run, and you might have 100 rules queued up to run in response, but the salience is used as a fragile way to ensure the really important ones run first. As these rules run, the system changes more. The state of the facts are consistently changing, so by the time you get to processing the 100th rule, the state of the system has changed significantly since the time it was added to the queue when it was really responding to the first fact change. It might have changed so drastically that the rule doesn't have a chance to react to the original state of the system when it really should have. Usually as a workaround you carefully adjust its salience, but then that moves other rules down the list and you run into a chicken or egg problem. Other workarounds involve adding "processing flag" facts that serve as a locking mechanism to suppress certain rules until other rules process. These all feel like hacks and cause rules to include criteria beyond just the core domain model.
If you build a really sophisticated system that modeled a problem accurately, you would really want the changes to the facts to be staged to a separate "updates" queue that doesn't affect the current facts until the rules queue is empty. So lets say you make a fact change that fills the queue of rules to run with 100 rules. All of these rules would run, but none of them would update facts in the current fact list, any change they make gets queued to a change list, and that ensures no other rules get activated while the current batch is processing. Once all rules are processed, then the fact changes get applied to the current fact list, all at once, and then that triggers more rules to be activated. Rinse repeat. So it becomes much like how neural networks or cellular automata are processed. Run all rules against an unchanging current state, queue changes, after running all rules apply the changes to current state.
Is this mode of operation a concept that exist in the academic world of expert systems? I'm wondering if there is a term for it.
Does Drools have the capability to run in a way that allows all rules to run without affecting the current facts, and queue fact changes separately until all rules have run? If so, how? I don't expect you to write the code for me, but just some keywords of what it's called or keywords in the API, some starting point to help me search.
Do any other expert/rule engines have this capability?
Note that in such a case, the order rules run in no longer matters, because all of the rules queued to run will all be seeing only the current state. Thus as the queue of rules is run and cleared, none of the rules see any of the changes the other rules are making, because they are all being run against the current set of facts. Thus the order becomes irrelevant and the complexities of managing rule execution order go away. All fact changes are pending and not applied to the current state until all rules have been cleared from the queue. Then all of those changes are applied at once, and thus cause relevant rules to queue again. So my goal is not to have more control over the order that rules run in, but to avoid the issue of rule execution order entirely by using an engine that simulates simultaneous rule execution.
If I understand what you describe:
You have one fact that is managed by many rules
Each rule should apply on the initial value of your fact and has no right to modify the fact value (to not modify other rules'executions)
You then batch all the updates made by the rules on your fact
Other rules apply on this new fact value in a similar manner 'simutanously'
It seems to me that it is a Unit of Work design pattern just like Hibernate implements it (and many ORM in fact): http://www.codeproject.com/Articles/581487/Unit-of-Work-Design-Pattern
Basically you store in-memory all the changes (in a 'technical' fact for instance) and then execute a 'transaction' when all the rules based on the initial value have been fired that updates the fact value, and so on. Hibernate does that with its session (you modify your attached object, then when required it executes the update query on the database, not all modifications on the java object produce queries on your database).
Still you will have troubles if updates conflict (same fact field value modified, which value to choose? Same as a source version control conflict), you will have to define a determinist way to order updates, but it will be defined only once and available for all rules and for other changes it will work seamlessly.
This workaournd may/may not work based on your rather vague description. If you really are concerned about rules triggering further activations, why not queue the intermediate state yourself. And once the current evaluation is complete, insert those new facts into the working memory.
You would have to invoke fireAllRules() after inserting each fact though, this could be quite expensive. And then in the rules, rather than inserting the facts directly, push these into a queue. Once the above call returns, walk through the queue doing the same (or after inserting the original facts completely...)
I would imagine that this will be quite slow, to speed up, you could have multiple parallel working memories with the same rules, and evaluate multiple facts in one go into several queues etc. But things get pretty hairy..
Anyway, just an idea that's too long for the comments...

How to start working with a large decision table

Today I've been presented with a fun challenge and I want your input on how you would deal with this situation.
So the problem is the following (I've converted it to demo data as the real problem wouldn't make much sense without knowing the company dictionary by heart).
We have a decision table that has a minimum of 16 conditions. Because it is an impossible feat to manage all of them (2^16 possibilities) we've decided to only list the exceptions. Like this:
As an example I've only added 10 conditions but in reality there are (for now) 16. The basic idea is that we have one baseline (the default) which is valid for everyone and all the exceptions to this default.
Example:
You have a foreigner who is also a pirate.
If you go through all the exceptions one by one, and condition by condition you remove the exceptions that have at least one condition that fails. In the end you'll end up with the following two exceptions that are valid for our case. The match is on the IsPirate and the IsForeigner condition. But as you can see there are 2 results here, well 3 actually if you count the default.
Our solution
Now what we came up with on how to solve this is that in the GUI where you are adding these exceptions, there should run an algorithm which checks for such cases and force you to define the exception more specifically. This is only still a theory and hasn't been tested out but we think it could work this way.
My Question
I'm looking for alternative solutions that make the rules manageable and prevent the problem I've shown in the example.
Your problem seem to be resolution of conflicting rules. When multiple rules match your input, (your foreigner and pirate) and they end up recommending different things (your cangetjob and cangetevicted), you need a strategy for resolution of this conflict.
What you mentioned is one way of resolution -- which is to remove the conflict in the first place. However, this may not always be possible, and not always desirable because when a user adds a new rule that conflicts with a set of old rules (which he/she did not write), the user may not know how to revise it to remove the conflict.
Another possible resolution method is prioritization. Mark a priority on each rule (based on things like the user's own authority etc.), sort the matching rules according to priority, and apply in ascending sequence of priority. This usually works and is much simpler to manage (e.g. everybody knows that the top boss's rules are final!)
Prioritization may also be used to mark a certain rule as "global override". In your example, you may want to make "IsPirate" as an override rule -- which means that it overrides settings for normal people. In other words, once you're a pirate, you're treated differently. This make it very easy to design a system in which you have a bunch of normal business rules governing 90% of the cases, then a set of "exceptions" that are treated differently, automatically overriding certain things. In this case, you should also consider making "?" available in the output columns as well.
One other possible resolution method is to include attributes in each of your conditions. For example, certain conditions must have no "zeros" in order to pass (? doesn't matter). Some conditions must have at least one "one" in order to pass. In other words, mark each condition as either "AND", "OR", or "XOR". Some popular file-system security uses this model. For example, CanGetJob may be AND (you want to be stringent on rights-to-work). CanBeEvicted may be OR -- you may want to evict even a foreigner if he is also a pirate.
An enhancement on the AND/OR method is to provide a threshold that the total result must exceed before passing that condition. For example, putting CanGetJob at a threshold of 2 then it must get at least two 1's in order to return 1. This is sometimes useful on conditions that are not clearly black-and-white.
You can mix resolution methods: e.g. first prioritize, then use AND/OR to resolve rules with similar priorities.
The possibilities are limitless and really depends on what your actual needs are.
To me this problem reminds business rules engine where there is no known algorithm to define outputs from inputs (e.g. using boolean logic) but the user (typically some sort of administrator) has to define all or some the logic itself.
This might sound a bit of an overkill but OTOH this provides virtually limit-less extension capabilities: you don't have to code any new business logic, just define a new rule set.
As I understand your problem, you are looking for a nice way to visualise the editing for these rules. But this all depends on your programming language and the tool you select for this. Java, for example, has JBoss Drools. Quoting their page:
Drools Guvnor provides a (logically
centralized) repository to store you
business knowledge, and a web-based
environment that allows business users
to view and (within certain
constraints) possibly update the
business logic directly.
You could possibly use this generic tool or write your own.
Everything depends on what your actual rules will look like. Rules like 'IF has an even number of these properties THEN' would be painful to represent in this format, whereas rules like 'IF pirate and not geek THEN' are easy.
You can 'avoid the ambiguity' by stating that you'll always be taking the first actual match, in other words your rules have a priority. You'd then want to flag rules which have no effect because they are 'shadowed' by rules higher up. They're not hard to find, so it's something your program should do.
Your interface could also indicate groups of rules where rules within the group can be in any order without changing the outcomes. This will add clarity to what the rules are really saying.
If some of your outputs are relatively independent of the others, you will also get a more compact and much clearer table by allowing question marks in the output. In that design the scan for first matching rule is done once for each output. Consider for example if 'HasChildren' is the only factor relevant to 'Can Be Evicted'. With question marks in the outputs (= no effect) you could be halving the number of exception rules.
My background for this is circuit logic design, not business logic. What you're designing is similar to, but not the same as, a PLA. As long as your actual rules are close to sum of products then it can work well. If your rules aren't, for example the 'even number of these properties' rule, then the grid like presentation will break down in a combinatorial explosion of cases. Your best hope if your rules are arbitrary is to get a clearer more compact presentation with either equations or with diagrams like a circuit diagram. To be avoided, if you can.
If you are looking for a Decision Engine with a GUI, than you can try this one: http://gandalf.nebo15.com/
We just released it, it's open source and production ready.
You probably need some kind of inference engine. Think about doing it in prolog.