How to use Drools backward chaining to list what initial facts are needed to satisfy a goal? - drools

I'm trying to use Drools backward chaining to find out which facts are needed to get an object inserted in the working memory.
In the following example, I expect to get the fact "go2".
rule "ins a"
when
String( this == "go2" )
then
insert(new A());
end
rule "Run"
when
then
insert(new String("go1"));
end
rule "Test isThereAnyA"
when
String( this == "go1" )
isThereAnyA(a;)
then
System.out.println( "you can get " + a );
end
query isThereAnyA (A a)
a := A()
end
I've been looking at examples in the official documentation
http://docs.jboss.org/drools/release/6.1.0.Final/drools-docs/html_single/index.html#d0e21289
but they show a different situation (the rules in those examples doesn't creates new fact)
From the chart
http://docs.jboss.org/drools/release/6.1.0.Final/drools-docs/html_single/index.html#d0e21240
I think it should work but I haven't found a way to specify a query that gives me the expected results.
Thank you in advance.

Short answer:
Unfortunately backward chaining can not be used for this purpose.
It will not give you "go2" in this case.
Long answer:
In Drools, Backward chaining (BC) is a way to query the WM in a goal-driven fashion, not a way to trace back the derivation graph of a normal forward chaining inference process.
BC allows rule "Test" to retrieve As through the query "isThereAnyA", and possibly invoke other queries, but will not allow to find the "production" link between "A" and "go2". The reason is that "when..then..insert.." does not create any link between the triggering facts and the asserted conclusion, and backward chaining will not change it.
What you could do with BC is this:
query isThereAnyA_InPresenceOfA_String( A a )
isThereAnyString( $s ; )
a := A()
end
query isThereAnyString( String $s )
$s := String( this == "go2" )
end
This will pick up As only if a String "go2" is (still) present. However you'll notice that the connection between a particular instance of A and a the particular String which led to its assertion is still missing.
To know exactly which objects led to the assertion of another object you may need a different approach. Options include:
make the connection explicit : new A( $s ) // $s bound to "go2"
use "insertLogical" to establish a dependency between "go2" and A, then query the TruthMaintenanceSystem
The TMS-based one would be my tentative choice, but it also depends on your exact requirements.
This use case is common, there may be other options, including a few which are experimental as they are being developed in 6.3, but I'd rather ask a few questions first.
That is: when do you need exactly to discover the facts - during the execution of the rules, or "offline"? Is it purely for auditing purposes, or does it impact your business logic? Can you have multiple rules asserting the "same" object?
Hope this helps
Davide

Related

Accumulate/Collect vs single java loop

I have a use case in which let's say I have 100,000 pojos in the kiesession of type A and I want to apply some rules on it out of which two are -
Check is a duplicate value of type A exist.
sum of A.someInt is equal to the givenValue.
class A {
private int someInt;
private String someString;
}
For these two rules should I go with creating separate rules for them in drools like this
$sumSomeInt: Integer(this > 90) from accumulate(A( $SomeInt: someInt ), sum($SomeInt))
function boolean checkDuplicate(List input) {
int a = input.size();
int b = ((List) input.stream().distinct().collect(Collectors.toList())).size();
return a!=b;
}
dialect "java"
rule "ADuplicateRule"
when
$input : List( ) from collect(A())
eval(checkDuplicate($input))
then
throw new Exception("A list has duplicate values");
end
Or is it better to apply this in java using a single loop for this and doing both the things what I want to know is will applying these two approaches on 100k records give us a major performance difference
just a reminder I also have other rules so I have to use drools I don't want to apply some validation in java and some in drools unless there is a major performance boost
Drools optimizes the "when" clause. The "then", which is pure Java, is not optimized and is executed "as-is". eval statements are not optimized either and are bad practice and should be avoided at all costs. Just about anything you can do in an 'eval', you can do in a Drools native fashion.
Given the choice, in the "when", of doing something in Java via an eval, and doing something using Drools built-in structures and methods, always prefer doing it the Drools native way. 100k records isn't very many so it probably won't be noticeable at such a scale.
And you shouldn't be throwing exceptions -- that's also not a good practice. retract your inputs or something to force execution to terminate early.
I shared a better way of doing duplicate detection in your other question.

Drools RETE algorithm confusion

I am having an issue understanding RETE algorithm Beta node JoinNode and notNode?
Documentation says :
There are two two-input nodes, JoinNode and NotNode, and both are
types of BetaNodes. BetaNodes are used to compare 2 objects, and their
fields, to each other. The objects may be the same or different types.
By convention, we refer to the two inputs as left and right. The left
input for a BetaNode is generally a list of objects; in Drools this is
a Tuple. The right input is a single object. Two Nodes can be used to
implement 'exists' checks. BetaNodes also have memory. The left input
is called the Beta Memory and remembers all incoming tuples. The right
input is called the Alpha Memory and remembers all incoming objects.
I understood, Alpha Node: Various literal conditions for drl rules but above documentation for BetaNodes is confusing me a bit.
say below is drl condition for above diagram:
$person : Person( favouriteCheese == $cheddar )
Query: 1) what are these left and right inputs to two-input Beta Nodes exactly as explained in above documentation? I believe it's referring to facts and rules where I believe tuples would be facts?
2) notNode would be basically drl condition matching literal condition with not?
Updated question on 6Sep17:
3) I believe above diagram represent joinNode, how would notNode be represented , if above workflow is altered to suit notNode?
The condition corresponding to the diagram would be
Cheese( $name: name == "Cheddar" )
Person( favouriteCheese == $name )
Once there is a match, a new tuple consisting of the matching Cheese and Person is composed and can act as a new tuple for further matches if there is a third pattern in the condition.
A not-node would be one that asserts the non-existence of some fact. It would fire only once.
You might find a much better description of "rete" on the web.

Why does the order of how we specify the variables in a '==' comparison matter?

What i noticed is that there is a big performance difference by just changing the order of the variables that are compared with the '==' operator. For example $variable == variable is considerably slower than variable == $variable.
Why is this so and are there similar cases like this one?
By the way i am using a version of OptaPlanner from GitHub downloaded from GitHub that uses the "7.0.0-SNAPSHOT" Drools version.
This is the case in all the rules that do a cross product where i try to match variables from one pattern in another. For example:
rule "Example"
when
Class1(... , $var : var)
Class2($var == var, ...)
then
end
So when i changed the expression $var == var to var == $var immediately i could spot the difference.
When it comes to benchmarking at first i just compared this in one rule that i was focused on, so i only did this type of change in the expressions there(the other rules were deleted).
Afterwards i applied this to all the rules.
I think what happens is that
Class1(... , $var : var)
Class2(var == $var, ...)
produces a network where all Class1 facts are taken, and then the Cartesian product with all Class2 facts with identical var field is created.
In contrast,
Class1(... , $var : var)
Class2($var == var, ...)
"rewritten" by the compiler as
Class1(... , $var : var)
$c2: Class2(...)
eval( $var == $c2.var )
creates the Cartesian product of all Class1 facts and all (!) Class2 facts and only thereafter filters all where the eval is false.
The traditional syntax (Drools 5 and earlier) forced you to have the field name on the left-hand side; only later on (late 5.x, 6.x), any logical expression was permitted.
After speaking to s.o. from the Drools team, a more accurate description might be this:- It is likely that where an attribute is compared to something else an optimization is triggered. Someone from the Drools team will take a look and possibly improve it by checking also the reversed expression.

Does drools support any form of "rule activation probability" (as in how close to firing is a rule)?

I'm wondering if there is anything in drools that can be used to determine how close a rule is (or has been) to being activated?
From all that I can tell, the standard drools doesn't support anything like it, I just wondered if I might have missed something.
I glanced at Drools Chance (https://github.com/droolsjbpm/drools-chance), but it seems that it hasn't been developed a lot anymore recently and doesn't seem ready for Drools 6.x.
I know that AgendaEventListeners can be used to intercept when a rule has fired but it doesn't look like there is anything to intercept if a single condition of a rule has been evaluated to true.
Am I missing something or is this a current limitation of drools to not have any support for this kind of thing?
Thanks!
Perhaps you could use a work around. Construct a set of extra rules that write metrics for when they fire depending on your definition of close--that could be 1..n parts of the LHS for the rule of interest and/or thresholds for nearness to any part of the LHS (say you want to know when a value approaches to with 90% of another value). For complex conditionals in source code (not Drools related), I've used approaches like the below to trace complicated and nested logic:
boolean a1 = property1 > property2
boolean a2 = (!isHigh || isMedium)
boolean a3 = property 4 == property5
System.out.println ("rule2: " + a1 + " " + a2 + " " + a3);
if (a1 && a2 && a3) {
...do something
}
This is related what I called "learning the reason for failure". Consider that you have to pass n qualifications. Rather than being told that you have failed you'd like to have a list of the "pass" (and "fail") criteria.
One rule evaluating all of this in a lump sum is no good. You have to write one rule for each of the n criteria and collect the positives with the fact holding the properties under survey. Finally, one low-priority rule can check whether you have all n ("hooray") and another one can tell you "sorry, no", but it can give you a list what succeeded (and what not).
Lots of effort, but good information is always costly.

Methods of simplifying ugly nested if-else trees in C#

Sometimes I'm writing ugly if-else statements in C# 3.5; I'm aware of some different approaches to simplifying that with table-driven development, class hierarchy, anonimous methods and some more.
The problem is that alternatives are still less wide-spread than writing traditional ugly if-else statements because there is no convention for that.
What depth of nested if-else is normal for C# 3.5? What methods do you expect to see instead of nested if-else the first? the second?
if i have ten input parameters with 3 states in each, i should map functions to combination of each state of each parameter (really less, because not all the states are valid, but sometimes still a lot). I can express these states as a hashtable key and a handler (lambda) which will be called if key matches.
It is still mix of table-driven, data-driven dev. ideas and pattern matching.
what i'm looking for is extending for C# such approaches as this for scripting (C# 3.5 is rather like scripting)
http://blogs.msdn.com/ericlippert/archive/2004/02/24/79292.aspx
Good question. "Conditional Complexity" is a code smell. Polymorphism is your friend.
Conditional logic is innocent in its infancy, when it’s simple to understand and contained within a
few lines of code. Unfortunately, it rarely ages well. You implement several new features and
suddenly your conditional logic becomes complicated and expansive. [Joshua Kerevsky: Refactoring to Patterns]
One of the simplest things you can do to avoid nested if blocks is to learn to use Guard Clauses.
double getPayAmount() {
if (_isDead) return deadAmount();
if (_isSeparated) return separatedAmount();
if (_isRetired) return retiredAmount();
return normalPayAmount();
};
The other thing I have found simplifies things pretty well, and which makes your code self-documenting, is Consolidating conditionals.
double disabilityAmount() {
if (isNotEligableForDisability()) return 0;
// compute the disability amount
Other valuable refactoring techniques associated with conditional expressions include Decompose Conditional, Replace Conditional with Visitor, Specification Pattern, and Reverse Conditional.
There are very old "formalisms" for trying to encapsulate extremely complex expressions that evaluate many possibly independent variables, for example, "decision tables" :
http://en.wikipedia.org/wiki/Decision_table
But, I'll join in the choir here to second the ideas mentioned of judicious use of the ternary operator if possible, identifying the most unlikely conditions which if met allow you to terminate the rest of the evaluation by excluding them first, and add ... the reverse of that ... trying to factor out the most probable conditions and states that can allow you to proceed without testing of the "fringe" cases.
The suggestion by Miriam (above) is fascinating, even elegant, as "conceptual art;" and I am actually going to try it out, trying to "bracket" my suspicion that it will lead to code that is harder to maintain.
My pragmatic side says there is no "one size fits all" answer here in the absence of a pretty specific code example, and complete description of the conditions and their interactions.
I'm a fan of "flag setting" : meaning anytime my application goes into some less common "mode" or "state" I set a boolean flag (which might even be static for the class) : for me that simplifies writing complex if/then else evaluations later on.
best, Bill
Simple. Take the body of the if and make a method out of it.
This works because most if statements are of the form:
if (condition):
action()
In other cases, more specifically :
if (condition1):
if (condition2):
action()
simplify to:
if (condition1 && condition2):
action()
I'm a big fan of the ternary operator which get's overlooked by a lot of people. It's great for assigning values to variables based on conditions. like this
foobarString = (foo == bar) ? "foo equals bar" : "foo does not equal bar";
Try this article for more info.
It wont solve all your problems, but it is very economical.
I know that this is not the answer you are looking for, but without context your questions is very hard to answer. The problem is that the way to refactor such a thing really depends on your code, what it is doing, and what you are trying to accomplish. If you had said that you were checking the type of an object in these conditionals we could throw out an answer like 'use polymorphism', but sometimes you actually do just need some if statements, and sometimes those statements can be refactored into something more simple. Without a code sample it is hard to say which category you are in.
I was told years ago by an instructor that 3 is a magic number. And as he applied it it-else statements he suggested that if I needed more that 3 if's then I should probably use a case statement instead.
switch (testValue)
{
case = 1:
// do something
break;
case = 2:
// do something else
break;
case = 3:
// do something more
break;
case = 4
// do what?
break;
default:
throw new Exception("I didn't do anything");
}
If you're nesting if statements more than 3 deep then you should probably take that as a sign that there is a better way. Probably like Avirdlg suggested, separating the nested if statements into 1 or more methods. If you feel you are absolutely stuck with multiple if-else statements then I would wrap all the if-else statements into a single method so it didn't ugly up other code.
If the entire purpose is to assign a different value to some variable based upon the state of various conditionals, I use a ternery operator.
If the If Else clauses are performing separate chunks of functionality. and the conditions are complex, simplify by creating temporary boolean variables to hold the true/false value of the complex boolean expressions. These variables should be suitably named to represent the business sense of what the complex expression is calculating. Then use the boolean variables in the If else synatx instead of the complex boolean expressions.
One thing I find myself doing at times is inverting the condition followed by return; several such tests in a row can help reduce nesting of if and else.
Not a C# answer, but you probably would like pattern matching. With pattern matching, you can take several inputs, and do simultaneous matches on all of them. For example (F#):
let x=
match cond1, cond2, name with
| _, _, "Bob" -> 9000 // Bob gets 9000, regardless of cond1 or 2
| false, false, _ -> 0
| true, false, _ -> 1
| false, true, _ -> 2
| true, true, "" -> 0 // Both conds but no name gets 0
| true, true, _ -> 3 // Cond1&2 give 3
You can express any combination to create a match (this just scratches the surface). However, C# doesn't support this, and I doubt it will any time soon. Meanwhile, there are some attempts to try this in C#, such as here: http://codebetter.com/blogs/matthew.podwysocki/archive/2008/09/16/functional-c-pattern-matching.aspx. Google can turn up many more; perhaps one will suit you.
try to use patterns like strategy or command
In simple cases you should be able to get around with basic functional decomposition. For more complex scenarios I used Specification Pattern with great success.