How to obtain the minimum of an updated value in Drools? - drools

I want to use Drools to manage some questions in a form. Depending of the answer of the question, the question score be updated with a value. Then I want to obtain the minimum value of all questions answered in a category.
One example:
Category "Replicant or not?":
You’re in a desert walking along in the sand when all of the sudden you look down, and you see a tortoise, crawling toward you. You reach down, you flip the tortoise over on its back. Why is that?
1. it was an error. (5 points).
2. It is the funniest think to do at a dessert. (3 points).
3. You want to kill the tortoise in the most painful way (1 point).
//More other questions.
At the end of the test, the minimum value for each category will be the used one.
Then I have defined some drools rules to define the score for each one (in a spreadsheet, by I put the rule translations):
rule "Question_1"
when
$qs: Questions(checked == false);
$q: Question(category == 'cat1', question == "q1", answer == "a") from $qs.getQuestions();
then
$q.setValue(5);
$qs.setChecked(true);
update($qs);
end
checked value is used to avoid to reuse the rule when updating. category, question, answer are used for classifying the question.
And then, my rule for calculate the minimum is:
rule "Min value"
when
$qs: Questions(checked == true);
not q:Question($v: value; value < $v) from $qs.getQuestions();
then
$qs.setMinValue($v);
System.out.println("The smallest value is: "+ $v);
end
The error obtained is:
$v cannot be resolved to a variable
Then, the question is: How can I obtain the minimum value of a value setted by a previous rule?

The duplication of "from " can be avoided, using the accumulate CE which is about 25% faster.
rule "getmin"
when
Questions( $qs: questions )
accumulate( Question( $v: value ) from $qs; $min: min( $v ) )
then
System.out.println( "minimum is " + $min );
end

You are trying to get $v from a Fact that doesn't exist: not q:Question($v: value; value < $v) from $qs.getQuestions();
What is the expected value of $v is there is no Question that matches that pattern!?
What the seccond pattern is basically saying is: "there is no Question from Questions.getQuestions() that has a value ($v) and that that value is less than the same value"
What you have to do is to bind $v to the positive pattern. Something like this:
rule "Min value"
when
$qs: Questions(checked == true);
Question($v: value) from $qs.getQuestions()
not Question(value < $v) from $qs.getQuestions();
then
$qs.setMinValue($v);
System.out.println("The smallest value is: "+ $v);
end
Hope it helps,
---EDITED: added missing Question positive pattern

Related

OptaPlanner: Drools rule on consecutive shift assignments

The context is Employee Shift Assignment with OptaPlanner using Drools rules for calculating scores.
My Employees cannot work for, say, for more than three consecutive days without a rest day.
I implement such a constraint very stupidly as:
rule "No more than three consecutive working days"
when
ShiftAssignment(
$id1 : id,
$empoloyee : empoloyee != null,
$shift1 : shift
)
ShiftAssignment(
id > $id1,
empoloyee == $empoloyee,
shift.isConsecutiveDay($shift1),
$id2 : id,
$shift2 : shift
)
ShiftAssignment(
id > $id2,
empoloyee == $empoloyee,
shift.isConsecutiveDay($shift2),
$id3 : id,
$shift3 : shift
)
ShiftAssignment(
id > $id3,
empoloyee == $empoloyee,
shift.isConsecutiveDay($shift10)
)
then
scoreHolder.penalize(kcontext);
end
I hope the name of the methods/variables clearly reveal what they do/mean.
Is there a more convenient and smart way to implement such a rule? Keep in mind that the three days above may need to change to a bigger number (I used three to avoid a more realistic ten and more lines of code in the rule). Thanks.
If we can assume an employee takes up to a single shift per day and the shift.isConsecutiveDay() may be replaced by something like shift.day == $shift1.day + 1, exists can be used:
when
ShiftAssignment($employee : empoloyee != null, $shift1 : shift)
exists ShiftAssignment(employee == $employee, shift.day == $shift1.day + 1)
exists ShiftAssignment(employee == $employee, shift.day == $shift1.day + 2)
exists ShiftAssignment(employee == $employee, shift.day == $shift1.day + 3)
then
If such an assumption cannot be made, your solution should work, with one potential corner case to think about:
The rule tries to filter out combinations of the same shifts by the condition id > $id1. This condition works, but the IDs must be generated ascendingly by the time of the shift, otherwise, it clashes with shift.isConsecutiveDay(...). In case this property cannot be guaranteed, checking for ID inequality could be preferable.
I used a combination of rules to achieve this. First rule sets up the start of a consecutive work sequence, second one sets up the end, 3rd rule creates a "Work Sequence" to fit between the start and end. Finally the "Max Consecutive Days" rule actually checks your "Work Sequence" against a limit on number of consecutive days.
This paradigm is actually in the nurse rostering example:
https://github.com/kiegroup/optaplanner/blob/master/optaplanner-examples/src/main/resources/org/optaplanner/examples/nurserostering/solver/nurseRosteringConstraints.drl

In Drools, what does it mean to compare IDs

I understand the basics of writing drools rules now but i can't seem to understand in the examples that i've seen (optaplanner), there are comparisons of IDs. Is this necessary? Why is it there?
// RoomOccupancy: Two lectures in the same room at the same period.
// Any extra lecture in the same period and room counts as one more violation.
rule "roomOccupancy"
when
Lecture($leftId : id, period != null, $period : period, room != null, $room : room)
// $leftLecture has lowest id of the period+room combo
not Lecture(period == $period, room == $room, id < $leftId)
// rightLecture has the same period
Lecture(period == $period, room == $room, id > $leftId, $rightId : id)
then
scoreHolder.addHardConstraintMatch(kcontext, -1);
end
From my understanding deleting the line with not Lecture(.. and leaving Lecture(period == $period, room == $room) should do the trick. Is my understanding correct or am I missing some use cases here?
You should understand that a pattern such as
$a: Lecture()
$b: Lecture()
with two Lecture facts A an B in the system will produce the following matches and firings:
$a-A, $b-B (1)
$a-B, $b-A (2)
$a-A, $b-A
$a-B, $b-B
Therefore, to reduce the unwanted combinations you need have a way to ascertain to have not identical facts matching (bound to) $a and $b:
$a: Lecture( $ida: id )
$b: Lecture( $idb: id != $ida )
However, using not equal still produces combinations (1) and (2).
Given 2 queens A and B, the id comparison in the "no 2 queens on the same horizontal row" constraint makes sure that we only match A-B and not B-A, A-A and B-B.
Same principle for lectures.

Latest n events, matching a certain pattern

Is there a built-in feature in Drools, selecting the latest n events, matching a certain pattern? I've read about sliding length windows in the documentation and the stock tick example seemed to be exactly what I wanted:
"For instance, if the user wants to consider only the last 10 RHT Stock Ticks, independent of how old they are, the pattern would look like this:"
StockTick( company == "RHT" ) over window:length( 10 )
When testing the example, it seems to me that it is evaluted more like a
StockTick( company == "RHT" ) from StockTick() over window:length( 10 )
selecting the latest 10 StockTick events and afterwards filtering them by company == "RTH", resulting in 0 to 10 RHT-Ticks, event though the stream contains more then 10 RTH-events.
A workaround is something like:
$tick : StockTick( company == "RHT" )
accumulate(
$other : StockTick(this after $tick, company == "RHT" );
$cnt : count(other);
$cnt < 10)
which has bad performance and readability.
Most likely you are seeing an initial phase where the count of events in the window and according to the constraints hasn't reached the length specified in window:length yet. For instance,
rule "Your First Rule"
when
accumulate( $st : Applicant($age: age > 5) over window:length(10)
from entry-point X,
$avg: average ( $age ), $cnt: count( $st ))
then
System.out.println("~~~~~avg~~~~~");
System.out.println($avg + ", count=" + $cnt);
System.out.println("~~~~~avg~~~~~");
end
displays an output even before there are 10 matching Applicants but later on, $cnt never falls below 10, even though $age ranges from 0 to 9, periodically.
If you do think you have found an example supporting your claim, please provide full code for reproduction and make sure to indicate the Drools version.
Your workaround is very bad indeed, as it accumulates for each StockTick. But a window:length(n) can be very efficiently implemented by using an auxiliary fact maintaining a list of n events. This may even be more advantageous than window:length.

Conceptual meaning of 'not' keyword; evaluating between objects

I am trying to find a BucketTotal object which has the smallest total in a Drools Planner project. I adapted this from example code.
rule "insertMinimumBucketTotal"
when
$b : BucketTotal($total : total)
not BucketTotal(total > $total) // CONFUSED HERE
then
insertLogical(new MinimumBucketTotal($total));
end
As far as my reasoning went, it meant "find BucketTotal object $b, such that there doesnt exist another BucketTotal object whose total is greater than total of $b".
Turns out, it meant the opposite (and I corrected it).
Please explain how Drools reasons that statement to find $b.
Indeed your are confusing things. "not" means "not exists". So if you want to find the minimum total you can do:
rule "minimum"
when
BucketTotal( $min : total )
not BucketTotal( total < $min )
then
// do something with $min
end
The above is usually the more performant way of doing it, but you can also use accumulate if you prefer:
rule "minimum"
when
accumulate( BucketTotal( $total : total ),
$min : min( $total ) )
then
// do something with $min
end

Wanted: a quicker way to check all combinations within a very large hash

I have a hash with about 130,000 elements, and I am trying to check all combinations within that hash for something (130,000 x 130,000 combinations). My code looks like this:
foreach $key1 (keys %CNV)
{
foreach $key2 (keys %CNV)
{
if (blablabla){do something that doesn't take as long}
}
}
As you might expect, this takes ages to run. Does anyone know a quicker way to do this? Many thanks in advance!!
-Abdel
Edit: Update on the blablabla.
Hey guys, thanks for all the feedback! Really appreciate it. I changed the foreach statement to:
for ($j=1;$j<=24;++$j)
{
foreach $key1 (keys %{$CNV{$j}})
{
foreach $key2 (keys %{$CNV{$j}})
{
if (blablabla){do something}
}
}
}
The hash is now multidimensional:
$CNV{chromosome}{$start,$end}
I'll elaborate on what I'm exactly trying to do, as requested.
The blablabla is the following:
if ( (($CNVstart{$j}{$key1} >= $CNVstart{$j}{$key2}) && ($CNVstart{$j}{$key1} <= $CNVend{$j}{$key2})) ||
(($CNVend{$j}{$key1} >= $CNVstart{$j}{$key2}) && ($CNVend{$j}{$key1} <= $CNVend{$j}{$key2})) ||
(($CNVstart{$j}{$key2} >= $CNVstart{$j}{$key1}) && ($CNVstart{$j}{$key2} <= $CNVend{$j}{$key1})) ||
(($CNVend{$j}{$key2} >= $CNVstart{$j}{$key1}) && ($CNVend{$j}{$key2} <= $CNVend{$j}{$key1}))
)
In short: The hash elements represent a specific part of the DNA (a so called "CNV", think of it like a gene for now), with a start and an end (which are integers representing their position on that particular chromosome, stored in hashes with the same keys: %CNVstart & %CNVend). I'm trying to check for every combination of CNVs whether they overlap. If there are two elements that overlap within a family (I mean a family of persons whose DNA I have and read in; there is also a for-statement inside the foreach-statement that let's the program check this for every family, which makes it last even longer), I check whether they also have the same "copy number" (which is stored in another hash with the same keys) and print out the result.
Thank you guys for your time!
It sounds like Algorithm::Combinatorics may help you here. It's intended to provide "efficient generation of combinatorial sequences." From its docs:
Algorithm::Combinatorics is an
efficient generator of combinatorial
sequences. ... Iterators do not use
recursion, nor stacks, and are written
in C.
You could use its combinations sub-routine to provide all possible 2 key combos from your full set of keys.
On the other hand, Perl itself is written in C. So I honestly have no idea whether or not this would help at all.
Maybe by using concurrency? But you would have to be carefull with what you do with a possitive match as to not get problems.
E.g. take $key1, split it in $key1A and §key1B. The create two separate threads, each containing "half of the loop".
I am not sure exactly how expensive it is to start new threads in Perl but if your positive action doesn't have to be synchronized I imagine that on matching hardware you would be faster.
Worth a try imho.
define blah blah.
You could write it like this:
foreach $key1 (keys %CNV)
{
if (blah1)
{
foreach $key2 (keys %CNV)
{
if (blah2){do something that doesn't take as long}
}
}
}
This pass should be O(2N) instead of O(N^2)
The data structure in the question is not a good fit to the problem. Let's try it this way.
use Set::IntSpan::Fast::XS;
my #CNV;
for ([3, 7], [4, 8], [9, 11]) {
my $set = Set::IntSpan::Fast::XS->new;
$set->add_range(#{$_});
push #CNV, $set;
}
# The comparison is commutative, so we can cut the total number in half.
for my $index1 (0 .. -1+#CNV) {
for my $index2 (0 .. $index1) {
next if $index1 == $index2; # skip if it's the same CNV
say sprintf(
'overlap of CNV %s, %s at indices %d, %d',
$CNV[$index1]->as_string, $CNV[$index2]->as_string, $index1, $index2
) unless $CNV[$index1]->intersection($CNV[$index2])->is_empty;
}
}
Output:
overlap of CNV 4-8, 3-7 at indices 1, 0
We will not get the overlap of 3-7, 4-8 because it is a duplicate.
There's also Bio::Range, but it doesn't look so efficient to me. You should definitely get in touch with the bio.perl.org/open-bio people; chances are what you're doing has been done already a million times before they already have the optimal algorithm all figured out.
I think I found the answer :-)
Couldn't have done it without you guys though. I found a way to skip most of the comparisons I make:
for ($j=1;$j<=24;++$j)
{
foreach $key1 (sort keys %{$CNV{$j}})
{
foreach $key2 (sort keys %{$CNV{$j}})
{
if (($CNVstart{$j}{$key2} < $CNVstart{$j}{$key1}) && ($CNVend{$j}{$key2} < $CNVstart{$j}{$key1}))
{
next;
}
if (($CNVstart{$j}{$key2} > $CNVend{$j}{$key1}) && ($CNVend{$j}{$key2} > $CNVend{$j}{$key1}))
{
last;
}
if ( (($CNVstart{$j}{$key1} >= $CNVstart{$j}{$key2}) && ($CNVstart{$j}{$key1} <= $CNVend{$j}{$key2})) ||
(($CNVend{$j}{$key1} >= $CNVstart{$j}{$key2}) && ($CNVend{$j}{$key1} <= $CNVend{$j}{$key2})) ||
(($CNVstart{$j}{$key2} >= $CNVstart{$j}{$key1}) && ($CNVstart{$j}{$key2} <= $CNVend{$j}{$key1})) ||
(($CNVend{$j}{$key2} >= $CNVstart{$j}{$key1}) && ($CNVend{$j}{$key2} <= $CNVend{$j}{$key1}))
) {print some stuff out}
}
}
}
What I did is:
sort the keys of the hash for each foreach loop
do "next" if the CNVs with $key2 still haven't reached the CNV with $key1 (i.e. start2 and end2 are both smaller than start1)
and probably the most time-saving: end the foreach loop if the CNV with $key2 has overtaken the CNV with $key1 (i.e. start2 and end2 are both larger than end1)
Thanks a lot for your time and feedback guys!
Your optimisation with taking out the j into the outer loop was good, but the solution is still far from optimal.
Your problem does have a simple O(N+M) solution where N is the total number of CNVs and M is the number of overlaps.
The idea is: you walk through the length of DNA while keeping track of all the "current" CNVs. If you see a new CNV start, you add it to the list and you know that it overlaps with all the other CNVs currently in the list. If you see a CNV end, you just remove it from the list.
I am not a very good perl programmer, so treat the following as a pseudo-code (it's more like a mix of Java and C# :)):
// input:
Map<CNV, int> starts;
Map<CNV, int> ends;
// temporary:
List<Tuple<int, bool, CNV>> boundaries;
foreach(CNV cnv in starts)
boundaries.add(starts[cnv], false, cnv);
foreach(CNV cnv in ends)
boundaries.add(ends[cnv], true, cnv);
// Sort first by position,
// then where position is equal we put "starts" first, "ends" last
boundaries = boundaries.OrderBy(t => t.first*2 + (t.second?1:0));
HashSet<CNV> current;
// main loop:
foreach((int position, bool isEnd, CNV cnv) in boundaries)
{
if(isEnd)
current.remove(cnv);
else
{
foreach(CNV otherCnv in current)
OVERLAP(cnv, otherCnv); // output of the algorithm
current.add(cnv);
}
}
Now I'm not a perl warrior, but based on the information given it is the same in any programming language; unless you sort the "hash" on the property you want to check and do a binary lookup you won't improve any performance in a lookup.
You can also if it is possible calculate which indexes in your hash would have the properties you are interested in, but as you have no information regarding such a possibility, this would perhaps not be a solution.