Is rule selection in Drools nondeterministic or random? - drools

Consider in Drools a set of rules that make up an activation group, all of them have the same salience and all of them are activated. Because they are in an activation group, only one of them can fire.
I want to know if the Drools engine makes sure that each of these rules has approximately the same chance of firing (selection is random) or if it is only formally undefined which rule will fire (selection is nondeterministic). In the latter case the rule fired would depend on the machine's state if anything and in practice e.g. almost always the top rule will fire.

Given that there are n rules with equal salience with all of them being activated: there is no randomizing being done by the engine to select the next rule to be fired, i.e., you can't use the engine to select a winner in a lottery.
There is something like a priority queue, and new activations are entered according to some efficient procedure. You may read that code or devise some test to determine how, but you should not base your application on these findings.

Related

Guided Decision Table and Guided Rule has same ruleflow group. Which rule would get executed first?

I have a Guided Rule file that sets a configuration for some convergence factors. Later I have a Guided Decision Table which has the same rule flow group.
Even when I don't mention any salience in both these files, the Guided Rule gets executed first and sets the configuration value, and the same model is later imported in the Guided Decision table and these default values are set in Guided rule are used in the Guided Decision table.
Is there a specific reason "why guided rules get execute first and guided decision tables gets executed later, even though they have the same rule flow group"
Unless you're using saliences, execution order is non-deterministic. There are no guarantees that Rule A will always go before Rule B. It might always go in this order (A -> B) right now, but that's not guaranteed, and tomorrow or the next time you do a rules change or a version update, it might go in a different order. It usually has to do with the order that rules are loaded into memory, (which is why un-salience'd rules in a single DRL file tend to be executed top of file to bottom of file, because that's the order they're read.)
If your rules are such that you require them to be executed in a specific order, you should put saliences on them so that Rule X will always execute before Rule Y because of the saliences on them. Alternatively, you could rewrite your rules to not rely on execution order (this is considered good practice anyway.)
The only order guarantee that Drools provides is that rules of a given salience will execute at the same time, though the order of those rules within the salience is not guaranteed. Rules with no salience are all defaulted to salience 0, so this guarantee holds true. All of the rules execute, but no order is guaranteed and cannot necessarily be consistently determined ahead of time (hence "non-deterministic.")
So really, at the end of the day, when your question is "which rule gets executed first?" the answer is -- unless you have saliences, it shouldn't matter. And if it does matter, you need to fix your rules.

How do I know which drools rule is running now?

For example, I am load a lot of drools rules to run, how do I know which drools rule now is running? So I can know find out the rule
Assuming you're talking about the right hand side of the rules, you'll want to use an AgendaEventListener. This is an interface which defines a listener that you can create that watches the Event Lifecycle. For more information about the event model, please refer to the Drools documentation.
The easiest way to do this would be to extend either DefaultAgendaEventListener or DebugAgendaEventListener. Both of these classes implement all of the interface methods. The Default listener implements each method as a "no-op", so you can override just the methods you care about. The Debug listener implements each method with a logging statement, logging the toString() of the triggering event to INFO. If you're just learning about the Drools lifecycle, hooking up the various Debug listeners is a great way to watch and learn how rules and events process in rules.
(Also the cool thing about listeners is that they allow you to put breakpoints in the "when" clause that trigger when specific conditions are met -- eg when a rule match is created. In general I find that listeners are a great debugging tool because they allow you to put breakpoints in methods that trigger when different parts of the Drools lifecycle occur.)
Anyway, what you'll want to do is create an event listener and then pay attention to one or more of these specific events:
BeforeMatchFired
AfterMatchFired
MatchCreated
Which events to pay attention to depend on where you think the issue is.
If you think the issue is in the "when" clause (left-hand side, LHS), the MatchCreated event is what is triggered when Drools evaluates the LHS and decides that this rule is valid for firing based on the input data. It is then put on, effectively, a priority queue based on salience. When the rule is the highest priority on the queue, it is picked up for firing -- at this point the BeforeMatchFired event is triggered; note that this is before the "then" clause (right-hand side, RHS) is evaluated. Then Drools will actually do the work on the RHS, and once it finishes, trigger the AfterMatchFired.
Things get a little more complicated when your rules do things like updates/retracts/etc -- you'll start having to consider potential match cancellations when Drools re-evaluates the LHS and decides that a rule is no longer valid to be fired per the facts in working memory. But in general, these are the tools you'll want to start with.
The way I would traditionally identify long-running rules would be to start timing within the BeforeMatchFired and to stop timing in the AfterMatchFired, and then log the resulting rule execution time. Note that you want to be careful here to log the execution of the current rule, tracking it by name; if your rule extends another rule you might find that your execution flow goes BeforeMatchFired(Child) -> BeforeMatchFired(Parent) -> AfterMatchFired(Parent) -> AfterMatchFired(Child), so if you're naively stopping a shared timer you might start having issues. My preferred way of doing this is by tracking timers by rule name in thread local or even a thread-safe map implementation, but you can go whichever route you'd like.
If you're using a very new version of Drools (7.41+), there is a new library called drools-metric which you can use to identify slow rules. I haven't personally used this library yet because the newest versions of Drools have started introducing non-backwards-compatible changes in minor releases, but this is an option as well.
You can read more about drools-metric in the official documentation here (you'll need to scroll down a bit.) There's some tuning you'll need to do because the module only logs instances where the thresholds are exceeded. The docs that I've linked to include the Maven dependency you'll need to import, along with information about configuration, and some examples of the output and how to understand what it's telling you.

Is rule engine suitable for validating data against set of rules?

I am trying to design an application that allows users to create subscriptions based on different configurations - expressing their interest to receive alerts when those conditions are met.
While evaluating the options for achieving the same, I was thinking about utilizing a generic rule engine such as Drools to achieve the same. Which seemed to be a natural fit to this problem looking at an high-level. But digging deeper and giving it a bit more thought, I am doubting if Business Rule Engine is the right thing to use.
I see Rule engine as something that can select a Rule based on predefined condition and apply the Rule to that data to produce an outcome. Whereas, my requirement is to start with a data (the event that is generated) and identify based on Rules (subscriptions) configured by users to identify all the Rules (subscription) that would satisfy the event being handled. So that Alerts can be generated to all those Subscribers.
To give an example, an hypothetical subscription from an user could be, to be alerted when a product in Amazon drops below $10 in the next 7 days. Another user would have created a subscription to be notified when a product in Amazon drops below $15 within the next 30 days and also offers free one-day shipping for Prime members.
After a bit of thought, I have settled down to storing the Rules/Subscriptions in a relational DB and identifying which Subscriptions are to fire an Alert for an Event by querying against the DB.
My main reason for choosing this approach is because of the volume, as the number of Rules/Subscriptions I being with will be about 1000 complex rules, and will grow exponentially as more users are added to the system. With the query approach I can trigger a single query that can validate all Rules in one go, vs. the Rule engine approach which would require me to do multiple validations based on the number of Rules configured.
While, I know my DB approach would work (may not be efficient), I just wanted to understand if Rule Engine can be used for such purposes and be able to scale well as the number of rules increases. (Performance is of at most importance as the number of Events that are to be processed per minute will be about 1000+)
If rule engine is not the right way to approach it, what other options are there for me to explore rather than writing my own implementation.
You are getting it wrong. A standard rule engine selects rules to execute based on the data. The rules constraints are evaluated with the data you insert into the rule engine. If all constraints in a rule match the data, the rule is executed. I would suggest you to try Drools.

Drools rule firing

I have event driven architecture. Say about 1000 event types and each event type can have multiple listeners. Averaging around 2 per event. giving 2000 handlers.
For each event handler I have rule to be evaluated further to see if that event handling is required or not.
handle(MyEvent xxx){
kisession.execute( xxx.getPayload());
// Here I want the rules that are named/identified againt my Event alone to be fired
}
I could add MyEvent to be part of LHS of the specific rule.
But I want the matching to be preprocessed to save on processing time after event is fired.
Is there a better way to fire a specific rule only rather than allowing the underlying engine evaluate all the 2000 rules to figure out which one is applicable for the Payload fact?
I could identify the rules for specific event handlers at design time itself and want to exploit this advantage for better performance.
If you select which rule to fire from outside the rules engine, then there is absolutely no point in using a rules engine!
Evaluating which rules should activate is what Drools is designed to do. Fast. Drools does not need to evaluate 2000 rules every time you call fireAllRules, just because you have 2000 rules. When you create a knowledge base, the rules are compiled into a graph which lets the engine determine which rules might fire for certain matches. The graph is updated every time a fact is inserted, modified or retracted. It's a bit like having an indexed database table.
Of course, you can technically do this. Use the fireAllRules(AgendaFilter) method to filter the rules which may fire.

Expert/Rule Engine that updates facts atomically?

Atomically might not be the right word. When modelling cellular automata or neural networks, usually you have two copies of the system state. One is the current state, and one is the state of the next step that you are updating. This ensures consistency that the state of the system as a whole remains unchanged while running all of the rules to determine the next step. For example, if you run the rules for one cell/neuron to determine the state of it for the next step, you then run the rules for the next cell, it's neighbor, you want to use as the input for those rules the current state of the neighbor cell, not its updated state.
This may seem inefficient due to the fact that each step requires you copy all of the current step states to the next step states before updating them, however it is important to do this to accurately simulate the system as if all cells/neurons were actually being processed simultaneously, and thus all inputs for rules/firing functions were the current states.
Something that has bothered me when designing rules for expert systems is how one rule can run, update some facts that should trigger other rules to run, and you might have 100 rules queued up to run in response, but the salience is used as a fragile way to ensure the really important ones run first. As these rules run, the system changes more. The state of the facts are consistently changing, so by the time you get to processing the 100th rule, the state of the system has changed significantly since the time it was added to the queue when it was really responding to the first fact change. It might have changed so drastically that the rule doesn't have a chance to react to the original state of the system when it really should have. Usually as a workaround you carefully adjust its salience, but then that moves other rules down the list and you run into a chicken or egg problem. Other workarounds involve adding "processing flag" facts that serve as a locking mechanism to suppress certain rules until other rules process. These all feel like hacks and cause rules to include criteria beyond just the core domain model.
If you build a really sophisticated system that modeled a problem accurately, you would really want the changes to the facts to be staged to a separate "updates" queue that doesn't affect the current facts until the rules queue is empty. So lets say you make a fact change that fills the queue of rules to run with 100 rules. All of these rules would run, but none of them would update facts in the current fact list, any change they make gets queued to a change list, and that ensures no other rules get activated while the current batch is processing. Once all rules are processed, then the fact changes get applied to the current fact list, all at once, and then that triggers more rules to be activated. Rinse repeat. So it becomes much like how neural networks or cellular automata are processed. Run all rules against an unchanging current state, queue changes, after running all rules apply the changes to current state.
Is this mode of operation a concept that exist in the academic world of expert systems? I'm wondering if there is a term for it.
Does Drools have the capability to run in a way that allows all rules to run without affecting the current facts, and queue fact changes separately until all rules have run? If so, how? I don't expect you to write the code for me, but just some keywords of what it's called or keywords in the API, some starting point to help me search.
Do any other expert/rule engines have this capability?
Note that in such a case, the order rules run in no longer matters, because all of the rules queued to run will all be seeing only the current state. Thus as the queue of rules is run and cleared, none of the rules see any of the changes the other rules are making, because they are all being run against the current set of facts. Thus the order becomes irrelevant and the complexities of managing rule execution order go away. All fact changes are pending and not applied to the current state until all rules have been cleared from the queue. Then all of those changes are applied at once, and thus cause relevant rules to queue again. So my goal is not to have more control over the order that rules run in, but to avoid the issue of rule execution order entirely by using an engine that simulates simultaneous rule execution.
If I understand what you describe:
You have one fact that is managed by many rules
Each rule should apply on the initial value of your fact and has no right to modify the fact value (to not modify other rules'executions)
You then batch all the updates made by the rules on your fact
Other rules apply on this new fact value in a similar manner 'simutanously'
It seems to me that it is a Unit of Work design pattern just like Hibernate implements it (and many ORM in fact): http://www.codeproject.com/Articles/581487/Unit-of-Work-Design-Pattern
Basically you store in-memory all the changes (in a 'technical' fact for instance) and then execute a 'transaction' when all the rules based on the initial value have been fired that updates the fact value, and so on. Hibernate does that with its session (you modify your attached object, then when required it executes the update query on the database, not all modifications on the java object produce queries on your database).
Still you will have troubles if updates conflict (same fact field value modified, which value to choose? Same as a source version control conflict), you will have to define a determinist way to order updates, but it will be defined only once and available for all rules and for other changes it will work seamlessly.
This workaournd may/may not work based on your rather vague description. If you really are concerned about rules triggering further activations, why not queue the intermediate state yourself. And once the current evaluation is complete, insert those new facts into the working memory.
You would have to invoke fireAllRules() after inserting each fact though, this could be quite expensive. And then in the rules, rather than inserting the facts directly, push these into a queue. Once the above call returns, walk through the queue doing the same (or after inserting the original facts completely...)
I would imagine that this will be quite slow, to speed up, you could have multiple parallel working memories with the same rules, and evaluate multiple facts in one go into several queues etc. But things get pretty hairy..
Anyway, just an idea that's too long for the comments...