Drools global variable initialization and scaling for performance - drools

Thanks in advance. We are trying to adopt drools as rules engine in our enterprise. After evaluating basic functionality in POC mode, we are exploring further. We have the following challenges and I am trying to validate some of the options we are considering. Any help is greatly appreciated.
Scenario-1: Say you get USA state (TX,CA,CO etc) in a fact's field. Now you want the rule to check if the 'state value on the fact' exists in a predetermined static list of state values(say the list contains three values TX,TN,MN).
Possible solution to Scenario-1: 'static list of state values' can be set as a global variable and the rule can access the global variable while performing the check.
Qustions on Scenario-1:
Is the 'possible solution to scenario-1' the standard practice? If so, is it possible to load the value of this global variable from a database during rule engine(KIE Server) startup? If yes, could you let me know the drools feature that enables us to load global variables from a database? Should a client application (a client application that calls kie-server) initialize global variables instead?
Scenario-2: We want to horizantally scale rule execution sever. Say we have one rule engine server(kie-server) exposing rest-api. Can we have multiple instances running behind a loadbalancer to have it scale horizontally? Is there any other way of achieving the scalability?

Q1: It depends. The usual solution for a small, rarely (if ever) changing set that is used just in a single rule is to put it into the rule, using the in operator. If you think you might have to change it or use it frequently, a global would be one way of achieving that but you must make sure that the global is initialized before any facts are inserted.
There is nothing out-of-the-box for accessing a DB.
Q2: A server running a Drools session is just another Java server program, so any load balancing applicable to this class of programs should apply to a Drools app as well. What are you fearing?

Related

Stateful session memory management in Drools

Just have a general question regarding memory management when using "Stateful sessions" in Drools. For context, I'm specifically looking to use a ksession in "Stream" mode, together with fireUntilHalt() to process an infinite stream of events. Each event is timestamped, however I'm mainly writing rules using length-based windows notation (i.e. window:length()) and the from accumulate syntax for decision making.
The docs are a little vague though about how memory management works in this case. The docs suggest that using temporal operators the engine can automatically remove any facts/events that can no longer match. However, would this also apply to rules that only use the window:length()? Or would my system need to manually delete events that are no longer applicable, in order to prevent running OOM?
window:time() calculates expiration, so works for automatic removal. However, window:length() doesn't calculate expiration, so events would be retained.
You can confirm the behaviour with my example:
https://github.com/tkobayas/kiegroup-examples/tree/master/Ex-cep-window-length-8.32
FYI)
https://github.com/kiegroup/drools/blob/8.32.0.Final/drools-core/src/main/java/org/drools/core/rule/SlidingLengthWindow.java#L145
You would need to explicitly delete them or set #expires (if it's possible for your application to specify expiration with time) to avoid OOME.
Thank you for pointing that the document is not clear about it. I have filed a doc JIRA to explain it.
https://issues.redhat.com/browse/DROOLS-7282

When to use multiple KieBases vs multiple KieSessions?

I know that one can utilize multiple KieBases and multiple KieSessions, but I don't understand under what scenarios one would use one approach vs the other (I am having some trouble in general understanding the definitions and relationships between KieContainer, KieBase, KieModule, and KieSession). Can someone clarify this?
You use multiple KieBases when you have multiple sets of rules doing different things.
KieSessions are the actual session for rule execution -- that is, they hold your data and some metadata and are what actually executes the rules.
Let's say I have an application for a school. One part of my application monitors students' attendance. The other part of my application tracks their grades. I have a set of rules which decides if students are truant and we need to talk to their parents. I have a completely unrelated set of rules which determines whether a student is having trouble academically and needs to be put on probation/a performance plan.
These rules have nothing to do with one another. They have completely separate concerns, different rule inputs, and are triggered in different parts of the application. The part of the application that is tracking attendance doesn't need to trigger the rules that monitor student performance.
For this application, I would have two different KieBases: one for attendance, and one for academics. When I need to fire the rules, I fire one or the other -- there is no use case for firing both at the same time.
The KieSession is the runtime for when we fire those rules. We add to it the data we need to trigger the rules, and it also tracks some other metadata that's really not relevant to this discussion. When firing the academics rules, I would be adding to it the student's grades, their classes, and maybe some information about the student (eg the grade level, whether they're an "honors" student, tec.). For the attendance rules, we would need the student information, plus historical tardiness/absence records. Those distinct pieces of data get added to the sessions.
When we decide to fire rules, we first get the appropriate KieBase -- academics or attendance. Then we get a session for that rule set, populate the data, and fire it. We technically "execute" the session, not the rules (and definitely not the rule base.) The rule base is just the collection of the rules; the session is how we actually execute it.
There are two kinds of sessions -- stateful and stateless. As their names imply, they differ with how data is stored and tracked. In most cases, people use stateful sessions because they want their rules to do iterative work on the inputs. You can read more about the specific differences in the documentation.
For low-volume applications, there's generally little need to reuse your KieSessions. Create, use, and dispose of them as needed. There is, however, some inherent overhead in this process, so there comes a point in which reuse does become something that you should consider. The documentation discusses the solution provided out-of-the box for Drools, which is session pooling.
(When trying to wrap your head around this, I like to use an analogy of databases. A session is like a JDBC connection: for small applications you can create them, use them, then close them as you need them. But as you scale you'll quickly find that you need to look into connection pooling to minimize this overhead. In this particular analogy, the rule base would be the database that the rules are executing against -- not the tables!)

enabling / disabling of rules

I am trying to come up with a mechanism in ODM to enable or disable a rule based on some input parameters like sales-zone, type of product and 6 or 7 other parameters. I don't want to put all these 7 parameters into the condition within the rule since that would reduce the reusability of the rules.
Are there any features available in ODM that can be used for this? Are there any techniques widely used in the BRMS community for such problems?
You can probably make use of rule selection using IRL at rule task level. Write a function which will determine whether the rule is effective or not for this input param. We are using this strategy. See the below screenshot.
I hope this may help you out. Happy Rule Development. :)
You could extend the extension meta data model and add a property that can be set on the rule to indicate the sales zone associated with the rule. Then on the rule task on the rule flow, use a dynamic select to include or exclude rules that have the property set.
However be aware that with lots of rules, dynamic selects can cause performance issues potentially.

Mapping to legacy MongoDB store

I'm attempting to write up a Yesod app as a replacement for a Ruby JSON service that uses MongoDB on the backend and I'm running into some snags.
the sql=foobar syntax in the models file does not seem too affect which collection Persistent.MongoDB uses. How can I change that?
is there a way to easily configure mongodb (preferably through the yaml file) to be explicitly read only? I'd take more comfort deploying this knowing that there was no possible way the app could overwrite or damage production data.
Is there any way I can get Persistent.MongoDB to ignore fields it doesn't know about? This service only needs a fraction of the fields in the collection in question. In order to keep the code as simple as possible, I'd really like to just map to the fields I care about and have Yesod ignore everything else. Instead it complains that the fields don't match.
How does one go about defining instances for models, such as ToJSON. I'd like to customize how that JSON gets rendered but I get the following error:
Handler/ProductStat.hs:8:10:
Illegal instance declaration for ToJSON Product'
(All instance types must be of the form (T t1 ... tn)
where T is not a synonym.
Use -XTypeSynonymInstances if you want to disable this.)
In the instance declaration forToJSON Product'
1) seems that sql= is not hooked up to mongo. Since sql is already doing this it shouldn't be difficult for Mongo.
2) you can change the function that runs the queries
in persistent/persistent-mongoDB/Database/Persist there is a runPool function of PersistConfig. That gets used in yesod-defaults. We should probably change the loadConfig function to check a readOnly setting
3) I am ok with changing the reorder function to allow for ignoring, although in the future (if MongoDB returns everything in ordeR) that may have performance implications, so ideally you would list the ignored columns.
4) This shouldn't require changes to Persistent. Did you try turning on TypeSynonymInstances ?
I have several other Yesod/Persistent priorities to attend to before these changes- please roll up your sleeves and let me know what help you need making them. I can change 2 & 3 myself fairly soon if you are committed to testing them.

Rules Based Database Engine

I would like to design a rules based database engine within Oracle for PeopleSoft Time entry application. How do I do this?
A rules-based system needs several key components:
- A set of rules defined as data
- A set of uniform inputs on which to operate
- A rules executor
- Supervisor hierarchy
Write out a series of use-cases - what might someone be trying to accomplish using the system?
Decide on what things your rules can take as inputs, and what as outputs
Describe the rules from your use-cases as a series of data, and thus determine your rule format. Expand 2 as necessary for this.
Create the basic rule executor, and test that it will take the rule data and process it correctly
Extend the above to deal with multiple rules with different priorities
Learn enough rule engine theory and graph theory to understand common rule-based problems - circularity, conflicting rules etc - and how to use (node) graphs to find cases of them
Write a supervisor hierarchy that is capable of managing the ruleset and taking decisions based on the possible problems above. This part is important, because it is your protection against foolishness on the part of the rule creators causing runtime failure of the entire system.
Profit!
Broadly, rules engines are an exercise in managing complexity. If you don't manage it, you can easily end up with rules that cascade from each other causing circular loops, race-conditions and other issues. It's very easy to construct these accidentally: consider an email program which you have told to move mail from folder A to B if it contains the magic word 'beta', and from B to A if it contains the word 'alpha'. An email with both would be shuttled back and forward until something broke, preventing all other rules from being processed.
I have assumed here that you want to learn about the theory and build the engine yourself. alphazero raises the important suggestion of using an existing rules engine library, which is wise - this is the kind of subject that benefits from academic theory.
I haven't tried this myself, but an obvious approach is to use Java procedures in the Oracle database, and use a Java rules engine library in that code.
Try:
http://www.oracle.com/technology/tech/java/jsp/index.html
http://www.oracle.com/technology/tech/java/java_db/pdf/TWP_AppDev_Java_DB_Reduce_your_Costs_and%20_Extend_your_Database_10gR1_1113.PDF
and
http://www.jboss.org/drools/
or
http://www.jessrules.com/
--
Basically you'll need to capture data events (inserts, updates, deletes), map to them to your rulespace's events, and apply rules.