How to start working with a large decision table - rule-engine

Today I've been presented with a fun challenge and I want your input on how you would deal with this situation.
So the problem is the following (I've converted it to demo data as the real problem wouldn't make much sense without knowing the company dictionary by heart).
We have a decision table that has a minimum of 16 conditions. Because it is an impossible feat to manage all of them (2^16 possibilities) we've decided to only list the exceptions. Like this:
As an example I've only added 10 conditions but in reality there are (for now) 16. The basic idea is that we have one baseline (the default) which is valid for everyone and all the exceptions to this default.
Example:
You have a foreigner who is also a pirate.
If you go through all the exceptions one by one, and condition by condition you remove the exceptions that have at least one condition that fails. In the end you'll end up with the following two exceptions that are valid for our case. The match is on the IsPirate and the IsForeigner condition. But as you can see there are 2 results here, well 3 actually if you count the default.
Our solution
Now what we came up with on how to solve this is that in the GUI where you are adding these exceptions, there should run an algorithm which checks for such cases and force you to define the exception more specifically. This is only still a theory and hasn't been tested out but we think it could work this way.
My Question
I'm looking for alternative solutions that make the rules manageable and prevent the problem I've shown in the example.

Your problem seem to be resolution of conflicting rules. When multiple rules match your input, (your foreigner and pirate) and they end up recommending different things (your cangetjob and cangetevicted), you need a strategy for resolution of this conflict.
What you mentioned is one way of resolution -- which is to remove the conflict in the first place. However, this may not always be possible, and not always desirable because when a user adds a new rule that conflicts with a set of old rules (which he/she did not write), the user may not know how to revise it to remove the conflict.
Another possible resolution method is prioritization. Mark a priority on each rule (based on things like the user's own authority etc.), sort the matching rules according to priority, and apply in ascending sequence of priority. This usually works and is much simpler to manage (e.g. everybody knows that the top boss's rules are final!)
Prioritization may also be used to mark a certain rule as "global override". In your example, you may want to make "IsPirate" as an override rule -- which means that it overrides settings for normal people. In other words, once you're a pirate, you're treated differently. This make it very easy to design a system in which you have a bunch of normal business rules governing 90% of the cases, then a set of "exceptions" that are treated differently, automatically overriding certain things. In this case, you should also consider making "?" available in the output columns as well.
One other possible resolution method is to include attributes in each of your conditions. For example, certain conditions must have no "zeros" in order to pass (? doesn't matter). Some conditions must have at least one "one" in order to pass. In other words, mark each condition as either "AND", "OR", or "XOR". Some popular file-system security uses this model. For example, CanGetJob may be AND (you want to be stringent on rights-to-work). CanBeEvicted may be OR -- you may want to evict even a foreigner if he is also a pirate.
An enhancement on the AND/OR method is to provide a threshold that the total result must exceed before passing that condition. For example, putting CanGetJob at a threshold of 2 then it must get at least two 1's in order to return 1. This is sometimes useful on conditions that are not clearly black-and-white.
You can mix resolution methods: e.g. first prioritize, then use AND/OR to resolve rules with similar priorities.
The possibilities are limitless and really depends on what your actual needs are.

To me this problem reminds business rules engine where there is no known algorithm to define outputs from inputs (e.g. using boolean logic) but the user (typically some sort of administrator) has to define all or some the logic itself.
This might sound a bit of an overkill but OTOH this provides virtually limit-less extension capabilities: you don't have to code any new business logic, just define a new rule set.
As I understand your problem, you are looking for a nice way to visualise the editing for these rules. But this all depends on your programming language and the tool you select for this. Java, for example, has JBoss Drools. Quoting their page:
Drools Guvnor provides a (logically
centralized) repository to store you
business knowledge, and a web-based
environment that allows business users
to view and (within certain
constraints) possibly update the
business logic directly.
You could possibly use this generic tool or write your own.

Everything depends on what your actual rules will look like. Rules like 'IF has an even number of these properties THEN' would be painful to represent in this format, whereas rules like 'IF pirate and not geek THEN' are easy.
You can 'avoid the ambiguity' by stating that you'll always be taking the first actual match, in other words your rules have a priority. You'd then want to flag rules which have no effect because they are 'shadowed' by rules higher up. They're not hard to find, so it's something your program should do.
Your interface could also indicate groups of rules where rules within the group can be in any order without changing the outcomes. This will add clarity to what the rules are really saying.
If some of your outputs are relatively independent of the others, you will also get a more compact and much clearer table by allowing question marks in the output. In that design the scan for first matching rule is done once for each output. Consider for example if 'HasChildren' is the only factor relevant to 'Can Be Evicted'. With question marks in the outputs (= no effect) you could be halving the number of exception rules.
My background for this is circuit logic design, not business logic. What you're designing is similar to, but not the same as, a PLA. As long as your actual rules are close to sum of products then it can work well. If your rules aren't, for example the 'even number of these properties' rule, then the grid like presentation will break down in a combinatorial explosion of cases. Your best hope if your rules are arbitrary is to get a clearer more compact presentation with either equations or with diagrams like a circuit diagram. To be avoided, if you can.

If you are looking for a Decision Engine with a GUI, than you can try this one: http://gandalf.nebo15.com/
We just released it, it's open source and production ready.

You probably need some kind of inference engine. Think about doing it in prolog.

Related

Drools conditions in the consequence: Benefits or Costs?

I am new to drools with a background in java. I have gained a basic undertanding of drools.
I have inherited a large drools project which works, but appears to be hacked together. Most of the rules have many and nested IF and ELSE statements in the "then" (consequence?). I believe this is bad practice. Can anyone confirm, references to materials on the internet would be useful.
Also what are the benefits in correcting this other than readability?
I'd have to dig for references but the general consent among rule programmers is that decisions should be made on the LHS/condition part of a rule. The simple reason is that the Engine is dedicated to the process of "many pattern/many object" pattern match problem. Even if there is one final condition where some action is necessary for both, true and false, Drools syntax provides a good solution, i.e., extending a rule twice, once with the positive and once with the negative condition.
That said, an occasional conditional statement may be tolerated or even fine, e.g., when it merely distinguishes between details in the works of the RHS/consequence part of a rule. But "many and nested" sounds rather bad - but maybe this "distinction of the details" does require such logic.
As for the benefits: nobody can tell without inspection, and then it'll need an experienced judge.
Since you've asked for references: http://www.redhat.com/rhecm/rest-rhecm/jcr/repository/collaboration/sites%20content/live/redhat/web-cabinet/home/resourcelibrary/whitepapers/brms-design-patterns/rh:pdfFile.pdf
One non-readability reasoning for putting as much evaluation in the LHS as possible is performance. For one thing it avoids unnecessary rule activations, but it also makes significant performance gains through caching the results of each of the matches, thereby avoiding re-evaluation.
This caching is not available to conditional logic on the RHS.
This is one reason why you invoke update/modify on a fact when you change it. This effectively instructs the engine that previously cached LHS evaluations relating to that fact can be discarded.
Nested "if/else" statements are generally not bad practice as logical operations are not stressful on the CPU, nor does it require copious amounts of memory.
I'm not a Drools programmer, but I imagine, since you alluded it to Java, it's an obj-oriented high-level prgramming language.
Howver, in the event that multiple if statements can be combined, it's recommended to do so for clear, clean coding, not necessarily performance.

hooks versus middleware in slim 2.0

Can anyone explain if there are any significant advantages or disadvantages when choosing to implement features such as authentication or caching etc using hooks as opposed to using middleware?
For instance - I can implement a translation feature by obtaining the request object through custom middleware and setting an app language variable that can be used to load the correct translation file when the app executes. Or I can add a hook before the routing and read the request variable and then load the correct file during the app execution.
Is there any obvious reason I am missing that makes one choice better than the other?
Super TL/DR; (The very short answer)
Use middleware when first starting some aspect of your application, i.e. routers, the boot process, during login confirmation, and use hooks everywhere else, i.e. in components or in microservices.
TL/DR; (The short answer)
Middleware is used when the order of execution matters. Because of this, middleware is often added to the execution stack in various aspects of code (middleware is often added during boot, while adding a logger, auth, etc. In most implementations, each middleware function subsequently decides if execution is continued or not.
However, using middleware when order of execution does not matter tends to lead to bugs in which middleware that gets added does not continue execution by mistake, or the intended order is shuffled, or someone simply forgets where or why a middleware was added, because it can be added almost anywhere. These bugs can be difficult to track down.
Hooks are generally not aware of the execution order; each hooked function is simply executed, and that is all that is guaranteed (i.e. adding a hook after another hook does not guarantee the 2nd hook is always executed second, only that it will simply be executed). The choice to perform it's task is left up to the function itself (to call out to state to halt execution). Most people feel this is much simpler and has fewer moving parts, so statistically yields less bugs. However, to detect if it should run or not, it can be important to include additional state in hooks, so that the hook does not reach out into the app and couple itself with things it's not inherently concerned with (this can take discipline to reason well, but is usually simpler). Also, because of their simplicity, hooks tend to be added at certain named points of code, yielding fewer areas where hooks can exist (often a single place).
Generally, hooks are easier to reason with and store because their order is not guaranteed or thought about. Because hooks can negate themselves, hooks are also computationally equivalent, making middleware only a form of coding style or shorthand for common issues.
Deep dive
Middleware is generally thought of today by architects as a poor choice. Middleware can lead to nightmares and the added effort in debugging is rarely outweighed by any shorthand achieved.
Middleware and Hooks (along with Mixins, Layered-config, Policy, Aspects and more) are all part of the "strategy" type of design pattern.
Strategy patterns, because they are invoked whenever code branching is involved, are probably one of if not the most often used software design patterns.
Knowledge and use of strategy patterns are probably the easiest way to detect the skill level of a developer.
A strategy pattern is used whenever you need to apply "if...then" type of logic (optional execution/branching).
The more computational thought experiments that are made on a piece of software, the more branches can mentally be reduced, and subsequently refactored away. This is essentially "aspect algebra"; constructing the "bones" of the issue, or thinking through what is happening over and over, reducing the procedure to it's fundamental concepts/first principles. When refactoring, these thought experiments are where an architect spends the most time; finding common aspects and reducing unnecessary complexity.
At the destination of complexity reduction is emergence (in systems theory vernacular, and specifically with software, applying configuration in special layers instead of writing software in the first place) and monads.
Monads tend to abstract away what is being done to a level that can lead to increased code execution time if a developer is not careful.
Both Monads and Emergence tend to abstract the problem away so that the parts can be universally applied using fundamental building blocks. Using Monads (for the small) and Emergence (for the large), any piece of complex software can be theoretically constructed from the least amount of parts possible.
After all, in refactoring: "the easiest code to maintain is code that no longer exists."
Functors and mapping functions
A great way to continually reduce complexity is applying functors and mapping functions. Functors are also usually the fastest possible way to implement a branch and let the compiler see into the problem deeply so it can optimize things in the best way possible. They are also extremely easy to reason with and maintain, so there is rarely harm in leaving your work for the day and committing your changes with a partially refactored application.
Functors get their name from mathematics (specifically category theory, in which they are referred to a function that maps between two sets). However, in computation, functors are generally just objects that map problem-space in one way or another.
There is great debate over what is or is not a functor in computer science, but in keeping with the definition, you only need to be concerned with the act of mapping out your problem, and using the "functor" as a temporary thought scaffold that allows you to abstract the issue away until it becomes configuration or a factor of implementation instead of code.
As far as I can say that middleware is perfect for each routing work. And hooks is best for doing anything application-wide. For your case I think it should be better to use hooks than middleware.

Are there exceptions to the rule that requirements should be atomic?

I'm reviewing a requirements spec where some of the requirements include the word "and" or sometimes even a list of required functionality.
Am mostly thinking these should be broken up but this does have the downside of making a long document even longer and even less readable - which in practice may mean its intended audience ends up skimming over it or only reading sections rather than absorbing the whole thing.
However, there are some requirements where it seems a bit silly to break them up. E.g: there are a lot of get/set operations, which always go together - it seems a bit overkill to always break them up into "The user shall be able to get...", "The user shall be able to set..." Other examples are enable/disable, validation lists, supported platforms/browsers etc.
Just wondering if anyone has had similar thoughts and whether it might sometimes be OK to break the rule of atomicity?
My opinion is that you do not have to break up the requirements, as long as you uniquely identify them. E.g. "[REQ1] The user should be able to [a] set ... and [b] get ..." In this way you keep the document readable and also keep the possibility of separately tracing the atomic parts.

Specification: Use cases for CRUD

I am writing a Product requirements specification. In this document I must describe the ways that the user can interact with the system in a very high level. Several of these operations are "Create-Read-Update-Delete" on some objects.
The question is, when writing use cases for these operations, what is the right way to do so? Can I write only one Use Case called "Manage Object x" and then have these operations as included Use Cases? Or do I have to create one use case per operation, per object? The problem I see with the last approach is that I would be writing quite a few pages that I feel do not really contribute to the understanding of the problem.
What is the best practice?
The original concept for use cases was that they, like actors, and class definitions, and -- frankly everything -- enjoy inheritance, as well as <<uses>> and <<extends>> relationships.
A Use Case superclass ("CRUD") makes sense. A lot of use cases are trivial extensions to "CRUD" with an entity type plugged into the use case.
A few use cases will be interesting extensions to "CRUD" with variant processing scenarios for -- maybe -- a fancy search as part of Retrieve, or a multi-step process for Create or Update, or a complex confirmation for Delete.
Feel free to use inheritance to simplify and normalize your use cases. If you use a UML tool, you'll notice that Use Cases have an "inheritance" arrow available to them.
The answer really depends on how complex the interactions are and how many variations are possible from object to object. There are two real reasons why I suggest that you develop specific use cases for each CRUD
(a) If you really are only doing a high-level summary of the interaction then the overhead is very small
(b) I've found it useful to specify a set of generic Use Cases for modifying 'Resources' and then extending / overriding particular steps for particular objects. Obviously the common behaviour is captured in the generic 'Resource' use cases.
As your understanding of the domain develops (i.e. as business users dump more requirements on you), you are more likely to add to the CRUD rather than remove it.
It makes sense to distinguish between workflow cases and resource/object lifecycles.
They interact but they are not the same; it makes sense to specify them both.
Use case scenarios or more extended workflow specifications typically describe how a case may proceed through the system's workflow. This will typically include interaction with various different resources. These interactions can often be characterized as C,R,U or D.
Resource lifecycles provide the process model of what may happen to a particular (type of) resource (object). They are often trivial "flower" models that say: any of C,R,U,D may happen to this resource in any order, so they are not very interesting by themselves.
The link between the two is that steps from the workflow and from the lifecycles coincide.
I feel representation - as long as it makes sense and is readable - does not matter. Conforming to the UML spec in all details is especially irrelevant.
What does matter, that you spec clearly states the operations and operation types the implementaton requires.
C: What form of insert operations exists. Can you insert rows not fully populated? Can you insert rows without an ID? Can you retrieve the ID last inserted? Can you cancel an insert selectively? What happens on duplicate keys or constraints failure? Is there a REPLACE INTO equivalent?
R: By what fields can you select? Can you do arbitrary grouping, orders? Can you create aggregate fields, aliases? How can you retrieve embedded (has many etc.) data? How do you specify depth of recursion, limits?
U, D: see R + C

Rules Based Database Engine

I would like to design a rules based database engine within Oracle for PeopleSoft Time entry application. How do I do this?
A rules-based system needs several key components:
- A set of rules defined as data
- A set of uniform inputs on which to operate
- A rules executor
- Supervisor hierarchy
Write out a series of use-cases - what might someone be trying to accomplish using the system?
Decide on what things your rules can take as inputs, and what as outputs
Describe the rules from your use-cases as a series of data, and thus determine your rule format. Expand 2 as necessary for this.
Create the basic rule executor, and test that it will take the rule data and process it correctly
Extend the above to deal with multiple rules with different priorities
Learn enough rule engine theory and graph theory to understand common rule-based problems - circularity, conflicting rules etc - and how to use (node) graphs to find cases of them
Write a supervisor hierarchy that is capable of managing the ruleset and taking decisions based on the possible problems above. This part is important, because it is your protection against foolishness on the part of the rule creators causing runtime failure of the entire system.
Profit!
Broadly, rules engines are an exercise in managing complexity. If you don't manage it, you can easily end up with rules that cascade from each other causing circular loops, race-conditions and other issues. It's very easy to construct these accidentally: consider an email program which you have told to move mail from folder A to B if it contains the magic word 'beta', and from B to A if it contains the word 'alpha'. An email with both would be shuttled back and forward until something broke, preventing all other rules from being processed.
I have assumed here that you want to learn about the theory and build the engine yourself. alphazero raises the important suggestion of using an existing rules engine library, which is wise - this is the kind of subject that benefits from academic theory.
I haven't tried this myself, but an obvious approach is to use Java procedures in the Oracle database, and use a Java rules engine library in that code.
Try:
http://www.oracle.com/technology/tech/java/jsp/index.html
http://www.oracle.com/technology/tech/java/java_db/pdf/TWP_AppDev_Java_DB_Reduce_your_Costs_and%20_Extend_your_Database_10gR1_1113.PDF
and
http://www.jboss.org/drools/
or
http://www.jessrules.com/
--
Basically you'll need to capture data events (inserts, updates, deletes), map to them to your rulespace's events, and apply rules.