Accumulate/Collect vs single java loop - drools

I have a use case in which let's say I have 100,000 pojos in the kiesession of type A and I want to apply some rules on it out of which two are -
Check is a duplicate value of type A exist.
sum of A.someInt is equal to the givenValue.
class A {
private int someInt;
private String someString;
}
For these two rules should I go with creating separate rules for them in drools like this
$sumSomeInt: Integer(this > 90) from accumulate(A( $SomeInt: someInt ), sum($SomeInt))
function boolean checkDuplicate(List input) {
int a = input.size();
int b = ((List) input.stream().distinct().collect(Collectors.toList())).size();
return a!=b;
}
dialect "java"
rule "ADuplicateRule"
when
$input : List( ) from collect(A())
eval(checkDuplicate($input))
then
throw new Exception("A list has duplicate values");
end
Or is it better to apply this in java using a single loop for this and doing both the things what I want to know is will applying these two approaches on 100k records give us a major performance difference
just a reminder I also have other rules so I have to use drools I don't want to apply some validation in java and some in drools unless there is a major performance boost

Drools optimizes the "when" clause. The "then", which is pure Java, is not optimized and is executed "as-is". eval statements are not optimized either and are bad practice and should be avoided at all costs. Just about anything you can do in an 'eval', you can do in a Drools native fashion.
Given the choice, in the "when", of doing something in Java via an eval, and doing something using Drools built-in structures and methods, always prefer doing it the Drools native way. 100k records isn't very many so it probably won't be noticeable at such a scale.
And you shouldn't be throwing exceptions -- that's also not a good practice. retract your inputs or something to force execution to terminate early.
I shared a better way of doing duplicate detection in your other question.

Related

convert Mono<Order> to Order object

I am trying to explore the reactive programming using drools. I am doing POC to apply the drools rules on the object.
public Mono<Order> findByOrderNo(String orderNo) {
Mono<Order> order = orderDAO.findByOrderNo ( orderNo );
KieSession kieSession = kieContainer.newKieSession("rulesSessionpart2");
kieSession.insert(order); // rules are not getting applied as it requires the object type
//as input
kieSession.fireAllRules();
kieSession.dispose();
return order;
}
This is my test rule:
import com.reactive.practice.springreactor.model.Order
rule "ReturnEligible for Order"
when
orderObject: Order(itemReturnEligible==true)
then
orderObject.setDescription("bdfgdfdfhdf");
end
Here the method kieSession.insert(order) requires an object as input, but in the above code, I am passing Publisher type of Mono.
I tried converting Mono to Order object using block(). As in many documentation suggests it is not recommendable to use as it is blocking the operation.
Is there any other way to convert the Mono to Order Object.
Any help is appreciable.
Thanks
Two answers
1) Use a dedicated testing library like reactor-test
https://projectreactor.io/docs/core/release/reference/#testing
StepVerifier.create(
appendBoomError(source))
.expectNext("thing1")
.expectNext("thing2")
.expectErrorMessage("boom")
.verify();
2) Calling block in a test isn't a problem.
If you unit test method is blocking and must complete before you return then block is arguably the right call to make. But there are still better ways to do this like StepVerifier above.
In reactive style programming, calling block in your production code is almost always a bug unless you are specifically the framework code in a blocking context (e.g. a synchronous servlet API that runs the process). Generally you return a Mono and transform other inputs without having a blocking call.

Does functional programming's deep stacks prevent garbage collection in the JVM?

Suppose I allocation some large object (e.g. a vector of size N, which might be very large) and perform a sequence of m operations on it:
fm( .. f3( f2( f1( vec ) ) ) )
with each returning a collection of size N.
For simplicity let's assume each f is quite simple
def f5(vec: Vector[Int]) = { gc(); f6( vec.map(_+1) ) }
So, vec no longer has future references at the point where each subsequent call is made. (f1's vec parameter is never used after f2 is entered, and so forth for each call)
However, because most JVMs don't decrement references until the stack unwinds (AFAIK), isn't my program required to consume NxM memory. By comparison in the following style only 2xM is required (and less in other implementations)
var vec:Vector[Int] = ...
for ( f <- F ) {
vec = f(vec)
gc()
}
Does the same issue exist for tail recursive methods?
This isn't just an academic exercise - in some types of big-data type problems, we might to choose N so that our program is fits fully into RAM. In this case, should I be concerned that one style of pipelining is preferable to another?
First of all, your question contains a serious misconception, and an example of disastrously bad coding.
However, because most JVMs don't decrement references until the stack unwinds (AFAIK) ...
Actually there are no mainstream JVMs that use reference counting on references at all. Instead, they all use mark-sweep, copying or generational collection algorithms of some kind that do not rely on reference counting.
Next this:
def f5(vec: Vector[Int]) = { gc(); f6( vec.map(_+1) ) }
I think you are trying to "force" a garbage collection with the gc() call. Don't do this: it is horribly inefficient. And even if you are only doing to investigate memory management behavior, you are most likely distorting that behavior to the extent that what you are seeing is NOT representative of normal Scala code.
Having said that, the answer is basically yes. If your Scala function cannot be tail-call optimized, then there is the potential for a deep recursion to cause garbage retention problems. The only "get out" would be if the JIT compiler was able to tell the GC that certain variables were "dead" at particular points in a method call. I don't know if HotSpot JITs / GCs can do that.
(I guess, another way to do that would be for the Scala compiler to explicitly assign null to dead reference variables. But that has potential performance issues when you don't have a garbage retention problem!)
To add to #StephenC's answer
I don't know if HotSpot JITs / GCs can do that.
The hotspot jit can do liveness analysis within a method and deem local variables as unreachable even while a frame is still on the stack. This is why JDK9 introduces Reference.reachabilityFence, under some conditions even this can become unreachable while executing a member method of that instance.
But that optimization only applies when there really nothing in the control flow that can still read that local variable, e.g. no finally blocks or monitor exits. So it would depend on the bytecode generated by scala.
The calls in your example are tail calls. They really shouldn't have a stack frame allocated at all. However, for various unfortunate reasons, the Scala Language Specification does not mandate Proper Tail Calls, and for similarly unfortunate reasons, the Scala-JVM implementation does not perform Tail Call Optimization.
However, some JVMs have TCO, e.g. the J9 JVM performs TCO, and thus there shouldn't be any additional stack frames allocated, making the intermediate objects unreachable as soon as the next tail call happens. Even JVMs that do not have TCO are able to perform various static (escape analysis, liveness analysis) or dynamic (escape detection, e.g. the Azul Zing JVM does this) analysis that may or may not help with this case.
There are also other implementations of Scala: Scala.js does not perform TCO, as far as I know, but it compiles to ECMAScript, and as of ECMAScript 2015, ECMAScript does have Proper Tail Calls, so as long as the encoding of Scala method calls ends up as ECMAScript function calls, an standards-conforming ECMAScript 2015 engine should eliminate Scala tail calls.
Scala-native currently does not perform TCO, but it will in the future.

Is there any advantage to avoiding while loops in Scala?

Reading Scala docs written by the experts one can get the impression that tail recursion is better than a while loop, even when the latter is more concise and clearer. This is one example
object Helpers {
implicit class IntWithTimes(val pip:Int) {
// Recursive
def times(f: => Unit):Unit = {
#tailrec
def loop(counter:Int):Unit = {
if (counter >0) { f; loop(counter-1) }
}
loop(pip)
}
// Explicit loop
def :#(f: => Unit) = {
var lc = pip
while (lc > 0) { f; lc -= 1 }
}
}
}
(To be clear, the expert was not addressing looping at all, but in the example they chose to write a loop in this fashion as if by instinct, which is what the raised the question for me: should I develop a similar instinct..)
The only aspect of the while loop that could be better is the iteration variable should be local to the body of the loop, and the mutation of the variable should be in a fixed place, but Scala chooses not to provide that syntax.
Clarity is subjective, but the question is does the (tail) recursive style offer improved performance?
I'm pretty sure that, due to the limitations of the JVM, not every potentially tail-recursive function will be optimised away by the Scala compiler as so, so the short (and sometimes wrong) answer to your question on performance is no.
The long answer to your more general question (having an advantage) is a little more contrived. Note that, by using while, you are in fact:
creating a new variable that holds a counter.
mutating that variable.
Off-by-one errors and the perils of mutability will ensure that, on the long run, you'll introduce bugs with a while pattern. In fact, your times function could easily be implemented as:
def times(f: => Unit) = (1 to pip) foreach f
Which not only is simpler and smaller, but also avoids any creation of transient variables and mutability. In fact, if the type of the function you are calling would be something to which the results matter, then the while construction would start to be even more difficult to read. Please attempt to implement the following using nothing but whiles:
def replicate(l: List[Int])(times: Int) = l.flatMap(x => List.fill(times)(x))
Then proceed to define a tail-recursive function that does the same.
UPDATE:
I hear you saying: "hey! that's cheating! foreach is neither a while nor a tail-rec call". Oh really? Take a look into Scala's definition of foreach for Lists:
def foreach[B](f: A => B) {
var these = this
while (!these.isEmpty) {
f(these.head)
these = these.tail
}
}
If you want to learn more about recursion in Scala, take a look at this blog post. Once you are into functional programming, go crazy and read Rúnar's blog post. Even more info here and here.
In general, a directly tail recursive function (i.e., one that always calls itself directly and cannot be overridden) will always be optimized into a while loop by the compiler. You can use the #tailrec annotation to verify that the compiler is able to do this for a particular function.
As a general rule, any tail recursive function can be rewritten (usually automatically by the compiler) as a while loop and vice versa.
The purpose of writing functions in a (tail) recursive style is not to maximize performance or even conciseness, but to make the intent of the code as clear as possible, while simultaneously minimizing the chance of introducing bugs (by eliminating mutable variables, which generally make it harder to keep track of what the "inputs" and "outputs" of the function are). A properly written recursive function consists of a series of checks for terminating conditions (using either cascading if-else or a pattern match) with the recursive call(s) (plural only if not tail recursive) made if none of the terminating conditions are met.
The benefit of using recursion is most dramatic when there are several different possible terminating conditions. A series of if conditionals or patterns is generally much easier to comprehend than a single while condition with a whole bunch of (potentially complex and inter-related) boolean expressions &&'d together, especially if the return value needs to be different depending on which terminating condition is met.
Did these experts say that performance was the reason? I'm betting their reasons are more to do with expressive code and functional programming. Could you cite examples of their arguments?
One interesting reason why recursive solutions can be more efficient than more imperative alternatives is that they very often operate on lists and in a way that uses only head and tail operations. These operations are actually faster than random-access operations on more complex collections.
Anther reason that while-based solutions may be less efficient is that they can become very ugly as the complexity of the problem increases...
(I have to say, at this point, that your example is not a good one, since neither of your loops do anything useful. Your recursive loop is particularly atypical since it returns nothing, which implies that you are missing a major point about recursive functions. The functional bit. A recursive function is much more than another way of repeating the same operation n times.)
While loops do not return a value and require side effects to achieve anything. It is a control structure which only works at all for very simple tasks. This is because each iteration of the loop has to examine all of the state to decide what to next. The loops boolean expression may also have to be come very complex if there are multiple potential exit paths (or that complexity has to be distributed throughout the code in the loop, which can be ugly and obfuscatory).
Recursive functions offer the possibility of a much cleaner implementation. A good recursive solution breaks a complex problem down in to simpler parts, then delegates each part on to another function which can deal with it - the trick being that that other function is itself (or possibly a mutually recursive function, though that is rarely seen in Scala - unlike the various Lisp dialects, where it is common - because of the poor tail recursion support). The recursively called function receives in its parameters only the simpler subset of data and only the relevant state; it returns only the solution to the simpler problem. So, in contrast to the while loop,
Each iteration of the function only has to deal with a simple subset of the problem
Each iteration only cares about its inputs, not the overall state
Sucess in each subtask is clearly defined by the return value of the call that handled it.
State from different subtasks cannot become entangled (since it is hidden within each recursive function call).
Multiple exit points, if they exist, are much easier to represent clearly.
Given these advantages, recursion can make it easier to achieve an efficient solution. Especially if you count maintainability as an important factor in long-term efficiency.
I'm going to go find some good examples of code to add. Meanwhile, at this point I always recommend The Little Schemer. I would go on about why but this is the second Scala recursion question on this site in two days, so look at my previous answer instead.

Everything's an object in Scala

I am new to Scala and heard a lot that everything is an object in Scala. What I don't get is what's the advantage of "everything's an object"? What are things that I cannot do if everything is not an object? Examples are welcome. Thanks
The advantage of having "everything" be an object is that you have far fewer cases where abstraction breaks.
For example, methods are not objects in Java. So if I have two strings, I can
String s1 = "one";
String s2 = "two";
static String caps(String s) { return s.toUpperCase(); }
caps(s1); // Works
caps(s2); // Also works
So we have abstracted away string identity in our operation of making something upper case. But what if we want to abstract away the identity of the operation--that is, we do something to a String that gives back another String but we want to abstract away what the details are? Now we're stuck, because methods aren't objects in Java.
In Scala, methods can be converted to functions, which are objects. For instance:
def stringop(s: String, f: String => String) = if (s.length > 0) f(s) else s
stringop(s1, _.toUpperCase)
stringop(s2, _.toLowerCase)
Now we have abstracted the idea of performing some string transformation on nonempty strings.
And we can make lists of the operations and such and pass them around, if that's what we need to do.
There are other less essential cases (object vs. class, primitive vs. not, value classes, etc.), but the big one is collapsing the distinction between method and object so that passing around and abstracting over functionality is just as easy as passing around and abstracting over data.
The advantage is that you don't have different operators that follow different rules within your language. For example, in Java to perform operations involving objects, you use the dot name technique of calling the code (static objects still use the dot name technique, but sometimes the this object or the static object is inferred) while built-in items (not objects) use a different method, that of built-in operator manipulation.
Number one = Integer.valueOf(1);
Number two = Integer.valueOf(2);
Number three = one.plus(two); // if only such methods existed.
int one = 1;
int two = 2;
int three = one + two;
the main differences is that the dot name technique is subject to polymorphisim, operator overloading, method hiding, and all the good stuff that you can do with Java objects. The + technique is predefined and completely not flexible.
Scala circumvents the inflexibility of the + method by basically handling it as a dot name operator, and defining a strong one-to-one mapping of such operators to object methods. Hence, in Scala everything is an object means that everything is an object, so the operation
5 + 7
results in two objects being created (a 5 object and a 7 object) the plus method of the 5 object being called with the parameter 7 (if my scala memory serves me correctly) and a "12" object being returned as the value of the 5 + 7 operation.
This everything is an object has a lot of benefits in a functional programming environment, for example, blocks of code now are object too, making it possible to pass back and forth blocks of code (without names) as parameters, yet still be bound to strict type checking (the block of code only returns Long or a subclass of String or whatever).
When it boils down to it, it makes some kinds of solutions very easy to implement, and often the inefficiencies are mitigated by the lack of need to handle "move into primitives, manipulate, move out of primitives" marshalling code.
One specific advantage that comes to my mind (since you asked for examples) is what in Java are primitive types (int, boolean ...) , in Scala are objects that you can add functionality to with implicit conversions. For example, if you want to add a toRoman method to ints, you could write an implicit class like:
implicit class RomanInt(i:Int){
def toRoman = //some algorithm to convert i to a Roman representation
}
Then, you could call this method from any Int literal like :
val romanFive = 5.toRoman // V
This way you can 'pimp' basic types to adapt them to your needs
In addition to the points made by others, I always emphasize that the uniform treatment of all values in Scala is in part an illusion. For the most part it is a very welcome illusion. And Scala is very smart to use real JVM primitives as much as possible and to perform automatic transformations (usually referred to as boxing and unboxing) only as much as necessary.
However, if the dynamic pattern of application of automatic boxing and unboxing is very high, there can be undesirable costs (both memory and CPU) associated with it. This can be partially mitigated with the use of specialization, which creates special versions of generic classes when particular type parameters are of (programmer-specified) primitive types. This avoids boxing and unboxing but comes at the cost of more .class files in your running application.
Not everything is an object in Scala, though more things are objects in Scala than their analogues in Java.
The advantage of objects is that they're bags of state which also have some behavior coupled with them. With the addition of polymorphism, objects give you ways of changing the implicit behavior and state. Enough with the poetry, let's go into some examples.
The if statement is not an object, in either scala or java. If it were, you could be able to subclass it, inject another dependency in its place, and use it to do stuff like logging to a file any time your code makes use of the if statement. Wouldn't that be magical? It would in some cases help you debug stuff, and in other cases it would make your hairs grow white before you found a bug caused by someone overwriting the behavior of if.
Visiting an objectless, statementful world: Imaging your favorite OOP programming language. Think of the standard library it provides. There's plenty of classes there, right? They offer ways for customization, right? They take parameters that are other objects, they create other objects. You can customize all of these. You have polymorphism. Now imagine that all the standard library was simply keywords. You wouldn't be able to customize nearly as much, because you can't overwrite keywords. You'd be stuck with whatever cases the language designers decided to implement, and you'd be helpless in customizing anything there. Such languages exist, you know them well, they're the sequel-like languages. You can barely create functions there, but in order to customize the behavior of the SELECT statement, new versions of the language had to appear which included the features most desired. This would be an extreme world, where you'd only be able to program by asking the language designers for new features (which you might not get, because someone else more important would require some feature incompatible with what you want)
In conclusion, NOT everything is an object in scala: Classes, expressions, keywords and packages surely aren't. More things however are, like functions.
What's IMHO a nice rule of thumb is that more objects equals more flexibility
P.S. in Python for example, even more things are objects (like the classes themselves, the analogous concept for packages (that is python modules and packages). You'd see how there, black magic is easier to do, and that brings both good and bad consequences.

Methods of simplifying ugly nested if-else trees in C#

Sometimes I'm writing ugly if-else statements in C# 3.5; I'm aware of some different approaches to simplifying that with table-driven development, class hierarchy, anonimous methods and some more.
The problem is that alternatives are still less wide-spread than writing traditional ugly if-else statements because there is no convention for that.
What depth of nested if-else is normal for C# 3.5? What methods do you expect to see instead of nested if-else the first? the second?
if i have ten input parameters with 3 states in each, i should map functions to combination of each state of each parameter (really less, because not all the states are valid, but sometimes still a lot). I can express these states as a hashtable key and a handler (lambda) which will be called if key matches.
It is still mix of table-driven, data-driven dev. ideas and pattern matching.
what i'm looking for is extending for C# such approaches as this for scripting (C# 3.5 is rather like scripting)
http://blogs.msdn.com/ericlippert/archive/2004/02/24/79292.aspx
Good question. "Conditional Complexity" is a code smell. Polymorphism is your friend.
Conditional logic is innocent in its infancy, when it’s simple to understand and contained within a
few lines of code. Unfortunately, it rarely ages well. You implement several new features and
suddenly your conditional logic becomes complicated and expansive. [Joshua Kerevsky: Refactoring to Patterns]
One of the simplest things you can do to avoid nested if blocks is to learn to use Guard Clauses.
double getPayAmount() {
if (_isDead) return deadAmount();
if (_isSeparated) return separatedAmount();
if (_isRetired) return retiredAmount();
return normalPayAmount();
};
The other thing I have found simplifies things pretty well, and which makes your code self-documenting, is Consolidating conditionals.
double disabilityAmount() {
if (isNotEligableForDisability()) return 0;
// compute the disability amount
Other valuable refactoring techniques associated with conditional expressions include Decompose Conditional, Replace Conditional with Visitor, Specification Pattern, and Reverse Conditional.
There are very old "formalisms" for trying to encapsulate extremely complex expressions that evaluate many possibly independent variables, for example, "decision tables" :
http://en.wikipedia.org/wiki/Decision_table
But, I'll join in the choir here to second the ideas mentioned of judicious use of the ternary operator if possible, identifying the most unlikely conditions which if met allow you to terminate the rest of the evaluation by excluding them first, and add ... the reverse of that ... trying to factor out the most probable conditions and states that can allow you to proceed without testing of the "fringe" cases.
The suggestion by Miriam (above) is fascinating, even elegant, as "conceptual art;" and I am actually going to try it out, trying to "bracket" my suspicion that it will lead to code that is harder to maintain.
My pragmatic side says there is no "one size fits all" answer here in the absence of a pretty specific code example, and complete description of the conditions and their interactions.
I'm a fan of "flag setting" : meaning anytime my application goes into some less common "mode" or "state" I set a boolean flag (which might even be static for the class) : for me that simplifies writing complex if/then else evaluations later on.
best, Bill
Simple. Take the body of the if and make a method out of it.
This works because most if statements are of the form:
if (condition):
action()
In other cases, more specifically :
if (condition1):
if (condition2):
action()
simplify to:
if (condition1 && condition2):
action()
I'm a big fan of the ternary operator which get's overlooked by a lot of people. It's great for assigning values to variables based on conditions. like this
foobarString = (foo == bar) ? "foo equals bar" : "foo does not equal bar";
Try this article for more info.
It wont solve all your problems, but it is very economical.
I know that this is not the answer you are looking for, but without context your questions is very hard to answer. The problem is that the way to refactor such a thing really depends on your code, what it is doing, and what you are trying to accomplish. If you had said that you were checking the type of an object in these conditionals we could throw out an answer like 'use polymorphism', but sometimes you actually do just need some if statements, and sometimes those statements can be refactored into something more simple. Without a code sample it is hard to say which category you are in.
I was told years ago by an instructor that 3 is a magic number. And as he applied it it-else statements he suggested that if I needed more that 3 if's then I should probably use a case statement instead.
switch (testValue)
{
case = 1:
// do something
break;
case = 2:
// do something else
break;
case = 3:
// do something more
break;
case = 4
// do what?
break;
default:
throw new Exception("I didn't do anything");
}
If you're nesting if statements more than 3 deep then you should probably take that as a sign that there is a better way. Probably like Avirdlg suggested, separating the nested if statements into 1 or more methods. If you feel you are absolutely stuck with multiple if-else statements then I would wrap all the if-else statements into a single method so it didn't ugly up other code.
If the entire purpose is to assign a different value to some variable based upon the state of various conditionals, I use a ternery operator.
If the If Else clauses are performing separate chunks of functionality. and the conditions are complex, simplify by creating temporary boolean variables to hold the true/false value of the complex boolean expressions. These variables should be suitably named to represent the business sense of what the complex expression is calculating. Then use the boolean variables in the If else synatx instead of the complex boolean expressions.
One thing I find myself doing at times is inverting the condition followed by return; several such tests in a row can help reduce nesting of if and else.
Not a C# answer, but you probably would like pattern matching. With pattern matching, you can take several inputs, and do simultaneous matches on all of them. For example (F#):
let x=
match cond1, cond2, name with
| _, _, "Bob" -> 9000 // Bob gets 9000, regardless of cond1 or 2
| false, false, _ -> 0
| true, false, _ -> 1
| false, true, _ -> 2
| true, true, "" -> 0 // Both conds but no name gets 0
| true, true, _ -> 3 // Cond1&2 give 3
You can express any combination to create a match (this just scratches the surface). However, C# doesn't support this, and I doubt it will any time soon. Meanwhile, there are some attempts to try this in C#, such as here: http://codebetter.com/blogs/matthew.podwysocki/archive/2008/09/16/functional-c-pattern-matching.aspx. Google can turn up many more; perhaps one will suit you.
try to use patterns like strategy or command
In simple cases you should be able to get around with basic functional decomposition. For more complex scenarios I used Specification Pattern with great success.