Drools: Time continuous drools rules not working - scala

I'm trying to test the following drools rules:
declare Message
#role(event)
#timestamp(timestamp)
#expires(10s)
end
rule "throttle state activated"
when
$meta: CurrentState(throttleState == false)
$lastMessage: Message($lastTimestamp: timestamp)
not (Message(timestamp > $lastMessage.timestamp))
$messages: ArrayList(size > 20) from collect(
$message: Message() over window:time(10s)
)
then
$meta.setLastThrottled($lastMessage.getTimestamp());
$meta.setThrottleState(true);
update($meta);
end
rule "throttle state deactivated"
when
$meta: CurrentState(throttleState == true)
$lastMessage: Message($lastTimestamp: timestamp)
not (Message(timestamp > $lastMessage.timestamp))
$messages: ArrayList(size <= 20) from collect(
$message: Message() over window:time(10s)
)
then
$meta.setThrottleState(false);
update($meta);
end
With the following two tests, these are in Scala and are using must matchers:
"rules" must {
"successfully trigger (activate)" in {
reset() // Resets the kie session and event listener
// The insertAndFire function first advances the pseudo clock, then inserts the message, then triggers all
// drools rules
(1 to 50).foreach(i => insertAndFire(mockMessage(offsetMSec = 200 * i), timeAdvanceMSec = 200))
// The sListener simply counts every drools rule fire amount
sListener.getCount("throttle state activated") mustBe 1 // THIS TEST SUCCEEDS
}
"successfully trigger (deactivate)" in {
reset()
(1 to 50).foreach(i => insertAndFire(mockMessage(offsetMSec = 200 * i), timeAdvanceMSec = 200))
(1 to 60).foreach(i => {
// Now for the next 60 seconds, we advance the clock and fire all rules to force the deactivation to not have 20 messages in the last 10 seconds.
// This should theoretically already trigger the rule after 10 forloop iterations.
clock.advanceTime(1, TimeUnit.SECONDS)
kieSession.fireAllRules
})
sListener.getCount("throttle state activated") mustBe 1 // THIS TEST SUCCEEDS
sListener.getCount("throttle state deactivated") mustBe 1 // !!! THIS TEST FAILS !!!
}
}
I've tried everything, and have already debugged that the throttleState variable in the CurrentState actually turns to true, but for some reason it does not want to actually trigger the deactivation rule.
When using these rules in live code, not testing code, it works fine, so I don't understand why the rules do not properly trigger. I'm using a PseudoSessionClock to simulate time during these tests and this works fine for every single other rule. My suspicion is that it has to do with the collect in the deactivation rule, but I cannot pinpoint it.
I've also tried forcefully waiting 10 seconds using a timer in real life, which of course hogs the test, but this also fails. It would not be logical that this solution would work as the activation rule does properly trigger, but it was worth a try.
MCVE Project: https://github.com/ThijmenL98/DroolsMCVE

Related

How to repeat each test with a delay if a particular Exception happens (pytest)

I have a load of test which I want to rerun if there is a particular exception. The reason for this is that I am running real API calls to a server and sometimes I hit the rate limit for the API, in which case I want to wait and try again.
However, I am also using a pytest fixture to make each test is run several times, because I am sending requests to different servers (the actual use case is different cryptocurrency exchanges).
Using pytest-rerunfailures comes very close to what I need...apart from that I can't see how to look at the exception of the last test run in the condition.
Below is some code which shows what I am trying to achieve, but I don't want to write code like this for every test obviously.
#pytest_asyncio.fixture(
params=EXCHANGE_NAMES,
)
async def client(request):
exchange_name = request.param
exchange_client = get_exchange_client(exchange_name)
return exchange_client
def test_something(client):
test_something.count += 1
### This block is the code I want to
try:
result = client.do_something()
except RateLimitException:
test_something.count
if test_something.count <= 3:
sleep_duration = get_sleep_duration(client)
time.sleep(sleep_duration)
# run the same test again
test_something()
else:
raise
expected = [1,2,3]
assert result == expected
You can use the retry library to wrap your actual code in:
#pytest_asyncio.fixture(
params=EXCHANGE_NAMES,
autouse=True,
)
async def client(request):
exchange_name = request.param
exchange_client = get_exchange_client(exchange_name)
return exchange_client
def test_something(client):
actual_test_something(client)
#retry(RateLimitException, tries=3, delay=2)
def actual_test_something(client):
'''Retry on RateLimitException, raise error after 3 attempts, sleep 2 seconds between attempts.'''
result = client.do_something()
expected = [1,2,3]
assert result == expected
The code looks much cleaner this way.

Killing the spark.sql

I am new to scala and spark both .
I have a code in scala which executes quieres in while loop one after the other.
What we need to do is if a particular query takes more than a certain time , for example # 10 mins we should be able to stop the query execution for that particular query and move on to the next one
for example
do {
var f = Future(
spark.sql("some query"))
)
f onSucess {
case suc - > println("Query ran in 10mins")
}
f failure {
case fail -> println("query took more than 10mins")
}
}while(some condition)
var result = Await.ready(f,Duration(10,TimeUnit.MINUTES))
I understand that when we call spark.sql the control is sent to spark which i need to kill/stop when the duration is over so that i can get back the resources
I have tried multiple things but I am not sure how to solve this.
Any help would be welcomed as i am stuck with this.

How to cancel other futures?

I am creating multiple futures and I am expecting only one to achieve the desired goal.
How can I cancel all other futures from within a future?
This is how I create futures:
jobs = days_to_scan.map{|day|
Concurrent::Future.execute do
sleep_time = day.to_f / days_to_scan.count.to_f * seconds_to_complete.to_f
sleep (sleep_time)
if GoogleAPI.new.api_call(#adwords, ad_seeder, visitor, day)
# How to cancel other futures here?
end
end
}
I might be late to the party but I'm gonna reply anyway as other people might stumble upon this question.
So what you want is to probably force-shutdown the thread pool as soon as one Future finishes:
class DailyJobs
def call
thread_pool = ::Concurrent::CachedThreadPool.new
jobs = days_to_scan.map{ |day|
Concurrent::Future.execute(executor: thread_pool) do
sleep_time = day.to_f / days_to_scan.count.to_f * seconds_to_complete.to_f
sleep (sleep_time)
if GoogleAPI.new.api_call(#adwords, ad_seeder, visitor, day)
# How to cancel other futures here?
thread_pool.kill
end
end
}
end
end
the thing is: killing a thread pool is not really recommended and might have unpredictable results
a better approach is to track when a Future is done and ignore other Futures:
class DailyJobs
def call
status = ::Concurrent::AtomicBoolean.new(false)
days_to_scan.map{ |day|
Concurrent::Future.execute do
return if status.true? # Early return so Future does nothing
sleep_time = day.to_f / days_to_scan.count.to_f * seconds_to_complete.to_f
sleep (sleep_time)
if GoogleAPI.new.api_call(#adwords, ad_seeder, visitor, day)
# Do your thing
status.value = true # This will let you know that at least one Future completed
end
end
}
end
end
It is worth noting that if this is a Rails application, you probably want to wrap your Future on Rails executor to avoid autoloading and deadlock issues. I wrote about it here
Okey, I could implement it as:
#wait until one job has achieved the goal
while jobs.select{|job| job.value == 'L' }.count == 0 && jobs.select{|job| [:rejected, :fulfilled].include?(job.state) }.count != jobs.count
sleep(0.1)
end
#cancel other jobs
jobs.each{|job| job.cancel unless (job.state == :fulfilled && job.value == success_value) }
}

Spark - how to handle with lazy evaluation in case of iterative (or recursive) function calls

I have a recursive function that needs to compare the results of the current call to the previous call to figure out whether it has reached a convergence. My function does not contain any action - it only contains map, flatMap, and reduceByKey. Since Spark does not evaluate transformations (until an action is called), my next iteration does not get the proper values to compare for convergence.
Here is a skeleton of the function -
def func1(sc: SparkContext, nodes:RDD[List[Long]], didConverge: Boolean, changeCount: Int) RDD[(Long] = {
if (didConverge)
nodes
else {
val currChangeCount = sc.accumulator(0, "xyz")
val newNodes = performSomeOps(nodes, currChangeCount) // does a few map/flatMap/reduceByKey operations
if (currChangeCount.value == changeCount) {
func1(sc, newNodes, true, currChangeCount.value)
} else {
func1(sc, newNode, false, currChangeCount.value)
}
}
}
performSomeOps only contains map, flatMap, and reduceByKey transformations. Since it does not have any action, the code in performSomeOps does not execute. So my currChangeCount does not get the actual count. What that implies, the condition to check for the convergence (currChangeCount.value == changeCount) is going to be invalid. One way to overcome is to force an action within each iteration by calling a count but that is an unnecessary overhead.
I am wondering what I can do to force an action w/o much overhead or is there another way to address this problem?
I believe there is a very important thing you're missing here:
For accumulator updates performed inside actions only, Spark guarantees that each task’s update to the accumulator will only be applied once, i.e. restarted tasks will not update the value. In transformations, users should be aware of that each task’s update may be applied more than once if tasks or job stages are re-executed.
Because of that accumulators cannot be reliably used for managing control flow and are better suited for job monitoring.
Moreover executing an action is not an unnecessary overhead. If you want to know what is the result of the computation you have to perform it. Unless of course the result is trivial. The cheapest action possible is:
rdd.foreach { case _ => }
but it won't address the problem you have here.
In general iterative computations in Spark can be structured as follows:
def func1(chcekpoinInterval: Int)(sc: SparkContext, nodes:RDD[List[Long]],
didConverge: Boolean, changeCount: Int, iteration: Int) RDD[(Long] = {
if (didConverge) nodes
else {
// Compute and cache new nodes
val newNodes = performSomeOps(nodes, currChangeCount).cache
// Periodically checkpoint to avoid stack overflow
if (iteration % checkpointInterval == 0) newNodes.checkpoint
/* Call a function which computes values
that determines control flow. This execute an action on newNodes.
*/
val changeCount = computeChangeCount(newNodes)
// Unpersist old nodes
nodes.unpersist
func1(checkpointInterval)(
sc, newNodes, currChangeCount.value == changeCount,
currChangeCount.value, iteration + 1
)
}
}
I see that these map/flatMap/reduceByKey transformations are updating an accumulator. Therefore the only way to perform all updates is to execute all these functions and count is the easiest way to achieve that and gives the lowest overhead compared to other ways (cache + count, first or collect).
Previous answers put me on the right track to solve a similar convergence detection problem.
foreach is presented in the docs as:
foreach(func) : Run a function func on each element of the dataset. This is usually done for side effects such as updating an Accumulator or interacting with external storage systems.
It seems like instead of using rdd.foreach() as a cheap action to trigger accumulator increments placed in various transformations, it should be used to do the incrementing itself.
I'm unable to produce a scala example, but here's a basic java version, if it can still help:
// Convergence is reached when two iterations
// return the same number of results
long previousCount = -1;
long currentCount = 0;
while (previousCount != currentCount){
rdd = doSomethingThatUpdatesRdd(rdd);
// Count entries in new rdd with foreach + accumulator
rdd.foreach(tuple -> accumulator.add(1));
// Update helper values
previousCount = currentCount;
currentCount = accumulator.sum();
accumulator.reset();
}
// Convergence is reached

How to indiciate a failure for a function with a void result

I have a function in scala which has no return-value (so unit). This function can sometimes fail (if the user provided parameters are not valid). If I were on java, I would simply throw an exception. But on scala (although the same thing is possible), it is suggested to not use exceptions.
I perfectly know how to use Option or Try, but they all only make sense if you have something valid to return.
For example, think of a (imaginary) addPrintJob(printJob: printJob): Unit command which adds a print job to a printer. The job definition could now be invalid and the user should be notified of this.
I see the following two alternatives:
Use exceptions anyway
Return something from the method (like a "print job identifier") and then return a Option/Either/Try of that type. But this means adding a return value just for the sake of error handling.
What are the best practices here?
You are too deep into FP :-)
You want to know whether the method is successful or not - return a Boolean!
According to this Throwing exceptions in Scala, what is the "official rule" Throwing exceptions in scala is not advised as because it breaks the control flow. In my opinion you should throw an exception in scala only when something significant has gone wrong and normal flow should not be continued.
For all other cases it generally better to return the status/result of the operation that was performed. scala Option and Either serve this purpose. imho A function which does not return any value is a bad practice.
For the given example of the addPrintJob I would return an job identifier (as suggested by #marstran in comments), if this is not possible the status of addPrintJob.
The problem is that usually when you have to model things for a specific method it is not about having success or failure ( true or false ) or ( 0 or 1 - Unit exit codes wise ) or ( 0 or 1 - true or false interpolation wise ) , but about returning status info and a msg , thus the most simplest technique I use ( whenever code review naysayers/dickheads/besserwissers are not around ) is that
val msg = "unknown error has occurred during ..."
val ret = 1 // defined in the beginning of the method, means "unknown error"
.... // action
ret = 0 // when you finally succeeded to implement FULLY what THIS method was supposed to to
msg = "" // you could say something like ok , but usually end-users are not interested in your ok msgs , they want the stuff to work ...
at the end always return a tuple
return ( ret , msg )
or if you have a data as well ( lets say a spark data frame )
return ( ret , msg , Some(df))
Using return is more obvious, although not required ( for the purists ) ...
Now because ret is just a stupid int, you could quickly turn more complex status codes into more complex Enums , objects or whatnot , but the point is that you should not introduce more complexity than it is needed into your code in the beginning , let it grow organically ...
and of course the caller would call like
( ret , msg , mayBeDf ) = myFancyFunc(someparam, etc)
Thus exceptions would mean truly error situations and you will avoid messy try catch jungles ...
I know this answer WILL GET down-voted , because well there are too much guys from universities with however bright resumes writing whatever brilliant algos and stuff ending-up into the spagetti code we all are sick of and not something as simple as possible but not simpler and of course something that WORKS.
BUT, if you need only ok/nok control flow and chaining, here is bit more elaborated ok,nok example, which does really throw exception, which of course you would have to trap on an upper level , which works for spark:
/**
* a not so fancy way of failing asap, on first failing link in the control chain
* #return true if valid, false if not
*/
def isValid(): Boolean = {
val lst = List(
isValidForEmptyDF() _,
isValidForFoo() _,
isValidForBar() _
)
!lst.exists(!_()) // and fail asap ...
}
def isValidForEmptyDF()(): Boolean = {
val specsAreMatched: Boolean = true
try {
if (df.rdd.isEmpty) {
msg = "the file: " + uri + " is empty"
!specsAreMatched
} else {
specsAreMatched
}
} catch {
case jle: java.lang.UnsupportedOperationException => {
msg = msg + jle.getMessage
return false
}
case e: Exception => {
msg = msg + e.getMessage()
return false
}
}
}
Disclaimer: my colleague helped me with the fancy functions syntax ...