How to find which rule take long time to finish - drools

We have around 15,000 rules running and it takes 2 hours to complete. I would like to figure out which rule takes long time. Its not possible for me to go to each rule and log it. So I implemented the AgendaEventListener and override afterMatchFired() method. I know which rules fired. But how do I know which rule took long time.

If you want the time to execute the match, update your listener to implement the beforeMatchFired method as well. When the beforeMatchFired method triggers, start the timer. When the same rule's afterMatchFired triggers, stop the timer and log it. You'll want to also track match cancelled to discard unnecessary timers.
Remember to track this on a per-rule basis -- you'll need to identify each rule uniquely (package + name should be specific enough). Note that this exercise will get a little complicated if you have rules that extend other rules.

Rather that doing your own profile, you should take a look at the new drools-metric module that solves exactly your problem.
From the manual
Use the drools-metric module to identify the obstruction in your rules
You can use the drools-metric module to identify slow rules especially
when you process many rules. The drools-metric module can also assist
in analyzing the Drools engine performance. Note that the
drools-metric module is not for production environment use. However,
you can perform the analysis in your test environment.
To analyze the Drools engine performance using drools-metric, add
drools-metric to your project dependencies and enable trace logging
for org.drools.metric.util.MetricLogUtils , as shown in the following
example:

Try RulesChronoAgendaEventListener and RulesChronoChartRecorder from da-utils
Both gives you real-time JMX monitoring, showing average and max execution time for each rule. Second gather jfreechart data to show you values in time, which gives you perfect sense of overall performance and 'problematic' cases.
Register session listener
AgendaEventListener rulesChrono = new RulesChronoChartRecorder().withMaxThreshold(500).schedule();
session.addEventListener(rulesChrono);
Print per rule statistic
StringBuilder sb = new StringBuilder();
rulesChrono.getPerfStat().values()
.forEach(s -> sb.append(format("%n%s - min: %.2f avg: %.2f max: %.2f activations: %d", s.getDomain(), s.getMinTimeMs(), s.getAvgTimeMs(), s.getMaxTimeMs(), s.getLeapsCount())));
System.out.println(sb.toString());
Draw charts
Map<String, TimeSeries> rulesMaxChart = rulesChrono.getRulesMaxChart();
Map<String, TimeSeries> rulesMinChart = rulesChrono.getRulesMinChart();
rulesChrono.getRulesAvgChart().entrySet()
.forEach(e -> ChartUtils.pngChart(format("charts/%s.png", e.getKey()), 1024, 500, e.getValue(), Color.black, rulesMaxChart.get(e.getKey()), Color.lightGray, rulesMinChart.get(e.getKey()), Color.lightGray));
For the second you'll need jfreechart dependency

Related

Spark::KMeans calls takeSample() twice?

I have many data and I have experimented with partitions of cardinality [20k, 200k+].
I call it like that:
from pyspark.mllib.clustering import KMeans, KMeansModel
C0 = KMeans.train(first, 8192, initializationMode='random', maxIterations=10, seed=None)
C0 = KMeans.train(second, 8192, initializationMode='random', maxIterations=10, seed=None)
and I see that initRandom() calls takeSample() once.
Then the takeSample() implementation doesn't seem to call itself or something like that, so I would expect KMeans() to call takeSample() once. So why the monitor shows two takeSample()s per KMeans()?
Note: I execute more KMeans() and they all invoke two takeSample()s, regardless of the data being .cache()'d or not.
Moreover, the number of partitions doesn't affect the number takeSample() is called, it's constant to 2.
I am using Spark 1.6.2 (and I cannot upgrade) and my application is in Python, if that matters!
I brought this to the mailing list of the Spark devs, so I am updating:
Details of 1st takeSample():
Details of 2nd takeSample():
where one can see that the same code is executed.
As suggested by Shivaram Venkataraman in Spark's mailing list:
I think takeSample itself runs multiple jobs if the amount of samples
collected in the first pass is not enough. The comment and code path
at GitHub
should explain when this happens. Also you can confirm this by
checking if the logWarning shows up in your logs.
// If the first sample didn't turn out large enough, keep trying to take samples;
// this shouldn't happen often because we use a big multiplier for the initial size
var numIters = 0
while (samples.length < num) {
logWarning(s"Needed to re-sample due to insufficient sample size. Repeat #$numIters")
samples = this.sample(withReplacement, fraction, rand.nextInt()).collect()
numIters += 1
}
However, as one can see, the 2nd comment said it shouldn't happen often, and it does happen always to me, so if anyone has another idea, please let me know.
It was also suggested that this was a problem of the UI and takeSample() was actually called only once, but that was just hot air.

How can I calculate business/SLA hours with iterating over each second?

Before I spend a lot of time writing the only solution I can think of I was wondering if I'm doing it an inefficient way.
Once a support ticket is closed, a script is triggered, the script is passed an array of 'status-change-events' that happened from call open to close. So you might have 5 changes: new, open, active, stalled, resolved. Each one of these events has a timestamp associated with it.
What I need to do is calculate how much time the call was with us (new, open, active) and how much time it was with the customer (stalled). I also need to figure out how much of the 'us' time was within core hours 08:00 - 18:00 (and how many were non-core), and weekends/bank holidays count towards non-core hours.
My current idea is to for each status change, iterate over every second that occurred and check for core/non_core, and log it.
Here's some pseudo code:
time_since_last = ticket->creation_date
foreach events as event {
time_now = time_since_last
while (time_now < ticket->event_date) {
if ticket->status = stalled {
customer_fault_stalled++
} else {
work out if it was our fault or not
add to the appropriate counter etc
}
time_now++
}
}
Apologies if it's a little unclear, it's a fairly longwinded problem. Also I'm aware this may be slightly off of SO question guidelines, but I can't think of a better way of wording it and I need some advice before I spend days writing it this way.
I think you have the right idea, but recalculating the status of every ticket for every second of elapsed time will take a lot processing, and nothing will have changed for the vast majority of those one-second intervals
The way event simulations work, and the way I think you should write your application, is to create a list of all events where the status might change. So you will want to include all of the status change events for every ticket as well as the start and end of core time on all non-bank-holiday weekdays
That list of events is sorted by timestamp, after which you can just process each event as if your per-second counter has reached that time. The difference is that you no longer have to count through the many intervening seconds where nothing changes, and you should end up with a much more efficient application
I hope that's clear. You may find it easier to process each ticket separately, but the maximum gain will be achieved by processing all tickets simultaneously. You will still have a sorted sequence of events to process, but you will avoid having to reprocess the same core time start and end events over and over again
One more thing I noticed is that you can probably ignore any open status change events. I would guess that tickets either go from new to open and then active, or straight from new to resolved. So a switch between with your company and with the customer will never be made at an open event, and so they can be ignored. Please check this as I am only speaking from my intuition, and clearly know nothing about how your ticketing system has been designed
I would not iterate over the seconds. Depending on the cost of your calculations that could be quite costly. It would be better to calculate the borders between core/outside core.
use strict;
use warnings;
my $customer_time;
my $our_time_outside;
my $our_time_core;
foreach my $event ($ticket->events) {
my $current_ts = $event->start_ts;
while ($current_ts < $event->end_ts) {
if ($event->status eq 'stalled') {
$customer_time += $event->end_ts - $current_ts;
$current_ts = $event->end_ts;
}
elsif (is_core_hours($current_ts)) {
my $next_ts = min(end_of_core_hours($current_ts), $event->end_ts);
$our_time_core += $next_ts - $current_ts;
$current_ts = $next_ts;
}
else {
my $next_ts = min(start_of_core_hours($current_ts), $event->end_ts);
$our_time_outside += $next_ts - $current_ts;
$current_ts = $next_ts;
}
}
}
I can't see why you'd want to iterate over every second. That seems very wasteful.
Get a list of all of the events for a given ticket.
Add to the list any boundaries between core and non-core times.
Sort this list into chronological order.
For each consecutive pair of events in the list, subtract the later from the earlier to get a duration.
Add that duration to the appropriate bucket.
And the usual caveats for dealing with dates and times apply here:
Use a library (I recommend DateTime together with DateTime::Duration)
Convert all of your timestamps to UTC as soon as you get them. Only convert back to local time just before displaying them to the user.

Moving from file-based tracing session to real time session

I need to log trace events during boot so I configure an AutoLogger with all the required providers. But when my service/process starts I want to switch to real-time mode so that the file doesn't explode.
I'm using TraceEvent and I can't figure out how to do this move correctly and atomically.
The first thing I tried:
const int timeToWait = 5000;
using (var tes = new TraceEventSession("TEMPSESSIONNAME", #"c:\temp\TEMPSESSIONNAME.etl") { StopOnDispose = false })
{
tes.EnableProvider(ProviderExtensions.ProviderName<MicrosoftWindowsKernelProcess>());
Thread.Sleep(timeToWait);
}
using (var tes = new TraceEventSession("TEMPSESSIONNAME", TraceEventSessionOptions.Attach))
{
Thread.Sleep(timeToWait);
tes.SetFileName(null);
Thread.Sleep(timeToWait);
Console.WriteLine("Done");
}
Here I wanted to make that I can transfer the session to real-time mode. But instead, the file I got contained events from a 15s period instead of just 10s.
The same happens if I use new TraceEventSession("TEMPSESSIONNAME", #"c:\temp\TEMPSESSIONNAME.etl", TraceEventSessionOptions.Create) instead.
It seems that the following will cause the file to stop being written to:
using (var tes = new TraceEventSession("TEMPSESSIONNAME"))
{
tes.EnableProvider(ProviderExtensions.ProviderName<MicrosoftWindowsKernelProcess>());
Thread.Sleep(timeToWait);
}
But here I must reenable all the providers and according to the documentation "if the session already existed it is closed and reopened (thus orphans are cleaned up on next use)". I don't understand the last part about orphans. Obviously some events might occur in the time between closing, opening and subscribing on the events. Does this mean I will lose these events or will I get the later?
I also found the following in the documentation of the library:
In real time mode, events are buffered and there is at least a second or so delay (typically 3 sec) between the firing of the event and the reception by the session (to allow events to be delivered in efficient clumps of many events)
Does this make the above code alright (well, unless the improbable happens and for some reason my thread is delayed for more than a second between creating the real-time session and starting processing the events)?
I could close the session and create a new different one but then I think I'd miss some events. Or I could open a new session and then close the file-based one but then I might get duplicate events.
I couldn't find online any examples of moving from a file-based trace to a real-time trace.
I managed to contact the author of TraceEvent and this is the answer I got:
Re the exception of the 'auto-closing and restarting' feature, it is really questions about the OS (TraceEvent simply calls the underlying OS API). Just FYI, the deal about orphans is that it is EASY for your process to exit but leave a session going. This MAY be what you want, but often it is not, and so to make the common case 'just work' if you do Create (which is the default), it will close a session if it already existed (since you asked for a new one).
Experimentation of course is the touchstone of 'truth' but I would frankly expecting unusual combinations to just work is generally NOT true.
My recommendation is to keep it simple. You need to open a new session and close the original one. Yes, you will end up with duplicates, but you CAN filter them out (after all they are IDENTICAL timestamps).
The other possibility is use SetFileName in its intended way (from one file to another). This certainly solves your problem of file size growth, and often is a good way to deal with other scenarios (after all you can start up you processing and start deleting files even as new files are being generated).

webMethods rotate log now by calling special method

Is it possible to rotate a logfile immediately in webMethods - by calling a special method or whatever. I do not want to use third-party software.
Further explanation
I need this rotation for both. The default logfile(s) (e.g. server.log) and custom logfiles.
By default webMethods logs (all components such IS, MWS, Optimize etc) rotate every 24h on midnight. You can change that interval by modifying an extended property.
For IntegrationServer 9.6 and lower it is watt.server.logRotateInterval (in milliseconds).
Please note: The watt.server.logRotateInterval parameter was removed from Integration Server after
8.2 SP2. When it was reintroduced for the following fixes, the scope of the parameter changed so that it affected only the stats.log:
IS_9.0_SP1_Core_Fix6
IS_9.5_SP1_Core_Fix3
IS_9.6_Core_Fix2
Starting with Integration Server 9.7, this server configuration parameter has been renamed watt.server.statsLogRotateInterval (in minutes instead of milliseconds), but affects only stats.log file as well.
So I think there is no way to change the log rotate interval. For compressing the old log files I think the best solution would be to write a service that does that and used that service to create a scheduled task (executed daily after midnight).

How can I make a 'trial version' of a matlab GUI

My aim is to make a GUI, then by using deploytool to make an exe file from it.
Since I don't want the user to be able to use it for ever I want to make it as a trial version meaning that it will work only for a certain time.
I thought maybe by somehow connecting to the user's computer clock and date, and using the code for a time limit, but I found some problems it this logic.
Any ideas, how it can be done?
Using the computer's clock seems a reasonable way to go. Sure, the user than thwart that by changing the clock, but this will most likely create sufficient inconvenience that they rather pay the reasonable price of the software.
Simply put the following inside the OpeningFcn of your GUI
expiryDate = '2012-12-31';
if now > datenum(expiryDate)
h = errordlg('please upgrade to a full license');
uiwait(h)
return %# or throw an error
end