I'm migrating legacy drools files from 4.0.3 to 7X.
I have performance issue using new version. Of course, there a bunch of changes between these versions, so I decided to make dummy test and made JMH benchmarks for each version.
Here is complete code of two projects : Github code
I made two classes:
public class Subscriber {
private int chargingProfileId;
private List<Account> accounts;
public class Account {
private int accountKindId;
private long balance;
and generated drl with rundom criteria like this
rule "general.test.rule.cpX.acc.Y"
activation-group "main"
salience Z
when
Subscriber(chargingProfileId == X)
$acc:Account(accountKindId == Y, balance>0)
then
$acc.decrementBalance();
end
All rules has salience in definition (sadly in legacy app we used a lot this feature). We need only StatelessSessions so only one rule has to activated per execute (activation-group "main").
I'm pregenerated dummy random data and saved in json file (subscribers.json) , so the data is exactly the same for both version tests. also rules are the same (count of rules is 4K ).
In drools 4X we are using java 1.6. We are using RuleBaseConfiguration with configuration:
conf.setSequential(false);
conf.setShadowProxy(false);
if ShadowProxy is enabled, Throughput decreeses dramatically:
Benchmark (shadowProxy) Mode Cnt Score Error Units
DroolsBenchmark.send false thrpt 30 138888.339 ± 6603.057 ops/s
DroolsBenchmark.send true thrpt 30 1704.062 ± 178.104 ops/s
I know, in new versions Drools do not uses ShadowProxy parameter, so there is no way to configure this in 7X.
RuleBaseConfiguration conf = (RuleBaseConfiguration) KnowledgeBaseFactory.newKnowledgeBaseConfiguration();
conf.setSequential(false);
and the JMH output is
DroolsBenchmark.send thrpt 30 67881.788 ± 941.384 ops/s
In real project setting conf.setSessionPoolSize(X); made better results, but sadly this not helped in this tests.
But when I set for SessionPoolSize (in real tests), it seems that FireAllRulesCommand's AgendaFilter fires several times per execution, even thought there is "activation group main":
batch.add(new FireAllRulesCommand(match -> {
RuleImpl rule = (RuleImpl) match.getRule();
String activationGroup = rule.getActivationGroup();
if (activationGroup != null && activationGroup.equals("main")) {
RULE_FOUND_COUNT.incrementAndGet();
}
return true;
}));
Sadly, I couldn't reproduce this in TestCase, I think because real project rules are more complicated, not dummy as this test.
Also tried to make KieBase(RuleBase in 4.x) as threadLocal, but this didn't helped, saw more degradation...
In real project I have ~15K ops/s with drools 4.x and only ~2.5K ops/s with 7.x.
I used several versions of drools 7.X (7.52.0.Final, 7.59.0.Final, 7.52.0.Final-redhat-00008 ...), but nearly the same problem.
Of course, it's different versions, also with unsupported java and drools, also JMH versions are different (1.6.3 for java 1.6 and 1.32 for java 8). I know, drools improved greatly during these period, but the results shows my problem.
I was testing JMH with
#BenchmarkMode(Mode.Throughput)
#OutputTimeUnit(TimeUnit.SECONDS)
I change it to to see latencies
#BenchmarkMode(Mode.AverageTime)
#OutputTimeUnit(TimeUnit.MICROSECONDS)
but results are the same. In my example ,drools 4.0.3 has nearly 2X performance compare to 7.52.0.Final .
There is "jmsoutput" in each project and you'll see the results
drools 4.0.3 final results:
Result: 75.845 ±(99.9%) 2.668 us/op [Average]
Statistics: (min, avg, max) = (68.360, 75.845, 87.189), stdev = 3.994
Confidence interval (99.9%): [73.176, 78.513]
# Run complete. Total time: 00:05:51
Benchmark Mode Cnt Score Error Units
DroolsBenchmark.send avgt 30 75.845 ± 2.668 us/op
drools 7.52.0.Final results:
Result "com.drools.perf.test.DroolsBenchmark.send":
147.362 ±(99.9%) 2.510 us/op [Average]
(min, avg, max) = (140.884, 147.362, 157.948), stdev = 3.757
CI (99.9%): [144.852, 149.872] (assumes normal distribution)
# Run complete. Total time: 00:05:52
Benchmark Mode Cnt Score Error Units
DroolsBenchmark.send avgt 30 147.362 ± 2.510 us/op
What is the problem? I think it's kind of configuration ...
Related
I work in industrial automation and the functions of automation processors and software are locked down. I'm trying to sample and collect an analog signal at as fast of a rate as I can, <=10ms.
I have tried VB into excel, using a DDERequest and incrementing a delayed loop.
Application.Wait is too slow (1s)
"Private Declare PtrSafe Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)," had the most promise, but too slow (100ms). It can be pushed faster, but this is on my computer, and then grabbing the float from the automation processor over ethernet... 100ms is the fastest without distorting the "real-time sample."
I have tried a Python module that pulls the float from the IP traffic. (Still over ethernet and too slow)
#x parameters
sample = .001
iterations = 1000
#Collection
for i in range(iterations):
# read the GPIO
float1 = SomeGPIOCommand(pin#)
float2 = SomeGPIOCommand(pin#)
# add the result to our ypoints list
ypoints1.append(float(float1.Value))
ypoints2.append(float(float2.Value))
#x
t = i*sample
xpoints.append(float(t))
# pause
time.sleep(sample)
#Plot
plt.plot(xpoints, ypoints1, 'c-', label='target' ) plt.plot(xpoints, ypoints2 ,'r--', label='actual')
OR is this fast of a sample rate going to require code under an IDE? The key here is matching the time stamp, in ms, exactly with the measured value.
I'd like to get there without an IDE, I just have no clue where to start, especially with the pi.
I have yet to see any example with this performance level.
Appreciate any help!
OR is this fast of a sample rate going to require code under an IDE?
No. A fast sample rate doesn't require coding under an IDE. Whether or not the code is developed under an IDE will have no bearing on sample rate.
We have around 15,000 rules running and it takes 2 hours to complete. I would like to figure out which rule takes long time. Its not possible for me to go to each rule and log it. So I implemented the AgendaEventListener and override afterMatchFired() method. I know which rules fired. But how do I know which rule took long time.
If you want the time to execute the match, update your listener to implement the beforeMatchFired method as well. When the beforeMatchFired method triggers, start the timer. When the same rule's afterMatchFired triggers, stop the timer and log it. You'll want to also track match cancelled to discard unnecessary timers.
Remember to track this on a per-rule basis -- you'll need to identify each rule uniquely (package + name should be specific enough). Note that this exercise will get a little complicated if you have rules that extend other rules.
Rather that doing your own profile, you should take a look at the new drools-metric module that solves exactly your problem.
From the manual
Use the drools-metric module to identify the obstruction in your rules
You can use the drools-metric module to identify slow rules especially
when you process many rules. The drools-metric module can also assist
in analyzing the Drools engine performance. Note that the
drools-metric module is not for production environment use. However,
you can perform the analysis in your test environment.
To analyze the Drools engine performance using drools-metric, add
drools-metric to your project dependencies and enable trace logging
for org.drools.metric.util.MetricLogUtils , as shown in the following
example:
Try RulesChronoAgendaEventListener and RulesChronoChartRecorder from da-utils
Both gives you real-time JMX monitoring, showing average and max execution time for each rule. Second gather jfreechart data to show you values in time, which gives you perfect sense of overall performance and 'problematic' cases.
Register session listener
AgendaEventListener rulesChrono = new RulesChronoChartRecorder().withMaxThreshold(500).schedule();
session.addEventListener(rulesChrono);
Print per rule statistic
StringBuilder sb = new StringBuilder();
rulesChrono.getPerfStat().values()
.forEach(s -> sb.append(format("%n%s - min: %.2f avg: %.2f max: %.2f activations: %d", s.getDomain(), s.getMinTimeMs(), s.getAvgTimeMs(), s.getMaxTimeMs(), s.getLeapsCount())));
System.out.println(sb.toString());
Draw charts
Map<String, TimeSeries> rulesMaxChart = rulesChrono.getRulesMaxChart();
Map<String, TimeSeries> rulesMinChart = rulesChrono.getRulesMinChart();
rulesChrono.getRulesAvgChart().entrySet()
.forEach(e -> ChartUtils.pngChart(format("charts/%s.png", e.getKey()), 1024, 500, e.getValue(), Color.black, rulesMaxChart.get(e.getKey()), Color.lightGray, rulesMinChart.get(e.getKey()), Color.lightGray));
For the second you'll need jfreechart dependency
I have a open source project and the counter of files in project are more than 40.
I build the project when the configuration is Debug and the compile time is 2m22s.
And I also use the BuildTimeAnalyzer, the longest time of all is 28ms.
But when I build the project with the Release configuration, it stuck in Compile Swift source files more than one hour.
I have no idea about this, please help me.
In the DEBUG build, if you add up all the time spent on each function, you get about 7s. The numbers don't quite add up — you have spent 142s to build the whole thing, but these functions just take about than 7s to compile??
That's because these timing just accounts for type-checking each function body. In the Swift frontend there are three flags you could use:
-Xfrontend -debug-time-compilation
-Xfrontend -debug-time-function-bodies
-Xfrontend -debug-time-expression-type-checking
Let's use the first to see the whole picture. Pick one slow file, say Option.swift, and look:
===-------------------------------------------------------------------------===
Swift compilation
===-------------------------------------------------------------------------===
Total Execution Time: 30.5169 seconds (43.6413 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
23.5183 ( 80.1%) 0.7773 ( 67.6%) 24.2957 ( 79.6%) 34.4762 ( 79.0%) LLVM output
3.7312 ( 12.7%) 0.0437 ( 3.8%) 3.7749 ( 12.4%) 5.4192 ( 12.4%) LLVM optimization
1.8563 ( 6.3%) 0.2830 ( 24.6%) 2.1393 ( 7.0%) 3.1800 ( 7.3%) IRGen
0.2026 ( 0.7%) 0.0376 ( 3.3%) 0.2402 ( 0.8%) 0.4666 ( 1.1%) Type checking / Semantic analysis
... <snip> ...
29.3665 (100.0%) 1.1504 (100.0%) 30.5169 (100.0%) 43.6413 (100.0%) Total
Turns out it's not Swift that is slow, but LLVM! So there is not point looking at type-checking time. We can further check why LLVM is slow using -Xllvm -time-passes, but it won't give us useful information, it's just saying X86 Assembly / Object Emitter is taking most time.
Let's take a step back and check which files take most time to compile:
Option.swift 30.5169
Toolbox.swift 15.6143
PictorialBarSerie.swift 12.2670
LineSerie.swift 8.9690
ScatterSerie.swift 8.5959
FunnelSerie.swift 8.3299
GaugeSerie.swift 8.2945
...
Half a minute is spent in Options.swift. What's wrong with this file?
You have a huge struct, with 31 members. Compiling that struct alone takes 11 seconds.
You have a huge enum, with 80 variants. Compiling this enum alone takes 7 seconds.
The first problem is easy to fix: Use final class instead! The second problem would not have a simple fix (I don't see any time improvement with alternatives e.g. replace the enums with class hierarchy). All other slow files have a similar problem: large structure, large enums.
Simply replacing all struct with final class is enough to bring the compilation time from "over hours and still compiling" to "2.5 minutes".
See also Why Choose Struct Over Class?. Your "struct"s may not qualify as structs.
Note that changing from struct to class do change the semantics of users' code since classes have reference semantics.
Try this....
under Build Settings -> Swift Compiler - Code Generation for your Release choose SWIFT_OPTIMIZATION_LEVEL = -Owholemodule. Then under Other Swift Flags enter -Onone. Doing this carved off big chunks of time off my project.
I'm a Performance QC engineer, so far i used Visual Studio Ultimate to run load test bug now I'm going to change to gatling. So I'm a newbie on gatling and scala.
I'm defining the simulation with step-load scenario here:
Initial: 5 user
Maximum user count: 100 users
Step duration: 10 seconds
Step user count: 5 users
Duration: 10 minutes
Meaning: start with 5 users > after 10 seconds increase 5 users: repeat until maximum 100 user and run the test in 10 minutes.
I tried some code and other injects but the result is not as expected:
splitUsers(100)
into(rampUsers(5)
over(10 seconds))
separatedBy(10 minutes)
Could you please help me to simulate the step load on gatling?
define the User injection part in setUp something like this
setUp(
scn.inject(
atOnceUsers(5), //Initial: 5 user
nothingFor(10 seconds), //A pause to uniform the step load
splitUsers(100) into atOnceUsers(5) separatedBy(10 seconds) //max user,split time,number of user
).protocols(httpConf))
the duration you can define just by using during function over scenario. Hope it helps
Can you be more specific about the result not being as expected?
According to the documentation your situation should be:
splitUsers(100) into(rampUsers(5) over(10 seconds)) separatedBy atOnceUsers(5)
If test duration is the target then have a look at Throttling in the Gatling documentation.
I need to create a 100fps animation that display 3d data from a file that contains 100 frames per second. But the AnimationTimer in javaFx allows me to get 60fps only. How to get over it?
Removing the JavaFX Frame Rate Cap
You can remove the 60fps JavaFX frame rate cap by setting a system property, e.g.,
java -Djavafx.animation.fullspeed=true MyApp
Which is an undocumented and unsupported setting.
Removing the JavaFX frame rate cap may make your application considerably less efficient in terms of resource usage (e.g. a JavaFX application without a frame rate cap will consume more CPU than an application with the frame rate cap in place).
Configuring the JavaFX Frame Rate Cap
Additionally, there is another undocumented system property you could try:
javafx.animation.framerate
I have not tried it.
Debugging JavaFX Frames (Pulses)
There are other settings like -Djavafx.pulseLogger=true which you could enable to help you debug the JavaFX architecture and validate that your application is actually running at the framerate you expect.
JavaFX 8 has a Pulse Logger (-Djavafx.pulseLogger=true system property) that "prints out a lot of crap" (in a good way) about the JavaFX engine's execution. There is a lot of provided information on a per-pulse basis including pulse number (auto-incremented integer), pulse duration, and time since last pulse. The information also includes thread details and events details. This data allows a developer to see what is taking most of the time.
Warning
Normal warnings for using undocumented features apply, as Richard Bair from the JavaFX team notes:
Just a word of caution, if we haven't documented the command line switches, they're fair game for removal / modification in subsequent releases :-)
Fullspeed=true will give you a high framerate without control (and thereby decrease performance of your app as it renders too much), framerate doesn't work indeed.
Use:
-Djavafx.animation.pulse=value
You can check your framerate with the following code. I checked if it actually works too by setting the pulserate to 2, 60 and 120 (I have a 240Hz monitor) and you see a difference in how fast the random number changes.
private final long[] frameTimes = new long[100];
private int frameTimeIndex = 0 ;
private boolean arrayFilled = false ;
Label label = new Label();
root.getChildren().add(label);
AnimationTimer frameRateMeter = new AnimationTimer() {
#Override
public void handle(long now) {
long oldFrameTime = frameTimes[frameTimeIndex] ;
frameTimes[frameTimeIndex] = now ;
frameTimeIndex = (frameTimeIndex + 1) % frameTimes.length ;
if (frameTimeIndex == 0) {
arrayFilled = true ;
}
if (arrayFilled) {
long elapsedNanos = now - oldFrameTime ;
long elapsedNanosPerFrame = elapsedNanos / frameTimes.length ;
double frameRate = 1_000_000_000.0 / elapsedNanosPerFrame ;
label.setText(String.format("Current frame rate: %.3f" +", Random number:" + Math.random(), frameRate));
}
}
};
frameRateMeter.start();