Drools. Slow performance and needs restart. KieServicesClient should call method close or completeConversation? - drools

We use drools for our business rules but currently as time goes by drools become slower and slower and then we have to restart them. When they become slow we collect dump but we did not see the memory to be increased dramatically. We use java rest client in order to communicate with drools and specifically we use below code
KieServicesClient kieServicesClient = KieServicesFactory.newKieServicesClient(conf);
KieServicesClient ruleClient = kieServicesClient.getServicesClient(RuleServicesClient.class);
After communicating with drools should we call kieServicesClient.close() or kieServicesClient.completeConversation()? If we do not call anything then what will happen? Can drools become slower?
Could you please give some hints how we can investigate why the performance of the drools is deteriorating?
rules are deployed in Business Central Workbench, v7.47.0.Final

Related

Safe to use Single KieSession instance in multithreading environment?

I and creating a rule engine with the help of drools. What I have done till now is that on server boot I have created singleton instance of KieSession using the drl file. I want to use this same instance for multiple rule evaluation in multithreaded environment.
So what I am doing is that, after my fact is ready I will insert the fact in kieSession, Fire rules and post that I will delete the fact from the kieSession.
ExamplePojo examplePojo = new ExamplePojo();
FactHandle insertedFact = kieSession.insert(examplePojo);
kieSession.fireAllRules();
kieSession.delete(insertedFact);
As per the KieSession.delete JavaDoc -
Retracts the fact for which the given FactHandle was assigned regardless if it has been explicitly or logically inserted.
Things I know -
In multithreaded scenario kieSession.fireAllRules() can be called multiple times even when the fact from other thread is still in working memory
I have tested that calling kieSession.fireAllRules() multiple times does not execute rules again on the already inserted facts if there are no explicit changes in rules in between the two fires
kieSession.delete(insertedFact) is compulsory from memory leak point of view
At any point of time there can be facts from multiple threads in the working memory at the same time
Questions I have -
Will this scenario can cause malfunctioning in rule execution through some execution interference?
Can this cause any sort if memory leaks the way I am using it?
Experts your advises are welcome. I am new to Drools and not aware of such use cases.

"PSQLException: FATAL: sorry, too many clients already" error in integration tests with jOOQ & Spring Boot

There are already similar questions about this error and suggested solutions; e.g. increasing max_connections in postgresql.conf and / or adapting the max number of connections your app requests. However, my question is more specific to using jOOQ in a Spring Boot application.
I integrated jOOQ into my application as in the example on GitHub. Namely, I am using DataSourceConnectionProvider with TransactionAwareDataSourceProxy to handle database connections, and I inject the DSLContext in the classes that need it.
My application provides various web services to front-ends and I've never encountered that PSQLException on dev or test environments so far. I only started getting that error when running all integration tests (around 1000) locally. I don't expect some leak in handling the connection as Spring and jOOQ manage the resources; nevertheless that error got me worried if that would also happen on production.
Long story short, is there a better alternative to using DataSourceConnectionProvider to manage connections? Note that I already tried using DefaultConnectionProvider as well, and tried to make spring.datasource.max-active less than max_connections allowed by Postgres. Neither fixed my problem so far.
Since your question seems not to be about the generally best way to work with PostgreSQL connections / data sources, I'll answer the part about jOOQ and using its DataSourceConnectionProvider:
Using DataSourceConnectionProvider
There is no better alternative in general. In order to understand DataSourceConnectionProvider (the implementation), you have to understand ConnectionProvider (its specification). It is an SPI that jOOQ uses for two things:
to acquire() a connection prior to running a statement or a transaction
to release() a connection after running a statement (and possibly, fetching results) or a transaction
The DataSourceConnectionProvider does so by acquiring a connection from your DataSource through DataSource.getConnection() and by releasing it through Connection.close(). This is the most common way to interact with data sources, in order to let the DataSource implementation handle transaction and/or pooling semantics.
Whether this is a good idea in your case may depend on individual configurations that you have made. It generally is a good idea because you usually don't want to manually manage connection lifecycles.
Using DefaultConnectionProvider
This can certainly be done instead, in case of which jOOQ does not close() your connection for you, you'll do that yourself. I'm expecting this to have no effect in your particular case, as you'll implement the DataSourceConnectionProvider semantics manually using e.g.
try (Connection c = ds.getConnection()) {
// Implicitly using a DefaultConnectionProvider
DSL.using(c).select(...).fetch();
// Implicit call to c.close()
}
In other words: this is likely not a problem related to jOOQ, but to your data source.

Managed Executor Service in TomEE

I have a server with 48 CPUs hosting a Java EE 7 REST API on TomEE+ 7.0.2.
Some of the APIs are in need of making use of as many CPUs as possible as they run parallelized algorithms. The parallelized portion do not require any database or other resources, just some heavy lifting in a shared double[][] matrix.
I am usually working in EJB contexts, but for this particular instance it is not a requirement (and also preferred not to be).
So far I was using
ExecutorService pool = Executors.newFixedThreadPool(maxThreads);
in order to instantiate an executor, but as this seems to spawn actual threads on operating system level I am not a big fan of it - after some JMeter load testing it even led to a point, where the the whole bash was blocked and I could not even SSH the server any more until hard reboot.
I stumbled upon the concept of "Managed Executor Service", but I cannot find a tutorial / example online in how to make use of that (and also configure that).
Could someone please share thoughts on the following?
a) How to configure a thread pool in TomEE (e.g. via server.xml, context.xml or tomee.xml), code examples would be appreciated?
b) Is there a way to just make use of some default thread pool (and is that clever enough to not need tuning, if no, where could I start out tuning that)?
c) How can I look up the thread pool in Java then - preferred via JDNI lookup?
d) If I once decided to make that resource part of EJB, what would the code for Injection look like?
My application context is specified as "myContext" in server.xml, so if you provide samples could you please indicate how the lookup strings would look like, exactly?
Other than that I have a very plain installation of TomEE+ 7.0.2, I did not touch any configuration so far.
Thank you so much for your help!
Daniel
Here's a good tutorial to get started: https://martinsdeveloperworld.wordpress.com/2014/02/25/using-java-ees-managedexecutorservice-to-asynchronously-execute-transactions/
If you inject #ManagedExecutorService, TomEE should give you a default service and pool. If it does not, that's probably a bug:
#Resource
private ManagedExecutorService mes;
You should be able to configure it in TomEE.xml like this (I didn't test this):
<Resource id="myManagedExecutorService" type="javax.enterprise.concurrent.ManagedExecutorService">
Core = 5
Max = 25
KeepAlive = 5 s
Queue = 15
WaitAtShutdown = 30 seconds
</Resource>
And in your code:
#Resource("myManagedExecutorService")
private ManagedExecutorService mes;
I figured this out by looking at service-jar.xml. You may also want to JMS and #Asyncronous which are a bit better options than ManagedExecutorService in my opinion
you can find the documentation here http://tomee.apache.org/admin/configuration/resources.html#_managedexecutorservice
The main advantage of these executors are:
it is configured in the container - no need of a custom app configuration but it is still tunable without recompiling/changing the application
it is not limited like #Asynchronous which doesn't define any particular pool so portability is not very high whereas these managed pools are quite uniform
these pools are "enterprise" friendly because you have listeners to add security and auditing
these pools propagate some context (security and jndi/the classloader typically)
In tomee we ali

What is the best practice in EF Core for using parallel async calls with an Injected DbContext?

I have a .NET Core 1.1 API with EF Core 1.1 and using Microsoft's vanilla setup of using Dependency Injection to provide the DbContext to my services. (Reference: https://learn.microsoft.com/en-us/aspnet/core/data/ef-mvc/intro#register-the-context-with-dependency-injection)
Now, I am looking into parallelizing database reads as an optimization using WhenAll
So instead of:
var result1 = await _dbContext.TableModel1.FirstOrDefaultAsync(x => x.SomeId == AnId);
var result2 = await _dbContext.TableModel2.FirstOrDefaultAsync(x => x.SomeOtherProp == AProp);
I use:
var repositoryTask1 = _dbContext.TableModel1.FirstOrDefaultAsync(x => x.SomeId == AnId);
var repositoryTask2 = _dbContext.TableModel2.FirstOrDefaultAsync(x => x.SomeOtherProp == AProp);
(var result1, var result2) = await (repositoryTask1, repositoryTask2 ).WhenAll();
This is all well and good, until I use the same strategy outside of these DB Repository access classes and call these same methods with WhenAll in my controller across multiple services:
var serviceTask1 = _service1.GetSomethingsFromDb(Id);
var serviceTask2 = _service2.GetSomeMoreThingsFromDb(Id);
(var dataForController1, var dataForController2) = await (serviceTask1, serviceTask2).WhenAll();
Now when I call this from my controller, randomly I will get concurrency errors like:
System.InvalidOperationException: ExecuteReader requires an open and available Connection. The connection's current state is closed.
The reason I believe is because sometimes these threads try to access the same tables at the same time. I know that this is by design in EF Core and if I wanted to I could create a new dbContext every time, but I am trying to see if there is a workaround. That's when I found this good post by Mehdi El Gueddari: http://mehdi.me/ambient-dbcontext-in-ef6/
In which he acknowledges this limitation:
an injected DbContext prevents you from being able to introduce multi-threading or any sort of parallel execution flows in your services.
And offers a custom workaround with DbContextScope.
However, he presents a caveat even with DbContextScope in that it won't work in parallel (what I'm trying to do above):
if you attempt to start multiple parallel tasks within the context of
a DbContextScope (e.g. by creating multiple threads or multiple TPL
Task), you will get into big trouble. This is because the ambient
DbContextScope will flow through all the threads your parallel tasks
are using.
His final point here leads me to my question:
In general, parallelizing database access within a single business transaction has little to no benefits and only adds significant complexity. Any parallel operation performed within the context of a business transaction should not access the database.
Should I not be using WhenAll in this case in my Controllers and stick with using await one-by-one? Or is dependency-injection of the DbContext the more fundamental problem here, therefore a new one should instead be created/supplied every time by some kind of factory?
Using any context.XyzAsync() method is only useful if you either await the called method or return control to a calling thread that's doesn't have context in its scope.
A DbContext instance isn't thread-safe: you should never ever use it in parallel threads. Which means, just for sure, never use it in multiple threads anyway, even if they don't run parallel. Don't try to work around it.
If for some reason you want to run parallel database operations (and think you can avoid deadlocks, concurrency conflicts etc.), make sure each one has its own DbContext instance. Note however, that parallelization is mainly useful for CPU-bound processes, not IO-bound processes like database interaction. Maybe you can benefit from parallel independent read operations but I would certainly never execute parallel write processes. Apart from deadlocks etc. it also makes it much harder to run all operations in one transaction.
In ASP.Net core you'd generally use the context-per-request pattern (ServiceLifetime.Scoped, see here), but even that can't keep you from transferring the context to multiple threads. In the end it's only the programmer who can prevent that.
If you're worried about the performance costs of creating new contexts all the time: don't be. Creating a context is a light-weight operation, because the underlying model (store model, conceptual model + mappings between them) is created once and then stored in the application domain. Also, a new context doesn't create a physical connection to the database. All ASP.Net database operations run through the connection pool that manages a pool of physical connections.
If all this implies that you have to reconfigure your DI to align with best practices, so be it. If your current setup passes contexts to multiple threads there has been a poor design decision in the past. Resist the temptation to postpone inevitable refactoring by work-arounds. The only work-around is to de-parallelize your code, so in the end it may even be slower than if you redesign your DI and code to adhere to context per thread.
It came to the point where really the only way to answer the debate was to do a performance/load test to get comparable, empirical, statistical evidence so I could settle this once and for all.
Here is what I tested:
Cloud Load test with VSTS # 200 users max for 4 minutes on a Standard Azure webapp.
Test #1: 1 API call with Dependency Injection of the DbContext and async/await for each service.
Results for Test #1:
Test #2: 1 API call with new creation of the DbContext within each service method call and using parallel thread execution with WhenAll.
Results for Test #2:
Conclusion:
For those who doubt the results, I ran these tests several times with varying user loads, and the averages were basically the same every time.
The performance gains with parallel processing in my opinion is insignificant, and this does not justify the need for abandoning Dependency Injection which would create development overhead/maintenance debt, potential for bugs if handled wrong, and a departure from Microsoft's official recommendations.
One more thing to note: as you can see there were actually a few failed requests with the WhenAll strategy, even when ensuring a new context is created every time. I am not sure the reason for this, but I would much prefer no 500 errors over a 10ms performance gain.

quick loading of a drools knowledge base

I'm trying to use Drools as the rule engine for a grammar relations to semantics mapping framework. The rule base is in excess of 5000 rules even now and will get extended. In using Drools currently the reading of the drl file containing the rules and creating the knowledge base takes a lot of time each time the program is run. Is there a way create the knowledge base once and save it in some persistent format that can be quickly loaded with the option to regenerate the knowledge base only when a change is made?
Yes, drools can serialise a knowledgebase out to external storage and then load this serialised knowledgebase back in again.
So, you need a cycle that loads from drl, compiles, serialises out. Then a second cycle that uses the serialised version.
I've used this with some success, reducing a 1 minute 30 loading time down to about 15-20 seconds. Also, it reduces your heap/perm gen requirements as well.
Check the API for the exact methods.
My first thought is to keep the knowledge base around as long as possible. Unless you are creating multiple knowledge bases from different sets of rules, and there are too many possible combinations, hang onto those knowledge bases. In one application I work on, one knowledge base has all the rules so we treat it like a singleton.
However, if that's not possible or your application is not that long-running, I don't know that Drools itself provides any ways of speeding that up. Running a Drools 5.0 project through the debugger, I see that the KnowledgeBase Drools gives me is Serializable. I imagine it would be quicker to deserialize a KnowledgeBase than to re-parse the rules. But be careful designing your application around this! You use interfaces for a reason and the implementation could change without warning.