Managed Executor Service in TomEE - service

I have a server with 48 CPUs hosting a Java EE 7 REST API on TomEE+ 7.0.2.
Some of the APIs are in need of making use of as many CPUs as possible as they run parallelized algorithms. The parallelized portion do not require any database or other resources, just some heavy lifting in a shared double[][] matrix.
I am usually working in EJB contexts, but for this particular instance it is not a requirement (and also preferred not to be).
So far I was using
ExecutorService pool = Executors.newFixedThreadPool(maxThreads);
in order to instantiate an executor, but as this seems to spawn actual threads on operating system level I am not a big fan of it - after some JMeter load testing it even led to a point, where the the whole bash was blocked and I could not even SSH the server any more until hard reboot.
I stumbled upon the concept of "Managed Executor Service", but I cannot find a tutorial / example online in how to make use of that (and also configure that).
Could someone please share thoughts on the following?
a) How to configure a thread pool in TomEE (e.g. via server.xml, context.xml or tomee.xml), code examples would be appreciated?
b) Is there a way to just make use of some default thread pool (and is that clever enough to not need tuning, if no, where could I start out tuning that)?
c) How can I look up the thread pool in Java then - preferred via JDNI lookup?
d) If I once decided to make that resource part of EJB, what would the code for Injection look like?
My application context is specified as "myContext" in server.xml, so if you provide samples could you please indicate how the lookup strings would look like, exactly?
Other than that I have a very plain installation of TomEE+ 7.0.2, I did not touch any configuration so far.
Thank you so much for your help!
Daniel

Here's a good tutorial to get started: https://martinsdeveloperworld.wordpress.com/2014/02/25/using-java-ees-managedexecutorservice-to-asynchronously-execute-transactions/
If you inject #ManagedExecutorService, TomEE should give you a default service and pool. If it does not, that's probably a bug:
#Resource
private ManagedExecutorService mes;
You should be able to configure it in TomEE.xml like this (I didn't test this):
<Resource id="myManagedExecutorService" type="javax.enterprise.concurrent.ManagedExecutorService">
Core = 5
Max = 25
KeepAlive = 5 s
Queue = 15
WaitAtShutdown = 30 seconds
</Resource>
And in your code:
#Resource("myManagedExecutorService")
private ManagedExecutorService mes;
I figured this out by looking at service-jar.xml. You may also want to JMS and #Asyncronous which are a bit better options than ManagedExecutorService in my opinion

you can find the documentation here http://tomee.apache.org/admin/configuration/resources.html#_managedexecutorservice
The main advantage of these executors are:
it is configured in the container - no need of a custom app configuration but it is still tunable without recompiling/changing the application
it is not limited like #Asynchronous which doesn't define any particular pool so portability is not very high whereas these managed pools are quite uniform
these pools are "enterprise" friendly because you have listeners to add security and auditing
these pools propagate some context (security and jndi/the classloader typically)
In tomee we ali

Related

Chaos engineering best practice

I studied the principles of chaos, and looks for some opensource project, such as chaosblade which is open sourced by Alibaba, and mangle, by vmware.
These tools are both fault injection tools, and do nothing to analysis on the tested system.
According to the principles of chaos, we should
1.Start by defining ‘steady state’ as some measurable output of a system that indicates normal behavior.
2.Hypothesize that this steady state will continue in both the control group and the experimental group.
3.Introduce variables that reflect real world events like servers that crash, hard drives that malfunction, network connections that are severed, etc.
4.Try to disprove the hypothesis by looking for a difference in steady state between the control group and the experimental group.
so how we do step 4? Should we use monitoring system to monitor some major metrics, to check the status of the system after fault injection.
Is there any good suggestions or best practice?
so how we do step 4? Should we use monitoring system to monitor some major metrics, to check the status of the system after fault injection.
As always the answer is it depends.... It depends how do you want to measure your hypothesis, it depends on the hypothesis itself and it depends on the system. But normally it makes totally sense to introduce metrics to improve/increase the observability.
If your hypothesis is like Our service can process 120 requests in a second, even if one node fails. Then you could do it via metrics to measure that yes, but you could also measure it via the requests you send and receive the responses back. It is up to you.
But if your Hypothesis is I get a response for an request which was send before a node goes down. Then it makes more sense to verify this directly with the requests and response.
At our project we use for example chaostoolkit, which lets you specify the hypothesis in json or yaml and related action to prove it.
So you can say I have a steady state X and if I do Y, then the steady state X should be still valid. The toolkit is also able to verify metrics if you want to.
The Principles of Chaos are a bit above the actual testing, they reflect the philosophy of designed vs actual system and system under injection vs baseline, but are a bit too abstract to apply in everyday testing, they are a way of reasoning, not a work process methodology.
I'm think the control group vs experiment wording is one especially doubtful part - you stage a test (injection) in a controlled environment and try to catch if there is a user-facing incident, SLA breach of any kind or a degradation. I do not see where there is a control group out there if you test on a stand or dedicated environment.
We use a very linear variety of chaos methodology which is:
find failure points in the system (based on architecture, critical user scenarios and history of incidents)
design choas test scenarios (may be a single attack or more elaborate sequence)
run tests, register results and reuse green for new releases
start tasks to fix red tests, verify the solutions when they are available
One may say we are actually using the Principles of Choas in 1 and 2, but we tend to think of choas testing as quite linear and simple process.
Mangle 3.0 released with an option for analysis using resiliency score. Detailed documentation available at https://github.com/vmware/mangle/blob/master/docs/sre-developers-and-users/resiliency-score.md

hexagonal architecture and transactions concept

I'm trying to get used to hexagonal architecture and can't get how to implement common practical problems, already realized with different approaches. I think my core problem is to understand level of responsibility extracted to adapter and ports.
Reading articles on the web it is ok with primitive examples like:
we have RepositoryInterface which can be implemented in
mysql/txt/s3/nosql storage
or
we have NotificationSendingInterface and have email/sms/web push realizations
but those are very refined examples and simply interface/realization details separation.
In practice, however, coding service in domain model we usually know interface+realization guarantees more deeply.
For illustration purpose example I decided to ask about storage+transaction pair.
How transaction conception for storage should be implemented in hex architecture?
Assume we have simple crud service interface inside domain level
StorageRepoInterface
save(...)
update(...)
delete(...)
get(...)
and we want some kind of transaction guarantee while working with those methods, e.g. delete+save in one transaction.
How it should be designed and implemented according to hex conception?
Is it should be implemented with some external coordination interface of TransactionalOperation? If yes, then in general, TransactionalOperation must know how to implement transaction guaranty working with all implementations of StorageRepoInterface(mb within additional transaction-oriented operation interface)
If no, then seems there should be explicit transaction guarantees from StorageRepoInterface in the domain level(inside hex) with additional methods?
Either way it is no look so "isolated and interfaced based" as stated.
Can someone point me how to change mindset correctly for such situations or where to read?
Thanks in advance.
In Hex Arch, driver ports are the API of the application, the use case boundary. Use cases are transactional. So you have to control the transactionality at the driver ports methods. You enclose every method in a transaction.
If you use Spring you could use declarative transaction (#Transactional annotation).
Another way is to explicity open a db transaction before the execution of the method, and to close (commit / rollback) it after the method.
A useful pattern for applying transactionality is the command bus, wrapping it with a decorator which enclose the command in a transaction.
Transactions are infraestructure, so you should have a driven port and an adapter implementing the port.
The implementation must use the same db context (entity manager) used by persistence adapters (repositories).
Vaughn Vernon talks about this topic in the "Managing transactions" section (pages 432-437) of his book "Implementing DDD".
Instead of using command bus pattern, you could simply inject a TransactionPort to your command handler (defined at domain level).
The TransactionPort would have two methods (start and commit).
The TransactionAdapter would be your custom implementation (defined at infrastructure level).
Then you could do somethig like:
this.transactionalPort.start();
# Do you stuff
this.transactionalPort.commit();

"PSQLException: FATAL: sorry, too many clients already" error in integration tests with jOOQ & Spring Boot

There are already similar questions about this error and suggested solutions; e.g. increasing max_connections in postgresql.conf and / or adapting the max number of connections your app requests. However, my question is more specific to using jOOQ in a Spring Boot application.
I integrated jOOQ into my application as in the example on GitHub. Namely, I am using DataSourceConnectionProvider with TransactionAwareDataSourceProxy to handle database connections, and I inject the DSLContext in the classes that need it.
My application provides various web services to front-ends and I've never encountered that PSQLException on dev or test environments so far. I only started getting that error when running all integration tests (around 1000) locally. I don't expect some leak in handling the connection as Spring and jOOQ manage the resources; nevertheless that error got me worried if that would also happen on production.
Long story short, is there a better alternative to using DataSourceConnectionProvider to manage connections? Note that I already tried using DefaultConnectionProvider as well, and tried to make spring.datasource.max-active less than max_connections allowed by Postgres. Neither fixed my problem so far.
Since your question seems not to be about the generally best way to work with PostgreSQL connections / data sources, I'll answer the part about jOOQ and using its DataSourceConnectionProvider:
Using DataSourceConnectionProvider
There is no better alternative in general. In order to understand DataSourceConnectionProvider (the implementation), you have to understand ConnectionProvider (its specification). It is an SPI that jOOQ uses for two things:
to acquire() a connection prior to running a statement or a transaction
to release() a connection after running a statement (and possibly, fetching results) or a transaction
The DataSourceConnectionProvider does so by acquiring a connection from your DataSource through DataSource.getConnection() and by releasing it through Connection.close(). This is the most common way to interact with data sources, in order to let the DataSource implementation handle transaction and/or pooling semantics.
Whether this is a good idea in your case may depend on individual configurations that you have made. It generally is a good idea because you usually don't want to manually manage connection lifecycles.
Using DefaultConnectionProvider
This can certainly be done instead, in case of which jOOQ does not close() your connection for you, you'll do that yourself. I'm expecting this to have no effect in your particular case, as you'll implement the DataSourceConnectionProvider semantics manually using e.g.
try (Connection c = ds.getConnection()) {
// Implicitly using a DefaultConnectionProvider
DSL.using(c).select(...).fetch();
// Implicit call to c.close()
}
In other words: this is likely not a problem related to jOOQ, but to your data source.

Should I override the default ExecutionContext?

When using futures in scala the default behaviour is to use the default Implicits.global execution context. It seems this defaults to making one thread available per processor. In a more traditional threaded web application this seems like a poor default when the futures are performing a task such as waiting on a database (as opposed to some cpu bound task).
I'd expect that overriding the default context would be fairly standard in production but I can find so little documentation about doing it that it seems that it might not be that common. Am I missing something?
Instead of thinking of it as overriding the default execution context, why not ask instead "Should I use multiple execution contexts for different things?" If that's the question, then my answer would be yes. Where I work, we use Akka. Within our app, we use the default Akka execution context for non blocking functionality. Then, because there is no good non blocking jdbc driver currently, all of our blocking SQL calls use a separate execution context, where we have a thread per connection approach. Keeping the main execution context (a fork join pool) free from blocking lead to a significant increase in throughput for us.
I think it's perfectly ok to use multiple different execution contexts for different types of work within your system. It's worked well for us.
The "correct" answer is that your methods that needs to use an ExecutionContext require an ExecutionContext in their signature, so you can supply ExecutionContext(s) from the "outside" to control execution at a higher level.
Yes, creating and using other execution contexts in you application is definitely a good idea.
Execution contexts will modularize your concurrency model and isolate the different parts of your application, so that if something goes wrong in a part of your app, the other parts will be less impacted by this. To consider your example, you would have a different execution context for DB-specific operations and another one for say, processing of web requests.
In this presentation by Jonas Boner this pattern is referred to as creating "Bulkheads" in your application for greater stability & fault tolerance.
I must admit I haven't heard much about execution context usage by itself. However, I do see this principle applied in some frameworks. For example, Play will use different execution contexts for different types of jobs and they encourage you to split your tasks into different pools if necessary: Play Thread Pools
The Akka middleware also suggests splitting your app into different contexts for the different concurrency zones in your application. They use the concept of Dispatcher which is an execution context on batteries.
Also, most operators in the scala concurrency library require an execution context. This is by design to give you the flexibility you need when in modularizing your application concurrency-wise.

quick loading of a drools knowledge base

I'm trying to use Drools as the rule engine for a grammar relations to semantics mapping framework. The rule base is in excess of 5000 rules even now and will get extended. In using Drools currently the reading of the drl file containing the rules and creating the knowledge base takes a lot of time each time the program is run. Is there a way create the knowledge base once and save it in some persistent format that can be quickly loaded with the option to regenerate the knowledge base only when a change is made?
Yes, drools can serialise a knowledgebase out to external storage and then load this serialised knowledgebase back in again.
So, you need a cycle that loads from drl, compiles, serialises out. Then a second cycle that uses the serialised version.
I've used this with some success, reducing a 1 minute 30 loading time down to about 15-20 seconds. Also, it reduces your heap/perm gen requirements as well.
Check the API for the exact methods.
My first thought is to keep the knowledge base around as long as possible. Unless you are creating multiple knowledge bases from different sets of rules, and there are too many possible combinations, hang onto those knowledge bases. In one application I work on, one knowledge base has all the rules so we treat it like a singleton.
However, if that's not possible or your application is not that long-running, I don't know that Drools itself provides any ways of speeding that up. Running a Drools 5.0 project through the debugger, I see that the KnowledgeBase Drools gives me is Serializable. I imagine it would be quicker to deserialize a KnowledgeBase than to re-parse the rules. But be careful designing your application around this! You use interfaces for a reason and the implementation could change without warning.