Infinite loop with thread sleep vs EJB 3.1 Timers in WebSphere 8: backpressure - java-ee-6

I have a requirement to
read a Database table
process the data (dataCleanser)
to, increase the throughput, I implemented EJB timers (non persistent) that wake up every 5 minutes (10 of them) and do the above work.
The problem is 'back pressure', the dataCleanser can sometimes take like 12 minutes (makes an external API call) and when this happens, Websphere reports a hung thread.
In those cases, I would like to decrease the number of timers (say from 10 to 5) programatically.
I can do that only if the timer comes back reports its status of successful execution or exception or timeout
that way I can control the back pressure.
is there anyway to do that in Websphere 8?
To ask the question in another way
- can the EJB timers(with transaction_not_supported) invoke another EJB that have transaction timeouts?
- can those timeouts be caught in the calling timer code?
if that is not possible, what are the downsides of using the plain old infinite loop with sleep and then invoke and EJB(in turn calls dataCleanser) with transaction timeout?
one of the downside, I know is this becomes single threaded and I do not how to make 10 parallel executions like I would do with timers.

I had some similar issues with scheduling, I decided to re-schedule callbacks programmatically each time my logic ended, based on the results of processing. You can have the timer service injected:
#Resource
private TimerService timerService;
This can be in a superclass if there are multiple EJBs that need scheduling, with methods like:
protected void reschedule(long millis) {
timerService.createTimer(millis, null);
}
This way you can control beans individually. I would not try to make them control each other, since that would become difficult in a cluster with multiple JVMs.

Related

How to tune Play Framework application with proper threadpools?

I am working with Play Framework (Scala) version 2.3. From the docs:
You can’t magically turn synchronous IO into asynchronous by wrapping it in a Future. If you can’t change the application’s architecture to avoid blocking operations, at some point that operation will have to be executed, and that thread is going to block. So in addition to enclosing the operation in a Future, it’s necessary to configure it to run in a separate execution context that has been configured with enough threads to deal with the expected concurrency.
This has me a bit confused on how to tune my webapp. Specifically, since my app has a good amount of blocking calls: a mix of JDBC calls, and calls to 3rd party services using blocking SDKs, what is the strategy for configuring the execution context and determining the number of threads to provide? Do I need a separate execution context? Why can't I simply configure the default pool to have a sufficient amount of threads (and if I do this, why would I still need to wrap the calls in a Future?)?
I know this ultimately will depend on the specifics of my app, but I'm looking for some guidance on the strategy and approach. The play docs preach the use of non-blocking operations everywhere but in reality the typical web-app hitting a sql database has many blocking calls, and I got the impression from reading the docs that this type of app will perform far from optimally with the default configurations.
[...] what is the strategy for configuring the execution context and
determining the number of threads to provide
Well, that's the tricky part which depends on your individual requirements.
First of all, you probably should choose a basic profile from the docs (pure asynchronous, highly synchronous or many specific thread pools)
The second step is to fine-tune your setup by profiling and benchmarking your application
Do I need a separate execution context?
Not necessarily. But it makes sense to use separate execution contexts if you want to trigger all your blocking IO-calls at once and not in a sequential way (so database call B does not have to wait until database call A is finished).
Why can't I simply configure the default pool to have a sufficient
amount of threads (and if I do this, why would I still need to wrap
the calls in a Future?)?
You can, check the docs:
play {
akka {
akka.loggers = ["akka.event.slf4j.Slf4jLogger"]
loglevel = WARNING
actor {
default-dispatcher = {
fork-join-executor {
parallelism-min = 300
parallelism-max = 300
}
}
}
}
}
With this approach, you basically are turning Play into a one-thread-per-request-model. This is not the idea behind Play, but if you're doing a lot of blocking IO calls, it's the simplest approach. In this case, you don't need to wrap your database calls in a Future.
To put it in a nutshell, you basically have three ways to go:
Only use (IO-)technologies whose API calls are non-blocking and asynchronous. This allows you to use a small threadpool / default execution context which suits the nature of Play
Turn Play into a one-thread-per-request Framework by drastically increasing the default execution context. No futures needed, just call your blocking database as always
Create specific execution contexts for your blocking IO-calls and gain fine-grained control of what you are doing
Firstly, before diving in and refactoring your app, you should determine whether this is actually a problem for you. Run some benchmarks (gatling is superb) and do a few profiles with something like JProfiler. If you can live with the current performance then happy days.
The ideal is to use a reactive driver which would return you a future that then gets passed all the way back to your controller. Unfortunately async is still an Open ticket for slick. Interacting with REST APIs can be made reactive using the PlayWS library, but if you have to go via a library that your 3rd party provides then you're stuck.
So, assuming that none of these are feasible and that you do need to improve performance, the question is what benefit would Play's suggestion have? I think what they're getting at here is that it's useful to partition your threads into those that block and those that can make use of asynchronous techniques.
If, for instance, only some proportion of your requests are long and blocking then with a single thread pool you risk all threads being used for the blocking operations. Your controller would then not be able to handle any new requests, irrespective of whether that request needs to call a blocking service. If you can allocate enough threads that this never happens then no problem.
If, on the other hand, you are hitting your limit for threads then by using two pools you can keep your fast, non-blocking requests snappy. You would have one pool servicing requests in your controller and calling into services which return futures. Some of these futures would actually be performing work using a separate pool of threads, but only for the blocking operations. If there is any portion of your app which could be made reactive, then your controller could take advantage of this while isolating the controller from the blocking operations.

Good design suggestion for multiple tasks scheduling in C#

I have a service that requires an operation to be performed on an entity status 10 minutes after a specific event has been invoked. During the 10 minute wait, it has to listen for other events that would stop this event altogether so the operation would not be executed.
At any point in time, there could be multiple instances of entities to be handled.
Is there any suggestion as to what possible ways of implementation to create this service?
I have something similar in one of my projects that handles users presence management. The rough idea is like this:
1 You have a timer with tick precision that you need (i use 15
second)
2 Some input event triggers monitoring of an entity and puts entity into the execution queue with time of adding
3 If any canceling event occurs > you remove corresponding entities from the execution queue
4 On timer ticks you check time of adding for each iteam in executionQueue and execute operation for entities that are in there for 10 mins
PS.
you might want to consider using multi-threading / TPL for operations execution.

Akka: Adding a delay to a durable mailbox

I am wondering if there is some way to delay an akka message from processing?
My use case: For every request I have, I have a small amount of work that I need to do and then I need to additional work two hours later.
Is there any easy way to delay the processing of a message in AKKA? I know I can probably setup an external distributed queue such as ActiveMQ, RabbitMQ which probably has this feature but I rather not.
I know I would need to make the mailbox durable so it can survive restarts or crashes. We already have mongo setup so I probably be using the MongoBasedMailbox for durability.
Temporal Workflow is capable of supporting your use case with minimal effort. You can think about it as a Durable Actor platform. When actor state including threads and local variables is preserved across process restarts.
Temporal offers a lot of other features for task processing.
Built it exponential retries with unlimited expiration interval
Failure handling. For example, it allows executing a task that notifies another service if both updates couldn't succeed during a configured interval.
Support for long running heartbeating operations
Ability to implement complex task dependencies. For example to implement chaining of calls or compensation logic in case of unrecoverable failures (SAGA)
Gives complete visibility into the current state of the update. For example, when using queues all you know if there are some messages in a queue and you need additional DB to track the overall progress. With Temporal every event is recorded.
Ability to cancel an update in flight.
Throttling of requests
See the presentation that goes over the Temporal programming model. It talks about Cadence which is the predecessor of Temporal.
It's not ideal, but the Akka Camel Quartz scheduler would do the trick. More heavyweight than the built-in ActorSystem scheduler, but know that Quartz has its own issues.
you could still use the normal Akka scheduler, you will just have to keep a state on the actor persistence to avoid loosing the job if the server restarted.
I have recently used PersistentFsmActor - which will keep the state of the actor persisted
I'm not sure in your case you have to use FSM (Finite State Machine) , so you could basically just use a persistentActor to save the time the job was inserted, and start a scheduler to that time. this way - even if you restarted the server, the actor will start and create a new scheduled job use the persistent data to calculate the time left to run it

Implementing timer using select

I am planning to write a small timer library in C using timerfd_create.
The basic user of this library will have two threads
application thread
Timer thread
There will be a queue between these two threads so that whenever the application wants to start a timer, it will push a message into the queue which the timer thread will then read and create an FD for it and put it in select.
The problem with the above approach is that the timer thread being a single thread would be blocked in the select system call and would not know if a message has been posted in his receive queue to start a timer.
One way around this is to let the select timeout every "tick" and then check for messages in the queue. Is their a better way to do this?
I was also thinking of raising an Interrupt every time the application puts a message in the select queue to interrupt the select. Does that work well with Multi-threaded applications?
Platform : Unix
If you insist on having multiple threads post timers to a dedicated timer thread sitting in select(2), then why not use eventfd(2) or just an old-good self-pipe trick to signal that new timers are available. Include the event file descriptor to the pollable set, wait on all of them.
Which platform(s) are you wanting to target? Under Windows, for instance, there are much better ways to handle this without using select(), such as PostThreadMessage() and WaitMessage().
If you are using timerfd's then there is no need for a dedicated timer thread, just write the application around an event loop using select, poll, or epoll, etc.

High Throughput and Windows Workflow Foundation

Can WWF handle high throughput scenarios where several dozen records are 'actively' being processed in parallel at any one time?
We want to build a workflow process which handles a few thousand records per hour. Each record takes up to a minute to process, because it makes external web service calls.
We are testing Windows Workflow Foundation to do this. But our demo programs show processing of each record appear to be running in sequence not in parallel, when we use parallel activities to process several records at once within one workflow instance.
Should we use multiple workflow instances or parallel activities?
Are there any known patterns for high performance WWF processing?
You should definitely use a new workflow per record. Each workflow only gets one thread to run in, so even with a ParallelActivity they'll still be handled sequentially.
I'm not sure about the performance of Windows Workflow, but from what I have heard about .NET 4 at Tech-Ed was that its Workflow components will be dramatically faster then the ones from .NET 3.0 and 3.5. So if you really need a lot of performance, maybe you should consider waiting for .NET 4.0.
Another option could be to consider BizTalk. But it's pretty expensive.
I think the common pattern is to use one workflow instance per record. The workflow runtime runs multiple instances in parallel.
One workflow instance runs one thread at a time. The parallel activity calls Execute method of each activity sequentially on this single thread. You may still get performance improvement from parallel activity however, if the activities are asynchronous and spend most of the time waiting for external process to finish its work. E.g. if activity calls an external web method, and then waits for a reply - it returns from Execute method and does not occupy this thread while waiting for the reply, so another activity in the Parallel group can start its job (e.g. also call to a web service) at the same time.