Control execution of IgniteFuture - scala

I would like to execute an async method in Ignite cache and set a timeout for the execution. Moreover, I would like to specify the executor.
Using cache.getAsync is very close to the desired functionality but it does not accept a timeout and executor arguments.
Currently, a sub-optimal solution can be found in the following Scala snippet:
val igniteFuture = cache.getAsync(key)
igniteFuture.listenAsync(
(f: IgniteFuture[T]) => f.get(timeout.toMillis, TimeUnit.MILLISECONDS)), executor)
How can the desired functionality can be achieved with current Ignite building blocks?

I think, you are mixing concepts of futures and asynchronous operations. Futures are the objects, that can be either completed, or waited on. So, when you ask Ignite to perform an asynchronous operation, it gives you a future, that will be completed later at some point. You can specify a period of time in a IgniteFuture.get() method, or subscribe to completion of this future, by using IgniteFuture.listen() method.
But the way, that operations are performed, is incapsulated from you. You can configure sizes of internal thread pools though: https://apacheignite.readme.io/docs/thread-pools

Related

How are Futures executed in an Akka actor

I'm trying to make a background task that performs a network call and stores the response in a database. According to the documentation, background tasks are supposed to use the scheduler within the Akka actor system. I need to run a Future inside of this actor:
actorSystem.scheduler.scheduleOnce(delay = new FiniteDuration(0, TimeUnit.SECONDS)) {
val future = network.request()
future.flatMap(saveToDatabase(_))
}
Therefore, I have two questions:
Is this future guaranteed to get executed (to completion)?
Is it possible for other requests to follow up on the status of this task (whether it has finished or not)?
The Future in the future value is returned by the network object, so this object is responsible for executing the code that triggers the Future, not Akka. So you need to look at the documentation for the request call to see what completion guarantees there are for this Future.
The Future returned by the flatMap call uses the default execution context that is in scope when this task is created. But the saveToDatabase call is not guaranteed to be called because the Future can fail and flatMap is only called on success.
If you want to track the status of this task, send messages to a monitoring actor at various points in the execution. Other actors can then ask this monitoring actor about the progress of the task.

Using futures in Spark-Streaming & Cassandra (Scala)

I am rather new to spark, and I wonder what is the best practice when using spark-streaming with Cassandra.
Usually, when performing IO, it is a good practice to execute it inside a Future (in Scala).
However, a lot of the spark-cassandra-connector seems to operate synchronously.
For example: saveToCassandra (com.datastax.spark.connector.RDDFunctions)
Is there a good reason why those functions are not async ?
should I wrap them with a Future?
While there are legitimate cases when you can benefit from asynchronous execution of the driver code it is not a general rule. You have to remember that the driver itself is not the place where actual work is performed and Spark execution is a subject of different types of constraints in particular:
scheduling constraints related to resource allocation and DAG topology
batch order in streaming applications
Moreover thinking about the actions like saveToCassandra as IO operation is a significant oversimplification. Spark actions are just entry points for Spark jobs where typically IO activity is just a tip of the iceberg.
If you perform multiple actions per batch and have enough resources to do it without negative impact on individual jobs or you want to perform some type of IO in the driver thread itself then async execution can be useful. Otherwise you probably wasting your time.

How to tune Play Framework application with proper threadpools?

I am working with Play Framework (Scala) version 2.3. From the docs:
You can’t magically turn synchronous IO into asynchronous by wrapping it in a Future. If you can’t change the application’s architecture to avoid blocking operations, at some point that operation will have to be executed, and that thread is going to block. So in addition to enclosing the operation in a Future, it’s necessary to configure it to run in a separate execution context that has been configured with enough threads to deal with the expected concurrency.
This has me a bit confused on how to tune my webapp. Specifically, since my app has a good amount of blocking calls: a mix of JDBC calls, and calls to 3rd party services using blocking SDKs, what is the strategy for configuring the execution context and determining the number of threads to provide? Do I need a separate execution context? Why can't I simply configure the default pool to have a sufficient amount of threads (and if I do this, why would I still need to wrap the calls in a Future?)?
I know this ultimately will depend on the specifics of my app, but I'm looking for some guidance on the strategy and approach. The play docs preach the use of non-blocking operations everywhere but in reality the typical web-app hitting a sql database has many blocking calls, and I got the impression from reading the docs that this type of app will perform far from optimally with the default configurations.
[...] what is the strategy for configuring the execution context and
determining the number of threads to provide
Well, that's the tricky part which depends on your individual requirements.
First of all, you probably should choose a basic profile from the docs (pure asynchronous, highly synchronous or many specific thread pools)
The second step is to fine-tune your setup by profiling and benchmarking your application
Do I need a separate execution context?
Not necessarily. But it makes sense to use separate execution contexts if you want to trigger all your blocking IO-calls at once and not in a sequential way (so database call B does not have to wait until database call A is finished).
Why can't I simply configure the default pool to have a sufficient
amount of threads (and if I do this, why would I still need to wrap
the calls in a Future?)?
You can, check the docs:
play {
akka {
akka.loggers = ["akka.event.slf4j.Slf4jLogger"]
loglevel = WARNING
actor {
default-dispatcher = {
fork-join-executor {
parallelism-min = 300
parallelism-max = 300
}
}
}
}
}
With this approach, you basically are turning Play into a one-thread-per-request-model. This is not the idea behind Play, but if you're doing a lot of blocking IO calls, it's the simplest approach. In this case, you don't need to wrap your database calls in a Future.
To put it in a nutshell, you basically have three ways to go:
Only use (IO-)technologies whose API calls are non-blocking and asynchronous. This allows you to use a small threadpool / default execution context which suits the nature of Play
Turn Play into a one-thread-per-request Framework by drastically increasing the default execution context. No futures needed, just call your blocking database as always
Create specific execution contexts for your blocking IO-calls and gain fine-grained control of what you are doing
Firstly, before diving in and refactoring your app, you should determine whether this is actually a problem for you. Run some benchmarks (gatling is superb) and do a few profiles with something like JProfiler. If you can live with the current performance then happy days.
The ideal is to use a reactive driver which would return you a future that then gets passed all the way back to your controller. Unfortunately async is still an Open ticket for slick. Interacting with REST APIs can be made reactive using the PlayWS library, but if you have to go via a library that your 3rd party provides then you're stuck.
So, assuming that none of these are feasible and that you do need to improve performance, the question is what benefit would Play's suggestion have? I think what they're getting at here is that it's useful to partition your threads into those that block and those that can make use of asynchronous techniques.
If, for instance, only some proportion of your requests are long and blocking then with a single thread pool you risk all threads being used for the blocking operations. Your controller would then not be able to handle any new requests, irrespective of whether that request needs to call a blocking service. If you can allocate enough threads that this never happens then no problem.
If, on the other hand, you are hitting your limit for threads then by using two pools you can keep your fast, non-blocking requests snappy. You would have one pool servicing requests in your controller and calling into services which return futures. Some of these futures would actually be performing work using a separate pool of threads, but only for the blocking operations. If there is any portion of your app which could be made reactive, then your controller could take advantage of this while isolating the controller from the blocking operations.

Event based design - Futures, Promises vs Akka Persistence

I have multiple use cases which require predefined events to be fired based on a certain user actions.
e.g. let's say when NewUser is created in the application, it'll have to call CreateUserInWorkflowSystem and FireEmailToTheUser asynchronously. There are many other business cases of this nature where events will be predefined based on a usecase. I can use Promises/Futures to model these events as below
if 'NewUser' then
call `CreateUserInWorkflowSystem` (which will be Future based API)
call `FireEmailToTheUser` (which will be Future based API)
if 'FileImport' then
call `API3` (which will be Future based call)
call `API4` (which will be Future based call)
All those Future calls will have to log failures somewhere so failed calls can be retried etc. Note NewUser call won't be waiting for those Futures (events per say) to complete.
That was using plain Futures/Promises APIs. However I am thinking Akka Persistence will be an appropriate fit here and blocking calls can still run into Futures. With Akka persistence, handling failure will be easy as it provides it out of box etc. I understand Akka persistence is still in experimental stage but that doesn't seem to be a big concern as typesafe generally keeps these new frameworks into experimental state before promoting into future release etc. (same was true with Macros). Given these requirements do you think Futures/Promises or Akka persistence is a better fit here?
This is an opinion based question - not the best type to ask on SO. Anyway, trying to answer.
It really depends what you are more comfortable with and what your requirements are. Do you need to scale the system later beyond a single JVM - use Akka. Do you want to keep it more simple - use Futures.
If you use Futures you can store all state and actions to execute in a job queue/db. It's quite reasonable.
If you use Akka Persistence then obviously it will help you with persistence. Akka will help to perform supervison, recovery and retries easier. If your CreateUserInWorkflowSystem action fails result is propagated to supervising actor which probably restarts the failed actor and makes it retry for N times. If your supervising actor fails then his supervisor will do the right thing, or eventually the whole app will crash which is good. With Futures you would have to implement this mechanism yourself and make sure that application can crash when needed.
If you have completely independent actions then Futures and Actors sound about the same. If you have to chain actions and compose them, then using Futures will be a somewhat more natural thing to do: for comprehensions, etc. In Akka you would have to wait for a message and based on a type of a message perform next action.
Try to mock a simple implementation using both and compare what you like/dislike given your particular application requirements. Overall, both choices are good, but I'm slightly leaning towards actors in this case.

How does I/O work in Akka?

How does the actor model (in Akka) work when you need to perform I/O (ie. a database operation)?
It is my understanding that a blocking operation will throw an exception (and essentially ruin all concurrency due to the evented nature of Netty, which Akka uses). Hence I would have to use a Future or something similar - however I don't understand the concurrency model.
Can 1 actor be processing multiple message simultaneously?
If an actor makes a blocking call in a future (ie. future.get()) does that block only the current actor's execution; or will it prevent execution on all actors until the blocking call has completed?
If it blocks all execution, how does using a future assist concurrency (ie. wouldn't invoking blocking calls in a future still amount to creating an actor and executing the blocking call)?
What is the best way to deal with a multi-staged process (ie. read from the database; call a blocking webservice; read from the database; write to the database) where each step is dependent on the last?
The basic context is this:
I'm using a Websocket server which will maintain thousands of sessions.
Each session has some state (ie. authentication details, etc);
The Javascript client will send a JSON-RPC message to the server, which will pass it to the appropriate session actor, which will execute it and return a result.
Execution of the RPC call will involve some I/O and blocking calls.
There will be a large number of concurrent requests (each user will be making a significant amount of requests over the WebSocket connection and there will be a lot of users).
Is there a better way to achieve this?
Blocking operations do not throw exceptions in Akka. You can do blocking calls from an Actor (which you probably want to minimize, but thats another story).
no, 1 actor instance cannot.
It will not block any other actors. You can influence this by using a specific Dispatcher. Futures use the default dispatcher (the global event driven one normally) so it runs on a thread in a pool. You can choose which dispatcher you want to use for your actors (per actor, or for all). I guess if you really wanted to create a problem you might be able to pass exactly the same (thread based) dispatcher to futures and actors, but that would take some intent from your part. I guess if you have a huge number of futures blocking indefinitely and the executorservice has been configured to a fixed amount of threads, you could blow up the executorservice. So a lot of 'ifs'. a f.get blocks only if the Future has not completed yet. It will block the 'current thread' of the Actor from which you call it (if you call it from an Actor, which is not necessary by the way)
you do not necessarily have to block. you can use a callback instead of f.get. You can even compose Futures without blocking. check out talk by Viktor on 'the promising future of akka' for more details: http://skillsmatter.com/podcast/scala/talk-by-viktor-klang
I would use async communication between the steps (if the steps are meaningful processes on their own), so use an actor for every step, where every actor sends a oneway message to the next, possibly also oneway messages to some other actor that will not block which can supervise the process. This way you could create chains of actors, of which you could make many, in front of it you could put a load balancing actor, so that if one actor blocks in one chain another of the same type might not in the other chain. That would also work for your 'context' question, pass of workload to local actors, chain them up behind a load balancing actor.
As for netty (and I assume you mean Remote Actors, because this is the only thing that netty is used for in Akka), pass of your work as soon as possible to a local actor or a future (with callback) if you are worried about timing or preventing netty to do it's job in some way.
Blocking operations will generally not throw exceptions, but waiting on a future (for example by using !! or !!! send methods) can throw a time out exception. That's why you should stick with fire-and-forget as much as possible, use a meaningful time-out value and prefer callbacks when possible.
An akka actor cannot explicitly process several messages in a row, but you can play with the throughput value via the config file. The actor will then process several message (i.e. its receive method will be called several times sequentially) if its message queue it's not empty: http://akka.io/docs/akka/1.1.3/scala/dispatchers.html#id5
Blocking operations inside an actor will not "block" all actors, but if you share threads among actors (recommended usage), one of the threads of the dispatcher will be blocked until operations resume. So try composing futures as much as possible and beware of the time-out value).
3 and 4. I agree with Raymond answers.
What Raymond and paradigmatic said, but also, if you want to avoid starving the thread pool, you should wrap any blocking operations in scala.concurrent.blocking.
It's of course best to avoid blocking operations, but sometimes you need to use a library that blocks. If you wrap said code in blocking, it will let the execution context know you may be blocking this thread so it can allocate another one if needed.
The problem is worse than paradigmatic describes since if you have several blocking operations you may end up blocking all threads in the thread pool and have no free threads. You could end up with deadlock if all your threads are blocked on something that won't happen until another actor/future gets scheduled.
Here's an example:
import scala.concurrent.blocking
...
Future {
val image = blocking { load_image_from_potentially_slow_media() }
val enhanced = image.enhance()
blocking {
if (oracle.queryBetter(image, enhanced)) {
write_new_image(enhanced)
}
}
enhanced
}
Documentation is here.