Best practice to cleanup resources acquired by an Akka actor - scala

My Akka actor acquires some resources, which should be released after jobs are done, as shown by code snippet below.
The issue is, how to make sure the resources will be released in case of exceptions.
The first idea occurred to me is to assign the acquired resources to a var, and release them in def postStop() hook as a safeguard.
As this is a such common requirement, I'm wondering if there's any best practice to this?
override def receive: Receive = {
case FILEIO(path) => {
val ios = fs.create(new Path(path))
// ios.read/write ..
// exception could occur
ios.close()
}
}

If you use scala 2.13 you can try scala.util.Using. It will close the resource automatically even if an exception occurs.
val resultTry = Using(fs.create(new Path(path))) { ios =>
// ios.read/write ..
// exception could occur
}
resultTry match {
case Success(ioOperationsResult) => ...
case Failure(ioException) => ...
}
Also, you should separate blocking IO operations from the akka dispatcher.

Use Akka Stream's File IO sinks and sources. The FileIO tools use actors under the covers, transparently close resources when errors occur, and run on a dedicated dispatcher. The latter is important because blocking IO operations should be isolated from the rest of the system. A simple example from the documentation:
import akka.stream.scaladsl._
val file = Paths.get("example.csv")
val foreach: Future[IOResult] = FileIO.fromPath(file).to(Sink.ignore).run()

I think you should reconsider using actors for heavy IO access.
Given that you're using a default dispatcher, you might block it with IO operations and this will slow down the whole actor system.
If you still wish to proceed, recommend doing following
Create a new dispatcher and assign it to your actor. See docs for reference
Use try/finally block to close resources (yes, plain old try/catch/finally)
override def receive: Receive = {
case FILEIO(path) => {
val ios = fs.create(new Path(path))
try {
// ios.read/write ..
// exception could occur
} finally {
ios.close()
}
}
}
Alternatively, extract your IO work into async context, like Future that should be executed in a dedicated execution context and use pipeTo pattern to process results with an Actor.

Related

scala ZIO foreachPar

I'm new to parallel programming and ZIO, i'm trying to get data from an API, by parallel requests.
import sttp.client._
import zio.{Task, ZIO}
ZIO.foreach(files) { file =>
getData(file)
Task(file.getName)
}
def getData(file: File) = {
val data: String = readData(file)
val request = basicRequest.body(data).post(uri"$url")
.headers(content -> "text", char -> "utf-8")
.response(asString)
implicit val backend: SttpBackend[Identity, Nothing, NothingT] = HttpURLConnectionBackend()
request.send().body
resquest.Response match {
case Success(value) => {
val src = new PrintWriter(new File(filename))
src.write(value.toString)
src.close()
}
case Failure(exception) => log error
}
when i execute the program sequentially, it work as expected,
if i tried to run parallel, by changing ZIO.foreach to ZIO.foreachPar.
The program is terminating prematurely, i get that, i'm missing something basic here,
any help is appreciated to help me figure out the issue.
Generally speaking I wouldn't recommend mixing synchronous blocking code as you have with asynchronous non-blocking code which is the primary role of ZIO. There are some great talks out there on how to effectively use ZIO with the "world" so to speak.
There are two key points I would make, one ZIO lets you manage resources effectively by attaching allocation and finalization steps and two, "effects" we could say are "things which actually interact with the world" should be wrapped in the tightest scope possible*.
So lets go through this example a bit, first of all, I would not suggest using the default Identity backed backend with ZIO, I would recommend using the AsyncHttpClientZioBackend instead.
import sttp.client._
import zio.{Task, ZIO}
import zio.blocking.effectBlocking
import sttp.client.asynchttpclient.zio.AsyncHttpClientZioBackend
// Extract the common elements of the request
val baseRequest = basicRequest.post(uri"$url")
.headers(content -> "text", char -> "utf-8")
.response(asString)
// Produces a writer which is wrapped in a `Managed` allowing it to be properly
// closed after being used
def managedWriter(filename: String): Managed[IOException, PrintWriter] =
ZManaged.fromAutoCloseable(UIO(new PrintWriter(new File(filename))))
// This returns an effect which produces an `SttpBackend`, thus we flatMap over it
// to extract the backend.
val program = AsyncHttpClientZioBackend().flatMap { implicit backend =>
ZIO.foreachPar(files) { file =>
for {
// Wrap the synchronous reading of data in a `Task`, but which allows runs this effect on a "blocking" threadpool instead of blocking the main one.
data <- effectBlocking(readData(file))
// `send` will return a `Task` because it is using the implicit backend in scope
resp <- baseRequest.body(data).send()
// Build the managed writer, then "use" it to produce an effect, at the end of `use` it will automatically close the writer.
_ <- managedWriter("").use(w => Task(w.write(resp.body.toString)))
} yield ()
}
}
At this point you will just have the program which you will need to run using one of the unsafe methods or if you are using a zio.App through the main method.
* Not always possible or convenient, but it is useful because it prevents resource hogging by yielding tasks back to the runtime for scheduling.
When you use a purely functional IO library like ZIO, you must not call any side-effecting functions (like getData) except when calling factory methods like Task.effect or Task.apply.
ZIO.foreach(files) { file =>
Task {
getData(file)
file.getName
}
}

Possible to offload route completion DSL to actors in akka-http?

I have become interested in the akka-http implementation but one thing that strikes me as kind of an anti-pattern is the creation of a DSL for all routing, parsing of parameters, error handling and so on. The examples given in the documents are trivial in the extreme. However I saw a route of a real product on the market and it was a massive 10k line file with mesting many levels deep and a ton of business logic in the route. Real world systems ahave to deal with users passing bad parameters, not having the right permissions and so on so the simple DSL explodes fast in real life. To me the optimal solution would be to hand off the route completion to actors, each with the same api who will then do what is needed to complete the route. This would spread out the logic and enable maintainable code but after hours I have been unable to manage it. With the low level API I can pass off the HttpRequest and handle it the old way but that leaves me out of most of the tools in the DSL. So is there a way I could pass something to an actor that would enable it to continue the DSL at that point, handling route specific stuff? I.e. I am talking about something like this:
class MySlashHandler() extends Actor {
def receive = {
case ctxt: ContextOfSomeKind =>
decodeRequest {
// unmarshal with in-scope unmarshaller
entity(as[Order]) { order =>
sender ! "Order received"
}
context.stop(self)
}
}
val route =
pathEndOrSingleSlash {
get { ctxt =>
val actor = actorSystem.actorOf(Props(classOf[MySlashHandler]))
complete(actor ? ctxt)
}
}
Naturally that wont even compile. Despite my best efforts i haven't found a type for ContextOfSomeKind or how to re-enter the DSL once I am inside the actor. It could be this isnt possible. If not I dont think I like the DSL because it encourages what I would consider horrible programming methodology. Then the only problem with the low level API is getting access to the entity marshallers but I would rather do that then make a massive app in a single source file.
Offload Route Completion
Answering your question directly: a Route is nothing more than a function. The definition being:
type Route = (RequestContext) => Future[RouteResult]
Therefore you can simply write a function that does what you are look for, e.g. sends the RequestContext to an Actor and gets back the result:
class MySlashHandler extends Actor {
val routeHandler = (_ : RequestContext) => complete("Actor Complete")
override def receive : Receive = {
case requestContext : RequestContext => routeHandler(requestContext) pipeTo sender
}
}
val actorRef : ActorRef = actorSystem actorOf (Props[MySlashHandler])
val route : Route =
(requestContext : RequestContext) => (actorRef ? requestContext).mapTo[RouteResult]
Actors Don't Solve Your Problem
The problem you are trying to deal with is the complexity of the real world and modelling that complexity in code. I agree this is an issue, but an Actor isn't your solution. There are several reasons to avoid Actors for the design solution you seek:
There are compelling arguments against putting business logic in
Actors.
Akka-http uses akka-stream under the hood. Akka stream uses Actors under the hood. Therefore, you are trying to escape a DSL based on composable Actors by using Actors. Water isn't usually the solution for a drowning person...
The akka-http DSL provides a lot of compile time checks that are washed away once you revert to the non-typed receive method of an Actor. You'll get more run time errors, e.g. dead letters, by using Actors.
Route Organization
As stated previously: a Route is a simple function, the building block of scala. Just because you saw an example of a sloppy developer keeping all of the logic in 1 file doesn't mean that is the only solution.
I could write all of my application code in the main method, but that doesn't make it good functional programming design. Similarly, I could write all of my collection logic in a single for-loop, but I usually use filter/map/reduce.
The same organizing principles apply to Routes. Just from the most basic perspective you could break up the Route logic according to method type:
//GetLogic.scala
object GetLogic {
val getRoute = get {
complete("get received")
}
}
//PutLogic.scala
object PutLogic {
val putRoute = put {
complete("put received")
}
}
Another common organizing principle is to keep your business logic separate from your Route logic:
object BusinessLogic {
type UserName = String
type UserId = String
//isolated business logic
val dbLookup(userId : UserId) : UserName = ???
val businessRoute = get {
entity(as[String]) { userId => complete(dbLookup(userId)) }
}
}
These can then all be combined in your main method:
val finalRoute : Route =
GetLogic.getRoute ~ PutLogic.putRoute ~ BusinessLogic.businessRoute
The routing DSL can be misleading because it sort of looks like magic at times, but underneath it's just plain-old functions which scala can organize and isolate just fine...
I encountered a problem like this last week as well. Eventually I ended up at this blog and decided to go the same way as described there.
I created a custom directive which makes it possible for me to pass request contexts to Actors.
def imperativelyComplete(inner: ImperativeRequestContext => Unit): Route = { ctx: RequestContext =>
val p = Promise[RouteResult]()
inner(new ImperativeRequestContext(ctx, p))
p.future
}
Now I can use this in my Routes file like this:
val route =
put {
imperativelyComplete { ctx =>
val actor = actorSystem.actorOf(Props(classOf[RequestHandler], ctx))
actor ! HandleRequest
}
}
With my RequestHandler Actor looking like the following:
class RequestHandler(ctx: ImperativeRequestContext) extends Actor {
def receive: Receive = {
case handleRequest: HandleRequest =>
someActor ! DoSomething() // will return SomethingDone to the sender
case somethingDone: SomethingDone =>
ctx.complete("Done handling request")
context.stop(self)
}
}
I hope this brings you into the direction of finding a better solution. I am not sure if this solution should be the way to go, but up until now it works out really well for me.

Asynchronous message handling with Akka's Actors

In my project I'm using Akka's Actors. By definition Actors are thread-safe, which means that in the Actor's receive method
def receive = {
case msg =>
// some logic here
}
only one thread at a time processes the commented piece of code. However, things are starting to get more complicated when this code is asynchronous:
def receive = {
case msg =>
Future {
// some logic here
}
}
If I understand this correctly, in this case only the Future construct will be synchronized, so to speak, and not the logic inside the Future.
Of course I may block the Future:
def receive = {
case msg =>
val future = Future {
// some logic here
}
Await.result(future, 10.seconds)
}
which solves the problem, but I think we all should agree that this is hardly an acceptable solution.
So this is my question: how can I retain the thread-safe nature of actors in case of asynchronous computing without blocking Scala's Futures?
How can I retain the thread-safe nature of actors in case of
asynchronous computing without block Scalas Future?
This assumption is only true if you modify the internal state of the actor inside the Future which seems to be a design smell in the first place. Use the future for computation only by creating a copy of the data and pipe to result of the computation to the actor using pipeTo. Once the actor receives the result of the computation you can safely operate on it:
import akka.pattern.pipe
case class ComputationResult(s: String)
def receive = {
case ComputationResult(s) => // modify internal state here
case msg =>
Future {
// Compute here, don't modify state
ComputationResult("finished computing")
}.pipeTo(self)
}
I think you need to "resolve" the db query first and then use the result to return a new Future. If the db query returns a Future[A], then you can use flatMap to operate over A and return a new Future. Something in the lines of
def receive = {
case msg =>
val futureResult: Future[Result] = ...
futureResult.flatMap { result: Result =>
// ....
// return a new Future
}
}
the simplest solution here is to turn the actor into a state machine (use AkkaFSM) and do the following:
dispatch a future for the mongoDB request.
use the reference to your own actor to commuincate with your actor
tell the message back from the future.
depending on context you might have to do some more to get a proper response.
But this has the advantage that you process the message with the actor state and you can mutate the actor state as you please as you own the thread.

How to run futures on the current actor's dispatcher in Akka

Akka's documentation warns:
When using future callbacks, such as onComplete, onSuccess, and onFailure, inside actors you need to carefully avoid closing over the containing actor’s reference, i.e. do not call methods or access mutable state on the enclosing actor from within the callback
It seems to me that if I could get the Future which wants to access the mutable state to run on the same dispatcher that arranges for mutual exclusion of threads handling actor messages then this issue could be avoided. Is that possible? (Why not?)
The ExecutionContext provided by context.dispatcher is not tied to the actor messages dispatcher, but what if it were? i.e.
class MyActorWithSafeFutures {
implicit def safeDispatcher = context.dispatcherOnMessageThread
var successCount = 0
var failureCount = 0
override def receive: Receive = {
case MakeExternalRequest(req) =>
val response: Future[Response] = someClient.makeRequest(req)
response.onComplete {
case Success(_) => successCount += 1
case Failure(_) => failureCount += 1
}
response pipeTo sender()
}
}
}
Is there any way to do that in Akka?
(I know that I could convert the above example to do something like self ! IncrementSuccess, but this question is about mutating actor state from Futures, rather than via messages.)
It looks like I might be able to implement this myself, using code like the following:
class MyActorWithSafeFutures {
implicit val executionContext: ExecutionContextExecutor = new ExecutionContextExecutor {
override def execute(runnable: Runnable): Unit = {
self ! runnable
}
override def reportFailure(cause: Throwable): Unit = {
throw new Error("Unhandled throwable", cause)
}
}
override def receive: Receive = {
case runnable: Runnable => runnable.run()
... other cases here
}
}
Would that work? Why doesn't Akka offer that - is there some huge drawback I'm not seeing?
(See https://github.com/jducoeur/Requester for a library which does just this in a limited way -- for Asks only, not for all Future callbacks.)
Your actor is executing its receive under one of the dispatcher's threads, and you want to spin off a Future that's firmly attached to this particular thread? In that case the system can't reuse this thread to run a different actor, because that would mean the thread was unavailable when you wanted to execute the Future. If it happened to use that same thread to execute someClient, you might deadlock with yourself. So this thread can no longer be used freely to run other actors - it has to belong to MySafeActor.
And no other threads can be allowed to freely run MySafeActor - if they were, two different threads might try to update successCount at the same time and you'd lose data (e.g. if the value is 0 and two threads both try to do successCount +=1, the value can end up as 1 rather that 2). So to do this safely, MySafeActor has to have a single Thread that's used for itself and its Future. So you end up with MySafeActor and that Future being tightly, but invisibly, coupled. The two can't run at the same time and could deadlock against each other. (It's still possible for a badly-written actor to deadlock against itself, but the fact that all the code using that actor's "imaginary mutex" is in a single place makes it easier to see potential problems).
You could use traditional multithreading techniques - mutexes and the like - to allow the Future and MySafeActor to run concurrently. But what you really want is to encapsulate successCount in something that can be used concurrently but safely - some kind of... Actor?
TL;DR: either the Future and the Actor: 1) may not run concurrently, in which case you may deadlock 2) may run concurrently, in which case you will corrupt data 3) access state in a concurrency-safe way, in which case you're reimplementing Actors.
You could use a PinnedDispatcher for your MyActorWithSafeFutures actor class which would create a thread pool with exactly one thread for each instance of the given class, and use context.dispatcher as execution context for your Future.
To do this you have to put something like this in your application.conf:
akka {
...
}
my-pinned-dispatcher {
executor = "thread-pool-executor"
type = PinnedDispatcher
}
and to create your actor:
actorSystem.actorOf(
Props(
classOf[MyActorWithSafeFutures]
).withDispatcher("my-pinned-dispatcher"),
"myActorWithSafeFutures"
)
Although what you are trying to achieve breaks completely the purpose of the actor model. The actor state should be encapsulated, and actor state changes should be driven by incoming messages.
This does not answer your question directly, but rather offers an alternative solution using Akka Agents:
class MyActorWithSafeFutures extends Actor {
var successCount = Agent(0)
var failureCount = Agent(0)
def doSomethingWithPossiblyStaleCounts() = {
val (s, f) = (successCount.get(), failureCount.get())
statisticsCollector ! Ratio(f/s+f)
}
def doSomethingWithCurrentCounts() = {
val (successF, failureF) = (successCount.future(), failureCount.future())
val ratio : Future[Ratio] = for {
s <- successF
f <- failureF
} yield Ratio(f/s+f)
ratio pipeTo statisticsCollector
}
override def receive: Receive = {
case MakeExternalRequest(req) =>
val response: Future[Response] = someClient.makeRequest(req)
response.onComplete {
case Success(_) => successCount.send(_ + 1)
case Failure(_) => failureCount.send(_ + 1)
}
response pipeTo sender()
}
}
The catch is that if you want to operate on the counts that would result if you were using #volatile, then you need to operate inside a Future, see doSomethingWithCurrentCounts().
If you are fine with having values which are eventually consistent (there might be pending updates scheduled for the Agents), then something like doSometinghWithPossiblyStaleCounts() is fine.
#rkuhn explains why this would be a bad idea on the akka-user list:
My main consideration here is that such a dispatcher would make it very convenient to have multiple concurrent entry points into the Actor’s behavior, where with the current recommendation there is only one—the active behavior. While classical data races are excluded by the synchronization afforded by the proposed ExecutionContext, it would still allow higher-level races by suspending a logical thread and not controlling the intermediate execution of other messages. In a nutshell, I don’t think this would make the Actor easier to reason about, quite the opposite.

Parallel file processing in Scala

Suppose I need to process files in a given folder in parallel. In Java I would create a FolderReader thread to read file names from the folder and a pool of FileProcessor threads. FolderReader reads file names and submits the file processing function (Runnable) to the pool executor.
In Scala I see two options:
create a pool of FileProcessor actors and schedule a file processing function with Actors.Scheduler.
create an actor for each file name while reading the file names.
Does it make sense? What is the best option?
Depending on what you're doing, it may be as simple as
for(file<-files.par){
//process the file
}
I suggest with all my energies to keep as far as you can from the threads. Luckily we have better abstractions which take care of what's happening below, and in your case it appears to me that you do not need to use actors (while you can) but you can use a simpler abstraction, called Futures. They are a part of Akka open source library, and I think in the future will be a part of the Scala standard library as well.
A Future[T] is simply something that will return a T in the future.
All you need to run a future, is to have an implicit ExecutionContext, which you can derive from a java executor service. Then you will be able to enjoy the elegant API and the fact that a future is a monad to transform collections into collections of futures, collect the result and so on. I suggest you to give a look to http://doc.akka.io/docs/akka/2.0.1/scala/futures.html
object TestingFutures {
implicit val executorService = Executors.newFixedThreadPool(20)
implicit val executorContext = ExecutionContext.fromExecutorService(executorService)
def testFutures(myList:List[String]):List[String]= {
val listOfFutures : Future[List[String]] = Future.traverse(myList){
aString => Future{
aString.reverse
}
}
val result:List[String] = Await.result(listOfFutures,1 minute)
result
}
}
There's a lot going on here:
I am using Future.traverse which receives as a first parameter which is M[T]<:Traversable[T] and as second parameter a T => Future[T] or if you prefer a Function1[T,Future[T]] and returns Future[M[T]]
I am using the Future.apply method to create an anonymous class of type Future[T]
There are many other reasons to look at Akka futures.
Futures can be mapped because they are monad, i.e. you can chain Futures execution :
Future { 3 }.map { _ * 2 }.map { _.toString }
Futures have callback: future.onComplete, onSuccess, onFailure, andThen etc.
Futures support not only traverse, but also for comprehension
Ideally you should use two actors. One for reading the list of files, and one for actually reading the file.
You start the process by simply sending a single "start" message to the first actor. The actor can then read the list of files, and send a message to the second actor. The second actor then reads the file and processes the contents.
Having multiple actors, which might seem complicated, is actually a good thing in the sense that you have a bunch of objects communicating with eachother, like in a theoretical OO system.
Edit: you REALLY shouldn't be doing doing concurrent reading of a single file.
I was going to write up exactly what #Edmondo1984 did except he beat me to it. :) I second his suggestion in a big way. I'll also suggest that you read the documentation for Akka 2.0.2. As well, I'll give you a slightly more concrete example:
import akka.dispatch.{ExecutionContext, Future, Await}
import akka.util.duration._
import java.util.concurrent.Executors
import java.io.File
val execService = Executors.newCachedThreadPool()
implicit val execContext = ExecutionContext.fromExecutorService(execService)
val tmp = new File("/tmp/")
val files = tmp.listFiles()
val workers = files.map { f =>
Future {
f.getAbsolutePath()
}
}.toSeq
val result = Future.sequence(workers)
result.onSuccess {
case filenames =>
filenames.foreach { fn =>
println(fn)
}
}
// Artificial just to make things work for the example
Thread.sleep(100)
execContext.shutdown()
Here I use sequence instead of traverse, but the difference is going to depend on your needs.
Go with the Future, my friend; the Actor is just a more painful approach in this instance.
But if use actors, what's wrong with that?
If we have to read / write to some property file. There is my Java example. But still with Akka Actors.
Lest's say we have an actor ActorFile represents one file. Hm.. Probably it can not represent One file. Right? (would be nice it could). So then it represents several files like PropertyFilesActor then:
Why would not use something like this:
public class PropertyFilesActor extends UntypedActor {
Map<String, String> filesContent = new LinkedHashMap<String, String>();
{ // here we should use real files of cource
filesContent.put("file1.xml", "");
filesContent.put("file2.xml", "");
}
#Override
public void onReceive(Object message) throws Exception {
if (message instanceof WriteMessage) {
WriteMessage writeMessage = (WriteMessage) message;
String content = filesContent.get(writeMessage.fileName);
String newContent = content + writeMessage.stringToWrite;
filesContent.put(writeMessage.fileName, newContent);
}
else if (message instanceof ReadMessage) {
ReadMessage readMessage = (ReadMessage) message;
String currentContent = filesContent.get(readMessage.fileName);
// Send the current content back to the sender
getSender().tell(new ReadMessage(readMessage.fileName, currentContent), getSelf());
}
else unhandled(message);
}
}
...a message will go with parameter (fileName)
It has its own in-box, accepting messages like:
WriteLine(fileName, string)
ReadLine(fileName, string)
Those messages will be storing into to the in-box in the order, one after antoher. The actor would do its work by receiving messages from the box - storing/reading, and meanwhile sending feedback sender ! message back.
Thus, let's say if we write to the property file, and send showing the content on the web page. We can start showing page (right after we sent message to store a data to the file) and as soon as we received the feedback, update part of the page with a data from just updated file (by ajax).
Well, grab your files and stick them in a parallel structure
scala> new java.io.File("/tmp").listFiles.par
res0: scala.collection.parallel.mutable.ParArray[java.io.File] = ParArray( ... )
Then...
scala> res0 map (_.length)
res1: scala.collection.parallel.mutable.ParArray[Long] = ParArray(4943, 1960, 4208, 103266, 363 ... )