Scala parallel execution - scala

I am working on a requirement to get stats about files stored in Linux using Scala.
We will pass the root directory as input and our code will get the complete list of sub directories for the root directory passed.
Then for each directory in the list i will get the files list and for each files I will get the owners, groups, permission, lastmodifiedtime, createdtime, lastaccesstime.
The problem is how to can I process the directories list in parallel to get the stats of the files stored in that directory.
In production environment we have 100000+ of folders inside root folders.
So my list is having 100000+ folders list.
How can I parallize my operation(file stats) on my available list.
Since I am new to Scala please help me in this requirement.
Sorry for posting without code snippet.
Thanks.

I ended up using Akka actors.
I made assumptions about your desired output so that the program would be simple and fast. The assumptions I made are that the output is JSON, the hierarchy is not preserved, and that multiple files are acceptable. If you don't like JSON, you can replace it with something else, but the other two assumptions are important for keeping the current speed and simplicity of the program.
There are some command line parameters you can set. If you don't set them, then defaults will be used. The defaults are contained in Main.scala.
The command line parameters are as follows:
(0) the root directory you are starting from; (no default)
(1) the timeout interval (in seconds) for all the timeouts in this program; (default is 60)
(2) the number of printer actors to use; this will be the number of log files created; (default is 50)
(3) the tick interval to use for the monitor actor; (default is 500)
For the timeout, keep in mind this is the value of the time interval to wait at the completion of the program. So if you run a small job and wonder why it is taking a minute to complete, it is because it is waiting for the timeout interval to elapse before closing the program.
Because you are running such a large job, it is possible that the default timeout of 60 is too small. If you are getting exceptions complaining about timeout, increase the timeout value.
Please note that if your tick interval is set too high, there is a chance your program will close prematurely.
To run, just start sbt in project folder, and type
runMain Main <canonical path of root directory>
I couldn't figure how to get the group of a File in Java. You'll need to research that and add the relevant code to Entity.scala and TraverseActor.scala.
Also f.list() in TraverseActor.scala was sometimes coming back as null, which was why I wrapped it in an Option. You'll have to debug that issue to make sure you aren't failing silently on certain files.
Now, here are the contents of all the files.
build.sbt
name := "stackoverflow20191110"
version := "0.1"
scalaVersion := "2.12.1"
libraryDependencies ++= Seq(
"io.circe" %% "circe-core",
"io.circe" %% "circe-generic",
"io.circe" %% "circe-parser"
).map(_ % "0.12.2")
libraryDependencies += "com.typesafe.akka" %% "akka-actor" % "2.4.16"
Entity.scala
import io.circe.Encoder
import io.circe.generic.semiauto._
sealed trait Entity {
def path: String
def owner: String
def permissions: String
def lastModifiedTime: String
def creationTime: String
def lastAccessTime: String
def hashCode: Int
}
object Entity {
implicit val entityEncoder: Encoder[Entity] = deriveEncoder
}
case class FileEntity(path: String, owner: String, permissions: String, lastModifiedTime: String, creationTime: String, lastAccessTime: String) extends Entity
object fileentityEncoder {
implicit val fileentityEncoder: Encoder[FileEntity] = deriveEncoder
}
case class DirectoryEntity(path: String, owner: String, permissions: String, lastModifiedTime: String, creationTime: String, lastAccessTime: String) extends Entity
object DirectoryEntity {
implicit val directoryentityEncoder: Encoder[DirectoryEntity] = deriveEncoder
}
case class Contents(path: String, files: IndexedSeq[Entity])
object Contents {
implicit val contentsEncoder: Encoder[Contents] = deriveEncoder
}
Main.scala
import akka.actor.ActorSystem
import akka.pattern.ask
import akka.util.Timeout
import java.io.{BufferedWriter, File, FileWriter}
import ShutDownActor.ShutDownYet
import scala.concurrent.Await
import scala.concurrent.duration._
import scala.util.Try
object Main {
val defaultNumPrinters = 50
val defaultMonitorTickInterval = 500
val defaultTimeoutInS = 60
def main(args: Array[String]): Unit = {
val timeoutInS = Try(args(1).toInt).toOption.getOrElse(defaultTimeoutInS)
val system = ActorSystem("SearchHierarchy")
val shutdown = system.actorOf(ShutDownActor.props)
val monitor = system.actorOf(MonitorActor.props(shutdown, timeoutInS))
val refs = (0 until Try(args(2).toInt).toOption.getOrElse(defaultNumPrinters)).map{x =>
val name = "logfile" + x
(name, system.actorOf(PrintActor.props(name, Try(args(3).toInt).toOption.getOrElse(defaultMonitorTickInterval), monitor)))
}
val root = system.actorOf(TraverseActor.props(new File(args(0)), refs))
implicit val askTimeout = Timeout(timeoutInS seconds)
var isTimedOut = false
while(!isTimedOut){
Thread.sleep(30000)
val fut = (shutdown ? ShutDownYet).mapTo[Boolean]
isTimedOut = Await.result(fut, timeoutInS seconds)
}
refs.foreach{ x =>
val fw = new BufferedWriter(new FileWriter(new File(x._1), true))
fw.write("{}\n]")
fw.close()
}
system.terminate
}
}
MonitorActor.scala
import MonitorActor.ShutDown
import akka.actor.{Actor, ActorRef, Props, ReceiveTimeout, Stash}
import io.circe.syntax._
import scala.concurrent.duration._
class MonitorActor(shutdownActor: ActorRef, timeoutInS: Int) extends Actor with Stash {
context.setReceiveTimeout(timeoutInS seconds)
override def receive: Receive = {
case ReceiveTimeout =>
shutdownActor ! ShutDown
}
}
object MonitorActor {
def props(shutdownActor: ActorRef, timeoutInS: Int) = Props(new MonitorActor(shutdownActor, timeoutInS))
case object ShutDown
}
PrintActor.scala
import java.io.{BufferedWriter, File, FileWriter, PrintWriter}
import akka.actor.{Actor, ActorRef, Props, Stash}
import PrintActor.{Count, HeartBeat}
class PrintActor(name: String, interval: Int, monitorActor: ActorRef) extends Actor with Stash {
val file = new File(name)
override def preStart = {
val fw = new BufferedWriter(new FileWriter(file, true))
fw.write("[\n")
fw.close()
self ! Count(0)
}
override def receive: Receive = {
case Count(c) =>
context.become(withCount(c))
unstashAll()
case _ =>
stash()
}
def withCount(c: Int): Receive = {
case s: String =>
val fw = new BufferedWriter(new FileWriter(file, true))
fw.write(s)
fw.write(",\n")
fw.close()
if (c == interval) {
monitorActor ! HeartBeat
context.become(withCount(0))
} else {
context.become(withCount(c+1))
}
}
}
object PrintActor {
def props(name: String, interval: Int, monitorActor: ActorRef) = Props(new PrintActor(name, interval, monitorActor))
case class Count(count: Int)
case object HeartBeat
}
ShutDownActor.scala
import MonitorActor.ShutDown
import ShutDownActor.ShutDownYet
import akka.actor.{Actor, Props, Stash}
class ShutDownActor() extends Actor with Stash {
override def receive: Receive = {
case ShutDownYet => sender ! false
case ShutDown => context.become(canShutDown())
}
def canShutDown(): Receive = {
case ShutDownYet => sender ! true
}
}
object ShutDownActor {
def props = Props(new ShutDownActor())
case object ShutDownYet
}
TraverseActor.scala
import java.io.File
import akka.actor.{Actor, ActorRef, PoisonPill, Props, ReceiveTimeout}
import io.circe.syntax._
import scala.collection.JavaConversions
import scala.concurrent.duration._
import scala.util.Try
class TraverseActor(start: File, printers: IndexedSeq[(String, ActorRef)]) extends Actor{
val hash = start.hashCode()
val mod = hash % printers.size
val idx = if (mod < 0) -mod else mod
val myPrinter = printers(idx)._2
override def preStart = {
self ! start
}
override def receive: Receive = {
case f: File =>
val path = f.getCanonicalPath
val files = Option(f.list()).map(_.toIndexedSeq.map(x =>new File(path + "/" + x)))
val directories = files.map(_.filter(_.isDirectory))
directories.foreach(ds => processDirectories(ds))
val entities = files.map{fs =>
fs.map{ f =>
val path = f.getCanonicalPath
val owner = Try(java.nio.file.Files.getOwner(f.toPath).toString).toOption.getOrElse("")
val permissions = Try(java.nio.file.Files.getPosixFilePermissions(f.toPath).toString).toOption.getOrElse("")
val attributes = Try(java.nio.file.Files.readAttributes(f.toPath, "lastModifiedTime,creationTime,lastAccessTime"))
val lastModifiedTime = attributes.flatMap(a => Try(a.get("lastModifiedTime").toString)).toOption.getOrElse("")
val creationTime = attributes.flatMap(a => Try(a.get("creationTime").toString)).toOption.getOrElse("")
val lastAccessTime = attributes.flatMap(a => Try(a.get("lastAccessTime").toString)).toOption.getOrElse("")
if (f.isDirectory) FileEntity(path, owner, permissions, lastModifiedTime, creationTime, lastAccessTime)
else DirectoryEntity(path, owner, permissions, lastModifiedTime, creationTime, lastAccessTime)
}
}
directories match {
case Some(seq) =>
seq match {
case x+:xs =>
case IndexedSeq() => self ! PoisonPill
}
case None => self ! PoisonPill
}
entities.foreach(e => myPrinter ! Contents(f.getCanonicalPath, e).asJson.toString)
}
def processDirectories(directories: IndexedSeq[File]): Unit = {
def inner(fs: IndexedSeq[File]): Unit = {
fs match {
case x +: xs =>
context.actorOf(TraverseActor.props(x, printers))
processDirectories(xs)
case IndexedSeq() =>
}
}
directories match {
case x +: xs =>
self ! x
inner(xs)
case IndexedSeq() =>
}
}
}
object TraverseActor {
def props(start: File, printers: IndexedSeq[(String, ActorRef)]) = Props(new TraverseActor(start, printers))
}
I only tested on a small example, so it is possible this program will run into problems when running your job. If that happens, feel free to ask questions.

Related

Akka HTTP - max-open-requests and substreams?

I'm writing an app using Scala 2.13 with Akka HTTP 10.2.4 and Akka Stream 2.6.15. I'm trying to query a web service in a parallel manner, like so:
package com.example
import akka.actor.typed.scaladsl.ActorContext
import akka.http.scaladsl.Http
import akka.http.scaladsl.client.RequestBuilding.Get
import akka.http.scaladsl.model.HttpResponse
import akka.http.scaladsl.unmarshalling.Unmarshal
import akka.stream.scaladsl.{Flow, JsonFraming, Sink, Source}
import spray.json.DefaultJsonProtocol
import spray.json.DefaultJsonProtocol.jsonFormat2
import scala.util.Try
case class ClientStockPortfolio(id: Long, symbol: String)
case class StockTicker(symbol: String, price: Double)
trait SprayFormat extends DefaultJsonProtocol {
implicit val stockTickerFormat = jsonFormat2(StockTicker)
}
class StockTrader(context: ActorContext[_]) extends SprayFormat {
implicit val system = context.system.classicSystem
val httpPool = Http().superPool()[Seq[ClientStockPortfolio]]
def collectPrices() = {
val src = Source(Seq(
ClientStockPortfolio(1, "GOOG"),
ClientStockPortfolio(2, "AMZN"),
ClientStockPortfolio(3, "MSFT")
)
)
val graph = src
.groupBy(8, _.id % 8)
.via(createPost)
.via(httpPool)
.via(decodeTicker)
.mergeSubstreamsWithParallelism(8)
.to(
Sink.fold(0.0) { (totalPrice, ticker) =>
insertIntoDatabase(ticker)
totalPrice + ticker.price
}
)
graph.run()
}
def createPost = Flow[ClientStockPortfolio]
.grouped(10)
.map { port =>
(
Get(uri = s"http://wherever/?symbols=${port.map(_.symbol).mkString(",")}"),
port
)
}
def decodeTicker = Flow[(Try[HttpResponse], Seq[ClientStockPortfolio])]
.flatMapConcat { x =>
x._1.get.entity.dataBytes
.via(JsonFraming.objectScanner(Int.MaxValue))
.mapAsync(4)(bytes => Unmarshal(bytes).to[StockTicker])
.mapConcat { ticker =>
lookupPreviousPrices(ticker)
}
}
def lookupPreviousPrices(ticker: StockTicker): List[StockTicker] = ???
def insertIntoDatabase(ticker: StockTicker) = ???
}
I have two questions. First, will the groupBy call that splits the stream into substreams run them in parallel like I want? And second, when I call this code, I run into the max-open-requests error, since I haven't increased the setting from the default. But even if I am running in parallel, I'm only running 8 threads - how is the Http().superPool() getting backed up with 32 requests?

Scala program using futures is not terminating

I am trying to learn concurrency in Scala and using Scala futures to generate a dataset with random string. I want to create an application which should generate a file with any number of records and it should be scalable.
Code:
import java.util.concurrent.{ExecutorService, Executors}
import scala.util.{Failure, Random, Success}
import scala.concurrent.duration._
object datacreator {
implicit val ec: ExecutionContext = new ExecutionContext {
val threadPool: ExecutorService = Executors.newFixedThreadPool(4)
def execute(runnable: Runnable) {
threadPool.submit(runnable)
}
def reportFailure(t: Throwable) {}
}
def getRecord : String = {
"Random string"
}
def main(args: Array[String]): Unit = {
val filename = args(0)
val number_of_records = args(1)
val file_Object = new FileWriter(filename, true)
val data: Future[Iterable[String]] = Future {
for (i <- 1 to number_of_records.toInt)
yield getRecord
}
val result = data.map{
result => result.foreach(record => file_Object.write(record))
}
result.onComplete{
case Success(value) => {
println("Success")
file_Object.close()
}
case Failure(e) => e.printStackTrace()
}
}
}
I am facing the following issues:
When I am running the program using SBT it is writing results to the file but not terminating as going in infinite mode.
[info] Loading project definition from /Users/cw0155/PersonalProjects/datagen/project
[info] Loading settings for project datagen from build.sbt ...
[info] Set current project to datagenerator (in build file:/Users/cw0155/PersonalProjects/datagen/)
[info] running com.generator.DataGenerator xyz.csv 100
Success
| => datagen / Compile / runMain 255s
When I am running the program using Jar as:
scala -cp target/scala-2.13/datagenerator_2.13-0.1.jar com.generator.DataGenerator "pqr.csv" "1000"
It is waiting infinite time and not writing to the file.
Any help is much appreciated :)
Try this version
bar.scala
import scala.concurrent.{Await, Future, ExecutionContext}
import scala.concurrent.duration._
import scala.util.{Success, Failure}
import ExecutionContext.Implicits.global
import java.io.FileWriter
object bar {
def getRecord: String = "Random string\n"
def main(args: Array[String]): Unit = {
val filename = args(0)
val number_of_records = args(1)
val data: Future[Iterable[String]] = Future {
for (i <- 1 to number_of_records.toInt)
yield getRecord
}
val file_Object = new FileWriter(filename, true)
val result = data.map( r => r.foreach(record => file_Object.write(record)) )
result.onComplete {
case Success(value) =>
println("Success")
file_Object.close()
case Failure(e) =>
e.printStackTrace()
}
Await.result( result, 10.second )
}
}
Your original version gave me the expected output when I ran it like so
bash-3.2$ scala bar.scala /dev/fd/1 10
Success
Random string
Random string
Random string
Random string
Random string
Random string
Random string
Random string
Random string
Random string
However without the Await.result your program can exit before the future finishes.

Akka Streams: State in a flow

I want to read multiple big files using Akka Streams to process each line. Imagine that each key consists of an (identifier -> value). If a new identifier is found, I want to save it and its value in the database; otherwise, if the identifier has already been found while processing the stream of lines, I want to save only the value. For that, I think that I need some kind of recursive stateful flow in order to keep the identifiers that have already been found in a Map. I think I'd receive in this flow a pair of (newLine, contextWithIdentifiers).
I've just started to look into Akka Streams. I guess I can manage myself to do the stateless processing stuff but I have no clue about how to keep the contextWithIdentifiers. I'd appreciate any pointers to the right direction.
Maybe something like statefulMapConcat can help you:
import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import akka.stream.scaladsl.{Sink, Source}
import scala.util.Random._
import scala.math.abs
import scala.concurrent.ExecutionContext.Implicits.global
implicit val system = ActorSystem()
implicit val materializer = ActorMaterializer()
//encapsulating your input
case class IdentValue(id: Int, value: String)
//some random generated input
val identValues = List.fill(20)(IdentValue(abs(nextInt()) % 5, "valueHere"))
val stateFlow = Flow[IdentValue].statefulMapConcat{ () =>
//state with already processed ids
var ids = Set.empty[Int]
identValue => if (ids.contains(identValue.id)) {
//save value to DB
println(identValue.value)
List(identValue)
} else {
//save both to database
println(identValue)
ids = ids + identValue.id
List(identValue)
}
}
Source(identValues)
.via(stateFlow)
.runWith(Sink.seq)
.onSuccess { case identValue => println(identValue) }
A few years later, here is an implementation I wrote if you only need a 1-to-1 mapping (not 1-to-N):
import akka.stream.stage.{GraphStage, GraphStageLogic}
import akka.stream.{Attributes, FlowShape, Inlet, Outlet}
object StatefulMap {
def apply[T, O](converter: => T => O) = new StatefulMap[T, O](converter)
}
class StatefulMap[T, O](converter: => T => O) extends GraphStage[FlowShape[T, O]] {
val in = Inlet[T]("StatefulMap.in")
val out = Outlet[O]("StatefulMap.out")
val shape = FlowShape.of(in, out)
override def createLogic(inheritedAttributes: Attributes): GraphStageLogic = new GraphStageLogic(shape) {
val f = converter
setHandler(in, () => push(out, f(grab(in))))
setHandler(out, () => pull(in))
}
}
Test (and demo):
behavior of "StatefulMap"
class Counter extends (Any => Int) {
var count = 0
override def apply(x: Any): Int = {
count += 1
count
}
}
it should "not share state among substreams" in {
val result = await {
Source(0 until 10)
.groupBy(2, _ % 2)
.via(StatefulMap(new Counter()))
.fold(Seq.empty[Int])(_ :+ _)
.mergeSubstreams
.runWith(Sink.seq)
}
result.foreach(_ should be(1 to 5))
}

Akka-http process requests with Stream

I try write some simple akka-http and akka-streams based application, that handle http requests, always with one precompiled stream, because I plan to use long time processing with back-pressure in my requestProcessor stream
My application code:
import akka.actor.{ActorSystem, Props}
import akka.http.scaladsl._
import akka.http.scaladsl.server.Directives._
import akka.http.scaladsl.server._
import akka.stream.ActorFlowMaterializer
import akka.stream.actor.ActorPublisher
import akka.stream.scaladsl.{Sink, Source}
import scala.annotation.tailrec
import scala.concurrent.Future
object UserRegisterSource {
def props: Props = Props[UserRegisterSource]
final case class RegisterUser(username: String)
}
class UserRegisterSource extends ActorPublisher[UserRegisterSource.RegisterUser] {
import UserRegisterSource._
import akka.stream.actor.ActorPublisherMessage._
val MaxBufferSize = 100
var buf = Vector.empty[RegisterUser]
override def receive: Receive = {
case request: RegisterUser =>
if (buf.isEmpty && totalDemand > 0)
onNext(request)
else {
buf :+= request
deliverBuf()
}
case Request(_) =>
deliverBuf()
case Cancel =>
context.stop(self)
}
#tailrec final def deliverBuf(): Unit =
if (totalDemand > 0) {
if (totalDemand <= Int.MaxValue) {
val (use, keep) = buf.splitAt(totalDemand.toInt)
buf = keep
use foreach onNext
} else {
val (use, keep) = buf.splitAt(Int.MaxValue)
buf = keep
use foreach onNext
deliverBuf()
}
}
}
object Main extends App {
val host = "127.0.0.1"
val port = 8094
implicit val system = ActorSystem("my-testing-system")
implicit val fm = ActorFlowMaterializer()
implicit val executionContext = system.dispatcher
val serverSource: Source[Http.IncomingConnection, Future[Http.ServerBinding]] = Http(system).bind(interface = host, port = port)
val mySource = Source.actorPublisher[UserRegisterSource.RegisterUser](UserRegisterSource.props)
val requestProcessor = mySource
.mapAsync(1)(fakeSaveUserAndReturnCreatedUserId)
.to(Sink.head[Int])
.run()
val route: Route =
get {
path("test") {
parameter('test) { case t: String =>
requestProcessor ! UserRegisterSource.RegisterUser(t)
???
}
}
}
def fakeSaveUserAndReturnCreatedUserId(param: UserRegisterSource.RegisterUser): Future[Int] =
Future.successful {
1
}
serverSource.to(Sink.foreach {
connection =>
connection handleWith Route.handlerFlow(route)
}).run()
}
I found solution about how create Source that can dynamically accept new items to process, but I can found any solution about how than obtain result of stream execution in my route
The direct answer to your question is to materialize a new Stream for each HttpRequest and use Sink.head to get the value you're looking for. Modifying your code:
val requestStream =
mySource.map(fakeSaveUserAndReturnCreatedUserId)
.to(Sink.head[Int])
//.run() - don't materialize here
val route: Route =
get {
path("test") {
parameter('test) { case t: String =>
//materialize a new Stream here
val userIdFut : Future[Int] = requestStream.run()
requestProcessor ! UserRegisterSource.RegisterUser(t)
//get the result of the Stream
userIdFut onSuccess { case userId : Int => ...}
}
}
}
However, I think your question is ill posed. In your code example the only thing you're using an akka Stream for is to create a new UserId. Futures readily solve this problem without the need for a materialized Stream (and all the accompanying overhead):
val route: Route =
get {
path("test") {
parameter('test) { case t: String =>
val user = RegisterUser(t)
fakeSaveUserAndReturnCreatedUserId(user) onSuccess { case userId : Int =>
...
}
}
}
}
If you want to limit the number of concurrent calls to fakeSaveUserAndReturnCreateUserId then you can create an ExecutionContext with a defined ThreadPool size, as explained in the answer to this question, and use that ExecutionContext to create the Futures:
val ThreadCount = 10 //concurrent queries
val limitedExecutionContext =
ExecutionContext.fromExecutor(Executors.newFixedThreadPool(ThreadCount))
def fakeSaveUserAndReturnCreatedUserId(param: UserRegisterSource.RegisterUser): Future[Int] =
Future { 1 }(limitedExecutionContext)

Improve this actor calling futures

I've an actor (Worker) which basically ask 3 other actors (Filter1, Filter2 and Filter3) for a result. If any of them return a false, It's unnecessary to wait for the others, like an "and" operation over the results. When a false response is receive, a cancel message is sent to the actors in a way to cancel the queued work and make it more effective in the execution.
Filters aren't children of Worker, but there are a common pool of actor which are used by all Worker actors. I use an Agent to maintain the collection of cancel Works. Then, before a particular work is processed, I check in the cancel agent if that work was cancel, and then avoid the execution for it. Cancel has a higher priority than Work, then, it is processed always first.
The code is something like this
Proxy, who create the actors tree:
import scala.collection.mutable.HashSet
import scala.concurrent.ExecutionContext.Implicits.global
import com.typesafe.config.Config
import akka.actor.Actor
import akka.actor.ActorLogging
import akka.actor.ActorSystem
import akka.actor.PoisonPill
import akka.actor.Props
import akka.agent.Agent
import akka.routing.RoundRobinRouter
class Proxy extends Actor with ActorLogging {
val agent1 = Agent(new HashSet[Work])
val agent2 = Agent(new HashSet[Work])
val agent3 = Agent(new HashSet[Work])
val filter1 = context.actorOf(Props(Filter1(agent1)).withDispatcher("priorityMailBox-dispatcher")
.withRouter(RoundRobinRouter(24)), "filter1")
val filter2 = context.actorOf(Props(Filter2(agent2)).withDispatcher("priorityMailBox-dispatcher")
.withRouter(RoundRobinRouter(24)), "filter2")
val filter3 = context.actorOf(Props(Filter3(agent3)).withDispatcher("priorityMailBox-dispatcher")
.withRouter(RoundRobinRouter(24)), "filter3")
//val workerRouter = context.actorOf(Props[SerialWorker].withRouter(RoundRobinRouter(24)), name = "workerRouter")
val workerRouter = context.actorOf(Props(new Worker(filter1, filter2, filter3)).withRouter(RoundRobinRouter(24)), name = "workerRouter")
def receive = {
case w: Work =>
workerRouter forward w
}
}
Worker:
import scala.concurrent.Await
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.Future
import scala.concurrent.duration.DurationInt
import akka.actor.Actor
import akka.actor.ActorLogging
import akka.actor.Props
import akka.actor.actorRef2Scala
import akka.pattern.ask
import akka.pattern.pipe
import akka.util.Timeout
import akka.actor.ActorRef
import akka.routing.RoundRobinRouter
import akka.agent.Agent
import scala.collection.mutable.HashSet
class Worker(filter1: ActorRef, filter2: ActorRef, filter3: ActorRef) extends Actor with ActorLogging {
implicit val timeout = Timeout(30.seconds)
def receive = {
case w:Work =>
val start = System.currentTimeMillis();
val futureF3 = (filter3 ? w).mapTo[Response]
val futureF2 = (filter2 ? w).mapTo[Response]
val futureF1 = (filter1 ? w).mapTo[Response]
val aggResult = Future.find(List(futureF3, futureF2, futureF1)) { res => !res.reponse }
Await.result(aggResult, timeout.duration) match {
case None =>
Nqueen.fact(10500000L)
log.info(s"[${w.message}] Procesado mensaje TRUE en ${System.currentTimeMillis() - start} ms");
sender ! WorkResponse(w, true)
case _ =>
filter1 ! Cancel(w)
filter2 ! Cancel(w)
filter3 ! Cancel(w)
log.info(s"[${w.message}] Procesado mensaje FALSE en ${System.currentTimeMillis() - start} ms");
sender ! WorkResponse(w, false)
}
}
}
and Filters:
import scala.collection.mutable.HashSet
import scala.util.Random
import akka.actor.Actor
import akka.actor.ActorLogging
import akka.actor.actorRef2Scala
import akka.agent.Agent
trait CancellableFilter { this: Actor with ActorLogging =>
//val canceledJobs = new HashSet[Int]
val agent: Agent[HashSet[Work]]
def cancelReceive: Receive = {
case Cancel(w) =>
agent.send(_ += w)
//log.info(s"[$t] El trabajo se cancelara (si llega...)")
}
def cancelled(w: Work): Boolean =
if (agent.get.contains(w)) {
agent.send(_ -= w)
true
} else {
false
}
}
abstract class Filter extends Actor with ActorLogging { this: CancellableFilter =>
val random = new Random(System.currentTimeMillis())
def response: Boolean
val timeToWait: Int
val timeToExecutor: Long
def receive = cancelReceive orElse {
case w:Work if !cancelled(w) =>
//log.info(s"[$t] Llego trabajo")
Thread.sleep(timeToWait)
Nqueen.fact(timeToExecutor)
val r = Response(response)
//log.info(s"[$t] Respondio ${r.reponse}")
sender ! r
}
}
object Filter1 {
def apply(agente: Agent[HashSet[Work]]) = new Filter with CancellableFilter {
val timeToWait = 74
val timeToExecutor = 42000000L
val agent = agente
def response = true //random.nextBoolean
}
}
object Filter2 {
def apply(agente: Agent[HashSet[Work]]) = new Filter with CancellableFilter {
val timeToWait = 47
val timeToExecutor = 21000000L
val agent = agente
def response = true //random.nextBoolean
}
}
object Filter3 {
def apply(agente: Agent[HashSet[Work]]) = new Filter with CancellableFilter {
val timeToWait = 47
val timeToExecutor = 21000000L
val agent = agente
def response = true //random.nextBoolean
}
}
Basically, I think Worker code is ugly and I want to make it better. Could you help to improve it?
Other point I want to improve is the cancel message. As I don't know which of the filters are done, I need to Cancel all of them, then, at least one cancel is redundant (Since this work is completed)
It is minor, but why don't you store filters as sequence? filters.foreach(f ! Cancel(w)) is nicer than
filter1 ! Cancel(w)
filter2 ! Cancel(w)
filter3 ! Cancel(w)
Same for other cases:
class Worker(filter1: ActorRef, filter2: ActorRef, filter3: ActorRef) extends Actor with ActorLogging {
private val filters = Seq(filter1, filter2, filter3)
implicit val timeout = Timeout(30.seconds)
def receive = {
case w:Work =>
val start = System.currentTimeMillis();
val futures = filters.map { f =>
(f ? w).mapTo[Response]
}
val aggResult = Future.find(futures) { res => !res.reponse }
Await.result(aggResult, timeout.duration) match {
case None =>
Nqueen.fact(10500000L)
log.info(s"[${w.message}] Procesado mensaje TRUE en ${System.currentTimeMillis() - start} ms");
sender ! WorkResponse(w, true)
case _ =>
filters.foreach(f ! Cancel(w))
log.info(s"[${w.message}] Procesado mensaje FALSE en ${System.currentTimeMillis() - start} ms");
sender ! WorkResponse(w, false)
}
}
You may also consider to write constructor as Worker(filters: ActorRef*) if you do not enforce exactly three filters. It think it is okay to sendoff one redundant cancel (alternatives I see is overly complicated). I'm not sure, but if filters will be created very fast, if may got randoms initialized with the same seed value.