Scala/akka discrete event simulation - scala

I am considering migrating a large code base to Scala and leveraging the Akka Actor model. I am running into the following conceptual issue almost immediately:
In the existing code base, business logic is tested using a discrete event simulation (DES) like SimPy with deterministic results. While the real-time program runs for hous, the DES takes a few minutes. The real time system can be asynchronous and need not follow the exact ordering of the testing setup. I'd like to leverage the same Akka code for both the testing setup and the real-time setup. Can this be achieved in Scala + Akka?
I have toyed with the idea of a central message queue Actor -- but feel this is not the correct approach.

A general approach to solving the problem you stated is to isolate your "business logic" from Akka code. This allows the business code to be unit tested, and event tested, independently of Akka which also allows you to write very lean Akka code.
As an example, say you're business logic is to process some Data:
object BusinessLogic {
type Data = ???
type Result = ???
def processData(data : Data) : Result = ???
}
This is a clean implementation that can be run in any concurrency environment, not just Akka (Scala Futures, Java threads, ...).
Historic Simulation
The core business logic can then be run in your discrete event simulation:
import BusinessLogic.processData
val someDate : Date = ???
val historicData : Iterable[Data] = querySomeDatabase(someDate)
//discrete event simulation
val historicResults : Iterable[Result] = historicData map processData
If concurrency is capable of making the event simulator faster then it is possible to use a non-Akka approach:
val concurrentHistoricResults : Future[Iterable[Result]] =
Future sequence {
historicData.map(data => Future(processData(data)))
}
Akka Realtime
At the same time the logic can be incorporated into Akka Actors. In general it is very helpful to make your receive method nothing more than a "data dispatcher", there shouldn't be any substantive code residing in the Actor definition:
class BusinessActor extends Actor {
override def receive = {
case data : Data => sender ! processData(data)
}
}
Similarly the business logic can be placed inside of akka streams for back-pressured stream processing:
val dataSource : Source[Data, _] = ???
val resultSource : Source[Result, _] =
dataSource via (Flow[Data] map processData)

Related

akka streaming file lines to actor router and writing with single actor. how to handle the backpressure

I want to stream a file from s3 to actor to be parsed and enriched and to write the output to other file.
The number of parserActors should be limited e.g
application.conf
akka{
actor{
deployment {
HereClient/router1 {
router = round-robin-pool
nr-of-instances = 28
}
}
}
}
code
val writerActor = actorSystem.actorOf(WriterActor.props())
val parser = actorSystem.actorOf(FromConfig.props(ParsingActor.props(writerActor)), "router1")
however the actor that is writing to a file should be limited to 1 (singleton)
I tried doing something like
val reader: ParquetReader[GenericRecord] = AvroParquetReader.builder[GenericRecord](file).withConf(conf).build()
val source: Source[GenericRecord, NotUsed] = AvroParquetSource(reader)
source.map (record => record ! parser)
but I am not sure that the backpressure is handled correctly. any advice ?
Indeed your solution is disregarding backpressure.
The correct way to have a stream interact with an actor while maintaining backpressure is to use the ask pattern support of akka-stream (reference).
From my understanding of your example you have 2 separate actor interaction points:
send records to the parsing actors (via a router)
send parsed records to the singleton write actor
What I would do is something similar to the following:
val writerActor = actorSystem.actorOf(WriterActor.props())
val parserActor = actorSystem.actorOf(FromConfig.props(ParsingActor.props(writerActor)), "router1")
val reader: ParquetReader[GenericRecord] = AvroParquetReader.builder[GenericRecord](file).withConf(conf).build()
val source: Source[GenericRecord, NotUsed] = AvroParquetSource(reader)
source.ask[ParsedRecord](28)(parserActor)
.ask[WriteAck](writerActor)
.runWith(Sink.ignore)
The idea is that you send all the GenericRecord elements to the parserActor which will reply with a ParsedRecord. Here as an example we specify a parallelism of 28 since that's the number of instances you have configured, however as long as you use a value higher than the actual number of actor instances no actor should suffer from work starvation.
Once the parseActor replies with the parsing result (here represented by the ParsedRecord) we apply the same pattern to interact with the singleton writer actor. Note that here we don't specify the parallelism as we have a single instance so it doesn't make sense the send more than 1 message at a time (in reality this happens anyway due to buffering at async boundaries, but this is just a built-in optimization). In this case we expect that the writer actor replies with a WriteAck to inform us that the writing has been successful and we can send the next element.
Using this method you are maintaining backpressure throughout your whole stream.
I think you should be using one of the "async" operations
Perhaps this other q/a gives you some insperation Processing an akka stream asynchronously and writing to a file sink

using streams vs actors for periodic tasks

Im working with akka/scala/play stack.
Usually, im using stream to perform certain tasks. for example, I have a stream that wakes every minute, picks up something from the DB, and call another service to enrich its data using an API and save the enrichment to the DB.
something like this:
class FetcherAndSaveStream #Inject()(fetcherAndSaveGraph: FetcherAndSaveGraph, dbElementsSource: DbElementsSource)
(implicit val mat: Materializer,
implicit val exec: ExecutionContext) extends LazyLogging {
def graph[M1, M2](source: Source[BDElement, M1],
sink: Sink[BDElement, M2],
switch: SharedKillSwitch): RunnableGraph[(M1, M2)] = {
val fetchAndSaveDataFromExternalService: Flow[BDElement, BDElement, NotUsed] =
fetcherAndSaveGraph.fetchEndSaveEnrichment
source.viaMat(switch.flow)(Keep.left)
.via(fetchAndSaveDataFromExternalService)
.toMat(sink)(Keep.both).withAttributes(supervisionStrategy(resumingDecider))
}
def runGraph(switchSharedKill: SharedKillSwitch): (NotUsed, Future[Done]) = {
logger.info("FetcherAndSaveStream is now running")
graph(dbElementsSource.dbElements(), Sink.ignore, switchSharedKill).run()
}
}
I wonder, is this better than just using an actor that ticks every minute and do something like that? what is the comparison between using actors for this and stream?
trying to figure out still when should I choose which method (streams/actors). thanks!!
You can use both, depending on the requirements you have for your solution which are not listed there. The general concern you need to take into consideration - actors more low-level stuff than streams, so they require more code and debug.
Basically, streams are good for tasks where you have a relatively big amount of data you need to process with low memory consumption. With streams, you won't need to start to stream each n seconds, you can set this stream to run along with the application. That could make your code more concise by omitting scheduler logic.
I will omit your DI and architecture stuff, write solution with pseudocode:
val yourConsumer: Sink[YourDBRecord] = ???
val recordsSource: Source[YourDBRecord] =
val runnableGraph = (Source repeat ())
.throttle(1, n seconds)
.mapAsync(yourParallelism){_ =>
fetchReasonableAmountOfRecordsFromDB
} mapConcat identity to yourConsumer
This stream will do your stuff. You even can enhance it with more sophisticated logic to adapt the polling rate according to workloads using feedback loop in graph api. Also, you can add the error-handling strategy you need to resume in place your stream has crashed.
Moreover, there's alpakka connectors for DBS capable of doing so, you can see if solutions there fit your purpose, or check for implementation details.
What you can get by doing so - backpressure, ability to work with streams, clean and concise code with no timed automata managed directly by you.
https://doc.akka.io/docs/akka/current/stream/stream-rate.html
You can also create an actor, but then you should do all the things akka streams do for you by hand, i.e. back-pressure in case you want to interop with streams, scheduler, chunking and memory management(to not to load 100000 or so entries in one batch to memory), etc.

Looking for something like a TestFlow analogous to TestSink and TestSource

I am writing a class that takes a Flow (representing a kind of socket) as a constructor argument and that allows to send messages and wait for the respective answers asynchronously by returning a Future. Example:
class SocketAdapter(underlyingSocket: Flow[String, String, _]) {
def sendMessage(msg: MessageType): Future[ResponseType]
}
This is not necessarily trivial because there may be other messages in the socket stream that are irrelevant, so some filtering is required.
In order to test the class I need to provide something like a "TestFlow" analogous to TestSink and TestSource. In fact I can create a flow by combining both. However, the problem is that I only obtain the actual probes upon materialization and materialization happens inside the class under test.
The problem is similar to the one I described in this question. My problem would be solved if I could materialize the flow first and then pass it to a client to connect to it. Again, I'm thinking about using MergeHub and BroadcastHub and again I see the problem that the resulting stream would behave differently because it is not linear anymore.
Maybe I misunderstood how a Flow is supposed to be used. In order to feed messages into the flow when sendMessage() is called, I need a certain kind of Source anyway. Maybe a Source.actorRef(...) or Source.queue(...), so I could pass in the ActorRef or SourceQueue directly. However, I'd prefer if this choice was up to the SocketAdapter class. Of course, this applies to the Sink as well.
It feels like this is a rather common case when working with streams and sockets. If it is not possible to create a "TestFlow" like I need it, I'm also happy with some advice on how to improve my design and make it better testable.
Update: I browsed through the documentation and found SourceRef and SinkRef. It looks like these could solve my problem but I'm not sure yet. Is it reasonable to use them in my case or are there any drawbacks, e.g. different behaviour in the test compared to production where there are no such refs?
Indirect Answer
The nature of your question suggests a design flaw which you are bumping into at testing time. The answer below does not address the issue in your question, but it demonstrates how to avoid the situation altogether.
Don't Mix Business Logic with Akka Code
Presumably you need to test your Flow because you have mixed a substantial amount of logic into the materialization. Lets assume you are using raw sockets for your IO. Your question suggests that your flow looks like:
val socketFlow : Flow[String, String, _] = {
val socket = new Socket(...)
//business logic for IO
}
You need a complicated test framework for your Flow because your Flow itself is also complicated.
Instead, you should separate out the logic into an independent function that has no akka dependencies:
type MessageProcessor = MessageType => ResponseType
object BusinessLogic {
val createMessageProcessor : (Socket) => MessageProcessor = {
//business logic for IO
}
}
Now your flow can be very simple:
val socket : Socket = new Socket(...)
val socketFlow = Flow.map(BusinessLogic.createMessageProcessor(socket))
As a result: your unit testing can exclusively work with createMessageProcessor, there's no need to test akka Flow because it is a simple veneer around the complicated logic that is tested independently.
Don't Use Streams For Concurrency Around 1 Element
The other big problem with your design is that SocketAdapter is using a stream to process just 1 message at a time. This is incredibly wasteful and unnecessary (you're trying to kill a mosquito with a tank).
Given the separated business logic your adapter becomes much simpler and independent of akka:
class SocketAdapter(messageProcessor : MessageProcessor) {
def sendMessage(msg: MessageType): Future[ResponseType] = Future {
messageProcessor(msg)
}
}
Note how easy it is to use Future in some instances and Flow in other scenarios depending on the need. This comes from the fact that the business logic is independent of any concurrency framework.
This is what I came up with using SinkRef and SourceRef:
object TestFlow {
def withProbes[In, Out](implicit actorSystem: ActorSystem,
actorMaterializer: ActorMaterializer)
:(Flow[In, Out, _], TestSubscriber.Probe[In], TestPublisher.Probe[Out]) = {
val f = Flow.fromSinkAndSourceMat(TestSink.probe[In], TestSource.probe[Out])
(Keep.both)
val ((sinkRefFuture, (inProbe, outProbe)), sourceRefFuture) =
StreamRefs.sinkRef[In]()
.viaMat(f)(Keep.both)
.toMat(StreamRefs.sourceRef[Out]())(Keep.both)
.run()
val sinkRef = Await.result(sinkRefFuture, 3.seconds)
val sourceRef = Await.result(sourceRefFuture, 3.seconds)
(Flow.fromSinkAndSource(sinkRef, sourceRef), inProbe, outProbe)
}
}
This gives me a flow I can completely control with the two probes but I can pass it to a client that connects source and sink later, so it seems to solve my problem.
The resulting Flow should only be used once, so it differs from a regular Flow that is rather a flow blueprint and can be materialized several times. However, this restriction applies to the web socket flow I am mocking anyway, as described here.
The only issue I still have is that some warnings are logged when the ActorSystem terminates after the test. This seems to be due to the indirection introduced by the SinkRef and SourceRef.
Update: I found a better solution without SinkRef and SourceRef by using mapMaterializedValue():
def withProbesFuture[In, Out](implicit actorSystem: ActorSystem,
ec: ExecutionContext)
: (Flow[In, Out, _],
Future[(TestSubscriber.Probe[In], TestPublisher.Probe[Out])]) = {
val (sinkPromise, sourcePromise) =
(Promise[TestSubscriber.Probe[In]], Promise[TestPublisher.Probe[Out]])
val flow =
Flow
.fromSinkAndSourceMat(TestSink.probe[In], TestSource.probe[Out])(Keep.both)
.mapMaterializedValue { case (inProbe, outProbe) =>
sinkPromise.success(inProbe)
sourcePromise.success(outProbe)
()
}
val probeTupleFuture = sinkPromise.future
.flatMap(sink => sourcePromise.future.map(source => (sink, source)))
(flow, probeTupleFuture)
}
When the class under test materializes the flow, the Future is completed and I receive the test probes.

Scala how to use akka actors to handle a timing out operation efficiently

I am currently evaluating javascript scripts using Rhino in a restful service. I wish for there to be an evaluation time out.
I have created a mock example actor (using scala 2.10 akka actors).
case class Evaluate(expression: String)
class RhinoActor extends Actor {
override def preStart() = { println("Start context'"); super.preStart()}
def receive = {
case Evaluate(expression) ⇒ {
Thread.sleep(100)
sender ! "complete"
}
}
override def postStop() = { println("Stop context'"); super.postStop()}
}
Now I run use this actor as follows:
def run {
val t = System.currentTimeMillis()
val system = ActorSystem("MySystem")
val actor = system.actorOf(Props[RhinoActor])
implicit val timeout = Timeout(50 milliseconds)
val future = (actor ? Evaluate("10 + 50")).mapTo[String]
val result = Try(Await.result(future, Duration.Inf))
println(System.currentTimeMillis() - t)
println(result)
actor ! PoisonPill
system.shutdown()
}
Is it wise to use the ActorSystem in a closure like this which may have simultaneous requests on it?
Should I make the ActorSystem global, and will that be ok in this context?
Is there a more appropriate alternative approach?
EDIT: I think I need to use futures directly, but I will need the preStart and postStop. Currently investigating.
EDIT: Seems you don't get those hooks with futures.
I'll try and answer some of your questions for you.
First, an ActorSystem is a very heavy weight construct. You should not create one per request that needs an actor. You should create one globally and then use that single instance to spawn your actors (and you won't need system.shutdown() anymore in run). I believe this covers your first two questions.
Your approach of using an actor to execute javascript here seems sound to me. But instead of spinning up an actor per request, you might want to pool a bunch of the RhinoActors behind a Router, with each instance having it's own rhino engine that will be setup during preStart. Doing this will eliminate per request rhino initialization costs, speeding up your js evaluations. Just make sure you size your pool appropriately. Also, you won't need to be sending PoisonPill messages per request if you adopt this approach.
You also might want to look into the non-blocking callbacks onComplete, onSuccess and onFailure as opposed to using the blocking Await. These callbacks also respect timeouts and are preferable to blocking for higher throughput. As long as whatever is way way upstream waiting for this response can handle the asynchronicity (i.e. an async capable web request), then I suggest going this route.
The last thing to keep in mind is that even though code will return to the caller after the timeout if the actor has yet to respond, the actor still goes on processing that message (performing the evaluation). It does not stop and move onto the next message just because a caller timed out. Just wanted to make that clear in case it wasn't.
EDIT
In response to your comment about stopping a long execution there are some things related to Akka to consider first. You can call stop the actor, send a Kill or a PosionPill, but none of these will stop if from processing the message that it's currently processing. They just prevent it from receiving new messages. In your case, with Rhino, if infinite script execution is a possibility, then I suggest handling this within Rhino itself. I would dig into the answers on this post (Stopping the Rhino Engine in middle of execution) and setup your Rhino engine in the actor in such a way that it will stop itself if it has been executing for too long. That failure will kick out to the supervisor (if pooled) and cause that pooled instance to be restarted which will init a new Rhino in preStart. This might be the best approach for dealing with the possibility of long running scripts.

Implementing long polling in scala and play 2.0 with akka

I'm implementing long polling in Play 2.0 in potentially a distributed environment. The way I understand it is that when Play gets a request, it should suspend pending notification of an update then go to the db to fetch new data and repeat. I started looking at the chat example that Play 2.0 offers but it's in websocket. Furthermore it doesn't look like it's capable of being distributed. So I thought I will use Akka's event bus. I took the eventstream implementation and replicated my own with LookupClassification. However I'm stumped as to how I'm gonna get a message back (or for that matter, what should be the subscriber instead of ActorRef)?
EventStream implementation:
https://github.com/akka/akka/blob/master/akka-actor/src/main/scala/akka/event/EventStream.scala
I am not sure that is what you are looking for, but there is quite a simple solution in the comet-clock sample, that you can adapt to use AKKA actors. It uses an infinite iframe instead of long polling. I have used an adapted version for a more complex application doing multiple DB calls and long computation in AKKA actors and it works fine.
def enum = Action {
//get your actor
val myActorRef = Akka.system.actorOf(Props[TestActor])
//do some query to your DB here. Promise.timeout is to simulate a blocking call
def getDatabaseItem(id: Int): Promise[String] = { Promise.timeout("test", 10 milliseconds) }
//test iterator, you will want something smarter here
val items1 = 1 to 10 toIterator
// this is a very simple enumerator that takes ints from an existing iterator (for an http request parameters for instance) and do some computations
def myEnum(it: Iterator[Int]): Enumerator[String] = Enumerator.fromCallback[String] { () =>
if (!items1.hasNext)
Promise.pure[Option[String]](None) //we are done with our computations
else {
// get the next int, query the database and compose the promise with a further query to the AKKA actor
getDatabaseItem(items1.next).flatMap { dbValue =>
implicit val timeout = new Timeout(10 milliseconds)
val future = (myActorRef ? dbValue) mapTo manifest[String]
// here we convert the AKKA actor to the right Promise[Option] output
future.map(v => Some(v)).asPromise
}
}
}
// finally we stream the result to the infinite iframe.
// console.log is the javascript callback, you will want something more interesting.
Ok.stream(myEnum(items1) &> Comet(callback = "console.log"))
}
Note that this fromCallback doesn't allow you to combine enumerators with "andThen", there is in the trunk version of play2 a generateM method that might be more appropriate if you want to use combinations.
It's not long polling, but it works fine.
I stumbled on your question while looking for the same thing.
I found the streaming solution unsatisfying as they caused "spinner of death" in webkit browser (i.e. shows it is loading all the time)
Anyhow, didn't have any luck finding good examples but I managed to create my own proof-of-concept using promises:
https://github.com/kallebertell/longpoll