Processing an akka stream asynchronously and writing to a file sink - scala

I am trying to write a piece of code that would consume a stream of tickers (stock exchange symbol of a company) and fetch company information from a REST API for each ticker.
I want to fetch information for multiple companies asynchronously.
I would like to save the results to a file in a continuous manner as the entire data set might not fit into memory.
Following the documentation of akka streams and resources that I was able to google on this subject I have come up with the following piece of code (some parts are omitted for brevity):
implicit val actorSystem: ActorSystem = ActorSystem("stock-fetcher-system")
implicit val materializer: ActorMaterializer = ActorMaterializer(None, Some("StockFetcher"))(actorSystem)
implicit val context = system.dispatcher
import CompanyJsonMarshaller._
val parallelism = 10
val connectionPool = Http().cachedHostConnectionPoolHttps[String](s"")
val listOfSymbols = symbols.toList
val outputPath = "out.txt"
.mapAsync(parallelism) {
stockSymbol => Future(HttpRequest(uri = s"${stockSymbol.symbol}/company"), stockSymbol.symbol)
.map {
case (Success(response), _) => Unmarshal(response.entity).to[Company]
case (Failure(ex), symbol) => println(s"Unable to fetch char data for $symbol") "x"
.runWith(FileIO.toPath(new File(outputPath).toPath, Set(StandardOpenOption.APPEND)))
.onComplete { _ =>
This is the problematic line:
runWith(FileIO.toPath(new File(outputPath).toPath, Set(StandardOpenOption.APPEND)))
which doesn't compile and the compiler gives me this mysteriously looking error:
Type mismatch, expected Graph[SinkShape[Any, NotInferedMat2], actual Sink[ByeString, Future[IOResult]]
If I change the sink to Sink.ignore or println(_) it works.
I'd appreciate some more detailed explanation.

As the compiler is indicating, the types don't match. In the call to .map...
.map {
case (Success(response), _) =>
case (Failure(ex), symbol) =>
println(s"Unable to fetch char data for $symbol")
}'re returning either a Company instance or a String, so the compiler infers the closest supertype (or "least upper bounds") to be Any. The Sink expects input elements of type ByteString, not Any.
One approach is to send the response to the file sink without unmarshalling the response:
.mapAsync(parallelism) {
.map(_.entity.dataBytes) // entity.dataBytes is a Source[ByteString, _]


Chaining context through akka streams

I'm converting some C# code to scala and akka streams.
My c# code looks something like this:
Task<Result1> GetPartialResult1Async(Request request) ...
Task<Result2> GetPartialResult2Async(Request request) ...
async Task<Result> GetResultAsync(Request request)
var result1 = await GetPartialResult1Async(request);
var result2 = await GetPartialResult2Async(request);
return new Result(request, result1, result2);
Now for the akka streams. Instead of having a function from Request to a Task of a result, I have flows from a Request to a Result.
So I already have the following two flows:
val partialResult1Flow: Flow[Request, Result1, NotUsed] = ...
val partialResult2Flow: Flow[Request, Result2, NotUsed] = ...
However I can't see how to combine them into a complete flow, since by calling via on the first flow we lose the original request, and by calling via on the second flow we lose the result of the first flow.
So I've created a WithState monad which looks something like this:
case class WithState[+TState, +TValue](value: TValue, state: TState) {
def map[TResult](func: TValue => TResult): WithState[TState, TResult] = {
WithState(func(value), state)
... bunch more helper functions go here
Then I'm rewriting my original flows to look like this:
def partialResult1Flow[TState]: Flow[WithState[TState, Request], WithState[TState, Result1]] = ...
def partialResult2Flow: Flow[WithState[TState, Request], WithState[TState, Result2]] = ...
and using them like this:
val flow = Flow[Request]
.map(x => WithState(x, x))
.map(x => WithState(x.state, (x.state, x.value))
.map(x => Result(x.state._1, x.state._2, x.value))
Now this works, but of course I can't guarantee how flow will be used. So I really ought to make it take a State parameter:
def flow[TState] = Flow[WithState[TState, Request]]
.map(x => WithState(x.value, (x.state, x.value)))
.map(x => WithState(x.state._2, (x.state, x.value))
.map(x => WithState(Result(x.state._1._2, x.state._2, x.value), x.state._1._1))
Now at this stage my code is getting extremely hard to read. I could clean it up by naming the functions, and using case classes instead of tuples etc. but fundamentally there's a lot of incidental complexity here, which is hard to avoid.
Am I missing something? Is this not a good use case for Akka streams? Is there some inbuilt way of doing this?
I don't have any fundamentally different way to do this than I described in the question.
However the current flow can be significantly improved:
Stage 1: FlowWithContext
Instead of using a custom WithState monad, it's possible to use the built in FlowWithContext.
The advantage of this is that you can use the standard operators on the flow, without needing to worry about transforming the WithState monad. Akka takes care of this for you.
So instead of
def partialResult1Flow[TState]: Flow[WithState[TState, Request], WithState[TState, Result1]] =
Flow[WithState[TState, Request]].mapAsync(_ mapAsync {doRequest(_)})
We can write:
def partialResult1Flow[TState]: FlowWithContext[Request, TState, Result1, TState, NotUsed] =
FlowWithContext[Request, TState].mapAsync(doRequest(_))
Unfortunately though, whilst FlowWithContext is quite easy to write when you don't need to change the context, it's a little fiddly to use when you need to go via a stream which requires you to move some of your current data into the context (as ours does). In order to do that you need to convert to a Flow (using asFlow), and then back to a FlowWithContext using asFlowWithContext.
I found it easiest to just write the whole thing as a Flow in such cases, and convert to a FlowWithContext at the end.
For example:
def flow[TState]: FlowWithContext[Request, TState, Result, TState, NotUsed] =
Flow[(Request, TState)]
.map(x => (x._1, (x._1, x._2)))
.map(x => (x._2._1, (x._2._1, x._1, x._2._2))
.map(x => (Result(x._2._1, x._2._2, x._1), x._2._2))
.asFlowWithContext((a: Request, b: TState) => (a,b))(_._2)
Is this any better?
In this particular case it's probably worse. In other cases, where you rarely need to change the context it would be better. However either way I would recommend using it as it's built in, rather than relying on a custom monad.
Stage 2: viaUsing
In order to make this a bit more user friendly I created a viaUsing extension method for Flow and FlowWithContext:
import{FlowShape, Graph}
import{Flow, FlowWithContext}
object FlowExtensions {
implicit class FlowViaUsingOps[In, Out, Mat](val f: Flow[In, Out, Mat]) extends AnyVal {
def viaUsing[Out2, Using, Mat2](func: Out => Using)(flow: Graph[FlowShape[(Using, Out), (Out2, Out)], Mat2]) : Flow[In, (Out2, Out), Mat] = => (func(x), x)).via(flow)
implicit class FlowWithContextViaUsingOps[In, CtxIn, Out, CtxOut, Mat](val f: FlowWithContext[In, CtxIn, Out, CtxOut, Mat]) extends AnyVal {
def viaUsing[Out2, Using, Mat2](func: Out => Using)(flow: Graph[FlowShape[(Using, (Out, CtxOut)), (Out2, (Out, CtxOut))], Mat2]):
FlowWithContext[In, CtxIn, (Out2, Out), CtxOut, Mat] =
.map(x => (func(x._1), (x._1, x._2)))
.asFlowWithContext((a: In, b: CtxIn) => (a,b))(_._2._2)
.map(x => (x._1, x._2._1))
The purpose of viaUsing, is to create the input for a FlowWithContext from the current output, whilst preserving your current output by passing it through the context. It result in a Flow whose output is the a tuple of the output from the nested flow, and the original flow.
With viaUsing our example simplifies to:
def flow[TState]: FlowWithContext[Request, TState, Result, TState, NotUsed] =
FlowWithContext[Request, TState]
.viaUsing(x => x)(partialResult1Flow)
.viaUsing(x => x._2)(partialResult2Flow)
.map(x => Result(x._2._2, x._2._1, x._1))
I think this is a significant improvement. I've made a request to add viaUsing to Akka instead of relying on extension methods here.
I agree using Akka Streams for backpressure is useful. However, I'm not convinced that modelling the calculation of the partialResults as streams is useful here. having the 'inner' logic based on Futures and wrapping those in the mapAsync of your flow to apply backpressure to the whole operation as one unit seems simpler, and perhaps even better.
This is basically a boiled-down refactoring of Levi Ramsey's earlier excellent answer:
import scala.concurrent.{ ExecutionContext, Future }
import akka.NotUsed
case class Request()
case class Result1()
case class Result2()
case class Response(r: Request, r1: Result1, r2: Result2)
def partialResult1(req: Request): Future[Result1] = ???
def partialResult2(req: Request): Future[Result2] = ???
val system =
implicit val ec: ExecutionContext = system.dispatcher
val flow: Flow[Request, Response, NotUsed] =
.mapAsync(parallelism = 12) { req =>
for {
res1 <- partialResult1(req)
res2 <- partialResult2(req)
} yield (Response(req, res1, res2))
I would start with this, and only if you know you have reason to split partialResult1 and partialResult2 into separate stages introduce an intermediate step in the Flow. Depending on your requirements mapAsyncUnordered might be more suitable.
Disclaimer, I'm not totally familiar with C#'s async/await.
From what I've been able to glean from a quick perusal of the C# docs, Task<T> is a strictly (i.e. eager, not lazy) evaluated computation which will if successful eventually contain a T. The Scala equivalent of this is Future[T], where the equivalent of the C# code would be:
import scala.concurrent.{ ExecutionContext, Future }
def getPartialResult1Async(req: Request): Future[Result1] = ???
def getPartialResult2Async(req: Request): Future[Result2] = ???
def getResultAsync(req: Request)(implicit ectx: ExecutionContext): Future[Result] = {
val result1 = getPartialResult1Async(req)
val result2 = getPartialResult2Async(req)
result1.zipWith(result2) { tup => val (r1, r2) = tup
new Result(req, r1, r2)
/* Could also:
* for {
* r1 <- result1
* r2 <- result2
* } yield { new Result(req, r1, r2) }
* Note that both the `result1.zipWith(result2)` and the above `for`
* construction may compute the two partial results simultaneously. If you
* want to ensure that the second partial result is computed after the first
* partial result is successfully computed:
* for {
* r1 <- getPartialResult1Async(req)
* r2 <- getPartialResult2Async(req)
* } yield new Result(req, r1, r2)
No Akka Streams required for this particular case, but if you have some other need to use Akka Streams, You could express this as
val actorSystem = ??? // In Akka Streams 2.6, you'd probably have this as an implicit val
val parallelism = ??? // Controls requests in flight
val flow = Flow[Request]
.mapAsync(parallelism) { req =>
import actorSystem.dispatcher
getPartialResult1Async(req).map { r1 => (req, r1) }
.mapAsync(parallelism) { tup =>
import actorSystem.dispatcher
getPartialResult2Async(tup._2).map { r2 =>
new Result(tup._1, tup._2, r2)
/* Given the `getResultAsync` function in the previous snippet, you could also:
* val flow = Flow[Request].mapAsync(parallelism) { req =>
* getResultAsync(req)(actorSystem.dispatcher)
* }
One advantage of the Future-based implementation is that it's pretty easy to integrate with whatever Scala abstraction of concurrency/parallelism you want to use in a given context (e.g. cats, akka stream, akka). My general instinct to an Akka Streams integration would be in the direction of the three-liner in my comment in the second code block.

Akka Stream - Select Sink based on Element in Flow

I'm creating a simple message delivery service using Akka stream. The service is just like mail delivery, where elements from source include destination and content like:
case class Message(destination: String, content: String)
and the service should deliver the messages to appropriate sink based on the destination field. I created a DeliverySink class to let it have a name:
case class DeliverySink(name: String, sink: Sink[String, Future[Done]])
Now, I instantiated two DeliverySink, let me call them sinkX and sinkY, and created a map based on their name. In practice, I want to provide a list of sink names and the list should be configurable.
The challenge I'm facing is how to dynamically choose an appropriate sink based on the destination field.
Eventually, I want to map Flow[Message] to a sink. I tried:
val sinkNames: List[String] = List("sinkX", "sinkY")
val sinkMapping: Map[String, DeliverySink] = { name => name -> DeliverySink(name, ???)}.toMap
Flow[Message].map { msg => msg.content }.to(sinks(msg.destination).sink)
but, obviously this doesn't work because we can't reference msg outside of map...
I guess this is not a right approach. I also thought about using filter with broadcast, but if the destination scales to 100, I cannot type every routing. What is a right way to achieve my goal?
Ideally, I would like to make destinations dynamic. So, I cannot statically type all destinations in filter or routing logic. If a destination sink has not been connected, it should create a new sink dynamically too.
If You Have To Use Multiple Sinks
Sink.combine would directly suite your existing requirements. If you attach an appropriate Flow.filter before each Sink then they'll only receive the appropriate messages.
Don't Use Multiple Sinks
In general I think it is bad design to have the structure, and content, of streams contain business logic. Your stream should be a thin veneer for back-pressured concurrency on top of business logic which is in ordinary scala/java code.
In this particular case, I think it would be best to wrap your destination routing inside of a single Sink and the logic should be implemented inside of a separate function. For example:
val routeMessage : (Message) => Unit =
(message) =>
if(message.destination equalsIgnoreCase "stdout")
System.out println message.content
else if(message.destination equalsIgnoreCase "stderr")
System.err println message.content
val routeSink : Sink[Message, _] = Sink foreach routeMessage
Note how much easier it is to now test my routeMessage since it isn't inside of the stream: I don't need any akka testkit "stuff" to test routeMessage. I can also move the function to a Future or a Thread if my concurrency design were to change.
Many Destinations
If you have many destinations you can use a Map. Suppose, for example, you are sending your messages to AmazonSQS. You could define a function to convert a Queue Name to Queue URL and use that function to maintain a Map of already created names:
type QueueName = String
val nameToRequest : (QueueName) => CreateQueueRequest = ??? //implementation unimportant
type QueueURL = String
val nameToURL : (AmazonSQS) => (QueueName) => QueueURL = {
val nameToURL = mutable.Map.empty[QueueName, QueueURL]
(sqs) => (queueName) => nameToURL.get(queueName) match {
case Some(url) => url
case None => {
val url = sqs.getQueueUrl(queueName).getQueueUrl()
nameToURL put (queueName, url)
Now you can use this non-stream function inside of a singular Sink:
val sendMessage : (AmazonSQS) => (Message) => Unit =
(sqs) => (message) =>
sqs sendMessage {
(new SendMessageRequest())
val sqs : AmazonSQS = ???
val messageSink = Sink foreach sendMessage(sqs)
Side Note
For destination you probably want to use something other than String. A coproduct is usually better because they can be used with case statements and you'll get helpful compiler errors if you miss one of the possibilities:
sealed trait Destination
object Out extends Destination
object Err extends Destination
object SomethingElse extends Destination
case class Message(destination: Destination, content: String)
//This function won't compile because SomethingElse doesn't have a case
val routeMessage : (Message) => Unit =
(message) => message.destination match {
case Out =>
case Err =>
Given your requirement, maybe you want to consider multiplexing your stream source into substreams using groubBy:
import akka.util.ByteString
import akka.{NotUsed, Done}
import scala.concurrent.Future
import java.nio.file.Paths
import java.nio.file.StandardOpenOption._
implicit val system = ActorSystem("sys")
implicit val materializer = ActorMaterializer()
import system.dispatcher
case class Message(destination: String, content: String)
case class DeliverySink(name: String, sink: Sink[ByteString, Future[IOResult]])
val messageSource: Source[Message, NotUsed] = Source(List(
Message("a", "uuu"), Message("a", "vvv"),
Message("b", "xxx"), Message("b", "yyy"), Message("b", "zzz")
val sinkA = DeliverySink("sink-a", FileIO.toPath(
Paths.get("/path/to/sink-a.txt"), options = Set(CREATE, WRITE)
val sinkB = DeliverySink("sink-b", FileIO.toPath(
Paths.get("/path/to/sink-b.txt"), options = Set(CREATE, WRITE)
val sinkMapping: Map[String, DeliverySink] = Map("a" -> sinkA, "b" -> sinkB)
val totalDests = 2 => (m.destination, m)).
groupBy(totalDests, _._1).
fold(("", List.empty[Message])) {
case ((_, list), (dest, msg)) => (dest, msg :: list)
mapAsync(parallelism = totalDests) {
case (dest: String, msgList: List[Message]) =>

Scala & Play Websockets: Storing messages exchanged

I started playing around scala and came to this particular boilerplate of web socket chatroom in scala.
They use MessageHub.source() and BroadcastHub.sink() as their Source and Sink for sending the messages to all connected clients.
The example is working fine for exchanging messages as it is.
private val (chatSink, chatSource) = {
// Don't log MergeHub$ProducerFailed as error if the client disconnects.
// recoverWithRetries -1 is essentially "recoverWith"
val source = MergeHub.source[WSMessage]
.recoverWithRetries(-1, { case _: Exception ⇒ Source.empty })
val sink = BroadcastHub.sink[WSMessage]
private val userFlow: Flow[WSMessage, WSMessage, _] = {
Flow.fromSinkAndSource(chatSink, chatSource)
def chat(): WebSocket = {
WebSocket.acceptOrResult[WSMessage, WSMessage] {
case rh if sameOriginCheck(rh) =>
Future.successful(userFlow).map { flow =>
}.recover {
case e: Exception =>
val msg = "Cannot create websocket"
logger.error(msg, e)
val result = InternalServerError(msg)
case rejected =>
logger.error(s"Request ${rejected} failed same origin check")
Future.successful {
I want to store the messages that are exchanged in the chatroom in a DB.
I tried adding map and fold functions to source and sink to get hold of the messages that are sent but I wasn't able to.
I tried adding a Flow stage between MergeHub and BroadcastHub like below
val flow = Flow[WSMessage].map(element => println(s"Message: $element"))
But it throws a compilation error that cannot reference toMat with such signature.
Can someone help or point me how can I get hold of messages that are sent and store them in DB.
Link for full template:
Let's look at your flow:
val flow = Flow[WSMessage].map(element => println(s"Message: $element"))
It takes elements of type WSMessage, and returns nothing (Unit). Here it is again with the correct type:
val flow: Flow[Unit] = Flow[WSMessage].map(element => println(s"Message: $element"))
This will clearly not work as the sink expects WSMessage and not Unit.
Here's how you can fix the above problem:
val flow = Flow[WSMessage].map { element =>
println(s"Message: $element")
Not that for persisting messages in the database, you will most likely want to use an async stage, roughly:
val flow = Flow[WSMessage].mapAsync(parallelism) { element =>
println(s"Message: $element")
// assuming DB.write() returns a Future[Unit]
DB.write(element).map(_ => element)

How to add an error flow for Akka http websockets

I've been banging my head against the wall for quite some time as I can't figure out how to add an error flow for an akka http websocket flow. What I'm trying to achieve is:
Message comes in from WS client
It's parsed with circe from json
If the message was the right format send the parsed message to an actor
If the message was the wrong format return an error message to the client
The actor can additionally send messages to the client
Without the error handling this was quite easy, but I can't figure out how to add the errors. Here's what I have:
type GameDecodeResult =
Either[(String, io.circe.Error), GameLobby.LobbyRequest]
val errorFlow =
.mapConcat {
case Left(err) => err :: Nil
case Right(_) => Nil
.map { case (message, error) =>"failed to parse message $message", error)
val normalFlow = {
val normalFlowSink =
.mapConcat {
case Right(msg) => msg :: Nil
case Left(_) => Nil
.map(req => GameLobby.IncomingMessage(userId, req))
.to(Sink.actorRef[GameLobby.IncomingMessage](gameLobby, PoisonPill))
val normalFlowSource: Source[Message, NotUsed] =
.mapMaterializedValue { outActor =>
gameLobby ! GameLobby.UserConnected(userId, outActor)
.map(outMessage => TextMessage(Ok(outMessage.message).asJson.spaces2))
Flow.fromSinkAndSource(normalFlowSink, normalFlowSource)
val incomingMessageParser =
.flatMapConcat {
case tm: TextMessage =>
case bm: BinaryMessage =>
Source.empty }
.map { message =>
decode[GameLobby.LobbyRequest](message) => message -> err)
These are my flows defined and I think this should bee good enough, but I have no idea how to assemble them and the complexity of the akka streaming API doesn't help. Here's what I tried:
val x: Flow[Message, Message, NotUsed] =
GraphDSL.create(incomingMessageParser, normalFlow, errorFlow)((_, _, _)) { implicit builder =>
(incoming, normal, error) =>
import GraphDSL.Implicits._
val partitioner = builder.add(Partition[GameDecodeResult](2, {
case Right(_) => 0
case Left(_) => 1
val merge = builder.add(Merge[Message](2)) ~> partitioner ~> normal ~> merge
partitioner ~> error ~> merge
but admittedly I have absolutely no idea how GraphDSL.create works, where I can use the ~> arrow or what I'm doing in genreal at the last part. It just won't type check and the error messages are not helping me one bit.
A few things needing to be fixed in the Flow you're building using the GraphDSL:
There is no need to pass the 3 subflows to the GraphDSL.create method, as this is only needed to customize the materialized value of your graph. You have already decided the materialized value of your graph is going to be NotUsed.
When connecting incoming using the ~> operator, you need to connect its outlet (.out) to the partition stage.
Every GraphDSL definition block needs to return the shape of your graph - i.e. its external ports. You do that by returning a FlowShape that has as input, as merge.out as output. These will define the blueprint of your custom flow.
Because in the end you want to obtain a Flow, you're missing a last call to create is from the graph you defined. This call is Flow.fromGraph(...).
Code example below:
val x: Flow[Message, Message, NotUsed] =
Flow.fromGraph(GraphDSL.create() { implicit builder =>
import GraphDSL.Implicits._
val partitioner = builder.add(Partition[GameDecodeResult](2, {
case Right(_) => 0
case Left(_) => 1
val merge = builder.add(Merge[Message](2))
val incoming = builder.add(incomingMessageParser)
incoming.out ~> partitioner
partitioner ~> normalFlow ~> merge
partitioner ~> errorFlow ~> merge
FlowShape(, merge.out)

Stream Future in Play 2.5

Once again I am attempting to update some pre Play 2.5 code (based on this vid). For example the following used to be how to stream a Future:
Ok.chunked(Enumerator.generateM(Promise.timeout(Some("hello"), 500)))
I have created the following method for the work-around for Promise.timeout (deprecated) using Akka:
private def keepResponding(data: String, delay: FiniteDuration, interval: FiniteDuration): Future[Result] = {
val promise: Promise[Result] = Promise[Result]()
actorSystem.scheduler.schedule(delay, interval) { promise.success(Ok(data)) }
According to the Play Framework Migration Guide; Enumerators should be rewritten to a Source and Source.unfoldAsync is apparently the equivalent of Enumerator.generateM so I was hoping that this would work (where str is a Future[String]):
def inf = Action { request =>
val str = keepResponding("stream me", 1.second, 2.second)
Of course I'm getting a Type mismatch error and when looking at the case class signature of unfoldAsync:
final class UnfoldAsync[S, E](s: S, f: S ⇒ Future[Option[(S, E)]])
I can see that the parameters are not correct but I'm not fully understanding what/how I should pass this through.
unfoldAsync is even more generic than Play!'s own generateM, as it allows you to pass through a status (S) value. This can make the value emitted depend on the previously emitted value(s).
The example below will load values by an increasing id, until the loading fails:
val source: Source[String, NotUsed] = Source.unfoldAsync(0){ id ⇒
.map(s ⇒ Some((id + 1, s)))
.recover{case _ ⇒ None}
def loadFromId(id: Int): Future[String] = ???
In your case an internal state is not really needed, therefore you can just pass dummy values whenever required, e.g.
val source: Source[Result, NotUsed] = Source.unfoldAsync(NotUsed) { _ ⇒
schedule("stream me", 2.seconds).map(x ⇒ Some(NotUsed → x))
def schedule(data: String, delay: FiniteDuration): Future[Result] = {
akka.pattern.after(delay, system.scheduler){Future.successful(Ok(data))}
Note that your original implementation of keepResponding is incorrect, as you cannot complete a Promise more than once. Akka after pattern offer a simpler way to achieve what you need.
However, note that in your specific case, Akka Streams offers a more idiomatic solution with Source.tick:
val source: Source[String, Cancellable] = Source.tick(1.second, 2.seconds, NotUsed).mapAsync(1){ _ ⇒
def loadSomeFuture(): Future[String] = ???
or even simpler in case you don't actually need asynchronous computation as in your example
val source: Source[String, Cancellable] = Source.tick(1.second, 2.seconds, "stream me")