How to make effectful computation referential transparent - scala

I'm learning FP by writing simple apps. And now I'm approaching effect monads (cats.effect.IO/scalaz.IO does not really matter). I have two functions:
def open(path: String): IO[InputStream] = IO {
new FileInputStream(new File(path))
}
def read(is: InputStream): IO[Option[Array[Byte]]] = IO {
val buffer = new Array[Byte](4096)
val bytesRead = is.read(buffer)
if (bytesRead != -1) {
val newBuffer = new Array[Byte](bytesRead)
System.arraycopy(buffer, 0, newBuffer, 0, bytesRead)
print(new String(buffer))
Some(newBuffer)
}
else
None
}
And I can combine them into a stream as follows
import cats.effect.IO
import fs2.Stream
object App {
def main(args: Array[String]): Unit = logic.unsafeRunSync()
def logic: IO[Unit] = for {
is <- open("/tmp/prompthooks.py")
_ <- fs2.Stream.eval(read(is)).repeat.unNoneTerminate.compile.drain
} yield ()
}
And it works fine. But the question is if it all implemented in pure FP. I have doubts about that in a sense that def read(is: InputStream): IO[Option[Array[Byte]] accepts stream and try to read from it. Yes it suspends side-effect but in val io = read(is) is sort of stateful (if we perform unsafeRunSync twice we got different result.

Related

Mixing pure and impure with Cats Effect

Suppose we have a two-stage pure method.
def filterPositivePure(seq: Seq[Int]): Seq[Int] =
if (seq.nonEmpty) {
val sorted = seq.sorted
sorted.filter(_ > 0)
}
else seq
We needed to log the intermediate result of the calculations or any other impure action.
def filterPositiveImpure(seq: Seq[Int]): Seq[Int] =
if (seq.nonEmpty) {
val sorted = seq.sorted
println(sorted)
sorted.filter(_ > 0)
}
else seq
Then to preserve purity, we wrapped in IO the output to the log, the rest of the calculations and the result of the entire alternative if branch.
def filterPositiveIO(seq: Seq[Int]): IO[Seq[Int]] =
if (seq.nonEmpty) {
val sorted = seq.sorted
IO(println(sorted)) *> IO(sorted.filter(_ > 0))
}
else IO(seq)
Is there a more coincise way to bring purity back?
I think better way is working on one action as a new effect and write separate function for logging:
import cats.syntax.flatMap._
// this three imports is just to make show is working
import cats.syntax.show._
import cats.instances.list._
import cats.instances.int._
// now function looks more structural and without multiple using sorted list
def filterPositiveIO(seq: Seq[Int]): IO[Seq[Int]] = {
if (seq.nonEmpty)
withLogging(seq.toList.sorted).map(_.filter(_ > 0))
else
IO(seq)
}
// this function is just for making some logging effect and return lifted value into IO
def withLogging[A: Show](value: A): IO[A] = logging(value.show) >> IO(value)
// here you should make logging effect
def logging(strExpr: => String): IO[Unit] = ???

Scala stream and ExecutionContext issue

I'm new in Scala and i'm facing a few problems in my assignment :
I want to build a stream class that can do 3 main tasks : filter,map,and forEach.
My streams data is an array of elements. Each of the 3 main tasks should run in 2 different threads on my streams array.
In addition, I need to divde the logic of the action and its actual run to two different parts. First declare all tasks in stream and only when I run stream.run() I want the actual actions to happen.
My code :
class LearningStream[A]() {
val es: ExecutorService = Executors.newFixedThreadPool(2)
val ec = ExecutionContext.fromExecutorService(es)
var streamValues: ArrayBuffer[A] = ArrayBuffer[A]()
var r: Runnable = () => "";
def setValues(streamv: ArrayBuffer[A]) = {
streamValues = streamv;
}
def filter(p: A => Boolean): LearningStream[A] = {
var ls_filtered: LearningStream[A] = new LearningStream[A]()
r = () => {
println("running real filter..")
val (l,r) = streamValues.splitAt(streamValues.length/2)
val a:ArrayBuffer[A]=es.submit(()=>l.filter(p)).get()
val b:ArrayBuffer[A]=es.submit(()=>r.filter(p)).get()
ms_filtered.setValues(a++b)
}
return ls_filtered
}
def map[B](f: A => B): LearningStream[B] = {
var ls_map: LearningStream[B] = new LearningStream[B]()
r = () => {
println("running real map..")
val (l,r) = streamValues.splitAt(streamValues.length/2)
val a:ArrayBuffer[B]=es.submit(()=>l.map(f)).get()
val b:ArrayBuffer[B]=es.submit(()=>r.map(f)).get()
ls_map.setValues(a++b)
}
return ls_map
}
def forEach(c: A => Unit): Unit = {
r=()=>{
println("running real forEach")
streamValues.foreach(c)}
}
def insert(a: A): Unit = {
streamValues += a
}
def start(): Unit = {
ec.submit(r)
}
def shutdown(): Unit = {
ec.shutdown()
}
}
my main :
def main(args: Array[String]): Unit = {
var factorial=0
val s = new LearningStream[String]
s.filter(str=>str.startsWith("-")).map(s=>s.toInt*(-1)).forEach(i=>factorial=factorial*i)
for(i <- -5 to 5){
s.insert(i.toString)
}
println(s.streamValues)
s.start()
println(factorial)
}
The main prints only the filter`s output and the factorial isnt changed (still 1).
What am I missing here ?
My solution: #Levi Ramsey left a few good hints in the comments if you want to get hints and not the real solution.
First problem: Only one command (filter) run and the other didn't. solution: insert to the runnable of each command a call for the next stream via:
ec.submit(ms_map.r)
In order to be able to close all sessions, we need to add another LearningStream data member to the class. However we can't add just a regular LearningStream object because it depends on parameter [A]. Therefore, I implemented a trait that has the close function and my data member was of that trait type.

Composing BodyParser in Play 2.5

Given a function with this signature:
def parser[A](otherParser: BodyParser[A]): BodyParser[A]
How can I write the function in such a way that the request body is examined and verified before it is passed to otherParser?
For simplicity let's say that I want to verify that a header ("Some-Header", perhaps) has a value that matches the body exactly. So if I have this action:
def post(): Action(parser(parse.tolerantText)) { request =>
Ok(request.body)
}
When I make a request like curl -H "Some-Header: hello" -d "hello" http://localhost:9000/post it should return "hello" in the response body with a status of 200. If my request is curl -H "Some-Header: hello" -d "hi" http://localhost:9000/post it should return a 400 with no body.
Here's what I've tried.
This one does not compile because otherParser(request).through(flow) expects flow to output a ByteString. The idea here was that the flow could notify the accumulator whether or not to continue processing via the Either output. I'm not sure how to let the accumulator know the status of the previous step.
def parser[A](otherParser: BodyParser[A]): BodyParser[A] = BodyParser { request =>
val flow: Flow[ByteString, Either[Result, ByteString], NotUsed] = Flow[ByteString].map { bytes =>
if (request.headers.get("Some-Header").contains(bytes.utf8String)) {
Right(bytes)
} else {
Left(BadRequest)
}
}
val acc: Accumulator[ByteString, Either[Result, A]] = otherParser(request)
// This fails to compile because flow needs to output a ByteString
acc.through(flow)
}
I also attempted to use filter. This one does compile and the response body that gets written is correct. However it always returns a 200 Ok response status.
def parser[A](otherParser: BodyParser[A]): BodyParser[A] = BodyParser { request =>
val flow: Flow[ByteString, ByteString, akka.NotUsed] = Flow[ByteString].filter { bytes =>
request.headers.get("Some-Header").contains(bytes.utf8String)
}
val acc: Accumulator[ByteString, Either[Result, A]] = otherParser(request)
acc.through(flow)
}
I came up with a solution using a GraphStageWithMaterializedValue. This concept was borrowed from Play's maxLength body parser. The key difference between my first attempt in my question (that doesn't compile) is that instead of attempting to mutate the stream I should use the materialized value to convey information about the state of processing. While I had created a Flow[ByteString, Either[Result, ByteString], NotUsed] it turns out what I needed was a Flow[ByteString, ByteString, Future[Boolean]].
So with that, my parser function ends up looking like this:
def parser[A](otherParser: BodyParser[A]): BodyParser[A] = BodyParser { request =>
val flow: Flow[ByteString, ByteString, Future[Boolean]] = Flow.fromGraph(new BodyValidator(request.headers.get("Some-Header")))
val parserSink: Sink[ByteString, Future[Either[Result, A]]] = otherParser.apply(request).toSink
Accumulator(flow.toMat(parserSink) { (statusFuture: Future[Boolean], resultFuture: Future[Either[Result, A]]) =>
statusFuture.flatMap { success =>
if (success) {
resultFuture.map {
case Left(result) => Left(result)
case Right(a) => Right(a)
}
} else {
Future.successful(Left(BadRequest))
}
}
})
}
The key line is this one:
val flow: Flow[ByteString, ByteString, Future[Boolean]] = Flow.fromGraph(new BodyValidator(request.headers.get("Some-Header")))
The rest kind of falls into place once you are able to create this flow. Unfortunately BodyValidator is pretty verbose and feels somewhat boiler-platey. In any case, it's mostly pretty easy to read. GraphStageWithMaterializedValue expects you to implement def shape: S (S is FlowShape[ByteString, ByteString] here) to specify the input type and output type of this graph. It also expects you to imlpement def createLogicAndMaterializedValue(inheritedAttributes: Attributes): (GraphStageLogic, M) (M is a Future[Boolean] here) to define what the graph should actually do. Here's the full code of BodyValidator (I'll explain in more detail below):
class BodyValidator(expected: Option[String]) extends GraphStageWithMaterializedValue[FlowShape[ByteString, ByteString], Future[Boolean]] {
val in = Inlet[ByteString]("BodyValidator.in")
val out = Outlet[ByteString]("BodyValidator.out")
override def shape: FlowShape[ByteString, ByteString] = FlowShape.of(in, out)
override def createLogicAndMaterializedValue(inheritedAttributes: Attributes): (GraphStageLogic, Future[Boolean]) = {
val status = Promise[Boolean]()
val bodyBuffer = new ByteStringBuilder()
val logic = new GraphStageLogic(shape) {
setHandler(out, new OutHandler {
override def onPull(): Unit = pull(in)
})
setHandler(in, new InHandler {
def onPush(): Unit = {
val chunk = grab(in)
bodyBuffer.append(chunk)
push(out, chunk)
}
override def onUpstreamFinish(): Unit = {
val fullBody = bodyBuffer.result()
status.success(expected.map(ByteString(_)).contains(fullBody))
completeStage()
}
override def onUpstreamFailure(e: Throwable): Unit = {
status.failure(e)
failStage(e)
}
})
}
(logic, status.future)
}
}
You first want to create an Inlet and Outlet to set up the inputs and outputs for your graph
val in = Inlet[ByteString]("BodyValidator.in")
val out = Outlet[ByteString]("BodyValidator.out")
Then you use these to define shape.
def shape: FlowShape[ByteString, ByteString] = FlowShape.of(in, out)
Inside createLogicAndMaterializedValue you need to initialize the value you intend to materialze. Here I've used a promise that can be resolved when I have the full data from the stream. I also create a ByteStringBuilder to track the data between iterations.
val status = Promise[Boolean]()
val bodyBuffer = new ByteStringBuilder()
Then I create a GraphStageLogic to actually set up what this graph does at each point of processing. Two handler are being set. One is an InHandler for dealing with data as it comes from the upstream source. The other is an OutHandler for dealing with data to send downstream. There's nothing really interesting in the OutHandler so I'll ignore it here besides to say that it is necessary boiler plate in order to avoid an IllegalStateException. Three methods are overridden in the InHandler: onPush, onUpstreamFinish, and onUpstreamFailure. onPush is called when new data is ready from upstream. In this method I simply grab the next chunk of data, write it to bodyBuffer and push the data downstream.
def onPush(): Unit = {
val chunk = grab(in)
bodyBuffer.append(chunk)
push(out, chunk)
}
onUpstreamFinish is called when the upstream finishes (surprise). This is where the business logic of comparing the body with the header happens.
override def onUpstreamFinish(): Unit = {
val fullBody = bodyBuffer.result()
status.success(expected.map(ByteString(_)).contains(fullBody))
completeStage()
}
onUpstreamFailure is implemented so that when something goes wrong, I can mark the materialized future as failed as well.
override def onUpstreamFailure(e: Throwable): Unit = {
status.failure(e)
failStage(e)
}
Then I just return the GraphStageLogic I've created and status.future as a tuple.

Asynchronous Iterable over remote data

There is some data that I have pulled from a remote API, for which I use a Future-style interface. The data is structured as a linked-list. A relevant example data container is shown below.
case class Data(information: Int) {
def hasNext: Boolean = ??? // Implemented
def next: Future[Data] = ??? // Implemented
}
Now I'm interested in adding some functionality to the data class, such as map, foreach, reduce, etc. To do so I want to implement some form of IterableLike such that it inherets these methods.
Given below is the trait Data may extend, such that it gets this property.
trait AsyncIterable[+T]
extends IterableLike[Future[T], AsyncIterable[T]]
{
def hasNext : Boolean
def next : Future[T]
// How to implement?
override def iterator: Iterator[Future[T]] = ???
override protected[this] def newBuilder: mutable.Builder[Future[T], AsyncIterable[T]] = ???
override def seq: TraversableOnce[Future[T]] = ???
}
It should be a non-blocking implementation, which when acted on, starts requesting the next data from the remote data source.
It is then possible to do cool stuff such as
case class Data(information: Int) extends AsyncIterable[Data]
val data = Data(1) // And more, of course
// Asynchronously print all the information.
data.foreach(data => println(data.information))
It is also acceptable for the interface to be different. But the result should in some way represent asynchronous iteration over the collection. Preferably in a way that is familiar to developers, as it will be part of an (open source) library.
In production I would use one of following:
Akka Streams
Reactive Extensions
For private tests I would implement something similar to following.
(Explanations are below)
I have modified a little bit your Data:
abstract class AsyncIterator[T] extends Iterator[Future[T]] {
def hasNext: Boolean
def next(): Future[T]
}
For it we can implement this Iterable:
class AsyncIterable[T](sourceIterator: AsyncIterator[T])
extends IterableLike[Future[T], AsyncIterable[T]]
{
private def stream(): Stream[Future[T]] =
if(sourceIterator.hasNext) {sourceIterator.next #:: stream()} else {Stream.empty}
val asStream = stream()
override def iterator = asStream.iterator
override def seq = asStream.seq
override protected[this] def newBuilder = throw new UnsupportedOperationException()
}
And if see it in action using following code:
object Example extends App {
val source = "Hello World!";
val iterator1 = new DelayedIterator[Char](100L, source.toCharArray)
new AsyncIterable(iterator1).foreach(_.foreach(print)) //prints 1 char per 100 ms
pause(2000L)
val iterator2 = new DelayedIterator[String](100L, source.toCharArray.map(_.toString))
new AsyncIterable(iterator2).reduceLeft((fl: Future[String], fr) =>
for(l <- fl; r <- fr) yield {println(s"$l+$r"); l + r}) //prints 1 line per 100 ms
pause(2000L)
def pause(duration: Long) = {println("->"); Thread.sleep(duration); println("\n<-")}
}
class DelayedIterator[T](delay: Long, data: Seq[T]) extends AsyncIterator[T] {
private val dataIterator = data.iterator
private var nextTime = System.currentTimeMillis() + delay
override def hasNext = dataIterator.hasNext
override def next = {
val thisTime = math.max(System.currentTimeMillis(), nextTime)
val thisValue = dataIterator.next()
nextTime = thisTime + delay
Future {
val now = System.currentTimeMillis()
if(thisTime > now) Thread.sleep(thisTime - now) //Your implementation will be better
thisValue
}
}
}
Explanation
AsyncIterable uses Stream because it's calculated lazily and it's simple.
Pros:
simplicity
multiple calls to iterator and seq methods return same iterable with all items.
Cons:
could lead to memory overflow because stream keeps all prevously obtained values.
first value is eagerly gotten during creation of AsyncIterable
DelayedIterator is very simplistic implementation of AsyncIterator, don't blame me for quick and dirty code here.
It's still strange for me to see synchronous hasNext and asynchronous next()
Using Twitter Spool I've implemented a working example.
To implement spool I modified the example in the documentation.
import com.twitter.concurrent.Spool
import com.twitter.util.{Await, Return, Promise}
import scala.concurrent.{ExecutionContext, Future}
trait AsyncIterable[+T <: AsyncIterable[T]] { self : T =>
def hasNext : Boolean
def next : Future[T]
def spool(implicit ec: ExecutionContext) : Spool[T] = {
def fill(currentPage: Future[T], rest: Promise[Spool[T]]) {
currentPage foreach { cPage =>
if(hasNext) {
val nextSpool = new Promise[Spool[T]]
rest() = Return(cPage *:: nextSpool)
fill(next, nextSpool)
} else {
val emptySpool = new Promise[Spool[T]]
emptySpool() = Return(Spool.empty[T])
rest() = Return(cPage *:: emptySpool)
}
}
}
val rest = new Promise[Spool[T]]
if(hasNext) {
fill(next, rest)
} else {
rest() = Return(Spool.empty[T])
}
self *:: rest
}
}
Data is the same as before, and now we can use it.
// Cool stuff
implicit val ec = scala.concurrent.ExecutionContext.global
val data = Data(1) // And others
// Print all the information asynchronously
val fut = data.spool.foreach(data => println(data.information))
Await.ready(fut)
It will trow an exception on the second element, because the implementation of next was not provided.

Sequencing Scala Futures with bounded parallelism (without messing around with ExecutorContexts)

Background: I have a function:
def doWork(symbol: String): Future[Unit]
which initiates some side-effects to fetch data and store it, and completes a Future when its done. However, the back-end infrastructure has usage limits, such that no more than 5 of these requests can be made in parallel. I have a list of N symbols that I need to get through:
var symbols = Array("MSFT",...)
but I want to sequence them such that no more than 5 are executing simultaneously. Given:
val allowableParallelism = 5
my current solution is (assuming I'm working with async/await):
val symbolChunks = symbols.toList.grouped(allowableParallelism).toList
def toThunk(x: List[String]) = () => Future.sequence(x.map(doWork))
val symbolThunks = symbolChunks.map(toThunk)
val done = Promise[Unit]()
def procThunks(x: List[() => Future[List[Unit]]]): Unit = x match {
case Nil => done.success()
case x::xs => x().onComplete(_ => procThunks(xs))
}
procThunks(symbolThunks)
await { done.future }
but, for obvious reasons, I'm not terribly happy with it. I feel like this should be possible with folds, but every time I try, I end up eagerly creating the Futures. I also tried out a version with RxScala Observables, using concatMap, but that also seemed like overkill.
Is there a better way to accomplish this?
I have example how to do it with scalaz-stream. It's quite a lot of code because it's required to convert scala Future to scalaz Task (abstraction for deferred computation). However it's required to add it to project once. Another option is to use Task for defining 'doWork'. I personally prefer task for building async programs.
import scala.concurrent.{Future => SFuture}
import scala.util.Random
import scala.concurrent.ExecutionContext.Implicits.global
import scalaz.stream._
import scalaz.concurrent._
val P = scalaz.stream.Process
val rnd = new Random()
def doWork(symbol: String): SFuture[Unit] = SFuture {
Thread.sleep(rnd.nextInt(1000))
println(s"Symbol: $symbol. Thread: ${Thread.currentThread().getName}")
}
val symbols = Seq("AAPL", "MSFT", "GOOGL", "CVX").
flatMap(s => Seq.fill(5)(s).zipWithIndex.map(t => s"${t._1}${t._2}"))
implicit class Transformer[+T](fut: => SFuture[T]) {
def toTask(implicit ec: scala.concurrent.ExecutionContext): Task[T] = {
import scala.util.{Failure, Success}
import scalaz.syntax.either._
Task.async {
register =>
fut.onComplete {
case Success(v) => register(v.right)
case Failure(ex) => register(ex.left)
}
}
}
}
implicit class ConcurrentProcess[O](val process: Process[Task, O]) {
def concurrently[O2](concurrencyLevel: Int)(f: Channel[Task, O, O2]): Process[Task, O2] = {
val actions =
process.
zipWith(f)((data, f) => f(data))
val nestedActions =
actions.map(P.eval)
merge.mergeN(concurrencyLevel)(nestedActions)
}
}
val workChannel = io.channel((s: String) => doWork(s).toTask)
val process = Process.emitAll(symbols).concurrently(5)(workChannel)
process.run.run
When you'll have all this transformation in scope, basically all you need is:
val workChannel = io.channel((s: String) => doWork(s).toTask)
val process = Process.emitAll(symbols).concurrently(5)(workChannel)
Quite short and self-decribing
Although you've already got an excellent answer, I thought I might still offer an opinion or two about these matters.
I remember seeing somewhere (on someone's blog) "use actors for state and use futures for concurrency".
So my first thought would be to utilize actors somehow. To be precise, I would have a master actor with a router launching multiple worker actors, with number of workers restrained according to allowableParallelism. So, assuming I have
def doWorkInternal (symbol: String): Unit
which does the work from yours doWork taken 'outside of future', I would have something along these lines (very rudimentary, not taking many details into consideration, and practically copying code from akka documentation):
import akka.actor._
case class WorkItem (symbol: String)
case class WorkItemCompleted (symbol: String)
case class WorkLoad (symbols: Array[String])
case class WorkLoadCompleted ()
class Worker extends Actor {
def receive = {
case WorkItem (symbol) =>
doWorkInternal (symbol)
sender () ! WorkItemCompleted (symbol)
}
}
class Master extends Actor {
var pending = Set[String] ()
var originator: Option[ActorRef] = None
var router = {
val routees = Vector.fill (allowableParallelism) {
val r = context.actorOf(Props[Worker])
context watch r
ActorRefRoutee(r)
}
Router (RoundRobinRoutingLogic(), routees)
}
def receive = {
case WorkLoad (symbols) =>
originator = Some (sender ())
context become processing
for (symbol <- symbols) {
router.route (WorkItem (symbol), self)
pending += symbol
}
}
def processing: Receive = {
case Terminated (a) =>
router = router.removeRoutee(a)
val r = context.actorOf(Props[Worker])
context watch r
router = router.addRoutee(r)
case WorkItemCompleted (symbol) =>
pending -= symbol
if (pending.size == 0) {
context become receive
originator.get ! WorkLoadCompleted
}
}
}
You could query the master actor with ask and receive a WorkLoadCompleted in a future.
But thinking more about 'state' (of number of simultaneous requests in processing) to be hidden somewhere, together with implementing necessary code for not exceeding it, here's something of the 'future gateway intermediary' sort, if you don't mind imperative style and mutable (used internally only though) structures:
object Guardian
{
private val incoming = new collection.mutable.HashMap[String, Promise[Unit]]()
private val outgoing = new collection.mutable.HashMap[String, Future[Unit]]()
private val pending = new collection.mutable.Queue[String]
def doWorkGuarded (symbol: String): Future[Unit] = {
synchronized {
val p = Promise[Unit] ()
incoming(symbol) = p
if (incoming.size <= allowableParallelism)
launchWork (symbol)
else
pending.enqueue (symbol)
p.future
}
}
private def completionHandler (t: Try[Unit]): Unit = {
synchronized {
for (symbol <- outgoing.keySet) {
val f = outgoing (symbol)
if (f.isCompleted) {
incoming (symbol).completeWith (f)
incoming.remove (symbol)
outgoing.remove (symbol)
}
}
for (i <- outgoing.size to allowableParallelism) {
if (pending.nonEmpty) {
val symbol = pending.dequeue()
launchWork (symbol)
}
}
}
}
private def launchWork (symbol: String): Unit = {
val f = doWork(symbol)
outgoing(symbol) = f
f.onComplete(completionHandler)
}
}
doWork now is exactly like yours, returning Future[Unit], with the idea that instead of using something like
val futures = symbols.map (doWork (_)).toSeq
val future = Future.sequence(futures)
which would launch futures not regarding allowableParallelism at all, I would instead use
val futures = symbols.map (Guardian.doWorkGuarded (_)).toSeq
val future = Future.sequence(futures)
Think about some hypothetical database access driver with non-blocking interface, i.e. returning futures on requests, which is limited in concurrency by being built over some connection pool for example - you wouldn't want it to return futures not taking parallelism level into account, and require you to juggle with them to keep parallelism under control.
This example is more illustrative than practical since I wouldn't normally expect that 'outgoing' interface would be utilizing futures like this (which is quote ok for 'incoming' interface).
First, obviously some purely functional wrapper around Scala's Future is needed, cause it's side-effective and runs as soon as it can. Let's call it Deferred:
import scala.concurrent.Future
import scala.util.control.Exception.nonFatalCatch
class Deferred[+T](f: () => Future[T]) {
def run(): Future[T] = f()
}
object Deferred {
def apply[T](future: => Future[T]): Deferred[T] =
new Deferred(() => nonFatalCatch.either(future).fold(Future.failed, identity))
}
And here is the routine:
import java.util.concurrent.CopyOnWriteArrayList
import java.util.concurrent.atomic.AtomicInteger
import scala.collection.immutable.Seq
import scala.concurrent.{ExecutionContext, Future, Promise}
import scala.util.control.Exception.nonFatalCatch
import scala.util.{Failure, Success}
trait ConcurrencyUtils {
def runWithBoundedParallelism[T](parallelism: Int = Runtime.getRuntime.availableProcessors())
(operations: Seq[Deferred[T]])
(implicit ec: ExecutionContext): Deferred[Seq[T]] =
if (parallelism > 0) Deferred {
val indexedOps = operations.toIndexedSeq // index for faster access
val promise = Promise[Seq[T]]()
val acc = new CopyOnWriteArrayList[(Int, T)] // concurrent acc
val nextIndex = new AtomicInteger(parallelism) // keep track of the next index atomically
def run(operation: Deferred[T], index: Int): Unit = {
operation.run().onComplete {
case Success(value) =>
acc.add((index, value)) // accumulate result value
if (acc.size == indexedOps.size) { // we've done
import scala.collection.JavaConversions._
// in concurrent setting next line may be called multiple times, that's why trySuccess instead of success
promise.trySuccess(acc.view.sortBy(_._1).map(_._2).toList)
} else {
val next = nextIndex.getAndIncrement() // get and inc atomically
if (next < indexedOps.size) { // run next operation if exists
run(indexedOps(next), next)
}
}
case Failure(t) =>
promise.tryFailure(t) // same here (may be called multiple times, let's prevent stdout pollution)
}
}
if (operations.nonEmpty) {
indexedOps.view.take(parallelism).zipWithIndex.foreach((run _).tupled) // run as much as allowed
promise.future
} else {
Future.successful(Seq.empty)
}
} else {
throw new IllegalArgumentException("Parallelism must be positive")
}
}
In a nutshell, we run as much operations initially as allowed and then on each operation completion we run next operation available, if any. So the only difficulty here is to maintain next operation index and results accumulator in concurrent setting. I'm not an absolute concurrency expert, so make me know if there are some potential problems in the code above. Notice that returned value is also a deferred computation that should be run.
Usage and test:
import org.scalatest.{Matchers, FlatSpec}
import org.scalatest.concurrent.ScalaFutures
import org.scalatest.time.{Seconds, Span}
import scala.collection.immutable.Seq
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.Future
import scala.concurrent.duration._
class ConcurrencyUtilsSpec extends FlatSpec with Matchers with ScalaFutures with ConcurrencyUtils {
"runWithBoundedParallelism" should "return results in correct order" in {
val comp1 = mkDeferredComputation(1)
val comp2 = mkDeferredComputation(2)
val comp3 = mkDeferredComputation(3)
val comp4 = mkDeferredComputation(4)
val comp5 = mkDeferredComputation(5)
val compountComp = runWithBoundedParallelism(2)(Seq(comp1, comp2, comp3, comp4, comp5))
whenReady(compountComp.run()) { result =>
result should be (Seq(1, 2, 3, 4, 5))
}
}
// increase default ScalaTest patience
implicit val defaultPatience = PatienceConfig(timeout = Span(10, Seconds))
private def mkDeferredComputation[T](result: T, sleepDuration: FiniteDuration = 100.millis): Deferred[T] =
Deferred {
Future {
Thread.sleep(sleepDuration.toMillis)
result
}
}
}
Use Monix Task. An example from Monix document for parallelism=10
val items = 0 until 1000
// The list of all tasks needed for execution
val tasks = items.map(i => Task(i * 2))
// Building batches of 10 tasks to execute in parallel:
val batches = tasks.sliding(10,10).map(b => Task.gather(b))
// Sequencing batches, then flattening the final result
val aggregate = Task.sequence(batches).map(_.flatten.toList)
// Evaluation:
aggregate.foreach(println)
//=> List(0, 2, 4, 6, 8, 10, 12, 14, 16,...
Akka streams, allow you to do the following:
import akka.NotUsed
import akka.stream.Materializer
import akka.stream.scaladsl.Source
import scala.concurrent.Future
def sequence[A: Manifest, B](items: Seq[A], func: A => Future[B], parallelism: Int)(
implicit mat: Materializer
): Future[Seq[B]] = {
val futures: Source[B, NotUsed] =
Source[A](items.toList).mapAsync(parallelism)(x => func(x))
futures.runFold(Seq.empty[B])(_ :+ _)
}
sequence(symbols, doWork, allowableParallelism)