How to spawn an unknown amount of Futures and combine the result even if one or more failed? - scala

I want to turn the following sequential code into concurrent code with Futures and need advice on how to structure it.
sequential:
import java.net.URL
val providers = List(
new URL("http://www.cnn.com"),
new URL("http://www.bbc.co.uk"),
new URL("http://www.othersite.com")
)
def download(urls: URL*) = urls.flatMap(url => io.Source.fromURL(url).getLines).distinct
val res = download(providers:_*)
I want to download all sources that are coming in via the varargs of the download method and combine the results into one Seq/List/Set, whatever, together. When one Future failed, let's say because the server is unreachable, it should take all others and move on and return the result nonetheless. firstCompletedOf won't work because I need the results of all, except one failed due to error. I thought about using Future.sequence like below but I can't get it to work. Here is what I tried...
def download(urls: URL*) = Future.sequence {
urls.map { url =>
Future {
io.Source.fromURL(url).getLines
}
}
}
This produces a Seq[Future[Iterator[String]]] which is not compatible with M_[Future[A_]].
A Future[Iterator[String]] is what I want. (I thought I return an Iterator because I need to reuse it later on with reset method on Iterator.)

You can use parallel collections:
import java.net.URL
val providers = List(
new URL("http://www.cnn.com"),
new URL("http://www.bbc.co.uk"),
new URL("http://www.othersite.com")
)
def download(urls: URL*) = urls.par.flatMap(url => {
Try {
io.Source.fromURL(url).getLines
} match {
case Success(e) => e
case Failure(_) => Seq()
}
}).toSeq
val res: Seq[String] = download(providers:_*)
Or if you want the non blocking version with a Future:
def download(urls: URL*) = Future {
blocking {
urls.par.flatMap(url => {
Try {
io.Source.fromURL(url).getLines
} match {
case Success(e) => e
case Failure(_) => Seq()
}
})
}
}
val res: Future[Seq[String]] = download(providers:_*)

Related

Scala Futures for-comprehension with a list of values

I need to execute a Future method on some elements I have in a list simultaneously. My current implementation works sequentially, which is not optimal for saving time. I did this by mapping my list and calling the method on each element and processing the data this way.
My manager shared a link with me showing how to execute Futures simultaneously using for-comprehension but I cannot see/understand how I can implement this with my List.
The link he shared with me is https://alvinalexander.com/scala/how-use-multiple-scala-futures-in-for-comprehension-loop/
Here is my current code:
private def method1(id: String): Tuple2[Boolean, List[MyObject]] = {
val workers = List.concat(idleWorkers, activeWorkers.keys.toList)
var ready = true;
val workerStatus = workers.map{ worker =>
val option = Await.result(method2(worker), 1 seconds)
var status = if (option.isDefined) {
if (option.get._2 == id) {
option.get._1.toString
} else {
"INVALID"
}
} else "FAILED"
val status = s"$worker: $status"
if (option.get._1) {
ready = false
}
MyObject(worker.toString, status)
}.toList.filterNot(s => s. status.contains("INVALID"))
(ready, workerStatus)
}
private def method2(worker: ActorRef): Future[Option[(Boolean, String)]] = Future{
implicit val timeout: Timeout = 1 seconds;
Try(Await.result(worker ? GetStatus, 1 seconds)) match {
case Success(extractedVal) => extractedVal match {
case res: (Boolean, String) => Some(res)
case _ => None
}
case Failure(_) => { None }
case _ => { None }
}
}
If someone could suggest how to implement for-comprehension in this scenario, I would be grateful. Thanks
For method2 there is no need for the Future/Await mix. Just map the Future:
def method2(worker: ActorRef): Future[Option[(Boolean, String)]] =
(worker ? GetStatus).map{
case res: (Boolean, String) => Some(res)
case _ => None
}
For method1 you likewise need to map the result of method2 and do the processing inside the map. This will make workerStatus a List[Future[MyObject]] and means that everything runs in parallel.
Then use Future.sequence(workerStatus) to turn the List[Future[MyObject]] into a Future[List[MyObject]]. You can then use map again to do the filtering/ checking on that List[MyObject]. This will happen when all the individual Futures have completed.
Ideally you would then return a Future from method1 to keep everything asynchronous. You could, if absolutely necessary, use Await.result at this point which would wait for all the asynchronous operations to complete (or fail).

Recursive HTTP-requests in Scala

I need to do recursive requests and then collect all models into one List, but not understand how to do it. Please tell me am I thinking right way?
package kindSir.main
import dispatch.Defaults._
import dispatch._
import kindSir.models._
import org.json4s._
import org.json4s.jackson.JsonMethods._
object ApplicationMain extends App {
def fetchMergeRequests(startPage: Int = 1): Future[List[MergeRequest]] = {
val requestsUrl = url(s"https://gitlab.com/api/v4/projects/gitlab-org%2Fgitlab-ce/merge_requests?state=opened&per_page=3&page=${startPage}")
Http(requestsUrl).map { res =>
(parse(res.getResponseBody), res.getHeader("X-Next-Page").toInt) match {
case (list#JArray(_), nextPage: Int) =>
val currentList: List[MergeRequest] = MergeRequest.parseList(list).get
val nextPageListFuture: Future[List[MergeRequest]] = fetchMergeRequests(nextPage)
// And how to merge these two lists?
case (list#JArray(_), _) => MergeRequest.parseList(list).get
case _ => throw new RuntimeException(s"No merge requests for project found")
}
}
}
}
The main problem you're dealing with here is that you're trying to combine data you already have (List[MergeRequest]) with the data you'll retrieve in future (Future[List[MergeRequest]]). There are a few things you need to do to handle this scenario:
Use flatMap instead of map on result of the HTTP request. This allows you to make further HTTP requests inside the recursion but map them back to a single Future.
Call map on the result of the recursion fetchMergeRequests(nextPage) to combine the data you already have with the future data from the recursion.
Wrap the other list in Future.successful() because flatMap requires all the pattern matches to return a Future — except for the exception.
I'm not familiar with the libraries you are using so I haven't tested it, but I think your code should work like this:
def fetchMergeRequests(startPage: Int = 1): Future[List[MergeRequest]] = {
val requestsUrl = url(s"https://gitlab.com/api/v4/projects/gitlab-org%2Fgitlab-ce/merge_requests?state=opened&per_page=3&page=${startPage}")
Http(requestsUrl).flatMap { res =>
(parse(res.getResponseBody), res.getHeader("X-Next-Page").toInt) match {
case (list#JArray(_), nextPage: Int) =>
val currentList: List[MergeRequest] = MergeRequest.parseList(list).get
val nextPageListFuture: Future[List[MergeRequest]] = fetchMergeRequests(nextPage)
nextPageListFuture.map(nextPageList => currentList ++ nextPageList)
case (list#JArray(_), _) =>
Future.successful(MergeRequest.parseList(list).get)
case _ => throw new RuntimeException(s"No merge requests for project found")
}
}
}

Scala Play: how to wait until future is complete before OK result is returned to frontend

In my playframework application I want to wait until my future is completed and the return it to the view.
my code looks like:
def getContentComponentUsageSearch: Action[AnyContent] = Action.async { implicit request =>
println(request.body.asJson)
request.body.asJson.map(_.validate[StepIds] match {
case JsSuccess(stepIds, _) =>
println("VALIDE SUCCESS -------------------------------")
val fList: List[Seq[Future[ProcessTemplatesModel]]] = List() :+ stepIds.s.map(s => {
processTemplateDTO.getProcessStepTemplate(s.processStep_id).flatMap(stepTemplate => {
processTemplateDTO.getProcessTemplate(stepTemplate.get.processTemplate_id.get).map(a => {
a.get
})
})
})
fList.map(u => {
val a: Seq[Future[ProcessTemplatesModel]] = u
Future.sequence(a).map(s => {
println(s)
})
})
Future.successful(Ok(Json.obj("id" -> "")))
case JsError(_) =>
println("NOT VALID -------------------------------")
Future.successful(BadRequest("Process Template not create client"))
case _ => Future.successful(BadRequest("Process Template create client"))
}).getOrElse(Future.successful(BadRequest("Process Template create client")))
}
the pirntln(s) is printing the finished stuff. But how can I wait until it is complete and return it then to the view?
thanks in advance
UPDATE:
also tried this:
val process = for {
fList: List[Seq[Future[ProcessTemplatesModel]]] <- List() :+ stepIds.s.map(s => {
processTemplateDTO.getProcessStepTemplate(s.processStep_id).flatMap(stepTemplate => {
processTemplateDTO.getProcessTemplate(stepTemplate.get.processTemplate_id.get).map(a => {
a.get
})
})
})
} yield (fList)
process.map({ case (fList) =>
Ok(Json.obj(
"processTemplate" -> fList
))
})
but then I got this:
UPDATE:
My problem is that the futures in fList do not complete before an OK result is returned
The code in the question didn't seem compilable, so here is an untested very rough sketch, that hopefully provides enough inspiration for further search of the correct solution:
def getContentComponentUsageSearch: = Action.async { implicit req =>
req.body.asJson.map(_.validate[StepIds] match {
case JsSuccess(stepIds, _) => {
// Create list of futures
val listFuts: List[Future[ProcessTemplatesModel]] = (stepIds.s.map(s => {
processTemplateDTO.
getProcessStepTemplate(s.processStep_id).
flatMap{ stepTemplate =>
processTemplateDTO.
getProcessTemplate(stepTemplate.get.processTemplate_id.get).
map(_.get)
}
})).toList
// Sequence all the futures into a single future of list
val futList = Future.sequence(listFuts)
// Flat map this single future to the OK result
for {
listPTMs <- futList
} yield {
// Apparently some debug output?
listPTMs foreach printl
Ok(Json.obj("id" -> ""))
}
}
case JsError(_) => {
println("NOT VALID -------------------------------")
Future.successful(BadRequest("Process Template not create client"))
}
case _ => Future.successful(BadRequest("Process Template create client"))
}).getOrElse(Future.successful(BadRequest("Process Template create client")))
}
If I understood your question correctly, what you wanted was to make sure that all futures in the list complete before you return the OK. Therefore I have first created a List[Future[...]]:
val listFuts: List[Future[ProcessTemplatesModel]] = // ...
Then I've combined all the futures into a single future of list, which completes only when every element has completed:
// Sequence all the futures into a single future of list
val futList = Future.sequence(listFuts)
Then I've used a for-comprehension to make sure that the listPTMs finishes computation before the OK is returned:
// Flat map this single future to the OK result
for {
listPTMs <- futList
} yield {
// Apparently some debug output?
listPTMs foreach printl
Ok(Json.obj("id" -> ""))
}
The for-yield (equivalent to map here) is what establishes the finish-this-before-doing-that behavior, so that listPTMs is fully evaluated before OK is constructed.
In order to wait until a Future is complete, it is most common to do one of two things:
Use a for-comprehension, which does a bunch of mapping and flatmapping behind the scenes before doing anything in the yield section (see Andrey's comment for a more detailed explanation). A simplified example:
def index: Action[AnyContent] = Action.async {
val future1 = Future(1)
val future2 = Future(2)
for {
f1 <- future1
f2 <- future2
} yield {
println(s"$f1 + $f2 = ${f1 + f2}") // prints 3
Ok(views.html.index("Home"))
}
}
Map inside a Future:
def index: Action[AnyContent] = Action.async {
val future1 = Future(1)
future1.map{
f1 =>
println(s"$f1")
Ok(views.html.index("Home"))
}
}
If there are multiple Futures:
def index: Action[AnyContent] = Action.async {
val future1 = Future(1)
val future2 = Future(2)
future1.flatMap{
f1 =>
future2.map {
f2 =>
println(s"$f1 + $f2 = ${f1 + f2}")
Ok(views.html.index("Home"))
}
}
}
}
When you have multiple Futures though, the argument for for-yield comprehensions gets much stronger as it gets easier to read. Also, you are probably aware but if you work with futures you may need to following imports:
import scala.concurrent.Future
import scala.concurrent.ExecutionContext.Implicits.global

Iterate data source asynchronously in batch and stop while remote return no data in Scala

Let's say we have a fake data source which will return data it holds in batch
class DataSource(size: Int) {
private var s = 0
implicit val g = scala.concurrent.ExecutionContext.global
def getData(): Future[List[Int]] = {
s = s + 1
Future {
Thread.sleep(Random.nextInt(s * 100))
if (s <= size) {
List.fill(100)(s)
} else {
List()
}
}
}
object Test extends App {
val source = new DataSource(100)
implicit val g = scala.concurrent.ExecutionContext.global
def process(v: List[Int]): Unit = {
println(v)
}
def next(f: (List[Int]) => Unit): Unit = {
val fut = source.getData()
fut.onComplete {
case Success(v) => {
f(v)
v match {
case h :: t => next(f)
}
}
}
}
next(process)
Thread.sleep(1000000000)
}
I have mine, the problem here is some portion is more not pure. Ideally, I would like to wrap the Future for each batch into a big future, and the wrapper future success when last batch returned 0 size list? My situation is a little from this post, the next() there is synchronous call while my is also async.
Or is it ever possible to do what I want? Next batch will only be fetched when the previous one is resolved in the end whether to fetch the next batch depends on the size returned?
What's the best way to walk through this type of data sources? Are there any existing Scala frameworks that provide the feature I am looking for? Is play's Iteratee, Enumerator, Enumeratee the right tool? If so, can anyone provide an example on how to use those facilities to implement what I am looking for?
Edit----
With help from chunjef, I had just tried out. And it actually did work out for me. However, there was some small change I made based on his answer.
Source.fromIterator(()=>Iterator.continually(source.getData())).mapAsync(1) (f=>f.filter(_.size > 0))
.via(Flow[List[Int]].takeWhile(_.nonEmpty))
.runForeach(println)
However, can someone give comparison between Akka Stream and Play Iteratee? Does it worth me also try out Iteratee?
Code snip 1:
Source.fromIterator(() => Iterator.continually(ds.getData)) // line 1
.mapAsync(1)(identity) // line 2
.takeWhile(_.nonEmpty) // line 3
.runForeach(println) // line 4
Code snip 2: Assuming the getData depends on some other output of another flow, and I would like to concat it with the below flow. However, it yield too many files open error. Not sure what would cause this error, the mapAsync has been limited to 1 as its throughput if I understood correctly.
Flow[Int].mapConcat[Future[List[Int]]](c => {
Iterator.continually(ds.getData(c)).to[collection.immutable.Iterable]
}).mapAsync(1)(identity).takeWhile(_.nonEmpty).runForeach(println)
The following is one way to achieve the same behavior with Akka Streams, using your DataSource class:
import scala.concurrent.Future
import scala.util.Random
import akka.actor.ActorSystem
import akka.stream._
import akka.stream.scaladsl._
object StreamsExample extends App {
implicit val system = ActorSystem("Sandbox")
implicit val materializer = ActorMaterializer()
val ds = new DataSource(100)
Source.fromIterator(() => Iterator.continually(ds.getData)) // line 1
.mapAsync(1)(identity) // line 2
.takeWhile(_.nonEmpty) // line 3
.runForeach(println) // line 4
}
class DataSource(size: Int) {
...
}
A simplified line-by-line overview:
line 1: Creates a stream source that continually calls ds.getData if there is downstream demand.
line 2: mapAsync is a way to deal with stream elements that are Futures. In this case, the stream elements are of type Future[List[Int]]. The argument 1 is the level of parallelism: we specify 1 here because DataSource internally uses a mutable variable, and a parallelism level greater than one could produce unexpected results. identity is shorthand for x => x, which basically means that for each Future, we pass its result downstream without transforming it.
line 3: Essentially, ds.getData is called as long as the result of the Future is a non-empty List[Int]. If an empty List is encountered, processing is terminated.
line 4: runForeach here takes a function List[Int] => Unit and invokes that function for each stream element.
Ideally, I would like to wrap the Future for each batch into a big future, and the wrapper future success when last batch returned 0 size list?
I think you are looking for a Promise.
You would set up a Promise before you start the first iteration.
This gives you promise.future, a Future that you can then use to follow the completion of everything.
In your onComplete, you add a case _ => promise.success().
Something like
def loopUntilDone(f: (List[Int]) => Unit): Future[Unit] = {
val promise = Promise[Unit]
def next(): Unit = source.getData().onComplete {
case Success(v) =>
f(v)
v match {
case h :: t => next()
case _ => promise.success()
}
case Failure(e) => promise.failure(e)
}
// get going
next(f)
// return the Future for everything
promise.future
}
// future for everything, this is a `Future[Unit]`
// its `onComplete` will be triggered when there is no more data
val everything = loopUntilDone(process)
You are probably looking for a reactive streams library. My personal favorite (and one I'm most familiar with) is Monix. This is how it will work with DataSource unchanged
import scala.concurrent.duration.Duration
import scala.concurrent.Await
import monix.reactive.Observable
import monix.execution.Scheduler.Implicits.global
object Test extends App {
val source = new DataSource(100)
val completed = // <- this is Future[Unit], completes when foreach is done
Observable.repeat(Observable.fromFuture(source.getData()))
.flatten // <- Here it's Observable[List[Int]], it has collection-like methods
.takeWhile(_.nonEmpty)
.foreach(println)
Await.result(completed, Duration.Inf)
}
I just figured out that by using flatMapConcat can achieve what I wanted to achieve. There is no point to start another question as I have had the answer already. Put my sample code here just in case someone is looking for similar answer.
This type of API is very common for some integration between traditional Enterprise applications. The DataSource is to mock the API while the object App is to demonstrate how the client code can utilize Akka Stream to consume the APIs.
In my small project the API was provided in SOAP, and I used scalaxb to transform the SOAP to Scala async style. And with the client calls demonstrated in the object App, we can consume the API with AKKA Stream. Thanks for all for the help.
class DataSource(size: Int) {
private var transactionId: Long = 0
private val transactionCursorMap: mutable.HashMap[TransactionId, Set[ReadCursorId]] = mutable.HashMap.empty
private val cursorIteratorMap: mutable.HashMap[ReadCursorId, Iterator[List[Int]]] = mutable.HashMap.empty
implicit val g = scala.concurrent.ExecutionContext.global
case class TransactionId(id: Long)
case class ReadCursorId(id: Long)
def startTransaction(): Future[TransactionId] = {
Future {
synchronized {
transactionId += transactionId
}
val t = TransactionId(transactionId)
transactionCursorMap.update(t, Set(ReadCursorId(0)))
t
}
}
def createCursorId(t: TransactionId): ReadCursorId = {
synchronized {
val c = transactionCursorMap.getOrElseUpdate(t, Set(ReadCursorId(0)))
val currentId = c.foldLeft(0l) { (acc, a) => acc.max(a.id) }
val cId = ReadCursorId(currentId + 1)
transactionCursorMap.update(t, c + cId)
cursorIteratorMap.put(cId, createIterator)
cId
}
}
def createIterator(): Iterator[List[Int]] = {
(for {i <- 1 to 100} yield List.fill(100)(i)).toIterator
}
def startRead(t: TransactionId): Future[ReadCursorId] = {
Future {
createCursorId(t)
}
}
def getData(cursorId: ReadCursorId): Future[List[Int]] = {
synchronized {
Future {
Thread.sleep(Random.nextInt(100))
cursorIteratorMap.get(cursorId) match {
case Some(i) => i.next()
case _ => List()
}
}
}
}
}
object Test extends App {
val source = new DataSource(10)
implicit val system = ActorSystem("Sandbox")
implicit val materializer = ActorMaterializer()
implicit val g = scala.concurrent.ExecutionContext.global
//
// def process(v: List[Int]): Unit = {
// println(v)
// }
//
// def next(f: (List[Int]) => Unit): Unit = {
// val fut = source.getData()
// fut.onComplete {
// case Success(v) => {
// f(v)
// v match {
//
// case h :: t => next(f)
//
// }
// }
//
// }
//
// }
//
// next(process)
//
// Thread.sleep(1000000000)
val s = Source.fromFuture(source.startTransaction())
.map { e =>
source.startRead(e)
}
.mapAsync(1)(identity)
.flatMapConcat(
e => {
Source.fromIterator(() => Iterator.continually(source.getData(e)))
})
.mapAsync(5)(identity)
.via(Flow[List[Int]].takeWhile(_.nonEmpty))
.runForeach(println)
/*
val done = Source.fromIterator(() => Iterator.continually(source.getData())).mapAsync(1)(identity)
.via(Flow[List[Int]].takeWhile(_.nonEmpty))
.runFold(List[List[Int]]()) { (acc, r) =>
// println("=======" + acc + r)
r :: acc
}
done.onSuccess {
case e => {
e.foreach(println)
}
}
done.onComplete(_ => system.terminate())
*/
}

Handling Option Inside For Comprehension of Futures

Consider the following code inside a Play Framework controller:
val firstFuture = function1(id)
val secondFuture = function2(id)
val resultFuture = for {
first <- firstFuture
second <- secondFuture(_.get)
result <- function3(first, second)
} yield Ok(s"Processed $id")
resultFuture.map(result => result).recover { case t => InternalServerError(s"Error organizing files: $t.getMessage")}
Here are some details about the functions:
function1 returns Future[List]
function2 returns Future[Option[Person]]
function1 and function2 can run in parallel, but function3 needs the results for both.
Given this information, I have some questions:
Although the application is such that this code is very unlikely to be called with an improper id, I would like to handle this possibility. Basically, I would like to return NotFound if function2 returns None, but I can't figure out how to do that.
Will the recover call handle an Exception thrown any step of the way?
Is there a more elegant or idiomatic way to write this code?
Perhaps using collect, and then you can recover the NoSuchElementException--which yes, will recover a failure from any step of the way. resultFuture will either be successful with the mapped Result, or failed with the first exception that was thrown.
val firstFuture = function1(id)
val secondFuture = function2(id)
val resultFuture = for {
first <- firstFuture
second <- secondFuture.collect(case Some(x) => x)
result <- function3(first, second)
} yield Ok(s"Processed $id")
resultFuture.map(result => result)
.recover { case java.util.NoSuchElementException => NotFound }
.recover { case t => InternalServerError(s"Error organizing files: $t.getMessage")}
I would go with Scalaz OptionT. Maybe when you have only one function Future[Optipn[T]] it's overkill, but when you'll start adding more functions it will become super useful
import scala.concurrent.ExecutionContext.Implicits.global
import scalaz.OptionT
import scalaz.OptionT._
import scalaz.std.scalaFuture._
// Wrap 'some' result into OptionT
private def someOptionT[T](t: Future[T]): OptionT[Future, T] =
optionT[Future](t.map(Some.apply))
val firstFuture = function1(id)
val secondFuture = function2(id)
val action = for {
list <- someOptionT(firstFuture)
person <- optionT(secondFuture)
result = function3(list, person)
} yield result
action.run.map {
case None => NotFound
case Some(result) => Ok(s"Processed $id")
} recover {
case NonFatal(err) => InternalServerError(s"Error organizing files: ${err.getMessage}")
}