Error handling in list of scala futures - Apache Spark - scala

I am having issues while handle exceptions in List of Scala futures. I am calling getQC_report(qcArgsThread,spark) method within ruuner method which process input file and saves in Hive table. Code below
import scala.util.{Failure, Success}
import scala.concurrent._
import scala.concurrent.duration._
val spark = SparkSession.builder.master("yarn").enableHiveSupport().getOrCreate()
var argsList: List[Array[String]] = List[Array[String]]()
for(ip_file <- INPUT_FILE.asScala.toList) {
var qcArgs:Array[String] = null
qcArgs = Array("input_file", ip_file,
"hiveDB",hiveDB,
"Outputhive_table",Outputhive_table)
argsList = qcArgs :: argsList
}
var pool = 0
def poolId = {
pool = pool + 1
pool
}
def runner(qcArgsThread: Array[String]) = Future {
sc.setLocalProperty("spark.scheduler.pool", poolId.toString)
getQC_report(qcArgsThread,spark)
}
val futures = argsList map(i => runner(i))
futures foreach(f => Await.ready(f, Duration.Inf))
futures.onComplete {
case Success(x) => {
println(s"\nresult = $x")
}
case Failure(e) => {
System.err.println("Failure happened!")
System.err.println(e.getMessage)
}
}
I am getting error in futures.onComplete line.
Error - Cannot resolve symbol onComplete.
Please help me in improving the code as I am new to using Scala Futures. Thanks!

The short answer is that because argsList is a List[Array[String]]
val futures = argsList map(i => runner(i))
will have the type List[Future[WhateverGetQC_ReportReturns]]. It specifically is not a Future, so has no onComplete method.
If you want to have a Future which completes when all the futures are completed, Future.sequence will convert a List[Future[T]] into a Future[List[T]]:
// replaces all code after val futures = argsList map ...
val allFutures = Future.sequence(futures)
val result: List[WhateverGetQC_ReportReturns] =
try {
Await.result(allFutures, Duration.Inf)
} catch {
case NonFatal(e) =>
System.err.println("Failure happened!")
System.err.println(e.getMessage)
}

Related

Scala program using futures is not terminating

I am trying to learn concurrency in Scala and using Scala futures to generate a dataset with random string. I want to create an application which should generate a file with any number of records and it should be scalable.
Code:
import java.util.concurrent.{ExecutorService, Executors}
import scala.util.{Failure, Random, Success}
import scala.concurrent.duration._
object datacreator {
implicit val ec: ExecutionContext = new ExecutionContext {
val threadPool: ExecutorService = Executors.newFixedThreadPool(4)
def execute(runnable: Runnable) {
threadPool.submit(runnable)
}
def reportFailure(t: Throwable) {}
}
def getRecord : String = {
"Random string"
}
def main(args: Array[String]): Unit = {
val filename = args(0)
val number_of_records = args(1)
val file_Object = new FileWriter(filename, true)
val data: Future[Iterable[String]] = Future {
for (i <- 1 to number_of_records.toInt)
yield getRecord
}
val result = data.map{
result => result.foreach(record => file_Object.write(record))
}
result.onComplete{
case Success(value) => {
println("Success")
file_Object.close()
}
case Failure(e) => e.printStackTrace()
}
}
}
I am facing the following issues:
When I am running the program using SBT it is writing results to the file but not terminating as going in infinite mode.
[info] Loading project definition from /Users/cw0155/PersonalProjects/datagen/project
[info] Loading settings for project datagen from build.sbt ...
[info] Set current project to datagenerator (in build file:/Users/cw0155/PersonalProjects/datagen/)
[info] running com.generator.DataGenerator xyz.csv 100
Success
| => datagen / Compile / runMain 255s
When I am running the program using Jar as:
scala -cp target/scala-2.13/datagenerator_2.13-0.1.jar com.generator.DataGenerator "pqr.csv" "1000"
It is waiting infinite time and not writing to the file.
Any help is much appreciated :)
Try this version
bar.scala
import scala.concurrent.{Await, Future, ExecutionContext}
import scala.concurrent.duration._
import scala.util.{Success, Failure}
import ExecutionContext.Implicits.global
import java.io.FileWriter
object bar {
def getRecord: String = "Random string\n"
def main(args: Array[String]): Unit = {
val filename = args(0)
val number_of_records = args(1)
val data: Future[Iterable[String]] = Future {
for (i <- 1 to number_of_records.toInt)
yield getRecord
}
val file_Object = new FileWriter(filename, true)
val result = data.map( r => r.foreach(record => file_Object.write(record)) )
result.onComplete {
case Success(value) =>
println("Success")
file_Object.close()
case Failure(e) =>
e.printStackTrace()
}
Await.result( result, 10.second )
}
}
Your original version gave me the expected output when I ran it like so
bash-3.2$ scala bar.scala /dev/fd/1 10
Success
Random string
Random string
Random string
Random string
Random string
Random string
Random string
Random string
Random string
Random string
However without the Await.result your program can exit before the future finishes.

Scala resolve multiple Futures and get a Map(String, AnyRef)

I am currently trying to resolve multiple futures at once but as some of them may fail, I don't want to get a failure on all if one of them fails, instead, end up with a Map(String, AnyRef) (meaning a Map with the future name and the response converted to what a need).
Currently I have the following:
val fsResp = channelList.map {
channelRef => channelRef.ask(ReportStatus).mapTo[EventMessage]
}
Future.sequence(fsResp).onComplete{
case Success(resp: Seq[EventMessage]) =>
resp.foreach { event => Supervisor.foreach(_ ! event) }
val channels = loadConfiguredComponents()
.collect {
case ("processor" | "external", components) => components.map {
case (name, config: Channel) =>
(name, serializeDetails(config, resp.find(_.channel == ChannelName(name))))
}
}.flatten.toMap
val event = EventMessage(...)
Supervisor.foreach(_ ! event)
case Failure(exception) => originalSender ! replayError(exception.getMessage)
}
But this fails if any of those fails. So How can I end up with a Map(channelRef.path.name, event() | exception) ?
Thanks!
You can use fallbackTo in order to avoid a Failure. In this example I change Future[T] to Future[Option[T]] in order to fallback to None, and then remove None elements.
import scala.concurrent.ExecutionContext.Implicits.global
def method(value:Int) = { Thread.sleep(2000); println(value); value }
println("start")
val eventualNone = Future.successful(None)
val futures = List(Future(method(1)), Future(method(2)), Future(method(3)), Future(throw new RuntimeException))
val withoutFailures = futures.map(_.map(Option.apply).fallbackTo(eventualNone))
Future.sequence(withoutFailures).map(_.flatten).onComplete {
case Success(values) => println(values)
case Failure(ex:Throwable) => println("FAIL")
}
Thread.sleep(5000)
output
start
1
3
2
List(1, 2, 3)
Can be changed to Either[Throwable, T] instead of Option[T] if you want to know what fails.
This code always be Success (regarding the Future result), so you need to inspect your values in order to know if all futures fail.
To capture successful/failed values from the list of Futures, you can first apply map/recover to each of them, then use Future.sequence to transform the result list into a Future of List[Either[Throwable,EventMessage]], as shown in the following trivialized example:
import scala.concurrent.{Future, Await}
import scala.concurrent.duration._
import scala.concurrent.ExecutionContext.Implicits.global
case class EventMessage(id: Int, msg: String)
val fsResp = List(
Future{EventMessage(1, "M1")}, Future{throw new Throwable()}, Future{EventMessage(3, "M3")}
)
val f = Future.sequence(
fsResp.map( _.map{ resp =>
// Do stuff with `resp`, add item to `Map()`, etc ...
Right(resp)
}.
recover{ case e: Throwable =>
// Log `exception` info, etc ...
Left(e)
} )
)
Await.result(f, Duration.Inf)
// f: scala.concurrent.Future[List[Product with Serializable with scala.util.
// Either[Throwable,EventMessage]]] = Future(Success(List(
// Right(EventMessage(1,M1)), Left(java.lang.Throwable), Right(EventMessage(3,M3))
// )))

Scala - Futures not starting

I want to start two workers on a Future method called extraire_phrases . I call them in my main, but it seems that the Promise is never fulfilled and I don't get anything at the end of my main, as if the workers don't start. Any ideas? Thanks a lot.
object Main {
val chemin_corpus:String = "src/corpus.txt"
val chemin_corpus_backup:String = "src/tartarinalpes.txt"
val chemin_dictionnaire:String = "src/dicorimes.dmp"
val chemin_dictionnaire_backup:String = "src/dicorimes2.dmp"
def main(args:Array[String]){
val quatrain = Promise[List[Phrase]]()
var grosPoeme = List[Phrase]()
Future {
val texte_1 = Phrases.extraire_phrases(chemin_corpus, chemin_dictionnaire)
val texte_2 = Phrases.extraire_phrases(chemin_corpus_backup, chemin_dictionnaire_backup)
texte_1.onComplete {
case Success(list) => {
val poeme = new DeuxVers(list)
poeme.ecrire :: grosPoeme
}
case Failure(ex) => {
quatrain.failure(LameExcuse("Error: " + ex.getMessage))
}
}
texte_2.onComplete {
case Success(lst) => {
val poeme2 = new DeuxVers(lst)
poeme2.ecrire :: grosPoeme
}
case Failure(ex) => {
quatrain.failure(LameExcuse("Error: " + ex.getMessage))
}
}
quatrain.success(grosPoeme)
}
println(quatrain.future)
println(grosPoeme)
}
}
Here is what I have in my console after execution:
Future(<not completed>)
List()
Even if I remove the Future { before val texte_1 it seems that none of them fire properly, texte_1 starts somehow, sometimes it works, sometimes not, and texte_2 never starts (never goes to completion). No failure either.
// Edit: Alvaro Carrasco's answer is the correct one. Thank both of you however for the help
Futures are executed asynchronously and your code won't "wait" for them to finish. onComplete will schedule some code to run when the future completes, but it won't force your program to wait for the result.
You need to thread the inner futures using map/flatMap/sequence so you end up with a single future at the end and then wait for it using Await.result(...).
You don't really need Promise here, as exceptions will caught by the future.
Something like this:
object Main {
val chemin_corpus:String = "src/corpus.txt"
...
def main(args:Array[String]){
...
val f1 = texte_1
.map {list =>
val poeme = new DeuxVers(list)
poeme.ecrire :: grosPoeme
}
val f2 = texte_2
.map {lst =>
val poeme2 = new DeuxVers(lst)
poeme2.ecrire :: grosPoeme
}
// combine both futures
val all = for {
res1 <- f1
res2 <- f2
} yield {
println(...)
}
// wait for the combined future
Await.result(all, 1.hour)
}
}
A solution with for-comprehension on Future. You need to change f1 and f2 to do what you need. f1 and f2 will be executed in parallel. for-comprehension gives elegant way to get the result of future(it's just syntactic sugar for compositions of operations with flatMap, filter and etc:
import scala.concurrent.{Await, Future}
import scala.concurrent.duration._
import scala.concurrent.ExecutionContext.Implicits.global
val f1: Future[Seq[Int]] = Future {
// Do something here
Seq(1, 2, 3)
}.recover { case ex =>
// If Future fails, let's log an exception and return default value
println(s"Unable to complete f1: $ex")
Seq.empty[Int]
}
val f2: Future[Seq[Int]] = Future {
// Do something here
Seq(4, 5, 6)
}.recover { case ex =>
// If Future fails, let's log an exception and return default value
println(s"Unable to complete f2: $ex")
Seq.empty[Int]
}
// f1 and f2 have started
// we use for-comprehension on Future to get the result
val f = for {
seq1 <- f1
seq2 <- f2
} yield seq1 ++ seq2
// Blocking current thread and wait 1 seconds for the result
val r = Await.result(f, 1.seconds)
println(s"Result: $r")

asynchronous processing using list of Scala futures with onComplete for exception handling

I'm trying to make a large number of external service calls, each followed up with exception handling and conditional further processing. I thought it would be easy to extend this nice (Asynchronous IO in Scala with futures) example using an .onComplete inside, but it appears that I don't understand something about scoping and/or Futures. Can anyone point me in the right direction please?
#!/bin/bash
scala -feature $0 $#
exit
!#
import scala.concurrent.{future, blocking, Future, Await}
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration._
import scala.util.{Success, Failure}
import scala.language.postfixOps
val keylist = List("key1", "key2")
val myFuts: List[Future[String]] = keylist.map {
myid => future {
// this line simulates an external call which returns a future (retrieval from S3)
val myfut = future { Thread.sleep(1); "START " + myid}
var mystr = "This should have been overwritten"
myfut.onComplete {
case Failure(ex) => {
println (s"failed with error: $ex")
mystr = "FAILED"
}
case Success(myval) => {
mystr = s"SUCCESS $myid: $myval"
println (mystr)
}
}
mystr
}
}
val futset: Future[List[String]] = Future.sequence(myFuts)
println (Await.result(futset, 10 seconds))
on my computer (Scala 2.10.4), this prints:
SUCCESS key2: START key2
SUCCESS key1: START key1
List(This should have been overwritten, This should have been overwritten)
I want (order unimportant):
SUCCESS key2: START key2
SUCCESS key1: START key1
List(SUCCESS key2: START key2, SUCCESS key1: START key1)
I would avoid using onComplete and trying to use it to do side-effecting logic on a mutable variable. I would instead map the future and handle the fail case as returning a different value. Here's a slightly modified version of your code, using map on the Future (via a for comprehension) and then using recover to handle the failure case. Hopefully this is what you were looking for:
val keylist = List("key1", "key2")
val myFuts: List[Future[String]] = keylist.map {myid =>
// this line simulates an external call which returns a future (retrieval from S3)
val myfut = future { Thread.sleep(1); "START " + myid}
val result = for (myval <- myfut) yield {
val res = s"SUCCESS $myid: $myval"
println(res)
res
}
result.recover{
case ex =>
println (s"failed with error: $ex")
"FAILED"
}
}
val futset: Future[List[String]] = Future.sequence(myFuts)
println (Await.result(futset, 10 seconds))
On complete does not return a new future, it just allows you to do something when that future is completed. So your first Future block is returning before the onComplete is executed, thus you are getting back the original value of the string.
What we can do is use a promise, to return another future, and that future is completed by the result of the first future.
val keylist = List("key1", "key2")
val myFuts: List[Future[String]] = keylist.map {
myid => {
// this line simulates an external call which returns a future (retrieval from S3)
val myfut = Future {
Thread.sleep(1); "START " + myid
}
var mystr = "This should have been overwritten"
val p = Promise[String]()
myfut.onComplete {
case Failure(ex) =>
println(s"failed with error: $ex")
mystr = "FAILED"
p failure ex
case Success(myval) =>
mystr = s"SUCCESS $myid: $myval"
println(mystr)
p success myval
}
p.future
}
}
val futset: Future[List[String]] = Future.sequence(myFuts)
println(Await.result(futset, 10 seconds))
What would be super handy would be a mapAll method as I asked about here:
Map a Future for both Success and Failure

Scala: List[Future] to Future[List] disregarding failed futures

I'm looking for a way to convert an arbitrary length list of Futures to a Future of List. I'm using Playframework, so ultimately, what I really want is a Future[Result], but to make things simpler, let's just say Future[List[Int]] The normal way to do this would be to use Future.sequence(...) but there's a twist... The list I'm given usually has around 10-20 futures in it, and it's not uncommon for one of those futures to fail (they are making external web service requests).
Instead of having to retry all of them in the event that one of them fails, I'd like to be able to get at the ones that succeeded and return those.
For example, doing the following doesn't work:
import scala.concurrent._
import scala.concurrent.ExecutionContext.Implicits.global
import scala.util.Success
import scala.util.Failure
val listOfFutures = Future.successful(1) :: Future.failed(new Exception("Failure")) ::
Future.successful(3) :: Nil
val futureOfList = Future.sequence(listOfFutures)
futureOfList onComplete {
case Success(x) => println("Success!!! " + x)
case Failure(ex) => println("Failed !!! " + ex)
}
scala> Failed !!! java.lang.Exception: Failure
Instead of getting the only the exception, I'd like to be able to pull the 1 and 3 out of there. I tried using Future.fold, but that apparently just calls Future.sequence behind the scenes.
The trick is to first make sure that none of the futures has failed. .recover is your friend here, you can combine it with map to convert all the Future[T] results to Future[Try[T]]] instances, all of which are certain to be successful futures.
note: You can use Option or Either as well here, but Try is the cleanest way if you specifically want to trap exceptions
def futureToFutureTry[T](f: Future[T]): Future[Try[T]] =
f.map(Success(_)).recover { case x => Failure(x)}
val listOfFutures = ...
val listOfFutureTrys = listOfFutures.map(futureToFutureTry(_))
Then use Future.sequence as before, to give you a Future[List[Try[T]]]
val futureListOfTrys = Future.sequence(listOfFutureTrys)
Then filter:
val futureListOfSuccesses = futureListOfTrys.map(_.filter(_.isSuccess))
You can even pull out the specific failures, if you need them:
val futureListOfFailures = futureListOfTrys.map(_.filter(_.isFailure))
Scala 2.12 has an improvement on Future.transform that lends itself in an anwser with less codes.
val futures = Seq(Future{1},Future{throw new Exception})
// instead of `map` and `recover`, use `transform`
val seq = Future.sequence(futures.map(_.transform(Success(_))))
val successes = seq.map(_.collect{case Success(x)=>x})
successes
//res1: Future[Seq[Int]] = Future(Success(List(1)))
val failures = seq.map(_.collect{case Failure(x)=>x})
failures
//res2: Future[Seq[Throwable]] = Future(Success(List(java.lang.Exception)))
I tried Kevin's answer, and I ran into a glitch on my version of Scala (2.11.5)... I corrected that, and wrote a few additional tests if anyone is interested... here is my version >
implicit class FutureCompanionOps(val f: Future.type) extends AnyVal {
/** Given a list of futures `fs`, returns the future holding the list of Try's of the futures from `fs`.
* The returned future is completed only once all of the futures in `fs` have been completed.
*/
def allAsTrys[T](fItems: /* future items */ List[Future[T]]): Future[List[Try[T]]] = {
val listOfFutureTrys: List[Future[Try[T]]] = fItems.map(futureToFutureTry)
Future.sequence(listOfFutureTrys)
}
def futureToFutureTry[T](f: Future[T]): Future[Try[T]] = {
f.map(Success(_)) .recover({case x => Failure(x)})
}
def allFailedAsTrys[T](fItems: /* future items */ List[Future[T]]): Future[List[Try[T]]] = {
allAsTrys(fItems).map(_.filter(_.isFailure))
}
def allSucceededAsTrys[T](fItems: /* future items */ List[Future[T]]): Future[List[Try[T]]] = {
allAsTrys(fItems).map(_.filter(_.isSuccess))
}
}
// Tests...
// allAsTrys tests
//
test("futureToFutureTry returns Success if no exception") {
val future = Future.futureToFutureTry(Future{"mouse"})
Thread.sleep(0, 100)
val futureValue = future.value
assert(futureValue == Some(Success(Success("mouse"))))
}
test("futureToFutureTry returns Failure if exception thrown") {
val future = Future.futureToFutureTry(Future{throw new IllegalStateException("bad news")})
Thread.sleep(5) // need to sleep a LOT longer to get Exception from failure case... interesting.....
val futureValue = future.value
assertResult(true) {
futureValue match {
case Some(Success(Failure(error: IllegalStateException))) => true
}
}
}
test("Future.allAsTrys returns Nil given Nil list as input") {
val future = Future.allAsTrys(Nil)
assert ( Await.result(future, 100 nanosecond).isEmpty )
}
test("Future.allAsTrys returns successful item even if preceded by failing item") {
val future1 = Future{throw new IllegalStateException("bad news")}
var future2 = Future{"dog"}
val futureListOfTrys = Future.allAsTrys(List(future1,future2))
val listOfTrys = Await.result(futureListOfTrys, 10 milli)
System.out.println("successItem:" + listOfTrys);
assert(listOfTrys(0).failed.get.getMessage.contains("bad news"))
assert(listOfTrys(1) == Success("dog"))
}
test("Future.allAsTrys returns successful item even if followed by failing item") {
var future1 = Future{"dog"}
val future2 = Future{throw new IllegalStateException("bad news")}
val futureListOfTrys = Future.allAsTrys(List(future1,future2))
val listOfTrys = Await.result(futureListOfTrys, 10 milli)
System.out.println("successItem:" + listOfTrys);
assert(listOfTrys(1).failed.get.getMessage.contains("bad news"))
assert(listOfTrys(0) == Success("dog"))
}
test("Future.allFailedAsTrys returns the failed item and only that item") {
var future1 = Future{"dog"}
val future2 = Future{throw new IllegalStateException("bad news")}
val futureListOfTrys = Future.allFailedAsTrys(List(future1,future2))
val listOfTrys = Await.result(futureListOfTrys, 10 milli)
assert(listOfTrys(0).failed.get.getMessage.contains("bad news"))
assert(listOfTrys.size == 1)
}
test("Future.allSucceededAsTrys returns the succeeded item and only that item") {
var future1 = Future{"dog"}
val future2 = Future{throw new IllegalStateException("bad news")}
val futureListOfTrys = Future.allSucceededAsTrys(List(future1,future2))
val listOfTrys = Await.result(futureListOfTrys, 10 milli)
assert(listOfTrys(0) == Success("dog"))
assert(listOfTrys.size == 1)
}
I just came across this question and have another solution to offer:
def allSuccessful[A, M[X] <: TraversableOnce[X]](in: M[Future[A]])
(implicit cbf: CanBuildFrom[M[Future[A]], A, M[A]],
executor: ExecutionContext): Future[M[A]] = {
in.foldLeft(Future.successful(cbf(in))) {
(fr, fa) ⇒ (for (r ← fr; a ← fa) yield r += a) fallbackTo fr
} map (_.result())
}
The idea here is that within the fold you are waiting for the next element in the list to complete (using the for-comprehension syntax) and if the next one fails you just fallback to what you already have.
You can easily wraps future result with option and then flatten the list:
def futureToFutureOption[T](f: Future[T]): Future[Option[T]] =
f.map(Some(_)).recover {
case e => None
}
val listOfFutureOptions = listOfFutures.map(futureToFutureOption(_))
val futureListOfOptions = Future.sequence(listOfFutureOptions)
val futureListOfSuccesses = futureListOfOptions.flatten
You can also collect successful and unsuccessful results in different lists:
def safeSequence[A](futures: List[Future[A]]): Future[(List[Throwable], List[A])] = {
futures.foldLeft(Future.successful((List.empty[Throwable], List.empty[A]))) { (flist, future) =>
flist.flatMap { case (elist, alist) =>
future
.map { success => (elist, alist :+ success) }
.recover { case error: Throwable => (elist :+ error, alist) }
}
}
}
If you need to keep failed futures for some reason, e.g., logging or conditional processing, this works with Scala 2.12+. You can find working code here.
val f1 = Future(1)
val f2 = Future(2)
val ff = Future.failed(new Exception())
val futures: Seq[Future[Either[Throwable, Int]]] =
Seq(f1, f2, ff).map(_.transform(f => Success(f.toEither)))
val sum = Future
.sequence(futures)
.map { eithers =>
val (failures, successes) = eithers.partitionMap(identity)
val fsum = failures.map(_ => 100).sum
val ssum = successes.sum
fsum + ssum
}
assert(Await.result(sum, 1.second) == 103)