How to extract value from a scala Future - scala

How to I perform a reduce/fold operation on the Seq and then get the final value.
I'm performing an operation (in this case a Redis call) that returns a Future. I'm processing the Future (results) using a map operation.
The map operation returns a Future[Seq[Any]] type.
res0: scala.concurrent.Future[Seq[Any]] = scala.concurrent.impl.Promise$DefaultPromise#269f8f79
Now I want to perform some operations(fold/reduce) on this Seq and then get a final value. How can I achieve this?
implicit val akkaSystem = akka.actor.ActorSystem()
val redisClient = RedisClient()
val sentimentZSetKey = "dummyzset"
val currentTimeStamp = System.currentTimeMillis()
val end = Limit(currentTimeStamp)
val start = Limit(currentTimeStamp - 60 * 100000)
val results = redisClient.zrangebyscoreWithscores(ZSetKey, start, end)
implicit val formats = DefaultFormats
import org.json4s._
import org.json4s.native.JsonMethods._
import org.json4s.DefaultFormats
results.map {
seq => seq.map {
element => element match {
case (byteString, value) => {
val p = byteString.decodeString("UTF-8")
try {
val ph = parse(p).extract[MyClass]
ph
} catch {
case e: Exception => println(e.getMessage)
}
}
case _ =>
}
}
}

Blocking is discouraged when using futures in Scala, but it can be done with the Await function as per the link. Since you want to further transform the sequence, you are better off using functional composition as in these examples.

Related

How to persist the list which we made dynamically from dataFrame in scala spark

def getAnimalName(dataFrame: DataFrame): List[String] = {
dataFrame.select("animal").
filter(col("animal").isNotNull && col("animal").notEqual("")).
rdd.map(r => r.getString(0)).distinct().collect.toList
}
I am basicaly Calling this function 2 times For getting the list for different purposes . I just want to know is there a way to retain the list in memory and we dont have to call the same function again and again to generate the list and only have to generate the list only one time in scala spark.
Try something as below and you can also check the performance using time func.
Also find the code explanation inline
import org.apache.spark.rdd
import org.apache.spark.sql.functions._
import org.apache.spark.sql.{DataFrame, functions}
object HandleCachedDF {
var cachedAnimalDF : rdd.RDD[String] = _
def main(args: Array[String]): Unit = {
val spark = Constant.getSparkSess
val df = spark.read.json("src/main/resources/hugeTest.json") // Load your Dataframe
val df1 = time[rdd.RDD[String]] {
getAnimalName(df)
}
val resultList = df1.collect().toList
val df2 = time{
getAnimalName(df)
}
val resultList1 = df2.collect().toList
println(resultList.equals(resultList1))
}
def getAnimalName(dataFrame: DataFrame): rdd.RDD[String] = {
if (cachedAnimalDF == null) { // Check if this the first initialization of your dataframe
cachedAnimalDF = dataFrame.select("animal").
filter(functions.col("animal").isNotNull && col("animal").notEqual("")).
rdd.map(r => r.getString(0)).distinct().cache() // Cache your dataframe
}
cachedAnimalDF // Return your cached dataframe
}
def time[R](block: => R): R = { // COmpute the time taken by function to execute
val t0 = System.nanoTime()
val result = block // call-by-name
val t1 = System.nanoTime()
println("Elapsed time: " + (t1 - t0) + "ns")
result
}
}
You would have to persist or cache at this point
dataFrame.select("animal").
filter(col("animal").isNotNull && col("animal").notEqual("")).
rdd.map(r => r.getString(0)).distinct().persist
and then call the function as follow
def getAnimalName(dataFrame: DataFrame): List[String] = {
dataFrame.collect.toList
}
as many times as you need it without repeat the process.
I hope it helps.

Scala resolve multiple Futures and get a Map(String, AnyRef)

I am currently trying to resolve multiple futures at once but as some of them may fail, I don't want to get a failure on all if one of them fails, instead, end up with a Map(String, AnyRef) (meaning a Map with the future name and the response converted to what a need).
Currently I have the following:
val fsResp = channelList.map {
channelRef => channelRef.ask(ReportStatus).mapTo[EventMessage]
}
Future.sequence(fsResp).onComplete{
case Success(resp: Seq[EventMessage]) =>
resp.foreach { event => Supervisor.foreach(_ ! event) }
val channels = loadConfiguredComponents()
.collect {
case ("processor" | "external", components) => components.map {
case (name, config: Channel) =>
(name, serializeDetails(config, resp.find(_.channel == ChannelName(name))))
}
}.flatten.toMap
val event = EventMessage(...)
Supervisor.foreach(_ ! event)
case Failure(exception) => originalSender ! replayError(exception.getMessage)
}
But this fails if any of those fails. So How can I end up with a Map(channelRef.path.name, event() | exception) ?
Thanks!
You can use fallbackTo in order to avoid a Failure. In this example I change Future[T] to Future[Option[T]] in order to fallback to None, and then remove None elements.
import scala.concurrent.ExecutionContext.Implicits.global
def method(value:Int) = { Thread.sleep(2000); println(value); value }
println("start")
val eventualNone = Future.successful(None)
val futures = List(Future(method(1)), Future(method(2)), Future(method(3)), Future(throw new RuntimeException))
val withoutFailures = futures.map(_.map(Option.apply).fallbackTo(eventualNone))
Future.sequence(withoutFailures).map(_.flatten).onComplete {
case Success(values) => println(values)
case Failure(ex:Throwable) => println("FAIL")
}
Thread.sleep(5000)
output
start
1
3
2
List(1, 2, 3)
Can be changed to Either[Throwable, T] instead of Option[T] if you want to know what fails.
This code always be Success (regarding the Future result), so you need to inspect your values in order to know if all futures fail.
To capture successful/failed values from the list of Futures, you can first apply map/recover to each of them, then use Future.sequence to transform the result list into a Future of List[Either[Throwable,EventMessage]], as shown in the following trivialized example:
import scala.concurrent.{Future, Await}
import scala.concurrent.duration._
import scala.concurrent.ExecutionContext.Implicits.global
case class EventMessage(id: Int, msg: String)
val fsResp = List(
Future{EventMessage(1, "M1")}, Future{throw new Throwable()}, Future{EventMessage(3, "M3")}
)
val f = Future.sequence(
fsResp.map( _.map{ resp =>
// Do stuff with `resp`, add item to `Map()`, etc ...
Right(resp)
}.
recover{ case e: Throwable =>
// Log `exception` info, etc ...
Left(e)
} )
)
Await.result(f, Duration.Inf)
// f: scala.concurrent.Future[List[Product with Serializable with scala.util.
// Either[Throwable,EventMessage]]] = Future(Success(List(
// Right(EventMessage(1,M1)), Left(java.lang.Throwable), Right(EventMessage(3,M3))
// )))

Spark doesn't conform to expected type TraversableOnce

val num_idf_pairs = rescaledData.select("item", "features")
.rdd.map(x => {(x(0), x(1))})
val itemRdd = rescaledData.select("item", "features").where("item = 1")
.rdd.map(x => {(x(0), x(1))})
val b_num_idf_pairs = sparkSession.sparkContext.broadcast(num_idf_pairs.collect())
val sims = num_idf_pairs.flatMap {
case (key, value) =>
val sv1 = value.asInstanceOf[SV]
import breeze.linalg._
val valuesVector = new SparseVector[Double](sv1.indices, sv1.values, sv1.size)
itemRdd.map {
case (id2, idf2) =>
val sv2 = idf2.asInstanceOf[SV]
val xVector = new SparseVector[Double](sv2.indices, sv2.values, sv2.size)
val sim = valuesVector.dot(xVector) / (norm(valuesVector) * norm(xVector))
(id2.toString, key.toString, sim)
}
}
The error is doesn't conform to expected type TraversableOnce.
When i modify as follows:
val b_num_idf_pairs = sparkSession.sparkContext.broadcast(num_idf_pairs.collect())
val docSims = num_idf_pairs.flatMap {
case (id1, idf1) =>
val idfs = b_num_idf_pairs.value.filter(_._1 != id1)
val sv1 = idf1.asInstanceOf[SV]
import breeze.linalg._
val bsv1 = new SparseVector[Double](sv1.indices, sv1.values, sv1.size)
idfs.map {
case (id2, idf2) =>
val sv2 = idf2.asInstanceOf[SV]
val bsv2 = new SparseVector[Double](sv2.indices, sv2.values, sv2.size)
val cosSim = bsv1.dot(bsv2).asInstanceOf[Double] / (norm(bsv1) * norm(bsv2))
(id1.toString(), id2.toString(), cosSim)
}
}
it compiles but this will cause an OutOfMemoryException. I set --executor-memory 4G.
The first snippet:
num_idf_pairs.flatMap {
...
itemRdd.map { ...}
}
is not only not valid Spark code (no nested transformations are allowed), but also, as you already know, won't type check, because RDD is not TraversableOnce.
The second snippet likely fails, because data you are trying to collect and broadcast is to large.
It looks like you are trying to find all items similarity so you'll need Cartesian product, and structure your code roughly like this:
num_idf_pairs
.cartesian(itemRdd)
.filter { case ((id1, idf1), (id2, idf2)) => id1 != id2 }
.map { case ((id1, idf1), (id2, idf2)) => {
val cosSim = ??? // Compute similarity
(id1.toString(), id2.toString(), cosSim)
}}

asynchronous processing using list of Scala futures with onComplete for exception handling

I'm trying to make a large number of external service calls, each followed up with exception handling and conditional further processing. I thought it would be easy to extend this nice (Asynchronous IO in Scala with futures) example using an .onComplete inside, but it appears that I don't understand something about scoping and/or Futures. Can anyone point me in the right direction please?
#!/bin/bash
scala -feature $0 $#
exit
!#
import scala.concurrent.{future, blocking, Future, Await}
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration._
import scala.util.{Success, Failure}
import scala.language.postfixOps
val keylist = List("key1", "key2")
val myFuts: List[Future[String]] = keylist.map {
myid => future {
// this line simulates an external call which returns a future (retrieval from S3)
val myfut = future { Thread.sleep(1); "START " + myid}
var mystr = "This should have been overwritten"
myfut.onComplete {
case Failure(ex) => {
println (s"failed with error: $ex")
mystr = "FAILED"
}
case Success(myval) => {
mystr = s"SUCCESS $myid: $myval"
println (mystr)
}
}
mystr
}
}
val futset: Future[List[String]] = Future.sequence(myFuts)
println (Await.result(futset, 10 seconds))
on my computer (Scala 2.10.4), this prints:
SUCCESS key2: START key2
SUCCESS key1: START key1
List(This should have been overwritten, This should have been overwritten)
I want (order unimportant):
SUCCESS key2: START key2
SUCCESS key1: START key1
List(SUCCESS key2: START key2, SUCCESS key1: START key1)
I would avoid using onComplete and trying to use it to do side-effecting logic on a mutable variable. I would instead map the future and handle the fail case as returning a different value. Here's a slightly modified version of your code, using map on the Future (via a for comprehension) and then using recover to handle the failure case. Hopefully this is what you were looking for:
val keylist = List("key1", "key2")
val myFuts: List[Future[String]] = keylist.map {myid =>
// this line simulates an external call which returns a future (retrieval from S3)
val myfut = future { Thread.sleep(1); "START " + myid}
val result = for (myval <- myfut) yield {
val res = s"SUCCESS $myid: $myval"
println(res)
res
}
result.recover{
case ex =>
println (s"failed with error: $ex")
"FAILED"
}
}
val futset: Future[List[String]] = Future.sequence(myFuts)
println (Await.result(futset, 10 seconds))
On complete does not return a new future, it just allows you to do something when that future is completed. So your first Future block is returning before the onComplete is executed, thus you are getting back the original value of the string.
What we can do is use a promise, to return another future, and that future is completed by the result of the first future.
val keylist = List("key1", "key2")
val myFuts: List[Future[String]] = keylist.map {
myid => {
// this line simulates an external call which returns a future (retrieval from S3)
val myfut = Future {
Thread.sleep(1); "START " + myid
}
var mystr = "This should have been overwritten"
val p = Promise[String]()
myfut.onComplete {
case Failure(ex) =>
println(s"failed with error: $ex")
mystr = "FAILED"
p failure ex
case Success(myval) =>
mystr = s"SUCCESS $myid: $myval"
println(mystr)
p success myval
}
p.future
}
}
val futset: Future[List[String]] = Future.sequence(myFuts)
println(Await.result(futset, 10 seconds))
What would be super handy would be a mapAll method as I asked about here:
Map a Future for both Success and Failure

Scala: List[Future] to Future[List] disregarding failed futures

I'm looking for a way to convert an arbitrary length list of Futures to a Future of List. I'm using Playframework, so ultimately, what I really want is a Future[Result], but to make things simpler, let's just say Future[List[Int]] The normal way to do this would be to use Future.sequence(...) but there's a twist... The list I'm given usually has around 10-20 futures in it, and it's not uncommon for one of those futures to fail (they are making external web service requests).
Instead of having to retry all of them in the event that one of them fails, I'd like to be able to get at the ones that succeeded and return those.
For example, doing the following doesn't work:
import scala.concurrent._
import scala.concurrent.ExecutionContext.Implicits.global
import scala.util.Success
import scala.util.Failure
val listOfFutures = Future.successful(1) :: Future.failed(new Exception("Failure")) ::
Future.successful(3) :: Nil
val futureOfList = Future.sequence(listOfFutures)
futureOfList onComplete {
case Success(x) => println("Success!!! " + x)
case Failure(ex) => println("Failed !!! " + ex)
}
scala> Failed !!! java.lang.Exception: Failure
Instead of getting the only the exception, I'd like to be able to pull the 1 and 3 out of there. I tried using Future.fold, but that apparently just calls Future.sequence behind the scenes.
The trick is to first make sure that none of the futures has failed. .recover is your friend here, you can combine it with map to convert all the Future[T] results to Future[Try[T]]] instances, all of which are certain to be successful futures.
note: You can use Option or Either as well here, but Try is the cleanest way if you specifically want to trap exceptions
def futureToFutureTry[T](f: Future[T]): Future[Try[T]] =
f.map(Success(_)).recover { case x => Failure(x)}
val listOfFutures = ...
val listOfFutureTrys = listOfFutures.map(futureToFutureTry(_))
Then use Future.sequence as before, to give you a Future[List[Try[T]]]
val futureListOfTrys = Future.sequence(listOfFutureTrys)
Then filter:
val futureListOfSuccesses = futureListOfTrys.map(_.filter(_.isSuccess))
You can even pull out the specific failures, if you need them:
val futureListOfFailures = futureListOfTrys.map(_.filter(_.isFailure))
Scala 2.12 has an improvement on Future.transform that lends itself in an anwser with less codes.
val futures = Seq(Future{1},Future{throw new Exception})
// instead of `map` and `recover`, use `transform`
val seq = Future.sequence(futures.map(_.transform(Success(_))))
val successes = seq.map(_.collect{case Success(x)=>x})
successes
//res1: Future[Seq[Int]] = Future(Success(List(1)))
val failures = seq.map(_.collect{case Failure(x)=>x})
failures
//res2: Future[Seq[Throwable]] = Future(Success(List(java.lang.Exception)))
I tried Kevin's answer, and I ran into a glitch on my version of Scala (2.11.5)... I corrected that, and wrote a few additional tests if anyone is interested... here is my version >
implicit class FutureCompanionOps(val f: Future.type) extends AnyVal {
/** Given a list of futures `fs`, returns the future holding the list of Try's of the futures from `fs`.
* The returned future is completed only once all of the futures in `fs` have been completed.
*/
def allAsTrys[T](fItems: /* future items */ List[Future[T]]): Future[List[Try[T]]] = {
val listOfFutureTrys: List[Future[Try[T]]] = fItems.map(futureToFutureTry)
Future.sequence(listOfFutureTrys)
}
def futureToFutureTry[T](f: Future[T]): Future[Try[T]] = {
f.map(Success(_)) .recover({case x => Failure(x)})
}
def allFailedAsTrys[T](fItems: /* future items */ List[Future[T]]): Future[List[Try[T]]] = {
allAsTrys(fItems).map(_.filter(_.isFailure))
}
def allSucceededAsTrys[T](fItems: /* future items */ List[Future[T]]): Future[List[Try[T]]] = {
allAsTrys(fItems).map(_.filter(_.isSuccess))
}
}
// Tests...
// allAsTrys tests
//
test("futureToFutureTry returns Success if no exception") {
val future = Future.futureToFutureTry(Future{"mouse"})
Thread.sleep(0, 100)
val futureValue = future.value
assert(futureValue == Some(Success(Success("mouse"))))
}
test("futureToFutureTry returns Failure if exception thrown") {
val future = Future.futureToFutureTry(Future{throw new IllegalStateException("bad news")})
Thread.sleep(5) // need to sleep a LOT longer to get Exception from failure case... interesting.....
val futureValue = future.value
assertResult(true) {
futureValue match {
case Some(Success(Failure(error: IllegalStateException))) => true
}
}
}
test("Future.allAsTrys returns Nil given Nil list as input") {
val future = Future.allAsTrys(Nil)
assert ( Await.result(future, 100 nanosecond).isEmpty )
}
test("Future.allAsTrys returns successful item even if preceded by failing item") {
val future1 = Future{throw new IllegalStateException("bad news")}
var future2 = Future{"dog"}
val futureListOfTrys = Future.allAsTrys(List(future1,future2))
val listOfTrys = Await.result(futureListOfTrys, 10 milli)
System.out.println("successItem:" + listOfTrys);
assert(listOfTrys(0).failed.get.getMessage.contains("bad news"))
assert(listOfTrys(1) == Success("dog"))
}
test("Future.allAsTrys returns successful item even if followed by failing item") {
var future1 = Future{"dog"}
val future2 = Future{throw new IllegalStateException("bad news")}
val futureListOfTrys = Future.allAsTrys(List(future1,future2))
val listOfTrys = Await.result(futureListOfTrys, 10 milli)
System.out.println("successItem:" + listOfTrys);
assert(listOfTrys(1).failed.get.getMessage.contains("bad news"))
assert(listOfTrys(0) == Success("dog"))
}
test("Future.allFailedAsTrys returns the failed item and only that item") {
var future1 = Future{"dog"}
val future2 = Future{throw new IllegalStateException("bad news")}
val futureListOfTrys = Future.allFailedAsTrys(List(future1,future2))
val listOfTrys = Await.result(futureListOfTrys, 10 milli)
assert(listOfTrys(0).failed.get.getMessage.contains("bad news"))
assert(listOfTrys.size == 1)
}
test("Future.allSucceededAsTrys returns the succeeded item and only that item") {
var future1 = Future{"dog"}
val future2 = Future{throw new IllegalStateException("bad news")}
val futureListOfTrys = Future.allSucceededAsTrys(List(future1,future2))
val listOfTrys = Await.result(futureListOfTrys, 10 milli)
assert(listOfTrys(0) == Success("dog"))
assert(listOfTrys.size == 1)
}
I just came across this question and have another solution to offer:
def allSuccessful[A, M[X] <: TraversableOnce[X]](in: M[Future[A]])
(implicit cbf: CanBuildFrom[M[Future[A]], A, M[A]],
executor: ExecutionContext): Future[M[A]] = {
in.foldLeft(Future.successful(cbf(in))) {
(fr, fa) ⇒ (for (r ← fr; a ← fa) yield r += a) fallbackTo fr
} map (_.result())
}
The idea here is that within the fold you are waiting for the next element in the list to complete (using the for-comprehension syntax) and if the next one fails you just fallback to what you already have.
You can easily wraps future result with option and then flatten the list:
def futureToFutureOption[T](f: Future[T]): Future[Option[T]] =
f.map(Some(_)).recover {
case e => None
}
val listOfFutureOptions = listOfFutures.map(futureToFutureOption(_))
val futureListOfOptions = Future.sequence(listOfFutureOptions)
val futureListOfSuccesses = futureListOfOptions.flatten
You can also collect successful and unsuccessful results in different lists:
def safeSequence[A](futures: List[Future[A]]): Future[(List[Throwable], List[A])] = {
futures.foldLeft(Future.successful((List.empty[Throwable], List.empty[A]))) { (flist, future) =>
flist.flatMap { case (elist, alist) =>
future
.map { success => (elist, alist :+ success) }
.recover { case error: Throwable => (elist :+ error, alist) }
}
}
}
If you need to keep failed futures for some reason, e.g., logging or conditional processing, this works with Scala 2.12+. You can find working code here.
val f1 = Future(1)
val f2 = Future(2)
val ff = Future.failed(new Exception())
val futures: Seq[Future[Either[Throwable, Int]]] =
Seq(f1, f2, ff).map(_.transform(f => Success(f.toEither)))
val sum = Future
.sequence(futures)
.map { eithers =>
val (failures, successes) = eithers.partitionMap(identity)
val fsum = failures.map(_ => 100).sum
val ssum = successes.sum
fsum + ssum
}
assert(Await.result(sum, 1.second) == 103)