How to traverse a Set[Future[Option[User]]] and mutate a map - scala

I have a mutable map that contains users:
val userMap = mutable.Map.empty[Int, User] // Int is user.Id
Now I need to load the new users, and add them to the map. I have the following api methods:
def getNewUsers(): Seq[Int]
def getUser(userId: Int): Future[Option[User]]
So I first get all the users I need to load:
val newUserIds: Set[Int] = api.getNewUsers
I now need to load each user, but not sure how to do getUser returns a Future[Option[User]].
I tried this:
api.getNewUsers().map( getUser(_) )
But that returns a Set[Future[Option[User]]]
I'm not sure how to use Set[Future[Option[User]]] to update my userMap now.

You'll have to wait for all of the Futures to finish. You can use Future.sequence to transform your Set[Future[_]] into a Future[Set], so you can wait for them all to finish:
val s: Set[scala.concurrent.Future[Some[User]]] = Set(Future(Some(User(1))), Future(Some(User(2))))
val f: Future[Set[Some[User]]] = Future.sequence(s)
f.map(users => users.foreach(u => /* your code here */))
However, using a mutable Map may be dangerous because it's possible to open yourself up to race conditions. Futures are executed in different threads, and if you altering a mutable object's state in different threads, bad things will happen.

You can use Future.sequence:
transforms a TraversableOnce[Future[A]] into a
Future[TraversableOnce[A]]. Useful for reducing many Futures into a
single Future
from Scala Future
You can try:
val result : Future[Seq[Option[User]]] =
Future.sequence(
api.getNewUsers().map( getUser )
)
result.andThen {
case Success(users) =>
users.flatten.foreach(u => yourMap += u.id -> u)
}

Related

Best way to get List[String] or Future[List[String]] from List[Future[List[String]]] Scala

I have a flow that returns List[Future[List[String]]] and I want to convert it to List[String] .
Here's what I am doing currently to achieve it -
val functionReturnedValue: List[Future[List[String]]] = functionThatReturnsListOfFutureList()
val listBuffer = new ListBuffer[String]
functionReturnedValue.map{futureList =>
val list = Await.result(futureList, Duration(10, "seconds"))
list.map(string => listBuffer += string)
}
listBuffer.toList
Waiting inside loop is not good, also need to avoid use of ListBuffer.
Or, if it is possible to get Future[List[String]] from List[Future[List[String]]]
Could someone please help with this?
There is no way to get a value from an asynchronus context to the synchronus context wihtout blocking the sysnchronus context to wait for the asynchronus context.
But, yes you can delay that blocking as much as you can do get better results.
val listFutureList: List[Future[List[String]]] = ???
val listListFuture: Future[List[List[String]]] = Future.sequence(listFutureList)
val listFuture: Future[List[String]] = listListFuture.map(_.flatten)
val list: List[String] = Await.result(listFuture, Duration.Inf)
Using Await.result invokes a blocking operation, which you should avoid if you can.
Just as a side note, in your code you are using .map but as you are only interested in the (mutable) ListBuffer you can just use foreach which has Unit as a return type.
Instead of mapping and adding item per item, you can use .appendAll
functionReturnedValue.foreach(fl =>
listBuffer.appendAll(Await.result(fl, Duration(10, "seconds")))
)
As you don't want to use ListBuffer, another way could be using .sequence is with a for comprehension and then .flatten
val fls: Future[List[String]] = for (
lls <- Future.sequence(functionReturnedValue)
) yield lls.flatten
You can transform List[Future[In]] to Future[List[In]] safetly as follows:
def aggregateSafeSequence[In](futures: List[Future[In]])(implicit ec: ExecutionContext): Future[List[In]] = {
val futureTries = futures.map(_.map(Success(_)).recover { case NonFatal(ex) => Failure(ex)})
Future.sequence(futureTries).map {
_.foldRight(List[In]()) {
case (curr, acc) =>
curr match {
case Success(res) => res :: acc
case Failure(ex) =>
println("Failure occurred", ex)
acc
}
}
}
}
Then you can use Await.result In order to wait if you like but it's not recommended and you should avoid it if possible.
Note that in general Future.sequence, if one the futures fails all the futures will fail together, so i went to a little different approach.
You can use the same way from List[Future[List[String]]] and etc.

request timeout from flatMapping over cats.effect.IO

I am attempting to transform some data that is encapsulated in cats.effect.IO with a Map that also is in an IO monad. I'm using http4s with blaze server and when I use the following code the request times out:
def getScoresByUserId(userId: Int): IO[Response[IO]] = {
implicit val formats = DefaultFormats + ShiftJsonSerializer() + RawShiftSerializer()
implicit val shiftJsonReader = new Reader[ShiftJson] {
def read(value: JValue): ShiftJson = value.extract[ShiftJson]
}
implicit val shiftJsonDec = jsonOf[IO, ShiftJson]
// get the shifts
var getDbShifts: IO[List[Shift]] = shiftModel.findByUserId(userId)
// use the userRoleId to get the RoleId then get the tasks for this role
val taskMap : IO[Map[String, Double]] = taskModel.findByUserId(userId).flatMap {
case tskLst: List[Task] => IO(tskLst.map((task: Task) => (task.name -> task.standard)).toMap)
}
val traversed: IO[List[Shift]] = for {
shifts <- getDbShifts
traversed <- shifts.traverse((shift: Shift) => {
val lstShiftJson: IO[List[ShiftJson]] = read[List[ShiftJson]](shift.roleTasks)
.map((sj: ShiftJson) =>
taskMap.flatMap((tm: Map[String, Double]) =>
IO(ShiftJson(sj.name, sj.taskType, sj.label, sj.value.toString.toDouble / tm.get(sj.name).get)))
).sequence
//TODO: this flatMap is bricking my request
lstShiftJson.flatMap((sjLst: List[ShiftJson]) => {
IO(Shift(shift.id, shift.shiftDate, shift.shiftStart, shift.shiftEnd,
shift.lunchDuration, shift.shiftDuration, shift.breakOffProd, shift.systemDownOffProd,
shift.meetingOffProd, shift.trainingOffProd, shift.projectOffProd, shift.miscOffProd,
write[List[ShiftJson]](sjLst), shift.userRoleId, shift.isApproved, shift.score, shift.comments
))
})
})
} yield traversed
traversed.flatMap((sLst: List[Shift]) => Ok(write[List[Shift]](sLst)))
}
as you can see the TODO comment. I've narrowed down this method to the flatmap below the TODO comment. If I remove that flatMap and merely return "IO(shift)" to the traversed variable the request does not timeout; However, that doesn't help me much because I need to make use of the lstShiftJson variable which has my transformed json.
My intuition tells me I'm abusing the IO monad somehow, but I'm not quite sure how.
Thank you for your time in reading this!
So with the guidance of Luis's comment I refactored my code to the following. I don't think it is optimal (i.e. the flatMap at the end seems unecessary, but I couldnt' figure out how to remove it. BUT it's the best I've got.
def getScoresByUserId(userId: Int): IO[Response[IO]] = {
implicit val formats = DefaultFormats + ShiftJsonSerializer() + RawShiftSerializer()
implicit val shiftJsonReader = new Reader[ShiftJson] {
def read(value: JValue): ShiftJson = value.extract[ShiftJson]
}
implicit val shiftJsonDec = jsonOf[IO, ShiftJson]
// FOR EACH SHIFT
// - read the shift.roleTasks into a ShiftJson object
// - divide each task value by the task.standard where task.name = shiftJson.name
// - write the list of shiftJson back to a string
val traversed = for {
taskMap <- taskModel.findByUserId(userId).map((tList: List[Task]) => tList.map((task: Task) => (task.name -> task.standard)).toMap)
shifts <- shiftModel.findByUserId(userId)
traversed <- shifts.traverse((shift: Shift) => {
val lstShiftJson: List[ShiftJson] = read[List[ShiftJson]](shift.roleTasks)
.map((sj: ShiftJson) => ShiftJson(sj.name, sj.taskType, sj.label, sj.value.toString.toDouble / taskMap.get(sj.name).get ))
shift.roleTasks = write[List[ShiftJson]](lstShiftJson)
IO(shift)
})
} yield traversed
traversed.flatMap((t: List[Shift]) => Ok(write[List[Shift]](t)))
}
Luis mentioned that mapping my List[Shift] to a Map[String, Double] is a pure operation so we want to use a map instead of flatMap.
He mentioned that I'm wrapping every operation that comes from the database in IO which is causing a great deal of recomputation. (including DB transactions)
To solve this issue I moved all of the database operations inside of my for loop, using the "<-" operator to flatMap each of the return values allows the variables being used to preside within the IO monads, hence preventing the recomputation experienced before.
I do think there must be a better way of returning my return value. flatMapping the "traversed" variable to get back inside of the IO monad seems to be unnecessary recomputation, so please anyone correct me.

How to retrieve value from the output of a scala Future?

I am trying to query a table, store values of the query in a Scala Map & return the same map.
To do that, I came up with the following code:
def getBounds(incLogIdMap:scala.collection.mutable.Map[String, String]): Future[scala.collection.mutable.Map[String, String]] = Future {
var boundsMap = scala.collection.mutable.Map[String, String]()
incLogIdMap.keys.foreach(table => if(!incLogIdMap(table).contains("INVALID")) {
val minMax = s"select max(cast(to_char(update_tms,'yyyyddmmhhmmss') as bigint)) maxTms, min(cast(to_char(update_tms,'yyyyddmmhhmmss') as bigint)) minTms from queue.${table} where key_ids in (${incLogIdMap(table)})"
val boundsDF = spark.read.format("jdbc").option("url", commonParams.getGpConUrl()).option("dbtable", s"(${minMax}) as ctids")
.option("user", commonParams.getGpUserName()).option("password", commonParams.getGpPwd()).load()
val maxTms = boundsDF.select("minTms").head.getLong(0).toString + "," + boundsDF.select("maxTms").head.getLong(0).toString
boundsMap += (table -> maxTms)
}
)
boundsMap
}
In order to receive the value from the method: getBounds, I used the method onCompletion as below:
val tmsobj = new MinMaxVals(spark, commonParams)
val boundsMap = tmsobj.getBounds(incLogIds)
boundsMap.onComplete({
case Success(value) =>
case Failure(value) =>
})
I have coded in Scala before but I am new to Futures in Scala. Could anyone let me know how can I retrieve the value returned by getBounds into val boundsMap
You can use Awaits ( not the best aproach)
val boundsMap = Await.result(tmsobj.getBounds(incLogIds),Duration.Inf)
Or use the value only when you need
val boundsMap = tmsobj.getBounds(incLogIds)
booundsMap.map(value => Smth_To_Do(value))
Accessing a value from a Future is not recommended as it defeats the purpose of asynchronous computation. However, there may be cases where you are dealing with the legacy code or some situation where fetching the value from the future is the way forward. To deal with such situations, there are two approaches
Using await that will block the thread
Await.result(getBounds, 10 seconds)
So, here what await does is, it will wait for 10 seconds for the getBounds future to complete. If it completes within this time, then you have the value, else you get an exception here. The biggest drawback of this method is that it blocks the current thread of execution.
Using a callback method onComplete as you have used
getBounds onComplete {
case Success(someOption) => myMethod(someOption)
case Failure(t) => println("Error)
}
So what onComplete does is to register a callback function that will get executed whenever the future is completed. This is comparatively safer that await.
You can refer to Accessing value returned by scala futures for further details.
I hope that this answers your question.

Wait for a list of futures with composing Option in Scala

I have to get a list of issues for each file of a given list from a REST API with Scala. I want to do the requests in parallel, and use the Dispatch library for this. My method is called from a Java framework and I have to wait at the end of this method for the result of all the futures to yield the overall result back to the framework. Here's my code:
def fetchResourceAsJson(filePath: String): dispatch.Future[json4s.JValue]
def extractLookupId(json: org.json4s.JValue): Option[String]
def findLookupId(filePath: String): Future[Option[String]] =
for (json <- fetchResourceAsJson(filePath))
yield extractLookupId(json)
def searchIssuesJson(lookupId: String): Future[json4s.JValue]
def extractIssues(json: org.json4s.JValue): Seq[Issue]
def findIssues(lookupId: String): Future[Seq[Issue]] =
for (json <- searchIssuesJson(componentId))
yield extractIssues(json)
def getFilePathsToProcess: List[String]
def thisIsCalledByJavaFramework(): java.util.Map[String, java.util.List[Issue]] = {
val finalResultPromise = Promise[Map[String, Seq[Issue]]]()
// (1) inferred type of issuesByFile not as expected, cannot get
// the type system happy, would like to have Seq[Future[(String, Seq[Issue])]]
val issuesByFile = getFilePathsToProcess map { f =>
findLookupId(f).flatMap { lookupId =>
(f, findIssues(lookupId)) // I want to yield a tuple (String, Seq[Issue]) here
}
}
Future.sequence(issuesByFile) onComplete {
case Success(x) => finalResultPromise.success(x) // (2) how to return x here?
case Failure(x) => // (3) how to return null from here?
}
//TODO transform finalResultPromise to Java Map
}
This code snippet has several issues. First, I'm not getting the type I would expect for issuesByFile (1). I would like to just ignore the result of findLookUpId if it is not able to find the lookUp ID (i.e., None). I've read in various tutorials that Future[Option[X]] is not easy to handle in function compositions and for expressions in Scala. So I'm also curious what the best practices are to handle these properly.
Second, I somehow have to wait for all futures to finish, but don't know how to return the result to the calling Java framework (2). Can I use a promise here to achieve this? If yes, how can I do it?
And last but not least, in case of any errors, I would just like to return null from thisIsCalledByJavaFramework but don't know how (3).
Any help is much appreciated.
Thanks,
Michael
Several points:
The first problem at (1) is that you don't handle the case where findLookupId returns None. You need to decide what to do in this case. Fail the whole process? Exclude that file from the list?
The second problem at (1) is that findIssues will itself return a Future, which you need to map before you can build the result tuple
There's a shortcut for map and then Future.sequence: Future.traverse
If you cannot change the result type of the method because the Java interface is fixed and cannot be changed to support Futures itself you must wait for the Future to be completed. Use Await.ready or Await.result to do that.
Taking all that into account and choosing to ignore files for which no id could be found results in this code:
// `None` in an entry for a file means that no id could be found
def entryForFile(file: String): Future[(String, Option[Seq[Issue]])] =
findLookupId(file).flatMap {
// the need for this kind of pattern match shows
// the difficulty of working with `Future[Option[T]]`
case Some(id) ⇒ findIssues(id).map(issues ⇒ file -> Some(issues))
case None ⇒ Future.successful(file -> None)
}
def thisIsCalledByJavaFramework(): java.util.Map[String, java.util.List[Issue]] = {
val issuesByFile: Future[Seq[(String, Option[Seq[Issue]])]] =
Future.traverse(getFilePathsToProcess)(entryForFile)
import scala.collection.JavaConverters._
try
Await.result(issuesByFile, 10.seconds)
.collect {
// here we choose to ignore entries where no id could be found
case (f, Some(issues)) ⇒ f -> issues
}
.toMap.mapValues(_.asJava).asJava
catch {
case NonFatal(_) ⇒ null
}
}

Create Future without starting it

This is a follow-up to my previous question
Suppose I want to create a future with my function but don't want to start it immediately (i.e. I do not want to call val f = Future { ... // my function}.
Now I see it can be done as follows:
val p = promise[Unit]
val f = p.future map { _ => // my function here }
Is it the only way to create a future with my function w/o executing it?
You can do something like this
val p = Promise[Unit]()
val f = p.future
//... some code run at a later time
p.success {
// your function
}
LATER EDIT:
I think the pattern you're looking for can be encapsulated like this:
class LatentComputation[T](f: => T) {
private val p = Promise[T]()
def trigger() { p.success(f) }
def future: Future[T] = p.future
}
object LatentComputation {
def apply[T](f: => T) = new LatentComputation(f)
}
You would use it like this:
val comp = LatentComputation {
// your code to be executed later
}
val f = comp.future
// somewhere else in the code
comp.trigger()
You could always defer creation with a closure, you'll not get the future object right ahead, but you get a handle to call later.
type DeferredComputation[T,R] = T => Future[R]
def deferredCall[T,R](futureBody: T => R): DeferredComputation[T,R] =
t => future {futureBody(t)}
def deferredResult[R](futureBody: => R): DeferredComputation[Unit,R] =
_ => future {futureBody}
If you are getting too fancy with execution control, maybe you should be using actors instead?
Or, perhaps, you should be using a Promise instead of a Future: a Promise can be passed on to others, while you keep it to "fulfill" it at a later time.
It's also worth giving a plug to Promise.completeWith.
You already know how to use p.future onComplete mystuff.
You can trigger that from another future using p completeWith f.
You can also define a function that creates and returns the Future, and then call it:
val double = (value: Int) => {
val f = Future { Thread.sleep(1000); value * 2 }
f.onComplete(x => println(s"Future return: $x"))
f
}
println("Before future.")
double(2)
println("After future is called, but as the future takes 1 sec to run, it will be printed before.")
I used this to executes futures in batches of n, something like:
// The functions that returns the future.
val double = (i: Int) => {
val future = Future ({
println(s"Start task $i")
Thread.sleep(1000)
i * 2
})
future.onComplete(_ => {
println(s"Task $i ended")
})
future
}
val numbers = 1 to 20
numbers
.map(i => (i, double))
.grouped(5)
.foreach(batch => {
val result = Await.result( Future.sequence(batch.map{ case (i, callback) => callback(i) }), 5.minutes )
println(result)
})
Or just use regular methods that return futures, and fire them in series using something like a for comprehension (sequential call-site evaluation)
This well known problem with standard libraries Future: they are designed in such a way that they are not referentially transparent, since they evaluate eagerly and memoize their result. In most use cases, this is totally fine and Scala developers rarely need to create non-evaluated future.
Take the following program:
val x = Future(...); f(x, x)
is not the same program as
f(Future(...), Future(...))
because in the first case the future is evaluated once, in the second case it is evaluated twice.
The are libraries which provide the necessary abstractions to work with referentially transparent asynchronous tasks, whose evaluation is deferred and not memoized unless explicitly required by the developer.
Scalaz Task
Monix Task
fs2
If you are looking to use Cats, Cats effects works nicely with both Monix and fs2.
this is a bit of a hack, since it have nothing to do with how future works but just adding lazy would suffice:
lazy val f = Future { ... // my function}
but note that this is sort of a type change as well, because whenever you reference it you will need to declare the reference as lazy too or it will be executed.