What is the best way to merge two Future[Map[T1, T2]] in Scala - scala

I have a list of fileNames and I want to load the correlated pages in batches (and not all at once). In order to do so, I'm using FoldLeft and I'm writing an aggregate function which aggregates a Future[Map[T1,T2]].
def loadPagesInBatches[T1, T2](fileNames: Set[FileName]): Future[Map[T1, T2]] = {
val fileNameToPageId: Map[FileName, PageId] = ... //invokes a function that returns the pageId correlated to the fileName.
val batches: Iterator[Set[FileName]] = fileNames.grouped(10) //batches of 10;
batches.foldLeft(Future(Map.empty[T1, T2]))(aggregate(fileNameToPageId))
}
And the signature of aggregate is as follows:
def aggregate(fileNameToPageId: Map[FileName, PageId]): (Future[Map[T1, T2]], Set[FileName]) => Future[Map[T1, T2]] = {..}
I'm trying to make sure what is the best way to merge these Future[Map]s.
Thanks ahead!
P.S: FileName and PageId are just Types of string.

In case you have exactly 2 futures, zipWith would probably be the most idiomatic.
val future1 = ???
val future2 = ???
future1.zipWith(future2)(_ ++ _)
Which is a shorter way of writing a for comprehension:
for {
map1 <- future1
map2 <- future2
} yield map1 ++ map2
Although zipWith could potentially implement some kind of optimization.

My solution was putting the two maps into a list and using Future.reduceLeft.
def aggregate(fileNameToPageId: Map[FileName, PageId]): (Future[Map[T1, T2]], Set[FileName]) => Future[Map[T1, T2]] = {
case (all, filesBatch) =>
val mapOfPages: Future[Map[NodeId, T]] = for {
... //Some logic
} yield "TheBatchMap"
Future.reduceLeft(List(all, mapOfPages))(_ ++ _)
}

Related

Best way to get List[String] or Future[List[String]] from List[Future[List[String]]] Scala

I have a flow that returns List[Future[List[String]]] and I want to convert it to List[String] .
Here's what I am doing currently to achieve it -
val functionReturnedValue: List[Future[List[String]]] = functionThatReturnsListOfFutureList()
val listBuffer = new ListBuffer[String]
functionReturnedValue.map{futureList =>
val list = Await.result(futureList, Duration(10, "seconds"))
list.map(string => listBuffer += string)
}
listBuffer.toList
Waiting inside loop is not good, also need to avoid use of ListBuffer.
Or, if it is possible to get Future[List[String]] from List[Future[List[String]]]
Could someone please help with this?
There is no way to get a value from an asynchronus context to the synchronus context wihtout blocking the sysnchronus context to wait for the asynchronus context.
But, yes you can delay that blocking as much as you can do get better results.
val listFutureList: List[Future[List[String]]] = ???
val listListFuture: Future[List[List[String]]] = Future.sequence(listFutureList)
val listFuture: Future[List[String]] = listListFuture.map(_.flatten)
val list: List[String] = Await.result(listFuture, Duration.Inf)
Using Await.result invokes a blocking operation, which you should avoid if you can.
Just as a side note, in your code you are using .map but as you are only interested in the (mutable) ListBuffer you can just use foreach which has Unit as a return type.
Instead of mapping and adding item per item, you can use .appendAll
functionReturnedValue.foreach(fl =>
listBuffer.appendAll(Await.result(fl, Duration(10, "seconds")))
)
As you don't want to use ListBuffer, another way could be using .sequence is with a for comprehension and then .flatten
val fls: Future[List[String]] = for (
lls <- Future.sequence(functionReturnedValue)
) yield lls.flatten
You can transform List[Future[In]] to Future[List[In]] safetly as follows:
def aggregateSafeSequence[In](futures: List[Future[In]])(implicit ec: ExecutionContext): Future[List[In]] = {
val futureTries = futures.map(_.map(Success(_)).recover { case NonFatal(ex) => Failure(ex)})
Future.sequence(futureTries).map {
_.foldRight(List[In]()) {
case (curr, acc) =>
curr match {
case Success(res) => res :: acc
case Failure(ex) =>
println("Failure occurred", ex)
acc
}
}
}
}
Then you can use Await.result In order to wait if you like but it's not recommended and you should avoid it if possible.
Note that in general Future.sequence, if one the futures fails all the futures will fail together, so i went to a little different approach.
You can use the same way from List[Future[List[String]]] and etc.

request timeout from flatMapping over cats.effect.IO

I am attempting to transform some data that is encapsulated in cats.effect.IO with a Map that also is in an IO monad. I'm using http4s with blaze server and when I use the following code the request times out:
def getScoresByUserId(userId: Int): IO[Response[IO]] = {
implicit val formats = DefaultFormats + ShiftJsonSerializer() + RawShiftSerializer()
implicit val shiftJsonReader = new Reader[ShiftJson] {
def read(value: JValue): ShiftJson = value.extract[ShiftJson]
}
implicit val shiftJsonDec = jsonOf[IO, ShiftJson]
// get the shifts
var getDbShifts: IO[List[Shift]] = shiftModel.findByUserId(userId)
// use the userRoleId to get the RoleId then get the tasks for this role
val taskMap : IO[Map[String, Double]] = taskModel.findByUserId(userId).flatMap {
case tskLst: List[Task] => IO(tskLst.map((task: Task) => (task.name -> task.standard)).toMap)
}
val traversed: IO[List[Shift]] = for {
shifts <- getDbShifts
traversed <- shifts.traverse((shift: Shift) => {
val lstShiftJson: IO[List[ShiftJson]] = read[List[ShiftJson]](shift.roleTasks)
.map((sj: ShiftJson) =>
taskMap.flatMap((tm: Map[String, Double]) =>
IO(ShiftJson(sj.name, sj.taskType, sj.label, sj.value.toString.toDouble / tm.get(sj.name).get)))
).sequence
//TODO: this flatMap is bricking my request
lstShiftJson.flatMap((sjLst: List[ShiftJson]) => {
IO(Shift(shift.id, shift.shiftDate, shift.shiftStart, shift.shiftEnd,
shift.lunchDuration, shift.shiftDuration, shift.breakOffProd, shift.systemDownOffProd,
shift.meetingOffProd, shift.trainingOffProd, shift.projectOffProd, shift.miscOffProd,
write[List[ShiftJson]](sjLst), shift.userRoleId, shift.isApproved, shift.score, shift.comments
))
})
})
} yield traversed
traversed.flatMap((sLst: List[Shift]) => Ok(write[List[Shift]](sLst)))
}
as you can see the TODO comment. I've narrowed down this method to the flatmap below the TODO comment. If I remove that flatMap and merely return "IO(shift)" to the traversed variable the request does not timeout; However, that doesn't help me much because I need to make use of the lstShiftJson variable which has my transformed json.
My intuition tells me I'm abusing the IO monad somehow, but I'm not quite sure how.
Thank you for your time in reading this!
So with the guidance of Luis's comment I refactored my code to the following. I don't think it is optimal (i.e. the flatMap at the end seems unecessary, but I couldnt' figure out how to remove it. BUT it's the best I've got.
def getScoresByUserId(userId: Int): IO[Response[IO]] = {
implicit val formats = DefaultFormats + ShiftJsonSerializer() + RawShiftSerializer()
implicit val shiftJsonReader = new Reader[ShiftJson] {
def read(value: JValue): ShiftJson = value.extract[ShiftJson]
}
implicit val shiftJsonDec = jsonOf[IO, ShiftJson]
// FOR EACH SHIFT
// - read the shift.roleTasks into a ShiftJson object
// - divide each task value by the task.standard where task.name = shiftJson.name
// - write the list of shiftJson back to a string
val traversed = for {
taskMap <- taskModel.findByUserId(userId).map((tList: List[Task]) => tList.map((task: Task) => (task.name -> task.standard)).toMap)
shifts <- shiftModel.findByUserId(userId)
traversed <- shifts.traverse((shift: Shift) => {
val lstShiftJson: List[ShiftJson] = read[List[ShiftJson]](shift.roleTasks)
.map((sj: ShiftJson) => ShiftJson(sj.name, sj.taskType, sj.label, sj.value.toString.toDouble / taskMap.get(sj.name).get ))
shift.roleTasks = write[List[ShiftJson]](lstShiftJson)
IO(shift)
})
} yield traversed
traversed.flatMap((t: List[Shift]) => Ok(write[List[Shift]](t)))
}
Luis mentioned that mapping my List[Shift] to a Map[String, Double] is a pure operation so we want to use a map instead of flatMap.
He mentioned that I'm wrapping every operation that comes from the database in IO which is causing a great deal of recomputation. (including DB transactions)
To solve this issue I moved all of the database operations inside of my for loop, using the "<-" operator to flatMap each of the return values allows the variables being used to preside within the IO monads, hence preventing the recomputation experienced before.
I do think there must be a better way of returning my return value. flatMapping the "traversed" variable to get back inside of the IO monad seems to be unnecessary recomputation, so please anyone correct me.

How to combine 2 future sequences of type Seq[Either[A,B]]?

Suppose,there are 2 future sequences of type Future[Seq[A,B]]. How can I combine into one?
You combined Futures using flatMap. So like:
futureA.flatMap(firstSequence =>
futureB.map(secondSequence => firstSequence ++ secondSequence))
For comprehensions are syntax sugar for this:
for {
firstSequence <- futureA
secondSequence <- futureB
} yield firstSequence ++ secondSequence
This code will run your Futures sequentially if they've been lazy up until this point. So you may wish to allow them to run in parallel by assigning them to a val before the for comprehension.
val executingFutureA = futureA
val executingFutureB = futureB
for {
firstSequence <- executingFutureA
secondSequence <- executingFutureB
} yield firstSequence ++ secondSequence
You can use Future.sequence to convert a sequence of Future into a single Future containing a sequence of the results of each Future. So in your case you can do this:
val a: Future[Seq[Either[A,B]]] = ???
val b: Future[Seq[Either[A,B]]] = ???
Future.sequence(Seq(a, b)).map(_.flatten) // => Seq[Either[A,B]]
The flatten operations converts the Seq[Seq[Either[A,B]]] into Seq[Either[A,B]], but the results could be combined in other ways if required.
This solution is very flexible, but for a fixed number of Seq[Future] it is often better to use flatMap/for as explained in another answer.

Scala: Convert a vector of tuples containing a future to a future of a vector of tuples

I'm looking for a way to convert a Vector[(Future[TypeA], TypeB)] to a Future[Vector[(TypeA, TypeB)]].
I'm aware of the conversion of a collection of futures to a future of a collection using Future.sequence(...) but cannot find out a way to manage the step from the tuple with a future to a future of tuple.
So I'm looking for something that implements the desired functionality of the dummy extractFutureFromTuple in the following.
val vectorOfTuples: Vector[(Future[TypeA], TypeB)] = ...
val vectorOfFutures: Vector[Future[(TypeA, TypeB)]] = vectorOfTuples.map(_.extractFutureFromTuple)
val futureVector: Future[Vector[(TypeA, TypeB)]] = Future.sequence(vectorOfFutures)
Note that you can do this with a single call to Future.traverse:
val input: Vector[(Future[Int], Long)] = ???
val output: Future[Vector[(Int, Long)]] = Future.traverse(input) {
case (f, v) => f.map(_ -> v)
}

Scala - merging multiple iterators

I have multiple iterators which return items in a sorted manner according to some sorting criterion. Now, I would like to merge (multiplex) the iterators into one, combined iterator. I know how to do it in Java style, with e.g. tree-map, but I was wondering if there is a more functional approach? I want to preserve the laziness of the iterators as much as possible.
You can just do:
val it = iter1 ++ iter2
It creates another iterator and does not evaluate the elements, but wraps the two existing iterators.
It is fully lazy, so you are not supposed to use iter1 or iter2 once you do this.
In general, if you have more iterators to merge, you can use folding:
val iterators: Seq[Iterator[T]] = ???
val it = iterators.foldLeft(Iterator[T]())(_ ++ _)
If you have some ordering on the elements that you would like to maintain in the resulting iterator but you want lazyness, you can convert them to streams:
def merge[T: Ordering](iter1: Iterator[T], iter2: Iterator[T]): Iterator[T] = {
val s1 = iter1.toStream
val s2 = iter2.toStream
def mergeStreams(s1: Stream[T], s2: Stream[T]): Stream[T] = {
if (s1.isEmpty) s2
else if (s2.isEmpty) s1
else if (s1.head < s2.head) s1.head #:: mergeStreams(s1.tail, s2)
else s2.head #:: mergeStreams(s1, s2.tail)
}
mergeStreams(s1, s2).iterator
}
Not necessarily faster though, you should microbenchmark this.
A possible alternative is to use buffered iterators to achieve the same effect.
Like #axel22 mentioned, you can do this with BufferedIterators. Here's one Stream-free solution:
def combine[T](rawIterators: List[Iterator[T]])(implicit cmp: Ordering[T]): Iterator[T] = {
new Iterator[T] {
private val iterators: List[BufferedIterator[T]] = rawIterators.map(_.buffered)
def hasNext: Boolean = iterators.exists(_.hasNext)
def next(): T = if (hasNext) {
iterators.filter(_.hasNext).map(x => (x.head, x)).minBy(_._1)(cmp)._2.next()
} else {
throw new UnsupportedOperationException("Cannot call next on an exhausted iterator!")
}
}
You could try:
(iterA ++ iterB).toStream.sorted.toIterator
For example:
val i1 = (1 to 100 by 3).toIterator
val i2 = (2 to 100 by 3).toIterator
val i3 = (3 to 100 by 3).toIterator
val merged = (i1 ++ i2 ++ i3).toStream.sorted.toIterator
merged.next // results in: 1
merged.next // results in: 2
merged.next // results in: 3