Scala: Convert a vector of tuples containing a future to a future of a vector of tuples - scala

I'm looking for a way to convert a Vector[(Future[TypeA], TypeB)] to a Future[Vector[(TypeA, TypeB)]].
I'm aware of the conversion of a collection of futures to a future of a collection using Future.sequence(...) but cannot find out a way to manage the step from the tuple with a future to a future of tuple.
So I'm looking for something that implements the desired functionality of the dummy extractFutureFromTuple in the following.
val vectorOfTuples: Vector[(Future[TypeA], TypeB)] = ...
val vectorOfFutures: Vector[Future[(TypeA, TypeB)]] = vectorOfTuples.map(_.extractFutureFromTuple)
val futureVector: Future[Vector[(TypeA, TypeB)]] = Future.sequence(vectorOfFutures)

Note that you can do this with a single call to Future.traverse:
val input: Vector[(Future[Int], Long)] = ???
val output: Future[Vector[(Int, Long)]] = Future.traverse(input) {
case (f, v) => f.map(_ -> v)
}

Related

What is the best way to merge two Future[Map[T1, T2]] in Scala

I have a list of fileNames and I want to load the correlated pages in batches (and not all at once). In order to do so, I'm using FoldLeft and I'm writing an aggregate function which aggregates a Future[Map[T1,T2]].
def loadPagesInBatches[T1, T2](fileNames: Set[FileName]): Future[Map[T1, T2]] = {
val fileNameToPageId: Map[FileName, PageId] = ... //invokes a function that returns the pageId correlated to the fileName.
val batches: Iterator[Set[FileName]] = fileNames.grouped(10) //batches of 10;
batches.foldLeft(Future(Map.empty[T1, T2]))(aggregate(fileNameToPageId))
}
And the signature of aggregate is as follows:
def aggregate(fileNameToPageId: Map[FileName, PageId]): (Future[Map[T1, T2]], Set[FileName]) => Future[Map[T1, T2]] = {..}
I'm trying to make sure what is the best way to merge these Future[Map]s.
Thanks ahead!
P.S: FileName and PageId are just Types of string.
In case you have exactly 2 futures, zipWith would probably be the most idiomatic.
val future1 = ???
val future2 = ???
future1.zipWith(future2)(_ ++ _)
Which is a shorter way of writing a for comprehension:
for {
map1 <- future1
map2 <- future2
} yield map1 ++ map2
Although zipWith could potentially implement some kind of optimization.
My solution was putting the two maps into a list and using Future.reduceLeft.
def aggregate(fileNameToPageId: Map[FileName, PageId]): (Future[Map[T1, T2]], Set[FileName]) => Future[Map[T1, T2]] = {
case (all, filesBatch) =>
val mapOfPages: Future[Map[NodeId, T]] = for {
... //Some logic
} yield "TheBatchMap"
Future.reduceLeft(List(all, mapOfPages))(_ ++ _)
}

How to combine 2 future sequences of type Seq[Either[A,B]]?

Suppose,there are 2 future sequences of type Future[Seq[A,B]]. How can I combine into one?
You combined Futures using flatMap. So like:
futureA.flatMap(firstSequence =>
futureB.map(secondSequence => firstSequence ++ secondSequence))
For comprehensions are syntax sugar for this:
for {
firstSequence <- futureA
secondSequence <- futureB
} yield firstSequence ++ secondSequence
This code will run your Futures sequentially if they've been lazy up until this point. So you may wish to allow them to run in parallel by assigning them to a val before the for comprehension.
val executingFutureA = futureA
val executingFutureB = futureB
for {
firstSequence <- executingFutureA
secondSequence <- executingFutureB
} yield firstSequence ++ secondSequence
You can use Future.sequence to convert a sequence of Future into a single Future containing a sequence of the results of each Future. So in your case you can do this:
val a: Future[Seq[Either[A,B]]] = ???
val b: Future[Seq[Either[A,B]]] = ???
Future.sequence(Seq(a, b)).map(_.flatten) // => Seq[Either[A,B]]
The flatten operations converts the Seq[Seq[Either[A,B]]] into Seq[Either[A,B]], but the results could be combined in other ways if required.
This solution is very flexible, but for a fixed number of Seq[Future] it is often better to use flatMap/for as explained in another answer.

Scala Future Sequence Mapping: finding length?

I want to return both a Future[Seq[String]] from a method and the length of that Seq[String] as well. Currently I'm building the Future[Seq[String]] using a mapping function from another Future[T].
Is there any way to do this without awaiting for the Future?
You can map over the current Future to create a new one with the new data added to the type.
val fss: Future[Seq[String]] = Future(Seq("a","b","c"))
val x: Future[(Seq[String],Int)] = fss.map(ss => (ss, ss.length))
If you somehow know what the length of the Seq will be without actually waiting for it, then something like this;
val t: Future[T] = ???
def foo: (Int, Future[Seq[String]]) = {
val length = 42 // ???
val fut: Future[Seq[String]] = t map { v =>
genSeqOfLength42(v)
}
(length, fut)
}
If you don't, then you will have to return Future[(Int, Seq[String])] as jwvh said, or you can easily get the length later in the calling function.

Spark: Not able to use accumulator on a tuple/count using scala

I am trying to replace reduceByKey with accumulator logic for word count.
wc.txt
Hello how are are you
Here's what I've got so far:
val words = sc.textFile("wc.txt").flatMap(_.split(" "))
val accum = sc.accumulator(0,"myacc")
for (i <- 1 to words.count.toInt)
foreach( x => accum+ =x)
.....
How to proceed about it. Any thoughts or ideas are appreciated.
Indeed, using Accumulators for this is cumbersome and not recommended - but for completeness - here's how it can be done (at least with Spark versions 1.6 <= V <= 2.1). Do note that this uses a deprecated API that will not be a part of next versions.
You'll need a Map[String, Long] accumulator, which is not available by default, so you'll need to create your own AccumulableParam implementation and use it implicitly:
// some data:
val words = sc.parallelize(Seq("Hello how are are you")).flatMap(_.split(" "))
// aliasing the type, just for convenience
type AggMap = Map[String, Long]
// creating an implicit AccumulableParam that counts by String key
implicit val param: AccumulableParam[AggMap, String] = new AccumulableParam[AggMap, String] {
// increase matching value by 1, or create it if missing
override def addAccumulator(r: AggMap, t: String): AggMap =
r.updated(t, r.getOrElse(t, 0L) + 1L)
// merge two maps by summing matching values
override def addInPlace(r1: AggMap, r2: AggMap): AggMap =
r1 ++ r2.map { case (k, v) => k -> (v + r1.getOrElse(k, 0L)) }
// start with an empty map
override def zero(initialValue: AggMap): AggMap = Map.empty
}
// create the accumulator; This will use the above `param` implicitly
val acc = sc.accumulable[AggMap, String](Map.empty[String, Long])
// add each word to accumulator; the `count()` can be replaced by any Spark action -
// we just need to trigger the calculation of the mapped RDD
words.map(w => { acc.add(w); w }).count()
// after the action, we acn read the value of the accumulator
val result: AggMap = acc.value
result.foreach(println)
// (Hello,1)
// (how,1)
// (are,2)
// (you,1)
As I understand you want to count all words in you text file using Spark accumulator, in this case you can use:
words.foreach(_ => accum.add(1L))

Transform scala inner map

hey i have Map like this:
val valueParameters = Map("key1"->"value","anotherkey1"->"value","thirdkey1"->"value","key2"->"value","anotherkey2"->"value","thirdkey2"->"value")
and pattern:
val pattern = """(?<=[a-zA-Z])\d{1,2}""".r
val result = valueParameters.groupBy(x=>pattern.findAllIn(x._1).next().toInt).toSeq.sortBy(_._1).toMap
which gives: Map[Int,Map[String,String] and i want to remove the number from the first string of the second map which i dont need anymore so i can : result(1)("key") not result(1)("key1")
This should work
val result1 = result.map { case (k,v) =>
k -> v.map { case (a,b) =>
val a1 = a.takeWhile(! _.isDigit)
a1 -> b
}
}
Note that while using mapValues would result in shorter code, mapValues is a lazy operation that will do the computation every time you access the map, whereas mapping the entries will result in the computation being done once, which is usually what you expect in scala.