Composing futures - how to get another variable associated with the result of a list of futures - scala

I'm a bit new to future composition so I haven't figured out all the common patterns yet.
I have a list of futures but I need to associate a name with the futures when they are created so I can somehow reconcile the list.
EG if I create a list of futures like this, how can I get x to be associated with the future's result?
val requestsForMaster = shardNames.map { x ⇒
sentinel ? Request("SENTINEL", "get-master-addr-by-name", x)
}
I would do something like this to get the futures into a sequence
val mastersConfig = Future.sequence(requestsForMaster)
mastersConfig.onSuccess {
case x: List[Some[List[Some[ByteString]]]] ⇒
self ! x.map {
case Some(List(Some(host: ByteString), Some(port: ByteString))) ⇒
println("Host/port: " + host.utf8String + ":" + port.utf8String)
Shard("name", host.utf8String, port.utf8String.toInt, None)
}
}
But when I go to create the Shard object, the name (x) isn't available and I need it in there.
Any idea as to how I can compose these to get the name in there?
Edit:
Here is the solution I used:
val requestsForMaster = shardNames.map { x ⇒
(sentinel ? Request("SENTINEL", "get-master-addr-by-name", x)).map(y ⇒ (x, y))
}
val mastersConfig = Future.sequence(requestsForMaster)
mastersConfig.onSuccess {
case x: (List[(String, Some[List[Some[ByteString]]])]) ⇒
self ! x.map {
case (name, Some(List(Some(host: ByteString), Some(port: ByteString)))) ⇒
println("Name Host:port: " + name + " " + host.utf8String + ":" + port.utf8String)
Shard("name", host.utf8String, port.utf8String.toInt, None)
}
}

If I understand correctly, it sounds like you want to use map on the request futures to pair each response with the shard's name:
val requestsForMaster: List[Future[(String, Some[List[Some[ByteString]])] =
shardNames.map { x =>
val result = sentinel ? Request("SENTINEL", "get-master-addr-by-name", x)
result.map(x -> _)
}
val mastersConfig = Future.sequence(requestsForMaster)
mastersConfig.onSuccess {
case results: List[(String, Some[List[Some[ByteString]]])] =>
self ! results.map {
case (x, Some(List(Some(host: ByteString), Some(port: ByteString)))) =>
// Create the shard object.
}
}
More generally, if you have a Future[V] and a K, you can create a Future[(K, V)] by writing fv.map(k -> _).

Related

Is it faster to create a new Map or clear it and use again?

I need to use many Maps in my project so I wonder which way is more efficient:
val map = mutable.Map[Int, Int] = mutable.Map.empty
for (_ <- 0 until big_number)
{
// do something with map
map.clear()
}
or
for (_ <- 0 until big_number)
{
val map = mutable.Map[Int, Int] = mutable.Map.empty
// do something with map
}
to use in terms of time and memory?
Well, my formal answer would always be depends. As you need to benchmark your own scenario, and see what fits better for your scenario. I'll provide an example how you can try benchmarking your own code. Let's start with writing a measuring method:
def measure(name: String, f: () => Unit): Unit = {
val l = System.currentTimeMillis()
println(name + ": " + (System.currentTimeMillis() - l))
f()
println(name + ": " + (System.currentTimeMillis() - l))
}
Let's assume that in each iteration we need to insert into the map one key-value pair, and then to print it:
Await.result(Future.sequence(Seq(Future {
measure("inner", () => {
for (i <- 0 until 10) {
val map2 = mutable.Map.empty[Int, Int]
map2(i) = i
println(map2)
}
})
},
Future {
measure("outer", () => {
val map1 = mutable.Map.empty[Int, Int]
for (i <- 0 until 10) {
map1(i) = i
println(map1)
map1.clear()
}
})
})), 10.seconds)
The output in this case, is almost always equal between the inner and the outer. Please note that in this case I run the two options in parallel, as if I wouldn't the first one always takes significantly more time, no matter which one of then is first.
Therefore, we can conclude, that in this case they are almost the same.
But, if for example I add an immutable option:
Future {
measure("immutable", () => {
for (i <- 0 until 10) {
val map1 = Map[Int, Int](i -> i)
println(map1)
}
})
}
it always ends up first. This makes sense because immutable collections are much more performant than the mutables.
For better performance tests you probably need to use some third parties, such as scalameter, or others that exists.

Can I use Action.async with multiple Futures?

In a previous SO question, I got advice on using Scala Futures with PlayFramework, thank you. Now things have gotten a bit more complicated. Let's say that before I just had to map where fruit could be found:
def getMapData(coll: MongoCollection[Document], s: String): Future[Seq[Document]] = ...
def mapFruit(collection: MongoCollection[Document]) = Action.async {
val fut = getMapData(collection, "fruit")
fut.map { docs: Seq[Document] =>
Ok(docs.toJson)
} recover {
case e => Console.err.println("FAIL: " + e.getMessage); BadRequest("FAIL")
}
}
It turns out that people care more about Apples than Bananas or Cherries, so if no more than 100 items should appear on the map, people want Apples to have priority over Bananas and Cherries, but not more than some percentage of items on a map should be Apples. Some function pickDocs determines the proper mix. I thought something like this might just work, but no:
def mapApplesBananasCherries(collection: MongoCollection[Document]) = Action.async {
val futA = getMapData(collection, "apples")
val futB = getMapData(collection, "bananas")
val futC = getMapData(collection, "cherries")
futA.map { docsA: Seq[Document] =>
futB.map { docsB: Seq[Document] =>
futC.map { docsC: Seq[Document] =>
val docsPicked = pickDocs(100, docsA, docsB, docsC)
Ok(docsPicked.toJson)
}
}
// won't compile without something here, e.g. Ok("whatever")
} recover {
case e => Console.err.println("FAIL: " + e.getMessage); BadRequest("FAIL")
}
}
Life was simple when I just had one Future, but now I have three. What can I do to make this to (1) work and (2) again be simple? I can't really construct a web response until all three Futures have values.
Basically, you should use flatMap
futA.flatMap { docsA: Seq[String] =>
futB.flatMap { docsB: Seq[String] =>
futC.map { docsC: Seq[String] =>
docsPicked = pickDocs(100, docsA, docsB, docsC)
Ok(docsPicked.toJson)
}
}
}
Also, you can use for comprehension:
val res = for {
docsA <- futA
docsB <- futB
docsC <- futC
} yield Ok(pickDocs(100, docsA, docsB, docsC).toJson)
res.recover {
case e => Console.err.println("FAIL: " + e.getMessage); BadRequest("FAIL")
}
If my understanding is that you want to execute apples, cherries and bananas in that priority, I would code it similar to this
import scala.concurrent.{Await, Future}
import scala.util.Random
import scala.concurrent.duration._
object WaitingFutures extends App {
implicit val ec = scala.concurrent.ExecutionContext.Implicits.global
val apples = Future {50 + Random.nextInt(100)}
val cherries = Future {50 + Random.nextInt(100)}
val bananas = Future {50 + Random.nextInt(100)}
val mix = for {
app <- apples
cher <- if (app < 100) cherries else Future {0}
ban <- if (app + cher < 100) bananas else Future {0}
} yield (app,cher,ban)
mix.onComplete {m =>
println(s"mix ${m.get}")
}
Await.result(mix, 3 seconds)
}
if apples returns more than 100 when the future completes, it doesn't wait until cherries or bananas are done, but returns a dummy future with 0. If it's not enough it will wait until cherries are executed and so on.
NB I didn't put much effort on how to signal the if, so I'm using the dummy future which might not be the best approach.
This doesn't compile because your nested future block is returning a Future[Future[Future[Response]]]. If you instead use flatMap on the futures, Your futures will not be nested.
If you want this to be a little less repetitive, you can use Future.sequence instead to kick off futures simultaneously. You can either use pattern matching to re-extract the lists:
val futureCollections = List("apples", "bananas", "cherries").map{ getMapData(collection, _) }
Future.sequence(futureCollections) map { case docsA :: docsB :: docsC :: Nil =>
Ok(pickDocs(100, docsA, docsB, docsC).toJson)
} recover {
case e => Console.err.println("FAIL: " + e.getMessage); BadRequest("FAIL")
}
or you could just hand the pickDocs function a list of lists (sorted by priority) for it to pick from.
Future.sequence(futureCollections) map { docLists =>
Ok(pickDocs(docLists, 100, 0.75f).toJson)
} recover {
case e => Console.err.println("FAIL: " + e.getMessage); BadRequest("FAIL")
}
This pickDocs implementation will take a percentage of the head of the list, unless there aren't enough documents in the full list, in which it takes more, then recursively apply the same percentage on the remaining slots lists.
def pickDocs[T](lists: List[List[T]], max: Int, dampPercentage: Float): List[T] = {
lists match {
case Nil => Nil
case head :: tail =>
val remainingLength = tail.flatten.length
val x = max - remainingLength
val y = math.ceil(max * dampPercentage).toInt
val fromHere = head.take(x max y)
fromHere ++ pickDocs(tail, max - fromHere.length, dampPercentage)
}
}
This is a very common pattern for Futures and similar classes that "contain values" (e.g. Option, List)
To combine the results you want to use the flatMap method and the resulting code is
def mapApplesBananasCherries(collection: MongoCollection[Document]) = Action.async {
val futA = getMapData(collection, "apples")
val futB = getMapData(collection, "bananas")
val futC = getMapData(collection, "cherries")
futA.flatMap { docsA =>
futB.flatMap { docsB =>
futC.map { docsC =>
val docsPicked = pickDocs(100, docsA, docsB, docsC)
Ok(docsPicked.toJson)
}
}
} recover {
case e => Console.err.println("FAIL: " + e.getMessage); BadRequest("FAIL")
}
}
In fact it's so common that a special syntax exists to make it more readable, called for-comprehension: the following code is equivalent to the previous snippet
def mapApplesBananasCherries(collection: MongoCollection[Document]) = Action.async {
val futA = getMapData(collection, "apples")
val futB = getMapData(collection, "bananas")
val futC = getMapData(collection, "cherries")
for {
apples <- futA
bananas <- futB
cherries <- futC
} yield {
val docsPicked = pickDocs(100, apples, bananas, cherries)
Ok(docsPicked.toJson)
} recover {
case e => Console.err.println("FAIL: " + e.getMessage); BadRequest("FAIL")
}
}

Scala Future, flatMap that works on Either

Is there really a way to transform an object of type Future[Either[Future[T1], Future[T2]]] to and object of type Either[Future[T1], Future[T2]] ??
Maybe something like flatMap that works on Either....
I'm trying to make this code work (I have similar code that implements wrapped chain-of actions, but it doesn't involve future. It works, much simpler). The code below is based on that, with necessary modification to make it work for situation that involves futures.
case class WebServResp(msg: String)
case class WebStatus(code: Int)
type InnerActionOutType = Either[Future[Option[WebServResp]], Future[WebStatus]]
type InnerActionSig = Future[Option[WebServResp]] => Either[Future[Option[WebServResp]], Future[WebStatus]]
val chainOfActions: InnerActionSig = Seq(
{prevRespOptFut =>
println("in action 1: " + prevRespOptFut)
//dont care about prev result
Left(Future.successful(Some(WebServResp("result from 1"))))
},
{prevRespOptFut =>
println("in action 2: " + prevFutopt)
prevRespOptFut.map {prevRespOpt =>
//i know prevResp contains instance of WebServResp. so i skip the opt-matching
val prevWebServResp = prevRespOpt.get
Left(Some(prevWebServResp.msg + " & " + " additional result from 2"))
}
//But the outcome of the map above is: Future[Left(...)]
//What I want is Left(Future[...])
}
)
type WrappedActionSig = InnerActionOutType => InnerActionOutType
val wrappedChainOfActions = chainOfActions.map {innerAction =>
val wrappedAction: WrappedActionSig = {respFromPrevWrappedAction =>
respFromPrevWrappedAction match {
case Left(wsRespOptFut) => {
innerAction(wsRespOptFut)
}
case Right(wsStatusFut) => {
respFromPrevWrappedAction
}
}
}
wrappedAction
}
wrappedChainOfActions.fold(identity[WrappedActionIOType] _) ((l, r) => l andThen r).apply(Left(None))
UPDATE UPDATE UPDATE
Based on comments from Didier below ( Scala Future, flatMap that works on Either )... here's a code that works:
//API
case class WebRespString(str: String)
case class WebStatus(code: Int, str: String)
type InnerActionOutType = Either[Future[Option[WebRespString]], Future[WebStatus]]
type InnerActionSig = Future[Option[WebRespString]] => InnerActionOutType
type WrappedActionSig = InnerActionOutType => InnerActionOutType
def executeChainOfActions(chainOfActions: Seq[InnerActionSig]): Future[WebStatus] = {
val wrappedChainOfActions : Seq[WrappedActionSig] = chainOfActions.map {innerAction =>
val wrappedAction: WrappedActionSig = {respFromPrevWrappedAction =>
respFromPrevWrappedAction match {
case Left(wsRespOptFut) => {
innerAction(wsRespOptFut) }
case Right(wsStatusFut) => {
respFromPrevWrappedAction
}
}
}
wrappedAction
}
val finalResultPossibilities = wrappedChainOfActions.fold(identity[InnerActionOutType] _) ((l, r) => l andThen r).apply(Left(Future.successful(None)))
finalResultPossibilities match {
case Left(webRespStringOptFut) => webRespStringOptFut.map {webRespStringOpt => WebStatus(200, webRespStringOpt.get.str)}
case Right(webStatusFut) => webStatusFut
}
}
//API-USER
executeChainOfActions(Seq(
{prevRespOptFut =>
println("in action 1: " + prevRespOptFut)
//dont care about prev result
Left(Future.successful(Some(WebRespString("result from 1"))))
},
{prevRespOptFut =>
println("in action 2: " + prevRespOptFut)
Left(prevRespOptFut.map {prevRespOpt =>
val prevWebRespString = prevRespOpt.get
Some(WebRespString(prevWebRespString.str + " & " + " additional result from 2"))
})
}
)).map {webStatus =>
println(webStatus.code + ":" + webStatus.str)
}
executeChainOfActions(Seq(
{prevRespOptFut =>
println("in action 1: " + prevRespOptFut)
//Let's short-circuit here
Right(Future.successful(WebStatus(404, "resource non-existent")))
},
{prevRespOptFut =>
println("in action 2: " + prevRespOptFut)
Left(prevRespOptFut.map {prevRespOpt =>
val prevWebRespString = prevRespOpt.get
Some(WebRespString(prevWebRespString.str + " & " + " additional result from 2"))
})
}
)).map {webStatus =>
println(webStatus.code + ":" + webStatus.str)
}
Thanks,
Raka
The type Future[Either[Future[T1], Future[T2]]] means that sometimes later (that's future) one gets an Either, so at that time, one will know which way the calculation will go, and whether one will, still later, gets a T1 or a T2.
So the knowledge of which branch will be chosen (Left or Right) will come later. The type Either[Future[T1], Future[T2] means that one has that knowledge now (don't know what the result will be, but knows already what type it will be). The only way to get out of the Future is to wait.
No magic here, the only way for later to become now is to wait, which is done with result on the Future, and not recommended. `
What you can do instead is say that you are not too interested in knowing which branch is taken, as long has it has not completed, so Future[Either[T1, T2]] is good enough. That is easy. Say you have the Either and you would rather not look not but wait for the actual result :
def asFuture[T1, T2](
either: Either[Future[T1], Future[T2]])(
implicit ec: ExecutionContext)
: Future[Either[T1, T2] = either match {
case Left(ft1) => ft1 map {t1 => Left(t1)}
case Right(ft2) => ft2 map {t2 => Right(t2)}
}
You don't have the Either yet, but a future on that, so just flatMap
f.flatMap(asFuture) : Future[Either[T1, T2]]
(will need an ExecutionContext implicitly available)
It seems like you don't actually need the "failure" case of the Either to be a Future? In which case we can use scalaz (note that the "success" case of an either should be on the right):
import scalaz._
import scalaz.Scalaz._
def futureEitherFutureToFuture[A, B](f: Future[Either[A, Future[B]]])(
implicit ec: ExecutionContext): Future[Either[A, B]] =
f.flatMap(_.sequence)
But it's probably best to always keep the Future on the outside in your API, and to flatMap in your code rather than in the client. (Here it's part of the foldLeftM):
case class WebServResp(msg: String)
case class WebStatus(code: Int)
type OWSR = Option[WebServResp]
type InnerActionOutType = Future[Either[WebStatus, OWSR]]
type InnerActionSig = OWSR => InnerActionOutType
def executeChain(chain: List[InnerActionSig]): InnerActionOutType =
chain.foldLeftM(None: OWSR) {
(prevResp, action) => action(prevResp)
}
//if you want that same API
def executeChainOfActions(chainOfActions: Seq[InnerActionSig]) =
executeChain(chainOfActions.toList).map {
case Left(webStatus) => webStatus
case Right(webRespStringOpt) => WebStatus(200, webRespStringOpt.get.str)
}
(If you need "recovery" type actions, so you really need OWSR to be an Either, then you should still make InnerActionOutType a Future[Either[...]], and you can use .traverse or .sequence in your actions as necessary. If you have an example of an "error-recovery" type action, I can put an example of that here)

Pattern matching in conjunciton with filter

given the following code that I#d like to refactor - Im only interested in lines matching the 1st pattern that occurs, is there a way of shortening this like lets say to use it in conjunction with filter?
With best regards,
Stefan
def processsHybridLinks(it: Iterator[String]): Unit =
{
for (line <- it) {
val lineSplit = lineSplitAndFilter(line)
lineSplit match {
case Array(TaggedString(origin), TaggedString(linkName), TaggedString(target), ".") =>
{
println("trying to find pages " + origin + " and " + target)
val originPageOpt = Page.findOne(MongoDBObject("name" -> (decodeUrl(origin))))
val targetPageOpt = Page.findOne(MongoDBObject("name" -> (decodeUrl(target))))
(originPageOpt, targetPageOpt) match {
case (Some(origin), Some(target)) =>
createHybridLink(origin, linkName, target)
Logger.info(" creating Hybrid Link")
case _ => Logger.info(" couldnt create Hybrid LInk")
}
}
case _ =>
}
}
}
Have a look at collect method. It allows you to use a PartialFunction[A,B] defined using an incomplete pattern match as a sort of combination map and filter:
it.map(lineSplitAndFilter) collect {
case Array(TaggedString(o), TaggedString(n), TaggedString(t), ".") =>
(n, Page.findOne(...), Page.findOne(...))
} foreach {
case (n, Some(o), Some(t)) => ...
case _ =>
}

Scala: how to traverse stream/iterator collecting results into several different collections

I'm going through log file that is too big to fit into memory and collecting 2 type of expressions, what is better functional alternative to my iterative snippet below?
def streamData(file: File, errorPat: Regex, loginPat: Regex): List[(String, String)]={
val lines : Iterator[String] = io.Source.fromFile(file).getLines()
val logins: mutable.Map[String, String] = new mutable.HashMap[String, String]()
val errors: mutable.ListBuffer[(String, String)] = mutable.ListBuffer.empty
for (line <- lines){
line match {
case errorPat(date,ip)=> errors.append((ip,date))
case loginPat(date,user,ip,id) =>logins.put(ip, id)
case _ => ""
}
}
errors.toList.map(line => (logins.getOrElse(line._1,"none") + " " + line._1,line._2))
}
Here is a possible solution:
def streamData(file: File, errorPat: Regex, loginPat: Regex): List[(String,String)] = {
val lines = Source.fromFile(file).getLines
val (err, log) = lines.collect {
case errorPat(inf, ip) => (Some((ip, inf)), None)
case loginPat(_, _, ip, id) => (None, Some((ip, id)))
}.toList.unzip
val ip2id = log.flatten.toMap
err.collect{ case Some((ip,inf)) => (ip2id.getOrElse(ip,"none") + "" + ip, inf) }
}
Corrections:
1) removed unnecessary types declarations
2) tuple deconstruction instead of ulgy ._1
3) left fold instead of mutable accumulators
4) used more convenient operator-like methods :+ and +
def streamData(file: File, errorPat: Regex, loginPat: Regex): List[(String, String)] = {
val lines = io.Source.fromFile(file).getLines()
val (logins, errors) =
((Map.empty[String, String], Seq.empty[(String, String)]) /: lines) {
case ((loginsAcc, errorsAcc), next) =>
next match {
case errorPat(date, ip) => (loginsAcc, errorsAcc :+ (ip -> date))
case loginPat(date, user, ip, id) => (loginsAcc + (ip -> id) , errorsAcc)
case _ => (loginsAcc, errorsAcc)
}
}
// more concise equivalent for
// errors.toList.map { case (ip, date) => (logins.getOrElse(ip, "none") + " " + ip) -> date }
for ((ip, date) <- errors.toList)
yield (logins.getOrElse(ip, "none") + " " + ip) -> date
}
I have a few suggestions:
Instead of a pair/tuple, it's often better to use your own class. It gives meaningful names to both the type and its fields, which makes the code much more readable.
Split the code into small parts. In particular, try to decouple pieces of code that don't need to be tied together. This makes your code easier to understand, more robust, less prone to errors and easier to test. In your case it'd be good to separate producing your input (lines of a log file) and consuming it to produce a result. For example, you'd be able to make automatic tests for your function without having to store sample data in a file.
As an example and exercise, I tried to make a solution based on Scalaz iteratees. It's a bit longer (includes some auxiliary code for IteratorEnumerator) and perhaps it's a bit overkill for the task, but perhaps someone will find it helpful.
import java.io._;
import scala.util.matching.Regex
import scalaz._
import scalaz.IterV._
object MyApp extends App {
// A type for the result. Having names keeps things
// clearer and shorter.
type LogResult = List[(String,String)]
// Represents a state of our computation. Not only it
// gives a name to the data, we can also put here
// functions that modify the state. This nicely
// separates what we're computing and how.
sealed case class State(
logins: Map[String,String],
errors: Seq[(String,String)]
) {
def this() = {
this(Map.empty[String,String], Seq.empty[(String,String)])
}
def addError(date: String, ip: String): State =
State(logins, errors :+ (ip -> date));
def addLogin(ip: String, id: String): State =
State(logins + (ip -> id), errors);
// Produce the final result from accumulated data.
def result: LogResult =
for ((ip, date) <- errors.toList)
yield (logins.getOrElse(ip, "none") + " " + ip) -> date
}
// An iteratee that consumes lines of our input. Based
// on the given regular expressions, it produces an
// iteratee that parses the input and uses State to
// compute the result.
def logIteratee(errorPat: Regex, loginPat: Regex):
IterV[String,List[(String,String)]] = {
// Consumes a signle line.
def consume(line: String, state: State): State =
line match {
case errorPat(date, ip) => state.addError(date, ip);
case loginPat(date, user, ip, id) => state.addLogin(ip, id);
case _ => state
}
// The core of the iteratee. Every time we consume a
// line, we update our state. When done, compute the
// final result.
def step(state: State)(s: Input[String]): IterV[String, LogResult] =
s(el = line => Cont(step(consume(line, state))),
empty = Cont(step(state)),
eof = Done(state.result, EOF[String]))
// Return the iterate waiting for its first input.
Cont(step(new State()));
}
// Converts an iterator into an enumerator. This
// should be more likely moved to Scalaz.
// Adapted from scalaz.ExampleIteratee
implicit val IteratorEnumerator = new Enumerator[Iterator] {
#annotation.tailrec def apply[E, A](e: Iterator[E], i: IterV[E, A]): IterV[E, A] = {
val next: Option[(Iterator[E], IterV[E, A])] =
if (e.hasNext) {
val x = e.next();
i.fold(done = (_, _) => None, cont = k => Some((e, k(El(x)))))
} else
None;
next match {
case None => i
case Some((es, is)) => apply(es, is)
}
}
}
// main ---------------------------------------------------
{
// Read a file as an iterator of lines:
// val lines: Iterator[String] =
// io.Source.fromFile("test.log").getLines();
// Create our testing iterator:
val lines: Iterator[String] = Seq(
"Error: 2012/03 1.2.3.4",
"Login: 2012/03 user 1.2.3.4 Joe",
"Error: 2012/03 1.2.3.5",
"Error: 2012/04 1.2.3.4"
).iterator;
// Create an iteratee.
val iter = logIteratee("Error: (\\S+) (\\S+)".r,
"Login: (\\S+) (\\S+) (\\S+) (\\S+)".r);
// Run the the iteratee against the input
// (the enumerator is implicit)
println(iter(lines).run);
}
}