Is there a way to turn a Seq[Future[X]] into an Enumerator[X] ? The use case is that I want to get resources by crawling the web. This is going to return a Sequence of Futures, and I'd like to return an Enumerator that will push the futures in the order in which they are first finished on to the Iteratee.
It looks like Victor Klang's Future select gist could be used to do this - though it looks pretty inefficient.
Note: The Iteratees and Enumerator's in question are those given by the play framework version 2.x, ie with the following imports: import play.api.libs.iteratee._
Using Victor Klang's select method:
/**
* "Select" off the first future to be satisfied. Return this as a
* result, with the remainder of the Futures as a sequence.
*
* #param fs a scala.collection.Seq
*/
def select[A](fs: Seq[Future[A]])(implicit ec: ExecutionContext):
Future[(Try[A], Seq[Future[A]])] = {
#scala.annotation.tailrec
def stripe(p: Promise[(Try[A], Seq[Future[A]])],
heads: Seq[Future[A]],
elem: Future[A],
tail: Seq[Future[A]]): Future[(Try[A], Seq[Future[A]])] = {
elem onComplete { res => if (!p.isCompleted) p.trySuccess((res, heads ++ tail)) }
if (tail.isEmpty) p.future
else stripe(p, heads :+ elem, tail.head, tail.tail)
}
if (fs.isEmpty) Future.failed(new IllegalArgumentException("empty future list!"))
else stripe(Promise(), fs.genericBuilder[Future[A]].result, fs.head, fs.tail)
}
}
I can then get what I need with
Enumerator.unfoldM(initialSeqOfFutureAs){ seqOfFutureAs =>
if (seqOfFutureAs.isEmpty) {
Future(None)
} else {
FutureUtil.select(seqOfFutureAs).map {
case (t, seqFuture) => t.toOption.map {
a => (seqFuture, a)
}
}
}
}
A better, shorter and I think more efficient answer is:
def toEnumerator(seqFutureX: Seq[Future[X]]) = new Enumerator[X] {
def apply[A](i: Iteratee[X, A]): Future[Iteratee[X, A]] = {
Future.sequence(seqFutureX).flatMap { seqX: Seq[X] =>
seqX.foldLeft(Future.successful(i)) {
case (i, x) => i.flatMap(_.feed(Input.El(x)))
}
}
}
}
I do realise that the question is a bit old already, but based on Santhosh's answer and the built-in Enumterator.enumerate() implementation I came up with the following:
def enumerateM[E](traversable: TraversableOnce[Future[E]])(implicit ec: ExecutionContext): Enumerator[E] = {
val it = traversable.toIterator
Enumerator.generateM {
if (it.hasNext) {
val next: Future[E] = it.next()
next map {
e => Some(e)
}
} else {
Future.successful[Option[E]] {
None
}
}
}
}
Note that unlike the first Viktor-select-based-solution this one preserves the order, but you can still start off all computations asynchronously before. So, for example, you can do the following:
// For lack of a better name
def mapEachM[E, NE](eventuallyList: Future[List[E]])(f: E => Future[NE])(implicit ec: ExecutionContext): Enumerator[NE] =
Enumerator.flatten(
eventuallyList map { list =>
enumerateM(list map f)
}
)
This latter method was in fact what I was looking for when I stumbled on this thread. Hope it helps someone! :)
You could construct one using the Java Executor Completeion Service (JavaDoc). The idea is to use create a sequence of new futures, each using ExecutorCompletionService.take() to wait for the next result. Each future will start, when the previous future has its result.
But please b e aware, that this might be not that efficient, because a lot of synchronisation is happening behind the scenes. It might be more efficient, to use some parallel map reduce for calculation (e.g. using Scala's ParSeq) and let the Enumerator wait for the complete result.
WARNING: Not compiled before answering
What about something like this:
def toEnumerator(seqFutureX: Seq[Future[X]]) = new Enumerator[X] {
def apply[A](i: Iteratee[X, A]): Future[Iteratee[X, A]] =
Future.fold(seqFutureX)(i){ case (i, x) => i.flatMap(_.feed(Input.El(x)))) }
}
Here is something I found handy,
def unfold[A,B](xs:Seq[A])(proc:A => Future[B])(implicit errorHandler:Throwable => B):Enumerator[B] = {
Enumerator.unfoldM (xs) { xs =>
if (xs.isEmpty) Future(None)
else proc(xs.head) map (b => Some(xs.tail,b)) recover {
case e => Some((xs.tail,errorHandler(e)))
}
}
}
def unfold[A,B](fxs:Future[Seq[A]])(proc:A => Future[B]) (implicit errorHandler1:Throwable => Seq[A], errorHandler:Throwable => B) :Enumerator[B] = {
(unfold(Seq(fxs))(fxs => fxs)(errorHandler1)).flatMap(unfold(_)(proc)(errorHandler))
}
def unfoldFutures[A,B](xsfxs:Seq[Future[Seq[A]]])(proc:A => Future[B]) (implicit errorHandler1:Throwable => Seq[A], errorHandler:Throwable => B) :Enumerator[B] = {
xsfxs.map(unfold(_)(proc)).reduceLeft((a,b) => a.andThen(b))
}
I would like to propose the use of a Broadcast
def seqToEnumerator[A](futuresA: Seq[Future[A]])(defaultValue: A, errorHandler: Throwable => A): Enumerator[A] ={
val (enumerator, channel) = Concurrent.broadcast[A]
futuresA.foreach(f => f.onComplete({
case Success(Some(a: A)) => channel.push(a)
case Success(None) => channel.push(defaultValue)
case Failure(exception) => channel.push(errorHandler(exception))
}))
enumerator
}
I added errorHandling and defaultValues but you can skip those by using onSuccess or onFailure, instead of onComplete
Related
I am using a library that provides a Traversable[T] that pages through database results. I'd like to avoid loading the whole thing into memory, so I am trying to convert it to a Stream[T].
From what I can tell, the built in "asStream" method loads the whole Traversable into a Buffer, which defeats my purpose. My attempt (below) hits a StackOverflowException on large results, and I can't tell why. Can someone help me understand what is going on? Thanks!
def asStream[T](traversable: => Traversable[T]): Stream[T] = {
if (traversable.isEmpty) Empty
else {
lazy val head = traversable.head
lazy val tail = asStream(traversable.tail)
head #:: tail
}
}
Here's a complete example that reproduces this, based on a suggestion by #SCouto
import scala.collection.immutable.Stream.Empty
object StreamTest {
def main(args: Array[String]) = {
val bigVector = Vector.fill(90000)(1)
val optionStream = asStream(bigVector).map(v => Some(v))
val zipped = optionStream.zipAll(optionStream.tail, None, None)
}
def asStream[T](traversable: => Traversable[T]): Stream[T] = {
#annotation.tailrec
def loop(processed: => Stream[T], pending: => Traversable[T]): Stream[T] = {
if (pending.isEmpty) processed
else {
lazy val head = pending.head
lazy val tail = pending.tail
loop(processed :+ head, tail)
}
}
loop(Empty, traversable)
}
}
Edit: After some interesting ideas from #SCouto, I learned this could also be done with trampolines to keep the result as a Stream[T] that is in the original order
object StreamTest {
def main(args: Array[String]) = {
val bigVector = Range(1, 90000).toVector
val optionStream = asStream(bigVector).map(v => Some(v))
val zipped = optionStream.zipAll(optionStream.tail, None, None)
zipped.take(10).foreach(println)
}
def asStream[T](traversable: => Traversable[T]): Stream[T] = {
sealed trait Traversal[+R]
case class More[+R](result: R, next: () => Traversal[R]) extends Traversal[R]
case object Done extends Traversal[Nothing]
def next(currentTraversable: Traversable[T]): Traversal[T] = {
if (currentTraversable.isEmpty) Done
else More(currentTraversable.head, () => next(currentTraversable.tail))
}
def trampoline[R](body: => Traversal[R]): Stream[R] = {
def loop(thunk: () => Traversal[R]): Stream[R] = {
thunk.apply match {
case More(result, next) => Stream.cons(result, loop(next))
case Done => Stream.empty
}
}
loop(() => body)
}
trampoline(next(traversable))
}
}
Try this:
def asStream[T](traversable: => Traversable[T]): Stream[T] = {
#annotation.tailrec
def loop(processed: Stream[T], pending: Traversable[T]): Stream[T] = {
if (pending.isEmpty) processed
else {
lazy val head = pending.head
lazy val tail = pending.tail
loop(head #:: processed, tail)
}
}
loop(Empty, traversable)
}
The main point is to ensure that your recursive call is the last action of your recursive function.
To ensure this you can use both a nested method (called loop in the example) and the tailrec annotation which ensures your method is tail-safe.
You can find info about tail rec here and in this awesome answer here
EDIT
The problem was that we were adding the element at the end of the Stream. If you add it as head of the Stream as in your example it will work fine. I updated my code. Please test it and let us know the result.
My tests:
scala> val optionStream = asStream(Vector.fill(90000)(1)).map(v => Some(v))
optionStream: scala.collection.immutable.Stream[Some[Int]] = Stream(Some(1), ?)
scala> val zipped = optionStream.zipAll(optionStream.tail, None, None)
zipped: scala.collection.immutable.Stream[(Option[Int], Option[Int])] = Stream((Some(1),Some(1)), ?)
EDIT2:
According to your comments, and considering the fpinscala example as you said. I think this may help you. The point is creating a case class structure with lazy evaluation. Where the head is a single element, and the tail a traversable
sealed trait myStream[+T] {
def head: Option[T] = this match {
case MyEmpty => None
case MyCons(h, _) => Some(h())
}
def tail: myStream[T] = this match {
case MyEmpty => MyEmpty
case MyCons(_, t) => myStream.cons(t().head, t().tail)
}
}
case object MyEmpty extends myStream[Nothing]
case class MyCons[+T](h: () => T, t: () => Traversable[T]) extends myStream[T]
object myStream {
def cons[T](hd: => T, tl: => Traversable[T]): myStream[T] = {
lazy val head = hd
lazy val tail = tl
MyCons(() => head, () => tail)
}
def empty[T]: myStream[T] = MyEmpty
def apply[T](as: T*): myStream[T] = {
if (as.isEmpty) empty
else cons(as.head, as.tail)
}
}
Some Quick tests:
val bigVector = Vector.fill(90000)(1)
myStream.cons(bigVector.head, bigVector.tail)
res2: myStream[Int] = MyCons(<function0>,<function0>)
Retrieving head:
res2.head
res3: Option[Int] = Some(1)
And the tail:
res2.tail
res4: myStream[Int] = MyCons(<function0>,<function0>)
EDIT3
The trampoline solution by the op:
def asStream[T](traversable: => Traversable[T]): Stream[T] = {
sealed trait Traversal[+R]
case class More[+R](result: R, next: () => Traversal[R]) extends Traversal[R]
case object Done extends Traversal[Nothing]
def next(currentTraversable: Traversable[T]): Traversal[T] = {
if (currentTraversable.isEmpty) Done
else More(currentTraversable.head, () => next(currentTraversable.tail))
}
def trampoline[R](body: => Traversal[R]): Stream[R] = {
def loop(thunk: () => Traversal[R]): Stream[R] = {
thunk.apply match {
case More(result, next) => Stream.cons(result, loop(next))
case Done => Stream.empty
}
}
loop(() => body)
}
trampoline(next(traversable))
}
}
Stream doesn't keep the data in memory because you declare how to generate each item. It's very likely that your database data is not been procedurally generated so what you need is to fetch the data the first time you ask for it (something like def getData(index: Int): Future[Data]).
The biggest problem rise in, since you are fetching data from a database, you are probably using Futures so, even if you are able to achieve it, you would have a Future[Stream[Data]] object which is not that nice to use or, much worst, block it.
Wouldn't be much more worthy just to paginate your database data query?
I have a function in my controller like this:
def getPreviousVersions(id: Int): Action[AnyContent] = Action.async {
val result: Future[Option[ElementModel]] = dto.getPreviousVersion(id)
// while-loop or whatever is best practice
// val arrayOfElements: Future[Seq[ElementModel] = ...
val c = for {
previousVersions <- arrayOfElements
} yield previousVersions
//do something with the versions
//Return something in the end
}
My model looks like this:
case class ElementModel(
...
previousVersion: Option[Int],
...)
I store the id of the latest previousVersion in my model. Now, what I want to do is iterate either recursively or with a while-loop or whatever is best-practice to get the previousVersion of the previousVersion and so on.
The idea is to get all previous versions, store them in a sequence and pass this seq to another function.
Is there a smooth, proper way to do this? Thanks!
def getVersionSequence(id: Int): Future[List[ElementModel]] = {
def _getVersionSequence(id: Int, fList: Future[List[ElementModel]]): Future[List[ElementModel]] = {
dto.getPreviousVersion(id).flatMap({
case Some(elementModel) => elementModel.previousVersion match {
case Some(pVId) => _getVersionSequence(pVId, fList.map(list => elementModel +: list))
case None => fList.map(list => elementModel +: list)
}
case None => fList
})
}
val fInvertedList = _getVersionSequence(id, Future(List.empty[ElementModel]))
fInvertedList.map(list => list.reverse)
}
def getPreviousVersions(id: Int): Action[AnyContent] = Action.async {
val c: Future[List[ElementModel]] = getVersionSequence(id)
//do something with the versions
//Return something in the end
}
def getPreviousVersion(previousVersionId: Int): Future[ElementModel] = dto.getElementModelForId(previousVersionId)
/*This has been changed from your question, you shouldn't need a getPreviousVersion(id) function in your database connector, but merely a function to get an element by id*/
def getAllPreviousVersions(e: ElementModel): Future[Seq[ElementModel]] = {
e.previousVersion match {
case None => Future.successful(Seq(e))
case Some(id) => getPreviousVersion(id).flatMap {
previousVersionElement =>
getAllPreviousVersions(previousVersionElement).map {
//Properly preserves order !
seq => previousVersionElement +: seq
}
}
}
}
def getPreviousVersions(e: ElementModel) = {
getAllPreviousVersions(e).map {
//do something with the versions
//Return something in the end
???
}
}
I want to apply a function f to each element of a List and not stop at the first error but throw the last error (if any) only:
#annotation.tailrec
def tryAll[A](xs: List[A])(f: A => Unit): Unit = {
xs match {
case x :: xt =>
try {
f(x)
} finally {
tryAll(xt)(f)
}
case _ =>
}
}
But, the above code does not compile - it is complaining that this function is not tail recursive. Why not?
This solution iterates over all elements and produces (throws) the last error if any:
def tryAll[A](xs: List[A])(f: A => Unit): Unit = {
val res = xs.foldLeft(Option.empty[Throwable]) {
case (maybeThrowable, a) =>
Try(f(a)) match {
case Success(_) => maybeThrowable
case Failure(e) => Option(e)
}
}
res.foreach(throwable => throw throwable)
}
As mentioned by #HristoIliev, your method cannot be tail recursive because the finally call is not guaranteed to be the tail call. This means that any method using try in this way will not be tail recursive. See this answer, also.
Calling the method again is a weird way of trying something repeatedly until it succeeds, because at each stage it's throwing an exception you are presumably not handling. Instead, I'd argue using a functional approach with Try, taking failures from a view until the operation succeeds. The only disadvantage to this approach is that it doesn't throw any exceptions for you to handle along the way (which can also be an advantage!).
def tryAll[A](xs: List[A])(f: A => Unit): Unit =
xs.view.map(x => Try(f(x))).takeWhile(_.isFailure).force
scala> val list = List(0, 0, 0, 4, 5, 0)
scala> tryAll(list)(a => println(10 / a))
2
If you really want to handle the exceptions (or just the last exception), you can change the return type of tryAll to List[Try[Unit]] (or simply Try[Unit] if you modify the code to only take the last one). It's better for the return type of the method to describe part of what it's actually doing--potentially returning errors.
Not sure the intention of the method, but you can something like that:
final def tryAll[A](xs: List[A])(f: A => Unit): Unit = {
xs match {
case x :: xt =>
try {
f(x)
} catch {
case e => tryAll(xt)(f)
}
case _ => //do something else
}
}
I know this way to use #annotation.tailrec
From this:
def fac(n:Int):Int = if (n<=1) 1 else n*fac(n-1)
You should have this:
#scala.annotation.tailrec
def facIter(f:Int, n:Int):Int = if (n<2) f else facIter(n*f, n-1)
def fac(n:Int) = facIter(1,n)
There're map/flatMap methods, there're also recover/recoverWith methods in the Scala Future standard API.
Why there's no collectWith ?
The code of the collect method is pretty simple :
def collect[S](pf: PartialFunction[T, S])(implicit executor: ExecutionContext): Future[S] =
map {
r => pf.applyOrElse(r, (t: T) => throw new NoSuchElementException("Future.collect partial function is not defined at: " + t))
}
The code of the collectWith method is then easy to imagine :
def collectWith[S](pf: PartialFunction[T, Future[S]])(implicit executor: ExecutionContext): Future[S] =
flatMap {
r => pf.applyOrElse(r, (t: T) => throw new NoSuchElementException("Future.collect partial function is not defined at: " + t))
}
I know that I can implement it and "extend" the Future standard API easily thanks to this article : http://debasishg.blogspot.fr/2008/02/why-i-like-scalas-lexically-scoped-open.html
I done that in my project :
class RichFuture[T](future: Future[T]) {
def collectWith[S](pf: PartialFunction[T, Future[S]])(implicit executor: ExecutionContext): Future[S] =
future.flatMap {
r => pf.applyOrElse(r, (t: T) => throw new NoSuchElementException("Future.collect partial function is not defined at: " + t))
}
}
trait WithRichFuture {
implicit def enrichFuture[T](person: Future[T]): RichFuture[T] = new RichFuture(person)
}
Maybe my needs for that does not justify to implement it in the standard API ?
Here is why I need this method in my Play2 project :
def createCar(key: String, eligibleCars: List[Car]): Future[Car] = {
def handleResponse: PartialFunction[WSResponse, Future[Car]] = {
case response: WSResponse if response.status == Status.CREATED => Future.successful(response.json.as[Car])
case response: WSResponse
if response.status == Status.BAD_REQUEST && response.json.as[Error].error == "not_the_good_one" =>
createCar(key, eligibleCars.tail)
}
// The "carApiClient.createCar" method just returns the result of the WS API call.
carApiClient.createCar(key, eligibleCars.head).collectWith(handleResponse)
}
I don't know how to do that without my collectWith method.
Maybe it's not the right way to do something like this ?
Do you know a better way ?
EDIT:
I have maybe a better solution for the createCar method that does not requires the collectWith method :
def createCar(key: String, eligibleCars: List[Car]): Future[Car] = {
for {
mayCar: Option[Car] <- Future.successful(eligibleCars.headOption)
r: WSResponse <- carApiClient.createCar(key, mayCar.get) if mayCar.nonEmpty
createdCar: Car <- Future.successful(r.json.as[Car]) if r.status == Status.CREATED
createdCar: Car <- createCar(key, eligibleCars.tail) if r.status == Status.BAD_REQUEST && r.json.as[Error].error == "not_the_good_one"
} yield createdCar
}
What do you think about this second solution ?
Second edit:
Just for information, here is my final solution thanks to #Dylan answer :
def createCar(key: String, eligibleCars: List[Car]): Future[Car] = {
def doCall(head: Car, tail: List[Car]): Future[Car] = {
carApiClient
.createCar(key, head)
.flatMap( response =>
response.status match {
case Status.CREATED => Future.successful(response.json.as[Car])
case Status.BAD_REQUEST if response.json.as[Error].error == "not_the_good_one" =>
createCar(key, tail)
}
)
}
eligibleCars match {
case head :: tail => doCall(head, tail)
case Nil => Future.failed(new RuntimeException)
}
}
Jules
How about:
def createCar(key: String, eligibleCars: List[Car]): Future[Car] = {
def handleResponse(response: WSResponse): Future[Car] = response.status match {
case Status.Created =>
Future.successful(response.json.as[Car])
case Status.BAD_REQUEST if response.json.as[Error].error == "not_the_good_one" =>
createCar(key, eligibleCars.tail)
case _ =>
// just fallback to a failed future rather than having a `collectWith`
Future.failed(new NoSuchElementException("your error here"))
}
// using flatMap since `handleResponse` is now a regular function
carApiClient.createCar(key, eligibleCars.head).flatMap(handleResponse)
}
Two changes:
handleResponse is no longer a partial function. The case _ returns a failed future, which is essentially what you were doing in your custom collectWith implementation.
use flatMap instead of collectWith, since handleResponse now suits that method signature
edit for extra info
If you really need to support the PartialFunction approach, you could always convert a PartialFunction[A, Future[B]] to a Function[A, Future[B]] by calling orElse on the partial function, e.g.
val handleResponsePF: PartialFunction[WSResponse, Future[Car]] = {
case ....
}
val handleResponse: Function[WSResponse, Future[Car]] = handleResponsePF orElse {
case _ => Future.failed(new NoSucheElementException("default case"))
}
Doing so would allow you to adapt an existing partial function to fit into a flatMap call.
(okay, technically, it already does, but you'd be throwing MatchErrors rather than your own custom exceptions)
I have an Iterator[Record] which is ordered on record.id this way:
record.id=1
record.id=1
...
record.id=1
record.id=2
record.id=2
..
record.id=2
Records of a specific ID could occur a large number of times, so I want to write a function that takes this iterator as input, and returns an Iterator[Iterator[Record]] output in a lazy manner.
I was able to come up with the following, but it fails on StackOverflowError after 500K records or so:
def groupByIter[T, B](iterO: Iterator[T])(func: T => B): Iterator[Iterator[T]] = new Iterator[Iterator[T]] {
var iter = iterO
def hasNext = iter.hasNext
def next() = {
val first = iter.next()
val firstValue = func(first)
val (i1, i2) = iter.span(el => func(el) == firstValue)
iter = i2
Iterator(first) ++ i1
}
}
What am I doing wrong?
Trouble here is that each Iterator.span call makes another stacked closure for trailing iterator, and without any trampolining it's very easy to overflow.
Actually I dont think there is an implementation, which is not memoizing elements of prefix iterator, since followed iterator could be accessed earlier than prefix is drain out.
Even in .span implementation there is a Queue to memoize elements in the Leading definition.
So easiest implementation that I could imagine is the following via Stream.
implicit class StreamChopOps[T](xs: Stream[T]) {
def chopBy[U](f: T => U): Stream[Stream[T]] = xs match {
case x #:: _ =>
def eq(e: T) = f(e) == f(x)
xs.takeWhile(eq) #:: xs.dropWhile(eq).chopBy(f)
case _ => Stream.empty
}
}
Although it could be not the most performant as it memoize a lot. But with proper iterating of that, GC should handle problem of excess intermediate streams.
You could use it as myIterator.toStream.chopBy(f)
Simple check validates that following code can run without SO
Iterator.fill(10000000)(Iterator(1,1,2)).flatten //1,1,2,1,1,2,...
.toStream.chopBy(identity) //(1,1),(2),(1,1),(2),...
.map(xs => xs.sum * xs.size).sum //60000000
Inspired by chopBy implemented by #Odomontois here is a chopBy I implemented for Iterator. Of course each bulk should fit allocated memory. It doesn't looks very elegant but it seems to work :)
implicit class IteratorChopOps[A](toChopIter: Iterator[A]) {
def chopBy[U](f: A => U) = new Iterator[Traversable[A]] {
var next_el: Option[A] = None
#tailrec
private def accum(acc: List[A]): List[A] = {
next_el = None
val new_acc = hasNext match {
case true =>
val next = toChopIter.next()
acc match {
case Nil =>
acc :+ next
case _ MatchTail t if (f(t) == f(next)) =>
acc :+ next
case _ =>
next_el = Some(next)
acc
}
case false =>
next_el = None
return acc
}
next_el match{
case Some(_) =>
new_acc
case None => accum(new_acc)
}
}
def hasNext = {
toChopIter.hasNext || next_el.isDefined
}
def next: Traversable[A] = accum(next_el.toList)
}
}
And here is an extractor for matching tail:
object MatchTail {
def unapply[A] (l: Traversable[A]) = Some( (l.init, l.last) )
}