I am using a library that provides a Traversable[T] that pages through database results. I'd like to avoid loading the whole thing into memory, so I am trying to convert it to a Stream[T].
From what I can tell, the built in "asStream" method loads the whole Traversable into a Buffer, which defeats my purpose. My attempt (below) hits a StackOverflowException on large results, and I can't tell why. Can someone help me understand what is going on? Thanks!
def asStream[T](traversable: => Traversable[T]): Stream[T] = {
if (traversable.isEmpty) Empty
else {
lazy val head = traversable.head
lazy val tail = asStream(traversable.tail)
head #:: tail
}
}
Here's a complete example that reproduces this, based on a suggestion by #SCouto
import scala.collection.immutable.Stream.Empty
object StreamTest {
def main(args: Array[String]) = {
val bigVector = Vector.fill(90000)(1)
val optionStream = asStream(bigVector).map(v => Some(v))
val zipped = optionStream.zipAll(optionStream.tail, None, None)
}
def asStream[T](traversable: => Traversable[T]): Stream[T] = {
#annotation.tailrec
def loop(processed: => Stream[T], pending: => Traversable[T]): Stream[T] = {
if (pending.isEmpty) processed
else {
lazy val head = pending.head
lazy val tail = pending.tail
loop(processed :+ head, tail)
}
}
loop(Empty, traversable)
}
}
Edit: After some interesting ideas from #SCouto, I learned this could also be done with trampolines to keep the result as a Stream[T] that is in the original order
object StreamTest {
def main(args: Array[String]) = {
val bigVector = Range(1, 90000).toVector
val optionStream = asStream(bigVector).map(v => Some(v))
val zipped = optionStream.zipAll(optionStream.tail, None, None)
zipped.take(10).foreach(println)
}
def asStream[T](traversable: => Traversable[T]): Stream[T] = {
sealed trait Traversal[+R]
case class More[+R](result: R, next: () => Traversal[R]) extends Traversal[R]
case object Done extends Traversal[Nothing]
def next(currentTraversable: Traversable[T]): Traversal[T] = {
if (currentTraversable.isEmpty) Done
else More(currentTraversable.head, () => next(currentTraversable.tail))
}
def trampoline[R](body: => Traversal[R]): Stream[R] = {
def loop(thunk: () => Traversal[R]): Stream[R] = {
thunk.apply match {
case More(result, next) => Stream.cons(result, loop(next))
case Done => Stream.empty
}
}
loop(() => body)
}
trampoline(next(traversable))
}
}
Try this:
def asStream[T](traversable: => Traversable[T]): Stream[T] = {
#annotation.tailrec
def loop(processed: Stream[T], pending: Traversable[T]): Stream[T] = {
if (pending.isEmpty) processed
else {
lazy val head = pending.head
lazy val tail = pending.tail
loop(head #:: processed, tail)
}
}
loop(Empty, traversable)
}
The main point is to ensure that your recursive call is the last action of your recursive function.
To ensure this you can use both a nested method (called loop in the example) and the tailrec annotation which ensures your method is tail-safe.
You can find info about tail rec here and in this awesome answer here
EDIT
The problem was that we were adding the element at the end of the Stream. If you add it as head of the Stream as in your example it will work fine. I updated my code. Please test it and let us know the result.
My tests:
scala> val optionStream = asStream(Vector.fill(90000)(1)).map(v => Some(v))
optionStream: scala.collection.immutable.Stream[Some[Int]] = Stream(Some(1), ?)
scala> val zipped = optionStream.zipAll(optionStream.tail, None, None)
zipped: scala.collection.immutable.Stream[(Option[Int], Option[Int])] = Stream((Some(1),Some(1)), ?)
EDIT2:
According to your comments, and considering the fpinscala example as you said. I think this may help you. The point is creating a case class structure with lazy evaluation. Where the head is a single element, and the tail a traversable
sealed trait myStream[+T] {
def head: Option[T] = this match {
case MyEmpty => None
case MyCons(h, _) => Some(h())
}
def tail: myStream[T] = this match {
case MyEmpty => MyEmpty
case MyCons(_, t) => myStream.cons(t().head, t().tail)
}
}
case object MyEmpty extends myStream[Nothing]
case class MyCons[+T](h: () => T, t: () => Traversable[T]) extends myStream[T]
object myStream {
def cons[T](hd: => T, tl: => Traversable[T]): myStream[T] = {
lazy val head = hd
lazy val tail = tl
MyCons(() => head, () => tail)
}
def empty[T]: myStream[T] = MyEmpty
def apply[T](as: T*): myStream[T] = {
if (as.isEmpty) empty
else cons(as.head, as.tail)
}
}
Some Quick tests:
val bigVector = Vector.fill(90000)(1)
myStream.cons(bigVector.head, bigVector.tail)
res2: myStream[Int] = MyCons(<function0>,<function0>)
Retrieving head:
res2.head
res3: Option[Int] = Some(1)
And the tail:
res2.tail
res4: myStream[Int] = MyCons(<function0>,<function0>)
EDIT3
The trampoline solution by the op:
def asStream[T](traversable: => Traversable[T]): Stream[T] = {
sealed trait Traversal[+R]
case class More[+R](result: R, next: () => Traversal[R]) extends Traversal[R]
case object Done extends Traversal[Nothing]
def next(currentTraversable: Traversable[T]): Traversal[T] = {
if (currentTraversable.isEmpty) Done
else More(currentTraversable.head, () => next(currentTraversable.tail))
}
def trampoline[R](body: => Traversal[R]): Stream[R] = {
def loop(thunk: () => Traversal[R]): Stream[R] = {
thunk.apply match {
case More(result, next) => Stream.cons(result, loop(next))
case Done => Stream.empty
}
}
loop(() => body)
}
trampoline(next(traversable))
}
}
Stream doesn't keep the data in memory because you declare how to generate each item. It's very likely that your database data is not been procedurally generated so what you need is to fetch the data the first time you ask for it (something like def getData(index: Int): Future[Data]).
The biggest problem rise in, since you are fetching data from a database, you are probably using Futures so, even if you are able to achieve it, you would have a Future[Stream[Data]] object which is not that nice to use or, much worst, block it.
Wouldn't be much more worthy just to paginate your database data query?
Related
I'm not sure whether I chose the right title for my question..
I'm interested as to why the collection in the companion object is defined. Am I mistaken that this collection will have only one f in it? What I am seeing is a collection with exactly one element.
Here's the Future I'm dealing with:
trait Future[+T] { self =>
def onComplete(callback: Try[T] => Unit): Unit
def map[U](f: T => U) = new Future[U] {
def onComplete(callback: Try[U] => Unit) =
self onComplete (t => callback(t.map(f)))
}
def flatMap[U](f: T => Future[U]) = new Future[U] {
def onComplete(callback: Try[U] => Unit) =
self onComplete { _.map(f) match {
case Success(fu) => fu.onComplete(callback)
case Failure(e) => callback(Failure(e))
} }
}
def filter(p: T => Boolean) =
map { t => if (!p(t)) throw new NoSuchElementException; t }
}
Its companion object:
object Future {
def apply[T](f: => T) = {
val handlers = collection.mutable.Buffer.empty[Try[T] => Unit]
var result: Option[Try[T]] = None
val runnable = new Runnable {
def run = {
val r = Try(f)
handlers.synchronized {
result = Some(r)
handlers.foreach(_(r))
}
}
}
(new Thread(runnable)).start()
new Future[T] {
def onComplete(f: Try[T] => Unit) = handlers.synchronized {
result match {
case None => handlers += f
case Some(r) => f(r)
}
}
}
}
}
In my head I was imagining something like the following instead of the above companion object (notice how I replaced the above val handlers .. with var handler ..):
object Future {
def apply[T](f: => T) = {
var handler: Option[Try[T] => Unit] = None
var result: Option[Try[T]] = None
val runnable = new Runnable {
val execute_when_ready: Try[T] => Unit = r => handler match {
case None => execute_when_ready(r)
case Some(f) => f(r)
}
def run = {
val r = Try(f)
handler.synchronized {
result = Some(r)
execute_when_ready(r)
}
}
}
(new Thread(runnable)).start()
new Future[T] {
def onComplete(f: Try[T] => Unit) = handler.synchronized {
result match {
case None => handler = Some(f)
case Some(r) => f(r)
}
}
}
}
}
So why does the function execute_when_ready leads to stackoverflow, but that's not the case with handlers.foreach? what is the collection is offering me which I can't do without it? And is it possible to replace the collection with something else in the companion object?
The collection is not in the companion object, it is in the apply method, so there is a new instance for each Future. It is there because there can be multiple pending onComplete handlers on the same Future.
Your implementation only allows a single handler and silently removes any existing handler in onComplete which is a bad idea because the caller has no idea if a previous function has added an onComplete handler or not.
As noted in the comments, the stack overflow is because execute_when_ready calls itself if handler is None with no mechanism to stop the recursion.
I wrote simple traversal function:
case class Node(data: String, childs: Seq[Node] = Seq.empty)
def travers(root: Node, visit: String => Unit): Unit = {
def recur(n: Node): Unit = {
visit(n.data)
for (c <- n.childs) recur(c)
}
recur(root)
}
val root = Node("1",
Seq(Node("2",
Seq(Node("3"), Node("4")))))
travers(root, s => println(s))
How to implement it with tail recursion?
You could do this by "manually" managing a stack of nodes to visit, like this:
def travers(root: Node, visit: String => Unit): Unit = {
#scala.annotation.tailrec
def recur(stack: List[Node]): Unit = stack match {
case Node(d, children) :: rest => {
visit(d)
recur(children.toList ++ rest)
}
case Nil => {}
}
recur(List(root))
}
You can try changing the recursion to handle a sequence of children at each iteration - and get a breadth-first traversal (unlike the depth-first order #JoeK's answer and your implementation share):
def travers(root: Node, visit: String => Unit): Unit = {
#tailrec
def recur(ns: List[Node]): Unit = {
ns.foreach(n => visit(n.data))
ns.flatMap(_.childs) match {
case Nil => // done
case more => recur(more)
}
}
recur(List(root))
}
My old code looks something like below, where all db calls blocking.
I need help converting this over to using Futures.
def getUserPoints(username: String): Option[Long]
db.getUserPoints(username) match {
case Some(userPoints) => Some(userPoints.total)
case None => {
if (db.getSomething("abc").isEmpty) {
db.somethingElse("asdf") match {
case Some(pointId) => {
db.setPoints(pointId, username)
db.findPointsForUser(username)
}
case _ => None
}
} else {
db.findPointsForUser(username)
}
}
}
}
My new API is below where I am returning Futures.
db.getUserPoints(username: String): Future[Option[UserPoints]]
db.getSomething(s: String): Future[Option[Long]]
db.setPoints(pointId, username): Future[Unit]
db.findPointsForUser(username): Future[Option[Long]]
How can I go about converting the above to use my new API that uses futures.
I tried using a for-compr but started to get wierd errors like Future[Nothing].
var userPointsFut: Future[Long] = for {
userPointsOpt <- db.getUserPoints(username)
userPoints <- userPointsOpt
} yield userPoints.total
But it gets a bit tricky with all the branching and if clauses and trying to convert it over to futures.
I would argue that the first issue with this design is that the port of the blocking call to a Future should not wrap the Option type:
The blocking call:
def giveMeSomethingBlocking(for:Id): Option[T]
Should become:
def giveMeSomethingBlocking(for:Id): Future[T]
And not:
def giveMeSomethingBlocking(for:Id): Future[Option[T]]
The blocking call give either a value Some(value) or None, the non-blocking Future version gives either a Success(value) or Failure(exception) which fully preserves the Option semantics in a non-blocking fashion.
With that in mind, we can model the process in question using combinators on Future. Let's see how:
First, lets refactor the API to something we can work with:
type UserPoints = Long
object db {
def getUserPoints(username: String): Future[UserPoints] = ???
def getSomething(s: String): Future[UserPoints] = ???
def setPoints(pointId:UserPoints, username: String): Future[Unit] = ???
def findPointsForUser(username: String): Future[UserPoints] = ???
}
class PointsNotFound extends Exception("bonk")
class StuffNotFound extends Exception("sthing not found")
Then, the process would look like:
def getUserPoints(username:String): Future[UserPoints] = {
db.getUserPoints(username)
.map(userPoints => userPoints /*.total*/)
.recoverWith{
case ex:PointsNotFound =>
(for {
sthingElse <- db.getSomething("abc")
_ <- db.setPoints(sthingElse, username)
points <- db.findPointsForUser(username)
} yield (points))
.recoverWith{
case ex: StuffNotFound => db.findPointsForUser(username)
}
}
}
Which type-checks correctly.
Edit
Given that the API is set in stone, a way to deal with nested monadic types is to define a MonadTransformer. In simple words, let's make Future[Option[T]] a new monad, let's call it FutureO that can be composed with other of its kind. [1]
case class FutureO[+A](future: Future[Option[A]]) {
def flatMap[B](f: A => FutureO[B])(implicit ec: ExecutionContext): FutureO[B] = {
val newFuture = future.flatMap{
case Some(a) => f(a).future
case None => Future.successful(None)
}
FutureO(newFuture)
}
def map[B](f: A => B)(implicit ec: ExecutionContext): FutureO[B] = {
FutureO(future.map(option => option map f))
}
def recoverWith[U >: A](pf: PartialFunction[Throwable, FutureO[U]])(implicit executor: ExecutionContext): FutureO[U] = {
val futOtoFut: FutureO[U] => Future[Option[U]] = _.future
FutureO(future.recoverWith(pf andThen futOtoFut))
}
def orElse[U >: A](other: => FutureO[U])(implicit executor: ExecutionContext): FutureO[U] = {
FutureO(future.flatMap{
case None => other.future
case _ => this.future
})
}
}
And now we can re-write our process preserving the same structure as the future-based composition.
type UserPoints = Long
object db {
def getUserPoints(username: String): Future[Option[UserPoints]] = ???
def getSomething(s: String): Future[Option[Long]] = ???
def setPoints(pointId: UserPoints, username:String): Future[Unit] = ???
def findPointsForUser(username: String): Future[Option[Long]] = ???
}
class PointsNotFound extends Exception("bonk")
class StuffNotFound extends Exception("sthing not found")
def getUserPoints2(username:String): Future[Option[UserPoints]] = {
val futureOpt = FutureO(db.getUserPoints(username))
.map(userPoints => userPoints /*.total*/)
.orElse{
(for {
sthingElse <- FutureO(db.getSomething("abc"))
_ <- FutureO(db.setPoints(sthingElse, username).map(_ => Some(())))
points <- FutureO(db.findPointsForUser(username))
} yield (points))
.orElse{
FutureO(db.findPointsForUser(username))
}
}
futureOpt.future
}
[1] with acknowledgements to http://loicdescotte.github.io/posts/scala-compose-option-future/
I have an Iterator[Record] which is ordered on record.id this way:
record.id=1
record.id=1
...
record.id=1
record.id=2
record.id=2
..
record.id=2
Records of a specific ID could occur a large number of times, so I want to write a function that takes this iterator as input, and returns an Iterator[Iterator[Record]] output in a lazy manner.
I was able to come up with the following, but it fails on StackOverflowError after 500K records or so:
def groupByIter[T, B](iterO: Iterator[T])(func: T => B): Iterator[Iterator[T]] = new Iterator[Iterator[T]] {
var iter = iterO
def hasNext = iter.hasNext
def next() = {
val first = iter.next()
val firstValue = func(first)
val (i1, i2) = iter.span(el => func(el) == firstValue)
iter = i2
Iterator(first) ++ i1
}
}
What am I doing wrong?
Trouble here is that each Iterator.span call makes another stacked closure for trailing iterator, and without any trampolining it's very easy to overflow.
Actually I dont think there is an implementation, which is not memoizing elements of prefix iterator, since followed iterator could be accessed earlier than prefix is drain out.
Even in .span implementation there is a Queue to memoize elements in the Leading definition.
So easiest implementation that I could imagine is the following via Stream.
implicit class StreamChopOps[T](xs: Stream[T]) {
def chopBy[U](f: T => U): Stream[Stream[T]] = xs match {
case x #:: _ =>
def eq(e: T) = f(e) == f(x)
xs.takeWhile(eq) #:: xs.dropWhile(eq).chopBy(f)
case _ => Stream.empty
}
}
Although it could be not the most performant as it memoize a lot. But with proper iterating of that, GC should handle problem of excess intermediate streams.
You could use it as myIterator.toStream.chopBy(f)
Simple check validates that following code can run without SO
Iterator.fill(10000000)(Iterator(1,1,2)).flatten //1,1,2,1,1,2,...
.toStream.chopBy(identity) //(1,1),(2),(1,1),(2),...
.map(xs => xs.sum * xs.size).sum //60000000
Inspired by chopBy implemented by #Odomontois here is a chopBy I implemented for Iterator. Of course each bulk should fit allocated memory. It doesn't looks very elegant but it seems to work :)
implicit class IteratorChopOps[A](toChopIter: Iterator[A]) {
def chopBy[U](f: A => U) = new Iterator[Traversable[A]] {
var next_el: Option[A] = None
#tailrec
private def accum(acc: List[A]): List[A] = {
next_el = None
val new_acc = hasNext match {
case true =>
val next = toChopIter.next()
acc match {
case Nil =>
acc :+ next
case _ MatchTail t if (f(t) == f(next)) =>
acc :+ next
case _ =>
next_el = Some(next)
acc
}
case false =>
next_el = None
return acc
}
next_el match{
case Some(_) =>
new_acc
case None => accum(new_acc)
}
}
def hasNext = {
toChopIter.hasNext || next_el.isDefined
}
def next: Traversable[A] = accum(next_el.toList)
}
}
And here is an extractor for matching tail:
object MatchTail {
def unapply[A] (l: Traversable[A]) = Some( (l.init, l.last) )
}
I'm trying to implement a container for a match (like in sports) result so that I can create matches between the winners of other matches. This concept is close to what a future monads is as it contains a to be defined value, and also close to a state monad as it hides state change. Being mostly a begginer on the topic I have implemented an initial version in scala that is surely improvable. I added a get method that I'm not sure was a good idea, and so far the only way to create a value would be Unknown(null) which is not as elegant as I'd hoped. What do you think I could do to improve this design?
case class Unknown[T](t : T) {
private var value : Option[T] = Option(t)
private var applicatives: List[T => Unit] = Nil
def set(t: T) {
if (known) {
value = Option(t)
applicatives.foreach(f => f(t))
applicatives = Nil
} else {
throw new IllegalStateException
}
}
def get : T = value.get
def apply(f: T => Unit) = value match {
case Some(x) => f(x);
case None => applicatives ::= f
}
def known = value == None
}
UPDATE: a usage example of the current implementation follows
case class Match(val home: Unknown[Team], val visit: Unknown[Team], val result: Unknown[(Int, Int)]) {
val winner: Unknown[Team] = Unknown(null)
val loser: Unknown[Team] = Unknown(null)
result.apply(result => {
if (result._1 > result._2) {
home.apply(t => winner.set(t))
visit.apply(t => loser.set(t))
} else {
home.apply(t => loser.set(t))
visit.apply(t => winner.set(t))
}
})
}
And a test snippet:
val definedUnplayedMatch = Match(Unknown(Team("A")), Unknown(Team("B")), Unknown(null));
val definedPlayedMatch = Match(Unknown(Team("D")), Unknown(Team("E")), Unknown((1,0)));
val undefinedUnplayedMatch = Match(Unknown(null), Unknown(null), Unknown(null));
definedUnplayedMatch.winner.apply(undefinedUnplayedMatch.home.set(_))
definedPlayedMatch.winner.apply(undefinedUnplayedMatch.visit.set(_))
undefinedUnplayedMatch.result.set((3,1))
definedUnplayedMatch.result.set((2,4))
undefinedUnplayedMatch.winner.get must be equalTo(Team("B"));
undefinedUnplayedMatch.loser.get must be equalTo(Team("D"));
UPDATE - CURRENT IDEA : I haven't had much time to work on this because my laptop broke down, but I though it would be useful to write the monad I have so far for those who are interested:
sealed abstract class Determine[+A] {
def map[B](f: A => B): Determine[B]
def flatMap[B](f: A => Determine[B]): Determine[B]
def filter(p: A => Boolean): Determine[A]
def foreach(b: A => Unit): Unit
}
final case class Known[+A](value: A) extends Determine[A] {
def map[B](f: A => B): Determine[B] = Known(f(value))
def flatMap[B](f: A => Determine[B]): Determine[B] = f(value)
def filter(p: A => Boolean): Determine[A] = if (p(value)) this else Unknown
def foreach(b: A => Unit): Unit = b(value)
}
final case class TBD[A](definer: () => A) extends Determine[A] {
private var value: A = _
def map[B](f: A => B): Determine[B] = {
def newDefiner(): B = {
f(cachedDefiner())
}
TBD[B](newDefiner)
}
def flatMap[B](f: A => Determine[B]): Determine[B] = {
f(cachedDefiner())
}
def filter(p: A => Boolean): Determine[A] = {
if (p(cachedDefiner()))
this
else
Unknown
}
def foreach(b: A => Unit): Unit = {
b(cachedDefiner())
}
private def cachedDefiner(): A = {
if (value == null)
value = definer()
value
}
}
case object Unknown extends Determine[Nothing] {
def map[B](f: Nothing => B): Determine[B] = this
def flatMap[B](f: Nothing => Determine[B]): Determine[B] = this
def filter(p: Nothing => Boolean): Determine[Nothing] = this
def foreach(b: Nothing => Unit): Unit = {}
}
I got rid of the set & get and now the TBD class receives instead a function that will define provide the value or null if still undefined. This idea works great for the map method, but the rest of the methods have subtle bugs.
For a simple approach, you don't need monads, with partial application is enough:
//some utilities
type Score=(Int,Int)
case class MatchResult[Team](winner:Team,loser:Team)
//assume no ties
def playMatch[Team](home:Team,away:Team)(score:Score)=
if (score._1>score._2) MatchResult(home,away)
else MatchResult(away,home)
//defined played match
val dpm= playMatch("D","E")(1,0)
//defined unplayed match, we'll apply the score later
val dum= playMatch("A","B")_
// a function that takes the dum score and applies it
// to get a defined played match from an undefined one
// still is a partial application of match because we don't have the final result yet
val uumWinner= { score:Score => playMatch (dpm.winner,dum(score).winner) _ }
val uumLoser= { score:Score => playMatch (dpm.loser,dum(score).loser) _}
//apply the scores
uumWinner (2,4)(3,1)
uumLoser (2,4)(0,1)
//scala> uumWinner (2,4)(3,1)
//res6: MatchResult[java.lang.String] = MatchResult(D,B)
//scala> uumLoser (2,4)(0,1)
//res7: MatchResult[java.lang.String] = MatchResult(A,E)
This is a starting point, I'm pretty sure it can be further refined. Maybe there we'll find the elusive monad. But I think an applicative functor will be enough.
I'll give another pass later...