Why does Stream.foldLeft take two elements before operating?

Why does Stream.foldLeft take two elements before operating? - scala

foldLeft needs only one element from the collection before operating. So why does it try to resolve two of them? Couldn't it be just a little bit lazier?
def stream(i: Int): Stream[Int] =
if (i < 100) {
println("taking")
i #:: stream(i + 1)
} else Stream.empty
scala> stream(97).foldLeft(0) { case (acc, i) =>
println("using")
acc + i
}
taking
taking
using
taking
using
using
res0: Int = 294
I ask this because I have a built a stream around a mutable priority queue, wherein the iteration of the fold can inject new members into the stream. It starts off with one value and during the first iteration injects more values. But those other values are never seen because the stream has already been resolved to empty in position 2 before the first iteration.

Can only explain why it's happening. Here is source of stream's #:: (Cons):
final class Cons[+A](hd: A, tl: => Stream[A]) extends Stream[A] {
override def isEmpty = false
override def head = hd
#volatile private[this] var tlVal: Stream[A] = _
#volatile private[this] var tlGen = tl _
def tailDefined: Boolean = tlGen eq null
override def tail: Stream[A] = {
if (!tailDefined)
synchronized {
if (!tailDefined) {
tlVal = tlGen()
tlGen = null
}
}
tlVal
}
}
So you can see that head is always calculated (isn't lazy). Here is foldLeft:
override final def foldLeft[B](z: B)(op: (B, A) => B): B = {
if (this.isEmpty) z
else tail.foldLeft(op(z, head))(op)
}
You can see that tail is called here, which means that "head of tail" (second element) becomes calculated automatically (as it requires your stream function to be called again to generate tail). So the better question isn't "why second" - the question is why Stream always calculates its first element. I don't know the answer, but believe that scala-library's implementation could be improved just by making head lazy inside Cons, so you could pass someLazyCalculation #:: stream(i + 1).
Note that eitherway your stream function will be called twice, but second approach gives you a way to avoid automatical second head's calculation by providing some lazy value as a head. Smthng like this could work then (now it doesn't):
def stream(i: Int): Stream[Int] =
if (i < 100) {
lazy val ii = {
println("taking")
i
}
ii #:: stream(i + 1)
} else Stream.empty
P.S. It's probably not so good idea to build (eventually) immutable collection around mutable one.

Related

How to return early without return statement?

How to write an early-return piece of code in scala with no returns/breaks?
For example
for i in 0..10000000
if expensive_operation(i)
return i
return -1

How about
input.find(expensiveOperation).getOrElse(-1)

You can use dropWhile
Here an example:
Seq(2,6,8,3,5).dropWhile(_ % 2 == 0).headOption.getOrElse(default = -1) // -> 8
And here you find more scala-takewhile-example
With your example
(0 to 10000000).dropWhile(!expensive_operation(_)).headOption.getOrElse(default = -1)`

Since you asked for intuition to solve this problem generically. Let me start from the basis.
Scala is (between other things) a functional programming language, as such there is a very important concept for us. And it is that we write programs by composing expressions rather than statements.
Thus, the concept of return value for us means the evaluation of an expression.
(Note this is related to the concept of referential transparency).
val a = expr // a is bounded to the evaluation of expr,
val b = (a, a) // and they are interchangeable, thus b === (expr, expr)
How this relates to your question. In the sense that we really do not have control structures but complex expressions. For example an if
val a = if (expr) exprA else exprB // if itself is an expression, that returns other expressions.
Thus instead of doing something like this:
def foo(a: Int): Int =
if (a != 0) {
val b = a * a
return b
}
return -1
We would do something like:
def foo(a: Int): Int =
if (a != 0)
a * a
else
-1
Because we can bound all the if expression itself as the body of foo.
Now, returning to your specific question. How can we early return a cycle?
The answer is, you can't, at least not without mutations. But, you can use a higher concept, instead of iterating, you can traverse something. And you can do that using recursion.
Thus, let's implement ourselves the find proposed by #Thilo, as a tail-recursive function.
(It is very important that the function is recursive by tail, so the compiler optimizes it as something equivalent to a while loop, that way we will not blow up the stack).
def find(start: Int, end: Int, step: Int = 1)(predicate: Int => Boolean): Option[Int] = {
#annotation.tailrec
def loop(current: Int): Option[Int] =
if (current == end)
None // Base case.
else if (predicate(current))
Some(current) // Early return.
else
loop(current + step) // Recursive step.
loop(current = start)
}
find(0, 10000)(_ == 10)
// res: Option[Int] = Some(10)
Or we may generalize this a little bit more, let's implement find for Lists of any kind of elements.
def find[T](list: List[T])(predicate: T => Boolean): Option[T] = {
#annotation.tailrec
def loop(remaining: List[T]): Option[T] =
remaining match {
case Nil => None
case t :: _ if (predicate(t)) => Some(t)
case _ :: tail => loop(remaining = tail)
}
loop(remaining = list)
}

This is not necessarily the best solution from a practical perspective but I still wanted to add it for educational purposes:
import scala.annotation.tailrec
def expensiveOperation(i: Int): Boolean = ???
#tailrec
def findFirstBy[T](f: (T) => Boolean)(xs: Seq[T]): Option[T] = {
xs match {
case Seq() => None
case Seq(head, _*) if f(head) => Some(head)
case Seq(_, tail#_*) => findFirstBy(f)(tail)
}
}
val result = findFirstBy(expensiveOperation)(Range(0, 10000000)).getOrElse(-1)
Please prefer collections methods (dropWhile, find, ...) in your production code.

There a lot of better answer here but I think a 'while' could work just fine in that situation.
So, this code
for i in 0..10000000
if expensive_operation(i)
return i
return -1
could be rewritten as
var i = 0
var result = false
while(!result && i<(10000000-1)) {
i = i+1
result = expensive_operation(i)
}
After the 'while' the variable 'result' will tell if it succeed or not.

Tail recursion: internal "loop" function or default values for accumulators

I know of at least two styles to writing tail recursive functions. Take a sum function for example:
def sum1(xs: List[Int]): Int = {
def loop(xs: List[Int], acc: Int): Int = xs match {
case Nil => acc
case x :: xs1 => loop(xs1, acc + x)
}
loop(xs, 0)
}
vs
def sum2(xs: List[Int], acc: Int = 0): Int = xs match {
case Nil => acc
case x :: xs1 => sum2(xs1, x + acc)
}
I've noticed the first style (internal loop function) much more commonly than the second. Is there any reason to prefer it or is the difference just a matter of style?

There a couple of reasons to prefer the first notation.
Firstly, you define clearly to your reader what's the internal implementation from the external one.
Secondly, in your example the seed value is a pretty simple one that you can put straight as a default argument, but your seed value may be a very complicated-to-compute object that requires a longer init than default. Should this init for example require to be done asynchronously, you definitely want to put it out of your default value and manage with Futures or w/e.
Lastly, as Didier mentioned, the type of sum1 is a function from List[Int] -> Int (which makes sense), while the type of sum2 is a function from (List[Int], Int) -> Int which is less meaningful. Also, this implies that it's easier to pass sum1 around than sum2. For example, if you have an object that encapsulates a list of Int's and you want to provide synthesizer functions over it you can do (pseudocode, i dont have a repl to write it properly now):
class MyFancyList[T](val seed: List[T]) = {
type SyntFunction = (List[T] => Any)
var functions = Set[SyntFunction]
def addFunction(f: SyntFunction) = functions += f
def computeAll = {
for {
f <- functions
}
yield {
f(seed)
}
}
}
And you can do:
def concatStrings(list:List[Int]) = {
val listOfStrings = for {
n <- list
}
yield {
n+""
}
listOfStrings.mkString
}
val x = MyFancyList(List(1, 2, 3))
x.addFunction(sum1)
x.addFunction(concatStrings)
x.computeAll == List(6, "123")
but you can't add sum2 (not as easily at least)

lazy sorts of iterators in Scala?

I've read that in haskell, when sorting an iterator, it only evaluates as much of the qsort as necessary to return the number of values actually evaluated on the resulting iterator (i.e, it is lazy, i.e., once it has completed the LHS of the first pivot and can return one value, it can provide that one value on a call to "next" on the iterator and not continue pivoting unless next is called again).
For example, in haskell, head(qsort list) is O(n). It just finds the minimum value in the list, and doesn't sort the rest of the list unless the rest of the result of the qsort list is accessed.
Is there a way to do this in Scala? I want to use sortWith on a collection but only sort as much as necessary, such that I can mySeq.sortWith(<).take(3) and have it not need to complete the sort operation.
I'd like to know if other sort functions (like sortBy) can be used in a lazy way, and how to ensure laziness, and how to find any other documentation about when sorts in Scala are or are not lazily evaluated.
UPDATE/EDIT: I'm ideally looking for ways to do this with standard sorting functions like sortWith. I'd rather not have to implement my own version of quicksort just to get lazy evaluation. Shouldn't this be built into the standard library, at least for collections like Stream that support laziness??

I've used Scala's priority queue implementation to solve this kind of partial sorting problem:
import scala.collection.mutable.PriorityQueue
val q = PriorityQueue(1289, 12, 123, 894, 1)(Ordering.Int.reverse)
Now we can call dequeue:
scala> q.dequeue
res0: Int = 1
scala> q.dequeue
res1: Int = 12
scala> q.dequeue
res2: Int = 123
It costs O(n) to build the queue and O(k log n) to take the first k elements.
Unfortunately PriorityQueue doesn't iterate in priority order, but it's not too hard to write an iterator that calls dequeue.

As an example, I created an implementation of lazy quick-sort that creates a lazy tree structure (instead of producing a result list). This structure can be asked for any i-th element in O(n) time or a slice of k elements. Asking the same element again (or an nearby element) takes only O(log n) as the tree structure built in the previous step is reused. Traversing all elements takes O(n log n) time. (All assuming that we've chosen reasonable pivots.)
The key is that subtrees are not built right away, they're delayed in a lazy computation. So when asking only for a single element, the root node is computed in O(n), then one of its sub-nodes in O(n/2) etc. until the required element is found, taking O(n + n/2 + n/4 ...) = O(n). When the tree is fully evaluated, picking any element takes O(log n) as with any balanced tree.
Note that the implementation of build is quite inefficient. I wanted it to be simple and as easy to understand as possible. The important thing is that it has the proper asymptotic bounds.
import collection.immutable.Traversable
object LazyQSort {
/**
* Represents a value that is evaluated at most once.
*/
final protected class Thunk[+A](init: => A) extends Function0[A] {
override lazy val apply: A = init;
}
implicit protected def toThunk[A](v: => A): Thunk[A] = new Thunk(v);
implicit protected def fromThunk[A](t: Thunk[A]): A = t.apply;
// -----------------------------------------------------------------
/**
* A lazy binary tree that keeps a list of sorted elements.
* Subtrees are created lazily using `Thunk`s, so only
* the necessary part of the whole tree is created for
* each operation.
*
* Most notably, accessing any i-th element using `apply`
* takes O(n) time and traversing all the elements
* takes O(n * log n) time.
*/
sealed abstract class Tree[+A]
extends Function1[Int,A] with Traversable[A]
{
override def apply(i: Int) = findNth(this, i);
override def head: A = apply(0);
override def last: A = apply(size - 1);
def max: A = last;
def min: A = head;
override def slice(from: Int, until: Int): Traversable[A] =
LazyQSort.slice(this, from, until);
// We could implement more Traversable's methods here ...
}
final protected case class Node[+A](
pivot: A, leftSize: Int, override val size: Int,
left: Thunk[Tree[A]], right: Thunk[Tree[A]]
) extends Tree[A]
{
override def foreach[U](f: A => U): Unit = {
left.foreach(f);
f(pivot);
right.foreach(f);
}
override def isEmpty: Boolean = false;
}
final protected case object Leaf extends Tree[Nothing] {
override def foreach[U](f: Nothing => U): Unit = {}
override def size: Int = 0;
override def isEmpty: Boolean = true;
}
// -----------------------------------------------------------------
/**
* Finds i-th element of the tree.
*/
#annotation.tailrec
protected def findNth[A](tree: Tree[A], n: Int): A =
tree match {
case Leaf => throw new ArrayIndexOutOfBoundsException(n);
case Node(pivot, lsize, _, l, r)
=> if (n == lsize) pivot
else if (n < lsize) findNth(l, n)
else findNth(r, n - lsize - 1);
}
/**
* Cuts a given subinterval from the data.
*/
def slice[A](tree: Tree[A], from: Int, until: Int): Traversable[A] =
tree match {
case Leaf => Leaf
case Node(pivot, lsize, size, l, r) => {
lazy val sl = slice(l, from, until);
lazy val sr = slice(r, from - lsize - 1, until - lsize - 1);
if ((until <= 0) || (from >= size)) Leaf // empty
if (until <= lsize) sl
else if (from > lsize) sr
else sl ++ Seq(pivot) ++ sr
}
}
// -----------------------------------------------------------------
/**
* Builds a tree from a given sequence of data.
*/
def build[A](data: Seq[A])(implicit ord: Ordering[A]): Tree[A] =
if (data.isEmpty) Leaf
else {
// selecting a pivot is traditionally a complex matter,
// for simplicity we take the middle element here
val pivotIdx = data.size / 2;
val pivot = data(pivotIdx);
// this is far from perfect, but still linear
val (l, r) = data.patch(pivotIdx, Seq.empty, 1).partition(ord.lteq(_, pivot));
Node(pivot, l.size, data.size, { build(l) }, { build(r) });
}
}
// ###################################################################
/**
* Tests some operations and prints results to stdout.
*/
object LazyQSortTest extends App {
import util.Random
import LazyQSort._
def trace[A](name: String, comp: => A): A = {
val start = System.currentTimeMillis();
val r: A = comp;
val end = System.currentTimeMillis();
println("-- " + name + " took " + (end - start) + "ms");
return r;
}
{
val n = 1000000;
val rnd = Random.shuffle(0 until n);
val tree = build(rnd);
trace("1st element", println(tree.head));
// Second element is much faster since most of the required
// structure is already built
trace("2nd element", println(tree(1)));
trace("Last element", println(tree.last));
trace("Median element", println(tree(n / 2)));
trace("Median + 1 element", println(tree(n / 2 + 1)));
trace("Some slice", for(i <- tree.slice(n/2, n/2+30)) println(i));
trace("Traversing all elements", for(i <- tree) i);
trace("Traversing all elements again", for(i <- tree) i);
}
}
The output will be something like
0
-- 1st element took 268ms
1
-- 2nd element took 0ms
999999
-- Last element took 39ms
500000
-- Median element took 122ms
500001
-- Median + 1 element took 0ms
500000
...
500029
-- Slice took 6ms
-- Traversing all elements took 7904ms
-- Traversing all elements again took 191ms

You could use a Stream to build something like that. Here is a simple example, that can definitely be made better, but it works as an example, I guess.
def extractMin(xs: List[Int]) = {
def extractMin(xs: List[Int], min: Int, rest: List[Int]): (Int,List[Int]) = xs match {
case Nil => (min, rest)
case head :: tail if head > min => extractMin(tail, min, head :: rest)
case head :: tail => extractMin(tail, head, min :: rest)
}
if(xs.isEmpty) throw new NoSuchElementException("List is empty")
else extractMin(xs.tail, xs.head, Nil)
}
def lazySort(xs: List[Int]): Stream[Int] = xs match {
case Nil => Stream.empty
case _ =>
val (min, rest) = extractMin(xs)
min #:: lazySort(rest)
}

Summing values in a List

I'm trying to write a scala function which will recursively sum the values in a list. Here is what I have so far :
def sum(xs: List[Int]): Int = {
val num = List(xs.head)
if(!xs.isEmpty) {
sum(xs.tail)
}
0
}
I dont know how to sum the individual Int values as part of the function. I am considering defining a new function within the function sum and have using a local variable which sums values as List is beuing iterated upon. But this seems like an imperative approach. Is there an alternative method ?

Also you can avoid using recursion directly and use some basic abstractions instead:
val l = List(1, 3, 5, 11, -1, -3, -5)
l.foldLeft(0)(_ + _) // same as l.foldLeft(0)((a,b) => a + b)
foldLeft is as reduce() in python. Also there is foldRight which is also known as accumulate (e.g. in SICP).

With recursion I often find it worthwhile to think about how you'd describe the process in English, as that often translates to code without too much complication. So...
"How do I calculate the sum of a list of integers recursively?"
"Well, what's the sum of a list, 3 :: restOfList?
"What's restOfList?
"It could be anything, you don't know. But remember, we're being recursive - and don't you have a function to calculate the sum of a list?"
"Oh right! Well then the sum would be 3 + sum(restOfList).
"That's right. But now your only problem is that every sum is defined in terms of another call to sum(), so you'll never be able to get an actual value out. You'll need some sort of base case that everything will actually reach, and that you can provide a value for."
"Hmm, you're right." Thinks...
"Well, since your lists are getting shorter and shorter, what's the shortest possible list?"
"The empty list?"
"Right! And what's the sum of an empty list of ints?"
"Zero - I get it now. So putting it together, the sum of an empty list is zero, and the sum of any other list is its first element added to the sum of the rest of it.
And indeed, the code could read almost exactly like that last sentence:
def sumList(xs: List[Int]) = {
if (xs.isEmpty) 0
else xs.head + sumList(xs.tail)
}
(The pattern matching versions, such as that proposed by Kim Stebel, are essentially identical to this, they just express the conditions in a more "functional" way.)

Here's the the "standard" recursive approach:
def sum(xs: List[Int]): Int = {
xs match {
case x :: tail => x + sum(tail) // if there is an element, add it to the sum of the tail
case Nil => 0 // if there are no elements, then the sum is 0
}
}
And, here's a tail-recursive function. It will be more efficient than a non-tail-recursive function because the compiler turns it into a while loop that doesn't require pushing a new frame on the stack for every recursive call:
def sum(xs: List[Int]): Int = {
#tailrec
def inner(xs: List[Int], accum: Int): Int = {
xs match {
case x :: tail => inner(tail, accum + x)
case Nil => accum
}
}
inner(xs, 0)
}

You cannot make it more easy :
val list = List(3, 4, 12);
println(list.sum); // result will be 19
Hope it helps :)

Your code is good but you don't need the temporary value num. In Scala [If] is an expression and returns a value, this will be returned as the value of the sum function. So your code will be refactored to:
def sum(xs: List[Int]): Int = {
if(xs.isEmpty) 0
else xs.head + sum(xs.tail)
}
If list is empty return 0 else you add the to the head number the rest of the list

The canonical implementation with pattern matching:
def sum(xs:List[Int]) = xs match {
case Nil => 0
case x::xs => x + sum(xs)
}
This isn't tail recursive, but it's easy to understand.

Building heavily on #Kim's answer:
def sum(xs: List[Int]): Int = {
if (xs.isEmpty) throw new IllegalArgumentException("Empty list provided for sum operation")
def inner(xs: List[Int]): Int = {
xs match {
case Nil => 0
case x :: tail => xs.head + inner(xs.tail)
}
}
return inner(xs)
}
The inner function is recursive and when an empty list is provided raise appropriate exception.

If you are required to write a recursive function using isEmpty, head and tail, and also throw exception in case empty list argument:
def sum(xs: List[Int]): Int =
if (xs.isEmpty) throw new IllegalArgumentException("sum of empty list")
else if (xs.tail.isEmpty) xs.head
else xs.head + sum(xs.tail)

def sum(xs: List[Int]): Int = {
def loop(accum: Int, xs: List[Int]): Int = {
if (xs.isEmpty) accum
else loop(accum + xs.head, xs.tail)
}
loop(0,xs)
}

def sum(xs: List[Int]): Int = xs.sum
scala> sum(List(1,3,7,5))
res1: Int = 16
scala> sum(List())
res2: Int = 0

To add another possible answer to this, here is a solution I came up with that is a slight variation of #jgaw's answer and uses the #tailrec annotation:
def sum(xs: List[Int]): Int = {
if (xs.isEmpty) throw new Exception // May want to tailor this to either some sort of case class or do something else
#tailrec
def go(l: List[Int], acc: Int): Int = {
if (l.tail == Nil) l.head + acc // If the current 'list' (current element in xs) does not have a tail (no more elements after), then we reached the end of the list.
else go(l.tail, l.head + acc) // Iterate to the next, add on the current accumulation
}
go(xs, 0)
}
Quick note regarding the checks for an empty list being passed in; when programming functionally, it is preferred to not throw any exceptions and instead return something else (another value, function, case class, etc.) to handle errors elegantly and to keep flowing through the path of execution rather than stopping it via an Exception. I threw one in the example above since we're just looking at recursively summing items in a list.

Tried the following method without using substitution approach
def sum(xs: List[Int]) = {
val listSize = xs.size
def loop(a:Int,b:Int):Int={
if(a==0||xs.isEmpty)
b
else
loop(a-1,xs(a-1)+b)
}
loop(listSize,0)
}

Scalaz's traverse_ with IO monad

I want to use IO monad.
But this code do not run with large file.
I am getting a StackOverflowError.
I tried the -DXss option, but it throws the same error.
val main = for {
l <- getFileLines(file)(collect[String, List]).map(_.run)
_ <- l.traverse_(putStrLn)
} yield ()
How can I do it?
I wrote Iteratee that is output all element.
def putStrLn[E: Show]: IterV[E, IO[Unit]] = {
import IterV._
def step(i: IO[Unit])(input: Input[E]): IterV[E, IO[Unit]] =
input(el = e => Cont(step(i >|> effects.putStrLn(e.shows))),
empty = Cont(step(i)),
eof = Done(i, EOF[E]))
Cont(step(mzero[IO[Unit]]))
}
val main = for {
i <- getFileLines(file)(putStrLn).map(_.run)
} yield i.unsafePerformIO
This is also the same result.
I think to be caused by IO implementation.

This is because scalac is not optimizing loop inside getReaderLines for tail calls. loop is tail recursive but I think the case anonymous function syntax gets in the way.
Edit: actually it's not even tail recursive (the wrapping in the IO monad) causes at least one more call after the recursive call. When I was doing my testing yesterday, I was using similar code but I had dropped the IO monad and it was then possible to make the Iteratee tail recursive. The text below, assumes no IO monad...
I happened to find that out yesterday while experimenting with iteratees. I think changing the signature of loop to this will help (so for the time being you may have to reimplement getFilesLines and getReaderLines:
#annotations.tailrec
def loop(it: IterV[String, A]): IO[IterV[String, A]] = it match {
// ...
}
We should probably report this to the scalaz folk (and may be open an enhancement ticket for scala).
This shows what happens (code vaguely similar to getReaderLines.loop):
#annotation.tailrec
def f(i: Int): Int = i match {
case 0 => 0
case x => f(x - 1)
}
// f: (i: Int)Int
#annotation.tailrec
def g: Int => Int = {
case 0 => 0
case x => g(x - 1)
}
/* error: could not optimize #tailrec annotated method g:
it contains a recursive call not in tail position
def g: Int => Int = {
^
*/

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Why does Stream.foldLeft take two elements before operating? - scala

Related

How to return early without return statement?

Tail recursion: internal "loop" function or default values for accumulators

lazy sorts of iterators in Scala?

Summing values in a List

Scalaz's traverse_ with IO monad

Categories

Resources