foldLeft early termination in a Stream[Boolean]? - scala

I have a:
val a : Stream[Boolean] = ...
When I foldLeft it as follows
val b = a.foldLeft(false)(_||_)
Will it terminate when it finds the first true value in the stream? If not, how do I make it to?

It would not terminate on the first true. You can use exists instead:
val b = a.exists(identity)

No it won't terminate early. This is easy to illustrate:
val a : Stream[Boolean] = Stream.continually(true)
// won't terminate because the strea
val b = a.foldLeft(false)(_||_)
stew showed that a simple solution to terminate early, in your specific case, is
val b = a.exists(identity).
Even simpler, this is equivalent to:
val b = a.contains(true)
A more general solution which unlike the above is also applicable if you actually need a fold, is to use recursion (note that here I am assuming the stream is non-empty, for simplicity):
def myReduce( s: Stream[Boolean] ): Boolean = s.head || myReduce( s.tail )
val b = myReduce(a)
Now the interesting thing of the recursive solution is how it can be used in a more general use case where you actually need to accumulate the values in some way (which is what fold is for) and still terminate early. Say that you want to add the values of a stream of ints using an add method that will "terminate" early in a way similar to || (in this case, it does not evaluate its right hand side if the left hand side is > 100):
def add(x: Int, y: => Int) = if ( x >= 100 ) x else x + y
val a : Stream[Int] = Stream.range(0, Int.MaxValue)
val b = a.foldLeft(0)(add(_, _))
The last line won't terminate, much like in your example. But you can fix it like this:
def myReduce( s: Stream[Int] ): Int = add( s.head, myReduce( s.tail ) )
val b = myReduce(a)
WARNING: there is a significant downside to this approach though: myReduce here is not tail recursive, meaning that it will blow your stack if iterating over too many elements of the stream.
Yet another solution, which does nto blow the stack, is this:
val b = a.takeWhile(_ <= 100).foldLeft(0)(_ + _)
But I fear I have gone really too far on the off topic side, so I'd better stop now.

You could use takeWhile to extract the prefix of the Stream on which you want to operate and then apply foldLeft to that.

Related

How to write this function efficiently?

Suppose I am writing function foo: Seq[B] => Boolean like this:
case class B(xs: Seq[Int])
def foo(bs: Seq[B]): Boolean = bs.map(_.xs.size).sum > 0
The implementation above is suboptimal (since it's not necessary to loop over all bs elements to return true). How would you implement foo efficiently ?
Well, for 0 this is kind of trivial:
bs.exists(!_.xs.isEmpty)
does the job, because as soon as you find a non-empty xs, you are done.
Now, suppose that the threshold is not trivial, e.g. 42 instead of 0.
You can then take the iterator of bs, incrementally aggregate the values using scanLeft, and then check whether there exists an intermediate result that is greater than zero:
def foo(bs: Seq[Int]): Boolean = bs
.iterator
.scanLeft(0)(_ + _.xs.size)
.exists(_ > 42)

How to get a Long typed production of a Seq[Int] in Scala?

Suppose val s = Seq[Int] and I would like to get the production of all its elements. The value is guaranteed to be greater than Int.MaxValue but less than Long.MaxValue so I hope the value to be a Long type.
It seems I cannot use product/foldLeft/reduceLeft due to the fact Long and Int are different types without any relations; therefore I need to write a for-loop myself. Is there any decent way to achieve this goal?
Note: I'm just asking the possibility to use builtin libraries but still fine with "ugly" code below.
def product(a: Seq[Int]): Long = {
var p = 1L
for (e <- a) p = p * e
p
}
There's no need to mess about with asInstanceOf or your own loop. foldLeft works just fine
val xs = Seq(1,1000000000,1000000)
xs.foldLeft(1L)((a,e) => a*e)
//> res0: Long = 1000000000000000
How about
def product(s: Seq[Int]) = s.map(_.asInstanceOf[Long]).fold(1L)( _ * _ )
In fact, having re-read your question and learnt about the existence of product itself, you could just do:
def product(s: Seq[Int]) = s.map(_.asInstanceOf[Long]).product

lazy val v.s. val for recursive stream in Scala

I understand the basic of diff between val and lazy val .
but while I run across this example, I 'm confused.
The following code is right one. It is a recursion on stream type lazy value.
def recursive(): {
lazy val recurseValue: Stream[Int] = 1 #:: recurseValue.map(_+1)
recurseValue
}
If I change lazy val to val. It reports error.
def recursive(): {
//error forward reference failed.
val recurseValue: Stream[Int] = 1 #:: recurseValue.map(func)
recurseValue
}
My trace of thought in 2th example by substitution model/evaluation strategy is :
the right hand sight of #:: is call by name with that the value shall be of the form :
1 #:: ?,
and if 2th element being accessed afterward, it refer to current recurseValue value and rewriting it to :
1 :: ((1 #:: ?) map func) =
1 :: (func(1) #:: (? map func))
.... and so on and so on such that the compiler should success.
I don't see any error when I rewriting it ,is there somthing wrong?
EDIT:
CONCLUSION:I found it work fine if the val defined as a field. And I also noticed this post about implement of val. The conclusion is that the val has different implementation in method or field or REPL. That's confusing really.
That substitution model works for recursion if you are defining functions, but you can't define a variable in terms of itself unless it is lazy. All of the info needed to compute the right-hand side must be available for the assignment to take place, so a bit of laziness is required in order to recursively define a variable.
You probably don't really want to do this, but just to show that it works for functions:
scala> def r = { def x:Stream[Int] = 1#::( x map (_+1) ); x }
r: Stream[Int]
scala> r take 3 foreach println
1
2
3

Why stream fold operation throws Out of memory exception?

I have following simple code
def fib(i:Long,j:Long):Stream[Long] = i #:: fib(j, i+j)
(0l /: fib(1,1).take(10000000)) (_+_)
And it throws OutOfMemmoryError exception.
I can not understand why, because I think all the parts use constant memmory i.e. lazy evaluation streams and foldLeft...
Those code also don't work
fib(1,1).take(10000000).sum or max, min e.t.c.
How to correctly implement infinite streams and do iterative operations upon it?
Scala version: 2.9.0
Also scala javadoc said, that foldLeft operation is memmory safe for streams
/** Stream specialization of foldLeft which allows GC to collect
* along the way.
*/
#tailrec
override final def foldLeft[B](z: B)(op: (B, A) => B): B = {
if (this.isEmpty) z
else tail.foldLeft(op(z, head))(op)
}
EDIT:
Implementation with iterators still not useful, since it throws ${domainName} exception
def fib(i:Long,j:Long): Iterator[Long] = Iterator(i) ++ fib(j, i + j)
How to define correctly infinite stream/iterator in Scala?
EDIT2:
I don't care about int overflow, I just want to understand how to create infinite stream/iterator etc in scala without side effects .
The reason to use Stream instead of Iterator is so that you don't have to calculate all the small terms in the series over again. But this means that you need to store ten million stream nodes. These are pretty large, unfortunately, so that could be enough to overflow the default memory. The only realistic way to overcome this is to start with more memory (e.g. scala -J-Xmx2G). (Also, note that you're going to overflow Long by an enormous margin; the Fibonacci series increases pretty quickly.)
P.S. The iterator implementation I have in mind is completely different; you don't build it out of concatenated singleton Iterators:
def fib(i: Long, j: Long) = Iterator.iterate((i,j)){ case (a,b) => (b,a+b) }.map(_._1)
Now when you fold, past results can be discarded.
The OutOfMemoryError happens indenpendently from the fact that you use Stream. As Rex Kerr mentioned above, Stream -- unlike Iterator -- stores everything in memory. The difference with List is that the elements of Stream are calculated lazily, but once you reach 10000000, there will be 10000000 elements, just like List.
Try with new Array[Int](10000000), you will have the same problem.
To calculate the fibonacci number as above you may want to use different approach. You can take into account the fact that you only need to have two numbers, instead of the whole fibonacci numbers discovered so far.
For example:
scala> def fib(i:Long,j:Long): Iterator[Long] = Iterator(i) ++ fib(j, i + j)
fib: (i: Long,j: Long)Iterator[Long]
And to get, for example, the index of the first fibonacci number exceeding 1000000:
scala> fib(1, 1).indexWhere(_ > 1000000)
res12: Int = 30
Edit: I added the following lines to cope with the StackOverflow
If you really want to work with 1 millionth fibonacci number, the iterator definition above will not work either for StackOverflowError. The following is the best I have in mind at the moment:
class FibIterator extends Iterator[BigDecimal] {
var i: BigDecimal = 1
var j: BigDecimal = 1
def next = {val temp = i
i = i + j
j = temp
j }
def hasNext = true
}
scala> new FibIterator().take(1000000).foldLeft(0:BigDecimal)(_ + _)
res49: BigDecimal = 82742358764415552005488531917024390424162251704439978804028473661823057748584031
0652444660067860068576582339667553466723534958196114093963106431270812950808725232290398073106383520
9370070837993419439389400053162345760603732435980206131237515815087375786729469542122086546698588361
1918333940290120089979292470743729680266332315132001038214604422938050077278662240891771323175496710
6543809955073045938575199742538064756142664237279428808177636434609546136862690895665103636058513818
5599492335097606599062280930533577747023889877591518250849190138449610994983754112730003192861138966
1418736269315695488126272680440194742866966916767696600932919528743675517065891097024715258730309025
7920682881137637647091134870921415447854373518256370737719553266719856028732647721347048627996967...
#yura's problem:
def fib(i:Long,j:Long):Stream[Long] = i #:: fib(j, i+j)
(0l /: fib(1,1).take(10000000)) (_+_)
besides using a Long which can't possibly hold the Fibonacci of 10,000,000, it does work. That is, if the foldLeft is written as:
fib(1,1).take(10000000).foldLeft(0L)(_+_)
Looking at the Streams.scala source, foldLeft() is clearly designed for Garbage Collection, but /: is not def'd.
The other answers alluded to another problem. The Fibonacci of 10 million is a big number and if BigInt is used, instead of just overflowing like with a Long, absolutely enormous numbers are being added to each over and over again.
Since Stream.foldLeft is optimized for GC it does look like the way to solve for really big Fibonacci numbers, rather than using a zip or tail recursion.
// Fibonacci using BigInt
def fib(i:BigInt,j:BigInt):Stream[BigInt] = i #:: fib(j, i+j)
fib(1,0).take(10000000).foldLeft(BigInt("0"))(_+_)
Results of the above code: 10,000,000 is a 8-figure number. How many figures in fib(10000000)? 2,089,877
fib(1,1).take(10000000) is the "this" of the method /:, it is likely that the JVM will consider the reference alive as long as the method runs, even if in this case, it might get rid of it.
So you keep a reference on the head of the stream all along, hence on the whole stream as you build it to 10M elements.
You could just use recursion, which is about as simple:
def fibSum(terms: Int, i: Long = 1, j: Long = 1, total: Long = 2): Long = {
if (terms == 2) total
else fibSum(terms - 1, j, i + j, total + i + j)
}
With this, you can "fold" a billion elements in only a couple of seconds, but as Rex points out, summing the Fibbonaci sequence overflows Long very quickly.
If you really wanted to know the answer to your original problem and don't mind sacrificing some accuracy you could do this:
def fibSum(terms: Int, i: Double = 1, j: Double = 1, tot: Double = 2,
exp: Int = 0): String = {
if (terms == 2) "%.6f".format(tot) + " E+" + exp
else {
val (i1, j1, tot1, exp1) =
if (tot + i + j > 10) (i/10, j/10, tot/10, exp + 1)
else (i, j, tot, exp)
fibSum(terms - 1, j1, i1 + j1, tot1 + i1 + j1, exp1)
}
}
scala> fibSum(10000000)
res54: String = 2.957945 E+2089876

Most concise way to combine sequence elements

Say we have two sequences and we and we want to combine them using some method
val a = Vector(1,2,3)
val b = Vector(4,5,6)
for example addition could be
val c = a zip b map { i => i._1 + i._2 }
or
val c = a zip b map { case (i, j) => i + j }
The repetition in the second part makes me think this should be possible in a single operation. I can't see any built-in method for this. I suppose what I really want is a zip method that skips the creation and extraction of tuples.
Is there a prettier / more concise way in plain Scala, or maybe with Scalaz? If not, how would you write such a method and pimp it onto sequences so I could write something like
val c = a zipmap b (_+_)
There is
(a,b).zipped.map(_ + _)
which is probably close enough to what you want to not bother with an extension. (You can't use it point-free, unfortunately, since the implicits on zipped don't like that.)
Rex's answer is certainly the easier way out for most cases. However, zipped is more limited than zip, so you might stumble upon cases where it won't work.
For those cases, you might try this:
val c = a zip b map (Function tupled (_+_))
Or, alternatively, if you do have a function or method that does what you want, you have this option as well:
def sumFunction = (a: Int, b: Int) => a + b
def sumMethod(a: Int, b: Int) = a + b
val c1 = a zip b map sumFunction.tupled
val c2 = a zip b map (sumMethod _).tupled
Using .tupled won't work in the first case because Scala won't be able to infer the type of the function.