Merge two Streams (ordered) to get a final sorted Stream - scala

For example, how to merge two Streams of sorted Integers? I thought it's very basic, but just found it's non trivial at all. The below one is not tail-recursive and it will stack-overflow when the Streams are large.
def merge(as: Stream[Int], bs: Stream[Int]): Stream[Int] = {
(as, bs) match {
case (Stream.Empty, bss) => bss
case (ass, Stream.Empty) => ass
case (a #:: ass, b #:: bss) =>
if (a < b) a #:: merge(ass, bs)
else b #:: merge(as, bss)
}
}
We may want to turn it into a tail-recursive one by introducing a accumulator. However, if we pre-pend the accumulator, we will only get a stream of reversed order; if we append the accumulator with concatenation (#:::), it's NOT lazy (strict) any more.
What could be the solution here? Thanks

Turning a comment into an answer, there's nothing wrong with your merge.
It's not recursive at all - any one call to merge returns a new Stream without any other call to merge. a #:: merge(ass, bs) return a stream with first element a and where merge(ass, bs) will be called to evaluate the rest of the stream when required.
So
val m = merge(Stream.from(1,2), Stream.from(2, 2))
//> m : Stream[Int] = Stream(1, ?)
m.drop(10000000).take(1)
//> res0: scala.collection.immutable.Stream[Int] = Stream(10000001, ?)
works just fine. No stack overflow.

Related

Function takes two List[Int] arguments and produces a List[Int]. SCALA [duplicate]

This question already has answers here:
Scala - Combine two lists in an alternating fashion
(4 answers)
Closed 3 years ago.
The elements of the resulting list should alternate between the elements of the arguments. Assume that the two arguments have the same length.
USE RECURSION
My code as follows
val finalString = new ListBuffer[Int]
val buff2= new ListBuffer[Int]
def alternate(xs:List[Int], ys:List[Int]):List[Int] = {
while (xs.nonEmpty) {
finalString += xs.head + ys.head
alternate(xs.tail,ys.tail)
}
return finalString.toList
}
EXPECTED RESULT:
alternate ( List (1 , 3, 5) , List (2 , 4, 6)) = List (1 , 2, 3, 4, 6)
As far for the output, I don't get any output. The program is still running and cannot be executed.
Are there any Scala experts?
There are a few problems with the recursive solutions suggested so far (including yours, which would actually work, if you replace while with if): appending to end of list is a linear operation, making the whole thing quadratic (taking a .length of a list too, as well ас accessing elements by index), don't do that; also, if the lists are long, a recursion may require a lot of space on the stack, you should be using tail-recursion whenever possible.
Here is a solution that is free of both those problems: it builds the output backwards, by prepending elements to the list (constant time operation) rather than appending them, and reverses the result at the end. It is also tail-recursive: the recursive call is the last operation in the function, which allows the compiler to convert it into a loop, so that it will only use a single stack frame for execution regardless of the size of the lists.
#tailrec
def alternate(
a: List[Int],
b: List[Int],
result: List[Int] = Nil
): List[Int] = (a,b) match {
case (Nil, _) | (_, Nil) => result.reversed
case (ah :: at, bh :: bt) => alternate(at, bt, bh :: ah :: result)
}
(if the lists are of different lengths, the whole thing stops when the shortest one ends, and whatever is left in the longer one is thrown out. You may want to modify the first case (split it into two, perhaps) if you desire a different behavior).
BTW, your own solution is actually better than most suggested here: it is actually tail recursive (or rather can be made one if you add else after your if, which is now while), and appending to ListBuffer isn't actually as bad as to a List. But using mutable state is generally considered "code smell" in scala, and should be avoided (that's one of the main ideas behind using recursion instead of loops in the first place).
Condition xs.nonEmpty is true always so you have infinite while loop.
Maybe you meant if instead of while.
A more Scala-ish approach would be something like:
def alternate(xs: List[Int], ys: List[Int]): List[Int] = {
xs.zip(ys).flatMap{case (x, y) => List(x, y)}
}
alternate(List(1,3,5), List(2,4,6))
// List(1, 2, 3, 4, 5, 6)
A recursive solution using match
def alternate[T](a: List[T], b: List[T]): List[T] =
(a, b) match {
case (h1::t1, h2::t2) =>
h1 +: h2 +: alternate(t1, t2)
case _ =>
a ++ b
}
This could be more efficient at the cost of clarity.
Update
This is the more efficient solution:
def alternate[T](a: List[T], b: List[T]): List[T] = {
#annotation.tailrec
def loop(a: List[T], b: List[T], res: List[T]): List[T] =
(a, b) match {
case (h1 :: t1, h2 :: t2) =>
loop(t1, t2, h2 +: h1 +: res)
case _ =>
a ++ b ++ res.reverse
}
loop(a, b, Nil)
}
This retains the original function signature but uses an inner function that is an efficient, tail-recursive implementation of the algorithm.
You're accessing variables from outside the method, which is bad. I would suggest something like the following:
object Main extends App {
val l1 = List(1, 3, 5)
val l2 = List(2, 4, 6)
def alternate[A](l1: List[A], l2: List[A]): List[A] = {
if (l1.isEmpty || l2.isEmpty) List()
else List(l1.head,l2.head) ++ alternate(l1.tail, l2.tail)
}
println(alternate(l1, l2))
}
Still recursive but without accessing state from outside the method.
Assuming both lists are of the same length, you can use a ListBuffer to build up the alternating list. alternate is a pure function:
import scala.collection.mutable.ListBuffer
object Alternate extends App {
def alternate[T](xs: List[T], ys: List[T]): List[T] = {
val buffer = new ListBuffer[T]
for ((x, y) <- xs.zip(ys)) {
buffer += x
buffer += y
}
buffer.toList
}
alternate(List(1, 3, 5), List(2, 4, 6)).foreach(println)
}

Scala - access collection members within map or flatMap

Suppose that I use a sequence of various maps and/or flatMaps to generate a sequence of collections. Is it possible to access information about the "current" collection from within any of those methods? For example, without knowing anything specific about the functions used in the previous maps or flatMaps, and without using any intermediate declarations, how can I get the maximum value (or length, or first element, etc.) of the collection upon which the last map acts?
List(1, 2, 3)
.flatMap(x => f(x) /* some unknown function */)
.map(x => x + ??? /* what is the max element of the collection? */)
Edit for clarification:
In the example, I'm not looking for the max (or whatever) of the initial List. I'm looking for the max of the collection after the flatMap has been applied.
By "without using any intermediate declarations" I mean that I do not want to use any temporary collections en route to the final result. So, the example by Steve Waldman below, while giving the desired result, is not what I am seeking. (I include this condition is mostly for aesthetic reasons.)
Edit for clarification, part 2:
The ideal solution would be some magic keyword or syntactic sugar that lets me reference the current collection:
List(1, 2, 3)
.flatMap(x => f(x))
.map(x => x + theCurrentList.max)
I'm prepared to accept the fact, however, that this simply is not possible.
Maybe just define the list as a val, so you can name it? I don't know of any facility built into map(...) or flatMap(...) that would help.
val myList = List(1, 2, 3)
myList
.flatMap(x => f(x) /* some unknown function */)
.map(x => x + myList.max /* what is the max element of the List? */)
Update: By this approach at least, if you have multiple transformations and want to see the transformed version, you'd have to name that. You could get away with
val myList = List(1, 2, 3).flatMap(x => f(x) /* some unknown function */)
myList.map(x => x + myList.max /* what is the max element of the List? */)
Or, if there will be multiple transformations, get in the habit of naming the stages.
val rawList = List(1, 2, 3)
val smordified = rawList.flatMap(x => f(x) /* some unknown function */)
val maxified = smordified.map(x => x + smordified.max /* what is the max element of the List? */)
maxified
Update 2: Watch it work in the REPL even with heterogenous types:
scala> def f( x : Int ) : Vector[Double] = Vector(x * math.random, x * math.random )
f: (x: Int)Vector[Double]
scala> val rawList = List(1, 2, 3)
rawList: List[Int] = List(1, 2, 3)
scala> val smordified = rawList.flatMap(x => f(x) /* some unknown function */)
smordified: List[Double] = List(0.40730853571901315, 0.15151641399798665, 1.5305929709857609, 0.35211231420067435, 0.644241939254793, 0.15530230501048903)
scala> val maxified = smordified.map(x => x + smordified.max /* what is the max element of the List? */)
maxified: List[Double] = List(1.937901506704774, 1.6821093849837476, 3.0611859419715217, 1.8827052851864352, 2.1748349102405538, 1.6858952759962498)
scala> maxified
res3: List[Double] = List(1.937901506704774, 1.6821093849837476, 3.0611859419715217, 1.8827052851864352, 2.1748349102405538, 1.6858952759962498)
It is possible, but not pretty, and not likely something you want if you are doing it for "aesthetic reasons."
import scala.math.max
def f(x: Int): Seq[Int] = ???
List(1, 2, 3).
flatMap(x => f(x) /* some unknown function */).
foldRight((List[Int](),List[Int]())) {
case (x, (xs, Nil)) => ((x :: xs), List.fill(xs.size + 1)(x))
case (x, (xs, xMax :: _)) => ((x :: xs), List.fill(xs.size + 1)(max(x, xMax)))
}.
zipped.
map {
case (x, xMax) => x + xMax
}
// Or alternately, a slightly more efficient version using Streams.
List(1, 2, 3).
flatMap(x => f(x) /* some unknown function */).
foldRight((List[Int](),Stream[Int]())) {
case (x, (xs, Stream())) =>
((x :: xs), Stream.continually(x))
case (x, (xs, curXMax #:: _)) =>
val newXMax = max(x, curXMax)
((x :: xs), Stream.continually(newXMax))
}.
zipped.
map {
case (x, xMax) => x + xMax
}
Seriously though, I just took this on to see if I could do it. While the code didn't turn out as bad as I expected, I still don't think it's particularly readable. I'd discourage using this over something similar to Steve Waldman's answer. Sometimes, it's simply better to just introduce a val, rather than being dogmatic about it.
You could define a mapWithSelf (resp. flatMapWithSelf) operation along these lines and add it as an implicit enrichment to the collection. For List it might look like:
// Scala 2.13 APIs
object Enrichments {
implicit class WithSelfOps[A](val lst: List[A]) extends AnyVal {
def mapWithSelf[B](f: (A, List[A]) => B): List[B] =
lst.map(f(_, lst))
def flatMapWithSelf[B](f: (A, List[A]) => IterableOnce[B]): List[B] =
lst.flatMap(f(_, lst))
}
}
The enrichment basically fixes the value of the collection before the operation and threads it through. It should be possible to generify this (at least for the strict collections), though it would look a little different in 2.12 vs. 2.13+.
Usage would look like
import Enrichments._
val someF: Int => IterableOnce[Int] = ???
List(1, 2, 3)
.flatMap(someF)
.mapWithSelf { (x, lst) =>
x + lst.max
}
So at the usage site, it's aesthetically pleasant. Note that if you're computing something which traverses the list, you'll be traversing the list every time (leading to a quadratic runtime). You can get around that with some mutability or by just saving the intermediate list after the flatMap.
One somewhat-simple way of referencing prior output within the current map/collect operation is to use a named reference outside the map, then reference it from within the map block:
var prevOutput = ... // starting value of whatever is referenced within the map
myValues.map {
prevOutput = ... // expression that references prior `prevOutput`
prevOutput // return above computed value for the map to collect
}
This draws attention to the fact that we're referencing prior elements while building the new sequence.
This would be more messy, though, if you wanted to reference arbitrarily previous values, not just the previous one.

What's the diff between reduceLeft and reduceRight in Scala?

What's the diff between reduceLeft and reduceRight in Scala?
val list = List(1, 0, 0, 1, 1, 1)
val sum1 = list reduceLeft {_ + _}
val sum2 = list reduceRight {_ + _}
println { sum2 == sum2 }
In my snippet sum1 = sum2 = 4, so the order does not matter here.
When do they produce the same result
As Lionel already pointed out, reduceLeft and reduceRight only produce the same result if the function you are using to combine the elements is associative (this isn't always true, see my note at the bottom). For instance when running reduceLeft and reduceRight on Seq(1,2,3) with the function (a: Int, b: Int) => a - b you get a different result.
scala> Seq(1,2,3)
res0: Seq[Int] = List(1, 2, 3)
scala> res0.reduceLeft(_ - _)
res5: Int = -4
scala> res0.reduceRight(_ - _)
res6: Int = 2
Why this happens can be made clear if we look at how each of the functions is applied over the list.
For reduceRight this is what the calls look like if we were to unwrap them.
(1 - (2 - 3))
(1 - (-1))
2
For reduceLeft the sequence is built up starting from the left,
((1 - 2) - 3)
((-1) - 3)
(-4)
Tail Recursion
Further because reduceLeft is implemented using Tail Recursion, it will not stack overflow when operating on very large collections (possibly even infinite). reduceRight is not tail recursive, so given a collection of large enough size, it will produce a stack overflow.
For instance, on my machine if I run the following I get an Out of Memory error,
scala> (0 to 100000000).reduceRight(_ - _)
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.lang.Integer.valueOf(Integer.java:832)
at scala.runtime.BoxesRunTime.boxToInteger(BoxesRunTime.java:65)
at scala.collection.immutable.Range.apply(Range.scala:61)
at scala.collection.IndexedSeqLike$Elements.next(IndexedSeqLike.scala:65)
at scala.collection.Iterator$class.foreach(Iterator.scala:742)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
at scala.collection.TraversableOnce$class.reversed(TraversableOnce.scala:99)
at scala.collection.AbstractIterator.reversed(Iterator.scala:1194)
at scala.collection.TraversableOnce$class.reduceRight(TraversableOnce.scala:197)
at scala.collection.AbstractIterator.reduceRight(Iterator.scala:1194)
at scala.collection.IterableLike$class.reduceRight(IterableLike.scala:85)
at scala.collection.AbstractIterable.reduceRight(Iterable.scala:54)
... 20 elided
But if I compute with reduceLeft I don't get the OOM,
scala> (0 to 100000000).reduceLeft(_ - _)
res16: Int = -987459712
You might get slightly different results on your system, depending your JVM default memory settings.
Prefer left versions
So, because of tail recursion, if you know that reduceLeft and reduceRight will produce the same value, you should prefer the reduceLeft variant. This generally holds true of the other left/right functions, such as foldRight and foldLeft (which are just more general versions of reduceRight and reduceLeft).
When do they really always produce the same result
A small note about reduceLeft and reduceRight and the Associative Property of the function you are using. I said that reduceRight and reduceLeft only produce the same results if the operator is associative. This isn't always true for all collection types. That is somewhat of another topic though, so consult the ScalaDoc, but in short the function you are reducing with needs to be both commutative and associative in order to get the same results for all collection types.
Reduce left doesn't always equal the same result as reduce right. Consider an asymmetric function on your array.
Assuming the same result, performance is one obvious difference
See performance-characteristics
The data structure is build with constant access time to head and tail. Iterating backwards will perform worse for large lists.
The best way to know the difference is read the source code in library/scala/collection/LinearSeqOptimized.scala:
def reduceLeft[B >: A](f: (B, A) => B): B =
......
tail.foldLeft[B](head)(f)
def reduceRight[B >: A](op: (A, B) => B): B =
......
op(head, tail.reduceRight(op))
def foldLeft[B](z: B)(f: (B, A) => B): B = {
var acc = z
var these = this
while (!these.isEmpty) {
acc = f(acc, these.head)
these = these.tail
}
acc
}
The above is some key part of the code, and you can see reduceLeft is based on foldLeft , while reduceRight is implemented via recursion.
I guess reduceLeft has a better performance.

Why does this function run out of memory?

It's a function to find the third largest of a collection of integers. I'm calling it like this:
val lineStream = thirdLargest(Source.fromFile("10m.txt").getLines.toIterable
val intStream = lineStream map { s => Integer.parseInt(s) }
thirdLargest(intStream)
The file 10m.txt contains 10 million lines with a random integer on each one. The thirdLargest function below should not be keeping any of the integers after it has tested them, and yet it causes the JVM to run out of memory (after about 90 seconds in my case).
def thirdLargest(numbers: Iterable[Int]): Option[Int] = {
def top3of4(top3: List[Int], fourth: Int) = top3 match {
case List(a, b, c) =>
if (fourth > c) List(b, c, fourth)
else if (fourth > b) List(b, fourth, c)
else if (fourth > a) List(fourth, b, c)
else top3
}
#tailrec
def find(top3: List[Int], rest: Iterable[Int]): Int = (top3, rest) match {
case (List(a, b, c), Nil) => a
case (top3, d #:: rest) => find(top3of4(top3, d), rest)
}
numbers match {
case a #:: b #:: c #:: rest => Some(find(List[Int](a, b, c).sorted, rest))
case _ => None
}
}
The OOM error has nothing to do with the way you read the file. It is totally fine and even recommended to use Source.getLines here. The problem is elsewhere.
Many people are being confused by the nature of Scala Stream concept. In fact this is not something you would want to use just to iterate over things. It is lazy indeed however it doesn't discard previous results – they're being memoized so there's no need to recalculate them again on the next use (which never happens in your case but that's where your memory goes). See also this answer.
Consider using foldLeft. Here's a working (but intentionally simplified) example for illustration purposes:
val lines = Source.fromFile("10m.txt").getLines()
print(lines.map(_.toInt).foldLeft(-1 :: -1 :: -1 :: Nil) { (best3, next) =>
(next :: best3).sorted.reverse.take(3)
})

Pattern matching and infinite streams

So, I'm working to teach myself Scala, and one of the things I've been playing with is the Stream class. I tried to use a naïve translation of the classic Haskell version of Dijkstra's solution to the Hamming number problem:
object LazyHammingBad {
private def merge(a: Stream[BigInt], b: Stream[BigInt]): Stream[BigInt] =
(a, b) match {
case (x #:: xs, y #:: ys) =>
if (x < y) x #:: merge(xs, b)
else if (y < x) y #:: merge(a, ys)
else x #:: merge(xs, ys)
}
val numbers: Stream[BigInt] =
1 #:: merge(numbers map { _ * 2 },
merge(numbers map { _ * 3 }, numbers map { _ * 5 }))
}
Taking this for a spin in the interpreter led quickly to disappointment:
scala> LazyHammingBad.numbers.take(10).toList
java.lang.StackOverflowError
I decided to look to see if other people had solved the problem in Scala using the Haskell approach, and adapted this solution from Rosetta Code:
object LazyHammingGood {
private def merge(a: Stream[BigInt], b: Stream[BigInt]): Stream[BigInt] =
if (a.head < b.head) a.head #:: merge(a.tail, b)
else if (b.head < a.head) b.head #:: merge(a, b.tail)
else a.head #:: merge(a.tail, b.tail)
val numbers: Stream[BigInt] =
1 #:: merge(numbers map {_ * 2},
merge(numbers map {_ * 3}, numbers map {_ * 5}))
}
This one worked nicely, but I still wonder how I went wrong in LazyHammingBad. Does using #:: to destructure x #:: xs force the evaluation of xs for some reason? Is there any way to use pattern matching safely with infinite streams, or do you just have to use head and tail if you don't want things to blow up?
a match {case x#::xs =>... is about the same as val (x, xs) = (a.head, a.tail). So the difference between the bad version and the good one, is that in that in the bad version, you're calling a.tail and b.tail right at the start, instead of just use them to build the tail of the resulting stream. Furthermore when you use them at the right of #:: (not pattern matching, but building the result, as in #:: merge(a.b.tail) you are not actually calling merge, that will be done only later, when accessing the tail of the returned Stream. So in the good version, a call to merge does not call tail at all. In the bad version, it calls it right at start.
Now if you consider numbers, or even a simplified version, say 1 #:: merge(numbers, anotherStream), when you call you call tail on that (as take(10) will), merge has to be evaluated. You call tail on numbers, which call merge with numbers as parameters, which calls tails on numbers, which calls merge, which calls tail...
By contrast, in super lazy Haskell, when you pattern match, it does barely any work. When you do case l of x:xs, it will evaluate l just enough to know whether it is an empty list or a cons.
If it is indeed a cons, x and xs will be available as two thunks, functions that will eventually give access, later, to content. The closest equivalent in Scala would be to just test empty.
Note also that in Scala Stream, while the tail is lazy, the head is not. When you have a (non empty) Stream, the head has to be known. Which means that when you get the tail of the stream, itself a stream, its head, that is the second element of the original stream, has to be computed. This is sometimes problematic, but in your example, you fail before even getting there.
Note that you can do what you want by defining a better pattern matcher for Stream:
Here's a bit I just pulled together in a Scala Worksheet:
object HammingTest {
// A convenience object for stream pattern matching
object #:: {
class TailWrapper[+A](s: Stream[A]) {
def unwrap = s.tail
}
object TailWrapper {
implicit def unwrap[A](wrapped: TailWrapper[A]) = wrapped.unwrap
}
def unapply[A](s: Stream[A]): Option[(A, TailWrapper[A])] = {
if (s.isEmpty) None
else {
Some(s.head, new TailWrapper(s))
}
}
}
def merge(a: Stream[BigInt], b: Stream[BigInt]): Stream[BigInt] =
(a, b) match {
case (x #:: xs, y #:: ys) =>
if (x < y) x #:: merge(xs, b)
else if (y < x) y #:: merge(a, ys)
else x #:: merge(xs, ys)
} //> merge: (a: Stream[BigInt], b: Stream[BigInt])Stream[BigInt]
lazy val numbers: Stream[BigInt] =
1 #:: merge(numbers map { _ * 2 }, merge(numbers map { _ * 3 }, numbers map { _ * 5 }))
//> numbers : Stream[BigInt] = <lazy>
numbers.take(10).toList //> res0: List[BigInt] = List(1, 2, 3, 4, 5, 6, 8, 9, 10, 12)
}
Now you just need to make sure that Scala finds your object #:: instead of the one in Stream.class whenever it's doing pattern matching. To facilitate that, it might be best to use a different name like #>: or ##:: and then just remember to always use that name when pattern matching.
If you ever need to match the empty stream, use case Stream.Empty. Using case Stream() will attempt to evaluate your entire stream there in the pattern match, which will lead to sadness.