I need to produce an java.lang.Iterable[T] where the supply of T is some long running operation. In addition, after T is supplied it is wrapped and further computation is made to prepare for the next iteration.
Initially I thought I could do this with Iterator.continually. However, calling toIterable on the result actually creates a Stream[T] - with the problem being that the head is eagerly evaluated, which I don't want.
How can I either:
Create an Iterable[T] from a supplying function or
Convert an Iterator[T] into an Iterable[T] without using Stream?
In Scala 2.13, you can use LazyList:
LazyList.continually(1)
Unlike Stream, LazyList is also lazy in its head.
Because java.lang.Iterable is a very simple API, it's trivial to go from a scala.collection.Iterator to it.
case class IterableFromIterator[T](override val iterator:java.util.Iterator[T]) extends java.lang.Iterable[T]
val iterable:java.lang.Iterable[T] = IterableFromIterator(Iterator.continually(...).asJava)
Note this contradicts the expectation that iterable.iterator() produces a fresh Iterator each time; instead, iterable.iterator() can only be called once.
Related
I am trying to learn Scala through a sample project that I have. In it there is a variable record defined as:
val records: Iterator[Product2[K, V]]
It is passed around in different methods. I explore its contents using :
records.foreach(println)
However, when I try to print the contents using this iterator again, even in successive lines of code, I get no results. It seems as if the iterator is consumed. How do prevent it from happening and be able to explore the contents of the iterator without rendering it useless for the rest of the code?
An Iterator extends TraversableOnce and hence can only be iterated over once, as it represents a mutating pointer into an Iterable. If you want something that can be traversable repeatedly and without affecting multiple, parallel accesses, you need to use the Iterable instead, which extends Traversable and on foreach creates a new Iterator for that specific context
I am trying to analyze Scala code written by someone else, and in doing so, I would like to be able to write Unit Tests (that were not written before the code was written, unfortunately).
Being a relative Newbie to Scala, especially in the Futures concept area, I am trying to understand the following line of code.
val niceAnalysis:Option[(niceReport) => Future[niceReport]] = None
Update:
The above line of code should be:
val niceAnalysis:Option[(NiceReport) => Future[NiceReport]] = None
- Where NiceReport is a case class
-----------Update ends here----------------
Since I am trying to mock up an Actor, I created this new Actor where I introduce my niceAnalysis val as a field.
The first problem I see with this "niceAnalysis" thing is that it looks like an anonymous function.
How do I "initialize" this val, or to give it an initial value.
My goal is to create a test in my test class, where I am going to pass in this initialized val value into my test actor's receive method.
My naive approach to accomplish this looked like:
val myActorUnderTestRef = TestActorRef(new MyActorUnderTest("None))
Neither does IntelliJ like it. My SBT compile and test fails.
So, I need to understand the "niceAnalyis" declaration first and then understand how to give it an initial value. Please advise.
You are correct that this is a value that might contain a function from type niceReport to Future[niceReport]. You can pass an anonymous function or just a function pointer. The easiest to understand might be the pointer, so I will provide that first, but the easiest in longer terms would be the anonymous function most likely, which I will show second:
import scala.concurrent.Future
import scala.concurrent.ExecutionContext.Implicits.global
def strToFuture(x: String) = Future{ x } //merely wrap the string in a future
val foo = Option(strToFuture)
Conversely, the one liner is as follows:
val foo = Option((x:String)=>Future{x})
There are basically two options to evaluate a map in Scala.
Lazy evaluation computers the function that is passed as a parameter when the next value is needed. IF the function takes one hour to execute then it's one hour to wait when the value is needed. (e.g. Stream and Iterator)
Eager evaluation computes the function when the map is defined. It produces a new list (Vector or whatever) and stores the results, making the program to be busy in that time.
With Future we can obtain the list (Seq or whatever) in a separate thread, this means that our thread doesn't block, but the results have to be stored.
So I did something different, please check it here.
This was a while ago so I don't remember whether I tested it. The point is to have a map that applies concurrently (non-blocking) and kind of eagerly to a set of elements, filling a buffer (the size of the number of cores in the computer, and not more). This means that:
The invocation of the map doesn't block the current thread.
Obtaining an element doesn't block the current thread (in case there was time to calculate it before and store the result in the buffer).
Infinite lists can be handled because we only prefetch a few results (about 8, depending on the number of cores).
So this all sounds very good and you may be wondering what's the problem. The problem is that this solution is not particularly elegant IMHO. Assuming the code I shared works in Java and or Scala, to iterate over the elements in the iterable produced by the map I would only need to write:
new CFMap(whateverFunction).apply(whateverIterable)
However what I would like to write is something like:
whateverIterable.bmap(whateverFunction)
As it is usual in Scala (the 'b' is for buffered), or perhaps something like:
whateverThing.toBuffered.map(whateverFunction)
Either of them works for me. So the question is, how can I do this in an idiomatic way in Scala? Some options:
Monads: create a new collection "Buffered" so that I can use the toBuffered method (that should be added to the previous ones as an implicit) and implement map and everything else for this Buffered thing (sounds like quite some work).
Implicits: create an implicit method that transforms the usual collections or the superclass of them (I'm not sure about which one should it be, Iterable maybe?) to something else to which I can apply the .bmap method and obtain something from it, probably an iterable.
Other: there are probably many options I have not considered so far. It's possible that some library does already implement this (I'd be actually surprised of the opposite, I can't believe nobody thought of this before). Using something that has already been done is usually a good idea.
Please let me know if something is unclear.
What you're looking for is the "pimp-my-library" pattern. Check it out:
object CFMapExtensions {
import sanity.commons.functional.CFMap
import scala.collection.JavaConversions._
implicit class IterableExtensions[I](i: Iterable[I]) {
def bmap[O](f: Function1[I, O]): Iterable[O] = new CFMap(f).apply(asJavaIterable(i))
}
implicit class JavaIterableExtensions[I](i: java.lang.Iterable[I]) {
def bmap[O](f: Function1[I, O]): Iterable[O] = new CFMap(f).apply(i)
}
// Add an implicit conversion to a java function.
import java.util.function.{Function => JFunction}
implicit def toJFunction[I, O](f: Function1[I, O]): JFunction[I, O] = {
new JFunction[I, O]() {
def apply(t: I): O = f(t)
}
}
}
object Test extends App {
import CFMapExtensions._
List(1,2,3,4).bmap(_ + 5).foreach(println)
}
I know streams are supposed to be lazily evaluated sequences in Scala, but I think I am suffering from some sort of fundamental misunderstanding because they seem to be more eager than I would have expected.
In this example:
val initial = Stream(1)
lazy val bad = Stream(1/0)
println((initial ++ bad) take 1)
I get a java.lang.ArithmeticException, which seems to be cause by zero division. I would expect that bad would never get evaluated since I only asked for one element from the stream. What's wrong?
OK, so after commenting other answers, I figured I could as well turn my comments into a proper answer.
Streams are indeed lazy, and will only compute their elements on demand (and you can use #:: to construct a stream element by element, much like :: for List). By example, the following will not throw any exception:
(1/2) #:: (1/0) #:: Stream.empty
This is because when applying #::, the tail is passed by name so as to not evaluate it eagerly, but only when needed (see ConsWrapper.# ::, const.apply and class Cons in Stream.scala for more details).
On the other hand, the head is passed by value, which means that it will always be eagerly evaluated, no matter what (as mentioned by Senthil). This means that doing the following will actually throw a ArithmeticException:
(1/0) #:: Stream.empty
It is a gotcha worth knowing about streams. However, this is not the issue you are facing.
In your case, the arithmetic exception happens before even instantiating a single Stream. When calling Stream.apply in lazy val bad = Stream(1/0), the argument is eagerly executed because it is not declared as a by name parameter. Stream.apply actually takes a vararg parameter, and those are necessarily passed by value.
And even if it was passed by name, the ArithmeticException would be triggered shortly after, because as said earlier the head of a Stream is always early evaluated.
The fact that Streams are lazy doesn't change the fact that method arguments are evaluated eagerly.
Stream(1/0) expands to Stream.apply(1/0). The semantics of the language require that the arguments are evaluated before the method is called (since the Stream.apply method doesn't use call-by-name arguments), so it attempts to evaluate 1/0 to pass as the argument to the Stream.apply method, which causes your ArithmeticException.
There are a few ways you can get this working though. Since you've already declared bad as a lazy val, the easiest is probably to use the also-lazy #::: stream concatenation operator to avoid forcing evaluation:
val initial = Stream(1)
lazy val bad = Stream(1/0)
println((initial #::: bad) take 1)
// => Stream(1, ?)
The Stream will evaluate the head & remaining tail is evaluated lazily. In your example, both the streams are having only the head & hence giving an error.
I recently started playing with Scala and ran across the following. Below are 4 different ways to iterate through the lines of a file, do some stuff, and write the result to another file. Some of these methods work as I would think (though using a lot of memory to do so) and some eat memory to no end.
The idea was to wrap Scala's getLines Iterator as an Iterable. I don't care if it reads the file multiple times - that's what I expect it to do.
Here's my repro code:
class FileIterable(file: java.io.File) extends Iterable[String] {
override def iterator = io.Source.fromFile(file).getLines
}
// Iterator
// Option 1: Direct iterator - holds at 100MB
def lines = io.Source.fromFile(file).getLines
// Option 2: Get iterator via method - holds at 100MB
def lines = new FileIterable(file).iterator
// Iterable
// Option 3: TraversableOnce wrapper - holds at 2GB
def lines = io.Source.fromFile(file).getLines.toIterable
// Option 4: Iterable wrapper - leaks like a sieve
def lines = new FileIterable(file)
def values = lines
.drop(1)
//.map(l => l.split("\t")).map(l => l.reduceLeft(_ + "|" + _))
//.filter(l => l.startsWith("*"))
val writer = new java.io.PrintWriter(new File("out.tsv"))
values.foreach(v => writer.println(v))
writer.close()
The file it's reading is ~10GB with 1MB lines.
The first two options iterate the file using a constant amount of memory (~100MB). This is what I would expect. The downside here is that an iterator can only be used once and it's using Scala's call-by-name convention as a psuedo-iterable. (For reference, the equivalent c# code uses ~14MB)
The third method calls toIterable defined in TraverableOnce. This one works, but it uses about 2GB to do the same work. No idea where the memory is going because it can't cache the entire Iterable.
The fourth is the most alarming - it immediately uses all available memory and throws an OOM exception. Even weirder is that it does this for all of the operations I've tested: drop, map, and filter. Looking at the implementations, none of them seem to maintain much state (though the drop looks a little suspect - why does it not just count the items?). If I do no operations, it works fine.
My guess is that somewhere it's maintaining references to each of the lines read, though I can't imagine how. I've seen the same memory usage when passing Iterables around in Scala. For example if I take case 3 (.toIterable) and pass that to a method that writes an Iterable[String] to a file, I see the same explosion.
Any ideas?
Note how the ScalaDoc of Iterable says:
Implementations of this trait need to provide a concrete method with
signature:
def iterator: Iterator[A]
They also need to provide a method newBuilder which creates a builder
for collections of the same kind.
Since you don't provide an implementation for newBuilder, you get the default implementation, which uses a ListBuffer and thus tries to fit everything into memory.
You might want to implement Iterable.drop as
def drop(n: Int) = iterator.drop(n).toIterable
but that would break with the representation invariance of the collection library (i.e. iterator.toIterable returns a Stream, while you want List.drop to return a List etc - thus the need for the Builder concept).