Scala lazy elements in iterator - scala

Does anyone know how to create a lazy iterator in scala?
For example, I want to iterate through instantiating each element. After passing, I want the instance to die / be removed from memory.
If I declare an iterator like so:
val xs = Iterator(
(0 to 10000).toArray,
(0 to 10).toArray,
(0 to 10000000000).toArray)
It creates the arrays when xs is declared. This can be proven like so:
def f(name: String) = {
val x = (0 to 10000).toArray
println("f: " + name)
x
}
val xs = Iterator(f("1"),f("2"),f("3"))
which prints:
scala> val xs = Iterator(f("1"),f("2"),f("3"))
f: 1
f: 2
f: 3
xs: Iterator[Array[Int]] = non-empty iterator
Anyone have any ideas?
Streams are not suitable because elements remain in memory.
Note: I am using an Array as an example, but I want it to work with any type.

Scala collections have a view method which produces a lazy equivalent of the collection. So instead of (0 to 10000).toArray, use (0 to 10000).view. This way, there will be no array created in the memory. See also https://stackoverflow.com/a/6996166/90874, https://stackoverflow.com/a/4799832/90874, https://stackoverflow.com/a/4511365/90874 etc.

Use one of Iterator factory methods which accepts call-by-name parameter.
For your first example you can do one of this:
val xs1 = Iterator.fill(3)((0 to 10000).toArray)
val xs2 = Iterator.tabulate(3)(_ => (0 to 10000).toArray)
val xs3 = Iterator.continually((0 to 10000).toArray).take(3)
Arrays won't be allocated until you need them.
In case you need different expressions for each element, you can create separate iterators and concatenate them:
val iter = Iterator.fill(1)(f("1")) ++
Iterator.fill(1)(f("2")) ++
Iterator.fill(1)(f("3"))

Related

Scala efficient set inclusion detection

Let a collection of tuples where the first item is a set, for instance
val xs = Seq(
((1 to 5).toSet ++ Set(9), "apple"),
((15 to 17).toSet, "pear"),
((21 to 30).toSet, "grape"))
Given a value x:Int, how to efficiently identify the second item ? (The real use case includes thousands of sets.)
For val x = 22 the result would be Some("grape"), for val x = 19 the result would be None.
Note Values in each set are not necessarily consecutive.
Note Sets do not overlap (any sets intersection proves empty).
Depends on your use case, but given you're concerned with efficiency, I assume you're going to do a lot of lookups.
I also assume you use one xs, and lookup in that a lot of times.
Preprocess xs into a map of Int->String
val xsMap = (xs flatMap { case (s, v) => s.map((_,v))}).toMap[Int, String]
Then it's trivial (and O(1)) to look up elements
xsMap.get(22) //> res0: Option[String] = Some(grape)
xsMap.get(19) //> res1: Option[String] = None
What about:
s.find(_._1.contains(11)).map(_._2)

Populating an immutable List

Here I populate two Lists where each list is either mutable or immutable :
var mutableList = scala.collection.mutable.MutableList[String]()
//> mutableList : scala.collection.mutable.MutableList[String] = MutableList()
//|
for (a <- 1 to 100) {
mutableList += a.toString
}
println(mutableList.size); //> 100
val immutableList = List[String]() //> immutableList : List[String] = List()
for (a <- 1 to 100) {
immutableList :+ a.toString
}
println(immutableList.size); //> 0
When I print the size of the immutableList its output is 0. This is because within the for loop a new reference is created that does not point to immutableList ? Is there a functional equivalent to populating an immutable List from within loop ?
As Gabor answered in a comment, you want to use fold, or even continue with the for and yield. What he did not explain is why you are getting a size of 0. The reason is that immutableList :+ a.toString is returning a new list each time, which you are not using. the immutableList is exactly that, immutable.
Keep in mind that everything in Scala is an expression and therefore returns something. So, you can turn your regular for (which acts like a forEach) into a comprehension by adding the yield as below
val immutableList = for (a <- 1 to 100) yield a.toString
This desugars into something like:
(1 to 100).map(_.toString)
For completeness, method tabulate allows for creating and populating an immutable List, for instance as follows,
List.tabulate(100)(a => a.toString)
or equivalently
List.tabulate(100)(_.toString)

lazy val v.s. val for recursive stream in Scala

I understand the basic of diff between val and lazy val .
but while I run across this example, I 'm confused.
The following code is right one. It is a recursion on stream type lazy value.
def recursive(): {
lazy val recurseValue: Stream[Int] = 1 #:: recurseValue.map(_+1)
recurseValue
}
If I change lazy val to val. It reports error.
def recursive(): {
//error forward reference failed.
val recurseValue: Stream[Int] = 1 #:: recurseValue.map(func)
recurseValue
}
My trace of thought in 2th example by substitution model/evaluation strategy is :
the right hand sight of #:: is call by name with that the value shall be of the form :
1 #:: ?,
and if 2th element being accessed afterward, it refer to current recurseValue value and rewriting it to :
1 :: ((1 #:: ?) map func) =
1 :: (func(1) #:: (? map func))
.... and so on and so on such that the compiler should success.
I don't see any error when I rewriting it ,is there somthing wrong?
EDIT:
CONCLUSION:I found it work fine if the val defined as a field. And I also noticed this post about implement of val. The conclusion is that the val has different implementation in method or field or REPL. That's confusing really.
That substitution model works for recursion if you are defining functions, but you can't define a variable in terms of itself unless it is lazy. All of the info needed to compute the right-hand side must be available for the assignment to take place, so a bit of laziness is required in order to recursively define a variable.
You probably don't really want to do this, but just to show that it works for functions:
scala> def r = { def x:Stream[Int] = 1#::( x map (_+1) ); x }
r: Stream[Int]
scala> r take 3 foreach println
1
2
3

immutable val vs mutable ArrayBuffer

mutable vs. immutable in Scala collections
Before I post this question, I have read the above article. Apparently if you store something in val, you can't modify it, but then if you store a mutable collection such as ArrayBuffer, you can modify it!
scala> val b = ArrayBuffer[Int](1,2,3)
b: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(1, 2, 3)
scala> b += 1
res50: b.type = ArrayBuffer(1, 2, 3, 1)
scala> b
res51: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(1, 2, 3, 1)
What is the use of using val to store a mutable ArrayBuffer? I assume the only reason b changes is because val b holds the memory address to that ArrayBuffer(1,2,3).
If you try var x = 1; val y = x; x = 5; y, the output will still be 1. In this case, y stores an actual value instead of the address to x.
Java doesn't have this confusion because it's clear an Object can't be assigned to an int variable .
How do I know when is the variable in scala carrying a value, when is a memory address? What's the point of storing a mutable collection in a immutable variable?
A simple answer is that vals and vars are all references. There're no primitive types in Scala. They're all objects.
val x = 1
is a reference named x that points to an immutable integer object 1. You cannot do 1.changeTo(2) or something, so if you have
val value = 5
val x = value
var y = value
You can do y += 10 This changes y to reference a new object, (5 + 10) = 15. The original 5 remains 5.
On the other hand, you cannot do x += 10 because x is a val which means it must always point to 5. So, this doesn't compile.
You may wonder why you can do val b = ArrayBuffer(...) and then b += something even though b is a val. That's because += is actually a method, not an assignment. Calling b += something gets translated to b.+=(something). The method += just adds a new element (something) to its mutable self and returns itself for further assignment.
Let's see an example
scala> val xs = ArrayBuffer(1,2,3)
xs: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(1, 2, 3)
scala> val ys = ( xs += 999 )
ys: xs.type = ArrayBuffer(1, 2, 3, 999)
scala> xs
res0: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(1, 2, 3, 999)
scala> ys
res1: xs.type = ArrayBuffer(1, 2, 3, 999)
scala> xs eq ys
res2: Boolean = true
This confirms xs and ys point to the same (mutable) ArrayBuffer. The eq method is like Java's ==, which compares object identity. Mutable/Immutable references (val/var) and mutable/immutable data structures (ArrayBuffer, List) are different. So, if you do another xs += 888, the ys which is an immutable reference pointing to a mutable data structure also contains 888.
What's the point of storing a mutable collection in a immutable variable
val a = new ArrayBuffer(1)
a = new ArrayBuffer[Int]()
<console>:9: error: reassignment to val
It prevents the variable from being assigned to a new memory address. In practice though scala encourages you not to use mutable state (to avoid locking, blocking, etc), so I'm having trouble coming up with an example for a real situation where the choice of var or val for mutable state matters.
Immutable object and constant value are two different things.
If you define your collection as val means that the referenced instance of the collection will always be the same. But this instance can be mutable or immutable: if it is immutable you cannot add or remove items in that instance, vice versa if it is mutable you can do it. When a collection is immutable to add or remove items you always create a copy.
How do I know when is the variable in scala carrying a value, when is a memory address?
Scala always runs on the JVM (.NET support was discontinued), so types that are primitive types on JVM will be treated as primitive types by Scala.
What is the use of using val to store a mutable ArrayBuffer?
The closest alternative would be to use a var to store an immutable Seq. If that Seq was very large, you wouldn't want to have to copy the whole Seq every time you made a change to it - but that's what you might have to do! That would be very slow!

Scala Stream Off By One

Can someone please explain the following output from the REPL?
I'm defining 2 (infinite) Streams that are otherwise identical in their definition except that map is preceded by . (period) in one definition and a _ _ (space) in the other.
I can see that this would cause map to bind differently, but what happens to the 1 in the output from the second definition?
Thanks.
scala> lazy val infinite: Stream[Int] = 1 #:: infinite.map(_+1)
infinite: Stream[Int] = <lazy>
scala> val l = infinite.take(10).toList.mkString(",")
l: String = 1,2,3,4,5,6,7,8,9,10
scala> lazy val infinite2: Stream[Int] = 1 #:: infinite2 map(_+1)
infinite2: Stream[Int] = <lazy>
scala> val l2 = infinite2.take(10).toList.mkString(",")
l2: String = 2,3,4,5,6,7,8,9,10,11
It's about method associativity. This:
1 #:: infinite.map(_+1)
is quite straightforward while this:
1 #:: infinite2 map(_+1)
is interpreted by the compiler as:
(1 #:: infinite2) map(_+1)
1 #:: infinite2 is your desired stream, but before you return it, you apply lazy transformation adding one to every item. This explains why 1 never appears as a result - after transformation it becomes 2.
For more details see: Operator precedence in Scala. Since # is not a special character, it is treated equally with map, thus methods are evaluated from left to right.
In the infinite2 case, what you've expressed is equivalent to the following:
lazy val infinite2: Stream[Int] = (1 #:: infinite2) map(_ + 1)
Since the stream starts with 1, the map will add 1 to the first element.