scala iterator hasNext random behaviour - scala

var intList = Iterator(range(1,10))
println("data"+intList)
println(intList.hasNext)
Last line gives True, whereas for
var intList = Iterator(range(1,10))
println("data"+intList.toList)
println(intList.hasNext)
Last line give False
Why even if intList is immutable I am not assigning it to any new variable.

You're right that lists are immutable in Scala. However, your intList is not a list; it's an Iterator, which uses next() to iterate and is mutable.
println("data " + intList)
This prints out a representation of the iterator. It probably says something like "non-empty iterator". All it needs to do is call hasNext, which changes nothing.
println("data " + intList.toList)
toList is a method (don't let the lack of parentheses fool you; everything you call on an object is a method in Scala), and exhausts the iterator, which means it calls next() until there's nothing left. Then your iterator is empty, so hasNext rightly tells you that there is no next value.

Related

StringTokenizer to Scala Iterator

I am trying to convert java.util.StringTokenizer to Scala's Iterator and the following approach fails:
def toIterator(st: StringTokenizer): Iterator[String] =
Iterator.continually(st.nextToken()).takeWhile(_ => st.hasMoreTokens()))
But this works:
def toIterator(st: StringTokenizer): Iterator[String] =
Iterator.fill(st.countTokens())(st.nextToken())
You can see this in Scala console:
scala> Iterator("a b", "c d").map(new java.util.StringTokenizer(_)).flatMap(st => Iterator.continually(st.nextToken()).takeWhile(_ => st.hasMoreTokens())).toList
res1: List[String] = List(a, c)
scala> Iterator("a b", "c d").map(new java.util.StringTokenizer(_)).flatMap(st => Iterator.fill(st.countTokens())(st.nextToken())).toList
res2: List[String] = List(a, b, c, d)
As you can see res1 is incorrect and res2 is correct. What am I doing wrong? The first version should work and is preferred since it is 2x faster than second approach since it does not scan the string twice
takeWhile is not intended to be used statefully. It should take a pure function that determines, solely based on the input, whether or not to continue.
Specifically, the iterator must produce the value before the takeWhile predicate gets called. Even though your function ignores the takeWhile argument, it still gets evaluated. So nextToken gets called and then we check for more tokens.
To be perfectly precise, in your "a b" case,
First, we call nextToken, which is what Iterator.continually does. There's a next token, so it returns "a".
Now, to determine if we should include the next token, we call your predicate with "a" as argument. Your predicate ignores "a" and calls hasMoreTokens. Our tokenizer has more tokens (namely, "b"), so it returns true. Continue.
Now we call nextToken again. This returns "b".
We need to determine if we should include this in our result, so our takeWhile predicate runs with "b" as argument. Our takeWhile predicate ignores its argument and calls hasMoreTokens. We have no more tokens anymore, so this returns false. We should not include this element.
takeWhile has returned false, so we stop at the last element for which it returned true. Our resulting list is List("a").
Since we're abusing a pure functional technique like takeWhile to be stateful, we get unintuitive results.
As much as it looks snazzy and clever to have a one-line solution, what you have is a stateful, imperative object that you want to adapt to the Iterator interface. Hiding that statefulness in a bunch of pure function calls is not a good idea, so we should just write our own subclass of Iterator and do it properly.
import java.util.StringTokenizer
final class StringTokenizerIterator(
private val tokenizer: StringTokenizer
) extends Iterator[String] {
def hasNext: Boolean = tokenizer.hasMoreTokens
def next(): String = tokenizer.nextToken()
}
object Example {
def toIterator(st: StringTokenizer): Iterator[String] =
new StringTokenizerIterator(st)
def main(args: Array[String]) = {
println(Iterator("a b", "c d")
.map(new java.util.StringTokenizer(_))
.flatMap(toIterator(_))
.toList)
}
}
We're doing the same work you were doing calling the appropriate StringTokenizer functions, but we're doing it in a full class that encapsulates the state, rather than pretending the stateful part is not there. It's longer code, yes, but it should be longer. We don't want the messy parts of it to go unnoticed.

Can't understand Scala example

I am struggling to understand the basics of Scala, in particular this code:
trait RichIterator extends AbsIterator {
def foreach(f: T => Unit): Unit = while (hasNext) f(next())
}
to me, foreach looks like a method that returns unit. It takes a function, f as parameter:
f: T => Unit
which also returns unit. The method loops while hasNext is true and I have no idea what f(next()) does. Is this f the input parameter, and why use a function which seems to just return a Unit? Can someone please help explain this to me.
The foreach function is typically used to achieve side-effects. What I mean is that when I call List(1, 2, 3).foreach(println) I want to print (which is a side-effect) all elements from the list.
I have no idea what f(next()) does.
The f(next()) means call the function f - which is the argument of the foreach function - with the next element - which is retrieved by calling next().
Why use a function that just returns Unit?
Generally in Scala you will notice that methods which have side-effects return Unit. I actually extracted a piece of scaladoc here which is from the original definition of foreach found in TraversableLike.
/**
* #param f the function that is applied for its side-effect to every element. The result of function `f` is discarded.
* #tparam U the type parameter describing the result of function `f`. This result will always be ignored. Typically `U` is `Unit`, but this is not necessary.
*/
def foreach[U](f: A => U): Unit
Considering that the result of foreach will always be ignored, one could directly set it to Unit instead of enforcing another type parameter as the author of your example already did.
I hope this helps you :)
In this example, as you correctly noticed, the foreach-function loops over all the elements (with the while and hasNext).
f is the function you got as parameter.
You call this function with all the elements element (which you get by calling next() in the loop ).
So the function f is executed for every element.
Example:
You pass the function println.
Now you loop over every element of the Iterator. with next(), you get the next element. This is passed to your println-function (that you provided as parameter). Therefore, each element is printed.
Foreach is a loop that does some actions on each element wrapped in object. In this example this extends AbsIterator, so next() is next element of iterable sequence. You are using foreach by passing there argument function which will do some actions on each element, but won't change it unlike map, and unlike map it won't return new collection, that is why it's UNIT. Easy example would be if RichIterator would iterate on Strings.
Then you can have Iterable like this:
("1","2","3")
You could use foreach on this by passing f = (str => println(str))
Then it would go with
hasNext -> true
println("1")
hasNext -> true
println("2")
hasNext -> true
println("3")
hasNext -> false
And output would be simple
1
2
3

Scala: mutable HashMap does not update inside for loop

I have a var permutedTables = HashMap[List[Int], List[String, String]] defined globally. I first populate the Hashmap with the keys in a method, which works.
print(permutedTables) :
Map(List(1,2,3,4) -> List(),
List(2,4,5,6) -> List(), etc...)
The problem occurs when I want to update the values (empty lists) of the HashMap inside a for loop (inside a second method). In other words, I want to add tuples (String, String) in the List() for each key.
for(pi_k <- permutedTables.keySet){
var mask = emptyMask;
mask = pi_k.foldLeft(mask)((s, i) => s.updated(i, '1'))
val maskB = Integer.parseInt(mask,2)
val permutedFP = (intFP & maskB).toBinaryString
// attempt 1 :
// permutedTables(pi_k) :+ (url, permutedFP)
// attempt 2 :
// permutedTables.update(pi_k, permutedTables(pi_k):::List((url, permutedFP)))
}
The values do not update. I still have empty lists as values. I don't understand what is wrong with my code.
EDIT 1: When I call print(permutedTables) after any of the two attempts (inside the loop), the value seem updated, but when I call it outside of the loop, the Lists are empty
EDIT 2: The second attempt in my code seems to work now(!). But why does first not work ?
The second attempt in my code seems to work now(!). But why does first not work ?
Because what you do in the first case is get a list from permutedTables, add an element and throw away the result without storing it back. It would work if you mutated the value, but a List is immutable. With List, you need
permutedTables += pi_k -> permutedTables(pi_k) :+ (url, permutedFP)
Or, as you saw, update.
You can use e.g. ArrayBuffer or ListBuffer as your value type instead (note that you need :+= instead of :+ to mutate them), and convert to your desired type at the end. This is going to be rather more efficient than appending to the end of the list, unless the lists are quite small!
Finally, note that you generally want either var or a mutable type, not both at the same time.

What is the relation between Iterable and Iterator?

What is the difference between Iterator and Iterable in scala?
I thought that Iterable represents a set that I can iterate through, and Iterator is a "pointer" to one of the items in the iterable set.
However, Iterator has functions like forEach, map, foldLeft. It can be converted to Iterable via toIterable. And, for example, scala.io.Source.getLines returns Iterator, not Iterable.
But I cannot do groupBy on Iterator and I can do it on Iterable.
So, what's the relation between those two, Iterator and Iterable?
In short: An Iterator does have state, whereas an Iterable does not.
See the API docs for both.
Iterable:
A base trait for iterable collections.
This is a base trait for all Scala collections that define an iterator
method to step through one-by-one the collection's elements.
[...] This trait implements Iterable's foreach method by stepping
through all elements using iterator.
Iterator:
Iterators are data structures that allow to iterate over a sequence of
elements. They have a hasNext method for checking if there is a next
element available, and a next method which returns the next element
and discards it from the iterator.
An iterator is mutable: most operations on it change its state. While
it is often used to iterate through the elements of a collection, it
can also be used without being backed by any collection (see
constructors on the companion object).
With an Iterator you can stop an iteration and continue it later if you want. If you try to do this with an Iterable it will begin from the head again:
scala> val iterable: Iterable[Int] = 1 to 4
iterable: Iterable[Int] = Range(1, 2, 3, 4)
scala> iterable.take(2)
res8: Iterable[Int] = Range(1, 2)
scala> iterable.take(2)
res9: Iterable[Int] = Range(1, 2)
scala> val iterator = iterable.iterator
iterator: Iterator[Int] = non-empty iterator
scala> if (iterator.hasNext) iterator.next
res23: AnyVal = 1
scala> if (iterator.hasNext) iterator.next
res24: AnyVal = 2
scala> if (iterator.hasNext) iterator.next
res25: AnyVal = 3
scala> if (iterator.hasNext) iterator.next
res26: AnyVal = 4
scala> if (iterator.hasNext) iterator.next
res27: AnyVal = ()
Note, that I didn't use take on Iterator. The reason for this is that it is tricky to use. hasNext and next are the only two methods that are guaranteed to work as expected on Iterator. See the Scaladoc again:
It is of particular importance to note that, unless stated otherwise,
one should never use an iterator after calling a method on it. The two
most important exceptions are also the sole abstract methods: next and
hasNext.
Both these methods can be called any number of times without having to
discard the iterator. Note that even hasNext may cause mutation --
such as when iterating from an input stream, where it will block until
the stream is closed or some input becomes available.
Consider this example for safe and unsafe use:
def f[A](it: Iterator[A]) = {
if (it.hasNext) { // Safe to reuse "it" after "hasNext"
it.next // Safe to reuse "it" after "next"
val remainder = it.drop(2) // it is *not* safe to use "it" again after this line!
remainder.take(2) // it is *not* safe to use "remainder" after this line!
} else it
}
Another explanation from Martin Odersky and Lex Spoon:
There's an important difference between the foreach method on
iterators and the same method on traversable collections: When called
to an iterator, foreach will leave the iterator at its end when it is
done. So calling next again on the same iterator will fail with a
NoSuchElementException. By contrast, when called on on a collection,
foreach leaves the number of elements in the collection unchanged
(unless the passed function adds to removes elements, but this is
discouraged, because it may lead to surprising results).
Source: http://www.scala-lang.org/docu/files/collections-api/collections_43.html
Note also (thanks to Wei-Ching Lin for this tip) Iterator extends the TraversableOnce trait while Iterable doesn't.

Scala: What is the difference between Traversable and Iterable traits in Scala collections?

I have looked at this question but still don't understand the difference between Iterable and Traversable traits. Can someone explain ?
Think of it as the difference between blowing and sucking.
When you have call a Traversables foreach, or its derived methods, it will blow its values into your function one at a time - so it has control over the iteration.
With the Iterator returned by an Iterable though, you suck the values out of it, controlling when to move to the next one yourself.
To put it simply, iterators keep state, traversables don't.
A Traversable has one abstract method: foreach. When you call foreach, the collection will feed the passed function all the elements it keeps, one after the other.
On the other hand, an Iterable has as abstract method iterator, which returns an Iterator. You can call next on an Iterator to get the next element at the time of your choosing. Until you do, it has to keep track of where it was in the collection, and what's next.
tl;dr Iterables are Traversables that can produce stateful Iterators
First, know that Iterable is subtrait of Traversable.
Second,
Traversable requires implementing the foreach method, which is used by everything else.
Iterable requires implementing the iterator method, which is used by everything else.
For example, the implemetation of find for Traversable uses foreach (via a for comprehension) and throws a BreakControl exception to halt iteration once a satisfactory element has been found.
trait TravserableLike {
def find(p: A => Boolean): Option[A] = {
var result: Option[A] = None
breakable {
for (x <- this)
if (p(x)) { result = Some(x); break }
}
result
}
}
In contrast, the Iterable subtract overrides this implementation and calls find on the Iterator, which simply stops iterating once the element is found:
trait Iterable {
override /*TraversableLike*/ def find(p: A => Boolean): Option[A] =
iterator.find(p)
}
trait Iterator {
def find(p: A => Boolean): Option[A] = {
var res: Option[A] = None
while (res.isEmpty && hasNext) {
val e = next()
if (p(e)) res = Some(e)
}
res
}
}
It'd be nice not to throw exceptions for Traversable iteration, but that's the only way to partially iterate when using just foreach.
From one perspective, Iterable is the more demanding/powerful trait, as you can easily implement foreach using iterator, but you can't really implement iterator using foreach.
In summary, Iterable provides a way to pause, resume, or stop iteration via a stateful Iterator. With Traversable, it's all or nothing (sans exceptions for flow control).
Most of the time it doesn't matter, and you'll want the more general interface. But if you ever need more customized control over iteration, you'll need an Iterator, which you can retrieve from an Iterable.
Daniel's answer sounds good. Let me see if I can to put it in my own words.
So an Iterable can give you an iterator, that lets you traverse the elements one at a time (using next()), and stop and go as you please. To do that the iterator needs to keep an internal "pointer" to the element's position. But a Traversable gives you the method, foreach, to traverse all elements at once without stopping.
Something like Range(1, 10) needs to have only 2 integers as state as a Traversable. But Range(1, 10) as an Iterable gives you an iterator which needs to use 3 integers for state, one of which is an index.
Considering that Traversable also offers foldLeft, foldRight, its foreach needs to traverse the elements in a known and fixed order. Therefore it's possible to implement an iterator for a Traversable. E.g.
def iterator = toList.iterator