Overriding methods map and flatMap in class extending trait Iterator - scala

As a Scala beginner I'm trying to implement a counter for every item of an Iterator being retrieved and processed in a for expression as well as a counter incremented every time a new iteration over one of the "loops" of the expression (outer loop and nested loops) is started. The requirement is to accomplish this without simply placing a statement like counter = counter + 1 at numerous locations in the for expression.
The following listing shows my proposed solution to this problem and I would like to know, why method next implementing Iterator's abstract member gets called (and the corresponding counter is incremented) whereas flatMap and map overriding their pendants defined in trait Iterator (and calling them via super) are not called at all.
object ZebraPuzzle {
var starts = 0
var items = 0
class InstrumentedIter[A](it: Iterator[A]) extends Iterator[A] {
private val iterator = it
def hasNext = it.hasNext
def next() = {
items = items + 1
it.next()
}
override def flatMap[B](f: (A) ⇒ GenTraversableOnce[B]): Iterator[B] = {
starts = starts + 1
super.flatMap(f)
}
override def map[B](f: (A) ⇒ B): Iterator[B] = {
starts = starts + 1
super.map(f)
}
} // inner class InstrumentedIter
The corresponding for expression looks like this:
def solve = {
val first = 1
val middle = 3
val houses = List(first, 2, middle, 4, 5)
for {
List(r, g, i, y, b) <- new InstrumentedIter(houses.permutations)
if ...
List(eng, span, ukr, jap, nor) <- new InstrumentedIter(houses.permutations)
if ...
if ...
if ...
List(of, tea, milk, oj, wat) <- new InstrumentedIter(houses.permutations)
if ...
...
} yield ...
...
}
...
} // standalone singleton object ZebraPuzzle
I would be grateful if someone could give me a hint how to solve the given problem in a better way. But most of all I am interested to know why my solution overriding Iterator's map and flatMap doesn't work like expected by my somewhat limited brain ;-)
Regards
Martin

In the meanwhile I managed to find the answer myself. The problem with my solution is that withFilter returns a reference to a newly created AbstractIterator, not an InstrumentedIterator. As a possible solution this reference can be passed to the constructor of a wrapper class like InstrumentedIterator that mixes in trait Iterator and overrides methods map and flatMap. These methods can then do the counting ...
Regards
Martin

Your lines
List(...) <- Iterator
don't call map and flatmap. They call unapply in the List companion object which unpacks Iterators into tuples.
To call map or flatMap you need something like
item <- Iterator
You need to either define a companion object for InstrumentedIter with an unapply method or use the map/flatMap syntax in your for comprehension.

Related

How to program an iterator in scala without using a mutable variable?

I want to implement the iterator trait but in the functional way, ie, without using a var. How to do that?
Suppose I have an external library where I get some elements by calling a function getNextElements(numOfElements: Int):Array[String] and I want to implement an Iterator using that function but without using a variable indicating the "current" array (in my case, the var buffer). How can I implement that in the functional way?
class MyIterator[T](fillBuffer: Int => Array[T]) extends Iterator[T] {
var buffer: List[T] = fillBuffer(10).toList
override def hasNext(): Boolean = {
if (buffer.isEmpty) buffer = fillBuffer(10).toList
buffer.nonEmpty
}
override def next(): T = {
if (!hasNext()) throw new NoSuchElementException()
val elem: T = buffer.head
buffer = buffer.tail
elem
}
}
class Main extends App {
def getNextElements(num: Int): Array[String] = ???
val iterator = new MyIterator[String](getNextElements)
iterator.foreach(println)
}
Iterators are mutable, at least without an interface that also returns a state variable, so you can't in general implement the interface directly without some sort of mutation.
That being said, there are some very useful functions in the Iterator companion object that let you hide the mutation, and make the implementation cleaner. I would implement yours something like:
Iterator.continually(getNextElements(10)).flatten
This calls getNextElements(10) whenever it needs to fill the buffer. The flatten changes it from an Iterator[Array[A]] to an Iterator[A].
Note this returns an infinite iterator. Your question didn't say anything about detecting the end of your source elements, but I would usually implement that using takeWhile. For example, if getNextElements returns an empty array when there are no more elements, you can do:
Iterator.continually(getNextElements(10)).takeWhile(!_.isEmpty).flatten

Why do each new instance of case classes evaluate lazy vals again in Scala?

From what I have understood, scala treats val definitions as values.
So, any instance of a case class with same parameters should be equal.
But,
case class A(a: Int) {
lazy val k = {
println("k")
1
}
val a1 = A(5)
println(a1.k)
Output:
k
res1: Int = 1
println(a1.k)
Output:
res2: Int = 1
val a2 = A(5)
println(a1.k)
Output:
k
res3: Int = 1
I was expecting that for println(a2.k), it should not print k.
Since this is not the required behavior, how should I implement this so that for all instances of a case class with same parameters, it should only execute a lazy val definition only once. Do I need some memoization technique or Scala can handle this on its own?
I am very new to Scala and functional programming so please excuse me if you find the question trivial.
Assuming you're not overriding equals or doing something ill-advised like making the constructor args vars, it is the case that two case class instantiations with same constructor arguments will be equal. However, this does not mean that two case class instantiations with same constructor arguments will point to the same object in memory:
case class A(a: Int)
A(5) == A(5) // true, same as `A(5).equals(A(5))`
A(5) eq A(5) // false
If you want the constructor to always return the same object in memory, then you'll need to handle this yourself. Maybe use some sort of factory:
case class A private (a: Int) {
lazy val k = {
println("k")
1
}
}
object A {
private[this] val cache = collection.mutable.Map[Int, A]()
def build(a: Int) = {
cache.getOrElseUpdate(a, A(a))
}
}
val x = A.build(5)
x.k // prints k
val y = A.build(5)
y.k // doesn't print anything
x == y // true
x eq y // true
If, instead, you don't care about the constructor returning the same object, but you just care about the re-evaluation of k, you can just cache that part:
case class A(a: Int) {
lazy val k = A.kCache.getOrElseUpdate(a, {
println("k")
1
})
}
object A {
private[A] val kCache = collection.mutable.Map[Int, Int]()
}
A(5).k // prints k
A(5).k // doesn't print anything
The trivial answer is "this is what the language does according to the spec". That's the correct, but not very satisfying answer. It's more interesting why it does this.
It might be clearer that it has to do this with a different example:
case class A[B](b: B) {
lazy val k = {
println(b)
1
}
}
When you're constructing two A's, you can't know whether they are equal, because you haven't defined what it means for them to be equal (or what it means for B's to be equal). And you can't statically intitialize k either, as it depends on the passed in B.
If this has to print twice, it would be entirely intuitive if that would only be the case if k depends on b, but not if it doesn't depend on b.
When you ask
how should I implement this so that for all instances of a case class with same parameters, it should only execute a lazy val definition only once
that's a trickier question than it sounds. You make "the same parameters" sound like something that can be known at compile time without further information. It's not, you can only know it at runtime.
And if you only know that at runtime, that means you have to keep all past uses of the instance A[B] alive. This is a built in memory leak - no wonder Scala has no built-in way to do this.
If you really want this - and think long and hard about the memory leak - construct a Map[B, A[B]], and try to get a cached instance from that map, and if it doesn't exist, construct one and put it in the map.
I believe case classes only consider the arguments to their constructor (not any auxiliary constructor) to be part of their equality concept. Consider when you use a case class in a match statement, unapply only gives you access (by default) to the constructor parameters.
Consider anything in the body of case classes as "extra" or "side effect" stuffs. I consider it a good tactic to make case classes as near-empty as possible and put any custom logic in a companion object. Eg:
case class Foo(a:Int)
object Foo {
def apply(s: String) = Foo(s.toInt)
}
In addition to dhg answer, I should say, I'm not aware of functional language that does full constructor memoizing by default. You should understand that such memoizing means that all constructed instances should stick in memory, which is not always desirable.
Manual caching is not that hard, consider this simple code
import scala.collection.mutable
class Doubler private(a: Int) {
lazy val double = {
println("calculated")
a * 2
}
}
object Doubler{
val cache = mutable.WeakHashMap.empty[Int, Doubler]
def apply(a: Int): Doubler = cache.getOrElseUpdate(a, new Doubler(a))
}
Doubler(1).double //calculated
Doubler(5).double //calculated
Doubler(1).double //most probably not calculated

Scala: "map" vs "foreach" - is there any reason to use "foreach" in practice?

In Scala collections, if one wants to iterate over a collection (without returning results, i.e. doing a side effect on every element of collection), it can be done either with
final def foreach(f: (A) ⇒ Unit): Unit
or
final def map[B](f: (A) ⇒ B): SomeCollectionClass[B]
With the exception of possible lazy mapping(*), from an end-user perspective, I see zero differences in these invocations:
myCollection.foreach { element =>
doStuffWithElement(element);
}
myCollection.map { element =>
doStuffWithElement(element);
}
given that I can just ignore what map outputs. I can't think of any specific reason why two different methods should exist & be used, when map seems to include all the functionality of foreach, and, in fact, I would be pretty much impressed if an intelligent compiler & VM won't optimize out that collection object creation given that it's not assigned to anything, or read, or used anywhere.
So, the question is - am I right - and there are no reasons to call foreach anywhere in one's code?
Notes:
(*) The lazy mapping concept, as throughly illustrated in this question, might change things a bit and justify usage of foreach, but as far as I can see, one specifically needs to stumble upon a LazyMap, normal
(**) If one's not using a collection, but writing one, then one would quickly stumble upon the fact that for comprehension syntax syntax is in fact a syntax sugar that generates "foreach" call, i.e. these two lines generate fully equivalent code:
for (element <- myCollection) { doStuffWithElement(element); }
myCollection.foreach { element => doStuffWithElement(element); }
So if one cares about other people using that collection class with for syntax, one might still want to implement foreach method.
I can think of a couple motivations:
When the foreach is the last line of a method that is of type Unit your compiler will not give an warning but will with map (and you need -Ywarn-value-discard on). Sometimes you get warning: a pure expression does nothing in statement position; you may be omitting necessary parentheses using map but wouldn't with foreach.
General readability - a reader can know that your mutating some state without returning something at a glance, but greater cognitive resources would be required to understand the same operation if map was used
Further to 1. you also can have type checking when passing named functions around, then into map and foreach
Using foreach won't build a new list, so will be more efficient (thanks #Vishnu)
scala> (1 to 5).iterator map println
res0: Iterator[Unit] = non-empty iterator
scala> (1 to 5).iterator foreach println
1
2
3
4
5
I'd be impressed if the builder machinery could be optimized away.
scala> :pa
// Entering paste mode (ctrl-D to finish)
implicit val cbf = new collection.generic.CanBuildFrom[List[Int],Int,List[Int]] {
def apply() = new collection.mutable.Builder[Int, List[Int]] {
val b = new collection.mutable.ListBuffer[Int]
override def +=(i: Int) = { println(s"Adding $i") ; b +=(i) ; this }
override def clear() = () ; override def result() = b.result() }
def apply(from: List[Int]) = apply() }
// Exiting paste mode, now interpreting.
cbf: scala.collection.generic.CanBuildFrom[List[Int],Int,List[Int]] = $anon$2#e3cee7b
scala> List(1,2,3) map (_ + 1)
Adding 2
Adding 3
Adding 4
res1: List[Int] = List(2, 3, 4)
scala> List(1,2,3) foreach (_ + 1)

Extending Scala collections: One based Array index exercise

As an exercise, I'd like to extend the Scala Array collection to my own OneBasedArray (does what you'd expect, indexing starts from 1). Since this is an immutable collection, I'd like to have it return the correct type when calling filter/map etc.
I've read the resources here, here and here, but am struggling to understand how to translate this to Arrays (or collections other than the ones in the examples). Am I on the right track with this sort of structure?
class OneBasedArray[T]
extends Array[T]
with GenericTraversableTemplate[T, OneBasedArray]
with ArrayLike[T, OneBasedArray]
Are there any further resources that help explain extending collections?
For a in depth overview of new collections API: The Scala 2.8 Collections API
For a nice view of the relation between main classes and traits this
By the way I don't think Array is a collection in Scala.
Here is an example of pimping iterables with a method that always returns the expected runtime type of the iterable it operates on:
import scala.collection.generic.CanBuildFrom
trait MapOrElse[A] {
val underlying: Iterable[A]
def mapOrElse[B, To]
(m: A => Unit)
(pf: PartialFunction[A,B])
(implicit cbf: CanBuildFrom[Iterable[A], B, To])
: To = {
var builder = cbf(underlying.repr)
for (a <- underlying) if (pf.isDefinedAt(a)) builder += pf(a) else m(a)
builder.result
}
}
implicit def toMapOrElse[A](it: Iterable[A]): MapOrElse[A] =
new MapOrElse[A] {val underlying = it}
The new function mapOrElse is similar to the collect function but it allows you to pass a method m: A => Unit in addition to a partial function pf that is invoked whenever pf is undefined. m can for example be a logging method.
An Array is not a Traversable -- trying to work with that as a base class will cause all sorts of problems. Also, it is not immutable either, which makes it completely unsuited to what you want. Finally, Array is an implementation -- try to inherit from traits or abstract classes.
Array isn't a typical Scala collection... It's simply a Java array that's pimped to look like a collection by way of implicit conversions.
Given the messed-up variance of Java Arrays, you really don't want to be using them without an extremely compelling reason, as they're a source of lurking bugs.
(see here: http://www.infoq.com/presentations/Java-Puzzlers)
Creaking a 1-based collection like this isn't really a good idea either, as you have no way of knowing how many other collection methods rely on the assumption that sequences are 0-based. So to do it safely (if you really must) you'll want add a new method that leaves the default one unchanged:
class OneBasedLookup[T](seq:Seq) {
def atIdx(i:Int) = seq(i-1)
}
implicit def seqHasOneBasedLookup(seq:Seq) = new OneBasedLookup(seq)
// now use `atIdx` on any sequence.
Even safer still, you can create a Map[Int,T], with the indices being one-based
(Iterator.from(1) zip seq).toMap
This is arguably the most "correct" solution, although it will also carry the highest performance cost.
Not an array, but here's a one-based immutable IndexedSeq implementation that I recently put together. I followed the example given here where they implement an RNA class. Between that example, the ScalaDocs, and lots of "helpful" compiler errors, I managed to get it set up correctly. The fact that OneBasedSeq is genericized made it a little more complex than the RNA example. Also, in addition to the traits extended and methods overridden in the example, I had to extend IterableLike and override the iterator method, because various methods call that method behind the scenes, and the default iterator is zero-based.
Please pardon any stylistic or idiomadic oddities; I've been programming in Scala for less than 2 months.
import collection.{IndexedSeqLike, IterableLike}
import collection.generic.CanBuildFrom
import collection.mutable.{Builder, ArrayBuffer}
// OneBasedSeq class
final class OneBasedSeq[T] private (s: Seq[T]) extends IndexedSeq[T]
with IterableLike[T, OneBasedSeq[T]] with IndexedSeqLike[T, OneBasedSeq[T]]
{
private val innerSeq = s.toIndexedSeq
def apply(idx: Int): T = innerSeq(idx - 1)
def length: Int = innerSeq.length
override def iterator: Iterator[T] = new OneBasedSeqIterator(this)
override def newBuilder: Builder[T, OneBasedSeq[T]] = OneBasedSeq.newBuilder
override def toString = "OneBasedSeq" + super.toString
}
// OneBasedSeq companion object
object OneBasedSeq {
private def fromSeq[T](s: Seq[T]) = new OneBasedSeq(s)
def apply[T](vals: T*) = fromSeq(IndexedSeq(vals: _*))
def newBuilder[T]: Builder[T, OneBasedSeq[T]] =
new ArrayBuffer[T].mapResult(OneBasedSeq.fromSeq)
implicit def canBuildFrom[T, U]: CanBuildFrom[OneBasedSeq[T], U, OneBasedSeq[U]] =
new CanBuildFrom[OneBasedSeq[T], U, OneBasedSeq[U]] {
def apply() = newBuilder
def apply(from: OneBasedSeq[T]): Builder[U, OneBasedSeq[U]] = newBuilder[U]
}
}
// Iterator class for OneBasedSeq
class OneBasedSeqIterator[T](private val obs: OneBasedSeq[T]) extends Iterator[T]
{
private var index = 1
def hasNext: Boolean = index <= obs.length
def next: T = {
val ret = obs(index)
index += 1
ret
}
}

Can I overload parenthesis in Scala?

Trying to figure out out how to overload parenthesis on a class.
I have this code:
class App(values: Map[String,String])
{
// do stuff
}
I would like to be able to access the values Map this way:
var a = new App(Map("1" -> "2"))
a("1") // same as a.values("1")
Is this possible?
You need to define an apply method.
class App(values: Map[String,String]) {
def apply(x:String) = values(x)
// ...
}
For completeness, it should be said that your "apply" can take multiple values, and that "update" works as the dual of "apply", allowing "parentheses overloading" on the left-hand-side of assignments
Class PairMap[A, B, C]{
val contents: mutable.Map[(A,B), C] = new mutable.Map[(A, B), C]();
def apply(a:A, b:B):C = contents.get((a, b))
def update(a:A, b:B, c:C):Unit = contents.put((a, b), c)
}
val foo = new PairMap[String, Int, Int]()
foo("bar", 42) = 6
println(foo("bar", 42)) // prints 6
The primary value of all this is that it keeps people from suggesting extra syntax for things that had to be special-cased in earlier C-family languages (e.g. array element assignment and fetch). It's also handy for factory methods on companion objects. Other than that, care should be taken, as it's one of those things that can easily make your code too compact to actually be readable.
As others have already noted, you want to overload apply:
class App(values: Map[String,String]) {
def apply(s: String) = values(s)
}
While you're at it, you might want to overload the companion object apply also:
object App {
def apply(m: Map[String,String]) = new App(m)
}
Then you can:
scala> App(Map("1" -> "2")) // Didn't need to call new!
res0: App = App#5c66b06b
scala> res0("1")
res1: String = 2
though whether this is a benefit or a confusion will depend on what you're trying to do.
I think it works using apply : How does Scala's apply() method magic work?