Lazify this flatMap on a list of iterators

Lazify this flatMap on a list of iterators - scala

Given this function, which I can't modify:
def numbers(c: Char): Iterator[Int] =
if(Character.isDigit(c)) Iterator(Integer.parseInt(c.toString))
else Iterator.empty
// numbers: (c: Char)Iterator[Int]
And this input data:
val data = List('a','b','c','1','d','&','*','x','9')
// data: List[Char] = List(a, b, c, 1, d, &, *, x, 9)
How can I make this function lazy, such that data is only processed to the first occurrence of a number character?
def firstNumber(data: List[Char]) :Int = data.flatMap(numbers).take(1)

data.iterator.flatMap(numbers).take(1).toList
Don't use streams; you don't need the old data stored. Don't use views; they aren't being carefully maintained and are overkill anyway.
If you want an Int, you need some default behavior. Depending on what that is, you might choose
data.iterator.flatMap(numbers).take(1).headOption.getOrElse(0)
or something like
{
val ns = data.iterator.flatMap(numbers)
if (ns.hasNext) ns.next
else throw new NoSuchElementException("Number missing")
}

Just calling .toStream on your data should do it:
firstNumber(data.toStream)

One possibility would be to use Scala's collection views:
http://www.scala-lang.org/docu/files/collections-api/collections_42.html
Calling .view on a collection allows you to call functions like map, flatMap etc on the collection without generating intermediate results.
So in your case you could write:
data.view.flatMap(numbers).take(1).force
which would give a List[Int] with at most one element and only process data to the first number.

you could use Streams:
http://www.scala-lang.org/api/current/index.html#scala.collection.immutable.Stream
One way of stream creation:
1 #:: 2 #:: empty

Related

loop until a condition stands in scala

I'd like to write a generic loop until a given condition stands, in a functional way.
I've came up with the following code :
def loop[A](a: A, f: A => A, cond: A => Boolean) : A =
if (cond(a)) a else loop(f(a), f, cond)
What are other alternatives ? Is there anything in scalaz ?
[update] It may be possible to use cats and to convert A => A into Reader and afterwards use tailRecM. Any help would be appreciated.

I agree with #wheaties's comment, but since you asked for alternatives, here you go:
You could represent the loop's steps as an iterator, then navigate to the first step where cond is true using .find:
val result = Iterator.iterate(a)(f).find(cond).get
I had originally misread, and answered as if the cond was the "keep looping while true" condition, as with C-style loops. Here's my response as if that was what you asked.
val steps = Iterator.iterate(a)(f).takeWhile(cond)
If all you want is the last A value, you can use steps.toIterable.last (oddly, Iterator doesn't have .last defined). Or you could collect all of the values to a list using steps.toList.
Example:
val steps = Iterator.iterate(0)(_ + 1).takeWhile(_ < 10)
// remember that an iterator is read-once, so if you call .toList, you can't call .last
val result = steps.toIterable.last
// result == 9

From your structure, I think what you are describing is closer to dropWhile than takeWhile. What follows is 100% educational and I don't suggest that this is useful or the proper way to solve this problem. Nevertheless, you might find it useful.
If you want to be generic to any container (List, Array, Option, etc.) You will need a method to access the first element of this container (a.k.a. the head):
trait HasHead[I[_]]{
def head[X](of: I[X]): X
}
object HasHead {
implicit val listHasHead = new HasHead[List] {
def head[X](of: List[X]) = of.head
}
implicit val arrayHasHead = new HasHead[Array] {
def head[X](of: Array[X]) = of.head
}
//...
}
Here is the generic loop adapted to work with any container:
def loop[I[_], A](
a: I[A],
f: I[A] => I[A],
cond: A => Boolean)(
implicit
hh: HasHead[I]): I[A] =
if(cond(hh.head(a))) a else loop(f(a), f, cond)
Example:
loop(List(1,2,3,4,5), (_: List[Int]).tail, (_: Int) > 2)
> List(3, 4, 5)

Scala: what is the interest in using Iterators?

I have used Iterators after have worked with Regexes in Scala but I don't really understand the interest.
I know that it has a state and if I call the next() method on it, it will output a different result every time, but I don't see anything I can do with it and that is not possible with an Iterable.
And it doesn't seem to work as Akka Streams (for example) since the following example directly prints all the numbers (without waiting one second as I would expect it):
lazy val a = Iterator({Thread.sleep(1000); 1}, {Thread.sleep(1000); 2}, {Thread.sleep(1000); 3})
while(a.hasNext){ println(a.next()) }
So what is the purpose of using Iterators?

Perhaps, the most useful property of iterators is that they are lazy.
Consider something like this:
(1 to 10000)
.map { x => x * x }
.map { _.toString }
.find { _ == "4" }
This snippet will square 10000 numbers, then generate 10000 strings, and then return the second one.
This on the other hand:
(1 to 10000)
.iterator
.map { x => x * x }
.map { _.toString }
.find { _ == "4" }
... only computes two squares, and generates two strings.
Iterators are also often useful when you need to wrap around some poorly designed (java?) objects in order to be able to handle them in functional style:
val rs: ResultSet = jdbcQuery.executeQuery()
new Iterator {
def next = rs
def hasNext = rs.next
}.map { rs =>
fetchData(rs)
}
Streams are similar to iterators - they are also lazy, and also useful for wrapping:
Stream.continually(rs).takeWhile { _.next }.map(fetchData)
The main difference though is that streams remember the data that gets materialized, so that you can traverse them more than once. This is convenient, but may be costly if the original amount of data is very large, especially, if it gets filtered down to much smaller size:
Source
.fromFile("huge_file.txt")
.getLines
.filter(_ == "")
.toList
This only uses, roughly (ignoring buffering, object overhead, and other implementation specific details), the amount of memory, necessary to keep one line in memory, plus however many empty lines there are in the file.
This on the other hand:
val reader = new FileReader("huge_file.txt")
Stream
.continually(reader.readLine)
.takeWhile(_ != null)
.filter(_ == "")
.toList
... will end up with the entire content of the huge_file.txt in memory.
Finally, if I understand the intent of your example correctly, here is how you could do it with iterators:
val iterator = Seq(1,2,3).iterator.map { n => Thread.sleep(1000); n }
iterator.foreach(println)
// Or while(iterator.hasNext) { println(iterator.next) } as you had it.

There is a good explanation of what iterator is http://www.scala-lang.org/docu/files/collections-api/collections_43.html
An iterator is not a collection, but rather a way to access the
elements of a collection one by one. The two basic operations on an
iterator it are next and hasNext. A call to it.next() will return the
next element of the iterator and advance the state of the iterator.
Calling next again on the same iterator will then yield the element
one beyond the one returned previously. If there are no more elements
to return, a call to next will throw a NoSuchElementException.

First of all you should understand what is wrong with your example:
lazy val a = Iterator({Thread.sleep(1); 1}, {Thread.sleep(1); 2},
{Thread.sleep(2); 3}) while(a.hasNext){ println(a.next()) }
if you look at the apply method of Iterator, you'll see there are no calls by name,so all Thread.sleep are calling at the same time when apply method calls. Also Thread.sleep takes parameter of time to sleep in milliseconds, so if you want to sleep your thread on one second you should pass Thread.sleep(1000).
The companion object has additional methods which allow you do the next:
val a = Iterator.iterate(1)(x => {Thread.sleep(1000); x+1})
Iterator is very useful when you need to work with large data. Also you can implement your own:
val it = new Iterator[Int] {
var i = -1
def hasNext = true
def next(): Int = { i += 1; i }
}

I don't see anything I can do with it and that is not possible with an Iterable
In fact, what most collection can do can also be done with Array, but we don't do that because it's much less convenient
So same reason apply to iterator, if you want to model a mutable state, then iterator makes more sense.
For example, Random is implemented in a way resemble to iterator because it's use case fit more naturally in iterator, rather than iterable.

List of functions mapped on view - is the view traversed more than once?

I'm trying to wrap my head around how views are used in Scala. I have the following example code:
class Square extends (Seq[Int] => Int) {
def apply(x: Seq[Int]) = x.reduce(_ * _)
}
class Sum extends (Seq[Int] => Int) {
def apply(x: Seq[Int]) = x.reduce(_ + _)
}
val functionList = List(new Square, new Sum)
val list = List(1, 2, 3, 4, 5, 6)
val view = list.view
functionList.map(f => f(view))
My questions are:
Will the list be traversed once or twice in the above example when the map is applied?
If it is traversed more than once, is there any other design pattern that I can use that allows me to define a collection of functions and map over some other collection while only traversing the second collection once?

The list is being traversed twice. What you're essentially doing is first taking the Square function and applying a reduce on all the elements. Next the Sum function will also apply a reduce on all the elements.
A view turns the collection into a lazy one. Meaning if you have multiple transformations they will only be applied when they are needed. In this case, I don't think this has any impact on your solution, since you're applying functions that do calculations on whole collections (meaning they need to be evaluated to get the result). See this question for an answer on when to use views.
Traversing the second collection once isn't easy in this case. You have two functions that do a reduce on the whole list, meaning each one needs a seperate accumulator and has a seperate result. If you want to only traverse your element list once, you need to change your logic a bit. Instead of defining operations on the whole list, you need to define operations on how to combine two elements. Here's what I came up with:
val functionListWithInitialValues = List(
((a: Int, b: Int) => a * b, 1), //This is a Tuple that defines how to combine two calculations and what the initial value is.
((a: Int, b: Int) => a + b, 0)
)
val results = list.foldLeft(functionListWithInitialValues) {
case (accumulators, next) => // foldLeft gives us the previous results
// (which is essentailly a Tuple(function, value)),
// and the next element, to combine with the previous result.
// Now let's go through our functions and apply the functions on the accumulator and the next element
accumulators.map {
case (function, previousResult) =>
(function, function(previousResult, next))
}
}
results.map { case (function, result) => println(result) }
This solution will only traverse your elements once, applying the combining function on the accumulator and the next element.

Not sure what List you are talking about.
You are applying the map on an actual strict List so functionList will be traversed once, and applying 'f' to the view will reduce the List[Int] for each function in functionList.
If you want to apply your modificators lazily then functionList should be a view so that 'map' is not strict.

Scala: Why does SortedMap's mapValues returns a Map and not a SortedMap?

I'm new to Scala.
I'm using SortedMap in my code, and I wanted to use mapValues to create a new map with some transformation on the values.
Instead of returning a new SortedMap, the mapValues function returns a new Map, which I then have to convert to a SortedMap.
For example
val my_map = SortedMap(1 -> "one", 0 -> "zero", 2 -> "two")
val new_map = my_map.mapValues(name => name.toUpperCase)
// returns scala.collection.immutable.Map[Int,java.lang.String] = Map(0 -> ZERO, 1 -> ONE, 2 -> TWO)
val sorted_new_map = SortedMap(new_map.toArray:_ *)
This looks inefficient - the last convertion probably sorts the keys again, or at least verify that they are sorted.
I could use the normal map function which operates both on the keys and the values, and deliberately not change the keys in my transformation function. This looks inefficient too, since the implementation of Map probably assume that the transformation may change the order of the keys (like in the case: my_map.map(tup => (-tup._1, tup._2)) - so it probably "re-sorts" them too.
Is anyone familiar with the internal implementations of Map and SortedMap, and could tell me if my assumptions are correct? Can the compiler recognize automatically that the keys have not been reordered? Is there an internal reason for why mapValues should not return a SortedMap? Is there a better way to transform the map's values without loosing the order of the keys?
Thanks

You've stumbled upon a tricky feature of Scala's Map implementation. The catch that you are missing is that mapValues does not actually return a new Map: it returns a view of a Map. In other words, it wraps your original map in such a way that whenever you access a value it will compute .toUpperCase before returning the value to you.
The upside to this behavior is that Scala won't compute the function for values that aren't accessed, and it won't spend time copying all the data into a new Map. The downside is that the function is re-computed every time that value is accessed. So you might end up doing extra computation if you access the same values many times.
So why does SortedMap not return a SortedMap? Because it's actually returning a Map-wrapper. The underlying Map, then one that is wrapped, is still a SortedMap, so if you were to iterate through, it would still be in sorted order. You and I know that, but the type-checker doesn't. It certainly seems like they could have written it in such a way that it still maintains the SortedMap trait, but they didn't.
You can see in the code that it's not returning a SortedMap, but that the iteration behavior is still going to be sorted:
// from MapLike
override def mapValues[C](f: B => C): Map[A, C] = new DefaultMap[A, C] {
def iterator = for ((k, v) <- self.iterator) yield (k, f(v))
...
The solution to your problem is the same as the solution to getting around the view issue: use .map{ case (k,v) => (k,f(v)) }, as you mentioned in your question.
If you really want that convenience method though, you can do what I do, and write you own, better, version of mapValues:
class EnrichedWithMapVals[T, U, Repr <: GenTraversable[(T, U)]](self: GenTraversableLike[(T, U), Repr]) {
/**
* In a collection of pairs, map a function over the second item of each
* pair. Ensures that the map is computed at call-time, and not returned
* as a view as 'Map.mapValues' would do.
*
* #param f function to map over the second item of each pair
* #return a collection of pairs
*/
def mapVals[R, That](f: U => R)(implicit bf: CanBuildFrom[Repr, (T, R), That]) = {
val b = bf(self.asInstanceOf[Repr])
b.sizeHint(self.size)
for ((k, v) <- self) b += k -> f(v)
b.result
}
}
implicit def enrichWithMapVals[T, U, Repr <: GenTraversable[(T, U)]](self: GenTraversableLike[(T, U), Repr]): EnrichedWithMapVals[T, U, Repr] =
new EnrichedWithMapVals(self)
Now when you call mapVals on a SortedMap you get back a non-view SortedMap:
scala> val m3 = m1.mapVals(_ + 1)
m3: SortedMap[String,Int] = Map(aardvark -> 2, cow -> 6, dog -> 10)
It actually works on any collection of pairs, not just Map implementations:
scala> List(('a,1),('b,2),('c,3)).mapVals(_+1)
res8: List[(Symbol, Int)] = List(('a,2), ('b,3), ('c,4))

Nearest keys in a SortedMap

Given a key k in a SortedMap, how can I efficiently find the largest key m that is less than or equal to k, and also the smallest key n that is greater than or equal to k. Thank you.

Looking at the source code for 2.9.0, the following code seems about to be the best you can do
def getLessOrEqual[A,B](sm: SortedMap[A,B], bound: A): B = {
val key = sm.to(x).lastKey
sm(key)
}
I don't know exactly how the splitting of the RedBlack tree works, but I guess it's something like a O(log n) traversal of the tree/construction of new elements and then a balancing, presumable also O(log n). Then you need to go down the new tree again to get the last key. Unfortunately you can't retrieve the value in the same go. So you have to go down again to fetch the value.
In addition the lastKey might throw an exception and there is no similar method that returns an Option.
I'm waiting for corrections.
Edit and personal comment
The SortedMap area of the std lib seems to be a bit neglected. I'm also missing a mutable SortedMap. And looking through the sources, I noticed that there are some important methods missing (like the one the OP asks for or the ones pointed out in my answer) and also some have bad implementation, like 'last' which is defined by TraversableLike and goes through the complete tree from first to last to obtain the last element.
Edit 2
Now the question is reformulated my answer is not valid anymore (well it wasn't before anyway). I think you have to do the thing I'm describing twice for lessOrEqual and greaterOrEqual. Well you can take a shortcut if you find the equal element.

Scala's SortedSet trait has no method that will give you the closest element to some other element.
It is presently implemented with TreeSet, which is based on RedBlack. The RedBlack tree is not visible through methods on TreeSet, but the protected method tree is protected. Unfortunately, it is basically useless. You'd have to override methods returning TreeSet to return your subclass, but most of them are based on newSet, which is private.
So, in the end, you'd have to duplicate most of TreeSet. On the other hand, it isn't all that much code.
Once you have access to RedBlack, you'd have to implement something similar to RedBlack.Tree's lookup, so you'd have O(logn) performance. That's actually the same complexity of range, though it would certainly do less work.
Alternatively, you'd make a zipper for the tree, so that you could actually navigate through the set in constant time. It would be a lot more work, of course.

Using Scala 2.11.7, the following will give what you want:
scala> val set = SortedSet('a', 'f', 'j', 'z')
set: scala.collection.SortedSet[Char] = TreeSet(a, f, j, z)
scala> val beforeH = set.to('h').last
beforeH: Char = f
scala> val afterH = set.from('h').head
afterH: Char = j
Generally you should use lastOption and headOption as the specified elements may not exist. If you are looking to squeeze a little more efficiency out, you can try replacing from(...).head with keysIteratorFrom(...).head

Sadly, the Scala library only allows to make this type of query efficiently:
and also the smallest key n that is greater than or equal to k.
val n = TreeMap(...).keysIteratorFrom(k).next
You can hack this by keeping two structures, one with normal keys, and one with negated keys. Then you can use the other structure to make the second type of query.
val n = - TreeMap(...).keysIteratorFrom(-k).next

Looks like I should file a ticket to add 'fromIterator' and 'toIterator' methods to 'Sorted' trait.

Well, one option is certainly using java.util.TreeMap.
It has lowerKey and higherKey methods, which do excatly what you want.

I had a similar problem: I wanted to find the closest element to a given key in a SortedMap. I remember the answer to this question being, "You have to hack TreeSet," so when I had to implement it for a project, I found a way to wrap TreeSet without getting into its internals.
I didn't see jazmit's answer, which more closely answers the original poster's question with minimum fuss (two method calls). However, those method calls do more work than needed for this application (multiple tree traversals), and my solution provides lots of hooks where other users can modify it to their own needs.
Here it is:
import scala.collection.immutable.TreeSet
import scala.collection.SortedMap
// generalize the idea of an Ordering to metric sets
trait MetricOrdering[T] extends Ordering[T] {
def distance(x: T, y: T): Double
def compare(x: T, y: T) = {
val d = distance(x, y)
if (d > 0.0) 1
else if (d < 0.0) -1
else 0
}
}
class MetricSortedMap[A, B]
(elems: (A, B)*)
(implicit val ordering: MetricOrdering[A])
extends SortedMap[A, B] {
// while TreeSet searches for an element, keep track of the best it finds
// with *thread-safe* mutable state, of course
private val best = new java.lang.ThreadLocal[(Double, A, B)]
best.set((-1.0, null.asInstanceOf[A], null.asInstanceOf[B]))
private val ord = new MetricOrdering[(A, B)] {
def distance(x: (A, B), y: (A, B)) = {
val diff = ordering.distance(x._1, y._1)
val absdiff = Math.abs(diff)
// the "to" position is a key-null pair; the object of interest
// is the other one
if (absdiff < best.get._1)
(x, y) match {
// in practice, TreeSet always picks this first case, but that's
// insider knowledge
case ((to, null), (pos, obj)) =>
best.set((absdiff, pos, obj))
case ((pos, obj), (to, null)) =>
best.set((absdiff, pos, obj))
case _ =>
}
diff
}
}
// use a TreeSet as a backing (not TreeMap because we need to get
// the whole pair back when we query it)
private val treeSet = TreeSet[(A, B)](elems: _*)(ord)
// find the closest key and return:
// (distance to key, the key, its associated value)
def closest(to: A): (Double, A, B) = {
treeSet.headOption match {
case Some((pos, obj)) =>
best.set((ordering.distance(to, pos), pos, obj))
case None =>
throw new java.util.NoSuchElementException(
"SortedMap has no elements, and hence no closest element")
}
treeSet((to, null.asInstanceOf[B])) // called for side effects
best.get
}
// satisfy the contract (or throw UnsupportedOperationException)
def +[B1 >: B](kv: (A, B1)): SortedMap[A, B1] =
new MetricSortedMap[A, B](
elems :+ (kv._1, kv._2.asInstanceOf[B]): _*)
def -(key: A): SortedMap[A, B] =
new MetricSortedMap[A, B](elems.filter(_._1 != key): _*)
def get(key: A): Option[B] = treeSet.find(_._1 == key).map(_._2)
def iterator: Iterator[(A, B)] = treeSet.iterator
def rangeImpl(from: Option[A], until: Option[A]): SortedMap[A, B] =
new MetricSortedMap[A, B](treeSet.rangeImpl(
from.map((_, null.asInstanceOf[B])),
until.map((_, null.asInstanceOf[B]))).toSeq: _*)
}
// test it with A = Double
implicit val doubleOrdering =
new MetricOrdering[Double] {
def distance(x: Double, y: Double) = x - y
}
// and B = String
val stuff = new MetricSortedMap[Double, String](
3.3 -> "three",
1.1 -> "one",
5.5 -> "five",
4.4 -> "four",
2.2 -> "two")
println(stuff.iterator.toList)
println(stuff.closest(1.5))
println(stuff.closest(1000))
println(stuff.closest(-1000))
println(stuff.closest(3.3))
println(stuff.closest(3.4))
println(stuff.closest(3.2))

I've been doing:
val m = SortedMap(myMap.toSeq:_*)
val offsetMap = (m.toSeq zip m.keys.toSeq.drop(1)).map {
case ( (k,v),newKey) => (newKey,v)
}.toMap
When I want the results of my map off-set by one key. I'm also looking for a better way, preferably without storing an extra map.