While trying to understand the differences between streams, iterators, and views of collections, I stumbled upon the following strange behavior.
Here the code (map and filter simply print their input and forward it unchanged):
object ArrayViewTest {
def main(args: Array[String]) {
val array = Array.range(1,10)
print("stream-map-head: ")
array.toStream.map(x => {print(x); x}).head
print("\nstream-filter-head: ")
array.toStream.filter(x => {print(x); true}).head
print("\niterator-map-head: ")
array.iterator.map(x => {print(x); x}).take(1).toArray
print("\niterator-filter-head: ")
array.iterator.filter(x => {print(x); true}).take(1).toArray
print("\nview-map-head: ")
array.view.map(x => {print(x); x}).head
print("\nview-filter-head: ")
array.view.filter(x => {print(x); true}).head
}
}
And its output:
stream-map-head: 1
stream-filter-head: 1
iterator-map-head: 1
iterator-filter-head: 1
view-map-head: 1
view-filter-head: 123456789 // <------ WHY ?
Why does filter called on a view process the whole array?
I expected that the evaluation of filter is driven only once by calling head, just as in all other cases, in particular just as in using map on view.
Which insight am I missing ?
(Mini-side-question for a comment, why is there no head on an iterator?)
edit:
The same strange behavior (as here for scala.Array.range(1,10)) is achieved by scala.collection.mutable.ArraySeq.range(1,10), scala.collection.mutable.ArrayBuffer.range(1,10), and scala.collection.mutable.StringBuilder.newBuilder.append("123456789").
However, for all other mutable collections, and all immutable collections, the filter on the view works as expected and outputs 1.
It seems the head uses isEmpty
trait IndexedSeqOptimized[+A, +Repr] extends Any with IndexedSeqLike[A, Repr] { self =>
...
override /*IterableLike*/
def head: A = if (isEmpty) super.head else this(0)
And isEmpty uses length
trait IndexedSeqOptimized[+A, +Repr] extends Any with IndexedSeqLike[A, Repr] { self =>
...
override /*IterableLike*/
def isEmpty: Boolean = { length == 0 }
The implementation of length is used from Filtered which loops through the whole array
trait Filtered extends super.Filtered with Transformed[A] {
protected[this] lazy val index = {
var len = 0
val arr = new Array[Int](self.length)
for (i <- 0 until self.length)
if (pred(self(i))) {
arr(len) = i
len += 1
}
arr take len
}
def length = index.length
def apply(idx: Int) = self(index(idx))
}
The Filtered trait is only used when calling filter
protected override def newFiltered(p: A => Boolean): Transformed[A] =
new { val pred = p } with AbstractTransformed[A] with Filtered
This is why is happens when using filter and not when using map
I think it has to do that Array is a mutable indexed sequence. And it's view is also a mutable collection :) So when it creates a view it creates an index that maps between original collection and filtered collection. And it doesn't really make sense to create this index lazily, because when someone will request the ith element than the whole source array may be traversed anyway. It is still lazy in a sense that this index is not created until you call head. Still this is not explicitly stated in scala documentation, and it looks like a bug at first sight.
For the mini side question, I think the problem with head on iterator is that people expect head to be pure function, namely you should be able to call it n times and it should return the same result each time. And iterator is inherently mutable data structure, which by contract is only traversable once. This may be overcomed by caching the first element of the iterator, but I find this to be very confusing.
Related
I am trying to find an elegant way to do:
val l = List(1,2,3)
val (item, idx) = l.zipWithIndex.find(predicate)
val updatedItem = updating(item)
l.update(idx, updatedItem)
Can I do all in one operation ? Find the item, if it exist replace with updated value and keep it in place.
I could do:
l.map{ i =>
if (predicate(i)) {
updating(i)
} else {
i
}
}
but that's pretty ugly.
The other complexity is the fact that I want to update only the first element which match predicate .
Edit: Attempt:
implicit class UpdateList[A](l: List[A]) {
def filterMap(p: A => Boolean)(update: A => A): List[A] = {
l.map(a => if (p(a)) update(a) else a)
}
def updateFirst(p: A => Boolean)(update: A => A): List[A] = {
val found = l.zipWithIndex.find { case (item, _) => p(item) }
found match {
case Some((item, idx)) => l.updated(idx, update(item))
case None => l
}
}
}
I don't know any way to make this in one pass of the collection without using a mutable variable. With two passes you can do it using foldLeft as in:
def updateFirst[A](list:List[A])(predicate:A => Boolean, newValue:A):List[A] = {
list.foldLeft((List.empty[A], predicate))((acc, it) => {acc match {
case (nl,pr) => if (pr(it)) (newValue::nl, _ => false) else (it::nl, pr)
}})._1.reverse
}
The idea is that foldLeft allows passing additional data through iteration. In this particular implementation I change the predicate to the fixed one that always returns false. Unfortunately you can't build a List from the head in an efficient way so this requires another pass for reverse.
I believe it is obvious how to do it using a combination of map and var
Note: performance of the List.map is the same as of a single pass over the list only because internally the standard library is mutable. Particularly the cons class :: is declared as
final case class ::[B](override val head: B, private[scala] var tl: List[B]) extends List[B] {
so tl is actually a var and this is exploited by the map implementation to build a list from the head in an efficient way. The field is private[scala] so you can't use the same trick from outside of the standard library. Unfortunately I don't see any other API call that allows to use this feature to reduce the complexity of your problem to a single pass.
You can avoid .zipWithIndex() by using .indexWhere().
To improve complexity, use Vector so that l(idx) becomes effectively constant time.
val l = Vector(1,2,3)
val idx = l.indexWhere(predicate)
val updatedItem = updating(l(idx))
l.updated(idx, updatedItem)
Reason for using scala.collection.immutable.Vector rather than List:
Scala's List is a linked list, which means data are access in O(n) time. Scala's Vector is indexed, meaning data can be read from any point in effectively constant time.
You may also consider mutable collections if you're modifying just one element in a very large collection.
https://docs.scala-lang.org/overviews/collections/performance-characteristics.html
Running the PrefixMap example from the book Programming in Scala, 3rd edition, from the chapter The Architecture of Scala Collections, I don't understand what updates the inherited Map of PrefixMap when calling update.
Here is the code:
import collection._
class PrefixMap[T]
extends mutable.Map[String, T]
with mutable.MapLike[String, T, PrefixMap[T]] {
val id: Long = PrefixMap.nextId
var suffixes: immutable.Map[Char, PrefixMap[T]] = Map.empty
var value: Option[T] = None
def get(s: String): Option[T] =
if (s.isEmpty) value
else suffixes get s(0) flatMap (_.get(s substring 1))
def withPrefix(s: String): PrefixMap[T] =
if (s.isEmpty) this
else {
val leading = s(0)
suffixes get leading match {
case None =>
suffixes = suffixes + (leading -> empty)
case _ =>
}
val ret = suffixes(leading) withPrefix (s substring 1)
println("withPrefix: ends with: id="+this.id+", size="+this.size+", this="+this)
ret
}
override def update(s: String, elem: T) = {
println("update: this before withPrefix: id="+this.id+", size="+this.size+", return="+this)
val pm = withPrefix(s)
println("update: withPrefix returned to update: id="+pm.id+", size="+pm.size+", return="+pm)
println("===> update: this after withPrefix and before assignment to pm.value : id="+this.id+", size="+this.size+", return="+this)
pm.value = Some(elem)
println("===> update: this after assinment to pm.value: id="+this.id+", size="+this.size+", return="+this)
}
override def remove(s: String): Option[T] =
if (s.isEmpty) { val prev = value; value = None; prev }
else suffixes get s(0) flatMap (_.remove(s substring 1))
def iterator: Iterator[(String, T)] =
(for (v <- value.iterator) yield ("", v)) ++
(for ((chr, m) <- suffixes.iterator;
(s, v) <- m.iterator) yield (chr +: s, v))
def += (kv: (String, T)): this.type = { update(kv._1, kv._2); this }
def -= (s: String): this.type = { remove(s); this }
override def empty = new PrefixMap[T]
}
object PrefixMap {
var ids: Long = 0
def nextId: Long = { PrefixMap.ids+=1; ids }
}
object MyApp extends App {
val pm = new PrefixMap[Int]
pm.update("a", 0)
println(pm)
}
The output is:
update: this before withPrefix: id=1, size=0, return=Map()
withPrefix: ends with: id=1, size=0, this=Map()
update: withPrefix returned to update: id=2, size=0, return=Map()
===> update: this after withPrefix and before assignment to pm.value : id=1, size=0, return=Map()
===> update: this after assinment to pm.value: id=1, size=1, return=Map(a -> 0)
Map(a -> 0)
So the question is: how it is possible that the line with "pm.value = Some(elem)" in the update method causes the inherited Map of PrefixMap to be updated with (a -> 0)?
It is not clear what you mean by "inherited Map of PrefixMap". Map is a trait which if you are coming from the Java world is similar to interface. It means that Map on its own doesn't hold any value, it just specifies contract and provides some default implementation of various convenience methods via "core" methods (the ones you implement in your PrefixMap).
As to how this whole data structure works, you should imagine this PrefixMap implementation as a "tree". Logically each edge has a single char (in the prefix sequence) and each node potentially a value that corresponds to a string that is created by accumulation all chars on the way from the root to the current node.
So if you have a Map with "ab" -> 12 key-value, the tree will look something like this:
And if you add "ac" -> 123 to the tree, it will become
Finally if you add "a" -> 1 to the tree, it will become:
Important observation here is that if you take the "a" node as a root, what you'll be left with is a valid prefix tree with all strings shortened by that "a" prefix.
Physically the layout is a bit different:
There is the root node which is PrefixMap[T] which is Map[String,T] from the outside, and also a node for an empty string key.
Internal nodes which are value + suffixes i.e. optional value and merged list of children nodes with their corresponding characters on the edge into a Map[Char, PrefixMap[T]]
As you may see update implementation is effectively find something with withPrefix call and then assigning value to it. So what the withPrefix method does? Although it is implemented recursively, it might be easier to think about it in an iterative way. From this point of view, it iterates over the characters of the given String one by one and navigates through the tree creating missing nodes see
case None =>
suffixes = suffixes + (leading -> empty)
and finally returns the node corresponding to the whole String (i.e. this in case the deepest recursive s.isEmpty)
Method get implementation is actually quite similar to the withPrefix: it recursively iterates over given string and navigates through the tree but it is simpler because it doesn't have to create missing nodes. Because children nodes are actually also stored in a Map its get method returns Option the same way PrefixMap should return Option. So you can just use flatMap and it will work OK if there is no such child node at some level.
Finally iterator creates its iterator as a union of
the value.iterator (luckily Option in Scala implements iterator that returns just 1 or 0 elements depending on whether there is a value or not)
all iterators of all the children nodes just adding its own character as a prefix to their keys.
So when you do
val pm = new PrefixMap[Int]
pm.update("a", 0)
println(pm)
update creates are node(s) in the tree and stores the value. And pm.toString actually uses iterate to build string representation. So it iterates over the tree collection all the values in non-empty value Options in all the nodes.
We can implement a queue in java simply by using ArrayList but in case of Scala Lists are immutable so how can I implement a queue using List in Scala.Somebody give me some hint about it.
This is from Scala's immutable Queue:
Queue is implemented as a pair of Lists, one containing the in elements and the other the out elements. Elements are added to the in list and removed from the out list. When the out list runs dry, the queue is pivoted by replacing the out list by in.reverse, and in by Nil.
So:
object Queue {
def empty[A]: Queue[A] = new Queue(Nil, Nil)
}
class Queue[A] private (in: List[A], out: List[A]) {
def isEmpty: Boolean = in.isEmpty && out.isEmpty
def push(elem: A): Queue[A] = new Queue(elem :: in, out)
def pop(): (A, Queue[A]) =
out match {
case head :: tail => (head, new Queue(in, tail))
case Nil =>
val head :: tail = in.reverse // throws exception if empty
(head, new Queue(Nil, tail))
}
}
var q = Queue.empty[Int]
(1 to 10).foreach(i => q = q.push(i))
while (!q.isEmpty) { val (i, r) = q.pop(); println(i); q = r }
With immutable Lists, you have to return a new List after any modifying operation. Once you've grasped that, it's straightforward. A minimal (but inefficient) implementation where the Queue is also immutable might be:
class Queue[T](content:List[T]) {
def pop() = new Queue(content.init)
def push(element:T) = new Queue(element::content)
def peek() = content.last
override def toString() = "Queue of:" + content.toString
}
val q= new Queue(List(1)) //> q : lists.queue.Queue[Int] = Queue of:List(1)
val r = q.push(2) //> r : lists.queue.Queue[Int] = Queue of:List(2, 1)
val s = r.peek() //> s : Int = 1
val t = r.pop() //> t : lists.queue.Queue[Int] = Queue of:List(2)
If we talk about mutable Lists, they wouldn't be an efficient structure for implementing a Queue for the following reason: Adding elements to the beginning of a list works very well (takes constant time), but popping elements off the end is not efficient at all (takes longer the more elements there are in the list).
You do, however, have Arrays in Scala. Accessing any element in an array takes constant time. Unfortunately arrays are not dynamically sized, so they wouldn't make good queues. They cannot grow as your queue grows. However ArrayBuffers do grow as your array grows. So that would be a great place to start.
Also, note that Scala already has a Queue class: scala.collection.mutable.Queue.
The only way to implement a Queue with an immutable List would be to use a var. Good luck!
Much like this question:
Functional code for looping with early exit
Say the code is
def findFirst[T](objects: List[T]):T = {
for (obj <- objects) {
if (expensiveFunc(obj) != null) return /*???*/ Some(obj)
}
None
}
How to yield a single element from a for loop like this in scala?
I do not want to use find, as proposed in the original question, i am curious about if and how it could be implemented using the for loop.
* UPDATE *
First, thanks for all the comments, but i guess i was not clear in the question. I am shooting for something like this:
val seven = for {
x <- 1 to 10
if x == 7
} return x
And that does not compile. The two errors are:
- return outside method definition
- method main has return statement; needs result type
I know find() would be better in this case, i am just learning and exploring the language. And in a more complex case with several iterators, i think finding with for can actually be usefull.
Thanks commenters, i'll start a bounty to make up for the bad posing of the question :)
If you want to use a for loop, which uses a nicer syntax than chained invocations of .find, .filter, etc., there is a neat trick. Instead of iterating over strict collections like list, iterate over lazy ones like iterators or streams. If you're starting with a strict collection, make it lazy with, e.g. .toIterator.
Let's see an example.
First let's define a "noisy" int, that will show us when it is invoked
def noisyInt(i : Int) = () => { println("Getting %d!".format(i)); i }
Now let's fill a list with some of these:
val l = List(1, 2, 3, 4).map(noisyInt)
We want to look for the first element which is even.
val r1 = for(e <- l; val v = e() ; if v % 2 == 0) yield v
The above line results in:
Getting 1!
Getting 2!
Getting 3!
Getting 4!
r1: List[Int] = List(2, 4)
...meaning that all elements were accessed. That makes sense, given that the resulting list contains all even numbers. Let's iterate over an iterator this time:
val r2 = (for(e <- l.toIterator; val v = e() ; if v % 2 == 0) yield v)
This results in:
Getting 1!
Getting 2!
r2: Iterator[Int] = non-empty iterator
Notice that the loop was executed only up to the point were it could figure out whether the result was an empty or non-empty iterator.
To get the first result, you can now simply call r2.next.
If you want a result of an Option type, use:
if(r2.hasNext) Some(r2.next) else None
Edit Your second example in this encoding is just:
val seven = (for {
x <- (1 to 10).toIterator
if x == 7
} yield x).next
...of course, you should be sure that there is always at least a solution if you're going to use .next. Alternatively, use headOption, defined for all Traversables, to get an Option[Int].
You can turn your list into a stream, so that any filters that the for-loop contains are only evaluated on-demand. However, yielding from the stream will always return a stream, and what you want is I suppose an option, so, as a final step you can check whether the resulting stream has at least one element, and return its head as a option. The headOption function does exactly that.
def findFirst[T](objects: List[T], expensiveFunc: T => Boolean): Option[T] =
(for (obj <- objects.toStream if expensiveFunc(obj)) yield obj).headOption
Why not do exactly what you sketched above, that is, return from the loop early? If you are interested in what Scala actually does under the hood, run your code with -print. Scala desugares the loop into a foreach and then uses an exception to leave the foreach prematurely.
So what you are trying to do is to break out a loop after your condition is satisfied. Answer here might be what you are looking for. How do I break out of a loop in Scala?.
Overall, for comprehension in Scala is translated into map, flatmap and filter operations. So it will not be possible to break out of these functions unless you throw an exception.
If you are wondering, this is how find is implemented in LineerSeqOptimized.scala; which List inherits
override /*IterableLike*/
def find(p: A => Boolean): Option[A] = {
var these = this
while (!these.isEmpty) {
if (p(these.head)) return Some(these.head)
these = these.tail
}
None
}
This is a horrible hack. But it would get you the result you wished for.
Idiomatically you'd use a Stream or View and just compute the parts you need.
def findFirst[T](objects: List[T]): T = {
def expensiveFunc(o : T) = // unclear what should be returned here
case class MissusedException(val data: T) extends Exception
try {
(for (obj <- objects) {
if (expensiveFunc(obj) != null) throw new MissusedException(obj)
})
objects.head // T must be returned from loop, dummy
} catch {
case MissusedException(obj) => obj
}
}
Why not something like
object Main {
def main(args: Array[String]): Unit = {
val seven = (for (
x <- 1 to 10
if x == 7
) yield x).headOption
}
}
Variable seven will be an Option holding Some(value) if value satisfies condition
I hope to help you.
I think ... no 'return' impl.
object TakeWhileLoop extends App {
println("first non-null: " + func(Seq(null, null, "x", "y", "z")))
def func[T](seq: Seq[T]): T = if (seq.isEmpty) null.asInstanceOf[T] else
seq(seq.takeWhile(_ == null).size)
}
object OptionLoop extends App {
println("first non-null: " + func(Seq(null, null, "x", "y", "z")))
def func[T](seq: Seq[T], index: Int = 0): T = if (seq.isEmpty) null.asInstanceOf[T] else
Option(seq(index)) getOrElse func(seq, index + 1)
}
object WhileLoop extends App {
println("first non-null: " + func(Seq(null, null, "x", "y", "z")))
def func[T](seq: Seq[T]): T = if (seq.isEmpty) null.asInstanceOf[T] else {
var i = 0
def obj = seq(i)
while (obj == null)
i += 1
obj
}
}
objects iterator filter { obj => (expensiveFunc(obj) != null } next
The trick is to get some lazy evaluated view on the colelction, either an iterator or a Stream, or objects.view. The filter will only execute as far as needed.
One way is this
list.distinct.size != list.size
Is there any better way? It would have been nice to have a containsDuplicates method
Assuming "better" means "faster", see the alternative approaches benchmarked in this question, which seems to show some quicker methods (although note that distinct uses a HashSet and is already O(n)). YMMV of course, depending on specific test case, scala version etc. Probably any significant improvement over the "distinct.size" approach would come from an early-out as soon as a duplicate is found, but how much of a speed-up is actually obtained would depend strongly on how common duplicates actually are in your use-case.
If you mean "better" in that you want to write list.containsDuplicates instead of containsDuplicates(list), use an implicit:
implicit def enhanceWithContainsDuplicates[T](s:List[T]) = new {
def containsDuplicates = (s.distinct.size != s.size)
}
assert(List(1,2,2,3).containsDuplicates)
assert(!List("a","b","c").containsDuplicates)
You can also write:
list.toSet.size != list.size
But the result will be the same because distinct is already implemented with a Set. In both case the time complexity should be O(n): you must traverse the list and Set insertion is O(1).
I think this would stop as soon as a duplicate was found and is probably more efficient than doing distinct.size - since I assume distinct keeps a set as well:
#annotation.tailrec
def containsDups[A](list: List[A], seen: Set[A] = Set[A]()): Boolean =
list match {
case x :: xs => if (seen.contains(x)) true else containsDups(xs, seen + x)
case _ => false
}
containsDups(List(1,1,2,3))
// Boolean = true
containsDups(List(1,2,3))
// Boolean = false
I realize you asked for easy and I don't now that this version is, but finding a duplicate is also finding if there is an element that has been seen before:
def containsDups[A](list: List[A]): Boolean = {
list.iterator.scanLeft(Set[A]())((set, a) => set + a) // incremental sets
.zip(list.iterator)
.exists{ case (set, a) => set contains a }
}
#annotation.tailrec
def containsDuplicates [T] (s: Seq[T]) : Boolean =
if (s.size < 2) false else
s.tail.contains (s.head) || containsDuplicates (s.tail)
I didn't measure this, and think it is similar to huynhjl's solution, but a bit more simple to understand.
It returns early, if a duplicate is found, so I looked into the source of Seq.contains, whether this returns early - it does.
In SeqLike, 'contains (e)' is defined as 'exists (_ == e)', and exists is defined in TraversableLike:
def exists (p: A => Boolean): Boolean = {
var result = false
breakable {
for (x <- this)
if (p (x)) { result = true; break }
}
result
}
I'm curious how to speed things up with parallel collections on multi cores, but I guess it is a general problem with early-returning, while another thread will keep running, because it doesn't know, that the solution is already found.
Summary:
I've written a very efficient function which returns both List.distinct and a List consisting of each element which appeared more than once and the index at which the element duplicate appeared.
Note: This answer is a straight copy of the answer on a related question.
Details:
If you need a bit more information about the duplicates themselves, like I did, I have written a more general function which iterates across a List (as ordering was significant) exactly once and returns a Tuple2 consisting of the original List deduped (all duplicates after the first are removed; i.e. the same as invoking distinct) and a second List showing each duplicate and an Int index at which it occurred within the original List.
Here's the function:
def filterDupes[A](items: List[A]): (List[A], List[(A, Int)]) = {
def recursive(remaining: List[A], index: Int, accumulator: (List[A], List[(A, Int)])): (List[A], List[(A, Int)]) =
if (remaining.isEmpty)
accumulator
else
recursive(
remaining.tail
, index + 1
, if (accumulator._1.contains(remaining.head))
(accumulator._1, (remaining.head, index) :: accumulator._2)
else
(remaining.head :: accumulator._1, accumulator._2)
)
val (distinct, dupes) = recursive(items, 0, (Nil, Nil))
(distinct.reverse, dupes.reverse)
}
An below is an example which might make it a bit more intuitive. Given this List of String values:
val withDupes =
List("a.b", "a.c", "b.a", "b.b", "a.c", "c.a", "a.c", "d.b", "a.b")
...and then performing the following:
val (deduped, dupeAndIndexs) =
filterDupes(withDupes)
...the results are:
deduped: List[String] = List(a.b, a.c, b.a, b.b, c.a, d.b)
dupeAndIndexs: List[(String, Int)] = List((a.c,4), (a.c,6), (a.b,8))
And if you just want the duplicates, you simply map across dupeAndIndexes and invoke distinct:
val dupesOnly =
dupeAndIndexs.map(_._1).distinct
...or all in a single call:
val dupesOnly =
filterDupes(withDupes)._2.map(_._1).distinct
...or if a Set is preferred, skip distinct and invoke toSet...
val dupesOnly2 =
dupeAndIndexs.map(_._1).toSet
...or all in a single call:
val dupesOnly2 =
filterDupes(withDupes)._2.map(_._1).toSet
This is a straight copy of the filterDupes function out of my open source Scala library, ScalaOlio. It's located at org.scalaolio.collection.immutable.List_._.
If you're trying to check for duplicates in a test then ScalaTest can be helpful.
import org.scalatest.Inspectors._
import org.scalatest.Matchers._
forEvery(list.distinct) { item =>
withClue(s"value $item, the number of occurences") {
list.count(_ == item) shouldBe 1
}
}
// example:
scala> val list = List(1,2,3,4,3,2)
list: List[Int] = List(1, 2, 3, 4, 3, 2)
scala> forEvery(list) { item => withClue(s"value $item, the number of occurences") { list.count(_ == item) shouldBe 1 } }
org.scalatest.exceptions.TestFailedException: forEvery failed, because:
at index 1, value 2, the number of occurences 2 was not equal to 1 (<console>:19),
at index 2, value 3, the number of occurences 2 was not equal to 1 (<console>:19)
in List(1, 2, 3, 4)