Will the foreach method of a Scala immutable Queue always be processed in the order one expects for a queue or is there a method that guarantees the order? Or do I have to use a loop + dequeue?
scala.collection.immutable.Queue is scala.collection.Seq. See Seq documentation:
Sequences are special cases of iterable collections of class Iterable. Unlike iterables, sequences always have a defined order of elements.
So yes, you'll get the same elements order with foreach and with loop + dequeue.
If you don't trust documentation you could take a look at implementation:
Queue#foreach is inherited from IterableLike and implemented like this:
def foreach[U](f: A => U): Unit = iterator.foreach(f)
Queue#iterator is implemented like this:
override def iterator: Iterator[A] = (out ::: in.reverse).iterator
And Queue#dequeue returns the first element of out, if any, or the last element of in. So you'll get the same elements order.
Related
So I am a bit perplexed. I have a piece of code in Scala (the exact code is not really all that important). I had all my methods written to take Seq[T]. These methods are mostly tail recursive and use the Seq[T] as an accumulator which is fed initially like Seq(). Interestingly enough, when I swap all the signature with the concrete List() implementation, I am observing three fold improvement in performance.
Isn't it the case that Seq's default implementation is in fact an immutable List ? So if that is the case, what is really going on ?
Calling Seq(1,2,3) and calling List(1,2,3) will both result in a 1 :: 2 :: 3 :: Nil. The Seq.apply method is just a very generic method that looks like this:
def apply[A](elems: A*): CC[A] = {
if (elems.isEmpty) empty[A]
else {
val b = newBuilder[A]
b ++= elems
b.result()
}
}
newBuilder is the thing that sort of matters here. That method delegates to scala.collection.immutable.Seq.newBuilder:
def newBuilder[A]: Builder[A, Seq[A]] = new mutable.ListBuffer
So the Builder for a Seq is a mutable.ListBuffer. A Seq gets constructed by appending the elements to the empty ListBuffer and then calling result on it, which is implemented like this:
def result: List[A] = toList
/** Converts this buffer to a list. Takes constant time. The buffer is
* copied lazily, the first time it is mutated.
*/
override def toList: List[A] = {
exported = !isEmpty
start
}
List also has a ListBuffer as Builder. It goes through a slightly different but similar building process. It is not going to make a big difference anyway, since I assume that most of your algorithm will consist of prepending things to a Seq, not calling Seq.apply(...) the whole time. Even if you did it shouldn't make much difference.
It's really not possible to say what is causing the behavior you're seeing without seeing the code that has that behavior.
For an immutable flavour, Iterator does the job.
val x = Iterator.fill(100000)(someFn)
Now I want to implement a mutable version of Iterator, with three guarantees:
thread-safe on all transformations(fold, foldLeft, ..) and append
lazy evaluated
traversable only once! Once used, an object from this Iterator should be destroyed.
Is there an existing implementation to give me these guarantees? Any library or framework example would be great.
Update
To illustrate the desired behaviour.
class SomeThing {}
class Test(val list: Iterator[SomeThing]) {
def add(thing: SomeThing): Test = {
new Test(list ++ Iterator(thing))
}
}
(new Test()).add(new SomeThing).add(new SomeThing);
In this example, SomeThing is an expensive construct, it needs to be lazy.
Re-iterating over list is never required, Iterator is a good fit.
This is supposed to asynchronously and lazily sequence 10 million SomeThing instances without depleting the executor(a cached thread pool executor) or running out of memory.
You don't need a mutable Iterator for this, just daisy-chain the immutable form:
class SomeThing {}
case class Test(val list: Iterator[SomeThing]) {
def add(thing: => SomeThing) = Test(list ++ Iterator(thing))
}
(new Test()).add(new SomeThing).add(new SomeThing)
Although you don't really need the extra boilerplate of Test here:
Iterator(new SomeThing) ++ Iterator(new SomeThing)
Note that Iterator.++ takes a by-name param, so the ++ operation is already lazy.
You might also want to try this, to avoid building intermediate Iterators:
Iterator.continually(new SomeThing) take 2
UPDATE
If you don't know the size in advance, then I'll often use a tactic like this:
def mkSomething = if(cond) Some(new Something) else None
Iterator.continually(mkSomething) takeWhile (_.isDefined) map { _.get }
The trick is to have your generator function wrap its output in an Option, which then gives you a way to flag that the iteration is finished by returning None
Of course... If you're really pushing out the boat, you can even use the dreaded null:
def mkSomething = if(cond) { new Something } else null
Iterator.continually(mkSomething) takeWhile (_ != null)
Seems like you need to hide the fact that the iterator is mutable but at the same time allow it to grow mutably. What I'm going to propose is the same sort of trick I've used to speed up ::: in the past:
abstract class AppendableIterator[A] extends Iterator[A]{
protected var inner: Iterator[A]
def hasNext = inner.hasNext
def next() = inner next ()
def append(that: Iterator[A]) = synchronized{
inner = new JoinedIterator(inner, that)
}
}
//You might need to add some more things, this is a skeleton
class JoinedIterator[A](first: Iterator[A], second: Iterator[A]) extends Iterator[A]{
def hasNext = first.hasNext || second.hasNext
def next() = if(first.hasNext) first next () else if(second.hasNext) second next () else Iterator.next()
}
So what you're really doing is leaving the Iterator at whatever place in its iteration you might have it while still preserving the thread safety of the append by "joining" another Iterator in non-destructively. You avoid the need to recompute the two together because you never actually force them through a CanBuildFrom.
This is also a generalization of just adding one item. You can always wrap some A in an Iterator[A] of one element if you so choose.
Have you looked at the mutable.ParIterable in the collection.parallel package?
To access an iterator over elements you can do something like
val x = ParIterable.fill(100000)(someFn).iterator
From the docs:
Parallel operations are implemented with divide and conquer style algorithms that parallelize well. The basic idea is to split the collection into smaller parts until they are small enough to be operated on sequentially.
...
The higher-order functions passed to certain operations may contain side-effects. Since implementations of bulk operations may not be sequential, this means that side-effects may not be predictable and may produce data-races, deadlocks or invalidation of state if care is not taken. It is up to the programmer to either avoid using side-effects or to use some form of synchronization when accessing mutable data.
I'm splitting a long string into an array of strings, then I want to insert all of them to the database. I can easily loop through the array and insert one by one, but it seems very inefficient. Then I think there is a insertAll() method. However, the insertAll() method is defined like this:
def insertAll(values: U*)
This only accepts multiple U, but not an Array or List.
/* Insert a new Tag */
def insert(insertTags: Array[String])(implicit s: Session) {
var insertSeq: List[Tag] = List()
for(tag <- insertTags) {
insertSeq ++= new Tag(None, tag)
}
Tag.insertAll(insertSeq)
}
*Tag is the table Object
This is the preliminary code I have written. It doesn't work because insertAll() doesn't take Seq. I hope there is a way to do this...so it won't generate an array length times SQL insertion clause.
When a function expect repeated parameters such as U* and you want to pass a sequence of U instead, it must be marked as a sequence argument, which is done with : _* as in
Tag.insertAll(insertSeq: _*)
This is described in the specification in section 6.6. It disambiguates some situations such as ;
def f(x: Any*)
f(Seq(1, "a"))
There, f could be called with a single argument, Seq(1, "a") or two, 1 and "a". It will be the former, the latter is done with f(Seq(1, "a"): _*). Python has a similar syntax.
Regarding the questions in your comment :
It works with Seq, collections that confoms to Seq, and values that can be implicitly converted to Seqs (which includes Arrays). This means a lot of collections, but not all of them. For instances, Set are not allowed (but they have a toSeq methods so it is still easy to call with a Set).
It is not a method, it it more like a type abscription. It simply tells the compiler that this the argument is the full expected Seq in itself, and not the only item in a sequence argument.
Streams can be used as class constructor arguments:
scala> ( 0 to 10).toStream.map(i =>{println("bla" + i); -i})
bla0
res0: scala.collection.immutable.Stream[Int] = Stream(0, ?)
scala> class B(val a:Seq[Int]){println(a.tail.head)}
defined class B
scala> new B(res0)
bla1
-1
res1: B = B#fdb84e
So, the Stream does not get fully evaluated although handed in as a Seq argument, and although being partly evaluated. Works as expected.
I have a class like this:
class HazelSimpleResultSet[T] (col: Seq[T], comparator:Comparator[T]) extends HGRandomAccessResult[T] with CountMe
{
val foo: Int = -1 // col of type Stream[T] already fully evaluated here.
def count = col.size
....
}
where HGRandomAccessResult and CountMe are simple interfaces.
I most cases I want to use Streams as col constructor arguments, to avoid costly operations. In the debugger I can follow that it works in some cases, since the value displayed for col remains Stream(xy, ?) and "tlVal = null", even after initialization of HazelSimpleResultSet.
Furthermore, for testing, I include println in the blocks that construct the Streams like this:
keyvalues.foldLeft(Stream.empty[KeyType]){ case (a, b) => ({ println("evaluating "+ b); unpack[KeyType](b)}) #:: a}
in order to follow in the Console when exactly the Stream is evaluted.
So, in some cases it works, but in some cases the Stream gets full evaluted during the very first moments of initialization of HazelSimpleResultSet. I cannot see no relevant difference in the Stream handed in, i'm just sure they are unevaluted Streams till that moment.
"Stepping into" with the debugger, I can see that it gets evaluted in the line of the class definition itself, before even reaching the class body, i.e. before initialization of any field.
EDIT:
I can define the class in a (suboptimal) way such that no field at all is referencing to the Stream, and still I get that behaviour.
The CountMe interface defines a "count" method, which calls col.size which would then evaluate all the Stream. I tried to define count in terms of a lazy val size, but that didn't make a difference.
I'm a bit at a loss why it doesn't work in some cases. Anybody has any hints about hidden caveats of Streams?
EDIT:
An important note: The Stream object wraps some serious state that it needs to evaluate, i.e. a reference to a NoSQL database (hazelcast).
Question: what are the caveats here? Is there something in particular I must take care of when my Stream carries stateful references necessary for evaluation?
If you create Stream like this:
Stream ({println("eval 1"); 1}, {println("eval 2"); 2})
then you are actually calling Stream.apply which is implemented like this:
/** A stream consisting of given elements */
override def apply[A](xs: A*): Stream[A] = xs.toStream
which means that what actually happens is:
All elements are evaluated!
A Seq containing these elements is created.
A Stream is created out of this Seq
So as you can see, if you create your Stream this way, all its elements are evaluated eagerly. This is not how you create lazily-evaluated Stream. What you want to do is probably use #:: and #::: operators that evaluate their operands lazily. Look up the docs for their usage.
I have looked at this question but still don't understand the difference between Iterable and Traversable traits. Can someone explain ?
Think of it as the difference between blowing and sucking.
When you have call a Traversables foreach, or its derived methods, it will blow its values into your function one at a time - so it has control over the iteration.
With the Iterator returned by an Iterable though, you suck the values out of it, controlling when to move to the next one yourself.
To put it simply, iterators keep state, traversables don't.
A Traversable has one abstract method: foreach. When you call foreach, the collection will feed the passed function all the elements it keeps, one after the other.
On the other hand, an Iterable has as abstract method iterator, which returns an Iterator. You can call next on an Iterator to get the next element at the time of your choosing. Until you do, it has to keep track of where it was in the collection, and what's next.
tl;dr Iterables are Traversables that can produce stateful Iterators
First, know that Iterable is subtrait of Traversable.
Second,
Traversable requires implementing the foreach method, which is used by everything else.
Iterable requires implementing the iterator method, which is used by everything else.
For example, the implemetation of find for Traversable uses foreach (via a for comprehension) and throws a BreakControl exception to halt iteration once a satisfactory element has been found.
trait TravserableLike {
def find(p: A => Boolean): Option[A] = {
var result: Option[A] = None
breakable {
for (x <- this)
if (p(x)) { result = Some(x); break }
}
result
}
}
In contrast, the Iterable subtract overrides this implementation and calls find on the Iterator, which simply stops iterating once the element is found:
trait Iterable {
override /*TraversableLike*/ def find(p: A => Boolean): Option[A] =
iterator.find(p)
}
trait Iterator {
def find(p: A => Boolean): Option[A] = {
var res: Option[A] = None
while (res.isEmpty && hasNext) {
val e = next()
if (p(e)) res = Some(e)
}
res
}
}
It'd be nice not to throw exceptions for Traversable iteration, but that's the only way to partially iterate when using just foreach.
From one perspective, Iterable is the more demanding/powerful trait, as you can easily implement foreach using iterator, but you can't really implement iterator using foreach.
In summary, Iterable provides a way to pause, resume, or stop iteration via a stateful Iterator. With Traversable, it's all or nothing (sans exceptions for flow control).
Most of the time it doesn't matter, and you'll want the more general interface. But if you ever need more customized control over iteration, you'll need an Iterator, which you can retrieve from an Iterable.
Daniel's answer sounds good. Let me see if I can to put it in my own words.
So an Iterable can give you an iterator, that lets you traverse the elements one at a time (using next()), and stop and go as you please. To do that the iterator needs to keep an internal "pointer" to the element's position. But a Traversable gives you the method, foreach, to traverse all elements at once without stopping.
Something like Range(1, 10) needs to have only 2 integers as state as a Traversable. But Range(1, 10) as an Iterable gives you an iterator which needs to use 3 integers for state, one of which is an index.
Considering that Traversable also offers foldLeft, foldRight, its foreach needs to traverse the elements in a known and fixed order. Therefore it's possible to implement an iterator for a Traversable. E.g.
def iterator = toList.iterator