Purpose of toIterable - scala

In scala exercies I have found the following example:
val set = Set(4, 6, 7, 8, 9, 13, 14)
val result = set.toIterable
with the following description:
toIterable will convert any Traversable to an Iterable. This is a base trait for all Scala collections that define an iterator method to iterate through the collection's elements
But Set is already an Iterable, so what's the point of this method? If this isn't the valid case, could you point me one?

In Scala 2.13 there is no more Traversable:
Simpler type hierarchy
No more Traversable and TraversableOnce. They remain only as
deprecated aliases for Iterable and IterableOnce.
Calling toIterable on Set is redundant as it will simply return this same collection:
This collection as an Iterable[A]. No new collection will be built if
this is already an Iterable[A].
Examples where toIterable would have an effect would be
"Hello".toIterable
Array(1).toIterable
which implicitly converts to
wrapString("Hello").toIterable
wrapIntArray(Array(1)).toIterable
and make these Java-like types into Scala collections proper.

In addition to Mario Galic's answer, the other thing it does is change the static type. If you and the compiler knew it was a Set before the call, you don't know afterwards. Though the same can be achieved with a type ascription
val result: Iterable[Int] = set
(and this will work for strings and arrays as well), then you need to write out the type parameter, which may much more complex than Int.
Why would I use it? If i know it's a Set, why would I change the type to Iterable?
it can be in a method which can be overridden and doesn't have to return Set in subclasses:
class Super {
def someValues = {
val set = ... // you want to avoid duplicates
set
}
}
class Sub : Super {
override def someValues = {
List(...) // happens to have duplicates this time
}
doesn't compile, but would if Super#someValues returned set.toIterable (though it's generally good practice to have explicit return types).
It can influence later inferred types:
val arr = Array(set)
arr(0) = List(0, 1, 2, 3)
doesn't compile, but would with Array(set.toIterable).

Related

Scala : Does variable type inference affect performance?

In Scala, you can declare a variable by specifying the type, like this: (method 1)
var x : String = "Hello World"
or you can let Scala automatically detect the variable type (method 2)
var x = "Hello World"
Why would you use method 1? Does it have a performance benefit?
And once the variable has been declared, will it behave exactly the same in all situations wether it has been declared by method 1 or method 2?
Type inference is done at compile time - it's essentially the compiler figuring out what you mean, filling in the blanks, and then compiling the resulting code.
What this means is that there can be no runtime cost to type inference. The compile time cost, however, can sometimes be prohibitive and require you to explicitly annotate some of your expressions.
You will not have any performance difference using this two variants.
They will both be compiled to the same code.
The other answers assume that the compiler inferred what you think it inferred.
It is easy to demonstrate that specifying the type in a definition will set the expected type for the RHS of the definition and guide type inference.
For example, in this method that builds a collection of something, A is inferred to be Nothing, which may not be what you wanted:
scala> def build[A, B, C <: Iterable[B]](bs: B*)(implicit cbf: CanBuildFrom[A, B, C]): C = {
| val b = cbf(); println(b.getClass); b ++= bs; b.result }
build: [A, B, C <: Iterable[B]](bs: B*)(implicit cbf: scala.collection.generic.CanBuildFrom[A,B,C])C
scala> val xs = build(1,2,3)
class scala.collection.immutable.VectorBuilder
xs: scala.collection.immutable.IndexedSeq[Int] = Vector(1, 2, 3)
scala> val xs: List[Int] = build(1,2,3)
class scala.collection.mutable.ListBuffer
xs: List[Int] = List(1, 2, 3)
scala> val xs: Seq[Int] = build(1,2,3)
class scala.collection.immutable.VectorBuilder
xs: Seq[Int] = Vector(1, 2, 3)
Obviously, it matters for runtime performance whether you get a List or a Vector.
This is a lame example, but in many expressions you wouldn't notice the type of an intermediate collection unless it caused a performance problem.
Sample conversations:
https://groups.google.com/forum/#!msg/scala-language/mQ-bIXbC1zs/wgSD4Up5gYMJ
http://grokbase.com/p/gg/scala-user/137mgpjg98/another-funny-quirk
Why is Seq.newBuilder returning a ListBuffer?
https://groups.google.com/forum/#!topic/scala-user/1SjYq_qFuKk
In the simple example you gave, there is no difference in the generated byte code, and therefore no difference in performance. It would also make no noticeable difference in compilation speed.
In more complex code (likely involving implicits) you could run into cases where compile-type performance would be noticeably improved by specifying some types. However, I would completely ignore this until and unless you run into it -- specify types or not for other, better reasons.
More in line with your question, there is one very important case where it is a good idea to specify the type to ensure good run-time performance. Consider this code:
val x = new AnyRef { def sayHi() = println("Howdy!") }
x.sayHi
That code uses reflection to call sayHi, and that's a huge performance hit. Recent versions of Scala will warn you about this code for that reason, unless you have enabled the language feature for it:
warning: reflective access of structural type member method sayHi should be enabled
by making the implicit value scala.language.reflectiveCalls visible.
This can be achieved by adding the import clause 'import scala.language.reflectiveCalls'
or by setting the compiler option -language:reflectiveCalls.
See the Scala docs for value scala.language.reflectiveCalls for a discussion
why the feature should be explicitly enabled.
You might then change the code to this, which does not make use of reflection:
trait Talkative extends AnyRef { def sayHi(): Unit }
val x = new Talkative { def sayHi() = println("Howdy!") }
x.sayHi
For this reason you generally want to specify the type of the variable when you are defining classes this way; that way if you inadvertently add a method that would require reflection to call, you'll get a compilation error -- the method won't be defined for the variable's type. So while it is not the case that specifying the type makes the code run faster, it is the case that if the code would be slow, specifying the type makes it fail to compile.
val x: AnyRef = new AnyRef { def sayHi() = println("Howdy!") }
x.sayHi // ERROR: sayHi is not defined on AnyRef
There are of course other reasons why you might want to specify a type. They are required for the formal parameters of methods/functions, and for the return types of methods that are recursive or overloaded.
Also, you should always specify return types for methods in a public API (unless they are just trivially obvious), or you might end up with different method signatures than you intended, and then risk breaking existing clients of your API when you fix the signature.
You may of course want to deliberately widen a type so that you can assign other types of things to a variable later, e.g.
var shape: Shape = new Circle(1.0)
shape = new Square(1.0)
But in these cases there is no performance impact.
It is also possible that specifying a type will cause a conversion, and of course that will have whatever performance impact the conversion imposes.

Varargs with different type parameters in scala

I'm new to Scala...
Anyway, I want to do something like:
val bar = new Foo("a" -> List[Int](1), "b" -> List[String]("2"), ...)
bar("a") // gives List[Int] containing 1
bar("b") // gives List[String] containing "2"
The problem when I do:
class Foo(pairs: (String, List[_])*) {
def apply(name: String): List[_] = pairs.toMap(name)
}
pairs is gonna be Array[(String, List[Any]) (or something like that) and apply() is wrong anyway since List[_] is one type instead of "different types". Even if the varargs * returned a tuple I'm still not sure how I'd go about getting bar("a") to return a List[OriginalTypePassedIn]. So is there actually a way of doing this? Scala seems pretty flexible so it feels like there should be some advanced way of doing this.
No.
That's just the nature of static type systems: a method has a fixed return type. It cannot depend on the values of the method's parameters, because the parameters are not known at compile time. Suppose you have bar, which is an instance of Foo, and you don't know anything about how it was instantiated. You call bar("a"). You will get back an instance of the correct type, but since that type isn't determined until runtime, there's no way for a compiler to know it.
Scala does, however, give you a convenient syntax for subtyping Foo:
object bar extends Foo {
val a = List[Int](1)
val b = List[String]("2")
}
This can't be done. Consider this:
val key = readStringFromUser();
val value = bar(key);
what would be the type of value? It would depend on what the user has input. But types are static, they're determined and used at compile time.
So you'll either have to use a fixed number of arguments for which you know their types at compile time, or use a generic vararg and do type casts during runtime.

differences between List(), Array() and new List(), new Array() in Scala

I know that when you type:
val list = List(2,3)
you are accessing the apply method of the List object which returns a List. What I can't understand is why is this possible when the List class is abstract and therefore cannot be directly instanciated(new List() won't compile)?
I'd also like to ask what is the difference between:
val arr = Array(4,5,6)
and
val arr = new Array(4, 5, 6)
The List class is sealed and abstract. It has two concreate implementations
Nil which represents an empty list
::[B] which represents a non empty list with head and tail. ::[B] in the documentation
When you call List.apply it will jump through some hoops and supply you with an instance of the ::[B] case class.
About array: new Array(4, 5, 6) will throw a compile error as the constructor of array is defined like this: new Array(_length: Int). The apply method of the Array companion object uses the arguments to create a new instance of an Array (with the help of ArrayBuilder).
I started writing that the easy way to determine this is to look at the sources for the methods you're calling, which are available from the ScalaDoc. However, the various levels of indirection that are gone through to actually build a list give lie to the term 'easy'! It's worth having a look through if you want, starting from the apply method in the List object which is defined as follows:
override def apply[A](xs: A*): List[A] = xs.toList
You may or may not know that a parameter of the form xs : A* is treated internally as a Seq, which means that we're calling the toList method on a Seq, which is defined in TraversableOnce. This then delegates to a generic to method, which looks for an implicit
CanBuildFrom which actually constructs the list. So what you're getting back is some implementation of List which is chosen by the CanBuildFrom. What you actually get is a scala.collection.immutable.$colon$colon, which implements a singly-linked list.
Luckily, the behaviour of Array.apply is a little easier to look up:
def apply[T: ClassTag](xs: T*): Array[T] = {
val array = new Array[T](xs.length)
var i = 0
for (x <- xs.iterator) { array(i) = x; i += 1 }
array
}
So, Array.apply just delegates to new Array and then sets elements appropriately.

Why does Scala maintain the type of collection not return Iterable (as in .Net)?

In Scala, you can do
val l = List(1, 2, 3)
l.filter(_ > 2) // returns a List[Int]
val s = Set("hello", "world")
s.map(_.length) // returns a Set[Int]
The question is: why is this useful?
Scala collections are probably the only existing collection framework that does this. Scala community seems to agree that this functionality is needed. Yet, noone seems to miss this functionality in the other languages. Example C# (modified naming to match Scala's):
var l = new List<int> { 1, 2, 3 }
l.filter(i => i > 2) // always returns Iterable[Int]
l.filter(i => i > 2).toList // if I want a List, no problem
l.filter(i => i > 2).toSet // or I want a Set
In .NET, I always get back an Iterable and it is up to me what I want to do with it. (This also makes .NET collections very simple) .
The Scala example with Set forces me to make a Set of lengths out of a Set of string. But what if I just want to iterate over the lengths, or construct a List of lengths, or keep the Iterable to filter it later. Constructing a Set right away seems pointless. (EDIT: collection.view provides the simpler .NET functionality, nice)
I am sure you will show me examples where the .NET approach is absolutely wrong or kills performance, but I just can't see any (using .NET for years).
Not a full answer to your question, but Scala never forces you to use one collection type over another. You're free to write code like this:
import collection._
import immutable._
val s = Set("hello", "world")
val l: Vector[Int] = s.map(_.length)(breakOut)
Read more about breakOut in Daniel Sobral's detailed answer to another question.
If you want your map or filter to be evaluated lazily, use this:
s.view.map(_.length)
This whole behavior makes it easy to integrate your new collection classes and inherit all the powerful capabilities of the standard collection with no code duplication, all of this ensuring that YourSpecialCollection#filter returns an instance of YourSpecialCollection; that YourSpecialCollection#map returns an instance of YourSpecialCollection if it supports the type being mapped to, or a built-in fallback collection if it doesn't (like what happens of you call map on a BitSet). Surely, a C# iterator has no .toMySpecialCollection method.
See also: “Integrating new sets and maps” in The Architecture of Scala Collections.
Scala follows the "uniform return type principle" assuring that you always end up with the appropriate return type, instead of loosing that information like in C#.
The reason C# does it this was is that their type system is not good enough to provide these assurances without overriding the whole implementation of every method in every single subclass. Scala solves this with the usage of Higher Kinded Types.
Why Scala has the only collection framework doing this? Because it is harder than most people think it is, especially when things like Strings and Arrays which are no "real" collections should be integrated as well:
// This stays a String:
scala> "Foobar".map(identity)
res27: String = Foobar
// But this falls back to the "nearest" appropriate type:
scala> "Foobar".map(_.toInt)
res29: scala.collection.immutable.IndexedSeq[Int] = Vector(70, 111, 111, 98, 97, 114)
If you have a Set, and an operation on it returns an Iterable while its runtime type is still a Set, then you're losing important informations about its behavior, and the access to set-specific methods.
BTW: There are other languages behaving similar, like Haskell, which influenced Scala a lot. The Haskell version of map would look like this translated to Scala (without implicitmagic):
//the functor type class
trait Functor[C[_]] {
def fmap[A,B](f: A => B, coll: C[A]) : C[B]
}
//an instance
object ListFunctor extends Functor[List] {
def fmap[A,B](f: A => B, list: List[A]) : List[B] = list.map(f)
}
//usage
val list = ListFunctor.fmap((x:Int) => x*x, List(1,2,3))
And I think the Haskell community values this feature as well :-)
It is a matter of consistency. Things are what they are, and return things like them. You can depend on it.
The difference you make here is one of strictness. A strict method is immediately evaluated, while a non-strict method is only evaluated as needed. This has consequences. Take this simple example:
def print5(it: Iterable[Int]) = {
var flag = true
it.filter(_ => flag).foreach { i =>
flag = i < 5
println(i)
}
}
Test it with these two collections:
print5(List.range(1, 10))
print5(Stream.range(1, 10))
Here, List is strict, so its methods are strict. Conversely, Stream is non-strict, so its methods are non-strict.
So this isn't really related to Iterable at all -- after all, both List and Stream are Iterable. Changing the collection return type can cause all sort of problems -- at the very least, it would make the task of keeping a persistent data structure harder.
On the other hand, there are advantages to delaying certain operations, even on a strict collection. Here are some ways of doing it:
// Get an iterator explicitly, if it's going to be used only once
def print5I(it: Iterable[Int]) = {
var flag = true
it.iterator.filter(_ => flag).foreach { i =>
flag = i < 5
println(i)
}
}
// Get a Stream explicitly, if the result will be reused
def print5S(it: Iterable[Int]) = {
var flag = true
it.toStream.filter(_ => flag).foreach { i =>
flag = i < 5
println(i)
}
}
// Use a view, which provides non-strictness for some methods
def print5V(it: Iterable[Int]) = {
var flag = true
it.view.filter(_ => flag).foreach { i =>
flag = i < 5
println(i)
}
}
// Use withFilter, which is explicitly designed to be used as a non-strict filter
def print5W(it: Iterable[Int]) = {
var flag = true
it.withFilter(_ => flag).foreach { i =>
flag = i < 5
println(i)
}
}

scala implicit or explicit conversion from iterator to iterable

Does Scala provide a built-in class, utility, syntax, or other mechanism for converting (by wrapping) an Iterator with an Iterable?
For example, I have an Iterator[Foo] and I need an Iterable[Foo], so currently I am:
val foo1: Iterator[Foo] = ....
val foo2: Iterable[Foo] = new Iterable[Foo] {
def elements = foo1
}
This seems ugly and unnecessary. What's a better way?
Iterator has a toIterable method in Scala 2.8.0, but not in 2.7.7 or earlier. It's not implicit, but you could define your own implicit conversion if you need one.
You should be very careful about ever implicitly converting an Iterator into an Iterable (I normally use Iterator.toList - explicitly). The reason for this is that, by passing the result into a method (or function) which expects an Iterable, you lose control of it to the extent that your program might be broken. Here's one example:
def printTwice(itr : Iterable[String]) : Unit = {
itr.foreach(println(_))
itr.foreach(println(_))
}
If an Iterator were somehow implicitly convertible into an Iterable, what will the following would print?
printTwice(Iterator.single("Hello"))
It will (of course) only print Hello once. Very recently, the trait TraversableOnce has been added to the collections library, which unifies Iterator and Iterable. To my mind, this is arguably a mistake.
My personal preference is to use Iterator explicitly wherever possible and then use List, Set or IndexedSeq directly. I have found that I can rarely write a method which is genuinely agnostic of the type it is passed. One example:
def foo(trades: Iterable[Trade]) {
log.info("Processing %d trades", trades.toList.length) //hmmm, converted to a List
val shorts = trades.filter(_.side.isSellShort)
log.info("Found %d sell-short", shorts.toList.length) //hmmm, converted to a List again
//etc