Coming from a Java background I am learning Scala and the following has me very confused. Why is the type returned different in these two (very similar but different) constructs, which vary only in how the source collection was build -
val seq1: IndexedSeq[Int] = for (i <- 1 to 10) yield i
vs.
val seq2: Array[Int] = for (i <- Array(1, 2, 3)) yield i
Please do point to me the right literature so that I can understand the core fundamentals at play here.
There are, in general, two different styles of collection operation libraries:
type-preserving: that is what you are confused about in your question
generic (not in the "parametric polymorphism sense" but the standard English sense of the word) or maybe "homogeneous"
Type-preserving collection operations try to preserve the type exactly for operations like filter, take, drop, etc. that only take existing elements unmodified. For operations like map, it tries to find the closest super type that can still hold the result. E.g. mapping over an IntSet with a function from Int to String can obviously not result in an IntSet, but only in a Set. Mapping an IntSet to Boolean could be represented in a BitSet, but I know of no collections framework that is clever enough to actually do that.
Generic / homogeneous collection operations always return the same type. Usually, this type is chosen to be very general, to accommodate the widest range of use cases. For example, In .NET, collection operations return IEnumerable, in Java, they return Streams, in C++, they return iterators, in Ruby, they return arrays.
Until recently, it was only possible to implement type-preserving collection operations by duplicating all operations for all types. For example, the Smalltalk collections framework is type-preserving, and it does this by having every single collections class re-implement every single collections operation. This results in a lot of duplicated code and is a maintenance nightmare. (It is no coincidence that many new object-oriented abstractions that get invented have their first paper written about how it can be applied to the Smalltalk collections framework. See Traits: Composable Units of Behaviour for an example.)
To my knowledge, the Scala 2.8 re-design of the collections framework (see also this answer on SO) was the first time someone managed to create type-preserving collections operations while minimizing (though not eliminating) duplication. However, the Scala 2.8 collections framework was widely criticized as being overly complex, and it has required constant work over the last decade. In fact, it actually lead to a complete re-design of the Scala documentation system as well, just to be able to hide the very complex type signatures that the type-preserving operations require. But, this still wasn't enough, so the collections framework was completely thrown out and re-designed yet again in Scala 2.13. (And this re-design took several years.)
So, the simple answer to your question is this: Scala tries as much as possible to preserve the type of the collection.
In your second case, the type of the collection is Array, and when you map over an Array, you get back an Array.
In your first case, the type of the collection is Range. Now, a Range doesn't actually have elements, though. It only has a beginning and an end and a step, and it produces the elements on demand while you are iterating over it. So, it is not that easy to produce a new Range with the new elements. The map function would basically need to be able to "reverse engineer" your mapping function to figure out what the new beginning and end and step should be. (Which is equivalent to solving the Halting Problem, or in other words impossible.) And what if you do something like this:
val seq1: IndexedSeq[Int] = for (i <- 1 to 10) yield scala.util.Random.nextInt(i)
Here, there isn't even a well-defined step, so it is actually impossible to build a Range that does this.
So, clearly, mapping over a Range cannot return a Range. So, it does the next best thing: It returns the most precise super type of Range that can contain the mapped values. In this case, that happens to be IndexedSeq.
There is a wrinkle, in that type-preserving collections operations challenge what we consider to be part of the contract of certain operations. For example, most people would argue that the cardinality of a collection should be invariant under map, in other words, map should map each element to exactly one new element and thus map should never change the size of the collection. But, what about this code:
Set(1, 2, 3).map { _ % 2 == 0 }
//=> Set(true, false)
Here, you get back a collection with fewer elements from a map, which is only supposed to transform elements, not remove them. But, since we decided we want type-preserving collections, and a Set cannot have duplicate values, the two false values are actually the same value, so there is only one of them in the set.
[It could be argued that this actually only demonstrates that Sets aren't collections and shouldn't be treated as collections. Sets are predicates ("Is this element a member?") rather than collections ("Give me all your elements!")]
this happens because construction:
for (x <-someSeq) yield x
is the same as:
someSeq.map(x => x)
for () yield is just syntactic sugar for flatMap/map function.
As we know map function doesn't change type of object-container, it changes elements inside container.
So, in your example 1 to 10 has a type Range.Inclusive which extends Range, and Range extends IndexedSeq. Mapped IndexedSeq has the same type IndexedSeq.
for (i <- 1 to 10) yield i the same as (1 to 10).map(x => x)
In second case: for (i <- Array(1, 2, 3)) yield i you have an Array, and mapped Array type also Array.
for (i <- Array(1, 2, 3)) yield i the same as Array(1, 2, 3).map(x => x)
I think the right literature would be Scala for the Impatient by Cay Horstmann. The first edition is a little outdated but it's held up pretty well. The book is a fairly easy read almost to the end (I admit I don't quite understand lexical parsers or actors, but that's probably on me not Horstmann).
One of the things that Horstmann's book explains early on is that although you can use for like in Java, it can actually do much more sophisticated things. As a toy example, consider this Java procedure:
public static void main(String[] args) {
HashSet<Integer> squaresMod10 = new HashSet<>();
for (int i = 1; i < 11; i++) {
squaresMod10.add(i * i % 10);
}
for (Integer j : squaresMod10) {
System.out.print(j + " ");
}
}
If you're using Java 8 or later, your IDE's probably advising you that you "can use functional operators." NetBeans rewrote it thus for me:
squaresMod10.forEach((j) -> {
System.out.print(j + " ");
});
In Scala, you can use "functional operations" for both the i and j loops in this example. Rather than fire up IntelliJ just for this, I'll use the local Scala REPL on my system.
scala> (1 to 10).map(i => i * i % 10)
res2: IndexedSeq[Int] = Vector(1, 4, 9, 6, 5, 6, 9, 4, 1, 0)
scala> (1 to 10).map(i => i * i % 10).toSet
res3: scala.collection.immutable.Set[Int] = HashSet(0, 5, 1, 6, 9, 4)
scala> for (j <- res3) System.out.print(j + " ")
^
warning: method + in class Int is deprecated (since 2.13.0):
Adding a number and a String is deprecated. Use the string interpolation `s"$num$str"`
0 5 1 6 9 4
In Java, what would you be going for with seq1 and seq2? With a standard for loop in Java, you're essentially guiding the computer through the process of looking at each element of an ad hoc collection one by one and performing a certain operation on it.
Lambdas like the one NetBeans wrote for me still become If and Goto constructs at the JVM level, just like regular for loops in Java, as do a lot of what Horstmann calls "for comprehensions" in Scala. Either way, you're delegating the nuts and bolts of how exactly that happens to the compiler. In other words, you're not micro-managing.
However, as your seq2 example shows, it's still possible to wrap collections unnecessarily.
scala> 1 to 10
res5: scala.collection.immutable.Range.Inclusive = Range 1 to 10
scala> res5 == seq1
res6: Boolean = true
scala> Array(1, 2, 3)
res7: Array[Int] = Array(1, 2, 3)
scala> res7 == seq2
res8: Boolean = false
Okay, that didn't quite go the way I thought it would, but my point stands: in this vacuum, it's unnecessary to wrap 1 to 10 into another IndexedSeq[Int] and it's unnecessary to wrap an Array[Int] into another Array[Int]. Your for "statements" are merely wrapping unnamed collections into named collections.
If a have a val x = List(2,3,5,8) and I want to append element 4 to the list, x::a or a::x work as expected. But is there an alternative to this notation?
If I understood your question correctly, we have:
val x = List(2,3,5,8)
val a = 4
and you wish to append (in immutable terms) a to x.
a::x works but will return a list with 4 prepended, so not what you asked for. x::a will not work at all because, well, you can't really prepend a list to an integer.
What you can do, for example, is use the :+ method:
x :+ a // Returns List(2, 3, 5, 8, 4)
Notice however that appending to a List requires linear time and may therefore be a bad idea, depending on your particular application. Consider using a different data structure if the performance of this operation is important. More information here.
I'm looking for an equivalent to the subsequences function in Scala. I cannot find it in scala.collection.Seq, maybe it is defined somewhere else. But where?
I think the problem is well known. As an example, given the sequence "abc", the list of all subsequences is ["","a","b","ab","c","ac","bc","abc"].
A quick and dirty implementation in Scala would be as follows:
(for {ys <- xs.inits.toList; zs <- ys.tails} yield zs).distinct
But it'd be nice to use something already defined, and more efficient.
You could use combinations:
(0 to xs.length).toIterator.flatMap(i => xs.combinations(i))
I'm trying to compile the following code, using Scala 2.11.7.
object LucasSeq {
val fibo: Stream[Int] = 0 #:: 1 #:: fibo.zip(fibo.tail).map { pair =>
pair._1 + pair._2
}
def firstKind(p: Int, q: Int): Stream[Int] = {
val lucas: Stream[Int] = 0 #:: 1 #:: lucas.zip(lucas.tail).map { pair =>
p * pair._2 - q * pair._1
}
lucas
}
}
fibo is based on the Fibonacci sequence example in Scala's Stream documentation, and it works.
However, the firstKind function, which tries to generalize the sequence with parameters p and q (making Lucas sequences of the first kind), has the following error:
LucasSeq.scala:7: error: forward reference extends over definition of value lucas
val lucas: Stream[Int] = 0 #:: 1 #:: lucas.zip(lucas.tail).map { pair =>
^
one error found
It's basically the same code, so why does it work outside the function but not inside a function?
This error message has puzzled many programmers before me. I've considered…
So just don't put that code in a function — but I do want a function.
implicit val lucas — doesn't help.
Self-references can only be used in lazy expressions — but this is lazy, right?
Compile with -Xprint:typer diagnostics — not sure what to do with that information.
Is it a shadowing issue? — No, I'm using identifiers that don't clash.
Compiler bug? — I hope not. The referenced bug should be already fixed in 2.11.7.
I could probably go on reading for hours, but I think it would be best to ask for help at this point. I'm looking for both a solution and an explanation. (I'm familiar with functional programming, but new to Scala, so if the explanation involves terms like "synthetic" and "implicit", then I'll probably need an additional explanation of that as well.)
There was an answer here, but it was deleted for some reason.
There are basically two options. You could make your val into lazy val. Or you could define your lucas: Stream[Int] in a class as a field. You can parameterize the class with p and q in the constructor.
You are right that the original code is lazy. But it is not lazy enough for scala to translate it.
For the sake of simplicity think into what code val a = 1 + a will be translated (I know the code does not make sense much). In Java int a = 1 + a won't work. Java will try to use a in 1 + a, but a is not yet initialized. Even if Java had Integer a = 1 + a, and a would be a reference, Java still not able to execute this, because Java runs 1 + a statement when allocating a
So it leaves us with two options. Defining a not as a variable, but as a field. Scala automatically resolve the problem by defining a recursive method, instead of a field - because field in scala is two methods + variable anyway. Or you could tell scala explicitly that it should resolve the lazy problem here by specifying your val as lazy val. This will make scala generate a hidden class with all the necessary infrastructure for it to be lazy.
You can check this behavior by running your compiler with -print option. The output is rather complicated though, especially in lazy val case.
Also please note that because your stream leaves the scope and also because you have two parameters for your stream - p and q, your stream will be recomputed each call if you go with lazy val option. If you choose creating an additional class - you are able to control this, by caching all instances of this class for each p and q possible
P.S. By saying Java here I of course mean JVM. It just easier to think in terms of Java
I'm new to the language and trying to figure out how to read some of the code in it. Here is the example code that I'm trying to figure out:
lazy val genHeap: Gen[H] = for{
n <- arbitrary[A]
h <- frequency((1,value(empty)),(9,genHeap))
} yield insert(n,h)
I don't quite understand what is going on:
The return type is Gen?
Does the <- act as an = operator?
Is the yield statement building a heap with each iteration by inserting a new element?
Hello fellow Coursera student! The Principles of Reactive Programming Course is not exactly the easiest place to start to learn Scala! It is an advanced Scala course.
The type return is a Gen?
Yes, that's what the : means. (The Gen itself is an object, a random generator to be precise, which can produce a sequence of values, each having the same type as its type parameter - in this case, H.)
Does the <- act as an '=' operator?
Not exactly.
and the yield statement.. as I understand, it is building a heap with each iteration by inserting a new element?
Actually it's a recursion, not an iteration... but essentially, yes.
A for..yield expression is a fancy way to write a series of map, flatMap and withFilter invocations. Let's desugar it down into ordinary Scala code:
lazy val genHeap: Gen[H] = arbitrary[A].flatMap(n => frequency((1,value(empty)),(9,genHeap)).map(h => insert(n,h)))
So a H generator (genHeap) is one that starts by generating an arbitrary A, then generating an arbitrary H (an empty H with probability 0.1, or the result of invoking genHeap itself again with probability 0.9), and then inserting the A into the H to get a new H.
These As and Hs are both abstract types, by the way.
Yes, I'd say this is pretty advanced stuff. If you don't even know what : means, you're definitely starting in the wrong place.