How to systematically avoid unsafe pattern matching in Scala? - scala

Consider the following broken function:
def sum (list : Seq[Int]) : Int = list match {
case Nil => 0
case head :: tail => head + sum(tail)
}
Here, the function was supposed to work with a List[Int], but was refactored to accept Seq[Int] instead, thus becoming broken without the compiler noticing.
This gaping hole in Scala's incomplete pattern match detection makes it next to useless.
I want to have a way of systematically detecting such problems. Specifically, I'd like the compiler to emit an error/warning on every instanceof-guided pattern match, i.e. I only want to allow pattern matches on sealed hierarchies and on custom matchers.
Are there existing compiler options/plugins for doing conservative (as opposed to arbitrary) checks of pattern matching safety?

Have a look at this answer by M. Odersky.
Summary
Checks on matches of non-sealed hierarchies are doable, not trivial and not (yet) implemented.

Nil and :: are clearly ways to construct a List, but not all Sequences happen to be Lists, so one would expect the Scala type checker to reject this program as ill-typed. Right?
Wrong. Try this and you'll see what I mean:
def sum (list : Seq[Int]) : Int = list match {
case Nil => 0
case head :: tail => head + sum(tail)
case _ => -1
}
> sum(Array(1,2,3).toSeq)
res1: Int = -1
> sum(List(1,2,3))
res2: Int = 6
So you see, some Sequences might be able to be deconstructed with Nil and ::, so those which can, will. Those which can't will fail the pattern match and move on, trying the next match. Nil and :: are sufficient to cover all the possibilities for List, but not for Seq. There is a tradeoff here between subtyping, convenience, and type safety. The solution for now is: be more careful when you refactor.

Related

cons operator in case statements in Scala

So I read about right-associative operators like the cons operator in Scala. I'm wondering why they work in case statements. It seems like you can pattern match using the cons statement here?
def findKth1[A](k:Int, l:List[A]):A = (k, l) match {
case (0, h::_) => h
case(k, _::tail) if k > 0 => findKth1(k - 1, tail)
case _ => throw new NoSuchElementException
}
findKth1(2, List(3,4,5,6))
res1: Int = 5
What is the placeholder doing here? I've only seen placeholders used in functions like this: <SomeList>.map(_.doThing). Is it the same concept?
The only difference with :: and ::: is that ::: is used for 2 Lists right?
tl;dr It isn't pattern matching the operator, it's pattern matching a case class called ::
There are a couple of things happening at the same time. First of all, there may be a bit of confusion, because :: is a method on List:
val x: List[Int] = 1 :: 2 :: 3 :: Nil
But there is also a case class :: that, according to the docs is:
A non empty list characterized by a head and a tail.
More info here
Scala's case classes automatically come with extractor methods(unapply), which allow pattern matching, as in case User(name, age) => ....
You are also allowed to use the case class name in an infix position (although you shouldn't do so, except when the case class is used like an operator, such as in this case). So case head :: tail => ... is the same as case ::(head, tail) => .... More info here
When pattern matching, you can use _ to mean that there will be a value there, but you don't care about it, so you aren't providing it a name.
So the three cases you provided are roughly:
A tuple where the first value is 0, and the second value is a list with a head to be called h, and a tail which we will ignore.
A tuple with some integer to be called k, and a list with a head that we don't care about, and a tail, which we will call tail. Also, k must be greater than 0
Anything else, then throw a NoSuchElementException

When does Scala force a stream value?

I am comfortable with streams, but I admit I am puzzled by this behavior:
import collection.immutable.Stream
object StreamForceTest extends App {
println("Computing fibs")
val fibs: Stream[BigInt] = BigInt(0) #:: BigInt(1) #::
fibs.zip(fibs.tail).map((x: (BigInt, BigInt)) => {
println("Adding " + x._1 + " and " + x._2);
x._1 + x._2
})
println("Taking first 5 elements")
val fibs5 = fibs.take(5)
println("Computing length of that prefix")
println("fibs5.length = " + fibs5.length)
}
with output
Computing fibs
Taking first 5 elements
Computing length of that prefix
Adding 0 and 1
Adding 1 and 1
Adding 1 and 2
fibs5.length = 5
Why should take(5) not force the stream's values to be computed,
while length does do so? Offhand neither one needs to actually
look at the values, but I would have thought that take was more
likely to do it than length. Inspecting the source code on github,
we find these definitions for take (including an illuminating
comment):
override def take(n: Int): Stream[A] = (
// Note that the n == 1 condition appears redundant but is not.
// It prevents "tail" from being referenced (and its head being evaluated)
// when obtaining the last element of the result. Such are the challenges
// of working with a lazy-but-not-really sequence.
if (n <= 0 || isEmpty) Stream.empty
else if (n == 1) cons(head, Stream.empty)
else cons(head, tail take n-1)
)
and length:
override def length: Int = {
var len = 0
var left = this
while (!left.isEmpty) {
len += 1
left = left.tail
}
len
}
The definition of head and tail is obtained from the specific
subclass (Empty and Cons). (Of course Empty is an object, not a
class, and its definitions of head and tail just throw
exceptions.) There are subtleties, but they seem to concern making
sure that the tail of a Cons is evaluated lazily; the head
definition is straight out of lecture 0 on Scala constructors.
Note that length doesn't go near head, but it's the one that
does the forcing.
All this is part of a general puzzlement about how close Scala streams
are to Haskell lists. I thought Haskell treated head and tail
symmetrically (I'm not a serious Haskell hacker), and Scala forced
head evaluation in more circumstances. I'm trying to figure out
exactly what those circumstances are.
Stream's head is strict and its tail is lazy, as you can see in cons.apply and in the Cons constructor:
def apply[A](hd: A, tl: => Stream[A]) = new Cons(hd, tl)
class Cons[+A](hd: A, tl: => Stream[A]) extends Stream[A]
Notice the context in which the take method refers to tail:
cons(head, tail take n-1)
Because the expression tail take n-1 is used as the second argument to cons, which is passed by name, it doesn't force evaluation of tail take n-1, thus doesn't force evaluation of tail.
Whereas in length, the statement
left = left.tail
, by assigning left.tail to a var, does force its evaluation.
Scala is "strict by default". In most situations, everything you reference will be evaluated. We only have lazy evaluation in cases where a method/constructor parameter declares an call-by-name argument with =>, and in the culture we don't typically use this unless there's a special reason.
Let me offer another answer to this, one that just looks from a high level, i.e. without actually considering the code.
If you want to know how long a Stream is, you must evaluate it all the way to the end. Otherwise, you can only guess at its length. Admittedly, you may not actually care about the values (since you only want to count them) but that's immaterial.
On the other hand, when you "take" a certain number of elements from a stream (or indeed any collection) you are simply saying that you want at most that number of elements. The result is still a stream even though it may have been truncated.

Scala lists and splitting

I have a homework assignment that's having us use a list and splitting it into two parts with the elements in the first part are no greater than p, and the elements in the second part are greater than p. so it's like a quick sort, except we can't use any sorting. I really need some tips on how to go about this. I know I'm using cases, but I'm not familiar with how the list class works in scala. below is what I have so far, but in not sure how to go about the splitting of the 2 lists.
using
def split(p:Int, xs:List[Int]): List[Int] = {
xs match {
case Nil => (Nil, Nil)
case head :: tail =>
}
First off, you want split to return a pair of lists, so the return type needs to be (List[Int], List[Int]). However, working with pairs and lists together can often mean decomposing return values frequently. You may want to have an auxiliary function do the heavy lifting for you.
For instance, your auxiliary function might be given two lists, initially empty, and build up the contents until the first list is empty. The result would then be the pair of lists.
The next thing you have to decide in your recursive function design is, "What is the key decision?" In your case, it is "value no greater than p". That leads to the following code:
def split(p:Int, xs: List[Int]): (List[Int], List[Int]) = {
def splitAux(r: List[Int], ngt: List[Int], gt: List[Int]): (List[Int], List[Int]) =
r match {
case Nil => (ngt, gt)
case head :: rest if head <= p =>
splitAux(rest, head :: ngt, gt)
case head :: rest if head > p =>
splitAux(rest, ngt, head :: gt)
}
val (ngt, gt) = splitAux(xs, List(), List())
(ngt.reverse, gt.reverse)
}
The reversing step isn't strictly necessary, but probably is least surprising. Similarly, the second guard predicate makes explicit the path being taken.
However, there is a much simpler way: use builtin functionality.
def split(p:Int, xs: List[Int]): (List[Int], List[Int]) = {
(xs.filter(_ <= p), xs.filter(_ > p))
}
filter extracts only those items meeting the criterion. This solution walks the list twice, but since you have the reverse step in the previous solution, you are doing that anyway.

What's the reasoning behind adding the "case" keyword to Scala?

Apart from:
case class A
... case which is quite useful?
Why do we need to use case in match? Wouldn't:
x match {
y if y > 0 => y * 2
_ => -1
}
... be much prettier and concise?
Or why do we need to use case when a function takes a tuple? Say, we have:
val z = List((1, -1), (2, -2), (3, -3)).zipWithIndex
Now, isn't:
z map { case ((a, b), i) => a + b + i }
... way uglier than just:
z map (((a, b), i) => a + b + i)
...?
First, as we know, it is possible to put several statements for the same case scenario without needing some separation notation, just a line jump, like :
x match {
case y if y > 0 => y * 2
println("test")
println("test2") // these 3 statements belong to the same "case"
}
If case was not needed, compiler would have to find a way to know when a line is concerned by the next case scenario.
For example:
x match {
y if y > 0 => y * 2
_ => -1
}
How compiler would know whether _ => -1 belongs to the first case scenario or represents the next case?
Moreover, how compiler would know that the => sign doesn't represent a literal function but the actual code for the current case?
Compiler would certainly need a kind of code like this allowing cases isolation:
(using curly braces, or anything else)
x match {
{y if y > 0 => y * 2}
{_ => -1} // confusing with literal function notation
}
And surely, solution (provided currently by scala) using case keyword is a lot more readable and understandable than putting some way of separation like curly braces in my example.
Adding to #Mik378's answer:
When you write this: (a, b) => something, you are defining an anonymous Function2 - a function that takes two parameters.
When you write this: case (a, b) => something, you are defining an anonymous PartialFunction that takes one parameter and matches it against a pair.
So you need the case keyword to differentiate between these two.
The second issue, anonymous functions that avoid the case, is a matter of debate:
https://groups.google.com/d/msg/scala-debate/Q0CTZNOekWk/z1eg3dTkCXoJ
Also: http://www.scala-lang.org/old/node/1260
For the first issue, the choice is whether you allow a block or an expression on the RHS of the arrow.
In practice, I find that shorter case bodies are usually preferable, so I can certainly imagine your alternative syntax resulting in crisper code.
Consider one-line methods. You write:
def f(x: Int) = 2 * x
then you need to add a statement. I don't know if the IDE is able to auto-add parens.
def f(x: Int) = { val res = 2*x ; res }
That seems no worse than requiring the same syntax for case bodies.
To review, a case clause is case Pattern Guard => body.
Currently, body is a block, or a sequence of statements and a result expression.
If body were an expression, you'd need braces for multiple statements, like a function.
I don't think => results in ambiguities since function literals don't qualify as patterns, unlike literals like 1 or "foo".
One snag might be: { case foo => ??? } is a "pattern matching anonymous function" (SLS 8.5). Obviously, if the case is optional or eliminated, then { foo => ??? } is ambiguous. You'd have to distinguish case clauses for anon funs (where case is required) and case clauses in a match.
One counter-argument for the current syntax is that, in an intuition deriving from C, you always secretly hope that your match will compile to a switch table. In that metaphor, the cases are labels to jump to, and a label is just the address of a sequence of statements.
The alternative syntax might encourage a more inlined approach:
x match {
C => c(x)
D => d(x)
_ => ???
}
#inline def c(x: X) = ???
//etc
In this form, it looks more like a dispatch table, and the match body recalls the Map syntax, Map(a -> 1, b -> 2), that is, a tidy simplification of the association.
One of the key aspects of code readability is the words that grab your attention. For example,
return grabs your attention when you see it because you know that it is such a decisive action (breaking out of the function and possible sending a value back to the caller).
Another example is break--not that I like break, but it gets your attention.
I would agree with #Mik378 that case in Scala is more readable than the alternatives. Besides the compiler confusion he mentions, it gets your attention.
I am all for concise code, but there is a line between concise and illegible. I will gladly make the trade of 4n characters (where n is the number of cases) for the substantial readability that I get in return.

Explain some scala code - beginner

I've encountered this scala code and I'm trying to work out what its doing except the fact it returns an int. I'm unsure of these three lines :
l match {
case h :: t =>
case _ => 0
Function :
def iterate(l: List[Int]): Int =
l match {
case h :: t =>
if (h > n) 0
case _ => 0
}
First, you define a function called iterate and you specified the return type as Int. It has arity 1, parameter l of type List[Int].
The List type is prominent throughout functional programming, and it's main characteristics being that it has efficient prepend and that it is easy to decompose any List into a head and tail. The head would be the first element of the list (if non-empty) and the tail would be the rest of the List(which itself is a List) - this becomes useful for recursive functions that operate on List.
The match is called pattern matching.. it's essentially a switch statement in the C-ish languages, but much more powerful - the switch restricts you to constants (at least in C it does), but there is no such restriction with match.
Now, your first case you have h :: t - the :: is called a "cons", another term from functional programming. When you create a new List from another List via a prepend, you can use the :: operator to do it.
Example:
val oldList = List(1, 2, 3)
val newList = 0 :: oldList // newList == List(0, 1, 2, 3)
In Scala, operators that end with a : are really a method of the right hand side, so 0 :: oldList is the equivalent of oldList.::(0) - the 0 :: oldList is syntactic sugar that makes it easier to read.
We could've defined oldList like
val oldList = 1 :: 2 :: 3 :: Nil
where Nil represents an empty List. Breaking this down into steps:
3 :: Nil is evaluated first, creating the equivalent of a List(3) which has head 3 and empty tail.
2 is prepended to the above list, creative a new list with head 2 and tail List(3).
1 is prepended, creating a new list with head 1 and tail List(2, 3).
The resulting List of List(1, 2, 3) is assigned to the val oldList.
Now when you use :: to pattern match you essentially decompose a List into a head and tail, like the reverse of how we created the List above. Here when you do
l match {
case h :: t => ...
}
you are saying decompose l into a head and tail if possible. If you decompose successfully, you can then use these h and t variables to do whatever you want.. typically you would do something like act on h and call the recursive function on t.
One thing to note here is that your code will not compile.. you do an if (h > n) 0 but there is no explicit else so what happens is your code looks like this to the compiler:
if (h > n) 0
else { }
which has type AnyVal (the common supertype of 0 and "nothing"), a violation of your Int guarentee - you're going to have to add an else branch with some failure value or something.
The second case _ => is like a default in the switch, it catches anything that failed the head/tail decomposition in your first case.
Your code essentially does this:
Take the l List parameter and see if it can be decomposed into a head and tail.
If it can be, compare the head against (what I assume to be) a variable in the outer scope called n. If it is greater than n, the function returns 0. (You need to add what happens if it's not greater)
If it cannot be decomposed, the function returns 0.
This is called pattern matching. It's like a switch statement, but more powerful.
Some useful resources:
http://www.scala-lang.org/node/120
http://www.codecommit.com/blog/scala/scala-for-java-refugees-part-4