scala type erasure - matching case class - scala

I was reading about type erasure, here
case class Thing[T](value: T)
def processThing(thing: Thing[_]) = {
thing match {
case Thing(value: Int) => "Thing of int" //isn't thing Thing[_] anymore?
case Thing(value: String) => "Thing of string" //isn't thing Thing[_] anymore?
case Thing(value: Seq[Int]) => "Thing of Seq[Int]" //value=Seq[_] really
case Thing(value: Seq[String]) => "Thing of Seq[String]" //value=Seq[_] really
case _ => "Thing of something else"
}
}
println(processThing(Thing(Seq(1,2,3)))) //type erased, I get it
println(processThing(Thing(Seq("hello", "yo")))) //type erased, I get it
println(processThing(Thing(1))) //why is this working?
println(processThing(Thing("hello"))) //why is this working?
I understand why Seq[Int] and Seq[String] are not identified correctly, at runtime both are seen like Seq[Object].
However, I do not understand why the first two examples ARE working:
why Thing[Int] and Thing[String], both being Thing[T]
are NOT having the same problem that Seq[T] has...
Why Thing[Seq[T]] is type erased
but Thing[T] (T=Int, String) is not?
Can anybody explain what is going on here? Thanks

You are right that at runtime both Thing(1) and Thing("Hello") will erase to the same class Thing.
Thus, if you do something like this:
thing match {
case _: Thing[Int] => foo
case _: Thing[String] => bar
}
You would see the behavior you expected.
However, your pattern match is doing something different, it extracts the value inside thing and then performs a class check on that.
The class information of the value is preserved by itself so that is why you can distinguish between Int, String and Seq, but you can't see what was the type parameter of the Seq
But, you may try to check the first element of the Seq ... but that still won't be enough, since the first element may be a Dog and the second a Cat because it was a Seq[Animal] and that check is even more unsafe than the previous ones since the Seq may be empty.

Consider the following Java code:
class Foo {
<T> void bar(T whatIsIt) {
System.out.println("It is a: " + whatIsIt.getClass().getName());
}
public static void main() {
Foo f = new Foo();
f.bar(123);
f.bar("Hello world");
f.bar(f);
f.bar(new ArrayList<String>());
}
}
At runtime there exists only one method Foo#bar(Object). But the object that is passed in as whatIsIt still has a pointer to whatever class it actually is.
Scala has access to the same metadata that Java does. So when it builds the bytecode for processThing it can insert value instanceOf Integer - and that's what it does:
scala> :javap processThing
// ... snip ...
public java.lang.String processThing($line3.$read$$iw$Thing<?>);
descriptor: (L$line3/$read$$iw$Thing;)Ljava/lang/String;
flags: (0x0001) ACC_PUBLIC
Code:
stack=1, locals=8, args_size=2
0: aload_1
1: astore_3
2: aload_3
3: ifnull 29
6: aload_3
7: invokevirtual #37 // Method $line3/$read$$iw$Thing.value:()Ljava/lang/Object;
10: astore 4
12: aload 4
14: instanceof #39 // class java/lang/Integer
17: ifeq 26
20: ldc #41 // String Thing of int
// ... snip ...

Related

Pattern Matching - # versus :?

Given:
case object A
What's the difference, if any, between the # and : in:
def f(a: A.type): Int = a match {
case aaa # A => 42
}
and
def f(a: A.type): Int = a match {
case aaa : A.type => 42
}
The first one # uses an extractor to do the pattern matching while the second one : requires the type - that's why you need to pass in A.type there.
There's actually no difference between them in terms of matching. To better illustrate the difference between # and : we can look at a simple class, which doesn't provide an extractor out of the box.
class A
def f(a: A) = a match {
case _ : A => // works fine
case _ # A => // doesn't compile because no extractor is found
}
In this very specific case, almost nothing is different. They will both achieve the same results.
Semantically, case aaa # A => 42 is usage of pattern binding where we're matching on the exact object A, and case aaa : A.type => 42 is a type pattern where we want a to have the type A.type. In short, type versus equality, which doesn't make a difference for a singleton.
The generated code is actually slightly different. Consider this code compiled with -Xprint:patmat:
def f(a: A.type): Int = a match {
case aaa # A => 42
case aaa : A.type => 42
}
The relevant code for f shows that the two cases are slightly different, but will not produce different results:
def f(a: A.type): Int = {
case <synthetic> val x1: A.type = a;
case6(){
if (A.==(x1)) // case aaa # A
matchEnd5(42)
else
case7()
};
case7(){
if (x1.ne(null)) // case aaa: A.type
matchEnd5(42)
else
case8()
};
case8(){
matchEnd5(throw new MatchError(x1))
};
matchEnd5(x: Int){
x
}
}
The first case checks equality, where the second case only checks that the reference is not null (we already know the type matches since the method parameter is the singleton type).
Semantically, there is no difference in this case. We can have a look at the bytecode to see if there is a runtime difference:
> object A
defined object A
> object X { def f(a: A.type) = a match { case a # A => 42 } }
defined object X
> :javap X
...
public int f($line4.$read$$iw$$iw$$iw$$iw$$iw$$iw$A$);
descriptor: (L$line4/$read$$iw$$iw$$iw$$iw$$iw$$iw$A$;)I
flags: ACC_PUBLIC
Code:
stack=3, locals=4, args_size=2
0: aload_1
1: astore_3
2: getstatic #51 // Field $line4/$read$$iw$$iw$$iw$$iw$$iw$$iw$A$.MODULE$:L$line4/$read$$iw$$iw$$iw$$iw$$iw$$iw$A$;
5: aload_3
6: invokevirtual #55 // Method java/lang/Object.equals:(Ljava/lang/Object;)Z
9: ifeq 18
12: bipush 42
14: istore_2
15: goto 30
18: goto 21
21: new #57 // class scala/MatchError
24: dup
25: aload_3
26: invokespecial #60 // Method scala/MatchError."<init>":(Ljava/lang/Object;)V
29: athrow
30: iload_2
31: ireturn
And the other case:
> object Y { def f(a: A.type) = a match { case a: A.type => 42 } }
defined object Y
> :javap Y
...
public int f($line4.$read$$iw$$iw$$iw$$iw$$iw$$iw$A$);
descriptor: (L$line4/$read$$iw$$iw$$iw$$iw$$iw$$iw$A$;)I
flags: ACC_PUBLIC
Code:
stack=3, locals=4, args_size=2
0: aload_1
1: astore_3
2: aload_3
3: ifnull 12
6: bipush 42
8: istore_2
9: goto 24
12: goto 15
15: new #50 // class scala/MatchError
18: dup
19: aload_3
20: invokespecial #53 // Method scala/MatchError."<init>":(Ljava/lang/Object;)V
23: athrow
24: iload_2
25: ireturn
Indeed, there is a small difference. In the second case the compiler can see that a parameter of type A.type has only two values: A.type and null. Therefore at runtime there is only a check whether it is null because the other case is checked at compile time. In the first version of the code, the compiler doesn't do this optimization. Instead it is calling the equals method.
If we change the type of the parameter slightly, we get a different result:
> object Z { def f(a: AnyRef) = a match { case a: A.type => 42 } }
defined object Z
> :javap Z
...
public int f(java.lang.Object);
descriptor: (Ljava/lang/Object;)I
flags: ACC_PUBLIC
Code:
stack=3, locals=4, args_size=2
0: aload_1
1: astore_3
2: aload_3
3: getstatic #51 // Field $line4/$read$$iw$$iw$$iw$$iw$$iw$$iw$A$.MODULE$:L$line4/$read$$iw$$iw$$iw$$iw$$iw$$iw$A$;
6: if_acmpne 15
9: bipush 42
11: istore_2
12: goto 27
15: goto 18
18: new #53 // class scala/MatchError
21: dup
22: aload_3
23: invokespecial #56 // Method scala/MatchError."<init>":(Ljava/lang/Object;)V
26: athrow
27: iload_2
28: ireturn
In this version the compiler no longer knows what the parameter is, therefore it is doing a comparison of the types at runtime. We could now discuss whether the call of equals in the first version or the type comparison in the third call is more efficient but I guess the JIT of the JVM is optimizing away any overhead in both cases anyway, therefore we first would have to look at the machine code to tell which code is more efficient, if there is a difference at all.
Semantically there is no different in this particular example but in general we include keyword # if we want to do something with the object itself. This thread explains the use of these extractors with a simple example.

Extract same value type from pattern matching "or" case

This must be possible, yet I'm unable to find any examples or guidance online...
I am trying to extract a variable from an Either return where Left can have an Exception case class with the value I want OR a Right with the value I want.
Definitions:
def findInnerObj(innerObjId: String): Either[InnerObjNotFoundException, (OuterObj, InnerObj)] = ???
case class InnerObjNotFoundException(outer: OuterObj) extends Exception
Usage:
findInnerObj(innerObjId) match {
case Left(InnerObjNotFoundException(x)) | Right((x, _)) =>
// do something with x <-- ATM, compiler: "Cannot resolve symbol x"
}
Pattern alternatives with name binding are not supported, you can do it like this.
val innerObj = findInnerObj(innerObjId) match {
case Left(InnerObjNotFoundException(x)) => x
case Right((x, _)) => x
}
// do something with innerObj

Why does the value of an expression depend on the variable it's assigned to?

I tried to immitate the default keyword of C#:
private class Default[T] {
private var default : T = _
def get = default
}
Then in the package object I define:
def default[T] = new Default[T].get
I expected default[Int] to be 0, but
println(default[String])
println(default[Int])
println(default[Double])
println(default[Boolean])
all prints null. However
val x = default[Int]
println(x)
prints 0. If I add a type annotation : Any to x it prints null again.
I'm guessing because println expects an argument of type Any the same is happening there.
How is it possible that assigning an expression to a variable of a more general type changes the value of that expression? I find that really counter-intuitive.
Has it something to do with boxing, so that I'm actually calling two different default functions (once with primitive int, once with Integer)? If yes, is there a way to avoid that?
After studying the generated bytecode, I realised what's actually happening. default[T] always returns null, but assigning it to a primitive calls BoxesRunTime.unboxTo... which converts null to whatever the primitive default is.
There are not so many such classes. You could process all of them explicitly:
import scala.reflect.ClassTag
def default[T: ClassTag]: T = (implicitly[ClassTag[T]] match {
case ClassTag.Boolean => false
case ClassTag.Byte => 0: Byte
case ClassTag.Char => 0: Char
case ClassTag.Double => 0: Double
case ClassTag.Float => 0: Float
case ClassTag.Int => 0: Int
case ClassTag.Long => 0: Long
case ClassTag.Short => 0: Short
case ClassTag.Unit => ()
case _ => null.asInstanceOf[T]
}).asInstanceOf[T]
scala> println(default[Int])
0

Pattern matching vs if-else

I'm novice in Scala. Recently I was writing a hobby app and caught myself trying to use pattern matching instead of if-else in many cases.
user.password == enteredPassword match {
case true => println("User is authenticated")
case false => println("Entered password is invalid")
}
instead of
if(user.password == enteredPassword)
println("User is authenticated")
else
println("Entered password is invalid")
Are these approaches equal? Is one of them more preferrable than another for some reason?
class MatchVsIf {
def i(b: Boolean) = if (b) 5 else 4
def m(b: Boolean) = b match { case true => 5; case false => 4 }
}
I'm not sure why you'd want to use the longer and clunkier second version.
scala> :javap -cp MatchVsIf
Compiled from "<console>"
public class MatchVsIf extends java.lang.Object implements scala.ScalaObject{
public int i(boolean);
Code:
0: iload_1
1: ifeq 8
4: iconst_5
5: goto 9
8: iconst_4
9: ireturn
public int m(boolean);
Code:
0: iload_1
1: istore_2
2: iload_2
3: iconst_1
4: if_icmpne 11
7: iconst_5
8: goto 17
11: iload_2
12: iconst_0
13: if_icmpne 18
16: iconst_4
17: ireturn
18: new #14; //class scala/MatchError
21: dup
22: iload_2
23: invokestatic #20; //Method scala/runtime/BoxesRunTime.boxToBoolean:(Z)Ljava/lang/Boolean;
26: invokespecial #24; //Method scala/MatchError."<init>":(Ljava/lang/Object;)V
29: athrow
And that's a lot more bytecode for the match also. It's fairly efficient even so (there's no boxing unless the match throws an error, which can't happen here), but for compactness and performance one should favor if/else. If the clarity of your code is greatly improved by using match, however, go ahead (except in those rare cases where you know performance is critical, and then you might want to compare the difference).
Don't pattern match on a single boolean; use an if-else.
Incidentally, the code is better written without duplicating println.
println(
if(user.password == enteredPassword)
"User is authenticated"
else
"Entered password is invalid"
)
One arguably better way would be to pattern match on the string directly, not on the result of the comparison, as it avoids "boolean blindness". http://existentialtype.wordpress.com/2011/03/15/boolean-blindness/
One downside is the need to use backquotes to protect the enteredPassword variable from being shadowed.
Basically, you should tend to avoid dealing with booleans as much as possible, as they don't convey any information at the type level.
user.password match {
case `enteredPassword` => Right(user)
case _ => Left("passwords don't match")
}
Both statements are equivalent in terms of code semantics. But it might be possible that the compiler creates more complicated (and thus inefficient) code in one case (the match).
Pattern matching is usually used to break apart more complicated constructs, like polymorphic expressions or deconstructing (unapplying) objects into their components. I would not advice to use it as a surrogate for a simple if-else statement - there's nothing wrong with if-else.
Note that you can use it as an expression in Scala. Thus you can write
val foo = if(bar.isEmpty) foobar else bar.foo
I apologize for the stupid example.
It's 2020, the Scala compiler generates far more efficient bytecode in the pattern matching case. The performance comments in the accepted answer are misleading in 2020.
The pattern match generated byte code gives a tough competition to if-else at times pattern matching wins giving much better and consistent results.
One can use pattern match or if-else based on the situation & simplicity.
But the pattern matching has poor performance conclusion is no longer valid.
You can try the following snippet and see the results:
def testMatch(password: String, enteredPassword: String) = {
val entering = System.nanoTime()
password == enteredPassword match {
case true => {
println(s"User is authenticated. Time taken to evaluate True in match : ${System.nanoTime() - entering}"
)
}
case false => {
println(s"Entered password is invalid. Time taken to evaluate false in match : ${System.nanoTime() - entering}"
)
}
}
}
testMatch("abc", "abc")
testMatch("abc", "def")
Pattern Match Results :
User is authenticated. Time taken to evaluate True in match : 1798
Entered password is invalid. Time taken to evaluate false in match : 3878
If else :
def testIf(password: String, enteredPassword: String) = {
val entering = System.nanoTime()
if (password == enteredPassword) {
println(
s"User is authenticated. Time taken to evaluate if : ${System.nanoTime() - entering}"
)
} else {
println(
s"Entered password is invalid.Time taken to evaluate else ${System.nanoTime() - entering}"
)
}
}
testIf("abc", "abc")
testIf("abc", "def")
If-else time results:
User is authenticated. Time taken to evaluate if : 65062652
Entered password is invalid.Time taken to evaluate else : 1809
PS: Since the numbers are at nano precision the results may not accurately match to the exact numbers but the argument on performance holds good.
For the large majority of code that isn't performance-sensitive, there are a lot of great reasons why you'd want to use pattern matching over if/else:
it enforces a common return value and type for each of your branches
in languages with exhaustiveness checks (like Scala), it forces you to explicitly consider all cases (and noop the ones you don't need)
it prevents early returns, which become harder to reason if they cascade, grow in number, or the branches grow longer than the height of your screen (at which point they become invisible). Having an extra level of indentation will warn you you're inside a scope.
it can help you identify logic to pull out. In this case the code could have been rewritten and made more DRY, debuggable, and testable like this:
val errorMessage = user.password == enteredPassword match {
case true => "User is authenticated"
case false => "Entered password is invalid"
}
println(errorMesssage)
Here's an equivalent if/else block implementation:
var errorMessage = ""
if(user.password == enteredPassword)
errorMessage = "User is authenticated"
else
errorMessage = "Entered password is invalid"
println(errorMessage)
Yes, you can argue that for something as simple as a boolean check you can use an if-expression. But that's not relevant here and doesn't scale well to conditions with more than 2 branches.
If your higher concern is maintainability or readability, pattern matching is awesome and you should use it for even minor things!
I am here to offer a different opinion:
For the specific example you offer, the second one (if...else...) style is actually better because it is much easier to read.
In fact, if you put your first example into IntelliJ, it will suggest you to change to the second (if...else...) style. Here is the IntelliJ style suggestion:
Trivial match can be simplified less... (⌘F1)
Suggests to replace trivial pattern match on a boolean expression with a conditional statement.
Before:
bool match {
case true => ???
case false => ???
}
After:
if (bool) {
???
} else {
???
}
I'v came across same question, and had written tests:
def factorial(x: Int): Int = {
def loop(acc: Int, c: Int): Int = {
c match {
case 0 => acc
case _ => loop(acc * c, c - 1)
}
}
loop(1, x)
}
def factorialIf(x: Int): Int = {
def loop(acc: Int, c: Int): Int =
if (c == 0) acc else loop(acc * c, c - 1)
loop(1, x)
}
def measure(e: (Int) => Int, arg:Int, numIters: Int): Long = {
def loop(max: Int): Unit = {
if (max == 0)
return
else {
val x = e(arg)
loop(max-1)
}
}
val startMatch = System.currentTimeMillis()
loop(numIters)
System.currentTimeMillis() - startMatch
}
val timeIf = measure(factorialIf, 1000,1000000)
val timeMatch = measure(factorial, 1000,1000000)
timeIf : Long = 22
timeMatch : Long = 1092
In my environment (scala 2.12 and java 8) I get different results. Match performs consistently better in the code above:
timeIf: Long = 249
timeMatch: Long = 68

Warning about an unchecked type argument in this Scala pattern match?

This file:
object Test extends App {
val obj = List(1,2,3) : Object
val res = obj match {
case Seq(1,2,3) => "first"
case _ => "other"
}
println(res)
}
Gives this warning:
Test.scala:6: warning: non variable type-argument A in type pattern Seq[A]
is unchecked since it is eliminated by erasure
case Seq(1,2,3) => "first"
Scala version 2.9.0.1.
I don't see how an erased type parameter is needed to perform the match. That first case clause is meant to ask if obj is a Seq with 3 elements equal to 1, 2, and 3.
I would understand this warning if I had written something like:
case strings : Seq[String] => ...
Why do I get the warning, and what is a good way to make it go away?
By the way, I do want to match against something with static type of Object. In the real code I'm parsing something like a Lisp datum - it might be an String, sequence of datums, Symbol, Number, etc.
Here is some insight to what happens behind the scene. Consider this code:
class Test {
new Object match { case x: Seq[Int] => true }
new Object match { case Seq(1) => true }
}
If you compile with scalac -Xprint:12 -unchecked, you'll see the code just before the erasure phase (id 13). For the first type pattern, you will see something like:
<synthetic> val temp1: java.lang.Object = new java.lang.Object();
if (temp1.isInstanceOf[Seq[Int]]())
For the Seq extractor pattern, you will see something like:
<synthetic> val temp3: java.lang.Object = new java.lang.Object();
if (temp3.isInstanceOf[Seq[A]]()) {
<synthetic> val temp4: Seq[A] = temp3.asInstanceOf[Seq[A]]();
<synthetic> val temp5: Some[Seq[A]] = collection.this.Seq.unapplySeq[A](temp4);
// ...
}
In both cases, there is a type test to see if the object is of type Seq (Seq[Int] and Seq[A]). Type parameters will be eliminated during the erasure phase. Thus the warning. Even though the second may be unexpected, it does make sense to check the type since if object is not of type Seq that clause won't match and the JVM can proceed to the next clause. If the type does match, then the object can be casted to Seq and unapplySeq can be called.
RE: thoredge comment on the type check. May be we are talking about different things. I was merely saying that:
(o: Object) match {
case Seq(i) => println("seq " + i)
case Array(i) => println("array " + i)
}
translates to something like:
if (o.isInstanceOf[Seq[_]]) { // type check
val temp1 = o.asInstanceOf[Seq[_]] // cast
// verify that temp1 is of length 1 and println("seq " + temp1(0))
} else if (o.isInstanceOf[Array[_]]) { // type check
val temp1 = o.asInstanceOf[Array[_]] // cast
// verify that temp1 is of length 1 and println("array " + temp1(0))
}
The type check is used so that when the cast is done there is no class cast exception.
Whether the warning non variable type-argument A in type pattern Seq[A] is unchecked since it is eliminated by erasure is justified and whether there would be cases where there could be class cast exception even with the type check, I don't know.
Edit: here is an example:
object SeqSumIs10 {
def unapply(seq: Seq[Int]) = if (seq.sum == 10) Some(seq) else None
}
(Seq("a"): Object) match {
case SeqSumIs10(seq) => println("seq.sum is 10 " + seq)
}
// ClassCastException: java.lang.String cannot be cast to java.lang.Integer
Declaring the match object outside at least makes it go away, but I'm not sure why:
class App
object Test extends App {
val obj = List(1,2,3) : Object
val MatchMe = Seq(1,2,3)
val res = obj match {
case MatchMe => "first"
case _ => "other"
}
println(res)
}