How unsafe is it to cast an arbitrary function X=>Y to X => Unit in scala? - scala

More explicitly, can this code produce any errors in any scenrios:
def foreach[U](f :Int=>U) = f.asInstanceOf[Int=>Unit](1)
I know it works, and I have a vague idea why: any function, as an instance of a generic type, must define an erased version of apply and jvm performs type check only when the object is actually to be returned to a code where it had a concrete type (often miles away). So, in theory, as long as I never look at the returned value, I should be safe. I don't have an enough low-level understandings of java byte code, let alone scalac, to have any certainty about it.
Why would I want to do it? Look at the following example:
val b = new mutable.Buffer[Int]
val ints = Seq(1, 2, 3, 4)
ints foreach { b += _ }
It's a typical scala construct, as far as imperative style can be typical. foreach in this example takes an Int as an argument, and as scalac knows it to be an Int, it will create a closure with a specialized apply(x :Int). Unfortunately, its return type in this case is a mutable.Buffer[Int], which is an AnyRef. As far as I was able to see, scalac will never invoke a specialized apply providing an AnyVal argument if the result is an AnyRef (and vice versa). This means, that even if the caller applies the function to Int, underneath the function will box the argument and invoke the erased variant. Here of course it doesn't matter as they are boxed within the List anyway, but I'm talking about the principle.
For this reason I prefer to define this type of method as foreach(f :X=>Unit), rather than foreach[O](f: X=>O) as it is in TraversableOnce. If the input sequence in the example had such a signature, everything would compile just as fine, and the compiler would ignore the actual type of the expression and generate a function with Unit return type, which - when applied to an unboxed Int - would invoke directly void apply(Int x), without boxing.
The problem arises with interoperability - sometimes I need to call a method expecting a function with a Unit return type and all I have is a generic function returning Odin knows what. Of course, I could just write f(_) to box it in another function object instead of passing it directly, but it to large extent makes the whole optimisation of small tight loops moot.

Related

Disfunctionality of type parameter

I’m new to using Scala and am trying to see if a list contains any objects of a certain type.
When I make a method to do this, I get the following results:
var l = List("Some string", 3)
def containsType[T] = l.exists(_.isInstanceOf[T])
containsType[Boolean] // val res0: Boolean = true
l.exists(_.isInstanceOf[Boolean]) // val res1: Boolean = false
Could someone please help me understand why my method doesn’t return the same results as the expression on the last line?
Thank you,
Johan
Alin's answer details perfectly why the generic isn't available at runtime. You can get a bit closer to what you want with the magic of ClassTag, but you still have to be conscious of some issues with Java generics.
import scala.reflect.ClassTag
var l = List("Some string", 3)
def containsType[T](implicit cls: ClassTag[T]): Boolean = {
l.exists(cls.runtimeClass.isInstance(_))
}
Now, whenever you call containsType, a hidden extra argument of type ClassTag[T] gets passed it. So when you write, for instance, println(containsType[String]), then this gets compiled to
scala.this.Predef.println($anon.this.containsType[String](ClassTag.apply[String](classOf[java.lang.String])))
An extra argument gets passed to containsType, namely ClassTag.apply[String](classOf[java.lang.String]). That's a really long winded way of explicitly passing a Class<String>, which is what you'd have to do in Java manually. And java.lang.Class has an isInstance function.
Now, this will mostly work, but there are still major caveats. Generics arguments are completely erased at runtime, so this won't help you distinguish between an Option[Int] and an Option[String] in your list, for instance. As far as the JVM is concerned, they're both Option.
Second, Java has an unfortunate history with primitive types, so containsType[Int] will actually be false in your case, despite the fact that the 3 in your list is actually an Int. This is because, in Java, generics can only be class types, not primitives, so a generic List can never contain int (note the lowercase 'i', this is considered a fundamentally different thing in Java than a class).
Scala paints over a lot of these low-level details, but the cracks show through in situations like this. Scala sees that you're constructing a list of Strings and Ints, so it wants to construct a list of the common supertype of the two, which is Any (strings and ints have no common supertype more specific than Any). At runtime, Scala Int can translate to either int (the primitive) or Integer (the object). Scala will favor the former for efficiency, but when storing in generic containers, it can't use a primitive type. So while Scala thinks that your list l contains a String and an Int, Java thinks that it contains a String and a java.lang.Integer. And to make things even crazier, both int and java.lang.Integer have distinct Class instances.
So summon[ClassTag[Int]] in Scala is java.lang.Integer.TYPE, which is a Class<Integer> instance representing the primitive type int (yes, the non-class type int has a Class instance representing it). While summon[ClassTag[java.lang.Integer]] is java.lang.Integer::class, a distinct Class<Integer> representing the non-primitive type Integer. And at runtime, your list contains the latter.
In summary, generics in Java are a hot mess. Scala does its best to work with what it has, but when you start playing with reflection (which ClassTag does), you have to start thinking about these problems.
println(containsType[Boolean]) // false
println(containsType[Double]) // false
println(containsType[Int]) // false (list can't contain primitive type)
println(containsType[Integer]) // true (3 is converted to an Integer)
println(containsType[String]) // true (class type so it works the way you expect)
println(containsType[Unit]) // false
println(containsType[Long]) // false
Scala uses the type erasure model of generics. This means that no
information about type arguments is kept at runtime, so there's no way
to determine at runtime the specific type arguments of the given
List object. All the system can do is determine that a value is a
List of some arbitrary type parameters.
You can verify this behavior by trying any List concrete type:
val l = List("Some string", 3)
println(l.isInstanceOf[List[Int]]) // true
println(l.isInstanceOf[List[String]]) // true
println(l.isInstanceOf[List[Boolean]]) // also true
println(l.isInstanceOf[List[Unit]]) // also true
Now regarding your example:
def containsType[T] = l.exists(_.isInstanceOf[T])
println(containsType[Int]) // true
println(containsType[Boolean]) // also true
println(containsType[Unit]) // also true
println(containsType[Double]) // also true
isInstanceOf is a synthetic function (a function generated by the Scala compiler at compile-time, usually to work around the underlying JVM limitations) and does not work the way you would expect with generic type arguments like T, because after compilation, this would normally be equivalent in Java to instanceof T which, by the way - is illegal in Java.
Why is illegal? Because of type erasure. Type erasure means all your generic code (generic classes, generic methods, etc.) is converted to non-generic code. This usually means 3 things:
all type parameters in generic types are replaced with their bounds or Object if they are unbounded;
wherever necessary the compiler inserts type casts to preserve type-safety;
bridge methods are generated if needed to preserve polymorphism of all generic methods.
However, in the case of instanceof T, the JVM cannot differentiate between types of T at execution time, so this makes no sense. The type used with instanceof has to be reifiable, meaning that all information about the type needs to be available at runtime. This property does not apply to generic types.
So if Java forbids this because it can't work, why does Scala even allows it? The Scala compiler is indeed more permissive here, but for one good reason; because it treats it differently. Like the Java compiler, the Scala compiler also erases all generic code at compile-time, but since isInstanceOf is a synthetic function in Scala, calls to it using generic type arguments such as isInstanceOf[T] are replaced during compilation with instanceof Object.
Here's a sample of your code decompiled:
public <T> boolean containsType() {
return this.l().exists(x$1 -> BoxesRunTime.boxToBoolean(x$1 instanceof Object));
}
Main$.l = (List<Object>)package$.MODULE$.List().apply((Seq)ScalaRunTime$.MODULE$.wrapIntArray(new int[] { 1, 2, 3 }));
Predef$.MODULE$.println((Object)BoxesRunTime.boxToBoolean(this.containsType()));
Predef$.MODULE$.println((Object)BoxesRunTime.boxToBoolean(this.containsType()));
This is why no matter what type you give to the polymorphic function containsType, it will always result in true. Basically, containsType[T] is equivalent to containsType[_] from Scala's perspective - which actually makes sense because a generic type T, without any upper bounds, is just a placeholder for type Any in Scala. Because Scala cannot have raw types, you cannot for example, create a List without providing a type parameter, so every List must be a List of "something", and that "something" is at least an Any, if not given a more specific type.
Therefore, isInstanceOf can only be called with specific (concrete) type arguments like Boolean, Double, String, etc. That is why, this works as expected:
println(l.exists(_.isInstanceOf[Boolean])) // false
We said that Scala is more permissive, but that does not mean you get away without a warning.
To alert you of the possibly non-intuitive runtime behavior, the Scala compiler does usually emit unchecked warnings. For example, if you had run your code in the Scala interpreter (or compile it using scalac), you would have received this:

What is the difference between Any and Unit?

What is the difference between Any and Unit in Scala ?
I know both are datatypes, but what is the difference ?
Unit is kind of like Java's void, except it has an actual value (() is the only value of type Unit).
Any is the parent type of every other type. () is an instance of Any. 1 is an instance of Any. "Hello" is an instance of Any.
Any has two direct sub-types; AnyVal (which includes types that Java would consider "primitives" like Int and Boolean), and AnyRef (like java.lang.Object).
Any represents an object of any type, roughly the same as void * in C/C++.
Unit represents no object, roughly the same as void in C/C++.
Neither carries much semantic meaning in the following sense:
if we have a value of type Any we do not really know what kind of value it is unless we cast it at runtime
if we have a value of Unit we know a side-effect was executed, but we do not really know much about the kind of side-effect it was.
Ideally, we want to narrow down the meaning of the program as much as possible, thus we try to minimise the usage of Any and Unit, whilst maximising the usage of semantically richer types.
As a side-note, Any is the root of the Scala class hierarchy, which means even Unit is a subtype of Any, for example, the following is valid: val a: Any = ().

Scala type hierarchy

I looked at Scala Type Hierarchy
It's pretty clear that Unit is a subtype of AnyVal. So I tried this:
object Main extends App {
val u : Unit = Seq[String]()
}
and it compiled fine. But I expected some error. DEMO.
Why? Unit is not a supertype of a Seq
This happens because Unit can also be inferred from statements (in contrast with expressions). This means that if you use the Unit type annotation, the compiler will ignore the last value in your expression and return Unit after your actual code.
You can simply testing this, by casting your u back to its original type. You will get a cast error, because u isn't actually bound to your Seq.
Here's what happens if you run this in the REPL:
scala> u.asInstanceOf[Seq[String]]
java.lang.ClassCastException: scala.runtime.BoxedUnit cannot be cast to scala.collection.Seq
As Jasper-M correctly pointed out, the compiler will rewrite this to
val u: Unit = { Seq[String](); () }
Why? Unit is not a supertype of a Seq
You are assuming that the Seq is assigned to u. But it isn't:
println(u)
# ()
As you can see, the value of u is () (i.e. the unique singleton instance of Unit), not an empty Seq.
This is due to Value Discarding, which is described in clause 5 of section 6.26.1 Value Conversions of the Scala Language Specification:
The following seven implicit conversions can be applied to an expression e which has some value type T and which is type-checked with some expected type pt.
[…]
If e has some value type and the expected type is Unit, e is converted to the expected type by embedding it in the term { e; () }.
In plain English as opposed to "Speclish", what this means is: if you say "this is of type Unit", but it actually isn't, the compiler will return () for you.
The Unit type is sort-of the equivalent to a statement: statements have no value. Well, in Scala, everything has a value, but we have () which is a useless value, so what would in other languages be a statement which has no value, is in Scala an expression which has a useless value. And just like other languages which distinguish between expressions and statements have such things like statement expressions and expression statements (e.g. ECMAScript, Java, C, C++, C♯), Scala has Value Discarding, which is essentially its analog for an Expression Statement (i.e. a statement which contains an expression whose value is discarded).
For example, in C you are allowed to write
printf("Hello, World");
even though printf is a function which returns a size_t. But, C will happily allow you to use printf as a statement and simply discard the return value. Scala does the same.
Whether or not that's a good thing is a different question. Would it be better if C forced you to write
size_t ignoreme = printf("Hello, World");
or if Scala forced you to write
val _ = processKeyBinding​(…)
// or
{ processKeyBinding​(…); () }
instead of just
processKeyBinding​(…)
I don't know. However, one of Scala's design goals is to integrate well with the host platform and that does not just mean high-performance language interoperability with other languages, it also means easy onboarding of existing developers, so there are some features in Scala that are not actually needed but exist for familiarity with developers from the host platform (e.g. Java, C♯, ECMAScript) – Value Discarding is one of them, while loops are another.

def layout[A](x: A) = ... syntax in Scala

I'm a beginner of Scala who is struggling with Scala syntax.
I got the line of code from https://www.tutorialspoint.com/scala/higher_order_functions.htm.
I know (x: A) is an argument of layout function
( which means argument x of Type A)
But what is [A] between layout and (x: A)?
I've been googling scala function syntax, couldn't find it.
def layout[A](x: A) = "[" + x.toString() + "]"
It's a type parameter, meaning that the method is parameterised (some also say "generic"). Without it, compiler would think that x: A denotes a variable of some concrete type A, and when it wouldn't find any such type it would report a compile error.
This is a fairly common thing in statically typed languages; for example, Java has the same thing, only syntax is <A>.
Parameterized methods have rules where the types can occur which involve concepts of covariance and contravariance, denoted as [+A] and [-A]. Variance is definitely not in the scope of this question and is probably too much for you too handle right now, but it's an important concept so I figured I'd just mention it, at least to let you know what those plus and minus signs mean when you see them (and you will).
Also, type parameters can be upper or lower bounded, denoted as [A <: SomeType] and [A >: SomeType]. This means that generic parameter needs to be a subtype/supertype of another type, in this case a made-up type SomeType.
There are even more constructs that contribute extra information about the type (e.g. context bounds, denoted as [A : Foo], used for typeclass mechanism), but you'll learn about those later.
This means that the method is using a generic type as its parameter. Every type you pass that has the definition for .toString could be passed through layout.
For example, you could pass both int and string arguments to layout, since you could call .toString on both of them.
val i = 1
val s = "hi"
layout(i) // would give "[1]"
layout(s) // would give "[hi]"
Without the gereric parameter, for this example you would have to write two definitions for layout: one that accepts integers as param, and one that accepts string as param. Even worse: every time you need another type you'd have to write another definition that accepts it.
Take a look at this example here and you'll understand it better.
I also recomend you to take a look at generic classes here.
A is a type parameter. Rather than being a data type itself (Ex. case class A), it is generic to allow any data type to be accepted by the function. So both of these will work:
layout(123f) [Float datatype] will output: "[123]"
layout("hello world") [String datatype] will output: "[hello world]"
Hence, whichever datatype is passed, the function will allow. These type parameters can also specify rules. These are called contravariance and covariance. Read more about them here!

Scala: why doesn't List[=>Int] work?

I've been working on learning the ins and outs of scala, and recently I've come across something I'm curious about.
As I understand, if I want to pass a block of code that is effectively lazily evaluated to a function, (without evaluating it on the spot) I could type:
def run(a: =>Int):Int = {...}
In this sense, the function run receives a block of code, that is yet to be evaluated, which it evaluates and returns the computed Int of. I then tried to extend this idea to the List data structure. Typing:
def run(a: List[=>Int]) = {...}
This however, returns an error. I was wondering why this is disallowed. How, other than by this syntax can I pass a list of unevaluated blocks of code?
=>Int is the syntax for by name parameters. =>Int is not a type, so it can't be used as a parameter to List. However, ()=>Int is a type. It's the type of nullary functions that return Int. So this works:
def run(a: List[()=>Int]) = {...}
by-name parameter is not a first-class type in Scala.
List[()=>Int] is one of the solution. otherwise, You can use following Lazy data structure.
https://gist.github.com/1164885
https://github.com/scalaz/scalaz/blob/v6.0.4/core/src/main/scala/scalaz/Name.scala#L99-107
https://github.com/scalaz/scalaz/blob/v7.0.0-M7/core/src/main/scala/scalaz/Name.scala#L37-L60
https://github.com/scala/scala/blob/v2.10.0/src/reflect/scala/reflect/internal/transform/Transforms.scala#L9-23
https://github.com/harrah/xsbt/blob/v0.12.1/compile/api/SafeLazy.scala
https://github.com/okomok/ken/blob/0.1.0/src/main/scala/com/github/okomok/ken/Lazy.scala
https://github.com/playframework/Play20/blob/2.1-RC1/framework/src/play/src/main/scala/play/api/libs/functional/Util.scala#L3-L11