Why are there RichInt or RichX in Scala? - scala

This is a simple question.
Why was not the method related to Int resided in Int?
Instead Scala bothers to put related methods into RichInt and rely on implicit conversion so as to have them work like methods of Int.
Why bother?

Scala doesn't exist in a vacuum. It was specifically designed to be hosted in an ecosystem / on a platform which was mostly designed for another language: the Java platform, the .NET platform, the ECMAScript platform, Cocoa, etc.
This means that in some cases compromises had to be made, in order to make Scala operate seamlessly, efficiently, and with high performance with the ecosystem, libraries and language of the host platform. That's why it has null, why it has classes (it could get by with just traits, and allow traits to have constructors), why it has packages (because they can be cleanly mapped to Java packages or .NET namespaces), why it doesn't have proper tail calls, doesn't have reified generics, etc. It's even why it has curly braces, not to make it easier to integrate with Java, but to make it easier to integrate with the brains of Java developers.
scala.Int is a fake class, it represents a native platform integer (primitive int in Java, System.Int32 in .NET, etc.) Being fake, it can't really have any methods other than the operations provided by the host environment.
The alternative would be to have all operations in the Int class and have the compiler know the difference between which methods are native and which aren't. But that's a special case, it makes more sense to concentrate efforts on making "enrich-my-library" fast in general, so that all programmers can benefit from those optimizations instead of spending time, money and resources on optimizations that only apply to twelve or so classes.

The question is why not model Int richly and then optimize, for example, that it has an unboxed representation and that some operations are provided natively?
The answer must surely be that the compiler is still not very good at these optimizations.
scala> 42.isWhole
res1: Boolean = true
scala> :javap -prv -
[snip]
9: getstatic #26 // Field scala/runtime/RichInt$.MODULE$:Lscala/runtime/RichInt$;
12: getstatic #31 // Field scala/Predef$.MODULE$:Lscala/Predef$;
15: bipush 42
17: invokevirtual #35 // Method scala/Predef$.intWrapper:(I)I
20: invokevirtual #39 // Method scala/runtime/RichInt$.isWhole$extension:(I)Z
23: putfield #17 // Field res1:Z
26: return
or under -optimize
9: getstatic #26 // Field scala/runtime/RichInt$.MODULE$:Lscala/runtime/RichInt$;
12: getstatic #31 // Field scala/Predef$.MODULE$:Lscala/Predef$;
15: astore_1
16: bipush 42
18: invokevirtual #35 // Method scala/runtime/RichInt$.isWhole$extension:(I)Z
21: putfield #17 // Field res0:Z
24: return

Related

Scala and JMM: number types performance

I'm new in Scala and don't understand some base things.
Scala does not contains primitives. Hence int, short and other "simple" number types are objects. So, according to JMM, they are not located at stack and subject to cleaning by GB. Cleaning by GB may be too expensive for some cases.
So I don't clearly understand, why Scala is considered faster than Java (in which primitives located in stack).
Scala does not contains primitives. Hence int, short and other "simple" number types are objects.
That is correct.
So, according to JMM,
The Java Memory Model is for Java. It is completely irrelevant to Scala.
they are not located at stack and subject to cleaning by GB. Cleaning by GB may be too expensive for some cases.
There is no such thing as a "stack" in Scala. The Scala Language Specification only mentions the term "stack" in very few places, and none of them have anything to do with Ints:
In section 1 Lexical Syntax, subsection 1.6 XML mode, it is said that because XML literals and Scala code can be arbitrarily nested, the parser has to use a stack data structure to keep track of the context.
In section 7 Implicits, subsection 7.2 Implicit parameters, it is said that to prevent an infinite recursion when searching for implicit, the compiler keeps a stack of "open types", which are types that it is currently searching an implicit for.
In section 6 Expressions, subsection 6.6 Function Applications, there is the following statement, specifying Proper Direct Tail Recursion:
A function application usually allocates a new frame on the program's run-time stack. However, if a local method or a final method calls itself as its last action, the call is executed using the stack-frame of the caller.
In section 6 Expressions, subsection 6.20 Return Expressions, there is the following statement about one possible implementation strategy for non-local returns from nested functions:
Returning from the method from within a nested function may be implemented by throwing and catching a scala.runtime.NonLocalReturnControl. Any exception catches between the point of return and the enclosing methods might see and catch that exception. A key comparison makes sure that this exception is only caught by the method instance which is terminated by the return.
If the return expression is itself part of an anonymous function, it is possible that the enclosing method m has already returned before the return expression is executed. In that case, the thrown scala.runtime.NonLocalReturnControl will not be caught, and will propagate up the call stack.
Of these 4 instances, the first 2 clearly do not refer to the concept of a call stack but rather to the generic computer science data structure. The 4th one is only an example of a possible implementation strategy ("Returning from the method from within a nested function may be implemented by […]"). Only the 3rd one is actually relevant, as it indeed talks about a call stack. However, it does not say anything about allocating Ints, and it explicitly leaves the door open to alternative implementations as well, by stating that "usually" function application leads to allocation of a stack frame, but doesn't have to.
So I don't clearly understand, why Scala is considered faster than Java (in which primitives located in stack).
Actually, there is nothing in the Java Language Specification either that says that primitives are located on the stack. In fact, the Java Language Specification does not mandate the existence of a stack at all. It would be perfectly legal to implement Java without a stack.
There are exactly zero occurrences of the term "stack" in the JLS. There are a couple of mentions of the term "heap", but only in the compound term "heap pollution", which is simply a word describing a certain flaw in the type system, but does not necessarily require a heap, and does not mandate a heap.
And none of these mentions of "heap pollution" have anything to do with primitives.
Note that, when I say that the Scala Language Specification says nothing about stacks or heaps or how Ints are allocated, that is actually really important. Because the SLS doesn't say anything, implementors are allowed to do whatever they want, including making Ints primitive and allocating them on the stack.
And that is exactly what most Scala implementations do. The (now-defunct) Scala.NET implemented scala.Int as a .NET System.Int32. Scala-native implements scala.Int as a C int32_t. Scala.js implements scala.Int as an ECMAScript number. And Scala-JVM implements scala.Int as a JVM int.
If you check out the source code of scala.Int in the Scala-JVM repository (src/library/scala/Int.scala), you will find that it is actually empty! More precisely, it only contains documentation and declarations, but no definitions or implementations. Also, the class is marked final (meaning it can't be inherited from) and abstract (meaning it must be inherited from in order to provide overrides for the missing implementations), which is a contradiction.
How does this work? Well, the compiler knows what an Int is and how it works, and it simply generates the correct code for dealing with a JVM int. So, when it sees a call to scala.Int.+, it knows that instead it must generate an iadd bytecode instruction. Likewise, Scala-native will just generate the native integer addition instructions, and so on.
In other words, Ints are semantically defined as objects, but they are actually pragmatically implemented as primitives.
This is a general rule of how language specifications work: typically, they only describe what the result is that the programmer sees, but they leave it open to the implementor how to actually achieve that result. So, the SLS specifies that an Int must look as if it actually were an object, but there is nothing that says it actually has to be one.
They are handled the same way that Java handles those types, they're only boxed when strictly necessary. The details on how and when they are boxed may differ, but the compiler uses a primitive representation if it can do so. Here's what the docs say (this is just for Int, but it applies to other "primitive" types too):
Int, a 32-bit signed integer (equivalent to Java's int primitive type) is a subtype of scala.AnyVal. Instances of Int are not represented by an object in the underlying runtime system.
There is an implicit conversion from scala.Int => scala.runtime.RichInt which provides useful non-primitive operations.
https://www.scala-lang.org/api/2.13.6/scala/Int.html
The main difference, really, is that there aren't two separate types, like in Java, to represent the boxed and unboxed representations — both get the same Int type, whereas Java has int and Integer.

Understanding the Idea behind Tail Recursion in Scala

I come from a object oriented background, where I primarily wrote applications in Java. I recently started to explore more on Scala and I have been reading some text. I thus came across something called tail recursion. I understood how to write tail recursive methods.
For example - To add the elements in a List (Of course this could be done using reduce method) but for the sake of understanding, I wrote a tail recursive method:
#scala.annotation.tailrec
def sum(l: List[Int], acc: Int): Int = l match {
case Nil => acc
case x :: xs => sum(xs, acc + x)
}
How is this recursion handled internally by the Scala run time?
How is this recursion handled internally by the Scala run time?
It isn't. It is handled by the compiler at compile time.
Tail-recursion is equivalent to a while loop. So, a tail-recursive method can be compiled to a while loop, or, more precisely, it can be compiled the same way a while loop is compiled. Of course, how exactly it is compiled depends on the compiler being used.
There are currently three major implementations of Scala, these are Scala-native (a compiler that targets native machine code with its own runtime), Scala.js (a compiler that targets the ECMAScript platform, sitting on top of the ECMAScript runtime), and the JVM implementation Scala which confusingly is also called "Scala" like the language (which targets the JVM platform and uses the JVM runtime). There used to be a Scala.NET, but that is no longer actively maintained.
I will focus on Scala-JVM in this answer.
I'll use a slightly different example than yours, because the encoding of pattern matching is actually fairly complex. Let's start with the simplest possible tail-recursive function there is:
def foo(): Unit = foo()
This gets compiled by Scala-JVM to the following JVM bytecode:
public void foo()
0: goto 0
Remember how I said above that tail-recursion is equivalent to looping? Well, the JVM doesn't have loops, it only has GOTO. This is exactly the same as a while loop:
def bar(): Unit = while (true) {}
Gets compiled to:
public void bar()
0: goto 0
And for a slightly more interesting example:
def baz(n: Int): Int = if (n <= 0) n else baz(n-1)
gets compiled to:
public int baz(int);
0: iload_1
1: iconst_0
2: if_icmpgt 9
5: iload_1
6: goto 16
9: iload_1
10: iconst_1
11: isub
12: istore_1
13: goto 0
16: ireturn
As you can see, it is just a while loop.

Why are FunctionN types in Scala created as a subtype of AnyRef, not AnyVal?

If I understand correctly, under the JVM, every time I use a lambda expression, an Object has to be created.
Why the overhead? Why did the Scala creators choose to extend AnyRef instead of AnyVal when designing FunctionN types? I mean, they don't have any real 'values' in them by themselves, so shouldn't it be possible for functions to be value objects with an underlying Unit representation (or a long containing some hash for equality checking or whatever)? I can imagine not allocating an object per every lambda can lead to performance boosts in some codebases.
One obvious disadvantage that comes to my mind of extending AnyVal is that it would prohibit subclassing function types. Maybe that alone would be sufficient to be not extending AnyVal, but what other reasons can there be?
--Edit
I understand that functions need to close over other variables, but I think it would be more natural to model it as arguments to the apply method, not field members of FunctionN objects (thus removing the necessity to have a java.lang.Object on this part) -- after all, isn't what variables are closed over all known at compile time?
--Edit again
I found out about it; what I had in mind was 'lambda lifting'.
The only ways to call a method are the bytecode operations invokevirtual (virtual dispatch on class), invokeinterface (same, but on interfaces), invokespecial (invoke exactly the given method, ignoring virtual lookup. Used for private, super, and new.), and invokestatic (summon unicorns call static method). invokespecial is out, because calling exactly some function is the antithesis of abstracting over a function. invokestatic is out, too, because it's essentially an invokespecial clone that doesn't need a this argument. invokevirtual and invokeinterface are similar enough for our purposes to be considered the same.
There's no way to pass a plain function pointer like you might see in C, and even if you could, you could never call it, as there is no instruction that can jump to an arbitrary point in code. (All jump targets inside a method are restricted to that method, and all references to the outside boil down to strings. (The JVM, of course, is free to optimize that representation once a file is loaded.))
In order to invoke a method with either instruction, the JVM must look up the method inside the target object's virtual dispatch table. If you tried to dummy out the object with () (AnyVal subclasses didn't exist until 2.10, but let's suspend our disbelief), then the JVM would get horribly confused when you tried to call a (presumably interesting) method on something that's as close to "featureless blob" as you can get.
Also remember that an object's methods are totally determined by its class. If a.getClass == b.getClass, then a and b have the exact same methods, code and all. In order to get around this, we need to create subclasses of the FunctionN traits, such that each subclass represents one function, and each instance of each class contains a reference to its class which contains a reference to the code associated with that function. Even with invokedynamic, this is still true, as the LambdaMetaFactory's current implementation creates an inner class at runtime.
Finally, the assumption that functions need no state is false, as #Oleg points out. Closures need to keep references to their environment, and that is only possible with an object.

unrecognizable code in scala Predef object

Can someone please explain the following code in Predef object? Thanks.
scala.`package` // to force scala package object to be seen.
scala.collection.immutable.List // to force Nil, :: to be seen.
(Link). I can only guess. When you use a singleton object as expression, this has the same effect as forcing the evaluation of a lazy val, in other words it will run the object's body if it wasn't yet initialised.
For example:
object Foo {
println("Foo is launching rockets...")
}
Now when you write just
Foo // prints `Foo is launching rockets...`
this enforces the evaluation of the contents of Foo.
So my guess is that in Predef this just makes sure that certain things in the scala package object and in List are initialised. Very unlikely something you would bother as a user.
The accepted answer has to be wrong.
Update: The ML provided a couple of helpful links to issues, without saying whether the problem is only runtime dependencies between modules or also compiler-internal or similar.
Predef is imported at compile time to make various definitions available when your source file is compiled. (Update: but still carries some value definitions, and not all its methods are inlined. And in any case, even an inlined usage, of assert for example, will incur a reference to Predef$.MODULE$, because of side-effects at initialization, see below.)
You can turn off that behavior with -Yno-predef. For example,
apm#mara:~/goof$ scala -Yno-predef
Welcome to Scala version 2.10.3 (OpenJDK 64-Bit Server VM, Java 1.7.0_25).
Type in expressions to have them evaluated.
Type :help for more information.
scala> "a" -> 1
<console>:8: error: value -> is not a member of String
"a" -> 1
^
It has nothing to do with runtime evaluation. Otherwise, everything in Predef and everything it referenced would get loaded before your program got a chance to run.
(Update: yeah, well, guess what.)
In Scala, import is not like #include.
Useful exercise:
scala> :javap -prv scala.Predef$
private scala.Predef$();
flags: ACC_PRIVATE
Code:
stack=3, locals=1, args_size=1
0: aload_0
1: invokespecial #520 // Method scala/LowPriorityImplicits."<init>":()V
4: aload_0
5: putstatic #522 // Field MODULE$:Lscala/Predef$;
8: aload_0
9: invokestatic #526 // Method scala/DeprecatedPredef$class.$init$:(Lscala/Predef$;)V
12: getstatic #531 // Field scala/package$.MODULE$:Lscala/package$;
15: pop
16: getstatic #536 // Field scala/collection/immutable/List$.MODULE$:Lscala/collection/immutable/List$;
19: pop
20: aload_0
21: getstatic #540 // Field scala/collection/immutable/Map$.MODULE$:Lscala/collection/immutable/Map$;
24: putfield #151 // Field Map:Lscala/collection/immutable/Map$;
// etc, etc
No doubt my understanding is incomplete.

Why does running javap on a compiled Scala class show weird entries in the constant pool?

When running javap -v on the compiled class resulting from this bit of Scala (version 2.8.1 final):
class Point(x : Int, y : Int)
I get the following output for the constant pool entries, along with several terminal beeps indicating non-printable chars?
#19 = Utf8 Lscala/reflect/ScalaSignature;
#20 = Utf8 bytes
#21 = Utf8 \t2\"\t!!>Lg9A(Z7qift4A\nqCA\r!BA
aM\4
-\tAA[1wC&Q\nTWm;=R\"\t
E\tQa]2bYL!a\tMr\1PE*,7\r\t+\t)A-\t/%:$
eDu\taP5oSRtDc!CAqA!)Qca-!)!da-
#22 = Utf8 RuntimeVisibleAnnotations
#23 = Utf8 Point
#24 = Class #23 // Point
Any idea what's going on and why? I've never seen binary garbage in CONSTANT_Utf8 entries before.
I'm using an OpenJDK 7 build on Mac 10.6, if that makes a difference - I will try to replicate tomorrow when I have other OSes to play with, and will update accordingly.
The ScalaSignature element is where the extra type information that Scala needs is stored. It's being stored (encoded, obviously) in annotations now so that it can be made available to reflection tools.