Why is String different than Int,Boolean,Byte... in scala? - scala

Because I know a little bit Java I was trying to use in every Scala Code some Java types like java.lang.Integer, java.lang.Character, java.lang.Boolean ... and so on. Now people told me "No! For everything in Scala there is an own type, the Java stuff will work - but you should always prefer the Scala types and objects".
Ok now I see, there is everything in Scala that is in Java. Not sure why it is better to use for example Scala Boolean instead of Java Boolean, but fine. If I look at the types I see scala.Boolean, scala.Int, scala.Byte ... and then I look at String but its not scala.String, (well its not even java.lang.String confuse) its just a String. But I thought I should use everything that comes direct from scala. Maybe I do not understand scala correctly, could somebody explain it please?

First, 'well its not even java.lang.String' statement is not quite correct. Plain String name comes from type alias defined in Predef object:
type String = java.lang.String
and insides of Predef are imported in every Scala source, hence you use String instead of full java.lang.String, but in fact they are the same.
java.lang.String is very special class treated by JVM in special way. As #pagoda_5b said, it is declared as final, it is not possible to extend it (and this is good in fact), so Scala library provides a wrapper (RichString) with additional operations and an implicit conversion String -> RichString available by default.
However, there is slightly different situation with Integer, Character, Boolean etc. You see, even though String is treated specially by JVM, it still a plain class whose instances are plain objects. Semantically it is not different from, say, List class.
There is another situation with primitive types. Java int, char, boolean types are not classes, and values of these types are not objects. But Scala is fully object-oriented language, there are no primitive types. It would be possible to use java.lang.{Integer,Boolean,...} everywhere where you need corresponding types, but this would be awfully inefficient because of boxing.
Because of this Scala needed a way to present Java primitive types in object-oriented setting, and so scala.{Int,Boolean,...} classes were introduced. These types are treated specially via Scala compiler - scalac generates code working with primitives when it encounters one of these classes. They also extend AnyVal class, which prevents you from using null as a value for these types. This approach solves the problem with efficiency, leaves java.lang.{Integer,Boolean,...} classes available where you really need boxing, and also provides elegant way to use primitives of another host system (e.g. .NET runtime).

I'm just guessing here
If you look at the docs, you can see that the scala version of primitives gives you all the expected operators that works on numeric types, or boolean types, and sensible conversions, without resorting to boxing-unboxing as for java.lang wrappers.
I think this choice was made to give uniform and natural access to what was expected of primitive types, while at the same time making them Objects as any other scala type.
I suppose that java.lang.String required a different approach, being an Object already, and final in its implementation. So the "path of least pain" was to create an implicit Rich wrapper around it to get missing operations on String, while leaving the rest untouched.
To see it another way, java.lang.String was already good enough as-is, being immutable and what-else.
It's worth mentioning that the other "primitive" types in scala have their own Rich wrappers that provides additional sensible operations.

Related

Scala and JMM: number types performance

I'm new in Scala and don't understand some base things.
Scala does not contains primitives. Hence int, short and other "simple" number types are objects. So, according to JMM, they are not located at stack and subject to cleaning by GB. Cleaning by GB may be too expensive for some cases.
So I don't clearly understand, why Scala is considered faster than Java (in which primitives located in stack).
Scala does not contains primitives. Hence int, short and other "simple" number types are objects.
That is correct.
So, according to JMM,
The Java Memory Model is for Java. It is completely irrelevant to Scala.
they are not located at stack and subject to cleaning by GB. Cleaning by GB may be too expensive for some cases.
There is no such thing as a "stack" in Scala. The Scala Language Specification only mentions the term "stack" in very few places, and none of them have anything to do with Ints:
In section 1 Lexical Syntax, subsection 1.6 XML mode, it is said that because XML literals and Scala code can be arbitrarily nested, the parser has to use a stack data structure to keep track of the context.
In section 7 Implicits, subsection 7.2 Implicit parameters, it is said that to prevent an infinite recursion when searching for implicit, the compiler keeps a stack of "open types", which are types that it is currently searching an implicit for.
In section 6 Expressions, subsection 6.6 Function Applications, there is the following statement, specifying Proper Direct Tail Recursion:
A function application usually allocates a new frame on the program's run-time stack. However, if a local method or a final method calls itself as its last action, the call is executed using the stack-frame of the caller.
In section 6 Expressions, subsection 6.20 Return Expressions, there is the following statement about one possible implementation strategy for non-local returns from nested functions:
Returning from the method from within a nested function may be implemented by throwing and catching a scala.runtime.NonLocalReturnControl. Any exception catches between the point of return and the enclosing methods might see and catch that exception. A key comparison makes sure that this exception is only caught by the method instance which is terminated by the return.
If the return expression is itself part of an anonymous function, it is possible that the enclosing method m has already returned before the return expression is executed. In that case, the thrown scala.runtime.NonLocalReturnControl will not be caught, and will propagate up the call stack.
Of these 4 instances, the first 2 clearly do not refer to the concept of a call stack but rather to the generic computer science data structure. The 4th one is only an example of a possible implementation strategy ("Returning from the method from within a nested function may be implemented by […]"). Only the 3rd one is actually relevant, as it indeed talks about a call stack. However, it does not say anything about allocating Ints, and it explicitly leaves the door open to alternative implementations as well, by stating that "usually" function application leads to allocation of a stack frame, but doesn't have to.
So I don't clearly understand, why Scala is considered faster than Java (in which primitives located in stack).
Actually, there is nothing in the Java Language Specification either that says that primitives are located on the stack. In fact, the Java Language Specification does not mandate the existence of a stack at all. It would be perfectly legal to implement Java without a stack.
There are exactly zero occurrences of the term "stack" in the JLS. There are a couple of mentions of the term "heap", but only in the compound term "heap pollution", which is simply a word describing a certain flaw in the type system, but does not necessarily require a heap, and does not mandate a heap.
And none of these mentions of "heap pollution" have anything to do with primitives.
Note that, when I say that the Scala Language Specification says nothing about stacks or heaps or how Ints are allocated, that is actually really important. Because the SLS doesn't say anything, implementors are allowed to do whatever they want, including making Ints primitive and allocating them on the stack.
And that is exactly what most Scala implementations do. The (now-defunct) Scala.NET implemented scala.Int as a .NET System.Int32. Scala-native implements scala.Int as a C int32_t. Scala.js implements scala.Int as an ECMAScript number. And Scala-JVM implements scala.Int as a JVM int.
If you check out the source code of scala.Int in the Scala-JVM repository (src/library/scala/Int.scala), you will find that it is actually empty! More precisely, it only contains documentation and declarations, but no definitions or implementations. Also, the class is marked final (meaning it can't be inherited from) and abstract (meaning it must be inherited from in order to provide overrides for the missing implementations), which is a contradiction.
How does this work? Well, the compiler knows what an Int is and how it works, and it simply generates the correct code for dealing with a JVM int. So, when it sees a call to scala.Int.+, it knows that instead it must generate an iadd bytecode instruction. Likewise, Scala-native will just generate the native integer addition instructions, and so on.
In other words, Ints are semantically defined as objects, but they are actually pragmatically implemented as primitives.
This is a general rule of how language specifications work: typically, they only describe what the result is that the programmer sees, but they leave it open to the implementor how to actually achieve that result. So, the SLS specifies that an Int must look as if it actually were an object, but there is nothing that says it actually has to be one.
They are handled the same way that Java handles those types, they're only boxed when strictly necessary. The details on how and when they are boxed may differ, but the compiler uses a primitive representation if it can do so. Here's what the docs say (this is just for Int, but it applies to other "primitive" types too):
Int, a 32-bit signed integer (equivalent to Java's int primitive type) is a subtype of scala.AnyVal. Instances of Int are not represented by an object in the underlying runtime system.
There is an implicit conversion from scala.Int => scala.runtime.RichInt which provides useful non-primitive operations.
https://www.scala-lang.org/api/2.13.6/scala/Int.html
The main difference, really, is that there aren't two separate types, like in Java, to represent the boxed and unboxed representations — both get the same Int type, whereas Java has int and Integer.

What types are special to the Scala compiler?

Scala makes a big deal about how what seem to be language features are implemented as library features.
Is there a list of types that are treated specially by the language?
Either in the specification or as an implementation detail?
That would include, for example, optimizing away matches on tuples.
What about special conventions related to pattern matching, for comprehensions, try-catch blocks and other language constructs?
Is String somehow special to the compiler? I see that String enhancement is just a library implicit conversion, and that String concatenation is supported by Predef, but is that somehow special-cased by the language?
Similarly, I see questions about <:< and classOf and asInstanceOf, and it's not clear what is a magical intrinsic. Is there a way to tell the difference, either with a compiler option or by looking at byte code?
I would like to understand if a feature is supported uniformly by implementations such as Scala.JS and Scala-native, or if a feature might actually prove to be implementation-dependent, depending on the library implementation.
There is an incredible amount of types that are "known" of the compiler, and are special to varying degrees. You can find a complete list in scalac's Definitions.scala.
We can probably classify them according to the degree of specialness they bear.
Disclaimer: I have probably forgotten a few more.
Special for the type system
The following types are crucial to Scala's type system. They have an influence on how type checking itself is performed. All these types are mentioned in the specification (or at least, they definitely should be).
Any, AnyRef, AnyVal, Null, Nothing: the five types that sit at the top and bottom of Scala's type system.
scala.FunctionN, the (canonical) type given of anonymous functions (including eta-expansion). Even in 2.12, which has the SAM treatment of anonymous functions, FunctionN remains special in some cases (notably in overloading resolution).
scala.PartialFunction (has an impact on how type inference works)
Unit
All types with literal notation: Int, Long, Float, Double, Char, Boolean, String, Symbol, java.lang.Class
All numeric primitive types and Chars, for weak conformance (together, these two bullets cover all primitive types)
Option and tuples (for pattern matching and auto-tupling)
java.lang.Throwable
scala.Dynamic
scala.Singleton
Most of scala.reflect.*, in particular ClassTag, TypeTag, etc.
scala.annotation.{,ClassFile,Static}Annotation
Almost all of the annotations in scala.annotation.* (e.g., unchecked)
scala.language.*
scala.math.ScalaNumber (for obscure reasons)
Known to the compiler as the desugaring of some language features
The following types are not crucial to the type system. They do not have an influence on type checking. However, the Scala language does feature a number of constructs which desugar into expressions of those types.
These types would also be mentioned in the specification.
scala.collection.Seq, Nil and WrappedArray, which are used for varargs parameters.
TupleN types
Product and Serializable (for case classes)
MatchError, generated by pattern matching constructs
scala.xml.*
scala.DelayedInit
List (the compiler performs some trivial optimizations on those, such as rewriting List() as Nil)
Known to the implementation of the language
This is probably the list that you care most about, given that you said you were interested in knowing what could go differently on different back-ends. The previous categories are handled by early (front-end) phases of the compiler, and are therefore shared by Scala/JVM, Scala.js and Scala Native. This category is typically known of the compiler back-end, and so potentially have different treatments. Note that both Scala.js and Scala Native do try to mimic the semantics of Scala/JVM to a reasonable degree.
Those types might not be mentioned in the language specification per se, at least not all of them.
Here are those where the back-ends agree (re Scala Native, to the best of my knowledge):
All primitive types: Boolean, Char, Byte, Short, Int, Long, Float, Double, Unit.
scala.Array.
Cloneable (currently not supported in Scala Native, see #334)
String and StringBuilder (mostly for string concatenation)
Object, for virtually all its methods
And here are those where they disagree:
Boxed versions of primitive types (such as java.lang.Integer)
Serializable
java.rmi.Remote and java.rmi.RemoteException
Some the annotations in scala.annotation.* (e.g., strictfp)
Some stuff in java.lang.reflect.*, used by Scala/JVM to implement structural types
Also, although not types per se, but a long list of primitive methods are also handled specifically by the back-ends.
Platform-specific types
In addition to the types mentioned above, which are available on all platforms, non-JVM platforms add their own special types for interoperability purposes.
Scala.js-specific types
See JSDefinitions.scala
js.Any: conceptually a third subtype of Any, besides AnyVal and AnyRef. They have JavaScript semantics instead of Scala semantics.
String and the boxed versions of all primitive types (heavily rewritten--so-called "hijacked"--by the compiler)
js.ThisFunctionN: their apply methods behave differently than that of other JavaScript types (the first actual argument becomes the thisArgument of the called function)
js.UndefOr and js.| (they behave as JS types even though they do not extend js.Any)
js.Object (new js.Object() is special-cased as an empty JS object literal {})
js.JavaScriptException (behaves very specially in throw and catch)
js.WrappedArray (used by the desugaring of varargs)
js.ConstructorTag (similar to ClassTag)
The annotation js.native, and all annotations in js.annotation.*
Plus, a dozen more primitive methods.
Scala Native-specific types
See NirDefinitions.scala
Unsigned integers: UByte, UShort, UInt and ULong
Ptr, the pointer type
FunctionPtrN, the function pointer types
The annotations in native.*
A number of extra primitive methods in scala.scalanative.runtime

Scala type inference can't mix values and references?

I'm using some Java API from Scala, which looks like :
public Boolean isBoolValue();
public Boolean getBoolValue();
And the type inference engine seems to have troubles with it, like so :
val fetchedValue = if (isBoolValue()) getBoolValue() else false
The inferred type is Any. How come? Or am I holding it wrong?
I know that the compiler sometimes choses unexpected types sometimes when there are different thinggies for the then and the else branch, here we have booleans everywhere.
Of course my example works if I change getBoolValue to return boolean (which I obviously can't, being an API) or if I add type annotation to fetchedValue.
Is there also a nicer way then specifying the expected type for the if expression?
They say Scala doesn't have two booleans, two ints etc like Java but my example seems to prove that somehow there are two flavors of these primitives?
This works:
val fetchedValue = if (isBoolValue()) getBoolValue() else false:java.lang.Boolean
but your problem is that false (in scala) isn't of type java.lang.Boolean so Any is correct.
They say Scala doesn't have two booleans, two ints etc like Java but my example seems to prove that somehow there are two flavors of these primitives?
No, it's the other way about. Scala has one, but Java has two, and the one Scala one can't be the same type as both the Java ones...

Is is reasonable, and is there a benefit to a Scala Symbol class that extends AnyVal?

It seems that one issue with scala.Symbol is it two objects, the Symbol and the String it is based on.
Why can this extra object not be eliminated by defining Sym something like:
class Sym private(val name:String) extends AnyVal {
override def toString = "'" + name
}
object Sym {
def apply(name:String) = new Sym(name.intern)
}
Admittedly the performance implications of object allocation are likely tiny, but comments with those with a deeper understanding of Scala would be illuminating. In particular, does the above provide efficient maps via equality by reference?
Another advantage of the simple 'Sym' above is in a map centric application where there are lots of string keys, but where the strings are naming many entirely different kinds of things, type safe Sym classes can be defined so that Maps will definitively show to the programmer, the compiler and refactoring tools what the key really is.
(Neither Symbol nor Sym can be extened, the former apparently by choice, and the latter because it extends AnyVal, but Sym is trivial enough to just duplicate with an appropriate name)
It is not possible to do Symbol as an AnyVal. The main benefit of Symbols over simple Strings is that Symbols are guaranteed to be interned, so you can test equality of symbols using a simple reference comparison instead of an expensive string comparison.
See the source code of Symbol. Equals is overridden and redefined to do a reference comparison using the eq method.
But unfortunately an AnyVal does not allow you to redefine equality. From the SIP-15 for user-defined value classes:
C may not define concrete equals or hashCode methods.
So while it would be extremely useful to have a way to redefine equality without incurring runtime overhead, it is unfortunately not possible.
Edit: never use string.intern in any program where performance is important. The performance of string.intern is horrible compared to even a trivial intern table. See this SO question and answer. See the source code of Symbol above for a simple intern table.
Unfortunately, object allocation for an AnyVal is forced whenever it is put into a collection, like the Map in your example. This is because the value class has to be cast to the type parameter of the collection, and casting to a new type always forces allocation. This eliminates almost any advantage of declaring Sym as a value class. See Allocation Details in the Scala documentation page for value classes.
For AnyVal the class is actually the String. The magically added methods and type-safety are just compiler tricks. It's the String that gets transfered all around.
For pattern matching (Symbol's purpose as I suppose) Scala needs the class of an object. Thus — Symbol extends AnyRef.

What is the purpose of AnyVal?

I can't think of any situation where the type AnyVal would be useful, especially with the addition of the Numeric type for abstracting over Int, Long, etc. Are there any actual use cases for AnyVal, or is it just an artifact that makes the type hierarchy a bit prettier?
Just to clarify, I know what AnyVal is, I just can't think of any time that I would actually need it in Scala. When would I ever need a type that encompassed Int, Character and Double? It seems like it's just there to make the type hierarchy prettier (i.e. it looks nicer to have AnyVal and AnyRef as siblings rather than having Int, Character, etc. inherit directly from Any).
As om-nom-nom already said, AnyVal is the common super type of all primitives in scala. In scala 2.10 however, there will be a new feature called value classes. Value classes are classes, that can be inlined, with this you can for example reduce the overhead of the extend my library pattern, because there will be no instances of the wrapper classes, that include these methods, instead they will be called statically. You can read everything about value classes in the SIP-15.
Let's go to the videotape, er, the spec 12.2:
Value classes are classes whose instances are not represented as
objects by the underlying host system. All value classes inherit from
class AnyVal.
So, maybe the question is, if everything is an object, why do I care if something is not represented i.e. implemented as an object? That's the implementation in implementation detail.
But let's not pretend, of course you care. Do you never specialize?
The spec goes on:
Scala implementations need to provide the value classes Unit, Boolean,
Double, Float, Long, Int, Char, Short, and Byte (but are free to
provide others as well).
Therefore a test for AnyVal is meaningful, over and above an enumeration of the required value classes.
That said, you must accept #drexin's answer because if you're not using value classes for extension methods, then you're not really living. (In the sense of, living it up.)
Motivation from the SIP:
...classes in Scala that can get completely inlined, so operations on
these classes have zero overhead compared to external methods. Some
use cases for inlined classes are:
Inlined implicit wrappers. Methods on those wrappers would be
translated to extension methods.
New numeric classes, such as unsigned ints. There would no longer
need to be a boxing overhead for such classes. So this is similar to
value classes in .NET.
Classes representing units of measure. Again, no boxing overhead
would be incurred for these classes.
You can mark the extension method itself as #inline and everything is inlined: no object wrapper and your little method is inlined.
I use this feature every day. Yesterday I hit a bug in it. The bug is already fixed. What that says is that it's such a cool feature, the Scala folks will take time out from Coursera to quash a little bug in it.
That reminds me, I forgot to ask, this isn't a Coursera quiz question, is it?