What is the purpose of AnyVal? - scala

I can't think of any situation where the type AnyVal would be useful, especially with the addition of the Numeric type for abstracting over Int, Long, etc. Are there any actual use cases for AnyVal, or is it just an artifact that makes the type hierarchy a bit prettier?
Just to clarify, I know what AnyVal is, I just can't think of any time that I would actually need it in Scala. When would I ever need a type that encompassed Int, Character and Double? It seems like it's just there to make the type hierarchy prettier (i.e. it looks nicer to have AnyVal and AnyRef as siblings rather than having Int, Character, etc. inherit directly from Any).

As om-nom-nom already said, AnyVal is the common super type of all primitives in scala. In scala 2.10 however, there will be a new feature called value classes. Value classes are classes, that can be inlined, with this you can for example reduce the overhead of the extend my library pattern, because there will be no instances of the wrapper classes, that include these methods, instead they will be called statically. You can read everything about value classes in the SIP-15.

Let's go to the videotape, er, the spec 12.2:
Value classes are classes whose instances are not represented as
objects by the underlying host system. All value classes inherit from
class AnyVal.
So, maybe the question is, if everything is an object, why do I care if something is not represented i.e. implemented as an object? That's the implementation in implementation detail.
But let's not pretend, of course you care. Do you never specialize?
The spec goes on:
Scala implementations need to provide the value classes Unit, Boolean,
Double, Float, Long, Int, Char, Short, and Byte (but are free to
provide others as well).
Therefore a test for AnyVal is meaningful, over and above an enumeration of the required value classes.
That said, you must accept #drexin's answer because if you're not using value classes for extension methods, then you're not really living. (In the sense of, living it up.)
Motivation from the SIP:
...classes in Scala that can get completely inlined, so operations on
these classes have zero overhead compared to external methods. Some
use cases for inlined classes are:
Inlined implicit wrappers. Methods on those wrappers would be
translated to extension methods.
New numeric classes, such as unsigned ints. There would no longer
need to be a boxing overhead for such classes. So this is similar to
value classes in .NET.
Classes representing units of measure. Again, no boxing overhead
would be incurred for these classes.
You can mark the extension method itself as #inline and everything is inlined: no object wrapper and your little method is inlined.
I use this feature every day. Yesterday I hit a bug in it. The bug is already fixed. What that says is that it's such a cool feature, the Scala folks will take time out from Coursera to quash a little bug in it.
That reminds me, I forgot to ask, this isn't a Coursera quiz question, is it?

Related

What types are special to the Scala compiler?

Scala makes a big deal about how what seem to be language features are implemented as library features.
Is there a list of types that are treated specially by the language?
Either in the specification or as an implementation detail?
That would include, for example, optimizing away matches on tuples.
What about special conventions related to pattern matching, for comprehensions, try-catch blocks and other language constructs?
Is String somehow special to the compiler? I see that String enhancement is just a library implicit conversion, and that String concatenation is supported by Predef, but is that somehow special-cased by the language?
Similarly, I see questions about <:< and classOf and asInstanceOf, and it's not clear what is a magical intrinsic. Is there a way to tell the difference, either with a compiler option or by looking at byte code?
I would like to understand if a feature is supported uniformly by implementations such as Scala.JS and Scala-native, or if a feature might actually prove to be implementation-dependent, depending on the library implementation.
There is an incredible amount of types that are "known" of the compiler, and are special to varying degrees. You can find a complete list in scalac's Definitions.scala.
We can probably classify them according to the degree of specialness they bear.
Disclaimer: I have probably forgotten a few more.
Special for the type system
The following types are crucial to Scala's type system. They have an influence on how type checking itself is performed. All these types are mentioned in the specification (or at least, they definitely should be).
Any, AnyRef, AnyVal, Null, Nothing: the five types that sit at the top and bottom of Scala's type system.
scala.FunctionN, the (canonical) type given of anonymous functions (including eta-expansion). Even in 2.12, which has the SAM treatment of anonymous functions, FunctionN remains special in some cases (notably in overloading resolution).
scala.PartialFunction (has an impact on how type inference works)
Unit
All types with literal notation: Int, Long, Float, Double, Char, Boolean, String, Symbol, java.lang.Class
All numeric primitive types and Chars, for weak conformance (together, these two bullets cover all primitive types)
Option and tuples (for pattern matching and auto-tupling)
java.lang.Throwable
scala.Dynamic
scala.Singleton
Most of scala.reflect.*, in particular ClassTag, TypeTag, etc.
scala.annotation.{,ClassFile,Static}Annotation
Almost all of the annotations in scala.annotation.* (e.g., unchecked)
scala.language.*
scala.math.ScalaNumber (for obscure reasons)
Known to the compiler as the desugaring of some language features
The following types are not crucial to the type system. They do not have an influence on type checking. However, the Scala language does feature a number of constructs which desugar into expressions of those types.
These types would also be mentioned in the specification.
scala.collection.Seq, Nil and WrappedArray, which are used for varargs parameters.
TupleN types
Product and Serializable (for case classes)
MatchError, generated by pattern matching constructs
scala.xml.*
scala.DelayedInit
List (the compiler performs some trivial optimizations on those, such as rewriting List() as Nil)
Known to the implementation of the language
This is probably the list that you care most about, given that you said you were interested in knowing what could go differently on different back-ends. The previous categories are handled by early (front-end) phases of the compiler, and are therefore shared by Scala/JVM, Scala.js and Scala Native. This category is typically known of the compiler back-end, and so potentially have different treatments. Note that both Scala.js and Scala Native do try to mimic the semantics of Scala/JVM to a reasonable degree.
Those types might not be mentioned in the language specification per se, at least not all of them.
Here are those where the back-ends agree (re Scala Native, to the best of my knowledge):
All primitive types: Boolean, Char, Byte, Short, Int, Long, Float, Double, Unit.
scala.Array.
Cloneable (currently not supported in Scala Native, see #334)
String and StringBuilder (mostly for string concatenation)
Object, for virtually all its methods
And here are those where they disagree:
Boxed versions of primitive types (such as java.lang.Integer)
Serializable
java.rmi.Remote and java.rmi.RemoteException
Some the annotations in scala.annotation.* (e.g., strictfp)
Some stuff in java.lang.reflect.*, used by Scala/JVM to implement structural types
Also, although not types per se, but a long list of primitive methods are also handled specifically by the back-ends.
Platform-specific types
In addition to the types mentioned above, which are available on all platforms, non-JVM platforms add their own special types for interoperability purposes.
Scala.js-specific types
See JSDefinitions.scala
js.Any: conceptually a third subtype of Any, besides AnyVal and AnyRef. They have JavaScript semantics instead of Scala semantics.
String and the boxed versions of all primitive types (heavily rewritten--so-called "hijacked"--by the compiler)
js.ThisFunctionN: their apply methods behave differently than that of other JavaScript types (the first actual argument becomes the thisArgument of the called function)
js.UndefOr and js.| (they behave as JS types even though they do not extend js.Any)
js.Object (new js.Object() is special-cased as an empty JS object literal {})
js.JavaScriptException (behaves very specially in throw and catch)
js.WrappedArray (used by the desugaring of varargs)
js.ConstructorTag (similar to ClassTag)
The annotation js.native, and all annotations in js.annotation.*
Plus, a dozen more primitive methods.
Scala Native-specific types
See NirDefinitions.scala
Unsigned integers: UByte, UShort, UInt and ULong
Ptr, the pointer type
FunctionPtrN, the function pointer types
The annotations in native.*
A number of extra primitive methods in scala.scalanative.runtime

Memory overhead of Case classes in scala

What is the memory overhead of a case class in scala ?
I've implemented some code to hold a lexicon with multiple types of interned tokens for NLP processing. I've got a case class for each token type.
For example, the canonical lemma/stem token is as follows:
sealed trait InternedLexAtom extends LexAtom{
def id : Int
}
case class Lemma(id: Int) extends InternedLexAtom
I'm going to be returning document vectors of these interned tokens, the reason I wrap them in case classes is to be able to add methods to the tokens via implicit classes. The reason I use this way of adding behaviour to the lexeme's is because I want the lexemes to have different methods based on different contexts.
So I'm hoping the answer will be zero memory overhead due to type erasure. Is this the case ?
I have a suspicion that a single pointer might be packed with the parameters for some of the magic Scala can do :(
justification
To put things in perspective. The JVM uses 1.5-2gigs of memory with my lexicon loaded (the lexicon does not use cases classes in it's in-memory representation), and C++ does the same in 500-700 mb of memory. If my codebase keeps scaling it's memory requirements the way it is now I'm not going to be able to do this stuff on my laptop (in-memory)
I'll sidestep the problem by structuring my code differently. For example I can just strip away the case classes in vector representations if I need to. Would be nice if I didn't have to.
Question Extension.
Robin and Pedro have addressed the use-case, thank you. In this case I was missing value classes. With those there are no more downsides. additionally: I tried my best not to mention C++'s POD concept. But now I must ask :D A c++ POD is just a struct with primitive values. If I wanted to pack more than just one value into value class, how would I achieve this ? I am assuming this would be what I want to do ?
class SuperTriple(val underlying: Tuple2[Int,Int]) extends AnyVal {
def super: underlying._1
def triple: underlying._2
}
I do actually need the above construct, since a SuperTriple is what I am using as my vector model symbol :D
The original question still remains "what is the overhead of a case class".
In Scala 2.10 you can use value classes. (In older versions of Scala, for something with zero overhead for just one member, you need to use unboxed tagged types.)

Is is reasonable, and is there a benefit to a Scala Symbol class that extends AnyVal?

It seems that one issue with scala.Symbol is it two objects, the Symbol and the String it is based on.
Why can this extra object not be eliminated by defining Sym something like:
class Sym private(val name:String) extends AnyVal {
override def toString = "'" + name
}
object Sym {
def apply(name:String) = new Sym(name.intern)
}
Admittedly the performance implications of object allocation are likely tiny, but comments with those with a deeper understanding of Scala would be illuminating. In particular, does the above provide efficient maps via equality by reference?
Another advantage of the simple 'Sym' above is in a map centric application where there are lots of string keys, but where the strings are naming many entirely different kinds of things, type safe Sym classes can be defined so that Maps will definitively show to the programmer, the compiler and refactoring tools what the key really is.
(Neither Symbol nor Sym can be extened, the former apparently by choice, and the latter because it extends AnyVal, but Sym is trivial enough to just duplicate with an appropriate name)
It is not possible to do Symbol as an AnyVal. The main benefit of Symbols over simple Strings is that Symbols are guaranteed to be interned, so you can test equality of symbols using a simple reference comparison instead of an expensive string comparison.
See the source code of Symbol. Equals is overridden and redefined to do a reference comparison using the eq method.
But unfortunately an AnyVal does not allow you to redefine equality. From the SIP-15 for user-defined value classes:
C may not define concrete equals or hashCode methods.
So while it would be extremely useful to have a way to redefine equality without incurring runtime overhead, it is unfortunately not possible.
Edit: never use string.intern in any program where performance is important. The performance of string.intern is horrible compared to even a trivial intern table. See this SO question and answer. See the source code of Symbol above for a simple intern table.
Unfortunately, object allocation for an AnyVal is forced whenever it is put into a collection, like the Map in your example. This is because the value class has to be cast to the type parameter of the collection, and casting to a new type always forces allocation. This eliminates almost any advantage of declaring Sym as a value class. See Allocation Details in the Scala documentation page for value classes.
For AnyVal the class is actually the String. The magically added methods and type-safety are just compiler tricks. It's the String that gets transfered all around.
For pattern matching (Symbol's purpose as I suppose) Scala needs the class of an object. Thus — Symbol extends AnyRef.

Why is String different than Int,Boolean,Byte... in scala?

Because I know a little bit Java I was trying to use in every Scala Code some Java types like java.lang.Integer, java.lang.Character, java.lang.Boolean ... and so on. Now people told me "No! For everything in Scala there is an own type, the Java stuff will work - but you should always prefer the Scala types and objects".
Ok now I see, there is everything in Scala that is in Java. Not sure why it is better to use for example Scala Boolean instead of Java Boolean, but fine. If I look at the types I see scala.Boolean, scala.Int, scala.Byte ... and then I look at String but its not scala.String, (well its not even java.lang.String confuse) its just a String. But I thought I should use everything that comes direct from scala. Maybe I do not understand scala correctly, could somebody explain it please?
First, 'well its not even java.lang.String' statement is not quite correct. Plain String name comes from type alias defined in Predef object:
type String = java.lang.String
and insides of Predef are imported in every Scala source, hence you use String instead of full java.lang.String, but in fact they are the same.
java.lang.String is very special class treated by JVM in special way. As #pagoda_5b said, it is declared as final, it is not possible to extend it (and this is good in fact), so Scala library provides a wrapper (RichString) with additional operations and an implicit conversion String -> RichString available by default.
However, there is slightly different situation with Integer, Character, Boolean etc. You see, even though String is treated specially by JVM, it still a plain class whose instances are plain objects. Semantically it is not different from, say, List class.
There is another situation with primitive types. Java int, char, boolean types are not classes, and values of these types are not objects. But Scala is fully object-oriented language, there are no primitive types. It would be possible to use java.lang.{Integer,Boolean,...} everywhere where you need corresponding types, but this would be awfully inefficient because of boxing.
Because of this Scala needed a way to present Java primitive types in object-oriented setting, and so scala.{Int,Boolean,...} classes were introduced. These types are treated specially via Scala compiler - scalac generates code working with primitives when it encounters one of these classes. They also extend AnyVal class, which prevents you from using null as a value for these types. This approach solves the problem with efficiency, leaves java.lang.{Integer,Boolean,...} classes available where you really need boxing, and also provides elegant way to use primitives of another host system (e.g. .NET runtime).
I'm just guessing here
If you look at the docs, you can see that the scala version of primitives gives you all the expected operators that works on numeric types, or boolean types, and sensible conversions, without resorting to boxing-unboxing as for java.lang wrappers.
I think this choice was made to give uniform and natural access to what was expected of primitive types, while at the same time making them Objects as any other scala type.
I suppose that java.lang.String required a different approach, being an Object already, and final in its implementation. So the "path of least pain" was to create an implicit Rich wrapper around it to get missing operations on String, while leaving the rest untouched.
To see it another way, java.lang.String was already good enough as-is, being immutable and what-else.
It's worth mentioning that the other "primitive" types in scala have their own Rich wrappers that provides additional sensible operations.

Practical uses for Structural Types?

Structural types are one of those "wow, cool!" features of Scala. However, For every example I can think of where they might help, implicit conversions and dynamic mixin composition often seem like better matches. What are some common uses for them and/or advice on when they are appropriate?
Aside from the rare case of classes which provide the same method but aren't related nor do implement a common interface (for example, the close() method -- Source, for one, does not extend Closeable), I find no use for structural types with their present restriction. If they were more flexible, however, I could well write something like this:
def add[T: { def +(x: T): T }](a: T, b: T) = a + b
which would neatly handle numeric types. Every time I think structural types might help me with something, I hit that particular wall.
EDIT
However unuseful I find structural types myself, the compiler, however, uses it to handle anonymous classes. For example:
implicit def toTimes(count: Int) = new {
def times(block: => Unit) = 1 to count foreach { _ => block }
}
5 times { println("This uses structural types!") }
The object resulting from (the implicit) toTimes(5) is of type { def times(block: => Unit) }, ie, a structural type.
I don't know if Scala does that for every anonymous class -- perhaps it does. Alas, that is one reason why doing pimp my library that way is slow, as structural types use reflection to invoke the methods. Instead of an anonymous class, one should use a real class to avoid performance issues in pimp my library.
Structural types are very cool constructs in Scala. I've used them to represent multiple unrelated types that share an attribute upon which I want to perform a common operation without a new level of abstraction.
I have heard one argument against structural types from people who are strict about an application's architecture. They feel it is dangerous to apply a common operation across types without an associative trait or parent type, because you then leave the rule of what type the method should apply to open-ended. Daniel's close() example is spot on, but what if you have another type that requires different behavior? Someone who doesn't understand the architecture might use it and cause problems in the system.
I think structural types are one of these features that you don't need that often, but when you need it, it helps you a lot. One area where structural types really shine is "retrofitting", e.g. when you need to glue together several pieces of software you have no source code for and which were not intended for reuse. But if you find yourself using structural types a lot, you're probably doing it wrong.
[Edit]
Of course implicits are often the way to go, but there are cases when you can't: Imagine you have a mutable object you can modify with methods, but which hides important parts of it's state, a kind of "black box". Then you have to work somehow with this object.
Another use case for structural types is when code relies on naming conventions without a common interface, e.g. in machine generated code. In the JDK we can find such things as well, like the StringBuffer / StringBuilder pair (where the common interfaces Appendable and CharSequence are way to general).
Structural types gives some benefits of dynamic languages to a statically linked language, specifically loose coupling. If you want a method foo() to call instance methods of class Bar, you don't need an interface or base-class that is common to both foo() and Bar. You can define a structural type that foo() accepts and whose Bar has no clue of existence. As long as Bar contains methods that match the structural type signatures, foo() will be able to call.
It's great because you can put foo() and Bar on distinct, completely unrelated libraries, that is, with no common referenced contract. This reduces linkage requirements and thus further contributes for loose coupling.
In some situations, a structural type can be used as an alternative to the Adapter pattern, because it offers the following advantages:
Object identity is preserved (there is no separate object for the adapter instance, at least in the semantic level).
You don't need to instantiate an adapter - just pass a Bar instance to foo().
You don't need to implement wrapper methods - just declare the required signatures in the structural type.
The structural type doesn't need to know the actual instance class or interface, while the adapter must know Bar so it can call its methods. This way, a single structural type can be used for many actual types, whereas with adapter it's necessary to code multiple classes - one for each actual type.
The only drawback of structural types compared to adapters is that a structural type can't be used to translate method signatures. So, when signatures doesn't match, you must use adapters that will have some translation logic. I particularly don't like to code "intelligent" adapters because in many times they are more than just adapters and cause increased complexity. If a class client needs some additional method, I prefer to simply add such method, since it usually doesn't affect footprint.