scala - is it possible to force immutability on an object? - scala

I mean if there's some declarative way to prevent an object from changing any of it's members.
In the following example
class student(var name:String)
val s = new student("John")
"s" has been declared as a val, so it will always point to the same student.
But is there some way to prevent s.name from being changed by just declaring it like immutable???
Or the only solution is to declare everything as val, and manually force immutability?

No, it's not possible to declare something immutable. You have to enforce immutability yourself, by not allowing anyone to change it, that is remove all ways of modifying the class.
Someone can still modify it using reflection, but that's another story.

Scala doesn't enforce that, so there is no way to know. There is, however, an interesting compiler-plugin project named pusca (I guess it stands for Pure-Scala). Pure is defined there as not mutating a non-local variable and being side-effect free (e.g. not printing to the console)—so that calling a pure method repeatedly will always yield the same result (what is called referentially transparent).
I haven't tried out that plug-in myself, so I can't say if it's any stable or usable already.

There is no way that Scala could do this generally.
Consider the following hypothetical example:
class Student(var name : String, var course : Course)
def stuff(course : Course) {
magically_pure_val s = new Student("Fredzilla", course)
someFunctionOfStudent(s)
genericHigherOrderFunction(s, someFunctionOfStudent)
course.someMethod()
}
The pitfalls for any attempt to actually implement that magically_pure_val keyword are:
someFunctionOfStudent takes an arbitrary student, and isn't implemented in this compilation unit. It was written/compiled knowing that Student consists of two mutable fields. How do we know it doesn't actually mutate them?
genericHigherOrderFunction is even worse; it's going to take our Student and a function of Student, but it's written polymorphically. Whether or not it actually mutates s depends on what its other arguments are; determining that at compile time with full generality requires solving the Halting Problem.
Let's assume we could get around that (maybe we could set some secret flags that mean exceptions get raised if the s object is actually mutated, though personally I wouldn't find that good enough). What about that course field? Does course.someMethod() mutate it? That method call isn't invoked from s directly.
Worse than that, we only know that we'll have passed in an instance of Course or some subclass of Course. So even if we are able to analyze a particular implementation of Course and Course.someMethod and conclude that this is safe, someone can always add a new subclass of Course whose implementation of someMethod mutates the Course.
There's simply no way for the compiler to check that a given object cannot be mutated. The pusca plugin mentioned by 0__ appears to detect purity the same way Mercury does; by ensuring that every method is known from its signature to be either pure or impure, and by raising a compiler error if the implementation of anything declared to be pure does anything that could cause impurity (unless the programmer promises that the method is pure anyway).[1]
This is quite a different from simply declaring a value to be completely (and deeply) immutable and expecting the compiler to notice if any of the code that could touch it could mutate it. It's also not a perfect inference, just a conservative one
[1]The pusca README claims that it can infer impurity of methods whose last expression is a call to an impure method. I'm not quite sure how it can do this, as checking if that last expression is an impure call requires checking if it's calling a not-declared-impure method that should be declared impure by this rule, and the implementation might not be available to the compiler at that point (and indeed could be changed later even if it is). But all I've done is look at the README and think about it for a few minutes, so I might be missing something.

Related

Scala: when to use explicit type annotations

I've been reading a lot of other people's Scala code recently, and one of the things that I have difficultly with (coming from Java) is a lack of explicit type annotations.
It's certainly convenient when writing code to be able to leave out type annotations -- however when reading code I often find that explicit type annotations help me to understand at a glance what code is doing more easily.
The Scala style guide (http://docs.scala-lang.org/style/types.html) doesn't seem to provide any definitive guidance on this, stating:
Use type inference where possible, but put clarity first, and favour explicitness in public APIs.
To my mind, this is a bit contradictory. While it's clearly obvious what type this variable is:
val tokens = new HashMap[String, Int]
It's not so obvious what type this one is:
val tokens = readTokens()
So, if I was putting clarity first I would probably annotate all variables where the type is not already declared on the same line.
Do any Scala practitioners have guidance on this? Am I crazy to be considering adding type annotations to my local variables? I'm particularly interested in hearing from folks who spend a lot of time reading scala code (for example, in code reviews), as well as writing it.
It's not so obvious what type this one is:
val tokens = readTokens()
Good names are important: the name is plural, ergo it returns some collection of some kind. The most general collection types in Scala are Traversable and Iterator, and they mostly share a common interface, so it's not really important which one of the two it is. The name also talks about "reading tokens", ergo it obviously should return Tokens in some fashion. And last but not least, the method call has parentheses, which according to the style guide means it has side-effects, so I wouldn't count on being able to traverse the collection more than once.
Ergo, the return type is something like
Traversable[Token]
or
Iterator[Token]
and which of the two it is doesn't really matter because their client interfaces are mostly identical.
Note also that the latter constraint (only traversing the collection once) isn't even captured in the type, even if you were providing an explicit type, you would still have to look at the name and the style!

Parenthesis for not pure functions

I know that that I should use () by convention if a method has side effects
def method1(a: String): Unit = {
//.....
}
//or
def method2(): Unit = {
//.....
}
Do I have to do the same thing if a method doesn't have side effects but it's not pure, doesn't have any parameters and, of course, it returns the different results each time it's being called?
def method3() = getRemoteSessionId("login", "password")
Edit: After reviewing Luigi Plinge's comment, I came to think that I should rewrite the answer. This is also not a clear yes/no answer, but some suggestions.
First: The case regarding var is an interesting one. Declaring a var foo gives you a getter foo without parentheses. Obviously it is an impure call, but it does not have a side effect (it does not change anything unobserved by the caller).
Second, regarding your question: I now would not argue that the problem with getRemoteSessionId is that it is impure, but that it actually makes the server maintain some session login for you, so clearly you interfere destructively with the environment. Then method3() should be written with parentheses because of this side-effect nature.
A third example: Getting the contents of a directory should thus be written file.children and not file.children(), because again it is an impure function but should not have side effects (other than perhaps a read-only access to your file system).
A fourth example: Given the above, you should write System.currentTimeMillis. I do tend to write System.currentTimeMillis() however...
Using this forth case, my tentative answer would be: Parentheses are preferable when the function has either a side-effect; or if it is impure and depending on state not under the control of your program.
With this definition, it would not matter whether getRemoteSessionId has known side-effects or not. On the other hand, it implies to revert to writing file.children()...
The Scala style guide recommends:
Methods which act as accessors of any sort (either encapsulating a field or a logical property) should be declared without parentheses except if they have side effects.
It doesn't mention any other use case besides accessors. So the question boils down to whether you regard this method as an accessor, which in turns depends on how the rest of the class is set up and perhaps also on the (intended) call sites.

Everything's an object in Scala

I am new to Scala and heard a lot that everything is an object in Scala. What I don't get is what's the advantage of "everything's an object"? What are things that I cannot do if everything is not an object? Examples are welcome. Thanks
The advantage of having "everything" be an object is that you have far fewer cases where abstraction breaks.
For example, methods are not objects in Java. So if I have two strings, I can
String s1 = "one";
String s2 = "two";
static String caps(String s) { return s.toUpperCase(); }
caps(s1); // Works
caps(s2); // Also works
So we have abstracted away string identity in our operation of making something upper case. But what if we want to abstract away the identity of the operation--that is, we do something to a String that gives back another String but we want to abstract away what the details are? Now we're stuck, because methods aren't objects in Java.
In Scala, methods can be converted to functions, which are objects. For instance:
def stringop(s: String, f: String => String) = if (s.length > 0) f(s) else s
stringop(s1, _.toUpperCase)
stringop(s2, _.toLowerCase)
Now we have abstracted the idea of performing some string transformation on nonempty strings.
And we can make lists of the operations and such and pass them around, if that's what we need to do.
There are other less essential cases (object vs. class, primitive vs. not, value classes, etc.), but the big one is collapsing the distinction between method and object so that passing around and abstracting over functionality is just as easy as passing around and abstracting over data.
The advantage is that you don't have different operators that follow different rules within your language. For example, in Java to perform operations involving objects, you use the dot name technique of calling the code (static objects still use the dot name technique, but sometimes the this object or the static object is inferred) while built-in items (not objects) use a different method, that of built-in operator manipulation.
Number one = Integer.valueOf(1);
Number two = Integer.valueOf(2);
Number three = one.plus(two); // if only such methods existed.
int one = 1;
int two = 2;
int three = one + two;
the main differences is that the dot name technique is subject to polymorphisim, operator overloading, method hiding, and all the good stuff that you can do with Java objects. The + technique is predefined and completely not flexible.
Scala circumvents the inflexibility of the + method by basically handling it as a dot name operator, and defining a strong one-to-one mapping of such operators to object methods. Hence, in Scala everything is an object means that everything is an object, so the operation
5 + 7
results in two objects being created (a 5 object and a 7 object) the plus method of the 5 object being called with the parameter 7 (if my scala memory serves me correctly) and a "12" object being returned as the value of the 5 + 7 operation.
This everything is an object has a lot of benefits in a functional programming environment, for example, blocks of code now are object too, making it possible to pass back and forth blocks of code (without names) as parameters, yet still be bound to strict type checking (the block of code only returns Long or a subclass of String or whatever).
When it boils down to it, it makes some kinds of solutions very easy to implement, and often the inefficiencies are mitigated by the lack of need to handle "move into primitives, manipulate, move out of primitives" marshalling code.
One specific advantage that comes to my mind (since you asked for examples) is what in Java are primitive types (int, boolean ...) , in Scala are objects that you can add functionality to with implicit conversions. For example, if you want to add a toRoman method to ints, you could write an implicit class like:
implicit class RomanInt(i:Int){
def toRoman = //some algorithm to convert i to a Roman representation
}
Then, you could call this method from any Int literal like :
val romanFive = 5.toRoman // V
This way you can 'pimp' basic types to adapt them to your needs
In addition to the points made by others, I always emphasize that the uniform treatment of all values in Scala is in part an illusion. For the most part it is a very welcome illusion. And Scala is very smart to use real JVM primitives as much as possible and to perform automatic transformations (usually referred to as boxing and unboxing) only as much as necessary.
However, if the dynamic pattern of application of automatic boxing and unboxing is very high, there can be undesirable costs (both memory and CPU) associated with it. This can be partially mitigated with the use of specialization, which creates special versions of generic classes when particular type parameters are of (programmer-specified) primitive types. This avoids boxing and unboxing but comes at the cost of more .class files in your running application.
Not everything is an object in Scala, though more things are objects in Scala than their analogues in Java.
The advantage of objects is that they're bags of state which also have some behavior coupled with them. With the addition of polymorphism, objects give you ways of changing the implicit behavior and state. Enough with the poetry, let's go into some examples.
The if statement is not an object, in either scala or java. If it were, you could be able to subclass it, inject another dependency in its place, and use it to do stuff like logging to a file any time your code makes use of the if statement. Wouldn't that be magical? It would in some cases help you debug stuff, and in other cases it would make your hairs grow white before you found a bug caused by someone overwriting the behavior of if.
Visiting an objectless, statementful world: Imaging your favorite OOP programming language. Think of the standard library it provides. There's plenty of classes there, right? They offer ways for customization, right? They take parameters that are other objects, they create other objects. You can customize all of these. You have polymorphism. Now imagine that all the standard library was simply keywords. You wouldn't be able to customize nearly as much, because you can't overwrite keywords. You'd be stuck with whatever cases the language designers decided to implement, and you'd be helpless in customizing anything there. Such languages exist, you know them well, they're the sequel-like languages. You can barely create functions there, but in order to customize the behavior of the SELECT statement, new versions of the language had to appear which included the features most desired. This would be an extreme world, where you'd only be able to program by asking the language designers for new features (which you might not get, because someone else more important would require some feature incompatible with what you want)
In conclusion, NOT everything is an object in scala: Classes, expressions, keywords and packages surely aren't. More things however are, like functions.
What's IMHO a nice rule of thumb is that more objects equals more flexibility
P.S. in Python for example, even more things are objects (like the classes themselves, the analogous concept for packages (that is python modules and packages). You'd see how there, black magic is easier to do, and that brings both good and bad consequences.

Why making a difference between methods and functions in Scala?

I have been reading about methods and functions in Scala. Jim's post and Daniel's complement to it do a good job of explaining what the differences between these are. Here is what I took with me:
functions are objects, methods are not;
as a consequence functions can be passed as argument, but methods can not;
methods can be type-parametrised, functions can not;
methods are faster.
I also understand the difference between def, val and var.
Now I have actually two questions:
Why can't we parametrise the apply method of a function to parametrise the function? And
Why can't the method be called by the function object to run faster? Or the caller of the function be made calling the original method directly?
Looking forward to your answers and many thanks in advance!
1 - Parameterizing functions.
It is theoretically possible for a compiler to parameterize the type of a function; one could add that as a feature. It isn't entirely trivial, though, because functions are contravariant in their argument and covariant in their return value:
trait Function1[+T,-R] { ... }
which means that another function that can take more arguments counts as a subclass (since it can process anything that the superclass can process), and if it produces a smaller set of results, that's okay (since it will also obey the superclass construct that way). But how do you encode
def fn[A](a: A) = a
in that framework? The whole point is that the return type is equal to the type passed in, whatever that type has to be. You'd need
Function1[ ThisCanBeAnything, ThisHasToMatch ]
as your function type. "This can be anything" is well-represented by Any if you want a single type, but then you could return anything as the original type is lost. This isn't to say that there is no way to implement it, but it doesn't fit nicely into the existing framework.
2 - Speed of functions.
This is really simple: a function is the apply method on another object. You have to have that object in order to call its method. This will always be slower (or at least no faster) than calling your own method, since you already have yourself.
As a practical matter, JVMs can do a very good job inlining functions these days; there is often no difference in performance as long as you're mostly using your method or function, not creating the function object over and over. If you're deeply nesting very short loops, you may find yourself creating way too many functions; moving them out into vals outside of the nested loops may save time. But don't bother until you've benchmarked and know that there's a bottleneck there; typically the JVM does the right thing.
Think about the type signature of a function. It explicitly says what types it takes. So then type-parameterizing apply() would be inconsistent.
A function is an object, which must be created, initialized, and then garbage-collected. When apply() is called, it has to grab the function object in addition to the parent.

Scala toString: parenthesize or not?

I'd like this thread to be some kind of summary of pros/cons for overriding and calling toString with or without empty parentheses, because this thing still confuses me sometimes, even though I've been into Scala for quite a while.
So which one is preferable over another? Comments from Scala geeks, officials and OCD paranoids are highly appreciated.
Pros to toString:
seems to be an obvious and natural choice at the first glance;
most cases are trivial and just construct Strings on the fly without ever modifying internal state;
another common case is to delegate method call to the wrapped abstraction:
override def toString = underlying.toString
Pros to toString():
definitely not "accessor-like" name (that's how IntelliJ IDEA inspector complains every once in a while);
might imply some CPU or I/O work (in cases where counting every System.arrayCopy call is crucial to performance);
even might imply some mutable state changing (consider an example when first toString call is expensive, so it is cached internally to yield quicker calls in future).
So what's the best practice? Am I still missing something?
Update: this question is related specifically to toString which is defined on every JVM object, so I was hoping to find the best practice, if it ever exists.
Here's what Programming In Scala (section 10.3) has to say:
The recommended convention is to use a parameterless method whenever
there are no parameters and the method accesses mutable state only by
reading fields of the containing object (in particular, it does not
change mutable state). This convention supports the uniform access
principle,1 which says that client code should not be affected by a
decision to implement an attribute as a field or method.
Here's what the (unofficial) Scala Style Guide (page 18) has to say:
Scala allows the omission of parentheses on methods of arity-0 (no
arguments):
reply()
// is the same as
reply
However, this syntax
should only be used when the method in question has no side-effects
(purely-functional). In other words, it would be acceptable to omit
parentheses when calling queue.size, but not when calling println().
This convention mirrors the method declaration convention given above.
The latter does not mention the Uniform Access Principle.
If your toString method can be implemented as a val, it implies the field is immutable. If, however, your class is mutable, toString might not always yield the same result (e.g. for StringBuffer). So Programming In Scala implies that we should use toString() in two different situations:
1) When its value is mutable
2) When there are side-effects
Personally I think it's more common and more consistent to ignore the first of these. In practice toString will almost never have side-effects. So (unless it does), always use toString and ignore the Uniform Access Principle (following the Style Guide): keep parentheses to denote side-effects, rather than mutability.
Yes, you are missing something: Semantics.
If you have a method that simply gives back a value, you shouldn't use parens. The reason is that this blurs the line between vals and defs, satisfying the Uniform Access Principle. E.g. consider the size method for collections. For fixed-sized vectors or arrays this can be just a val, other collections may need to calculate it.
The usage of empty parens should be limited to methods which perform some kind of side effect, e.g. println(), or a method that increases an internal counter, or a method that resets a connection etc.
I would recommend always using toString. Regarding your third "pro" to toString():
Might imply some mutable state changing (consider an example when first toString call is expensive, so it is cached internally to yield quicker calls in future).
First of all, toString generally shouldn't be an expensive operation. But suppose it is expensive, and suppose you do choose to cache the result internally. Even in that case, I'd say use toString, as long as the result of toString is always the same for a given state of the object (disregarding the state of the toString cache).
The only reason I would not recommend using toString without parens is if you have a code profiler/analyzer that makes assumptions based on the presence or absence of parens. In that case, follow the conventions set forth by said profiler. Also, if your toString is that complicated, consider renaming it to something else, like expensiveToString. It is unofficially expected that toString be a straightforward, simple function in most cases.
Not much argumentation in this answer but GenTraversableOnce alone declares the following defs without parentheses:
toArray
toBuffer
toIndexedSeq
toIterable
toIterator
toList
toMap
toSeq
toSet
toStream
toTraversable