Why use Collection.empty[T] instead of new Collection[T]() - scala

I was wondering if there is a good reason to use Collection.empty[T] instead of new Collection[T]() (or the inverse) ? Or is it just a personal preference ?
Thanks.

Calling new Collection[T]() will create a new instance every time. On the other hand, Collection.empty[T] will most likely always return the same singleton object, usually defined somewhere as
object Empty extends Collection[Nothing] ...
which will be much faster. Edit: This is only possible for immutable collections, mutable collections have to return a new instance every time empty is called.

You should always prefer Collection.empty[Type].

In addition to Collection.empty[T] being clearer on the intent, you should favour it for the same reason that you should favour factory methods in general when instantiating a collection: because thoses factories abstract away some implementation details that you might not (or should not) care about.
By example, when you do Seq.empty[String] you actually get an instance of List[String]. You could directly instantiate a List[String] but if all you care about is to have some Seq you would introduce a needless dependency to List (well OK, actually you cannot as it stands, because List is already abstract, but let's pretend we can for the sake of the argument)
The whole point of factories is precisely to have some amount of separation of concern and not bother with unnecessary instantiation details.
As another more elaborate example, let's talk about collection.immutable.HashMap. This one is very much a concrete class so you might think there is no need for a factory here. Except that for optimization purpose the factory in the companion object collection.immutable.HashMap will actually create different sub-classes depending on the number of elements that you initialize the map with (see this question: Scala: how to make a Hash(Trie)Map from a Map (via Anorm in Play)). Obviously, if you directly instantiate collection.immutable.HashMap you will lose this optimization.
Another common optimization for empty is to always return (when it is an immutable collection) the same instance, yet another useful optimization that you would lose by directly instantiating the collection.
So as a rule of thumb, as far as you can you should use the factories that are provided by the various collection companion objects, so as to shield yourself from unneeded dependencies while at the same time benefiting from potential optimizations provided by the collection framework.
empty is just a special case of factory, and so the same logic applies.

Related

why class Set does not exists in scala?

There are many classes which inherit trait Set.
HashSet, TreeSet, etc.
And there's Object(could i call it companion object of trait Set? not in the case of class Set?) Set and trait Set.
It seems to me that just adding one more class "Set" to this list make it seems to be really easy to understand the structure.
is there any reason Class Set should not exists?
If you just need a set, use Set.apply and you will have a valid set that supports all important operations. You don't need to worry how is it implemented. It is prepared to work well for most use cases.
On the other hand, if performance of certain operations matters for you, create a concrete class for concrete implementation of set, and you will know exactly what you have.
In java you would write:
Set<String> strings = new HashSet<>(Arrays.asList("a", "b"));
in scala you could as well have those types
val strings: Set[String] = HashSet("a", "b")
but you can also use a handy factory if you don't need to worry about the type and simply use
val strings = Set("a", "b")
and nothing is wrong with this, and I don't see how adding another class would help at all. It is normal thing to have an interface/trait and concrete implementations, nothing in the middle is needed nor helpful.
Set.apply is a factory for sets. You can check what is the actual class of resulting object using getClass. This factory creates special, optimized sets for sizes 0-4, for example
scala.collection.immutable.Set$EmptySet$
scala.collection.immutable.Set$Set1
scala.collection.immutable.Set$Set2
for bigger sets it is a hash set, namely scala.collection.immutable.HashSet$HashTrieSet.
In Scala there is no overlap between classes and traits. Classes are implementations that can be instantiated, while traits are independently mixable interfaces. The use of Set.apply gives an object with interface Set and that is all you need to know to use it. I fully understand wanting a concrete type, but that would be unnecessary. The right thing to do here is save it to a val of type Set and use only the interface Set provides.
I know that may not be satisfying, but give it time and the Scala type system will make sense in terms of itself, even if that is different than what Java does.

Why are SessionVars in Lift implemented using singletons?

One typical way of managing state in Lift is to create a singleton object extending SessionVar, like in this example taken from the documentation:
object MySnippetCompanion {
object mySessionVar extends SessionVar[String]("hello")
}
The case for using SessionVars is clear and I've been using them in practice as needed. I also roughly understand how they work inside.
Still, I can't help but wonder why the mechanism for "session variables", which are clearly associated with the current session (usually just one out of many sessions in the system), was designed to be used via a singleton? This goes so against my intuition that at first glance I was tempted to believe that Lift was somehow able to override Scala's language features and to make object mean something different that in regular Scala.
Even though I now understand how it works, I can't grasp the rationale for such a design, which, at least for me, breaks the rule of least astonishment. Can someone point out any advantages or perhaps explain why such a design decision could have been made?
Session variables in Lift use Scala's DynamicVariable. Basically they allow you to statically reference a variable in a code-block and then later on call the code and substitute a value:
import scala.util.DynamicVariable
val x = new DynamicVariable(1)
def printIt() {
println(x.value)
}
printIt()
//> 1
x.withValue(2)(printIt())
//> 2
So each time a request is handled, the scope of these dynamic variables is changed to the current session, completely hiding the state change of the current session to you as a programmer.
The other option would be to pass around a "sessionID" object which you would have to use when you want to access session specific data. Not really handy.
The reason you have to use the object keyword is that object is unique in that it defines both a value and a class. This allows Lift to call getClass to get a name that uniquely identifies this SessionVar vs. any other one, which Lift needs in order to serialize and deserialize every piece of session state in the right place(s). Furthermore if the SessionVar is in a class that has two instances (for instance a snippet rendered in two tabs), they will both refer to the same piece of session state. (The flip side of the coin is that the same SessionVar instance can be referenced by two different sessions and mean the right thing to each.)
Actually at times this is insufficient --- for instance, if you define a SessionVar in a trait, and have two different classes that inherit the trait, but you need them two have two different values. The solution in that case is to override the def for the "name salt", which is combined with getClass to identify the SessionVar.

Should Scala immutable case classes be defined to hold Seq[T], immutable.Seq[T], List[T] or Vector[T]?

If we want to define a case class that holds a single object, say a tuple, we can do it easily:
sealed case class A(x: (Int, Int))
In this case, retrieving the "x" value will take a small constant amount of time, and this class will only take a small constant amount of space, regardless of how it was created.
Now, let's assume we want to hold a sequence of values instead; we could it like this:
sealed final case class A(x: Seq[Int])
This might seem to work as before, except that now storage and time to read all of x is proportional to x.length.
However, this is not actually the case, because someone could do something like this:
val hugeList = (1 to 1000000000).toList
val a = A(hugeList.view.filter(_ == 500000000))
In this case, the a object looks like an innocent case class holding a single int in a sequence, but in fact it requires gigabytes of memory, and it will take on the order of seconds to access that single element every time.
This could be fixed by specifying something like List[T] as the type instead of Seq[T]; however, this seems ugly since it adds a reference to a specific implementation, while in fact other well behaved implementations, like Vector[T], would also do.
Another worrying issue is that one could pass a mutable Seq[T], so it seems that one should at least use immutable.Seq instead of scala.collection.Seq (although the compiler can't actually enforce the immutability at the moment).
Looking at most libraries it seems that the common pattern is to use scala.collection.Seq[T], but is this really a good idea?
Or perhaps Seq is being used just because it's the shortest to type, and in fact it would be best to use immutable.Seq[T], List[T], Vector[T] or something else?
New text added in edit
Looking at the class library, some of the most core functionality like scala.reflect.api.Trees does in fact use List[T], and in general using a concrete class seems a good idea.
But then, why use List and not Vector?
Vector has O(1)/O(log(n)) length, prepend, append and random access, is asymptotically smaller (List is ~3-4 times bigger due to vtable and next pointers), and supports cache efficient and parallelized computation, while List has none of those properties except O(1) prepend.
So, personally I'm leaning towards Vector[T] being the correct choice for something exposed in a library data structure, where one doesn't know what operations the library user will need, despite the fact that it seems less popular.
First of all, you talk both about space and time requirements. In terms of space, your object will always be as large as the collection. It doesn't matter whether you wrap a mutable or immutable collection, that collection for obvious reasons needs to be in memory, and the case class wrapping it doesn't take any additional space (except its own small object reference). So if your collection takes "gigabytes of memory", that's a problem of your collection, not whether you wrap it in a case class or not.
You then go on to argue that a problem arises when using views instead of eager collections. But again the question is what the problem actually is? You use the example of lazily filtering a collection. In general running a filter will be an O(n) operation just as if you were iterating over the original list. In that example it would be O(1) for successive calls if that collection was made strict. But that's a problem of the calling site of your case class, not the definition of your case class.
The only valid point I see is with respect to mutable collections. Given the defining semantics of case classes, you should really only use effectively immutable objects as arguments, so either pure immutable collections or collections to which no instance has any more write access.
There is a design error in Scala in that scala.Seq is not aliased to collection.immutable.Seq but a general seq which can be either mutable or immutable. I advise against any use of unqualified Seq. It is really wrong and should be rectified in the Scala standard library. Use collection.immutable.Seq instead, or if the collection doesn't need to be ordered, collection.immutable.Traversable.
So I agree with your suspicion:
Looking at most libraries it seems that the common pattern is to use scala.collection.Seq[T], but is this really a good idea?
No! Not good. It might be convenient, because you can pass in an Array for example without explicit conversion, but I think a cleaner design is to require immutability.

Everything's an object in Scala

I am new to Scala and heard a lot that everything is an object in Scala. What I don't get is what's the advantage of "everything's an object"? What are things that I cannot do if everything is not an object? Examples are welcome. Thanks
The advantage of having "everything" be an object is that you have far fewer cases where abstraction breaks.
For example, methods are not objects in Java. So if I have two strings, I can
String s1 = "one";
String s2 = "two";
static String caps(String s) { return s.toUpperCase(); }
caps(s1); // Works
caps(s2); // Also works
So we have abstracted away string identity in our operation of making something upper case. But what if we want to abstract away the identity of the operation--that is, we do something to a String that gives back another String but we want to abstract away what the details are? Now we're stuck, because methods aren't objects in Java.
In Scala, methods can be converted to functions, which are objects. For instance:
def stringop(s: String, f: String => String) = if (s.length > 0) f(s) else s
stringop(s1, _.toUpperCase)
stringop(s2, _.toLowerCase)
Now we have abstracted the idea of performing some string transformation on nonempty strings.
And we can make lists of the operations and such and pass them around, if that's what we need to do.
There are other less essential cases (object vs. class, primitive vs. not, value classes, etc.), but the big one is collapsing the distinction between method and object so that passing around and abstracting over functionality is just as easy as passing around and abstracting over data.
The advantage is that you don't have different operators that follow different rules within your language. For example, in Java to perform operations involving objects, you use the dot name technique of calling the code (static objects still use the dot name technique, but sometimes the this object or the static object is inferred) while built-in items (not objects) use a different method, that of built-in operator manipulation.
Number one = Integer.valueOf(1);
Number two = Integer.valueOf(2);
Number three = one.plus(two); // if only such methods existed.
int one = 1;
int two = 2;
int three = one + two;
the main differences is that the dot name technique is subject to polymorphisim, operator overloading, method hiding, and all the good stuff that you can do with Java objects. The + technique is predefined and completely not flexible.
Scala circumvents the inflexibility of the + method by basically handling it as a dot name operator, and defining a strong one-to-one mapping of such operators to object methods. Hence, in Scala everything is an object means that everything is an object, so the operation
5 + 7
results in two objects being created (a 5 object and a 7 object) the plus method of the 5 object being called with the parameter 7 (if my scala memory serves me correctly) and a "12" object being returned as the value of the 5 + 7 operation.
This everything is an object has a lot of benefits in a functional programming environment, for example, blocks of code now are object too, making it possible to pass back and forth blocks of code (without names) as parameters, yet still be bound to strict type checking (the block of code only returns Long or a subclass of String or whatever).
When it boils down to it, it makes some kinds of solutions very easy to implement, and often the inefficiencies are mitigated by the lack of need to handle "move into primitives, manipulate, move out of primitives" marshalling code.
One specific advantage that comes to my mind (since you asked for examples) is what in Java are primitive types (int, boolean ...) , in Scala are objects that you can add functionality to with implicit conversions. For example, if you want to add a toRoman method to ints, you could write an implicit class like:
implicit class RomanInt(i:Int){
def toRoman = //some algorithm to convert i to a Roman representation
}
Then, you could call this method from any Int literal like :
val romanFive = 5.toRoman // V
This way you can 'pimp' basic types to adapt them to your needs
In addition to the points made by others, I always emphasize that the uniform treatment of all values in Scala is in part an illusion. For the most part it is a very welcome illusion. And Scala is very smart to use real JVM primitives as much as possible and to perform automatic transformations (usually referred to as boxing and unboxing) only as much as necessary.
However, if the dynamic pattern of application of automatic boxing and unboxing is very high, there can be undesirable costs (both memory and CPU) associated with it. This can be partially mitigated with the use of specialization, which creates special versions of generic classes when particular type parameters are of (programmer-specified) primitive types. This avoids boxing and unboxing but comes at the cost of more .class files in your running application.
Not everything is an object in Scala, though more things are objects in Scala than their analogues in Java.
The advantage of objects is that they're bags of state which also have some behavior coupled with them. With the addition of polymorphism, objects give you ways of changing the implicit behavior and state. Enough with the poetry, let's go into some examples.
The if statement is not an object, in either scala or java. If it were, you could be able to subclass it, inject another dependency in its place, and use it to do stuff like logging to a file any time your code makes use of the if statement. Wouldn't that be magical? It would in some cases help you debug stuff, and in other cases it would make your hairs grow white before you found a bug caused by someone overwriting the behavior of if.
Visiting an objectless, statementful world: Imaging your favorite OOP programming language. Think of the standard library it provides. There's plenty of classes there, right? They offer ways for customization, right? They take parameters that are other objects, they create other objects. You can customize all of these. You have polymorphism. Now imagine that all the standard library was simply keywords. You wouldn't be able to customize nearly as much, because you can't overwrite keywords. You'd be stuck with whatever cases the language designers decided to implement, and you'd be helpless in customizing anything there. Such languages exist, you know them well, they're the sequel-like languages. You can barely create functions there, but in order to customize the behavior of the SELECT statement, new versions of the language had to appear which included the features most desired. This would be an extreme world, where you'd only be able to program by asking the language designers for new features (which you might not get, because someone else more important would require some feature incompatible with what you want)
In conclusion, NOT everything is an object in scala: Classes, expressions, keywords and packages surely aren't. More things however are, like functions.
What's IMHO a nice rule of thumb is that more objects equals more flexibility
P.S. in Python for example, even more things are objects (like the classes themselves, the analogous concept for packages (that is python modules and packages). You'd see how there, black magic is easier to do, and that brings both good and bad consequences.

scala - is it possible to force immutability on an object?

I mean if there's some declarative way to prevent an object from changing any of it's members.
In the following example
class student(var name:String)
val s = new student("John")
"s" has been declared as a val, so it will always point to the same student.
But is there some way to prevent s.name from being changed by just declaring it like immutable???
Or the only solution is to declare everything as val, and manually force immutability?
No, it's not possible to declare something immutable. You have to enforce immutability yourself, by not allowing anyone to change it, that is remove all ways of modifying the class.
Someone can still modify it using reflection, but that's another story.
Scala doesn't enforce that, so there is no way to know. There is, however, an interesting compiler-plugin project named pusca (I guess it stands for Pure-Scala). Pure is defined there as not mutating a non-local variable and being side-effect free (e.g. not printing to the console)—so that calling a pure method repeatedly will always yield the same result (what is called referentially transparent).
I haven't tried out that plug-in myself, so I can't say if it's any stable or usable already.
There is no way that Scala could do this generally.
Consider the following hypothetical example:
class Student(var name : String, var course : Course)
def stuff(course : Course) {
magically_pure_val s = new Student("Fredzilla", course)
someFunctionOfStudent(s)
genericHigherOrderFunction(s, someFunctionOfStudent)
course.someMethod()
}
The pitfalls for any attempt to actually implement that magically_pure_val keyword are:
someFunctionOfStudent takes an arbitrary student, and isn't implemented in this compilation unit. It was written/compiled knowing that Student consists of two mutable fields. How do we know it doesn't actually mutate them?
genericHigherOrderFunction is even worse; it's going to take our Student and a function of Student, but it's written polymorphically. Whether or not it actually mutates s depends on what its other arguments are; determining that at compile time with full generality requires solving the Halting Problem.
Let's assume we could get around that (maybe we could set some secret flags that mean exceptions get raised if the s object is actually mutated, though personally I wouldn't find that good enough). What about that course field? Does course.someMethod() mutate it? That method call isn't invoked from s directly.
Worse than that, we only know that we'll have passed in an instance of Course or some subclass of Course. So even if we are able to analyze a particular implementation of Course and Course.someMethod and conclude that this is safe, someone can always add a new subclass of Course whose implementation of someMethod mutates the Course.
There's simply no way for the compiler to check that a given object cannot be mutated. The pusca plugin mentioned by 0__ appears to detect purity the same way Mercury does; by ensuring that every method is known from its signature to be either pure or impure, and by raising a compiler error if the implementation of anything declared to be pure does anything that could cause impurity (unless the programmer promises that the method is pure anyway).[1]
This is quite a different from simply declaring a value to be completely (and deeply) immutable and expecting the compiler to notice if any of the code that could touch it could mutate it. It's also not a perfect inference, just a conservative one
[1]The pusca README claims that it can infer impurity of methods whose last expression is a call to an impure method. I'm not quite sure how it can do this, as checking if that last expression is an impure call requires checking if it's calling a not-declared-impure method that should be declared impure by this rule, and the implementation might not be available to the compiler at that point (and indeed could be changed later even if it is). But all I've done is look at the README and think about it for a few minutes, so I might be missing something.