Does it make sense to have properties like Option[List[...]] when the List container supports the property of empty? - scala

Does it make sense for class properties or function parameters that are like:
Option[List[String]]
Option[Map[String,String]]
Does wrapping it in an option provide more composability or since the container e.g. List and Map supports being empty there is no point in making it an Option?
Is it more efficient to have Option since it can be a None versus having a List.empty or Map.empty?

It depends ;)
Option is also a collection - it just can't have more than 1 element (but it can have 0). So it usually doesn't make sense to wrap another collection in Option.
However, it depends on what it represents and what semantics it conveys. Is None and empty List the same? Or maybe it matters that the List is there, but it's empty? And it's different than when the List is not there?

Depends on a use case. An empty list could actually mean something different than absence of a list.
Does Option[Int] make sense? Maybe not, if you just want to represent a number of candies (0 will do). But what if you are looking for a maximum element of a list, which happens to be empty? None would be much more appropriate in this case than 0.
Same with lists and other things. If you are listing brands of candy in someone's pocket, sure, you can use Nil to cover the case of a poor shmuck who doesn't have any.
But what if we wanted to describe, say a store inventory?
def listBrandsInInventory(product: String = "candy"): List[String]
What if this is a bookstore, and doesn't carry candy at all? Sure, you could argue, that nobody needs a store that does not carry candy, or you could still just return Nil (and it wouldn't even be lying), but there are two different situations here: either all candy is sold out, or we don't carry candy at all (don't bother to call tomorrow). If you'd like to distinguish between the two, Option[List] comes handy.

Related

Scala Method Argument: Option of Collection or Default Value

I have method that takes a Map of [Int, MyClass] as an argument. Something like this:
myMethod(someMap : Map[Int, MyClass])
However, the someMap might be not always be present (null in Java world and None in Scala).
Which of the following is a better design of this method from an API point of view:
Wrapping it in an Option:
myMethod(someMap : Option[Map[Int, MyClass]] = None)
Defining an default empty map:
myMethod(someMap : Map[Int, MyClass] = Maps.empty)
The first option looks elegant, however it has the added complexity that one has to wrap a Map (if not None) in Some() and in the implementor has to do a getOrElse to unwrap it.
The first option also makes it clear to the consumer of the api, that the map object might not actually exist (None)
In the second option, one does not have to do the wrapping (in Some) or unwrapping, but an empty container has to be instantiated every time there is no existing Map object.
Also, the arguments agains 1: Option is itself a container of 0 or 1 item and Map is also a container (collection). Is it good design to wrap a container in another container ?
From an API design point of view, which one is the better approach?
The right question is: does it make sense for myMethod to work with an Option?
From the point of view of myMethod, maybe it only works with Map in which case it is the responsability of the caller not to call myMethod if there is no map to work with.
On the other hand, maybe myMethod does something special if there no Map or if it is empty and that is the responsability of the method to handle this case.
So there is no right answer but the correct argument is the one so that the responsabilities of the methods are respected. The aim is to have high cohesion and low coupling between your functions and classes.
Map.empty is a cheap operation, a result of it being immutable. So there is virtually no overhead in using it. Therefore, keep it simple, ask for a Map without any wrapping.

Should Scala immutable case classes be defined to hold Seq[T], immutable.Seq[T], List[T] or Vector[T]?

If we want to define a case class that holds a single object, say a tuple, we can do it easily:
sealed case class A(x: (Int, Int))
In this case, retrieving the "x" value will take a small constant amount of time, and this class will only take a small constant amount of space, regardless of how it was created.
Now, let's assume we want to hold a sequence of values instead; we could it like this:
sealed final case class A(x: Seq[Int])
This might seem to work as before, except that now storage and time to read all of x is proportional to x.length.
However, this is not actually the case, because someone could do something like this:
val hugeList = (1 to 1000000000).toList
val a = A(hugeList.view.filter(_ == 500000000))
In this case, the a object looks like an innocent case class holding a single int in a sequence, but in fact it requires gigabytes of memory, and it will take on the order of seconds to access that single element every time.
This could be fixed by specifying something like List[T] as the type instead of Seq[T]; however, this seems ugly since it adds a reference to a specific implementation, while in fact other well behaved implementations, like Vector[T], would also do.
Another worrying issue is that one could pass a mutable Seq[T], so it seems that one should at least use immutable.Seq instead of scala.collection.Seq (although the compiler can't actually enforce the immutability at the moment).
Looking at most libraries it seems that the common pattern is to use scala.collection.Seq[T], but is this really a good idea?
Or perhaps Seq is being used just because it's the shortest to type, and in fact it would be best to use immutable.Seq[T], List[T], Vector[T] or something else?
New text added in edit
Looking at the class library, some of the most core functionality like scala.reflect.api.Trees does in fact use List[T], and in general using a concrete class seems a good idea.
But then, why use List and not Vector?
Vector has O(1)/O(log(n)) length, prepend, append and random access, is asymptotically smaller (List is ~3-4 times bigger due to vtable and next pointers), and supports cache efficient and parallelized computation, while List has none of those properties except O(1) prepend.
So, personally I'm leaning towards Vector[T] being the correct choice for something exposed in a library data structure, where one doesn't know what operations the library user will need, despite the fact that it seems less popular.
First of all, you talk both about space and time requirements. In terms of space, your object will always be as large as the collection. It doesn't matter whether you wrap a mutable or immutable collection, that collection for obvious reasons needs to be in memory, and the case class wrapping it doesn't take any additional space (except its own small object reference). So if your collection takes "gigabytes of memory", that's a problem of your collection, not whether you wrap it in a case class or not.
You then go on to argue that a problem arises when using views instead of eager collections. But again the question is what the problem actually is? You use the example of lazily filtering a collection. In general running a filter will be an O(n) operation just as if you were iterating over the original list. In that example it would be O(1) for successive calls if that collection was made strict. But that's a problem of the calling site of your case class, not the definition of your case class.
The only valid point I see is with respect to mutable collections. Given the defining semantics of case classes, you should really only use effectively immutable objects as arguments, so either pure immutable collections or collections to which no instance has any more write access.
There is a design error in Scala in that scala.Seq is not aliased to collection.immutable.Seq but a general seq which can be either mutable or immutable. I advise against any use of unqualified Seq. It is really wrong and should be rectified in the Scala standard library. Use collection.immutable.Seq instead, or if the collection doesn't need to be ordered, collection.immutable.Traversable.
So I agree with your suspicion:
Looking at most libraries it seems that the common pattern is to use scala.collection.Seq[T], but is this really a good idea?
No! Not good. It might be convenient, because you can pass in an Array for example without explicit conversion, but I think a cleaner design is to require immutability.

Examples of using some Scala Option methods

I have read the blog post recommended me here. Now I wonder what some those methods are useful for. Can you show examples of using forall (as opposed to foreach) and toList of Option?
map: Allows you to transform a value "inside" an Option, as you probably already know for Lists. This operation makes Option a functor (you can say "endofunctor" if you want to scare your colleagues)
flatMap: Option is actually a monad, and flatMap makes it one (together with something like a constuctor for a single value). This method can be used if you have a function which turns a value into an Option, but the value you have is already "wrapped" in an Option, so flatMap saves you the unwrapping before applying the function. E.g. if you have an Option[Map[K,V]], you can write mapOption.flatMap(_.get(key)). If you would use a simple map here, you would get an Option[Option[V]], but with flatMap you get an Option[V]. This method is cooler than you might think, as it allows to chain functions together in a very flexible way (which is one reason why Haskell loves monads).
flatten: If you have a value of type Option[Option[T]], flatten turns it into an Option[T]. It is the same as flatMap(identity(_)).
orElse: If you have several alternatives wrapped in Options, and you want the first one that holds actually a value, you can chain these alternatives with orElse: steakOption.orElse(hamburgerOption).orElse(saladOption)
getOrElse: Get the value out of the Option, but specify a default value if it is empty, e.g. nameOption.getOrElse("unknown").
foreach: Do something with the value inside, if it exists.
isDefined, isEmpty: Determine if this Option holds a value.
forall, exists: Tests if a given predicate holds for the value. forall is the same as option.map(test(_)).getOrElse(true), exists is the same, just with false as default.
toList: Surprise, it converts the Option to a List.
Many of the methods on Option may be there more for the sake of uniformity (with collections) rather than for their usefulness, as they are all very small functions and so do not spare much effort, yet they serve a purpose, and their meanings are clear once you are familiar with the collection framework (as is often said, Option is like a list which cannot have more than one element).
forall checks a property of the value inside an option. If there is no value, the check pass. For example, if in a car rental, you are allowed one additionalDriver: Option[Person], you can do
additionalDriver.forall(_.hasDrivingLicense)
exactly the same thing that you would do if several additional drivers were allowed and you had a list.
toList may be a useful conversion. Suppose you have options: List[Option[T]], and you want to get a List[T], with the values of all of the options that are Some. you can do
for(option <- options; value in option.toList) yield value
(or better options.flatMap(_.toList))
I have one practical example of toList method. You can find it in scaldi (my Scala dependency injection framework) in Module.scala at line 72:
https://github.com/OlegIlyenko/scaldi/blob/f3697ecaa5d6e96c5486db024efca2d3cdb04a65/src/main/scala/scaldi/Module.scala#L72
In this context getBindings method can return either Nil or List with only one element. I can retrieve it as Option with discoverBinding. I find it convenient to be able to convert Option to List (that either empty or has one element) with toList method.

Scala: read and save all elements of an Iterable

I have an Iterable[T] that is really a stream of unknown length, and want to read it all and save it into something that is still an instance of Iterable. I really do have to read it and save it; I can't do it in a lazy way. The original Iterable can have a few thousand elements, at least. What's the most efficient/best/canonical way? Should I use an ArrayBuffer, a List, a Vector?
Suppose xs is my Iterable. I can think of doing these possibilities:
xs.toArray.toIterable // Ugh?
xs.toList // Fast?
xs.copyToBuffer(anArrayBuffer)
Vector(xs: _*) // There's no toVector, sadly. Is this construct as efficient?
EDIT: I see by the questions I should be more specific. Here's a strawman example:
def f(xs: Iterable[SomeType]) { // xs might a stream, though I can't be sure
val allOfXS = <xs all read in at once>
g(allOfXS)
h(allOfXS) // Both g() and h() take an Iterable[SomeType]
}
This is easy. A few thousand elements is nothing, so it hardly matters unless it's a really tight loop. So the flippant answer is: use whatever you feel is most elegant.
But, okay, let's suppose that this is actually in some tight loop, and you can predict or have benchmarked your code enough to know that this is performance-limiting.
Your best performance for an immutable solution will likely be a Vector, used like so:
Vector() ++ xs
In my hands, this can copy a 10k iterable about 4k-5k times per second. List is about half the speed.
If you're willing to try a mutable solution under the hood, xs.toArray.toIterable usually takes the cake with about 10k copies per second. ArrayBuffer is about the same speed as List.
If you actually know the size of the target (i.e. size is O(1) or you know it from somewhere else), you can shave off another 20-30% of the execution speed by allocating just the right size and writing a while loop.
If it's actually primitives, you can gain a factor of 10 by writing your own specialized Iterable-like-thing that acts on arrays and converts to regular collections via the underlying array.
Bottom line: for a great blend of power, speed, and flexibility, use Vector() ++ xs in most situations. xs.toIndexedSeq defaults to the same thing, with the benefit that if it's already a Vector that it will take no time at all (and chains nicely without using parens), and the drawback that you are relying upon a convention, not a specification for behavior (and it takes 1-3 more characters to type).
How about Stream.force?
Forces evaluation of the whole stream and returns it.
This is hard. An Iterable's methods are defined in terms of its iterator, but that gets overridden by subtraits. For instance, IndexedSeq methods are usually defined in terms of apply.
There is the question of why do you want to copy the Iterable, but I suppose you might be guarding against the possibility of it being mutable. If you do not want to copy it, then you need to rephrase your question.
If you are going to copy it, and you want to be sure all elements are copied in a strict manner, you could use .toList. That will not copy a List, but a List does not need to be copied. For anything else, it will produce a new copy.

Fill List with values from a for loop in Scala

I'm pretty new to scala and I am not able to solve this (pretty) trivial problem.
I know I can instantiate a List with predefined values like this:
val myList = List(1,2)
I want to fill a List with all Integers from 1 to 100000 . My Goal is not to use a var for the List and use a loop to fill the list.
Is there any "functional" way of doing this?
Either of these will do the trick. (If you try them in the REPL, though, be advised that it's going to try to print all million hundred thousand entries, which is generally not going to work.)
List.range(1,100001)
(1 to 100000).toList
I am also very new to Scala, it's pretty awesome isn't it.
Rex has the absolutely correct answer, but as food for thought: if you want a list that is not evaluated up front (perhaps the computations involved in evaluating the items in the list is expensive, or you just want to make things lazy), you can use a Stream.
Stream.from(0,1).takeWhile(_<=100000)
This can be used in most situations where you'd use a List.