Why doesn't scala's parallel sequences have a contains method? - scala

Why does
List.range(0,100).contains(2)
Work, while
List.range(0,100).par.contains(2)
Does not?
This is planned for the future?

The non-teleological answer is that it's because contains is defined in SeqLike but not in ParSeqLike.
If that doesn't satisfy your curiosity, you can find that SeqLike's contains is defined thus:
def contains(elem: Any): Boolean = exists (_ == elem)
So for your example you can write
List.range(0,100).par.exists(_ == 2)
ParSeqLike is missing a few other methods as well, some of which would be hard to implement efficiently (e.g. indexOfSlice) and some for less obvious reasons (e.g. combinations - maybe because that's only useful on small datasets). But if you have a parallel collection you can also use .seq to get back to the linear version and get your methods back:
List.range(0,100).par.seq.contains(2)
As for why the library designers left it out... I'm totally guessing, but maybe they wanted to reduce the number of methods for simplicity's sake, and it's nearly as easy to use exists.
This also raises the question, why is contains defined on SeqLike rather than on the granddaddy of all collections, GenTraversableOnce, where you find exists? A possible reason is that contains for Map is semantically a different method to that on Set and Seq. A Map[A,B] is a Traversable[(A,B)], so if contains were defined for Traversable, contains would need to take a tuple (A,B) argument; however Map's contains takes just an A argument. Given this, I think contains should be defined in GenSeqLike - maybe this is an oversight that will be corrected.
(I thought at first maybe parallel sequences don't have contains because searching where you intend to stop after finding your target on parallel collections is a lot less efficient than the linear version (the various threads do a lot of unnecessary work after the value is found: see this question), but that can't be right because exists is there.)

Related

Either monadic operations

I just start to be used to deal with monadic operations.
For the Option type, this Cheat Sheet of Tony Morris helped:
http://blog.tmorris.net/posts/scalaoption-cheat-sheet/
So in the end it seems easy to understand that:
map transforms the value of inside an option
flatten permits to transform Option[Option[X]] in Option[X]
flatMap is somehow a map operation producing an Option[Option[X]] and then flattened to Option[X]
At least it is what I understand until now.
For Either, it seems a bit more difficult to understand since Either itself is not right biaised, does not have map / flatMap operations... and we use projection.
I can read the Scaladoc but it is not as clear as the Cheat Sheet on Options.
Can someone provide an Either Sheet Cheat to describe the basic monadic operations?
It seems to me that Either.joinRight is a bit like RightProjection.flatMap and seems to be the equivalent of Option.flatten for Either.
It seems to me that if Either was Right biaised, then Either.flatten would be Either.joinRight no?
In this question: Either, Options and for comprehensions I ask about for comprehension with Eiher, and one of the answers says that we can't mix monads because of the way it is desugared into map/flatMap/filter.
When using this kind of code:
def updateUserStats(user: User): Either[Error,User] = for {
stampleCount <- stampleRepository.getStampleCount(user).right
userUpdated <- Right(copyUserWithStats(user,stampleCount)).right
userSaved <- userService.update(userUpdated).right
} yield userSaved
Does this mean that all my 3 method calls must always return Either[Error,Something]?
I mean if I have a method call Either[Throwable,Something] it won't work right?
Edit:
Is Try[Something] exactly the same as a right-biaised Either[Throwable,Something]?
Either was never really meant to be an exception handling based structure. It was meant to represent a situation where a function really could possible return one of two distinct types, but people started the convention where the left type is a supposed to be a failed case and the right is success. If you want to return a biased type for some pass/fail type business checks logic, then Validation from scalaz works well. If you have a function that could return a value or a Throwable, then Try would be a good choice. Either should be used for situations where you really might get one of two possible types, and now that I am using Try and Validation (each for different types of situations), I never use Either any more.

Should Scala immutable case classes be defined to hold Seq[T], immutable.Seq[T], List[T] or Vector[T]?

If we want to define a case class that holds a single object, say a tuple, we can do it easily:
sealed case class A(x: (Int, Int))
In this case, retrieving the "x" value will take a small constant amount of time, and this class will only take a small constant amount of space, regardless of how it was created.
Now, let's assume we want to hold a sequence of values instead; we could it like this:
sealed final case class A(x: Seq[Int])
This might seem to work as before, except that now storage and time to read all of x is proportional to x.length.
However, this is not actually the case, because someone could do something like this:
val hugeList = (1 to 1000000000).toList
val a = A(hugeList.view.filter(_ == 500000000))
In this case, the a object looks like an innocent case class holding a single int in a sequence, but in fact it requires gigabytes of memory, and it will take on the order of seconds to access that single element every time.
This could be fixed by specifying something like List[T] as the type instead of Seq[T]; however, this seems ugly since it adds a reference to a specific implementation, while in fact other well behaved implementations, like Vector[T], would also do.
Another worrying issue is that one could pass a mutable Seq[T], so it seems that one should at least use immutable.Seq instead of scala.collection.Seq (although the compiler can't actually enforce the immutability at the moment).
Looking at most libraries it seems that the common pattern is to use scala.collection.Seq[T], but is this really a good idea?
Or perhaps Seq is being used just because it's the shortest to type, and in fact it would be best to use immutable.Seq[T], List[T], Vector[T] or something else?
New text added in edit
Looking at the class library, some of the most core functionality like scala.reflect.api.Trees does in fact use List[T], and in general using a concrete class seems a good idea.
But then, why use List and not Vector?
Vector has O(1)/O(log(n)) length, prepend, append and random access, is asymptotically smaller (List is ~3-4 times bigger due to vtable and next pointers), and supports cache efficient and parallelized computation, while List has none of those properties except O(1) prepend.
So, personally I'm leaning towards Vector[T] being the correct choice for something exposed in a library data structure, where one doesn't know what operations the library user will need, despite the fact that it seems less popular.
First of all, you talk both about space and time requirements. In terms of space, your object will always be as large as the collection. It doesn't matter whether you wrap a mutable or immutable collection, that collection for obvious reasons needs to be in memory, and the case class wrapping it doesn't take any additional space (except its own small object reference). So if your collection takes "gigabytes of memory", that's a problem of your collection, not whether you wrap it in a case class or not.
You then go on to argue that a problem arises when using views instead of eager collections. But again the question is what the problem actually is? You use the example of lazily filtering a collection. In general running a filter will be an O(n) operation just as if you were iterating over the original list. In that example it would be O(1) for successive calls if that collection was made strict. But that's a problem of the calling site of your case class, not the definition of your case class.
The only valid point I see is with respect to mutable collections. Given the defining semantics of case classes, you should really only use effectively immutable objects as arguments, so either pure immutable collections or collections to which no instance has any more write access.
There is a design error in Scala in that scala.Seq is not aliased to collection.immutable.Seq but a general seq which can be either mutable or immutable. I advise against any use of unqualified Seq. It is really wrong and should be rectified in the Scala standard library. Use collection.immutable.Seq instead, or if the collection doesn't need to be ordered, collection.immutable.Traversable.
So I agree with your suspicion:
Looking at most libraries it seems that the common pattern is to use scala.collection.Seq[T], but is this really a good idea?
No! Not good. It might be convenient, because you can pass in an Array for example without explicit conversion, but I think a cleaner design is to require immutability.

How do I deal with Scala collections generically?

I have realized that my typical way of passing Scala collections around could use some improvement.
def doSomethingCool(theFoos: List[Foo]) = { /* insert cool stuff here */ }
// if I happen to have a List
doSomethingCool(theFoos)
// but elsewhere I may have a Vector, Set, Option, ...
doSomethingCool(theFoos.toList)
I tend to write my library functions to take a List as the parameter type, but I'm certain that there's something more general I can put there to avoid all the occasional .toList calls I have in the application code. This is especially annoying since my doSomethingCool function typically only needs to call map, flatMap and filter, which are defined on all the collection types.
What are my options for that 'something more general'?
Here are more general traits, each of which extends the previous one:
GenTraversableOnce
GenTraversable
GenIterable
GenSeq
The traits above do not specify whether the collection is sequential or parallel. If your code requires that things be executed sequentially (typically, if your code has side effects of any kind), they are too general for it.
The following traits mandate sequential execution:
TraversableOnce
Traversable
Iterable
Seq
LinearSeq
The first one, TraversableOnce only allows you to call one method on the collection. After that, the collection has been "used". In exchange, it is general enough to accept iterators as well as collections.
Traversable is a pretty general collection that has most methods. There are some things it cannot do, however, in which case you need to go to Iterable.
All Iterable implement the iterator method, which allows you to get an Iterator for that collection. This gives it the capability for a few methods not present in Traversable.
A Seq[A] implements the function Int => A, which means you can access any element by its index. This is not guaranteed to be efficient, but it is a guarantee that each element has an index, and that you can make assertions about what that index is going to be. Contrast this with Map and Set, where you cannot tell what the index of an element is.
A LinearSeq is a Seq that provides fast head, tail, isEmpty and prepend. This is as close as you can get to a List without actually using a List explicitly.
Alternatively, you could have an IndexedSeq, which has fast indexed access (something List does not provide).
See also this question and this FAQ based on it.
The most obvious one is to use Traversable as the most general trait which will have the goodies you want. However, I think you are generally better sticking to:
Seq
IndexedSeq
Set
Map
A Seq will cover List, Vector etc, IndexedSeq will cover Vector etc etc. I found myself not using Iterable because I often need (or want) to know the size of the thing I have and back pre scala-2.8 Iterable did not provide access to this, so I kept having to turn things into sequences anyway!
Looks like Traversable and Iterable now have size methods so maybe I should go back to using them! Of course you could start "going mad" with GenTraversableOnce but that is not likely to aid in readability.

Scala: selecting function returning Option versus PartialFunction

I'm a relative Scala beginner and would like some advice on how to proceed on an implementation that seems like it can be done either with a function returning Option or with PartialFunction. I've read all the related posts I could find (see bottom of question), but these seem to involve the technical details of using PartialFunction or converting one to the other; I am looking for an answer of the type "if the circumstances are X,Y,Z, then use A else B, but also consider C".
My example use case is a path search between locations using a library of path finders. Say the locations are of type L, a path was of type P and the desired path search result would be an Iterable[P]. The patch search result should be assembled by asking all the path finders (in something like Google maps these might be Bicycle, Car, Walk, Subway, etc.) for their path suggestions, which may or may not be defined for a particular start/end location pair.
There seem to be two ways to go about this:
(a) define a path finder as f: (L,L) => Option[P] and then get the result via something like finders.map( _.apply(l1,l2) ).filter( _.isDefined ).map( _.get )
(b) define a path finder as f: PartialFunction[(L,L),P] and then get the result via something likefinders.filter( _.isDefined( (l1,l2) ) ).map( _.apply( (l1,l2)) )`
It seems like using a function returning Option[P] would avoid double evaluation of results, so for an expensive computation this may be preferable unless one caches the results. It also seems like using Option one can have an arbitrary input signature, whereas PartialFunction expects a single argument. But I am particularly interested in hearing from someone with practical experience about less immediate, more "bigger picture" considerations, such as the interaction with the Scala library. Would using a PartialFunction have significant benefits in making available certain methods of the collections API that might pay off in other ways? Would such code generally be more concise?
Related but different questions:
Inverse of PartialFunction's lift method
Is the PartialFunction design inefficient?
How to convert X => Option[R] to PartialFunction[X,R]
Is there a nicer way of lifting a PartialFunction in Scala?
costly computation occuring in both isDefined and Apply of a PartialFunction
It's not all that well known, but since 2.8 Scala has a collect method defined on it's collections. collect is similar to filter, but takes a partial function and has the semantics you describe.
It feels like Option might fit your use case better.
My interpretation is that Partial Functions work well to be combined over input ranges. So if f is defined over (SanDiego,Irvine) and g is defined over (Paris,London) then you can get a function that is defined over the combined input (SanDiego,Irvine) and (Paris,London) by doing f orElse g.
But in your case it seems, things happen for a given (l1,l2) location tuple and then you do some work...
If you find yourself writing a lot of {case (L,M) => ... case (P,Q) => ...} then it may be the sign that partial functions are a better fit.
Otherwise options work well with the rest of the collections and can be used like this instead of your (a) proposal:
val processedPaths = for {
f <- finders
p <- f(l1, l2)
} yield process(p)
Within the for comprehension p is lifted into an Traversable, so you don't even have to call filter, isDefined or get to skip the finders without results.

Examples of using some Scala Option methods

I have read the blog post recommended me here. Now I wonder what some those methods are useful for. Can you show examples of using forall (as opposed to foreach) and toList of Option?
map: Allows you to transform a value "inside" an Option, as you probably already know for Lists. This operation makes Option a functor (you can say "endofunctor" if you want to scare your colleagues)
flatMap: Option is actually a monad, and flatMap makes it one (together with something like a constuctor for a single value). This method can be used if you have a function which turns a value into an Option, but the value you have is already "wrapped" in an Option, so flatMap saves you the unwrapping before applying the function. E.g. if you have an Option[Map[K,V]], you can write mapOption.flatMap(_.get(key)). If you would use a simple map here, you would get an Option[Option[V]], but with flatMap you get an Option[V]. This method is cooler than you might think, as it allows to chain functions together in a very flexible way (which is one reason why Haskell loves monads).
flatten: If you have a value of type Option[Option[T]], flatten turns it into an Option[T]. It is the same as flatMap(identity(_)).
orElse: If you have several alternatives wrapped in Options, and you want the first one that holds actually a value, you can chain these alternatives with orElse: steakOption.orElse(hamburgerOption).orElse(saladOption)
getOrElse: Get the value out of the Option, but specify a default value if it is empty, e.g. nameOption.getOrElse("unknown").
foreach: Do something with the value inside, if it exists.
isDefined, isEmpty: Determine if this Option holds a value.
forall, exists: Tests if a given predicate holds for the value. forall is the same as option.map(test(_)).getOrElse(true), exists is the same, just with false as default.
toList: Surprise, it converts the Option to a List.
Many of the methods on Option may be there more for the sake of uniformity (with collections) rather than for their usefulness, as they are all very small functions and so do not spare much effort, yet they serve a purpose, and their meanings are clear once you are familiar with the collection framework (as is often said, Option is like a list which cannot have more than one element).
forall checks a property of the value inside an option. If there is no value, the check pass. For example, if in a car rental, you are allowed one additionalDriver: Option[Person], you can do
additionalDriver.forall(_.hasDrivingLicense)
exactly the same thing that you would do if several additional drivers were allowed and you had a list.
toList may be a useful conversion. Suppose you have options: List[Option[T]], and you want to get a List[T], with the values of all of the options that are Some. you can do
for(option <- options; value in option.toList) yield value
(or better options.flatMap(_.toList))
I have one practical example of toList method. You can find it in scaldi (my Scala dependency injection framework) in Module.scala at line 72:
https://github.com/OlegIlyenko/scaldi/blob/f3697ecaa5d6e96c5486db024efca2d3cdb04a65/src/main/scala/scaldi/Module.scala#L72
In this context getBindings method can return either Nil or List with only one element. I can retrieve it as Option with discoverBinding. I find it convenient to be able to convert Option to List (that either empty or has one element) with toList method.