Related
I understand map and flatten operations can be combined into flatMap, and filter and map into collect in Scala.
Is there anyway I can combine zip/zipwithIndex with map operation?
There is no single operation in the standard library, as far as I know, but there is an extension method on various tuples, called zipped. This method returns an object which provides methods like map and flatMap, which would perform zipping in step with mapping:
(xs, ys).zipped.map((x, y) => x * y)
This object also is implicitly convertible to Traversable, so you can call more complex methods like mkString or foldLeft.
If, for some reason, you really wanted a combined version you could write one yourself.
implicit class SeqOps[A](s: Seq[A]) {
def zipWithIndex2[A1 >: A, B >: Int, That](f: (A, Int) => (A1, B))(implicit bf: CanBuildFrom[Seq[A], (A1, B), That]): That = {
val b = bf(s)
var i = 0
for (x <- s) {
b += f(x, i)
i += 1
}
b.result()
}
}
Call it like:
s.zipWithIndex2 {
case (a, b) => (a + "2", b + 2)
}
I'd really think about this twice though and most likely go with any of the other approaches that have been suggested.
I'm trying to write a generic invert method that takes a Map from keys of type A to values that are collections of type B and converts it to a Map with keys of type B and collections of A using the same original collection type.
My goal is to make this method a member of a MyMap[A,B] class that offers extensions of the basic library methods, where Maps are implicitly converted to MyMaps. I am able to do this implicit conversion for a generic map, but I want to further specify that the invert method should only work in the case where B is a collection.
I lack the understanding of scala's collections framework to accomplish this - I've scoured the net for thorough introductory explanations of the signatures that look like a hodgepodge of Repr, CC,That, and CanBuildFrom, but I don't really understand how all these pieces fit together well enough to construct the method signature on my own. Please don't just give me the working signature for this case - I want to understand how the signatures of methods that use generic collections work in a broader sense so I'm able to do this independently going forward. Alternatively, feel free to reference an online resource that elaborates on this - I was unable to find one that was comprehensive and clear.
EDIT
I seem to have gotten it to work with the following code. If I did something wrong or you see something that can be improved, please comment & answer with a more optimal alternative.
class MyMap[A, B](val _map: Map[A, B]) {
def invert[E, CC[E]](
implicit ev1: B =:= CC[E],
ev2: CC[E] <:< TraversableOnce[E],
cbf: CanBuildFrom[CC[A], A, CC[A]]
): Map[E, CC[A]] = {
val inverted = scala.collection.mutable.Map.empty[E, Builder[A, CC[A]]]
for {
(key, values) <- _map
value <- values.asInstanceOf[CC[E]]
} {
if (!inverted.contains(value)) {
inverted += (value -> cbf())
}
inverted.get(value).foreach(_ += key)
}
return inverted.map({ case (k,v) => (k -> v.result) }).toMap
}
}
I started from your code and ended up with this:
implicit class MyMap[A, B, C[B] <: Traversable[B]](val _map: Map[A, C[B]]) {
def invert(implicit cbf: CanBuildFrom[C[A], A, C[A]]): Map[B, C[A]] = {
val inverted = scala.collection.mutable.Map.empty[B, Builder[A, C[A]]]
for ((k, vs) <- _map; v <- vs) {
inverted.getOrElseUpdate(v, cbf()) += k
}
inverted.map({ case (k, v) => (k -> v.result)}).toMap
}
}
val map = Map("a"-> List(1,2,3), "b" -> List(1,2))
println(map.invert) //Map(2 -> List(a, b), 1 -> List(a, b), 3 -> List(a))
A couple of weeks ago Dragisa Krsmanovic asked a question here about how to use the free monad in Scalaz 7 to avoid stack overflows in this situation (I've adapted his code a bit):
import scalaz._, Scalaz._
def setS(i: Int): State[List[Int], Unit] = modify(i :: _)
val s = (1 to 100000).foldLeft(state[List[Int], Unit](())) {
case (st, i) => st.flatMap(_ => setS(i))
}
s(Nil)
I thought that just lifting a trampoline into StateT should work:
import Free.Trampoline
val s = (1 to 100000).foldLeft(state[List[Int], Unit](()).lift[Trampoline]) {
case (st, i) => st.flatMap(_ => setS(i).lift[Trampoline])
}
s(Nil).run
But it still blows the stack, so I just posted it as a comment.
Dave Stevens just pointed out that sequencing with the applicative *> instead of the monadic flatMap actually works just fine:
val s = (1 to 100000).foldLeft(state[List[Int], Unit](()).lift[Trampoline]) {
case (st, i) => st *> setS(i).lift[Trampoline]
}
s(Nil).run
(Well, it's super slow of course, because that's the price you pay for doing anything interesting like this in Scala, but at least there's no stack overflow.)
What's going on here? I don't think there could be a principled reason for this difference, but really I have no idea what could be going on in the implementation and don't have time to dig around at the moment. But I'm curious and it would be cool if someone else knows.
Mandubian is correct, the flatMap of StateT doesn't allow you to bypass stack accumulation because of the creation of the new StateT immediately before calling the wrapped monad's bind (which would be a Free[Function0] in your case).
So Trampoline can't help, but the Free Monad over the functor for State is one way to ensure stack safety.
We want to go from State[List[Int],Unit] to Free[a[State[List[Int],a],Unit] and our flatMap call will be to Free's flatMap (that doesn't do anything other than create the Free data structure).
val s = (1 to 100000).foldLeft(
Free.liftF[({ type l[a] = State[List[Int],a]})#l,Unit](state[List[Int], Unit](()))) {
case (st, i) => st.flatMap(_ =>
Free.liftF[({ type l[a] = State[List[Int],a]})#l,Unit](setS(i)))
}
Now we have a Free data structure built that we can easily thread a state through as such:
s.foldRun(List[Int]())( (a,b) => b(a) )
Calling liftF is fairly ugly so I have a PR in to make it easier for State and Kleisli monads so hopefully in the future there won't need to be type lambdas.
Edit: PR accepted so now we have
val s = (1 to 100000).foldLeft(state[List[Int], Unit](()).liftF) {
case (st, i) => st.flatMap(_ => setS(i).liftF)
}
There is a principled intuition for this difference.
The applicative operator *> evaluates its left argument only for its side effects, and always ignores the result. This is similar (in some cases equivalent) to Haskell's >> function for monads. Here's the source for *>:
/** Combine `self` and `fb` according to `Apply[F]` with a function that discards the `A`s */
final def *>[B](fb: F[B]): F[B] = F.apply2(self,fb)((_,b) => b)
and Apply#apply2:
def apply2[A, B, C](fa: => F[A], fb: => F[B])(f: (A, B) => C): F[C] =
ap(fb)(map(fa)(f.curried))
In general, flatMap depends on the result of the left argument (it must, as it is the input for the function in the right argument). Even though in this specific case you are ignoring the left result, flatMap doesn't know that.
It seems likely, given your results, that the implementation for *> is optimized for the case where the result of the left argument is unneeded. However flatMap cannot perform this optimization and so each call grows the stack by retaining the unused left result.
It's possible that this could be optimized at the compiler (scalac) or JIT (HotSpot) level (Haskell's GHC certainly performs this optimization), but for now this seems like a missed optimization opportunity.
Just to add to the discussion...
In StateT, you have:
def flatMap[S3, B](f: A => IndexedStateT[F, S2, S3, B])(implicit F: Bind[F]): IndexedStateT[F, S1, S3, B] =
IndexedStateT(s => F.bind(apply(s)) {
case (s1, a) => f(a)(s1)
})
The apply(s) fixes the current state reference in the next state.
bind definition interpretes eagerly its parameters catching the reference because it requires it:
def bind[A, B](fa: F[A])(f: A => F[B]): F[B]
At the difference of ap which might not need to interprete one of its parameters:
def ap[A, B](fa: => F[A])(f: => F[A => B]): F[B]
With this code, the Trampoline can't help for StateT flatMap (and also map)...
The Either class seems useful and the ways of using it are pretty obvious. But then I look at the API documentation and I'm baffled:
def joinLeft [A1 >: A, B1 >: B, C] (implicit ev: <:<[A1, Either[C, B1]]):
Either[C, B1]
Joins an Either through Left.
def joinRight [A1 >: A, B1 >: B, C] (implicit ev: <:<[B1, Either[A1, C]]):
Either[A1, C]
Joins an Either through Right.
def left : LeftProjection[A, B]
Projects this Either as a Left.
def right : RightProjection[A, B]
Projects this Either as a Right.
What do I do with a projection and how do I even invoke the joins?
Google just points me to the API documentation.
This might just be a case of "paying no attention to the man behind the curtain", but I don't think so. I think this is important.
left and right are the important ones. Either is useful without projections (mostly you do pattern matching), but projections are quite worthy of attention, as they give a much richer API. You will use joins much less.
Either is often used to mean "a proper value or an error". In this respect, it is like an extended Option . When there is no data, instead of None, you have an error.
Option has a rich API. The same can be made available on Either, provided we know, in Either, which one is the result and which one is the error.
left and right projection says just that. It is the Either, plus the added knowledge that the value is respectively at left or at right, and the other one is the error.
For instance, in Option, you can map, so opt.map(f) returns an Option with f applied to the value of opt if it has a one, and still None if opt was None. On a left projection, it will apply f on the value at left if it is a Left, and leave it unchanged if it is a Right. Observe the signatures:
In LeftProjection[A,B], map[C](f: A => C): Either[C,B]
In RightProjection[A,B], map[C](f: B => C): Either[A,C].
left and right are simply the way to say which side is considered the value when you want to use one of the usual API routines.
Alternatives could have been:
set a convention, as in Haskell, where there were strong syntactical reasons to put the value at right. When you want to apply a method on the other side (you may well want to change the error with a map for instance), do a swap before and after.
postfix method names with Left or Right (maybe just L and R). That would prevent using for comprehension. With for comprehensions (flatMap in fact, but the for notation is quite convenient) Either is an alternative to (checked) exceptions.
Now the joins. Left and Right means the same thing as for the projections, and they are closely related to flatMap. Consider joinLeft. The signature may be puzzling:
joinLeft [A1 >: A, B1 >: B, C] (implicit ev: <:<[A1, Either[C, B1]]):
Either[C, B1]
A1 and B1 are technically necessary, but not critical to the understanding, let's simplify
joinLeft[C](implicit ev: <:<[A, Either[C, B])
What the implicit means is that the method can only be called if A is an Either[C,B]. The method is not available on an Either[A,B] in general, but only on an Either[Either[C,B], B]. As with left projection, we consider that the value is at left (that would be right for joinRight). What the join does is flatten this (think flatMap). When one join, one does not care whether the error (B) is inside or outside, we just want Either[C,B]. So Left(Left(c)) yields Left(c), both Left(Right(b)) and Right(b) yield Right(b). The relation with flatMap is as follows:
joinLeft(e) = e.left.flatMap(identity)
e.left.flatMap(f) = e.left.map(f).joinLeft
The Option equivalent would work on an Option[Option[A]], Some(Some(x)) would yield Some(x) both Some(None) and None would yield None. It can be written o.flatMap(identity). Note that Option[A] is isomorphic to Either[A,Unit] (if you use left projections and joins) and also to Either[Unit, A] (using right projections).
Ignoring the joins for now, projections are a mechanism allowing you to use use an Either as a monad. Think of it as extracting either the left or right side into an Option, but without losing the other side
As always, this probably makes more sense with an example. So imagine you have an Either[Exception, Int] and want to convert the Exception to a String (if present)
val result = opReturningEither
val better = result.left map {_.getMessage}
This will map over the left side of result, giving you an Either[String,Int]
joinLeft and joinRight enable you to "flatten" a nested Either:
scala> val e: Either[Either[String, Int], Int] = Left(Left("foo"))
e: Either[Either[String,Int],Int] = Left(Left(foo))
scala> e.joinLeft
res2: Either[String,Int] = Left(foo)
Edit: My answer to this question shows one example of how you can use the projections, in this case to fold together a sequence of Eithers without pattern matching or calling isLeft or isRight. If you're familiar with how to use Option without matching or calling isDefined, it's analagous.
While curiously looking at the current source of Either, I saw that joinLeft and joinRight are implemented with pattern matching. However, I stumbled across this older version of the source and saw that it used to implement the join methods using projections:
def joinLeft[A, B](es: Either[Either[A, B], B]) =
es.left.flatMap(x => x)
My suggestion is add the following to your utility package:
implicit class EitherRichClass[A, B](thisEither: Either[A, B])
{
def map[C](f: B => C): Either[A, C] = thisEither match
{
case Left(l) => Left[A, C](l)
case Right(r) => Right[A, C](f(r))
}
def flatMap[C](f: B => Either[A, C]): Either[A, C] = thisEither match
{
case Left(l) => Left[A, C](l)
case Right(r) => (f(r))
}
}
In my experience the only useful provided method is fold. You don't really use isLeft or isRight in functional code. joinLeft and joinRight might be useful as flatten functions as explained by Dider Dupont but, I haven't had occasion to use them that way. The above is using Either as right biased, which I suspect is how most people use them. Its like an Option with an error value instead of None.
Here's some of my own code. Apologies its not polished code but its an example of using Either in a for comprehension. Adding the map and flatMap methods to Either allows us to use the special syntax in for comprehensions. Its parsing HTTP headers, either returning an Http and Html error page response or a parsed custom HTTP Request object. Without the use of the for comprehension the code would be very difficult to comprehend.
object getReq
{
def LeftError[B](str: String) = Left[HResponse, B](HttpError(str))
def apply(line1: String, in: java.io.BufferedReader): Either[HResponse, HttpReq] =
{
def loop(acc: Seq[(String, String)]): Either[HResponse, Seq[(String, String)]] =
{
val ln = in.readLine
if (ln == "")
Right(acc)
else
ln.splitOut(':', s => LeftError("400 Bad Syntax in Header Field"), (a, b) => loop(acc :+ Tuple2(a.toLowerCase, b)))
}
val words: Seq[String] = line1.lowerWords
for
{
a3 <- words match
{
case Seq("get", b, c) => Right[HResponse, (ReqType.Value, String, String)]((ReqType.HGet, b, c))
case Seq("post", b, c) => Right[HResponse, (ReqType.Value, String, String)]((ReqType.HPost, b, c))
case Seq(methodName, b, c) => LeftError("405" -- methodName -- "method not Allowed")
case _ => LeftError("400 Bad Request: Bad Syntax in Status Line")
}
val (reqType, target, version) = a3
fields <- loop(Nil)
val optLen = fields.find(_._1 == "content-length")
pair <- optLen match
{
case None => Right((0, fields))
case Some(("content-length", second)) => second.filterNot(_.isWhitespace) match
{
case s if s.forall(_.isDigit) => Right((s.toInt, fields.filterNot(_._1 == "content-length")))
case s => LeftError("400 Bad Request: Bad Content-Length SyntaxLine")
}
}
val (bodyLen, otherHeaderPairs) = pair
val otherHeaderFields = otherHeaderPairs.map(pair => HeaderField(pair._1, pair._2))
val body = if (bodyLen > 0) (for (i <- 1 to bodyLen) yield in.read.toChar).mkString else ""
}
yield (HttpReq(reqType, target, version, otherHeaderFields, bodyLen, body))
}
}
I recently answered a question with an attempt at writing a quicksort function in Scala, I'd seen something like the code below written somewhere.
def qsort(l: List[Int]): List[Int] = {
l match {
case Nil => Nil
case pivot::tail => qsort(tail.filter(_ < pivot)) ::: pivot :: qsort(tail.filter(_ >= pivot))
}
}
My answer received some constructive criticism pointing out that List was a poor choice of collection for quicksort and secondly that the above wasn't tail recursive.
I tried to re-write the above in a tail recursive manner but didn't have much luck. Is it possible to write a tail recursive quicksort? or, if not, how can it be done in a functional style? Also what can be done to maximize the efficiency of the implementation?
A few years back, I spent some time trying to optimize functional quicksort as far as I could. The following is what I came up with for vanilla List[A]:
def qsort[A : Ordering](ls: List[A]) = {
import Ordered._
def sort(ls: List[A])(parent: List[A]): List[A] = {
if (ls.size <= 1) ls ::: parent else {
val pivot = ls.head
val (less, equal, greater) = ls.foldLeft((List[A](), List[A](), List[A]())) {
case ((less, equal, greater), e) => {
if (e < pivot)
(e :: less, equal, greater)
else if (e == pivot)
(less, e :: equal, greater)
else
(less, equal, e :: greater)
}
}
sort(less)(equal ::: sort(greater)(parent))
}
}
sort(ls)(Nil)
}
I was able to do even better with a custom List structure. This custom structure basically tracked the ideal (or nearly ideal) pivot point for the list. Thus, I could obtain a far better pivot point in constant time, simply by accessing this special list property. In practice, this did quite a bit better than the standard functional approach of choosing the head.
As it is, the above is still pretty snappy. It's "half" tail recursive (you can't do a tail-recursive quicksort without getting really ugly). More importantly, it rebuilds from the tail end first, so that results in substantially fewer intermediate lists than the conventional approach.
It's important to note that this is not the most elegant or most idiomatic way to do quicksort in Scala, it just happens to work very well. You will probably have more success writing merge sort, which is usually a much faster algorithm when implemented in functional languages (not to mention much cleaner).
I guess it depends on what do you mean by "idiomatic". The main advantage of quicksort is being a very fast in-place sorting algorithm. So, if you can't sort in-place, you loose all its advantages -- but you're still stuck with it's dis advantages.
So, here's some code I wrote for Rosetta Code on this very subject. It still doesn't sort in-place, but, on the other hand, it sorts any of the new collections:
import scala.collection.TraversableLike
import scala.collection.generic.CanBuildFrom
def quicksort
[T, CC[X] <: Traversable[X] with TraversableLike[X, CC[X]]] // My type parameters -- which are expected to be inferred
(coll: CC[T]) // My explicit parameter -- the one users will actually see
(implicit ord: Ordering[T], cbf: CanBuildFrom[CC[T], T, CC[T]]) // My implicit parameters -- which will hopefully be implicitly available
: CC[T] = // My return type -- which is the very same type of the collection received
if (coll.isEmpty) {
coll
} else {
val (smaller, bigger) = coll.tail partition (ord.lt(_, coll.head))
quicksort(smaller) ++ coll.companion(coll.head) ++ quicksort(bigger)
}
As it happens I tried to solve this exact same problem recently. I wanted to have the classic algorithm (i.e. the one that does in-place sorting) converted to tail recursive form.
If you are still interested you may see my recommended solution here:
Quicksort rewritten in tail-recursive form - An example in Scala
The article also contains the steps I followed to convert the initial implementation to tail recursive form.
I did some experiments trying to write Quicksort in a purely functional style. Here is what I got (Quicksort.scala):
def quicksort[A <% Ordered[A]](list: List[A]): List[A] = {
def sort(t: (List[A], A, List[A])): List[A] = t match {
case (Nil, p, Nil) => List(p)
case (l, p, g) => partitionAndSort(l) ::: (p :: partitionAndSort(g))
}
def partition(as: List[A]): (List[A], A, List[A]) = {
def loop(p: A, as: List[A], l: List[A], g: List[A]): (List[A], A, List[A]) =
as match {
case h :: t => if (h < p) loop(p, t, h :: l, g) else loop(p, t, l, h :: g)
case Nil => (l, p, g)
}
loop(as.head, as.tail, Nil, Nil)
}
def partitionAndSort(as: List[A]): List[A] =
if (as.isEmpty) Nil
else sort(partition(as))
partitionAndSort(list)
}
My solution on Scala 3.
import scala.language.postfixOps
import scala.util.Random
val randomArray: Array[Int] = (for(_ <- 1 to 1000) yield Random.nextInt(1000)).toArray
def quickSort(inputArray: Array[Int]): Array[Int] =
inputArray.length match
case 0 => inputArray
case 1 => inputArray
case _ => Array.concat(
quickSort(inputArray.filter(inputArray(inputArray.length / 2)
inputArray.filter(inputArray(inputArray.length / 2) ==),
quickSort(inputArray.filter(inputArray(inputArray.length / 2)
print(quickSort(randomArray).mkString("Sorted array: (", ", ", ")"))