Using Scalaz Stream for parsing task (replacing Scalaz Iteratees) - scala

Introduction
I use Scalaz 7's iteratees in a number of projects, primarily for processing large-ish files. I'd like to start switching to Scalaz streams, which are designed to replace the iteratee package (which frankly is missing a lot of pieces and is kind of a pain to use).
Streams are based on machines (another variation on the iteratee idea), which have also been implemented in Haskell. I've used the Haskell machines library a bit, but the relationship between machines and streams isn't completely obvious (to me, at least), and the documentation for the streams library is still a little sparse.
This question is about a simple parsing task that I'd like to see implemented using streams instead of iteratees. I'll answer the question myself if nobody else beats me to it, but I'm sure I'm not the only one who's making (or at least considering) this transition, and since I need to work through this exercise anyway, I figured I might as well do it in public.
Task
Supposed I've got a file containing sentences that have been tokenized and tagged with parts of speech:
no UH
, ,
it PRP
was VBD
n't RB
monday NNP
. .
the DT
equity NN
market NN
was VBD
illiquid JJ
. .
There's one token per line, words and parts of speech are separated by a single space, and blank lines represent sentence boundaries. I want to parse this file and return a list of sentences, which we might as well represent as lists of tuples of strings:
List((no,UH), (,,,), (it,PRP), (was,VBD), (n't,RB), (monday,NNP), (.,.))
List((the,DT), (equity,NN), (market,NN), (was,VBD), (illiquid,JJ), (.,.)
As usual, we want to fail gracefully if we hit invalid input or file reading exceptions, we don't want to have to worry about closing resources manually, etc.
An iteratee solution
First for some general file reading stuff (that really ought to be part of the iteratee package, which currently doesn't provide anything remotely this high-level):
import java.io.{ BufferedReader, File, FileReader }
import scalaz._, Scalaz._, effect.IO
import iteratee.{ Iteratee => I, _ }
type ErrorOr[A] = EitherT[IO, Throwable, A]
def tryIO[A, B](action: IO[B]) = I.iterateeT[A, ErrorOr, B](
EitherT(action.catchLeft).map(I.sdone(_, I.emptyInput))
)
def enumBuffered(r: => BufferedReader) = new EnumeratorT[String, ErrorOr] {
lazy val reader = r
def apply[A] = (s: StepT[String, ErrorOr, A]) => s.mapCont(k =>
tryIO(IO(Option(reader.readLine))).flatMap {
case None => s.pointI
case Some(line) => k(I.elInput(line)) >>== apply[A]
}
)
}
def enumFile(f: File) = new EnumeratorT[String, ErrorOr] {
def apply[A] = (s: StepT[String, ErrorOr, A]) => tryIO(
IO(new BufferedReader(new FileReader(f)))
).flatMap(reader => I.iterateeT[String, ErrorOr, A](
EitherT(
enumBuffered(reader).apply(s).value.run.ensuring(IO(reader.close()))
)
))
}
And then our sentence reader:
def sentence: IterateeT[String, ErrorOr, List[(String, String)]] = {
import I._
def loop(acc: List[(String, String)])(s: Input[String]):
IterateeT[String, ErrorOr, List[(String, String)]] = s(
el = _.trim.split(" ") match {
case Array(form, pos) => cont(loop(acc :+ (form, pos)))
case Array("") => cont(done(acc, _))
case pieces =>
val throwable: Throwable = new Exception(
"Invalid line: %s!".format(pieces.mkString(" "))
)
val error: ErrorOr[List[(String, String)]] = EitherT.left(
throwable.point[IO]
)
IterateeT.IterateeTMonadTrans[String].liftM(error)
},
empty = cont(loop(acc)),
eof = done(acc, eofInput)
)
cont(loop(Nil))
}
And finally our parsing action:
val action =
I.consume[List[(String, String)], ErrorOr, List] %=
sentence.sequenceI &=
enumFile(new File("example.txt"))
We can demonstrate that it works:
scala> action.run.run.unsafePerformIO().foreach(_.foreach(println))
List((no,UH), (,,,), (it,PRP), (was,VBD), (n't,RB), (monday,NNP), (.,.))
List((the,DT), (equity,NN), (market,NN), (was,VBD), (illiquid,JJ), (.,.))
And we're done.
What I want
More or less the same program implemented using Scalaz streams instead of iteratees.

A scalaz-stream solution:
import scalaz.std.vector._
import scalaz.syntax.traverse._
import scalaz.std.string._
val action = linesR("example.txt").map(_.trim).
splitOn("").flatMap(_.traverseU { s => s.split(" ") match {
case Array(form, pos) => emit(form -> pos)
case _ => fail(new Exception(s"Invalid input $s"))
}})
We can demonstrate that it works:
scala> action.collect.attempt.run.foreach(_.foreach(println))
Vector((no,UH), (,,,), (it,PRP), (was,VBD), (n't,RB), (monday,NNP), (.,.))
Vector((the,DT), (equity,NN), (market,NN), (was,VBD), (illiquid,JJ), (.,.))
And we're done.
The traverseU function is a common Scalaz combinator. In this case it's being used to traverse, in the Process monad, the sentence Vector generated by splitOn. It's equivalent to map followed by sequence.

Related

Scala: Apply same function to 2 lists in one call

let say I have
val list: List[(Int, String)] = List((1,"test"),(2,"test2"),(3,"sample"))
I need to partition this list in two, based on (Int, String) value. So far, so good.
For example it can be
def isValid(elem: (Int, String)) = elem._1 < 3 && elem._2.startsWith("test")
val (good, bad) = list.partition(isValid)
So, now I had 2 lists with signatures List[(Int, String)], but I need only Int part(some id). Off course I can write some function
def ids(list:List(Int, String)) = list.map(_._1)
and call it on both lists
val (ok, wrong) = (ids(good), ids(bad))
it worked, but looks little bit boilerplate. I prefer something like
val (good, bad) = list.partition(isValid).map(ids)
But it obviously not possible. So is there "Nicer" way to do what I need?
I understand that it's not so bad, but feel that there exist some functional pattern or general solution for such cases and I want to know it:) Thanks!
P.S. Thanks for all! Finally it's transformed to
private def handleGames(games:List[String], lastId:Int) = {
val (ok, wrong) = games.foldLeft(
(List.empty[Int], List.empty[Int])){
(a, b) => b match {
case gameRegex(d,w,e) => {
if(filterGame((d, w, e), lastId)) (d.toInt :: a._1, a._2)
else (a._1, d.toInt :: a._2 )
}
case _ => log.debug(s"not handled game template is: $b"); a
}
}
log.debug(s"not handled game ids are: ${wrong.mkString(",")}")
ok
}
You're looking for a foldLeft on the List:
myList.foldLeft((List.empty[Int], List.empty[Int])){
case ((good, bad), (id, value)) if predicate(id, value) => (id :: good, bad)
case ((good, bad), (id, _)) => (good, id :: bad)
}
This way you're operating at every stage doing both a transform and an accumulate. The returned type will be (List[Int], List[Int]) assuming predicate is the function which chooses between "good" and "bad" states. The cast of the Nil is due to the aggressive nature of Scala for choosing the most restrictive type on a foldl.
An additional approach using Cats can be used with Tuple2K and Foldables foldMap. Note this requires help from the kind-projector compiler plugin
import cats.implicits._
import cats.Foldable
import cats.data.Tuple2K
val listTuple = Tuple2K(list, otherList)
val (good, bad) = Foldable[Tuple2K[List, List, ?]].foldMap(listTuple)(f =>
if (isValid(f)) (List(f), List.empty) else (List.empty, List(f)))

How to create an Iteratee that passes through values to an inner Iteratee unless a specific value is found

I've got an ADT that's essentially a cross between Option and Try:
sealed trait Result[+T]
case object Empty extends Result[Nothing]
case class Error(cause: Throwable) extends Result[Nothing]
case class Success[T](value: T) extends Result[T]
(assume common combinators like map, flatMap etc are defined on Result)
Given an Iteratee[A, Result[B] called inner, I want to create a new Iteratee[Result[A], Result[B]] with the following behavior:
If the input is a Success(a), feed a to inner
If the input is an Empty, no-op
If the input is an Error(err), I want inner to be completely ignored, instead returning a Done iteratee with the Error(err) as its result.
Example Behavior:
// inner: Iteratee[Int, Result[List[Int]]]
// inputs:
1
2
3
// output:
Success(List(1,2,3))
// wrapForResultInput(inner): Iteratee[Result[Int], Result[List[Int]]]
// inputs:
Success(1)
Success(2)
Error(Exception("uh oh"))
Success(3)
// output:
Error(Exception("uh oh"))
This sounds to me like the job for an Enumeratee, but I haven't been able to find anything in the docs that looks like it'll do what I want, and the internal implementations are still voodoo to me.
How can I implement wrapForResultInput to create the behavior described above?
Adding some more detail that won't really fit in a comment:
Yes it looks like I was mistaken in my question. I described it in terms of Iteratees but it seems I really am looking for Enumeratees.
At a certain point in the API I'm building, there's a Transformer[A] class that is essentially an Enumeratee[Event, Result[A]]. I'd like to allow clients to transform that object by providing an Enumeratee[Result[A], Result[B]], which would result in a Transformer[B] aka an Enumeratee[Event, Result[B]].
For a more complex example, suppose I have a Transformer[AorB] and want to turn that into a Transformer[(A, List[B])]:
// the Transformer[AorB] would give
a, b, a, b, b, b, a, a, b
// but the client wants to have
a -> List(b),
a -> List(b, b, b),
a -> Nil
a -> List(b)
The client could implement an Enumeratee[AorB, Result[(A, List[B])]] without too much trouble using Enumeratee.grouped, but they are required to provide an Enumeratee[Result[AorB], Result[(A, List[B])] which seems to introduce a lot of complication that I'd like to hide from them if possible.
val easyClientEnumeratee = Enumeratee.grouped[AorB]{
for {
_ <- Enumeratee.dropWhile(_ != a) ><> Iteratee.ignore
headResult <- Iteratee.head.map{ Result.fromOption }
bs <- Enumeratee.takeWhile(_ == b) ><> Iteratee.getChunks
} yield headResult.map{_ -> bs}
val harderEnumeratee = ??? ><> easyClientEnumeratee
val oldTransformer: Transformer[AorB] = ... // assume it already exists
val newTransformer: Transformer[(A, List[B])] = oldTransformer.andThen(harderEnumeratee)
So what I'm looking for is the ??? to define the harderEnumeratee in order to ease the burden on the user who already implemented easyClientEnumeratee.
I guess the ??? should be an Enumeratee[Result[AorB], AorB], but if I try something like
Enumeratee.collect[Result[AorB]] {
case Success(ab) => ab
case Error(err) => throw err
}
the error will actually be thrown; I actually want the error to come back out as an Error(err).
Simplest implementation of such would be Iteratee.fold2 method, that could collect elements until something is happened.
Since you return single result and can't really return anything until you verify there is no errors, Iteratee would be enough for such a task
def listResults[E] = Iteratee.fold2[Result[E], Either[Throwable, List[E]]](Right(Nil)) { (state, elem) =>
val Right(list) = state
val next = elem match {
case Empty => (Right(list), false)
case Success(x) => (Right(x :: list), false)
case Error(t) => (Left(t), true)
}
Future(next)
} map {
case Right(list) => Success(list.reverse)
case Left(th) => Error(th)
}
Now if we'll prepare little playground
import scala.concurrent.ExecutionContext.Implicits._
import scala.concurrent.{Await, Future}
import scala.concurrent.duration._
val good = Enumerator.enumerate[Result[Int]](
Seq(Success(1), Empty, Success(2), Success(3)))
val bad = Enumerator.enumerate[Result[Int]](
Seq(Success(1), Success(2), Error(new Exception("uh oh")), Success(3)))
def runRes[X](e: Enumerator[Result[X]]) : Result[List[X]] = Await.result(e.run(listResults), 3 seconds)
we can verify those results
runRes(good) //res0: Result[List[Int]] = Success(List(1, 2, 3))
runRes(bad) //res1: Result[List[Int]] = Error(java.lang.Exception: uh oh)

Why is headOption faster

I made a change to some code and it got 4.5x faster. I'm wondering why. It used to be essentially:
def doThing(queue: Queue[(String, String)]): Queue[(String, String)] = queue match {
case Queue((thing, stuff), _*) => doThing(queue.tail)
case _ => queue
}
and I changed it to this to get a huge speed boost:
def doThing(queue: Queue[(String, String)]): Queue[(String, String)] = queue.headOption match {
case Some((thing, stuff)) => doThing(queue.tail)
case _ => queue
}
What does _* do and why is it so expensive compared to headOption?
My guess after running scalac with -Xprint:all is that at the end of patmat in the queue match { case Queue((thing, stuff), _*) => doThing(queue.tail) } example I see the following methods being called (edited for brevity):
val o9 = scala.collection.immutable.Queue.unapplySeq[(String, String)](x1);
if (o9.isEmpty.unary_!)
if (o9.get.!=(null).&&(o9.get.lengthCompare(1).>=(0)))
{
val p2: (String, String) = o9.get.apply(0);
val p3: Seq[(String, String)] = o9.get.drop(1);
So lengthCompare compare the length of the collection in a possibly optimized way. For Queue, it creates an iterator and iterates one time. So that should be somewhat fast. On the other hand drop(1) also constructs an iterator, skips one element and adds the rest of the elements to the result queue, so that would be linear in the size of the collection.
The headOption example is more straightforward, it checks if the list is empty (two comparisons), and if not returns a Some(head), which then just has its _1 and _2 assigned to thing and stuff. So no iterators are created and nothing linear in the length of the collection.
There should be no significant difference between your code samples.
case Queue((thing, stuff), _*) is actually translated by compiler to call of head (apply(0)) method. You could use scalac -Xprint:patmat to investigate this:
<synthetic> val p2: (String, String) = o9.get.apply(0);
if (p2.ne(null))
matchEnd6(doThing(queue.tail))
The cost of head and cost of headOption are almost the same.
Methods head, tail and dequeue could cause reverce on internal List of Queue (with cost O(n)). In both you code samples there will be at most 2 reverce calls.
You should use dequeue like this to get at most a single reverce call:
def doThing(queue: Queue[(String, String)]): Queue[(String, String)] =
if (queue.isEmpty) queue
else queue.dequeue match {
case (e, q) => doThing(q)
}
You could also replace (thing, stuff) with _. In this case compiler will generate only call of lengthCompare without head or tail:
if (o9.get != null && o9.get.lengthCompare(1) >= 0)
_* is used to specify varargs arguments, so what you are doing in the first version is deconstructing the Queue into a pair of Strings, and an appropriate number of further pairs of Strings - ie you are deconstructing the whole Queue even though you only care about the first element.
If you just remove the asterisk, giving
def doThing(queue: Queue[(String, String)]): Queue[(String, String)] = queue match {
case Queue((thing, stuff), _) => doThing(queue.tail)
case _ => queue
}
then you are only deconstructing the Queue into a pair of Strings, and a remainder (which thus does not need to be fully deconstructed). This should run in comparable time to your second version (haven't timed it myself, though).

Threading extra state through a parser in Scala

I'll give you the tl;dr up front
I'm trying to use the state monad transformer in Scalaz 7 to thread extra state through a parser, and I'm having trouble doing anything useful without writing a lot of t m a -> t m b versions of m a -> m b methods.
An example parsing problem
Suppose I have a string containing nested parentheses with digits inside them:
val input = "((617)((0)(32)))"
I also have a stream of fresh variable names (characters, in this case):
val names = Stream('a' to 'z': _*)
I want to pull a name off the top of the stream and assign it to each parenthetical
expression as I parse it, and then map that name to a string representing the
contents of the parentheses, with the nested parenthetical expressions (if any) replaced by their
names.
To make this more concrete, here's what I'd want the output to look like for the example input above:
val target = Map(
'a' -> "617",
'b' -> "0",
'c' -> "32",
'd' -> "bc",
'e' -> "ad"
)
There may be either a string of digits or arbitrarily many sub-expressions at a given level, but these two kinds of content won't be mixed in a single parenthetical expression.
To keep things simple, we'll assume that the stream of names will never
contain either duplicates or digits, and that it will always contain enough
names for our input.
Using parser combinators with a bit of mutable state
The example above is a slightly simplified version of the parsing problem in
this Stack Overflow question.
I answered that question with
a solution that looked roughly like this:
import scala.util.parsing.combinator._
class ParenParser(names: Iterator[Char]) extends RegexParsers {
def paren: Parser[List[(Char, String)]] = "(" ~> contents <~ ")" ^^ {
case (s, m) => (names.next -> s) :: m
}
def contents: Parser[(String, List[(Char, String)])] =
"\\d+".r ^^ (_ -> Nil) | rep1(paren) ^^ (
ps => ps.map(_.head._1).mkString -> ps.flatten
)
def parse(s: String) = parseAll(paren, s).map(_.toMap)
}
It's not too bad, but I'd prefer to avoid the mutable state.
What I want
Haskell's Parsec library makes
adding user state to a parser trivially easy:
import Control.Applicative ((*>), (<$>), (<*))
import Data.Map (fromList)
import Text.Parsec
paren = do
(s, m) <- char '(' *> contents <* char ')'
h : t <- getState
putState t
return $ (h, s) : m
where
contents
= flip (,) []
<$> many1 digit
<|> (\ps -> (map (fst . head) ps, concat ps))
<$> many1 paren
main = print $
runParser (fromList <$> paren) ['a'..'z'] "example" "((617)((0)(32)))"
This is a fairly straightforward translation of my Scala parser above, but without mutable state.
What I've tried
I'm trying to get as close to the Parsec solution as I can using Scalaz's state monad transformer, so instead of Parser[A] I'm working with StateT[Parser, Stream[Char], A].
I have a "solution" that allows me to write the following:
import scala.util.parsing.combinator._
import scalaz._, Scalaz._
object ParenParser extends ExtraStateParsers[Stream[Char]] with RegexParsers {
protected implicit def monadInstance = parserMonad(this)
def paren: ESP[List[(Char, String)]] =
(lift("(" ) ~> contents <~ lift(")")).flatMap {
case (s, m) => get.flatMap(
names => put(names.tail).map(_ => (names.head -> s) :: m)
)
}
def contents: ESP[(String, List[(Char, String)])] =
lift("\\d+".r ^^ (_ -> Nil)) | rep1(paren).map(
ps => ps.map(_.head._1).mkString -> ps.flatten
)
def parse(s: String, names: Stream[Char]) =
parseAll(paren.eval(names), s).map(_.toMap)
}
This works, and it's not that much less concise than either the mutable state version or the Parsec version.
But my ExtraStateParsers is ugly as sin—I don't want to try your patience more than I already have, so I won't include it here (although here's a link, if you really want it). I've had to write new versions of every Parser and Parsers method I use above
for my ExtraStateParsers and ESP types (rep1, ~>, <~, and |, in case you're counting). If I had needed to use other combinators, I'd have had to write new state transformer-level versions of them as well.
Is there a cleaner way to do this? I'd love to see an example of a Scalaz 7's state monad transformer being used to thread state through a parser, but Scalaz 6 or Haskell examples would also be useful and appreciated.
Probably the most general solution would be to rewrite Scala's parser library to accommodate monadic computations while parsing (like you partly did), but that would be quite a laborious task.
I suggest a solution using ScalaZ's State where each of our result isn't a value of type Parse[X], but a value of type Parse[State[Stream[Char],X]] (aliased as ParserS[X]). So the overall parsed result isn't a value, but a monadic state value, which is then run on some Stream[Char]. This is almost a monad transformer, but we have to do lifting/unlifting manually. It makes the code a bit uglier, as we need to lift values sometimes or use map/flatMap on several places, but I believe it's still reasonable.
import scala.util.parsing.combinator._
import scalaz._
import Scalaz._
import Traverse._
object ParenParser extends RegexParsers with States {
type S[X] = State[Stream[Char],X];
type ParserS[X] = Parser[S[X]];
// Haskell's `return` for States
def toState[S,X](x: X): State[S,X] = gets(_ => x)
// Haskell's `mapM` for State
def mapM[S,X](l: List[State[S,X]]): State[S,List[X]] =
l.traverse[({type L[Y] = State[S,Y]})#L,X](identity _);
// .................................................
// Read the next character from the stream inside the state
// and update the state to the stream's tail.
def next: S[Char] = state(s => (s.tail, s.head));
def paren: ParserS[List[(Char, String)]] =
"(" ~> contents <~ ")" ^^ (_ flatMap {
case (s, m) => next map (v => (v -> s) :: m)
})
def contents: ParserS[(String, List[(Char, String)])] = digits | parens;
def digits: ParserS[(String, List[(Char, String)])] =
"\\d+".r ^^ (_ -> Nil) ^^ (toState _)
def parens: ParserS[(String, List[(Char, String)])] =
rep1(paren) ^^ (mapM _) ^^ (_.map(
ps => ps.map(_.head._1).mkString -> ps.flatten
))
def parse(s: String): ParseResult[S[Map[Char,String]]] =
parseAll(paren, s).map(_.map(_.toMap))
def parse(s: String, names: Stream[Char]): ParseResult[Map[Char,String]] =
parse(s).map(_ ! names);
}
object ParenParserTest extends App {
{
println(ParenParser.parse("((617)((0)(32)))", Stream('a' to 'z': _*)));
}
}
Note: I believe that your approach with StateT[Parser, Stream[Char], _] isn't conceptually correct. The type says that we're constructing a parser given some state (a stream of names). So it would be possible that given different streams we get different parsers. This is not what we want to do. We only want that the result of parsing depends on the names, not the whole parser. In this way Parser[State[Stream[Char],_]] seems to be more appropriate (Haskell's Parsec takes a similar approach, the state/monad is inside the parser).

Using Either to process failures in Scala code

Option monad is a great expressive way to deal with something-or-nothing things in Scala. But what if one needs to log a message when "nothing" occurs? According to the Scala API documentation,
The Either type is often used as an
alternative to scala.Option where Left
represents failure (by convention) and
Right is akin to Some.
However, I had no luck to find best practices using Either or good real-world examples involving Either for processing failures. Finally I've come up with the following code for my own project:
def logs: Array[String] = {
def props: Option[Map[String, Any]] = configAdmin.map{ ca =>
val config = ca.getConfiguration(PID, null)
config.properties getOrElse immutable.Map.empty
}
def checkType(any: Any): Option[Array[String]] = any match {
case a: Array[String] => Some(a)
case _ => None
}
def lookup: Either[(Symbol, String), Array[String]] =
for {val properties <- props.toRight('warning -> "ConfigurationAdmin service not bound").right
val logsParam <- properties.get("logs").toRight('debug -> "'logs' not defined in the configuration").right
val array <- checkType(logsParam).toRight('warning -> "unknown type of 'logs' confguration parameter").right}
yield array
lookup.fold(failure => { failure match {
case ('warning, msg) => log(LogService.WARNING, msg)
case ('debug, msg) => log(LogService.DEBUG, msg)
case _ =>
}; new Array[String](0) }, success => success)
}
(Please note this is a snippet from a real project, so it will not compile on its own)
I'd be grateful to know how you are using Either in your code and/or better ideas on refactoring the above code.
Either is used to return one of possible two meaningful results, unlike Option which is used to return a single meaningful result or nothing.
An easy to understand example is given below (circulated on the Scala mailing list a while back):
def throwableToLeft[T](block: => T): Either[java.lang.Throwable, T] =
try {
Right(block)
} catch {
case ex => Left(ex)
}
As the function name implies, if the execution of "block" is successful, it will return "Right(<result>)". Otherwise, if a Throwable is thrown, it will return "Left(<throwable>)". Use pattern matching to process the result:
var s = "hello"
throwableToLeft { s.toUpperCase } match {
case Right(s) => println(s)
case Left(e) => e.printStackTrace
}
// prints "HELLO"
s = null
throwableToLeft { s.toUpperCase } match {
case Right(s) => println(s)
case Left(e) => e.printStackTrace
}
// prints NullPointerException stack trace
Hope that helps.
Scalaz library has something alike Either named Validation. It is more idiomatic than Either for use as "get either a valid result or a failure".
Validation also allows to accumulate errors.
Edit: "alike" Either is complettly false, because Validation is an applicative functor, and scalaz Either, named \/ (pronounced "disjonction" or "either"), is a monad.
The fact that Validation can accumalate errors is because of that nature. On the other hand, / has a "stop early" nature, stopping at the first -\/ (read it "left", or "error") it encounters. There is a perfect explanation here: http://typelevel.org/blog/2014/02/21/error-handling.html
See: http://scalaz.googlecode.com/svn/continuous/latest/browse.sxr/scalaz/example/ExampleValidation.scala.html
As requested by the comment, copy/paste of the above link (some lines removed):
// Extracting success or failure values
val s: Validation[String, Int] = 1.success
val f: Validation[String, Int] = "error".fail
// It is recommended to use fold rather than pattern matching:
val result: String = s.fold(e => "got error: " + e, s => "got success: " + s.toString)
s match {
case Success(a) => "success"
case Failure(e) => "fail"
}
// Validation is a Monad, and can be used in for comprehensions.
val k1 = for {
i <- s
j <- s
} yield i + j
k1.toOption assert_≟ Some(2)
// The first failing sub-computation fails the entire computation.
val k2 = for {
i <- f
j <- f
} yield i + j
k2.fail.toOption assert_≟ Some("error")
// Validation is also an Applicative Functor, if the type of the error side of the validation is a Semigroup.
// A number of computations are tried. If the all success, a function can combine them into a Success. If any
// of them fails, the individual errors are accumulated.
// Use the NonEmptyList semigroup to accumulate errors using the Validation Applicative Functor.
val k4 = (fNel <**> fNel){ _ + _ }
k4.fail.toOption assert_≟ some(nel1("error", "error"))
The snippet you posted seems very contrived. You use Either in a situation where:
It's not enough to just know the data isn't available.
You need to return one of two distinct types.
Turning an exception into a Left is, indeed, a common use case. Over try/catch, it has the advantage of keeping the code together, which makes sense if the exception is an expected result. The most common way of handling Either is pattern matching:
result match {
case Right(res) => ...
case Left(res) => ...
}
Another interesting way of handling Either is when it appears in a collection. When doing a map over a collection, throwing an exception might not be viable, and you may want to return some information other than "not possible". Using an Either enables you to do that without overburdening the algorithm:
val list = (
library
\\ "books"
map (book =>
if (book \ "author" isEmpty)
Left(book)
else
Right((book \ "author" toList) map (_ text))
)
)
Here we get a list of all authors in the library, plus a list of books without an author. So we can then further process it accordingly:
val authorCount = (
(Map[String,Int]() /: (list filter (_ isRight) map (_.right.get)))
((map, author) => map + (author -> (map.getOrElse(author, 0) + 1)))
toList
)
val problemBooks = list flatMap (_.left.toSeq) // thanks to Azarov for this variation
So, basic Either usage goes like that. It's not a particularly useful class, but if it were you'd have seen it before. On the other hand, it's not useless either.
Cats has a nice way to create an Either from exception-throwing code:
val either: Either[NumberFormatException, Int] =
Either.catchOnly[NumberFormatException]("abc".toInt)
// either: Either[NumberFormatException,Int] = Left(java.lang.NumberFormatException: For input string: "abc")
in https://typelevel.org/cats/datatypes/either.html#working-with-exception-y-code