Idiomatic way to find a matching line in Scala - scala

I have an Iterable[String] representing the lines in a file and I'd like to find the first line in that sequence that matches a regular expression and return a numerical value extracted by the regex. The file is big enough that it wouldn't make sense to load the whole thing into memory and then call toString() or something, so I'll need to go through it a line at a time.
Here's what I have (it works):
val RateRegex : Regex = ".....".r
def getRate(source : Source) : Option[Double] = {
import java.lang.Double._
for(line <- source.getLines() ) {
line match {
case RateRegex(rawRate) => return Some(parseDouble(rawRate))
case None => ()
}
}
return None
}
This seems ugly to me. It feels very imperative and case None => () might as well be replaced with a comment that says "you're doing it wrong."
I think I want something like def findFirstWhereNonNone(p : Function[A,Option[B]]) => Option[B] where the collection's elements are of type A.
Are there built-in methods that would let me do this in a more functional way? Should I just write that method?
P.S. While I'm at it, is there an alternative to using java.lang.Double.parseDouble? Scala's Double class doesn't expose it.
P.P.S I've seen a lot of posts on SO suggesting that the Source API shouldn't be used in production, but they're all from 2008 and 2009. Is that still the case? If so, what should I use for IO?
Update
I now have:
import util.matching.Regex.Groups
for{line <- source.getLines()
Groups(rawRate) <- RateRegex.findFirstMatchIn(line)} {
return Some(parseDouble(rawRate))
}
return None
which feels a lot better to me.

EDIT: This third alternative ia quite neat:
source
.getLines()
.collectFirst{ case RateRegex(x) => x.toDouble}
Not sure if it's more functional, but you can use the behaviour of foreach/for-comprehensions on Options
def getRate(source : Source) : Option[Double] = {
for {line <- source.getLines()
rawRate <- RateRegex.findFirstIn(line)}
return Some(rawRate toDouble)
return None
}
This works too (quite similar to EasyAngel's answer):
source
.getLines()
.map{RateRegex.findFirstMatchIn(_)}
.filter{_.isDefined}
.map{_.get.group(0).toDouble}
.head
.toList
.headOption
The last three are a little ugly. The take(1) is to ensure we only evaluate up to the first match. The toList is to force the evaluation, and the headOption to extract the first value as Some() or None if there is none. Is there a more idiomatic way of doing this?

Here is one of the possible solutions:
def getRates(source : Source) = source.getLines.map {
case RateRegex(rate) => Some(rate toDouble)
case _ => None
} filter (_ isDefined) toList
Please note, that this function now returns now List[Option[Double]] of all found rates. It's also important, that Iterator remains lazy until I call toList
Update
As was asked in comments, here is solution, that returns only first occurrence:
def getRate(source : Source): Option[Double] = source.getLines.map {
case RateRegex(rate) => Some(rate toDouble)
case _ => None
} find (_ isDefined) getOrElse None

Related

Do something when exactly one option is non-empty

I want to compute something if exactly one of two options is non-empty. Obviously this could be done by a pattern match, but is there some better way?
(o1, o2) match {
case (Some(o), None) => Some(compute(o))
case (None, Some(o)) => Some(compute(o))
case _ => None
}
You could do something like this:
if (o1.isEmpty ^ o2.isEmpty)
List(o1,o2).flatMap(_.map(x=>Some(compute(x)))).head
else
None
But pattern matching is probably the better way to go.
Thanks to helpful comments from #Suma, I came up with another solutions in addition to the current ones:
Since the inputs are always in the form of Option(x):
Iterator(Seq(o1,o2).filter(_!=None))
.takeWhile(_.length==1)
.map( x => compute(x.head.get))
.toSeq.headOption
Using iterator also allows for a sequence of values to be passed to the input. The final mapping will be done if and only if one value in the sequence is defined.
Inspired by now deleted answer of pedrofurla, which was attempting to use o1 orElse o2 map { compute }, one possibility is to define xorElse, the rest is easy with it:
implicit class XorElse[T](o1: Option[T]) {
def xorElse[A >: T](o2: Option[A]): Option[A] = {
if (o1.isDefined != o2.isDefined) o1 orElse o2
else None
}
}
(o1 xorElse o2).map(compute)
Another possibility I have found is using a pattern match, but using Seq concatenation so that both cases are handled with the same code. The advantage of this approach is it can be extended to any number of options, it will always evaluate when there is exactly one:
o1.toSeq ++ o2 match {
case Seq(one) => Some(compute(one))
case _ => None
}
Just initialize a sequence and then flatten
Seq(o1, o2).flatten match {
case Seq(o) => Some(compute(o))
case _ => None
}

Avoiding deeply nested Option cascades in Scala

Say I have three database access functions foo, bar, and baz that can each return Option[A] where A is some model class, and the calls depend on each other.
I would like to call the functions sequentially and in each case, return an appropriate error message if the value is not found (None).
My current code looks like this:
Input is a URL: /x/:xID/y/:yID/z/:zID
foo(xID) match {
case None => Left(s"$xID is not a valid id")
case Some(x) =>
bar(yID) match {
case None => Left(s"$yID is not a valid id")
case Some(y) =>
baz(zID) match {
case None => Left(s"$zID is not a valid id")
case Some(z) => Right(process(x, y, z))
}
}
}
As can be seen, the code is badly nested.
If instead, I use a for comprehension, I cannot give specific error messages, because I do not know which step failed:
(for {
x <- foo(xID)
y <- bar(yID)
z <- baz(zID)
} yield {
Right(process(x, y, z))
}).getOrElse(Left("One of the IDs was invalid, but we do not know which one"))
If I use map and getOrElse, I end up with code almost as nested as the first example.
Is these some better way to structure this to avoid the nesting while allowing specific error messages?
You can get your for loop working by using right projections.
def ckErr[A](id: String, f: String => Option[A]) = (f(id) match {
case None => Left(s"$id is not a valid id")
case Some(a) => Right(a)
}).right
for {
x <- ckErr(xID, foo)
y <- ckErr(yID, bar)
z <- ckErr(zID, baz)
} yield process(x,y,z)
This is still a little clumsy, but it has the advantage of being part of the standard library.
Exceptions are another way to go, but they slow things down a lot if the failure cases are common. I'd only use that if failure was truly exceptional.
It's also possible to use non-local returns, but it's kind of awkward for this particular setup. I think right projections of Either are the way to go. If you really like working this way but dislike putting .right all over the place, there are various places you can find a "right-biased Either" which will act like the right projection by default (e.g. ScalaUtils, Scalaz, etc.).
Instead of using an Option I would instead use a Try. That way you have the Monadic composition that you'd like mixed with the ability to retain the error.
def myDBAccess(..args..) =
thingThatDoesStuff(args) match{
case Some(x) => Success(x)
case None => Failure(new IdError(args))
}
I'm assuming in the above that you don't actually control the functions and can't refactor them to give you a non-Option. If you did, then simply substitute Try.
I know this question was answered some time back, but I wanted to give an alternative to the accepted answer.
Given that, in your example, the three Options are independent, you can treat them as Applicative Functors and use ValidatedNel from Cats to simplify and aggregate the handling of the unhappy path.
Given the code:
import cats.data.Validated.{invalidNel, valid}
def checkOption[B, T](t : Option[T])(ifNone : => B) : ValidatedNel[B, T] = t match {
case None => invalidNel(ifNone)
case Some(x) => valid(x)
def processUnwrappedData(a : Int, b : String, c : Boolean) : String = ???
val o1 : Option[Int] = ???
val o2 : Option[String] = ???
val o3 : Option[Boolean] = ???
You can then replicate obtain what you want with:
//import cats.syntax.cartesian._
(
checkOption(o1)(s"First option is not None") |#|
checkOption(o2)(s"Second option is not None") |#|
checkOption(o3)(s"Third option is not None")
) map (processUnwrappedData)
This approach will allow you to aggregate failures, which was not possible in your solution (as using for-comprehensions enforces sequential evaluation). More examples and documentation can be found here and here.
Finally this solution uses Cats Validated but could easily be translated to Scalaz Validation
I came up with this solution (based on #Rex's solution and his comments):
def ifTrue[A](boolean: Boolean)(isFalse: => A): RightProjection[A, Unit.type] =
Either.cond(boolean, Unit, isFalse).right
def none[A](option: Option[_])(isSome: => A): RightProjection[A, Unit.type] =
Either.cond(option.isEmpty, Unit, isSome).right
def some[A, B](option: Option[A])(ifNone: => B): RightProjection[B, A] =
option.toRight(ifNone).right
They do the following:
ifTrue is used when a function returns a Boolean, with true being the "success" case (e.g.: isAllowed(userId)). It actually returns Unit so should be used as _ <- ifTrue(...) { error } in a for comprehension.
none is used when a function returns an Option with None being the "success" case (e.g.: findUser(email) for creating accounts with unique email addresses). It actually returns Unit so should be used as _ <- none(...) { error } in a for comprehension.
some is used when a function returns an Option with Some() being the "success" case (e.g.: findUser(userId) for a GET /users/userId). It returns the contents of the Some: user <- some(findUser(userId)) { s"user $userId not found" }.
They are used in a for comprehension:
for {
x <- some(foo(xID)) { s"$xID is not a valid id" }
y <- some(bar(yID)) { s"$yID is not a valid id" }
z <- some(baz(zID)) { s"$zID is not a valid id" }
} yield {
process(x, y, z)
}
This returns an Either[String, X] where the String is an error message and the X is the result of calling process.

Handle Scala Option idiomatically

What is the more idiomatic way to handle an Option, map / getOrElse, or match?
val x = option map {
value => Math.cos(value) + Math.sin(value)
} getOrElse {
.5
}
or
val x = option match {
case Some(value) => Math.cos(value) + Math.sin(value)
case None => .5
}
You could always just look at the Scaladoc for Option:
The most idiomatic way to use an scala.Option instance is to treat it as a collection or monad and use map,flatMap, filter, or foreach:
val name: Option[String] = request getParameter "name"
val upper = name map { _.trim } filter { _.length != 0 } map { _.toUpperCase }
println(upper getOrElse "")
And a bit later:
A less-idiomatic way to use scala.Option values is via pattern matching:
val nameMaybe = request getParameter "name"
nameMaybe match {
case Some(name) =>
println(name.trim.toUppercase)
case None =>
println("No name value")
}
Use fold for this kind of map-or-else-default thing:
val x = option.fold(0.5){ value => Math.cos(value) + Math.sin(value) }
Obviously both are valid and I don't think one is more idiomatic than the other. That being said, using map uses the fact the Option is a Monad. This can be particularly advantageous when combining two Options. Say you have two Option[Int] that you would like to add. In this case instead of doing multiple matches it is much cleaner to use map/flatMap and it's equivalent "for comprehensions". So for your example both are valid... but for other examples using map/flatMap is often much more succinct.
Some(6).flatMap(intValue => Some(5).map(intValue + _))
or
for {
i <- Some(6)
j <- Some(5)
} yield i + j
All of them have different semantics, so in your case none of them.
map applies some function to the value inside Option, if it exists (Some, not None). Basically this is how you safely work with Options, appling function on some null value is dangeroues, cause it can throw NPE, but in case with Option it just returns None.
getOrElse simply returns either it's value or default one (which you provide as an argument). It won't do anything with the value inside the Option, you can just extract it, if you have Some, or return a default one, in case of None.
and match approach i'd say is a combination of two, cause you can apply some computation on the values and extract it from the Option

Mapping many Eithers to one Either with many

Say I have a monadic function in called processOne defined like this:
def processOne(input: Input): Either[ErrorType, Output] = ...
Given a list of "Inputs", I would like to return a corresponding list of "Outputs" wrapped in an Either:
def processMany(inputs: Seq[Input]): Either[ErrorType, Seq[Output]] = ...
processMany will call processOne for each input it has, however, I would like it to terminate the first time (if any) that processOne returns a Left, and return that Left, otherwise return a Right with a list of the outputs.
My question: what is the best way to implement processMany? Is it possible to accomplish this behavior using a for expression, or is it going to be necessary for me to iterate the list myself recursively?
With Scalaz 7:
def processMany(inputs: Seq[Input]): Either[ErrorType, Seq[Output]] =
inputs.toStream traverseU processOne
Converting inputs to a Stream[Input] takes advantage of the non-strict traverse implementation for Stream, i.e. gives you the short-circuiting behaviour you want.
By the way, you tagged this "monads", but traversal requires only an applicative functor (which, as it happens, is probably defined in terms of the monad for Either). For further reference, see the paper The Essence of the Iterator Pattern, or, for a Scala-based interpretation, Eric Torreborre's blog post on the subject.
The easiest with standard Scala, which doesn't evaluate more than is necessary, would probably be
def processMany(inputs: Seq[Input]): Either[ErrorType, Seq[Output]] = {
Right(inputs.map{ x =>
processOne(x) match {
case Right(r) => r
case Left(l) => return Left(l)
}
})
}
A fold would be more compact, but wouldn't short-circuit when it hit a left (it'd just keep carrying it along while you iterated through the entire input).
For now, I've decided to just solve this using recursion, as I am reluctant to add a dependency to a library (Scalaz).
(Types and names in my application have been changed here in order to appear more generic)
def processMany(inputs: Seq[Input]): Either[ErrorType, Seq[Output]] = {
import scala.annotation.tailrec
#tailrec
def traverse(acc: Vector[Output], inputs: List[Input]): Either[ErrorType, Seq[Output]] = {
inputs match {
case Nil => Right(acc)
case input :: more =>
processOne(input) match {
case Right(output) => traverse(acc :+ output, more)
case Left(e) => Left(e)
}
}
}
traverse(Vector[Output](), inputs.toList)
}

Processing Scala Option[T]

I have a Scala Option[T]. If the value is Some(x) I want to process it with a a process that does not return a value (Unit), but if it is None, I want to print an error.
I can use the following code to do this, but I understand that the more idiomatic way is to treat the Option[T] as a sequence and use map, foreach, etc. How do I do this?
opt match {
case Some(x) => // process x with no return value, e.g. write x to a file
case None => // print error message
}
I think explicit pattern matching suits your use case best.
Scala's Option is, sadly, missing a method to do exactly this. I add one:
class OptionWrapper[A](o: Option[A]) {
def fold[Z](default: => Z)(action: A => Z) = o.map(action).getOrElse(default)
}
implicit def option_has_utility[A](o: Option[A]) = new OptionWrapper(o)
which has the slightly nicer (in my view) usage
op.fold{ println("Empty!") }{ x => doStuffWith(x) }
You can see from how it's defined that map/getOrElse can be used instead of pattern matching.
Alternatively, Either already has a fold method. So you can
op.toRight(()).fold{ _ => println("Empty!") }{ x => doStuffWith(x) }
but this is a little clumsy given that you have to provide the left value (here (), i.e. Unit) and then define a function on that, rather than just stating what you want to happen on None.
The pattern match isn't bad either, especially for longer blocks of code. For short ones, the overhead of the match starts getting in the way of the point. For example:
op.fold{ printError }{ saveUserInput }
has a lot less syntactic overhead than
op match {
case Some(x) => saveUserInput(x)
case None => printError
}
and therefore, once you expect it, is a lot easier to comprehend.
I'd recommend to simply and safely use opt.get which itself throws a NoSuchElementException exception if opt is None. Or if you want to throw your own exception, you can do this:
val x = opt.getOrElse(throw new Exception("Your error message"))
// x is of type T
as #missingfaktor says, you are in the exact scenario where pattern matching is giving the most readable results.
If Option has a value you want to do something, if not you want to do something else.
While there are various ways to use map and other functional constructs on Option types, they are generally useful when:
you want to use the Some case and ignore the None case e.g. in your case
opt.map(writeToFile(_)) //(...if None just do nothing)
or you want to chain the operations on more than one option and give a result only when all of them are Some. For instance, one way of doing this is:
val concatThreeOptions =
for {
n1 <- opt1
n2 <- opt2
n3 <- opt3
} yield n1 + n2 + n3 // this will be None if any of the three is None
// we will either write them all to a file or none of them
but none of these seem to be your case
Pattern matching is the best choice here.
However, if you want to treat Option as a sequence and to map over it, you can do it, because Unit is a value:
opt map { v =>
println(v) // process v (result type is Unit)
} getOrElse {
println("error")
}
By the way, printing an error is some kind of "anti-pattern", so it's better to throw an exception anyway:
opt.getOrElse(throw new SomeException)