How to better partition valid or invalid inputs - scala

Given a list of inputs that could be valid or invalid, is there a nice way to transform the list but to fail given one or more invalid inputs and, if necessary, to return information about those invalid inputs? I have something like this, but it feels very inelegant.
def processInput(inputList: List[Input]): Try[List[Output]] = {
inputList map { input =>
if (isValid(input)) Left(Output(input))
else Right(input)
} partition { result =>
result.isLeft
} match {
case (valids, Nil) =>
val outputList = valids map { case Left(output) => output }
Success(outputList)
case (_, invalids) =>
val errList = invalids map { case Right(invalid) => invalid }
Failure(new Throwable(s"The following inputs were invalid: ${errList.mkString(",")}"))
}
}
Is there a better way to do this?

I think you can simplify your current solution quite a bit with standard scala:
def processInput(inputList: List[Input]): Try[List[Output]] =
inputList.partition(isValid) match {
case (valids, Nil) => Success(valids.map(Output))
case (_, invalids) => Failure(new Throwable(s"The following inputs were invalid: ${invalids.mkString(",")}"))
}
Or, you can have a quite elegant solution with scalactic's Or.
import org.scalactic._
def processInputs(inputList: List[Input]): List[Output] Or List[Input] =
inputList.partition(isValid) match {
case (valid, Nil) => Good(valid.map(Output))
case (_, invalid) => Bad(invalid)
}
The result is of type org.scalactic.Or, which you then have to match to Good or Bad. This approach is more useful if you want the list of invalid inputs, you can match it out of Bad.

scalaz's validation is designed exactly for this. Try reading the tale of three nightclubs for how this would work, but the body of your function would probably end up just consisting of something like:
def processInput(inputList: List[Input]): Validation[List[Output]] = {
inputList foldMap { input =>
if (isValid(input)) Failure(Output(input))
else Success(List(input))
}

Related

Validation JSON schema runtime

I want to avoid Runtime undefined behaivors as follows:
val jsonExample = Json.toJson(0)
jsonExample.asOpt[Instant]
yield Some(1970-01-01T00:00:00Z)
How can I verify this using partial function with a lift or some other way, to check thatits indeed Instant, or how you recommend to validate?
ex1:
val jsonExample = Json.toJson(Instant.now())
jsonExample match { ... }
ex2:
val jsonExample = Json.toJson(0)
jsonExample match { ... }
Examples for desired output:
validateInstant(Json.toJson(Instant.now())) -> return Some(...)
validateInstant(Json.toJson(0)) -> return None
I can do somthing as follows, maybe some other ideas?
Just wanted to add a note regarding parsing json, there some runtime undefined problems when we are trying to parse .asOpt[T]
for example:
Json.toJson("0").asOpt[BigDecimal] // yields Some(0)
Json.toJson(0).asOpt[Instant] // yields Some(1970-01-01T00:00:00Z)
We can validate it as follows or some other way:
Json.toJson("0") match {
case JsString(value) => Some(value)
case _ => None
}
Json.toJson(0) match {
case JsNumber(value) => Some(value)
case _ => None
}
Json.toJson(Instant.now()) match {
case o # JsString(_) => o.asOpt[Instant]
case _ => None
}
You can use Option:
def filterNumbers[T](value: T)(implicit tjs: Writes[T]): Option[Instant] = {
Option(Json.toJson(value)).filter(_.asOpt[JsNumber].isEmpty).flatMap(_.asOpt[Instant])
}
Then the following:
println(filterNumbers(Instant.now()))
println(filterNumbers(0))
will output:
Some(2021-02-22T10:35:13.777Z)
None

Parser combinator handling alternation with an optional parser

I have a parser p of type Parser[Option[X]] and another q of type Parser[Y]. (X and Y are concrete types but that's not important here).
I'd like to combine them in such a way that the resulting parser returns a Parser[Either[X, Y]]. This parser will succeed with Left(x) if p yields Some(x) or, failing that, it will succeed with Right(y) if q yields a y. Otherwise, it will fail. Input will be consumed in the successful cases but not in the unsuccessful case.
I'd appreciate any help with this as I can't quite figure out how to make it work.
A little more perseverance after taking a break and I was able to solve this. I don't think my solution is the most elegant and would appreciate feedback:
def compose[X, Y](p: Parser[Option[X]], q: Parser[Y]): Parser[Either[X, Y]] = Parser {
in =>
p(in) match {
case s#this.Success(Some(_), _) => s map (xo => Left(xo.get))
case _ => q(in) match {
case s#this.Success(_, _) => s map (x => Right(x))
case _ => this.Failure("combine: failed", in)
}
}
}
implicit class ParserOps[X](p: Parser[Option[X]]) {
def ?|[Y](q: => Parser[Y]): Parser[Either[X, Y]] = compose(p, q)
}
// Example of usage
def anadicTerm: Parser[AnadicTerm] = (maybeNumber ?| anadicOperator) ^^ {
case Left(x: Number) => debug("anadicTerm (Number)", AnadicTerm(Right(x)))
case Right(x: String) => debug("anadicTerm (String)", AnadicTerm(Left(x)))
}

Handling and throwing Exceptions in Scala

I have the following implementation:
val dateFormats = Seq("dd/MM/yyyy", "dd.MM.yyyy")
implicit def dateTimeCSVConverter: CsvFieldReader[DateTime] = (s: String) => Try {
val elem = dateFormats.map {
format =>
try {
Some(DateTimeFormat.forPattern(format).parseDateTime(s))
} catch {
case _: IllegalArgumentException =>
None
}
}.collectFirst {
case e if e.isDefined => e.get
}
if (elem.isDefined)
elem.get
else
throw new IllegalArgumentException(s"Unable to parse DateTime $s")
}
So basically what I'm doing is that, I'm running over my Seq and trying to parse the DateTime with different formats. I then collect the first one that succeeds and if not I throw the Exception back.
I'm not completely satisfied with the code. Is there a better way to make it simpler? I need the exception message passed on to the caller.
The one problem with your code is it tries all patterns no matter if date was already parsed. You could use lazy collection, like Stream to solve this problem:
def dateTimeCSVConverter(s: String) = Stream("dd/MM/yyyy", "dd.MM.yyyy")
.map(f => Try(DateTimeFormat.forPattern(format).parseDateTime(s))
.dropWhile(_.isFailure)
.headOption
Even better is the solution proposed by jwvh with find (you don't have to call headOption):
def dateTimeCSVConverter(s: String) = Stream("dd/MM/yyyy", "dd.MM.yyyy")
.map(f => Try(DateTimeFormat.forPattern(format).parseDateTime(s))
.find(_.isSuccess)
It returns None if none of patterns matched. If you want to throw exception on that case, you can uwrap option with getOrElse:
...
.dropWhile(_.isFailure)
.headOption
.getOrElse(throw new IllegalArgumentException(s"Unable to parse DateTime $s"))
The important thing is, that when any validation succeedes, it won't go further but will return parsed date right away.
This is a possible solution that iterates through all the options
val dateFormats = Seq("dd/MM/yyyy", "dd.MM.yyyy")
val dates = Vector("01/01/2019", "01.01.2019", "01-01-2019")
dates.foreach(s => {
val d: Option[Try[DateTime]] = dateFormats
.map(format => Try(DateTimeFormat.forPattern(format).parseDateTime(s)))
.filter(_.isSuccess)
.headOption
d match {
case Some(d) => println(d.toString)
case _ => throw new IllegalArgumentException("foo")
}
})
This is an alternative solution that returns the first successful conversion, if any
val dateFormats = Seq("dd/MM/yyyy", "dd.MM.yyyy")
val dates = Vector("01/01/2019", "01.01.2019", "01-01-2019")
dates.foreach(s => {
dateFormats.find(format => Try(DateTimeFormat.forPattern(format).parseDateTime(s)).isSuccess) match {
case Some(format) => println(DateTimeFormat.forPattern(format).parseDateTime(s))
case _ => throw new IllegalArgumentException("foo")
}
})
I made it sweet like this now! I like this a lot better! Use this if you want to collect all the successes and all the failures. Note that, this might be a bit in-efficient when you need to break out of the loop as soon as you find one success!
implicit def dateTimeCSVConverter: CsvFieldReader[DateTime] = (s: String) => Try {
val (successes, failures) = dateFormats.map {
case format => Try(DateTimeFormat.forPattern(format).parseDateTime(s))
}.partition(_.isSuccess)
if (successes.nonEmpty)
successes.head.get
else
failures.head.get
}

Scala return variable type after future is complete Add Comment Collapse

I've got a problem with returning a list after handling futures in scala. My code looks like this:
def getElements(arrayOfIds: Future[Seq[Int]]): Future[Seq[Element]] = {
var elementArray: Seq[Element] = Seq()
arrayOfIds.map {
ids => ids.map(id => dto.getElementById(id).map {
case Some(element) => elementArray = elementArray :+ element
case None => println("Element not found")
})
}
arrayOfIds.onComplete(_ => elementArray)
}
I'd like to do something like .onComplete, however the return type is
Unit and I'd like to return a Future[Seq[Whatever]]. Is there clean way to handle futures like this? Thanks!
Please provide the type of function dto.getElementById. If it is Int => Future[Option[Element]], then:
def getElements(arrayOfIds: Future[Seq[Int]]): Future[Seq[Element]] = {
val allElements: Future[Seq[Option[Element]]] = arrayOfIds.flatMap( ids =>
Future.sequence(ids.map(dto.getElementById))
)
allElements.map(_.flatMap{
case None => println();None
case some => some
})
}
Without logging, it would be:
arrayOfIds.flatMap( ids => Future.traverse(ids.map(dto.getElementById))(_.flatten))
Instead of assigning the result to a mutable variable, return it from the continuation of the Future. You can use flatMap to extract only the Element results which actually contain a value:
def getElements(arrayOfIds: Future[Seq[Int]]): Future[Seq[Element]] = {
arrayOfIds.flatMap(id => Future.fold(id.map(getElementById))(Seq.empty[Element])(_ ++ _))
}

Cleanest way in Scala to avoid nested ifs when transforming collections and checking for error conditions in each step

I have some code for validating ip addresses that looks like the following:
sealed abstract class Result
case object Valid extends Result
case class Malformatted(val invalid: Iterable[IpConfig]) extends Result
case class Duplicates(val dups: Iterable[Inet4Address]) extends Result
case class Unavailable(val taken: Iterable[Inet4Address]) extends Result
def result(ipConfigs: Iterable[IpConfig]): Result = {
val invalidIpConfigs: Iterable[IpConfig] =
ipConfigs.filterNot(ipConfig => {
(isValidIpv4(ipConfig.address)
&& isValidIpv4(ipConfig.gateway))
})
if (!invalidIpConfigs.isEmpty) {
Malformatted(invalidIpConfigs)
} else {
val ipv4it: Iterable[Inet4Address] = ipConfigs.map { ipConfig =>
InetAddress.getByName(ipConfig.address).asInstanceOf[Inet4Address]
}
val dups = ipv4it.groupBy(identity).filter(_._2.size != 1).keys
if (!dups.isEmpty) {
Duplicates(dups)
} else {
val ipAvailability: Map[Inet4Address, Boolean] =
ipv4it.map(ip => (ip, isIpAvailable(ip)))
val taken: Iterable[Inet4Address] = ipAvailability.filter(!_._2).keys
if (!taken.isEmpty) {
Unavailable(taken)
} else {
Valid
}
}
}
}
I don't like the nested ifs because it makes the code less readable. Is there a nice way to linearize this code? In java, I might use return statements, but this is discouraged in scala.
I personally advocate using a match everywhere you can, as it in my opinion usually makes code very readable
def result(ipConfigs: Iterable[IpConfig]): Result =
ipConfigs.filterNot(ipc => isValidIpv4(ipc.address) && isValidIpv4(ipc.gateway)) match {
case Nil =>
val ipv4it = ipConfigs.map { ipc =>
InetAddress.getByName(ipc.address).asInstanceOf[Inet4Address]
}
ipv4it.groupBy(identity).filter(_._2.size != 1).keys match {
case Nil =>
val taken = ipv4it.map(ip => (ip, isIpAvailable(ip))).filter(!_._2).keys
if (taken.nonEmpty) Unavailable(taken) else Valid
case dups => Duplicates(dups)
}
case invalid => Malformatted(invalid)
}
Note that I've chosen to match on the else part first, since you generally go from specific to generic in matches, since Nil is a subclass of Iterable I put that as the first case, eliminating the need for an i if i.nonEmpty in the other case, since it would be a given if it didn't match Nil
Also a thing to note here, all your vals don't need the type explicitly defined, it significantly declutters the code if you write something like
val ipAvailability: Map[Inet4Address, Boolean] =
ipv4it.map(ip => (ip, isIpAvailable(ip)))
as simply
val ipAvailability = ipv4it.map(ip => (ip, isIpAvailable(ip)))
I've also taken the liberty of removing many one-off variables I didn't find remotely necessary, as all they did was add more lines to the code
A thing to note here about using match over nested ifs, is that is that it's easier to add a new case than it is to add a new else if 99% of the time, thereby making it more modular, and modularity is always a good thing.
Alternatively, as suggested by Nathaniel Ford, you can break it up into several smaller methods, in which case the above code would look like so:
def result(ipConfigs: Iterable[IpConfig]): Result =
ipConfigs.filterNot(ipc => isValidIpv4(ipc.address) && isValidIpv4(ipc.gateway)) match {
case Nil => wellFormatted(ipConfigs)
case i => Malformatted(i)
}
def wellFormatted(ipConfigs: Iterable[IpConfig]): Result = {
val ipv4it = ipConfigs.map(ipc => InetAddress.getByName(ipc.address).asInstanceOf[Inet4Address])
ipv4it.groupBy(identity).filter(_._2.size != 1).keys match {
case Nil => noDuplicates(ipv4it)
case dups => Duplicates(dups)
}
}
def noDuplicates(ipv4it: Iterable[IpConfig]): Result =
ipv4it.map(ip => (ip, isIpAvailable(ip))).filter(!_._2).keys match {
case Nil => Valid
case taken => Unavailable(taken)
}
This has the benefit of splitting it up into smaller more manageable chunks, while keeping to the FP ideal of having functions that only do one thing, but do that one thing well, rather than having god-methods that do everything.
Which style you prefer, of course is up to you.
This has some time now but I will add my 2 cents. The proper way to handle this is with Either. You can create a method like:
def checkErrors[T](errorList: Iterable[T], onError: Result) : Either[Result, Unit] = if(errorList.isEmpty) Right() else Left(onError)
so you can use for comprehension syntax
val invalidIpConfigs = getFormatErrors(ipConfigs)
val result = for {
_ <- checkErrors(invalidIpConfigs, Malformatted(invalidIpConfigs))
dups = getDuplicates(ipConfigs)
_ <- checkErrors(dups, Duplicates(dups))
taken = getAvailability(ipConfigs)
_ <- checkErrors(taken, Unavailable(taken))
} yield Valid
If you don't want to return an Either use
result.fold(l => l, r => r)
In case of the check methods uses Futures (could be the case for getAvailability, for example), you can use cats library to be able of use it in a clean way: https://typelevel.org/cats/datatypes/eithert.html
I think it's pretty readable and I wouldn't try to improve it from there, except that !isEmpty equals to nonEmpty.