Error Handling on Monix parallel tasks (using parMap) - scala

I am trying to use monix to do parallelize certain operations and then perform error handling
Lets say I am trying to parse and validate a couple of objects like this
def parseAndValidateX(x: X) Task[X]
and
def parseAndValidateY(y: Y): Task[Y]
Here X and Y are some types I have defined.
Now each of these methods evaluates some criteria and returns a Task. If the evaluation fails, I have some code of the form
Task.raiseError(new CustomException("X is invalid because certain reasons a,b,c"))
I have a similar task raise for Y.
Task.raiseError(new CustomException("Y is invalid because certain reasons a',b',c'"))
Now I have this type
case class Combined(x: X, y: Y)
and I defined this
private def parseAndValidateCombined(x: X, y: Y): Task[Combined] = {
val xTask = parseAndValidateX(x)
val yTask = parseAndValidateY(y)
Task.parMap2(xTask, yTask) {
(xEval, yEval) => SecretData(xEval, yTask)
}
}
This should allow me to run the validation in parallel, and sure enough I get the response.
However I also want this behavior
In the event both tasks fail, I want to return an error like this
Task.raiseError(new CustomException("X is invalid because certain reasons a,b,c and "Y is invalid because certain reasons a',b',c'"))
I cannot seem to do this. Depending on which of the two tasks fail, I can only be able to get one of the two failures on the onRecover method of the parMap2 output.
In case both do fail, I only get the error for task X.
Is it possible for me to accomplish what I am doing with Monix in a fully async way (for instance maybe some other method to compose tasks together)? Or would I have to block the exec, get the errors individually and recompose the values?

With parMap2 only, it is not possible to accomplish what you want to do.
The documentation says:
In case one of the tasks fails, then all other tasks get
cancelled and the final result will be a failure.
However, it is possible to accomplish want you to want exposing errors, and not hiding behind the monadic handling. This is possible via materialize method.
So, for instance, you can implement your method as:
private def parseAndValidateCombined[X, Y](x: X, y: Y): Task[Combined] = {
val xTask = parseAndValidate(x).materialize // turn on Task[Try[X]]
val yTask = parseAndValidate(y).materialize // turn on Task[Try[Y]]
Task.parMap2(xTask, yTask) {
case (Success(xEval), Success(yEval)) => Success(SecretData(xEval, yEval))
case (Failure(_), Failure(_)) => Failure[Combined](new CustomException(....))
case (Failure(left), _) => Failure[Combined](left)
case (_, Failure(right)) => Failure[Combined](right)
}.dematerialize // turn to Task[Combined]
}

Related

Why does Finatra use flatMap and not just map?

This might be a really dumb question but I am trying to understand the logic behind using #flatMap and not just #map in this method definition in Finatra's HttpClient definition:
def executeJson[T: Manifest](request: Request, expectedStatus: Status = Status.Ok): Future[T] = {
execute(request) flatMap { httpResponse =>
if (httpResponse.status != expectedStatus) {
Future.exception(new HttpClientException(httpResponse.status, httpResponse.contentString))
} else {
Future(parseMessageBody[T](httpResponse, mapper.reader[T]))
.transformException { e =>
new HttpClientException(httpResponse.status, s"${e.getClass.getName} - ${e.getMessage}")
}
}
}
}
Why create a new Future when I can just use #map and instead have something like:
execute(request) map { httpResponse =>
if (httpResponse.status != expectedStatus) {
throw new HttpClientException(httpResponse.status, httpResponse.contentString)
} else {
try {
FinatraObjectMapper.parseResponseBody[T](httpResponse, mapper.reader[T])
} catch {
case e => throw new HttpClientException(httpResponse.status, s"${e.getClass.getName} - ${e.getMessage}")
}
}
}
Would this be purely a stylistic difference and using Future.exception is just better style in this case, whereas throwing almost looks like a side-effect (in reality it's not, as it doesn't exit the context of a Future) or is there something more behind it, such as order of execution and such?
Tl;dr:
What's the difference between throwing within a Future vs returning a Future.exception?
From a theoretical point of view, if we take away the exceptions part (they cannot be reasoned about using category theory anyway), then those two operations are completely identical as long as your construct of choice (in your case Twitter Future) forms a valid monad.
I don't want to go into length over these concepts, so I'm just going to present the laws directly (using Scala Future):
import scala.concurrent.ExecutionContext.Implicits.global
// Functor identity law
Future(42).map(x => x) == Future(42)
// Monad left-identity law
val f = (x: Int) => Future(x)
Future(42).flatMap(f) == f(42)
// combining those two, since every Monad is also a Functor, we get:
Future(42).map(x => x) == Future(42).flatMap(x => Future(x))
// and if we now generalise identity into any function:
Future(42).map(x => x + 20) == Future(42).flatMap(x => Future(x + 20))
So yes, as you already hinted, those two approaches are identical.
However, there are three comments that I have on this, given that we are including exceptions into the mix:
Be careful - when it comes to throwing exceptions, Scala Future (probably Twitter too) violates the left-identity law on purpose, in order to trade it off for some extra safety.
Example:
import scala.concurrent.ExecutionContext.Implicits.global
def sneakyFuture = {
throw new Exception("boom!")
Future(42)
}
val f1 = Future(42).flatMap(_ => sneakyFuture)
// Future(Failure(java.lang.Exception: boom!))
val f2 = sneakyFuture
// Exception in thread "main" java.lang.Exception: boom!
As #randbw mentioned, throwing exceptions is not idiomatic to FP and it violates principles such as purity of functions and referential transparency of values.
Scala and Twitter Future make it easy for you to just throw an exception - as long as it happens in a Future context, exception will not bubble up, but instead cause that Future to fail. However, that doesn't mean that literally throwing them around in your code should be permitted, because it ruins the structure of your programs (similarly to how GOTO statements do it, or break statements in loops, etc.).
Preferred practice is to always evaluate every code path into a value instead of throwing bombs around, which is why it's better to flatMap into a (failed) Future than to map into some code that throws a bomb.
Keep in mind referential transparency.
If you use map instead of flatMap and someone takes the code from the map and extracts it out into a function, then you're safer if this function returns a Future, otherwise someone might run it outside of Future context.
Example:
import scala.concurrent.ExecutionContext.Implicits.global
Future(42).map(x => {
// this should be done inside a Future
x + 1
})
This is fine. But after completely valid refactoring (which utilizes the rule of referential transparency), your codfe becomes this:
def f(x: Int) = {
// this should be done inside a Future
x + 1
}
Future(42).map(x => f(x))
And you will run into problems if someone calls f directly. It's much safer to wrap the code into a Future and flatMap on it.
Of course, you could argue that even when using flatMap someone could rip out the f from .flatMap(x => Future(f(x)), but it's not that likely. On the other hand, simply extracting the response processing logic into a separate function fits perfectly with the functional programming's idea of composing small functions into bigger ones, and it's likely to happen.
From my understanding of FP, exceptions are not thrown. This would be, as you said, a side-effect. Exceptions are instead values that are handled at some point in the execution of the program.
Cats (and i'm sure other libraries, too) employs this technique too (https://github.com/typelevel/cats/blob/master/core/src/main/scala/cats/ApplicativeError.scala).
Therefore, the flatMap call allows the exception to be contained within a satisfied Future here and handled at a later point in the program's execution where other exception value handling may also occur.

Is it good practice to use a Try for flow control?

There's a function in our codebase with a signature like this:
def hasPermission(...): Try[Unit]
It basically checks if a user has permission to perform a certain action on a certain item. If the user has permission, it returns an empty Success, and if not, it returns a Failure with a specific exception type. This function is often used within comprehensions like this:
for {
_ <- hasPermission(...)
res <- doSomething()
} yield res
This seems like bad practice, but I can't quite articulate why I feel that way. To me, it seems like hasPermission should simply return a Boolean.
Is this an appropriate use of a Try?
edit: I think my question is different than the linked one because it's more specific. That one is asking a general question about returning Try[Unit], which I believe is acceptable in some cases.
If the method says hasPermission, then I'd say it should return a Boolean, or a Try[Boolean]. Try[Unit] is not as obvious as Try[Boolean], and the caller would have to inspect the exception to tell if it didn't have the permission, or whether it failed to retrieve the permission info.
Now that said, generally calling hasPermission and then acting depending on the result can cause race conditions (e.g. if the permission is revoked after hasPermission is called). Therefore it's often preferable to do def doSomething(...): Try[Unit] and then raise e.g. a NoPermissionException.
Generally, the use of exceptions for control flow is an anti-pattern
Try tries (no pun intended) to encapsulate that flow control, but if you don't actually need to use exceptions, there's no reason to. Scala 2.12's Either implementation seems close to what you probably want:
Either is right-biased, which means that Right is assumed to be the default case to operate on. If it is Left, operations like map and flatMap return the Left value unchanged:
Let's assume that you are implementing a web server, and this logic is in control of a particular path:
type Request = ...
type Response = String
type ResponseOr[+T] = Either[Response, T]
def checkPermission: ResponseOr[Unit] =
if(hasPermission) Right(())
else Left("insufficient permissions")
def doSomething(req: Request): ResponseOr[Something] =
if(argumentsAreBad(req)) Left("arguments are bad!")
else Right(new Something)
def makeResponse(result: Something): Response = ???
def handleIt(req: Request): Response = {
val result = for {
_ <- checkPermission
result <- doSomething
} yield makeResponse(result)
result.merge // special method for `Either[T, T]` that gives a `T`
}
You'll see similar behavior to Success vs Failure - think of Left as analagous to a Failure, where if at any point one of the flatMap/map steps returns a Left, that's the end result and the rest are "skipped". No exceptions required.

Abstraction for asynchronous computation that cannot fail

Scala Futures are a fine abstraction for asynchronous computation that can fail. What abstraction should I use when I know that there is no failure possible?
Here is a concrete use case:
case class Service123Error(e: Throwable)
val f1: Future[User] =
service123.call()
val f2: Future[Either[Service123Error, User]] =
f1.map(Right.apply)
.recover { case e => Left(Service123Error(e)) }
In this code, the types do not model the fact that f2 always successfully completes. Given several values similar to f2, I know I can Future.sequence over them (i.e. without the fail fast behavior), but the types are not carrying this information.
I would not stick an Either into a Future if it carries the error information. Simply use the failure side of the future.
Future(Right(User)) -> Future[User](User)
Future(Left(Service123Error)) -> Future[User](throw Service123Error)

Scala: Throw Error vs Return Try?

Should a Scala API ideally throw exceptions or return a Try value? Is there an official guideline regarding this?
def doSomethingMayThrow(): A = ???
def doSomethingWontThrow(): Try[A] = ???
Never throw exceptions for recoverable errors.
Returning appropriate data structure representing a possible failure (a Future, a Try, an Either and so on) is always preferable than throwing exceptions in the wild. It will inform the caller about the possibility of a failure, and it will force them to manage it.
Exceptions should only be thrown for unrecoverable errors, such as hardware failures and similar.
Let's make an example:
def throwOnOdds(x: Int): Int =
if (x % 2 == 0) x / 2 else throw new Exception("foo")
val x = throwOnOdds(41) + 2 // compiles and explodes at runtime
now, let's make it better
def throwOnOdds(x: Int): Try[Int] =
if (x % 2 == 0) Success(x / 2) else Failure(new Exception("foo"))
val x = throwOnOdds(41) + 2 // doesn't compile
Not handling the failure leads to a compile-time error, which is way better than a runtime one. Here's how to handle it
throwOnOdds(41) match {
case Success(n) => // do something with n
case Failure(e) => // handle exception
}
See monadic datatypes. Using monadic datatypes is much more expressive and clear than throwing exceptions and would be the preferred way of declaratively handling all cases without unclear side effects.
http://en.wikipedia.org/wiki/Monad_(functional_programming)
The advantage of using a failure vs success and using map and flatMap to express the 'happy path' is that exceptions/failures become explicit in the chain.
Where you can tell if you call to doSomethingMayThrow might have a side effect (eg throwing an exception) it is very clear when you use a monadic datatype as a result.
It would help to look at a real world case. I'll use this as it is something i have worked on recently:
Consider a monadic future in a caching scenario - if the cache returns a result, you can process the result. If the cache does not return a result, then we can go to the actual service that we are trying to cache results from and we can express it very explicitly without any unclear implied side effects such as exception :
Here recoverWith is like flatMap on the error case (return a Future[Model] instead of the failure). recover is like map on the error case (return a model on the future instead of the failure). Then map takes whatever the resulting model is and processes it - all cases are clearly defined in the code so there is clarity of all cases and conditions in a single expression.
(userCacheActor ? GetUserModel(id="abc"))
.recoverWith(userServiceActor ? GetUserModel(id="abc"))
.recover(new ErrorModel(errorCode=100, body="service error")
.map(x: Response => createHttpResponseFromModel(x))
def createHttpResponseFromModel(x: Model) =>
x match {
case model: ErrorModel => ???
case model: UserModel => ???
}
Again, everything is expressively notated - what to do with the failure of the cache hit, what to do if the service can't respond in that scenario, what to do with the result model at the end of all of the processing in any case.
Often flatMap is talked about as the 'plumber' of the monad or 'plumber' of the happy path. flatMap allows you to take another Monad and return it. So you can 'try' multiple scenarios and write the happy path code collecting any errors at the end.
scala> Option("Something").flatMap(x => Option( x + " SomethingElse"))
.flatMap(x => None).getOrElse("encountered None somewhere")
res1: String = encountered None somewhere
scala> scala.util.Try("tried")
.flatMap(x => scala.util.Try( x + " triedSomethingElse"))
.flatMap(x => scala.util.Try{throw new Exception("Oops")})
.getOrElse("encountered exception")
res2: String = encountered exception

costly computation occuring in both isDefined and Apply of a PartialFunction

It is quite possible that to know whether a function is defined at some point, a significant part of computing its value has to be done. In a PartialFunction, when implementing isDefined and apply, both methods will have to do that. What to do is this common job is costly?
There is the possibility of caching its result, hoping that apply will be called after isDefined. Definitely ugly.
I often wish that PartialFunction[A,B] would be Function[A, Option[B]], which is clearly isomorphic. Or maybe, there could be another method in PartialFunction, say applyOption(a: A): Option[B]. With some mixins, implementors would have a choice of implementing either isDefined and apply or applyOption. Or all of them to be on the safe side, performance wise. Clients which test isDefined just before calling apply would be encouraged to use applyOption instead.
However, this is not so. Some major methods in the library, among them collect in collections require a PartialFunction. Is there a clean (or not so clean) way to avoid paying for computations repeated between isDefined and apply?
Also, is the applyOption(a: A): Option[B] method reasonable? Does it sound feasible to add it in a future version? Would it be worth it?
Why is caching such a problem? In most cases, you have a local computation, so as long as you write a wrapper for the caching, you needn't worry about it. I have the following code in my utility library:
class DroppedFunction[-A,+B](f: A => Option[B]) extends PartialFunction[A,B] {
private[this] var tested = false
private[this] var arg: A = _
private[this] var ans: Option[B] = None
private[this] def cache(a: A) {
if (!tested || a != arg) {
tested = true
arg = a
ans = f(a)
}
}
def isDefinedAt(a: A) = {
cache(a)
ans.isDefined
}
def apply(a: A) = {
cache(a)
ans.get
}
}
class DroppableFunction[A,B](f: A => Option[B]) {
def drop = new DroppedFunction(f)
}
implicit def function_is_droppable[A,B](f: A => Option[B]) = new DroppableFunction(f)
and then if I have an expensive computation, I write a function method A => Option[B] and do something like (f _).drop to use it in collect or whatnot. (If you wanted to do it inline, you could create a method that takes A=>Option[B] and returns a partial function.)
(The opposite transformation--from PartialFunction to A => Option[B]--is called lifting, hence the "drop"; "unlift" is, I think, a more widely used term for the opposite operation.)
Have a look at this thread, Rethinking PartialFunction. You're not the only one wondering about this.
This is an interesting question, and I'll give my 2 cents.
First of resist the urge for premature optimization. Make sure the partial function is the problem. I was amazed at how fast they are on some cases.
Now assuming there is a problem, where would it come from?
Could be a large number of case clauses
Complex pattern matching
Some complex computation on the if causes
One option I'd try to find ways to fail fast. Break the pattern matching into layer, then chain partial functions. This way you can fail the match early. Also extract repeated sub matching. For example:
Lets assume OddEvenList is an extractor that break a list into a odd list and an even list:
var pf1: PartialFuntion[List[Int],R] = {
case OddEvenList(1::ors, 2::ers) =>
case OddEvenList(3::ors, 4::ors) =>
}
Break to two part, one that matches the split then one that tries to match re result (to avoid repeated computation. However this may require some re-engineering
var pf2: PartialFunction[(List[Int],List[Int],R) = {
case (1 :: ors, 2 :: ers) => R1
case (3 :: ors, 4 :: ors) => R2
}
var pf1: PartialFuntion[List[Int],R] = {
case OddEvenList(ors, ers) if(pf2.isDefinedAt(ors,ers) => pf2(ors,ers)
}
I have used this when progressively reading XML files that hard a rather inconstant format.
Another option is to compose partial functions using andThen. Although a quick test here seamed to indicate that only the first was is actually tests.
There is absolutely nothing wrong with caching mechanism inside the partial function, if:
the function returns always the same input, when passed the same argument
it has no side effects
it is completely hidden from the rest of the world
Such cached function is not distiguishable from a plain old pure partial function...