In Spark, I can handle exceptions this way:
val myRDD = sc.textFile(path)
.map(line => Try {
// do something dangerous
// if(condition)
// raise isFailure;
}).filter(_.isSuccess).map(_.get)
I'd like the element read by the first map function to raise failure as well, under some conditions. How do I do it?
If you're ignoring the actual errors, you can just filter for the condition (given that the condition depends entirely on the line):
def condition(line: String): Boolean = ???
val myRDD = sc.textFile(path)
.map(line => Try { // do something dangerous })
.filter(_.isSuccess)
.filter(condition)
.map(_.get)
You could use Either
myRDD.map(line=> if (condition) Left("some Error") else Right(somevalue))
.collect { case Right(v) => v }
This gives a result of whatever was in the successful mappings. But you probably want to do something with the failures if using the above code (with for example a partition { }. If not use the Option approach:
Or use a straightforward Option:
myRDD.map(line=> if (condition) None else Some(somevalue))
.flatten()
If you wanted to combine last option with your Try, use something like
myRDD.map(line=> Try{
/*something dangerous*/
if (condition) None else Some(somevalue)
})
.collect{ case Success(r) => r } // successful results
.flatten() // flatten out the Nones
IMHO (and many others), refrain from throwing exceptions in scala yourself. I see them as fancy goto-statements that don't jump through code within one dimension, but through different dimensions (i.e, the stack). They are part of the Java platform, so we need to deal with them when we access Java code (plus you might have to throw some if you are building something which will be used by Java developers)
Related
I have a function f which returns a Future[Unit]. I want to apply the function f onto a sequence of Strings, such that I get a Seq[Future[Unit]].
To convert it into a single Future, we are using Future.sequence which converts it into a Future[Seq[Unit]]. Now, since the function f can either fail or pass, we also convert it into a Try (to better handle the failures) using FutureUtil.toTry which gives us a Future[Try[Seq[Unit]]].
Now, the deal is that we don't want to know which Futures passed or not, but the main task is to realise if all passed or not. If either of them fails, we stop the execution.
So, I was wondering if there was some "elegant" way to find this and we could simply remove the Seq from the final Future and have something like Future[Try[Unit]].
A code example (which should help understand the problem in a much better way)
def f(s: String): Future[Unit] = {
if(s.isEmpty)
Future.failed(new Throwable("lalala"))
else
Future.successful()
}
val strings: Seq[String] = Seq[String]("abc", "xyz", "lol")
val stringsFuture: Seq[Future[Unit]] = strings.map({ s =>
f(s)
})
val futureStrings: Future[Seq[Unit]] = Future.sequence(stringsFuture)
val futureStringsTry: Future[Try[Seq[Unit]]] = FutureUtil.toTry(futureStrings)
Is there a way where we can convert futureStringsTry to a simple Future[Try[Unit]].
A naive solution would be to flatmap futureStringsTry, something like this:
val finalFuture: Future[Try[Unit]] = futureStringsTry.map({
case Success(_) => Success()
case Failure(exception) => Failure(exception)
})
But, is there some other way where we can "elegantly" evaluate whether the whole Sequence passed or not?
I used traverse to execute a collection of futures like this:
val result: Future[List[Either[Error, Int]]] = Future.traverse(urls)(foo(_))
I end up with a Future[List[Either[Error, Int]]]. How can I check that one of these futures resulted in an Error?
I tried to do this but I think it is wrong because I am reading that you cannot substitute variables for futures?
val check: Future[Boolean] = result.map{
fut => fut.exists(c => c.isLeft)
}
check.map{
b => b match {
case true => // do something
case false => // do something
}
}
You can convert the result to a list of errors like this:
val errors: Future[List[Error]] = result.map(_.collect{ case Left(err) => err })
It is then possible to use Await.result to extract these error values, but that is nearly always a bad idea because it blocks the current thread.
It is better to ask "What do I want to do once the Future is complete but returns errors?". Then implement that behaviour in a map or foreach on the errors Future.
I have one stage of a Spark job failing due to a java.lang.NullPointerException thrown by a function in a map transformation.
My idea is to get the corrupted Sale object from inside the map with the help of a Try type.
So I intentionally assigned the function result to a saleOption variable in order to then do pattern matching.
Unfortunately my current implementation does not work and I need an advice on how to fix it. Will be grateful for any suggestions.
Here is the initial method:
def filterSales(rawSales: RDD[Sale]): RDD[(String, Sale)] = {
rawSales
.map(sale => sale.id -> sale) // throws NullPointerException
.reduceByKey((sale1, sale2) => if (sale1.timestamp > sale2.timestamp) sale1 else sale2)
}
Here is how I implemented my idea:
def filterSales(rawSales: RDD[Sale]): RDD[(String, Sale)] = {
rawSales
.map(sale => {
val saleOption: Option[(String, Sale)] = Try(sale.id -> sale).toOption
saleOption match {
case Success(successSale) => successSale
case Failure(e) => throw new IllegalArgumentException(s"Corrupted sale: $rawSale;", e)
}
})
.reduceByKey((sale1, sale2) => if (sale1.timestamp > sale2.timestamp) sale1 else sale2)
}
UPD: My intention is to implement the idea for debugging purposes and to improve my Scala knowledge. I'm not going to use Try and Exceptions for flow control.
If you want to just ignore null Sales then remove these out and don't throw an exception. For example with
rawSales
.flatMap(Option(_))
.keyBy(_.id)
.reduceByKey(
(sale1, sale2) => if (sale1.timestamp > sale2.timestamp) sale1 else sale2
)
Try should not be used for flow control. Exceptions should be used only in exceptional cases. The best solution is to fix your NullPointerException. If there shouldn't be any nulls, then you have an error in your code that's generating the RDD. If you expect potential null values, such as from malformed input data, then you should really use an RDD[(String,Option[Sale])].
I'm writing an authentication client that takes an Option[Credentials] as a parameter. This Credentials object has a .token method on it which I will then use to construct an HTTP request to post to an endpoint. This returns a Future[HttpResponse], which I then need to validate, unmarshal, and then convert back to my return type, which is an Option[String].
My first thought was to use a for comprehension like this:
val resp = for {
c <- creds
req <- buildRequest(c.token)
resp <- Http().singleRequest(req)
} yield resp
but then I found out that monads cannot be composed like that. My next thought is to do something like this:
val respFut = Http().singleRequest(buildRequest(token))
respFut.onComplete {
case Success(resp) => Some("john.doe")//do stuff
case Failure(_) => None
}
Unfortunately onComplete returns a unit, and map leaves me with a Future[Option[String]], and the only way I currently know to strip off the future wrapper is using the pipeTo methods in the akka framework. How can I convert this back to just an option string?
Once you've got a Future[T], it's usually good practice to not try to unbox it until you absolutely have to. Can you change your method to return a Future[Option[String]]? How far up the call stack can you deal with futures? Ideally it's all the way.
Something like this will give you a Future[Option[String]] as a result:
val futureResult = creds map {
case Some(c) => {
val req = buildRequest(c.token)
val futureResponse = Http().singleRequest(req)
futureResponse.map(res => Some(convertResponseToString(res)))
}
case None => Future(None)
}
If you really need to block and wait on the result, you can do Await.result as described here.
And if you want to do it in a more monadic style (in a for-comprehension, like you tried), cats has an OptionT type that will help with that, and I think scalaz does as well. But whether you want to get into either of those libraries is up to you.
It's easy to "upgrade" an Option to a Future[Option[...]], so use Future as your main monad. And deal with the simpler case first:
val f: Future[Option[String]] =
// no credential? just wrap a `None` in a successful future
credsOpt.fold(Future.successful(Option.empty[String])) {creds =>
Http()
.singleRequest(buildRequest(creds.token))
.map(convertResponseToString)
.recover {case _ => Option.empty[String]}
}
The only way to turn that future into Option[String] is to wait for it with Await.result(...)... but it's better if that future can be passed along to the next caller (no blocking).
I'm not 100% certain about what all your types are, but it seems like you want a for comprehension that mixes option and futures. I've often been in that situation and I find I can just chain my for comprehensions as a way to make the code look a bit better.
val resp = for {
c <- creds
req <- buildRequest(c.token)
} yield for {
resp <- Http().singleRequest(req)
} yield resp
resp becomes an Option[Future[HttpResponse]] which you can match / partial func around with None meaning the code never got to execute because it failed its conditions. This is a dumb little trick I use to make comprehensions look better and I hope it gives you a hint towards your solution.
I have a Scala Option[T]. If the value is Some(x) I want to process it with a a process that does not return a value (Unit), but if it is None, I want to print an error.
I can use the following code to do this, but I understand that the more idiomatic way is to treat the Option[T] as a sequence and use map, foreach, etc. How do I do this?
opt match {
case Some(x) => // process x with no return value, e.g. write x to a file
case None => // print error message
}
I think explicit pattern matching suits your use case best.
Scala's Option is, sadly, missing a method to do exactly this. I add one:
class OptionWrapper[A](o: Option[A]) {
def fold[Z](default: => Z)(action: A => Z) = o.map(action).getOrElse(default)
}
implicit def option_has_utility[A](o: Option[A]) = new OptionWrapper(o)
which has the slightly nicer (in my view) usage
op.fold{ println("Empty!") }{ x => doStuffWith(x) }
You can see from how it's defined that map/getOrElse can be used instead of pattern matching.
Alternatively, Either already has a fold method. So you can
op.toRight(()).fold{ _ => println("Empty!") }{ x => doStuffWith(x) }
but this is a little clumsy given that you have to provide the left value (here (), i.e. Unit) and then define a function on that, rather than just stating what you want to happen on None.
The pattern match isn't bad either, especially for longer blocks of code. For short ones, the overhead of the match starts getting in the way of the point. For example:
op.fold{ printError }{ saveUserInput }
has a lot less syntactic overhead than
op match {
case Some(x) => saveUserInput(x)
case None => printError
}
and therefore, once you expect it, is a lot easier to comprehend.
I'd recommend to simply and safely use opt.get which itself throws a NoSuchElementException exception if opt is None. Or if you want to throw your own exception, you can do this:
val x = opt.getOrElse(throw new Exception("Your error message"))
// x is of type T
as #missingfaktor says, you are in the exact scenario where pattern matching is giving the most readable results.
If Option has a value you want to do something, if not you want to do something else.
While there are various ways to use map and other functional constructs on Option types, they are generally useful when:
you want to use the Some case and ignore the None case e.g. in your case
opt.map(writeToFile(_)) //(...if None just do nothing)
or you want to chain the operations on more than one option and give a result only when all of them are Some. For instance, one way of doing this is:
val concatThreeOptions =
for {
n1 <- opt1
n2 <- opt2
n3 <- opt3
} yield n1 + n2 + n3 // this will be None if any of the three is None
// we will either write them all to a file or none of them
but none of these seem to be your case
Pattern matching is the best choice here.
However, if you want to treat Option as a sequence and to map over it, you can do it, because Unit is a value:
opt map { v =>
println(v) // process v (result type is Unit)
} getOrElse {
println("error")
}
By the way, printing an error is some kind of "anti-pattern", so it's better to throw an exception anyway:
opt.getOrElse(throw new SomeException)