Scala - how to implement Try inside a map function in Spark - scala

I have one stage of a Spark job failing due to a java.lang.NullPointerException thrown by a function in a map transformation.
My idea is to get the corrupted Sale object from inside the map with the help of a Try type.
So I intentionally assigned the function result to a saleOption variable in order to then do pattern matching.
Unfortunately my current implementation does not work and I need an advice on how to fix it. Will be grateful for any suggestions.
Here is the initial method:
def filterSales(rawSales: RDD[Sale]): RDD[(String, Sale)] = {
rawSales
.map(sale => sale.id -> sale) // throws NullPointerException
.reduceByKey((sale1, sale2) => if (sale1.timestamp > sale2.timestamp) sale1 else sale2)
}
Here is how I implemented my idea:
def filterSales(rawSales: RDD[Sale]): RDD[(String, Sale)] = {
rawSales
.map(sale => {
val saleOption: Option[(String, Sale)] = Try(sale.id -> sale).toOption
saleOption match {
case Success(successSale) => successSale
case Failure(e) => throw new IllegalArgumentException(s"Corrupted sale: $rawSale;", e)
}
})
.reduceByKey((sale1, sale2) => if (sale1.timestamp > sale2.timestamp) sale1 else sale2)
}
UPD: My intention is to implement the idea for debugging purposes and to improve my Scala knowledge. I'm not going to use Try and Exceptions for flow control.

If you want to just ignore null Sales then remove these out and don't throw an exception. For example with
rawSales
.flatMap(Option(_))
.keyBy(_.id)
.reduceByKey(
(sale1, sale2) => if (sale1.timestamp > sale2.timestamp) sale1 else sale2
)

Try should not be used for flow control. Exceptions should be used only in exceptional cases. The best solution is to fix your NullPointerException. If there shouldn't be any nulls, then you have an error in your code that's generating the RDD. If you expect potential null values, such as from malformed input data, then you should really use an RDD[(String,Option[Sale])].

Related

How to Check If Some Futures in a Collection Have Failed

I used traverse to execute a collection of futures like this:
val result: Future[List[Either[Error, Int]]] = Future.traverse(urls)(foo(_))
I end up with a Future[List[Either[Error, Int]]]. How can I check that one of these futures resulted in an Error?
I tried to do this but I think it is wrong because I am reading that you cannot substitute variables for futures?
val check: Future[Boolean] = result.map{
fut => fut.exists(c => c.isLeft)
}
check.map{
b => b match {
case true => // do something
case false => // do something
}
}
You can convert the result to a list of errors like this:
val errors: Future[List[Error]] = result.map(_.collect{ case Left(err) => err })
It is then possible to use Await.result to extract these error values, but that is nearly always a bad idea because it blocks the current thread.
It is better to ask "What do I want to do once the Future is complete but returns errors?". Then implement that behaviour in a map or foreach on the errors Future.

Using Option with .map() and .getOrElse()

I am trying to read a value from a Map[String, String] given a key.
This key|value is optional, in that it might not be there
So, I want to use Option and then map & getOrElse as below to write the value if it's there, or set it to some default in case it's not there.
val endpoint:String = Option(config.getString("endpoint"))
.map(_.value())
.getOrElse()
The code above fails with "Symbol value is inaccessible from this place"
config is a Map[String, Object]
getString is a method on config that takes in the key, and returns the value
public String getString(String key){
<...returns value...>
}
I could just drop the Option() and do, but then I have to deal with the exception that will be throw by getString()
val endpoint:String = config.getString("endpoint")
Any ideas what's wrong with this, or how to fix this?
Better ways of writing this?
UPDATE: I need to mention that config is an object in an imported Java library. Not sure if that makes a difference or not.
If I understand your question correctly, config.getString will throw an exception when the key is not present. In this case, wrapping the call in Option() will not help catch that exception: you should wrap in Try instead and convert that to an Option.
Try[String] represents a computation that can either succeed and become a Success(String), or fail and give you a Failure(thrownException). If you're familiar with Option, this is very similar to the two possibilities of Some and None, except that Failure will wrap the exception so that you know what caused the problem. The Try(someComputation) method will just do something like this for you:
try {
Success(someComputation)
} catch {
case ex: Exception => Failure(ex)
}
The second thing to consider is what you actually want to happen when there is no value. One sensible idea would be to provide a default configuration, and this is what getOrElse is for: you can't use without giving it the default value!
Here is an example:
val endpoint = Try(config.getString("endpoint"))
.toOption
.getOrElse("your_default_value")
We can do even better: now that we're using Try to catch the exception, there is no need to convert to Option if we're going to access the value right away.
val endpoint = Try(config.getString("endpoint")).getOrElse("your_default_value")
You can get a value from a map like this.
val m: Map[String, String] = Map("foo" -> "bar")
val res = m.get("foo").getOrElse("N.A")
val res2 = m.getOrElse("foo", "N.A") // same as above but cleaner
But perhaps if you want to use pattern matching:
val o: Option[String] = m.get("foo")
val res: String = o match {
case Some(value) => value
case None => "N.A"
}
Finally, a safe way to handle reading from config.
val endpoint:String = config.getString("endpoint") // this can return null
val endpoint: Option[String] = Option(config.getString("endpoint")) // this will return None if endpoint is not found
I suspect the config object might even have a method like
val endpoint: Option[String] = config.getStringOpt("endpoint")
Then you can use pattern matching to extract the value in the option. Or one of the many combinators map, flatMap, fold etc
val endPoint = Option(config.getString("endpoint"))
def callEndPoint(endPoint: String): Future[Result] = ??? // calls endpoint
endPoint match {
case Some(ep) => callEndPoint(ep)
case None => Future.failed(new NoSuchElementException("End point not found"))
}
Or
val foo = endPoint.map(callEndPoint).getOrElse(Future.failed(new NoSuchElement...))

Getting lost in Scala Futures

I'm slowly wrapping my brain around Futures in Scala, and have a bit of a layer cake going on that I'm trying to unravel.
The specific use case is a DeferredResolver in sangria-graphql + akka. I've stolen their demo code, which looks like this
Future.fromTry(Try(
friendIds map (id => CharacterRepo.humans.find(_.id == id) orElse CharacterRepo.droids.find(_.id == id))))
and added my own modification to it. Theirs does an in-memory lookup, whereas mine asks something of another actor:
Future.fromTry(Try(
accountIds match {
case h :: _ =>
val f = sender ? TargetedMessage(h)
val resp = Await.result(f, timeout.duration).asInstanceOf[TargetedMessage]
marshallAccount(resp.body)
case _ => throw new Exception("Not found")
}
))
The pertinent piece here is that I pick the first element in the list, send it to an ActorRef that I got elsewhere and wait for the result. This works. What I'd like to do, however, is not have to wait for the result here, but return the whole thing as a Future
Future.fromTry(Try(
accountIds match {
case h :: _ =>
sender ? TargetedMessage(h) map {
case resp:TargetedMessage => marshallAccount(resp.body)
}
case _ => throw new Exception("Not found")
}
))
This doesn't work. When this is consumed, instead of being of type Account (the return type of function marshallAccount, it's of type Promise. If I understand correctly, it's because instead of having a return type of Future[Account], this has a type of Future[Future[Account]]
How do I flatten this?
You are looking at the wrong API method. Future.fromTry is used to create an immediately resolved Future, meaning the call is not actually asynchronous. Dive into the implementation of Future.fromTry which will take you to:
def fromTry[T](result: Try[T]): Promise[T] = new impl.Promise.KeptPromise[T](result)
A promise kept is basically something that has already happened, so just like Future.successful this is just used to ensure the right return type or similar, it's not actually a way to make something async.
The reason why the return type is Future[Future[Something]] is because you are trying to wrap something that already returns a future into another future.
The ask pattern, namely sender ? TargetMessage(h) is a way to ask something of an actor and await for a result, which will return a future.
The correct way to approach this:
val future: Future[Account] = accountIds match {
case h :: _ => sender ? TargetedMessage(h) map (marshallAccount(_.body)
case _ => Future.failed(throw new Exception("Not found"))
}
Basically you need to use Future.failed to return a failed future from an exception if you want to keep the return type consistent. It's worth reviewing this tutorial to learn a bit more about Futures and how to write application logic with them.

How do I raise failure with Scala Try

In Spark, I can handle exceptions this way:
val myRDD = sc.textFile(path)
.map(line => Try {
// do something dangerous
// if(condition)
// raise isFailure;
}).filter(_.isSuccess).map(_.get)
I'd like the element read by the first map function to raise failure as well, under some conditions. How do I do it?
If you're ignoring the actual errors, you can just filter for the condition (given that the condition depends entirely on the line):
def condition(line: String): Boolean = ???
val myRDD = sc.textFile(path)
.map(line => Try { // do something dangerous })
.filter(_.isSuccess)
.filter(condition)
.map(_.get)
You could use Either
myRDD.map(line=> if (condition) Left("some Error") else Right(somevalue))
.collect { case Right(v) => v }
This gives a result of whatever was in the successful mappings. But you probably want to do something with the failures if using the above code (with for example a partition { }. If not use the Option approach:
Or use a straightforward Option:
myRDD.map(line=> if (condition) None else Some(somevalue))
.flatten()
If you wanted to combine last option with your Try, use something like
myRDD.map(line=> Try{
/*something dangerous*/
if (condition) None else Some(somevalue)
})
.collect{ case Success(r) => r } // successful results
.flatten() // flatten out the Nones
IMHO (and many others), refrain from throwing exceptions in scala yourself. I see them as fancy goto-statements that don't jump through code within one dimension, but through different dimensions (i.e, the stack). They are part of the Java platform, so we need to deal with them when we access Java code (plus you might have to throw some if you are building something which will be used by Java developers)

How to eliminate val repetition in this Scala assign/test/throw sequence

This code throws an exception when properties.keySet contains keys that are not present in EXPECTED_IMPORT_KEYS. The val is referenced three times in the code,
val unexpectedKeys = properties.keySet -- EXPECTED_IMPORT_KEYS
if (unexpectedKeys.nonEmpty) {
throw new UnexpectedKeysException(unexpectedKeys)
}
Is there some more elegant way to achieve this in Scala? I am thinking in particular of the repeated val references. Can those repetitions be eliminated?
It might help to know that the unexpectedKeys val is not required after the code completes.
The improvement I am looking for is a reduction from three in the number of times the val occurs. It is not necessary to have a val, that's just my initial formulation.
If you need this often, just define a little helper method:
def emptyOption[A, CC <: Iterable[A]](coll: B with collection.IterableLike[A, CC]) =
if (coll.isEmpty) None else Some(coll)
Then use it like this:
scala> emptyOption(Set[Int]()) foreach (coll => throw new RuntimeException(coll.toString))
scala> emptyOption(Set[Int](1)) foreach (coll => throw new RuntimeException(coll.toString))
java.lang.RuntimeException: Set(1)
You could try:
(properties.keySet -- EXPECTED_IMPORT_KEY) match {
case residual if residual.nonEmpty => throw new UnexpectedKeysException(residual)
case _ =>
}