How can I catch spark.read FileNotFoundException on Spark read? - scala

def extractor: DataFrame = {
Try{
spark.read.schema(myschema).parquet(mypath);
} match {
case Success(df) => {
log("EXTRACTION SUCCESSFUL")
df
}
case Failure(exception) => {
log("EXTRACTION UNSUCCESSFUL")
Seq.empty[myschema].toDF()
}
}
}
I call this extractor function in my Spark job A. The issue is that mypath keeps getting refreshed every half an hour by some other job B. So, when job A reads mypath - it catalogues the file names. By the time actual action is performed - the files are changed and the catalogue gets stale and Job A throws an exception - FileNotFound.
I want to be able to catch this exception and move on.
But this is what is currently happening -
The above function logs "EXTRACTION SUCCESSFUL"
But Job A throws Job aborted exception which I can see in Yarn.
How can I catch this exception and return an empty data set from the function extractor?

Spark, and hence your function, is not reading the data in the file, it is just analysing it. The data is read when the action is invoked. As such, you need to catch the exception at the action that you mention.

Related

Can Spark ForEachPartitionAsync be async on worker nodes?

I write a custom spark sink. In my addBatch method I use ForEachPartitionAsync which if I'm not wrong only makes the driver work asynchronously, returning a future.
val work: FutureAction[Unit] = rdd.foreachPartitionAsync { rows =>
val sourceInfo: StreamSourceInfo = serializeRowsAsInputStream(schema, rows)
val ackIngestion = Future {
ingestRows(sourceInfo) } andThen {
case Success(ingestion) => ackIngestionDone(partitionId, ingestion)
}
Await.result(ackIngestion, timeOut) // I would like to remove this line..
}
work onSuccess {
case _ => // move data from temporary table, report success of all workers
}
work onFailure{
//delete tmp data
case t => throw t.getCause
}
I can't find a way to run the worker nodes without blocking on the Await call, as if I remove them a success is reported to the work future object although the future didn't really finish.
Is there a way to report to the driver that all the workers finished
their asynchronous jobs?
Note: I looked at the foreachPartitionAsync function and it has only one implementation that expects a function that returns a Unit (i would've expected it to have another one returning a future or maybe a CountDownLatch..)

spark catch all exception and print to to string

I have some spark code, I need to catch all exception and store to file for some reason, so I tried to catch the exception and print it but its print empty
try {
/* Some spark code */
}
catch {
case e: Exception => {
println(" ************** " + e.printStackTrace())
}
}
output currently printing nothing ************** ()
printStackTrace doesn't return a stacktrace. It just prints it into the stderr. If you want to store it in the file you can
a) call e.getStackTrace and save each element manually
b) call e.printStackTrace(s) where s is a PrintStream or a PrintWriter pointing to your output file.
[Please find below the link that has answer to query][1]
Scala: Silently catch all exceptions

How to catch exception in future in play framework 2.4

I'm trying to figure out how to catch an exception from within a future in a function being called by an asynchronous action in Play Framework 2.4. However, the code I've got using recover never seems to get executed - I always get an Execution exception page rather than an Ok response.
The action code is:
def index = Action.async {
cardRepo.getAll()
.map {
cards => Ok(views.html.cardlist(cards))
}.recover{
case e: Exception => Ok(e.getMessage)
}
}
The code in cardRepo.getAll (that I've hard-coded a throw new Exception for experimenting) is:
def getAll(): Future[Seq[Card]] = {
implicit val cardFormat = Json.format[Card]
val cards = collection.find(Json.obj())
.cursor[Card]()
.collect[Seq]()
throw new Exception("OH DEAR")
cards
}
I've seen similar questions on Stack Overflow but I can't see what I'm doing wrong.
Thanks Mon Calamari - I think I understand now. The future is coming from collection.find, so if an error was inside that, my code would work but because I've put I've got it inside the function above it, there is no Future at that point.

scheduler.properties file could be found. - Quartz scheduler

I have a problem while getting the instance for Quartz Scheduler, not at the first call but on the continuous calls.
This is my piece of code.
public void getClusteredSchedulerInstance() {
try {
cluteredScheduler = new StdSchedulerFactory("scheduler.properties").getScheduler();
if (!cluteredScheduler.isStarted()) {
cluteredScheduler.start();
}
} catch (SchedulerException e) {
logger.error("Error while starting clustered scheduler", e);
}
}
When i call the method for the first time it reads the property file and gives the instance, but fails to do so on further calls.
Can i know why this happens?
Note: scheduler.properties is located in the current working directory.
Error message
org.quartz.SchedulerException: Properties file: 'scheduler.properties' could not be read. [See nested exception: java.io.FileNotFoundException: scheduler.properties (The system cannot find the file specified)]

How to make Play print all the errors

In our Scala, Play, Reactivemongo we have a big problem with exception handling - when there is an error, in Iteratee/Enumeratee or in Actor system Play just swallows it, without logging any error to the output. So we effectively need to guess where, and why this error might happen.
We made Globals override, to always print the error, and specified logger.root=TRACE, but still saw no output, from which we could analyse our problems.
How to forcebly make Play print all the errors
Didn't found the way to explicitly log everything but there is a way to log exceptions locally.
I did this:
def recover[T] = Enumeratee.recover[T] {
case (e, input) => Logger.error("Error happened on:" + input, e)
}
and then used it on all the enumeratees that can produce errors
def transform(from: Enumerator[From]): Enumerator[String] = {
heading >>> (from &> recover[From] ><> mapper) >>> tailing
}
here, mapper throws exception, and they are all logged.
I think your problem is with how Future works in scala, let's take the following exemple :
val f :Future[Int]= Future {
throw new NullPointerException("NULL")
1
}
f.map(s1 => {println(s" ==> $s1");s" ==> $s1"})
This code will throw an exception but the stack trace will not be printed as futures handle the error.
If you want to get the error that happened you can
just call:
f.onComplete{
case Success(e) => {}
case Failure(e) => e.printStackTrace()
}
e is a throwable that you can use as you want to handle the error.
At the solution I used, is override through ErrorHandling in Play https://www.playframework.com/documentation/2.4.2/ScalaErrorHandling, basically creating ErrorHandler that logs all the errors, with needed detalization.