How to catch an exception that occurred on a spark worker? - scala

val HTF = new HashingTF(50000)
val Tf = Case.map(row=>
HTF.transform(row)
).cache()
val Idf = new IDF().fit(Tf)
try
{
Idf.transform(Tf).map(x=>LabeledPoint(1,x))
}
catch {
case ex:Throwable=>
println(ex.getMessage)
}
Code like this isn't working.
HashingTF/Idf belongs to org.spark.mllib.feature.
I'm still getting an exception that says
org.apache.spark.SparkException: Failed to get broadcast_5_piece0 of broadcast_5
I cannot see any of my files in the error log, how do I debug this?

It seems that the worker ran out of memory.
Instant temporary Fix:
Run the application without caching.
Just remove .cache()
How to Debug:
Probably Spark UI might have the complete exception details.
check Stage details
check the logs and thread dump in Executor tab
If you find multiple exceptions or errors try to resolve it in sequence.
Most of the times resolving 1st error will resolve subsequent errors.

Related

Illegal access error when deleting Google Pub Sub subscription upon JVM shutdown

I'm trying to delete a Google Pub Sub subscription in a JVM shutdown hook, but I'm encountering an illegal access error with the Google Pub Sub subscription admin client when the shutdown hook runs. I've tried using both sys.addShutdownHook as well as Runtime.getRuntime().addShutdownHook, but I get the same error either way.
val deleteInstanceCacheSubscriptionThread = new Thread {
override def run: Unit = {
cacheUpdateService. deleteInstanceCacheUpdateSubscription()
}
}
sys.addShutdownHook(deleteInstanceCacheSubscriptionThread.run)
// Runtime.getRuntime().addShutdownHook(deleteInstanceCacheSubscriptionThread)
This is the stack trace:
Exception in thread "shutdownHook1" java.lang.IllegalStateException: Illegal access: this web application instance has been stopped already. Could not load [META-INF/services/com.google.auth.http.HttpTransportFactory]. The following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access.
at org.apache.catalina.loader.WebappClassLoaderBase.checkStateForResourceLoading(WebappClassLoaderBase.java:1385)
at org.apache.catalina.loader.WebappClassLoaderBase.findResources(WebappClassLoaderBase.java:985)
at org.apache.catalina.loader.WebappClassLoaderBase.getResources(WebappClassLoaderBase.java:1086)
at java.util.ServiceLoader$LazyIterator.hasNextService(ServiceLoader.java:348)
at java.util.ServiceLoader$LazyIterator.hasNext(ServiceLoader.java:393)
at java.util.ServiceLoader$1.hasNext(ServiceLoader.java:474)
at com.google.common.collect.Iterators.getNext(Iterators.java:845)
at com.google.common.collect.Iterables.getFirst(Iterables.java:779)
at com.google.auth.oauth2.OAuth2Credentials.getFromServiceLoader(OAuth2Credentials.java:318)
at com.google.auth.oauth2.ServiceAccountCredentials.<init>(ServiceAccountCredentials.java:145)
at com.google.auth.oauth2.ServiceAccountCredentials.createScoped(ServiceAccountCredentials.java:505)
at com.google.api.gax.core.GoogleCredentialsProvider.getCredentials(GoogleCredentialsProvider.java:92)
at com.google.api.gax.rpc.ClientContext.create(ClientContext.java:142)
at com.google.cloud.pubsub.v1.stub.GrpcSubscriberStub.create(GrpcSubscriberStub.java:263)
at com.google.cloud.pubsub.v1.stub.SubscriberStubSettings.createStub(SubscriberStubSettings.java:242)
at com.google.cloud.pubsub.v1.SubscriptionAdminClient.<init>(SubscriptionAdminClient.java:178)
at com.google.cloud.pubsub.v1.SubscriptionAdminClient.create(SubscriptionAdminClient.java:159)
at com.google.cloud.pubsub.v1.SubscriptionAdminClient.create(SubscriptionAdminClient.java:150)
at com.company.pubsub.services.GooglePubSubService.$anonfun$deleteSubscription$2(GooglePubSubService.scala:384)
at com.company.utils.TryWithResources$.withResources(TryWithResources.scala:21)
at com.company.pubsub.services.GooglePubSubService.$anonfun$deleteSubscription$1(GooglePubSubService.scala:384)
at com.company.scalalogging.Logging.time(Logging.scala:43)
at com.company.scalalogging.Logging.time$(Logging.scala:35)
at com.company.pubsub.services.GooglePubSubService.time(GooglePubSubService.scala:30)
at com.company.pubsub.services.GooglePubSubService.deleteSubscription(GooglePubSubService.scala:382)
at com.company.cache.services.CacheUpdateService.deleteInstanceCacheUpdateSubscription(CacheUpdateService.scala:109)
at com.company.cache.services.CacheUpdateHandlerService$$anon$1.run(CacheUpdateHandlerService.scala:132)
at com.company.cache.services.CacheUpdateHandlerService$.$anonfun$addSubscriptionShutdownHook$1(CacheUpdateHandlerService.scala:135)
at scala.sys.ShutdownHookThread$$anon$1.run(ShutdownHookThread.scala:37)
It seems like by the time the shutdown hook runs the Pub Sub library has already shut down, so we can't access the subscription admin client anymore. But, I was wondering if there was anyway to delete the subscription before this happens.

In a scala spark job, running in yarn, how can I fail the job so that yarn shows a Failed status

I have a simple if statement in my scala spark job code, that if false i want to stop the job and mark it failed. I want the yarn UI to show the spark job with a status of failed, but everything i've done so far has stopped the job, but only shows up as successfully finished on the yarn UI.
if(someBoolen) {
//context.clearAllJobs()
//System.exit(-1)
//etc, nothing so far, stops the job and show as failed in the yarn UI
}
Any help would be great.
Throwing an exception (and not catching it) will cause the process to fail.
if(someBoolen) {
throw new Exception("Job failed");
}

Error in using MLlib function ALS in Spark

I have read from a file like this:
val ratingText = sc.textFile("/home/cloudera/rec_data/processed_data/ratings/000000_0")
Used the following function to parse this data:
def parseRating(str: String): Rating= {
val fields = str.split(",")
Rating(fields(0).toInt, fields(1).trim.toInt, fields(2).trim.toDouble)
}
And created a rdd, which is then split into different RDDs
val ratingsRDD = ratingText.map(x=>parseRating(x)).cache()
val splits = ratingsRDD.randomSplit(Array(0.8, 0.2), 0L)
val trainingRatingsRDD = splits(0).cache()
Used the training RDD to create a model as follows:
val model = (new ALS().setRank(20).setIterations(10) .run(trainingRatingsRDD))
I get the following error in the last command
16/10/28 01:03:44 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
16/10/28 01:03:44 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS
16/10/28 01:03:46 WARN LAPACK: Failed to load implementation from: com.github.fommil.netlib.NativeSystemLAPACK
16/10/28 01:03:46 WARN LAPACK: Failed to load implementation from: com.github.fommil.netlib.NativeRefLAPACK
Edit: T. Gaweda's suggestion helped in removing the errors, but I am still getting the following warning:
16/10/28 01:53:59 WARN Executor: 1 block locks were not released by TID = 60:
[rdd_420_0]
16/10/28 01:54:00 WARN Executor: 1 block locks were not released by TID = 61:
[rdd_421_0]
And I think this has resulted in an empty model , because the next step is resulting in the following error:
val topRecsForUser = model.recommendProducts(4276736,3)
Error is:
java.util.NoSuchElementException: next on empty iterator at scala.collection.Iterator$$anon$2.next(Iterator.scala:39)
Please help!
It's only a warning. Spark uses BLAS to perform calculations. BLAS has native implementations and JVM implementation, the native one is more optimized / faster. However, you must install native library individually.
Without this configuration the warning message will appear and Spark will use JVM implementation of BLAS. Results should be the same, maybe calculated quite slower.
Here you've got description what is BLAS and how to configure it, for example on Cent OS is should be only: yum install openblas lapack

Spark yarn return exit code not updating as failed in webUI - spark submit

I'm running spark jobs through YARN with Spark submit , after my spark job failing the job is still showing status as SUCCEED instead of FAILED. how can I return exit code as failed state from code to the YARN?
How can we send yarn different application code status from the code?
I do not think you can do that. I have experienced the same behavior with spark-1.6.2, but after analyzing the failures, I don't see any obvious way of sending a "bad" exit code from my application.
I tried sending a non-zero exit code within the code using
System.exit(1);
which did not work as expected and #gsamaras mentioned the same in his answer. The following work around worked for me though , using try catch.
try{
}
catch {
case e: Exception => {
log.error("Error " + e)
throw e;
} }
I had same issue, fixed this by changing deploy mode from client to cluster. If deploy mode is client, then state will always be FINISHED. Refer https://issues.apache.org/jira/browse/SPARK-3627

Spark with MongoDB error

I'm learning to use Spark with MongoDB, but I've encountered a problem that I think is related to the way I use Spark., because it doesn't make any sense to me.
My concept test is that I want to filter a collection containing about 800K documents by a certain field.
My code is very simple. Connect to my MongoDB, apply a filter transformation and then count the elements:
JavaSparkContext sc = new JavaSparkContext("local[2]", "Spark Test");
Configuration config = new Configuration();
config.set("mongo.input.uri", "mongodb://127.0.0.1:27017/myDB.myCollection");
JavaPairRDD<Object, BSONObject> mongoRDD = sc.newAPIHadoopRDD(config, com.mongodb.hadoop.MongoInputFormat.class, Object.class, BSONObject.class);
long numberOfFilteredElements = mongoRDD.filter(myCollectionDocument -> myCollectionDocument._2().get("site").equals("marfeel.com")).count();
System.out.format("Filtered collection size: %d%n", numberOfFilteredElements);
When I execute this code, the Mongo driver splits my collection into 2810 partitions, so equal number of tasks start to process.
About the task number 1000, I get the following error message:
ERROR Executor: Exception in task 990.0 in stage 0.0 (TID 990) java.lang.OutOfMemoryError: unable to create new native thread
I've searched a lot about this error, but it doesn't make any sense to me. I came up the conclusion that I have a problem with my code, that I have some library versions incompatibilities or that my real problem is that I'm getting the whole Spark concept wrong, and that the code above doesn't make any sense at all.
I'm using the following library versions:
org.apache.spark.spark-core_2.11 -> 1.2.0
org.apache.hadoop.hadoop-client -> 2.4.1
org.mongodb.mongo-hadoop.mongo-hadoop-core -> 1.3.1
org.mongodb.mongo-java-driver -> 2.13.0-rc1