I have a simple spark (1.4.1 version) application written in Scala that consume data from a kinesis stream. If i run the application, using the spark-submit command, with the value for the master setted to local[*] everything works fine. If i choose to use as master yarn-client i have the following exception:
15/11/24 14:22:09 ERROR ReceiverTracker: Deregistered receiver for stream 1: Error starting receiver 1 - java.lang.NoClassDefFoundError: org/joda/time/format/DateTimeFormat
at com.amazonaws.auth.AWS4Signer.<clinit>(AWS4Signer.java:44)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at java.lang.Class.newInstance(Class.java:442)
at com.amazonaws.auth.SignerFactory.createSigner(SignerFactory.java:119)
at com.amazonaws.auth.SignerFactory.lookupAndCreateSigner(SignerFactory.java:105)
at com.amazonaws.auth.SignerFactory.getSigner(SignerFactory.java:78)
at com.amazonaws.AmazonWebServiceClient.computeSignerByServiceRegion(AmazonWebServiceClient.java:307)
at com.amazonaws.AmazonWebServiceClient.computeSignerByURI(AmazonWebServiceClient.java:280)
at com.amazonaws.AmazonWebServiceClient.setEndpoint(AmazonWebServiceClient.java:160)
at com.amazonaws.services.kinesis.AmazonKinesisClient.setEndpoint(AmazonKinesisClient.java:2102)
at com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:216)
at com.amazonaws.services.kinesis.AmazonKinesisClient.<init>(AmazonKinesisClient.java:202)
at com.amazonaws.services.kinesis.AmazonKinesisClient.<init>(AmazonKinesisClient.java:175)
at com.amazonaws.services.kinesis.clientlibrary.lib.worker.Worker.<init>(Worker.java:106)
at com.amazonaws.services.kinesis.clientlibrary.lib.worker.Worker.<init>(Worker.java:92)
at org.apache.spark.streaming.kinesis.KinesisReceiver.onStart(KinesisReceiver.scala:133)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:125)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:109)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverLauncher$$anonfun$8.apply(ReceiverTracker.scala:308)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverLauncher$$anonfun$8.apply(ReceiverTracker.scala:300)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1767)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1767)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.joda.time.format.DateTimeFormat
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 31 more
Obviously i have created a fat jar using the assembly plugin for sbt that include the spark-streaming-kinesis-asl_2.10 library that has joda-time-2.9.1.jar as dependency. I've listed the file contained in my fat jar and the class is present. To be sure of its presence i've also tryed to use DateTimeFormat from the main class and i hadn't any problem.
I hope that someone could help me to solve this problem.
Thanks.
I suggest to check the classpath entries from the spark "Application Detail UI" => "Environment" tab and check if you see any joda-time entires there.
Related
I usually work with Pyspark but I had to deal with a spark streaming job written in Scala. I am running the spark-submit on EMR directly it works but running the same through Airflow throws me the following error. I don't even to where to start debugging the issue. Any ideas would be greatly appreciated.
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468)
at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
Caused by: com.typesafe.config.ConfigException$IO: available_application.properties -Dlog4j.configuration=log4j-yarn.properties: java.io.FileNotFoundException: available_application.properties -Dlog4j.configuration=log4j-yarn.properties (No such file or directory)
at com.typesafe.config.impl.Parseable.parseValue(Parseable.java:183)
at com.typesafe.config.impl.Parseable.parseValue(Parseable.java:170)
at com.typesafe.config.impl.Parseable.parse(Parseable.java:227)
at com.typesafe.config.ConfigFactory.parseFile(ConfigFactory.java:595)
at com.typesafe.config.ConfigFactory.loadDefaultConfig(ConfigFactory.java:244)
at com.typesafe.config.ConfigFactory.access$000(ConfigFactory.java:38)
at com.typesafe.config.ConfigFactory$1.call(ConfigFactory.java:378)
at com.typesafe.config.ConfigFactory$1.call(ConfigFactory.java:375)
at com.typesafe.config.impl.ConfigImpl$LoaderCache.getOrElseUpdate(ConfigImpl.java:58)
at com.typesafe.config.impl.ConfigImpl.computeCachedConfig(ConfigImpl.java:86)
at com.typesafe.config.ConfigFactory.load(ConfigFactory.java:375)
at com.typesafe.config.ConfigFactory.load(ConfigFactory.java:299)
at com.typesafe.config.ConfigFactory.load(ConfigFactory.java:288)
at com.nike.tdp.AvailabilityKafkaEvents$.main(AvailabilityKafkaEvents.scala:101)
at com.nike.tdp.AvailabilityKafkaEvents.main(AvailabilityKafkaEvents.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:684)
Caused by: java.io.FileNotFoundException: available_application.properties -Dlog4j.configuration=log4j-yarn.properties (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at com.typesafe.config.impl.Parseable$ParseableFile.reader(Parseable.java:512)
at com.typesafe.config.impl.Parseable.rawParseValue(Parseable.java:193)
at com.typesafe.config.impl.Parseable.parseValue(Parseable.java:176)
... 19 more
22/10/26 19:14:02 INFO ShutdownHookManager: Shutdown hook called
Caused by: java.io.FileNotFoundException: available_application.properties -Dlog4j.configuration=log4j-yarn.properties
is the main piece of information in the error you've shown.
It looks like you've made a typo in the parameters for running the app and available_application.properties -Dlog4j.configuration=log4j-yarn.properties is interpreted as the configuration file name instead of only available_application.properties (I assume).
Check the parameters used to run your app, maybe quotes in wrong place or missing? Maybe extra whitespace? ...
I'm using spark 2.0.2 ,scala 2.11.7
When I'm running spark standalone cluster ,I found in master webui the output in completed application .on the other hand ,I found this error
SharedState: Warehouse path is 'file:/home/salma/Movielens/spark-warehouse'.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/HadoopFsRelationProvider
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
How to solve it?
I am using the package IBMSparkGPU/GPUenabler package. I use sbt assembly to package all the dependency into one single jar file and submit it to the spark standalone cluster manager. However, the following error message appear:
org.apache.spark.SparkException: Error sending message [message = CacheGPUDS(a99176e95cf37ba4e5e46b9b172369ac_-1728716590,false)]
at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:119)
at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
at com.ibm.gpuenabler.GPUMemoryManagerMasterEndPoint.com$ibm$gpuenabler$GPUMemoryManagerMasterEndPoint$$tell(GPUMemoryManager.scala:172)
at com.ibm.gpuenabler.GPUMemoryManagerMasterEndPoint$$anonfun$registerGPUMemoryManager$2.apply(GPUMemoryManager.scala:64)
at com.ibm.gpuenabler.GPUMemoryManagerMasterEndPoint$$anonfun$registerGPUMemoryManager$2.apply(GPUMemoryManager.scala:64)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:45)
at com.ibm.gpuenabler.GPUMemoryManagerMasterEndPoint.registerGPUMemoryManager(GPUMemoryManager.scala:64)
at com.ibm.gpuenabler.GPUMemoryManagerMasterEndPoint$$anonfun$receiveAndReply$1.applyOrElse(GPUMemoryManager.scala:143)
at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:105)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
... 16 more
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: com.ibm.gpuenabler.CacheGPUDS
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1866)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1749)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2040)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:108)
at org.apache.spark.rpc.netty.NettyRpcEnv$$anonfun$deserialize$1$$anonfun$apply$1.apply(NettyRpcEnv.scala:259)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at org.apache.spark.rpc.netty.NettyRpcEnv.deserialize(NettyRpcEnv.scala:308)
at org.apache.spark.rpc.netty.NettyRpcEnv$$anonfun$deserialize$1.apply(NettyRpcEnv.scala:258)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at org.apache.spark.rpc.netty.NettyRpcEnv.deserialize(NettyRpcEnv.scala:257)
at org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:577)
at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:562)
at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:159)
at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:107)
at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:119)
at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
... more
There won't be this error message and the program can run if I submit to local[*] as master. I can also eliminate the error when disable the spark.gpuenabler.autocache. However, is there other way to properly fix the issue?
I am using Ubuntu 17.04, JRE 1.8.0, Scala 2.11 and Spark 2.1.0.
It turned out that adding the option
--conf "spark.executor.extraClassPath=file://path/to/jar" will solve the problem. Another thing is that I need to paste the jar file to all the machine with same path. Other wise the worker will not be able to get the jar file.
I am facing an error when I try to query a table in hive when hive uses spark. For example, when I do:
select count(*) from ma_table;
I get this:
Exception in thread "main" java.lang.NoClassDefFoundError: scala/collection/Iterable
at org.apache.hadoop.hive.ql.parse.spark.GenSparkProcContext.<init>(GenSparkProcContext.java:163)
at org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.generateTaskTree(SparkCompiler.java:195)
at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:267)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10947)
at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:246)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:250)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:477)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1242)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1384)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1171)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1161)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
Caused by: java.lang.ClassNotFoundException: scala.collection.Iterable
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 23 more
After some research, I tried to had this to my bashrc (and sourced it ;) ) :
for f in ${HIVE_LIB}/*.jar; do
CLASSPATH=${CLASSPATH}:$f;
done
for f in ${SPARK_HOME}/jars/*.jar; do
CLASSPATH=${CLASSPATH}:$f;
done
for f in ${SCALA_HOME}/lib/*.jar; do
CLASSPATH=${CLASSPATH}:$f;
done
I checked and I have all these jars in the CLASSPATH and still get the error.
I am using Hive 2.1, Hadoop 2.8 and Spark 2.1.
Any ideas? Thanks a lot by advance!
After coming back to it, I found that scala jars files weren't in the lib of Hive.
If you are facing something similar, have a look there:
https://www.linkedin.com/pulse/hive-spark-configuration-common-issues-mohamed-k
I am running into some class path issues when I try to run the command 'play run', but when I use 'play start' everything works like a charm.
Some background:
I am trying to create a Kafka Producer using Avro Serialization as a Play application.
I am passing the properties to the Kafka Producer using TypeSafe config library. A few of the properties are about the serializer class, key serializer class, partitioner class.
Folder structure
Base
- Controller
- KafkaProducer
- Util
- Serializer
- class A
- class B
package structure for the classes is com.abc.xyz.KafkaProducer, com.abc.qwer.classA etc.
I have verified that the respective classes are generated in the target folder.
I launch the application using 'play run'. When the producer tries to load the corresponding classes it throws a ClassNotFoundException.
! Internal server error, for (POST) [/publish/topic1] ->
java.lang.ExceptionInInitializerError: null
at Routes$$anonfun$routes$1$$anonfun$applyOrElse$2$$anonfun$apply$4.apply(routes_routing.scala:61) ~[na:na]
at Routes$$anonfun$routes$1$$anonfun$applyOrElse$2$$anonfun$apply$4.apply(routes_routing.scala:61) ~[na:na]
at play.core.Router$HandlerInvoker$$anon$6.call(Router.scala:170) ~[play_2.10-2.2.4.jar:2.2.4]
at play.core.Router$Routes$class.invokeHandler(Router.scala:375) ~[play_2.10-2.2.4.jar:2.2.4]
at Routes$.invokeHandler(routes_routing.scala:15) ~[na:na]
at Routes$$anonfun$routes$1$$anonfun$applyOrElse$2.apply(routes_routing.scala:61) ~[na:na]
Caused by: java.lang.ClassNotFoundException: com.abc.qwer.classA
at java.net.URLClassLoader$1.run(URLClassLoader.java:202) ~[na:1.6.0_65]
at java.security.AccessController.doPrivileged(Native Method) ~[na:1.6.0_65]
at java.net.URLClassLoader.findClass(URLClassLoader.java:190) ~[na:1.6.0_65]
at java.lang.ClassLoader.loadClass(ClassLoader.java:306) ~[na:1.6.0_65]
at java.lang.ClassLoader.loadClass(ClassLoader.java:247) ~[na:1.6.0_65]
at java.lang.Class.forName0(Native Method) ~[na:1.6.0_65]
[error] application - Error while rendering default error page
scala.MatchError: java.lang.ExceptionInInitializerError (of class java.lang.ExceptionInInitializerError)
at play.api.GlobalSettings$class.onError(GlobalSettings.scala:131) ~[play_2.10-2.2.4.jar:2.2.4]
at play.api.DefaultGlobal$.onError(GlobalSettings.scala:189) [play_2.10-2.2.4.jar:2.2.4]
at play.core.server.Server$class.logExceptionAndGetResult$1(Server.scala:73) [play_2.10-2.2.4.jar:2.2.4]
at play.core.server.Server$$anonfun$getHandlerFor$4.apply(Server.scala:83) [play_2.10-2.2.4.jar:2.2.4]
at play.core.server.Server$$anonfun$getHandlerFor$4.apply(Server.scala:81) [play_2.10-2.2.4.jar:2.2.4]
at scala.util.Either$RightProjection.flatMap(Either.scala:523) [scala-library.jar:na]
When I use the 'play start'. Everything works fine.
Any help would be appreciated. Thanks.