java.lang.NoSuchMethodError: $.refArrayOps(java.lang.Object[]) - scala

I have the following spark code stated below
spark-submit --class lr --master yarn --deploy-mode client --driver-memory 4g --executor-memory --num-executors 4 --executor-cores 4 --queue xxxxx_p1 /dd/ff/ff/fff/dd/dd/ddd/ddd/Automation/full-1.0-SNAPSHOT.jar /dd/dd/dd/dd/dd/dd/dd/dd/dd/lr_srvc_dt.properties
When I run this I get the following error,
22/06/29 04:50:41 ERROR SparkSubmit$$anon$2: 'scala.collection.mutable.ArrayOps scala.Predef$.refArrayOps(java.lang.Object[])'
java.lang.NoSuchMethodError: 'scala.collection.mutable.ArrayOps scala.Predef$.refArrayOps(java.lang.Object[])'
at labresult$.main(labresult.scala:30)
at labresult.main(labresult.scala)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1060)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1069)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Build definitions:
Scala version 2.12.15, OpenJDK 64-Bit Server VM, 11.0.14.1
Spark_Version:- version 3.2.0.3-eep-810
JDK-8
I can't figure out what causes this. And how to resolve such issue? Any ideas?

Related

Spark Submit failure on EMR - _"java.lang.IllegalStateException: ... make sure Spark is built."_

I can't submit a Spark job via spark-submit on EMR. My spark-submit looks like below -
sudo spark-submit --class timeusage.TimeUsage \
--deploy-mode cluster --master yarn \
--num-executors 2 --conf spark.executor.cores=2 \
--conf spark.executor.memory=2g --conf spark.driver.memory=1g \
--conf spark.driver.cores=1 --conf spark.logConf=true \
--conf spark.yarn.appMasterEnv.SPARKMASTER=yarn \
--conf spark.yarn.appMasterEnv.WAREHOUSEDIR=s3a://whbucket/spark-warehouse \
--conf spark.yarn.appMasterEnv.S3AACCESSKEY=xxx \
--conf spark.yarn.appMasterEnv.S3ASECRETKEY=yyy \
--jars s3://bucket/week3-assembly-0.1.0-SNAPSHOT.jar \
s3:/bucket/week3-assembly-0.1.0-SNAPSHOT.jar \
s3a://sbucket/atussum.csv
The error looks like below -
19/06/04 07:36:59 WARN TaskSetManager: Lost task 1.0 in stage 1.0 (TID 2, ip-172-31-66-110.ec2.internal, executor 1): java.lang.ExceptionInInitializerError
at timeusage.TimeUsage$$anonfun$8.apply(TimeUsage.scala:70)
at timeusage.TimeUsage$$anonfun$8.apply(TimeUsage.scala:70)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:463)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.agg_doAggregateWithKeys_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$11$$anon$1.hasNext(WholeStageCodegenExec.scala:619)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: Library directory '/mnt/yarn/usercache/root/appcache/application_1559614942233_0036/container_1559614942233_0036_02_000002/assembly/target/scala-2.11/jars' does not exist; make sure Spark is built.
at org.apache.spark.launcher.CommandBuilderUtils.checkState(CommandBuilderUtils.java:248)
at org.apache.spark.launcher.CommandBuilderUtils.findJarsDir(CommandBuilderUtils.java:342)
at org.apache.spark.launcher.YarnCommandBuilderUtils$.findJarsDir(YarnCommandBuilderUtils.scala:38)
at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:543)
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:863)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:177)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:178)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:501)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:936)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:927)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:927)
at timeusage.TimeUsage$.<init>(TimeUsage.scala:23)
at timeusage.TimeUsage$.<clinit>(TimeUsage.scala)
... 23 more
I've verified that my project's build dependencies are all correct. And the project works on local[*].
This is the first time I'm working with a multi-module SBT project - I'm not sure if this has something to do with it?
I've added the Assembly JAR to be executed to the --jars config but it hasn't had any impact at all.
My build.sbt is here - https://github.com/kevvo83/scala-spark-ln/blob/master/build.sbt
The expected result is that the project runs to completion and creates Hive tables in S3.
I'm still investigating, and will post updates in here as soon as I have them.
After Harsh's answer, I've added these 2 lines to my spark-submit command -
--files /usr/lib/spark/conf/hive-site.xml \
--jars s3://bucket/week3-assembly-0.1.0-SNAPSHOT.jar \
Now the stacktrace error is -
19/06/06 10:37:55 WARN TaskSetManager: Lost task 1.0 in stage 1.0 (TID 2, ip-172-31-76-146.ec2.internal, executor 1): java.lang.NoClassDefFoundError: Could not initialize class *timeusage.TimeUsage*$
at timeusage.TimeUsage$$anonfun$8.apply(TimeUsage.scala:70)
at timeusage.TimeUsage$$anonfun$8.apply(TimeUsage.scala:70)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:463)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.agg_doAggregateWithKeys_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$11$$anon$1.hasNext(WholeStageCodegenExec.scala:619)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
(FYI timeusage.TimeUsage is my class in the JAR). Is there anything else I need to include to ensure my class defs are getting pickedup?
UPDATE: I've got this to work - I believe the last 3 confs in the code snippet below are what worked (based on how the docs say that Spark loads Jars into staging area on HDFS for the Executors to access).
--conf spark.executorEnv.SPARK_HOME=/usr/lib/spark/
--conf spark.yarn.jars=/usr/lib/spark/jars/*.jar
--conf spark.network.timeout=600000
--files /usr/lib/spark/conf/spark-defaults.conf
Further, the spark-submit executes the Jar from local disk - and not from S3 bucket as how I was wrongly doing it earlier.
Marking the answer as correct, as it put me on the right track to resolving it.
As you are running the spark-submit in yarn master mode, you will need to pass hive-site.xml as files argument in your command :
sudo spark-submit --class timeusage.TimeUsage \
--deploy-mode cluster --master yarn \
--num-executors 2 --conf spark.executor.cores=2 \
--conf spark.executor.memory=2g --conf spark.driver.memory=1g \
--conf spark.driver.cores=1 --conf spark.logConf=true \
--conf spark.yarn.appMasterEnv.SPARKMASTER=yarn \
--conf spark.yarn.appMasterEnv.WAREHOUSEDIR=s3a://whbucket/spark-warehouse \
--conf spark.yarn.appMasterEnv.S3AACCESSKEY=xxx \
--conf spark.yarn.appMasterEnv.S3ASECRETKEY=yyy \
--jars s3://bucket/week3-assembly-0.1.0-SNAPSHOT.jar \
--files /usr/lib/spark/conf/hive-site.xml \
s3:/bucket/week3-assembly-0.1.0-SNAPSHOT.jar \
s3a://sbucket/atussum.csv

Can not read avro in DataProc Spark with spark-avro

I have a cluster on Google DataProc (with image 1.4) and I want to read avro files with Spark from google cloud storage. I follow this guide: Spark read avro.
The command I ran is:
gcloud dataproc jobs submit pyspark test.py \
--cluster $CLUSTER_NAME \
--region $REGION \
--properties spark.jars.packages='org.apache.spark:spark-avro_2.12:2.4.1'
test.py is very simple, just
from pyspark.sql import SparkSession
from pyspark.sql import SQLContext
spark = SparkSession.builder.appName('test').getOrCreate()
df = spark.read.format("avro").load("gs://mybucket/abc.avro")
df.show()
I got the following error:
Py4JJavaError: An error occurred while calling o196.load.
: java.util.ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister: Provider org.apache.spark.sql.avro.AvroFileFormat could not be instantiated
at java.util.ServiceLoader.fail(ServiceLoader.java:232)
at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247)
at scala.collection.TraversableLike$class.filter(TraversableLike.scala:259)
at scala.collection.AbstractTraversable.filter(Traversable.scala:104)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:630)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoSuchMethodError: org.apache.spark.sql.execution.datasources.FileFormat.$init$(Lorg/apache/spark/sql/execution/datasources/FileFormat;)V
at org.apache.spark.sql.avro.AvroFileFormat.<init>(AvroFileFormat.scala:44)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
... 24 more
Even if I ssh to master node and start the shell there with spark-shell --packages org.apache.spark:spark-avro_2.12:2.4.1, running val usersDF = spark.read.format("avro").load("gs://mybucket/abc.avro") has the same error.
Why this happens? Thank you.
Dataproc 1.4 uses Spark 2.4.0, not Spark 2.4.1, which normally wouldn't be expected to be a problem, but whereas Spark 2.4.0 uses Scala 2.11, Spark 2.4.1 uses Scala 2.12.
You can also see the avro artifact on a Dataproc cluster under /usr/lib/spark/external:
$ ls -l /usr/lib/spark/external
total 13656
-rw-r--r-- 1 root root 187385 Mar 6 23:25 spark-avro_2.11-2.4.0.jar
...
So you just need to use:
spark-shell --packages org.apache.spark:spark-avro_2.11:2.4.0

how to setup spark environment?

I am new to both scala and spark and I am using Intellij for running spark application.
It is an helloworld to spark using scala.
I got the code from GitHub
and I am getting these errors even after doing setup for spark using maven.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/11/02 22:31:22 INFO SparkContext: Running Spark version 1.6.0
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration
at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<init>(DefaultMetricsSystem.java:38)
at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<clinit>(DefaultMetricsSystem.java:36)
at org.apache.hadoop.security.UserGroupInformation$UgiMetrics.create(UserGroupInformation.java:99)
at org.apache.hadoop.security.UserGroupInformation.<clinit>(UserGroupInformation.java:192)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2136)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2136)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2136)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:322)
at HelloWorld$.main(HelloWorld.scala:14)
at HelloWorld.main(HelloWorld.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.configuration.Configuration
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 16 more
I know it is a very simple error, but I tried almost every web-link and still not able to get the solution.

Spark on Yarn: Max number of executor failures reached

When I am running spark job on cluster mode I am facing following issue:
6/05/25 12:42:55 INFO Client: Application report for application_1464166348026_0025 (state: RUNNING)
16/05/25 12:42:56 INFO Client: Application report for application_1464166348026_0025 (state: FINISHED)
16/05/25 12:42:56 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 10.255.8.181
ApplicationMaster RPC port: 0
queue: root.pimuser
start time: 1464172925289
final status: FAILED
tracking URL: http://test-hadoop-001.localdomain:8088/proxy/application_1464166348026_0025/history/application_1464166348026_0025/2
user: pimuser
Exception in thread "main" org.apache.spark.SparkException: Application application_1464166348026_0025 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:927)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:973)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/05/25 12:42:56 INFO ShutdownHookManager: Shutdown hook called
Following Command I am using to run the job.
spark-submit --driver-java-options -XX:MaxPermSize=2048m --driver-memory 4g --deploy-mode cluster --master yarn --files cluster.xls --class com.app.test.Matching target/test-0.0.1-SNAPSHOT-jar-with-dependencies.jar
Even I tried --master yarn-cluster also but I got same error.
I am using cloudera 5.5 ,Hadoop 2.6.0-cdh5.5.1 and Spark 1.5 versions.

Spark submit error: java.lang.ClassNotFoundException: DirectKafkaWordCount

I want to running a spark streaming example DirectKafkaWordCount.
This is my directory structure:
root#sandbox:/usr/local/spark/test# find
.
./src
./src/main
./src/main/scala
./src/main/scala/DirectKafkaWordCount.scala
./simple.sbt
sbt package is done, everything is ok.
........
[info] Done updating.
[info] Compiling 1 Scala source to /usr/local/spark-1.6.0-bin-hadoop2.6/test/target/scala-2.10.5/classes...
[info] Packaging /usr/local/spark-1.6.0-bin-hadoop2.6/test/target/scala-2.10.5/direct-kafka-word-count_2.10.5-1.0.jar ...
[info] Done packaging.
[success] Total time: 60 s, completed May 12, 2016 1:34:04 AM
but errors found when I run the spark-submit:
root#sandbox:/usr/local/spark# bin/spark-submit --class DirectKafkaWordCount --master local[4] test/target/scala-2.10.5/direct-kafka-word-count_2.10.5-1.0.jar
java.lang.ClassNotFoundException: DirectKafkaWordCount
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:278)
at org.apache.spark.util.Utils$.classForName(Utils.scala:174)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:689)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
I am new in Spark, hope someone can help me.
->Actually I think you should include the groupId and artifactId of your class when you run the
spark-submit command like:
spark-submit --class com.balabala.spark.DirectKafkaWordCount --master local[4] test/target/scala-2.10.5/direct-kafka-word-count_2.10.5-1.0.jar
Then spark should be able to find your class.