I'm trying to run some .jar file made in scala-Spark 2.0.2. in my Spark-Kafka cluster. The code's here:
import java.util.HashMap
import org.apache.kafka.clients.producer.{KafkaProducer,ProducerConfig}
import org.apache.spark.SparkConf
import org.apache.spark.streaming._
import org.apache.spark.streaming.kafka._
object sparkKafka{
def main(args: Array[String]): Unit = {
if(args.length < 4){
System.err.println("Usage: sparkKafka <zkQuorum><group> <topics> <numThreads>")
System.exit(1)
}
val Array(zkQuorum, group, topics, numThreads) = args
val sparkConf = new SparkConf()
.setAppName("sparkKafka")
val ssc = new StreamingContext(sparkConf, Seconds(2))
ssc.checkpoint("E:/temp/")
val topicMap = topics.split(",").map((_, numThreads.toInt)).toMap
val lines = KafkaUtils.createStream(ssc, zkQuorum, group, topicMap).map(_._2)
val words = lines.flatMap(_.split(" "))
val wordsCounts = words.map(x => (x, 1L))
.reduceByKeyAndWindow(_ + _, _ - _, Minutes(10), Seconds(2), 2)
wordsCounts.print()
ssc.start()
ssc.awaitTermination()
}
}
I built .jar file named: kafka-spark.jar and scp-it to my node in spark2 folder so it could read it.
Afterwards I went to start the script with:
bin/spark-submit --class "sparkKafka" --master local[4] kafka-spark.jar hdp2.local:2181 group1 Topic-example 1 -verbose
The error I'm getting is like it said in the head of topic, or ClassNotFoundException: sparkKafka
[root#hdp2 spark2]# bin/spark-submit --class "sparkKafka" --master local[4] kafka-spark.jar hdp2.local:2181 group1 Topic-example 1 -verbose
java.lang.ClassNotFoundException: sparkKafka
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:225)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:686)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Where am I making mistake? Also I tried with full path to my jar file, but eihter I get that .jar not found or this error above. Also I tried without -v but I think it doesn't make any difference.
It would be great if someone knows where is a problem. Thank you!
Did you try calling it like this
bin/spark-submit --class "sparkKafka" --master local[4] kafka-spark.jar hdp2.local:2181 group1 Topic-example 1 -verbose
If that does not work you might want to extract content of kafka-spark.jar and check if it actually contains sparkKafka class.
Hava you add package in your sparkKafka.scala ?
Related
I am trying to load in a table from SalesForce using spark. I invoked this code
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql._
object Sample {
def main(arg: Array[String]) {
val spark = SparkSession.builder().
appName("salesforce").
master("local[*]").
getOrCreate()
val tableName = "Opportunity"
val outputPath = "output/result" + tableName
val salesforceDf = spark.
read.
format("jdbc").
option("url", "jdbc:datadirect:sforce://login.salesforce.com;").
option("driver", "com.ddtek.jdbc.sforce.SForceDriver").
option("dbtable", tableName).
option("user", "").
option("password", "xxxxxxxxx").
option("securitytoken", "xxxxx")
.load()
salesforceDf.createOrReplaceTempView("Opportunity")
spark.sql("select * from Opportunity").collect.foreach(println)
//save the result
salesforceDf.write.save(outputPath)
}
}
And the docs I was referring to said to start a spark shell as:
spark-shell --jars /path_to_driver/sforce.jar
Which outputted a lot of lines in the terminal and this was the last line:
22/07/12 14:57:56 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-a12b060d-5c82-4283-b2b9-53f9b3863b53
And then to submit spark
spark-submit --jars sforce.jar --class <Your class name> your jar file
However I am not sure where this jar file is and if that was substantiated? and how to submit that. Any help is appreciated, thank you.
I would like to read a conf file in to my spark application. The conf file is located in Hadoop edge node directory.
omega.conf
username = "surrender"
location = "USA"
My Spark Code :
package com.test.spark
import org.apache.spark.{SparkConf, SparkContext}
import java.io.File
import com.typesafe.config.{ Config, ConfigFactory }
object DemoMain {
def main(args: Array[String]): Unit = {
println("Lets Get Started ")
val conf = new SparkConf().setAppName("SIMPLE")
val sc = new SparkContext(conf)
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
val conf_loc = "/home/cloudera/localinputfiles/omega.conf"
loadConfigFile(conf_loc)
}
def loadConfigFile(loc:String):Unit ={
val config = ConfigFactory.parseFile(new File(loc))
val username = config.getString("username")
println(username)
}
}
I am running this spark application using spark-submit
spark-submit --class com.test.spark.DemoMain --master local /home/cloudera/dev/jars/spark_examples.jar
Spark job is initiated ,but it throws me the below error .It says that No configuration setting found for key 'username'
17/03/29 12:57:37 INFO SparkContext: Created broadcast 0 from textFile at DemoMain.scala:25
Exception in thread "main" com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'username'
at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:115)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:136)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:150)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:155)
at com.typesafe.config.impl.SimpleConfig.getString (SimpleConfig.java:197)
at com.test.spark.DemoMain$.loadConfigFile(DemoMain.scala:53)
at com.test.spark.DemoMain$.main(DemoMain.scala:27)
at com.test.spark.DemoMain.main(DemoMain.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Please help me on fixing this issue
I just tried its working fine i test this with below code
val config=ConfigFactory.parseFile(new File("/home/sandy/my.conf"))
println("::::::::::::::::::::"+config.getString("username"))
and conf file is
username = "surrender"
location = "USA"
Please check location of your file by printing it.
I have problem when write data to mongo after read and map data.
This is script I use to run the program.
I am using Spark 1.4.0, Scala 2.11.7 and mongo 2.6.10
#!/usr/bin/env bash
SPARK_PATH="/Users/username/spark-1.4.0-bin-hadoop2.6/bin/spark-submit"
CLASS_NAME="com.knx.conversion.ScalaWordCount"
CLUSTER='local[2]'
JARS="/Users/username/spark-1.4.0-bin-hadoop2.6/lib/mongo-hadoop-core-1.4.0.jar,/Users/username/spark-1.4.0-bin-hadoop2.6/lib/mongo-java-driver-3.0.3.jar"
JAR="/Users/username/AggragateConversionFunnel/target/scala-2.11/aggragateconversionfunnel_2.11-1.0.jar"
PROJECT_PATH="/Users/username/AggragateConversionFunnel"
cd ${PROJECT_PATH} && sbt package
${SPARK_PATH} --class ${CLASS_NAME} --master ${CLUSTER} --jars ${JARS} $JAR
and here is the main program here. Just copy from [here][1] and change the input output collection.
package com.knx.conversion
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.hadoop.conf.Configuration
import org.bson.BSONObject
import org.bson.BasicBSONObject
object ScalaWordCount {
def main(args: Array[String]) {
val sc = new SparkContext("local", "Scala Word Count")
val config = new Configuration()
config.set("mongo.input.uri", "mongodb://127.0.0.1:27017/first-week.interactions")
config.set("mongo.output.uri", "mongodb://127.0.0.1:27017/visit_06_2015.output")
val mongoRDD = sc.newAPIHadoopRDD(config, classOf[com.mongodb.hadoop.MongoInputFormat], classOf[Object], classOf[BSONObject])
// Input contains tuples of (ObjectId, BSONObject)
// Output contains tuples of (null, BSONObject) - ObjectId will be generated by Mongo driver if null
val countsRDD = mongoRDD.flatMap(arg => {
val str = arg._2.get("referer").toString
str.split("h")
})
.map(word => (word, 1))
.reduceByKey((a, b) => a + b)
countsRDD.foreach(println)
val saveRDD = countsRDD.map((tuple) => {
val bson = new BasicBSONObject()
bson.put("word", tuple._1)
bson.put("count", tuple._2.toString)
(null, bson)
})
// Only MongoOutputFormat and config are relevant
saveRDD.saveAsNewAPIHadoopFile("file:///bogus", classOf[Any], classOf[Any], classOf[com.mongodb.hadoop.MongoOutputFormat[Any, Any]], config)
}
}
When run I got error
5/07/24 15:53:03 INFO DAGScheduler: Job 0 finished: foreach at ScalaWordCount.scala:39, took 1.111442 s
Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.$conforms()Lscala/Predef$$less$colon$less;
at com.knx.conversion.ScalaWordCount$.main(ScalaWordCount.scala:48)
at com.knx.conversion.ScalaWordCount.main(ScalaWordCount.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
15/07/24 15:53:03 INFO SparkContext: Invoking stop() from shutdown hook
Just don't know why and how it happened.
[1]: https://github.com/plaa/mongo-spark/blob/master/src/main/scala/ScalaWordCount.scala
This issued is about Scala version that I am currently using is not matching with Spark Scala version.
I am using Scala 2.11.7 to compile and package the jar but Spark 1.4.1 is using Scala 2.10.4.
The answer I found out here.
Then this issue solve by switching version of Scala to 2.10.4.
I'm new to Spark and wrote a very simple Spark application in Scala as below:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
object test2object {
def main(args: Array[String]) {
val logFile = "src/data/sample.txt"
val sc = new SparkContext("local", "Simple App", "/path/to/spark-0.9.1-incubating",
List("target/scala-2.10/simple-project_2.10-1.0.jar"))
val logData = sc.textFile(logFile, 2).cache()
val numTHEs = logData.filter(line => line.contains("the")).count()
println("Lines with the: %s".format(numTHEs))
}
}
I'm coding in Scala IDE and included the spark-assembly.jar into my code. I generate a jar file from my project and submit that to my local spark cluster using this command spark-submit --class test2object --master local[2] ./file.jar but I get this error message:
Exception in thread "main" java.lang.NoSuchMethodException: test2object.main([Ljava.lang.String;)
at java.lang.Class.getMethod(Class.java:1665)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:649)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
What is wrong here?
p.s. my source code is under the project root directory (project/test2object.scala)
I didn't use spark 0.9.1 before, but I believed the problem came from this line of code:
val sc = new SparkContext("local", "Simple App", "/path/to/spark-0.9.1-incubating", List("target/scala-2.10/simple-project_2.10-1.0.jar"))
If you change to this:
val conf = new SparkConf().setAppName("Simple App")
val sc = new SparkContext(conf)
This will work.
I have written a sample Spark program in Scala to count the number of lines of a text file present in Amazon S3. Below is my sample program.
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import java.util.{Map => JMap}
import org.apache.hadoop.conf.Configuration
object CountLines {
def main(args: Array[String]) {
val sc = new SparkContext(new SparkConf().setAppName("CountLines").setMaster("local"))
sc.hadoopConfiguration.set("fs.s3.awsAccessKeyId","ABC");
sc.hadoopConfiguration.set("fs.s3.awsSecretAccessKey","XYZ");
sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId","ABC");
sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey","XYX");
sc.hadoopConfiguration.set("fs.s3n.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
sc.hadoopConfiguration.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
val path ="s3:///my-bucket/test/test.txt";
println("num lines: " + countLines(sc, path));
}
def countLines(sc: SparkContext, path: String): Long = {
sc.textFile(path).count();
}
}
Unfortunately I am getting IllegalArgumentException which has something to do with credentials. Below is the stack trace.
Exception in thread "main" java.lang.IllegalArgumentException: Invalid hostname in URI s3:/my-bucket/test/test.txt
at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:45)
at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.initialize(Jets3tNativeFileSystemStore.java:76)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
I have given valid credentials. I package this as a JAR file and run on the cluster using spark-submit command. I am not sure if this is the right way to set the access key and secret key in spark. I have tried different approaches but nothing seems to work. Throwing some light on this issue would be highly appreciated.
Thanks,
J Joseph
You have an extra slash. You have to change s3:///my-bucket/test/test.txt to s3://my-bucket/test/test.txt.