I try to run Spark with Scala from inside Intellij Idea:
object SimpleApp {
def main(args: Array[String]) {
val logFile = "/home/kamil/Apps/spark-1.2.1-bin/README.md" // Should be some file on your system
val conf = new SparkConf().setAppName("Simple Application").setMaster("local[4]")
val sc = new SparkContext(conf)
val logData = sc.textFile(logFile, 2).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()
println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
}
}
Running it within spark-submit works all right. Running it from IDE result in a following error:
Exception in thread "main" java.lang.NoClassDefFoundError: javax/servlet/http/HttpServletResponse
at org.apache.spark.HttpServer.org$apache$spark$HttpServer$$doStart(HttpServer.scala:74)
at org.apache.spark.HttpServer$$anonfun$1.apply(HttpServer.scala:61)
at org.apache.spark.HttpServer$$anonfun$1.apply(HttpServer.scala:61)
at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1765)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1756)
at org.apache.spark.HttpServer.start(HttpServer.scala:61)
at org.apache.spark.HttpFileServer.initialize(HttpFileServer.scala:46)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:320)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:159)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:240)
at SimpleApp$.main(SimpleApp.scala:8)
at SimpleApp.main(SimpleApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
Caused by: java.lang.ClassNotFoundException: javax.servlet.http.HttpServletResponse
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 18 more
SimpleApp.scala:8 is the line with instantiating spark context. As someone suggested I've already added:
libraryDependencies += "javax.servlet" % "javax.servlet-api" % "3.0.1"
but it didn't help. Do you have any ideas? Thanks in advance.
I've just solved this issue myself. You need to change module settings.
Context Menu -> Open Module Settings -> Dependencies
Change the 'scope' of the missing jar from 'Provided' to 'Compile'.
I see that there is a pull request in Spark project to solve the issue:
https://github.com/apache/spark/pull/4411
Related
I have wrote small program in scala using spark 1.6.0 in intellij IDE, jar got build but its throwing error while running.
Please share your inputs to resolve this issue.
[cloudera#quickstart ~]$ spark-submit --master local --class com.sample.sample.sample /home/cloudera/Desktop/IdeaProjects.jar
Exception in thread "main" java.lang.SecurityException: Invalid signature file digest for Manifest main attributes
at sun.security.util.SignatureFileVerifier.processImpl(SignatureFileVerifier.java:286)
at sun.security.util.SignatureFileVerifier.process(SignatureFileVerifier.java:239)
at java.util.jar.JarVerifier.processEntry(JarVerifier.java:317)
at java.util.jar.JarVerifier.update(JarVerifier.java:228)
at java.util.jar.JarFile.initializeVerifier(JarFile.java:348)
at java.util.jar.JarFile.getInputStream(JarFile.java:415)
at sun.misc.JarIndex.getJarIndex(JarIndex.java:137)
at sun.misc.URLClassPath$JarLoader$1.run(URLClassPath.java:674)
at sun.misc.URLClassPath$JarLoader$1.run(URLClassPath.java:666)
at java.security.AccessController.doPrivileged(Native Method)
at sun.misc.URLClassPath$JarLoader.ensureOpen(URLClassPath.java:665)
at sun.misc.URLClassPath$JarLoader.<init>(URLClassPath.java:638)
at sun.misc.URLClassPath$3.run(URLClassPath.java:366)
at sun.misc.URLClassPath$3.run(URLClassPath.java:356)
at java.security.AccessController.doPrivileged(Native Method)
at sun.misc.URLClassPath.getLoader(URLClassPath.java:355)
at sun.misc.URLClassPath.getLoader(URLClassPath.java:332)
at sun.misc.URLClassPath.getResource(URLClassPath.java:198)
at java.net.URLClassLoader$1.run(URLClassLoader.java:358)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.spark.util.Utils$.classForName(Utils.scala:177)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:688)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
I have tried options provided in existed thread like delete MANIFEST.MF file..etc , But no luck
package com.sample.sample
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkConf, SparkContext}
object sample {
def main(args: Array[String]): Unit = {
println("Hello World")
val conf = new SparkConf().setAppName("JDBC App").setMaster("local")
val sc = new SparkContext(conf)
// val sc = new SparkConf()
val sqlContext = new SQLContext(sc)
val url ="jdbc:mysql://localhost:3306/retail_db?user=root&password=cloudera"
val df = sqlContext.read.format("jdbc").option("url", url).option("dbtable", "products").load()
df.printSchema()
}
}
Following is the command I am running:
spark-submit --master local --class com.sample.sample.sample /home/cloudera/Desktop/IdeaProjects.jar
expecting this code should work
I am new to Spark, and I'm using Scala 2.12.8 with Spark 2.4.0. I'm trying to use the Random Forest classifier in Spark MLLib. I can build and train the classifier, and the classifier can predict if I use the first() function on the resulting RDD. However, if I try to use the take(n) function, I get a pretty big, ugly stack trace. Does anyone know what I'm doing wrong? The error is occurring in the line: ".take(3)". I am aware that this is the first effectful operation that I'm performing on the RDD, so if anyone can explain to me why it's failing and how to fix it, I would be really grateful.
object ItsABreeze {
def main(args: Array[String]): Unit = {
val spark: SparkSession = SparkSession
.builder()
.appName("test")
.getOrCreate()
//Do stuff to file
val data: RDD[LabeledPoint] = MLUtils.loadLibSVMFile(spark.sparkContext, "file.svm")
// Split the data into training and test sets (30% held out for testing)
val splits: Array[RDD[LabeledPoint]] = data.randomSplit(Array(0.7, 0.3))
val (trainingData, testData) = (splits(0), splits(1))
// Train a RandomForest model.
// Empty categoricalFeaturesInfo indicates all features are continuous
val numClasses = 4
val categoricaFeaturesInfo = Map[Int, Int]()
val numTrees = 3
val featureSubsetStrategy = "auto"
val impurity = "gini"
val maxDepth = 5
val maxBins = 32
val model: RandomForestModel = RandomForest.trainClassifier(
trainingData,
numClasses,
categoricaFeaturesInfo,
numTrees,
featureSubsetStrategy,
impurity,
maxDepth,
maxBins
)
testData
.map((point: LabeledPoint) => model.predict(point.features))
.take(3)
.foreach(println)
spark.stop()
}
}
The top portion of the stack trace follows:
java.io.IOException: unexpected exception type
at java.io.ObjectStreamClass.throwMiscException(ObjectStreamClass.java:1736)
at java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1266)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2078)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:83)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at java.lang.invoke.SerializedLambda.readResolve(SerializedLambda.java:230)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1260)
... 25 more
Caused by: java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: scala/runtime/LambdaDeserialize
at ItsABreeze$.$deserializeLambda$(ItsABreeze.scala)
... 35 more
Caused by: java.lang.NoClassDefFoundError: scala/runtime/LambdaDeserialize
... 36 more
Caused by: java.lang.ClassNotFoundException: scala.runtime.LambdaDeserialize
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
The code that I was trying to run was a slightly modified version of the classification example on this page (from the Spark Machine Learning Library documentation).
Both commenters on my original question were correct: I changed the version of Scala that I was using from 2.12.8 to 2.11.12, and I reverted Spark to 2.2.1, and the code ran just as it was.
For anyone watching this issue that is qualified to answer, here is a followup question: Spark 2.4.0 claims to have new, experimental support for Scala 2.12.x. Are there a lot of known issues with the 2.12.x support?
I am trying to build my first spark & cassandra app using sbt.
here is the code from .scala file .
/* SimpleApp.scala */
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import com.datastax.spark.connector._,org.apache.spark.SparkContext,org.apache.spark.SparkContext._, org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.sql._
import org.apache.spark.SparkConf
import com.datastax.driver.core.utils.UUIDs
import org.apache.spark.sql.functions.udf
import org.apache.spark.sql.cassandra
import org.apache.spark.sql.cassandra._
import com.datastax.spark.connector.cql.CassandraConnectorConf
import com.datastax.spark.connector.rdd.ReadConf
object SimpleApp {
def main(args: Array[String]) {
//val logFile = "/home/goutham/derby.log" // Should be some file on your system
val conf = new SparkConf().setAppName("Simple Application")
val sc = new SparkContext(conf)
//val logData = sc.textFile(logFile, 2).cache()
//val numAs = logData.filter(line => line.contains("a")).count()
//val numBs = logData.filter(line => line.contains("b")).count()
//println(s"Lines with a: $numAs, Lines with b: $numBs")
val timeUUID = udf(() => UUIDs.timeBased().toString)
val sqlcontext = new org.apache.spark.sql.SQLContext(sc)
val df = sqlcontext.read.format("com.databricks.spark.csv").option("wholeFile", "true").option("header", "true").option("parserLib", "UNIVOCITY").option("quote","\"").option("inferSchema", "true").option("escape","\"").option("quoteMode","ALL").load("/home/goutham/Work/data/user.csv").withColumn("user_uuid", timeUUID())
df.createOrReplaceTempView("source_user")
val num = df.count()
println(s" Number of records to be proccessed in the file is $num")
sqlcontext.sql("""CREATE TEMPORARY VIEW Dest_user
|USING org.apache.spark.sql.cassandra
|OPTIONS (
| table "t_user",
| keyspace "ks_payu",
| cluster "Test Cluster",
| pushdown "true"
|)""".stripMargin)`
val df_oldrecordsUpdate = sqlcontext.sql("""Select dest.user_uuid,
dest.user_id,
dest.account_manager_id,
dest.address,
dest.address_city,
dest.address_line_2,
dest.address_line_3,
dest.affiliate,
dest.api_key,
dest.api_login,
dest.api_version,
dest.bcash_account,
dest.bcash_consumer_key,
dest.bcash_customer_id,
dest.bcash_email,
dest.bcash_token,
dest.valid_from_date,
current_timestamp() valid_to_date,
0 active_flag from source_user source inner join Dest_user dest on source.usuario_id=dest.user_id""")
following is the .sbt file used
name := "Simple Project"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.2"
libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector" % "2.0.0"
libraryDependencies += "org.apache.spark" % "spark-sql_2.11" % "2.0.2"
error
Number of records to be proccessed in the file is 10
17/04/12 16:24:08 INFO SparkSqlParser: Parsing command: CREATE TEMPORARY VIEW Dest_user
USING org.apache.spark.sql.cassandra
OPTIONS (
table "t_user",
keyspace "ks_payu",
cluster "Test Cluster",
pushdown "true")
Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: org.apache.spark.sql.cassandra. Please find packages at https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects
at org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(DataSource.scala:148)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:79)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:79)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:340)
at org.apache.spark.sql.execution.datasources.CreateTempViewUsing.run(ddl.scala:82)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:186)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:167)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:65)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:682)
at SimpleApp$.main(simpleApp.scala:61)
at SimpleApp.main(simpleApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.cassandra.DefaultSource
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5$$anonfun$apply$1.apply(DataSource.scala:132)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5$$anonfun$apply$1.apply(DataSource.scala:132)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5.apply(DataSource.scala:132)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5.apply(DataSource.scala:132)
at scala.util.Try.orElse(Try.scala:84)
at org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(DataSource.scala:132)
... 31 more
**error -2 **
java.lang.NoClassDefFoundError: scala/runtime/AbstractPartialFunction$mcJL$sp
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at com.datastax.spark.connector.rdd.CassandraLimit$.limitForIterator(CassandraLimit.scala:21)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.compute(CassandraTableScanRDD.scala:367)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: scala
You are providing wrong cassandra connector. You are using scala 2.11 and using connector 2.10. Try with:
spark-submit --packages datastax:spark-cassandra-connector:2.0.0-s_2.11 --class "SimpleApp" --master local[4] target/scala-2.11/simple-project_2.11-1.0.jar
So the problem that I am having is that I don't seem to be able to create a sparkcontext. And I have no idea why not.
Here is my code:
import org.apache.spark.{SparkConf, SparkContext}
object spark_test{
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("Datasets Test").setMaster("local")
val sc= new SparkContext(conf)
println(sc)
}
}
And here is the result that I am getting:
Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)Lscala/collection/mutable/ArrayOps;
at org.apache.spark.SparkConf.getAkkaConf(SparkConf.scala:203)
at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:68)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:126)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:139)
at spark_test$.main(test.scala:6)
at spark_test.main(test.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
Any thoughts?
Your scala version is too new and spark-core version is too old . I am using scala 2.11.8 and spark-core_2.11:2.0.1,you can try it!
When I run this code
val conf = new SparkConf().setMaster("local[2]").setAppName("MLlib")
val sc = new SparkContext(conf)
val data = sc.textFile("hdfs://192.168.1.20:8020/user/sparkMLlib/input/testMLlib.csv")
val parsedData = data.map(s => Vectors.dense(s.split(',').map(_.toDouble))).cache()
// Cluster the data into two classes using KMeans
val numClusters = 2
val numIterations = 20
val clusters = KMeans.train(parsedData, numClusters, numIterations)
// Evaluate clustering by computing Within Set Sum of Squared Errors
val WSSSE = clusters.computeCost(parsedData)
println("Within Set Sum of Squared Errors = " + WSSSE)
// Save and load model
clusters.save(sc, "hdfs://192.168.1.20:8020/user/sparkMLlib/output/")
val sameModel = KMeansModel.load(sc, "hdfs://192.168.1.20:8020/user/sparkMLlib/output/")
I get a runtime exception:
16/05/04 14:43:40 ERROR DefaultWriterContainer: Aborting task.
java.lang.NoClassDefFoundError: org/apache/spark/sql/types/GenericArrayData
at org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:207)
at org.apache.spark.mllib.linalg.VectorUDT.serialize(Vectors.scala:179)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$UDTConverter.toCatalystImpl(CatalystTypeConverters.scala:142)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401)
at org.apache.spark.sql.execution.RDDConversions$$anonfun$productToRowRdd$1$$anonfun$apply$1.apply(ExistingRDD.scala:39)
at org.apache.spark.sql.execution.RDDConversions$$anonfun$productToRowRdd$1$$anonfun$apply$1.apply(ExistingRDD.scala:36)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:263)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.types.GenericArrayData
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
I don't understand this at all.Is it a import or dependency problem ,I looked everywhere but I find nothing.It seems a really strange exception to get. Can anyone explain?