I am trying to run a scala project with spark. I did the following in my file:
package com.sparksql.count
import org.apache.log4j.Level
import org.apache.log4j.Logger
import org.apache.spark.SparkConf
import org.apache.spark._
However it told me that log4j and spark cant be found. I did a bit of research and figured it must have been the sbt. I then went to the SBT and i added in the library dependency as below
name := "SampleLearning"
version := "0.1"
scalaVersion := "2.12.4"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.2.0"
However, it still cant work and says there is something wrong with librayDependencies. Can anyone help?
Spark doesn't support Scala 2.12. To work with Spark use Scala 2.11
Spark 1.2 hasn't been supported in years. Use either 2.x branch (latest is 2.3.0) or (for legacy application) latest 1.x release (1.6.3) but it is quite outdated today.
Related
I'm new to Scala Spark and I'm trying to create an example project using Intellij. During Project creation I choose Scala and Sbt with Scala version 2.12 but When I tried adding spark-streaming version 2.3.2 if kept erroring out so I Google'd around and on Apache's website I found the sbt config shown below and I'm still getting the same error.
Error: Could not find or load main class SparkStreamingExample
Caused by: java.lang.ClassNotFoundException: SparkStreamingExample
How can it be determined which version of Scala works with which version of Spark Dependencies?
name := "SparkStreamExample"
version := "0.1"
scalaVersion := "2.11.8"
libraryDependencies ++= Seq(
"org.apache.spark" % "spark-streaming_2.11" % "2.3.2"
)
My Object class is very basic doesn't have much to it...
import org.apache.spark.SparkConf
import org.apache.spark.streaming.StreamingContext
object SparkStreamingExample extends App {
println("SPARK Streaming Example")
}
You can see the version of Scala that is supported by Spark in the Spark documentation.
As of this writing, the documentation says:
Spark runs on Java 8+, Python 2.7+/3.4+ and R 3.1+. For the Scala API, Spark 2.3.2 uses Scala 2.11. You will need to use a compatible Scala version (2.11.x).
Notice that only Scala 2.11.x is supported.
I am trying to run a simple scala program in IntelliJ.
My build.sbt looks like this:
name := "UseThis"
version := "0.1"
scalaVersion := "2.12.4"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.2.0"
And my import code looks like this:
package WorkWork.FriendsByAge
import com.sun.glass.ui.Window.Level
import com.sun.istack.internal.logging.Logger
import org.apache.spark._
import org.apache.spark.SparkContext._
import org.apache.log4j._
I don't get why the import fails. It tells me the dependency failed to load or wasn't found, but I put the line in the build.sbt as required. Is there some other step I need to have done? I've installed spark. Is it the version note at the end of the build line? I don't even know how to check what version I have of spark.
I'm trying to teach myself Scala (not a noob though, I know Python, R, various flavors of SQL, C#) but my word even setting it up is nigh on impossible, and apparently getting it to even run is too. Any ideas?
Take a look at this page here: Maven Central (Apache Spark core)
Unless you have set up some other repositories, the dependencies that are going to be loaded by sbt usually come from there.
There is a version column with numbers like 2.2.1, then there comes a scala column with numbers like 2.11, 2.10. In order for spark and scala to work together, you have to pick a valid combination from this table.
As of 28.Feb 2018, there are no versions of Spark that work with scala 2.12.4. The latest version of scala for which 1.2.0 works is 2.11. So, you will probably want to set scala version to 2.11.
Also note that the %% syntax in your SBT in
"org.apache.spark" %% "spark-core" % "1.2.0"
will automatically append the suffix _2.12 to the artifact-id-part. Since there is no spark-core_2.12, it cannot resolve the dependency, and you can't import anything in your code.
By the way: there was a big difference between spark 1.2 and spark 1.3, and then there was again a big difference between 1.x and 2.x. Does it really have to be 1.2?
Its because Spark 1.2 is not available for Scala 2.12
https://mvnrepository.com/artifact/org.apache.spark/spark-core
I used to code Scala within the terminal, but now I'm trying
with the ScalaIDE for Eclipse.
However, I've got one big problem:
error: not found: value sc
I've tried to add those libraries
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
But then it displays:
object apache is not a member of package org
So I don't know what to do....
In my IntelliJ project my build.sbt is pretty empty:
name := "test"
version := "1.0"
scalaVersion := "2.11.7"
is it a sbt project? make sure you have eclipse plugin for scala/sbt, and import it as a sbt project.
also, add the dependency on the build.sbt
nevertheless, I prefer Intellij :)
I worked with the spark implementation of Random Forest in the shell, and this import runs fine:
import org.apache.spark.mllib.tree.RandomForest
However, when I try to compile it as a standalone file, it fails. The exact error is:
5: object RandomForest is not a member of package org.apache.spark.mllib.tree
I have included mllib in my sbt file too, so can someone please tell me where this error arises? My code:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.mllib.tree.RandomForest
My sbt file:
name := "churn"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies ++= Seq(
"org.apache.spark" % "spark-core_2.10" % "1.5.2" % "provided",
"org.apache.spark" % "spark-mllib_2.10" % "1.5.2"
)
Edit:
My-MBP:Churn admin$ sbt 'show libraryDependencies'
[info] Set current project to churn (in build file:/Users/admin/Desktop/Churn/)
[info] List(org.scala-lang:scala-library:2.10.4, org.apache.spark:spark-core_2.10:1.1.0, org.apache.spark:spark-mllib_2.10:1.1.0)
My-MBP:Churn admin$ sbt scalaVersion
[info] Set current project to churn (in build file:/Users/admin/Desktop/Churn/)
[info] 2.10.4
tl;dr Use Spark 1.2.0 or later.
According to the history of org/apache/spark/mllib/tree/RandomForest.scala on GitHub the first version that supports Random Forest is 1.2.0 (see the tags the file was tagged with).
Even though you've showed that your build.sbt has 1.5.2 declared, the output of sbt 'show libraryDependencies' doesn't confirm it as it says:
org.apache.spark:spark-mllib_2.10:1.1.0
1.1.0 is the effective version of Spark MLlib you use in your project. That version has no support for Random Forest.
Spark has a dependency on json4s 3.2.10, but this version has several bugs and I need to use 3.2.11. I added json4s-native 3.2.11 dependency to build.sbt and everything compiled fine. But when I spark-submit my JAR it provides me with 3.2.10.
build.sbt
import sbt.Keys._
name := "sparkapp"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.3.0" % "provided"
libraryDependencies += "org.json4s" %% "json4s-native" % "3.2.11"`
plugins.sbt
logLevel := Level.Warn
resolvers += Resolver.url("artifactory", url("http://scalasbt.artifactoryonline.com/scalasbt/sbt-plugin-releases"))(Resolver.ivyStylePatterns)
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.13.0")
App1.scala
import org.apache.spark.SparkConf
import org.apache.spark.rdd.RDD
import org.apache.spark.{Logging, SparkConf, SparkContext}
import org.apache.spark.SparkContext._
object App1 extends Logging {
def main(args: Array[String]) = {
val conf = new SparkConf().setAppName("App1")
val sc = new SparkContext(conf)
println(s"json4s version: ${org.json4s.BuildInfo.version.toString}")
}
}
sbt 0.13.7 + sbt-assembly 0.13.0
Scala 2.10.4
Is there a way to force 3.2.11 version usage?
We ran into a problem similar to the one Necro describes, but downgrading from 3.2.11 to 3.2.10 when building the assembly jar did not resolve it. We ended up solving it (using Spark 1.3.1) by shading the 3.2.11 version in the job assembly jar:
assemblyShadeRules in assembly := Seq(
ShadeRule.rename("org.json4s.**" -> "shaded.json4s.#1").inAll
)
I asked the same question in the Spark User Mailing List, and got two answers how to make it work:
Use spark.driver.userClassPathFirst=true and spark.executor.userClassPathFirst=true options, but it works only in Spark 1.3 and probably will require some other modifications like excluding Scala classes from your build.
Rebuild Spark with json4s 3.2.11 version (you can change it in core/pom.xml)
Both work fine, I prefered the second one.
This is not an answer to your question but this came up when searching for my problem. I was getting a NoSuchMethod exception in formats.emptyValueStrategy.replaceEmpty(value) in json4s's 'render'. The reason was I was building with 3.2.11 but Spark was linking 3.2.10. I downgraded to 3.2.10 and my problem went away. Your question helped me understand what was going on (that Spark was linking a conflicting version of json4s) and I was able to resolve the problem, so thanks.