Spark sbt scala build error - missing jars from ivy - scala

I am trying to execute spark project from eclipse tool.
In build.sbt I have added below
name := "simple-spark-scala"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "1.6.2"
When I am importing this project I am getting error -
project is missing library around 100 such error
Description Resource Path Location Type
Project 'simple-spark' is missing required library: '/root/.ivy2/cache/aopalliance/aopalliance/jars/aopalliance-1.0.jar' simple-spark Build path Build Path Problem
However I am able to see all the jars under the mentioned path in missing jar files
Any idea how to resolve ?

Add dependency resolvers as below in your built.sbt file
resolvers += "MavenRepository" at "https://mvnrepository.com/"

Because, you dont link Spark Packages Repo. You can see my built.sbt below:
name := "spark"
version := "1.0"
scalaVersion := "2.11.8"
resolvers += "Spark Packages Repo" at "http://dl.bintray.com/spark-packages/maven"
libraryDependencies ++= Seq(
"org.apache.spark" % "spark-core_2.11" % "2.1.0",
"org.apache.spark" % "spark-sql_2.11" % "2.1.0",
"org.apache.spark" % "spark-graphx_2.11" % "2.1.0",
"org.apache.spark" % "spark-mllib_2.11" % "2.1.0",
"neo4j-contrib" % "neo4j-spark-connector" % "2.0.0-M2"
)

Related

Sbt assembly for multiple targets

I need to create fat jars for multiple version of scala using sbt assembly.
When I target a single version, I write in simple.sbt:
scalaVersion := "2.11.12"
And the fat jar is output to target/scala-2.11/Kernalytics-assembly-1.0.jar. Now I would like to also target Scala 2.12. I could edit the sbt file to change scalaVersion, but I would like the assembly process to be automated over a range of versions of Scala when I call sbt assembly.
If I use crossScalaVersions:
name := "Kernalytics"
version := "1.0"
crossScalaVersions := Seq("2.11.12", "2.12.4")
libraryDependencies ++= Seq(
"org.scalanlp" %% "breeze" % "0.13.2",
"org.scalanlp" %% "breeze-natives" % "0.13.2",
"org.scalanlp" %% "breeze-viz" % "0.13.2"
)
libraryDependencies += "commons-io" % "commons-io" % "2.6"
resolvers += "Sonatype Releases" at "https://oss.sonatype.org/content/repositories/releases/"
libraryDependencies += "org.scalactic" %% "scalactic" % "3.0.4"
libraryDependencies += "org.scalatest" %% "scalatest" % "3.0.4" % "test"
The only output is target/scala-2.12/Kernalytics-assembly-1.0.jar
If you use crossScalaVersions I think you need to prefix the command with a '+' if you want to build for all versions.
From Cross-Building a Project:
To build against all versions listed in crossScalaVersions, prefix the action to run with +

not able to import spark mllib in IntelliJ

I am not able to import spark mllib libraries in Intellij for Spark scala project. I am getting a resolution exception.
Below is my sbt.build
name := "ML_Spark"
version := "0.1"
scalaVersion := "2.11.12"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.2.1"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.1"
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "2.2.1" % "runtime"
I tried to copy/paste the same build.sbt file you provided and i got the following error :
[error] [/Users/pc/testsbt/build.sbt]:3: ';' expected but string literal found.
Actually, the build.sbt is invalid :
intellij error
Having the version and the Scala version in different lines solved the problem for me :
name := "ML_Spark"
version := "0.1"
scalaVersion := "2.11.12"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.2.1"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.1"
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "2.2.1" % "runtime"
I am not sure that this is the problem you're facing (can you please share the exception you had ?), it might be a problem with the repositories you specified under the .sbt folder in your home directory.
I have met the same problem before. To solve it, I just used the compiled version of mllib instead of the runtime one. Here is my conf:
name := "SparkDemo"
version := "0.1"
scalaVersion := "2.11.12"
// https://mvnrepository.com/artifact/org.apache.spark/spark-core
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.3.0"
// https://mvnrepository.com/artifact/org.apache.spark/spark-mllib
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "2.3.0"
I had a similar issue, but I found a workaround. Namely, you have to add the spark-mllib jar file to your project manually. Indeed, despite my build.sbt file was
name := "example_project"
version := "0.1"
scalaVersion := "2.12.10"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "3.0.0",
"org.apache.spark" %% "spark-sql" % "3.0.0",
"org.apache.spark" %% "spark-mllib" % "3.0.0" % "runtime"
)
I wasn't able to import the spark library with
import org.apache.spark.sql._
import org.apache.spark.sql.types._
import org.apache.spark.ml._
The solution that worked for me was to add the jar file manually. Specifically,
Download the jar file of the ml library you need (e.g. for spark 3 use https://mvnrepository.com/artifact/org.apache.spark/spark-mllib_2.12/3.0.0 ).
Follow this link to add the jar file to your intelliJ project: Correct way to add external jars (lib/*.jar) to an IntelliJ IDEA project
Add also the mlib-local jar (https://mvnrepository.com/artifact/org.apache.spark/spark-mllib-local)
If, for some reason, you compile again the build.sbt you need to re-import the jar file again.

Trying to get Apache Spark working with IntelliJ

I am trying to get Apache Spark working with IntelliJ. I have created an SBT project in IntelliJ and done the following:
1. Gone to File -> Project Structure -> Libraries
2. Clicked the '+' in the middle section, clicked Maven, clicked Download Library from Maven Repository, typed text 'spark-core' and org.apache.spark:spark-core_2.11:2.2.0, which is the latest version of Spark available
I downloaded the jar files and the source code into ./lib in the project folder
3. The Spark library is now showing in the list of libraries
4. Then I right-clicked on org.apache.spark:spark-core_2.11:2.2.0 and clicked Add to Project and Add to Modules
Now when I click on Modules on the left, and then my main project folder, and then Dependencies tab on the right I can see the external library as a Maven library, but after clicking Apply, re-building the project and re-starting IntelliJ, it will not show as an external library in the project. Therefore I can't access the Spark API commands.
What am I doing wrong please? I've looked at all the documentation on IntelliJ and a hundred other sources but can't find the answer.
Also, do I also need to include the following text in the build.SBT file, as well as specifying Apache Spark as an external library dependency? I assume that I need to EITHER include the code in the build.SBT file, OR add Spark as an external dependency manually, but not both.
I included this code in my build.SBT file:
name := "Spark_example"
version := "1.0"
scalaVersion := "2.12.3"
val sparkVersion = "2.0.0"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-streaming" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion
)
I get an error: sbt.ResolveException: unresolved dependency: org.apache.spark#spark-core_2.12;2.2.0: not found
Please help! Thanks
Spark does not have builds for Scala version 2.12.x. So set the Scala version to 2.11.x
scalaVersion := "2.11.8"
val sparkVersion = "2.0.0"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-streaming" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion
)
name := "Test"
version := "0.1"
scalaVersion := "2.11.7"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.2.0.2.6.4.0-91"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.0.2.6.4.0-91"
libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.2.0.2.6.4.0-91" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "2.2.0.2.6.4.0-91" % "runtime"
libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.2.0.2.6.4.0-91" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-hive-thriftserver" % "2.2.0.2.6.4.0-91" % "provided"

IntelliJ Idea 2016.2.4 cannot resolve symbol spark_2.11

I made a following dependency in build.sbt file for apache-spark 2.11.
name := "Project1"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.0.1"
libraryDependencies ++= Seq(
"org.scala-lang" % "scala-compiler" % "2.11.8",
"org.scala-lang" % "scala-reflect" % "2.11.8",
"org.scala-lang.modules" % "scala-parser-combinators_2.11" % "1.0.4",
"org.scala-lang.modules" % "scala-xml_2.11" % "1.0.4"
)
However Intellij could not resolve spark-core_2.11 dependency . I tried multiple times but could not succeed. Thanks in Advance.
I had the same problem in IntelliJ 2016.3.2 with almost the same Scala/Spark versions:
name := "some-project"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.1.0"
To get it to work I had to manually add the spark-core jar to my project libraries, ie:
Right click on the project -> Open Module Settings
Under Project Settings -> Libraries click + and select the 'Java' option.
Browse for the jar. I found it in my Ivy cache - I assume it got there because I had run the 'update' task from the sbt console previously.

Libraries not resolving in sbt within intellij, but resolves and compiles from commandline

The below sbt file doesn't resolve the spark-xml databricks package from within intelliJ Idea where as works fine with command line
name := "dataframes"
version := "1.0"
scalaVersion := "2.11.8"
val sparkVersion="1.4.0"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion % "provided",
"org.apache.spark" %% "spark-sql" % sparkVersion,
"com.databricks" %% "spark-xml" % "0.3.3"
)
resolvers ++= Seq(
"Apache HBase" at "https://repository.apache.org/content/repositories/releases",
"Typesafe repository" at "http://repo.typesafe.com/typesafe/releases/"
)
resolvers += Resolver.mavenLocal
The sbt was set to both bundled and other pointing to the locally installed sbt, but did not work either way.
The below package resolves and works perfectly from commandline
import com.databricks.spark._