SBT: cannot resolve dependency that used to work before - scala

My build.sbt looks like this:
import sbt._
name := "spark-jobs"
version := "0.1"
scalaVersion := "2.11.8"
resolvers += "Spark Packages Repo" at "https://dl.bintray.com/spark-packages/maven"
// additional libraries
libraryDependencies ++= Seq(
"org.apache.spark" % "spark-core_2.11" % "2.2.0" % "provided",
"org.apache.spark" % "spark-streaming_2.11" % "2.2.0",
"org.apache.spark" % "spark-sql_2.11" % "2.2.0" % "provided",
"org.apache.spark" % "spark-streaming-kafka-0-10_2.11" % "2.2.0"
)
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs # _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
This used to work until I decided to see what happens if I add another % "provided" at the end of spark-streaming_2.11. It failed to resolve dependency, I moved on and reverted the change. But, it seems to give me the exception after that as well. Now my build.sbt looks exactly like it used to when everything worked. Still, it gives me this exception :
[error] (*:update) sbt.librarymanagement.ResolveException: unresolved dependency: org.apache.spark#spark-streaming_2.11;2.2.0: org.apache.spark#spark-parent_2.11;2.2.0!spark-parent_2.11.pom(pom.original) origin location must be absolute: file:/home/aswin/.m2/repository/org/apache/spark/spark-parent_2.11/2.2.0/spark-parent_2.11-2.2.0.pom
SBT's behavior is a bit confusing to me. Could someone guide me to as why this could happen? Any good blogs/ resources to understand how exactly SBT works under the hood is also welcome.
Here is my project/assembly.sbt:
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.6")
project/build.properties:
sbt.version = 1.0.4
project/plugins.sbt:
resolvers += Resolver.url("artifactory", url("http://scalasbt.artifactoryonline.com/scalasbt/sbt-plugin-releases"))(Resolver.ivyStylePatterns)
resolvers += "Typesafe Repository" at "http://repo.typesafe.com/typesafe/releases/"
Thank you!

If you are in sbt console, just run reload command and try again. After you update your dependencies or sbt plugins, you need to reload the project so that the changes take effect.
By the way, instead of defining the Scala version in your dependencies, you can just use %% operator and it will fetch the appropriate dependency according to your defined scala version.
// additional libraries
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.2.0" % "provided",
"org.apache.spark" %% "spark-streaming" % "2.2.0",
"org.apache.spark" %% "spark-sql" % "2.2.0" % "provided",
"org.apache.spark" %% "spark-streaming-kafka-0-10" % "2.2.0"
)

Related

Question: libraries dependency Spark Scala in Intellij --Unresolved dependencies path: resolved

Checking the logs is usually helpful. When you use:
libraryDependencies += "org.apache.spark" %% "spark-core" % "3.1.2"
you will find a repo link:
not found: https://repo1.maven.org/maven2/org/apache/spark/spark-core_2.13/3.1.2/spark-core_2.13-3.1.2.pom
Here you can see that you can't reach the exact dir because of %% always tries to find itself but can't reach.
Use % only and try to give the manual path like this added the postfix to spark_sql.
libraryDependencies += "org.apache.spark" % "spark-sql_2.12" % "3.1.2" % "provided"
libraryDependencies += "org.apache.spark" % "spark-core_2.12" % "3.1.2" % "provided"
According to JxD’s own response:
Here, you can see that you can't reach the exact dir because of %%.
Use % only and try to give the manual path like this added as the postfix to spark_sql dependency.
libraryDependencies += "org.apache.spark" % "spark-sql_2.12" % "3.1.2" % "provided"
libraryDependencies += "org.apache.spark" % "spark-core_2.12" % "3.1.2" % "provided"

not able to import spark mllib in IntelliJ

I am not able to import spark mllib libraries in Intellij for Spark scala project. I am getting a resolution exception.
Below is my sbt.build
name := "ML_Spark"
version := "0.1"
scalaVersion := "2.11.12"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.2.1"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.1"
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "2.2.1" % "runtime"
I tried to copy/paste the same build.sbt file you provided and i got the following error :
[error] [/Users/pc/testsbt/build.sbt]:3: ';' expected but string literal found.
Actually, the build.sbt is invalid :
intellij error
Having the version and the Scala version in different lines solved the problem for me :
name := "ML_Spark"
version := "0.1"
scalaVersion := "2.11.12"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.2.1"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.1"
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "2.2.1" % "runtime"
I am not sure that this is the problem you're facing (can you please share the exception you had ?), it might be a problem with the repositories you specified under the .sbt folder in your home directory.
I have met the same problem before. To solve it, I just used the compiled version of mllib instead of the runtime one. Here is my conf:
name := "SparkDemo"
version := "0.1"
scalaVersion := "2.11.12"
// https://mvnrepository.com/artifact/org.apache.spark/spark-core
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.3.0"
// https://mvnrepository.com/artifact/org.apache.spark/spark-mllib
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "2.3.0"
I had a similar issue, but I found a workaround. Namely, you have to add the spark-mllib jar file to your project manually. Indeed, despite my build.sbt file was
name := "example_project"
version := "0.1"
scalaVersion := "2.12.10"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "3.0.0",
"org.apache.spark" %% "spark-sql" % "3.0.0",
"org.apache.spark" %% "spark-mllib" % "3.0.0" % "runtime"
)
I wasn't able to import the spark library with
import org.apache.spark.sql._
import org.apache.spark.sql.types._
import org.apache.spark.ml._
The solution that worked for me was to add the jar file manually. Specifically,
Download the jar file of the ml library you need (e.g. for spark 3 use https://mvnrepository.com/artifact/org.apache.spark/spark-mllib_2.12/3.0.0 ).
Follow this link to add the jar file to your intelliJ project: Correct way to add external jars (lib/*.jar) to an IntelliJ IDEA project
Add also the mlib-local jar (https://mvnrepository.com/artifact/org.apache.spark/spark-mllib-local)
If, for some reason, you compile again the build.sbt you need to re-import the jar file again.

Need some help in fixing the Spark streaming dependency (Scala sbt)

I am trying to run basic spark streaming example on my machine using IntelliJ, but I am unable to resolve the dependency issues.
Please help me in fixing it.
name := "demoSpark"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies ++= Seq("org.apache.spark"% "spark-core_2.11"%"2.1.0",
"org.apache.spark" % "spark-sql_2.10" % "2.1.0",
"org.apache.spark" % "spark-streaming_2.11" % "2.1.0",
"org.apache.spark" % "spark-mllib_2.10" % "2.1.0"
)
At the very least, all the dependencies must use the same version of Scala, not a mix of 2.10 and 2.11. You can use %% symbol in sbt to ensure the right version is selected (the one you specified in scalaVersion).
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.1.0",
"org.apache.spark" %% "spark-sql" % "2.1.0",
"org.apache.spark" %% "spark-streaming" % "2.1.0",
"org.apache.spark" %% "spark-mllib" % "2.1.0"
)

Trying to get Apache Spark working with IntelliJ

I am trying to get Apache Spark working with IntelliJ. I have created an SBT project in IntelliJ and done the following:
1. Gone to File -> Project Structure -> Libraries
2. Clicked the '+' in the middle section, clicked Maven, clicked Download Library from Maven Repository, typed text 'spark-core' and org.apache.spark:spark-core_2.11:2.2.0, which is the latest version of Spark available
I downloaded the jar files and the source code into ./lib in the project folder
3. The Spark library is now showing in the list of libraries
4. Then I right-clicked on org.apache.spark:spark-core_2.11:2.2.0 and clicked Add to Project and Add to Modules
Now when I click on Modules on the left, and then my main project folder, and then Dependencies tab on the right I can see the external library as a Maven library, but after clicking Apply, re-building the project and re-starting IntelliJ, it will not show as an external library in the project. Therefore I can't access the Spark API commands.
What am I doing wrong please? I've looked at all the documentation on IntelliJ and a hundred other sources but can't find the answer.
Also, do I also need to include the following text in the build.SBT file, as well as specifying Apache Spark as an external library dependency? I assume that I need to EITHER include the code in the build.SBT file, OR add Spark as an external dependency manually, but not both.
I included this code in my build.SBT file:
name := "Spark_example"
version := "1.0"
scalaVersion := "2.12.3"
val sparkVersion = "2.0.0"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-streaming" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion
)
I get an error: sbt.ResolveException: unresolved dependency: org.apache.spark#spark-core_2.12;2.2.0: not found
Please help! Thanks
Spark does not have builds for Scala version 2.12.x. So set the Scala version to 2.11.x
scalaVersion := "2.11.8"
val sparkVersion = "2.0.0"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-streaming" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion
)
name := "Test"
version := "0.1"
scalaVersion := "2.11.7"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.2.0.2.6.4.0-91"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.0.2.6.4.0-91"
libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.2.0.2.6.4.0-91" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "2.2.0.2.6.4.0-91" % "runtime"
libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.2.0.2.6.4.0-91" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-hive-thriftserver" % "2.2.0.2.6.4.0-91" % "provided"

resolviing SBT dependencies

I am new to JVM development (I am using Scala and SBT) and am having trouble resolving dependencies. Yesterday, I had trouble resolving the org.restlet.2.1.1 dependency and today, I am having trouble with resolving the following:
[error] (*:update) sbt.ResolveException: unresolved dependency: com.mongodb.casbah#casbah_2.9.2;2.1.5-1: not found
[error] unresolved dependency: org.scalatra#scalatra_2.9.2;2.3.0: not found
[error] unresolved dependency: org.scalatra#scalatra-akka2_2.9.2;2.3.0: not found
[error] unresolved dependency: org.scalatra#scalatra-specs2_2.9.2;2.3.0: not found
I am using a giter8 scalatra-mongodb project template from github: click me. Since the project is a little old, it stands to reason that I am trying to obtain outdated versions that no longer exist or are compatible. What does one do in this situation? I tried fiddling with the version numbers in my build.sbt file, but this did not work (and appears to be worse!).
The following is the contents of my build.sbt file:
scalaVersion := "2.9.2"
mainClass := Some("JettyLauncher")
seq(webSettings :_*)
port in container.Configuration := 8080
seq(assemblySettings: _*)
libraryDependencies ++= Seq(
"com.mongodb.casbah" %% "casbah" % "2.8.1-1",
"org.scalatra" %% "scalatra" % "2.2.0",
"org.scalatra" %% "scalatra-akka2" % "2.2.0",
"org.scalatra" %% "scalatra-specs2" % "2.2.0" % "test",
"org.mortbay.jetty" % "servlet-api" % "3.0.20100224" % "provided",
"org.eclipse.jetty" % "jetty-server" % "8.0.0.M3" % "container, compile",
"org.eclipse.jetty" % "jetty-util" % "8.0.0.M3" % "container, compile",
"org.eclipse.jetty" % "jetty-webapp" % "8.0.0.M3" % "container, compile"
)
resolvers ++= Seq(
"Sonatype OSS" at "http://oss.sonatype.org/content/repositories/releases/",
"Sonatype OSS Snapshots" at "http://oss.sonatype.org/content/repositories/snapshots/",
"Typesafe Repository" at "http://repo.typesafe.com/typesafe/releases/",
"Akka Repo" at "http://akka.io/repository/",
"Web plugin repo" at "http://siasia.github.com/maven2"
)
The following is my plugins.sbt file:
addSbtPlugin("com.earldouglas" %% "xsbt-web-plugin" % "0.9.0")
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.7.2")
Note that when I first generated the template, I was receiving missing dependencies for this first plugin. Fortunately, the github page for this plugin gave updated instructions and I am able to get past this dependency.
Anyway, what are the versions of these dependencies that I need to get everything working? In general what is a strategy for resolving these dependencies (right now I have no idea what to do (other than visit the github pages and fiddle with version numbers)?
Thanks for all the help!
I have at least gotten sbt to resolve my dependencies for whatever that is worth (it has classpath issues now...). Anyway, the following is my new and improved build.sbt file:
scalaVersion := "2.10.4"
mainClass := Some("JettyLauncher")
seq(webSettings :_*)
port in container.Configuration := 8080
seq(assemblySettings: _*)
libraryDependencies += "org.mongodb" %% "casbah-core" % "2.7.3"
libraryDependencies += "org.scalatra" %% "scalatra" % "2.2.0-RC3" cross CrossVersion.binary
libraryDependencies += "org.scalatra" %% "scalatra-akka" % "2.2.0-RC3"
libraryDependencies += "org.scalatra" %% "scalatra-specs2" % "2.2.0" % "test"
libraryDependencies += "org.mortbay.jetty" % "servlet-api" % "3.0.20100224" % "provided"
libraryDependencies += "org.eclipse.jetty" % "jetty-server" % "9.0.0.M5" % "container"
libraryDependencies += "org.eclipse.jetty" % "jetty-util" % "9.0.0.M5" % "container"
libraryDependencies += "org.eclipse.jetty" % "jetty-webapp" % "9.0.0.M5" % "container"
resolvers ++= Seq(
"Sonatype releases" at "http://oss.sonatype.org/content/repositories/releases/",
"Sonatype snapshots" at "http://oss.sonatype.org/content/repositories/snapshots/",
"Typesafe Repository" at "http://repo.typesafe.com/typesafe/releases/",
"Akka Repo" at "http://akka.io/repository/",
"Web plugin repo" at "http://siasia.github.com/maven2"
)
All I really did was spend a few hours googling and searching mvn repositories that were compatible with the scala version I am using (2.10.4). I learned that %% will append the scala-version to the dependency name (seems like a nice convention since Scala is always evolving). Once, I got a few dependencies resolved, the rest caved!