Spark MLLib exception LDA.class compiled against incompatible version - scala

when I try to run a Main class with sbt, I get this error. What am I missing?
Error:scalac: missing or invalid dependency detected while loading class file 'LDA.class'.
Could not access type Logging in package org.apache.spark,
because it (or its dependencies) are missing. Check your build definition for missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
A full rebuild may help if 'LDA.class' was compiled against an incompatible version of org.apache.spark.
My build.sbt looks like this:
"org.apache.spark" %% "spark-core" % "2.0.1" % Provided,
"org.apache.spark" % "spark-mllib_2.11" % "1.3.0"

You are trying to run the old sparkMllib on new Spark Core. i.e. your version of mllib and spark core are totally different.
Try using this:
"org.apache.spark" %% "spark-core_2.11" % "2.0.1",
"org.apache.spark" %% "spark-mllib_2.11" % "2.0.1"
Thant might solve your problem !

Related

upgrade from Scala 2.11.8 to 2.12.10 build fails at sbt due to conflicting cross-version suffixes

I am trying to upgrade my Scala version from 2.11.8 to 2.12.10. I made following changes in my sbt file.
"org.apache.spark" %% "spark-core" % "2.4.7" % "provided",
"org.apache.spark" %% "spark-sql" % "2.4.7" % "provided",
"com.holdenkarau" %% "spark-testing-base" % "3.1.2_1.1.0" % "test"
when I am build the sbt file, I am getting following error
[error] Modules were resolved with conflicting cross-version suffixes in ProjectRef(uri("file:/Users/user/IdeaProjects/project/"), "root"):
[error] io.reactivex:rxscala _2.12, _2.11
Tried following possible ways. but no luck.
("io.reactivex" % "rxscala_2.12" % "0.27.0").force().exclude("io.reactivex","rxscala_2.11")
2.Removed Scala verion=2.11.8 from File->project structure->global libraries.
Any Help will be very useful.
It would be best if you could post your entire build.sbt file so it could be reproduced, but as a starting point, the dependency spark-testing-base is pointing to the wrong Spark version. From the documentation:
So you include com.holdenkarau.spark-testing-base [spark_version]_1.0.0 and extend
Based on the information you have provided you should be using:
"com.holdenkarau" %% "spark-testing-base" % "2.4.7_1.1.0" % "test"

finding spark scala packages

I'm sure this is simpler than it looks, but I'm willing to look dumb.
I'm working my way through some Scala/Spark examples, which occasionally call for adding library dependencies, eg,
libraryDependencies ++= Seq(
scalaTest % Test,
"org.apache.spark" %% "spark-core" % "2.2.0" % "provided",
"org.apache.spark" %% "spark-mllib" % "2.2.0"
)
The question is, how do you find the appropriate names and versions for the libraries? It seems the texts all give import statements; there has to be some kind of registry or something. But where?
The correct version of library can always search from the mvnrepository .If you are trying to access the version from proprietary Distribution you need to add the repository of that Distribution.
Cloudera repository
MapR repository
hdp_maven_artifacts

can`t import kamon-play-26 using SBT

I updated my play to 2.6.0. I have a kamon dependency but sbt can't resolve this dependency.
Did anyone encounter this problem too?
Below is my libraryDependencies in the build.sbt:
libraryDependencies +=
Seq(
ws,
"com.google.inject" % "guice" % "3.0",
"com.typesafe.play" %% "play-json" % "2.6.0",
"io.kamon" %% "kamon-play-26" % "0.6.7"
)
But I get a below error as kamon-play-26 not found...
Kamon for Play 2.6 is available for Scala 2.11 and 2.12 with:
"io.kamon" %% "kamon-play-2.6" % "0.6.8"
Note the period in 2.6.
Searching through the kamon repositories in maven reveals that there is no kamon-play-26 package.
The github page https://github.com/kamon-io/kamon-play indicates that it does exist however. Perhaps its been pulled because the build is failing. Compile your own package from source, perhaps?

Using s3n:// with spark-submit

I've written a Spark app that is to run on a cluster using spark-submit. Here's part of my build.sbt.
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.1" % "provided" exclude("asm", "asm")
libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.6.1" % "provided"
asm is excluded because I'm using another library which depends on a different version of it. The asm dependency in Spark seems to come from one of Hadoop's dependents, and I'm not using the functionality.
The problem now is that with this setup, saveToTextFile("s3n://my-bucket/dir/file") throws java.io.IOException: No FileSystem for scheme: s3n.
Why is this happening? Shouldn't spark-submit provide the Hadoop dependencies?
I've tried a few things:
Leaving out "provided"
Putting hadoop-aws on the classpath, via a jar and spark.executor.extraClassPath and spark.driver.extraClassPath. This requires doing the same for all of its transitive dependencies though, which can be painful.
Neither really works. Is there a better approach?
I'm using the pre-built spark-1.6.1-bin-hadoop2.6.

IntelliJ: scalac bad symbolic reference

In my build.sbt file I have this in my project.
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.3.1"
libraryDependencies += "org.apache.spark" % "spark-hive_2.10" % "1.3.1"
libraryDependencies += "org.apache.spark" % "spark-graphx_2.10" % "1.3.1"
libraryDependencies += "org.apache.spark" % "spark-mllib_2.10" % "1.3.1"
I just let it download all the libraries automatically. I'm adding graphx, the spark-core, and the scala sdk to one of my project modules but when I try to compile I'm getting:
Error:scalac: bad symbolic reference. A signature in RDD.class refers to term hadoop
in package org.apache which is not available.
It may be completely missing from the current classpath, or the version on
the classpath might be incompatible with the version used when compiling RDD.class.
Error:scalac: bad symbolic reference. A signature in RDD.class refers to term io
in value org.apache.hadoop which is not available.
It may be completely missing from the current classpath, or the version on
the classpath might be incompatible with the version used when compiling RDD.class.
Error:scalac: bad symbolic reference. A signature in RDD.class refers to term compress
in value org.apache.io which is not available.
It may be completely missing from the current classpath, or the version on
the classpath might be incompatible with the version used when compiling RDD.class.
The weird thing is if I download graphx/mllib directly from the maven repositories it seems to compile. Any ideas?
Another possible source of error is the incorrect scalac version setting in the project. Right click project -> Open module settings -> Global Libraries, change/add the scala-sdk version appropriate to your project
Please add the hadoop dependency. Something like
libraryDependencies += "org.apache.hadoop" %% "hadoop-common" % "2.7.1"
libraryDependencies += "org.apache.hadoop" %% "hadoop-hdfs" % "2.7.1"
You may need to add other hadoop modules depending on your app.