Build sbt for spark with janusgraph and gremlin scala - scala

I was trying to setup a IntelliJ build for spark with janusgraph using gremlin scala but I am running into errors.
My build.sbt file is:
version := "1.0"
scalaVersion := "2.11.11"
libraryDependencies += "com.michaelpollmeier" % "gremlin-scala" % "2.3.0"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.2.1"
// https://mvnrepository.com/artifact/org.apache.spark/spark-sql
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.1"
// https://mvnrepository.com/artifact/org.apache.spark/spark-mllib
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "2.2.1"
// https://mvnrepository.com/artifact/org.apache.spark/spark-hive
libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.2.1"
// https://mvnrepository.com/artifact/org.janusgraph/janusgraph-core
libraryDependencies += "org.janusgraph" % "janusgraph-core" % "0.2.0"
libraryDependencies ++= Seq(
"ch.qos.logback" % "logback-classic" % "1.2.3" % Test,
"org.scalatest" %% "scalatest" % "3.0.3" % Test
)
resolvers ++= Seq(
Resolver.mavenLocal,
"Sonatype OSS" at "https://oss.sonatype.org/content/repositories/public"
)
But I am getting errors when I try to compile code that uses gremlin scala libraries or io.Source libraries. Can someone share their build file or tell what I should modify to fix it.
Thanks in advance.
So, I was trying to compile this code:
import gremlin.scala._
import org.apache.commons.configuration.BaseConfiguration
import org.janusgraph.core.JanusGraphFactory
class Test1() {
val conf = new BaseConfiguration()
conf.setProperty("storage.backend", "inmemory")
val gr = JanusGraphFactory.open(conf)
val graph = gr.asScala()
graph.close
}
object Test{
def main(args: Array[String]) {
val t = new Test1()
println("in Main")
}
}
The errors I get are:
Error:(1, 8) not found: object gremlin
import gremlin.scala._
Error:(10, 18) value asScala is not a member of org.janusgraph.core.JanusGraph
val graph = gr.asScala()

If you go to the Gremlin-Scala GitHub page you'll see that the current version is "3.3.1.1" and that
Typically you just need to add a dependency on "com.michaelpollmeier" %% "gremlin-scala" % "SOME_VERSION" and one for the graph db of your choice to your build.sbt (this readme assumes tinkergraph). The latest version is displayed at the top of this readme in the maven badge.
It is not a surprise that the APi has changed when the major version of the
library is different. If I change your first dependency as
//libraryDependencies += "com.michaelpollmeier" % "gremlin-scala" % "2.3.0" //old!
libraryDependencies += "com.michaelpollmeier" %% "gremlin-scala" % "3.3.1.1"
then your example code compiles for me.

Related

Can you import a separate version of the same dependency into one build file for test?

I was thinking this would work but it did not for me.
libraryDependencies += "org.json4s" %% "json4s-core" % "3.6.7" % "test"
libraryDependencies += "org.json4s" %% "json4s-core" % "3.7.0" % "compile"
Any idea?
Test classpath includes compile classpath.
So create different subrojects for different versions of the dependency if you need that.
lazy val forJson4s370 = project
.settings(
libraryDependencies += "org.json4s" %% "json4s-core" % "3.7.0" % "compile"
)
lazy val forJson4s367 = project
.settings(
libraryDependencies += "org.json4s" %% "json4s-core" % "3.6.7" % "test"
)
If you don't want to create different subprojects you can try custom sbt configurations
https://www.scala-sbt.org/1.x/docs/Advanced-Configurations-Example.html
An exotic solution is to manage dependencies and compile/run code programmatically in the exceptional class. Then you can have dependencies/versions different from specified in build.sbt.
import java.net.URLClassLoader
import coursier.{Dependency, Module, Organization, ModuleName, Fetch}
import scala.reflect.runtime.universe
import scala.reflect.runtime.universe.Quasiquote
import scala.tools.reflect.ToolBox
val files = Fetch()
.addDependencies(
Dependency(Module(Organization("org.json4s"), ModuleName("json4s-core_2.13")), "3.6.7"),
)
.run()
val depClassLoader = new URLClassLoader(
files.map(_.toURI.toURL).toArray,
/*getClass.getClassLoader*/ null // ignoring current classpath
)
val rm = universe.runtimeMirror(depClassLoader)
val tb = rm.mkToolBox()
tb.eval(q"""
import org.json4s._
// some exceptional json4s 3.6.7 code
println("hi")
""")
// hi
build.sbt
libraryDependencies ++= Seq(
scalaOrganization.value % "scala-compiler" % scalaVersion.value % "test",
"io.get-coursier" %% "coursier" % "2.1.0-M7-39-gb8f3d7532" % "test",
"org.json4s" %% "json4s-core" % "3.7.0" % "compile",
)

Apache Spark 3.1.2 can't read from S3 via documented spark-hadoop-cloud

The spark docmentation suggests using spark-hadoop-cloud to read / write from S3 in https://spark.apache.org/docs/latest/cloud-integration.html .
There is no apache spark published artifact for spark-hadoop-cloud. Then when trying to use the Cloudera published module the following exception occurs
Exception in thread "main" java.lang.NoSuchMethodError: 'void com.google.common.base.Preconditions.checkArgument(boolean, java.lang.String, java.lang.Object, java.lang.Object)'
at org.apache.hadoop.fs.s3a.S3AUtils.lookupPassword(S3AUtils.java:894)
at org.apache.hadoop.fs.s3a.S3AUtils.lookupPassword(S3AUtils.java:870)
at org.apache.hadoop.fs.s3a.S3AUtils.getEncryptionAlgorithm(S3AUtils.java:1605)
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:363)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:46)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:377)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:325)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:307)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:307)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:519)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:428)
This seems like a classpath conflict. Then it seems like it's not possible to use spark-hadoop-cloud to read with the vanilla apache spark 3.1.2 jars
version := "0.0.1"
scalaVersion := "2.12.12"
lazy val app = (project in file("app")).settings(
assemblyPackageScala / assembleArtifact := false,
assembly / assemblyJarName := "uber.jar",
assembly / mainClass := Some("com.example.Main"),
// more settings here ...
)
resolvers += "Cloudera" at "https://repository.cloudera.com/artifactory/cloudera-repos/"
libraryDependencies += "org.apache.spark" %% "spark-core" % "3.1.1" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "3.1.1" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-hadoop-cloud" % "3.1.1.3.1.7270.0-253"
libraryDependencies += "org.apache.hadoop" % "hadoop-aws" % "3.1.1.7.2.7.0-184"
libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "3.1.1.7.2.7.0-184"
libraryDependencies += "com.amazonaws" % "aws-java-sdk-bundle" % "1.11.901"
libraryDependencies += "com.github.mrpowers" %% "spark-daria" % "0.38.2"
libraryDependencies += "com.github.mrpowers" %% "spark-fast-tests" % "0.21.3" % "test"
libraryDependencies += "org.scalatest" %% "scalatest" % "3.0.1" % "test"
import org.apache.spark.sql.SparkSession
object SparkApp {
def main(args: Array[String]){
val spark = SparkSession.builder().master("local")
//.config("spark.jars.repositories", "https://repository.cloudera.com/artifactory/cloudera-repos/")
//.config("spark.jars.packages", "org.apache.spark:spark-hadoop-cloud_2.12:3.1.1.3.1.7270.0-253")
.appName("spark session").getOrCreate
val jsonDF = spark.read.json("s3a://path-to-bucket/compact.json")
val csvDF = spark.read.format("csv").load("s3a://path-to-bucket/some.csv")
jsonDF.show()
csvDF.show()
}
}
To read and write to S3 from Spark you only need these 2 dependencies:
"org.apache.hadoop" % "hadoop-aws" % hadoopVersion,
"org.apache.hadoop" % "hadoop-common" % hadoopVersion
Make sure the haddopVersion is the same used by your worker nodes and make sure your workers node also have these dependencies available. The rest of your code looks correct.

Why does Spark with Play fail with "NoClassDefFoundError: Could not initialize class org.apache.spark.SparkConf$"?

I am trying to use this project (https://github.com/alexmasselot/spark-play-activator) as an integration of Play and Spark example to do the same in my project. So, I created an object that starts Spark and a Controller that read a Json file using RDD. Below is my Object that starts Spark:
package bootstrap
import org.apache.spark.sql.SparkSession
object SparkCommons {
val sparkSession = SparkSession
.builder
.master("local")
.appName("ApplicationController")
.getOrCreate()
}
and my build.sbt is like this:
import play.sbt.PlayImport._
name := """crypto-miners-demo"""
version := "1.0-SNAPSHOT"
lazy val root = (project in file(".")).enablePlugins(PlayScala)
scalaVersion := "2.12.4"
libraryDependencies += guice
libraryDependencies += evolutions
libraryDependencies += jdbc
libraryDependencies += filters
libraryDependencies += ws
libraryDependencies += "com.h2database" % "h2" % "1.4.194"
libraryDependencies += "com.typesafe.play" %% "anorm" % "2.5.3"
libraryDependencies += "org.scalatestplus.play" %% "scalatestplus-play" % "3.1.0" % Test
libraryDependencies += "com.typesafe.play" %% "play-slick" % "3.0.0"
libraryDependencies += "com.typesafe.play" %% "play-slick-evolutions" % "3.0.0"
libraryDependencies += "org.xerial" % "sqlite-jdbc" % "3.19.3"
libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.2.0"
libraryDependencies += "org.apache.spark" % "spark-sql_2.11" % "2.2.0"
dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-databind" % "2.8.7"
But when I try to call a controller that uses the RDD I get this error on Play framework:
java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.SparkConf$
I am using the RDD like this: val rdd = SparkCommons.sparkSession.read.json("downloads/tweet-json").
The application that I am trying to copy the configuration is working well. I only could import the jackson-databind lib to my build.sbt. I have an error when I copy libraryDependencies ++= Dependencies.sparkAkkaHadoop and ivyScala := ivyScala.value map { _.copy(overrideScalaVersion = true) } to my build.sbt.
I will write 100000 times on the black board and never forget. Spark 2.2.0 still doesn't work with Scala 2.12. I also edited the Jackson lib version. Below is my build.sbt.
import play.sbt.PlayImport._
name := """crypto-miners-demo"""
version := "1.0-SNAPSHOT"
lazy val root = (project in file(".")).enablePlugins(PlayScala)
scalaVersion := "2.11.8"
libraryDependencies += guice
libraryDependencies += evolutions
libraryDependencies += jdbc
libraryDependencies += filters
libraryDependencies += ws
libraryDependencies += "com.h2database" % "h2" % "1.4.194"
libraryDependencies += "com.typesafe.play" %% "anorm" % "2.5.3"
libraryDependencies += "org.scalatestplus.play" %% "scalatestplus-play" % "3.1.0" % Test
libraryDependencies += "com.typesafe.play" %% "play-slick" % "3.0.0"
libraryDependencies += "com.typesafe.play" %% "play-slick-evolutions" % "3.0.0"
libraryDependencies += "org.xerial" % "sqlite-jdbc" % "3.19.3"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.2.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.0"
dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-databind" % "2.6.5"

Unable to run Unit tests (scalatest) on Spark-2.2.0 - Scala-2.11.8

Unable to run run scalatest with context on spark-2.2.0
StackTrace:
An exception or error caused a run to abort: org.apache.spark.sql.test.SharedSQLContext.eventually(Lorg/scalatest/concurrent/PatienceConfiguration$Timeout;Lscala/Function0;Lorg/scalatest/concurrent/AbstractPatienceConfiguration$PatienceConfig;)Ljava/lang/Object;
java.lang.NoSuchMethodError: org.apache.spark.sql.test.SharedSQLContext.eventually(Lorg/scalatest/concurrent/PatienceConfiguration$Timeout;Lscala/Function0;Lorg/scalatest/concurrent/AbstractPatienceConfiguration$PatienceConfig;)Ljava/lang/Object;
at org.apache.spark.sql.test.SharedSQLContext$class.afterEach(SharedSQLContext.scala:92)
at testUtils.ScalaTestWithContext1.afterEach(ScalaTestWithContext1.scala:7)
at org.scalatest.BeforeAndAfterEach$$anonfun$1.apply$mcV$sp(BeforeAndAfterEach.scala:234)
sample code:
import org.apache.spark.sql.SparkSession
import testUtils.ScalaTestWithContext1
class SampLeTest extends ScalaTestWithContext1 {
override protected def spark: SparkSession = ???
test("test") {
1 == 1 shouldBe true
}
}
ScalaTestWithContext1.scala
import org.apache.spark.sql.QueryTest
import org.apache.spark.sql.test.SharedSQLContext
import org.scalatest.{BeforeAndAfterAll, Matchers}
abstract class ScalaTestWithContext extends QueryTest with SharedSQLContext with Matchers with BeforeAndAfterAll{}
build.sbt :
name := "test"
version := "1.0"
scalaVersion := "2.11.11"
parallelExecution in Test := false
libraryDependencies ++= Seq(
"org.scala-lang" % "scala-library" % "2.11.11" % "provided",
"org.apache.spark" %% "spark-core" % "2.2.0",
"org.apache.spark" %% "spark-sql" % "2.2.0",
"org.apache.spark" %% "spark-catalyst" % "2.2.0",
"org.apache.spark" %% "spark-core" % "2.2.0" % "test" classifier
"tests",
"org.apache.spark" %% "spark-sql" % "2.2.0" % "test" classifier
"tests",
"org.apache.spark" %% "spark-catalyst" % "2.2.0" % "test" classifier
"tests",
"org.scalatest" %% "scalatest" % "3.0.1" % "test"
)
The class ScalaTestWithContext1 extends SharedSQLContext and all required traits.
Thanks in Advance.
I faced a similar issue. The solution that worked for me was to use version 2.2.6 of Scalatest instead of the 3.x version. The Maven repository also shows the correct dependency in the section "Test Dependencies".
Similarly as already pointed out, check out the pom.xml file in Spark's github repository to ensure you're using the same scalatest version.
There's probably a better solution, like having sbt merge or override your preferred version of scalatest over Spark's, but as of Dec 2019 Spark 2.4.4 is using Scalatest 3.0.3, which is fairly recent.

java.lang.NoSuchMethodError in Scalatra using Scalate with Markdown

So I have a Scalatra app (using Scalatra 2.2.1). I'm building views using Scalate; I've decided to go with the Jade/Markdown one-two. Only one problem: if I try to use markdown in a jade template (started with the :markdown tag), I get this:
scala.Predef$.any2ArrowAssoc(Ljava/lang/Object;)Lscala/Predef$ArrowAssoc;
java.lang.NoSuchMethodError: scala.Predef$.any2ArrowAssoc(Ljava/lang/Object;)Lscala/Predef$ArrowAssoc;
at org.fusesource.scalamd.Markdown$.<init>(md.scala:119)
at org.fusesource.scalamd.Markdown$.<clinit>(md.scala:-1)
at org.fusesource.scalate.filter.ScalaMarkdownFilter$.filter(ScalaMarkdownFilter.scala:32)
at org.fusesource.scalate.RenderContext$class.filter(RenderContext.scala:276)
at org.fusesource.scalate.DefaultRenderContext.filter(DefaultRenderContext.scala:30)
at org.fusesource.scalate.RenderContext$class.value(RenderContext.scala:235)
at org.fusesource.scalate.DefaultRenderContext.value(DefaultRenderContext.scala:30)
at templates.views.$_scalate_$about_jade$.$_scalate_$render(about_jade.scala:37)
at templates.views.$_scalate_$about_jade.render(about_jade.scala:48)
at org.fusesource.scalate.DefaultRenderContext.capture(DefaultRenderContext.scala:92)
at org.fusesource.scalate.layout.DefaultLayoutStrategy.layout(DefaultLayoutStrategy.scala:45)
at org.fusesource.scalate.TemplateEngine$$anonfun$layout$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(TemplateEngine.scala:559)
at org.fusesource.scalate.TemplateEngine$$anonfun$layout$1$$anonfun$apply$mcV$sp$1.apply(TemplateEngine.scala:559)
at org.fusesource.scalate.TemplateEngine$$anonfun$layout$1$$anonfun$apply$mcV$sp$1.apply(TemplateEngine.scala:559)
So that's pretty cool. The error vanishes as soon as I remove the :markdown flag, and beyond that everything compiles (beyond markdown not getting rendered correctly).
Things I know and have found so far:
There's some thought that this error is the biproduct of incompatible Scala versions somewhere in the build. My build.scala defines Scala version as 2.10.0, which Scalatra is explicitly compatible with.
...that said, I have no idea which version of Scalate Scalatra pulls in, and my attempts to override it have not worked so far. I know the current stable of Scalate (1.6.1) is only compatible up to Scala 2.10.0 -- but that's what I'm using.
I am, however, sure that my classpath is clean. I have no conflicting Scala versions. Everything is 2.10.0 in the dependencies.
Has anybody worked with this one before? Any ideas?
EDIT
Per request, here are my build definitions:
//build.sbt
libraryDependencies += "org.scalatest" %% "scalatest" % "2.0.M5b" % "test"
libraryDependencies += "org.twitter4j" % "twitter4j-core" % "3.0.3"
libraryDependencies += "org.fusesource.scalamd" % "scalamd" % "1.5"
//build.properties
sbt.version=0.12.3
//build.scala
import sbt._
import Keys._
import org.scalatra.sbt._
import org.scalatra.sbt.PluginKeys._
import com.mojolly.scalate.ScalatePlugin._
import ScalateKeys._
object TheRangeBuild extends Build {
val Organization = "com.gastove"
val Name = "The Range"
val Version = "0.1.0-SNAPSHOT"
val ScalaVersion = "2.10.0"
val ScalatraVersion = "2.2.1"
lazy val project = Project (
"the-range",
file("."),
settings = Defaults.defaultSettings ++ ScalatraPlugin.scalatraWithJRebel ++ scalateSettings ++ Seq(
organization := Organization,
name := Name,
version := Version,
scalaVersion := ScalaVersion,
resolvers += Classpaths.typesafeReleases,
libraryDependencies ++= Seq(
"org.scalatra" %% "scalatra" % ScalatraVersion,
"org.scalatra" %% "scalatra-scalate" % ScalatraVersion,
"org.scalatra" %% "scalatra-specs2" % ScalatraVersion % "test",
"ch.qos.logback" % "logback-classic" % "1.0.6" % "runtime",
"org.eclipse.jetty" % "jetty-webapp" % "8.1.8.v20121106" % "compile;container",
"org.eclipse.jetty.orbit" % "javax.servlet" % "3.0.0.v201112011016" % "compile;container;provided;test" artifacts (Artifact("javax.servlet", "jar", "jar"))
),
scalateTemplateConfig in Compile <<= (sourceDirectory in Compile){ base =>
Seq(
TemplateConfig(
base / "webapp" / "WEB-INF" / "templates",
Seq.empty, /* default imports should be added here */
Seq(
Binding("context", "_root_.org.scalatra.scalate.ScalatraRenderContext", importMembers = true, isImplicit = true)
), /* add extra bindings here */
Some("templates")
)
)
}
) ++ seq(com.typesafe.startscript.StartScriptPlugin.startScriptForClassesSettings: _*)
)
}
When using the markdown filter, you need to add the scalamd library as runtime dependency:
"org.fusesource.scalamd" %% "scalamd" % "1.6"
The most recent version can be found on Maven Central
Also you can delete the build.sbt file and put the dependencies into build.scala file which makes things a bit simpler.
import sbt._
import Keys._
import org.scalatra.sbt._
import org.scalatra.sbt.PluginKeys._
import com.mojolly.scalate.ScalatePlugin._
import ScalateKeys._
object TheRangeBuild extends Build {
val Organization = "com.gastove"
val Name = "The Range"
val Version = "0.1.0-SNAPSHOT"
val ScalaVersion = "2.10.0"
val ScalatraVersion = "2.2.1"
lazy val project = Project(
"the-range",
file("."),
settings = Defaults.defaultSettings ++ ScalatraPlugin.scalatraWithJRebel ++ scalateSettings ++ Seq(
organization := Organization,
name := Name,
version := Version,
scalaVersion := ScalaVersion,
resolvers += Classpaths.typesafeReleases,
libraryDependencies ++= Seq( // adding this Seq to the libraryDependencies
"org.scalatra" %% "scalatra" % ScalatraVersion,
"org.scalatra" %% "scalatra-scalate" % ScalatraVersion,
"org.scalatra" %% "scalatra-specs2" % ScalatraVersion % "test",
"ch.qos.logback" % "logback-classic" % "1.0.6" % "runtime",
"org.eclipse.jetty" % "jetty-webapp" % "8.1.8.v20121106" % "compile;container",
"org.scalatest" %% "scalatest" % "2.0.M5b" % "test",
"org.twitter4j" % "twitter4j-core" % "3.0.3",
"org.fusesource.scalamd" % "scalamd" % "1.6",
"org.eclipse.jetty.orbit" % "javax.servlet" % "3.0.0.v201112011016" % "compile;container;provided;test" artifacts (Artifact("javax.servlet", "jar", "jar"))
),
scalateTemplateConfig in Compile <<= (sourceDirectory in Compile){ base =>
Seq(
TemplateConfig(
base / "webapp" / "WEB-INF" / "templates",
Seq.empty, /* default imports should be added here */
Seq(
Binding("context", "_root_.org.scalatra.scalate.ScalatraRenderContext", importMembers = true, isImplicit = true)
), /* add extra bindings here */
Some("templates")
)
)
}
) ++ seq(com.typesafe.startscript.StartScriptPlugin.startScriptForClassesSettings: _*)
)
}
This comes from:
libraryDependencies += "org.fusesource.scalamd" % "scalamd" % "1.5"
Looking at the scalamd-1.5 pom.xml, it is built against Scala 2.8.1, which is not compatible with 2.10.
Dependency resolution keeps 2.10 and discard the 2.8.1 dependency, and you end up with this classpath issue.
The only solution you have is to try and build a new scalamd version against Scala 2.10, potentially fix a few things to get it there, and then publish it (at least locally).