SBT gives java.lang.NullPointerException when trying to run spark

SBT gives java.lang.NullPointerException when trying to run spark - scala

I'm trying to compile spark with sbt 1.7.2 on a Linux machine which system is CentOs6.
When I try to run clean command:
./build/sbt clean
I get the following output:
java.lang.NullPointerException
at sun.net.util.URLUtil.urlNoFragString(URLUtil.java:50)
at sun.misc.URLClassPath.getLoader(URLClassPath.java:526)
at sun.misc.URLClassPath.getNextLoader(URLClassPath.java:498)
at sun.misc.URLClassPath.getResource(URLClassPath.java:252)
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
at java.lang.ClassLoader.loadClass(ClassLoader.java:406)
at java.lang.ClassLoader.loadClass(ClassLoader.java:406)
at java.lang.ClassLoader.loadClass(ClassLoader.java:406)
at java.lang.ClassLoader.loadClass(ClassLoader.java:406)
at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
at sbt.internal.XMainConfiguration.run(XMainConfiguration.java:51)
at sbt.xMain.run(Main.scala:46)
at xsbt.boot.Launch$.$anonfun$run$1(Launch.scala:149)
at xsbt.boot.Launch$.withContextLoader(Launch.scala:176)
at xsbt.boot.Launch$.run(Launch.scala:149)
at xsbt.boot.Launch$.$anonfun$apply$1(Launch.scala:44)
at xsbt.boot.Launch$.launch(Launch.scala:159)
at xsbt.boot.Launch$.apply(Launch.scala:44)
at xsbt.boot.Launch$.apply(Launch.scala:21)
at xsbt.boot.Boot$.runImpl(Boot.scala:78)
at xsbt.boot.Boot$.run(Boot.scala:73)
at xsbt.boot.Boot$.main(Boot.scala:21)
at xsbt.boot.Boot.main(Boot.scala)
[error] [launcher] error during sbt launcher: java.lang.NullPointerException
It also happened when I use sbt 1.7.3, But it can success clean and compile spark when I use sbt 1.6.2.
What should I check first? I'd really appreciate any advice anyone can offer.

Several advices how to debug Spark and sbt.
How to build Spark in IntelliJ.
Clone https://github.com/apache/spark , open it in IntelliJ as sbt project.
I had to execute sbt compile and re-open the project before I can run my code in IntelliJ (I had an error object SqlBaseParser is not a member of package org.apache.spark.sql.catalyst.parser before that). For example I can put the following object in sql/core/src/main/scala and run/debug it in IntelliJ
// scalastyle:off
import org.apache.spark.sql.{Dataset, SparkSession}
object MyMain extends App {
val spark = SparkSession.builder()
.master("local")
.appName("SparkTestApp")
.getOrCreate()
case class Person(id: Long, name: String)
import spark.implicits._
val df: Dataset[Person] = spark.range(10).map(i => Person(i, i.toString))
df.show()
//+---+----+
//| id|name|
//+---+----+
//| 0| 0|
//| 1| 1|
//| 2| 2|
//| 3| 3|
//| 4| 4|
//| 5| 5|
//| 6| 6|
//| 7| 7|
//| 8| 8|
//| 9| 9|
//+---+----+
}
I also pressed Run npm install, Load Maven project when these pop-up windows appeared but I haven't noticed the difference.
Also once I had to keep in Project Structure in sql/catalyst/target/scala-2.12/src_managed only one source root sql/catalyst/target/scala-2.12/src_managed/main (and not sql/catalyst/target/scala-2.12/src_managed/main/antlr4). I had errors like SqlBaseLexer is already defined as class SqlBaseLexer before that.
Build Apache Spark Source Code with IntelliJ IDEA: https://yujheli-wordpress-com.translate.goog/2020/03/26/build-apache-spark-source-code-with-intellij-idea/?_x_tr_sl=auto&_x_tr_tl=en&_x_tr_hl=uk&_x_tr_pto=wapp (original in Chinese: https://yujheli.wordpress.com/2020/03/26/build-apache-spark-source-code-with-intellij-idea/ )
Why does building Spark sources give "object sbt is not a member of package com.typesafe"?
How to build sbt in IntelliJ.
sbt itself is tricky https://www.lihaoyi.com/post/SowhatswrongwithSBT.html and building it is a little tricky too.
Clone https://github.com/sbt/sbt , open it in IntelliJ. Let's try to run the previous Spark code using this cloned sbt.
sbt seems to be not intended to run in a specified directory. I put the following object in client/src/main/scala
object MyClient extends App {
System.setProperty("user.dir", "../spark")
sbt.client.Client.main(Array("sql/runMain MyMain"))
}
(Generally, mutating the system property user.dir is not recommended: How to use "cd" command using Java runtime?)
I had to execute sbt compile firstly (this includes the command sbt generateContrabands --- sbt uses sbt plugin sbt-contraband (ContrabandPlugin, JsonCodecPlugin), formerly sbt-datatype, for code generation: https://github.com/sbt/contraband https://www.scala-sbt.org/contraband/ https://www.scala-sbt.org/1.x/docs/Datatype.html https://github.com/eed3si9n/gigahorse/tree/develop/core/src/main/contraband). I had error not found: value ScalaKeywords before that.
Next error is type ExcludeItem is not a member of package sbt.internal.bsp. You can just remove in protocol/src/main/contraband-scala/sbt/internal/bsp/codec the files ExcludeItemFormats.scala, ExcludesItemFormats.scala, ExcludesParamsFormats.scala, ExcludesResultFormats.scala. They are outdated auto-generated files. You can check that if you remove the content of directory protocol/src/main/contraband-scala (this is a root for auto-generated sources) and execute sbt generateContrabands all the files except these four will be restored. For some reason these files didn't confuse sbt but confuse IntelliJ.
Now, while running, MyClient produces
//[info] +---+----+
//[info] | id|name|
//[info] +---+----+
//[info] | 0| 0|
//[info] | 1| 1|
//[info] | 2| 2|
//[info] | 3| 3|
//[info] | 4| 4|
//[info] | 5| 5|
//[info] | 6| 6|
//[info] | 7| 7|
//[info] | 8| 8|
//[info] | 9| 9|
//[info] +---+----+
sbt.client.Client is called the thin client. Alternatively, you can publish it locally and use as a dependency
build.sbt (https://github.com/sbt/sbt/blob/v1.8.0/build.sbt#L1160)
lazy val sbtClientProj = (project in file("client"))
.enablePlugins(NativeImagePlugin)
.dependsOn(commandProj)
.settings(
commonBaseSettings,
scalaVersion := "2.12.11",
publish / skip := false, // change true to false
name := "sbt-client",
.......
sbt publishLocal
A new project:
build.sbt
scalaVersion := "2.12.17"
// ~/.ivy2/local/org.scala-sbt/sbt-client/1.8.1-SNAPSHOT/jars/sbt-client.jar
libraryDependencies += "org.scala-sbt" % "sbt-client" % "1.8.1-SNAPSHOT"
src/main/scala/Main.scala
object Main extends App {
System.setProperty("user.dir", "../spark")
sbt.client.Client.main(Array("sql/runMain MyMain"))
//[info] +---+----+
//[info] | id|name|
//[info] +---+----+
//[info] | 0| 0|
//[info] | 1| 1|
//[info] | 2| 2|
//[info] | 3| 3|
//[info] | 4| 4|
//[info] | 5| 5|
//[info] | 6| 6|
//[info] | 7| 7|
//[info] | 8| 8|
//[info] | 9| 9|
//[info] +---+----+
}
But the thin client is not how sbt normally runs. sbt.xMain from your stack trace is from https://github.com/sbt/sbt . It's here: https://github.com/sbt/sbt/blob/1.8.x/main/src/main/scala/sbt/Main.scala#L44 But xsbt.boot.Boot from the stack trace is not from this repo, it's from https://github.com/sbt/launcher , namely https://github.com/sbt/launcher/blob/1.x/launcher-implementation/src/main/scala/xsbt/boot/Boot.scala
The thing is that sbt runs in two steps. The sbt executable (usually downloaded from https://www.scala-sbt.org/download.html#universal-packages) is a shell script, firstly it runs sbt-launch.jar (the object xsbt.boot.Boot)
https://github.com/sbt/sbt/blob/v1.8.0/sbt#L507-L512
execRunner "$java_cmd" \
"${java_args[#]}" \
"${sbt_options[#]}" \
-jar "$sbt_jar" \
"${sbt_commands[#]}" \
"${residual_args[#]}"
and secondly the latter reflectively calls sbt (the class sbt.xMain)
https://github.com/sbt/launcher/blob/v1.4.1/launcher-implementation/src/main/scala/xsbt/boot/Launch.scala#L147-L149
val main = appProvider.newMain()
try {
withContextLoader(appProvider.loader)(main.run(appConfig))
https://github.com/sbt/launcher/blob/v1.4.1/launcher-implementation/src/main/scala/xsbt/boot/Launch.scala#L496
// implementation of the above appProvider.newMain()
else if (AppMainClass.isAssignableFrom(entryPoint)) mainClass.newInstance
https://github.com/sbt/launcher/blob/v1.4.1/launcher-implementation/src/main/scala/xsbt/boot/PlainApplication.scala#L13
// implementation of the above main.run(appConfig)
mainMethod.invoke(null, configuration.arguments).asInstanceOf[xsbti.Exit]
Then xMain#run via XMainConfiguration#run reflectively calls xMain.run
https://github.com/sbt/sbt/blob/v1.8.0/main/src/main/scala/sbt/Main.scala#L44-L47
class xMain extends xsbti.AppMain {
def run(configuration: xsbti.AppConfiguration): xsbti.MainResult =
new XMainConfiguration().run("xMain", configuration)
}
https://github.com/sbt/sbt/blob/v1.8.0/main/src/main/java/sbt/internal/XMainConfiguration.java#L51-L57
Class<?> clazz = loader.loadClass("sbt." + moduleName + "$");
Object instance = clazz.getField("MODULE$").get(null);
Method runMethod = clazz.getMethod("run", xsbti.AppConfiguration.class);
try {
.....
return (xsbti.MainResult) runMethod.invoke(instance, updatedConfiguration);
Then it downloads and runs necessary version of Scala (specified in a build.sbt) and necessary version of the rest of sbt (specified in a project/build.properties).
What is the launcher.
Let's consider a helloworld for the launcher.
The launcher consists of a library (interfaces)
https://mvnrepository.com/artifact/org.scala-sbt/launcher-interface
https://github.com/sbt/launcher/tree/1.x/launcher-interface
and the launcher runnable jar
https://mvnrepository.com/artifact/org.scala-sbt/launcher
https://github.com/sbt/launcher/tree/1.x/launcher-implementation/src
Create a project (depending on launcher interfaces at compile tome)
build.sbt
lazy val root = (project in file("."))
.settings(
name := "scalademo",
organization := "com.example",
version := "0.1.0-SNAPSHOT",
scalaVersion := "2.13.10",
libraryDependencies ++= Seq(
"org.scala-sbt" % "launcher-interface" % "1.4.1" % Provided,
),
)
src/main/scala/mypackage/Main.scala (this class will be an entry point while working with the launcher)
package mypackage
import xsbti.{AppConfiguration, AppMain, Exit, MainResult}
class Main extends AppMain {
def run(configuration: AppConfiguration): MainResult = {
val scalaVersion = configuration.provider.scalaProvider.version
println(s"Hello, World! Running Scala $scalaVersion")
configuration.arguments.foreach(println)
new Exit {
override val code: Int = 0
}
}
}
Do sbt publishLocal. The project jar will be published at ~/.ivy2/local/com.example/scalademo_2.13/0.1.0-SNAPSHOT/jars/scalademo_2.13.jar
Download launcher runnable jar https://repo1.maven.org/maven2/org/scala-sbt/launcher/1.4.1/launcher-1.4.1.jar
Create launcher configuration
my.app.configuration
[scala]
version: 2.13.10
[app]
org: com.example
name: scalademo
version: 0.1.0-SNAPSHOT
class: mypackage.Main
cross-versioned: binary
[repositories]
local
maven-central
[boot]
directory: ${user.home}/.myapp/boot
Then command java -jar launcher-1.4.1.jar #my.app.configuration a b c produces
//Hello world! Running Scala 2.13.10
//a
//b
//c
There appeared files
~/.myapp/boot/scala-2.13.10/com.example/scalademo/0.1.0-SNAPSHOT
scalademo_2.13.jar
scala-library-2.13.10.jar
~/.myapp/boot/scala-2.13.10/lib
java-diff-utils-4.12.jar
jna-5.9.0.jar
jline-3.21.0.jar
scala-library.jar
scala-compiler.jar
scala-reflect.jar
So launcher helps to run application in environments with only Java installed (Scala is not necessary), Ivy dependency resolution will be used. There are features to handle return codes, reboot application with a different Scala version, launch servers etc.
Alternatively, any of the following commands can be used
java -Dsbt.boot.properties=my.app.configuration -jar launcher-1.4.1.jar
java -jar launcher-repacked.jar # put my.app.configuration to sbt/sbt.boot.properties/ and repack the jar
https://www.scala-sbt.org/1.x/docs/Launcher-Getting-Started.html
How to run sbt with the launcher.
Sbt https://github.com/sbt/sbt uses sbt plugin SbtLauncherPlugin https://github.com/sbt/sbt/blob/v1.8.0/project/SbtLauncherPlugin.scala so that from the raw launcher launcher
https://github.com/sbt/launcher/tree/1.x/launcher-implementation/src
https://mvnrepository.com/artifact/org.scala-sbt/launcher
it builds sbt-launch
https://github.com/sbt/sbt/tree/v1.8.0/launch
https://mvnrepository.com/artifact/org.scala-sbt/sbt-launch
Basically, sbt-launch is different from launcher in having default config sbt.boot.properties injected.
If we'd like to run sbt with the launcher then we should find a way to specify a working directory for sbt (similarly to how we did this while working with thin client).
Working directory can be set either 1) in sbt.xMain (sbt) or 2) in xsbt.boot.Boot (sbt-launcher).
1) Make sbt.xMain non-final so that it can be extended
/*final*/ class xMain extends xsbti.AppMain {
...........
https://github.com/sbt/sbt/blob/v1.8.0/main/src/main/scala/sbt/Main.scala#L44
Put a new class to main/src/main/scala (a launcher-style entry point)
import sbt.xMain
import xsbti.{ AppConfiguration, AppProvider, MainResult }
import java.io.File
class MyXMain extends xMain {
override def run(configuration: AppConfiguration): MainResult = {
val args = configuration.arguments
val (dir, rest) =
if (args.length >= 1 && args(0).startsWith("dir=")) {
(
Some(args(0).stripPrefix("dir=")),
args.drop(1)
)
} else {
(None, args)
}
dir.foreach { dir =>
System.setProperty("user.dir", dir)
}
// xMain.run(new AppConfiguration { // not ok
// new xMain().run(new AppConfiguration { // not ok
super[xMain].run(new AppConfiguration { // ok
override val arguments: Array[String] = rest
override val baseDirectory: File =
dir.map(new File(_)).getOrElse(configuration.baseDirectory)
override val provider: AppProvider = configuration.provider
})
}
}
sbt publishLocal
my.sbt.configuration
[scala]
version: auto
#version: 2.12.17
[app]
org: org.scala-sbt
name: sbt
#name: main # not ok
version: 1.8.1-SNAPSHOT
class: MyXMain
#class: sbt.xMain
components: xsbti,extra
cross-versioned: false
#cross-versioned: binary
[repositories]
local
maven-central
[boot]
directory: ${user.home}/.mysbt/boot
[ivy]
ivy-home: ${user.home}/.ivy2
A command:
java -jar launcher-1.4.1.jar #my.sbt.configuration dir=/path_to_spark/spark "sql/runMain MyMain"
or
java -jar sbt-launch.jar #my.sbt.configuration dir=/path_to_spark/spark "sql/runMain MyMain"
//[info] +---+----+
//[info] | id|name|
//[info] +---+----+
//[info] | 0| 0|
//[info] | 1| 1|
//[info] | 2| 2|
//[info] | 3| 3|
//[info] | 4| 4|
//[info] | 5| 5|
//[info] | 6| 6|
//[info] | 7| 7|
//[info] | 8| 8|
//[info] | 9| 9|
//[info] +---+----+
(sbt-launch.jar is taken from ~/.ivy2/local/org.scala-sbt/sbt-launch/1.8.1-SNAPSHOT/jars or just https://mvnrepository.com/artifact/org.scala-sbt/sbt-launch since we haven't modified launcher yet)
I had to copy scalastyle-config.xml from spark, otherwise it wasn't found.
Still I have warnings fatal: Not a git repository (or any parent up to mount parent ...) Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
2)
project/Dependencies.scala (https://github.com/sbt/sbt/blob/v1.8.0/project/Dependencies.scala#L25)
val launcherVersion = "1.4.2-SNAPSHOT" // modified
Clone https://github.com/sbt/launcher and make the following changes
build.sbt (https://github.com/sbt/launcher/blob/v1.4.1/build.sbt#L11)
ThisBuild / version := {
val orig = (ThisBuild / version).value
if (orig.endsWith("-SNAPSHOT")) "1.4.2-SNAPSHOT" // modified
else orig
}
launcher-implementation/src/main/scala/xsbt/boot/Launch.scala
(https://github.com/sbt/launcher/blob/v1.4.1/launcher-implementation/src/main/scala/xsbt/boot/Launch.scala#L17
#L21)
class LauncherArguments(
val args: List[String],
val isLocate: Boolean,
val isExportRt: Boolean,
val dir: Option[String] = None // added
)
object Launch {
def apply(arguments: LauncherArguments): Option[Int] =
apply((new File(arguments.dir.getOrElse(""))).getAbsoluteFile, arguments) // modified
.............
launcher-implementation/src/main/scala/xsbt/boot/Boot.scala (https://github.com/sbt/launcher/blob/v1.4.1/launcher-implementation/src/main/scala/xsbt/boot/Boot.scala#L41-L67)
def parseArgs(args: Array[String]): LauncherArguments = {
#annotation.tailrec
def parse(
args: List[String],
isLocate: Boolean,
isExportRt: Boolean,
remaining: List[String],
dir: Option[String] // added
): LauncherArguments =
args match {
...................
case "--locate" :: rest => parse(rest, true, isExportRt, remaining, dir) // modified
case "--export-rt" :: rest => parse(rest, isLocate, true, remaining, dir) // modified
// added
case "--mydir" :: next :: rest => parse(rest, isLocate, isExportRt, remaining, Some(next))
case next :: rest => parse(rest, isLocate, isExportRt, next :: remaining, dir) // modified
case Nil => new LauncherArguments(remaining.reverse, isLocate, isExportRt, dir) // modified
}
parse(args.toList, false, false, Nil, None)
}
sbt-launcher: sbt publishLocal
sbt: sbt publishLocal
my.sbt.configuration
[scala]
version: auto
[app]
org: org.scala-sbt
name: sbt
version: 1.8.1-SNAPSHOT
#class: MyXMain
class: sbt.xMain
components: xsbti,extra
cross-versioned: false
[repositories]
local
maven-central
[boot]
directory: ${user.home}/.mysbt/boot
[ivy]
ivy-home: ${user.home}/.ivy2
A command:
java -jar launcher-1.4.2-SNAPSHOT.jar #my.sbt.configuration --mydir /path_to_spark/spark "sql/runMain MyMain"
or
java -jar sbt-launch.jar #my.sbt.configuration --mydir /path_to_spark/spark "sql/runMain MyMain"
or
java -jar sbt-launch.jar --mydir /path_to_spark/spark "sql/runMain MyMain" (using default sbt.boot.properties rather than my.sbt.configuration)
(we're using modified launcher or new sbt-launch using this modified launcher).
Alternatively, we can specify "program arguments" in "Run configuration" for xsbt.boot.Boot in IntelliJ
#/path_to_sbt_config/my.sbt.configuration --mydir /path_to_spark/spark "sql/runMain MyMain"
Also it's possible to specify working directory /path_to_spark/spark in "Run configuration" in IntelliJ. Then remaining "program arguments" are
#/path_to_sbt_config/my.sbt.configuration "sql/runMain MyMain"
I tried to use "org.scala-sbt" % "launcher" % "1.4.2-SNAPSHOT" or "org.scala-sbt" % "sbt-launch" % "1.8.1-SNAPSHOT" as a dependency but got No RuntimeVisibleAnnotations in classfile with ScalaSignature attribute: class Boot.
Your setting.
So we can run/debug sbt-launcher code in IntelliJ and/or with printlns and run/debug sbt code with printlns (because there is no runnable object).
From your stack trace I have suspection that one of classloader urls is null
https://github.com/openjdk/jdk/blob/jdk8-b120/jdk/src/share/classes/sun/misc/URLClassPath.java#L82
Maybe you can add to sbt.xMain#run or MyXMain#run something like
var cl = getClass.getClassLoader
while (cl != null) {
println(s"classloader: ${cl.getClass.getName}")
cl match {
case cl: URLClassLoader =>
println("classloader urls:")
cl.getURLs.foreach(println)
case _ =>
println("not URLClassLoader")
}
cl = cl.getParent
}
in order to see what url is null.
https://www.scala-sbt.org/1.x/docs/Developers-Guide.html
https://github.com/sbt/sbt/blob/1.8.x/DEVELOPING.md

Related

initialCommands in consoleProject is not executing one or more imports

Given these two inconspicuous lines of code…
import scala.sys.process._
def less(s: String) = ("code -" #< new java.io.ByteArrayInputStream(s.getBytes)).!!
…defined like this…
lazy val consoleSupportSettings: Seq[Setting[_]] = Seq(
initialCommands in consoleProject := """import scala.sys.process._
|def less(s: String) = ("code -" #< new java.io.ByteArrayInputStream(s.getBytes)).!!""".stripMargin
)
…which eventually is added to the root project…
lazy val root =
Project(id = "root", base = file("."))
.settings(consoleSupportSettings)
scala seems to not really(*) execute the line import scala.sys.process._: When I fire up sbt and hop into consoleProject I am greeted by…
> consoleProject
[info] Starting scala interpreter...
[info]
<console>:19: error: value #< is not a member of String
def less(s: String) = ("code -" #< new java.io.ByteArrayInputStream(s.getBytes)).!!
^
[success] Total time: 1 s, completed Apr 29, 2021 10:58:09 AM
success indeed. NOT.
However, if I remove the setting…
> set initialCommands in consoleProject := ""
[info] Defining root/*:consoleProject::initialCommands
[info] The new value will be used by root/*:consoleProject
[info] Reapplying settings...
…
> consoleProject
…and then manually enter above code it works:
…
Welcome to Scala version 2.10.4 (OpenJDK 64-Bit Server VM, Java 1.8.0_272).
Type in expressions to have them evaluated.
Type :help for more information.
scala> import scala.sys.process._
import scala.sys.process._
scala> def less(s: String) = ("code -" #< new java.io.ByteArrayInputStream(s.getBytes)).!!
less: (s: String)String
Why is that and how do I fix this?
Worthy of note is that the sbt version is 0.13.8, so not exactly hot off the press.
(*) fwiw I replaced import scala.sys.process._ with import scala.sys.processasdf._ and it, as expected, would complain that processasdf was not a member of scala.sys, so it's not as if the import was actually ignored.

Whilst I still don't know why, it turns out that having the import inside the def rather than globally does the job:
initialCommands in consoleProject := """
|def less(s: String) = {
| import scala.sys.process._
| ("code -" #< new java.io.ByteArrayInputStream(s.getBytes)).!!
|}
|""".stripMargin

SBT: generate code for submodule before compilation

I have the following issue with build.sbt configuration.
I need to generate some code before compilation.
That's how it works now.
lazy val rootProject = project.in(file("."))
lazy val rootSourceGenerator = Def.task {
val f: File = (sourceManaged in Compile).value / "com" / "myproject" / "Version.scala"
IO.write(
f,
s"""package com.myproject
|
|object Version {
| some code ...
|}
|""".stripMargin
)
Seq(f)
}
inConfig(Compile)(
Seq(
sourceGenerators += rootSourceGenerator
))
And for now I need to make the same thing for a new submodule.
lazy val rootProject = project.in(file(".")).dependsOn(submodule)
lazy val submodule = project.in(file("submodule"))
lazy val submoduleSourceGenerator = Def.task {
val f: File = (sourceManaged in (submodule, Compile)).value / "com" / "myproject" / "SubmoduleVersion.scala"
IO.write(
f,
s"""package com.myproject
|
|object SubmoduleVersion {
| some code ...
|}
|""".stripMargin
)
Seq(f)
}
inConfig(submodule / Compile)(
Seq(
sourceGenerators += submoduleSourceGenerator
))
And inConfig(submodule / Compile) doesn't work. Error is about unknown syntax for /.
Any suggestions how to fix this?

There are multiple solutions but the cleanest is following in my opinion.
Create an AutoPlugin in project/GenerateVersion.scala with following content
import sbt.Keys._
import sbt._
object GenerateVersion extends AutoPlugin {
override def trigger = noTrigger
override def projectSettings: Seq[Def.Setting[_]] = {
Seq(
sourceGenerators in Compile += Def.task {
val f: File =
(sourceManaged in Compile).value / "com" / "myproject" / "Version.scala"
IO.write(
f,
s"""package com.myproject
|
|object Version {
|}
|""".stripMargin
)
Seq(f)
}.taskValue
)
}
}
Enable newly created plugin GenerateVersion for all projects/submodules that need Version.scala generated. It can be done as following in build.sbt
lazy val sub = project
.in(file("sub"))
.enablePlugins(GenerateVersion)
lazy val root = project
.in(file("."))
.enablePlugins(GenerateVersion)
.aggregate(sub)
aggregate(sub) is added to run tasks in sub module when root tasks are triggered. For example, sbt compile will run both sbt "root/compile" "sub/compile"
This solution is easier to share across multiple SBT projects in the form of SBT plugin.
Also, you might be interested in sbt-builtinfo plugin

Thanks, Ivan Stanislavciuc! But I've found another solution.
Just add all of the following content to /subproject/build.sbt
lazy val submoduleSourceGenerator = Def.task {
val f: File = (sourceManaged in Compile).value / "com" / "myproject" / "SubmoduleVersion.scala"
IO.write(
f,
s"""package com.myproject
|
|object SubmoduleVersion {
| some code ...
|}
|""".stripMargin
)
Seq(f)
}
inConfig(Compile)(
Seq(
sourceGenerators += submoduleSourceGenerator
))

Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: jdbc

Here is my code from IntelliJ:
package com.dmngaya
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.SparkSession
object ReadVertexPage {
def main(args: Array[String]): Unit = {
val conf: SparkConf = new SparkConf().setAppName("ReadVertexPage").setMaster("local")
val sc: SparkContext = new SparkContext(conf)
val spark = SparkSession
.builder()
.appName("Spark SQL basic example")
.getOrCreate()
val jdbcDF1 = spark.read.format("jdbc").options(
Map(
"driver" -> "com.tigergraph.jdbc.Driver",
"url" -> "jdbc:tg:http://127.0.0.1:14240",
"username" -> "tigergraph",
"password" -> "tigergraph",
"graph" -> "gsql_demo", // graph name
"dbtable" -> "vertex Page", // vertex type
"limit" -> "10", // number of vertices to retrieve
"debug" -> "0")).load()
jdbcDF1.show
}
}
When I run it in spark-shell, it is running file:
/opt/spark/bin/spark-shell --jars /home/tigergraph/ecosys/tools/etl/tg-jdbc-driver/tg-jdbc-driver/target/tg-jdbc-driver-1.2.jar
scala> val jdbcDF1 = spark.read.format("jdbc").options(
| Map(
| "driver" -> "com.tigergraph.jdbc.Driver",
| "url" -> "jdbc:tg:http://127.0.0.1:14240",
| "username" -> "tigergraph",
| "password" -> "tigergraph",
| "graph" -> "gsql_demo", // graph name
| "dbtable" -> "vertex Page", // vertex type
| "limit" -> "10", // number of vertices to retrieve
| "debug" -> "0")).load()
jdbcDF1: org.apache.spark.sql.DataFrame = [v_id: string, page_id: string]
scala> jdbcDF1.show
result:
+----+--------+
|v_id| page_id|
+----+--------+
| 7| 7|
| 5| 5|
| 10| 10|
|1002| 1002|
| 3| 3|
|1000|new page|
|1003| 1003|
| 1| 1|
| 6| 6|
|1001| |
From IntelliJ, I have the following error:
20/11/23 10:43:43 INFO SharedState: Setting
hive.metastore.warehouse.dir ('null') to the value of
spark.sql.warehouse.dir
('file:/home/tigergraph/fiverr-2/spark-warehouse'). 20/11/23 10:43:43
INFO SharedState: Warehouse path is
'file:/home/tigergraph/fiverr-2/spark-warehouse'. Exception in thread
"main" java.lang.ClassNotFoundException: Failed to find data source:
jdbc. Please find packages at
http://spark.apache.org/third-party-projects.html at
org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:679)
at
org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:733)
at
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:248)
at
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:221)
at com.dmngaya.ReadVertexPage$.main(ReadVertexPage.scala:25) at
com.dmngaya.ReadVertexPage.main(ReadVertexPage.scala) Caused by:
java.lang.ClassNotFoundException: jdbc.DefaultSource at
java.net.URLClassLoader.findClass(URLClassLoader.java:381) at
java.lang.ClassLoader.loadClass(ClassLoader.java:424) at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) at
java.lang.ClassLoader.loadClass(ClassLoader.java:357) at
org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$5(DataSource.scala:653)
at scala.util.Try$.apply(Try.scala:213) at
org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$4(DataSource.scala:653)
at scala.util.Failure.orElse(Try.scala:224) at
org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:653)
... 5 more 20/11/23 10:43:46 INFO SparkContext: Invoking stop() from
shutdown hook 20/11/23 10:43:46 INFO SparkUI: Stopped Spark web UI at
http://tigergraph-01:4040 20/11/23 10:43:46 INFO
MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint
stopped! 20/11/23 10:43:46 INFO MemoryStore: MemoryStore cleared
20/11/23 10:43:46 INFO BlockManager: BlockManager stopped 20/11/23
10:43:47 INFO BlockManagerMaster: BlockManagerMaster stopped 20/11/23
10:43:47 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
OutputCommitCoordinator stopped! 20/11/23 10:43:47 INFO SparkContext:
Successfully stopped SparkContext 20/11/23 10:43:47 INFO
ShutdownHookManager: Shutdown hook called 20/11/23 10:43:47 INFO
ShutdownHookManager: Deleting directory
/tmp/spark-66dd4dc4-c70b-4836-805b-d68b3183ccbf Process finished with
exit code 1
How can I fix that?

You should add the dependency tg-jdbc-driver-1.2 in your pom/sbt.

Could not initialize class com.datastax.spark.connector.types.TypeConverter$

I'm trying to query local Cassandra tables using Apache-Spark, however I'm running into this error when running any select show statement
Could not initialize class com.datastax.spark.connector.types.TypeConverter$
Versions:
Cassandra:version 3.11.2 | cqlsh version 5.0.1
Apache-Spark: version 2.3.1
Scala version 2.12.6
Cassandra Keyspace -> Table
CREATE KEYSPACE test_users
... WITH REPLICATION = {
... 'class' : 'SimpleStrategy',
... 'replication_factor' : 1
... };
CREATE TABLE member (
member_id bigint PRIMARY KEY,
member_name varchar,
member_age int
);
cqlsh> select * from member;
+---------+----------+-----------------+
|member_id|member_age| member_name|
+---------+----------+-----------------+
| 5| 53| Walter White|
| 6| 29|Henry Derplestick|
| 1| 67| Larry David|
| 4| 31| Joe Schmoe|
| 2| 19| Karen Dinglebop|
| 3| 49| Kenny Logins|
+---------+----------+-----------------+
QueryMembers.Scala
spark-shell --packages com.datastax.spark:spark-cassandra-connector_2.11:2.0.0-M3 --conf spark.cassandra.connection.host="10.0.0.233"
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
import org.apache.spark.sql.Encoder
import org.apache.spark.sql.expressions.Window
import spark.implicits._
import com.datastax.spark.connector._
import com.datastax.spark.connector.cql.CassandraConnector
import org.joda.time.LocalDate
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.functions.{litrow_number}
import org.apache.spark.sql.cassandra._
import org.apache.spark.sql.SQLContext
val conf = new SparkConf(true).
set("spark.cassandra.connection.host", "10.0.0.233").
set("spark.cassandra.connection.port", "9042")
val sc = new SparkContext("spark://10.0.0.233:9042", "test", conf)
val members = spark.
read.
format("org.apache.spark.sql.cassandra").
options(Map( "table" -> "member", "keyspace" -> "test_users" )).
load()
members.printSchema()
val older_members = members.select("member_id", "member_age", "member_name").
where("member_age > 50")
older_members: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [member_id: bigint, member_age: int ... 1 more field]
older_members.show() // breaks here
Errors
Caused by: java.io.IOException: Exception during preparation of SELECT "member_id", "member_age", "member_name" FROM "test_users"."member" WHERE token("member_id") > ? AND token("member_id") <= ? ALLOW FILTERING: Could not initialize class com.datastax.spark.connector.types.TypeConverter$
2018-07-29 18:57:09 ERROR Executor:91 - Exception in task 0.0 in stage 10.0 (TID 29)
java.io.IOException: Exception during preparation of SELECT "member_id", "member_age", "member_name" FROM "test_users"."member" WHERE token("member_id") > ? AND token("member_id") <= ? ALLOW FILTERING: Could not initialize class com.datastax.spark.connector.types.TypeConverter$
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.createStatement(CassandraTableScanRDD.scala:293)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.com$datastax$spark$connector$rdd$CassandraTableScanRDD$$fetchTokenRange(CassandraTableScanRDD.scala:307)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$19.apply(CassandraTableScanRDD.scala:335)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$19.apply(CassandraTableScanRDD.scala:335)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at com.datastax.spark.connector.util.CountingIterator.hasNext(CountingIterator.scala:12)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoClassDefFoundError: Could not initialize class com.datastax.spark.connector.types.TypeConverter$
at com.datastax.spark.connector.types.BigIntType$.converterToCassandra(PrimitiveColumnType.scala:50)
at com.datastax.spark.connector.types.BigIntType$.converterToCassandra(PrimitiveColumnType.scala:46)
at com.datastax.spark.connector.types.ColumnType$.converterToCassandra(ColumnType.scala:229)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$13.apply(CassandraTableScanRDD.scala:282)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$13.apply(CassandraTableScanRDD.scala:282)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:
Any insight would be greatly appreciated.

For Spark 2.3.0 you need to use the latest version of spark-cassandra-connector: 2.3.1...
Also, you must not use version 2.0.0-M3 - it's the pre-release version. Latest version in the 2.0.x series is 2.0.9. You need always check the versions at Maven Central.

After researching further I decided to try rolling back the Apache Spark version from 2.3.1 to 2.2.0, which has resolved the issue.
I'm now able to run and get the results of the query
val older_members = members.select("member_id", "member_age", "member_name").
where("member_age > 50")
older_members.show

I have the same issue with this connector. spent almost 1 day to figure out. Dont use spark-cassandra-connector_2.10-2.0.0-M3, Use newer version of spark-cassandra-connector.
Thanks #Alex Ott

SBT: Create a task say runAll to run main class of all sub modules

I have a multi-project module like this
proj
|--Module-1
| |--src/main/scala
| |-- MainClass1.scala
|--Module-2
| |--src/main/scala
| |-- MainClass2.scala
|--build.sbt
I want to implement a runAll sbt task so that I can run all main class in different jvm
##Note : There can be any number of submodules

A bit late, but it may help others who have the same question. Here is my solution.
build.sbt:
lazy val runAll = taskKey[Unit]("run-all, for running all main classes")
def runAllIn(config: Configuration) = Def.task {
val s = streams.value
val cp = (fullClasspath in config).value
val r = (runner in run).value
(discoveredMainClasses in config).value.foreach(c =>
r.run(c, cp.files, Seq(), s.log))
}
// For the target project that you intend to run "runAll"
lazy val example = (project in file("example"))
.settings(runAll := runAllIn(Compile).value)

The best idea would be to call a shell script from your build.sbt task like this
val searchPackage = taskKey[Unit]("SearchPackage")
searchPackage := {
Seq("./run_all_main_class.sh" !)
}
You can add all the classes in that shell script, somewhat like this
run_all_main_class.sh would look something like this, you can customize this shell script according to your needs.
#!/usr/bin/env bash
nohup sbt "runMain com.something.MainClass1" &
nohup sbt "runMain com.something.MainClass2" &

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

SBT gives java.lang.NullPointerException when trying to run spark - scala

Related

initialCommands in consoleProject is not executing one or more imports

SBT: generate code for submodule before compilation

Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: jdbc

Could not initialize class com.datastax.spark.connector.types.TypeConverter$

SBT: Create a task say runAll to run main class of all sub modules

Categories

Resources