Spark not working with pureconfig - scala

I'm trying to use pureConfig and configFactory for my spark application configuration.
here is my code:
import pureconfig.{loadConfigOrThrow}
object Source{
def apply(keyName: String, configArguments: Config): Source = {
keyName.toLowerCase match {
case "mysql" =>
val properties = loadConfigOrThrow[DBConnectionProperties](configArguments)
new MysqlSource(None, properties)
case "files" =>
val properties = loadConfigOrThrow[FilesSourceProperties](configArguments)
new Files(properties)
case _ => throw new NoSuchElementException(s"Unknown Source ${keyName.toLowerCase}")
}
}
}
import Source
val config = ConfigFactory.parseString(result.mkString("\n"))
val source = Source("mysql",config.getConfig("source.mysql"))
when I run it from the IDE (intelliJ) or directly from java
(i.e java jar...) it works fine.
But when I run it with spark-submit it fails with the following error:
Exception in thread "main" java.lang.NoSuchMethodError: shapeless.Witness$.mkWitness(Ljava/lang/Object;)Lshapeless/Witness;
From a quick search I found a similar similar to this question.
which suggest the reason for this is due to the fact both spark and pureConfig depends on Shapeless but with different versions,
I tried to shade it as suggested in the answer
assemblyShadeRules in assembly := Seq(
ShadeRule.rename("shapeless.**" -> "shadeshapless.#1")
.inLibrary("com.github.pureconfig" %% "pureconfig" % "0.7.0").inProject
)
but it didn't work as well
can it be from a different reason?
any idea what may work?
Thanks

You also have to shade shapeless inside its own JAR, in addition to pureconfig:
assemblyShadeRules in assembly := Seq(
ShadeRule.rename("shapeless.**" -> "shadeshapless.#1")
.inLibrary("com.chuusai" % "shapeless_2.11" % "2.3.2")
.inLibrary("com.github.pureconfig" %% "pureconfig" % "0.7.0")
.inProject
)
Make sure to add the correct shapeless version.

Related

How to exclude logging (like logback-classic) from jar published by sbt

My Scala project has a libraryDependency on slf4j because I use the API for logging. I also want to see the logging output while running from sbt or IntelliJ, both for the Apps that runMain and the unit tests that testOnly from sbt. Therefore there is also a libraryDependency on logback-classic. However, I do not want that second dependency published because of the convention stated below. When someone uses my published library, the transitive dependency should not be automatically brought in. How should that be done? I don't want to explain to the user how to manually exclude the transitive dependency, because they might be using any number of different tools. The logback-classic should continue to be included in an assembled jar, however, if at all possible. It doesn't seem like exclude() is the answer.
"Embedded components such as libraries or frameworks should not declare a dependency on any SLF4J binding/provider [like logback-classic] but only depend on slf4j-api. When a library declares a transitive dependency on a specific binding, that binding is imposed on the end-user negating the purpose of SLF4J. Note that declaring a non-transitive dependency on a binding, for example for testing, does not affect the end-user."
Publish the jar with slf4j-api but use the sbt Test configuration for logback. Unit tests will then have a concrete implementation but it won't be packaged in your artifact.
libraryDependencies ++= Seq(
"org.slf4j" % "slf4j-api" % "1.7.36",
"ch.qos.logback" % "logback-classic" % "1.2.11" % Test
)
This would be a project with sub-projects. Your sample app uses a concrete implementation, but not the library. Anyone using the library would provide their own.
lazy val root = (project in file("."))
.settings(
publish / skip := true,
)
.aggregate(sampleApp, theLibrary)
lazy val sampleApp = project
.settings(
publish / skip := true,
libraryDependencies ++= Seq(
"ch.qos.logback" % "logback-classic" % "1.2.11"
)
)
.dependsOn(theLibrary % "test->test;compile->compile")
lazy val theLibrary = project
.settings(
libraryDependencies ++= Seq(
"org.slf4j" % "slf4j-api" % "1.7.36",
"ch.qos.logback" % "logback-classic" % "1.2.11" % Test
)
)
My tentative solution is to add this code to an sbt file
ThisBuild / pomPostProcess := {
val logback = DependencyId("ch.qos.logback", "logback-classic")
val rule = DependencyFilter { dependencyId =>
dependencyId != logback
}
(node: Node) => new RuleTransformer(rule).transform(node).head
}
and back it up with this Scala code in the project directory
package org.clulab.sbt
import scala.xml.Node
import scala.xml.NodeSeq
import scala.xml.transform.RewriteRule
case class DependencyId(groupId: String, artifactId: String)
abstract class DependencyTransformer extends RewriteRule {
override def transform(node: Node): NodeSeq = {
val name = node.nameToString(new StringBuilder()).toString()
name match {
case "dependency" =>
val groupId = (node \ "groupId").text.trim
val artifactId = (node \ "artifactId").text.trim
transform(node, DependencyId(groupId, artifactId))
case _ => node
}
}
def transform(node: Node, dependencyId: DependencyId): NodeSeq
}
class DependencyFilter(filter: DependencyId => Boolean) extends DependencyTransformer {
def transform(node: Node, dependencyId: DependencyId): NodeSeq =
if (filter(dependencyId)) node
else Nil
}
object DependencyFilter {
def apply(filter: DependencyId => Boolean): DependencyFilter = new DependencyFilter(filter)
}
I'm still hoping to find a similar solution for editing ivy.xml.

Is it possible to parse Json in build.sbt?

I want to validate a downloaded Json file from server during build time and failed the build, if there are any errors.
Is it possible to parse/validate Json in build.sbt?
Your build.sbt is scala code so it can do everything you can do with other scala code.
You should be able to add dependencies (e.g. a json parsing library) of your build.sbt code in project/build.sbt since sbt is recursive.
Here is an example to supplement Jasper-M's answer.
For example, add liahoy's requests-scala HTTP client library, and upickle JSON deserialisation library to project/builds.sbt
libraryDependencies ++= List(
"com.lihaoyi" %% "requests" % "0.6.0",
"com.lihaoyi" %% "upickle" % "1.1.0"
)
Then under project/Preconditions.scala add the following object which will contain assertions you want to check before running the build
object Preconditions {
import scala.util.Try
import requests._
import upickle.default._
case class User(login: String, id: Int)
implicit val userRW: ReadWriter[User] = macroRW
def validateUserJson() = {
val result = Try(read[User](get("https://api.github.com/users/lihaoyi").text)).isSuccess
assert(result, "User JSON should be valid")
}
}
Now these facilities will be available to build.sbt under the root project. Lets create a task in build.sbt to run the assertions
lazy val checkPreconditions = taskKey[Unit]("Validate pre-conditions before building")
checkPreconditions := {
Preconditions.validateUserJson()
println("All preconditions passed!")
}
and finally lets make compile task dependant on checkPreconditions task using dependsOn like so
Compile / compile := (Compile / compile).dependsOn(checkPreconditions).value
Now executing sbt compile should check pre-conditions before proceeding with compilation.

run-main-0) scala.ScalaReflectionException: class java.sql.Date in JavaMirror with ClasspathFilter(

Hi I have a file given to by my teacher. It is about Scala and Spark.
When I run the code it gives me this exception:
(run-main-0) scala.ScalaReflectionException: class java.sql.Date in
JavaMirror with ClasspathFilter
The file itself looks like this:
import org.apache.spark.ml.feature.Tokenizer
import org.apache.spark.sql.Dataset
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types._
object Main {
type Embedding = (String, List[Double])
type ParsedReview = (Integer, String, Double)
org.apache.log4j.Logger getLogger "org" setLevel
(org.apache.log4j.Level.WARN)
org.apache.log4j.Logger getLogger "akka" setLevel
(org.apache.log4j.Level.WARN)
val spark = SparkSession.builder
.appName ("Sentiment")
.master ("local[9]")
.getOrCreate
import spark.implicits._
val reviewSchema = StructType(Array(
StructField ("reviewText", StringType, nullable=false),
StructField ("overall", DoubleType, nullable=false),
StructField ("summary", StringType, nullable=false)))
// Read file and merge the text abd summary into a single text column
def loadReviews (path: String): Dataset[ParsedReview] =
spark
.read
.schema (reviewSchema)
.json (path)
.rdd
.zipWithUniqueId
.map[(Integer,String,Double)] { case (row,id) => (id.toInt, s"${row getString 2} ${row getString 0}", row getDouble 1) }
.toDS
.withColumnRenamed ("_1", "id" )
.withColumnRenamed ("_2", "text")
.withColumnRenamed ("_3", "overall")
.as[ParsedReview]
// Load the GLoVe embeddings file
def loadGlove (path: String): Dataset[Embedding] =
spark
.read
.text (path)
.map { _ getString 0 split " " }
.map (r => (r.head, r.tail.toList.map (_.toDouble))) // yuck!
.withColumnRenamed ("_1", "word" )
.withColumnRenamed ("_2", "vec")
.as[Embedding]
def main(args: Array[String]) = {
val glove = loadGlove ("Data/glove.6B.50d.txt") // take glove
val reviews = loadReviews ("Data/Electronics_5.json") // FIXME
// replace the following with the project code
glove.show
reviews.show
spark.stop
}
}
I need to keep the line
import org.apache.spark.sql.Dataset
because some code depends on it but it is exactly because of it I have an exception throw.
My build.sbt file looks like this:
name := "Sentiment Analysis Project"
version := "1.1"
scalaVersion := "2.11.12"
scalacOptions ++= Seq("-unchecked", "-deprecation")
initialCommands in console :=
"""
import Main._
"""
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.3.0"
libraryDependencies += "org.apache.spark" %% "spark-mllib" %
"2.3.0"
libraryDependencies += "org.scalactic" %% "scalactic" % "3.0.5"
libraryDependencies += "org.scalatest" %% "scalatest" % "3.0.5" %
"test"
The Scala guide recommends you compile with Java8:
We recommend using Java 8 for compiling Scala code. Since the JVM is backward compatible, it is usually safe to use a newer JVM to run your code compiled by the Scala compiler for older JVM versions.
Although it's only a recommendation, I found it to fix the issue you mention.
In order to install Java 8 using Homebrew, it's best to use jenv which will help you handle multiple Java versions should you need to.
brew install jenv
Then run the following to add a tap (repository) of alternative versions of casks, since Java 8 is not in the default tap anymore:
brew tap homebrew/cask-versions
To install Java 8:
brew cask install homebrew/cask-versions/adoptopenjdk8
Run the following to add the previously installed Java version to jenv's list of versions:
jenv add /Library/Java/JavaVirtualMachines/<installed_java_version>/Contents/Home
Finally run
jenv global 1.8
or
jenv local 1.8
to use Java 1.8 globally or locally (in the current folder).
Fore more information, follow the instructions at jenv's website

How to do Slick configuration via application.conf from within custom sbt task?

I want to create an set task which creates a database schema with slick. For that, I have a task object like the following in my project:
object CreateSchema {
val instance = Database.forConfig("localDb")
def main(args: Array[String]) {
val createFuture = instance.run(createActions)
...
Await.ready(createFuture, Duration.Inf)
}
}
and in my build.sbt I define a task:
lazy val createSchema = taskKey[Unit]("CREATE database schema")
fullRunTask(createSchema, Runtime, "sbt.CreateSchema")
which gets executed as expected when I run sbt createSchema from the command line.
However, the problem is that application.conf doesn't seem to get taken into account (I've also tried different scopes like Compile or Test). As a result, the task fails due to com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'localDb'.
How can I fix this so the configuration is available?
I found a lot of questions here that deal with using the application.conf inside the build.sbt itself, but that is not what I need.
I have setup a little demo using SBT 0.13.8 and Slick 3.0.0, which is working as expected. (And even without modifying "-Dconfig.resource".)
Files
./build.sbt
name := "SO_20150915"
version := "1.0"
scalaVersion := "2.11.7"
libraryDependencies ++= Seq(
"com.typesafe" % "config" % "1.3.0" withSources() withJavadoc(),
"com.typesafe.slick" %% "slick" % "3.0.0",
"org.slf4j" % "slf4j-nop" % "1.6.4",
"com.h2database" % "h2" % "1.3.175"
)
lazy val createSchema = taskKey[Unit]("CREATE database schema")
fullRunTask(createSchema, Runtime, "somefun.CallMe")
./project/build.properties
sbt.version = 0.13.8
./src/main/resources/reference.conf
hello {
world = "buuh."
}
h2mem1 = {
url = "jdbc:h2:mem:test1"
driver = org.h2.Driver
connectionPool = disabled
keepAliveConnection = true
}
./src/main/scala/somefun/CallMe.scala
package somefun
import com.typesafe.config.Config
import com.typesafe.config.ConfigFactory
import slick.driver.H2Driver.api._
/**
* SO_20150915
* Created by martin on 15.09.15.
*/
object CallMe {
def main(args: Array[String]) : Unit = {
println("Hello")
val settings = new Settings()
println(s"Settings read from hello.world: ${settings.hw}" )
val db = Database.forConfig("h2mem1")
try {
// ...
println("Do something with your database.")
} finally db.close
}
}
class Settings(val config: Config) {
// This verifies that the Config is sane and has our
// reference config. Importantly, we specify the "di3"
// path so we only validate settings that belong to this
// library. Otherwise, we might throw mistaken errors about
// settings we know nothing about.
config.checkValid(ConfigFactory.defaultReference(), "hello")
// This uses the standard default Config, if none is provided,
// which simplifies apps willing to use the defaults
def this() {
this(ConfigFactory.load())
}
val hw = config.getString("hello.world")
}
Result
If running sbt createSchema from Console I obtain the output
[info] Loading project definition from /home/.../SO_20150915/project
[info] Set current project to SO_20150915 (in build file:/home/.../SO_20150915/)
[info] Running somefun.CallMe
Hello
Settings read from hello.world: buuh.
Do something with your database.
[success] Total time: 1 s, completed 15.09.2015 10:42:20
Ideas
Please verify that this unmodified demo project also works for you.
Then try changing SBT version in the demo project and see if that changes something.
Alternatively, recheck your project setup and try to use a higher version of SBT.
Answer
So, even if your code resides in your src-folder, it is called from within SBT. That means, you are trying to load your application.conf from within the classpath context of SBT.
Slick uses Typesafe Config internally. (So the approach below (described in background) is not applicable, as you can not modify the Config loading mechanism itself).
Instead try the set the path to your application.conf explicitly via config.resource, see typesafe config docu (search for config.resource)
Option 1
Either set config.resource (via -Dconfig.resource=...) before starting sbt
Option 2
Or from within build.sbt as Scala code
sys.props("config.resource") = "./src/main/resources/application.conf"
Option 3
Or create a Task in SBT via
lazy val configPath = TaskKey[Unit]("configPath", "Set path for application.conf")
and add
configPath := Def.task {
sys.props("config.resource") = "./src/main/resources/application.conf"
}
to your sequence of settings.
Please let me know, if that worked.
Background information
Recently, I was writing a custom plugin for SBT, where I also tried to access a reference.conf as well. Unfortunately, I was not able to access any of .conf placed within project-subfolder using the default ClassLoader.
In the end I created a testenvironment.conf in project folder and used the following code to load the (typesafe) config:
def getConfig: Config = {
val classLoader = new java.net.URLClassLoader( Array( new File("./project/").toURI.toURL ) )
ConfigFactory.load(classLoader, "testenvironment")
}
or for loading a genereal application.conf from ./src/main/resources:
def getConfig: Config = {
val classLoader = new java.net.URLClassLoader( Array( new File("./src/main/resources/").toURI.toURL ) )
// no .conf basename given, so look for reference.conf and application.conf
// use specific classLoader
ConfigFactory.load(classLoader)
}

scala.tools.nsc.IMain within Play 2.1

I googled a lot and am totally stuck now. I know, that there are similar questions but please read to the end. I have tried all proposed solutions and none did work.
I am trying to use the IMain class from scala.tools.nsc within a Play 2.1 project (Using Scala 2.10.0).
Controller Code
This is the code, where I try to use the IMain in a Websocket. This is only for testing.
object Scala extends Controller {
def session = WebSocket.using[String] { request =>
val interpreter = new IMain()
val (out,channel) = Concurrent.broadcast[String]
val in = Iteratee.foreach[String]{ code =>
interpreter.interpret(code) match {
case Results.Error => channel.push("error")
case Results.Incomplete => channel.push("incomplete")
case Results.Success => channel.push("success")
}
}
(in,out)
}
}
As soon as something gets sent over the Websocket the following error gets logged by play:
Failed to initialize compiler: object scala.runtime in compiler mirror not found.
** Note that as of 2.8 scala does not assume use of the java classpath.
** For the old behavior pass -usejavacp to scala, or if using a Settings
** object programatically, settings.usejavacp.value = true.
Build.scala
object ApplicationBuild extends Build {
val appName = "escalator"
val appVersion = "1.0-SNAPSHOT"
val appDependencies = Seq(
"org.scala-lang" % "scala-compiler" % "2.10.0"
)
val main = play.Project(appName, appVersion, appDependencies).settings(
)
}
What I have tried so far
All this didn't work:
I have included fork := true in the Build.scala
A Settings object with:
embeddedDefaults[MyType]
usejavacp.value = true
The soultion proposed as answer to Question Embedded Scala REPL inherits parent classpath
I dont know what to do now.
The problem here is that sbt doesnt add scala-library to the class path.
The following workaround works.
First create a folder lib in the top project directory(the parent of app,conf etc) and copy there the scala-library.jar
Then you can use the following code to host an interpreter :
val settings = new Settings
settings.bootclasspath.value +=scala.tools.util.PathResolver.Environment.javaBootClassPath + File.pathSeparator + "lib/scala-library.jar"
val in = new IMain(settings){
override protected def parentClassLoader = settings.getClass.getClassLoader()
}
val res = in.interpret("val x = 1")
The above creates the bootclasspath by adding to the java class the scala library. It's not a problem with play framework it comes from the sbt. The same problem occures for any scala project when it runs with sbt. Tested with a simple project. When it runs from eclipse its works fine.
EDIT: Link to sample project demonstrating the above.`
I wonder if the reflect jar is missing. Try adding this too in appDependencies.
"org.scala-lang" % "scala-reflect" % "2.10.0"