Sbt to download list of jars specified - scala

I have a list of JARs and I want to download the JARs via SBT into destination directory specified. Is there a way/command to do this?
What I am trying is to have a list of jars in classpath for an external system like spark.
By default spark adds some jars to classpath and
I also have some jars that my app depends on in addition to spark classpath jars.
I don't want to build a fat jar.
And I need to package the dependent jars along with my jar in a tar ball.
My build.sbt
name := "app-jar"
scalaVersion := "2.10.5"
dependencyOverrides += "org.scala-lang" % "scala-library" % scalaVersion.value
scalacOptions ++= Seq("-unchecked", "-deprecation")
libraryDependencies += "org.apache.spark" %% "spark-streaming" % "1.4.1"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.4.1"
libraryDependencies += "org.apache.spark" %% "spark-streaming-kafka" % "1.4.1"
// I want these jars from here
libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector" % "1.4.0-M3"
libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector-java" % "1.4.0-M3"
libraryDependencies += "com.google.protobuf" % "protobuf-java" % "2.6.1"
...
// To here in my tar ball
So far I have achieved this using a shell script.
I want to know if there is a way to do the same with sbt .

Add sbt-pack to your project/plugins.sbt (or create it):
addSbtPlugin("org.xerial.sbt" % "sbt-pack" % "0.7.9")
Add packAutoSettings to your build.sbt and then run:
sbt pack
In target/pack/lib you will find all jars (with dependencies).
Update
Add new task to sbt:
val libraries = Seq(
"com.datastax.spark" %% "spark-cassandra-connector" % "1.4.0-M3",
"com.datastax.spark" %% "spark-cassandra-connector-java" % "1.4.0-M3",
"com.google.protobuf" % "protobuf-java" % "2.6.1"
)
libraryDependencies ++= libraries
lazy val removeNotNeeded = taskKey[Unit]("Remove not needed jars")
removeNotNeeded := {
val fileSet = libraries.map(l => s"${l.name}-${l.revision}.jar").toSet
println(s"$fileSet")
val ver = scalaVersion.value.split("\\.").take(2).mkString(".")
println(s"$ver")
file("target/pack/lib").listFiles.foreach{
file =>
val without = file.getName.replace(s"_$ver","")
println(s"$without")
if(!fileSet.contains(without)){
println(s"${file.getName} removed")
sbt.IO.delete(file)
}
}
}
After calling sbt pack call sbt removeNotNeeded. You will received only needed jar files.

Related

SCALA and Elastic Search: Add symbol to classpath (Databricks)

:)
I'm getting an error that I have no idea how to fix.. I could not really find outstanding documentation for this SchemaRDD type, and how to use it.
build.sbt contains:
scalaVersion := "2.11.12"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.0"
libraryDependencies += "org.scalaj" %% "scalaj-http" % "2.4.1"
libraryDependencies += "io.spray" %% "spray-json" % "1.3.5"
libraryDependencies += "com.amazonaws" % "aws-java-sdk-core" % "1.11.534"
libraryDependencies += "com.amazonaws" % "aws-encryption-sdk-java" % "1.3.6"
libraryDependencies += "com.amazonaws" % "aws-java-sdk" % "1.11.550"
libraryDependencies += "com.typesafe" % "config" % "1.3.4"
libraryDependencies += "org.elasticsearch" %% "elasticsearch-spark-1.2" % "2.4.4"
Error:
Symbol 'type org.apache.spark.sql.SchemaRDD' is missing from the classpath.
[error] This symbol is required by 'value org.elasticsearch.spark.sql.package.rdd'.
[error] Make sure that type SchemaRDD is in your classpath and check for conflicting dependencies with `-Ylog-classpath`.
[error] A full rebuild may help if 'package.class' was compiled against an incompatible version of org.apache.spark.sql.
Thank you a lot for all kind of support! :)
Dependency elasticsearch-spark-1.2 is for Spark 1.x, need to use elasticsearch-spark-20 instead. The latest version is built for Spark 2.3
libraryDependencies += "org.elasticsearch" %% "elasticsearch-spark-20" % "7.1.1"

How to set up a spark build.sbt file?

I have been trying all day and cannot figure out how to make it work.
So I have a common library that will be my core lib for spark.
My build.sbt file is not working:
name := "CommonLib"
version := "0.1"
scalaVersion := "2.12.5"
// addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.6")
// resolvers += "bintray-spark-packages" at "https://dl.bintray.com/spark-packages/maven/"
// resolvers += Resolver.sonatypeRepo("public")
libraryDependencies ++= Seq(
"org.apache.spark" % "spark-core_2.10" % "1.6.0" exclude("org.apache.hadoop", "hadoop-yarn-server-web-proxy"),
"org.apache.spark" % "spark-sql_2.10" % "1.6.0" exclude("org.apache.hadoop", "hadoop-yarn-server-web-proxy"),
"org.apache.hadoop" % "hadoop-common" % "2.7.0" exclude("org.apache.hadoop", "hadoop-yarn-server-web-proxy"),
// "org.apache.spark" % "spark-sql_2.10" % "1.6.0" exclude("org.apache.hadoop", "hadoop-yarn-server-web-proxy"),
"org.apache.spark" % "spark-hive_2.10" % "1.6.0" exclude("org.apache.hadoop", "hadoop-yarn-server-web-proxy"),
"org.apache.spark" % "spark-yarn_2.10" % "1.6.0" exclude("org.apache.hadoop", "hadoop-yarn-server-web-proxy"),
"com.github.scopt" %% "scopt" % "3.7.0"
)
//addSbtPlugin("org.spark-packages" % "sbt-spark-package" % "0.2.6")
//libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.3.0"
//libraryDependencies ++= {
// val sparkVer = "2.1.0"
// Seq(
// "org.apache.spark" %% "spark-core" % sparkVer % "provided" withSources()
// )
//}
All the commented out are all the test I've done and I don't know what to do anymore.
My goal is to have spark 2.3 to work and to have scope available too.
For my sbt version, I have 1.1.1 installed.
Thank you.
I think I had two main issues.
Spark is not compatible with scala 2.12 yet. So moving to 2.11.12 solved one issue
The second issue is that for intelliJ SBT console to reload the build.sbt you either need to kill and restart the console or use the reload command which I didnt know so I was not actually using the latest build.sbt file.
There's a Giter8 template that should work nicely:
https://github.com/holdenk/sparkProjectTemplate.g8

Build sbt for spark with janusgraph and gremlin scala

I was trying to setup a IntelliJ build for spark with janusgraph using gremlin scala but I am running into errors.
My build.sbt file is:
version := "1.0"
scalaVersion := "2.11.11"
libraryDependencies += "com.michaelpollmeier" % "gremlin-scala" % "2.3.0"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.2.1"
// https://mvnrepository.com/artifact/org.apache.spark/spark-sql
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.1"
// https://mvnrepository.com/artifact/org.apache.spark/spark-mllib
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "2.2.1"
// https://mvnrepository.com/artifact/org.apache.spark/spark-hive
libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.2.1"
// https://mvnrepository.com/artifact/org.janusgraph/janusgraph-core
libraryDependencies += "org.janusgraph" % "janusgraph-core" % "0.2.0"
libraryDependencies ++= Seq(
"ch.qos.logback" % "logback-classic" % "1.2.3" % Test,
"org.scalatest" %% "scalatest" % "3.0.3" % Test
)
resolvers ++= Seq(
Resolver.mavenLocal,
"Sonatype OSS" at "https://oss.sonatype.org/content/repositories/public"
)
But I am getting errors when I try to compile code that uses gremlin scala libraries or io.Source libraries. Can someone share their build file or tell what I should modify to fix it.
Thanks in advance.
So, I was trying to compile this code:
import gremlin.scala._
import org.apache.commons.configuration.BaseConfiguration
import org.janusgraph.core.JanusGraphFactory
class Test1() {
val conf = new BaseConfiguration()
conf.setProperty("storage.backend", "inmemory")
val gr = JanusGraphFactory.open(conf)
val graph = gr.asScala()
graph.close
}
object Test{
def main(args: Array[String]) {
val t = new Test1()
println("in Main")
}
}
The errors I get are:
Error:(1, 8) not found: object gremlin
import gremlin.scala._
Error:(10, 18) value asScala is not a member of org.janusgraph.core.JanusGraph
val graph = gr.asScala()
If you go to the Gremlin-Scala GitHub page you'll see that the current version is "3.3.1.1" and that
Typically you just need to add a dependency on "com.michaelpollmeier" %% "gremlin-scala" % "SOME_VERSION" and one for the graph db of your choice to your build.sbt (this readme assumes tinkergraph). The latest version is displayed at the top of this readme in the maven badge.
It is not a surprise that the APi has changed when the major version of the
library is different. If I change your first dependency as
//libraryDependencies += "com.michaelpollmeier" % "gremlin-scala" % "2.3.0" //old!
libraryDependencies += "com.michaelpollmeier" %% "gremlin-scala" % "3.3.1.1"
then your example code compiles for me.

Spark application throws javax.servlet.FilterRegistration

I'm using Scala to create and run a Spark application locally.
My build.sbt:
name : "SparkDemo"
version : "1.0"
scalaVersion : "2.10.4"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.2.0" exclude("org.apache.hadoop", "hadoop-client")
libraryDependencies += "org.apache.spark" % "spark-sql_2.10" % "1.2.0"
libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "2.6.0" excludeAll(
ExclusionRule(organization = "org.eclipse.jetty"))
libraryDependencies += "org.apache.hadoop" % "hadoop-mapreduce-client-core" % "2.6.0"
libraryDependencies += "org.apache.hbase" % "hbase-client" % "0.98.4-hadoop2"
libraryDependencies += "org.apache.hbase" % "hbase-server" % "0.98.4-hadoop2"
libraryDependencies += "org.apache.hbase" % "hbase-common" % "0.98.4-hadoop2"
mainClass in Compile := Some("demo.TruckEvents")
During runtime I get the exception:
Exception in thread "main" java.lang.ExceptionInInitializerError
during calling of... Caused by: java.lang.SecurityException: class
"javax.servlet.FilterRegistration"'s signer information does not match
signer information of other classes in the same package
The exception is triggered here:
val sc = new SparkContext("local", "HBaseTest")
I am using the IntelliJ Scala/SBT plugin.
I've seen that other people have also this problem suggestion solution. But this is a maven build... Is my sbt wrong here? Or any other suggestion how I can solve this problem?
See my answer to a similar question here. The class conflict comes about because HBase depends on org.mortbay.jetty, and Spark depends on org.eclipse.jetty. I was able to resolve the issue by excluding org.mortbay.jetty dependencies from HBase.
If you're pulling in hadoop-common, then you may also need to exclude javax.servlet from hadoop-common. I have a working HBase/Spark setup with my sbt dependencies set up as follows:
val clouderaVersion = "cdh5.2.0"
val hadoopVersion = s"2.5.0-$clouderaVersion"
val hbaseVersion = s"0.98.6-$clouderaVersion"
val sparkVersion = s"1.1.0-$clouderaVersion"
val hadoopCommon = "org.apache.hadoop" % "hadoop-common" % hadoopVersion % "provided" excludeAll ExclusionRule(organization = "javax.servlet")
val hbaseCommon = "org.apache.hbase" % "hbase-common" % hbaseVersion % "provided"
val hbaseClient = "org.apache.hbase" % "hbase-client" % hbaseVersion % "provided"
val hbaseProtocol = "org.apache.hbase" % "hbase-protocol" % hbaseVersion % "provided"
val hbaseHadoop2Compat = "org.apache.hbase" % "hbase-hadoop2-compat" % hbaseVersion % "provided"
val hbaseServer = "org.apache.hbase" % "hbase-server" % hbaseVersion % "provided" excludeAll ExclusionRule(organization = "org.mortbay.jetty")
val sparkCore = "org.apache.spark" %% "spark-core" % sparkVersion % "provided"
val sparkStreaming = "org.apache.spark" %% "spark-streaming" % sparkVersion % "provided"
val sparkStreamingKafka = "org.apache.spark" %% "spark-streaming-kafka" % sparkVersion exclude("org.apache.spark", "spark-streaming_2.10")
If you are using IntelliJ IDEA, try this:
Right click the project root folder, choose Open Module Settings
In the new window, choose Modules in the left navigation column
In the column rightmost, select Dependencies tab, find Maven: javax.servlet:servlet-api:2.5
Finally, just move this item to the bottom by pressing ALT+Down.
It should solve this problem.
This method came from http://wpcertification.blogspot.ru/2016/01/spark-error-class-javaxservletfilterreg.html
If it is happening in Intellij Idea you should go to the project setting and find the jar in the modules, and remove it. Then run your code with sbt through shell. It will get the jar files itself, and then go back to intellij and re-run the code through intellij. It somehow works for me and fixes the error. I am not sure what was the problem since it doesn't show up anymore.
Oh, I also removed the jar file, and added "javax.servlet:javax.servlet-api:3.1.0" through maven by hand and now I can see the error gone.
When you use SBT, FilterRegistration class is present in 3.0 and also if you use JETTY Or Java 8 this JAR 2.5 it automatically adds as dependency,
Fix: Servlet-api-2.5 JAR was the mess there, I resolved this issue by adding servlet-api-3.0 jar in dependencies,
For me works the following:
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion.value % "provided",
"org.apache.spark" %% "spark-sql" % sparkVersion.value % "provided",
....................................................................
).map(_.excludeAll(ExclusionRule(organization = "javax.servlet")))
If you are running inside intellij, please check in project settings if you have two active modules (one for the project and another for sbt).
Probably a problem while importing existing project.
try running a simple program without the hadoop and hbase dependency
libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "2.6.0" excludeAll(ExclusionRule(organization = "org.eclipse.jetty"))
libraryDependencies += "org.apache.hadoop" % "hadoop-mapreduce-client-core" % "2.6.0"
libraryDependencies += "org.apache.hbase" % "hbase-client" % "0.98.4-hadoop2"
libraryDependencies += "org.apache.hbase" % "hbase-server" % "0.98.4-hadoop2"
libraryDependencies += "org.apache.hbase" % "hbase-common" % "0.98.4-hadoop2"
There should be mismatch of the dependencies. also make sure you have same version of jars while you compile and while you run.
Also is it possible to run the code on spark shell to reproduce ? I will be able to help better.

use sbt for generate a webapp with icefaces

I'm new to sbt and I will generate a web application with jsf 2.0 mojarra and icefaces, but i don't know how to build the build.sbt. I try things like this:
libraryDependencies += "org.icefaces" % "icefaces" % "2.0.2"
libraryDependencies += "net.java" % "jsf-api" % "2.1.2"
libraryDependencies += "net.java" % "jsf-impl" % "2.1.2"
Maybe is this horrible wrong and sbt can't find the module:
module not found: com.sun.faces#jsf-impl:2.1.1-b04/ivys/ivy.xml
resolvers += "java.net maven 2 repo" at "http://download.java.net/maven/2"
libraryDependencies += "org.icefaces" % "icefaces" % "2.0.2"
libraryDependencies += "com.sun.faces" % "jsf-api" % "2.1.2"
libraryDependencies += "com.sun.faces" % "jsf-impl" % "2.1.2"
This will only work with sbt 0.10+. Make sure you keep the blank lines between expressions.