Spark application throws javax.servlet.FilterRegistration - scala

I'm using Scala to create and run a Spark application locally.
My build.sbt:
name : "SparkDemo"
version : "1.0"
scalaVersion : "2.10.4"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.2.0" exclude("org.apache.hadoop", "hadoop-client")
libraryDependencies += "org.apache.spark" % "spark-sql_2.10" % "1.2.0"
libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "2.6.0" excludeAll(
ExclusionRule(organization = "org.eclipse.jetty"))
libraryDependencies += "org.apache.hadoop" % "hadoop-mapreduce-client-core" % "2.6.0"
libraryDependencies += "org.apache.hbase" % "hbase-client" % "0.98.4-hadoop2"
libraryDependencies += "org.apache.hbase" % "hbase-server" % "0.98.4-hadoop2"
libraryDependencies += "org.apache.hbase" % "hbase-common" % "0.98.4-hadoop2"
mainClass in Compile := Some("demo.TruckEvents")
During runtime I get the exception:
Exception in thread "main" java.lang.ExceptionInInitializerError
during calling of... Caused by: java.lang.SecurityException: class
"javax.servlet.FilterRegistration"'s signer information does not match
signer information of other classes in the same package
The exception is triggered here:
val sc = new SparkContext("local", "HBaseTest")
I am using the IntelliJ Scala/SBT plugin.
I've seen that other people have also this problem suggestion solution. But this is a maven build... Is my sbt wrong here? Or any other suggestion how I can solve this problem?

See my answer to a similar question here. The class conflict comes about because HBase depends on org.mortbay.jetty, and Spark depends on org.eclipse.jetty. I was able to resolve the issue by excluding org.mortbay.jetty dependencies from HBase.
If you're pulling in hadoop-common, then you may also need to exclude javax.servlet from hadoop-common. I have a working HBase/Spark setup with my sbt dependencies set up as follows:
val clouderaVersion = "cdh5.2.0"
val hadoopVersion = s"2.5.0-$clouderaVersion"
val hbaseVersion = s"0.98.6-$clouderaVersion"
val sparkVersion = s"1.1.0-$clouderaVersion"
val hadoopCommon = "org.apache.hadoop" % "hadoop-common" % hadoopVersion % "provided" excludeAll ExclusionRule(organization = "javax.servlet")
val hbaseCommon = "org.apache.hbase" % "hbase-common" % hbaseVersion % "provided"
val hbaseClient = "org.apache.hbase" % "hbase-client" % hbaseVersion % "provided"
val hbaseProtocol = "org.apache.hbase" % "hbase-protocol" % hbaseVersion % "provided"
val hbaseHadoop2Compat = "org.apache.hbase" % "hbase-hadoop2-compat" % hbaseVersion % "provided"
val hbaseServer = "org.apache.hbase" % "hbase-server" % hbaseVersion % "provided" excludeAll ExclusionRule(organization = "org.mortbay.jetty")
val sparkCore = "org.apache.spark" %% "spark-core" % sparkVersion % "provided"
val sparkStreaming = "org.apache.spark" %% "spark-streaming" % sparkVersion % "provided"
val sparkStreamingKafka = "org.apache.spark" %% "spark-streaming-kafka" % sparkVersion exclude("org.apache.spark", "spark-streaming_2.10")

If you are using IntelliJ IDEA, try this:
Right click the project root folder, choose Open Module Settings
In the new window, choose Modules in the left navigation column
In the column rightmost, select Dependencies tab, find Maven: javax.servlet:servlet-api:2.5
Finally, just move this item to the bottom by pressing ALT+Down.
It should solve this problem.
This method came from http://wpcertification.blogspot.ru/2016/01/spark-error-class-javaxservletfilterreg.html

If it is happening in Intellij Idea you should go to the project setting and find the jar in the modules, and remove it. Then run your code with sbt through shell. It will get the jar files itself, and then go back to intellij and re-run the code through intellij. It somehow works for me and fixes the error. I am not sure what was the problem since it doesn't show up anymore.
Oh, I also removed the jar file, and added "javax.servlet:javax.servlet-api:3.1.0" through maven by hand and now I can see the error gone.

When you use SBT, FilterRegistration class is present in 3.0 and also if you use JETTY Or Java 8 this JAR 2.5 it automatically adds as dependency,
Fix: Servlet-api-2.5 JAR was the mess there, I resolved this issue by adding servlet-api-3.0 jar in dependencies,

For me works the following:
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion.value % "provided",
"org.apache.spark" %% "spark-sql" % sparkVersion.value % "provided",
....................................................................
).map(_.excludeAll(ExclusionRule(organization = "javax.servlet")))

If you are running inside intellij, please check in project settings if you have two active modules (one for the project and another for sbt).
Probably a problem while importing existing project.

try running a simple program without the hadoop and hbase dependency
libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "2.6.0" excludeAll(ExclusionRule(organization = "org.eclipse.jetty"))
libraryDependencies += "org.apache.hadoop" % "hadoop-mapreduce-client-core" % "2.6.0"
libraryDependencies += "org.apache.hbase" % "hbase-client" % "0.98.4-hadoop2"
libraryDependencies += "org.apache.hbase" % "hbase-server" % "0.98.4-hadoop2"
libraryDependencies += "org.apache.hbase" % "hbase-common" % "0.98.4-hadoop2"
There should be mismatch of the dependencies. also make sure you have same version of jars while you compile and while you run.
Also is it possible to run the code on spark shell to reproduce ? I will be able to help better.

Related

Scala module 2.10.0 requires Jackson Databind version >= 2.10.0 and < 2.11.0

I have a sbt projects and I want to make a test with scala test and shared spark session. Several weeks ago my project started to make an error.
java.lang.ExceptionInInitializerError
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
.....
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Scala module 2.10.0 requires Jackson Databind version >= 2.10.0 and < 2.11.0
at com.fasterxml.jackson.module.scala.JacksonModule.setupModule(JacksonModule.scala:61)
at com.fasterxml.jackson.module.scala.JacksonModule.setupModule$(JacksonModule.scala:46)
There is a very simple test
import org.apache.spark.sql.QueryTest.checkAnswer
import org.apache.spark.sql.Row
import org.apache.spark.sql.test.SharedSparkSession
class SparkTestSpec extends SharedSparkSession {
import testImplicits._
test("join - join using") {
val df = Seq(1, 2, 3).toDF("int")
checkAnswer(df, Row(1) :: Row(2) :: Row(3) :: Nil)
}
}
And sbt config
ThisBuild / scalaVersion := "2.12.10"
val sparkVersion = "3.1.0"
val scalaTestVersion = "3.2.1"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-sql" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion % Test,
"org.apache.spark" %% "spark-sql" % sparkVersion % Test classifier "tests",
"org.apache.spark" %% "spark-catalyst" % sparkVersion % Test,
"org.apache.spark" %% "spark-catalyst" % sparkVersion % Test classifier "tests",
"org.apache.spark" %% "spark-hive" % sparkVersion % Test,
"org.apache.spark" %% "spark-hive" % sparkVersion % Test classifier "tests",
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-core" % sparkVersion % Test classifier "tests",
"log4j" % "log4j" % "1.2.17",
"org.slf4j" % "slf4j-log4j12" % "1.7.30",
"org.scalatest" %% "scalatest" % scalaTestVersion % Test,
"org.scalatestplus" %% "scalacheck-1-14" % "3.2.2.0",
)
This is a very classic issue with Jackson. The error tells you that you need to have a single version of Jackson across all your dependencies but it's not the case.
Usually you have both Spark and another library pulling transitively Jackson in different versions.
What you need to do is:
run sbt dependencyTree to identify which library is pulling Jackson and which version
define a dependencyOverrides to force the same Jackson version for all Jackson libraries (which version is up to you depending on compatibility with the other libraries needing it)

How to set up a spark build.sbt file?

I have been trying all day and cannot figure out how to make it work.
So I have a common library that will be my core lib for spark.
My build.sbt file is not working:
name := "CommonLib"
version := "0.1"
scalaVersion := "2.12.5"
// addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.6")
// resolvers += "bintray-spark-packages" at "https://dl.bintray.com/spark-packages/maven/"
// resolvers += Resolver.sonatypeRepo("public")
libraryDependencies ++= Seq(
"org.apache.spark" % "spark-core_2.10" % "1.6.0" exclude("org.apache.hadoop", "hadoop-yarn-server-web-proxy"),
"org.apache.spark" % "spark-sql_2.10" % "1.6.0" exclude("org.apache.hadoop", "hadoop-yarn-server-web-proxy"),
"org.apache.hadoop" % "hadoop-common" % "2.7.0" exclude("org.apache.hadoop", "hadoop-yarn-server-web-proxy"),
// "org.apache.spark" % "spark-sql_2.10" % "1.6.0" exclude("org.apache.hadoop", "hadoop-yarn-server-web-proxy"),
"org.apache.spark" % "spark-hive_2.10" % "1.6.0" exclude("org.apache.hadoop", "hadoop-yarn-server-web-proxy"),
"org.apache.spark" % "spark-yarn_2.10" % "1.6.0" exclude("org.apache.hadoop", "hadoop-yarn-server-web-proxy"),
"com.github.scopt" %% "scopt" % "3.7.0"
)
//addSbtPlugin("org.spark-packages" % "sbt-spark-package" % "0.2.6")
//libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.3.0"
//libraryDependencies ++= {
// val sparkVer = "2.1.0"
// Seq(
// "org.apache.spark" %% "spark-core" % sparkVer % "provided" withSources()
// )
//}
All the commented out are all the test I've done and I don't know what to do anymore.
My goal is to have spark 2.3 to work and to have scope available too.
For my sbt version, I have 1.1.1 installed.
Thank you.
I think I had two main issues.
Spark is not compatible with scala 2.12 yet. So moving to 2.11.12 solved one issue
The second issue is that for intelliJ SBT console to reload the build.sbt you either need to kill and restart the console or use the reload command which I didnt know so I was not actually using the latest build.sbt file.
There's a Giter8 template that should work nicely:
https://github.com/holdenk/sparkProjectTemplate.g8

SBT - Class not found, continuing with a stub

I am currently migrating my Play 2 Scala API project and encounter 10 warnings during the compilations indicating :
[warn] Class play.core.enhancers.PropertiesEnhancer$GeneratedAccessor not found - continuing with a stub.
All of them are the same, and I don't have any other indications. I've searched a bit for other similar cases, it's often because of the JDK version and so on but I'm already in 1.8.
Here's what I have in plugins.sbt :
addSbtPlugin("com.typesafe.play" % "sbt-plugin" % "2.5.3")
addSbtPlugin("org.scalastyle" %% "scalastyle-sbt-plugin" % "0.8.0")
addSbtPlugin("com.sksamuel.scapegoat" %% "sbt-scapegoat" % "1.0.4")
and in build.sbt :
libraryDependencies ++= Seq(
cache,
ws,
"org.reactivemongo" %% "play2-reactivemongo" % "0.10.5.0.akka23",
"org.reactivemongo" %% "reactivemongo" % "0.10.5.0.akka23",
"org.mockito" % "mockito-core" % "1.10.5" % "test",
"org.scalatestplus" %% "play" % "1.2.0" % "test",
"com.amazonaws" % "aws-java-sdk" % "1.8.3",
"org.cvogt" %% "play-json-extensions" % "0.8.0",
javaCore,
"com.clever-age" % "play2-elasticsearch" % "1.1.0" excludeAll(
ExclusionRule(organization = "org.scala-lang"),
ExclusionRule(organization = "com.typesafe.play"),
ExclusionRule(organization = "org.apache.commons", artifact = "commons-lang3")
)
)
Don't hesitate if you need anything else :)
It's not something that blocks me but I'd prefer avoid these 10 warnings everytime I recompile my application.
Thank you ! :)
It seems something in your code is trying to use the Play enhancer and is failing to find it. Are you using Ebean or something that may require the enhancer?
You can try to add the plugin to your plugins.sbt
addSbtPlugin("com.typesafe.sbt" % "sbt-play-enhancer" % "1.1.0")
This should make the warning go away. You can then disable it if you like:
# In build.sbt
playEnhancerEnabled := false

Sbt to download list of jars specified

I have a list of JARs and I want to download the JARs via SBT into destination directory specified. Is there a way/command to do this?
What I am trying is to have a list of jars in classpath for an external system like spark.
By default spark adds some jars to classpath and
I also have some jars that my app depends on in addition to spark classpath jars.
I don't want to build a fat jar.
And I need to package the dependent jars along with my jar in a tar ball.
My build.sbt
name := "app-jar"
scalaVersion := "2.10.5"
dependencyOverrides += "org.scala-lang" % "scala-library" % scalaVersion.value
scalacOptions ++= Seq("-unchecked", "-deprecation")
libraryDependencies += "org.apache.spark" %% "spark-streaming" % "1.4.1"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.4.1"
libraryDependencies += "org.apache.spark" %% "spark-streaming-kafka" % "1.4.1"
// I want these jars from here
libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector" % "1.4.0-M3"
libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector-java" % "1.4.0-M3"
libraryDependencies += "com.google.protobuf" % "protobuf-java" % "2.6.1"
...
// To here in my tar ball
So far I have achieved this using a shell script.
I want to know if there is a way to do the same with sbt .
Add sbt-pack to your project/plugins.sbt (or create it):
addSbtPlugin("org.xerial.sbt" % "sbt-pack" % "0.7.9")
Add packAutoSettings to your build.sbt and then run:
sbt pack
In target/pack/lib you will find all jars (with dependencies).
Update
Add new task to sbt:
val libraries = Seq(
"com.datastax.spark" %% "spark-cassandra-connector" % "1.4.0-M3",
"com.datastax.spark" %% "spark-cassandra-connector-java" % "1.4.0-M3",
"com.google.protobuf" % "protobuf-java" % "2.6.1"
)
libraryDependencies ++= libraries
lazy val removeNotNeeded = taskKey[Unit]("Remove not needed jars")
removeNotNeeded := {
val fileSet = libraries.map(l => s"${l.name}-${l.revision}.jar").toSet
println(s"$fileSet")
val ver = scalaVersion.value.split("\\.").take(2).mkString(".")
println(s"$ver")
file("target/pack/lib").listFiles.foreach{
file =>
val without = file.getName.replace(s"_$ver","")
println(s"$without")
if(!fileSet.contains(without)){
println(s"${file.getName} removed")
sbt.IO.delete(file)
}
}
}
After calling sbt pack call sbt removeNotNeeded. You will received only needed jar files.

Azure SDK in scala

I have imported several of the sdk components to my sbt file. In intelliJ all compiles fine, and when I try to compile in sbt I get an error of
object microsoft is not a member of package com`
libraryDependencies ++= Seq(
"com.microsoft.azure" % "azure-mgmt-resources" % "0.8.3",
"com.microsoft.azure" % "azure-management" % "0.8.0",
"com.microsoft.azure" % "azure-storage" % "4.0.0",
"com.microsoft.azure" % "azure-mgmt-utility" % "0.8.3" exclude ("commons-codec", "commons-codec"),
"com.sun.jersey" % "jersey-core" % "1.19"
)
I Had to do sbt reload in order to refresh the dependencies