"failed to find data source: json" when running scala assembly fat jar - scala

I have a scala-spark project and I have created a fat jar using sbt assembly. An error raised when I tried to run jar, "Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: json. Please find packages at http://spark.apache.org/third-party-projects.html". I added this code block to build.sbt;
assemblyMergeStrategy in assembly := {
case "META-INF/services/org.apache.spark.sql.sources.DataSourceRegister" => MergeStrategy.concat
case PathList("META-INF", xs # _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
It did not work, I would be very grateful for any help
Best regards

Related

sbt-assembly is not including scala libraries

The code I'm writing will be run in AWS Lambda which only has the Java 8 runtime installed so I need the scala libraries to be included in my jar. When I give it the jar I built with sbt-assembly I'm getting java.lang.NoClassDefFoundError: scala/Function3.
This is all I have in build.sbt for the assembly plugin:
assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = true)
assemblyMergeStrategy in assembly := {
case PathList("META-INF", _ # _*) => MergeStrategy.discard
case _ => MergeStrategy.first
}
This still happens whether or not I have the line that discards the META-INF files.
I used brew to install scala and I have tried setting my $SCALA_HOME to /usr/local/opt/scala/idea (caveats section of brew info scala) and /usr/local/bin/scala (output of which scala)
:: EDIT ::
I unpacked the jar and found that the class in question was actually included in the jar here: scala/Function3_scala-library-2.12.7_scala-library-2.12.7_scala-library-2.12.7.class
I added scala-library.jar to my classpath in Intellij, removed the target folder and ran sbt assembly and that worked.

SBT prepare WAR file, duplicate entry: META-INF/MANIFEST.MF

I am trying to pack one of the modules of my application into a war.
I have chosen xsbt-web-plugin to help me out.
I have prepared the sbt, I guess correctly:
lazy val `my-project` = (project in file("my-project"))
...
.enablePlugins(TomcatPlugin)
But during sbt package I get this error:
[info] Packaging /home/siatkowskim/Documents/....target/scala-2.11/my-project_2.11-1.2-SNAPSHOT.war ...
[error] java.util.zip.ZipException: duplicate entry: META-INF/MANIFEST.MF
I am familiar with sbt-assembly but I see no way of deduplication here.
How can I even debug, where is it duplicated from? Or how to solve this duplication?
Turned out I had MANIFEST.MF file in my classpath.
I do not know what it was for, but removing it solved the problem.
I have the same issue, yet I had no evident MANIFEST.MF file in my classpath. I can only suppose that it was coming from the multitude of .jar files included.
The following solved the problem:
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs # _*) =>
(xs map {_.toLowerCase}) match {
case ("manifest.mf" :: Nil) | ("index.list" :: Nil) | ("dependencies" :: Nil) => MergeStrategy.discard
case _ => MergeStrategy.last
}
}
See here to understand what the double colon notation means.

sbt+spring assembly deduplicate: different file contents found in the following

where i use sbt assembly my scala project, i met a error like that: deduplicate: different file contents found in the following: and below is the picture:
and my build.sbt:
sbt version: 0.13.15
scala version: 2.8.11
jdk: 1.8
You need to set the assembly merge strategy to either take one of the files, concat them or remove them completely:
assemblyMergeStrategy in assembly := {
case PathList("org", "springframework", xs#_*) => MergeStrategy.last
case x => MergeStrategy.defaultMergeStrategy(x)
}
I am using jdk 8(tried in 11 also) / sbt 1.4.7 / scala 2.13.3 and the following 2 plugins in my rootprojectdir/project/plugins.sbt
addSbtPlugin("com.typesafe.sbt" % "sbt-native-packager" % "1.8.0")
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.15.0")
assemblyMergeStrategy in assembly := {
case PathList(ps # _*) if ps.contains("module-info.class") => MergeStrategy.discard
case x => MergeStrategy.defaultMergeStrategy(x)
}
Also I have added the above strategy in my build.sbt, but still getting the same error. The sbt-assembly plugin ticket is still open in github.
Any workaround will be much appreciated

Deduplicate: Different File Contents Found Error for SBT and Scala

I am trying to build a JAR file through SBT, with my driver script being in Scala. However, when I run 'sbt assembly' I receive a large slew of:
deduplicate: different file contents found in the following:
The entire list of directories displaying that message is too large to post here, so I attached a screenshot which you can view below. My build file contains the following:
libraryDependencies ++= Seq("org.apache.spark" %% "spark-core" % "2.0.0" % "provided", "org.apache.spark" %% "spark-mllib" % "2.0.0")
I attempted the solution offered by Onilton Maciel to this question, but to no avail. I am also confused as to how to implement the solution offered by Martin Senne, as the setup instructions on the 'Setup' section for the Assembly plugin are not clear on how to implement his code snippet. Any help would be appreciated. Thanks
as the setup instructions on the 'Setup' section for the Assembly plugin are not clear on how to implement his code snippet.
It means you should add merge strategy in build.sbt or in root directory sbt files.
There is an example from sbt-assembly official document and update it with your conflicts:
import sbtassembly.MergeStrategy
assemblyMergeStrategy in assembly := {
case PathList("org", "apache", "hadoop", "yarn", "factories", "package-info.class") => MergeStrategy.discard
case PathList("org", "apache", "hadoop", "yarn", "provider", "package-info.class") => MergeStrategy.discard
case PathList("org", "apache", "hadoop", "util", "provider", "package-info.class") => MergeStrategy.discard
case PathList("org", "apache", "spark", "unused", "UnusedStubClass.class") => MergeStrategy.first
}
Hopeful it's helpful for you.

DeDuplication error with SBT assembly plugin

I am trying to create an executable jar using SBT assembly plugin.
I am ending up with below error :
[error] (app/*:assembly) deduplicate: different file contents found in the following:
[error] /Users/rajeevprasanna/.ivy2/cache/org.eclipse.jetty.orbit/javax.servlet/orbits/javax.servlet-3.0.0.v201112011016.jar:about.html
[error] /Users/rajeevprasanna/.ivy2/cache/org.eclipse.jetty/jetty-continuation/jars/jetty-continuation-8.1.8.v20121106.jar:about.html
[error] /Users/rajeevprasanna/.ivy2/cache/org.eclipse.jetty/jetty-http/jars/jetty-http-8.1.8.v20121106.jar:about.html
[error] /Users/rajeevprasanna/.ivy2/cache/org.eclipse.jetty/jetty-io/jars/jetty-io-8.1.8.v20121106.jar:about.html
[error] /Users/rajeevprasanna/.ivy2/cache/org.eclipse.jetty/jetty-security/jars/jetty-security-8.1.8.v20121106.jar:about.html
[error] /Users/rajeevprasanna/.ivy2/cache/org.eclipse.jetty/jetty-server/jars/jetty-server-8.1.8.v20121106.jar:about.html
[error] /Users/rajeevprasanna/.ivy2/cache/org.eclipse.jetty/jetty-servlet/jars/jetty-servlet-8.1.8.v20121106.jar:about.html
[error] /Users/rajeevprasanna/.ivy2/cache/org.eclipse.jetty/jetty-util/jars/jetty-util-8.1.8.v20121106.jar:about.html
[error] /Users/rajeevprasanna/.ivy2/cache/org.eclipse.jetty/jetty-webapp/jars/jetty-webapp-8.1.8.v20121106.jar:about.html
[error] /Users/rajeevprasanna/.ivy2/cache/org.eclipse.jetty/jetty-xml/jars/jetty-xml-8.1.8.v20121106.jar:about.html
[error] Total time: 2562 s, completed Dec 5, 2013 12:03:25 PM
After reading wikis of assembly plugin, i have added merge strategy in build.scala file. Seems it is not working. I am not sure whether it is right fix or not. Can someone suggest me the right strategy.
Below is the code which i have in build.scala file :
mergeStrategy in assembly <<= (mergeStrategy in assembly) {
(old) => {
case "about.html" => MergeStrategy.discard
case "logback.xml" => MergeStrategy.first //case PathList("logback.xml") => MergeStrategy.discard
case x => old(x)
}
}
I have coded plugin integration with my app as per this doc : Standalone deployment of Scalatra servlet
I tried diffrent strategies like MergeStrategy.rename and MergeStrategy.deduplicate. But nothing works..
Looking for help...
Your MergeStrategy looks correct. The only unhandled conflicts are "about.html" in the jetty jars, so case "about.html" => MergeStrategy.discard should just do it.
If you're still getting the error, I suspect that re-wiring of the mergeStrategy in assembly setting is either not going in, or going in the wrong order. The only way to know for sure is to see your Build.scala. #Stefan Ollinger's answer to your linked question for example sets up the project as follows:
lazy val project = Project("myProj", file(".")).
settings(mySettings: _*).
settings(myAssemblySettings:_*)
Could you post your Build.scala on gist if possible?