In Scala project: How to add dependency Jar details into Manifest file - scala

I am new to Scala and i am writing simple file read & write (AWS S3) program from AWS using Scala. The below command is used to add jar details into Manifest file using Maven. Similarly, I need command/configuration for scala project to add jar details into Manifest file
<manifest>
<addClasspath>true</addClasspath>
<classpathPrefix>lib/</classpathPrefix>
...
build.sbt
enablePlugins(PackPlugin)
name := "FileReadAWS"
version := "0.1"
scalaVersion := "2.11.8"
libraryDependencies += "org.scala-lang" % "scala-library" % scalaVersion.value
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.2.1"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.1"
libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "2.7.4"
libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "2.7.4"
libraryDependencies += "org.apache.hadoop" % "hadoop-aws" % "2.7.4"
/project/plugin.sbt
addSbtPlugin("org.xerial.sbt" % "sbt-pack" % "0.13")
When i execute sbt pack command and it is creating project jar & dependency jar added/downloaded into <project ocation>\target\pack\lib
lib folder contains project jar as well and it has Manifest file with Main class details and but doesnot have jar details
Manifest-Version: 1.0
Implementation-Title: FileReadAWS
Implementation-Version: 0.1
Specification-Vendor: default
Specification-Title: FileReadAWS
Implementation-Vendor-Id: default
Specification-Version: 0.1
Implementation-Vendor: default
Main-Class: spark.file.io.FileReader
Please help me.
Thanks in advance.

If your goal is to create a fat jar, have a look at https://github.com/sbt/sbt-assembly

Related

not able to import spark mllib in IntelliJ

I am not able to import spark mllib libraries in Intellij for Spark scala project. I am getting a resolution exception.
Below is my sbt.build
name := "ML_Spark"
version := "0.1"
scalaVersion := "2.11.12"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.2.1"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.1"
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "2.2.1" % "runtime"
I tried to copy/paste the same build.sbt file you provided and i got the following error :
[error] [/Users/pc/testsbt/build.sbt]:3: ';' expected but string literal found.
Actually, the build.sbt is invalid :
intellij error
Having the version and the Scala version in different lines solved the problem for me :
name := "ML_Spark"
version := "0.1"
scalaVersion := "2.11.12"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.2.1"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.1"
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "2.2.1" % "runtime"
I am not sure that this is the problem you're facing (can you please share the exception you had ?), it might be a problem with the repositories you specified under the .sbt folder in your home directory.
I have met the same problem before. To solve it, I just used the compiled version of mllib instead of the runtime one. Here is my conf:
name := "SparkDemo"
version := "0.1"
scalaVersion := "2.11.12"
// https://mvnrepository.com/artifact/org.apache.spark/spark-core
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.3.0"
// https://mvnrepository.com/artifact/org.apache.spark/spark-mllib
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "2.3.0"
I had a similar issue, but I found a workaround. Namely, you have to add the spark-mllib jar file to your project manually. Indeed, despite my build.sbt file was
name := "example_project"
version := "0.1"
scalaVersion := "2.12.10"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "3.0.0",
"org.apache.spark" %% "spark-sql" % "3.0.0",
"org.apache.spark" %% "spark-mllib" % "3.0.0" % "runtime"
)
I wasn't able to import the spark library with
import org.apache.spark.sql._
import org.apache.spark.sql.types._
import org.apache.spark.ml._
The solution that worked for me was to add the jar file manually. Specifically,
Download the jar file of the ml library you need (e.g. for spark 3 use https://mvnrepository.com/artifact/org.apache.spark/spark-mllib_2.12/3.0.0 ).
Follow this link to add the jar file to your intelliJ project: Correct way to add external jars (lib/*.jar) to an IntelliJ IDEA project
Add also the mlib-local jar (https://mvnrepository.com/artifact/org.apache.spark/spark-mllib-local)
If, for some reason, you compile again the build.sbt you need to re-import the jar file again.

Unresolved dependency generating jar with SBT

I'm developing a Spark process in Scala (Eclipse IDE) and runs fine in my local cluster, but when I try to compiled it with SBT that I installed on my pc I got a error (see picture).
My first doubt is why SBT try to compile with scala 2.12 if I explicitly set scalaVersion to 2.11.11 in my build.sbt. I tried installing other SBT versions with the same results, also in other PCs but not works. I need help to fix it.
scala_version(Spark) :2.11.11
sbt_version : 1.0.2
spark: 2.2
build.sbt
name := "Comple"
version := "1.0"
organization := "com.antonio.spark"
scalaVersion := "2.11.11"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.2.0" % "provided",
"org.apache.spark" %% "spark-sql" % "2.2.0" % "provided"
)
assembly.sbt
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "1.0.2")
Error:
ResolveException: unresolved dependency: sbt_assembly;1.0.2: not found
Try changing your assembly.sbt file to:
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.5")
as stated in the documentation here: https://github.com/sbt/sbt-assembly
I recently used that with spark-core_2.11 version 2.2.0 and it worked.

Trying to get Apache Spark working with IntelliJ

I am trying to get Apache Spark working with IntelliJ. I have created an SBT project in IntelliJ and done the following:
1. Gone to File -> Project Structure -> Libraries
2. Clicked the '+' in the middle section, clicked Maven, clicked Download Library from Maven Repository, typed text 'spark-core' and org.apache.spark:spark-core_2.11:2.2.0, which is the latest version of Spark available
I downloaded the jar files and the source code into ./lib in the project folder
3. The Spark library is now showing in the list of libraries
4. Then I right-clicked on org.apache.spark:spark-core_2.11:2.2.0 and clicked Add to Project and Add to Modules
Now when I click on Modules on the left, and then my main project folder, and then Dependencies tab on the right I can see the external library as a Maven library, but after clicking Apply, re-building the project and re-starting IntelliJ, it will not show as an external library in the project. Therefore I can't access the Spark API commands.
What am I doing wrong please? I've looked at all the documentation on IntelliJ and a hundred other sources but can't find the answer.
Also, do I also need to include the following text in the build.SBT file, as well as specifying Apache Spark as an external library dependency? I assume that I need to EITHER include the code in the build.SBT file, OR add Spark as an external dependency manually, but not both.
I included this code in my build.SBT file:
name := "Spark_example"
version := "1.0"
scalaVersion := "2.12.3"
val sparkVersion = "2.0.0"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-streaming" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion
)
I get an error: sbt.ResolveException: unresolved dependency: org.apache.spark#spark-core_2.12;2.2.0: not found
Please help! Thanks
Spark does not have builds for Scala version 2.12.x. So set the Scala version to 2.11.x
scalaVersion := "2.11.8"
val sparkVersion = "2.0.0"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-streaming" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion
)
name := "Test"
version := "0.1"
scalaVersion := "2.11.7"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.2.0.2.6.4.0-91"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.0.2.6.4.0-91"
libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.2.0.2.6.4.0-91" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "2.2.0.2.6.4.0-91" % "runtime"
libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.2.0.2.6.4.0-91" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-hive-thriftserver" % "2.2.0.2.6.4.0-91" % "provided"

Spark sbt scala build error - missing jars from ivy

I am trying to execute spark project from eclipse tool.
In build.sbt I have added below
name := "simple-spark-scala"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "1.6.2"
When I am importing this project I am getting error -
project is missing library around 100 such error
Description Resource Path Location Type
Project 'simple-spark' is missing required library: '/root/.ivy2/cache/aopalliance/aopalliance/jars/aopalliance-1.0.jar' simple-spark Build path Build Path Problem
However I am able to see all the jars under the mentioned path in missing jar files
Any idea how to resolve ?
Add dependency resolvers as below in your built.sbt file
resolvers += "MavenRepository" at "https://mvnrepository.com/"
Because, you dont link Spark Packages Repo. You can see my built.sbt below:
name := "spark"
version := "1.0"
scalaVersion := "2.11.8"
resolvers += "Spark Packages Repo" at "http://dl.bintray.com/spark-packages/maven"
libraryDependencies ++= Seq(
"org.apache.spark" % "spark-core_2.11" % "2.1.0",
"org.apache.spark" % "spark-sql_2.11" % "2.1.0",
"org.apache.spark" % "spark-graphx_2.11" % "2.1.0",
"org.apache.spark" % "spark-mllib_2.11" % "2.1.0",
"neo4j-contrib" % "neo4j-spark-connector" % "2.0.0-M2"
)

Cannot run jar file created from Scala file

This the code that I have written in Scala.
object Main extends App {
println("Hello World from Scala!")
}
This is my build.sbt.
name := "hello-world"
version := "1.0"
scalaVersion := "2.11.5"
mainClass := Some("Main")
This is the command that I have run to create the jar file.
sbt package
My problem is that a jar file named hello-world_2.11-1.0.jar has been created at target/scala-2.11. But I cannot run the file. It is giving me an error saying NoClassDefFoundError.
What am I doing wrong?
It also says what class is not found. Most likely you aren't including scala-library.jar. You can run scala target/scala-2.11/hello-world_2.11-1.0.jar if you have Scala 2.11 available from the command line or java -cp "<path to scala-library.jar>:target/scala-2.11/hello-world_2.11-1.0.jar" Main (use ; instead of : on Windows).
The procedure depicted proves valid up to the way the jar file is executed. From target/scala-2.11 try running it with
scala hello-world_2.11-1.0.jar
Check whether it is runnable also from the project root folder with sbt run.
To run the jar file(containing scala code) with multiple main classes use following approach
scala -cp "<jar-file>.jar;<other-dependencies>.jar" com.xyz.abc.TestApp
This command will take care of including scala-library.jar in dependency and will also identify TestApp as main class if it has a def main(args:Array[String]) method. Please note that multiple jar files should be separated by semi-colon(";")
We can use sbt-assembly to package and run the application.
First, create or add the plugin to project/plugins.sbt
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.9")
The sample build.sbt looks like below:
name := "coursera"
version := "0.1"
scalaVersion := "2.12.10"
mainClass := Some("Main")
val sparkVersion = "3.0.0-preview2"
val playVersion="2.8.1"
val jacksonVersion="2.10.1"
libraryDependencies ++= Seq(
"org.scala-lang" % "scala-library" % scalaVersion.toString(),
"org.apache.spark" %% "spark-streaming" % sparkVersion,
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion,
"com.typesafe.play" %% "play-json" % playVersion,
// https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka-0-10
"org.apache.spark" %% "spark-streaming-kafka-0-10" % sparkVersion,
// https://mvnrepository.com/artifact/org.mongodb/casbah
"org.mongodb" %% "casbah" % "3.1.1" pomOnly(),
// https://mvnrepository.com/artifact/com.typesafe/config
"com.typesafe" % "config" % "1.2.1"
)
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs # _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
From console, we can run sbt assembly and the jar file gets created in target/scala-2.12/ path.
sbt assembly will create a fat jar. Here is an excerpt from the documentation :
sbt-assembly is a sbt plugin originally ported from codahale's assembly-sbt, which I'm guessing was inspired by Maven's assembly plugin. The goal is simple: Create a fat JAR of your project with all of its dependencies.