I am trying to get my build.sbt working, but when I press reload all sbt projects in Intellij it fails:
Extracting structure failed: Build status: Error sbt task failed, see log for details
My build.sbt:
scalaVersion := "2.12.17"
name := "hello-world"
libraryDependencies ++= Seq(
"org.scala-lang.modules" %% "scala-parser-combinators" % "2.1.1",
"com.microsoft.azure" % "spark-mssql-connector_2.12" % "1.2.0",
"org.apache.spark" % "spark-core_2.12" % "3.1.3",
"org.apache.spark" % "spark-sql_2.12" % "3.1.3"
)
I am trying with this configuration since I want to try the Apache Spark Connector for SQL Server and Azure SQL. (https://github.com/microsoft/sql-spark-connector)
Anyone know why it fails? I don't get any details.
When I comment "org.apache.spark" % "spark-sql_2.12" % "3.1.3" the build is successful.
Related
I'm new to Scala/Spark, so please be easy on me :)
I'm trying to run an EMR cluster on AWS, running the jar file I packed with sbt package.
When I run the code locally, it is working perfectly fine, but when I'm running it in the AWS EMR cluster, I'm getting an error:
ERROR Client: Application diagnostics message: User class threw exception: java.lang.NoClassDefFoundError: upickle/core/Types$Writer
From what I understand, this error originates in the dependencies of the scala/spark versions.
So I'm using Scala 2.12 with spark 3.0.1, and in AWS I'm using emr-6.2.0.
Here's my build.sbt:
scalaVersion := "2.12.14"
libraryDependencies += "com.amazonaws" % "aws-java-sdk" % "1.11.792"
libraryDependencies += "com.amazonaws" % "aws-java-sdk-core" % "1.11.792"
libraryDependencies += "org.apache.hadoop" % "hadoop-aws" % "3.3.0"
libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "3.3.0"
libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "3.3.0"
libraryDependencies += "org.apache.spark" %% "spark-core" % "3.0.1"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "3.0.1"
libraryDependencies += "com.lihaoyi" %% "upickle" % "1.4.1"
libraryDependencies += "com.lihaoyi" %% "ujson" % "1.4.1"
What am I missing?
Thanks!
If you use sbt package, the generated jar will contain only the code of your project, but not dependencies. You need to use sbt assembly to generate so-called uberjar, that will include dependencies as well.
But in your cases, it's recommended to mark Spark and Hadoop (and maybe AWS) dependencies as Provided - they should be already included into the EMR runtime. Use something like this:
libraryDependencies += "org.apache.spark" %% "spark-core" % "3.0.1" % Provided
I just started using Spark 2.2 on HDP 2.6 and Iam facing issues when trying to do sbt compile
Error
[info] Updated file /home/maria_dev/structuredstreaming/project/build.properties: set sbt.version to 1.3.0
[info] Loading project definition from /home/maria_dev/structuredstreaming/project
[info] Fetching artifacts of
[info] Fetched artifacts of
[error] lmcoursier.internal.shaded.coursier.error.FetchError$DownloadingArtifacts: Error fetching artifacts:
[error] https://repo1.maven.org/maven2/com/squareup/okhttp3/okhttp-urlconnection/3.7.0/okhttp-urlconnection-3.7.0.jar: download error: Caught java.net.UnknownHostException: repo1.maven.org (repo1.maven.org) while downloading https://repo1.maven.org/maven2/com/squareup/okhttp3/okhttp-urlconnection/3.7.0/okhttp-urlconnection-3.7.0.jar
build.sbt file is as below
buid.sbt
scalaVersion := "2.11.8"
resolvers ++= Seq(
"Conjars" at "http://conjars.org/repo",
"Hortonworks Releases" at "http://repo.hortonworks.com/content/groups/public"
)
publishMavenStyle := true
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.2.0.2.6.3.0-235",
"org.apache.spark" %% "spark-sql" % "2.2.0.2.6.3.0-235",
"org.apache.phoenix" % "phoenix-spark2" % "4.7.0.2.6.3.0-235",
"org.apache.phoenix" % "phoenix-core" % "4.7.0.2.6.3.0-235",
"org.apache.kafka" % "kafka-clients" % "0.10.1.2.6.3.0-235",
"org.apache.spark" %% "spark-streaming" % "2.0.2" % "provided",
"org.apache.spark" %% "spark-streaming-kafka-0-10" % "2.0.2",
"org.apache.spark" %% "spark-sql-kafka-0-10" % "2.0.2" % "provided",
"com.typesafe" % "config" % "1.3.1",
"com.typesafe.play" %% "play-json" % "2.7.2",
"com.solarmosaic.client" %% "mail-client" % "0.1.0",
"org.json4s" %% "json4s-jackson" % "3.2.10",
"org.apache.logging.log4j" % "log4j-api-scala_2.11" % "11.0",
"com.databricks" %% "spark-avro" % "3.2.0",
"org.elasticsearch" %% "elasticsearch-spark-20" % "5.0.0-alpha5",
"io.spray" %% "spray-json" % "1.3.3"
)
retrieveManaged := true
fork in run := true
It looks like coursier is attempting to fetch dependencies from repo1.maven.org which is being blocked. The Scala-Metals people an explanation here. Basically, you have to set a global Coursier config pointing to your corporate proxy server by setting up a mirror.properties file that looks like this:
central.from=https://repo1.maven.org/maven2
central.to=http://mycorporaterepo.com:8080/nexus/content/groups/public
Based on your OS, it will be:
Windows: C:\Users\\AppData\Roaming\Coursier\config\mirror.properties
Linux: ~/.config/coursier/mirror.properties
MacOS: ~/Library/Preferences/Coursier/mirror.properties
You also might need to setup SBT to use a proxy for downloading dependencies. For that, you will need to edit this file:
~/.sbt/repositories
Set it to the following:
[repositories]
local
maven-central: http://mycorporaterepo.com:8080/nexus/content/groups/public
The combination of those two settings should take care of everything you need to do to point SBT to the correct places.
I am using spark streaming with the Kafka integration, When i run the streaming application from my IDE in Local mode, everything works as a charm. However as soon as i submit it to the cluster i keep having the following error:
java.lang.ClassNotFoundException:
org.apache.kafka.common.serialization.StringDeserializer
I am using sbt assembly to build my project.
my sbt is as such:
libraryDependencies ++= Seq(
"org.apache.spark" % "spark-streaming-kafka-0-10_2.11" % "2.2.0" % Provided,
"org.apache.spark" % "spark-core_2.11" % "2.2.0" % Provided,
"org.apache.spark" % "spark-streaming_2.11" % "2.2.0" % Provided,
"org.marc4j" % "marc4j" % "2.8.2",
"net.sf.saxon" % "Saxon-HE" % "9.7.0-20"
)
run in Compile := Defaults.runTask(fullClasspath in Compile, mainClass in (Compile, run), runner in (Compile, run)).evaluated
mainClass in assembly := Some("EstimatorStreamingApp")
I also tried to use the --package option
attempt 1
--packages org.apache.spark:spark-streaming-kafka-0-10_2.11:2.2.0
attempt 2
--packages org.apache.spark:spark-streaming-kafka-0-10-assembly_2.11:2.2.0
All with no success. Does anyone has anything to suggest
You need to remove the "provided" flag from the Kafka dependency, as it is a dependency not provided OOTB with Spark:
libraryDependencies ++= Seq(
"org.apache.spark" % "spark-streaming-kafka-0-10_2.11" % "2.2.0",
"org.apache.spark" % "spark-core_2.11" % "2.2.0" % Provided,
"org.apache.spark" % "spark-streaming_2.11" % "2.2.0" % Provided,
"org.marc4j" % "marc4j" % "2.8.2",
"net.sf.saxon" % "Saxon-HE" % "9.7.0-20"
)
I am trying to get Apache Spark working with IntelliJ. I have created an SBT project in IntelliJ and done the following:
1. Gone to File -> Project Structure -> Libraries
2. Clicked the '+' in the middle section, clicked Maven, clicked Download Library from Maven Repository, typed text 'spark-core' and org.apache.spark:spark-core_2.11:2.2.0, which is the latest version of Spark available
I downloaded the jar files and the source code into ./lib in the project folder
3. The Spark library is now showing in the list of libraries
4. Then I right-clicked on org.apache.spark:spark-core_2.11:2.2.0 and clicked Add to Project and Add to Modules
Now when I click on Modules on the left, and then my main project folder, and then Dependencies tab on the right I can see the external library as a Maven library, but after clicking Apply, re-building the project and re-starting IntelliJ, it will not show as an external library in the project. Therefore I can't access the Spark API commands.
What am I doing wrong please? I've looked at all the documentation on IntelliJ and a hundred other sources but can't find the answer.
Also, do I also need to include the following text in the build.SBT file, as well as specifying Apache Spark as an external library dependency? I assume that I need to EITHER include the code in the build.SBT file, OR add Spark as an external dependency manually, but not both.
I included this code in my build.SBT file:
name := "Spark_example"
version := "1.0"
scalaVersion := "2.12.3"
val sparkVersion = "2.0.0"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-streaming" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion
)
I get an error: sbt.ResolveException: unresolved dependency: org.apache.spark#spark-core_2.12;2.2.0: not found
Please help! Thanks
Spark does not have builds for Scala version 2.12.x. So set the Scala version to 2.11.x
scalaVersion := "2.11.8"
val sparkVersion = "2.0.0"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-streaming" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion
)
name := "Test"
version := "0.1"
scalaVersion := "2.11.7"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.2.0.2.6.4.0-91"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.0.2.6.4.0-91"
libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.2.0.2.6.4.0-91" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "2.2.0.2.6.4.0-91" % "runtime"
libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.2.0.2.6.4.0-91" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-hive-thriftserver" % "2.2.0.2.6.4.0-91" % "provided"
I am trying to execute spark project from eclipse tool.
In build.sbt I have added below
name := "simple-spark-scala"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "1.6.2"
When I am importing this project I am getting error -
project is missing library around 100 such error
Description Resource Path Location Type
Project 'simple-spark' is missing required library: '/root/.ivy2/cache/aopalliance/aopalliance/jars/aopalliance-1.0.jar' simple-spark Build path Build Path Problem
However I am able to see all the jars under the mentioned path in missing jar files
Any idea how to resolve ?
Add dependency resolvers as below in your built.sbt file
resolvers += "MavenRepository" at "https://mvnrepository.com/"
Because, you dont link Spark Packages Repo. You can see my built.sbt below:
name := "spark"
version := "1.0"
scalaVersion := "2.11.8"
resolvers += "Spark Packages Repo" at "http://dl.bintray.com/spark-packages/maven"
libraryDependencies ++= Seq(
"org.apache.spark" % "spark-core_2.11" % "2.1.0",
"org.apache.spark" % "spark-sql_2.11" % "2.1.0",
"org.apache.spark" % "spark-graphx_2.11" % "2.1.0",
"org.apache.spark" % "spark-mllib_2.11" % "2.1.0",
"neo4j-contrib" % "neo4j-spark-connector" % "2.0.0-M2"
)