Build a universal sbt file in apache spark - scala

Hello i have a sbt file like this:
name := "Myexample"
version := "1.0"
scalaVersion := "2.11.12"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-sql" % "2.4.4",
"org.apache.spark" %% "spark-mllib" % "2.4.4"
)
I want to fix a universal sbt file. If other computer has other scala version or spark version can i make a sbt file dynamic, not declaring my own scalaVersion at all, but declaring in every computer their own scalaversion or apache spark version??

Related

Spark Scala SBT Task Failed

When I tray using spark with Scala in SBT build system to read a Json file a have the error:
and my SBT file is:
ThisBuild / version := "0.1.0-SNAPSHOT"
ThisBuild / scalaVersion := "2.13.10"
// https://mvnrepository.com/artifact/org.apache.spark/spark-core
libraryDependencies += "org.apache.spark" %% "spark-core" % "3.3.1"
// https://mvnrepository.com/artifact/org.apache.spark/spark-sql
libraryDependencies += "org.apache.spark" %% "spark-sql" % "3.3.1"
// https://mvnrepository.com/artifact/org.mongodb.spark/mongo-spark-connector
libraryDependencies += "org.mongodb.spark" % "mongo-spark-connector" % "10.0.5"
I tray to change all "3.3.1" and "10.0.5" to "3.0.1" and it still the same problem
You should use a 2.12 version of Scala, since those dependencies do not all have 2.13 built.
ThisBuild / scalaVersion := "2.12.11"
Look at the screenshot from Maven to see that 2.13 isn't listed for those versions (or at all).

not able to import spark mllib in IntelliJ

I am not able to import spark mllib libraries in Intellij for Spark scala project. I am getting a resolution exception.
Below is my sbt.build
name := "ML_Spark"
version := "0.1"
scalaVersion := "2.11.12"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.2.1"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.1"
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "2.2.1" % "runtime"
I tried to copy/paste the same build.sbt file you provided and i got the following error :
[error] [/Users/pc/testsbt/build.sbt]:3: ';' expected but string literal found.
Actually, the build.sbt is invalid :
intellij error
Having the version and the Scala version in different lines solved the problem for me :
name := "ML_Spark"
version := "0.1"
scalaVersion := "2.11.12"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.2.1"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.1"
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "2.2.1" % "runtime"
I am not sure that this is the problem you're facing (can you please share the exception you had ?), it might be a problem with the repositories you specified under the .sbt folder in your home directory.
I have met the same problem before. To solve it, I just used the compiled version of mllib instead of the runtime one. Here is my conf:
name := "SparkDemo"
version := "0.1"
scalaVersion := "2.11.12"
// https://mvnrepository.com/artifact/org.apache.spark/spark-core
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.3.0"
// https://mvnrepository.com/artifact/org.apache.spark/spark-mllib
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "2.3.0"
I had a similar issue, but I found a workaround. Namely, you have to add the spark-mllib jar file to your project manually. Indeed, despite my build.sbt file was
name := "example_project"
version := "0.1"
scalaVersion := "2.12.10"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "3.0.0",
"org.apache.spark" %% "spark-sql" % "3.0.0",
"org.apache.spark" %% "spark-mllib" % "3.0.0" % "runtime"
)
I wasn't able to import the spark library with
import org.apache.spark.sql._
import org.apache.spark.sql.types._
import org.apache.spark.ml._
The solution that worked for me was to add the jar file manually. Specifically,
Download the jar file of the ml library you need (e.g. for spark 3 use https://mvnrepository.com/artifact/org.apache.spark/spark-mllib_2.12/3.0.0 ).
Follow this link to add the jar file to your intelliJ project: Correct way to add external jars (lib/*.jar) to an IntelliJ IDEA project
Add also the mlib-local jar (https://mvnrepository.com/artifact/org.apache.spark/spark-mllib-local)
If, for some reason, you compile again the build.sbt you need to re-import the jar file again.

Building Spark scala project using Bazel

I want to build a Spark with Scala project using Bazel which has been built using SBT and executed successfully. This is how my build.sbt looks like
name := "ScalaWithSpark" version := "0.1" scalaVersion := "2.11.11"
//libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.0"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.0"
Basically I have gone through all the concepts of Bazel but still its like dots scattered for me I need help in connecting the dots Tq

Libraries not resolving in sbt within intellij, but resolves and compiles from commandline

The below sbt file doesn't resolve the spark-xml databricks package from within intelliJ Idea where as works fine with command line
name := "dataframes"
version := "1.0"
scalaVersion := "2.11.8"
val sparkVersion="1.4.0"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion % "provided",
"org.apache.spark" %% "spark-sql" % sparkVersion,
"com.databricks" %% "spark-xml" % "0.3.3"
)
resolvers ++= Seq(
"Apache HBase" at "https://repository.apache.org/content/repositories/releases",
"Typesafe repository" at "http://repo.typesafe.com/typesafe/releases/"
)
resolvers += Resolver.mavenLocal
The sbt was set to both bundled and other pointing to the locally installed sbt, but did not work either way.
The below package resolves and works perfectly from commandline
import com.databricks.spark._

IntelliJ Idea 14: cannot resolve symbol spark

I made a dependency of Spark which worked in my first project. But when I try to make a new project with Spark, my SBT does not import the external jars of org.apache.spark. Therefore IntelliJ Idea gives the error that it "cannot resolve symbol".
I already tried to make a new project from scratch and use auto-import but none works. When I try to compile I get the messages that "object apache is not a member of package org". My build.sbt looks like this:
name := "hello"
version := "1.0"
scalaVersion := "2.11.7"
libraryDependencies += "org.apache.spark" % "spark-parent_2.10" % "1.4.1"
I have the impression that there might be something wrong with my SBT settings, although it already worked one time. And except for the external libraries everything is the same...
I also tried to import the pom.xml file of my spark dependency but that also doesn't work.
Thank you in advance!
This worked for me->
name := "ProjectName"
version := "0.1"
scalaVersion := "2.11.11"
libraryDependencies ++= Seq(
"org.apache.spark" % "spark-core_2.11" % "2.2.0",
"org.apache.spark" % "spark-sql_2.11" % "2.2.0",
"org.apache.spark" % "spark-mllib_2.10" % "1.1.0"
)
I use
scalaVersion := "2.11.7"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.4.1"
in my build.sbt and it works for me.
I had a similar problem. It seems the reason was that the build.sbt file was specifying the wrong version of scala.
If you run spark-shell it'll say at some point the scala version used by Spark, e.g.
Using Scala version 2.11.8
Then I edited the line in the build.sbt file to point to that version and it worked.
Currently spark-cassandra-connector compatible with Scala 2.10 and 2.11.
It worked for me when I updated the scala version of my project like below:
ThisBuild / scalaVersion := "2.11.12"
and I updated my dependency like:
libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector" % "2.4.0",
If you use "%%", sbt will add your project’s binary Scala version to the artifact name.
From sbt run:
sbt> reload
sbt> compile
Your library dependecy conflicts with with the scala version you're using, you need to use 2.11 for it to work. The correct dependency would be:
scalaVersion := "2.11.7"
libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "1.4.1"
note that you need to change spark_parent to spark_core
name := "SparkLearning"
version := "0.1"
scalaVersion := "2.12.3"
// additional libraries
libraryDependencies += "org.apache.spark" % "spark-streaming_2.10" % "1.4.1"