Intellisense in Intellij with spark libraries - scala

I've created a Spark project in IntelliJ and Intellsense (code completation) doesnt't work. It works for standard Scala libraries, but not for Spark. Any ideas what can be wron? (I've already done Invalidate Caches and restart)
built.sbt
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.1.0",
"org.apache.spark" %% "spark-sql" % "2.1.0"

Your build.sbt needs to be under the root folder of the project .

Related

Running scala spark 1.6 job on spark 2.1 fails [duplicate]

This question already has answers here:
Resolving dependency problems in Apache Spark
(7 answers)
Closed 4 years ago.
I have a spark job that needs to run nightly. However, I had to update to spark 2.1 from 1.6. Now I am receiving an error:
java.lang.NoSuchMethodError: org/apache/spark/sql/DataFrameReader.load()Lorg/apache/spark/sql/DataFrame; (loaded from file:/usr/local/src/spark21master/spark-2.1.2-bin-2.7.3/jars/spark-sql_2.11-2.1.2.jar by sun.misc.Launcher$AppClassLoader#305de464) called from class com.ibm.cit.tennis.ServiceStat$ (loaded from file:/tmp/spark-21-ego-master/work/spark-driver-8073f84b-6c09-4d7d-83f5-2c99527eaa1c/spark-service-stat_2.11-1.0.jar by org.apache.spark.util.MutableURLClassLoader#ee80a89b).
In my SBT my build file, I have the following configs:
scalaVersion := "2.11.8"
val sparkVersion = "2.1.2"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion % sparkDependencyScope,
"org.apache.spark" %% "spark-sql" % sparkVersion % sparkDependencyScope,
"org.apache.spark" %% "spark-mllib" % sparkVersion % sparkDependencyScope,
"org.apache.spark" %% "spark-streaming" % sparkVersion % sparkDependencyScope,
"org.apache.spark" %% "spark-hive" % sparkVersion % sparkDependencyScope,
"org.apache.spark" %% "spark-repl" % sparkVersion % sparkDependencyScope
"org.apache.spark" %% "spark-graphx" % sparkVersion % sparkDependencyScope
)
I am building with Scala 2.11.8 and Java 1.8.0.
Any help would be appreciated,
Aaron.
NoSuchMethodError exception is a sign of a version mismatch. I suspect you're still trying to use Spark 1.6 to launch your app. It is not clear what is the value of sparkDependencyScope in your example. Normally Spark dependencies are specified with "provided" scope, i.e. the installed version of the Spark runtime.
"org.apache.spark" %% "spark-core" % sparkVersion % "provided"
Try running
spark-submit --version to figure out which Spark launcher version is used. If it is not what you expect, make sure Spark 2.1.2 is installed and in PATH.
The issue has been solved. The libraries were cached in an environment. After the creation of a new environment, SBT was able to pull the latest sources.
Also, I had to add:
conf.set("spark.sql.crossJoin.enabled", "true")

Trying to get Apache Spark working with IntelliJ

I am trying to get Apache Spark working with IntelliJ. I have created an SBT project in IntelliJ and done the following:
1. Gone to File -> Project Structure -> Libraries
2. Clicked the '+' in the middle section, clicked Maven, clicked Download Library from Maven Repository, typed text 'spark-core' and org.apache.spark:spark-core_2.11:2.2.0, which is the latest version of Spark available
I downloaded the jar files and the source code into ./lib in the project folder
3. The Spark library is now showing in the list of libraries
4. Then I right-clicked on org.apache.spark:spark-core_2.11:2.2.0 and clicked Add to Project and Add to Modules
Now when I click on Modules on the left, and then my main project folder, and then Dependencies tab on the right I can see the external library as a Maven library, but after clicking Apply, re-building the project and re-starting IntelliJ, it will not show as an external library in the project. Therefore I can't access the Spark API commands.
What am I doing wrong please? I've looked at all the documentation on IntelliJ and a hundred other sources but can't find the answer.
Also, do I also need to include the following text in the build.SBT file, as well as specifying Apache Spark as an external library dependency? I assume that I need to EITHER include the code in the build.SBT file, OR add Spark as an external dependency manually, but not both.
I included this code in my build.SBT file:
name := "Spark_example"
version := "1.0"
scalaVersion := "2.12.3"
val sparkVersion = "2.0.0"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-streaming" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion
)
I get an error: sbt.ResolveException: unresolved dependency: org.apache.spark#spark-core_2.12;2.2.0: not found
Please help! Thanks
Spark does not have builds for Scala version 2.12.x. So set the Scala version to 2.11.x
scalaVersion := "2.11.8"
val sparkVersion = "2.0.0"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-streaming" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion
)
name := "Test"
version := "0.1"
scalaVersion := "2.11.7"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.2.0.2.6.4.0-91"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.0.2.6.4.0-91"
libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.2.0.2.6.4.0-91" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "2.2.0.2.6.4.0-91" % "runtime"
libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.2.0.2.6.4.0-91" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-hive-thriftserver" % "2.2.0.2.6.4.0-91" % "provided"

Check Spark packages version [duplicate]

This question already has an answer here:
sbt unresolved dependency for spark-cassandra-connector 2.0.2
(1 answer)
Closed 5 years ago.
I am trying to set up my first Scala project with IntelliJ Idea on Ubuntu 16.04. I need the Spark library and I think I have installed correctly in my computer, however I am not able to refer it in the project dependencies. In particular, I have added the following code in my build.sbt:
libraryDependencies ++= Seq(
"org.apache.spark" % "spark-core" % "2.1.1",
"org.apache.spark" % "spark-sql" % "2.1.1")
However sbt complains about not finding the correct packages (Unresolved Dependencies error, org.apache.spark#spark-core;2.1.1: not found and org.apache.spark#spark-sql;2.1.1: not found):
I think that the versions of the packages are incorrect (I copied the previous code from the web, just to try).
How can I determine the correct packages versions?
If you use % you have to define the exact version as
libraryDependencies ++= Seq(
"org.apache.spark" % "spark-core_2.10" % "2.1.1",
"org.apache.spark" % "spark-sql_2.10" % "2.1.1")
And if you don't want to define the version and let sbt take the correct version then you need to define %% as
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.1.1",
"org.apache.spark" %% "spark-sql" % "2.1.1")
you can check of installed version by doing
spark-submit --version
And by going to maven dependency

override guava dependency version of spark

spark depends on an old version of guava.
i build my spark project with sbt assembly, excluding spark using provided, and including the latest version of guava.
However, when running sbt-assembly, the guava dependency is excluded also from the jar.
my build.sbt:
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion % "provided",
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
"org.apache.spark" %% "spark-mllib" % sparkVersion % "provided",
"com.google.guava" % "guava" % "11.0"
)
if i remove the % "provided", then both spark and guava is included.
so, how can i exclude spark and include guava?
You are looking for shading options. See here but basically you need to add shading instructions. Something like this:
assemblyShadeRules in assembly := Seq(
ShadeRule.rename("com.google.guava.**" -> "my_conf.#1")
.inLibrary("com.google.guava" % "config" % "11.0")
.inProject
)
There is also the corresponding maven-shade-plugin for those who prefer maven.

Mllib dependency error

I'm trying to build a very simple scala standalone app using the Mllib, but I get the following error when trying to bulid the program:
Object Mllib is not a member of package org.apache.spark
Then, I realized that I have to add Mllib as dependency as follow :
version := "1"
scalaVersion :="2.10.4"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.1.0",
"org.apache.spark" %% "spark-mllib" % "1.1.0"
)
But, here I got an error that says :
unresolved dependency spark-core_2.10.4;1.1.1 : not found
so I had to modify it to
"org.apache.spark" % "spark-core_2.10" % "1.1.1",
But there is still an error that says :
unresolved dependency spark-mllib;1.1.1 : not found
Anyone knows how to add dependency of Mllib in .sbt file?
As #lmm pointed out, you can instead include the libraries as:
libraryDependencies ++= Seq(
"org.apache.spark" % "spark-core_2.10" % "1.1.0",
"org.apache.spark" % "spark-mllib_2.10" % "1.1.0"
)
In sbt %% includes the scala version, and you are building with scala version 2.10.4 whereas the Spark artifacts are published against 2.10 in general.
It should be noted that if you are going to make an assembly jar to deploy your application you may wish to mark spark-core as provided e.g.
libraryDependencies ++= Seq(
"org.apache.spark" % "spark-core_2.10" % "1.1.0" % "provided",
"org.apache.spark" % "spark-mllib_2.10" % "1.1.0"
)
Since the spark-core package will be in the path on executor anyways.
Here is another way to add the dependency to your build.sbt file if you're using the Databricks sbt-spark-package plugin:
sparkComponents ++= Seq("sql","hive", "mllib")