Maven setup for spark project in Eclipse Scala IDE - eclipse

Im using Eclipse for Scala IDE to develop a Spark application.
Im using a Maven project, but when i try to import sparkcon like :
import org.apache.spark.SparkConf
I have the error :
object apache is not a member of package org can you help me to setup
spark dependancies

You need to put the Spark dependency in your pom.xml file. As of now, the latest version of Spark can be obtained by putting:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>2.2.0</version>
</dependency>
in the dependencies in your pom.xml.

Please check the tutorial from
https://sparktutorials.github.io/2015/04/02/setting-up-a-spark-project-with-maven.html
Online tutorial
https://www.youtube.com/watch?v=4sO-VgqHLp4

Related

Implementing Isolation Forest in Spark Scala

I am trying to implement Isolation Forests algorithm using Spark Scala Maven project. It is explained on this link: iforest example.
My question is: when I try to implement the suggested code I collect this error :
object iforest is not a member of package org.apache.spark.ml
I tried to import org.apache.spark.ml and changed the Spark-core dependency to vesrion 2.2.0 as well.
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.2.0</version>
</dependency>
Any suggestions please?
You can try this Spark/Scala implementation of the isolation forest algorithm, which has artifacts available in the public Maven Central repository.
You can declare the dependency in your project's pom.xml as:
<dependency>
<groupId>com.linkedin.isolation-forest</groupId>
<artifactId>isolation-forest_3.2.0_2.12</artifactId>
<version>2.0.8</version>
</dependency>
Other available artifact versions are listed here.
This spark-iforest artifact is not included in official distribution nor present in any centralized artifact distribution resource, so to use it you need to build it on your own, either as a separate library or inside your project.
This library should not use package name of external sources at first place, because it made a false offer that it is available within Spark itself.

How to import TwitterUtils library from scala

I am using DSE 4.8. I am trying to import twitterutils library by using:
import org.apache.spark.streaming.twitter.TwitterUtils
It's showing error:
object twitter is not a member of package org.apache.spark.streaming.
Please let me know how to add package so that I can stream twitter data
I think you have missed to add dependency spark-streaming-twitter.
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-twitter_2.10 -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-twitter_2.10</artifactId>
<version>1.6.1</version>
</dependency>
You can alternatively use --package option to let spark auto download the jar for you if you have internet connection.
spark-shell --packages org.apache.bahir:spark-streaming-twitter_2.11:2.1.0
** Change version of the library as per your spark(scala) version

How to convince Scala IDE to recognize org.apache.spark.graphx._ package?

When I try to include GraphX into my work, it seems I'm doing something wrong because Scala IDE doesn't recognize org.apache.spark.graphx._ or anything related to graphs (!)
Did I miss something when I first created my project? Do I need to include additional libraries (how?)?
You need to add the GraphX dependency to your pom file.
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-graphx_2.10</artifactId>
<version>1.6.1</version>
</dependency>
There is now a --packages option for the Spark shell. You use it like this:
spark-shell --packages [maven:coordinates:here]

Eclipse cannot resolve org.springframework.extensions.*

I should develope a java backed web script for alfresco but some libraries are missing on the classpath so most of the classes I like to use can not be resolved. I can build the project but Eclipse can not find the classes:
import org.springframework.extensions.webscripts.AbstractWebScript;
import org.springframework.extensions.webscripts.WebScriptException;
import org.springframework.extensions.webscripts.WebScriptRequest;
import org.springframework.extensions.webscripts.WebScriptResponse;
Where can I download the jar to add it to the classpath? I already searched and googled but I cannot find a download :/
if you are using maven ,add this to your pom.xml
<dependency>
<groupId>org.springframework.extensions.surf</groupId>
<artifactId>spring-webscripts</artifactId>
<version>1.0.0.M3</version>
</dependency>
<dependency>
<groupId>org.springframework.extensions.surf</groupId>
<artifactId>spring-webscripts-api</artifactId>
<version>1.0.0.M3</version>
</dependency>

java.lang.NoClassDefFoundError: scala/reflect/ClassManifest

I am getting an error when trying to run an example on spark. Can anybody please let me know what changes do i need to do to my pom.xml to run programs with spark.
Currently Spark only works with Scala 2.9.3. It does not work with later versions of Scala. I saw the error you describe when I tried to run the SparkPi example with SCALA_HOME pointing to a 2.10.2 installation. When I pointed SCALA_HOME at a 2.9.3 installation instead, things worked for me. Details here.
You should add dependecy for scala-reflect to your maven build:
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-reflect</artifactId>
<version>2.10.2</version>
</dependency>
Ran into the same issue using the Scala-Redis 2.9 client (incompatible with Scala 2.10) and including a dependency to scala-reflect does not help. Indeed, scala-reflect is packaged as its own jar but does not include the Class missing which is deprecated since Scala 2.10.0 (see this thread).
The correct answer is to point to an installation of Scala which includes this class (In my case using the Scala-Redis client, the answer of McNeill helped. I pointed to Scala 2.9.3 using SBT and everything worked as expected)
In my case, the error is raised in Kafka's api. I change the dependency from
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.9.2</artifactId>
<version>0.8.1.1</version>
</dependency>
to
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka_2.10</artifactId>
<version>1.6.1</version>
</dependency>
fixed the problem.