CollectionAccumulator dependency is not resolving in Intellij - scala

I am new to Scala and Spark. I am writing a sample program on CollectionAccumulator. But the dependency for the CollectionAccumulator is not resolving in Intellij.
val slist : CollectionAccumulator[String] = new CollectionAccumulator()
sc.register(slist,"Myslist")
Please find the piece of code used. I tried the Accumulator[String] by replacing the CollectionAccumulator[String]. The Accumulator is getting resolved
I have imported the following:
import org.apache.log4j._
import org.apache.spark.{Accumulator, SparkContext}
import org.apache.spark.util._
Dependencies in pom.xml:
<dependencies>
<!-- Scala and Spark dependencies -->
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.2.0-cdh5.3.1</version>
</dependency>
Please help..

CollectionAccumulator are supported in spark 2.0+ version. You are on spark 1.2.0 cdh version.
Reference: https://spark.apache.org/docs/2.0.0/api/scala/index.html#org.apache.spark.util.CollectionAccumulator
Replace your spark dependency with
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.1.0.cloudera1</version>
</dependency>
Also make sure that "${scala.version}" resolves to scala 2.11

CollectionAccumulator comes only after spark v2.0.0, simply update your spark version to 2.0+
example build.sbt
name := "smartad-spark-songplaycount"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "2.0.0"
libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "2.2.0"
libraryDependencies += "org.apache.hadoop" % "hadoop-hdfs" % "2.2.0"
resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
example sbt console on above .sbt
sbt console
scala> import org.apache.spark.util.CollectionAccumulator
import org.apache.spark.util.CollectionAccumulator
scala> val slist : CollectionAccumulator[String] = new CollectionAccumulator()
slist: org.apache.spark.util.CollectionAccumulator[String] = Un-registered Accumulator: CollectionAccumulator

Related

RDD issues whith SparkSession

I am new to spark scala and practicing now on my own. can you please help in resolving the issue
could not resolve symbol SparkSession in scala
when I import org.apache.spark.sql.SparkSession in scala to practice RDD and transformations.
It seems you miss the dependencies, So, if you use Maven you can add the below in your pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.project.lib</groupId>
<artifactId>PROJECT</artifactId>
<version>1.0-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.1.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.1.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.1.1</version>
</dependency>
</dependencies>
</project>
But if you use sbt you use the below sample in your sbt.build
name := "SparkTest"
version := "0.1"
scalaVersion := "2.11.8"
val sparkVersion = "2.3.0"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion,
"org.apache.spark" %% "spark-hive" % sparkVersion
)

sbt resolver for confluent platform

I am unable to add confluent repo in my sbt. I looked at pom example and found definition of adding repo in maven.
<repositories>
<repository>
<id>confluent</id>
<url>https://packages.confluent.io/maven/</url>
</repository>
<!-- further repository entries here -->
</repositories>
and dependencies
<dependencies>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.11</artifactId>
<version>2.0.0-cp1</version>
</dependency>
<!-- further dependency entries here -->
</dependencies>
I used
resolvers += Resolver.url("confluent", url("http://packages.confluent.io/maven/")) in build.sbt`
and declared dependencies as
libraryDependencies += "org.apache.kafka" % "kafka-clients" % "2.0.0-cp1"
libraryDependencies += "org.apache.kafka" %% "kafka" % "2.0.0-cp1"
I still get
::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: UNRESOLVED DEPENDENCIES ::
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: org.apache.kafka#kafka-clients;2.0.0-cp1: not found
[warn] :: org.apache.kafka#kafka_2.12;2.0.0-cp1: not found
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
what should be the correct way of doing it?
My build.sbt
name := "kafka-Test"
version := "1.0"
scalaVersion := "2.12.3"
resolvers += Resolver.url("confluent", url("https://packages.confluent.io/maven/"))
libraryDependencies += "org.apache.kafka" % "kafka-clients" % "2.0.0-cp1"
libraryDependencies += "org.apache.kafka" %% "kafka" % "2.0.0-cp1"
The problem is in your resolver definition. It should be:
resolvers += "confluent" at "https://packages.confluent.io/maven/"
I just tried this and it works.

Correct mvn version for dispatch.databinder.net

The last version of the library dispatch.databinder.net is 0.9.5, according to the website.
What is the correct mvn dependency?
<dependency>
<groupId>net.databinder.dispatch</groupId>
<artifactId>core_2.10</artifactId>
<version>0.9.1</version>
</dependency>
or
<dependency>
<groupId>net.databinder.dispatch</groupId>
<artifactId>dispatch-core_2.10</artifactId>
<version>0.9.5</version>
</dependency>
or something else?
and how to find out this in general?
Since the website says
libraryDependencies += "net.databinder.dispatch" %% "dispatch-core" % "0.9.5"
The corresponding maven notation for Scala 2.10.x should be:
<dependency>
<groupId>net.databinder.dispatch</groupId>
<artifactId>dispatch-core_2.10</artifactId>
<version>0.9.5</version>
</dependency>
Starting sbt 0.12.0, the Scala version postfix got shortened to _2.10 for Scala 2.10.x and above to take advantage of binary compatibilities between minor releases of Scala.

How to define maven test-jar dependency in sbt

I have the following maven dependency
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase</artifactId>
<version>0.90.4</version>
<type>test-jar</type>
<scope>test</scope>
</dependency>
I know how to specify groupId,artifactId, version and scope
"org.apache.hbase" % "hbase" % "0.90.4" % "test"
but how do I specify the type (test-jar) so that I'd get hbase-0.90.4-tests.jar from the repo?
"org.apache.hbase" % "hbase" % "0.90.4" % "test" classifier "tests"

Setting target JVM in SBT

How can I set target JVM version in SBT?
In Maven (with maven-scala-plugin) it can be done as follows:
<plugin>
...
<configuration>
<scalaVersion>${scala.version}</scalaVersion>
<args>
<arg>-target:jvm-1.5</arg>
</args>
</configuration>
</plugin>
You can specify compiler options in the project definition:
javacOptions ++= Seq("-source", "1.8", "-target", "1.8")
you have to add(in your build.sbt file):
scalacOptions += "-target:jvm-1.8"
otherwise it won't work.
As suggested by others in comments, the current sbt version (1.0, 0.13.15) uses the following notation for setting source and target JVMs.
javacOptions ++= Seq("-source", "1.8", "-target", "1.8")