sbt resolver for confluent platform - scala

I am unable to add confluent repo in my sbt. I looked at pom example and found definition of adding repo in maven.
<repositories>
<repository>
<id>confluent</id>
<url>https://packages.confluent.io/maven/</url>
</repository>
<!-- further repository entries here -->
</repositories>
and dependencies
<dependencies>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.11</artifactId>
<version>2.0.0-cp1</version>
</dependency>
<!-- further dependency entries here -->
</dependencies>
I used
resolvers += Resolver.url("confluent", url("http://packages.confluent.io/maven/")) in build.sbt`
and declared dependencies as
libraryDependencies += "org.apache.kafka" % "kafka-clients" % "2.0.0-cp1"
libraryDependencies += "org.apache.kafka" %% "kafka" % "2.0.0-cp1"
I still get
::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: UNRESOLVED DEPENDENCIES ::
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: org.apache.kafka#kafka-clients;2.0.0-cp1: not found
[warn] :: org.apache.kafka#kafka_2.12;2.0.0-cp1: not found
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
what should be the correct way of doing it?
My build.sbt
name := "kafka-Test"
version := "1.0"
scalaVersion := "2.12.3"
resolvers += Resolver.url("confluent", url("https://packages.confluent.io/maven/"))
libraryDependencies += "org.apache.kafka" % "kafka-clients" % "2.0.0-cp1"
libraryDependencies += "org.apache.kafka" %% "kafka" % "2.0.0-cp1"

The problem is in your resolver definition. It should be:
resolvers += "confluent" at "https://packages.confluent.io/maven/"
I just tried this and it works.

Related

RDD issues whith SparkSession

I am new to spark scala and practicing now on my own. can you please help in resolving the issue
could not resolve symbol SparkSession in scala
when I import org.apache.spark.sql.SparkSession in scala to practice RDD and transformations.
It seems you miss the dependencies, So, if you use Maven you can add the below in your pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.project.lib</groupId>
<artifactId>PROJECT</artifactId>
<version>1.0-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.1.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.1.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.1.1</version>
</dependency>
</dependencies>
</project>
But if you use sbt you use the below sample in your sbt.build
name := "SparkTest"
version := "0.1"
scalaVersion := "2.11.8"
val sparkVersion = "2.3.0"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion,
"org.apache.spark" %% "spark-hive" % sparkVersion
)

CollectionAccumulator dependency is not resolving in Intellij

I am new to Scala and Spark. I am writing a sample program on CollectionAccumulator. But the dependency for the CollectionAccumulator is not resolving in Intellij.
val slist : CollectionAccumulator[String] = new CollectionAccumulator()
sc.register(slist,"Myslist")
Please find the piece of code used. I tried the Accumulator[String] by replacing the CollectionAccumulator[String]. The Accumulator is getting resolved
I have imported the following:
import org.apache.log4j._
import org.apache.spark.{Accumulator, SparkContext}
import org.apache.spark.util._
Dependencies in pom.xml:
<dependencies>
<!-- Scala and Spark dependencies -->
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.2.0-cdh5.3.1</version>
</dependency>
Please help..
CollectionAccumulator are supported in spark 2.0+ version. You are on spark 1.2.0 cdh version.
Reference: https://spark.apache.org/docs/2.0.0/api/scala/index.html#org.apache.spark.util.CollectionAccumulator
Replace your spark dependency with
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.1.0.cloudera1</version>
</dependency>
Also make sure that "${scala.version}" resolves to scala 2.11
CollectionAccumulator comes only after spark v2.0.0, simply update your spark version to 2.0+
example build.sbt
name := "smartad-spark-songplaycount"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "2.0.0"
libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "2.2.0"
libraryDependencies += "org.apache.hadoop" % "hadoop-hdfs" % "2.2.0"
resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
example sbt console on above .sbt
sbt console
scala> import org.apache.spark.util.CollectionAccumulator
import org.apache.spark.util.CollectionAccumulator
scala> val slist : CollectionAccumulator[String] = new CollectionAccumulator()
slist: org.apache.spark.util.CollectionAccumulator[String] = Un-registered Accumulator: CollectionAccumulator

NoClassDefFoundError while using scopt OptionParser with Spark

I am using Apache Spark version 1.2.1 and Scala version 2.10.4. I am trying to get the example MovieLensALS working. However, I am running into errors with scopt library which is a requirement in the code. Any help would be appreciated.
My build.sbt is as follows:
name := "Movie Recommender System"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.2.1"
libraryDependencies += "org.apache.spark" %% "spark-graphx" % "1.2.1"
libraryDependencies += "org.apache.spark" % "spark-mllib_2.10" % "1.2.1"
libraryDependencies += "com.github.scopt" %% "scopt" % "3.2.0"
resolvers += Resolver.sonatypeRepo("public")
and the errors I am getting are the following:
Exception in thread "main" java.lang.NoClassDefFoundError: scopt/OptionParser
at MovieLensALS.main(MovieLensALS.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: scopt.OptionParser
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
... 8 more
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
On running sbt assembly to build the jar, I receive the following errors:
[error] Not a valid command: assembly
[error] Not a valid project ID: assembly
[error] Expected ':' (if selecting a configuration)
[error] Not a valid key: assembly
[error] assembly
[error] ^
Edit: As per Justin Piphony's suggestion the solution that was listed in sbt's GitHub page helped fix this error. Basically creating a file assembly.sbt in the project/ directory and adding the line
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.13.0")
Note that the version should be added according to the version in use.
You need to package scopt in your jar. sbt doesn't do this by default. To create this fat jar, you need to use sbt-assembly
If you are using maven to package your spark project, you need to add maven-assembly-plugin plugin which helps to package dependencies:
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.5</version>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<!-- this is used for inheritance merges -->
<phase>package</phase>
<!-- bind to the packaging phase -->
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>

How to define maven test-jar dependency in sbt

I have the following maven dependency
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase</artifactId>
<version>0.90.4</version>
<type>test-jar</type>
<scope>test</scope>
</dependency>
I know how to specify groupId,artifactId, version and scope
"org.apache.hbase" % "hbase" % "0.90.4" % "test"
but how do I specify the type (test-jar) so that I'd get hbase-0.90.4-tests.jar from the repo?
"org.apache.hbase" % "hbase" % "0.90.4" % "test" classifier "tests"

Setting target JVM in SBT

How can I set target JVM version in SBT?
In Maven (with maven-scala-plugin) it can be done as follows:
<plugin>
...
<configuration>
<scalaVersion>${scala.version}</scalaVersion>
<args>
<arg>-target:jvm-1.5</arg>
</args>
</configuration>
</plugin>
You can specify compiler options in the project definition:
javacOptions ++= Seq("-source", "1.8", "-target", "1.8")
you have to add(in your build.sbt file):
scalacOptions += "-target:jvm-1.8"
otherwise it won't work.
As suggested by others in comments, the current sbt version (1.0, 0.13.15) uses the following notation for setting source and target JVMs.
javacOptions ++= Seq("-source", "1.8", "-target", "1.8")