I am new to spark scala and practicing now on my own. can you please help in resolving the issue
could not resolve symbol SparkSession in scala
when I import org.apache.spark.sql.SparkSession in scala to practice RDD and transformations.
It seems you miss the dependencies, So, if you use Maven you can add the below in your pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.project.lib</groupId>
<artifactId>PROJECT</artifactId>
<version>1.0-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.1.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.1.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.1.1</version>
</dependency>
</dependencies>
</project>
But if you use sbt you use the below sample in your sbt.build
name := "SparkTest"
version := "0.1"
scalaVersion := "2.11.8"
val sparkVersion = "2.3.0"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion,
"org.apache.spark" %% "spark-hive" % sparkVersion
)
Related
I have a .proto file which Imports google/protobuf/wrappers.proto
while I run Scalapbc to generate the relevant scala code out of it it gives Import google/protobuf/wrappers.proto not found error.
as a workaround for now I have kept the wrappers.proto file in file system for now inside --proto_path
But I need to come up with a fix wherein I need add the relevant dependencies in build.sbt / pom.xml to unpack the jar containing default proto files (such as wrappers.proto) before calling Scalapbc
All the required dependencies are provided by scalabp runtime
import sbtprotoc.ProtocPlugin.ProtobufConfig
import scalapb.compiler.Version.scalapbVersion
libraryDependencies ++= Seq(
"com.thesamet.scalapb" %% "scalapb-runtime" % scalapbVersion,
"com.thesamet.scalapb" %% "scalapb-runtime" % scalapbVersion % ProtobufConfig
)
I uses the AkkaGrpcPugin for sbt which seems to handle all the dependencies.
In plugins.sbt I have
addSbtPlugin("com.lightbend.akka.grpc" % "sbt-akka-grpc" % "1.1.1")
In build.sbt I have
enablePlugins(AkkaGrpcPlugin)
In automatically picks up the files in src/main/protobuf for the project and generates the appropriate stub files. I can import standard files, e.g.
import "google/protobuf/timestamp.proto";
For multi-project builds I use something like this:
lazy val allProjects = (project in file("."))
.aggregate(util, grpc)
lazy val grpc =
project
.in(file("grpc"))
.settings(
???
)
.enablePlugins(AkkaGrpcPlugin)
lazy val util =
project
.in(file("util"))
.settings(
???
)
.dependsOn(grpc)
Thanks everyone for your answers. I really appreciate it.
I was able to solve the dependency issue by
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<executions>
<execution>
<id>unpack</id>
<phase>prepare-package</phase>
<goals>
<goal>unpack</goal>
</goals>
<configuration>
<artifactItems>
<artifactItem>
<groupId>com.google.protobuf</groupId>
<artifactId>protobuf-java</artifactId>
<version>3.10.0</version>
<type>jar</type>
<includes>path/to/Files.whatsoever</includes>
<outputDirectory>${project.build.directory}/foldername</outputDirectory>
</artifactItem>
</artifactItems>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
This generates the required proto files inside target folder
I am unable to add confluent repo in my sbt. I looked at pom example and found definition of adding repo in maven.
<repositories>
<repository>
<id>confluent</id>
<url>https://packages.confluent.io/maven/</url>
</repository>
<!-- further repository entries here -->
</repositories>
and dependencies
<dependencies>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.11</artifactId>
<version>2.0.0-cp1</version>
</dependency>
<!-- further dependency entries here -->
</dependencies>
I used
resolvers += Resolver.url("confluent", url("http://packages.confluent.io/maven/")) in build.sbt`
and declared dependencies as
libraryDependencies += "org.apache.kafka" % "kafka-clients" % "2.0.0-cp1"
libraryDependencies += "org.apache.kafka" %% "kafka" % "2.0.0-cp1"
I still get
::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: UNRESOLVED DEPENDENCIES ::
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: org.apache.kafka#kafka-clients;2.0.0-cp1: not found
[warn] :: org.apache.kafka#kafka_2.12;2.0.0-cp1: not found
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
what should be the correct way of doing it?
My build.sbt
name := "kafka-Test"
version := "1.0"
scalaVersion := "2.12.3"
resolvers += Resolver.url("confluent", url("https://packages.confluent.io/maven/"))
libraryDependencies += "org.apache.kafka" % "kafka-clients" % "2.0.0-cp1"
libraryDependencies += "org.apache.kafka" %% "kafka" % "2.0.0-cp1"
The problem is in your resolver definition. It should be:
resolvers += "confluent" at "https://packages.confluent.io/maven/"
I just tried this and it works.
I am new to Scala and Spark. I am writing a sample program on CollectionAccumulator. But the dependency for the CollectionAccumulator is not resolving in Intellij.
val slist : CollectionAccumulator[String] = new CollectionAccumulator()
sc.register(slist,"Myslist")
Please find the piece of code used. I tried the Accumulator[String] by replacing the CollectionAccumulator[String]. The Accumulator is getting resolved
I have imported the following:
import org.apache.log4j._
import org.apache.spark.{Accumulator, SparkContext}
import org.apache.spark.util._
Dependencies in pom.xml:
<dependencies>
<!-- Scala and Spark dependencies -->
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.2.0-cdh5.3.1</version>
</dependency>
Please help..
CollectionAccumulator are supported in spark 2.0+ version. You are on spark 1.2.0 cdh version.
Reference: https://spark.apache.org/docs/2.0.0/api/scala/index.html#org.apache.spark.util.CollectionAccumulator
Replace your spark dependency with
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.1.0.cloudera1</version>
</dependency>
Also make sure that "${scala.version}" resolves to scala 2.11
CollectionAccumulator comes only after spark v2.0.0, simply update your spark version to 2.0+
example build.sbt
name := "smartad-spark-songplaycount"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "2.0.0"
libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "2.2.0"
libraryDependencies += "org.apache.hadoop" % "hadoop-hdfs" % "2.2.0"
resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
example sbt console on above .sbt
sbt console
scala> import org.apache.spark.util.CollectionAccumulator
import org.apache.spark.util.CollectionAccumulator
scala> val slist : CollectionAccumulator[String] = new CollectionAccumulator()
slist: org.apache.spark.util.CollectionAccumulator[String] = Un-registered Accumulator: CollectionAccumulator
The last version of the library dispatch.databinder.net is 0.9.5, according to the website.
What is the correct mvn dependency?
<dependency>
<groupId>net.databinder.dispatch</groupId>
<artifactId>core_2.10</artifactId>
<version>0.9.1</version>
</dependency>
or
<dependency>
<groupId>net.databinder.dispatch</groupId>
<artifactId>dispatch-core_2.10</artifactId>
<version>0.9.5</version>
</dependency>
or something else?
and how to find out this in general?
Since the website says
libraryDependencies += "net.databinder.dispatch" %% "dispatch-core" % "0.9.5"
The corresponding maven notation for Scala 2.10.x should be:
<dependency>
<groupId>net.databinder.dispatch</groupId>
<artifactId>dispatch-core_2.10</artifactId>
<version>0.9.5</version>
</dependency>
Starting sbt 0.12.0, the Scala version postfix got shortened to _2.10 for Scala 2.10.x and above to take advantage of binary compatibilities between minor releases of Scala.
I have the following maven dependency
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase</artifactId>
<version>0.90.4</version>
<type>test-jar</type>
<scope>test</scope>
</dependency>
I know how to specify groupId,artifactId, version and scope
"org.apache.hbase" % "hbase" % "0.90.4" % "test"
but how do I specify the type (test-jar) so that I'd get hbase-0.90.4-tests.jar from the repo?
"org.apache.hbase" % "hbase" % "0.90.4" % "test" classifier "tests"