Error while running sbt package: object apache is not a member of package org - scala

When I try sbt package in my below code I get these following errors
object apache is not a member of package org
not found: value SparkSession
MY Spark Version: 2.4.4
My Scala Version: 2.11.12
My build.sbt
name := "simpleApp"
version := "1.0"
scalaVersion := "2.11.12"
//libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.4"
libraryDependencies ++= {
val sparkVersion = "2.4.4"
Seq( "org.apache.spark" %% "spark-core" % sparkVersion)
}
my Scala project
import org.apache.spark.sql.SparkSession
object demoapp {
def main(args: Array[String]) {
val logfile = "C:/SUPPLENTA_INFORMATICS/demo/hello.txt"
val spark = SparkSession.builder.appName("Simple App in Scala").getOrCreate()
val logData = spark.read.textFile(logfile).cache()
val numAs = logData.filter(line => line.contains("Washington")).count()
println(s"Lines are: $numAs")
spark.stop()
}
}

If you want to use Spark SQL, you also have to add the spark-sql module to the dependencies:
// https://mvnrepository.com/artifact/org.apache.spark/spark-sql
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.4"
Also, note that you have to reload your project in SBT after changing the build definition and import the changes in intelliJ.

Related

Scala Flink get java.lang.NoClassDefFoundError: scala/Product$class after using case class for customized DeserializationSchema

It work fine when using generic class.
But get java.lang.NoClassDefFoundError: scala/Product$class error after change class to case class.
Not sure is sbt packaging problem or code problem.
When I'm using:
sbt
scala: 2.11.12
java: 8
sbt assembly to package
package example
import java.util.Properties
import java.nio.charset.StandardCharsets
import org.apache.flink.api.scala._
import org.apache.flink.streaming.util.serialization.{DeserializationSchema, SerializationSchema}
import org.apache.flink.streaming.api.scala.{DataStream, StreamExecutionEnvironment}
import org.apache.flink.streaming.connectors.kafka.{FlinkKafkaConsumer, FlinkKafkaProducer}
import org.apache.flink.streaming.api.watermark.Watermark
import org.apache.flink.streaming.api.functions.AssignerWithPunctuatedWatermarks
import org.apache.flink.api.common.typeinfo.TypeInformation
import Config._
case class Record(
id: String,
startTime: Long
) {}
class RecordDeSerializer extends DeserializationSchema[Record] with SerializationSchema[Record] {
override def serialize(record: Record): Array[Byte] = {
return "123".getBytes(StandardCharsets.UTF_8)
}
override def deserialize(b: Array[Byte]): Record = {
Record("1", 123)
}
override def isEndOfStream(record: Record): Boolean = false
override def getProducedType: TypeInformation[Record] = {
createTypeInformation[Record]
}
}
object RecordConsumer {
def main(args: Array[String]): Unit = {
val config : Properties = {
var p = new Properties()
p.setProperty("zookeeper.connect", Config.KafkaZookeeperServers)
p.setProperty("bootstrap.servers", Config.KafkaBootstrapServers)
p.setProperty("group.id", Config.KafkaGroupID)
p
}
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.enableCheckpointing(1000)
var consumer = new FlinkKafkaConsumer[Record](
Config.KafkaTopic,
new RecordDeSerializer(),
config
)
consumer.setStartFromEarliest()
val stream = env.addSource(consumer).print
env.execute("record consumer")
}
}
Error
2020-08-05 04:07:33,963 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Discarding checkpoint 1670 of job 4de8831901fa72790d0a9a973cc17dde.
java.lang.NoClassDefFoundError: scala/Product$class
...
build.SBT
First idea is that maybe version is not right.
But every thing work fine if use normal class
Here is build.sbt
ThisBuild / resolvers ++= Seq(
"Apache Development Snapshot Repository" at "https://repository.apache.org/content/repositories/snapshots/",
Resolver.mavenLocal
)
name := "deedee"
version := "0.1-SNAPSHOT"
organization := "dexterlab"
ThisBuild / scalaVersion := "2.11.8"
val flinkVersion = "1.8.2"
val flinkDependencies = Seq(
"org.apache.flink" %% "flink-scala" % flinkVersion % "provided",
"org.apache.flink" %% "flink-streaming-scala" % flinkVersion % "provided",
"org.apache.flink" %% "flink-streaming-java" % flinkVersion % "provided",
"org.apache.flink" %% "flink-connector-kafka" % flinkVersion,
)
val thirdPartyDependencies = Seq(
"com.github.nscala-time" %% "nscala-time" % "2.24.0",
"com.typesafe.play" %% "play-json" % "2.6.14",
)
lazy val root = (project in file(".")).
settings(
libraryDependencies ++= flinkDependencies,
libraryDependencies ++= thirdPartyDependencies,
libraryDependencies += "org.scala-lang" % "scala-compiler" % scalaVersion.value,
)
assembly / mainClass := Some("dexterlab.TelecoDataConsumer")
// make run command include the provided dependencies
Compile / run := Defaults.runTask(Compile / fullClasspath,
Compile / run / mainClass,
Compile / run / runner
).evaluated
// stays inside the sbt console when we press "ctrl-c" while a Flink programme executes with "run" or "runMain"
Compile / run / fork := true
Global / cancelable := true
// exclude Scala library from assembly
assembly / assemblyOption := (assembly / assemblyOption).value.copy(includeScala = false)
autoCompilerPlugins := true
Finally success after I add this line in build.sbt
assembly / assemblyOption := (assemblu / assemblyOption).value.copy(includeScala = true)
To include scala library when running sbt assembly

Understanding build.sbt with sbt-spark-package plugin

I am new the scala and SBT build files. From the introductory tutorials adding spark dependencies to a scala project should be straight-forward via the sbt-spark-package plugin but I am getting the following error:
[error] (run-main-0) java.lang.NoClassDefFoundError: org/apache/spark/SparkContext
Please provide resources to learn more about what could be driving error as I want to understand process more thoroughly.
CODE:
trait SparkSessionWrapper {
lazy val spark: SparkSession = {
SparkSession
.builder()
.master("local")
.appName("spark citation graph")
.getOrCreate()
}
val sc = spark.sparkContext
}
import org.apache.spark.graphx.GraphLoader
object Test extends SparkSessionWrapper {
def main(args: Array[String]) {
println("Testing, testing, testing, testing...")
var filePath = "Desktop/citations.txt"
val citeGraph = GraphLoader.edgeListFile(sc, filepath)
println(citeGraph.vertices.take(1))
}
}
plugins.sbt
resolvers += "bintray-spark-packages" at "https://dl.bintray.com/spark-packages/maven/"
addSbtPlugin("org.spark-packages" % "sbt-spark-package" % "0.2.6")
build.sbt -- WORKING. Why does libraryDependencies run/work ?
spName := "yewno/citation_graph"
version := "0.1"
scalaVersion := "2.11.12"
sparkVersion := "2.2.0"
sparkComponents ++= Seq("core", "sql", "graphx")
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.2.0",
"org.apache.spark" %% "spark-sql" % "2.2.0",
"org.apache.spark" %% "spark-graphx" % "2.2.0"
)
build.sbt -- NOT WORKING. Would expect this to compile & run correctly
spName := "yewno/citation_graph"
version := "0.1"
scalaVersion := "2.11.12"
sparkVersion := "2.2.0"
sparkComponents ++= Seq("core", "sql", "graphx")
Bonus for explanation + links to resources to learn more about SBT build process, jar files, and anything else that can help me get up to speed!
sbt-spark-package plugin provides dependencies in provided scope:
sparkComponentSet.map { component =>
"org.apache.spark" %% s"spark-$component" % sparkVersion.value % "provided"
}.toSeq
We can confirm this by running show libraryDependencies from sbt:
[info] * org.scala-lang:scala-library:2.11.12
[info] * org.apache.spark:spark-core:2.2.0:provided
[info] * org.apache.spark:spark-sql:2.2.0:provided
[info] * org.apache.spark:spark-graphx:2.2.0:provided
provided scope means:
The dependency will be part of compilation and test, but excluded from
the runtime.
Thus sbt run throws java.lang.NoClassDefFoundError: org/apache/spark/SparkContext
If we really want to include provided dependencies on run classpath then #douglaz suggests:
run in Compile := Defaults.runTask(fullClasspath in Compile, mainClass in (Compile, run), runner in (Compile, run)).evaluated

Scala Akka Streams project fails to compile, "value ~> is not a member of akka.streams.Outlet[In]"

I am using the Akka Streams API in a Scala project, working in Intellij IDEA with the SBT plugin. I have a worker pool Flow as described here: https://doc.akka.io/docs/akka/current/scala/stream/stream-cookbook.html. Here is my code:
package streams
import akka.NotUsed
import akka.stream.scaladsl.{Balance, Flow, GraphDSL, Merge}
import akka.stream.{FlowShape, Graph}
object WorkerPoolFlow {
def apply[In, Out](
worker: Flow[In, Out, Any],
workerCount: Int):
Graph[FlowShape[In, Out], NotUsed] = {
GraphDSL.create() { implicit b =>
val balance = b.add(Balance[In](workerCount, waitForAllDownstreams = true))
val merge = b.add(Merge[Out](workerCount))
for (i <- 0 until workerCount)
balance.out(i) ~> worker.async ~> merge.in(i)
FlowShape(
balance.in,
merge.out)
}
}
}
For some reason the project is now failing to compile, giving this error: value ~> is not a member of akka.stream.Outlet[In].
It compiled fine until today. The only change I am aware of making is installing a Scala linter plugin scalafmt, and importing a few new libraries in build.sbt. Here is my build.sbt:
name := "myProject"
version := "0.1"
scalaVersion := "2.11.11"
unmanagedJars in Compile += file("localDep1.jar")
unmanagedJars in Compile += file("localDep2.jar")
libraryDependencies += "io.spray" %% "spray-json" % "1.3.3"
libraryDependencies += "com.typesafe.akka" %% "akka-actor" % "2.5.6"
libraryDependencies += "com.typesafe.akka" %% "akka-stream" % "2.5.6"
libraryDependencies += "com.typesafe.akka" %% "akka-testkit" % "2.5.6" % Test
libraryDependencies += "com.typesafe.akka" %% "akka-stream-testkit" % "2.5.6" % Test
libraryDependencies += "com.47deg" %% "fetch" % "0.6.0"
libraryDependencies += "ch.qos.logback" % "logback-classic" % "1.2.3"
I have tried reloading SBT, building from SBT outside of IDEA, removing and re-adding dependencies, and cleaning the project, with no luck.
Import GraphDSL.Implicits._:
object WorkerPoolFlow {
def apply[In, Out](
worker: Flow[In, Out, Any],
workerCount: Int):
Graph[FlowShape[In, Out], NotUsed] = {
import GraphDSL.Implicits._
GraphDSL.create() { implicit b =>
...
}
}
}

java.lang.ClassNotFoundException: org.apache.spark.streaming.twitter.TwitterUtils$

I was building this small demo code for Spark streaming using twitter. I have added the required dependencies as shown by http://bahir.apache.org/docs/spark/2.0.0/spark-streaming-twitter/ and I am using sbt to build jars. The project build successfully and only problem seems to be is- it is not able to find the TwitterUtils class.
The scala code is given below
build.sbt
name := "twitterexample"
version := "1.0"
scalaVersion := "2.11.8"
val sparkVersion = "1.6.1"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-streaming" % sparkVersion,
"org.apache.bahir" %% "spark-streaming-twitter" % "2.1.0",
"org.twitter4j" % "twitter4j-core" % "4.0.4",
"org.twitter4j" % "twitter4j-stream" % "4.0.4"
)
The main scala file is
TwitterCount.scala
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._
import twitter4j.Status
object TwitterCount {
def main(args: Array[String]): Unit = {
val consumerKey = "abc"
val consumerSecret ="abc"
val accessToken = "abc"
val accessTokenSecret = "abc"
val lang ="english"
System.setProperty("twitter4j.oauth.consumerKey", consumerKey)
System.setProperty("twitter4j.oauth.consumerSecret",consumerSecret)
System.setProperty("twitter4j.oauth.accessToken",accessToken)
System.setProperty("twitter4j.oauth.accessTokenSecret",accessTokenSecret)
val conf = new SparkConf().setAppName("TwitterHashTags")
val ssc = new StreamingContext(conf, Seconds(5))
val tweets = TwitterUtils.createStream(ssc,None)
val tweetsFilteredByLang = tweets.filter{tweet => tweet.getLang() == lang}
val statuses = tweetsFilteredByLang.map{ tweet => tweet.getText()}
val words = statuses.map{status => status.split("""\s+""")}
val hashTags = words.filter{ word => word.startsWith("#StarWarsDay")}
val hashcounts = hashTags.count()
hashcounts.print()
ssc.start
ssc.awaitTermination()
}
Then I am building the project using
sbt package
and I submitting the generated jars using
spark-submit --class "TwitterCount" --master local[*] target/scala-2.11/twitterexample_2.11-1.0.jar
Please help me with this.
Thanks
--class: The entry point for your application (e.g. org.apache.spark.examples.SparkPi)
You are missing package name in your code. Your spark submit command should be like this.
--class com.spark.examples.TwitterCount
I found the solution at last.
java.lang.NoClassDefFoundError: org/apache/spark/streaming/twitter/TwitterUtils$ while running TwitterPopularTags
I have to build the jars using
sbt assembly
but I'm still wondering what's the difference in jars that I make using
sbt package
anyone knows? plz share

Compilation errors with spark cassandra connector and SBT

I'm trying to get the DataStax spark cassandra connector working. I've created a new SBT project in IntelliJ, and added a single class. The class and my sbt file is given below. Creating spark contexts seem to work, however, the moment I uncomment the line where I try to create a cassandraTable, I get the following compilation error:
Error:scalac: bad symbolic reference. A signature in CassandraRow.class refers to term catalyst
in package org.apache.spark.sql which is not available.
It may be completely missing from the current classpath, or the version on
the classpath might be incompatible with the version used when compiling CassandraRow.class.
Sbt is kind of new to me, and I would appreciate any help in understanding what this error means (and of course, how to resolve it).
name := "cassySpark1"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "1.1.0"
libraryDependencies += "com.datastax.spark" % "spark-cassandra-connector" % "1.1.0" withSources() withJavadoc()
libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector-java" % "1.1.0-alpha2" withSources() withJavadoc()
resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
And my class:
import org.apache.spark.{SparkConf, SparkContext}
import com.datastax.spark.connector._
object HelloWorld { def main(args:Array[String]): Unit ={
System.setProperty("spark.cassandra.query.retry.count", "1")
val conf = new SparkConf(true)
.set("spark.cassandra.connection.host", "cassandra-hostname")
.set("spark.cassandra.username", "cassandra")
.set("spark.cassandra.password", "cassandra")
val sc = new SparkContext("local", "testingCassy", conf)
> //val foo = sc.cassandraTable("keyspace name", "table name")
val rdd = sc.parallelize(1 to 100)
val sum = rdd.reduce(_+_)
println(sum) } }
You need to add spark-sql to dependencies list
libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.1.0"
Add library dependency in your project's pom.xml file. It seems they have changed the Vector.class dependencies location in the new refactoring.