I'm trying out Word Count problem with kafka streams. I am using Kafka 1.1.0 with scala version 2.11.12 and sbt version 1.1.4. I am getting following error:
Exception in thread "wordcount-application-d81ee069-9307-46f1-8e71-c9f777d2db64-StreamThread-1" java.lang.UnsatisfiedLinkError: C:\Users\user\AppData\Local\Temp\librocksdbjni5439068356048679315.dll: À¦¥Y
at java.lang.ClassLoader$NativeLibrary.load(Native Method)
at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1941)
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1824)
at java.lang.Runtime.load0(Runtime.java:809)
at java.lang.System.load(System.java:1086)
at org.rocksdb.NativeLibraryLoader.loadLibraryFromJar(NativeLibraryLoader.java:78)
at org.rocksdb.NativeLibraryLoader.loadLibrary(NativeLibraryLoader.java:56)
at org.rocksdb.RocksDB.loadLibrary(RocksDB.java:64)
at org.rocksdb.RocksDB.<clinit>(RocksDB.java:35)
at org.rocksdb.Options.<clinit>(Options.java:25)
at org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:116)
at org.apache.kafka.streams.state.internals.RocksDBStore.init(RocksDBStore.java:167)
at org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueBytesStore.init(ChangeLoggingKeyValueBytesStore.java:40)
at org.apache.kafka.streams.state.internals.CachingKeyValueStore.init(CachingKeyValueStore.java:63)
at org.apache.kafka.streams.state.internals.InnerMeteredKeyValueStore.init(InnerMeteredKeyValueStore.java:160)
at org.apache.kafka.streams.state.internals.MeteredKeyValueBytesStore.init(MeteredKeyValueBytesStore.java:102)
at org.apache.kafka.streams.processor.internals.AbstractTask.registerStateStores(AbstractTask.java:225)
at org.apache.kafka.streams.processor.internals.StreamTask.initializeStateStores(StreamTask.java:162)
at org.apache.kafka.streams.processor.internals.AssignedTasks.initializeNewTasks(AssignedTasks.java:88)
at org.apache.kafka.streams.processor.internals.TaskManager.updateNewAndRestoringTasks(TaskManager.java:316)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:789)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:750)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:720)
I already tried solution given here UnsatisfiedLinkError on Lib rocks DB dll when developing with Kafka Streams .
Here is the code that i am trying out in scala.
object WordCountApplication {
def main(args: Array[String]) {
val config: Properties = {
val p = new Properties()
p.put(StreamsConfig.APPLICATION_ID_CONFIG, "wordcount-application")
p.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092")
p.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass)
p.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass)
p
}
val builder: StreamsBuilder = new StreamsBuilder()
val textLines: KStream[String, String] = builder.stream("streams-plaintext-input")
val afterFlatMap: KStream[String, String] = textLines.flatMapValues(new ValueMapper[String,java.lang.Iterable[String]] {
override def apply(value: String): lang.Iterable[String] = value.split("\\W+").toIterable.asJava
})
val afterGroupBy: KGroupedStream[String, String] = afterFlatMap.groupBy(new KeyValueMapper[String,String,String] {
override def apply(key: String, value: String): String = value
})
val wordCounts: KTable[String, Long] = afterGroupBy
.count(Materialized.as("counts-store").asInstanceOf[Materialized[String, Long, KeyValueStore[Bytes, Array[Byte]]]])
wordCounts.toStream().to("streams-wordcount-output ", Produced.`with`(Serdes.String(), Serdes.Long()))
val streams: KafkaStreams = new KafkaStreams(builder.build(), config)
streams.start()
Runtime.getRuntime.addShutdownHook(new Thread(
new Runnable{
override def run() = streams.close(10, TimeUnit.SECONDS)}
))
}
}
Build.sbt
name := "KafkaStreamDemo"
version := "0.1"
scalaVersion := "2.11.12"
libraryDependencies ++= Seq(
"org.apache.kafka" %% "kafka" % "1.1.0",
"org.apache.kafka" % "kafka-clients" % "1.1.0",
"org.apache.kafka" % "kafka-streams" % "1.1.0",
"ch.qos.logback" % "logback-classic" % "1.2.3"
)
If anyone has faced such problem, kindly help.
Finally, I found answer which works. I followed Unable to load rocksdbjni.
I did two things which worked for me.
1) I installed Visual C++ Redistributable for Visual Studio 2015.
2) Earlier, i was using rocksdb 5.7.3 with kafka-streams 1.1.0(rocksdb 5.7.3 comes by default with kafka-streams 1.1.0). I excluded rocksdb dependency from kafka-streams dependency and installed rocksdb 5.3.6. For reference, below is my built.sbt right now.
name := "KafkaStreamDemo"
version := "0.1"
scalaVersion := "2.12.5"
libraryDependencies ++= Seq(
"org.apache.kafka" %% "kafka" % "1.1.0",
"org.apache.kafka" % "kafka-clients" % "1.1.0",
"org.apache.kafka" % "kafka-streams" % "1.1.0" exclude("org.rocksdb","rocksdbjni"),
"ch.qos.logback" % "logback-classic" % "1.2.3",
"org.rocksdb" % "rocksdbjni" % "5.3.6"
)
Hope, it helps someone.
Thanks
In my case I was using kafka-streams:1.0.2. Changing the base docker image from alpine-jdk8:latest to openjdk:8-jre worked.
This link - https://github.com/docker-flink/docker-flink/pull/22 helped me arrive at this solution.
Related
It work fine when using generic class.
But get java.lang.NoClassDefFoundError: scala/Product$class error after change class to case class.
Not sure is sbt packaging problem or code problem.
When I'm using:
sbt
scala: 2.11.12
java: 8
sbt assembly to package
package example
import java.util.Properties
import java.nio.charset.StandardCharsets
import org.apache.flink.api.scala._
import org.apache.flink.streaming.util.serialization.{DeserializationSchema, SerializationSchema}
import org.apache.flink.streaming.api.scala.{DataStream, StreamExecutionEnvironment}
import org.apache.flink.streaming.connectors.kafka.{FlinkKafkaConsumer, FlinkKafkaProducer}
import org.apache.flink.streaming.api.watermark.Watermark
import org.apache.flink.streaming.api.functions.AssignerWithPunctuatedWatermarks
import org.apache.flink.api.common.typeinfo.TypeInformation
import Config._
case class Record(
id: String,
startTime: Long
) {}
class RecordDeSerializer extends DeserializationSchema[Record] with SerializationSchema[Record] {
override def serialize(record: Record): Array[Byte] = {
return "123".getBytes(StandardCharsets.UTF_8)
}
override def deserialize(b: Array[Byte]): Record = {
Record("1", 123)
}
override def isEndOfStream(record: Record): Boolean = false
override def getProducedType: TypeInformation[Record] = {
createTypeInformation[Record]
}
}
object RecordConsumer {
def main(args: Array[String]): Unit = {
val config : Properties = {
var p = new Properties()
p.setProperty("zookeeper.connect", Config.KafkaZookeeperServers)
p.setProperty("bootstrap.servers", Config.KafkaBootstrapServers)
p.setProperty("group.id", Config.KafkaGroupID)
p
}
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.enableCheckpointing(1000)
var consumer = new FlinkKafkaConsumer[Record](
Config.KafkaTopic,
new RecordDeSerializer(),
config
)
consumer.setStartFromEarliest()
val stream = env.addSource(consumer).print
env.execute("record consumer")
}
}
Error
2020-08-05 04:07:33,963 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Discarding checkpoint 1670 of job 4de8831901fa72790d0a9a973cc17dde.
java.lang.NoClassDefFoundError: scala/Product$class
...
build.SBT
First idea is that maybe version is not right.
But every thing work fine if use normal class
Here is build.sbt
ThisBuild / resolvers ++= Seq(
"Apache Development Snapshot Repository" at "https://repository.apache.org/content/repositories/snapshots/",
Resolver.mavenLocal
)
name := "deedee"
version := "0.1-SNAPSHOT"
organization := "dexterlab"
ThisBuild / scalaVersion := "2.11.8"
val flinkVersion = "1.8.2"
val flinkDependencies = Seq(
"org.apache.flink" %% "flink-scala" % flinkVersion % "provided",
"org.apache.flink" %% "flink-streaming-scala" % flinkVersion % "provided",
"org.apache.flink" %% "flink-streaming-java" % flinkVersion % "provided",
"org.apache.flink" %% "flink-connector-kafka" % flinkVersion,
)
val thirdPartyDependencies = Seq(
"com.github.nscala-time" %% "nscala-time" % "2.24.0",
"com.typesafe.play" %% "play-json" % "2.6.14",
)
lazy val root = (project in file(".")).
settings(
libraryDependencies ++= flinkDependencies,
libraryDependencies ++= thirdPartyDependencies,
libraryDependencies += "org.scala-lang" % "scala-compiler" % scalaVersion.value,
)
assembly / mainClass := Some("dexterlab.TelecoDataConsumer")
// make run command include the provided dependencies
Compile / run := Defaults.runTask(Compile / fullClasspath,
Compile / run / mainClass,
Compile / run / runner
).evaluated
// stays inside the sbt console when we press "ctrl-c" while a Flink programme executes with "run" or "runMain"
Compile / run / fork := true
Global / cancelable := true
// exclude Scala library from assembly
assembly / assemblyOption := (assembly / assemblyOption).value.copy(includeScala = false)
autoCompilerPlugins := true
Finally success after I add this line in build.sbt
assembly / assemblyOption := (assemblu / assemblyOption).value.copy(includeScala = true)
To include scala library when running sbt assembly
I am using case class in Scala (2.12.8) Apache Flink (1.9.1) application. I get the following exception when I run the code below Caused by: java.lang.NoSuchMethodError: scala.Product.$init$(Lscala/Product;)V.
NOTE: I have used the default constructor as per the suggestion ( java.lang.NoSuchMethodException for init method in Scala case class) but that does not work in my case
Here is the complete code
package com.zignallabs
import org.apache.flink.api.scala._
/**
// Implements the program that reads from a Element list, Transforms it into tuple and outputs to TaskManager
*/
case class AddCount ( firstP: String, count: Int) {
def this () = this ("default", 1) // No help when added default constructor as per https://stackoverflow.com/questions/51129809/java-lang-nosuchmethodexception-for-init-method-in-scala-case-class
}
object WordCount {
def main(args: Array[String]): Unit = {
// set up the execution environment
val env = ExecutionEnvironment.getExecutionEnvironment
// get input data
val input =env.fromElements(" one", "two", "three", "four", "five", "end of test")
// ***** Line 31 throws the exception
// Caused by: java.lang.NoSuchMethodError: scala.Product.$init$(Lscala/Product;)V
// at com.zignallabs.AddCount.<init>(WordCount.scala:7)
// at com.zignallabs.WordCount$.$anonfun$main$1(WordCount.scala:31)
// at org.apache.flink.api.scala.DataSet$$anon$1.map(DataSet.scala:490)
// at org.apache.flink.runtime.operators.chaining.ChainedMapDriver.collect(ChainedMapDriver.java:79)
// at org.apache.flink.runtime.operators.util.metrics.CountingCollector.collect(CountingCollector.java:35)
// at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:196)
// at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
// at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
// at java.lang.Thread.run(Thread.java:748)
val transform = input.map{w => AddCount(w, 1)} // <- Throwing exception
// execute and print result
println(transform)
transform.print()
transform.printOnTaskManager(" Word")
env.execute()
}
}
Run time exception is :
at com.zignallabs.AddCount.<init>(WordCount.scala:7)
at com.zignallabs.WordCount$.$anonfun$main$1(WordCount.scala:31)
at org.apache.flink.api.scala.DataSet$$anon$1.map(DataSet.scala:490)
at org.apache.flink.runtime.operators.chaining.ChainedMapDriver.collect(ChainedMapDriver.java:79)
at org.apache.flink.runtime.operators.util.metrics.CountingCollector.collect(CountingCollector.java:35)
at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:196)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
at java.lang.Thread.run(Thread.java:748)
I am building and running flink locally using local flink cluster with flink version 1.9.1.
Here is the build.sbt file:
name := "flink191KafkaScala"
version := "0.1-SNAPSHOT"
organization := "com.zignallabs"
scalaVersion := "2.12.8"
val flinkVersion = "1.9.1"
//javacOptions ++= Seq("-source", "1.7", "-target", "1.7")
val http4sVersion = "0.16.6"
resolvers ++= Seq(
"Local Ivy" at "file:///"+Path.userHome+"/.ivy2/local",
"Local Ivy Cache" at "file:///"+Path.userHome+"/.ivy2/cache",
"Local Maven Repository" at "file:///"+Path.userHome+"/.m2/repository",
"Artifactory Cache" at "https://zignal.artifactoryonline.com/zignal/zignal-repos"
)
val excludeCommonsLogging = ExclusionRule(organization = "commons-logging")
libraryDependencies ++= Seq(
"org.apache.flink" %% "flink-scala" % flinkVersion % "provided",
"org.apache.flink" %% "flink-streaming-scala" % flinkVersion % "provided",
"org.apache.flink" %% "flink-clients" % "1.9.1",
// Upgrade to flink-connector-kafka_2.11
"org.apache.flink" %% "flink-connector-kafka-0.11" % "1.9.1",
//"org.scalaj" %% "scalaj-http" % "2.4.2",
"com.squareup.okhttp3" % "okhttp" % "4.2.2"
)
publishTo := Some("Artifactory Realm" at "https://zignal.artifactoryonline.com/zignal/zignal")
credentials += Credentials("Artifactory Realm", "zignal.artifactoryonline.com", "buildserver", "buildserver")
//mainClass in Compile := Some("com.zignallabs.StoryCounterTopology")
mainClass in Compile := Some("com.zignallabs.WordCount")
scalacOptions ++= Seq(
"-feature",
"-unchecked",
"-deprecation",
"-language:implicitConversions",
"-Yresolve-term-conflict:package",
"-language:postfixOps",
"-target:jvm-1.8")
lazy val root = project.in(file(".")).configs(IntegrationTest)
If you're using default args for the constructors of a case class, it's much more idiomatic Scala to define them like this:
case class AddCount ( firstP: String = "default", count: Int = 1)
This is syntactic sugar that basically gives you the following for free:
case class AddCount ( firstP: String, count: Int) {
def this () = this ("default", 1)
def this (firstP:String) = this (firstP, 1)
def this (count:Int) = this ("default", count)
}
I am able to now run this application using Scala 2.12. The issue was in the environment. I needed to ensure conflicts binaries are not there especially the ones for scala 2.11 and scala 2.12
When I try sbt package in my below code I get these following errors
object apache is not a member of package org
not found: value SparkSession
MY Spark Version: 2.4.4
My Scala Version: 2.11.12
My build.sbt
name := "simpleApp"
version := "1.0"
scalaVersion := "2.11.12"
//libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.4"
libraryDependencies ++= {
val sparkVersion = "2.4.4"
Seq( "org.apache.spark" %% "spark-core" % sparkVersion)
}
my Scala project
import org.apache.spark.sql.SparkSession
object demoapp {
def main(args: Array[String]) {
val logfile = "C:/SUPPLENTA_INFORMATICS/demo/hello.txt"
val spark = SparkSession.builder.appName("Simple App in Scala").getOrCreate()
val logData = spark.read.textFile(logfile).cache()
val numAs = logData.filter(line => line.contains("Washington")).count()
println(s"Lines are: $numAs")
spark.stop()
}
}
If you want to use Spark SQL, you also have to add the spark-sql module to the dependencies:
// https://mvnrepository.com/artifact/org.apache.spark/spark-sql
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.4"
Also, note that you have to reload your project in SBT after changing the build definition and import the changes in intelliJ.
I am using the Akka Streams API in a Scala project, working in Intellij IDEA with the SBT plugin. I have a worker pool Flow as described here: https://doc.akka.io/docs/akka/current/scala/stream/stream-cookbook.html. Here is my code:
package streams
import akka.NotUsed
import akka.stream.scaladsl.{Balance, Flow, GraphDSL, Merge}
import akka.stream.{FlowShape, Graph}
object WorkerPoolFlow {
def apply[In, Out](
worker: Flow[In, Out, Any],
workerCount: Int):
Graph[FlowShape[In, Out], NotUsed] = {
GraphDSL.create() { implicit b =>
val balance = b.add(Balance[In](workerCount, waitForAllDownstreams = true))
val merge = b.add(Merge[Out](workerCount))
for (i <- 0 until workerCount)
balance.out(i) ~> worker.async ~> merge.in(i)
FlowShape(
balance.in,
merge.out)
}
}
}
For some reason the project is now failing to compile, giving this error: value ~> is not a member of akka.stream.Outlet[In].
It compiled fine until today. The only change I am aware of making is installing a Scala linter plugin scalafmt, and importing a few new libraries in build.sbt. Here is my build.sbt:
name := "myProject"
version := "0.1"
scalaVersion := "2.11.11"
unmanagedJars in Compile += file("localDep1.jar")
unmanagedJars in Compile += file("localDep2.jar")
libraryDependencies += "io.spray" %% "spray-json" % "1.3.3"
libraryDependencies += "com.typesafe.akka" %% "akka-actor" % "2.5.6"
libraryDependencies += "com.typesafe.akka" %% "akka-stream" % "2.5.6"
libraryDependencies += "com.typesafe.akka" %% "akka-testkit" % "2.5.6" % Test
libraryDependencies += "com.typesafe.akka" %% "akka-stream-testkit" % "2.5.6" % Test
libraryDependencies += "com.47deg" %% "fetch" % "0.6.0"
libraryDependencies += "ch.qos.logback" % "logback-classic" % "1.2.3"
I have tried reloading SBT, building from SBT outside of IDEA, removing and re-adding dependencies, and cleaning the project, with no luck.
Import GraphDSL.Implicits._:
object WorkerPoolFlow {
def apply[In, Out](
worker: Flow[In, Out, Any],
workerCount: Int):
Graph[FlowShape[In, Out], NotUsed] = {
import GraphDSL.Implicits._
GraphDSL.create() { implicit b =>
...
}
}
}
I have problem with using updateStateByKey() function. I have following, simple code (written base on book: "Learning Spark - Lighting Fast Data Analysis"):
object hello {
def updateStateFunction(newValues: Seq[Int], runningCount: Option[Int]): Option[Int] = {
Some(runningCount.getOrElse(0) + newValues.size)
}
def main(args: Array[String]) {
val conf = new SparkConf().setMaster("local[5]").setAppName("AndrzejApp")
val ssc = new StreamingContext(conf, Seconds(4))
ssc.checkpoint("/")
val lines7 = ssc.socketTextStream("localhost", 9997)
val keyValueLine7 = lines7.map(line => (line.split(" ")(0), line.split(" ")(1).toInt))
val statefullStream = keyValueLine7.updateStateByKey(updateStateFunction _)
ssc.start()
ssc.awaitTermination()
}
}
My build.sbt is:
name := "stream-correlator-spark"
version := "1.0"
scalaVersion := "2.11.4"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.3.1" % "provided",
"org.apache.spark" %% "spark-streaming" % "1.3.1" % "provided"
)
When I build it with sbt assembly command everything goes fine. When I run this on spark cluster in local mode I got error:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/streaming/dstream/DStream$
at hello$.main(helo.scala:25)
...
line 25 is:
val statefullStream = keyValueLine7.updateStateByKey(updateStateFunction _)
I feel this might be some compatibility version problem but I don't know what might be the reason and how to resolve this.
I would be really grateful for help!
When you are writing "provided" in the SBT this means exactly that your library is provided by the environment and need no to be included in the package.
Try to remove "provided" mark from "spark-streaming" library.
You can add "provided" back when you need to submit your app to a spark cluster to run. The benefit of having "provided" is that the result fat jar will not include classes from the provided dependencies, which will yield a much smaller fat jar, comparing to not having "provided". In my case, the result jar will be around 90M without "provided" and then shrink to 30+M with "provided".