How to dockerize and write a dockerfile a Scala written kafka producer - scala

I am new to dockers and been trying to understand how to write a docker file to create my custom image. My Scala class is producing message to a topic continuously. I want to reproduce the same functionality with dockers. Can someone help me with the docker file.
I have tried using sbt docker:publishLocal, it creates the image but when i try to run the image it says unble to find the class. I am specifically looking to run it using docker file.
Here is the code which is working in intelliJ
import java.util.Properties
import org.apache.kafka.clients.producer._
object Scala_producer extends App{
val props = new Properties()
props.put("bootstrap.servers", "localhost:9092")
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer")
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer")
val producer = new KafkaProducer[String, String](props)
val TOPIC="tt"
println(producer.partitionsFor(TOPIC))
while(true){
val record = new ProducerRecord(TOPIC, "key", "hello ")
producer.send(record)
println("producing")
}
producer.close()
}
i expect to run the docker and get infinite producing message.

If this is the only issue, then add the mainclass to your build.sbt:
mainClass in Compile := Some("com.example.Scala_producer")

Related

Kafka TimeoutException: Topic not present in metadata after 60000 ms

I'm trying out with some Kafka basics and following examples at https://kafka.apache.org/quickstart. After starting zookeepier and kafka, I tried producing and consuming with included kafka shell scripts and it all worked without issue.
When I try to produce message from simple scala application then I get following error org.apache.kafka.common.errors.TimeoutException: Topic quickstart-events not present in metadata after 60000 ms.
I ensured the topic has been created and can telnet to localhost:9092 as well.
Here's the code I'm using for producer:
val props = new Properties()
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092")
props.put(ProducerConfig.CLIENT_ID_CONFIG, "test")
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, classOf[StringSerializer].getName)
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, classOf[StringSerializer].getName)
val producer = new KafkaProducer[String, String](props)
producer.send(new ProducerRecord[String, String]("quickstart-events", "1", "some event")).get()
Running this on mac, above code is part of a test case executed in IntelliJ.
Solved. I used kafka-clients library version 2.6.0 and running kafka server version 3.2.0. Matching version of the library fixed the issue.
I got this problem as well, the version is correct for me.
I figure out it's the lack of sasl certification.
try:
// set SASL configuration here
props.put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, “SASL_PLAINTEXT”);
props.put(SaslConfigs.SASL_MECHANISM, “PLAIN”);
props.put(“sasl.jaas.config”,
“org.apache.kafka.common.security.plain.PlainLoginModule required username=\”alice\” password=\”123456\”;”);

Why does the kafka consumer code freeze when I start spark stream?

I am new to Kafka and trying to implement Kafka consumer logic in spark2 and when I run all my code in the shell and start the streaming it shows nothing.
I have viewed many posts in StackOverflow but nothing helped me. I have even downloaded all the dependency jars from maven and tried to run but it still shows nothing.
Spark Version: 2.2.0
Scala version 2.11.8
jars I downloaded are kafka-clients-2.2.0.jar and spark-streaming-kafka-0-10_2.11-2.2.0.jar
but it still I face the same issue.
Please find the below code snippet
import org.apache.kafka.clients.consumer.ConsumerConfig
import org.apache.kafka.common.serialization.StringDeserializer
import org.apache.spark.streaming.{StreamingContext, Seconds}
import org.apache.spark.streaming.kafka010.{KafkaUtils, ConsumerStrategies, LocationStrategies}
val brokers = "host1:port, host2:port"
val groupid = "default"
val topics = "kafka_sample"
val topicset = topics.split(",").toSet
val ssc = new StreamingContext(sc, Seconds(2))
val kafkaParams = Map[String, Object](
ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG -> brokers,
ConsumerConfig.GROUP_ID_CONFIG -> groupid,
ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG -> classOf[StringDeserializer],
ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG -> classOf[StringDeserializer]
)
val msg = KafkaUtils.createDirectStream[String, String](
ssc, LocationStrategies.PreferConsistent, ConsumerStrategies.Subscribe[String, String](topicset, kafkaParams)
)
msg.foreachRDD{
rdd => rdd.collect().foreach(println)
}
ssc.start()
I am expecting SparkStreaming to start but it doesn't do anything. What mistake have I done here? Or is this a known issue?
The driver will be sitting idle unless you call ssc.awaitTermination() at the end. If you're using spark-shell then it's not a good tool for streaming jobs.
Please, use interactive tools like Zeppelin or Spark notebook for interacting with streaming or try building your app as jar file and then deploy.
Also, if you're trying out spark streaming, Structured Streaming would be better as it is quite easy to play with.
http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html
After ssc.start() use ssc.awaitTermination() in your code.
For testing, write your code in a Main Object and run it in any IDE like Intellij
You can use command shell and publish messages from the Kafka producer.
I have written all these steps in a simple example in a blog post with working code in GitHub. Please refer to: http://softwaredevelopercentral.blogspot.com/2018/10/spark-streaming-and-kafka-integration.html

I am using Kafka Producer Api to write messages from a file into kafka topic, but the logs of kafka topic is showing empty?

I am using below Producer API code to write messages into Kafka topic, but its unable to write messages into topic:
import java.util.Properties
import com.typesafe.config.ConfigFactory
import org.apache.kafka.clients.producer.{KafkaProducer, ProducerConfig, ProducerRecord}
import scala.io.Source
object KafkaProducerDemo {
def main(args: Array[String]): Unit = {
val props = new Properties()
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092")
props.put(ProducerConfig.CLIENT_ID_CONFIG, "KafkaProducerDemo")
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer")
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer")
val producer = new KafkaProducer[Nothing, String](props)
val logMessages = Source.
fromFile("/opt/gen_logs/logs/access.log").
getLines.
toList
logMessages.foreach(message => {
val record = new ProducerRecord("retail-multi", message)
producer.send(record)
})
}
}
Based on the error you mentioned in comments (java.lang.ArrayIndexOutOfBoundsException: 18), I'd say you've got a mismatch between your client library version and your broker version. Client lib should be < broker (unless client lib supports dynamic api-version checking).
So double check the broker version you are connecting to, and then double check your client library version. Once they match or are compatible, you should be good to go!
Hi I this may be because of some kafka version mismatch. I have re-install kafka and sbt. And it started working fine now.

flink read data from kafka

I write a simple example
val env = StreamExecutionEnvironment.getExecutionEnvironment
val properties = new Properties()
properties.setProperty("bootstrap.servers","xxxxxx")
properties.setProperty("zookeeper.connect","xxxxxx")
properties.setProperty("group.id", "caffrey")
val stream = env
.addSource(new FlinkKafkaConsumer082[String]("topic", new SimpleStringSchema(), properties))
.print()
env.execute("Flink Kafka Example")
when I run this code I got an error like this:
[error] Class
org.apache.flink.streaming.api.checkpoint.CheckpointNotifier not found
- continuing with a stub.
I google this error and find CheckpointNotifier is an interface.
I really don't understand where did I do wrong.
Since CheckpointNotifier is a class from an older Flink version, I suspect that you are mixing different Flink dependencies in your pom file.
Make sure all Flink dependencies have the same version.

Flink with Kafka Consumer doesn't work

I want to benchmark Spark vs Flink, for this purpose I am making several tests. However Flink doesn't work with Kafka, meanwhile with Spark works perfect.
The code is very simple:
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
val properties = new Properties()
properties.setProperty("bootstrap.servers", "localhost:9092")
properties.setProperty("group.id", "myGroup")
println("topic: "+args(0))
val stream = env.addSource(new FlinkKafkaConsumer09[String](args(0), new SimpleStringSchema(), properties))
stream.print
env.execute()
I use kafka 0.9.0.0 with the same topics (in consumer[Flink] and producer[Kafka console]), but when I send my jar to the cluster, nothing happens:
Cluster Flink
What it could be happening?
Your stream.print will not print in console on flink .It will write to flink0.9/logs/recentlog. Other-wise you can add your own logger for confirming output.
For this particular case (a Source chained into a Sink) the Webinterface will never report Bytes/Records sent/received. Note that this will change in the somewhat near future.
Please check whether the job-/taskmanager logs do not contain any output.