Kafka 2.3.0 producer and consumer - scala

I'm new to kafka,and want to use Kafka 2.3 to implement a producer/consumer app.
I had download and install the Kafka 2.3 on my ubuntu server.
I found some code online and build it on my laptop in IDEA, But the consumer can't get any info.
I had checked the topic info on my server which has the topic.
I had use kafka-console-consumer to check this topic, got the topic's value successfuly, but not with my consumer.
So what's wrong with my consumer?
Producer
package com.phitrellis.tool
import java.util.Properties
import java.util.concurrent.{Future, TimeUnit}
import org.apache.kafka.clients.consumer.KafkaConsumer
import org.apache.kafka.clients.producer._
object MyKafkaProducer extends App {
def createKafkaProducer(): Producer[String, String] = {
val props = new Properties()
props.put("bootstrap.servers", "*:9092")
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer")
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer")
props.put("producer.type", "async")
props.put("acks", "all")
new KafkaProducer[String, String](props)
}
def writeToKafka(topic: String): Unit = {
val producer = createKafkaProducer()
val record = new ProducerRecord[String, String](topic, "key", "value22222222222")
println("start")
producer.send(record)
producer.close()
println("end")
}
writeToKafka("phitrellis")
}
Consumer
package com.phitrellis.tool
import java.util
import java.util.Properties
import java.time.Duration
import scala.collection.JavaConverters._
import org.apache.kafka.clients.consumer.KafkaConsumer
object MyKafkaConsumer extends App {
def createKafkaConsumer(): KafkaConsumer[String, String] = {
val props = new Properties()
props.put("bootstrap.servers", "*:9092")
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
// props.put("auto.offset.reset", "latest")
props.put("enable.auto.commit", "true")
props.put("auto.commit.interval.ms", "1000")
props.put("group.id", "test")
new KafkaConsumer[String, String](props)
}
def consumeFromKafka(topic: String) = {
val consumer: KafkaConsumer[String, String] = createKafkaConsumer()
consumer.subscribe(util.Arrays.asList(topic))
while (true) {
val records = consumer.poll(Duration.ofSeconds(2)).asScala.iterator
println("true")
for (record <- records){
print(record.value())
}
}
}
consumeFromKafka("phitrellis")
}

Two line in your Consumer code are crucial:
props.put("auto.offset.reset", "latest")
props.put("group.id", "test")
To read from beginning of the topic you have to set auto.offset.reset to earliest (latest cause that you skip messages produced before your Consumer started).
group.id is responsible for group management. If you start processing data with some group.id and than restart your application or start new with same group.id only new messages will be read.
For your tests I would suggest to add auto.offset.reset -> earliest and change group.id
props.put("auto.offset.reset", "earliest")
props.put("group.id", "test123")
Additionally:
You have to remember that KafkaProducer::send returns Future<RecordMetadata> and messages are sent asynchronously and if you progam finished before Future will finished messages might not be sent.

There's two parts here. The producing side, and the consumer.
You don't say anything about the producer, so we're assuming it did work. However, did you check on the servers? You could check the kafka log files to see if there's any data on those particular topic/partitions.
On the consumer side, to validate, you should try to consume using the command-line from that same topic, to make sure the data is in there. Look for "Kafka Consumer Console" at the following link, and follow those steps.
http://cloudurable.com/blog/kafka-tutorial-kafka-from-command-line/index.html
If there is data on the topic, then running that command should get you data. If it's not, then it will just "hang" because it's waiting for data to be written to the topic.
In addition, you can try producing to the same topic using those command line tools, to make sure your cluster is configured correctly, you have the right addresses and ports, that the ports are not blocked, etc.

Related

How to consume only latest offset in Kafka topic

I am working on a scala application in which I am using kafka. My kafka consumer code is as follows.
def getValues(topic: String): String = {
val props = new Properties()
props.put("group.id", "test")
props.put("bootstrap.servers", "localhost:9092")
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
props.put("auto.offset.reset", "earliest")
val consumer: KafkaConsumer[String, String] = new KafkaConsumer[String, String](props)
val topicPartition = new TopicPartition(topic, 0)
consumer.assign(util.Collections.singletonList(topicPartition))
val offset = consumer.position(topicPartition) - 1
val record = consumer.poll(Duration.ofMillis(500)).asScala
for (data <- record)
if(data.offset() == offset) val value = data.value()
return value
}
In this I just want to return latest value. When I run my application I get following log:
Resetting offset for partition topic-0 to offset 0
Because of which val offset = consumer.position(topicPartition) - 1 becomes -1 and data.offset() gives list of all offsets. And as a result I don't get the latest value. Why it is automatically resetting offset to 0? How can I correct it? What is the mistake in my code? or anyother way I can get the value from the latest offset?
You are looking for the seek method which - according to the JavaDocs - "overrides the fetch offsets that the consumer will use on the next poll(timeout)".
Also make sure that you are setting
props.put("auto.offset.reset", "latest")
Making those two amendments to your code, the following worked for me to only fetch the value of the latest offset of the partion 0 in the selected topic:
import java.time.Duration
import java.util.Properties
import org.apache.kafka.clients.consumer.KafkaConsumer
import org.apache.kafka.common.TopicPartition
import collection.JavaConverters._
def getValues(topic: String): String = {
val props = new Properties()
props.put("group.id", "test")
props.put("bootstrap.servers", "localhost:9092")
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
props.put("auto.offset.reset", "latest")
val consumer: KafkaConsumer[String, String] = new KafkaConsumer[String, String](props)
val topicPartition = new TopicPartition(topic, 0)
consumer.assign(java.util.Collections.singletonList(topicPartition))
val offset = consumer.position(topicPartition) - 1
consumer.seek(topicPartition, offset)
val record = consumer.poll(Duration.ofMillis(500)).asScala
for (data <- record) {
val value: String = data.value() // you are only reading one message if no new messages flow into the Kafka topic
}
value
}
In this line, props.put("auto.offset.reset", "earliest"), you set the parameter auto.offset.reset of your Kafka consumer to earliest, which will reset the offset to earliest. If you want the latest value, you should use latest instead.
You can find the documentation here.

How to resolve Kafka Consumer poll timeout error

I am trying to use Apache Kafka through a vagrant machine to run a simple Kafka Consumer program. The program get's stuck before the for loop when it tries to call the .poll(100) method.
Lot's of digging into deeper classes for debugging but not much has been found.
val TOPIC="testTopic"
val props = new Properties()
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "192.168.56.10:9092")
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")
props.put(ConsumerConfig.GROUP_ID_CONFIG, UUID.randomUUID().toString());
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
val consumer = new KafkaConsumer[String, String](props)
consumer.subscribe(util.Collections.singletonList(TOPIC))
while(true) {
println("Test")
val records = consumer.poll(100)
for (record <- records.asScala) {
println(record)
}
println("Test2")
}
}
Currently outputs Test and then get's stuck with no error message. It's expected that it will output the contents of the Kafka topic.
You need to upgrade your kafka-clients version to 2.0.0 or above. When the kafka server is down, for example, using the poll method from KafkaConsumer class you will get stuck in the internal loop waiting for the broker to become available again.
According to KIP-266:
ConsumerRecords
poll​(long timeout)
Deprecated. Since 2.0. Use poll(Duration), which does not block
beyond the timeout awaiting partition assignment. See KIP-266 for more
information.
In your case:
import org.apache.kafka.clients.consumer.KafkaConsumer;
import scala.concurrent.duration._
// ...
val timeout = Duration(100, MILLISECONDS)
while(true) {
println("Test")
val records = consumer.poll(timeout)
for (record <- records.asScala) {
println(record)
}
println("Test2")
}
//...
In conclusion, you just need to import the new version of the KafkaConsumer class and pass the timeout parameter to the new poll method as an instance of the Duration object.

Apache Kafka: How to receive latest message from Kafka?

I am consuming and processing messages in the Kafka consumer application using Spark in Scala. Sometimes it takes little more time than usual to process messages from Kafka message queue. At that time I need to consume latest message, ignoring the earlier ones which have been published by the producer and yet to be consumed.
Here is my consumer code:
object KafkaSparkConsumer extends MessageProcessor {
def main(args: scala.Array[String]): Unit = {
val properties = readProperties()
val streamConf = new SparkConf().setMaster("local[*]").setAppName("Kafka-Stream")
val ssc = new StreamingContext(streamConf, Seconds(1))
val group_id = Random.alphanumeric.take(4).mkString("dfhSfv")
val kafkaParams = Map("metadata.broker.list" -> properties.getProperty("broker_connection_str"),
"zookeeper.connect" -> properties.getProperty("zookeeper_connection_str"),
"group.id" -> group_id,
"auto.offset.reset" -> properties.getProperty("offset_reset"),
"zookeeper.session.timeout" -> properties.getProperty("zookeeper_timeout"))
val msgStream = KafkaUtils.createStream[scala.Array[Byte], String, DefaultDecoder, StringDecoder](
ssc,
kafkaParams,
Map("moved_object" -> 1),
StorageLevel.MEMORY_ONLY_SER
).map(_._2)
msgStream.foreachRDD { x =>
x.foreach {
msg => println("Message: "+msg)
processMessage(msg)
}
}
ssc.start()
ssc.awaitTermination()
}
}
Is there any way to make sure the consumer always gets the most recent message in the consumer application? Or do I need to set any property in Kafka configuration to achieve the same?
Any help on this would be greatly appreciated. Thank you
Kafka consumer api include method
void seekToEnd(Collection<TopicPartition> partitions)
So, you can get assigned partitions from consumer and seek for all of them to the end. There is similar method to seekToBeginning.
You can leverage two KafkaConsumer APIs to get the very last message from a partition (assuming log compaction won't be an issue):
public Map<TopicPartition, Long> endOffsets(Collection<TopicPartition> partitions): This gives you the end offset of the given partitions. Note that the end offset is the offset of the next message to be delivered.
public void seek(TopicPartition partition, long offset): Run this for each partition and provide its end offset from above call minus 1 (assuming it's greater than 0).
You can always generate a new (random) group id when connecting to Kafka - that way you will start consuming new messages when you connect.
Yes, you can set staringOffset to latest to consume latest messages.
val spark = SparkSession
.builder
.appName("kafka-reading")
.getOrCreate()
import spark.implicits._
val df = spark
.readStream
.format("kafka")
.option("kafka.bootstrap.servers", "localhost:9092")
.option("startingOffsets", "latest")
.option("subscribe", topicName)
.load()

Kafka producer creates topic but is not able to send messages

I'm new to Scala and Kafka and I've run into some trouble.
I'm trying to connect a scala kafka producer to a kafka server that is installed on a cloudera express server.
I have done this already once in VMs with these instructions and didn't have any problems.
When I run the producer the desired topic is created but none of the messages is sent, or so I think.
Here follows some of the code:
Kafka producer
import java.util.Properties
import org.apache.kafka.clients.producer.{KafkaProducer, ProducerRecord}
class KafkaProducerManager {
val props = new Properties()
props.put("bootstrap.servers", KafkaServer.KAFKA_ADDRESS)
props.put("acks", "all")
props.put("retries", "2")
props.put("auto.commit.interval.ms", "1000")
props.put("linger.ms", "1")
props.put("block.on.buffer.full", "true")
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer")
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer")
props.put("auto.create.topics.enable", "true")
val producer = new KafkaProducer[String, String](props)
def startCounter() {
println("Start Producer Counter")
for (i <- 1 to 100) {
producer.send(new ProducerRecord("test-counter", i.toString, "Package " + i))
println("Producer - Send: " + i)
}
println("Closing producer")
producer.close()
}
}
When I execute the run method, I see "Producer - Send: #" as output of this and I get no errors.
So I assume that this piece of code has sent the messages to Kafka.
I started the following on the kafka server before I ran the producer:
kafka-console-consumer --zookeeper zk:2181 --topic test-counter
But here I see nothing happens.
When I check for the topic, that the producer is supposed to create, is in the list.
kafka-topics -zookeeper zk:2181 --list
I also have a similar problem with the consumer:
import java.util.{Arrays, Properties}
import org.apache.kafka.clients.consumer.KafkaConsumer
class KafkaConsumerManager {
val props = new Properties()
props.put("bootstrap.servers", KafkaServer.KAFKA_ADDRESS)
props.put("group.id", "testGroup")
props.put("enable.auto.commit", "true")
props.put("auto.commit.interval.ms", "1000")
props.put("linger.ms", "1")
props.put("session.timeout.ms", "3000")
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer")
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer")
props.put("zookeeper.connect", KafkaServer.ZOOKEEPER_ADDRESS)
val consumer = new KafkaConsumer[String, String](props)
def start() {
println("Start Consumer")
consumer.subscribe(Arrays.asList("test-counter"))
while (true) {
val records = consumer.poll(100)
val iterator = records.iterator()
while (iterator.hasNext) {
val record = iterator.next()
printf("Consumer: offset = %d, key = %s, value = %s \n", record.offset(), record.key(), record.value())
}
}
}
}
When I create messages on the server via kafka-console-producer I see them appear in the kafka-console-consumer on the server, but not in the consumer I wrote.
kafka-console-producer --broker-list ks:9092 --topic test-counter
The KafkaServer.ZOOKEEPER_ADDRESS is the same as the argument zk:2181 with kafka-console-consumer and the KafkaServer.KAFKA_ADDRESS is the same as the argument ks:9092 with the kafka-console-producer.
I tried the code and found that:
one should specify key and value deserializers in consumer
properties:
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
there is a problem with session.timeout.ms property. From here:
heartbeat.interval.ms - ... The value must be set lower than session.timeout.ms ... default: 3000
It means that you should either increase your session.timeout.ms
value or simply remove the line because default value for the
property is 30000 which is greater than default
heartbeat.interval.ms.
After performing the corrections the code works.
if you are running in windows machine? and following quickstart guide, I had the same issue producer/consumer not giving any error but running either, you need to set kafka_home in your envrionment variabel KAFKA_HOME=C:\kafka_2.13-2.6.0
and then for zooker/server/topic/consumer/producer everything run under your kafka/windows
example: for consumer
%KAFKA_HOME%/bin/windows/kafka-console-consumer.bat --topic quickstart-events --from-beginning --bootstrap-server localhost:9092

Can anyone share a Flink Kafka example in Scala?

Can anyone share a working example of Flink Kafka (mainly receiving messages from Kafka) in Scala? I know there is a KafkaWordCount example in Spark. I just need to print out Kafka message in Flink. It would be really helpful.
The following code shows how to read from a Kafka topic using Flink's Scala DataStream API:
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer082
import org.apache.flink.streaming.util.serialization.SimpleStringSchema
object Main {
def main(args: Array[String]) {
val env = StreamExecutionEnvironment.getExecutionEnvironment
val properties = new Properties()
properties.setProperty("bootstrap.servers", "localhost:9092")
properties.setProperty("zookeeper.connect", "localhost:2181")
properties.setProperty("group.id", "test")
val stream = env
.addSource(new FlinkKafkaConsumer082[String]("topic", new SimpleStringSchema(), properties))
.print
env.execute("Flink Kafka Example")
}
}
In contrast to what Robert added, below is a piece of application code for sending messages to the Kafka topic.
import org.apache.kafka.clients.producer.{KafkaProducer, ProducerRecord}
object KafkaProducer {
def main(args: Array[String]): Unit = {
KafkaProducer.sendMessageToKafkaTopic("localhost:9092", "topic_name")
}
def sendMessageToKafkaTopic(server: String, topic:String): Unit = {
val props = new Properties()
props.put("bootstrap.servers", servers)
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer")
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer")
val producer = new KafkaProducer[String,String](props)
val record = new ProducerRecord[String,String](topic, "HELLO WORLD!")
producer.send(record)
producer.close()
}
}