I'm new to Scala and Kafka and I've run into some trouble.
I'm trying to connect a scala kafka producer to a kafka server that is installed on a cloudera express server.
I have done this already once in VMs with these instructions and didn't have any problems.
When I run the producer the desired topic is created but none of the messages is sent, or so I think.
Here follows some of the code:
Kafka producer
import java.util.Properties
import org.apache.kafka.clients.producer.{KafkaProducer, ProducerRecord}
class KafkaProducerManager {
val props = new Properties()
props.put("bootstrap.servers", KafkaServer.KAFKA_ADDRESS)
props.put("acks", "all")
props.put("retries", "2")
props.put("auto.commit.interval.ms", "1000")
props.put("linger.ms", "1")
props.put("block.on.buffer.full", "true")
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer")
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer")
props.put("auto.create.topics.enable", "true")
val producer = new KafkaProducer[String, String](props)
def startCounter() {
println("Start Producer Counter")
for (i <- 1 to 100) {
producer.send(new ProducerRecord("test-counter", i.toString, "Package " + i))
println("Producer - Send: " + i)
}
println("Closing producer")
producer.close()
}
}
When I execute the run method, I see "Producer - Send: #" as output of this and I get no errors.
So I assume that this piece of code has sent the messages to Kafka.
I started the following on the kafka server before I ran the producer:
kafka-console-consumer --zookeeper zk:2181 --topic test-counter
But here I see nothing happens.
When I check for the topic, that the producer is supposed to create, is in the list.
kafka-topics -zookeeper zk:2181 --list
I also have a similar problem with the consumer:
import java.util.{Arrays, Properties}
import org.apache.kafka.clients.consumer.KafkaConsumer
class KafkaConsumerManager {
val props = new Properties()
props.put("bootstrap.servers", KafkaServer.KAFKA_ADDRESS)
props.put("group.id", "testGroup")
props.put("enable.auto.commit", "true")
props.put("auto.commit.interval.ms", "1000")
props.put("linger.ms", "1")
props.put("session.timeout.ms", "3000")
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer")
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer")
props.put("zookeeper.connect", KafkaServer.ZOOKEEPER_ADDRESS)
val consumer = new KafkaConsumer[String, String](props)
def start() {
println("Start Consumer")
consumer.subscribe(Arrays.asList("test-counter"))
while (true) {
val records = consumer.poll(100)
val iterator = records.iterator()
while (iterator.hasNext) {
val record = iterator.next()
printf("Consumer: offset = %d, key = %s, value = %s \n", record.offset(), record.key(), record.value())
}
}
}
}
When I create messages on the server via kafka-console-producer I see them appear in the kafka-console-consumer on the server, but not in the consumer I wrote.
kafka-console-producer --broker-list ks:9092 --topic test-counter
The KafkaServer.ZOOKEEPER_ADDRESS is the same as the argument zk:2181 with kafka-console-consumer and the KafkaServer.KAFKA_ADDRESS is the same as the argument ks:9092 with the kafka-console-producer.
I tried the code and found that:
one should specify key and value deserializers in consumer
properties:
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
there is a problem with session.timeout.ms property. From here:
heartbeat.interval.ms - ... The value must be set lower than session.timeout.ms ... default: 3000
It means that you should either increase your session.timeout.ms
value or simply remove the line because default value for the
property is 30000 which is greater than default
heartbeat.interval.ms.
After performing the corrections the code works.
if you are running in windows machine? and following quickstart guide, I had the same issue producer/consumer not giving any error but running either, you need to set kafka_home in your envrionment variabel KAFKA_HOME=C:\kafka_2.13-2.6.0
and then for zooker/server/topic/consumer/producer everything run under your kafka/windows
example: for consumer
%KAFKA_HOME%/bin/windows/kafka-console-consumer.bat --topic quickstart-events --from-beginning --bootstrap-server localhost:9092
Related
I am working on a scala application in which I am using kafka. My kafka consumer code is as follows.
def getValues(topic: String): String = {
val props = new Properties()
props.put("group.id", "test")
props.put("bootstrap.servers", "localhost:9092")
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
props.put("auto.offset.reset", "earliest")
val consumer: KafkaConsumer[String, String] = new KafkaConsumer[String, String](props)
val topicPartition = new TopicPartition(topic, 0)
consumer.assign(util.Collections.singletonList(topicPartition))
val offset = consumer.position(topicPartition) - 1
val record = consumer.poll(Duration.ofMillis(500)).asScala
for (data <- record)
if(data.offset() == offset) val value = data.value()
return value
}
In this I just want to return latest value. When I run my application I get following log:
Resetting offset for partition topic-0 to offset 0
Because of which val offset = consumer.position(topicPartition) - 1 becomes -1 and data.offset() gives list of all offsets. And as a result I don't get the latest value. Why it is automatically resetting offset to 0? How can I correct it? What is the mistake in my code? or anyother way I can get the value from the latest offset?
You are looking for the seek method which - according to the JavaDocs - "overrides the fetch offsets that the consumer will use on the next poll(timeout)".
Also make sure that you are setting
props.put("auto.offset.reset", "latest")
Making those two amendments to your code, the following worked for me to only fetch the value of the latest offset of the partion 0 in the selected topic:
import java.time.Duration
import java.util.Properties
import org.apache.kafka.clients.consumer.KafkaConsumer
import org.apache.kafka.common.TopicPartition
import collection.JavaConverters._
def getValues(topic: String): String = {
val props = new Properties()
props.put("group.id", "test")
props.put("bootstrap.servers", "localhost:9092")
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
props.put("auto.offset.reset", "latest")
val consumer: KafkaConsumer[String, String] = new KafkaConsumer[String, String](props)
val topicPartition = new TopicPartition(topic, 0)
consumer.assign(java.util.Collections.singletonList(topicPartition))
val offset = consumer.position(topicPartition) - 1
consumer.seek(topicPartition, offset)
val record = consumer.poll(Duration.ofMillis(500)).asScala
for (data <- record) {
val value: String = data.value() // you are only reading one message if no new messages flow into the Kafka topic
}
value
}
In this line, props.put("auto.offset.reset", "earliest"), you set the parameter auto.offset.reset of your Kafka consumer to earliest, which will reset the offset to earliest. If you want the latest value, you should use latest instead.
You can find the documentation here.
I'm new to kafka,and want to use Kafka 2.3 to implement a producer/consumer app.
I had download and install the Kafka 2.3 on my ubuntu server.
I found some code online and build it on my laptop in IDEA, But the consumer can't get any info.
I had checked the topic info on my server which has the topic.
I had use kafka-console-consumer to check this topic, got the topic's value successfuly, but not with my consumer.
So what's wrong with my consumer?
Producer
package com.phitrellis.tool
import java.util.Properties
import java.util.concurrent.{Future, TimeUnit}
import org.apache.kafka.clients.consumer.KafkaConsumer
import org.apache.kafka.clients.producer._
object MyKafkaProducer extends App {
def createKafkaProducer(): Producer[String, String] = {
val props = new Properties()
props.put("bootstrap.servers", "*:9092")
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer")
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer")
props.put("producer.type", "async")
props.put("acks", "all")
new KafkaProducer[String, String](props)
}
def writeToKafka(topic: String): Unit = {
val producer = createKafkaProducer()
val record = new ProducerRecord[String, String](topic, "key", "value22222222222")
println("start")
producer.send(record)
producer.close()
println("end")
}
writeToKafka("phitrellis")
}
Consumer
package com.phitrellis.tool
import java.util
import java.util.Properties
import java.time.Duration
import scala.collection.JavaConverters._
import org.apache.kafka.clients.consumer.KafkaConsumer
object MyKafkaConsumer extends App {
def createKafkaConsumer(): KafkaConsumer[String, String] = {
val props = new Properties()
props.put("bootstrap.servers", "*:9092")
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
// props.put("auto.offset.reset", "latest")
props.put("enable.auto.commit", "true")
props.put("auto.commit.interval.ms", "1000")
props.put("group.id", "test")
new KafkaConsumer[String, String](props)
}
def consumeFromKafka(topic: String) = {
val consumer: KafkaConsumer[String, String] = createKafkaConsumer()
consumer.subscribe(util.Arrays.asList(topic))
while (true) {
val records = consumer.poll(Duration.ofSeconds(2)).asScala.iterator
println("true")
for (record <- records){
print(record.value())
}
}
}
consumeFromKafka("phitrellis")
}
Two line in your Consumer code are crucial:
props.put("auto.offset.reset", "latest")
props.put("group.id", "test")
To read from beginning of the topic you have to set auto.offset.reset to earliest (latest cause that you skip messages produced before your Consumer started).
group.id is responsible for group management. If you start processing data with some group.id and than restart your application or start new with same group.id only new messages will be read.
For your tests I would suggest to add auto.offset.reset -> earliest and change group.id
props.put("auto.offset.reset", "earliest")
props.put("group.id", "test123")
Additionally:
You have to remember that KafkaProducer::send returns Future<RecordMetadata> and messages are sent asynchronously and if you progam finished before Future will finished messages might not be sent.
There's two parts here. The producing side, and the consumer.
You don't say anything about the producer, so we're assuming it did work. However, did you check on the servers? You could check the kafka log files to see if there's any data on those particular topic/partitions.
On the consumer side, to validate, you should try to consume using the command-line from that same topic, to make sure the data is in there. Look for "Kafka Consumer Console" at the following link, and follow those steps.
http://cloudurable.com/blog/kafka-tutorial-kafka-from-command-line/index.html
If there is data on the topic, then running that command should get you data. If it's not, then it will just "hang" because it's waiting for data to be written to the topic.
In addition, you can try producing to the same topic using those command line tools, to make sure your cluster is configured correctly, you have the right addresses and ports, that the ports are not blocked, etc.
I am trying to use Apache Kafka through a vagrant machine to run a simple Kafka Consumer program. The program get's stuck before the for loop when it tries to call the .poll(100) method.
Lot's of digging into deeper classes for debugging but not much has been found.
val TOPIC="testTopic"
val props = new Properties()
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "192.168.56.10:9092")
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")
props.put(ConsumerConfig.GROUP_ID_CONFIG, UUID.randomUUID().toString());
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
val consumer = new KafkaConsumer[String, String](props)
consumer.subscribe(util.Collections.singletonList(TOPIC))
while(true) {
println("Test")
val records = consumer.poll(100)
for (record <- records.asScala) {
println(record)
}
println("Test2")
}
}
Currently outputs Test and then get's stuck with no error message. It's expected that it will output the contents of the Kafka topic.
You need to upgrade your kafka-clients version to 2.0.0 or above. When the kafka server is down, for example, using the poll method from KafkaConsumer class you will get stuck in the internal loop waiting for the broker to become available again.
According to KIP-266:
ConsumerRecords
poll​(long timeout)
Deprecated. Since 2.0. Use poll(Duration), which does not block
beyond the timeout awaiting partition assignment. See KIP-266 for more
information.
In your case:
import org.apache.kafka.clients.consumer.KafkaConsumer;
import scala.concurrent.duration._
// ...
val timeout = Duration(100, MILLISECONDS)
while(true) {
println("Test")
val records = consumer.poll(timeout)
for (record <- records.asScala) {
println(record)
}
println("Test2")
}
//...
In conclusion, you just need to import the new version of the KafkaConsumer class and pass the timeout parameter to the new poll method as an instance of the Duration object.
I want to test a kafka example, the producer:
object ProducerApp extends App {
val topic = "topicTest"
val props = new Properties()
props.put("bootstrap.servers", "localhost:9092")
props.put(ConsumerConfig.GROUP_ID_CONFIG, "consumer")
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer")
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer")
val producer = new KafkaProducer[String, String](props)
for(i <- 0 to 125000)
{
val record = new ProducerRecord(topic, "key "+i,new PMessage())
producer.send(record)
}
}
The consumer:
object ConsumerApp extends App {
val topic = "topicTest"
val properties = new Properties
properties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092")
properties.put(ConsumerConfig.GROUP_ID_CONFIG, "consumer")
properties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false")
properties.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")
properties.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
properties.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
val consumer = new KafkaConsumer[String, String](properties)
consumer.subscribe(scala.List(topic).asJava)
while (true) {
consumer.seekToBeginning(consumer.assignment())
val records:ConsumerRecords[String,String] = consumer.poll(20000)
println("records size "+records.count())
}
}
The topic "topicTest" is created with 1 partition.
The expected result is:
...
records size 125000
records size 125000
records size 125000
records size 125000
...
but the obtained result is:
...
records size 778
records size 778
records size 778
records size 778
...
The consumer does not read all the records from the topic. I want to understand the reason. However, if the number of records is smaller (20 for example), it works fine and the consumer reads all the records. Is the size of the topic limited?
Is there a modification in the configuration of Kafka to allow the process of a big number of records?
There is the max.poll.records consumer parameters which has 500 as default with Kafka 1.0.0 so you can't have the result you want with 125000.
For this reason it works with 20 but it's strange the result 778 you have.
I am playing with Spark Streaming and Kafka (with the Scala API), and would like to read message from a set of Kafka topics with Spark Streaming.
The following method:
val kafkaParams = Map("metadata.broker.list" -> configuration.getKafkaBrokersList(), "auto.offset.reset" -> "smallest")
KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, topics)
reads from Kafka to the latest available offset, but doesn't give me the metadata that I need (since I am reading from a set of topics, I need for every message I read that topic) but this other method KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder, Tuple2[String, String]](ssc, kafkaParams, currentOffsets, messageHandler) wants explicitly an offset that I don't have.
I know that there is this shell command that gives you the last offset.
kafka-run-class.sh kafka.tools.GetOffsetShell
--broker-list <broker>: <port>
--topic <topic-name> --time -1 --offsets 1
and KafkaCluster.scala is an API that is for developers that used to be public and gives you exactly what I would like.
Hint?
You can use the code from GetOffsetShell.scala kafka API documentation
val consumer = new SimpleConsumer(leader.host, leader.port, 10000, 100000, clientId)
val topicAndPartition = TopicAndPartition(topic, partitionId)
val request = OffsetRequest(Map(topicAndPartition -> PartitionOffsetRequestInfo(time, nOffsets)))
val offsets = consumer.getOffsetsBefore(request).partitionErrorAndOffsets(topicAndPartition).offsets
Or you can create new consumer with unique groupId and use it for getting first offset
val consumer=new KafkaConsumer[String, String](createConsumerConfig(config.brokerList))
consumer.partitionsFor(config.topic).foreach(pi => {
val topicPartition = new TopicPartition(pi.topic(), pi.partition())
consumer.assign(List(topicPartition))
consumer.seekToBeginning()
val firstOffset = consumer.position(topicPartition)
...