Kafka Streams not writing expected result after countByKey - apache-kafka

Using Kafka Streams (version 0.10.0.1) and Kafka Broker (0.10.0.1) I'm trying to generate counts based on message keys. I produce my messages with the following command:
./bin/kafka-console-producer.sh --broker-list localhost:9092 --topic kafka-streams-topic --property parse.key=true --property key.separator=,
When I run the above command I can send a key and value like this:
1,{"value":10}
This will send a message to kafka that has a key = 1 and a value = {"value":10}.
My goal is to then count how many messages have the key=1. Given the above commands the count would be 1.
Here is the code that I am using:
public class StreamProcessor {
public static void main(String[] args) {
KStreamBuilder builder = new KStreamBuilder();
final Serde<Long> longSerde = Serdes.Long();
final Serde<String> stringSerde = Serdes.String();
KStream<String, String> values = builder.stream(stringSerde, stringSerde, "kafka-streams-topic");
KStream<String, Long> counts = values
.countByKey(stringSerde, "valueCounts")
.toStream();
counts.print(stringSerde, longSerde);
counts.to(stringSerde, longSerde, "message-counts-topic");
KafkaStreams streams = new KafkaStreams(builder, properties());
streams.start();
Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
}
private static Properties properties() {
final Properties streamsConfiguration = new Properties();
streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, "kafka-streams-poc");
streamsConfiguration.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
streamsConfiguration.put(StreamsConfig.ZOOKEEPER_CONNECT_CONFIG, "localhost:2181");
streamsConfiguration.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
streamsConfiguration.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
return streamsConfiguration;
}
}
When I run counts.print(stringSerde,longSerde) I get:
1 , 1
Meaning that I have a key=1 and their is 1 message that has that key. That is what I expect.
However, when the following line runs:
counts.to(stringSerde, longSerde, "message-counts-topic");
The topic called message-counts-topic gets a message sent to it but when I try to read the message using this command:
./bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic message-counts-topic --property print.key=true --property key.separator=, --from-beginning
I get the following output:
1 ,
Where the 1 is the key and nothing is displayed for the value. I expect to see the message 1 , 1. But for some reason the count value is lost, even though its displayed when calling the print method.

You need to specify a different value deserializer for bin/kafka-console-consumer.sh. Add the following:
--property value.deserializer=org.apache.kafka.common.serialization.LongDeserializer
Default String deserializer fails to read the long value correctly.

Related

TopicCommand.alterTopic in Kakfa 2.4

I have an old project (it's not mine) and I'm trying to update it from Kafka 2.1 to 2.4.
I have the following piece of code
public synchronized void increasePartitions(String topic, int partitions) throws InvalidPartitionsException, IllegalArgumentException {
StringBuilder commandString = new StringBuilder();
commandString.append("--alter");
commandString.append(" --topic ").append(topic);
commandString.append(" --zookeeper ").append(config.getOrDefault("zookeeper.connect",
"localhost:2181"));
commandString.append(" --partitions ").append(partitions);
String[] command = commandString.toString().split(" ");
TopicCommand.alterTopic(kafkaZkClient, new TopicCommand.TopicCommandOptions(command));
}
It says that the alterTopic method of TopicCommand doesn't exist. I'm looking at the documentation and I don't know how to solve it.
I need this method to do the exact same thing but with Kafka version 2.4.
You should use the Admin API to perform tasks like this.
In order to add partitions, there's the createPartitions() method.
For example, to increase the number of partitions for my-topic to 10:
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
Admin admin = Admin.create(props);
Map<String, NewPartitions> newPartitions = new HashMap<>();
newPartitions.put("my-topic", NewPartitions.increaseTo(10));
CreatePartitionsResult createPartitions = admin.createPartitions(newPartitions);
createPartitions.all().get();

Why is my simple Kafka Consumer example not working

I am facing issues in getting a very basic kafka consumer to work. I am using the kafka-clients-1.1.0.jar
Here is all that I have done.
Started zookeeper on command line (All commands are run from )
zookeeper-server-start.bat ../../config/zookeeper.properties
Started Kafka server
kafka-server-start.bat ../../config/server.properties
Created a new topic 'hellotopic' and verified it by listing the topics
kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic hellotopic
Created topic "hellotopic".
Verify by listing the topics
D:\RC\Softwares\kafka_2.12-1.1.0\kafka_2.12-1.1.0\bin\windows>kafka-topics.bat --list --zookeeper localhost:2181
hellotopic
Post message to the topic and verified the same on console consumer
kafka-console-producer.bat --broker-list localhost:9092 --topic hellotopic --property "parse.key=true" --property "key.separator=:"
Message key and value entered as below
key1:value1
You can see that on the console consumer we are able to see the message in topic 'hellotopic'
kafka-console-consumer.bat --zookeeper localhost:2181 --topic hellotopic --from-beginning
Output for above command is as shown below. We can see the message value 'value1' that was posted
Using the ConsoleConsumer with old consumer is deprecated and will be removed in a future major release. Consider using the new consumer by passing [bootstrap-server] instead of [zookeeper].
value1
Now that we have a topic with a message in it, I run my simple Java kafka consumer code to fetch all messages in the topic 'hellotopic'. Below is the code
import java.util.Arrays\;
import java.util.Properties;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
public class SampleConsumer {
public static void main(String[] args) {
System.out.println("Start consumer code");
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "test-consumer-group");
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "1000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("hellotopic"));
//while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records)
System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
//}
System.out.println("End consumer code");
}
}
When we run the above class, here is the output seen
Start consumer code
End consumer code
Tried a lot to find the issue, but no luck yet. Much appreciate help on this simple example.
I see two issues with the code:
You are missing a particular config that makes the consumer start from the earliest offset: props.put("auto.offset.reset", "earliest");
The --from-beginning in your command line consumer actually translated to this config. This config tells the consumer to start from the earliest offset if there no committed offset found for the corresponding topic and partition within the group.
The actual poll should be in a loop. One poll may not give the consumer enough time to do the subscription and also fetch data. One common way to do the poll is this:
try {
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records)
System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
}
} finally {
consumer.close();
}

kafka consumer is not able to produce output

I have written kafka consumer in scala. When I run consumer it is showing blank on console.
I have used below code:
val topicProducer = "testOutput"
val props = new Properties()
props.put("bootstrap.servers","host:9092,host:9092")
props.put("key.deserializer","org.apache.kafka.common.serialization.StringDeserializer")
props.put("value.deserializer","org.apache.kafka.common.serialization.StringDeserializer")
props.put("group.id", "test");
val kafkaConsumer = new KafkaConsumer[String, String](props)
val topic = Array("test").toList
kafkaConsumer.subscribe(topic)
val results = kafkaConsumer.poll(2000)
for ((record) <- results) {
producer.send(new ProducerRecord(topicProducer,"key","Value="+record.key()+" Record Key="+record.value()+"append"))
}
You also need to specify auto.offset.reset property so that your consumer is able to consume the messages from the beginning (equivalent to --from-beginning in the command-line )
props.put("auto.offset.reset", "earliest");
According to Kafka docs:
auto.offset.reset
What to do when there is no initial offset in ZooKeeper or if an
offset is out of range:
smallest : automatically reset the offset to the smallest offset
largest : automatically reset the offset to the largest offset
anything else: throw exception to the consumer
EDIT:
Alternatively, if you are using the old consumer API then instead of bootstrap-server host:9092 use the zookeeper parameter --zookeeper host:2181 .
If this does not solve the issue then try to delete /brokers in zookeeper
bin/zookeeper-shell <zk-host>:2181
and restart the kafka nodes
rmr /brokers

Kafka cannot see topic information when launching kafka-consumer-groups.sh tools script

I'm using Kafka 0.9 and I would like to use utility scripts provided into the bin folder of Kafka installation to check some information about my group, like partitions, lags, etc.
I have clients belonging to the group "my-group" which are correctly producing/consuming to/from 2 topics:
"topic-1" and "topic-2".
Simplifying, consumer code is the following, really basic, with properties having more or less default values.
public void run() {
consumer = new KafkaConsumer<>(getConsumerProperties());
consumer.subscribe(topics);
while (true) {
ConsumerRecords<String, Message> records = consumer.poll(Long.MAX_VALUE);
...
}
}
private Properties getConsumerProperties() {
Properties properties = new Properties();
properties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServer);
properties.put(ConsumerConfig.CLIENT_ID_CONFIG, clientId);
properties.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
properties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, autoCommit);
properties.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, autoCommitInterval.intValue());
properties.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, autoOffsetReset);
properties.put(ConsumerConfig.REQUEST_TIMEOUT_MS_CONFIG, requestTimeout.intValue());
properties.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, sessionTimeout.intValue());
properties.put(ConsumerConfig.HEARTBEAT_INTERVAL_MS_CONFIG, heartbeatInterval.intValue());
properties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, keyDeserializer);
properties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, valueDeserializer);
return properties;
}
If I run the following script
./kafka-consumer-groups.sh --bootstrap-server localhost:1881 --new-consumer --describe --group my-group
I get partitions information only about "topic-1", but no data about the other topic.
Anyone of you already tried to use this script and experienced this behavio of partial result shown?
Any help would be very much appreciated. Thanks!

How to pass Integer value to kafka producer and read it back on kafka consumer console using IntegerSerializer in Kafka

I am trying to send Integer value through Kafka producer using the kafka provided API IntegerSerializer,but the integer value is not getting parsed correctly and it displayed in form of random unknown symbol on the Kafka consumer console.
public static void main(String[] args) throws Exception{
int i;
// Check arguments length value
if(args.length == 0){
System.out.println("Enter topic name");
return;
}
//Assign topicName to string variable
String topicName = args[0].toString();
// create instance for properties to access producer configs
Properties props = new Properties();
//Assign localhost id
props.put("bootstrap.servers", "localhost:9092");
//Set acknowledgements for producer requests.
props.put("acks", "all");
//If the request fails, the producer can automatically retry,
props.put("retries", "0");
//Specify buffer size in config
props.put("batch.size"," 16384");
//Reduce the no of requests less than 0
props.put("linger.ms", "1");
//The buffer.memory controls the total amount of memory available to the producer for buffering.
props.put("buffer.memory", "33554432");
props.put("key.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer",
"org.apache.kafka.common.serialization.IntegerSerializer");
KafkaProducer<String,Integer> producerRcrd = new KafkaProducer<String,Integer>(props);
producerRcrd.send(new ProducerRecord<String,Integer>(topicName, "Key1",100));
System.out.println("Message sent successfully");
producerRcrd.flush();
producerRcrd.close();
}
}
Then it is not showing 100 on Kafka-consumer console.
Appending
--property key.deserializer=org.apache.kafka.common.serialization.StringDeserializer --property value.deserializer=org.apache.kafka.common.serialization.IntegerDeserializer
to kafka-console-consumer.sh, having console message formatter know how to deserialize your message body.