Kafka Producer Java API is not distributing messages to all topic partitions - apache-kafka

I am very new to Kafka and today I tried creating Java Producer for producing messages on Kafka topics on different Partitions.
First I created a package raggieKafka under which I created 2 classes: TestProducer and SimplePartitioner.
TestProducer class has following code:
package raggieKafka;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.util.*;
import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;
public class TestProducer{
public static void main(String args[]) throws Exception
{
long events = 0;
BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));
events = Integer.parseInt(reader.readLine());
Random rnd = new Random();
Properties props = new Properties();
props.put("metadata.broker.list", "localhost:9092");
props.put("topic.metadata.refresh.interval.ms", "1");
props.put("serializer.class", "kafka.serializer.StringEncoder");
props.put("partitioner.class", "raggieKafka.SimplePartitioner");
props.put("request.required.acks", "1");
ProducerConfig config = new ProducerConfig(props);
Producer<String, String> prod = new Producer<String, String>(config);
for(long i = 0; i < events; i++)
{
long runtime = new Date().getTime();
String ip = "192.168.2." + rnd.nextInt(255);
String msg = runtime + ",www.example.com, " + ip;
KeyedMessage<String,String> data = new KeyedMessage<String, String>("page_visits", ip, msg);
prod.send(data);
}
prod.close();
}
}
and SimplePartitioner class has following code:
package raggieKafka;
import kafka.producer.Partitioner;
import kafka.utils.VerifiableProperties;
public class SimplePartitioner implements Partitioner{
public SimplePartitioner(VerifiableProperties props)
{
}
public int partition(Object Key, int a_numPartitions)
{
int partition = 0;
String stringKey = (String) Key;
int offset = stringKey.indexOf(stringKey);
if(offset > 0)
{
partition = Integer.parseInt(stringKey.substring(offset+1)) % a_numPartitions;
}
return partition;
}
}
Before compiling these Java program I created topic on Kafka Broker:
C:\kafka_2.11-0.9.0.1>.\bin\windows\kafka-topics.bat --create --topic page_visit
s --zookeeper localhost:2181 --partitions 5 --replication-factor 1
WARNING: Due to limitations in metric names, topics with a period ('.') or under
score ('_') could collide. To avoid issues it is best to use either, but not bot
h.
Created topic "page_visits".
Now when I compile the java program it puts all the messages to only 1 partition i.e. page_visits-0 under which all the messages are posted however rest all other partitions remain empty.
Can someone tell me why my Java producer is NOT distributing all my messaged to other partitions ?
Infact, I looked on google and then added one more property:
props.put("topic.metadata.refresh.interval.ms", "1");
but still Producer isn't producing messages to all the topics.
PLEASE HELP.

Your SimplePartitioner code has bug in the following line
int offset = stringKey.indexOf(stringKey);
It always returns 0 so your offset always equals to 0 and as it never greater than 0 your if block will not get executed. And finally it always returns you partition 0.
Solution: As your key is ip address,the following change could work as expected.
int offset = stringKey.lastIndexOf('.');
Hope this helps!

Related

Kafka Java Producer API unable to serialize key as Long or Int

Here is the Java code for producing data in Kafka:
import org.apache.kafka.clients.producer.*;
import org.apache.kafka.common.serialization.LongSerializer;
import org.apache.kafka.common.serialization.StringSerializer;
import java.util.Properties;
public class ExampleClass {
private final static String TOPIC = "my-example-topic";
private final static String BOOTSTRAP_SERVERS = "confbroker:9092";
private static Producer<Long, String> createProducer() {
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, BOOTSTRAP_SERVERS);
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, LongSerializer.class.getName());
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
return new KafkaProducer<>(props);
}
private static void runProducer() throws Exception {
final Producer<Long, String> producer = createProducer();
long sensorId = 1001L;
try {
for (long index = sensorId; index < sensorId + 5; index++) {
final ProducerRecord<Long, String> record = new ProducerRecord<>(TOPIC, index, "This is sensor no: " + index);
RecordMetadata metadata = producer.send(record).get();
System.out.printf("sent record(key=%s value=%s) " + "meta(partition=%d, offset=%d)\n", record.key(),
record.value(), metadata.partition(), metadata.offset());
}
} finally {
producer.flush();
producer.close();
}
}
public static void main(String... args) throws Exception {
runProducer();
}
}
When running console consumer in Confluent 5.4.0, I am getting result as:
The key is gibberish.
How can I produce Key of either Int or Long type.
PS:
=> Same result in Confluent 5.5 also.
=> Same result with IntegerSerializer.
The console consumer uses StringDeserialisers as default for the key and the value. If you want to deserialise the key as Long you have to explicitly mention that in your console-consumer command:
--property key.deserializer org.apache.kafka.common.serialization.LongDeserializer

How to read all the records in a Kafka topic

I am using kafka : kafka_2.12-2.1.0, spring kafka on client side and have got stuck with an issue.
I need to load an in-memory map by reading all the existing messages within a kafka topic. I did this by starting a new consumer (with a unique consumer group id and setting the offset to earliest). Then I iterate over the consumer (poll method) to get all messages and stop when the consumer records become empty.
But I noticed that, when I start polling, the first few iterations return consumer records as empty and then it starts returning the actual records. Now this breaks my logic as our code thinks there are no records in the topic.
I have tried few other ways (like using offsets number) but haven't been able to come up with any solution, apart from keeping another record somewhere which tells me how many messages there are in the topic which needs to be read before I stop.
Any idea's please ?
To my understanding, what you are trying to achieve is to have a map constructed in your application based on the values that are already in a specific Topic.
For this task, instead of manually polling the topic, you can use Ktable in Kafka Streams DSL which will automatically construct a readable key-value store which is fault tolerant, replication enabled and automatically filled with new values.
You can do this simply by calling groupByKey on a stream and then using the aggregate.
KStreamBuilder builder = new KStreamBuilder();
KStream<String, Long> myKStream = builder.stream(Serdes.String(), Serdes.Long(), "topic_name");
KTable<String, Long> totalCount = myKStream.groupByKey().aggregate(this::initializer, this::aggregator);
(The actual code may vary depending on the kafka version, your configurations, etc..)
Read more about Kafka Stream concepts here
Then I iterate over the consumer (poll method) to get all messages and stop when the consumer records become empty
Kafka is a message streaming platform. Any data you stream is being updated continuously and you probably should not use it in a way that you expect the consuming to stop after a certain number of messages. How will you handle if a new message comes in after you stop the consumer?
Also the reason you are getting null records maybe probably related to records being in different partitions, etc..
What is your specific use case here?, There might be a good way to do it with the Kafka semantics itself.
You have to use 2 consumers one to load the offsets and another one to read all the records.
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.PartitionInfo;
import org.apache.kafka.common.TopicPartition;
import org.apache.kafka.common.serialization.ByteArrayDeserializer;
import java.time.Duration;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Objects;
import java.util.Properties;
import java.util.Set;
import java.util.stream.Collectors;
public class KafkaRecordReader {
static final Map<String, Object> props = new HashMap<>();
static {
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, ByteArrayDeserializer.class);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, ByteArrayDeserializer.class);
props.put(ConsumerConfig.CLIENT_ID_CONFIG, "sample-client");
}
public static void main(String[] args) {
final Map<TopicPartition, OffsetInfo> partitionOffsetInfos = getOffsets(Arrays.asList("world, sample"));
final List<ConsumerRecord<byte[], byte[]>> records = readRecords(partitionOffsetInfos);
System.out.println(partitionOffsetInfos);
System.out.println("Read : " + records.size() + " records");
}
private static List<ConsumerRecord<byte[], byte[]>> readRecords(final Map<TopicPartition, OffsetInfo> offsetInfos) {
final Properties readerProps = new Properties();
readerProps.putAll(props);
readerProps.put(ConsumerConfig.CLIENT_ID_CONFIG, "record-reader");
final Map<TopicPartition, Boolean> partitionToReadStatusMap = new HashMap<>();
offsetInfos.forEach((tp, offsetInfo) -> {
partitionToReadStatusMap.put(tp, offsetInfo.beginOffset == offsetInfo.endOffset);
});
final List<ConsumerRecord<byte[], byte[]>> cachedRecords = new ArrayList<>();
try (final KafkaConsumer<byte[], byte[]> consumer = new KafkaConsumer<>(readerProps)) {
consumer.assign(offsetInfos.keySet());
for (final Map.Entry<TopicPartition, OffsetInfo> entry : offsetInfos.entrySet()) {
consumer.seek(entry.getKey(), entry.getValue().beginOffset);
}
boolean close = false;
while (!close) {
final ConsumerRecords<byte[], byte[]> consumerRecords = consumer.poll(Duration.ofMillis(100));
for (final ConsumerRecord<byte[], byte[]> record : consumerRecords) {
cachedRecords.add(record);
final TopicPartition currentTp = new TopicPartition(record.topic(), record.partition());
if (record.offset() + 1 == offsetInfos.get(currentTp).endOffset) {
partitionToReadStatusMap.put(currentTp, true);
}
}
boolean done = true;
for (final Map.Entry<TopicPartition, Boolean> entry : partitionToReadStatusMap.entrySet()) {
done &= entry.getValue();
}
close = done;
}
}
return cachedRecords;
}
private static Map<TopicPartition, OffsetInfo> getOffsets(final List<String> topics) {
final Properties offsetReaderProps = new Properties();
offsetReaderProps.putAll(props);
offsetReaderProps.put(ConsumerConfig.CLIENT_ID_CONFIG, "offset-reader");
final Map<TopicPartition, OffsetInfo> partitionOffsetInfo = new HashMap<>();
try (final KafkaConsumer<byte[], byte[]> consumer = new KafkaConsumer<>(offsetReaderProps)) {
final List<PartitionInfo> partitionInfos = new ArrayList<>();
topics.forEach(topic -> partitionInfos.addAll(consumer.partitionsFor("sample")));
final Set<TopicPartition> topicPartitions = partitionInfos
.stream()
.map(x -> new TopicPartition(x.topic(), x.partition()))
.collect(Collectors.toSet());
consumer.assign(topicPartitions);
final Map<TopicPartition, Long> beginningOffsets = consumer.beginningOffsets(topicPartitions);
final Map<TopicPartition, Long> endOffsets = consumer.endOffsets(topicPartitions);
for (final TopicPartition tp : topicPartitions) {
partitionOffsetInfo.put(tp, new OffsetInfo(beginningOffsets.get(tp), endOffsets.get(tp)));
}
}
return partitionOffsetInfo;
}
private static class OffsetInfo {
private final long beginOffset;
private final long endOffset;
private OffsetInfo(long beginOffset, long endOffset) {
this.beginOffset = beginOffset;
this.endOffset = endOffset;
}
#Override
public String toString() {
return "OffsetInfo{" +
"beginOffset=" + beginOffset +
", endOffset=" + endOffset +
'}';
}
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
OffsetInfo that = (OffsetInfo) o;
return beginOffset == that.beginOffset &&
endOffset == that.endOffset;
}
#Override
public int hashCode() {
return Objects.hash(beginOffset, endOffset);
}
}
}
Adding to the above answer from #arshad, the reason you are not getting the records is because you have already read them. See this answer here using earliest or latest does not matter on the consumer after you have a committed offset for the partition
I would use a seek to the beginning or the particular offset if you knew the starting offset.

How to send and get back data from Kafka in a single API call

I would like to send and get back data from Kafka in a single API call (see diagram below).
Is this possible? I already know how to make the data go in one direction (e.g., Spark Streaming reads data using the Kafka consumer API). I also know how to sort of 'fake it' by doing two one-way approaches (e.g., the web app is both a producer and consumer). However, when the web app makes an API call, I only want it to have to deal with its own record, not all of the records in the topic, so this seems like the wrong approach.
Other sub-optimal approaches I've thought of:
Save the Spark Streaming result in a database so that the web app can constantly poll the database until the result shows up. I'm worried that this could consume a lot of resources and delay response time.
Create short-lived / temporary consumers each time I call the Kafka producer. The temporary consumer would filter out all records except for the one it's looking for. When it finds the record it's looking for, the temporary consumer shuts down. I don't think this would work because the record the API caller cares about might go to a different partition, and so it would never be found.
Make a temporary topic for each of the web app's consumer API calls. I'm not sure if Kafka will complain about too many topics though.
Any advice?
What I did is....
Create a synProducer who send the data with a key and creates a consumer for a topic that has the name as the key of the sent message.
Then a synConsumer handle the message and reply to a topic, where the consumer at step 1 is waiting.
Delete the temporary topic
The downside of this approach is that the issues are not deleted immediately .
I will suggest you the third one, but with 2 topics: 1 for the request and 1 for the response. This is an example:
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Properties;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import kafka.consumer.ConsumerConfig;
import kafka.consumer.ConsumerIterator;
import kafka.consumer.KafkaStream;
import kafka.javaapi.consumer.ConsumerConnector;
import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;
public class ConsumerGroupExample extends Thread {
private final ConsumerConnector consumer;
private final String topic;
private ConsumerIterator<byte[], byte[]> it;
private String mensaje="";
public ConsumerGroupExample(Properties props, String a_topic)
{
consumer = kafka.consumer.Consumer.createJavaConsumerConnector(new ConsumerConfig(props));
this.topic = a_topic;
Map<String, Integer> topicCountMap = new HashMap<String, Integer>();
topicCountMap.put(topic, 1);
Map<String, List<KafkaStream<byte[], byte[]>>> consumerMap = consumer.createMessageStreams(topicCountMap);
List<KafkaStream<byte[], byte[]>> streams = consumerMap.get(topic);
KafkaStream stream = streams.get(0);
it = stream.iterator();
}
public void shutdown()
{
if (consumer != null) consumer.shutdown();
}
public void run()
{
if (it.hasNext())
{
mensaje = new String(it.next().message());
}
System.out.println( mensaje );
}
public String getMensaje()
{
return this.mensaje;
}
public static void main(String[] args) {
Properties props = new Properties();
props.put("zookeeper.connect", "localhost:2181");
props.put("group.id", "Group");
props.put("zookeeper.session.timeout.ms", "400");
props.put("zookeeper.sync.time.ms", "200");
props.put("auto.commit.interval.ms", "1000");
props.put("consumer.timeout.ms", "10000");
ConsumerGroupExample example = new ConsumerGroupExample( props, "topicFoRResponse");
props = new Properties();
props.put("metadata.broker.list", "localhost:9092");
props.put("serializer.class", "kafka.serializer.StringEncoder");
props.put("request.required.acks", "1");
ProducerConfig config = new ProducerConfig(props);
example.start();
try {
Producer<String, String> colaParaEscritura;
KeyedMessage<String, String> data = new KeyedMessage<String, String>("topicForRequest", " message ");
colaParaEscritura = new kafka.javaapi.producer.Producer<String, String>(config);
colaParaEscritura.send(data);
System.out.println("enviado");
colaParaEscritura.close();
example.join();
System.out.println( "final"+ example.getMensaje() );
}
catch (InterruptedException ie) {
}
example.shutdown();
}
}

apache kafka throwing an exception for scala

I am trying to compile and run a simple kafka code that is a sample from Aapche.When compiling I am getting the following exception, even after adding all the lib files for scala (i guess).
Exception in thread "main" java.lang.NullPointerException
at scala.Predef$.Integer2int(Predef.scala:303)
at kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:103)
at kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:102)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:194)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:194)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:44)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:194)
at scala.collection.mutable.ArrayBuffer.map(ArrayBuffer.scala:44)
at kafka.client.ClientUtils$.parseBrokerList(ClientUtils.scala:102)
at kafka.producer.BrokerPartitionInfo.<init>(BrokerPartitionInfo.scala:32)
at kafka.producer.async.DefaultEventHandler.<init>(DefaultEventHandler.scala:41)
at kafka.producer.Producer.<init>(Producer.scala:60)
at kafka.javaapi.producer.Producer.<init>(Producer.scala:26)
at kafkaTest.TestProducer.main(TestProducer.java:23)
This is my program:
package kafkaTest;
import java.util.*;
import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;
public class TestProducer {
public static void main(String[] args) {
// long events = Long.parseLong(args[0]);
long events = 10l;
Random rnd = new Random();
Properties props = new Properties();
props.put("metadata.broker.list", "broker1:9092,broker2:9092 ");
props.put("serializer.class", "kafka.serializer.StringEncoder");
***![props.put("partitioner.class", "kafkaTest.SimplePartitioner");][1]***//this is line no 23
props.put("request.required.acks", "1");
ProducerConfig config = new ProducerConfig(props);
Producer<String, String> producer = new Producer<String, String>(config);
for (long nEvents = 0; nEvents < events; nEvents++) { long runtime =
new Date().getTime(); String ip = "192.168.2.1" + rnd.nextInt(255);
String msg = runtime + ",www.example.com," + ip; KeyedMessage<String,
String> data = new KeyedMessage<String, String>("page_visits", ip,
msg); producer.send(data); }
producer.close();
}
}
The attached is the screen shot of library files.
Please let me know the cause of error/exception.
Edit: this is SimplePartitioner.java
package kafkaTest;
import kafka.producer.Partitioner;
import kafka.utils.VerifiableProperties;
public class SimplePartitioner implements Partitioner {
public SimplePartitioner(VerifiableProperties props) {
}
public int partition(Object key, int a_numPartitions) {
int partition = 0;
String stringKey = (String) key;
int offset = stringKey.lastIndexOf('.');
if (offset > 0) {
partition = Integer.parseInt(stringKey.substring(offset + 1))
% a_numPartitions;
}
return partition;
}
}
There's a space at the end of your broker list :
props.put("metadata.broker.list", "broker1:9092,broker2:9092 ");
Remove it and it should work fine then :
props.put("metadata.broker.list", "broker1:9092,broker2:9092");
I also got this error when metadata.broker.list has a broker with no port number.

Apache Kafka Default Encoder Not Working

I am using Kafka 0.8 beta, and I am just trying to mess around with sending different objects, serializing them using my own encoder, and sending them to an existing broker configuration. For now I am trying to get just DefaultEncoder working.
I have the broker and everything setup and working for StringEncoder, but I am not able to get any other data type, including just pure byte[], to be sent and received by the broker.
My code for the Producer is:
import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;
import java.util.Date;
import java.util.Properties;
import java.util.Random;
public class ProducerTest {
public static void main(String[] args) {
long events = 5;
Random rnd = new Random();
rnd.setSeed(new Date().getTime());
Properties props = new Properties();
props.setProperty("metadata.broker.list", "localhost:9093,localhost:9094");
props.setProperty("serializer.class", "kafka.serializer.DefaultEncoder");
props.setProperty("partitioner.class", "example.producer.SimplePartitioner");
props.setProperty("request.required.acks", "1");
props.setProperty("producer.type", "async");
props.setProperty("batch.num.messages", "4");
ProducerConfig config = new ProducerConfig(props);
Producer<byte[], byte[]> producer = new Producer<byte[], byte[]>(config);
for (long nEvents = 0; nEvents < events; nEvents++) {
byte[] a = "Hello".getBytes();
byte[] b = "There".getBytes();
KeyedMessage<byte[], byte[]> data = new KeyedMessage<byte[], byte[]>("page_visits", a, b);
producer.send(data);
}
try {
Thread.sleep(5000);
} catch (InterruptedException e) {
e.printStackTrace();
}
producer.close();
}
}
I used the same SimplePartitioner as in the example given here, and replacing all the byte arrays by Strings and changing the serializer to kafka.serializer.StringEncoder works perfectly.
For reference, SimplePartitioner:
import kafka.producer.Partitioner;
import kafka.utils.VerifiableProperties;
public class SimplePartitioner implements Partitioner<String> {
public SimplePartitioner (VerifiableProperties props) {
}
public int partition(String key, int a_numPartitions) {
int partition = 0;
int offset = key.lastIndexOf('.');
if (offset > 0) {
partition = Integer.parseInt( key.substring(offset+1)) % a_numPartitions;
}
return partition;
}
}
What am I doing wrong?
The answer is that the partitioning class SimplePartitioner is applicable only for Strings. When I try to run the Producer asynchronously, it creates a separate thread that handles the encoding and partitioning before sending to the broker. This thread hits a roadblock when it realizes that SimplePartitioner works only for Strings, but because it's a separate thread, no Exceptions are thrown, and so the thread just exits without any indication of wrongdoing.
If we change the SimplePartitioner to accept byte[], for instance:
import kafka.producer.Partitioner;
import kafka.utils.VerifiableProperties;
public class SimplePartitioner implements Partitioner<byte[]> {
public SimplePartitioner (VerifiableProperties props) {
}
public int partition(byte[] key, int a_numPartitions) {
int partition = 0;
return partition;
}
}
This works perfectly now.