Why are all producer messages sent to one partition? - apache-kafka

I have a topic first_topic with 3 partitions.
Here is my code, I send 55 messages to a consumer ( which is running in cmd ), the code below shows the partition to which the message was sent. Every time I launch the code all the messages go to one partition only ( that is picked randomly ), it may be partition 0, 1 or 2.
Why doesn't round-robin work here? I do not specify the key, so I hope it should.
Logger logger = LoggerFactory.getLogger(Producer.class);
Properties properties = new Properties();
properties.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "127.0.0.1:9092");
properties.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
properties.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
KafkaProducer<String, String> producer = new KafkaProducer(properties);
for (int i = 0; i < 55; i++) {
producer.send(new ProducerRecord<>("first_topic", "Hello world partitionCheck " + i), (recordMetadata, e) -> {
// executes every time a record is sent or Exception is thrown
if (e == null) {
// record was successfully sent
logger.info("metaData: " +
"topic " + recordMetadata.topic() +
" Offset " + recordMetadata.offset() +
" TimeStamp " + recordMetadata.timestamp() +
" Partition " + recordMetadata.partition());
} else {
logger.error(e.toString());
}
});
}
producer.flush();
producer.close();

As per theory, if you are not specifying any custom partition it will use the default partitioner as per the below rule
If a partition is specified in the record, use it that partition to publish.
If no partition is specified but a key is present choose a partition based on a hash of the key
If no partition or key is present choose a partition in a round-robin fashion
Can you confirm, how are you checking this. "Every time I launch the code all the messages go to one partition only ( that is picked randomly ), it may be partition 0, 1 or 2. "

Related

Kafka streams join: How to wait a duration of time before emitting records?

We currently have 2 Kafka stream topics that have records coming in continuously. We're looking into joining the 2 streams based on a key after waiting for a window of 5 minutes but with my current code, I see records being emitted immediately without "waiting" to see if a matching record arrives in the other stream. My current implementation:
KStream<String, String> streamA =
builder.stream(topicA, Consumed.with(Serdes.String(), Serdes.String()))
.peek((key, value) -> System.out.println("Stream A incoming record key " + key + " value " + value));
KStream<String, String> streamB =
builder.stream(topicB, Consumed.with(Serdes.String(), Serdes.String()))
.peek((key, value) -> System.out.println("Stream B incoming record key " + key + " value " + value));
ValueJoiner<String, String, String > recordJoiner =
(recordA, recordB) -> {
if(recordA != null) {
return recordA;
} else {
return recordB;
}
};
KStream<String, String > combinedStream =
streamA(
streamB,
recordJoiner,
JoinWindows
.of(Duration.ofMinutes(5)),
StreamJoined.with(
Serdes.String(),
Serdes.String(),
Serdes.String()))
.peek((key, value) -> System.out.println("Stream-Stream Join record key " + key + " value " + value));
combinedStream.to("test-topic"
Produced.with(
Serdes.String(),
Serdes.String()));
KafkaStreams kafkaStreams = new KafkaStreams(builder.build(), streamsConfiguration);
kafkaStreams.start();
Although I have the JoinWindows.of(Duration.ofMinutes(5)), I see some records being emitted immediately. How do I ensure they are not?
Additionally, is this the most efficient way of joining 2 Kafka streams or is it better to come up with our own consumer implementation that reads from 2 streams etc?

Kafka Consumer Poll runs indefinitely and doesn't return anything

I am facing difficulty with KafkaConsumer.poll(duration timeout), wherein it runs indefinitely and never come out of the method. Understand that this could be related to connection and I have seen it a bit inconsistent sometimes. How do I handle this should poll stops responding? Given below is the snippet from KafkaConsumer.poll()
public ConsumerRecords<K, V> poll(final Duration timeout) {
return poll(time.timer(timeout), true);
}
and I am calling the above from here :
Duration timeout = Duration.ofSeconds(30);
while (true) {
final ConsumerRecords<recordID, topicName> records = consumer.poll(timeout);
System.out.println("record count is" + records.count());
}
I am getting the below error:
org.apache.kafka.common.errors.SerializationException: Error
deserializing key/value for partition at offset 2. If
needed, please seek past the record to continue consumption.
I stumbled upon some useful information while trying to fix the problem I was facing above. I will provide the piece of code which should be able to handle this, but before that it is important to know what causes this.
While producing or consuming message or data to Apache Kafka, we need schema structure to that message or data, in my case Avro schema. If there is a conflict of message being produced to Kafka that conflict with that message schema, it will have an effect on consumption.
Add below code in your consumer topic in the method where it consume records --
do remember to import below packages:
import org.apache.kafka.common.TopicPartition;
import org.jsoup.SerializationException;
try {
while (true) {
ConsumerRecords<String, GenericRecord> records = null;
try {
records = consumer.poll(10000);
} catch (SerializationException e) {
String s = e.getMessage().split("Error deserializing key/value
for partition ")[1].split(". If needed, please seek past the record to
continue consumption.")[0];
String topics = s.split("-")[0];
int offset = Integer.valueOf(s.split("offset ")[1]);
int partition = Integer.valueOf(s.split("-")[1].split(" at") .
[0]);
TopicPartition topicPartition = new TopicPartition(topics,
partition);
//log.info("Skipping " + topic + "-" + partition + " offset "
+ offset);
consumer.seek(topicPartition, offset + 1);
}
for (ConsumerRecord<String, GenericRecord> record : records) {
System.out.printf("value = %s \n", record.value());
}
}
} finally {
consumer.close();
}
I ran into this while setting up a test environment.
Running the following command on the broker printed out the stored records as one would expect:
bin/kafka-console-consumer.sh --bootstrap-server="localhost:9092" --topic="foo" --from-beginning
It turned out that the Kafka server was misconfigured. To connect from an external
IP address listeners must have a valid value in kafka/config/server.properties, e.g.
# The address the socket server listens on. It will get the value returned from
# java.net.InetAddress.getCanonicalHostName() if not configured.
# FORMAT:
# listeners = listener_name://host_name:port
# EXAMPLE:
# listeners = PLAINTEXT://your.host.name:9092
listeners=PLAINTEXT://:9092

Kafka single consumer to read from topic having multiple partitions from given offset

I have a kafka topic with 2 partitions, I want to create a consumer to read topic from a given offset, below is the sample code that I am using to read from given offset which is 9
Properties configProperties = new Properties();
configProperties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
configProperties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer");
configProperties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer");
configProperties.put(ConsumerConfig.GROUP_ID_CONFIG, "test_consumer_group");
configProperties.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,"earliest");
KafkaConsumer kafkaConsumer = new KafkaConsumer<String, String>(configProperties);
kafkaConsumer.subscribe(Arrays.asList(topicName), new ConsumerRebalanceListener() {
public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
}
public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
for(TopicPartition topicPartition: partitions) {
consumer.seek(topicPartition, 9);
}
}
});
try {
while (true) {
ConsumerRecords<String, String> records = kafkaConsumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
System.out.println(record.topic() + ", " + record.partition() + ", " + record.offset() + ", " + record.value());
}
}
}catch(WakeupException ex){
System.out.println("Exception caught " + ex.getMessage());
}finally{
kafkaConsumer.close();
}
But i am seeing the following error
org.apache.kafka.clients.consumer.internals.Fetcher - Fetch offset 9 is out of range for parition test-topic_partitions-0, resetting offset
org.apache.kafka.clients.consumer.internals.Fetcher - Resetting offset for partition test-topic_partitions-0 to offset
I am using below maven dependency
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>2.1.0</version>
</dependency>
And also to mention the delete.retention.ms configured for this topic is 86400000 (which is 1 day) and retention.ms is configured as 172800000 (which is 2 days)
Can someone help how to resolve the error ?
This error means that the partition does not have a record at offset 9.
So either:
you have currently less than 9 records in the partition
records at least up to 9 have been deleted by the retention policy
You can use endOffsets() and beginningOffsets() to find the smallest and largest offsets in your partition. If you try to seek to an offset out of this range, the reset policy auto.offset.reset triggers to find a valid offset.

Getting the last message sent to a kafka topic

I'm new to Kafka and working on a prototype to connect a proprietary streaming service into Kafka.
I'm looking to get the key of the last message sent on a topic as our in-house stream consumer needs to logon with the ID of the last message it received when connecting.
Is it possible, using either the KafkaProducer or a KafkaConsumer to do this?
I've attempted to do the following using a Consumer, but when also running the console consumer I see messages replayed.
// Poll so we know we're connected
consumer.poll(100);
// Get the assigned partitions
Set<TopicPartition> assignedPartitions = consumer.assignment();
// Seek to the end of those partitions
consumer.seekToEnd(assignedPartitions);
for(TopicPartition partition : assignedPartitions) {
final long offset = consumer.committed(partition).offset();
// Seek to the previous message
consumer.seek(partition,offset - 1);
}
// Now get the last message
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
lastKey = record.key();
}
consumer.close();
Is this expected behaviour or am I on the wrong path?
The problem is on line final long offset = consumer.committed(partition).offset(), as link api refers committed method is to get the last committed offset for the given partition, i.e: the last offset your consumer tell kafka server that it had already read.
So, definitely you will got messages replayed, because you always read from specific offset.
As I think I only have to remove the first for block.
Check the record count and get the last message:
// Poll so we know we're connected
consumer.poll(100);
// Get the assigned partitions
Set<TopicPartition> assignedPartitions = consumer.assignment();
// Seek to the end of those partitions
consumer.seekToEnd(assignedPartitions);
for (TopicPartition partition : assignedPartitions) {
final long offset = consumer.committed(partition).offset();
// Seek to the previous message
consumer.seek(partition, offset - 1);
}
// Now get the last message
ConsumerRecords<String, String> records = consumer.poll(100);
int size = records.count();
int index = 0;
for (ConsumerRecord<String, String> record : records) {
index = index + 1;
if (index == size) {
String value = record.value();
System.out.println("Last Message = " + value);
}
}
consumer.close();

Consume only specific partition message

Here is my kafka message producer:
ProducerRecord producerRecord = new ProducerRecord(topic, "k1", message);
producer.send(producerRecord);
here is my consumer
TopicPartition partition0 = new TopicPartition(topic, 0);
consumer.assign(Arrays.asList(partition0));
final int minBatchSize = 200;
List<ConsumerRecord<String, byte[]>> buffer = new ArrayList<>();
while (true) {
ConsumerRecords<String, byte[]> records = consumer.poll(100);
for (ConsumerRecord<String, byte[]> record : records) {
buffer.add(record);
System.out.println(record.key() + "KEY: " + record.value());
How is it possible to consume only topic message having k1 as partition key
The only way I see to implement such behavior is to have the number of partitions == number of possible keys and have a custom partitioner to maintain key uniqueness for a partition (default hash partitioner would work I think). But this solution is very far from optimal and I can't recommend it. Besides that you can't use any built in mechanism to achieve similar behavior - you'll have to filter messages on client side
One proposal is to remember the partition and offset of your specific message,
and using assign and seek, poll in consumer side.(also set consumer max.poll.records=1, which fetch one message in one time).
assign, assign specific partition to consumer;
seek, seek to specific offset, then next poll will get your expected message K1.
Note:It works like "random" seek, but will reduce message consumption performance.
0.10 new consumer and new config max.poll.records are required.