Consume only specific partition message - apache-kafka

Here is my kafka message producer:
ProducerRecord producerRecord = new ProducerRecord(topic, "k1", message);
producer.send(producerRecord);
here is my consumer
TopicPartition partition0 = new TopicPartition(topic, 0);
consumer.assign(Arrays.asList(partition0));
final int minBatchSize = 200;
List<ConsumerRecord<String, byte[]>> buffer = new ArrayList<>();
while (true) {
ConsumerRecords<String, byte[]> records = consumer.poll(100);
for (ConsumerRecord<String, byte[]> record : records) {
buffer.add(record);
System.out.println(record.key() + "KEY: " + record.value());
How is it possible to consume only topic message having k1 as partition key

The only way I see to implement such behavior is to have the number of partitions == number of possible keys and have a custom partitioner to maintain key uniqueness for a partition (default hash partitioner would work I think). But this solution is very far from optimal and I can't recommend it. Besides that you can't use any built in mechanism to achieve similar behavior - you'll have to filter messages on client side

One proposal is to remember the partition and offset of your specific message,
and using assign and seek, poll in consumer side.(also set consumer max.poll.records=1, which fetch one message in one time).
assign, assign specific partition to consumer;
seek, seek to specific offset, then next poll will get your expected message K1.
Note:It works like "random" seek, but will reduce message consumption performance.
0.10 new consumer and new config max.poll.records are required.

Related

Kafka consumer is reading last committed offset on re-start (Java)

I have a kakfa consumer for which enable.auto.commit is set to false. Whenever I re-start my consumer application, it always reads the last committed offset again and then the next offsets.
For ex. Last committed offset is 50. When I restart consumer, it again reads offset 50 first and then the next offsets.
I am performing commitsync as shown below.
Map<TopicPartition, OffsetAndMetadata> offsets = new HashMap<>();
offsets.put(new TopicPartition("sometopic", partition), new OffsetAndMetadata(offset));
kafkaconsumer.commitSync(offsets);
I tried setting auto.offset.reset to earliest and latest but it is not changing the behavior.
Am I missing something here in consumer configuration ?
config.put(ConsumerConfig.CLIENT_ID_CONFIG, "CLIENT_ID");
config.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "127.0.0.1:9092");
config.put(ConsumerConfig.GROUP_ID_CONFIG, "GROUP_ID");
config.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false");
config.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,StringDeserializer.class.getName());
config.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,CustomDeserializer.class.getName());
config.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
If you want to use commitSync(offset) you have to be careful and read its Javadoc:
The committed offset should be the next message your application will consume, i.e. lastProcessedMessageOffset + 1.
If you don't add + 1 to the offset, it is expected that on next restart, the consumer will consume again the last message. As mentioned in the other answer, if you use commitSync() without any argument, you don't have to worry about that
It looks like you're trying to commit using new OffsetAndMetadta(offset). That's not the typical usage.
Here's an example from the documentation, under Manual Offset Control:
List<ConsumerRecord<String, String>> buffer = new ArrayList<>();
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
buffer.add(record);
}
if (buffer.size() >= minBatchSize) {
insertIntoDb(buffer);
consumer.commitSync();
buffer.clear();
}
}
https://kafka.apache.org/21/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html
Notice how the consumer.commitSync() call is performed without any parameters. It simply consumes, and it will commit to whatever was consumed up to that point.

java KafkaConsumer never get results

I'm new to kafka, I have the following sample code :
KafkaConsumer<String,String> kc = new KafkaConsumer<String, String>(props);
while(true) {
List<String> topicNames = Arrays.asList(topics.split(","));
if (!kc.assignment().isEmpty()) {
kc.unsubscribe();
}
kc.subscribe(topicNames);
ConsumerRecords<String, String> recv = kc.poll(1000L);
if (!recv.isEmpty()) {
System.out.println("NOT EMPTY");
}
}
The recv is always empty but if I try to increment the pool timeout the records are returned, also if I cut off the unsubscribe part.
I've taken this piece of code from an integration proprietary software and I cannot modify it.
So my question is: Is this only a timing problem or there is more?
There is a lot that happens when a consumer (re)subscribes to a topic.
Very roughly and as far as I remember the consumer will:
request cluster information
request consumer group metadata
make a JOIN_GROUP request
be assigned certain partitions
The underlying mechanisms are even more complicated if there are more consumers within the same group. That's because the partitions should be reassigned between all the consumers within the group.
That is why:
1000 millis might not be enough for all this and you didn't poll anything in time
you polled something when you increased the timeout because Kafka managed to perform all of these bootstrapping operations
you polled something when you removed the unsubscription to the topics because most likely your consumer was already subscribed
So there is a timing issue. And I think that there is something more - un/subscribing to a topic within an infinite loop makes no sense to me (see the other answer).
You should subscribe to your topics only once at the beginning. Like this:
final KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("foo", "bar"));
while (true) {
final ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records)
System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
}

Getting the last message sent to a kafka topic

I'm new to Kafka and working on a prototype to connect a proprietary streaming service into Kafka.
I'm looking to get the key of the last message sent on a topic as our in-house stream consumer needs to logon with the ID of the last message it received when connecting.
Is it possible, using either the KafkaProducer or a KafkaConsumer to do this?
I've attempted to do the following using a Consumer, but when also running the console consumer I see messages replayed.
// Poll so we know we're connected
consumer.poll(100);
// Get the assigned partitions
Set<TopicPartition> assignedPartitions = consumer.assignment();
// Seek to the end of those partitions
consumer.seekToEnd(assignedPartitions);
for(TopicPartition partition : assignedPartitions) {
final long offset = consumer.committed(partition).offset();
// Seek to the previous message
consumer.seek(partition,offset - 1);
}
// Now get the last message
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
lastKey = record.key();
}
consumer.close();
Is this expected behaviour or am I on the wrong path?
The problem is on line final long offset = consumer.committed(partition).offset(), as link api refers committed method is to get the last committed offset for the given partition, i.e: the last offset your consumer tell kafka server that it had already read.
So, definitely you will got messages replayed, because you always read from specific offset.
As I think I only have to remove the first for block.
Check the record count and get the last message:
// Poll so we know we're connected
consumer.poll(100);
// Get the assigned partitions
Set<TopicPartition> assignedPartitions = consumer.assignment();
// Seek to the end of those partitions
consumer.seekToEnd(assignedPartitions);
for (TopicPartition partition : assignedPartitions) {
final long offset = consumer.committed(partition).offset();
// Seek to the previous message
consumer.seek(partition, offset - 1);
}
// Now get the last message
ConsumerRecords<String, String> records = consumer.poll(100);
int size = records.count();
int index = 0;
for (ConsumerRecord<String, String> record : records) {
index = index + 1;
if (index == size) {
String value = record.value();
System.out.println("Last Message = " + value);
}
}
consumer.close();

Kafka - Spring : kafka consumer read a message from topic based on offset

Is there a way to consume a message from Kafka topic based on offset.
I mean I have a offset id that I previously published in a topic. Now I need get a message from topic based on offset Id which I'm passing.
Using Java Kafka Consumer Library, However you have to know the partition number also.
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String (properties);
long desiredOffset = 10000;
TopicPartition partition = new TopicPartition("some-topic", 0);
consumer.assign(Arrays.asList(partition));
consumer.seek(partition, desiredOffset);
bool found= false;
while(found != true){
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
for(ConsumerRecord<String,String> record: records){
if(record.offset() == desiredOffset){
System.out.println(record)
found= true;
break;
}
}
}
consumer.close();
Things to consider, the record with your desired offset can be deleted depending of the clean up policy configuration in your Kafka Topic. Remember Kafka is a stream platform. Read the message by offset only if you are debugging.
Simply use Kafka consumer with required parameters like
bootstrap-server : (comma separated server names : port no.)
topic : (topic name)
partition : (partition number)
offset : (offset value)
max-messages : (No. of maximum messages to consume)
sh kafka-console-consumer.sh --bootstrap-server server1:9092,server2:9092,server3:9092 --topic test_topic --partition 0 --offset 43212345 --max-messages 1

Can single Kafka producer produce messages to multiple topics and how?

I am just exploring Kafka, currently i am using One producer and One topic to produce messages and it is consumed by one Consumer. very simple.
I was reading the Kafka page, the new Producer API is thread-safe and sharing single instance will improve the performance.
Does it mean i can use single Producer to publish messages to multiple topics?
Never tried it myself, but I guess you can. Since the code for producer and sending the record is (from here https://kafka.apache.org/090/javadoc/index.html?org/apache/kafka/clients/producer/KafkaProducer.html):
Producer<String, String> producer = new KafkaProducer<>(props);
for(int i = 0; i < 100; i++)
producer.send(new ProducerRecord<String, String>("my-topic", Integer.toString(i), Integer.toString(i)));
So, I guess, if you just write different topics in the ProducerRecord, than it should be possible.
Also, here http://kafka.apache.org/081/documentation.html#producerapi it explicitly says that you can use a method send(List<KeyedMessage<K,V>> messages) to write into multiple topics.
If I understand you correctly, you are more looking using the same producer instance to send the same/multiple messages on multiple topics.
Not sure about java, but here you can do in C#(.NET) using the Kafka .NET Client DependentProducerBuilder
using (var producer = new ProducerBuilder<string, string>(config).Build())
using (var producer2 = new DependentProducerBuilder<Null, int>(producer.Handle).Build())
{
producer.ProduceAsync("first-topic", new Message<string, string> { Key = "my-key-value", Value = "my-value" });
producer2.ProduceAsync("second-topic", new Message<Null, int> { Value = 42 });
producer2.ProduceAsync("first-topic", new Message<Null, int> { Value = 107 });
producer.Flush(TimeSpan.FromSeconds(10));
}