Is there a way to consume a message from Kafka topic based on offset.
I mean I have a offset id that I previously published in a topic. Now I need get a message from topic based on offset Id which I'm passing.
Using Java Kafka Consumer Library, However you have to know the partition number also.
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String (properties);
long desiredOffset = 10000;
TopicPartition partition = new TopicPartition("some-topic", 0);
consumer.assign(Arrays.asList(partition));
consumer.seek(partition, desiredOffset);
bool found= false;
while(found != true){
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
for(ConsumerRecord<String,String> record: records){
if(record.offset() == desiredOffset){
System.out.println(record)
found= true;
break;
}
}
}
consumer.close();
Things to consider, the record with your desired offset can be deleted depending of the clean up policy configuration in your Kafka Topic. Remember Kafka is a stream platform. Read the message by offset only if you are debugging.
Simply use Kafka consumer with required parameters like
bootstrap-server : (comma separated server names : port no.)
topic : (topic name)
partition : (partition number)
offset : (offset value)
max-messages : (No. of maximum messages to consume)
sh kafka-console-consumer.sh --bootstrap-server server1:9092,server2:9092,server3:9092 --topic test_topic --partition 0 --offset 43212345 --max-messages 1
Related
I have a kakfa consumer for which enable.auto.commit is set to false. Whenever I re-start my consumer application, it always reads the last committed offset again and then the next offsets.
For ex. Last committed offset is 50. When I restart consumer, it again reads offset 50 first and then the next offsets.
I am performing commitsync as shown below.
Map<TopicPartition, OffsetAndMetadata> offsets = new HashMap<>();
offsets.put(new TopicPartition("sometopic", partition), new OffsetAndMetadata(offset));
kafkaconsumer.commitSync(offsets);
I tried setting auto.offset.reset to earliest and latest but it is not changing the behavior.
Am I missing something here in consumer configuration ?
config.put(ConsumerConfig.CLIENT_ID_CONFIG, "CLIENT_ID");
config.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "127.0.0.1:9092");
config.put(ConsumerConfig.GROUP_ID_CONFIG, "GROUP_ID");
config.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false");
config.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,StringDeserializer.class.getName());
config.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,CustomDeserializer.class.getName());
config.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
If you want to use commitSync(offset) you have to be careful and read its Javadoc:
The committed offset should be the next message your application will consume, i.e. lastProcessedMessageOffset + 1.
If you don't add + 1 to the offset, it is expected that on next restart, the consumer will consume again the last message. As mentioned in the other answer, if you use commitSync() without any argument, you don't have to worry about that
It looks like you're trying to commit using new OffsetAndMetadta(offset). That's not the typical usage.
Here's an example from the documentation, under Manual Offset Control:
List<ConsumerRecord<String, String>> buffer = new ArrayList<>();
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
buffer.add(record);
}
if (buffer.size() >= minBatchSize) {
insertIntoDb(buffer);
consumer.commitSync();
buffer.clear();
}
}
https://kafka.apache.org/21/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html
Notice how the consumer.commitSync() call is performed without any parameters. It simply consumes, and it will commit to whatever was consumed up to that point.
I am facing issues in getting a very basic kafka consumer to work. I am using the kafka-clients-1.1.0.jar
Here is all that I have done.
Started zookeeper on command line (All commands are run from )
zookeeper-server-start.bat ../../config/zookeeper.properties
Started Kafka server
kafka-server-start.bat ../../config/server.properties
Created a new topic 'hellotopic' and verified it by listing the topics
kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic hellotopic
Created topic "hellotopic".
Verify by listing the topics
D:\RC\Softwares\kafka_2.12-1.1.0\kafka_2.12-1.1.0\bin\windows>kafka-topics.bat --list --zookeeper localhost:2181
hellotopic
Post message to the topic and verified the same on console consumer
kafka-console-producer.bat --broker-list localhost:9092 --topic hellotopic --property "parse.key=true" --property "key.separator=:"
Message key and value entered as below
key1:value1
You can see that on the console consumer we are able to see the message in topic 'hellotopic'
kafka-console-consumer.bat --zookeeper localhost:2181 --topic hellotopic --from-beginning
Output for above command is as shown below. We can see the message value 'value1' that was posted
Using the ConsoleConsumer with old consumer is deprecated and will be removed in a future major release. Consider using the new consumer by passing [bootstrap-server] instead of [zookeeper].
value1
Now that we have a topic with a message in it, I run my simple Java kafka consumer code to fetch all messages in the topic 'hellotopic'. Below is the code
import java.util.Arrays\;
import java.util.Properties;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
public class SampleConsumer {
public static void main(String[] args) {
System.out.println("Start consumer code");
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "test-consumer-group");
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "1000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("hellotopic"));
//while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records)
System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
//}
System.out.println("End consumer code");
}
}
When we run the above class, here is the output seen
Start consumer code
End consumer code
Tried a lot to find the issue, but no luck yet. Much appreciate help on this simple example.
I see two issues with the code:
You are missing a particular config that makes the consumer start from the earliest offset: props.put("auto.offset.reset", "earliest");
The --from-beginning in your command line consumer actually translated to this config. This config tells the consumer to start from the earliest offset if there no committed offset found for the corresponding topic and partition within the group.
The actual poll should be in a loop. One poll may not give the consumer enough time to do the subscription and also fetch data. One common way to do the poll is this:
try {
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records)
System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
}
} finally {
consumer.close();
}
I have written kafka consumer in scala. When I run consumer it is showing blank on console.
I have used below code:
val topicProducer = "testOutput"
val props = new Properties()
props.put("bootstrap.servers","host:9092,host:9092")
props.put("key.deserializer","org.apache.kafka.common.serialization.StringDeserializer")
props.put("value.deserializer","org.apache.kafka.common.serialization.StringDeserializer")
props.put("group.id", "test");
val kafkaConsumer = new KafkaConsumer[String, String](props)
val topic = Array("test").toList
kafkaConsumer.subscribe(topic)
val results = kafkaConsumer.poll(2000)
for ((record) <- results) {
producer.send(new ProducerRecord(topicProducer,"key","Value="+record.key()+" Record Key="+record.value()+"append"))
}
You also need to specify auto.offset.reset property so that your consumer is able to consume the messages from the beginning (equivalent to --from-beginning in the command-line )
props.put("auto.offset.reset", "earliest");
According to Kafka docs:
auto.offset.reset
What to do when there is no initial offset in ZooKeeper or if an
offset is out of range:
smallest : automatically reset the offset to the smallest offset
largest : automatically reset the offset to the largest offset
anything else: throw exception to the consumer
EDIT:
Alternatively, if you are using the old consumer API then instead of bootstrap-server host:9092 use the zookeeper parameter --zookeeper host:2181 .
If this does not solve the issue then try to delete /brokers in zookeeper
bin/zookeeper-shell <zk-host>:2181
and restart the kafka nodes
rmr /brokers
I'm new to Kafka and working on a prototype to connect a proprietary streaming service into Kafka.
I'm looking to get the key of the last message sent on a topic as our in-house stream consumer needs to logon with the ID of the last message it received when connecting.
Is it possible, using either the KafkaProducer or a KafkaConsumer to do this?
I've attempted to do the following using a Consumer, but when also running the console consumer I see messages replayed.
// Poll so we know we're connected
consumer.poll(100);
// Get the assigned partitions
Set<TopicPartition> assignedPartitions = consumer.assignment();
// Seek to the end of those partitions
consumer.seekToEnd(assignedPartitions);
for(TopicPartition partition : assignedPartitions) {
final long offset = consumer.committed(partition).offset();
// Seek to the previous message
consumer.seek(partition,offset - 1);
}
// Now get the last message
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
lastKey = record.key();
}
consumer.close();
Is this expected behaviour or am I on the wrong path?
The problem is on line final long offset = consumer.committed(partition).offset(), as link api refers committed method is to get the last committed offset for the given partition, i.e: the last offset your consumer tell kafka server that it had already read.
So, definitely you will got messages replayed, because you always read from specific offset.
As I think I only have to remove the first for block.
Check the record count and get the last message:
// Poll so we know we're connected
consumer.poll(100);
// Get the assigned partitions
Set<TopicPartition> assignedPartitions = consumer.assignment();
// Seek to the end of those partitions
consumer.seekToEnd(assignedPartitions);
for (TopicPartition partition : assignedPartitions) {
final long offset = consumer.committed(partition).offset();
// Seek to the previous message
consumer.seek(partition, offset - 1);
}
// Now get the last message
ConsumerRecords<String, String> records = consumer.poll(100);
int size = records.count();
int index = 0;
for (ConsumerRecord<String, String> record : records) {
index = index + 1;
if (index == size) {
String value = record.value();
System.out.println("Last Message = " + value);
}
}
consumer.close();
Here is my kafka message producer:
ProducerRecord producerRecord = new ProducerRecord(topic, "k1", message);
producer.send(producerRecord);
here is my consumer
TopicPartition partition0 = new TopicPartition(topic, 0);
consumer.assign(Arrays.asList(partition0));
final int minBatchSize = 200;
List<ConsumerRecord<String, byte[]>> buffer = new ArrayList<>();
while (true) {
ConsumerRecords<String, byte[]> records = consumer.poll(100);
for (ConsumerRecord<String, byte[]> record : records) {
buffer.add(record);
System.out.println(record.key() + "KEY: " + record.value());
How is it possible to consume only topic message having k1 as partition key
The only way I see to implement such behavior is to have the number of partitions == number of possible keys and have a custom partitioner to maintain key uniqueness for a partition (default hash partitioner would work I think). But this solution is very far from optimal and I can't recommend it. Besides that you can't use any built in mechanism to achieve similar behavior - you'll have to filter messages on client side
One proposal is to remember the partition and offset of your specific message,
and using assign and seek, poll in consumer side.(also set consumer max.poll.records=1, which fetch one message in one time).
assign, assign specific partition to consumer;
seek, seek to specific offset, then next poll will get your expected message K1.
Note:It works like "random" seek, but will reduce message consumption performance.
0.10 new consumer and new config max.poll.records are required.