How to know that Kafka has processed all the messages?
is there any command or log file that species Kafka offset under processing and the last Kafka offset?
You could use the command line tool kafka-consumer-groups.sh to check the consumer lag of your ConsumerGroup. It will show the end offset of the topic the ConsumerGroup is consuming and the last offset the ConsumerGroup committed:
bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group mygroup
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG OWNER
mygroup test-topic 0 5 15 10 xxx
mygroup test-topic 1 10 15 5 xxx
If you want to do it programmatically, e.g. from a Spring application:
#Bean
public ApplicationRunner runner(KafkaAdmin admin, ConsumerFactory<String, String> cf) {
return args -> {
try (
AdminClient client = AdminClient.create(admin.getConfig());
Consumer<String, String> consumer = cf.createConsumer("dummyGroup", "clientId", "");
) {
Collection<ConsumerGroupListing> groups = client.listConsumerGroups()
.all()
.get(10, TimeUnit.SECONDS);
groups.forEach(group -> {
Map<TopicPartition, OffsetAndMetadata> map = null;
try {
map = client.listConsumerGroupOffsets(group.groupId())
.partitionsToOffsetAndMetadata()
.get(10, TimeUnit.SECONDS);
}
catch (InterruptedException e) {
e.printStackTrace();
}
catch (ExecutionException e) {
e.printStackTrace();
}
catch (TimeoutException e) {
e.printStackTrace();
}
Map<TopicPartition, Long> endOffsets = consumer.endOffsets(map.keySet());
map.forEach((tp, off) -> {
System.out.println("group: " + group + " tp: " + tp
+ " current offset: " + off.offset()
+ " end offset: " + endOffsets.get(tp));
});
});
}
};
}
Related
I have 10 consumers and 10 partitions.
I take the number of partitions
int partitionCount = getPartitionCount(kafkaUrl);
and I create the same number of consumers with the same group.id.
public void listen() {
try {
String kafkaUrl = getKafkaUrl();
int partitionCount = getPartitionCount(kafkaUrl);
Stream.iterate(0, i -> i + 1)
.limit(partitionCount)
.forEach(index -> executorService.execute(() ->
consumerTask.invokeKafkaConsumerTask(prepareConsumerConfig(index, kafkaUrl), INPUT_TOPIC)));
} catch (Exception exception) {
logger.error("Cannot receive event from kafka ", exception);
}
public void invokeKafkaConsumerTask(Properties properties, String topicName) {
try(KafkaConsumer<String, String> consumer = new KafkaConsumer<>(properties)) {
consumer.subscribe(Collections.singletonList(topicName));
logger.info("[KAFKA] consumer created");
invokeKafkaConsumer(consumer);
} catch (IllegalArgumentException exception) {
logger.error("Cannot create kafka consumer ", exception);
}
}
private void invokeKafkaConsumer(KafkaConsumer<String, String> consumer) {
try {
while (true) {
ConsumerRecords<String, String> consumerRecords = consumer.poll(Duration.ofSeconds(4));
if (consumerRecords.count() > 0) {
consumeRecords(consumerRecords);
consumer.commitSync();
}
}
} catch (Exception e) {
logger.error("Error while receiving records ", e);
}
}
method getPartitionCount
return 10 partitions so it's working right
config looks like this
Properties properties = new Properties();
properties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaUrl);
properties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
properties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
properties.put(ConsumerConfig.GROUP_ID_CONFIG, CONSUMER_CLIENT_ID);
properties.put(ConsumerConfig.CLIENT_ID_CONFIG, CONSUMER_CLIENT_ID + index);
properties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false");
properties.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, "300000");
properties.put(ConsumerConfig.HEARTBEAT_INTERVAL_MS_CONFIG, "10000");
properties.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG, String.valueOf(Integer.MAX_VALUE));
properties.put(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG, "org.apache.kafka.clients.consumer.RoundRobinAssignor");
what I see after assigning consumers to the partition
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CLIENT-ID
topicName 1 89391 89391 0 consumer0
topicName 3 88777 88777 0 consumer1
topicName 5 89280 89280 0 consumer2
topicName 4 88776 88776 0 consumer2
topicName 0 4670991 4670991 0 consumer0
topicName 9 23307 89343 66036 consumer4
topicName 7 89610 89610 0 consumer3
topicName 8 88167 88167 0 consumer4
topicName 2 89138 89138 0 consumer1
topicName 6 88967 88967 0 consumer3
only half of the consumers have been assigned to the partitions
why did this happen? There should be one consumer per partition acording to documentation. Am I doing something wrong? kafka version 2.1.1.
I also find few this logs ->
Setting newly assigned partitions:[empty]
[solution] interesting case I changed group.id and partition.assignment.strategy, added auto.offset.reset=earliest and it looks like it works...
Are you subscribing to a collection of topic name or java Pattern?
If you are subscribing to a Pattern , change partition.assignment.strategy to RoundRobinAssignor or StickyAssignor.
I am running confluent-oss-5.0.0-2.11 kafka server with default server properties (from etc/kafka) on my local PC and created topics test1 and test2 with the below command
kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 –topic
Below are the environment properties I have
props.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 10);
props.put(BATCH_SIZE_CONFIG, 16384);
props.put(LINGER_MS_CONFIG, 10);
props.put(BUFFER_MEMORY_CONFIG, 33554432);
props.put(METADATA_MAX_AGE_CONFIG, "10");
Here is what the producer does
String padded = RandomStringUtils.random(2000, true, true);
for(int i=0;i<1000000;i++) {
kafkaProducer.send("a" + i, "aa" + padded + i, "test1");
kafkaProducer.send("a" + i, "bb" + padded + i, "test2");
}
kafkaProducer.flush();
Here is what the consumer does
KTable<String, String> a = builder.table("test");
KTable<String, String> b = builder.table("test1");
a.join(b, new ValueJoiner<String, String, String>() {
#Override
public String apply(String value1, String value2) {
return "a" + value1;
}
}).toStream().to("finalTopic");
And below is how I observe the performance of "finalTopic" population
AtomicInteger counter = new AtomicInteger();
builder.<String, String>stream("finalTopic").peek((key, value) -> {
if(counter.incrementAndGet()%1000 == 0) {
logger.info("date {}, final join key {}, value size {}, joins performed {}", System.currentTimeMillis(), key, value.length(), counter.get());
}
});
I managed to get around 55,000 messages per second getting above messages into different Kafka topics using producer.
However, on the consumer side, the rate of messages are being populated into "finalTopic" is at around 110 messages per second.
Any pointer is appreciated!
I am fairly new to kafka. I have created a sample producer and consumer in java. Using the producer, I was able to send data to a kafka topic but I am not able to get the number of records in the topic using the following consumer code.
public class ConsumerTests {
public static void main(String[] args) throws Exception {
BasicConfigurator.configure();
String topicName = "MobileData";
String groupId = "TestGroup";
Properties properties = new Properties();
properties.put("bootstrap.servers", "localhost:9092");
properties.put("group.id", groupId);
properties.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
properties.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> kafkaConsumer = new KafkaConsumer<>(properties);
kafkaConsumer.subscribe(Arrays.asList(topicName));
try {
while (true) {
ConsumerRecords<String, String> consumerRecords = consumer.poll(100);
System.out.println("Record count is " + records.count());
}
} catch (WakeupException e) {
// ignore for shutdown
} finally {
consumer.close();
}
}
}
I don't get any exception in the console but consumerRecords.count() always returns 0, even if there are messages in the topic. Please let me know, if I am missing something to get the record details.
The poll(...) call should normally be in a loop. It's always possible for the initial poll(...) to return no data (depending on the timeout) while the partition assignment is in progress. Here's an example:
try {
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
System.out.println("Record count is " + records.count());
}
} catch (WakeupException e) {
// ignore for shutdown
} finally {
consumer.close();
}
For more info see this relevant article:
I am using Kafka 0.10.2 KafkaProducer to produce data in topic.
Below is my code.
ProducerRecord record = new ProducerRecord(topic, (Integer)null, Long.valueOf(System.currentTimeMillis()), partitionKey.getBytes(), message.getBytes());
Future<RecordMetadata> future = this.producer.send(record, new Callback() {
public void onCompletion(RecordMetadata recordMetadata, Exception e) {
if(e != null) {
LOGGER.error("Error producing to topic " + partitionKey, e.getCause());
} else {
LOGGER.info(" Successfully produced topic " + recordMetadata.topic() + " on partition " + recordMetadata.partition() + "at " + recordMetadata.timestamp());
}
}
});
this message is getting replicated through mirrormaker in different cluster. I wrote a Mirrormaker Handler to send capture the delay though per message.
when I check the timestamp at mirrormaker consumer. I am not able to see the timestamp. below is the code for mirrormaker handler.
#Override
public List<ProducerRecord<byte[], byte[]>> handle(BaseConsumerRecord record) {
LOGGER.debug("Timestamp from dal producer "+record.timestamp());
return Collections.singletonList(new ProducerRecord<byte[], byte[]>(topicToSend,partitionToSend,timeStampAtMM, record.key(), record.value()));
}
I I am using Kafka 0.10.0 and zookeeper 3.4.6 in my production server .I am having 20 topics each with approx 50 partitions. I am having the total of 100 consumers each subscribed to different topics and partitions .All the consumers are having the same groupId. So will this be the case that if a consumer is added or removed for a specific topic then the consumers attached to the different topic will also undergo rebalancing?
My consumer code is:
public static void main(String[] args) {
String groupId = "prod"
String topicRegex = args[0]
String consumerTimeOut = "10000"
int n_threads = 1
if (args && args.size() > 1) {
ConfigLoader.init(args[1])
}
else {
ConfigLoader.init('development')
}
if(args && args.size() > 2 && args[2].isInteger()){
n_threads = (args[2]).toInteger()
}
ExecutorService executor = Executors.newFixedThreadPool(n_threads)
addShutdownHook(executor)
String zooKeeper = ConfigLoader.conf.zookeeper.hostName
List<Runnable> taskList = []
for(int i = 0; i < n_threads; i++){
KafkaConsumer example = new KafkaConsumer(zooKeeper, groupId, topicRegex, consumerTimeOut)
taskList.add(example)
}
taskList.each{ task ->
executor.submit(task)
}
executor.shutdown()
executor.awaitTermination(Long.MAX_VALUE, TimeUnit.SECONDS)
}
private static ConsumerConfig createConsumerConfig(String a_zookeeper, String a_groupId, String consumerTimeOut) {
Properties props = new Properties()
props.put("zookeeper.connect", a_zookeeper)
props.put("group.id", a_groupId)
props.put("zookeeper.session.timeout.ms", "10000")
props.put("rebalance.backoff.ms","10000")
props.put("zookeeper.sync.time.ms","200")
props.put("rebalance.max.retries","10")
props.put("enable.auto.commit", "false")
props.put("consumer.timeout.ms", consumerTimeOut)
props.put("auto.offset.reset", "smallest")
return new ConsumerConfig(props)
}
public void run(String topicRegex) {
String threadName = Thread.currentThread().getName()
logger.info("{} [{}] main Starting", TAG, threadName)
Map<String, Integer> topicCountMap = new HashMap<String, Integer>()
List<KafkaStream<byte[], byte[]>> streams = consumer.createMessageStreamsByFilter(new Whitelist(topicRegex),1)
ConsumerConnector consumerConnector = consumer
for (final KafkaStream stream : streams) {
ConsumerIterator<byte[], byte[]> consumerIterator = stream.iterator()
List<Object> batchTypeObjList = []
String topic
String topicObjectType
String method
String className
String deserialzer
Integer batchSize = 200
while (true){
boolean hasNext = false
try {
hasNext = consumerIterator.hasNext()
} catch (InterruptedException interruptedException) {
//if (exception instanceof InterruptedException) {
logger.error("{} [{}]Interrupted Exception: {}", TAG, threadName, interruptedException.getMessage())
throw interruptedException
//} else {
} catch(ConsumerTimeoutException timeoutException){
logger.error("{} [{}] Timeout Exception: {}", TAG, threadName, timeoutException.getMessage())
topicListMap.each{ eachTopic, value ->
batchTypeObjList = topicListMap.get(eachTopic)
if(batchTypeObjList != null && !batchTypeObjList.isEmpty()) {
def dbObject = topicConfigMap.get(eachTopic)
logger.debug("{} [{}] Timeout Happened.. Indexing remaining objects in list for topic: {}", TAG, threadName, eachTopic)
className = dbObject.get(KafkaTopicConfigEntity.CLASS_NAME_KEY)
method = dbObject.get(KafkaTopicConfigEntity.METHOD_NAME_KEY)
int sleepTime = 0
if(dbObject.get(KafkaTopicConfigEntity.CONUSMER_SLEEP_IN_MS) != null)
sleepTime = dbObject.get(KafkaTopicConfigEntity.CONUSMER_SLEEP_IN_MS)?.toInteger()
executeMethod(className, method, batchTypeObjList)
batchTypeObjList.clear()
topicListMap.put(eachTopic,batchTypeObjList)
sleep(sleepTime)
}
}
consumer.commitOffsets()
continue
} catch(Exception exception){
logger.error("{} [{}]Exception: {}", TAG, threadName, exception.getMessage())
throw exception
}
if(hasNext) {
def consumerObj = consumerIterator.next()
logger.debug("{} [{}] partition name: {}", TAG, threadName, consumerObj.partition())
topic = consumerObj.topic()
DBObject dbObject = topicConfigMap.get(topic)
logger.debug("{} [{}] topic name: {}", TAG, threadName, topic)
topicObjectType = dbObject.get(KafkaTopicConfigEntity.TOPIC_OBJECT_TYPE_KEY)
deserialzer = KafkaConfig.DEFAULT_DESERIALIZER
if(KafkaConfig.DESERIALIZER_MAP.containsKey(topicObjectType)){
deserialzer = KafkaConfig.DESERIALIZER_MAP.get(topicObjectType)
}
className = dbObject.get(KafkaTopicConfigEntity.CLASS_NAME_KEY)
method = dbObject.get(KafkaTopicConfigEntity.METHOD_NAME_KEY)
boolean isBatchJob = dbObject.get(KafkaTopicConfigEntity.IS_BATCH_JOB_KEY)
if(dbObject.get(KafkaTopicConfigEntity.BATCH_SIZE_KEY) != null)
batchSize = dbObject.get(KafkaTopicConfigEntity.BATCH_SIZE_KEY)
else
batchSize = 1
Object queueObj = (Class.forName(deserialzer)).deserialize(consumerObj.message())
int sleepTime = 0
if(dbObject.get(KafkaTopicConfigEntity.CONUSMER_SLEEP_IN_MS) != null)
sleepTime = dbObject.get(KafkaTopicConfigEntity.CONUSMER_SLEEP_IN_MS)?.toInteger()
if(isBatchJob == true){
batchTypeObjList = topicListMap.get(topic)
batchTypeObjList.add(queueObj)
if(batchTypeObjList.size() == batchSize) {
executeMethod(className, method, batchTypeObjList)
batchTypeObjList.clear()
sleep(sleepTime)
}
topicListMap.put(topic,batchTypeObjList)
} else {
executeMethod(className, method, queueObj)
sleep(sleepTime)
}
consumer.commitOffsets()
}
}
logger.debug("{} [{}] Shutting Down Process ", TAG, threadName)
}
}
Any help will be appriciated.
Whenever a consumer leaves or joins a consumer group, the entire group undergoes a rebalance. Since the group tracks all partitions across all topics that its members are subscribed to you are right in thinking, that this can lead to rebalancing of consumers that are not subscribed to the topic in question.
Please see below for a small test illustrating this point, I have a broker with two topics test1 (2 partitions) and test2 (9 partitions) and am starting two consumers, both with the same consumer group, each subscribes to only one of the two topics. As you can see, when consumer2 joins the group, consumer1 gets all partitions revoked and reassigned, because the entire group rebalances.
Subscribing consumer1 to topic test1
Starting thread for consumer1
Polling consumer1
consumer1 got 0 partitions revoked!
consumer1 got 2 partitions assigned!
Polling consumer1
Polling consumer1
Polling consumer1
Subscribing consumer2 to topic test2
Starting thread for consumer2
Polling consumer2
Polling consumer1
consumer2 got 0 partitions revoked!
Polling consumer1
Polling consumer1
consumer1 got 2 partitions revoked!
consumer2 got 9 partitions assigned!
consumer1 got 2 partitions assigned!
Polling consumer2
Polling consumer1
Polling consumer2
Polling consumer1
Polling consumer2