Unable to get number of messages in kafka topic - apache-kafka

I am fairly new to kafka. I have created a sample producer and consumer in java. Using the producer, I was able to send data to a kafka topic but I am not able to get the number of records in the topic using the following consumer code.
public class ConsumerTests {
public static void main(String[] args) throws Exception {
BasicConfigurator.configure();
String topicName = "MobileData";
String groupId = "TestGroup";
Properties properties = new Properties();
properties.put("bootstrap.servers", "localhost:9092");
properties.put("group.id", groupId);
properties.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
properties.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> kafkaConsumer = new KafkaConsumer<>(properties);
kafkaConsumer.subscribe(Arrays.asList(topicName));
try {
while (true) {
ConsumerRecords<String, String> consumerRecords = consumer.poll(100);
System.out.println("Record count is " + records.count());
}
} catch (WakeupException e) {
// ignore for shutdown
} finally {
consumer.close();
}
}
}
I don't get any exception in the console but consumerRecords.count() always returns 0, even if there are messages in the topic. Please let me know, if I am missing something to get the record details.

The poll(...) call should normally be in a loop. It's always possible for the initial poll(...) to return no data (depending on the timeout) while the partition assignment is in progress. Here's an example:
try {
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
System.out.println("Record count is " + records.count());
}
} catch (WakeupException e) {
// ignore for shutdown
} finally {
consumer.close();
}
For more info see this relevant article:

Related

Kafka consumer getting stuck

we are using Kafka streams to insert into PostgreSQL since the flow is too high a direct insert is being avoided. The consumer seems to be working well but gets stuck occasionally and cant find the root cause for the same .
The consumer has been running for about 6 months and already consumed billions of records. I don't understand why its getting stuck as of late . I don't even know where to start debugging.
Below is the code for processing the records:
`private static void readFromTopic(DataSource datasource, ConsumerOptions options) {
KafkaConsumer<String, String> consumer = KafkaConsumerConfig.createConsumerGroup(options);
Producer<Long, String> producer = KafkaProducerConfig.createKafkaProducer(options);
if (options.isReadFromAnOffset()) {
// if want to assign particular offsets to consume from
// will work for only a single partition for a consumer
List<TopicPartition> tpartition = new ArrayList<TopicPartition>();
tpartition.add(new TopicPartition(options.getTopicName(), options.getPartition()));
consumer.assign(tpartition);
consumer.seek(tpartition.get(0), options.getOffset());
} else {
// use auto assign partition & offsets
consumer.subscribe(Arrays.asList(options.getTopicName()));
log.debug("subscribed to topic {}", options.getTopicName());
}
List<Payload> payloads = new ArrayList<>();
while (true) {
// timer is the time to wait for messages to be received in the broker
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(50));
if(records.count() != 0 )
log.debug("poll size is {}", records.count());
Set<TopicPartition> partitions = records.partitions();
// reading normally as per round robin and the last committed offset
for (ConsumerRecord<String, String> r : records) {
log.debug(" Parition : {} Offset : {}", r.partition(), r.offset());
try {
JSONArray arr = new JSONArray(r.value());
for (Object o : arr) {
Payload p = JsonIterator.deserialize(((JSONObject) o).toString(), Payload.class);
payloads.add(p);
}
List<Payload> steplist = new ArrayList<>();
steplist.addAll(payloads);
// Run a task specified by a Runnable Object asynchronously.
CompletableFuture<Void> future = CompletableFuture.runAsync(new Runnable() {
#Override
public void run() {
try {
Connection conn = datasource.getConnection();
PgInsert.insertIntoPg(steplist, conn, consumer, r, options.getTopicName(),
options.getErrorTopic(), producer);
} catch (Exception e) {
log.error("error in processing future {}", e);
}
}
}, executorService);
// used to combine all futures
allfutures.add(future);
payloads.clear();
} catch (Exception e) {
// pushing into new topic for records which have failed
log.debug("error in kafka consumer {}", e);
ProducerRecord<Long, String> record = new ProducerRecord<Long, String>(options.getErrorTopic(),
r.offset(), r.value());
producer.send(record);
}
}
// commiting after every poll
consumer.commitSync();
if (records.count() != 0) {
Map<TopicPartition, OffsetAndMetadata> metadata = consumer.committed(partitions);
// reading the committed offsets for each partition after polling
for (TopicPartition tpartition : partitions) {
OffsetAndMetadata offsetdata = metadata.get(tpartition);
if (offsetdata != null && tpartition != null)
log.debug("committed offset is " + offsetdata.offset() + " for topic partition "
+ tpartition.partition());
}
}
// waiting for all threads to complete after each poll
try {
waitForFuturesToEnd();
allfutures.clear();
} catch (InterruptedException e) {
e.printStackTrace();
} catch (ExecutionException e) {
e.printStackTrace();
}
}
}`
Earlier i thought reason for it getting stuck is the size of the records being consumed , so i have reduced the MAX_POLL_RECORDS_CONFIG to 10. This will ensure the records fetched in the poll wont be more than 200kb since each record can have a max size of 20kb.
Thinking of using Spring framework to resolve this issue but before that would like to know why exactly the consumer gets stuck .Any insights on this will be helpful.

Kafka consumer API is not working properly

I am new to Kafka .i started doing on Kafka i am facing below issue please help me to solve this one thank in advance.
First i am writing producer API it is working fine but while doing Consumer API messages are not display.
My code is like this :
import java.util.Arrays;
import java.util.Properties;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.ConsumerRecord;
public class ConsumerGroup {
public static void main(String[] args) throws Exception {
String topic = "Hello-Kafka";
String group = "myGroup";
Properties props = new Properties();
props.put("bootstrap.servers", "XXX.XX.XX.XX:9092");
props.put("group.id", group);
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);
try {
consumer.subscribe(Arrays.asList(topic));
System.out.println("Subscribed to topic " + topic);
ConsumerRecords<String, String> records = consumer.poll(100);
System.out.println("records ::" + records);
System.out.println(records.toString());
for (ConsumerRecord<String, String> record : records) {
System.out.println("Record::" + record.offset());
System.out.println(record.key());
System.out.println(record.value());
}
consumer.commitSync();
} catch (Exception e) {
e.printStackTrace();
} finally {
consumer.commitSync();
consumer.close();
}
}
}
Response ::
Subscribed to topic Hello-Kafka
records ::org.apache.kafka.clients.consumer.ConsumerRecords#76b0bfab
org.apache.kafka.clients.consumer.ConsumerRecords#76b0bfab
here not printing the Offset,key,value
Control is not coming to for (ConsumerRecord record : records) {
that for loop it self please help me.
You are trying to print empty records, hence only records.toString() is printing in your code, which essentially is the name of the class.
I made some changes in your code and got it working. Do have a look if this helps.
public class ConsumerGroup {
public static void main(String[] args) throws Exception {
String topic = "Hello-Kafka";
String group = "myGroup";
Properties props = new Properties();
props.put("bootstrap.servers", "xx.xx.xx.xx:9092");
props.put("group.id", group);
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);
try {
consumer.subscribe(Arrays.asList(topic));
System.out.println("Subscribed to topic " + topic);
while(true){
ConsumerRecords<String, String> records = consumer.poll(1000);
if(records.isEmpty()){
}
else{
System.out.println("records ::" + records);
System.out.println(records.toString());
for (ConsumerRecord<String, String> record : records) {
System.out.println("Record::" + record.offset());
System.out.println(record.key());
System.out.println(record.value());
}
consumer.commitSync();
}
}
} catch (Exception e) {
e.printStackTrace();
} finally {
consumer.commitSync();
consumer.close();
}
}
}

Kafka Producer Consumer API Issue

I am using Kafka v0.10.0.0 and created Producer & Consumer Java code. But code is stuck on producer.send without any exception in logs.
Can anyone please help. Thank in advance.
I am using/modifying "mapr - kakfa sample program". You can look at the full code here.
https://github.com/panwars87/kafka-sample-programs
**Important: I changed the kafka-client version to 0.10.0.0 in maven dependencies and running Kafka 0.10.0.0 in my local.
public class Producer {
public static void main(String[] args) throws IOException {
// set up the producer
KafkaProducer<String, String> producer;
System.out.println("Starting Producers....");
try (InputStream props = Resources.getResource("producer.props").openStream()) {
Properties properties = new Properties();
properties.load(props);
producer = new KafkaProducer<>(properties);
System.out.println("Property loaded successfully ....");
}
try {
for (int i = 0; i < 20; i++) {
// send lots of messages
System.out.println("Sending record one by one....");
producer.send(new ProducerRecord<String, String>("fast-messages","sending message - "+i+" to fast-message."));
System.out.println(i+" message sent....");
// every so often send to a different topic
if (i % 2 == 0) {
producer.send(new ProducerRecord<String, String>("fast-messages","sending message - "+i+" to fast-message."));
producer.send(new ProducerRecord<String, String>("summary-markers","sending message - "+i+" to summary-markers."));
producer.flush();
System.out.println("Sent msg number " + i);
}
}
} catch (Throwable throwable) {
System.out.printf("%s", throwable.getStackTrace());
throwable.printStackTrace();
} finally {
producer.close();
}
}
}
public class Consumer {
public static void main(String[] args) throws IOException {
// and the consumer
KafkaConsumer<String, String> consumer;
try (InputStream props = Resources.getResource("consumer.props").openStream()) {
Properties properties = new Properties();
properties.load(props);
if (properties.getProperty("group.id") == null) {
properties.setProperty("group.id", "group-" + new Random().nextInt(100000));
}
consumer = new KafkaConsumer<>(properties);
}
consumer.subscribe(Arrays.asList("fast-messages", "summary-markers"));
int timeouts = 0;
//noinspection InfiniteLoopStatement
while (true) {
// read records with a short timeout. If we time out, we don't really care.
ConsumerRecords<String, String> records = consumer.poll(200);
if (records.count() == 0) {
timeouts++;
} else {
System.out.printf("Got %d records after %d timeouts\n", records.count(), timeouts);
timeouts = 0;
}
for (ConsumerRecord<String, String> record : records) {
switch (record.topic()) {
case "fast-messages":
System.out.println("Record value for fast-messages is :"+ record.value());
break;
case "summary-markers":
System.out.println("Record value for summary-markers is :"+ record.value());
break;
default:
throw new IllegalStateException("Shouldn't be possible to get message on topic ");
}
}
}
}
}
The code you're running is for a demo of mapR which is not Kafka. MapR claims API compatibility with Kafka 0.9, but even then mapR treats message offsets differently that does Kafka (offsets are byte offsets of messages rather than incremental offsets), etc.. The mapR implementation is also very, very different to say the least. This means that if you're lucky, a Kafka 0.9 app might just happen to run on mapR and vise versa. There is no such guarantee for other releases.
Thank you everyone for all your inputs. I resolved this by tweaking Mapr code and referring few other posts. Link for the solution api:
https://github.com/panwars87/hadoopwork/tree/master/kafka/kafka-api

Apache Kafka: Producer Not Producing All Data

I am new in kafka. My requirement is, I have two table in database source and destination. Now I want to fetch data from source table and store it into destination between these kafka will be work as a producer and consumer. I have done the code but problem is that when producer produces the data some data are missed to produce. For example if I have 100 records in source table then it's not produces all 100 records. I am using Kafka-0.10
MyProducer Config-
bootstrap.servers=192.168.1.XXX:9092,192.168.1.XXX:9093,192.168.1.XXX:9094
acks=all
retries=2
batch.size=16384
linger.ms=2
buffer.memory=33554432
key.serializer=org.apache.kafka.common.serialization.IntegerSerializer
value.serializer=org.apache.kafka.common.serialization.StringSerializer
My Producer Code:-
public void run() {
SourceDAO sourceDAO = new SourceDAO();
Source source;
int id;
try {
logger.debug("INSIDE RUN");
List<Source> listOfEmployee = sourceDAO.getAllSource();
Iterator<Source> sourceIterator = listOfEmployee.iterator();
String sourceJson;
Gson gson = new Gson();
while(sourceIterator.hasNext()) {
source = sourceIterator.next();
sourceJson = gson.toJson(source);
id = source.getId();
producerRecord = new ProducerRecord<Integer, String>(TOPIC, id, sourceJson);
producerRecords.add(producerRecord);
}
for(ProducerRecord<Integer, String> record : producerRecords) {
logger.debug("Producer Record: " + record.value());
producer.send(record, new Callback() {
#Override
public void onCompletion(RecordMetadata metadata, Exception exception) {
logger.debug("Exception: " + exception);
if (exception != null)
throw new RuntimeException(exception.getMessage());
logger.info("The offset of the record we just sent is: " + metadata.offset()
+ " In Partition : " + metadata.partition());
}
});
}
producer.close();
producer.flush();
logger.info("Size of Record: " + producerRecords.size());
} catch (SourceServiceException e) {
logger.error("Unable to Produce data...", e);
throw new RuntimeException("Unable to Produce data...", e);
}
}
My Consumer Config:-
bootstrap.servers=192.168.1.XXX:9092,192.168.1.231:XXX,192.168.1.232:XXX
group.id=consume
client.id=C1
enable.auto.commit=true
auto.commit.interval.ms=1000
max.partition.fetch.bytes=10485760
session.timeout.ms=35000
consumer.timeout.ms=35000
auto.offset.reset=earliest
message.max.bytes=10000000
key.deserializer=org.apache.kafka.common.serialization.IntegerDeserializer
value.deserializer=org.apache.kafka.common.serialization.StringDeserializer
Consumer Code:-
public void doWork() {
logger.debug("Inside doWork of DestinationConsumer");
DestinationDAO destinationDAO = new DestinationDAO();
consumer.subscribe(Collections.singletonList(this.TOPIC));
while(true) {
ConsumerRecords<String, String> consumerRecords = consumer.poll(1000);
int minBatchSize = 1;
for(ConsumerRecord<String, String> rec : consumerRecords) {
logger.debug("Consumer Recieved Record: " + rec);
consumerRecordsList.add(rec);
}
logger.debug("Record Size: " + consumerRecordsList.size());
if(consumerRecordsList.size() >= minBatchSize) {
try {
destinationDAO.insertSourceDataIntoDestination(consumerRecordsList);
} catch (DestinationServiceException e) {
logger.error("Unable to update destination table");
}
}
}
}
From what could be seens here I would guess that you did not flush or close the producer. You should note that send runs async and just prepare a batch which is send later on (depending on the configuration of your producer):
From the kafka documentation
The send() method is asynchronous. When called it adds the record to a buffer of pending record sends and immediately returns. This allows the producer to batch together individual records for efficiency.
What you should try is to call producer.close() after you iterated over all producerRecords (BTW: why are you caching the entire producerRecords that might causes problems when you have to many records).
If that does not help you should try to use a e.g. a console consumer to figure out what is missing. Please offer some more code. How is the producer configured? How does your consumer look like? What is the type of producerRecords?
Hope that helps.

kafka consumer group is rebalancing

I am using Kafka .9 and new java consumer. I am polling inside a loop. I am getting commitfailedexcption because of group rebalance, when code try to execute consumer.commitSycn . Please note, I am adding session.timeout.ms as 30000 and heartbeat.interval.ms as 10000 to consumer and polling happens for sure in 30000. Can anyone help me out. Please let me know if any information is needed.
Here is the code :-
Properties props = new Properties();
props.put("bootstrap.servers", {allthreeservers});
props.put("group.id", groupId);
props.put("key.deserializer", StringDeserializer.class.getName());
props.put("value.deserializer", ObjectSerializer.class.getName());
props.put("auto.offset.reset", erlierst);
props.put("enable.auto.commit", false);
props.put("session.timeout.ms", 30000);
props.put("heartbeat.interval.ms", 10000);
props.put("request.timeout.ms", 31000);
props.put("kafka.consumer.topic.name", topic);
props.put("max.partition.fetch.bytes", 1000);
while (true) {
Boolean isPassed = true;
try {
ConsumerRecords<Object, Object> records = consumer.poll(1000);
if (records.count() > 0) {
ConsumeEventInThread consumerEventInThread = new ConsumeEventInThread(records, consumerService);
FutureTask<Boolean> futureTask = new FutureTask<>(consumerEventInThread);
executorServiceForAsyncKafkaEventProcessing.execute(futureTask);
try {
isPassed = (Boolean) futureTask.get(Long.parseLong(props.getProperty("session.timeout.ms")) - Long.parseLong("5000"), TimeUnit.MILLISECONDS);
} catch (Exception Exception) {
logger.warn("Time out after waiting till session time out");
}
consumer.commitSync();
logger.info("Successfully committed offset for topic " + Arrays.asList(props.getProperty("kafka.consumer.topic.name")));
}else{
logger.info("Failed to process consumed messages, will not Commit and consume again");
}
}
} catch (Exception e) {
logger.error("Unhandled exception in while consuming messages " + Arrays.asList(props.getProperty("kafka.consumer.topic.name")), e);
}
}
The CommitFailedException is thrown when the commit cannot be completed because the group has been rebalanced. This is the main thing we have to be careful of when using the Java client. Since all network IO (including heartbeating) and message processing is done in the foreground, it is possible for the session timeout to expire while a batch of messages is being processed. To handle this, you have two choices.
First you can adjust the session.timeout.ms setting to ensure that the handler has enough time to finish processing messages. You can then tune max.partition.fetch.bytes to limit the amount of data returned in a single batch, though you will have to consider how many partitions are in the subscribed topics.
The second option is to do message processing in a separate thread, but you will have to manage flow control to ensure that the threads can keep up.
You can set session.timeout.ms large enough that commit failures from rebalances are rare.The only drawback to this is a longer delay before partitions can be re-assigned in the event of a hard failure.
For more info please see doc
This is working example.
----Worker code-----
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.Callable;
public class Worker implements Callable<Boolean> {
ConsumerRecord record;
public Worker(ConsumerRecord record) {
this.record = record;
}
public Boolean call() {
Map<String, Object> data = new HashMap<>();
try {
data.put("partition", record.partition());
data.put("offset", record.offset());
data.put("value", record.value());
Thread.sleep(10000);
System.out.println("Processing Thread---" + Thread.currentThread().getName() + " data: " + data);
return Boolean.TRUE;
} catch (Exception e) {
e.printStackTrace();
return Boolean.FALSE;
}
}
}
---------Execution code------------------
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.clients.consumer.OffsetAndMetadata;
import org.apache.kafka.common.TopicPartition;
import org.apache.kafka.common.serialization.StringDeserializer;
import java.util.*;
import java.util.concurrent.*;
public class AsyncConsumer {
public static void main(String[] args) {
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "test-group");
props.put("key.deserializer", StringDeserializer.class.getName());
props.put("value.deserializer", StringDeserializer.class.getName());
props.put("enable.auto.commit", false);
props.put("session.timeout.ms", 30000);
props.put("heartbeat.interval.ms", 10000);
props.put("request.timeout.ms", 31000);
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("Test1", "Test2"));
int poolSize=10;
ExecutorService es= Executors.newFixedThreadPool(poolSize);
CompletionService<Boolean> completionService=new ExecutorCompletionService<Boolean>(es);
try {
while (true) {
System.out.println("Polling................");
ConsumerRecords<String, String> records = consumer.poll(1000);
List<ConsumerRecord> recordList = new ArrayList();
for (ConsumerRecord<String, String> record : records) {
recordList.add(record);
if(recordList.size() ==poolSize){
int taskCount=poolSize;
//process it
recordList.forEach( recordTobeProcess -> completionService.submit(new Worker(recordTobeProcess)));
while(taskCount >0){
try {
Future<Boolean> futureResult = completionService.poll(1, TimeUnit.SECONDS);
if (futureResult != null) {
boolean result = futureResult.get().booleanValue();
taskCount = taskCount - 1;
}
}catch (Exception e) {
e.printStackTrace();
}
}
recordList.clear();
Map<TopicPartition,OffsetAndMetadata> commitOffset= Collections.singletonMap(new TopicPartition(record.topic(),record.partition()),
new OffsetAndMetadata(record.offset() + 1));
consumer.commitSync(commitOffset);
}
}
}
} finally {
consumer.close();
}
}
}
You need to follow some rule like:
1) You need to pass fixed number of record(for example 10) to ConsumeEventInThread.
2) Create more thread for processing instead of one thread and submit all task on completionservice.
3) poll all submitted task and verify.
4) then commit(should use parametric commitSync method instead of non parametric).