Kafka Producer Consumer API Issue - apache-kafka

I am using Kafka v0.10.0.0 and created Producer & Consumer Java code. But code is stuck on producer.send without any exception in logs.
Can anyone please help. Thank in advance.
I am using/modifying "mapr - kakfa sample program". You can look at the full code here.
https://github.com/panwars87/kafka-sample-programs
**Important: I changed the kafka-client version to 0.10.0.0 in maven dependencies and running Kafka 0.10.0.0 in my local.
public class Producer {
public static void main(String[] args) throws IOException {
// set up the producer
KafkaProducer<String, String> producer;
System.out.println("Starting Producers....");
try (InputStream props = Resources.getResource("producer.props").openStream()) {
Properties properties = new Properties();
properties.load(props);
producer = new KafkaProducer<>(properties);
System.out.println("Property loaded successfully ....");
}
try {
for (int i = 0; i < 20; i++) {
// send lots of messages
System.out.println("Sending record one by one....");
producer.send(new ProducerRecord<String, String>("fast-messages","sending message - "+i+" to fast-message."));
System.out.println(i+" message sent....");
// every so often send to a different topic
if (i % 2 == 0) {
producer.send(new ProducerRecord<String, String>("fast-messages","sending message - "+i+" to fast-message."));
producer.send(new ProducerRecord<String, String>("summary-markers","sending message - "+i+" to summary-markers."));
producer.flush();
System.out.println("Sent msg number " + i);
}
}
} catch (Throwable throwable) {
System.out.printf("%s", throwable.getStackTrace());
throwable.printStackTrace();
} finally {
producer.close();
}
}
}
public class Consumer {
public static void main(String[] args) throws IOException {
// and the consumer
KafkaConsumer<String, String> consumer;
try (InputStream props = Resources.getResource("consumer.props").openStream()) {
Properties properties = new Properties();
properties.load(props);
if (properties.getProperty("group.id") == null) {
properties.setProperty("group.id", "group-" + new Random().nextInt(100000));
}
consumer = new KafkaConsumer<>(properties);
}
consumer.subscribe(Arrays.asList("fast-messages", "summary-markers"));
int timeouts = 0;
//noinspection InfiniteLoopStatement
while (true) {
// read records with a short timeout. If we time out, we don't really care.
ConsumerRecords<String, String> records = consumer.poll(200);
if (records.count() == 0) {
timeouts++;
} else {
System.out.printf("Got %d records after %d timeouts\n", records.count(), timeouts);
timeouts = 0;
}
for (ConsumerRecord<String, String> record : records) {
switch (record.topic()) {
case "fast-messages":
System.out.println("Record value for fast-messages is :"+ record.value());
break;
case "summary-markers":
System.out.println("Record value for summary-markers is :"+ record.value());
break;
default:
throw new IllegalStateException("Shouldn't be possible to get message on topic ");
}
}
}
}
}

The code you're running is for a demo of mapR which is not Kafka. MapR claims API compatibility with Kafka 0.9, but even then mapR treats message offsets differently that does Kafka (offsets are byte offsets of messages rather than incremental offsets), etc.. The mapR implementation is also very, very different to say the least. This means that if you're lucky, a Kafka 0.9 app might just happen to run on mapR and vise versa. There is no such guarantee for other releases.

Thank you everyone for all your inputs. I resolved this by tweaking Mapr code and referring few other posts. Link for the solution api:
https://github.com/panwars87/hadoopwork/tree/master/kafka/kafka-api

Related

Kafka consumer getting stuck

we are using Kafka streams to insert into PostgreSQL since the flow is too high a direct insert is being avoided. The consumer seems to be working well but gets stuck occasionally and cant find the root cause for the same .
The consumer has been running for about 6 months and already consumed billions of records. I don't understand why its getting stuck as of late . I don't even know where to start debugging.
Below is the code for processing the records:
`private static void readFromTopic(DataSource datasource, ConsumerOptions options) {
KafkaConsumer<String, String> consumer = KafkaConsumerConfig.createConsumerGroup(options);
Producer<Long, String> producer = KafkaProducerConfig.createKafkaProducer(options);
if (options.isReadFromAnOffset()) {
// if want to assign particular offsets to consume from
// will work for only a single partition for a consumer
List<TopicPartition> tpartition = new ArrayList<TopicPartition>();
tpartition.add(new TopicPartition(options.getTopicName(), options.getPartition()));
consumer.assign(tpartition);
consumer.seek(tpartition.get(0), options.getOffset());
} else {
// use auto assign partition & offsets
consumer.subscribe(Arrays.asList(options.getTopicName()));
log.debug("subscribed to topic {}", options.getTopicName());
}
List<Payload> payloads = new ArrayList<>();
while (true) {
// timer is the time to wait for messages to be received in the broker
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(50));
if(records.count() != 0 )
log.debug("poll size is {}", records.count());
Set<TopicPartition> partitions = records.partitions();
// reading normally as per round robin and the last committed offset
for (ConsumerRecord<String, String> r : records) {
log.debug(" Parition : {} Offset : {}", r.partition(), r.offset());
try {
JSONArray arr = new JSONArray(r.value());
for (Object o : arr) {
Payload p = JsonIterator.deserialize(((JSONObject) o).toString(), Payload.class);
payloads.add(p);
}
List<Payload> steplist = new ArrayList<>();
steplist.addAll(payloads);
// Run a task specified by a Runnable Object asynchronously.
CompletableFuture<Void> future = CompletableFuture.runAsync(new Runnable() {
#Override
public void run() {
try {
Connection conn = datasource.getConnection();
PgInsert.insertIntoPg(steplist, conn, consumer, r, options.getTopicName(),
options.getErrorTopic(), producer);
} catch (Exception e) {
log.error("error in processing future {}", e);
}
}
}, executorService);
// used to combine all futures
allfutures.add(future);
payloads.clear();
} catch (Exception e) {
// pushing into new topic for records which have failed
log.debug("error in kafka consumer {}", e);
ProducerRecord<Long, String> record = new ProducerRecord<Long, String>(options.getErrorTopic(),
r.offset(), r.value());
producer.send(record);
}
}
// commiting after every poll
consumer.commitSync();
if (records.count() != 0) {
Map<TopicPartition, OffsetAndMetadata> metadata = consumer.committed(partitions);
// reading the committed offsets for each partition after polling
for (TopicPartition tpartition : partitions) {
OffsetAndMetadata offsetdata = metadata.get(tpartition);
if (offsetdata != null && tpartition != null)
log.debug("committed offset is " + offsetdata.offset() + " for topic partition "
+ tpartition.partition());
}
}
// waiting for all threads to complete after each poll
try {
waitForFuturesToEnd();
allfutures.clear();
} catch (InterruptedException e) {
e.printStackTrace();
} catch (ExecutionException e) {
e.printStackTrace();
}
}
}`
Earlier i thought reason for it getting stuck is the size of the records being consumed , so i have reduced the MAX_POLL_RECORDS_CONFIG to 10. This will ensure the records fetched in the poll wont be more than 200kb since each record can have a max size of 20kb.
Thinking of using Spring framework to resolve this issue but before that would like to know why exactly the consumer gets stuck .Any insights on this will be helpful.

How do I set in Kafka to not consume from where it left?

I have a Kafka consumer in Golang. I don't want to consume from where I left last time, but rather current message. How can I do it?
You can set enable.auto.commit to false and auto.offset.reset to latest for your consumer group id. This means kafka will not be automatically committing your offsets.
With auto commit disabled, your consumer group progress would not be saved (unless you do manually). So whenever the consumer is restarted for whatever reason, it does not find its progress saved and resets to the latest offset.
set a new group.id to your consumer.
Then use auto.offset.reset to define the behavior of this new consumer group, in you case: latest
Apache kafka consumer api provides a method called kafkaConsumer.seekToEnd() which can be used to ignore the existing messages and only consume messages published after the consumer has been started without changing the current group ID of the consumer.
Below is the implementation of the same. The program takes 3 arguments : topic name, group ID and offset range (0 to start from beginning, - 1 to receive messages after consumer has started, other than 0 or - 1 will imply to to consumer to consume from that offset)
import org.apache.kafka.clients.consumer.*;
import org.apache.kafka.common.TopicPartition;
import org.apache.kafka.common.errors.WakeupException;
import java.util.*;
public class Consumer {
private static Scanner in;
public static void main(String[] argv)throws Exception{
if (argv.length != 3) {
System.err.printf("Usage: %s <topicName> <groupId> <startingOffset>\n",
Consumer.class.getSimpleName());
System.exit(-1);
}
in = new Scanner(System.in);
String topicName = argv[0];
String groupId = argv[1];
final long startingOffset = Long.parseLong(argv[2]);
ConsumerThread consumerThread = new ConsumerThread(topicName,groupId,startingOffset);
consumerThread.start();
String line = "";
while (!line.equals("exit")) {
line = in.next();
}
consumerThread.getKafkaConsumer().wakeup();
System.out.println("Stopping consumer .....");
consumerThread.join();
}
private static class ConsumerThread extends Thread{
private String topicName;
private String groupId;
private long startingOffset;
private KafkaConsumer<String,String> kafkaConsumer;
public ConsumerThread(String topicName, String groupId, long startingOffset){
this.topicName = topicName;
this.groupId = groupId;
this.startingOffset=startingOffset;
}
public void run() {
Properties configProperties = new Properties();
configProperties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
configProperties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.ByteArrayDeserializer");
configProperties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer");
configProperties.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
configProperties.put(ConsumerConfig.CLIENT_ID_CONFIG, "offset123");
configProperties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG,false);
configProperties.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,"earliest");
//Figure out where to start processing messages from
kafkaConsumer = new KafkaConsumer<String, String>(configProperties);
kafkaConsumer.subscribe(Arrays.asList(topicName), new ConsumerRebalanceListener() {
public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
System.out.printf("%s topic-partitions are revoked from this consumer\n", Arrays.toString(partitions.toArray()));
}
public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
System.out.printf("%s topic-partitions are assigned to this consumer\n", Arrays.toString(partitions.toArray()));
Iterator<TopicPartition> topicPartitionIterator = partitions.iterator();
while(topicPartitionIterator.hasNext()){
TopicPartition topicPartition = topicPartitionIterator.next();
System.out.println("Current offset is " + kafkaConsumer.position(topicPartition) + " committed offset is ->" + kafkaConsumer.committed(topicPartition) );
if(startingOffset == -2) {
System.out.println("Leaving it alone");
}else if(startingOffset ==0){
System.out.println("Setting offset to begining");
kafkaConsumer.seekToBeginning(topicPartition);
}else if(startingOffset == -1){
System.out.println("Setting it to the end ");
kafkaConsumer.seekToEnd(topicPartition);
}else {
System.out.println("Resetting offset to " + startingOffset);
kafkaConsumer.seek(topicPartition, startingOffset);
}
}
}
});
//Start processing messages
try {
while (true) {
ConsumerRecords<String, String> records = kafkaConsumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
System.out.println(record.value());
}
if(startingOffset == -2)
kafkaConsumer.commitSync();
}
}catch(WakeupException ex){
System.out.println("Exception caught " + ex.getMessage());
}finally{
kafkaConsumer.close();
System.out.println("After closing KafkaConsumer");
}
}
public KafkaConsumer<String,String> getKafkaConsumer(){
return this.kafkaConsumer;
}
}
}

Unable to get number of messages in kafka topic

I am fairly new to kafka. I have created a sample producer and consumer in java. Using the producer, I was able to send data to a kafka topic but I am not able to get the number of records in the topic using the following consumer code.
public class ConsumerTests {
public static void main(String[] args) throws Exception {
BasicConfigurator.configure();
String topicName = "MobileData";
String groupId = "TestGroup";
Properties properties = new Properties();
properties.put("bootstrap.servers", "localhost:9092");
properties.put("group.id", groupId);
properties.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
properties.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> kafkaConsumer = new KafkaConsumer<>(properties);
kafkaConsumer.subscribe(Arrays.asList(topicName));
try {
while (true) {
ConsumerRecords<String, String> consumerRecords = consumer.poll(100);
System.out.println("Record count is " + records.count());
}
} catch (WakeupException e) {
// ignore for shutdown
} finally {
consumer.close();
}
}
}
I don't get any exception in the console but consumerRecords.count() always returns 0, even if there are messages in the topic. Please let me know, if I am missing something to get the record details.
The poll(...) call should normally be in a loop. It's always possible for the initial poll(...) to return no data (depending on the timeout) while the partition assignment is in progress. Here's an example:
try {
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
System.out.println("Record count is " + records.count());
}
} catch (WakeupException e) {
// ignore for shutdown
} finally {
consumer.close();
}
For more info see this relevant article:

Apache Kafka: Producer Not Producing All Data

I am new in kafka. My requirement is, I have two table in database source and destination. Now I want to fetch data from source table and store it into destination between these kafka will be work as a producer and consumer. I have done the code but problem is that when producer produces the data some data are missed to produce. For example if I have 100 records in source table then it's not produces all 100 records. I am using Kafka-0.10
MyProducer Config-
bootstrap.servers=192.168.1.XXX:9092,192.168.1.XXX:9093,192.168.1.XXX:9094
acks=all
retries=2
batch.size=16384
linger.ms=2
buffer.memory=33554432
key.serializer=org.apache.kafka.common.serialization.IntegerSerializer
value.serializer=org.apache.kafka.common.serialization.StringSerializer
My Producer Code:-
public void run() {
SourceDAO sourceDAO = new SourceDAO();
Source source;
int id;
try {
logger.debug("INSIDE RUN");
List<Source> listOfEmployee = sourceDAO.getAllSource();
Iterator<Source> sourceIterator = listOfEmployee.iterator();
String sourceJson;
Gson gson = new Gson();
while(sourceIterator.hasNext()) {
source = sourceIterator.next();
sourceJson = gson.toJson(source);
id = source.getId();
producerRecord = new ProducerRecord<Integer, String>(TOPIC, id, sourceJson);
producerRecords.add(producerRecord);
}
for(ProducerRecord<Integer, String> record : producerRecords) {
logger.debug("Producer Record: " + record.value());
producer.send(record, new Callback() {
#Override
public void onCompletion(RecordMetadata metadata, Exception exception) {
logger.debug("Exception: " + exception);
if (exception != null)
throw new RuntimeException(exception.getMessage());
logger.info("The offset of the record we just sent is: " + metadata.offset()
+ " In Partition : " + metadata.partition());
}
});
}
producer.close();
producer.flush();
logger.info("Size of Record: " + producerRecords.size());
} catch (SourceServiceException e) {
logger.error("Unable to Produce data...", e);
throw new RuntimeException("Unable to Produce data...", e);
}
}
My Consumer Config:-
bootstrap.servers=192.168.1.XXX:9092,192.168.1.231:XXX,192.168.1.232:XXX
group.id=consume
client.id=C1
enable.auto.commit=true
auto.commit.interval.ms=1000
max.partition.fetch.bytes=10485760
session.timeout.ms=35000
consumer.timeout.ms=35000
auto.offset.reset=earliest
message.max.bytes=10000000
key.deserializer=org.apache.kafka.common.serialization.IntegerDeserializer
value.deserializer=org.apache.kafka.common.serialization.StringDeserializer
Consumer Code:-
public void doWork() {
logger.debug("Inside doWork of DestinationConsumer");
DestinationDAO destinationDAO = new DestinationDAO();
consumer.subscribe(Collections.singletonList(this.TOPIC));
while(true) {
ConsumerRecords<String, String> consumerRecords = consumer.poll(1000);
int minBatchSize = 1;
for(ConsumerRecord<String, String> rec : consumerRecords) {
logger.debug("Consumer Recieved Record: " + rec);
consumerRecordsList.add(rec);
}
logger.debug("Record Size: " + consumerRecordsList.size());
if(consumerRecordsList.size() >= minBatchSize) {
try {
destinationDAO.insertSourceDataIntoDestination(consumerRecordsList);
} catch (DestinationServiceException e) {
logger.error("Unable to update destination table");
}
}
}
}
From what could be seens here I would guess that you did not flush or close the producer. You should note that send runs async and just prepare a batch which is send later on (depending on the configuration of your producer):
From the kafka documentation
The send() method is asynchronous. When called it adds the record to a buffer of pending record sends and immediately returns. This allows the producer to batch together individual records for efficiency.
What you should try is to call producer.close() after you iterated over all producerRecords (BTW: why are you caching the entire producerRecords that might causes problems when you have to many records).
If that does not help you should try to use a e.g. a console consumer to figure out what is missing. Please offer some more code. How is the producer configured? How does your consumer look like? What is the type of producerRecords?
Hope that helps.

Kafka consumer returns empty iterator

In my sample program i try to publish a file and try to consume that immediately. But my consumer iterator returns null.
Any idea what I'm doing wrong?
Test
**main(){**
KafkaMessageProducer producer = new KafkaMessageProducer(topic, file);
producer.generateMessgaes();
MessageListener listener = new MessageListener(topic);
listener.start();
}
MessageListener
public void start() {
Map<String, Integer> topicCountMap = new HashMap<String, Integer>();
topicCountMap.put(topic, new Integer(CoreConstants.THREAD_SIZE));
Map<String, List<KafkaStream<byte[], byte[]>>> consumerMap = consumerConnector
.createMessageStreams(topicCountMap);
List<KafkaStream<byte[], byte[]>> streams = consumerMap.get(topic);
executor = Executors.newFixedThreadPool(CoreConstants.THREAD_SIZE);
for (KafkaStream<byte[], byte[]> stream : streams) {
System.out.println("The stream is --"+ stream.iterator().makeNext().topic());
executor.submit(new ListenerThread(stream));
}
try { // without this wait the subsequent shutdown happens immediately before any messages are delivered
Thread.sleep(10000);
} catch (InterruptedException ie) {
}
if (consumerConnector != null) {
consumerConnector.shutdown();
}
if (executor != null) {
executor.shutdown();
}
}
ListenerThread
public class ListenerThread implements Runnable {
private KafkaStream<byte[], byte[]> stream;
public ListenerThread(KafkaStream<byte[], byte[]> msgStream) {
this.stream = msgStream;
System.out.println("----------" + stream.iterator().makeNext().topic());
}
public void run() {
try {
ConsumerIterator<byte[], byte[]> it = stream.iterator();
while (it.hasNext()) {
// MessageAndMetadata<byte[], byte[]> messageAndMetadata =
// it.makeNext();
// String topic = messageAndMetadata.topic();
// byte[] message = messageAndMetadata.message();
System.out.println("111111111111111111111111111");
FileProcessor processor = new FileProcessor();
processor.processFile("LOB_TOPIC", it.next().message());
}
in the above iterator it is not going inside while loop , since the iterator is null. But I'm sure I'm publishing a single message to the same topic and consumer listens to that topic.
Any help would be appreciated
I was having this same issue yesterday. After trying to work with it for a while, I couldn't get it to read from my current topic. So I took following steps
a. Stopped my consumer,
b. stopped the producer,
c. stopped the kafka server
bin/zookeeper-server-stop.sh config/zookeeper.properties
d. stopped the zookeeper
bin/zookeeper-server-stop.sh config/zookeeper.properties
After that I deleted my topic.
bin/kafka-topics.sh --delete --zookeeper localhost:2181 --topic test
I also deleted the files that was created by following the "Setting up a multi-broker cluster" but I don't think it created the issue.
a. Started the Zookeeper
b. started kafka
c. started producer and send some messages to Kafka
it started to work again. I am not sure if this will help you or not. But seems like that somehow my producer must have got disconnected from the consumer. Hope this helps.