In kafka I need to consume a topic with two partitions from two consumers (partition 1 to consumer 1 and partition 2 to consumer 2) using Java.
This is my Producer Code
public class KafkaClientOperationProducer {
KafkaClientOperationConsumer kac = new KafkaClientOperationConsumer();
public void initiateProducer(ClientOperation clientOperation,
ClientOperationManager activityManager,Logger logger) throws Exception {
Properties props = new Properties();
props.put("bootstrap.servers","localhost:9092,localhost:9093,localhost:9094");
props.put("key.serializer","org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
Producer<String, ClientOperation> producer = new KafkaProducer<>(props);
try{
ProducerRecord<String, ClientOperation> record = new ProducerRecord<String, ClientOperation>(
topicName, key, clientOperation);
producer.send(record);
}
finally{
producer.flush();
producer.close();
kac.initiateConsumer(activityManager);//Calling Consumer
}
}
}
This is my Consumer code
public class KafkaClientOperationConsumer{
String topicName = "CA_Topic";
String groupName = "CA_TopicGroup";
public void initiateConsumer(ClientOperationManager activityManager) throws Exception {
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092,localhost:9093,localhost:9094");
props.put("group.id", groupName);
props.put("enable.auto.commit", "true");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
KafkaConsumer<String, ClientOperation> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList(topicName));
ConsumerRecords<String, ClientOperation> records = consumer.poll(100);
try{
for (ConsumerRecord<String, ClientOperation> record : records) {
activityManager.save(record.value());//saves data in database
}}
finally{
consumer.close();}
}
}
The above code is working fine for single consumer not for multiple consumers
The clientOperation is a object which holds data about client operation.
The partition number is three(which you can see from the code) ,When i tried to call initiateConsumer using thread i.e..(ExecutorService executor) I'm getting Duplicate values in database
Please change my code so that i can consume CA_Topic using two consumers,I can't use two JVM's due to memory problem.Thanks in advance
I guess you must use KafkaConsumer.assign method. Here a little example:
Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "ip:port");
props.put(ConsumerConfig.GROUP_ID_CONFIG, "group_id");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.ByteArrayDeserializer");
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.ByteArrayDeserializer");
final Consumer<byte[], byte[]> consumer = new KafkaConsumer<>(props);
TopicPartition topicPartition = new TopicPartition("topic", 0); // topic name and partition id to be assigned for this consumer. in other consumer configurations this value must be any value other than 0
List<TopicPartition> partitionList = new ArrayList<TopicPartition>();
partitionList.add(topicPartition);
consumer.assign(partitionList); // in this line, 0. partition assigning to this consumer
You can see detail in documentation of Kafka: https://kafka.apache.org/0110/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#assign(java.util.Collection)
Related
My problem is Kafka consumer always hangs after receiving about 10,000 messages from Kafka, when I restart Kafka Consumer it starts reading again and continues to hang at 10,000 messages. Even when I consume from all partitions of just 1 partition, Kafka Consumer does not read after 10,000 messages.
P/S: If I use KafkaSpout to read messages from Kafka, KafkaSpout stops emitting after about 30,000 messages too.
Here is my code:
Properties props = new Properties();
props.put("group.id", "Tornado");
props.put("zookeeper.connect", TwitterPropertiesLoader.getInstance().getZookeeperServer());
props.put("zookeeper.connection.timeout.ms", "200000");
props.put("auto.offset.reset", "smallest");
props.put("auto.commit.enable", "true");
props.put("auto.commit.interval.ms", "1000");
ConsumerConfig consumerConfig = new ConsumerConfig(props);
final ConsumerConnector consumer = Consumer.createJavaConsumerConnector(consumerConfig);
Map<String, Integer> topicCountMap = new HashMap<String, Integer>();
topicCountMap.put(TwitterConstant.Kafka.TWITTER_STREAMING_TOPIC, 1);
Map<String, List<KafkaStream<byte[], byte[]>>> consumerMap = consumer.createMessageStreams(topicCountMap);
List<KafkaStream<byte[], byte[]>> streams = consumerMap.get(TwitterConstant.Kafka.TWITTER_STREAMING_TOPIC);
final KafkaStream<byte[], byte[]> stream0=streams.get(0);
logger.info("Client ID=" + stream0.clientId());
for (MessageAndMetadata<byte[], byte[]> message : stream0) {
try {
String messageReceived=new String(message.message(), "UTF-8");
logger.info("partition = " + message.partition() + ", offset=" + message.offset() + " => " + messageReceived);
//consumer.commitOffsets(true);
writeMessageToDatabase(messageReceived);
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
}
Edit: this is the log file, after 10,000 messages, there is something happened like rebalancing consumer (not sure), but KafkaStream cannot continue to read message
Kafka consumer poll api not returning records to low timeout.
If I increase the timeout value in poll then records are coming.
I am not able to get understand this logic. Please help, following the code:
public ConsumerRecords<String, Map<String, String>> subscribeToQueue(String topic, QueueListener q) {
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "com.intuit.eventcollection.queue.KafkaJsonDeserializer");
props.put("group.id", "test");
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "1000");
props.put("session.timeout.ms", "30000");
props.put("auto.offset.reset", "earliest");
// Figure out where to start processing messages from
KafkaConsumer<String, Map<String, String>> kafkaConsumer = new KafkaConsumer<String, Map<String, String>>(
props);
kafkaConsumer.subscribe(Arrays.asList(topic));
ConsumerRecords<String, Map<String, String>> records = null;
// Start processing messages
try {
records = kafkaConsumer.poll(100);
Poll will return nothing if there are no new unconsumed messages published in the time period specified as the timeout to poll( timeout ).
I have been using Kafka8 and trying to move to kafka10.
We have a topic with 10 partitions and used to create a consumer group with 10 consumers as shown below.
public void run(int a_numThreads) {
Map<String, Integer> topicCountMap = new HashMap<String, Integer>();
topicCountMap.put(topic, new Integer(a_numThreads));
Map<String, List<KafkaStream<byte[], byte[]>>> consumerMap = consumer.createMessageStreams(topicCountMap);
List<KafkaStream<byte[], byte[]>> streams = consumerMap.get(topic);
// now launch all the threads
//
executor = Executors.newFixedThreadPool(a_numThreads);
// now create an object to consume the messages
//
int threadNumber = 0;
for (final KafkaStream stream : streams) {
executor.execute(new ConsumerTest(stream, threadNumber));
threadNumber++;
}
}
Here, based on number of partitions we used to pass number of threads.
But, with kafka10 consumers not sure if there anything like that. Here it doesnt return streams based on partitions.
public static void main(String[] args) {
Properties props = new Properties();
props.put("bootstrap.servers", "192.168.33.10:9092");
props.put("group.id", "group-1");
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "1000");
props.put("auto.offset.reset", "earliest");
props.put("session.timeout.ms", "30000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> kafkaConsumer = new KafkaConsumer<>(props);
kafkaConsumer.subscribe(Arrays.asList("HelloKafkaTopic"));
while (true) {
ConsumerRecords<String, String> records = kafkaConsumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
System.out.printf("offset = %d, value = %s", record.offset(), record.value());
System.out.println();
}
}
}
Thanks in Advance
The new consumer enables a simple and efficient implementation which can handle all IO from a single thread. That's quite different with the old consumer. See this blog for further details :
https://www.confluent.io/blog/tutorial-getting-started-with-the-new-apache-kafka-0-9-consumer-client/
I have implemented a high level consumer per the example page: https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
When the code runs, it only consume half of the messages produced. I have a basic 3 node zookeeper cluster and 2 kafka brokers. When I run the simple consumer code (not high level consumer), all the messages are consumed. Any ideas will be appreciated.
Consumer code
public void run() {
Map<String, Integer> topicCountMap = new HashMap<String, Integer>();
topicCountMap.put("test", 2);
Map<String, List<KafkaStream<byte[], byte[]>>> consumerMap = consumer.createMessageStreams(topicCountMap);
List<KafkaStream<byte[], byte[]>> streams = consumerMap.get("test");
executor = Executors.newFixedThreadPool(2);
int threadNumber = 0;
for (final KafkaStream stream : streams) {
executor.submit(new Consumer(stream, threadNumber));
threadNumber++;
}
}
private static ConsumerConfig createConsumerConfig() {
Properties props = new Properties();
props.put("zookeeper.connect", "zookeeper01:2181,zookeeper02:2181,zookeeper03:2181");
props.put("group.id", "Consumers");
props.put("zookeeper.session.timeout.ms", "10000");
props.put("enable.auto.commit", "true");
props.put("zookeeper.sync.time.ms", "1000");
props.put("value.deserializer","org.apache.kafka.common.serialization.StringDeserializer");
props.put("auto.commit.interval.ms", "500");
return new ConsumerConfig(props);
}
i am trying to run this KAFKA consumer code in java for a particular topic but its not receiving any message from that topic. Server in running on a different windows machine. please help me out.
{Properties props = new Properties();
props.put("bootstrap.servers", "10.100.144.157:2181");
props.put("group.id", "test");
props.put("enable.auto.commit", "false");
props.put("auto.commit.interval.ms", "1000");
props.put("session.timeout.ms", "30000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("test"));
final int minBatchSize = 200;
List<ConsumerRecord<String, String>> buffer = new ArrayList<>();
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
buffer.add(record);
}
if (buffer.size() >= minBatchSize) {
insertIntoDb(buffer);
consumer.commitSync();
buffer.clear();
}
}}