I have implemented a high level consumer per the example page: https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
When the code runs, it only consume half of the messages produced. I have a basic 3 node zookeeper cluster and 2 kafka brokers. When I run the simple consumer code (not high level consumer), all the messages are consumed. Any ideas will be appreciated.
Consumer code
public void run() {
Map<String, Integer> topicCountMap = new HashMap<String, Integer>();
topicCountMap.put("test", 2);
Map<String, List<KafkaStream<byte[], byte[]>>> consumerMap = consumer.createMessageStreams(topicCountMap);
List<KafkaStream<byte[], byte[]>> streams = consumerMap.get("test");
executor = Executors.newFixedThreadPool(2);
int threadNumber = 0;
for (final KafkaStream stream : streams) {
executor.submit(new Consumer(stream, threadNumber));
threadNumber++;
}
}
private static ConsumerConfig createConsumerConfig() {
Properties props = new Properties();
props.put("zookeeper.connect", "zookeeper01:2181,zookeeper02:2181,zookeeper03:2181");
props.put("group.id", "Consumers");
props.put("zookeeper.session.timeout.ms", "10000");
props.put("enable.auto.commit", "true");
props.put("zookeeper.sync.time.ms", "1000");
props.put("value.deserializer","org.apache.kafka.common.serialization.StringDeserializer");
props.put("auto.commit.interval.ms", "500");
return new ConsumerConfig(props);
}
Related
In kafka I need to consume a topic with two partitions from two consumers (partition 1 to consumer 1 and partition 2 to consumer 2) using Java.
This is my Producer Code
public class KafkaClientOperationProducer {
KafkaClientOperationConsumer kac = new KafkaClientOperationConsumer();
public void initiateProducer(ClientOperation clientOperation,
ClientOperationManager activityManager,Logger logger) throws Exception {
Properties props = new Properties();
props.put("bootstrap.servers","localhost:9092,localhost:9093,localhost:9094");
props.put("key.serializer","org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
Producer<String, ClientOperation> producer = new KafkaProducer<>(props);
try{
ProducerRecord<String, ClientOperation> record = new ProducerRecord<String, ClientOperation>(
topicName, key, clientOperation);
producer.send(record);
}
finally{
producer.flush();
producer.close();
kac.initiateConsumer(activityManager);//Calling Consumer
}
}
}
This is my Consumer code
public class KafkaClientOperationConsumer{
String topicName = "CA_Topic";
String groupName = "CA_TopicGroup";
public void initiateConsumer(ClientOperationManager activityManager) throws Exception {
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092,localhost:9093,localhost:9094");
props.put("group.id", groupName);
props.put("enable.auto.commit", "true");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
KafkaConsumer<String, ClientOperation> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList(topicName));
ConsumerRecords<String, ClientOperation> records = consumer.poll(100);
try{
for (ConsumerRecord<String, ClientOperation> record : records) {
activityManager.save(record.value());//saves data in database
}}
finally{
consumer.close();}
}
}
The above code is working fine for single consumer not for multiple consumers
The clientOperation is a object which holds data about client operation.
The partition number is three(which you can see from the code) ,When i tried to call initiateConsumer using thread i.e..(ExecutorService executor) I'm getting Duplicate values in database
Please change my code so that i can consume CA_Topic using two consumers,I can't use two JVM's due to memory problem.Thanks in advance
I guess you must use KafkaConsumer.assign method. Here a little example:
Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "ip:port");
props.put(ConsumerConfig.GROUP_ID_CONFIG, "group_id");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.ByteArrayDeserializer");
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.ByteArrayDeserializer");
final Consumer<byte[], byte[]> consumer = new KafkaConsumer<>(props);
TopicPartition topicPartition = new TopicPartition("topic", 0); // topic name and partition id to be assigned for this consumer. in other consumer configurations this value must be any value other than 0
List<TopicPartition> partitionList = new ArrayList<TopicPartition>();
partitionList.add(topicPartition);
consumer.assign(partitionList); // in this line, 0. partition assigning to this consumer
You can see detail in documentation of Kafka: https://kafka.apache.org/0110/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#assign(java.util.Collection)
I have a java Kafka streams application that reads from a topic do some filtering and transformations and writing the data back to Kafka to a different topic.
I print the stream object on every step.
I noticed that if I send more than dozens of records to the input topic, some records are not consumed by my Kafka streams application.
when using kafka-console-consumer.sh to consume from the input topic, I do receive all records.
I'm running Kafka 1.0.0 with one broker and one partition topic.
Any idea why?
public static void main(String[] args) {
final String bootstrapServers = System.getenv("KAFKA");
final String inputTopic = System.getenv("INPUT_TOPIC");
final String outputTopic = System.getenv("OUTPUT_TOPIC");
final String gatewayTopic = System.getenv("GATEWAY_TOPIC");
final Properties streamsConfiguration = new Properties();
streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, "PreProcess");
streamsConfiguration.put(StreamsConfig.CLIENT_ID_CONFIG, "PreProcess-client");
streamsConfiguration.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
streamsConfiguration.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
streamsConfiguration.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
streamsConfiguration.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
streamsConfiguration.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 300L);
final StreamsBuilder builder = new StreamsBuilder();
final KStream<String, String> textLines = builder.stream(inputTopic);
textLines.print();
StreamsTransformation streamsTransformation = new StreamsTransformation(builder);
KTable<String,Gateway> gatewayKTable = builder.table(gatewayTopic, Consumed.with(Serdes.String(), SerdesUtils.getGatewaySerde()));
KStream<String, Message> gatewayIdMessageKStream = streamsTransformation.getStringMessageKStream(textLines,gatewayKTable);
gatewayIdMessageKStream.print();
KStream<String, FlatSensor> keyFlatSensorKStream = streamsTransformation.transformToKeyFlatSensorKStream(gatewayIdMessageKStream);
keyFlatSensorKStream.to(outputTopic, Produced.with(Serdes.String(), SerdesUtils.getFlatSensorSerde()));
keyFlatSensorKStream.print();
KafkaStreams streams = new KafkaStreams(builder.build(), streamsConfiguration);
streams.cleanUp();
streams.start();
// Add shutdown hook to respond to SIGTERM and gracefully close Kafka Streams
Runtime.getRuntime().addShutdownHook(new Thread(() -> {
streams.close();
}));
}
My problem is Kafka consumer always hangs after receiving about 10,000 messages from Kafka, when I restart Kafka Consumer it starts reading again and continues to hang at 10,000 messages. Even when I consume from all partitions of just 1 partition, Kafka Consumer does not read after 10,000 messages.
P/S: If I use KafkaSpout to read messages from Kafka, KafkaSpout stops emitting after about 30,000 messages too.
Here is my code:
Properties props = new Properties();
props.put("group.id", "Tornado");
props.put("zookeeper.connect", TwitterPropertiesLoader.getInstance().getZookeeperServer());
props.put("zookeeper.connection.timeout.ms", "200000");
props.put("auto.offset.reset", "smallest");
props.put("auto.commit.enable", "true");
props.put("auto.commit.interval.ms", "1000");
ConsumerConfig consumerConfig = new ConsumerConfig(props);
final ConsumerConnector consumer = Consumer.createJavaConsumerConnector(consumerConfig);
Map<String, Integer> topicCountMap = new HashMap<String, Integer>();
topicCountMap.put(TwitterConstant.Kafka.TWITTER_STREAMING_TOPIC, 1);
Map<String, List<KafkaStream<byte[], byte[]>>> consumerMap = consumer.createMessageStreams(topicCountMap);
List<KafkaStream<byte[], byte[]>> streams = consumerMap.get(TwitterConstant.Kafka.TWITTER_STREAMING_TOPIC);
final KafkaStream<byte[], byte[]> stream0=streams.get(0);
logger.info("Client ID=" + stream0.clientId());
for (MessageAndMetadata<byte[], byte[]> message : stream0) {
try {
String messageReceived=new String(message.message(), "UTF-8");
logger.info("partition = " + message.partition() + ", offset=" + message.offset() + " => " + messageReceived);
//consumer.commitOffsets(true);
writeMessageToDatabase(messageReceived);
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
}
Edit: this is the log file, after 10,000 messages, there is something happened like rebalancing consumer (not sure), but KafkaStream cannot continue to read message
I have been using Kafka8 and trying to move to kafka10.
We have a topic with 10 partitions and used to create a consumer group with 10 consumers as shown below.
public void run(int a_numThreads) {
Map<String, Integer> topicCountMap = new HashMap<String, Integer>();
topicCountMap.put(topic, new Integer(a_numThreads));
Map<String, List<KafkaStream<byte[], byte[]>>> consumerMap = consumer.createMessageStreams(topicCountMap);
List<KafkaStream<byte[], byte[]>> streams = consumerMap.get(topic);
// now launch all the threads
//
executor = Executors.newFixedThreadPool(a_numThreads);
// now create an object to consume the messages
//
int threadNumber = 0;
for (final KafkaStream stream : streams) {
executor.execute(new ConsumerTest(stream, threadNumber));
threadNumber++;
}
}
Here, based on number of partitions we used to pass number of threads.
But, with kafka10 consumers not sure if there anything like that. Here it doesnt return streams based on partitions.
public static void main(String[] args) {
Properties props = new Properties();
props.put("bootstrap.servers", "192.168.33.10:9092");
props.put("group.id", "group-1");
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "1000");
props.put("auto.offset.reset", "earliest");
props.put("session.timeout.ms", "30000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> kafkaConsumer = new KafkaConsumer<>(props);
kafkaConsumer.subscribe(Arrays.asList("HelloKafkaTopic"));
while (true) {
ConsumerRecords<String, String> records = kafkaConsumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
System.out.printf("offset = %d, value = %s", record.offset(), record.value());
System.out.println();
}
}
}
Thanks in Advance
The new consumer enables a simple and efficient implementation which can handle all IO from a single thread. That's quite different with the old consumer. See this blog for further details :
https://www.confluent.io/blog/tutorial-getting-started-with-the-new-apache-kafka-0-9-consumer-client/
i am trying to run this KAFKA consumer code in java for a particular topic but its not receiving any message from that topic. Server in running on a different windows machine. please help me out.
{Properties props = new Properties();
props.put("bootstrap.servers", "10.100.144.157:2181");
props.put("group.id", "test");
props.put("enable.auto.commit", "false");
props.put("auto.commit.interval.ms", "1000");
props.put("session.timeout.ms", "30000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("test"));
final int minBatchSize = 200;
List<ConsumerRecord<String, String>> buffer = new ArrayList<>();
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
buffer.add(record);
}
if (buffer.size() >= minBatchSize) {
insertIntoDb(buffer);
consumer.commitSync();
buffer.clear();
}
}}