Alternate for polling Kafka server - apache-kafka

Is there any alternate for Kafka server polling for consumer/client (in KAFKA 0.10.0.0)?
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("foo", "bar"));
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records)
System.out.printf("offset = %d, key = %s, value = %s", record.offset(), record.key(), record.value());
}

No. Brokers in Kafka are passive and clients need to pull data from there (a push model is not supported).
The poll loop example is recommended. See also http://docs.confluent.io/3.0.0/clients/consumer.html#java-client

Related

How to get partitionId and TopicName in KafkaStream application

How do we get topic name and partition id from KafkaStream. For any other Kafka consumer we can get topic name and partitionId like following:
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {System.out.printf("consumed: key = %s, value = %s, partition id= %s, offset = %s%n",record.key(), record.value(), record.partition(), record.offset());}
Not sure how to get the record reference in KafkaStreams.
You can get meta data of input record via the ProcessorContext that is exposed in the Processor API. You can embed the Processor API in the DSL via transform() and similar methods.
Check out the docs for details: https://docs.confluent.io/current/streams/developer-guide/processor-api.html#accessing-processor-context

Kafka Transactional read committed Consumer

I have transactional and normal Producer in application which are writting to topic kafka-topic as below.
Configuration for transactional Kafka Producer
#Bean
public Map<String, Object> producerConfigs() {
Map<String, Object> props = new HashMap<>();
// list of host:port pairs used for establishing the initial connections to the Kakfa cluster
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
props.put(ProducerConfig.RETRIES_CONFIG, 5);
/*The amount of time to wait before attempting to retry a failed request to a given topic partition.
* This avoids repeatedly sending requests in a tight loop under some failure scenarios.*/
props.put(ProducerConfig.RETRY_BACKOFF_MS_CONFIG, 3);
/*"The configuration controls the maximum amount of time the client will wait "
"for the response of a request. If the response is not received before the timeout "
"elapses the client will resend the request if necessary or fail the request if "
"retries are exhausted.";.*/
props.put(ProducerConfig.REQUEST_TIMEOUT_MS_CONFIG, 1);
/*To avoid duplicate msg*/
props.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true);
/*Will wait for ack from broker n all replicas*/
props.put(ProducerConfig.ACKS_CONFIG, "all");
/*Kafka Transactional Properties */
props.put(ProducerConfig.CLIENT_ID_CONFIG, "transactional-producer");
props.put(ProducerConfig.TRANSACTIONAL_ID_CONFIG, "test-transactional-id"); // set transaction id
return props;
}
#Bean
public KafkaProducer<String, String> kafkaProducer() {
return new KafkaProducer<>(producerConfigs());
}
Normal Producer config are same only ProducerConfig.CLIENT_ID_CONFIG and ProducerConfig.TRANSACTIONAL_ID_CONFIG are not added.
Consumer config is as below
#Bean
public Map<String, Object> consumerConfigs() {
Map<String, Object> props = new HashMap<>();
//list of host:port pairs used for establishing the initial connections to the Kafka cluster
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
//allows a pool of processes to divide the work of consuming and processing records
props.put(ConsumerConfig.GROUP_ID_CONFIG, "kafka_group");
//automatically reset the offset to the earliest offset
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
//Auto commit is set false.Will do manual commit
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
/*Kafka Transactional Property ->Controls how to read messages written transactionally
* read_committed - poll transactional messages which have been committed only
* read_uncommitted - will return all messages, even transactional messages
* default is read_uncommitted
* */
props.put(ConsumerConfig.ISOLATION_LEVEL_CONFIG, "read_committed");
return props;
}
#Bean
public ConsumerFactory<String, String> consumerFactory() {
return new DefaultKafkaConsumerFactory<>(consumerConfigs());
}
As I am setting isolation.level as read_committed so It should consumer only transactional messages from subscribed topic.
But is it consuming transactional and non-transactional messages from topic.
Do I am missing any configuration so that consumer will only consume transactional messages from subscribed topic.
Thanks in advance :-)
It doesn't work that way. isolation.level only pertains to records committed by transactional producers. All consumers see records published by non-transactional producers.
You need to use two different topics to get the behavior you desire,.

java KafkaConsumer never get results

I'm new to kafka, I have the following sample code :
KafkaConsumer<String,String> kc = new KafkaConsumer<String, String>(props);
while(true) {
List<String> topicNames = Arrays.asList(topics.split(","));
if (!kc.assignment().isEmpty()) {
kc.unsubscribe();
}
kc.subscribe(topicNames);
ConsumerRecords<String, String> recv = kc.poll(1000L);
if (!recv.isEmpty()) {
System.out.println("NOT EMPTY");
}
}
The recv is always empty but if I try to increment the pool timeout the records are returned, also if I cut off the unsubscribe part.
I've taken this piece of code from an integration proprietary software and I cannot modify it.
So my question is: Is this only a timing problem or there is more?
There is a lot that happens when a consumer (re)subscribes to a topic.
Very roughly and as far as I remember the consumer will:
request cluster information
request consumer group metadata
make a JOIN_GROUP request
be assigned certain partitions
The underlying mechanisms are even more complicated if there are more consumers within the same group. That's because the partitions should be reassigned between all the consumers within the group.
That is why:
1000 millis might not be enough for all this and you didn't poll anything in time
you polled something when you increased the timeout because Kafka managed to perform all of these bootstrapping operations
you polled something when you removed the unsubscription to the topics because most likely your consumer was already subscribed
So there is a timing issue. And I think that there is something more - un/subscribing to a topic within an infinite loop makes no sense to me (see the other answer).
You should subscribe to your topics only once at the beginning. Like this:
final KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("foo", "bar"));
while (true) {
final ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records)
System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
}

Getting the last message sent to a kafka topic

I'm new to Kafka and working on a prototype to connect a proprietary streaming service into Kafka.
I'm looking to get the key of the last message sent on a topic as our in-house stream consumer needs to logon with the ID of the last message it received when connecting.
Is it possible, using either the KafkaProducer or a KafkaConsumer to do this?
I've attempted to do the following using a Consumer, but when also running the console consumer I see messages replayed.
// Poll so we know we're connected
consumer.poll(100);
// Get the assigned partitions
Set<TopicPartition> assignedPartitions = consumer.assignment();
// Seek to the end of those partitions
consumer.seekToEnd(assignedPartitions);
for(TopicPartition partition : assignedPartitions) {
final long offset = consumer.committed(partition).offset();
// Seek to the previous message
consumer.seek(partition,offset - 1);
}
// Now get the last message
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
lastKey = record.key();
}
consumer.close();
Is this expected behaviour or am I on the wrong path?
The problem is on line final long offset = consumer.committed(partition).offset(), as link api refers committed method is to get the last committed offset for the given partition, i.e: the last offset your consumer tell kafka server that it had already read.
So, definitely you will got messages replayed, because you always read from specific offset.
As I think I only have to remove the first for block.
Check the record count and get the last message:
// Poll so we know we're connected
consumer.poll(100);
// Get the assigned partitions
Set<TopicPartition> assignedPartitions = consumer.assignment();
// Seek to the end of those partitions
consumer.seekToEnd(assignedPartitions);
for (TopicPartition partition : assignedPartitions) {
final long offset = consumer.committed(partition).offset();
// Seek to the previous message
consumer.seek(partition, offset - 1);
}
// Now get the last message
ConsumerRecords<String, String> records = consumer.poll(100);
int size = records.count();
int index = 0;
for (ConsumerRecord<String, String> record : records) {
index = index + 1;
if (index == size) {
String value = record.value();
System.out.println("Last Message = " + value);
}
}
consumer.close();

How to read data using key in Kafka Consumer API?

I'm constructing messages using below code...
Producer<String, String> producer = new kafka.javaapi.producer.Producer<String, String>(producerConfig);
KeyedMessage<String, String> keyedMsg = new KeyedMessage<String, String>(topic, "device-420", "{message:'hello world'}");
producer.send(keyedMsg);
And Consuming using following code block...
//Key = topic name, Value = No. of threads for topic
Map<String, Integer> topicCount = new HashMap<String, Integer>();
topicCount.put(topic, 1);
//ConsumerConnector creates the message stream for each topic
Map<String, List<KafkaStream<byte[], byte[]>>> consumerStreams = consumerConnector.createMessageStreams(topicCount);
// Get Kafka stream for topic
List<KafkaStream<byte[], byte[]>> kStreamList = consumerStreams.get(topic);
// Iterate stream using ConsumerIterator
for (final KafkaStream<byte[], byte[]> kStreams : kStreamList) {
ConsumerIterator<byte[], byte[]> consumerIte = kStreams.iterator();
while (consumerIte.hasNext()) {
MessageAndMetadata<byte[], byte[]> msg = consumerIte.next();
System.out.println(topic.toUpperCase() + ">"
+ " Partition:" + msg.partition()
+ " | Key:"+ new String(msg.key())
+ " | Offset:" + msg.offset()
+ " | Message:"+ new String(msg.message()));
}
}
Everything is working fine because I'm reading data topic wise. So I want to know that Is there any way to to consume data using message key i.e. device-420 in this example?
Short answer: no.
The smallest granularity in Kafka is a partition. You can write a client that reads only from a single partition. However, a partition can contain multiple keys and you need to consume all the keys contained in this partition.