Using Kafka through Observable(RxJava)

I have a producer (using Kafka), and more than one consumer. So I publish a message in a topic and then my consumers receive and process the message.
I need to receive a response in the producer from at least one consumer (better if it be the first). I'm trying to use RxJava to do it (observables).
Is it possible to do in that way? Anyone have an example?

Here is how I am using rxjava '2.2.6' without any additional dependencies to process Kafka events:
import io.reactivex.Observable;
import org.apache.kafka.clients.consumer.Consumer;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import java.time.Duration;
import java.util.Arrays;
import java.util.Properties;
// Load consumer props
Properties props = new Properties();
// Create a consumer
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
// Subscribe to topics
// Create an Observable for topic events
Observable<ConsumerRecords<String, String>> observable = Observable.fromCallable(() -> {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofSecond(10);
return records;
// Process Observable events
observable.subscribe(records -> {
if ((records != null) && (!records.isEmpty())) {
for (ConsumerRecord<String, String> record : records) {
System.out.println(record.offset() + ": " + record.value());

You can use it as follows:
val consumer = new RxConsumer("zookeeper:2181", "consumer-group")
.take(42 seconds)
For more information see:

Would be better that you'll share your solution first of all...
Since Spring Cloud Stream is a, mh, stream solution, not request/reply, there is no an example to share with you.
You can consider to make your consumers as producers as well. And in the original producer have a consumer to read from the replying topic. Finally you will have to correlate the reply data with the request one.
The RxJava or any other implementation details aren't related.


Writing <String,Long> key,value pair from kafkaTable to Kafkatopic causes corrupt values on topic

I am trying to run kafkastreams example where its counts favourite colours produced to a topic. But writing to final topic from kafka table I cannot figure out why long value is not written to the output topic.Below is my code please help me figure out what i am doing wrong.
package org.example;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.common.TopicCollection;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.*;
import org.apache.kafka.streams.kstream.*;
import java.util.Arrays;
import java.util.Collections;
import java.util.Properties;
public class FavouriteColorStreamsExample {
public static void main(String[] args) {
Properties config = new Properties();
config.setProperty(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
config.setProperty(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
// we disable the cache to demonstrate all the "steps" involved in the transformation - not recommended in prod
config.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, "0");
StreamsBuilder streamsBuilder = new StreamsBuilder();
//stream from kafka topic
Produce below messages to fav-color-input topic
KStream<String,String> textLines ="fav-color-input");
//lets filter out only blue,green,red colors inputs
KStream<String,String> userAndColours = textLines
//we filter all input and make sure correct color and
.filter((key, val) -> val.contains(","))
//we get the key as stephane, john, alice
.selectKey((key, val) -> val.split(",")[0].toLowerCase())
.mapValues(val -> val.split(",")[1].toLowerCase())
//filter with colors
.filter((user,val) -> (Arrays.asList("blue","red","green").contains(val)));"fav-color-output-temp");
//read from above intermetary topic "fav-color-output-temp" into a ktable
KTable<String, String> usersAndColoursTable = streamsBuilder.table("fav-color-output-temp");
KTable<String, Long> favouriteColour = usersAndColoursTable
.groupBy((user, color) -> new KeyValue<>(color, color))
.peek((key,value) -> System.out.println("Key : "+key+" => value : "+value));
to("fav-color-output", Produced.with(Serdes.String(),Serdes.Long()));"fav-color-output")
.peek((key,value) -> System.out.println("Key : "+key+" => value : "+value));
Topology topology =;
KafkaStreams kafkaStreams = new KafkaStreams(topology,config);
//only on dev - not in pod
//shutdown to correctly close the streams application
Runtime.getRuntime().addShutdownHook(new Thread(kafkaStreams::close));
This Line where we write from the ktable back to kafka topic for some reason does not work i guess.
to("fav-color-output", Produced.with(Serdes.String(),Serdes.Long()));```
my consumer topic ouput is just blank lines and And My Intellj show this corrupt values. screenshots below

Aug 2019 - Kafka Consumer Lag programmatically

Is there any way we can programmatically find lag in the Kafka Consumer.
I don't want external Kafka Manager tools to install and check on dashboard.
We can list all the consumer group and check for lag for each group.
Currently we do have command to check the lag and it requires the relative path where the Kafka resides.
Spring-Kafka, kafka-python, Kafka Admin client or using JMX - is there any way we can code and find out the lag.
We were careless and didn't monitor the process, the consumer was in zombie state and the lag went to 50,000 which resulted in lot of chaos.
Only when the issue arises we think of these cases as we were monitoring the script but didn't knew it will be result in zombie process.
Any thoughts are extremely welcomed!!
you can get this using kafka-python, run this on each broker or loop through list of brokers, it will give all topic partitions consumer lag.
BOOTSTRAP_SERVERS = '{}'.format(socket.gethostbyname(socket.gethostname()))
client = BrokerConnection(BOOTSTRAP_SERVERS, 9092, socket.AF_INET)
list_groups_request = ListGroupsRequest_v1()
future = client.send(list_groups_request)
while not future.is_done:
for resp, f in client.recv():
for group in future.value.groups:
if group[1] == 'consumer':
list_mebers_in_groups = DescribeGroupsRequest_v1(groups=[(group[0])])
future = client.send(list_mebers_in_groups)
while not future.is_done:
for resp, f in client.recv():
#print resp
(error_code, group_id, state, protocol_type, protocol, members) = future.value.groups[0]
if len(members) !=0:
for member in members:
(member_id, client_id, client_host, member_metadata, member_assignment) = member
member_topics_assignment = []
for (topic, partitions) in MemberAssignment.decode(member_assignment).assignment:
for topic in member_topics_assignment:
consumer = KafkaConsumer(
for p in consumer.partitions_for_topic(topic):
tp = TopicPartition(topic, p)
committed = consumer.committed(tp)
last_offset = consumer.position(tp)
if last_offset != None and committed != None:
lag = last_offset - committed
print "group: {} topic:{} partition: {} lag: {}".format(group[0], topic, p, lag)
Yes. We can get consumer lag in kafka-python. Not sure if this is best way to do it. But this works.
Currently we are giving our consumer manually, you also get consumers from kafka-python, but it gives only the list of active consumers. So if one of your consumers is down. It may not show up in the list.
First establish client connection
from kafka import BrokerConnection
from kafka.protocol.commit import *
import socket
#This takes in only one broker at a time. So to use multiple brokers loop through each one by giving broker ip and port.
def establish_broker_connection(server, port, group):
Client Connection to each broker for getting consumer offset info
bc = BrokerConnection(server, port, socket.AF_INET)
fetch_offset_request = OffsetFetchRequest_v3(group, None)
future = bc.send(fetch_offset_request)
Next we need to get the current offset for each topic the consumer is subscribed to. Pass the above future and bc here.
from kafka import SimpleClient
from kafka.protocol.offset import OffsetRequest, OffsetResetStrategy
from kafka.common import OffsetRequestPayload
def _get_client_connection():
Client Connection to the cluster for getting topic info
# Give comma seperated info of kafka broker "broker1:port1, broker2:port2'
client = SimpleClient(BOOTSTRAP_SEREVRS)
return client
def get_latest_offset_for_topic(self, topic):
To get latest offset for a topic
partitions = self.client.topic_partitions[topic]
offset_requests = [OffsetRequestPayload(topic, p, -1, 1) for p in partitions.keys()]
client = _get_client_connection()
offsets_responses = client.send_offset_request(offset_requests)
latest_offset = offsets_responses[0].offsets[0]
return latest_offset # Gives latest offset for topic
def get_current_offset_for_consumer_group(future, bc):
Get current offset info for a consumer group
while not future.is_done:
for resp, f in bc.recv():
# future.value.topics -- This will give all the topics in the form of a list.
for topic in self.future.value.topics:
latest_offset = self.get_latest_offset_for_topic(topic[0])
for partition in topic[1]:
offset_difference = latest_offset - partition[1]
offset_difference gives the difference between the last offset produced in the topic and the last offset (or message) consumed by your consumer.
If you are not getting current offset for a consumer for a topic then it means your consumer is probably down.
So you can raise alerts or send mail if the offset difference is above a threshold you want or if you get empty offsets for your consumer.
The java client exposes the lag for its consumers over JMX; in this example we have 5 partitions...
Spring Boot can publish these to micrometer.
I'm writing code in scala but use only native java API from KafkaConsumer and KafkaProducer.
You need only know the name of Consumer Group and Topics.
it's possible to avoid pre-defined topic, but then you will get Lag only for Consumer Group which exist and which state is stable not rebalance, this can be a problem for alerting.
So all that you really need to know and use are:
KafkaConsumer.commited - return latest committed offset for TopicPartition
KafkaConsumer.assign - do not use subscribe, because it causes to CG rebalance. You definitely do not want that your monitoring process to influence on the subject of monitoring.
kafkaConsumer.endOffsets - return latest produced offset
Consumer Group Lag - is a difference between the latest committed and latest produced
import java.util.{Properties, UUID}
import org.apache.kafka.clients.consumer.KafkaConsumer
import org.apache.kafka.clients.producer.KafkaProducer
import org.apache.kafka.common.TopicPartition
import org.apache.kafka.common.serialization.{StringDeserializer, StringSerializer}
import scala.collection.JavaConverters._
import scala.util.Try
case class TopicPartitionInfo(topic: String, partition: Long, currentPosition: Long, endOffset: Long) {
val lag: Long = endOffset - currentPosition
override def toString: String = s"topic=$topic,partition=$partition,currentPosition=$currentPosition,endOffset=$endOffset,lag=$lag"
case class ConsumerGroupInfo(consumerGroup: String, topicPartitionInfo: List[TopicPartitionInfo]) {
override def toString: String = s"ConsumerGroup=$consumerGroup:\n${topicPartitionInfo.mkString("\n")}"
object ConsumerLag {
def consumerGroupInfo(bootStrapServers: String, consumerGroup: String, topics: List[String]) = {
val properties = new Properties()
properties.put("bootstrap.servers", bootStrapServers)
properties.put("auto.offset.reset", "latest")
properties.put("", consumerGroup)
properties.put("key.deserializer", classOf[StringDeserializer])
properties.put("value.deserializer", classOf[StringDeserializer])
properties.put("key.serializer", classOf[StringSerializer])
properties.put("value.serializer", classOf[StringSerializer])
properties.put("", UUID.randomUUID().toString)
val kafkaProducer = new KafkaProducer[String, String](properties)
val kafkaConsumer = new KafkaConsumer[String, String](properties)
val assignment = topics
.map(topic => kafkaProducer.partitionsFor(topic).asScala)
.flatMap(partitions => => new TopicPartition(p.topic, p.partition)))
.map { case (tp, latestOffset) =>
Try(kafkaConsumer.committed(tp)).map(_.offset).getOrElse(0), // TODO Warn if Null, Null mean Consumer Group not exist
def main(args: Array[String]): Unit = {
bootStrapServers = "kafka-prod:9092",
consumerGroup = "not-exist",
topics = List("events", "anotherevents")
bootStrapServers = "kafka:9092",
consumerGroup = "consumerGroup1",
topics = List("events", "anotehr events")
if anyone is looking for consumer lag in confluent cloud here is simple script
CCLOUD_API_KEY = "{{ ccloud_apikey }}"
CCLOUD_API_SECRET = "{{ ccloud_apisecret }}"
CACERT = "/usr/local/lib/python{{ python3_version }}/site-packages/certifi/cacert.pem"
def main():
client = KafkaAdminClient(bootstrap_servers=BOOTSTRAP_SERVERS,
for group in client.list_consumer_groups():
if group[1] == 'consumer':
consumer = KafkaConsumer(
list_members_in_groups = client.list_consumer_group_offsets(group[0])
for (topic,partition) in list_members_in_groups:
tp = TopicPartition(topic, partition)
committed = consumer.committed(tp)
last_offset = consumer.position(tp)
if last_offset != None and committed != None:
lag = last_offset - committed
print("group: {} topic:{} partition: {} lag: {}".format(group[0], topic, partition, lag))

Kafka: KafkaConsumer not able to pull all records

I'm pretty new at Kafka.
For the purpose of stress testing my cluster, and building operational experience, I created two simple Java applications: one that repeatedly publishes messages to a topic (a sequence of integers) and another application that loads the entire topic (all records) and verifies that the sequence is complete. Expectation is that no messages get lost due to operations on the cluster (restart a node, replacing a node, topics partitions reconfigurations, etc).
The topic "sequence" has two partitions, and replication factor 3. The cluster is made of 3 virtual nodes (its for testing purposes, hence they are running on the same machine). The topic is configured to retain all messages ( set to -1)
I currently have two issues, that I have difficulties figuring out:
If I use bin/ --bootstrap-server kafka-test-server:9090,kafka-test-server:9091,kafka-test-server:9092 --topic sequence --from-beginning I see ALL messages (even though not ordered, as expected) loaded on console. On the other hand, if I use the consumer application that I wrote, I see different results being loaded at each cycle: - In the console output, the first line after the divisor is a call to records.partitions(), hence records are only sometimes pulled from both partitions. Why and why is the java app not behaving like bin/
When the topic gets to big, the bin/ is still able to show all messages, while the application is able to load only about 18'000 messages. I have tried playing around with
the various consumer-side configurations, with no progress. Again, the question is why is there a difference?
Thank you in advance for any hint!
Here are for ref. the two app discussed:
import java.util.Properties;
import java.util.concurrent.Future;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.producer.Callback;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;
public class SequenceProducer {
public static void main(String[] args) throws Exception {
Properties properties = new Properties();
properties.load(new FileInputStream(""));
properties.put("key.serializer", "org.apache.kafka.common.serialization.IntegerSerializer");
properties.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
properties.put("acks", "1");
properties.put("retries", "3");
properties.put("compression.type", "snappy");
properties.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, 1);
for (Integer sequence_i = 0; true; sequence_i++) {
try(Producer<Integer, String> producer = new KafkaProducer<>(properties)) {
ProducerRecord<Integer, String> record = new ProducerRecord<>("sequence", sequence_i, "Sequence number: " + String.valueOf(sequence_i));
Future<RecordMetadata> sendFuture = producer.send(record, (metadata, exception) -> {
System.out.println("Adding " + record.key() + " to partition " + metadata.partition());
if (exception != null) {
import java.time.Duration;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.Iterator;
import java.util.List;
import java.util.Properties;
import org.apache.kafka.clients.consumer.Consumer;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.TopicPartition;
public class CarthusianConsumer {
private static Properties getProperties() throws Exception {
Properties properties = new Properties();
properties.load(new FileInputStream(""));
properties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, org.apache.kafka.common.serialization.IntegerDeserializer.class);
properties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, org.apache.kafka.common.serialization.StringDeserializer.class);
properties.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, Integer.MAX_VALUE);
properties.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, 60 * 1000);
properties.put(ConsumerConfig.GROUP_ID_CONFIG, "carthusian-consumer");
properties.put(ConsumerConfig.FETCH_MAX_WAIT_MS_CONFIG, 60 * 1000);
properties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
properties.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
properties.put(ConsumerConfig.FETCH_MAX_BYTES_CONFIG, 1024 * 1024 * 1024);
return properties;
private static boolean checkConsistency(List<Integer> sequence) {
Iterator<Integer> iterator = sequence.iterator();
int control = 0;
while(iterator.hasNext()) {
int value =;
if (value != control) {
System.out.println("Gap found:");
System.out.println("\tSequence: " + value);
System.out.println("\tControl: " + control);
return false;
return true;
public static void main(String[] args) throws Exception {
// Step 1: create a base consumer object
Consumer<Integer, String> consumer = new KafkaConsumer<>(getProperties());
// Step 2: load topic configuration and build list of TopicPartitons
List<TopicPartition> topicPartitions = consumer
.map(partitionInfo -> new TopicPartition(partitionInfo.topic(), partitionInfo.partition()))
while (true) {
List<Integer> sequence = new ArrayList<>();
for (TopicPartition topicPartition : topicPartitions) {
// Step 3. specify the topic-partition to "read" from
// System.out.println("Partition specified: " + topicPartition);
// Step 4. set offset at the beginning
// Step 5. get all records from topic-partition
ConsumerRecords<Integer, String> records = consumer.poll(Duration.ofMillis(Long.MAX_VALUE));
// System.out.println("\tCount: " + records.count());
// System.out.println("\tPartitions: " + records.partitions());
records.forEach(record -> { sequence.add(record.key()); });
Thank you Mickael-Maison, here is my answer:
On producer: thanks you for the comment. I admit taking the example from the book and modifying it directly without performances considerations.
On consumer: as mentioned in the comments above, subscription was the first approach attempted which unfortunately yielded the same result described in my question: results from individual partitions, and only rarely from both partitions in the same call. I'd also love to understand the reasons for this apparently random behavior!
More on consumer: I rewind to the beginning of the topic at every cycle, because the purpose is to continuously verify that the sequence did not break (hence no messages lost). At every cycle I load all the messages and check them.
Because the single call based on topic subscription yielded an apparently random behavior (unsure when the full content of the topic is returned); I had to read off from each individual partition and join the lists of records manually before checking them - which is not what I wanted to do initially!
Are my approaches wrong?
There are a few things you should change in your clients logic.
You are creating a new producer for each record you're sending. This is terrible in terms of performance as each producer as to first bootstrap before sendign a record. Also as a single record is sent by each producer, no batching happens. Finally compression on a single record is also inexistant.
You should first create a Producer and use it to send all records, ie move the creation out of the loop, something like:
try (Producer<Integer, String> producer = new KafkaProducer<>(properties)) {
for (int sequence_i = 18310; true; sequence_i++) {
ProducerRecord<Integer, String> record = new ProducerRecord<>("sequence", sequence_i, "Sequence number: " + String.valueOf(sequence_i));
producer.send(record, (metadata, exception) -> {
System.out.println("Adding " + record.key() + " to partition " + metadata.partition());
if (exception != null) {
At every iteration of the for loop, you change the assignment and seek back to the beginning of the partition, so at best you will reconsume the same messages every time!
To begin, you should probably use the subscribe() API (like, so you don't have to fiddle with partitions. For example:
try (Consumer<Integer, String> consumer = new KafkaConsumer<>(properties)) {
while (true) {
List<Integer> sequence = new ArrayList<>();
ConsumerRecords<Integer, String> records = consumer.poll(Duration.ofSeconds(1L));
records.forEach(record -> {

Creating kafka stream API for JSON's

i am trying to write a kafka stream code for converting JSON array to JSON elements...since i am new to kafka stream can any one help me out writing the code.. like what should be there in kstream and ktable..
and my stream of input ll be in the following format
and my output must be in the form
can anyone help me out in writing the code??
If you want to use Kafka Streams, you can use a flatMap(). Something like
// using new 1.0 API
StreamsBuilder builder = new StreamsBuilder();"topic").flatMap(...).to("output-topic");
Check out the examples and docs for more details:
in Python...
from kafka import KafkaConsumer
consumer = KafkaConsumer('topicName')
for message in consumer:
specify bootstrap_servers parameter in KafkaConsumer.
For Java look cloudkarafka, really good:
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records)
System.out.printf("msg = %s\n", record.value());

Kafka Consumer not getting invoked when the kafka Producer is set to Sync

I have a requirement where there are 2 topics to be maintained 1 with synchronous approach and other with an asynchronous way.
The asynchronous works as expected invoking the consumer record, however in the synchronous approach the consumer code is not getting invoked.
Below is the code declared in the config file
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9093");
props.put(ProducerConfig.RETRIES_CONFIG, 3);
props.put(ProducerConfig.BATCH_SIZE_CONFIG, 16384);
props.put(ProducerConfig.ACKS_CONFIG, "all");
props.put(ProducerConfig.LINGER_MS_CONFIG, 1);
props.put(ProducerConfig.BUFFER_MEMORY_CONFIG, 33554432);
I have enabled autoFlush true here
#Bean( name="KafkaPayloadSyncTemplate")
public KafkaTemplate<String, KafkaPayload> KafkaPayloadSyncTemplate() {
return new KafkaTemplate<String,KafkaPayload>(producerFactory(),true);
The control stops thereafter not making any calls to the consumer after returning the recordMetadataResults object
private List<RecordMetadata> sendPayloadToKafkaTopicInSync() throws InterruptedException, ExecutionException {
final List<RecordMetadata> recordMetadataResults = new ArrayList<RecordMetadata>();
KafkaPayload kafkaPayload = constructKafkaPayload();
future = KafkaPayloadSyncTemplate.send(TestTopic, kafkaPayload);
SendResult<String, KafkaPayload> results;
results = future.get();
return recordMetadataResults;
Consumer Code
public class KafkaTestListener {
TestServiceImpl TestServiceImpl;
public final CountDownLatch countDownLatch = new CountDownLatch(1);
#KafkaListener(id="POC", topics = "TestTopic", group = "TestGroup")
public void listen(ConsumerRecord<String,KafkaPayload> record, Acknowledgment acknowledgment) {
System.out.println("Acknowledgment : " + acknowledgment);
Based on the issue, I have 2 questions
Should we manually call the listen() inside the Listener Class when its a Sync Producer. If Yes, How to do that ?
If the listener(#KafkaListener) get called automatically, what other setup/configurations do I need to add to make this working.
Thanks for the inputs in advance
You should be sure that you use consumerProps.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest"); for Consumer Properties.
Not sure what you mean about sync/async, but produce and consume are fully distinguished operations. And you can't affect consumer from your producer side. Because in between there is Kafka Broker.