Finish Flink Program After end of all Kafka Consumers - apache-kafka

I am setting up a minimal example here, where I have N streams (100 in the example below) from N Kakfa topics.
I want to finish each stream when it sees a "EndofStream" message.
When all streams are finished, I expected the Flink Program to gracefully finish.
This is true when parallelism is set to 1, but does not happen in general.
From Another question, it seems like not all threads of kafka consumer group end.
Others have suggested to throw an exception. However, the program will terminate at the first exception and would not wait for all the streams to finish.
I am also adding a minimal python program to add messages to kafka topics for reproducibility. Please fill in the <IP>:<PORT> in each program.
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
String outputPath = "file://" + System.getProperty("user.dir") + "/out/output";
Properties kafkaProps = null;
kafkaProps = new Properties();
String brokers = "<IP>:<PORT>";
kafkaProps.setProperty("bootstrap.servers", brokers);
kafkaProps.setProperty("auto.offset.reset", "earliest");
ArrayList<FlinkKafkaConsumer<String>> consumersList = new ArrayList<FlinkKafkaConsumer<String>>();
ArrayList<DataStream<String>> streamList = new ArrayList<DataStream<String>>();
for (int i = 0; i < 100; i++) {
consumersList.add(new FlinkKafkaConsumer<String>(Integer.toString(i),
new SimpleStringSchema() {
#Override
public boolean isEndOfStream(String nextElement) {
if (nextElement.contains("EndofStream")) {
// throw new RuntimeException("End of Stream");
return true;
} else {
return false;
}
}
}
, kafkaProps));
consumersList.get(i).setStartFromEarliest();
streamList.add(env.addSource(consumersList.get(i)));
streamList.get(i).writeAsText(outputPath + Integer.toString(i), WriteMode.OVERWRITE);
}
// execute program
env.execute("Flink Streaming Java API Skeleton");
Python 3 Program
from kafka import KafkaProducer
producer = KafkaProducer(bootstrap_servers='<IP>:<PORT>')
for i in range(100): # Channel Number
for j in range(100): # Message Number
message = "Message: " + str(j) + " going on channel: " + str(i)
producer.send(str(i), str.encode(message))
message = "EndofStream on channel: " + str(i)
producer.send(str(i), str.encode(message))
producer.flush()
Changing this line: streamList.add(env.addSource(consumersList.get(i))); to streamList.add(env.addSource(consumersList.get(i)).setParallelism(1)); also does the job, but then Flink places all the consumers to the same physical machine.
I would like the consumers to be distributed as well.
flink-conf.yaml
parallelism.default: 2
cluster.evenly-spread-out-slots: true
Last resort to write each topic in separate file and use file as a source instead of kafka consumer.
The end-goal is to test how much time flink takes to process certain workloads for certain programs.

Use the cancel method from FlinkKafkaConsumerBase which is the parent class of FlinkKafkaConsumer.
public void cancel() Description copied from interface: SourceFunction
Cancels the source. Most sources will have a while loop inside the
SourceFunction.run(SourceContext) method. The implementation needs to
ensure that the source will break out of that loop after this method
is called. A typical pattern is to have an "volatile boolean
isRunning" flag that is set to false in this method. That flag is
checked in the loop condition.
When a source is canceled, the executing thread will also be
interrupted (via Thread.interrupt()). The interruption happens
strictly after this method has been called, so any interruption
handler can rely on the fact that this method has completed. It is
good practice to make any flags altered by this method "volatile", in
order to guarantee the visibility of the effects of this method to any
interruption handler.
Specified by: cancel in interface SourceFunction
You are right. It is necessary to use the SimpleStringSchema. This was based on this answer https://stackoverflow.com/a/44247452/2096986. Take a look on this example. First I sent the string Flink code we saw also works in a cluster and the Kafka consumer consumes the message. Then I send SHUTDOWNDDDDDDD which also has no effect to finish the stream. Finally, I sent SHUTDOWN and the stream job is finalized. See the logs below the program.
package org.sense.flink.examples.stream.kafka;
import java.util.Properties;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer;
public class KafkaConsumerQuery {
public KafkaConsumerQuery() throws Exception {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "localhost:9092");
properties.setProperty("group.id", "test");
FlinkKafkaConsumer myConsumer = new FlinkKafkaConsumer(java.util.regex.Pattern.compile("test"),
new MySimpleStringSchema(), properties);
DataStream<String> stream = env.addSource(myConsumer);
stream.print();
System.out.println("Execution plan >>>\n" + env.getExecutionPlan());
env.execute(KafkaConsumerQuery.class.getSimpleName());
}
private static class MySimpleStringSchema extends SimpleStringSchema {
private static final long serialVersionUID = 1L;
private final String SHUTDOWN = "SHUTDOWN";
#Override
public String deserialize(byte[] message) {
return super.deserialize(message);
}
#Override
public boolean isEndOfStream(String nextElement) {
if (SHUTDOWN.equalsIgnoreCase(nextElement)) {
return true;
}
return super.isEndOfStream(nextElement);
}
}
public static void main(String[] args) throws Exception {
new KafkaConsumerQuery();
}
}
Logs:
2020-07-02 16:39:59,025 INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator - [Consumer clientId=consumer-8, groupId=test] Discovered group coordinator localhost:9092 (id: 2147483647 rack: null)
3> Flink code we saw also works in a cluster. To run this code in a cluster
3> SHUTDOWNDDDDDDD
2020-07-02 16:40:27,973 INFO org.apache.flink.runtime.taskmanager.Task - Source: Custom Source -> Sink: Print to Std. Out (3/4) (5f47c2b3f55c5eb558484d49fb1fcf0e) switched from RUNNING to FINISHED.
2020-07-02 16:40:27,973 INFO org.apache.flink.runtime.taskmanager.Task - Freeing task resources for Source: Custom Source -> Sink: Print to Std. Out (3/4) (5f47c2b3f55c5eb558484d49fb1fcf0e).
2020-07-02 16:40:27,974 INFO org.apache.flink.runtime.taskmanager.Task - Ensuring all FileSystem streams are closed for task Source: Custom Source -> Sink: Print to Std. Out (3/4) (5f47c2b3f55c5eb558484d49fb1fcf0e) [FINISHED]
2020-07-02 16:40:27,975 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and sending final execution state FINISHED to JobManager for task Source: Custom Source -> Sink: Print to Std. Out (3/4) 5f47c2b3f55c5eb558484d49fb1fcf0e.
2020-07-02 16:40:27,979 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Custom Source -> Sink: Print to Std. Out (3/4) (5f47c2b3f55c5eb558484d49fb1fcf0e) switched from RUNNING to FINISHED.

Related

Kafka Consumer Poll runs indefinitely and doesn't return anything

I am facing difficulty with KafkaConsumer.poll(duration timeout), wherein it runs indefinitely and never come out of the method. Understand that this could be related to connection and I have seen it a bit inconsistent sometimes. How do I handle this should poll stops responding? Given below is the snippet from KafkaConsumer.poll()
public ConsumerRecords<K, V> poll(final Duration timeout) {
return poll(time.timer(timeout), true);
}
and I am calling the above from here :
Duration timeout = Duration.ofSeconds(30);
while (true) {
final ConsumerRecords<recordID, topicName> records = consumer.poll(timeout);
System.out.println("record count is" + records.count());
}
I am getting the below error:
org.apache.kafka.common.errors.SerializationException: Error
deserializing key/value for partition at offset 2. If
needed, please seek past the record to continue consumption.
I stumbled upon some useful information while trying to fix the problem I was facing above. I will provide the piece of code which should be able to handle this, but before that it is important to know what causes this.
While producing or consuming message or data to Apache Kafka, we need schema structure to that message or data, in my case Avro schema. If there is a conflict of message being produced to Kafka that conflict with that message schema, it will have an effect on consumption.
Add below code in your consumer topic in the method where it consume records --
do remember to import below packages:
import org.apache.kafka.common.TopicPartition;
import org.jsoup.SerializationException;
try {
while (true) {
ConsumerRecords<String, GenericRecord> records = null;
try {
records = consumer.poll(10000);
} catch (SerializationException e) {
String s = e.getMessage().split("Error deserializing key/value
for partition ")[1].split(". If needed, please seek past the record to
continue consumption.")[0];
String topics = s.split("-")[0];
int offset = Integer.valueOf(s.split("offset ")[1]);
int partition = Integer.valueOf(s.split("-")[1].split(" at") .
[0]);
TopicPartition topicPartition = new TopicPartition(topics,
partition);
//log.info("Skipping " + topic + "-" + partition + " offset "
+ offset);
consumer.seek(topicPartition, offset + 1);
}
for (ConsumerRecord<String, GenericRecord> record : records) {
System.out.printf("value = %s \n", record.value());
}
}
} finally {
consumer.close();
}
I ran into this while setting up a test environment.
Running the following command on the broker printed out the stored records as one would expect:
bin/kafka-console-consumer.sh --bootstrap-server="localhost:9092" --topic="foo" --from-beginning
It turned out that the Kafka server was misconfigured. To connect from an external
IP address listeners must have a valid value in kafka/config/server.properties, e.g.
# The address the socket server listens on. It will get the value returned from
# java.net.InetAddress.getCanonicalHostName() if not configured.
# FORMAT:
# listeners = listener_name://host_name:port
# EXAMPLE:
# listeners = PLAINTEXT://your.host.name:9092
listeners=PLAINTEXT://:9092

Python librdkafka producer perform against the native Apache Kafka Producer

I am testing Apache Kafka Producer with native java implementation against Python's confluent-kafka to see which has the maximum throughput.
I am deploying a Kafka cluster with 3 Kafka brokers and 3 zookeeper instances using docker-compose. My docker compose file: https://paste.fedoraproject.org/paste/bn7rr2~YRuIihZ06O3Q6vw/raw
It's a very simple code with mostly default options for Python confluent-kafka and some config changes in java producer to match that of confluent-kafka.
Python Code:
from confluent_kafka import Producer
producer = Producer({'bootstrap.servers': 'kafka-1:19092,kafka-2:29092,kafka-3:39092', 'linger.ms': 300, "max.in.flight.requests.per.connection": 1000000, "queue.buffering.max.kbytes": 1048576, "message.max.bytes": 1000000,
'default.topic.config': {'acks': "all"}})
ss = '0123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357'
def f():
import time
start = time.time()
for i in xrange(1000000):
try:
producer.produce('test-topic', ss)
except Exception:
producer.poll(1)
try:
producer.produce('test-topic', ss)
except Exception:
producer.flush(30)
producer.produce('test-topic', ss)
producer.poll(0)
producer.flush(30)
print(time.time() - start)
if __name__ == '__main__':
f()
Java implementation. Configuration same as config in librdkafka. Changed the linger.ms and callback as suggested by Edenhill.
package com.amit.kafka;
import org.apache.kafka.clients.producer.*;
import org.apache.kafka.common.serialization.LongSerializer;
import org.apache.kafka.common.serialization.StringSerializer;
import java.nio.charset.Charset;
import java.util.Properties;
import java.util.concurrent.TimeUnit;
public class KafkaProducerExampleAsync {
private final static String TOPIC = "test-topic";
private final static String BOOTSTRAP_SERVERS = "kafka-1:19092,kafka-2:29092,kafka-3:39092";
private static Producer<String, String> createProducer() {
int bufferMemory = 67108864;
int batchSizeBytes = 1000000;
String acks = "all";
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, BOOTSTRAP_SERVERS);
props.put(ProducerConfig.CLIENT_ID_CONFIG, "KafkaExampleProducer");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, LongSerializer.class.getName());
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
props.put(ProducerConfig.BATCH_SIZE_CONFIG, batchSizeBytes);
props.put(ProducerConfig.LINGER_MS_CONFIG, 100);
props.put(ProducerConfig.BUFFER_MEMORY_CONFIG, bufferMemory);
props.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, 1000000);
props.put(ProducerConfig.ACKS_CONFIG, acks);
return new KafkaProducer<>(props);
}
static void runProducer(final int sendMessageCount) throws InterruptedException {
final Producer<String, String> producer = createProducer();
final String msg = "0123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357";
final ProducerRecord<String, String> record = new ProducerRecord<>(TOPIC, msg);
final long[] new_time = new long[1];
try {
for (long index = 0; index < sendMessageCount; index++) {
producer.send(record, new Callback() {
public void onCompletion(RecordMetadata metadata, Exception e) {
// This if-else is to only start timing this when first message reach kafka
if(e != null) {
e.printStackTrace();
} else {
if (new_time[0] == 0) {
new_time[0] = System.currentTimeMillis();
}
}
}
});
}
} finally {
// producer.flush();
producer.close();
System.out.printf("Total time %d ms\n", System.currentTimeMillis() - new_time[0]);
}
}
public static void main(String... args) throws Exception {
if (args.length == 0) {
runProducer(1000000);
} else {
runProducer(Integer.parseInt(args[0]));
}
}
}
Benchmark results(Edited after making changes recommended by Edenhill)
Acks = 0, Messages: 1000000
Java: 12.066
Python: 9.608 seconds
Acks: all, Messages: 1000000
Java: 45.763 11.917 seconds
Python: 14.3029 seconds
Java implementation is performing same as Python implementation even after making all the changes that I could think of and the ones suggested by Edenhill in the comment below.
There are various benchmarks about the performance of Kafka in Python but I couldn't find any comparing librdkafka or python Kafka against Apache Kafka.
I have two questions:
Is this test enough to come to the conclusion that with default config's and message of size 1Kb librdkafka is faster?
Does anyone have experience or a source(blog, doc etc) benchmarking librdkafka against confluent-kafka?
Python client uses librdkakfa which overrides some of the default configuration of Kafka.
Paramter = Kafka default
batch.size = 16384
max.in.flight.requests.per.connection = 5 (librdkafka's default is 1000000)
message.max.bytes in librdkafka may be equivalent to max.request.size.
I think there is no equivalent of librdKafka's queue.buffering.max.messages in Kafka's producer API. If you find something then correct me.
Also, remove buffer.memory parameter from Java program.
https://kafka.apache.org/documentation/#producerconfigs
https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md
Next thing is Java takes some time to load classes. So you need to increase the number of messages your producers producer. It would be great if it takes at-least 20-30 minutes to produce all messages. Then you can compare Java client with Python client.
I like the idea of comparison between python and java client. Keep posting your results on stackoverflow.

Kafka streams exactly once delivery

My goal is to consume from topic A, do some processing and produce to topic B, as a single atomic action. To achieve this I see two options:
Use a spring-kafka #Kafkalistener and a KafkaTemplate as described here.
Use Streams eos (exactly-once) functionality.
I have successfully verified option #1. By successfully, I mean that if my processing fails (IllegalArgumentException is thrown) the consumed message from topic A keeps being consumed by the KafkaListener. This is what I expect, as the offset is not committed and DefaultAfterRollbackProcessor is used.
I am expecting to see the same behaviour if instead of a KafkaListener I use a stream for consuming from topic A, processing and sending to topic B (option #2). But even though while I process an IllegalArgumentException is thrown the message is only consumed once by the stream. Is this the expected behaviour?
In the Streams case the only configuration I have is the following:
#Configuration
#EnableKafkaStreams
public class KafkaStreamsConfiguration {
#Bean(name = KafkaStreamsDefaultConfiguration.DEFAULT_STREAMS_CONFIG_BEAN_NAME)
public StreamsConfig kStreamsConfigs() {
Map<String, Object> props = new HashMap<>();
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "http://localhost:9092");
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "calculate-tax-sender-invoice-stream");
props.put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, "http://localhost:8082");
// this should be enough to enable transactions
props.put(StreamsConfig.PROCESSING_GUARANTEE_CONFIG, StreamsConfig.EXACTLY_ONCE);
return new StreamsConfig(props);
}
}
//required to create and start a new KafkaStreams, as when an exception is thrown the stream dies
// see here: https://docs.spring.io/spring-kafka/reference/html/_reference.html#after-rollback
#Bean(name = KafkaStreamsDefaultConfiguration.DEFAULT_STREAMS_BUILDER_BEAN_NAME)
public StreamsBuilderFactoryBean myKStreamBuilder(StreamsConfig streamsConfig) {
StreamsBuilderFactoryBean streamsBuilderFactoryBean = new StreamsBuilderFactoryBean(streamsConfig);
streamsBuilderFactoryBean.setUncaughtExceptionHandler(new Thread.UncaughtExceptionHandler() {
#Override
public void uncaughtException(Thread t, Throwable e) {
log.debug("StopStartStreamsUncaughtExceptionHandler caught exception {}, stopping StreamsThread ..", e);
streamsBuilderFactoryBean.stop();
log.debug("creating and starting a new StreamsThread ..");
streamsBuilderFactoryBean.start();
}
});
return streamsBuilderFactoryBean;
}
My Stream is like this:
#Autowired
public SpecificAvroSerde<InvoiceEvents> eventSerde;
#Autowired
private TaxService taxService;
#Bean
public KStream<String, InvoiceEvents> kStream(StreamsBuilder builder) {
KStream<String, InvoiceEvents> kStream = builder.stream("A",
Consumed.with(Serdes.String(), eventSerde));
kStream
.mapValues(v ->
{
// get tax from possibly remote service
// an IllegalArgumentException("Tax calculation failed") is thrown by getTaxForInvoice()
int tax = taxService.getTaxForInvoice(v);
// create a TaxCalculated event
InvoiceEvents taxCalculatedEvent = InvoiceEvents.newBuilder().setType(InvoiceEvent.TaxCalculated).setTax(tax).build();
log.debug("Generating TaxCalculated event: {}", taxCalculatedEvent);
return taxCalculatedEvent;
})
.to("B", Produced.with(Serdes.String(), eventSerde));
return kStream;
}
The happy path streams scenario works: if no exception is thrown while processing, message appears properly in topic B.
My unit test:
#Test
public void calculateTaxForInvoiceTaxCalculationFailed() throws Exception {
log.debug("running test calculateTaxForInvoiceTaxCalculationFailed..");
Mockito.when(taxService.getTaxForInvoice(any(InvoiceEvents.class)))
.thenThrow(new IllegalArgumentException("Tax calculation failed"));
InvoiceEvents invoiceCreatedEvent = createInvoiceCreatedEvent();
List<KeyValue<String, InvoiceEvents>> inputEvents = Arrays.asList(
new KeyValue<String, InvoiceEvents>("A", invoiceCreatedEvent));
Properties producerConfig = new Properties();
producerConfig.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "http://localhost:9092");
producerConfig.put(ProducerConfig.ACKS_CONFIG, "all");
producerConfig.put(ProducerConfig.RETRIES_CONFIG, 1);
producerConfig.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
producerConfig.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, KafkaAvroSerializer.class.getName());
producerConfig.put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, "http://localhost:8082");
producerConfig.put(ProducerConfig.CLIENT_ID_CONFIG, "unit-test-producer");
// produce with key
IntegrationTestUtils.produceKeyValuesSynchronously("A", inputEvents, producerConfig);
// wait for 30 seconds - I should observe re-consumptions of invoiceCreatedEvent, but I do not
Thread.sleep(30000);
// ...
}
Update:
In my unit test I sent 50 invoiceEvents (orderId=1,...,50), I process them and sent them to a destination topic.
In my logs the behaviour I see is as follows:
invoiceEvent.orderId = 43 → consumed and successfully processed
invoiceEvent.orderId = 44 → consumed and IlleagalArgumentException thrown
..new stream starts..
invoiceEvent.orderId = 44 → consumed and successfully processed
invoiceEvent.orderId = 45 → consumed and successfully processed
invoiceEvent.orderId = 46 → consumed and successfully processed
invoiceEvent.orderId = 47 → consumed and successfully processed
invoiceEvent.orderId = 48 → consumed and successfully processed
invoiceEvent.orderId = 49 → consumed and successfully processed
invoiceEvent.orderId = 50 → consumed and IlleagalArgumentException thrown
...
[29-0_0-producer] task [0_0] Error sending record (key A value {"type": ..., "payload": {**"id": "46"**, ... }}} timestamp 1529583666036) to topic invoice-with-tax.t due to {}; No more records will be sent and no more offsets will be recorded for this task.
..new stream starts..
invoiceEvent.**orderId = 46** → consumed and successfully processed
invoiceEvent.orderId = 47 → consumed and successfully processed
invoiceEvent.orderId = 48 → consumed and successfully processed
invoiceEvent.orderId = 49 → consumed and successfully processed
invoiceEvent.orderId = 50 → consumed and successfully processed
Why after the 2nd failure, it re-consumes from invoiceEvent.orderId = 46?
The key points to have option 2 (Streams Transactions) working are:
Assign a Thread.UncaughtExceptionHandler() so that you start a new StreamThread in case of any uncaught exception (by default the StreamThread dies - see code snippet that follows). This can even happen if the production to Kafka broker fails, it does not have to be related to your business logic code in the stream.
Consider setting a policy for handling de-serailization of messages (when you consume). Check DEFAULT_DESERIALIZATION_EXCEPTION_HANDLER_CLASS_CONFIG (javadoc). For example, should you ignore and consume next message or stop consuming from the relevant Kafka partition.
In the case of Streams, even if you set MAX_POLL_RECORDS_CONFIG=1 (one record per poll/batch), still consumed offsets and produced messages are not committed per message. This case leads to cases as the one described in the question (see "Why after the 2nd failure, it re-consumes from invoiceEvent.orderId = 46?").
Kafka transactions simply do not work on Windows yet. The fix will be delivered in Kafka 1.1.1 (https://issues.apache.org/jira/browse/KAFKA-6052).
Consider checking how you handle serialisation exceptions (or in general exceptions during production) (here and here)
#Configuration
#EnableKafkaStreams
public class KafkaStreamsConfiguration {
#Bean(name = KafkaStreamsDefaultConfiguration.DEFAULT_STREAMS_CONFIG_BEAN_NAME)
public StreamsConfig kStreamsConfigs() {
Map<String, Object> props = new HashMap<>();
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "http://localhost:9092");
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "blabla");
props.put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, "http://localhost:8082");
// this should be enough to enable transactions
props.put(StreamsConfig.PROCESSING_GUARANTEE_CONFIG, StreamsConfig.EXACTLY_ONCE);
return new StreamsConfig(props);
}
}
#Bean(name = KafkaStreamsDefaultConfiguration.DEFAULT_STREAMS_BUILDER_BEAN_NAME)
public StreamsBuilderFactoryBean myKStreamBuilder(StreamsConfig streamsConfig)
{
StreamsBuilderFactoryBean streamsBuilderFactoryBean = new StreamsBuilderFactoryBean(streamsConfig);
streamsBuilderFactoryBean.setUncaughtExceptionHandler(new Thread.UncaughtExceptionHandler() {
#Override
public void uncaughtException(Thread t, Throwable e) {
log.debug("StopStartStreamsUncaughtExceptionHandler caught exception {}, stopping StreamsThread ..", e);
streamsBuilderFactoryBean.stop();
log.debug("creating and starting a new StreamsThread ..");
streamsBuilderFactoryBean.start();
}
});
return streamsBuilderFactoryBean;
}

How to check if Kafka Consumer is ready

I have Kafka commit policy set to latest and missing first few messages. If I give a sleep of 20 seconds before starting to send the messages to the input topic, everything is working as desired. I am not sure if the problem is with consumer taking long time for partition rebalancing. Is there a way to know if the consumer is ready before starting to poll ?
You can use consumer.assignment(), it will return set of partitions and verify whether all of the partitions are assigned which are available for that topic.
If you are using spring-kafka project, you can include spring-kafka-test dependancy and use below method to wait for topic assignment , but you need to have container.
ContainerTestUtils.waitForAssignment(Object container, int partitions);
You can do the following:
I have a test that reads data from kafka topic.
So you can't use KafkaConsumer in multithread environment, but you can pass parameter "AtomicReference assignment", update it in consumer-thread, and read it in another thread.
For example, snipped of working code in project for testing:
private void readAvro(String readFromKafka,
AtomicBoolean needStop,
List<Event> events,
String bootstrapServers,
int readTimeout) {
// print the topic name
AtomicReference<Set<TopicPartition>> assignment = new AtomicReference<>();
new Thread(() -> readAvro(bootstrapServers, readFromKafka, needStop, events, readTimeout, assignment)).start();
long startTime = System.currentTimeMillis();
long maxWaitingTime = 30_000;
for (long time = System.currentTimeMillis(); System.currentTimeMillis() - time < maxWaitingTime;) {
Set<TopicPartition> assignments = Optional.ofNullable(assignment.get()).orElse(new HashSet<>());
System.out.println("[!kafka-consumer!] Assignments [" + assignments.size() + "]: "
+ assignments.stream().map(v -> String.valueOf(v.partition())).collect(Collectors.joining(",")));
if (assignments.size() > 0) {
break;
}
try {
Thread.sleep(1_000);
} catch (InterruptedException e) {
e.printStackTrace();
needStop.set(true);
break;
}
}
System.out.println("Subscribed! Wait summary: " + (System.currentTimeMillis() - startTime));
}
private void readAvro(String bootstrapServers,
String readFromKafka,
AtomicBoolean needStop,
List<Event> events,
int readTimeout,
AtomicReference<Set<TopicPartition>> assignment) {
KafkaConsumer<String, byte[]> consumer = (KafkaConsumer<String, byte[]>) queueKafkaConsumer(bootstrapServers, "latest");
System.out.println("Subscribed to topic: " + readFromKafka);
consumer.subscribe(Collections.singletonList(readFromKafka));
long started = System.currentTimeMillis();
while (!needStop.get()) {
assignment.set(consumer.assignment());
ConsumerRecords<String, byte[]> records = consumer.poll(1_000);
events.addAll(CommonUtils4Tst.readEvents(records));
if (readTimeout == -1) {
if (events.size() > 0) {
break;
}
} else if (System.currentTimeMillis() - started > readTimeout) {
break;
}
}
needStop.set(true);
synchronized (MainTest.class) {
MainTest.class.notifyAll();
}
consumer.close();
}
P.S.
needStop - global flag, to stop all running thread if any in case of failure of success
events - list of object, that i want to check
readTimeout - how much time we will wait until read all data, if readTimeout == -1, then stop when we read anything
Thanks to Alexey (I have also voted up), I seemed to have resolved my issue essentially following the same idea.
Just want to share my experience... in our case we using Kafka in request & response way, somewhat like RPC. Request is being sent on one topic and then waiting for response on another topic. Running into a similar issue i.e. missing out first response.
I have tried ... KafkaConsumer.assignment(); repeatedly (with Thread.sleep(100);) but doesn't seem to help. Adding a KafkaConsumer.poll(50); seems to have primed the consumer (group) and receiving the first response too. Tested few times and it consistently working now.
BTW, testing requires stopping application & deleting Kafka topics and, for a good measure, restarted Kafka too.
PS: Just calling poll(50); without assignment(); fetching logic, like Alexey mentioned, may not guarantee that consumer (group) is ready.
You can modify an AlwaysSeekToEndListener (listens only to new messages) to include a callback:
public class AlwaysSeekToEndListener<K, V> implements ConsumerRebalanceListener {
private final Consumer<K, V> consumer;
private Runnable callback;
public AlwaysSeekToEndListener(Consumer<K, V> consumer) {
this.consumer = consumer;
}
public AlwaysSeekToEndListener(Consumer<K, V> consumer, Runnable callback) {
this.consumer = consumer;
this.callback = callback;
}
#Override
public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
}
#Override
public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
consumer.seekToEnd(partitions);
if (callback != null) {
callback.run();
}
}
}
and subscribe with a latch callback:
CountDownLatch initLatch = new CountDownLatch(1);
consumer.subscribe(singletonList(topic), new AlwaysSeekToEndListener<>(consumer, () -> initLatch.countDown()));
initLatch.await(); // blocks until consumer is ready and listening
then proceed to start your producer.
If your policy is set to latest - which takes effect if there are no previously committed offsets - but you have no previously committed offsets, then you should not worry about 'missing' messages, because you're telling Kafka not to care about messages that were sent 'previously' to your consumers being ready.
If you care about 'previous' messages, you should set the policy to earliest.
In any case, whatever the policy, the behaviour you are seeing is transient, i.e. once committed offsets are saved in Kafka, on every restart the consumers will pick up where they left previoulsy
I needed to know if a kafka consumer was ready before doing some testing, so i tried with consumer.assignment(), but it only returned the set of partitions assigned, but there was a problem, with this i cannot see if this partitions assigned to the group had offset setted, so later when i tried to use the consumer it didn´t have offset setted correctly.
The solutions was to use committed(), this will give you the last commited offsets of the given partitions that you put in the arguments.
So you can do something like: consumer.committed(consumer.assignment())
If there is no partitions assigned yet it will return:
{}
If there is partitions assigned, but no offset yet:
{name.of.topic-0=null, name.of.topic-1=null}
But if there is partitions and offset:
{name.of.topic-0=OffsetAndMetadata{offset=5197881, leaderEpoch=null, metadata=''}, name.of.topic-1=OffsetAndMetadata{offset=5198832, leaderEpoch=null, metadata=''}}
With this information you can use something like:
consumer.committed(consumer.assignment()).isEmpty();
consumer.committed(consumer.assignment()).containsValue(null);
And with this information you can be sure that the kafka consumer is ready.

Kafka Consumer not getting invoked when the kafka Producer is set to Sync

I have a requirement where there are 2 topics to be maintained 1 with synchronous approach and other with an asynchronous way.
The asynchronous works as expected invoking the consumer record, however in the synchronous approach the consumer code is not getting invoked.
Below is the code declared in the config file
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9093");
props.put(ProducerConfig.RETRIES_CONFIG, 3);
props.put(ProducerConfig.BATCH_SIZE_CONFIG, 16384);
props.put(ProducerConfig.ACKS_CONFIG, "all");
props.put(ProducerConfig.LINGER_MS_CONFIG, 1);
props.put(ProducerConfig.BUFFER_MEMORY_CONFIG, 33554432);
I have enabled autoFlush true here
#Bean( name="KafkaPayloadSyncTemplate")
public KafkaTemplate<String, KafkaPayload> KafkaPayloadSyncTemplate() {
return new KafkaTemplate<String,KafkaPayload>(producerFactory(),true);
}
The control stops thereafter not making any calls to the consumer after returning the recordMetadataResults object
private List<RecordMetadata> sendPayloadToKafkaTopicInSync() throws InterruptedException, ExecutionException {
final List<RecordMetadata> recordMetadataResults = new ArrayList<RecordMetadata>();
KafkaPayload kafkaPayload = constructKafkaPayload();
ListenableFuture<SendResult<String,KafkaPayload>>
future = KafkaPayloadSyncTemplate.send(TestTopic, kafkaPayload);
SendResult<String, KafkaPayload> results;
results = future.get();
recordMetadataResults.add(results.getRecordMetadata());
return recordMetadataResults;
}
Consumer Code
public class KafkaTestListener {
#Autowired
TestServiceImpl TestServiceImpl;
public final CountDownLatch countDownLatch = new CountDownLatch(1);
#KafkaListener(id="POC", topics = "TestTopic", group = "TestGroup")
public void listen(ConsumerRecord<String,KafkaPayload> record, Acknowledgment acknowledgment) {
countDownLatch.countDown();
TestServiceImpl.consumeKafkaMessage(record);
System.out.println("Acknowledgment : " + acknowledgment);
acknowledgment.acknowledge();
}
}
Based on the issue, I have 2 questions
Should we manually call the listen() inside the Listener Class when its a Sync Producer. If Yes, How to do that ?
If the listener(#KafkaListener) get called automatically, what other setup/configurations do I need to add to make this working.
Thanks for the inputs in advance
-Srikant
You should be sure that you use consumerProps.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest"); for Consumer Properties.
Not sure what you mean about sync/async, but produce and consume are fully distinguished operations. And you can't affect consumer from your producer side. Because in between there is Kafka Broker.