Error while connecting to kafka with Apache flink Batch mode - apache-kafka

I am trying to consume messages from a Kafka topic in Apache flink running in Batch mode.
Below is the code snippet using "KafkaSource"
env.setRuntimeMode(RuntimeExecutionMode.BATCH);
KafkaSource<String> source = KafkaSource.<String>builder()
.setBounded(OffsetsInitializer.committedOffsets(OffsetResetStrategy.EARLIEST))
.setBootstrapServers(<kafka server address>)
.setTopics("abcd.a1")
.setGroupId("my-group")
.setStartingOffsets(OffsetsInitializer.earliest())
.setValueOnlyDeserializer(new SimpleStringSchema())
.build();
DataStreamSource<String> dataStream = env.fromSource(source, WatermarkStrategy.noWatermarks(), "Kafka Source");
dataStream.flatMap(new FlatMapFunction<String, Tuple3<String, String, String>>() {
#Override
public void flatMap(String value, Collector<Tuple3<String, String, String>> out) throws Exception {
System.out.println("Consumed message "+value);
}
});
When I try above, getting below error always,
Caused by: java.lang.NullPointerException: Partition abcd.a1-2 should stop at committed offset. But there is no committed offset of this partition for group my-group
at org.apache.flink.util.Preconditions.checkNotNull(Preconditions.java:76)
at org.apache.flink.connector.kafka.source.reader.KafkaPartitionSplitReader.lambda$acquireAndSetStoppingOffsets$3(KafkaPartitionSplitReader.java:360)
at java.util.HashMap.forEach(HashMap.java:1288)
at org.apache.flink.connector.kafka.source.reader.KafkaPartitionSplitReader.acquireAndSetStoppingOffsets(KafkaPartitionSplitReader.java:358)
at org.apache.flink.connector.kafka.source.reader.KafkaPartitionSplitReader.handleSplitsChanges(KafkaPartitionSplitReader.java:257)
at org.apache.flink.connector.base.source.reader.fetcher.AddSplitsTask.run(AddSplitsTask.java:51)
at org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.runOnce(SplitFetcher.java:142)
I have tried with both "OffsetsInitializer.earliest()" and "OffsetsInitializer.latest()".
Please help me on that. Let me know if its duplicate post, will close it.
Thank you

Related

error handling when consume as a batch in kafka

i have a kafka consumer written in java spring boot (spirng kafka). My consumer is like below.
#RetryableTopic(
attempts = "4",
backoff = #Backoff(delay = 1000, multiplier = 2.0),
autoCreateTopics = "false",
topicSuffixingStrategy = TopicSuffixingStrategy.SUFFIX_WITH_INDEX_VALUE,
include = {ResourceAccessException.class, MyCustomRetryableException.class})
#KafkaListener(topics = "myTopic", groupId = "myGroup", autoStartup = "true", concurrency = "3")
public void consume(#Header(KafkaHeaders.RECEIVED_TOPIC) String topic,
#Header("custom_header_1") String customHeader1,
#Header("custom_header_2") String customHeader2,
#Header("custom_header_3") String customHeader3,
#Header(required = false, name = KafkaHeaders.RECEIVED_MESSAGE_KEY) String key,
#Payload(required = false) String message) {
log.info("-------------------------");
log.info(key);
log.info(message);
log.info("-------------------------");
}
I have used #RetryableTopic annotation to handle errors. I have written a custom exception class and whatever method that throw my custom exception class (MyCustomRetryableException.class), it will retry according to the backoff with number of attempts defined in the retryable annotation. So in here i dont have to do anything. Kafka will simple publish failing messages to the correct dlt topic. All i have to do is create dlt related topic since i have used autoCreateTopics = "false".
Now i'm trying to consume messages in batch wise. I changed my kafka config like below in order to consume in batch wise.
#Bean
public ConsumerFactory<String, Object> consumerFactory() {
Map<String, Object> config = new HashMap<>();
// default configs like bootstrap servers, key and value deserializers are here
config.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, "5");
return new DefaultKafkaConsumerFactory<>(config);
}
#Bean
public ConcurrentKafkaListenerContainerFactory<String, Object> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, Object> factory =
new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.getContainerProperties().setCommitLogLevel(LogIfLevelEnabled.Level.DEBUG);
factory.setBatchListener(true);
return factory;
}
Now that i have added batch listeners, #RetryableTopic is not supported with it. So how can i achieve the publishing failed messages to DLT task which was previously handled by #RetryableTopic ?
If anyone can answer with an example it would be great. Thank you in advance.
See the documentation.
Use a DefaultErrorHandler with a DeadLetterPublishingRecoverer.
Non blocking retries are not supported; the retries will use the configured BackOff.
Throw a BatchListenerFailedException to indicate which record in the batch failed and just that one will be sent to the DLT.
With any other exception, the whole batch will be retried (and sent to the DLT if retries are exhausted).
https://docs.spring.io/spring-kafka/docs/current/reference/html/#retrying-batch-eh

Kafka Consumer Poll runs indefinitely and doesn't return anything

I am facing difficulty with KafkaConsumer.poll(duration timeout), wherein it runs indefinitely and never come out of the method. Understand that this could be related to connection and I have seen it a bit inconsistent sometimes. How do I handle this should poll stops responding? Given below is the snippet from KafkaConsumer.poll()
public ConsumerRecords<K, V> poll(final Duration timeout) {
return poll(time.timer(timeout), true);
}
and I am calling the above from here :
Duration timeout = Duration.ofSeconds(30);
while (true) {
final ConsumerRecords<recordID, topicName> records = consumer.poll(timeout);
System.out.println("record count is" + records.count());
}
I am getting the below error:
org.apache.kafka.common.errors.SerializationException: Error
deserializing key/value for partition at offset 2. If
needed, please seek past the record to continue consumption.
I stumbled upon some useful information while trying to fix the problem I was facing above. I will provide the piece of code which should be able to handle this, but before that it is important to know what causes this.
While producing or consuming message or data to Apache Kafka, we need schema structure to that message or data, in my case Avro schema. If there is a conflict of message being produced to Kafka that conflict with that message schema, it will have an effect on consumption.
Add below code in your consumer topic in the method where it consume records --
do remember to import below packages:
import org.apache.kafka.common.TopicPartition;
import org.jsoup.SerializationException;
try {
while (true) {
ConsumerRecords<String, GenericRecord> records = null;
try {
records = consumer.poll(10000);
} catch (SerializationException e) {
String s = e.getMessage().split("Error deserializing key/value
for partition ")[1].split(". If needed, please seek past the record to
continue consumption.")[0];
String topics = s.split("-")[0];
int offset = Integer.valueOf(s.split("offset ")[1]);
int partition = Integer.valueOf(s.split("-")[1].split(" at") .
[0]);
TopicPartition topicPartition = new TopicPartition(topics,
partition);
//log.info("Skipping " + topic + "-" + partition + " offset "
+ offset);
consumer.seek(topicPartition, offset + 1);
}
for (ConsumerRecord<String, GenericRecord> record : records) {
System.out.printf("value = %s \n", record.value());
}
}
} finally {
consumer.close();
}
I ran into this while setting up a test environment.
Running the following command on the broker printed out the stored records as one would expect:
bin/kafka-console-consumer.sh --bootstrap-server="localhost:9092" --topic="foo" --from-beginning
It turned out that the Kafka server was misconfigured. To connect from an external
IP address listeners must have a valid value in kafka/config/server.properties, e.g.
# The address the socket server listens on. It will get the value returned from
# java.net.InetAddress.getCanonicalHostName() if not configured.
# FORMAT:
# listeners = listener_name://host_name:port
# EXAMPLE:
# listeners = PLAINTEXT://your.host.name:9092
listeners=PLAINTEXT://:9092

flink kafkaproducer send duplicate message in exactly once mode when checkpoint restore

I am writing a case to test flink two step commit, below is overview.
sink kafka is exactly once kafka producer. sink step is mysql sink extend two step commit. sink compare is mysql sink extend two step commit, and this sink will occasionally throw a exeption to simulate checkpoint failed.
When checkpoint is failed and restore, I find mysql two step commit will work fine, but kafka consumer will read offset from last success and kafka producer produce messages even he was done it before this checkpoint failed.
How to avoid duplicate message in this case?
Thanks for help.
env:
flink 1.9.1
java 1.8
kafka 2.11
kafka producer code:
dataStreamReduce.addSink(new FlinkKafkaProducer<>(
"flink_output",
new KafkaSerializationSchema<Tuple4<String, String, String, Long>>() {
#Override
public ProducerRecord<byte[], byte[]> serialize(Tuple4<String, String, String, Long> element, #Nullable Long timestamp) {
UUID uuid = UUID.randomUUID();
JSONObject jsonObject = new JSONObject();
jsonObject.put("uuid", uuid.toString());
jsonObject.put("key1", element.f0);
jsonObject.put("key2", element.f1);
jsonObject.put("key3", element.f2);
jsonObject.put("indicate", element.f3);
return new ProducerRecord<>("flink_output", jsonObject.toJSONString().getBytes(StandardCharsets.UTF_8));
}
},
kafkaProps,
FlinkKafkaProducer.Semantic.EXACTLY_ONCE
)).name("sink kafka");
checkpoint settings:
StreamExecutionEnvironment executionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment();
executionEnvironment.enableCheckpointing(10000);
executionEnvironment.getCheckpointConfig().setTolerableCheckpointFailureNumber(0);
executionEnvironment.getCheckpointConfig().setPreferCheckpointForRecovery(true);
mysql sink:
dataStreamReduce.addSink(
new TwoPhaseCommitSinkFunction<Tuple4<String, String, String, Long>,
Connection, Void>
(new KryoSerializer<>(Connection.class, new ExecutionConfig()), VoidSerializer.INSTANCE) {
int count = 0;
Connection connection;
#Override
protected void invoke(Connection transaction, Tuple4<String, String, String, Long> value, Context context) throws Exception {
if (count > 10) {
throw new Exception("compare test exception.");
}
PreparedStatement ps = transaction.prepareStatement(
" insert into test_two_step_compare(slot_time, key1, key2, key3, indicate) " +
" values(?, ?, ?, ?, ?) " +
" ON DUPLICATE KEY UPDATE indicate = indicate + values(indicate) "
);
ps.setString(1, context.timestamp().toString());
ps.setString(2, value.f0);
ps.setString(3, value.f1);
ps.setString(4, value.f1);
ps.setLong(5, value.f3);
ps.execute();
ps.close();
count += 1;
}
#Override
protected Connection beginTransaction() throws Exception {
LOGGER.error("compare in begin transaction");
try {
if (connection.isClosed()) {
throw new Exception("mysql connection closed");
}
}catch (Exception e) {
LOGGER.error("mysql connection is error: " + e.toString());
LOGGER.error("reconnect mysql connection");
String jdbcURI = "jdbc:mysql://";
Class.forName("com.mysql.jdbc.Driver");
Connection connection = DriverManager.getConnection(jdbcURI);
connection.setAutoCommit(false);
this.connection = connection;
}
return this.connection;
}
#Override
protected void preCommit(Connection transaction) throws Exception {
LOGGER.error("compare in pre Commit");
}
#Override
protected void commit(Connection transaction) {
LOGGER.error("compare in commit");
try {
transaction.commit();
} catch (Exception e) {
LOGGER.error("compare Commit error: " + e.toString());
}
}
#Override
protected void abort(Connection transaction) {
LOGGER.error("compare in abort");
try {
transaction.rollback();
} catch (Exception e) {
LOGGER.error("compare abort error." + e.toString());
}
}
#Override
protected void recoverAndCommit(Connection transaction) {
super.recoverAndCommit(transaction);
LOGGER.error("compare in recover And Commit");
}
#Override
protected void recoverAndAbort(Connection transaction) {
super.recoverAndAbort(transaction);
LOGGER.error("compare in recover And Abort");
}
})
.setParallelism(1).name("sink compare");
I'm not quite sure I understand the question correctly:
When checkpoint is failed and restore, I find mysql two step commit will work fine, but kafka producer will read offset from last success and produce message even he was done it before this checkpoint failed.
Kafka producer is not reading any data. So, I'm assuming your whole pipeline rereads old offsets and produces duplicates. If so, you need to understand how Flink ensures exactly once.
Periodic checkpoints are created to have a consistent state in case of failure.
These checkpoints contain the offset of the last successfully read record at the time of the checkpoint.
Upon recovery Flink will reread all records from the offset stored in the last successful checkpoint. Thus, the same records will be replayed as have been generated in between last checkpoint and failure.
The replayed records will restore the state right before the failure.
It will produce duplicate outputs originating from the replayed input records.
It is the responsibility of the sinks to ensure that no duplicates are effectively written to the target system.
For the last point, there are two options:
only output data, when a checkpoint has been written, such that no effective duplicates can ever appear in the target. This naive approach is very universal (independent of the sink) but adds the checkpointing interval to the latency.
let the sink deduplicate the output.
The latter option is used for the Kafka sink. It uses Kafka transactions for letting it deduplicate data. To avoid duplicates on consumer side, you need to ensure it's not reading uncommitted data as mentioned in the documentation. Also make sure your transaction timeout is large enough that it doesn't discard data between failure and recovery.

How to check if Kafka Consumer is ready

I have Kafka commit policy set to latest and missing first few messages. If I give a sleep of 20 seconds before starting to send the messages to the input topic, everything is working as desired. I am not sure if the problem is with consumer taking long time for partition rebalancing. Is there a way to know if the consumer is ready before starting to poll ?
You can use consumer.assignment(), it will return set of partitions and verify whether all of the partitions are assigned which are available for that topic.
If you are using spring-kafka project, you can include spring-kafka-test dependancy and use below method to wait for topic assignment , but you need to have container.
ContainerTestUtils.waitForAssignment(Object container, int partitions);
You can do the following:
I have a test that reads data from kafka topic.
So you can't use KafkaConsumer in multithread environment, but you can pass parameter "AtomicReference assignment", update it in consumer-thread, and read it in another thread.
For example, snipped of working code in project for testing:
private void readAvro(String readFromKafka,
AtomicBoolean needStop,
List<Event> events,
String bootstrapServers,
int readTimeout) {
// print the topic name
AtomicReference<Set<TopicPartition>> assignment = new AtomicReference<>();
new Thread(() -> readAvro(bootstrapServers, readFromKafka, needStop, events, readTimeout, assignment)).start();
long startTime = System.currentTimeMillis();
long maxWaitingTime = 30_000;
for (long time = System.currentTimeMillis(); System.currentTimeMillis() - time < maxWaitingTime;) {
Set<TopicPartition> assignments = Optional.ofNullable(assignment.get()).orElse(new HashSet<>());
System.out.println("[!kafka-consumer!] Assignments [" + assignments.size() + "]: "
+ assignments.stream().map(v -> String.valueOf(v.partition())).collect(Collectors.joining(",")));
if (assignments.size() > 0) {
break;
}
try {
Thread.sleep(1_000);
} catch (InterruptedException e) {
e.printStackTrace();
needStop.set(true);
break;
}
}
System.out.println("Subscribed! Wait summary: " + (System.currentTimeMillis() - startTime));
}
private void readAvro(String bootstrapServers,
String readFromKafka,
AtomicBoolean needStop,
List<Event> events,
int readTimeout,
AtomicReference<Set<TopicPartition>> assignment) {
KafkaConsumer<String, byte[]> consumer = (KafkaConsumer<String, byte[]>) queueKafkaConsumer(bootstrapServers, "latest");
System.out.println("Subscribed to topic: " + readFromKafka);
consumer.subscribe(Collections.singletonList(readFromKafka));
long started = System.currentTimeMillis();
while (!needStop.get()) {
assignment.set(consumer.assignment());
ConsumerRecords<String, byte[]> records = consumer.poll(1_000);
events.addAll(CommonUtils4Tst.readEvents(records));
if (readTimeout == -1) {
if (events.size() > 0) {
break;
}
} else if (System.currentTimeMillis() - started > readTimeout) {
break;
}
}
needStop.set(true);
synchronized (MainTest.class) {
MainTest.class.notifyAll();
}
consumer.close();
}
P.S.
needStop - global flag, to stop all running thread if any in case of failure of success
events - list of object, that i want to check
readTimeout - how much time we will wait until read all data, if readTimeout == -1, then stop when we read anything
Thanks to Alexey (I have also voted up), I seemed to have resolved my issue essentially following the same idea.
Just want to share my experience... in our case we using Kafka in request & response way, somewhat like RPC. Request is being sent on one topic and then waiting for response on another topic. Running into a similar issue i.e. missing out first response.
I have tried ... KafkaConsumer.assignment(); repeatedly (with Thread.sleep(100);) but doesn't seem to help. Adding a KafkaConsumer.poll(50); seems to have primed the consumer (group) and receiving the first response too. Tested few times and it consistently working now.
BTW, testing requires stopping application & deleting Kafka topics and, for a good measure, restarted Kafka too.
PS: Just calling poll(50); without assignment(); fetching logic, like Alexey mentioned, may not guarantee that consumer (group) is ready.
You can modify an AlwaysSeekToEndListener (listens only to new messages) to include a callback:
public class AlwaysSeekToEndListener<K, V> implements ConsumerRebalanceListener {
private final Consumer<K, V> consumer;
private Runnable callback;
public AlwaysSeekToEndListener(Consumer<K, V> consumer) {
this.consumer = consumer;
}
public AlwaysSeekToEndListener(Consumer<K, V> consumer, Runnable callback) {
this.consumer = consumer;
this.callback = callback;
}
#Override
public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
}
#Override
public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
consumer.seekToEnd(partitions);
if (callback != null) {
callback.run();
}
}
}
and subscribe with a latch callback:
CountDownLatch initLatch = new CountDownLatch(1);
consumer.subscribe(singletonList(topic), new AlwaysSeekToEndListener<>(consumer, () -> initLatch.countDown()));
initLatch.await(); // blocks until consumer is ready and listening
then proceed to start your producer.
If your policy is set to latest - which takes effect if there are no previously committed offsets - but you have no previously committed offsets, then you should not worry about 'missing' messages, because you're telling Kafka not to care about messages that were sent 'previously' to your consumers being ready.
If you care about 'previous' messages, you should set the policy to earliest.
In any case, whatever the policy, the behaviour you are seeing is transient, i.e. once committed offsets are saved in Kafka, on every restart the consumers will pick up where they left previoulsy
I needed to know if a kafka consumer was ready before doing some testing, so i tried with consumer.assignment(), but it only returned the set of partitions assigned, but there was a problem, with this i cannot see if this partitions assigned to the group had offset setted, so later when i tried to use the consumer it didnĀ“t have offset setted correctly.
The solutions was to use committed(), this will give you the last commited offsets of the given partitions that you put in the arguments.
So you can do something like: consumer.committed(consumer.assignment())
If there is no partitions assigned yet it will return:
{}
If there is partitions assigned, but no offset yet:
{name.of.topic-0=null, name.of.topic-1=null}
But if there is partitions and offset:
{name.of.topic-0=OffsetAndMetadata{offset=5197881, leaderEpoch=null, metadata=''}, name.of.topic-1=OffsetAndMetadata{offset=5198832, leaderEpoch=null, metadata=''}}
With this information you can use something like:
consumer.committed(consumer.assignment()).isEmpty();
consumer.committed(consumer.assignment()).containsValue(null);
And with this information you can be sure that the kafka consumer is ready.

HornetQ Consumer Stops Receiving Messages After N hours

I'm using a HornetQ(v2.2.13) consumer, running standalone, to read off a Persistent topic published by a JBOSS server(7.1.1 final). All goes well for several hours (between 2-6) and then the consumer just stops receiving messages from the topic. From the log file on the server, I see data keeps getting pumped down the pipe but the consumer log file indicates the client stopped reading the data. I deduced that from the client saying the last time it read a message off the topic was 12:00:00 and the server log saying the last time it pushed a message to the topic was 14:00:00.
I've tried adjusting the HornetQ configs, but it doesn't seem to be working for a sustainable duration.
The code I use to communicate with the topic is as follows.
private TransportConfiguration getTC(String hostname) {
Map<String,Object> params = new HashMap<String, Object>();
params.put(TransportConstants.HOST_PROP_NAME, hostname);
params.put(TransportConstants.PORT_PROP_NAME, 5445);
TransportConfiguration tc = new TransportConfiguration(NettyConnectorFactory.class.getName(), params);
return tc;
}
private Topic createDestination(String destinationName) {
Topic topic = new HornetQTopic(destinationName);
return topic;
}
private HornetQConnectionFactory createCF(TransportConfiguration tc) {
HornetQConnectionFactory cf = HornetQJMSClient.createConnectionFactoryWithoutHA(JMSFactoryType .CF, tc);
return cf == null ? null : cf;
}
Code snippet that creates the session and starts it:
TransportConfiguration tc = this.getTC(this.hostname);
HornetQConnectionFactory cf = this.createCF(tc);
cf.setRetryInterval(4000);
cf.setReconnectAttempts(10);
cf.setConfirmationWindowSize(1000000);
Destination destination = this.createDestination(this.topicName);
logger.info("Starting Topic Connection");
try {
this.connection = cf.createConnection();
connection.start();
this.session = connection.createSession(transactional, ackMode);
MessageConsumer consumer = session.createConsumer(destination);
consumer.setMessageListener(this);
logger.info("Started topic connection");
} catch (Exception ex) {
ex.printStackTrace();
logger.error("EXCEPTION!");
}
You don't get any log about server getting disconnected on the server's side.
Did you try playing with client-failure-check-period and other ping parameters?
What about VM Settings?
How are you acknowledging the messages? I see that you created it as transaction. Are you sure you are committing the TX as you recieve messages?