Apache Kafka with High Level Consumer: Skip corrupted messages - apache-kafka

I'm facing an issue with high level kafka consumer (0.8.2.0) - after consuming some amount of data one of our consumers stops. After restart it consumes some messages and stops again with no error/exception or warning.
After some investigation I found that the problem with consumer was this exception:
ERROR c.u.u.e.impl.kafka.KafkaConsumer - Error consuming message stream:
kafka.message.InvalidMessageException: Message is corrupt (stored crc = 3801080313, computed crc = 2728178222)
Any ideas how can I simple skip such messages at all?

So, answering my own question. After some debugging of Kafka Consumer, I found one possible solution:
Create a subclass of kafka.consumer.ConsumerIterator
Override makeNext-method. In this method catch InvalidMessageException and return some dummy-placeholder.
In your while-loop you have to convert the kafka.consumer.ConsumerIterator to your implementation. Unfortunately all fields of kafka.consumer.ConsumerIterator are private, so you have to use reflection.
So this is the code example:
val skipIt = createKafkaSkippingIterator(ks.iterator())
while(skipIt.hasNext()) {
val messageAndTopic = skipIt.next()
if (messageNotCorrupt(messageAndTopic)) {
consumeFn(messageAndTopic)
}
}
The messageNotCorrupt-method simply checks if the argument is equal to the dummy-message.

another solution, possibly easier, using Kafka 0.8.2 client.
try {
val m = it.next()
//...
} catch {
case e: kafka.message.InvalidMessageException ⇒
log.warn("Corrupted message. Skipping.", e)
resetIteratorState(it)
}
//...
def resetIteratorState(it: ConsumerIterator[Array[Byte], Array[Byte]]): Unit = {
val method = classOf[IteratorTemplate[_]].getDeclaredMethod("resetState")
method.setAccessible(true)
method.invoke(it)
}

Related

Return from Kafka consumer when there is no message

I want to process a topic in application startup using Confluent dotnet client. Assume following example:
while (true)
{
try
{
var cr = c.Consume();
Console.WriteLine($"Consumed message '{cr.Value}' at: '{cr.TopicPartitionOffset}'.");
}
catch (ConsumeException e)
{
Console.WriteLine($"Error occured: {e.Error.Reason}");
}
}
When there is no new message in Kafka, c.Consume will be blocked. Because I want to use it for application startup (Like cache warm up) I want to proceed my code when I found there is no new message.
I know there is an overload for setting timeout like c.Consume(timeout) but the problem with this approach is that if you have a message in your topic and the time duration of reading the message was more than your timeout, You receive null output which is not desirable.
The consumer(s) is not supposed to be aware of the producer(s).
Now if you want to know that you have read everything in the topic from the moment you start to consume, you can:
Load the newest offset before starting to consume.
Then start consuming messages.
If the message's offset is the same as the newest offset you loaded before, stop consuming.
I'm not a C# developper but from what I read in the dotnet confluent doc you can call QueryWatermarkOffsetson the consumer to get oldest and newest offset. https://docs.confluent.io/current/clients/confluent-kafka-dotnet/api/Confluent.Kafka.Consumer.html#Confluent_Kafka_Consumer_QueryWatermarkOffsets_Confluent_Kafka_TopicPartition_
And then, on the Messageclass you have an Offset accessor. So the whole thing should not be too hard to achieve.
https://docs.confluent.io/current/clients/confluent-kafka-dotnet/api/Confluent.Kafka.Message.html#Confluent_Kafka_Message_Offset
You can use OnPartitionEOF event that indicates you have reached the end of partition.
CancellationTokenSource source = new CancellationTokenSource();
bool isContinue = true;
c.OnPartitionEOF += (o, e) =>
{
Console.WriteLine($"You have reached end of partition");
isContinue = false;
source.Cancel();
};
while (isContinue)
{
try
{
var cr = c.Consume(source.Token);
Console.WriteLine($"Consumed message '{cr.Value}' at: '{cr.TopicPartitionOffset}'.");
}
catch (ConsumeException e)
{
Console.WriteLine($"Error occured: {e.Error.Reason}");
}
}
I found the Consumer.IsPartitionEOF useful.

Kafka Consumer with limited number of retries when processing messages

I'm having a hard time figuring simple patterns for handling exceptions in the consumer of a Kafka topic.
Scenario is as follows: in the consumer I call an external service. If the service is unavailable I want to retry a few times and then stop consuming.
The simplest pattern seems a blocking synchronous way of dealing with it, something like this in java:
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
boolean processed=false;
int count=0;
while (!processed) {
try {
callService(..);
} catch (Exception e) {
if (count++ < 3) {
Thread.sleep(5000);
continue;
} else throw new RuntimeException();
}
}
}
However, I have the feeling there must be a simpler approach (without using third party libraries), and one that avoids blocking the thread.
Seems like a common thing we would like to have, yet I could not find a simple example for this pattern.
There is no such retrial mechanism provided by Kafka out of the box. With the experience of using RabbitMQ where the MQ provides a retry exchange. These exchanges are called as Dead-Letter-Exchanges in RabbitMQ.
https://www.rabbitmq.com/dlx.html
You can apply the same pattern in the case of kafka.
On message processing failure we can publish a copy of the message to another topic and wait for the next message. Let’s call the new topic the ‘retry_topic’. The consumer of the ‘retry_topic’ will receive the message from the Kafka and then will wait some predefined time, for example one hour, before starting the message processing. This way we can postpone next attempts of the message processing without any impact on the ‘main_topic’ consumer. If processing in the ‘retry_topic’ consumer fails we just have to give up and store the message in the ‘failed_topic’ for further manual handling of this problem. The ‘main_topic’ consumer code may look like this:
Pushing message to retry_topic on failure/exception
void consumeMainTopicWithPostponedRetry() {
while (true) {
Message message = takeNextMessage("main_topic");
try {
process(message);
} catch (Exception ex) {
publishTo("retry_topic");
LOGGER.warn("Message processing failure. Will try once again in the future.", ex);
}
}
}
Consumer of the retry topic
void consumeRetryTopic() {
while (true) {
Message message = takeNextMessage("retry_topic");
try {
process(message);
waitSomeLongerTime();
} catch (Exception ex) {
publishTo("failed_topic");
LOGGER.warn("Message processing failure. Will skip it.", ex);
}
}
}
The above strategy and examples are picked from the below link. The whole credit goes to the owner of the blog post.
https://blog.pragmatists.com/retrying-consumer-architecture-in-the-apache-kafka-939ac4cb851a
For non-blocking way of doing above can be understood by reading the whole blog post. Hope this helps.

Kafka async Commit Offset Replication

We occasionally suffer from high latency between the replica leader and the rest of the ISR nodes which lead to the consumer getting the following error:
org.apache.kafka.clients.consumer.RetriableCommitFailedException: Commit offsets failed with retriable exception. You should retry committing offsets.
Caused by: org.apache.kafka.common.errors.TimeoutException: The request timed out.
I can increase the offsets.commit.timeout.ms but I don't want to as it may lead to additional side effects.
But on broader view,I don't want the broker to wait syncing the commit offset on all the other replicas, rather commit locally and async update the rest.
Going over the broker configuration I found offsets.commit.required.acks which looks to configure exactly that, BUT the doc also cryptically states: the default (-1) should not be overridden.
Why? I even tried going over the broker source code but found little additional information.
Any idea why this isn't recommended? is there a different way of achieving the same result?
I recommend to actually retry committing the offsets.
Let your consumer commit the offsets asynchronously and implement a retry mechanism. However, retrying an asynchronous commit could lead to the case that you commit a smaller offset after committing a larger offset which should be avoided by all means.
In the book "Kafka - The Definitive Guide", there is a hint on how to mitigate this problem:
Retrying Async Commits: A simple pattern to get commit order right for asynchronous retries is to use a monotonically increasing sequence number. Increase the sequence number every time you commit and add the sequence number at the time of the commit to the commitAsync callback. When you’re getting ready to send a retry, check if the commit sequence number the callback got is equal to the instance variable; if it is, there was no newer commit and it is safe to retry. If the instance sequence number is higher, don’t retry because a newer commit was already sent.
As an example, you can see an implementation of this idea in Scala below:
import java.util._
import java.time.Duration
import org.apache.kafka.clients.consumer.{ConsumerConfig, ConsumerRecord, KafkaConsumer, OffsetAndMetadata, OffsetCommitCallback}
import org.apache.kafka.common.{KafkaException, TopicPartition}
import collection.JavaConverters._
object AsyncCommitWithCallback extends App {
// define topic
val topic = "myOutputTopic"
// set properties
val props = new Properties()
props.put(ConsumerConfig.GROUP_ID_CONFIG, "AsyncCommitter5")
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092")
// [set more properties...]
// create KafkaConsumer and subscribe
val consumer = new KafkaConsumer[String, String](props)
consumer.subscribe(List(topic).asJavaCollection)
// initialize global counter
val atomicLong = new AtomicLong(0)
// consume message
try {
while(true) {
val records = consumer.poll(Duration.ofMillis(1)).asScala
if(records.nonEmpty) {
for (data <- records) {
// do something with the records
}
consumer.commitAsync(new KeepOrderAsyncCommit)
}
}
} catch {
case ex: KafkaException => ex.printStackTrace()
} finally {
consumer.commitSync()
consumer.close()
}
class KeepOrderAsyncCommit extends OffsetCommitCallback {
// keeping position of this callback instance
val position = atomicLong.incrementAndGet()
override def onComplete(offsets: util.Map[TopicPartition, OffsetAndMetadata], exception: Exception): Unit = {
// retrying only if no other commit incremented the global counter
if(exception != null){
if(position == atomicLong.get) {
consumer.commitAsync(this)
}
}
}
}
}

How to use Flink streaming to process Data stream of Complex Protocols

I'm using Flink Stream for the handling of data traffic log in 3G network (GPRS Tunnelling Protocol). And I'm having trouble in the synthesis of information in a user session of the user.
For example: how to map the start and end one session. I don't know that there Flink streaming suited to handle complex protocols like that?
p/s:
We capture data exchanging between SGSN and GGSN in 3G network (use GTP protocol with GTP-C/U messages). A session is started when the SGSN sends the CreateReq (TEID, Seq, IMSI, TEID_dl,TEID_data_dl) message and GGSN responses CreateRsp(TEID_dl, Seq, TEID_ul, TEID_data_ul) message.
After the session is established, others GTP-C messages (ex: UpdateReq, DeleteReq) sent from SGSN to GGSN uses TEID_ul and response message uses TEID_dl, GTP- U message uses TEID_data_ul (SGSN -> GGSN) and TEID_data_dl (GGSN -> SGSN). GTP-U messages contain information such as AppID (facebook, twitter, web), url,...
Finally, I want to handle continuous log data stream and map the GTP-C messages and GTP-U of the same one user (IMSI) to make a report.
I've tried this:
val sessions = createReqs.connect(createRsps).flatMap(new CoFlatMapFunction[CreateReq, CreateRsp, Session] {
// holds CreateReqs indexed by (tedid_dl,seq)
private val createReqs = mutable.HashMap.empty[(String, String), CreateReq]
// holds CreateRsps indexed by (tedid,seq)
private val createRsps = mutable.HashMap.empty[(String, String), CreateRsp]
override def flatMap1(req: CreateReq, out: Collector[Session]): Unit = {
val key = (req.teid_dl, req.header.seqNum)
val oRsp = createRsps.get(key)
if (!oRsp.isEmpty) {
val rsp = oRsp.get
println("OK")
out.collect(new Session(rsp.header.time, req.imsi, req.teid_dl, req.teid_ddl, rsp.teid_upl, rsp.teid_dupl, req.rat, req.apn))
createRsps.remove(key)
} else {
createReqs.put(key, req)
}
}
override def flatMap2(rsp: CreateRsp, out: Collector[Session]): Unit = {
val key = (rsp.header.teid, rsp.header.seqNum)
val oReq = createReqs.get(key)
if (!oReq.isEmpty) {
val req = oReq.get
out.collect(new Session(rsp.header.time, req.imsi, req.teid_dl, req.teid_ddl, rsp.teid_upl, rsp.teid_dupl, req.rat, req.apn))
createReqs.remove(key)
} else {
createRsps.put(key, rsp)
}
}
}).print()
This code always returns empty result. The fact that the input stream contains CreateRsp and CreateReq message of the same session. They appear very close together (within 1 second). When I debug, the oReq.isEmpty == true every time.
What i'm doing wrong?
To be honest it is a bit difficult to see through the telco specifics here, but if I understand correctly you have at least 3 streams, the first two being the CreateReq and the CreateRsp streams.
To detect the establishment of a session I would use the ConnectedDataStream abstraction to share state between the two aforementioned streams. Check out this example for usage or the related Flink docs.
Is this what you are trying to achieve?

Scala Queue and NoSuchElementException

I am getting an infrequent NoSuchElementException error when operating over my Scala 2.9.2 Queue. I don't understand the exception becase the Queue has elements in it. I've tried switching over to a SynchronizedQueue, thinking it was a concurrency issue (my queue is written and read to from different threads) but that didn't solve it.
The reduced code looks like this:
val window = new scala.collection.mutable.Queue[Packet]
...
(thread 1)
window += packet
...
(thread 2)
window.dequeueAll(someFunction)
println(window.size)
window.foreach(println(_))
Which results in
32
java.util.NoSuchElementException
at scala.collection.mutable.LinkedListLike$class.head(LinkedListLike.scala:76)
at scala.collection.mutable.LinkedList.head(LinkedList.scala:78)
at scala.collection.mutable.MutableList.head(MutableList.scala:53)
at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
at scala.collection.mutable.MutableList.foreach(MutableList.scala:30)
The docs for LinkedListLike.head() say
Exceptions thrown
`NoSuchElementException`
if the linked list is empty.
but how can this exception be thrown if the Queue is not empty?
You should have window (mutable data structure) accessed from only a single thread. Other threads should send messages to that one.
There is Akka that allows relatively easy concurrent programming.
class MySource(windowHolderRef:ActorRef) {
def receive = {
case MyEvent(packet:Packet) =>
windowHolderRef ! packet
}
}
case object CheckMessages
class WindowHolder {
private val window = new scala.collection.mutable.Queue[Packet]
def receive = {
case packet:Packet =>
window += packet
case CheckMessages =>
window.dequeueAll(someFunction)
println(window.size)
window.foreach(println(_))
}
}
To check messages periodically you can schedule periodic messages.
// context.schedule(1 second, 1 second, CheckMessages)