KafkaConsumer: consume all available messages once, and exit - apache-kafka

I want to create a KafkaConsumer, with Kafka 2.0.0, that consumes all available messages once, and exits immediately. This differs slightly from the standard console consumer utility because that utility waits for a specified timeout for new messages, and only exits once that timeout has expired.
This seemingly simple task seems to be surprisingly hard using the KafkaConsumer. My gut reaction was the following pseudo-code:
consumer.assign(all partitions)
consumer.seekToBeginning(all partitions)
do
result = consumer.poll(Duration.ofMillis(0))
// onResult(result)
while result is not empty
However this does not work, as poll always returns an empty collection even though there are many messages on the topic.
Researching this, it looks like one reason may have been that assign/subscribe are considered lazy, and partitions are not assigned until a poll loop has completed (although I cannot find any support for this assertion in the docs). However, the following pseudo-code also returns an empty collection on each call to poll:
consumer.assign(all partitions)
consumer.seekToBeginning(all partitions)
// returns nothing
result = consumer.poll(Duration.ofMillis(0))
// returns nothing
result = consumer.poll(Duration.ofMillis(0))
// returns nothing
result = consumer.poll(Duration.ofMillis(0))
// deprecated poll also returns nothing
result = consumer.poll(0)
// returns nothing
result = consumer.poll(0)
// returns nothing
result = consumer.poll(0)
...
So clearly "laziness" is not the issue.
The javadoc states:
This method returns immediately if there are records available.
which seems to imply that the first pseudo-code above should work. However, it does not.
The only thing that seems to work is to specify a non-zero timeout on poll, and not just any non-zero value, for example 1 doesn't work. Which indicates that there is some non-deterministic behavior happening inside poll which assumes that poll will always be carried out in an infinite loop and it doesn't matter that it occasionally returns an empty collection despite the availability of messages. The code seems to confirm this with various calls to check if the timeout is expired sprinkled throughout the poll implementation and its callees.
So with the naive approach, a longer timeout is obviously required (and ideally Long.MAX_VALUE to avoid the non-deterministic behavior of a shorter poll interval), but unfortunately this will cause the consumer to block on the last poll, which isn't desired in this situation. With the naive approach, we now have a trade-off between how deterministic we want the behavior to be, vs how long we have to wait for no reason on the last poll. How do we avoid this?

The only way to accomplish this seems to be with some additional logic that self-manages the offsets. Here is the pseudo-code:
consumer.assign(all partitions)
consumer.seekToBeginning(all partitions)
// record the current ending offsets and poll until we get there
endOffsets = consumer.endOffsets(all partitions)
do
result = consumer.poll(NONTRIVIAL_TIMEOUT)
// onResult(result)
while given any partition p, consumer.position(p) < endOffsets[p]
and an implementation in Kotlin:
val topicPartitions = consumer.partitionsFor(topic).map { TopicPartition(it.topic(), it.partition()) }
consumer.assign(topicPartitions)
consumer.seekToBeginning(consumer.assignment())
val endOffsets = consumer.endOffsets(consumer.assignment())
fun pendingMessages() = endOffsets.any { consumer.position(it.key) < it.value }
do {
records = consumer.poll(Duration.ofMillis(1000))
onResult(records)
} while(pendingMessages())
The poll duration can now be set to a reasonable value (such as 1s) without concern of missing messages, since the loop continues until the consumer reaches the end offsets identified at the beginning of the loop.
There is one other corner case this deals with correctly: if the end offsets have changed, but there are actually no messages between the current offset and the end offset, then the poll will block and timeout. So it is important that the timeout not be set too low (otherwise the consumer will timeout before retrieving messages that are available) and it must also not be set too high (otherwise the consumer will take too long to timeout when retrieving messages that are not available). The latter situation can happen if those messages were deleted, or if the topic was deleted and recreated.

If there's noone producing concurrently, you can also use endOffsets to get the position of last message, and consume until that.
So, in pseudocode:
long currentOffset = -1
long endOffset = consumer.endOffset(partition)
while (currentOffset < endOffset) {
records = consumer.poll(NONTRIVIAL_TIMEOUT) // discussed in your answer
currentOffset = records.offsets().max()
}
This way we avoid final non-zero hangup, as we are always sure there is something to receive.
You might need to add safeguards if your consumer's position is equal to end offset (as you'd get no messages there).
Also, you might want to set max.poll.records to 1, so you don't consume messages positioned after end offset, if someone is producing in parallel.

Related

Kafka Streams / How to get the partition an iterartor is iterating over?

in my Kafka Streams application, I have a task that sets up a scheduled (by the wall time) punctuator. The punctuator iterates over the entries of a store and does something with them. Like this:
var store = context().getStateStore("MyStore");
var iter = store.all();
while (iter.hasNext()) {
var entry = iter.next();
// ... do something with the entry
}
// Print a summary (now): N entries processed
// Print a summary (wish): N entries processed in partition P
Since I'm working with a single store here (which might be partitioned), I assume that every single execution of the punctuator is bound to a single partition of that store.
Is it possible to find out which partition the punctuator operates on? The java docs for ProcessorContext.partition() states that this method returns -1 within punctuators.
I've read Kafka Streams: Punctuate vs Process and the answers there. I can understand that a task is, in general, not tied to a particular partition. But an iterator should be tied IMO.
How can I find out the partition?
Or is my assumption that a particular instance of a store iterator is tied to a partion wrong?
What I need it for: I'd like to include the partition number in some log messages. For now, I have several nearly identical log messages stating that the punctuator does this and that. In order to make those messages "unique" I'd like to include the partition number into them.
Just to post here the answer that was provided in https://issues.apache.org/jira/browse/KAFKA-12328:
I just used context.taskId(). It contains the partition number at the end of the value, after the underscore. This was sufficient for me.

Distinguish how to handle exceptions in async Kafka producer

When producing message to Kafka you can get two kind of errors: retriables and non-retriables. How should you differentiate them when handling them?
I want to produce records asynchronously, saving in another topic (or HBase) those in which callback object receives a nonretriable exception and let the producer handle for me all those that receives a retriable exception (up to a maximum number of attempts and, when it finally reach it, becomes one of the first ones).
My question is: will the producer still handle the retrievable exceptions by itself despite the callback object?
Because in the Interface Callback says:
Retriable exceptions (transient, may be covered by increasing #
.retries)
Could be the code something like this?
producer.send(record, callback)
def callback: Callback = new Callback {
override def onCompletion(recordMetadata: RecordMetadata, e: Exception): Unit = {
if(null != e) {
if (e == RecordTooLargeException || e == UnknownServerException || ..) {
log.error("Winter is comming")
writeDiscardRecordsToSomewhereElse
} else {
log.warn("It's no that cold") //it's retriable. The producer will keep trying by itself?
}
} else {
log.debug("It's summer. Everything is fine")
}
}
}
Kafka version: 0.10.0
Any light will be appreciated! :)
As the Kafka bible (aka Kafka-The Definitive Guide) says:
The drawback is that while commitSync() will retry the commit until it
either succeeds or encounters a nonretriable failure, commitAsync()
will not retry.
The reason:
It does not retry is that by the time commitAsync() receives a
response from the server, there may have been a later commit that was
already successful.
Imagine that we sent a request to commit offset
2000. There is a temporary communication problem, so the broker never gets the request and therefore never responds. Meanwhile, we processed
another batch and successfully committed offset 3000. If commitA
sync() now retries the previously failed commit, it might succeed in
committing offset 2000 after offset 3000 was already processed and
committed. In the case of a rebalance, this will cause more
duplicates.
Beside that, you still can create an increasing sequence number , which you can increase every time you commit and add that number to the Callback object. When the time to retry comes, just check if the current value of the Acc is equal to the number you gave to the Callback. If it does, it is safe and you can perform the commit. Otherwise, there has been a new commit and you should not retry the commit of this offset.
It seems a lot of troubles, and that is because if you are thinking on this, you should change your strategy.

Kafka producer future metadata in callback

In my application when I send messages I use the Metadata in the callback to save the offset of the record for future usage. However sometimes the metadata.offset() returns -1 which makes things hard later.
Why does this happen and is there a way to get the offset without consuming the topic to find it.
Edit: I am on ack 0 currently, when I pass to ack 1 I don't have these errors anymore however my performance drops drastically. From 100k message in 10 sec to 1 min.
acks=0 If set to zero then the producer will not wait for any
acknowledgment from the server at all. The record will be immediately
added to the socket buffer and considered sent. No guarantee can be
made that the server has received the record in this case, and the
retries configuration will not take effect (as the client won't
generally know of any failures). The offset given back for each
record will always be set to -1.
This is not exactly true as out of 100k messages I got 95k with offsets but I guess it's normal.
Still will need to find another solution to get the offset with ack=0

How can a kafka consumer doing infinite retires recover from a bad incoming message?

I am kafka newbie and as I was reading the docs, I had this design related question related to kafka consumer.
A kafka consumer reads messages from the kafka stream which is made up
of one or more partitions from one or more servers.
Lets say one of the incoming messages is corrupt and as a result the consumer fails to process. But when processing event logs you don't want to drop any events, as a result you do infinite retries to avoid transient errors during processing. In such cases of infinite retries, how can the consumer move forward. Is there a way to blacklist this message for next retry?
I'd think it needs manual intervention. Where we log some message metadata (don't know what exactly yet) to look at which message is failing and have logic in place where each consumer checks redis (or someplace else?) after n reties to see if this message needs to be skipped. The blacklist doesn't have to be stored forever in the redis either, only until the consumer can skip it. Here's a pseudocode of what i just described:
while (errorState) {
if (msg in blacklist) {
//skip
commitOffset()
} else {
errorState = processMessage(msg);
if (!errorState) {
commitOffset();
} else {
// log this msg so that we can add to blacklist
logger.info(msg)
}
}
}
I'd like to hear from more experienced folks to see if there are better ways to do this.
We had a requirement in our project where the processing of an incoming message to update a record was dependent on the record being present. Due to some race condition, sometimes update arrived before the insert. In such cases, we implemented couple of approaches.
A. Manual retry with a predefined delay. The code checks if the insert has arrived. If so, processing goes as normal. Otherwise, it would sleep for 500ms, then try again. This would repeat 10 times. At the end, if the message is still not processed, the code logs the message, commits the offset and moves forward. The processing of message is always done in a thread from a pool, so it doesn't block the main thread either. However, in the worst case each message would take 5 seconds of application time.
B. Recently, we refined the above solution to use a message scheduler based on kafka. So now if insert has not arrived before the update, system sends it to a separate scheduler which operates on kafka. This scheduler would replay the message after some time. After 3 retries, we again log the message and stop scheduling or retrying. This gives us the benefit of not blocking the application threads and manage when we would like to replay the message again.

Max number of tuple replays on Storm Kafka Spout

We’re using Storm with the Kafka Spout. When we fail messages, we’d like to replay them, but in some cases bad data or code errors will cause messages to always fail a Bolt, so we’ll get into an infinite replay cycle. Obviously we’re fixing errors when we find them, but would like our topology to be generally fault tolerant. How can we ack() a tuple after it’s been replayed more than N times?
Looking through the code for the Kafka Spout, I see that it was designed to retry with an exponential backoff timer and the comments on the PR state:
"The spout does not terminate the retry cycle (it is my conviction that it should not do so, because it cannot report context about the failure that happened to abort the reqeust), it only handles delaying the retries. A bolt in the topology is still expected to eventually call ack() instead of fail() to stop the cycle."
I've seen StackOverflow responses that recommend writing a custom spout, but I'd rather not be stuck maintaining a custom patch of the internals of the Kafka Spout if there's a recommended way to do this in a Bolt.
What’s the right way to do this in a Bolt? I don’t see any state in the tuple that exposes how many times it’s been replayed.
Storm itself does not provide any support for your problem. Thus, a customized solution is the only way to go. Even if you do not want to patch KafkaSpout, I think, introducing a counter and breaking the replay cycle in it, would be the best approach. As an alternative, you could also inherit from KafkaSpout and put a counter in your subclass. This is of course somewhat similar to a patch, but might be less intrusive and easier to implement.
If you want to use a Bolt, you could do the following (which also requires some changes to the KafkaSpout or a subclass of it).
Assign an unique IDs as an additional attribute to each tuple (maybe, there is already a unique ID available; otherwise, you could introduce a "counter-ID" or just the whole tuple, ie, all attributes, to identify each tuple).
Insert a bolt after KafkaSpout via fieldsGrouping on the ID (to ensure that a tuple that is replayed is streamed to the same bolt instance).
Within your bolt, use a HashMap<ID,Counter> that buffers all tuples and counts the number of (re-)tries. If the counter is smaller than your threshold value, forward the input tuple so it gets processed by the actual topology that follows (of course, you need to anchor the tuple appropriately). If the count is larger than your threshold, ack the tuple to break the cycle and remove its entry from the HashMap (you might also want to LOG all failed tuples).
In order to remove successfully processed tuples from the HashMap, each time a tuple is acked in KafkaSpout you need to forward the tuple ID to the bolt so that it can remove the tuple from the HashMap. Just declare a second output stream for your KafkaSpout subclass and overwrite Spout.ack(...) (of course you need to call super.ack(...) to ensure KafkaSpout gets the ack, too).
This approach might consume a lot of memory though. As an alternative to have an entry for each tuple in the HashMap you could also use a third stream (that is connected to the bolt as the other two), and forward a tuple ID if a tuple fails (ie, in Spout.fail(...)). Each time, the bolt receives a "fail" message from this third stream, the counter is increase. As long as no entry is in the HashMap (or the threshold is not reached), the bolt simply forwards the tuple for processing. This should reduce the used memory but requires some more logic to be implemented in your spout and bolt.
Both approaches have the disadvantage, that each acked tuple results in an additional message to your newly introduces bolt (thus, increasing network traffic). For the second approach, it might seem that you only need to send a "ack" message to the bolt for tuples that failed before. However, you do not know which tuples did fail and which not. If you want to get rid of this network overhead, you could introduce a second HashMap in KafkaSpout that buffers the IDs of failed messages. Thus, you can only send an "ack" message if a failed tuple was replayed successfully. Of course, this third approach makes the logic to be implemented even more complex.
Without modifying KafkaSpout to some extend, I see no solution for your problem. I personally would patch KafkaSpout or would use the third approach with a HashMap in KafkaSpout subclass and the bolt (because it consumed little memory and does not put a lot of additional load on the network compared to the first two solutions).
Basically it works like this:
If you deploy topologies they should be production grade (this is, a certain level of quality is expected, and the number of tuples low).
If a tuple fails, check if the tuple is actually valid.
If a tuple is valid (for example failed to be inserted because it's not possible to connect to an external database, or something like this) reply it.
If a tuple is miss-formed and can never be handled (for example an database id which is text and the database is expecting an integer) it should be ack, you will never be able to fix such thing or insert it into the database.
New kinds of exceptions, should be logged (as well as the tuple contents itself). You should check these logs and generate the rule to validate tuples in the future. And eventually add code to correctly process them (ETL) in the future.
Don't log everything, otherwise your log files will be huge, be very selective on what do you log. The contents of the log files should be useful and not a pile of rubbish.
Keep doing this, and eventually you will only cover all cases.
We also face the similar data where we have bad data coming in causing the bolt to fail infinitely.
In order to resolve this on runtime, we have introduced one more bolt naming it as "DebugBolt" for reference. So the spout sends the message to this bolt first and then this bolts does the required data fix for the bad messages and then emits them to the required bolt. This way one can fix the data errors on the fly.
Also, if you need to delete some messages, you can actually pass an ignoreFlag from your DebugBolt to your original Bolt and your original bolt should just send an ack to spout without processing if the ignoreFlag is True.
We simply had our bolt emit the bad tuple on an error stream and acked it. Another bolt handled the error by writing it back to a Kafka topic specifically for errors. This allows us to easily direct normal vs. error data flow through the topology.
The only case where we fail a tuple is because some required resource is offline, such as a network connection, DB, ... These are retriable errors. Anything else is directed to the error stream to be fixed or handled as is appropriate.
This all assumes of course, that you don't want to incur any data loss. If you only want to attempt a best effort and ignore after a few retries, then I would look at other options.
As per my knowledge Storm doesn't provide built-in support for this.
I have applied below-mentioned implementation:
public class AuditMessageWriter extends BaseBolt {
private static final long serialVersionUID = 1L;
Map<Object, Integer> failedTuple = new HashMap<>();
public AuditMessageWriter() {
}
/**
* {#inheritDoc}
*/
#Override
public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
this.collector = collector;
//any initialization if u want
}
/**
* {#inheritDoc}
*/
#Override
public void execute(Tuple input) {
try {
//Write your processing logic
collector.ack(input);
} catch (Exception e2) {
//In case of any exception save the tuple in failedTuple map with a count 1
//Before adding the tuple in failedTuple map check the count and increase it and fail the tuple
//if failure count reaches the limit (message reprocess limit) log that and remove from map and acknowledge the tuple
log(input);
ExceptionHandler.LogError(e2, "Message IO Exception");
}
}
void log(Tuple input) {
try {
//Here u can pass result to dead queue or log that
//And ack the tuple
} catch (Exception e) {
ExceptionHandler.LogError(e, "Exception while logging");
}
}
#Override
public void cleanup() {
// To declare output fields.Not required in this alert.
}
#Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
// To declare output fields.Not required in this alert.
}
#Override
public Map<String, Object> getComponentConfiguration() {
return null;
}
}