Max number of tuple replays on Storm Kafka Spout - apache-kafka

We’re using Storm with the Kafka Spout. When we fail messages, we’d like to replay them, but in some cases bad data or code errors will cause messages to always fail a Bolt, so we’ll get into an infinite replay cycle. Obviously we’re fixing errors when we find them, but would like our topology to be generally fault tolerant. How can we ack() a tuple after it’s been replayed more than N times?
Looking through the code for the Kafka Spout, I see that it was designed to retry with an exponential backoff timer and the comments on the PR state:
"The spout does not terminate the retry cycle (it is my conviction that it should not do so, because it cannot report context about the failure that happened to abort the reqeust), it only handles delaying the retries. A bolt in the topology is still expected to eventually call ack() instead of fail() to stop the cycle."
I've seen StackOverflow responses that recommend writing a custom spout, but I'd rather not be stuck maintaining a custom patch of the internals of the Kafka Spout if there's a recommended way to do this in a Bolt.
What’s the right way to do this in a Bolt? I don’t see any state in the tuple that exposes how many times it’s been replayed.

Storm itself does not provide any support for your problem. Thus, a customized solution is the only way to go. Even if you do not want to patch KafkaSpout, I think, introducing a counter and breaking the replay cycle in it, would be the best approach. As an alternative, you could also inherit from KafkaSpout and put a counter in your subclass. This is of course somewhat similar to a patch, but might be less intrusive and easier to implement.
If you want to use a Bolt, you could do the following (which also requires some changes to the KafkaSpout or a subclass of it).
Assign an unique IDs as an additional attribute to each tuple (maybe, there is already a unique ID available; otherwise, you could introduce a "counter-ID" or just the whole tuple, ie, all attributes, to identify each tuple).
Insert a bolt after KafkaSpout via fieldsGrouping on the ID (to ensure that a tuple that is replayed is streamed to the same bolt instance).
Within your bolt, use a HashMap<ID,Counter> that buffers all tuples and counts the number of (re-)tries. If the counter is smaller than your threshold value, forward the input tuple so it gets processed by the actual topology that follows (of course, you need to anchor the tuple appropriately). If the count is larger than your threshold, ack the tuple to break the cycle and remove its entry from the HashMap (you might also want to LOG all failed tuples).
In order to remove successfully processed tuples from the HashMap, each time a tuple is acked in KafkaSpout you need to forward the tuple ID to the bolt so that it can remove the tuple from the HashMap. Just declare a second output stream for your KafkaSpout subclass and overwrite Spout.ack(...) (of course you need to call super.ack(...) to ensure KafkaSpout gets the ack, too).
This approach might consume a lot of memory though. As an alternative to have an entry for each tuple in the HashMap you could also use a third stream (that is connected to the bolt as the other two), and forward a tuple ID if a tuple fails (ie, in Spout.fail(...)). Each time, the bolt receives a "fail" message from this third stream, the counter is increase. As long as no entry is in the HashMap (or the threshold is not reached), the bolt simply forwards the tuple for processing. This should reduce the used memory but requires some more logic to be implemented in your spout and bolt.
Both approaches have the disadvantage, that each acked tuple results in an additional message to your newly introduces bolt (thus, increasing network traffic). For the second approach, it might seem that you only need to send a "ack" message to the bolt for tuples that failed before. However, you do not know which tuples did fail and which not. If you want to get rid of this network overhead, you could introduce a second HashMap in KafkaSpout that buffers the IDs of failed messages. Thus, you can only send an "ack" message if a failed tuple was replayed successfully. Of course, this third approach makes the logic to be implemented even more complex.
Without modifying KafkaSpout to some extend, I see no solution for your problem. I personally would patch KafkaSpout or would use the third approach with a HashMap in KafkaSpout subclass and the bolt (because it consumed little memory and does not put a lot of additional load on the network compared to the first two solutions).

Basically it works like this:
If you deploy topologies they should be production grade (this is, a certain level of quality is expected, and the number of tuples low).
If a tuple fails, check if the tuple is actually valid.
If a tuple is valid (for example failed to be inserted because it's not possible to connect to an external database, or something like this) reply it.
If a tuple is miss-formed and can never be handled (for example an database id which is text and the database is expecting an integer) it should be ack, you will never be able to fix such thing or insert it into the database.
New kinds of exceptions, should be logged (as well as the tuple contents itself). You should check these logs and generate the rule to validate tuples in the future. And eventually add code to correctly process them (ETL) in the future.
Don't log everything, otherwise your log files will be huge, be very selective on what do you log. The contents of the log files should be useful and not a pile of rubbish.
Keep doing this, and eventually you will only cover all cases.

We also face the similar data where we have bad data coming in causing the bolt to fail infinitely.
In order to resolve this on runtime, we have introduced one more bolt naming it as "DebugBolt" for reference. So the spout sends the message to this bolt first and then this bolts does the required data fix for the bad messages and then emits them to the required bolt. This way one can fix the data errors on the fly.
Also, if you need to delete some messages, you can actually pass an ignoreFlag from your DebugBolt to your original Bolt and your original bolt should just send an ack to spout without processing if the ignoreFlag is True.

We simply had our bolt emit the bad tuple on an error stream and acked it. Another bolt handled the error by writing it back to a Kafka topic specifically for errors. This allows us to easily direct normal vs. error data flow through the topology.
The only case where we fail a tuple is because some required resource is offline, such as a network connection, DB, ... These are retriable errors. Anything else is directed to the error stream to be fixed or handled as is appropriate.
This all assumes of course, that you don't want to incur any data loss. If you only want to attempt a best effort and ignore after a few retries, then I would look at other options.

As per my knowledge Storm doesn't provide built-in support for this.
I have applied below-mentioned implementation:
public class AuditMessageWriter extends BaseBolt {
private static final long serialVersionUID = 1L;
Map<Object, Integer> failedTuple = new HashMap<>();
public AuditMessageWriter() {
}
/**
* {#inheritDoc}
*/
#Override
public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
this.collector = collector;
//any initialization if u want
}
/**
* {#inheritDoc}
*/
#Override
public void execute(Tuple input) {
try {
//Write your processing logic
collector.ack(input);
} catch (Exception e2) {
//In case of any exception save the tuple in failedTuple map with a count 1
//Before adding the tuple in failedTuple map check the count and increase it and fail the tuple
//if failure count reaches the limit (message reprocess limit) log that and remove from map and acknowledge the tuple
log(input);
ExceptionHandler.LogError(e2, "Message IO Exception");
}
}
void log(Tuple input) {
try {
//Here u can pass result to dead queue or log that
//And ack the tuple
} catch (Exception e) {
ExceptionHandler.LogError(e, "Exception while logging");
}
}
#Override
public void cleanup() {
// To declare output fields.Not required in this alert.
}
#Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
// To declare output fields.Not required in this alert.
}
#Override
public Map<String, Object> getComponentConfiguration() {
return null;
}
}

Related

Maintaining cold observable semantics with a hot observable

I have a requirement to read items from an external queue, and persist them to a JDBC store. The items must be processed one-by-one, and the next item must only be read from the external queue once the previous item has been successfully persisted. At any given time there may or may not be an item available to read, and if not the application must block until the next item is available.
In order to enforce the one-by-one semantics, I decided to use a cold Observable using the generate method:
return Observable.generate(emitter -> {
final Future<Message> receivedFuture = ...;
final Message message = receivedFuture.get();
emitter.onNext(message);
});
This seems to work as expected for the receiving side.
In order to persist the data to the database, I decided to make use of the Vertx JDBCPool library.
messageObservable
.flatMapSingle(message ->
jdbcPool.prepareQuery("...")
.rxExecute(Tuple.of(...)) // produces a hot observable
)
According to the Vertx docs, the JDBCPool RX methods all produce hot observables.
The problem here seems to be that the flatmap to the JDBCPool method causes the entire chain to become hot. This has the undesirable consequence that messages are read from the queue before the previous message was persisted.
In other words, instead of
Read message 1
Write message 1
Read message 2
Write message 2
I now get
Read message 1
Read message 2
Read message 3
Write message 1
Read message 4
Write message 2
The only solution I have at the moment is to do a very undesirable thing and put the JDBCPool query in its own chain:
messageObservable
.flatMapSingle(message ->
Single.just(
jdbcPool.prepareQuery("...")
.rxExecute(Tuple.of(...))
.blockingWait()
)
I want to know what if there is a way I can combine both the one-by-one semantics of a cold observable stream in combination with a hot observable operation, while keeping the chain intact.

Kafka Streams and Spring Cloud Stream - Processor Efficiency

I would like to confirm my understanding of the efficiency of having multiple processors reading from one Kafka Stream source. I believe the following in Example 1 is the most efficient if I want 2 different processes performed depending on Predicate logic. The Predicate looks at the content of the Value (the Notification object here). If you have a breakpoint in each of the following processors in Example 1, it shows each Function is called for each incoming Notification. Whereas in Example 2, you only call the process2 Function if the predicate logic is met.
Example 1
#Bean
public Function<KStream<String, Notification>,KStream<String, Notification>> process1() {
return input -> input
.branch(PREDICATE_FOR_OUT_0, PREDICATE_FOR_OUT_1);
}
#Bean
public Function<KStream<String, Notification>,KStream<String, EnrichedNotification>> process2() {
return input -> input
.filter(PREDICATE_FOR_OUT_2);
.map((key, value) ->.........; //different additional processing to map to EnrichedNotification type
}
There is no need for the following and attempt to route the output of one processor into another? (Not sure that it is even possible)
Example 2 (conceptual)
I am probably thinking this way because I am coming from using pure Kafka. Here process1 has a 3 way branch. Two of the branches go to their respective stream and then topic, but the third requires further processing before it can be routed to a topic.
#Bean
public Function<KStream<String, Notification>,KStream<String, Notification>[]> process1() {
return input -> input
.branch(PREDICATE_FOR_OUT_0, PREDICATE_FOR_OUT_1, PREDICATE_FOR_OUT_2);
}
Could we potentially route the branch for PREDICATE_FOR_OUT_2 into process2. This would mean process2 would only be called if PREDICATE_FOR_OUT_2 was met
#Bean
public Function<KStream<String, Notification>,KStream<String, EnrichedNotification>> process2() {
return input -> input
.map((key, value) ->.........; //different additional processing to map to EnrichedNotification type
}
My thinking is example 2 is redundant (and not actually possible anyway) due to the abstraction and functionality that Kafka Streams gives
I think both cases of your examples can get the job done, but there are some differences. In the first example, you have two functions, both receiving data from the same Kafka topic and the second function performs some additional logic before getting routed to the output topic. In the second example, you again have two functions. In the first function, you have 3 branches, each of them sending data to a Kafka topic (I assume they are 3 different topics). Then in the second function, you receive data from the 3rd output topic from the first function. After performing the logic in that second function of example 2, you send it to the final destination for this branch. You are introducing an extra topic for this second example. I think your first example is more readable and clean.

kafka asynchronous send not really asynchronous?

I am using KafkaProducer from the kafka-client 1.0.0 library, and as per the documentation, the method Future<RecordMetadata> send(ProducerRecord<K, V> record) will immediately return but actually, but looks like not. This method also calls another method which is doSend (see below for the snippet) in the same class, and inside this method, it is waiting for the metadata of the topic, which I think is necessary as it is related to partitions and etc.
/**
* Implementation of asynchronously send a record to a topic.
*/
private Future<RecordMetadata> doSend(ProducerRecord<K, V> record, Callback callback) {
TopicPartition tp = null;
try {
// first make sure the metadata for the topic is available
ClusterAndWaitTime clusterAndWaitTime = waitOnMetadata(record.topic(), record.partition(), maxBlockTimeMs);
long remainingWaitMs = Math.max(0, maxBlockTimeMs - clusterAndWaitTime.waitedOnMetadataMs);
Cluster cluster = clusterAndWaitTime.cluster;
Is there any other options that it is fully asynchronous? The problem with this why I wanted it to be fully asynchronous is because if some of the servers in the bootstrap.servers are not responding, it will wait with the time based on max.block.ms, but i don't actually want it to wait, but instead, i just wanted it to return.
The documentation where i saw that it is gonna return immediately:
KafkaProducer java doc
The send is asynchronous and this method will return immediately once
the record has been stored in the buffer of records waiting to be
sent. This allows sending many records in parallel without blocking to
wait for the response after each one.
your analysis is correct - kafka has a (sometimes) blocking "non-blocking" API.
this has been brought up before - https://cwiki.apache.org/confluence/display/KAFKA/KIP-286%3A+producer.send%28%29+should+not+block+on+metadata+update - but never prioritized.
It's as asynchronous as it can be. Kafka maintains a cache of metadata that gets updated occasionally to keep it current and in your scenario you only wait if that cache is stale or not initialized. Once the cache is initialized there's no wait.
If your code has a single upcoming send() that must be executed as quickly as possible, you might try sending a prepatory partitionsFor() method call to the producer to see if you can't force update the cache if needed.
Aside from that, there will always be the potential, occasional wait for the metadata cache to be refreshed.

How does Storm KafkaSpout know all bolts are executed

Example my topology code like this:
builder.setSpout("spout", new KafkaSpout);
builder.setBolt("bolt1", new Bolt1).shuffleGrouping("spout");
builder.setBolt("bolt2", new Bolt2).shuffleGrouping("bolt1");
builder.setBolt("bolt3", new Bolt3).shuffleGrouping("bolt2");
When bolt1 emitted, the message will be auto acked. But when bolt2 or bolt3 has exception occured, this message can't be resend, How can I retrieve failed message?
Storm has the concept of tuple trees at the helm of it. Let me try to explain using your example provided in the question.
When your spout calls the collector.emit method, the newly emitted tuple, let's call it tuple1, is added to tuple tree. This tuple reaches bolt1 as it has subscribed to it and will receive data emitted from the spout. Once it receives tuple1 as input in the execute method, after processing the input a new value is emitted as tuple2 which is added in the tuple tree after tuple1. Before exiting the execute method the tuple is acknowledged by calling collector.ack implicitly which tells storm that tuple1 has been processed please remove it from the tuple tree and now remains tuple2 which is passed on to bolt2 for processing.
Now the question arises what happens if bolt1 is unable to acknowledge due to some reason. Storm will see that after a certain period of time, which is the topology timeout time (defaults to 30s), the tuple tree hasn't exhausted thus it will replay the tuple from the start and the same above process will follow.
Hope i am able to explain what happens on failure. For more detail please read this or watch this

Too many tuple failures- Storm Topology

I have a storm application in which I have 1 spout and 5 bolts. Topology is working fine. but I gives Too many tuple failures error after 30min. In 1st bolt to 2nd bolt only 20% data is processed due to some analytics condition. 80% data discarded. I think this error occurred due to 80% data discarded or anything else. I don't know what's the reason and how to solve it.
If you use fault-tolerance in Storm (ie, assign message IDs to tuples in your spout), you need to ack all tuples in the bolt that consumes the spout's output. Even if you discard some tuples due to a filter condition, because "discarding a tuple" still means, that this tuple is fully processed, ie, you need to tell Storm about this -- otherwise, Storm thinks something went wrong (due to timeout) and fails the tuple.
KafkaSpouts assign message IDs automatically. You just need to ack all incoming tuples:
void execute(Tuple input) {
if(input-is-forwarded) {
collector.emit(input, new Values(/* generate output tuple */);
}
// ack tuple (regardless if forwarded or discarded)
collector.ack(input);
}