Error handling in Spring Cloud Kafka Streams - apache-kafka

I'm using Spring Cloud Stream with Kafka Streams. Let's say I have a processor which is a Function which converts a KStream of Strings to a KStream of CityProgrammes. It invokes an API to find the City by name and an other transformation which finds any events near that city.
Now the problem is that any error happens during the transformation, the whole application stops. I want to send that one particular message to a DLQ and move along. I've been reading for days and everyone suggests to handle errors within the called services but that is a nonesense in my opinion, plus I still need to return a KStream: how do I do that within a catch?
I also looked at UncaughtExeptionHandler but it is not aware of the message and only able to restart the processing which won't skip this invalid message.
This might sound like an A-B problem so the question rephrased: how do I maintain the flow in a KStream when an exception occurs and send the invalid item to the DLQ?

When it comes to the application-level errors you have, it is up to the application itself how the error is handled. Kafka Streams and the Spring Cloud Stream binder mainly support deserialization and serialization errors at the framework level. Although that is the case, I think your scenario can be handled. If you are using Kafka Client prior to 2.8, here is an SO answer I gave before on something similar: https://stackoverflow.com/a/66749750/2070861
If you are using Kafka/Streams 2.8, here is an idea that you can use. However, the code below should only be used as a starting point. Adjust it according to your use case. Read more on how branching works in Kafka Streams 2.8. The branching API is significantly refactored in 2.8 from the prior versions.
public Function<KStream<?, String>, KStream<?, Foo>> convert() {
Foo[] foo = new Foo[0];
return input -> {
final Map<String, ? extends KStream<?, String>> branches =
input.split(Named.as("foo-")).branch((key, value) -> {
try {
foo[0] = new Foo(); // your API call for CitiProgramme converion here, possibly.
return true;
}
catch (Exception e) {
Message<?> message = MessageBuilder.withPayload(value).build();
streamBridge.send("to-my-dlt", message);
return false;
}
}, Branched.as("bar"))
.defaultBranch();
final KStream<?, String> kStream = branches.get("foo-bar");
return kStream.map((key, value) -> new KeyValue<>("", foo[0]));
};
}
}
The default branch is ignored in this code because that only contains the records that threw exceptions. Those were handled by the catch statement above in which we send the records to a DLT programmatically. Finally, we get the good records and map them to a new KStream and send it through the outbound.

Related

Quarkus/Smallrye reactive kafka - Endpoint success/failure response from Message

I'm looking to respond to a REST endpoint with a Success/Failure response that dynamically accepts a topic as a query param. In Quarkus with smallrye reactive messaging the code would look something like below wrapping the payload with OutgoingKafkaRecordMetadata
i.e. https://myendpoint/publishToKafka?topic=myDynamicTopic
#Channel("test")
Emitter<byte []> kafkaEmitter;
#POST
#Path("/publishToKafka")
public CompletionStage<Void> publishRecord(#QueryParam("topic") String topic, byte [] payload){
kafkaEmitter.send(Message.of(payload).addMetadata(OutgoingKafkaRecordMetadata.<String>builder()
.withKey("my-key")
.withTopic("myDynamicTopic")
.build()));
}
From the Quarkus doco "If the endpoint does not return a CompletionStage, the HTTP response may be written before the message is sent to Kafka, and so failures won’t be reported to the user." The example here describes this process when you send a payload directly (i.e. emitter.send(payload) which returns a CompletionStage but emitter.send(message) returns void) but this requires configuring the topic in advance. Is it possible to specify metadata with a Message and still respond to the calling client with a success/failure response? (I don't mind if it's with Emitter and CompletionStage or MunityEmitter and Uni).
Any advice or suggestions would be appreciated.
Because you use a Message (as you need to specify the topic), you need something a bit more convoluted:
#Channel("test")
Emitter<byte []> kafkaEmitter;
#POST
#Path("/publishToKafka")
public CompletionStage<Void> publishRecord(#QueryParam("topic") String topic, byte [] payload){
CompletableFuture<Void> future = new CompletableFuture<>();
Message<byte[]> message = Message.of(payload).addMetadata(OutgoingKafkaRecordMetadata.
<String>builder()
.withKey("my-key")
.withTopic("myDynamicTopic")
.build()));
message = message.withAck(() -> {
future.complete(null));
return CompleteableFuture.completedFuture(null);
}
.withNack(t -> {
future.completeExceptionnaly(t));
return CompleteableFuture.completedFuture(null);
});
kafkaEmitter.send(message);
return future;
}
In this snippet, I also attach the ack and nack handlers called when the message is either acknowledged (accepted by the broker) or rejected (something wrong happened).
These callbacks report to future, a CompletableFuture created in the method. This is the object to return, as it will do what you want: indicate the outcome.
I know the callbacks are slightly complicated. This is mainly due to the spec: We have to return CompleteableFuture.completedFuture(...); to acknowledge that the nack-process was successful. If we were to return future; instead (which we have set to future.completeExceptionnaly(t));), this would be interpreted as a failure during the nack-process. This would basically be the equivalent to a throw within a catch-block in the imperative world.
Fortunately, an easier version will be available soonish (no worries, we won't break).

Spring boot Kafka request-reply scenario

I am implementing POC of request/reply scenario in order to move event-based microservice stack with using Kafka.
There is 2 options in spring.
I wonder which one is better to use. ReplyingKafkaTemplate or cloud-stream
First is ReplyingKafkaTemplate which can be easily configured to have dedicated channel to reply topics for each instance.
record.headers().add(new RecordHeader(KafkaHeaders.REPLY_TOPIC, provider.getReplyChannelName().getBytes()));
Consumer should not need to know replying topic name, just listens a topic and returns with given reply topic.
#KafkaListener(topics = "${kafka.topic.concat-request}")
#SendTo
public ConcatReply listen(ConcatModel request) {
.....
}
Second option is using combination of StreamListener, spring-integration and IntegrationFlows. Gateway should be configured and reply topics should be filtered.
#MessagingGateway
public interface StreamGateway {
#Gateway(requestChannel = START, replyChannel = FILTER, replyTimeout = 5000, requestTimeout = 2000)
String process(String payload);
}
#Bean
public IntegrationFlow headerEnricherFlow() {
return IntegrationFlows.from(START)
.enrichHeaders(HeaderEnricherSpec::headerChannelsToString)
.enrichHeaders(headerEnricherSpec -> headerEnricherSpec.header(Channels.INSTANCE_ID ,instanceUUID))
.channel(Channels.REQUEST)
.get();
}
#Bean
public IntegrationFlow replyFiltererFlow() {
return IntegrationFlows.from(GatewayChannels.REPLY)
.filter(Message.class, message -> Channels.INSTANCE_ID.equals(message.getHeaders().get("instanceId")) )
.channel(FILTER)
.get();
}
Building reply
#StreamListener(Channels.REQUEST)
#SendTo(Channels.REPLY)
public Message<?> process(Message<String> request) {
Specifying reply channel is mandatory. So receieved reply topics are filtered according to instanceID which a kind of workaround (might bloat up the network). On the other hand, DLQ scenario is enabled with adding
consumer:
enableDlq: true
Using spring cloud streams looks promising in terms of interoperability with RabbitMQ and other features, but not officially supports request reply scenario right away. Issue is still open, not rejected also. (https://github.com/spring-cloud/spring-cloud-stream/issues/1800)
Any suggestions are welcomed.
Spring Cloud Stream is not designed for request/reply; it can be done, it is not straightforward and you have to write code.
With #KafkaListener the framework takes care of everything for you.
If you want it to work with RabbitMQ too, you can annotate it with #RabbitListener as well.

Kafka Producer : Handle Exception in Async Send with Callback

I need to catch the exceptions in case of Async send to Kafka. The Kafka producer Api comes with a fuction send(ProducerRecord record, Callback callback). But when I tested this against following two scenarios :
Kafka Broker Down
Topic not pre created
The callbacks are not getting called. Rather I am getting warning in the code for unsuccessful send (as shown below).
Questions :
So are the callbacks called only for specific exceptions ?
When does Kafka Client try to connect to Kafka broker while async send : on every batch send or periodically ?
Kafka Warning Image
Note : I am also using linger.ms setting of 25 sec to batch send my records.
public class ProducerDemo {
static KafkaProducer<String, String> producer;
public static void main(String[] args) throws IOException {
final Logger logger = LoggerFactory.getLogger(ProducerDemo.class);
Properties properties = new Properties();
properties.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "127.0.0.1:9092");
properties.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
properties.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
properties.setProperty(ProducerConfig.ACKS_CONFIG, "1");
properties.setProperty(ProducerConfig.LINGER_MS_CONFIG, "30000");
producer = new KafkaProducer<String, String>(properties);
String topic = "first_topic";
for (int i = 0; i < 5; i++) {
String value = "hello world " + Integer.toString(i);
String key = "id_" + Integer.toString(i);
ProducerRecord<String, String> record = new ProducerRecord<String, String>(topic, key, value);
producer.send(record, new Callback() {
public void onCompletion(RecordMetadata recordMetadata, Exception e) {
//execute everytime a record is successfully sent or exception is thrown
if(e == null){
// No Exception
}else{
//Exception Handling
}
}
});
}
producer.close();
}
You will get those warning for non-existing topic as a resilience mechanism provided with KafkaProducer. If you wait a bit longer(should be 60 seconds by default), the callback will be called eventually:
Here's my snippet:
So, when something goes wrong and async send is not successful, it will eventually fail with a failed future or/and a callback with exception.
If you are not running it transactionally, it can still mean that some messages from the batch have found their way to the broker, while others haven't.
It will most certainly be a problem if you need a blocking-style acknowledgement to the upstream system(like http ingestion interface, etc.) per every message that is sent to Kafka. The only way to do that is by blocking every message with the future's get, as described in the documentation:
In general, I've noticed a lot of question related to KafkaProducer delivery semantics and guarantees. It can definitely be documented better.
One more thing, since you mentioned linger.ms:
Note that records that arrive close together in time will generally
batch together even with linger.ms=0 so under heavy load batching will
occur regardless of the linger configuration
For the first question, here is the answer.
As per the apache kafka documentation, you can capture below exceptions using onCompletion method when you are implementing Callback interface
https://kafka.apache.org/25/javadoc/org/apache/kafka/clients/producer/Callback.html
For the second question, the combination of below properties control when to send the records and as far as i understand, it's same for synchronous or asynchronous call.
linger.ms
max.block.ms
https://kafka.apache.org/documentation/#linger.ms
So are the callbacks called only for specific exceptions ?
Yes, that's how it works. From documentation (2.5.0):
* Fully non-blocking usage can make use of the {#link Callback} parameter to provide a callback that
* will be invoked when the request is complete.
Notice the important part: when the request is complete, what means that the producer must have accepted the record and sent the ProduceRequest to Kafka Broker. Without digging too deep into internals, this means that broker metadata must be present and the partition must exist.
When it comes to formal specification, you'd need to take a good look at send()'s Javadoc and possibly at KafkaProducer's implementation of doSend method. Out there you're going to see that multiple exceptions can be thrown at the in submitting call (instead of returning a future and invoking callback), e.g. :
if broker metadata is not available in timeout given,
if data could not be serialized,
if serialized form was too large, etc.

Writing Verticles that performs CRUD Operations on a file

I'm new to Vert.x and trying I am trying to implement a small REST API that stores its data in JSON files on the local file system.
So far I managed to implement the REST API since Vertx is very well documented on that part.
What I'm currently looking for are examples how to build data access objects in Vert.x. How can I implement a Verticle that can perform crud operations on a text file containing JSON?
Can you provide me any examples? Any hints?
UPDATE 1:
By CRUD operations on a file I'm thinking of the following. Imagine there is a REST resource called Records exposed on the the path /api/v1/user/:userid/records/.
In my verticle that starts my HTTP server I have the following routes.
router.get('/api/user/:userid/records').handler(this.&handleGetRecords)
router.post('/api/user/:userid/records').handler(this.&handleNewRecord)
The handler methods handleGetRecords and handleNewRecord are sending a message using the Vertx event bus.
request.bodyHandler({ b ->
def userid = request.getParam('userid')
logger.info "Reading record for user {}", userid
vertx.eventBus().send(GET_TIME_ENTRIES.name(), "read time records", [headers: [userId: userid]], { reply ->
// This handler will be called for every request
def response = routingContext.response()
if (reply.succeeded()) {
response.putHeader("content-type", "text/json")
// Write to the response and end it
response.end(reply.result().body())
} else {
logger.warn("Reply failed {}", reply.failed())
response.statusCode = 500
response.putHeader("content-type", "text/plain")
response.end('That did not work out well')
}
})
})
Then there is another verticle that consumes these messages GET_TIME_ENTRIES or CREATE_TIME_ENTRY. I think of this consumer verticle as a Data Access Object for Records. This verticle can read a file of the given :userid that contains all user records. The verticle is able to
add a record
read all records
read a specific record
update a record
delete a or all records
Here is the example of reading all records.
vertx.eventBus().consumer(GET_TIME_ENTRIES.name(), { message ->
String userId = message.headers().get('userId')
String absPath = "${this.source}/${userId}.json" as String
vertx.fileSystem().readFile(absPath, { result ->
if (result.succeeded()) {
logger.info("About to read from user file {}", absPath)
def jsonObject = new JsonObject(result.result().toString())
message.reply(jsonObject.getJsonArray('records').toString())
} else {
logger.warn("User file {} does not exist", absPath)
message.fail(404, "user ${userId} does not exist")
}
})
})
What I trying to achieve is to read the file like I did above and deserialise the JSON into a POJO (e.g. a List<Records>). This seems much more convenient that working with JsonObject of Vertx. I don't want to manipulate the JsonObject instance.
First of all, your approach using EventBus is fine, in my opinion. It may be a bit slower, because EventBus will serialize/deserialize your objects, but it gives you a very good decoupling.
Example of another approach you can see here:
https://github.com/aesteve/vertx-feeds/blob/master/src/main/java/io/vertx/examples/feeds/dao/RedisDAO.java
Note how every method receives handler as its last argument:
public void getMaxDate(String feedHash, Handler<Date> handler) {
More coupled, but also more efficient.
And for a more classic and straightforward approach, you can see the official examples:
https://github.com/aokolnychyi/vertx-example/blob/master/src/main/java/com/aokolnychyi/vertx/example/dao/MongoDbTodoDaoImpl.java
You can see that here DAO is pretty much synchronous, but since the handlers are still async, it's fine anyway.
I guess the following link will help you out and this is a good example of Vertx crud operations.
Vertx student crud operations using hikari

vertX eventBus consumer listens to all addresses

I'd like to write a catch all eventBus consumer. Is this possible?
eB = vertx.eventBus();
MessageConsumer<JsonObject> consumer = eB.consumer("*"); // What is catch all address ???
consumer.handler(message -> {
Log.info("Received: " + message.body().toString());
});
A solution to your problem might be an interceptor.
vertx.eventBus().addInterceptor( message -> {
System.out.println("LOG: " + message.message().body().toString());
});
This handler will write every message that comes to the event-bus in vertx.
Reference is here:
http://vertx.io/docs/apidocs/io/vertx/rxjava/core/eventbus/EventBus.html#addInterceptor-io.vertx.core.Handler-
Also, version of vertx-core that I'm using is 3.3.2, I think interceptor functionality is not available in older versions (e.g. 3.0.0).
Having looked through the Java code, I don't think this is possible.
Vert.x stores event bus consumers in a MultiMap looking like:
AsyncMultiMap<String, ServerID>
where the String key is the consumer address.
And as you'd guess, Vert.x just does a map.get(address) to find out the relevant consumers.
Update after OP comment
While I think your use case is valid, I think you're going to have to roll something yourself.
As far as I can see, Vert.x doesn't store consumers of send and publish separately. It's all in one MultiMap. So it would be inadvisable to try to register consumers for all events.
If someone does an eventBus.send(), and Vert.x selects your auditing consumer, it will be the only consumer receiving the event, and I'm going to guess that's not what you want.
I dont know if that´s possible but referring to the documentation, you can put a listener to the events to know when a publish, send, open_socket, close_socket is invoked
sockJSHandler.bridge(options, be -> {
if (be.type() == BridgeEvent.Type.PUBLISH || be.type() == BridgeEvent.Type.RECEIVE) {
Log.info("Received: " + message.body().toString());
}
be.complete(true);
});