Kafka Transaction in case multi Threading - apache-kafka

I am trying to create kafka producer in trasnsaction i.e. i want to write a group of msgs if anyone fails i want to rollback all the msg.
kafkaProducer.beginTransaction();
try
{
// code to produce to kafka topic
}
catch(Exception e)
{
kafkaProducer.abortTransaction();
}
kafkaProducer.commitTransaction();
The problem is for single thread above works just fine, but when multiple threads writes it throws exception
Invalid transaction attempted from state IN_TRANSITION to IN_TRANSITION
while debugging I found that if the thread1 transaction is in progress and thread2 also says beingTransaction it throws this exception. What I dont find if how to solve this issue. One possible thing I could find is creating a pool of produce.
Is there any already available API for kafka producer pool or i will have to create my own.
Below is the improvement jira already reported for this.
https://issues.apache.org/jira/browse/KAFKA-6278
Any other suggestion will be really helpful

You can only have a single transaction in progress at a time with a producer instance.
If you have multiple threads doing separate processing and they all need exactly once semantics, you should have a producer instance per thread.

Not sure if this was resolved.
you can use apache common pool2 to create a producer instance pool.
In the create() method of the factory implementation you can generate and assign a unique transactionalID to avoid a conflict (ProducerFencedException)

Related

InvalidPidMappingException causes kafka stream application to shut down

I have this application written in kafka streams. Every now and then it throws InvalidPidMappingException.
Caused by: org.apache.kafka.common.KafkaException: org.apache.kafka.common.errors.InvalidPidMappingException: The producer attempted to use a producer id which is not currently assigned to its transactional id.
And I have this snippet of code which sets the Uncaught Exception Handler for the app
streams.setUncaughtExceptionHandler(
(Thread thread, Throwable exp) -> {
log.error("Unhandled exception in thread with name ", exp);
SpringApplication.exit(applicationContext, () -> 1);
}
);
I understand that this exception occurs when the coordinator expires the producer's transaction id after it has not received any transaction status updates from it.
I have few questions regarding this exception:
I thought that after this exception, the producer would retry to
sync it's transaction id with the coordinator and resume without causing the kafka stream thread to be killed. Even if I change the above code snippet to not exit the Spring Application on InvalidPidMappingException, it still kills the stream thread. Is there a way to avoid the death of streams thread on InvalidPidMappingException? I have seen the desired behaviour when there is UnknownProducerIdException. Or am I missing something here?
Other than transactional id expiration, can there be any other reason for this exception to occur?
Why the InvalidPidMappingException is handled differently than
UnknownProducerIdException? The former kills the Stream Tread and the latter recovers just fine.
I am using the following versions of the libraries:
spring-kafka-version = '2.5.5.RELEASE'
apache-kafka-clientVersion = '2.5.1'
confluent-version = '5.4.2'
A few months late, but the discussion on this Apacha Kafka Jira is helpful.
To summarise, with v2.8.0 and above of the Apache Kafka streams library you can use the new setUncaughtExceptionHandler() method in org.apache.kafka.streams.KafkaStreams to handle any uncaught exceptions and keep the stream running by terminating the current thread and creating a new one for future processing by returning org.apache.kafka.streams.errors.StreamsUncaughtExceptionHandler.StreamThreadExceptionResponse.REPLACE_THREAD.
e.g.
kafkaStreams.setStreamsUncaughtExceptionHandler(e -> {
return StreamsUncaughtExceptionHandler.StreamThreadExceptionResponse.REPLACE_THREAD;
});
Javadocs for KafkaStreams::setUncaughtExceptionHandler method added in 2.8.0

Access kafka producer factory from spring-cloud-stream

I have a project using spring-boot-cloud and apache-kafka, I have a list of integration test covering the topology logic, thanks to EmbeddedBroker.
I recently discovered that there are many noise in the log when running these tests.
e.g. [Producer clientId=producer-2] Connection to node 0 (localhost/127.0.0.1:63267) could not be established. Broker may not be available.
After some trial and error it appears that these were the producers created by the spring-cloud-stream bindings. Somehow with #DirtiesContext(classMode = DirtiesContext.ClassMode.AFTER_EACH_TEST_METHOD) on the class level they do not appear to be cleaned up after each test.
Thus I figured if I can get access to the producer factory I can then manually clean them up inside the #AfterEach clause of my test class. I've tried to autowire KafkaTemplate but it didn't help. I don't know how I can access the producer factory since it's created implicitly by the framework.
Please be noted that these do not appear to affect the test result since they only show up at the end of the test phase.
Thanks in advance!
You can add a ProducerMessageHandlerCustomizer bean and get a reference to the producer factory that way.
#Bean
ProducerMessageHandlerCustomizer<KafkaProducerMessageHandler> cust() {
return (handler, dest) -> {
this.pfMap.put(dest, handler.getKafkaTemplate().getProducerFactory());
}
}
Store the PF in a map in the test case, then reset() it when you want to close the producer(s).

Kafka - Different KafkaProducers in same transaction

For migration purpose, we need to produce records using two different serialisers, and thus two different KafkaProducers ( one String and one Avro) in the same transaction.
But all the transaction stuff is done through one KafkaProducer instance as follows :
kafkaProducer.beginTransaction();
...
kafkaProducer.send(record);
...
kafkaProducer.commitTransaction();
Can I use a second kafkaProducer (with the second serializer) and use the same transactionnal.id and do like this :
kafkaProducer.beginTransaction();
...
kafkaProducer.send(record);
kafkaProducer2.send(record);
...
kafkaProducer.commitTransaction();
All will be part of the same transaction , all consistent ?
EDIT 1 :
According to what I saw in the java implementation, there is some mechanism when calling commitTransaction() like calling flush() on the producer itself.. so I think the model above won't work..
Any chance of achieving this without instantiating a full new instance of everything in parallel ?
You can only have a single producer active in a transaction at a time.
If you start 2 producers with the same transactional.id, one of them will be fenced and won't be able to commit its records and all ecords won't be part of the same transaction.
You need to use a single producer and one possible workaround is to configure it to use the BytesSerializer and handle the convertion of your Objects to bytes in your logic explicitely.

How to implement a microservice Event Driven architecture with Spring Cloud Stream Kafka and Database per service

I am trying to implement an event driven architecture to handle distributed transactions. Each service has its own database and uses Kafka to send messages to inform other microservices about the operations.
An example:
Order service -------> | Kafka |------->Payment Service
| |
Orders MariaDB DB Payment MariaDB Database
Order receives an order request. It has to store the new Order in its DB and publish a message so that Payment Service realizes it has to charge for the item:
private OrderBusiness orderBusiness;
#PostMapping
public Order createOrder(#RequestBody Order order){
logger.debug("createOrder()");
//a.- Save the order in the DB
orderBusiness.createOrder(order);
//b. Publish in the topic so that Payment Service charges for the item.
try{
orderSource.output().send(MessageBuilder.withPayload(order).build());
}catch(Exception e){
logger.error("{}", e);
}
return order;
}
These are my doubts:
Steps a.- (save in Order DB) and b.- (publish the message) should be performed in a transaction, atomically. How can I achieve that?
This is related to the previous one: I send the message with: orderSource.output().send(MessageBuilder.withPayload(order).build()); This operations is asynchronous and ALWAYS returns true, no matter if the Kafka broker is down. How can I know that the message has reached the Kafka broker?
Steps a.- (save in Order DB) and b.- (publish the message) should be
performed in a transaction, atomically. How can I achieve that?
Kafka currently does not support transactions (and thus also no rollback or commit), which you'd need to synchronize something like this. So in short: you can't do what you want to do. This will change in the near-ish future, when KIP-98 is merged, but that might take some time yet. Also, even with transactions in Kafka, an atomic transaction across two systems is a very hard thing to do, everything that follows will only be improved upon by transactional support in Kafka, it will still not entirely solve your issue. For that you would need to look into implementing some form of two phase commit across your systems.
You can get somewhat close by configuring producer properties, but in the end you will have to chose between at least once or at most once for one of your systems (MariaDB or Kafka).
Let's start with what you can do in Kafka do ensure delivery of a message and further down we'll dive into your options for the overall process flow and what the consequences are.
Guaranteed delivery
You can configure how many brokers have to confirm receipt of your messages, before the request is returned to you with the parameter acks: by setting this to all you tell the broker to wait until all replicas have acknowledged your message before returning an answer to you. This is still no 100% guarantee that your message will not be lost, since it has only been written to the page cache yet and there are theoretical scenarios with a broker failing before it is persisted to disc, where the message might still be lost. But this is as good a guarantee as you are going to get.
You can further reduce the risk of data loss by lowering the intervall at which brokers force an fsync to disc (emphasized text and/or flush.ms) but please be aware, that these values can bring with them heavy performance penalties.
In addition to these settings you will need to wait for your Kafka producer to return the response for your request to you and check whether an exception occurred. This sort of ties into the second part of your question, so I will go into that further down.
If the response is clean, you can be as sure as possible that your data got to Kafka and start worrying about MariaDB.
Everything we have covered so far only addresses how to ensure that Kafka got your messages, but you also need to write data into MariaDB, and this can fail as well, which would make it necessary to recall a message you potentially already sent to Kafka - and this you can't do.
So basically you need to choose one system in which you are better able to deal with duplicates/missing values (depending on whether or not you resend partial failures) and that will influence the order you do things in.
Option 1
In this option you initialize a transaction in MariaDB, then send the message to Kafka, wait for a response and if the send was successful you commit the transaction in MariaDB. Should sending to Kafka fail, you can rollback your transaction in MariaDB and everything is dandy.
If however, sending to Kafka is successful and your commit to MariaDB fails for some reason, then there is no way of getting back the message from Kafka. So you will either be missing a message in MariaDB or have a duplicate message in Kafka, if you resend everything later on.
Option 2
This is pretty much just the other way around, but you are probably better able to delete a message that was written in MariaDB, depending on your data model.
Of course you can mitigate both approaches by keeping track of failed sends and retrying just these later on, but all of that is more of a bandaid on the bigger issue.
Personally I'd go with approach 1, since the chance of a commit failing should be somewhat smaller than the send itself and implement some sort of dupe check on the other side of Kafka.
This is related to the previous one: I send the message with:
orderSource.output().send(MessageBuilder.withPayload(order).build());
This operations is asynchronous and ALWAYS returns true, no matter if
the Kafka broker is down. How can I know that the message has reached
the Kafka broker?
Now first of, I'll admit I am unfamiliar with Spring, so this may not be of use to you, but the following code snippet illustrates one way of checking produce responses for exceptions.
By calling flush you block until all sends have finished (and either failed or succeeded) and then check the results.
Producer<String, String> producer = new KafkaProducer<>(myConfig);
final ArrayList<Exception> exceptionList = new ArrayList<>();
for(MessageType message : messages){
producer.send(new ProducerRecord<String, String>("myTopic", message.getKey(), message.getValue()), new Callback() {
#Override
public void onCompletion(RecordMetadata metadata, Exception exception) {
if (exception != null) {
exceptionList.add(exception);
}
}
});
}
producer.flush();
if (!exceptionList.isEmpty()) {
// do stuff
}
I think the proper way for implementing Event Sourcing is by having Kafka be filled directly from events pushed by a plugin that reads from the RDBMS binlog e.g using Confluent BottledWater (https://www.confluent.io/blog/bottled-water-real-time-integration-of-postgresql-and-kafka/) or more active Debezium (http://debezium.io/). Then consuming Microservices can listen to those events, consume them and act on their respective databases being eventually consistent with the RDBMS database.
Have a look here to my full answer for a guideline:
https://stackoverflow.com/a/43607887/986160

How can a kafka consumer doing infinite retires recover from a bad incoming message?

I am kafka newbie and as I was reading the docs, I had this design related question related to kafka consumer.
A kafka consumer reads messages from the kafka stream which is made up
of one or more partitions from one or more servers.
Lets say one of the incoming messages is corrupt and as a result the consumer fails to process. But when processing event logs you don't want to drop any events, as a result you do infinite retries to avoid transient errors during processing. In such cases of infinite retries, how can the consumer move forward. Is there a way to blacklist this message for next retry?
I'd think it needs manual intervention. Where we log some message metadata (don't know what exactly yet) to look at which message is failing and have logic in place where each consumer checks redis (or someplace else?) after n reties to see if this message needs to be skipped. The blacklist doesn't have to be stored forever in the redis either, only until the consumer can skip it. Here's a pseudocode of what i just described:
while (errorState) {
if (msg in blacklist) {
//skip
commitOffset()
} else {
errorState = processMessage(msg);
if (!errorState) {
commitOffset();
} else {
// log this msg so that we can add to blacklist
logger.info(msg)
}
}
}
I'd like to hear from more experienced folks to see if there are better ways to do this.
We had a requirement in our project where the processing of an incoming message to update a record was dependent on the record being present. Due to some race condition, sometimes update arrived before the insert. In such cases, we implemented couple of approaches.
A. Manual retry with a predefined delay. The code checks if the insert has arrived. If so, processing goes as normal. Otherwise, it would sleep for 500ms, then try again. This would repeat 10 times. At the end, if the message is still not processed, the code logs the message, commits the offset and moves forward. The processing of message is always done in a thread from a pool, so it doesn't block the main thread either. However, in the worst case each message would take 5 seconds of application time.
B. Recently, we refined the above solution to use a message scheduler based on kafka. So now if insert has not arrived before the update, system sends it to a separate scheduler which operates on kafka. This scheduler would replay the message after some time. After 3 retries, we again log the message and stop scheduling or retrying. This gives us the benefit of not blocking the application threads and manage when we would like to replay the message again.