Transaction in spring cloud stream with data base operations - spring-data-jpa

I use spring cloud kafka stream to do some processing to the stream save in the middle of the topology some data to the db and then continue the process in my topology and finally send final result to other topic
My Function bean looks something like this:
#Component("handler")
public Class Handler implements Function<KStrean<Key1,Value1>,KStream<Key2,Valu2>> {
KStream<Key2,Value2> apply(KStream<Key1,Value1> input){
//start process topology and make few transformation
input.mapValues(...).filter...(...)
//now save the the intermediate result to DB via Spring data JPA and Hibernate
someEntityJpaRepo.save()
//continue with the remaning topology
Where should I use the #Transactional annotation if I want all or nothing behavior like in RDBMS and what PlatformTransactionalManager should I use? Jpa? Kafka?
For example I want to in the case of exception in the DB query or exception in the stream processing that the db transaction will roll back as well the kafa transaction

KafkaTransactionManager is NOT for KStream - kafka streams manage their own transactions.
Since Spring is not involved with Kafka Streams at runtime (only for topology setup), there is no concept of transaction synchronization between the two.
The JPA transaction will be independent of the Kafka Streams transaction and you must deal with the possibility of duplicate deliveries (make the DB update idempotent or use some other technique to detect that this Kafka record has already been stored to the DB.

Related

SAGA and local transactions with Kafka and Postgres in Spring Boot

Haven't worked with SAGAs and spring-kafka (and spring-cloud-stream-kafka-binder) for a while.
Context: there are several (3+) Spring-boot microservices that have to span business transaction in order to keep data in eventually consistent state. They use Database-per-Service approach (each service stores data in Postgres) and collaborate via Kafka as an Event-Store.
I'm going to apply SAGA (either choreography or orchestration approach, let's stick to the first one) to manage transaction over multiple services.
The question is: how to support local transactions when using RDBMS (Postgres) as a data store along with Kafka as an Event-Store/messaging middleware?
In nowadays, does actually spring-kafka support JTA transactions and would it be enough to wrap RDBMS and Kafka Producer into #Transactional methods? Or do we still have to apply some of Transactional microservices patterns (like Transactional Outbox, Transaction Log Tailing or Polling Publisher)?
Thanks in advance
Kafka does not support JTA/XA. The best you can do is "Best Effort 1PC" - see Dave Syer's Javaworld article; you have to handle possible duplicates.
Spring for Apache Kafka provides the ChainedKafkaTransactionManager; for consumer-initiated transactions, the CKTM should be injected into the listener container.
The CKTM should have the KTM first, followed by the RDBMS, so the RDBMS transaction will be committed first; if it fails, the Kafka tx will roll back and the record redelivered. If the DB succeeds but Kafka fails, the record will be redelivered (with default configuration).
For producer-only transactions, you can use #Transactional. In that case, the TM can just be the RDBMS TM and Spring-Kafka will synchronize a local Kafka transaction, committing last.
See here for more information.

Using Apache Kafka to maintain data integrity across databases in microservices architecture

Has anyone used Apache Kafka to maintain data integrity across microservice architecture which each service has its own database? I have been searching around and there was some posts mentioned about using Kafka but I'm looking for more details such as in how Kafka was used. Do you have to write code for producer and consumer (say for Customer database as producer and Orders database as consumer so that if a Customer is deleted in the Customer database then the Orders database somehow need to know that so it will delete all Orders for that Customer as well).
Yes, you'll need to write that processing code
For example, one database would be connected to a CDC reader to emit all changes to a stream (the producer), which could be fed into a KTable or custom consumer to write upserts/deletes into a local cache of another service. I mention it ought to be a cache rather than a database is because when the service restarts, you potentially miss some events, or duplicate others, so the source of the materialized view should ideally be Kafka itself (via a compacted topic)

Streaming database data to Kafka topic without using a connector

I have a use case where I have to push all my MySQL database data to a Kafka topic. Now, I know I can get this up and running using a Kafka connector, but I want to understand how it all works internally without using a connector. In my spring boot project I already have created a Kafka Producer file where I set all my configuration, create a Producer record and so on.
Has anyone tried this approach before? Can anyone throw some light on this?
Create entity using spring jpa for tables and send data to topic using find all. Use scheduler for fetching data and sending it to topic. You can add your own logic for fetching from DB and also a different logic for sending it to Kafka topic. Like fetch using auto increment, fetch using last updated timestamp or a bulk fetch. Same logic of JDBC connectors can be implemented.
Kakfa Connect will do it in an optimized way.

How does Spring Kafka/Spring Cloud Stream guarantee the transactionality / atomicity involving a Database and Kafka?

Spring Kafka, and thus Spring Cloud Stream, allow us to create transactional Producers and Processors. We can see that functionality in action in one of the sample projects: https://github.com/spring-cloud/spring-cloud-stream-samples/tree/master/transaction-kafka-samples:
#Transactional
#StreamListener(Processor.INPUT)
#SendTo(Processor.OUTPUT)
public PersonEvent process(PersonEvent data) {
logger.info("Received event={}", data);
Person person = new Person();
person.setName(data.getName());
if(shouldFail.get()) {
shouldFail.set(false);
throw new RuntimeException("Simulated network error");
} else {
//We fail every other request as a test
shouldFail.set(true);
}
logger.info("Saving person={}", person);
Person savedPerson = repository.save(person);
PersonEvent event = new PersonEvent();
event.setName(savedPerson.getName());
event.setType("PersonSaved");
logger.info("Sent event={}", event);
return event;
}
In this excerpt, there's a read from a Kafka topic, a write in a database and another write to another Kafka topic, all of this transactionally.
What I wonder, and would like to have answered is how is that technically achieved and implemented.
Since the datasource and Kafka don't participate in a XA transaction (2 phase commit), how does the implementation guarantee that a local transaction can read from Kafka, commit to a database and write to Kafka all of this transactionally?
There is no guarantee, only within Kafka itself.
Spring provides transaction synchronization so the commits are close together but it is possible for the DB to commit and the Kafka does not. So you have to deal with the possibility of duplicates.
The correct way to do this, when using spring-kafka directly, is NOT with #Transactional but to use a ChainedKafkaTransactionManager in the listener container.
See Transaction Synchronization.
Also see Distributed transactions in Spring, with and without XA and the "Best Efforts 1PC pattern" for background.
However, with Stream, there is no support for the chained transaction manager, so the #Transactional is required (with the DB transaction manager). This will provide similar results to chained tx manager, with the DB committing first, just before Kafka.

How to query the event repository in a microservice Event Sourcing architecture with Spring Cloud Stream Kafka

CLARIFICATION: Notice that this question is different form this one: How to implement a microservice Event Driven architecture with Spring Cloud Stream Kafka and Database per service
This one is about using Kafka as the only repository (of events), no DB needed, The other one is about using a Database (MariaDB) per service + Kafka.
I would like to implement an Event Sourcing architecture to handle distributed transactions:
OrdersService <------------> | Kafka Event Store | <------------>PaymentsService
subscribe/ subscribe/
find find
OrdersService receives an order request and stores the new Order in the broker.
private OrderBusiness orderBusiness;
#PostMapping
public Order createOrder(#RequestBody Order order){
logger.debug("createOrder()");
//do whatever
//Publish the new Order with state = pending
order.setState(PENDING);
try{
orderSource.output().send(MessageBuilder.withPayload(order).build());
}catch(Exception e){
logger.error("{}", e);
}
return order;
}
This is my main doubt: how can I query a Kafka broker? Imagine I want to search for orders by user/date,state, etc.
Short answer: you cannot query the broker but you could exploit Kafka's Streams API and "Interactive Queries".
Long answer: The access pattern for reading Kafka topics, are linear scans and not random lookups. Of course, you can also reposition at any time via #seek(), but only by offset or time. Also topics are sharded into partitions and data is (by default) hash partitioned by key (data model is key-value pairs). So there is a notion of a key.
However, you can use Kafka's Streams API that allows you to build an app that hold the current state -- base on a Kafka topics that is the ground truth -- as a materialized view (basically a cache). "Interactive Queries" allows you to query this materialized view.
For more details, see this two blog post:
https://www.confluent.io/blog/unifying-stream-processing-and-interactive-queries-in-apache-kafka/
https://www.confluent.io/blog/data-dichotomy-rethinking-the-way-we-treat-data-and-services/