Event sourcing - why a dedicated event store? - apache-kafka

I am trying to implement event sourcing/CQRS/DDD for the first time, mostly for learning purposes, where there is the idea of an event store and a message queue such as Apache Kafka, and you have events flowing from event store => Kafka Connect JDBC/Debezium CDC => Kafka.
I am wondering why there needs to be a separate event store when it sounds like its purpose can be fulfilled by Kafka itself with its main features and log compaction or configuring log retention for permanent storage. Should I store my events in a dedicated store like RDBMS to feed into Kafka or should I feed them straight into Kafka?

Much of the literature on event-sourcing and cqrs comes from the [domain driven design] community; in its earliest form, CQRS was called DDDD... Distributed domain driven design.
One of the common patterns in domain driven design is to have a domain model ensuring the integrity of the data in your durable storage, which is to say, ensuring that there are no internal contradictions...
I am wondering why there needs to be a separate event store when it sounds like its purpose can be fulfilled by Kafka itself with its main features and log compaction or configuring log retention for permanent storage.
So if we want an event stream with no internal contradictions, how do we achieve that? One way is to ensure that only a single process has permission to modify the stream. Unfortunately, that leaves you with a single point of failure -- the process dies, and everything comes to an end.
On the other hand, if you have multiple processes updating the same stream, then you have risk of concurrent writes, and data races, and contradictions being introduced because one writer couldn't yet see what the other one did.
With an RDBMS or an Event Store, we can solve this problem by using transactions, or compare and swap semantics; and attempt to extend the stream with new events is rejected if there has been a concurrent modification.
Furthermore, because of its DDD heritage, it is common for the durable store to be divided into many very fine grained partitions (aka "aggregates"). One single shopping cart might reasonably have four streams dedicated to it.
If Kafka lacks those capabilities, then it is going to be a lousy replacement for an event store. KAFKA-2260 has been open for more than four years now, so we seem to be lacking the first. From what I've been able to discern from the Kakfa literature, it isn't happy about fine grained streams either (although its been a while since I checked, perhaps things have changed).
See also: Jesper Hammarbäck writing about this 18 months ago, and reaching similar conclusions to those expressed here.

Kafka can be used as a DDD event store, but there are some complications if you do so due to the features it is missing.
Two key features that people use with event sourcing of aggregates are:
Load an aggregate, by reading the events for just that aggregate
When concurrently writing new events for an aggregate, ensure only one writer succeeds, to avoid corrupting the aggregate and breaking its invariants.
Kafka can't do either of these currently, since 1 fails since you generally need to have one stream per aggregate type (it doesn't scale to one stream per aggregate, and this wouldn't necessarily be desirable anyway), so there's no way to load just the events for one aggregate, and 2 fails since https://issues.apache.org/jira/browse/KAFKA-2260 has not been implemented.
So you have to write the system in such as way that capabilities 1 and 2 aren't needed. This can be done as follows:
Rather than invoking command handlers directly, write them to
streams. Have a command stream per aggregate type, sharded by
aggregate id (these don't need permanent retention). This ensures that you only ever process a single
command for a particular aggregate at a time.
Write snapshotting code for all your aggregate types
When processing a command message, do the following:
Load the aggregate snapshot
Validate the command against it
Write the new events (or return failure)
Apply the events to the aggregate
Save a new aggregate snapshot, including the current stream offset for the event stream
Return success to the client (via a reply message perhaps)
The only other problem is handling failures (such as the snapshotting failing). This can be handled during startup of a particular command processing partition - it simply needs to replay any events since the last snapshot succeeded, and update the corresponding snapshots before resuming command processing.
Kafka Streams appears to have the features to make this very simple - you have a KStream of commands that you transform into a KTable (containing snapshots, keyed by aggregate id) and a KStream of events (and possibly another stream containing responses). Kafka allows all this to work transactionally, so there is no risk of failing to update the snapshot. It will also handle migrating partitions to new servers, etc. (automatically loading the snapshot KTable into a local RocksDB when this happens).

there is the idea of an event store and a message queue such as Apache Kafka, and you have events flowing from event store => Kafka Connect JDBC/Debezium CDC => Kafka
In the essence of DDD-flavoured event sourcing, there's no place for message queues as such. One of the DDD tactical patterns is the aggregate pattern, which serves as a transactional boundary. DDD doesn't care how the aggregate state is persisted, and usually, people use state-based persistence with relational or document databases. When applying events-based persistence, we need to store new events as one transaction to the event store in a way that we can retrieve those events later in order to reconstruct the aggregate state. Thus, to support DDD-style event sourcing, the store needs to be able to index events by the aggregate id and we usually refer to the concept of the event stream, where such a stream is uniquely identified by the aggregate identifier, and where all events are stored in order, so the stream represents a single aggregate.
Because we rarely can live with a database that only allows us to retrieve a single entity by its id, we need to have some place where we can project those events into, so we can have a queryable store. That is what your diagram shows on the right side, as materialised views. More often, it is called the read side and models there are called read-models. That kind of store doesn't have to keep snapshots of aggregates. Quite the opposite, read-models serve the purpose to represent the system state in a way that can be directly consumed by the UI/API and often it doesn't match with the domain model as such.
As mentioned in one of the answers here, the typical command handler flow is:
Load one aggregate state by id, by reading all events for that aggregate. It already requires for the event store to support that kind of load, which Kafka cannot do.
Call the domain model (aggregate root method) to perform some action.
Store new events to the aggregate stream, all or none.
If you now start to write events to the store and publish them somewhere else, you get a two-phase commit issue, which is hard to solve. So, we usually prefer using products like EventStore, which has the ability to create a catch-up subscription for all written events. Kafka supports that too. It is also beneficial to have the ability to create new event indexes in the store, linking to existing events, especially if you have several systems using one store. In EventStore it can be done using internal projections, you can also do it with Kafka streams.
I would argue that indeed you don't need any messaging system between write and read sides. The write side should allow you to subscribe to the event feed, starting from any position in the event log, so you can build your read-models.
However, Kafka only works in systems that don't use the aggregate pattern, because it is essential to be able to use events, not a snapshot, as the source of truth, although it is of course discussable. I would look at the possibility to change the way how events are changing the entity state (fixing a bug, for example) and when you use events to reconstruct the entity state, you will be just fine, snapshots will stay the same and you'll need to apply correction events to fix all the snapshots.
I personally also prefer not to be tightly coupled to any infrastructure in my domain model. In fact, my domain models have zero dependencies on the infrastructure. By bringing the snapshotting logic to Kafka streams builder, I would be immediately coupled and from my point of view it is not the best solution.

Theoretically you can use Kafka for Event Store but as many people mentioned above that you will have several restrictions, biggest of those, only able to read event with the offset in the Kafka but no other criteria.
For this reason they are Frameworks there dealing with the Event Sourcing and CQRS part of the problem.
Kafka is only part of the toolchain which provides you the capability of replaying events and back pressure mechanism that are protecting you from overload.
If you want to see how all fits together, I have a blog about it

Related

How to replay Event Sourcing events reliably?

One of great promises of Event Sourcing is the ability to replay events. When there's no relationship between entities (e.g. blob storage, user profiles) it works great, but how to do replay quckly when there are important relationships to check?
For example: Product(id, name, quantity) and Order(id, list of productIds). If we have CreateProduct and then CreateOrder events, then it will succeed (product is available in warehouse), it's easy to implement e.g. with Kafka (one topic with n1 partitions for products, another with n2 partitions for orders).
During replay everything happens more quickly, and Kafka may reorder the events (e.g. CreateOrder and then CreateProduct), which will give us different behavior than originally (CreateOrder will now fail because product doesn't exist yet). It's because Kafka guarantees ordering only within one topic within one partition. The easy solution would be putting everything into one huge topic with one partition, but this would be completely unscalable, as single-threaded replay of bigger databases could take days at least.
Is there any existing, better solution for quick replaying of related entities? Or should we forget about event sourcing and replaying of events when we need to check relationships in our databases, and replaying is good only for unrelated data?
As a practical necessity when event sourcing, you need the ability to conjure up a stream of events for a particular entity so that you can apply your event handler to build up the state. For Kafka, outside of the case where you have so few entities that you can assign an entire topic partition to just the events for a single entity, this entails a linear scan and filter through a partition. So for this reason, while Kafka is very likely to be a critical part of any event-driven/event-based system in relaying events published by a service for consumption by other services (at which point, if we consider the event vs. command dichotomy, we're talking about commands from the perspective of the consuming service), it's not well suited to the role of an event store, which are defined by their ability to quickly give you an ordered stream of the events for a particular entity.
The most popular purpose-built event store is, probably, the imaginatively named Event Store (at least partly due to the involvement of a few prominent advocates of event sourcing in its design and implementation). Alternatively, there are libraries/frameworks like Akka Persistence (JVM with a .Net port) which use existing DBs (e.g. relational SQL DBs, Cassandra, Mongo, Azure Cosmos, etc.) in a way which facilitates their use as an event store.
Event sourcing also as a practical necessity tends to lead to CQRS (they go together very well: event sourcing is arguably the simplest possible persistence model capable of being a write model, while its nearly useless as a read model). The typical pattern seen is that the command processing component of the system enforces constraints like "product exists before being added to the cart" (how those constraints are enforced is generally a question of whatever concurrency model is in use: the actor model has a high level of mechanical sympathy with this approach, but other models are possible) before writing events to the event store and then the events read back from the event store can be assumed to have been valid as of the time they were written (it's possible to later decide a compensating event needs to be recorded). The events from within the event store can be projected to a Kafka topic for communication to another service (the command processing component is the single source of truth for events).
From the perspective of that other service, as noted, the projected events in the topic are commands (the implicit command for an event is "update your model to account for this event"). Semantically, their provenance as events means that they've been validated and are undeniable (they can be ignored, however). If there's some model validation that needs to occur, that generally entails either a conscious decision to ignore that command or to wait until another command is received which allows that command to be accepted.
Ok, you are still thinking how did we developed applications in last 20 years instead of how we should develop applications in the future. There are frameworks that actually fits the paradigms of future perfectly, one of those, which mentioned above, is Akka but more importantly a sub component of it Akka FSM Finite State Machine, which is some concept we ignored in software development for years, but future seems to be more and more event based and we can't ignore anymore.
So how these will help you, Akka is a framework based on Actor concept, every Actor is an unique entity with a message box, so lets say you have Order Actor with id: 123456789, every Event for Order Id: 123456789 will be processed with this Actor and its messages will be ordered in its message box with first in first out principle, so you don't need a synchronisation logic anymore. But you could have millions of Order Actors in your system, so they can work in parallel, when Order Actor: 123456789 processing its events, an Order Actor: 987654321 can process its own, so there is the parallelism and scalability. While your Kafka guaranteeing the order of every message for Key 123456789 and 987654321, everything is green.
Now you can ask, where Finite State Machine comes into play, as you mentioned the problem arise, when addProduct Event arrives before createOrder Event arrives (while being on different Kafka Topics), at that point, State Machine will behave differently when Order Actor is in CREATED state or INITIALISING state, in CREATED state, it will just add the Product, in INITIALISING state probably it will just stash it, until createOrder Event arrives.
These concepts are explained really good in this video and if you want to see a practical example I have a blog for it and this one for a more direct dive.
I think I found the solution for scalable (multi-partition) event sourcing:
create in Kafka (or in a similar system) topic named messages
assign users to partitions (e.g by murmurHash(login) % partitionCount)
if a piece of data is mutable (e.g. Product, Order), every partition should contain own copy of the data
if we have e.g. 256 pieces of a product in our warehouse and 64 partitions, we can initially 'give' every partition 8 pieces, so most CreateOrder events will be processed quickly without leaving user's partition
if a user (a partition) sometimes needs to mutate data in other partition, it should send a message there:
for example for Product / Order domain, partitions could work similarly to Walmart/Tesco stores around a country, and the messages sent between partitions ('stores') could be like CreateProduct, UpdateProduct, CreateOrder, SendProductToMyPartition, ProductSentToYourPartition
the message will become an 'event' as if it was generated by an user
the message shouldn't be sent during replay (already sent, no need to do it twice)
This way even when Kafka (or any other event sourcing system) chooses to reorder messages between partitions, we'll still be ok, because we don't ever read any data outside our single-threaded 'island'.
EDIT: As #LeviRamsey noted, this 'single-threaded island' is basically actor model, and frameworks like Akka can make it a bit easier.

Category projections using kafka and cassandra for event-sourcing

I'm using Cassandra and Kafka for event-sourcing, and it works quite well. But I've just recently discovered a potentially major flaw in the design/set-up. A brief intro to how it is done:
The aggregate command handler is basically a kafka consumer, which consumes messages of interest on a topic:
1.1 When it receives a command, it loads all events for the aggregate, and replays the aggregate event handler for each event to get the aggregate up to current state.
1.2 Based on the command and businiss logic it then applies one or more events to the event store. This involves inserting the new event(s) to the event store table in cassandra. The events are stamped with a version number for the aggregate - starting at version 0 for a new aggregate, making projections possible. In addition it sends the event to another topic (for projection purposes).
1.3 A kafka consumer will listen on the topic upon these events are published. This consumer will act as a projector. When it receives an event of interest, it loads the current read model for the aggregate. It checks that the version of the event it has received is the expected version, and then updates the read model.
This seems to work very well. The problem is when I want to have what EventStore calls category projections. Let's take Order aggregate as an example. I can easily project one or more read models pr Order. But if I want to for example have a projection which contains a customers 30 last orders, then I would need a category projection.
I'm just scratching my head how to accomplish this. I'm curious to know if any other are using Cassandra and Kafka for event sourcing. I've read a couple of places that some people discourage it. Maybe this is the reason.
I know EventStore has support for this built in. Maybe using Kafka as event store would be a better solution.
With this kind of architecture, you have to choose between:
Global event stream per type - simple
Partitioned event stream per type - scalable
Unless your system is fairly high throughput (say at least 10s or 100s of events per second for sustained periods to the stream type in question), the global stream is the simpler approach. Some systems (such as Event Store) give you the best of both worlds, by having very fine-grained streams (such as per aggregate instance) but with the ability to combine them into larger streams (per stream type/category/partition, per multiple stream types, etc.) in a performant and predictable way out of the box, while still being simple by only requiring you to keep track of a single global event position.
If you go partitioned with Kafka:
Your projection code will need to handle concurrent consumer groups accessing the same read models when processing events for different partitions that need to go into the same models. Depending on your target store for the projection, there are lots of ways to handle this (transactions, optimistic concurrency, atomic operations, etc.) but it would be a problem for some target stores
Your projection code will need to keep track of the stream position of each partition, not just a single position. If your projection reads from multiple streams, it has to keep track of lots of positions.
Using a global stream removes both of those concerns - performance is usually likely to be good enough.
In either case, you'll likely also want to get the stream position into the long term event storage (i.e. Cassandra) - you could do this by having a dedicated process reading from the event stream (partitioned or global) and just updating the events in Cassandra with the global or partition position of each event. (I have a similar thing with MongoDB - I have a process reading the 'oplog' and copying oplog timestamps into events, since oplog timestamps are totally ordered).
Another option is to drop Cassandra from the initial command processing and use Kafka Streams instead:
Partitioned command stream is processed by joining with a partitioned KTable of aggregates
Command result and events are computed
Atomically, KTable is updated with changed aggregate, events are written to event stream and command response is written to command response stream.
You would then have a downstream event processor that copies the events into Cassandra for easier querying etc. (and which can add the Kafka stream position to each event as it does it to give the category ordering). This can help with catch up subscriptions, etc. if you don't want to use Kafka for long term event storage. (To catch up, you'd just read as far as you can from Cassandra and then switch to streaming from Kafka from the position of the last Cassandra event). On the other hand, Kafka itself can store events for ever, so this isn't always necessary.
I hope this helps a bit with understanding the tradeoffs and problems you might encounter.

How do you ensure that events are applied in order to read model?

This is easy for projections that subscribe to all events from the stream, you just keep version of the last event applied on your read model. But what do you do when projection is composite of multiple streams? Do you keep version of each stream that is partaking in the projection. But then what about the gaps, if you are not subscribing to all events? At most you can assert that version is greater than the last one. How do others deal with this? Do you respond to every event and bump up version(s)?
For the EventStore, I would suggest using the $all stream as the default stream for any read-model subscription.
I have used the category stream that essentially produces the snapshot of a given entity type but I stopped doing so since read-models serve a different purpose.
It might be not desirable to use the $all stream as it might also get events, which aren't domain events. Integration events could be an example. In this case, adding some attributes either to event contracts or to the metadata might help to create an internal (JS) projection that will create a special all stream for domain events, or any event category in that regard, where you can subscribe to. You can also use a negative condition, for example, filter out all system events and those that have the original stream name starting with Integration.
As well as processing messages in the correct order, you also have the problem of resuming a projection after it is restarted - how do you ensure you start from the right place when you restart?
The simplest option is to use an event store or message broker that both guarantees order and provides some kind of global stream position field (such as a global event number or an ordered timestamp with a disambiguating component such as MongoDB's Timestamp type). Event stores where you pull the events directly from the store (such as eventstore.org or homegrown ones built on a database) tend to guarantee this. Also, some message brokers like Apache Kafka guarantee ordering (again, this is pull-based). You want at-least-once ordered delivery, ideally.
This approach limits write scalability (reads scale fine, using read replicas) - you can shard your streams across multiple event store instances in various ways, then you have to track the position on a per-shard basis, which adds some complexity.
If you don't have these ordering, delivery and position guarantees, your life is much harder, and it may be hard to make the system completely reliable. You can:
Hold onto messages for a while after receiving them, before processing them, to allow other ones to arrive
Have code to detect missing or out-of-order messages. As you mention, this only works if you receive all events with a global sequence number or if you track all stream version numbers, and even then it isn't reliable in all cases.
For each individual stream, you keep things in order by fetching them from a data store that knows the correct order. A way of thinking of this is that your query the data store, and you get a Document Message back.
It may help to review Greg Young's Polyglot Data talk.
As for synchronization of events in multiple streams; a thing that you need to recognize is that events in different streams are inherently concurrent.
You can get some loose coordination between different streams if you have happens-before data encoded into your messages. "Event B happened in response to Event A, therefore A happened-before B". That gets you a partial ordering.
If you really do need a total ordering of everything everywhere, then you'll need to be looking into patterns like Lamport Clocks.

How do I implement Event Sourcing using Kafka?

I would like to implement the event-sourcing pattern using kafka as an event store.
I want to keep it as simple as possible.
The idea:
My app contains a list of customers. Customers an be created and deleted. Very simple.
When a request to create a customer comes in, I am creating the event CUSTOMER_CREATED including the customer data and storing this in a kafka topic using a KafkaProducer. The same when a customer is deleted with the event CUSTOMER_DELETED.
Now when i want to list all customers, i have to replay all events that happened so far and then get the current state meaning a list of all customers.
I would create a temporary customer list, and then processing all the events one by one (create customer, create customer, delete customer, create customer etc). (Consuming these events with a KafkaConsumer). In the end I return the temporary list.
I want to keep it as simple as possible and it's just about giving me an understanding on how event-sourcing works in practice. Is this event-sourcing? And also: how do I create snapshots when implementing it this way?
when i want to list all customers, i have to replay all events that happened so far
You actually don't, or at least not after your app starts fresh and is actively collecting / tombstoning the data. I encourage you to lookup the "Stream Table Duality", which basically states that your table is the current state of the world in your system, and a snapshot in time of all the streamed events thus far, which would be ((customers added + customers modified) - customers deleted).
The way you implement this in Kafka would be to use a compacted Kafka topic for your customers, which can be read into a Kafka Streams KTable, and persisted in memory or spill to disk (backed by RocksDB). The message key would be some UUID for the customer, or some other identifiable record that cannot change (e.g. not name, email, phone, etc. as all this can change)
With that, you can implement Interactive Queries on it to scan or lookup a certain customer's details.
Theoretically you can do Event Sourcing with Kafka as you mentioned, replaying all Events in the application start but as you mentioned, if you have 100 000 Events to reach a State it is not practical.
As it is mentioned in the previous answers, you can use Kafka Streaming KTable for sense of Event Sourcing but while KTable is hosted in Key/Value database RockDB, querying the data will be quite limited (You can ask what is the State of the Customer Id: 123456789 but you can't ask give me all Customers with State CUSTOMER_DELETED).
To achieve that flexibility, we need help from another pattern Command Query Responsibility Segregation (CQRS), personally I advice you to use Kafka reliable, extremely performant Broker and give the responsibility for Event Sourcing dedicated framework like Akka (which Kafka synergies naturally) with Apache Cassandra persistence and Akka Finite State Machine for the Command part and Akka Projection for the Query part.
If you want to see a sample how all these technology stacks plays together, I have a blog for it. I hope it can help you.

How do I keep the RDMS and Kafka in sync?

We want to introduce a Kafka Event Bus which will contain some events like EntityCreated or EntityModified into our application so other parts of our system can consume from it. The main application uses an RDMS (i.e. postgres) under the hood to store the entities and their relationship.
Now the issue is how you make sure that you only send out EntityCreated events on Kafka if you successfully saved to the RDMS. If you don't make sure that this is the case, you end up with inconsistencies on the consumers.
I saw three solutions, of which none is convincing:
Don't care: Very dangerous, there can be something going wrong when inserting into an RDMS.
When saving the entity, also save the message which should be sent into a own table. Then have a separate process which consumes from this table and publishes to Kafka and after a success deleted from this table. This is quiet complex to implement and also looks like an anti-pattern.
Insert into the RDMS, keep the (SQL-) Transaction open until you wrote successfully to Kafka and only then commit. The problem is that you potentially keep the RDMS transaction open for some time. Don't know how big the problem is.
Do real CQRS which means that you don't save at all to the RDMS but construct the RDMS out of the Kafka queue. That seems like the ideal way but is difficult to retrofit to a service. Also there are problems with inconsistencies due to latencies.
I had difficulties finding good solutions on the internet.
Maybe this question is to broad, feel free to point me somewhere it fits better.
When saving the entity, also save the message which should be sent into a own table. Then have a separate process which consumes from this table and publishes to Kafka and after a success deleted from this table. This is quiet complex to implement and also looks like an anti-pattern.
This is, in fact, the solution described by Udi Dahan in his talk: Reliable Messaging without Distributed Transactions. It's actually pretty close to a "best practice"; so it may be worth exploring why you think it is an anti-pattern.
Do real CQRS which means that you don't save at all to the RDMS but construct the RDMS out of the Kafka queue.
Noooo! That's where the monster is hiding! (see below).
If you were doing "real CQRS", your primary use case would be that your writers make events durable in your book of record, and the consumers would periodically poll for updates. Think "Atom Feed", with the additional constraint that the entries, and the order of entries, is immutable; you can share events, and pages of events; cache invalidation isn't a concern because, since the state doesn't change, the event representations are valid "forever".
This also has the benefit that your consumers don't need to worry about message ordering; the consumers are reading documents of well ordered events with pointers to the prior and subsequent documents.
Furthermore, you've additionally gotten a solution to a versioning story: rather than broadcasting N different representations of the same event, you send out one representation, and then negotiate the content when the consumer polls you.
Now, polling does have latency issues; you can reduce the latency by broadcasting an announcement of the update, and notifying the consumers that new events are available.
If you want to reduce the rate of false polling (waking up a consumer for an event that they don't care about), then you can start adding more information into the notification, so that the consumer can judge whether to pull an update.
Notice that "wake up and maybe poll" is a process that is triggered by a single event in isolation. "Wake up and poll just this message" is another variation on the same idea. We broadcast a thin version of EmailDeliveryScheduled; and the service responsible for that calls back to ask for the email/an enhanced version of the event with the details needed to construct the email.
These are specializations of "wake up and consume the notification". If you have a use case where you can't afford the additional latency required to poll, you can use the state in the representation of the isolated event.
But trying to reproduce an ordered sequence of events when that information is already exposed as a sharable, cacheable document... That's a pretty unusual use case right there. I wouldn't worry about it as a general problem to solve -- my guess is that these cases are rare, and not easily generalized.
Note that all of the above is about messaging, not about Kafka. Notice that messaging and event sourcing are documented as different use cases. Jay Kreps wrote (2013)
I use the term "log" here instead of "messaging system" or "pub sub" because it is a lot more specific about semantics and a much closer description of what you need in a practical implementation to support data replication.
You can think of the log as acting as a kind of messaging system with durability guarantees and strong ordering semantics
The book of record should be the sole authority for the order of event messages. Any consumer that cares about order should be reading ordered documents from the book of record, rather than reading unordered documents and reconstructing the order.
In your current design....
Now the issue is how you make sure that you only send out EntityCreated events on Kafka if you successfully saved to the RDMS.
If the RDBMS is the book of record (the source of "truth"), then the Kafka log isn't (yet).
You can get there from here, over a number of gentle steps; roughly, you add events into the existing database, you read from the existing database to write into kafka's log; you use kafka's log as a (time delayed) source of truth to build a replica of the existing RDBMS, you migrate your read use cases to the replica, you migrate your write use cases to kafka, and you decommission the legacy database.
Kafka's log may or may not be the book of record you want. Greg Young has been developing Get Event Store for quite some time, and has enumerated some of the tradeoffs (2016). Horses for courses - I wouldn't expect it to be too difficult to switch the log from one of these to the other with a well written code base, but I can't speak at all to the additional coupling that might occur.
There is no perfect way to do this if your requirement is look SQL & kafka as a single node. So the question should be: "What bad things(power failure, hardware failure) I can afford if it happen? What the changes(programming, architecture) I can take if it must apply to my applications?"
For those points you mentioned:
What if the node fail after insert to kafka before delete from sql?
What if the node fail after insert to kafka before commit the sql transaction?
What if the node fail after insert to sql before commit the kafka offset?
All of them will facing the risk of data inconsistency(4 is slightly better if the data insert to sql can not success more than once such as they has a non database generated pk).
From the viewpoint of changes, 3 is smallest, however, it will decrease sql throughput. 4 is biggest due to your business logic model will facing two kinds of database when you coding(write to kafka by a data encoder, read from sql by sql sentence), it has more coupling than others.
So the choice is depend on what your business is. There is no generic way.