After a "when needed" snapshot, will Debezium pick up outbox events where it left them? - debezium

I'm using Debezium as an outbox event router with MySQL.
I'm considering setting snapshot.mode to when_needed, but I'd like to understand better the implications in case a snapshot happens.
After the snapshot is performed, will Debezium send to Kafka all events found in the outbox table, even those that it had already sent when operating normally before the snapshot? Or will it somehow pick up where it left them?

Related

debezium connector failover mechanism

i'm learning about debezium connectors and im using debezium for postgresql. I have a small question to clarify.
Imagine a situation like this. I have a debezium connector for a table called tableA and changes happening on that table publish to a topic called topicA. Connector works without any issue and changes are publishing to the topic without any issue. Now think that for some reason i need to delete my connector and start a new connector with the same configurations for the same table that publish to the same topic. So there is a time gap between i stop my connector and start a new one with same configs. What happen to the data that get change during that time on my tableA.
Will that gonna start from where it stopped or what will happen ?
Dushan , The answer is depends on how the connector stops. The various scenarios are articulated here
https://debezium.io/documentation/reference/stable/connectors/postgresql.html#postgresql-kafka-connect-process-stops-gracefully
In an ideal case scenario , the Log Sequence Number is recorded in the database history topic. Unless the history topic is re-created or messages expired the LSN offsets are stored and on restart will resume from that location

Delay Kafka event sending, using Outbox pattern and CDC

I would like to delay sending Kafka messages of given topic by 5 minutes. In order to ensure it will be sent, I need to persist this topic in a database before sending (Outbox pattern). Now, is there any CDC solution that provides delayed reaction to changes in database? Does Debezium allow such thing?

Kafka transaction management at producer end

I am looking for how Kafka behave when the producer is running in transaction.
I have a oracle database insert operations running in same transaction which rollback the changes if the transaction is rolled back.
How does Kafka producer behave in case of transaction rollback.
Will the message be rolled back or Kafka doesn't support rollback.
I know the JMS message are committed to queue only when transaction is committed. Looking for similar solutions if it is supported.
Note : Producer code is written using spring boot.
You are trying to update two systems
update a record in your oracle database
sending a event to apache kafka
This represents a challenge as you would like it to be atomic, either everything gets executed or nothing, otherwise you will end up with inconsistencies between your database and kafka.
You might send a Kafka message even if the database transaction was rollbacked.
Or the other way around (if you are sending the message just after the commit), you might commit the database transaction and crash (for some reason) just before sending the Kafka event.
One of the simplest solution is to use the outbox pattern:
Let's say you want to update an order table and send orderEvent to kafka
Instead of sending the event to kafka in the same transaction
You can save it a database table (outbox) using the same transaction as the order update
A separate process will read data from outbox table and make sure it's sent to kafka (using at least once semantic)
Your consumer need to be idempotent.
In this post, I explain more in detail how to implement this solution
https://mirakl.tech/sending-kafka-message-in-a-transactional-way-34d6d19bb7b2

In Kafka, how to handle deleted rows from source table that are already reflected in Kafka topic?

I am using a JDBC source connector with mode timestamp+incrementing to fetch table from Postgres, using Kafka Connect. The updates in data are reflected in Kafka topic but the deletion of records has no effect. So, my questions are:
Is there some way to handle deleted records?
How to handle records that are deleted but still present in kafka topic?
The recommendation is to either 1) adjust your source database to be append/update only, as well, either via a boolean or timestamp that is filtered out when Kafka Connect queries the table.
If your database is running out of space, then you can delete old records, which should already have been processed by Kafka
Option 2) Use CDC tools to capture delete events immediately rather than missing them in a period table scan. Debezium is a popular option for Postgres
A Kafka topic can be seen as an "append-only" log. It keeps all meesages for as long as you like but Kafka is not built to delete individual messages out of a topic.
In the scenario you are describing it is common that the downstream application (consuming the topic) handles the information on a deleted record.
As an alternative you could set the cleanup.policy of your topic to compact which means it will eventually keep only the latest value for each key. If you now define the key of a message as the primary key of the Postgres table, your topic will eventually delete the record when you produce a message with the same key and a null value into the topic. However,
I am not sure if your connector is flexible to do this
Depending on what you do with the data in the kafka topic, this could still not be a solution to your problem as the downstream application will still read both record, the original one and the null message as the deleted record.

Can Debezium ensure all events from the same transaction are published at the same time?

I'm starting to explore the use of change data capture to convert the database changes from a legacy and commercial application (which I cannot modify) into events that could be consumed by other systems. Simplifying my real case, let's say that there will be two tables involved, order with the order header details and order_line with the details of each of the products requested.
My current understanding is that events from the two tables will be published into two different kafka topics and I should aggregate them using kafka-streams or ksql. I've seen there are different options to define the window that will be used to select all the events that are related, however it is not clear for me how I could be sure all the events coming from the same database transaction are already in the topic, so I do not miss any of them.
Is Debezium able to ensure this (all events from same transaction are published) or it could happen that, for example, Debezium crashes while publishing the events and only part of the ones generated by the same transaction are in Kafka?
If so, what's the recommended approach to handle this?
Thanks
Debezium stores the positions of transaction logs that it reads completely in Kafka and it uses these positions to resume its work on any crash or other situation like this also in other situations that may happen sometimes and in this situation the debezium loss it's position, it will restore it by reading the snapshot of database again!