Debezium mongodb connector delete event doesn’t give the before event in payload - mongodb

adding a question to this thread..
I am using the debezium connetor and everything is going smooth except with few things. The delete operation doesn’t give any after or patch or before event and it just returns the id in the key field of confluent kafka. Do you suggest any property that i can add in my connector configuration and also for the update event, i am expecting the after and before event where as the connector gives me only tha patch event with only the updated values in mongodb and here also i should go and check the key field in confluent kafka topic to get the id of that record which is changed.. any help? Mongodb version which i am using is 4.4

Related

debezium connector failover mechanism

i'm learning about debezium connectors and im using debezium for postgresql. I have a small question to clarify.
Imagine a situation like this. I have a debezium connector for a table called tableA and changes happening on that table publish to a topic called topicA. Connector works without any issue and changes are publishing to the topic without any issue. Now think that for some reason i need to delete my connector and start a new connector with the same configurations for the same table that publish to the same topic. So there is a time gap between i stop my connector and start a new one with same configs. What happen to the data that get change during that time on my tableA.
Will that gonna start from where it stopped or what will happen ?
Dushan , The answer is depends on how the connector stops. The various scenarios are articulated here
https://debezium.io/documentation/reference/stable/connectors/postgresql.html#postgresql-kafka-connect-process-stops-gracefully
In an ideal case scenario , the Log Sequence Number is recorded in the database history topic. Unless the history topic is re-created or messages expired the LSN offsets are stored and on restart will resume from that location

Is there a way to configure Debezium to store in Kafka not all the changes from database but only a certain ones?

I have mongodb and I need to send the changes from a certain query to kafka broker. I heard that debezium tracks changes from database and stores it to kafka. But is there a way to configure that process to store not all the changes that happen in database but only a certain ones?
You can perform some filtering using their single message transform (SMT) Kafka Connect plugin. You can check its documentation to see if it has the features that you need: https://debezium.io/documentation/reference/stable/transformations/filtering.html
Depending on the source technology you could.
When using PostgreSQL as a source, for example, you can define which operations to include in the PG publication that is read by Debezium
More info at the Debezium docs

In Kafka, how to handle deleted rows from source table that are already reflected in Kafka topic?

I am using a JDBC source connector with mode timestamp+incrementing to fetch table from Postgres, using Kafka Connect. The updates in data are reflected in Kafka topic but the deletion of records has no effect. So, my questions are:
Is there some way to handle deleted records?
How to handle records that are deleted but still present in kafka topic?
The recommendation is to either 1) adjust your source database to be append/update only, as well, either via a boolean or timestamp that is filtered out when Kafka Connect queries the table.
If your database is running out of space, then you can delete old records, which should already have been processed by Kafka
Option 2) Use CDC tools to capture delete events immediately rather than missing them in a period table scan. Debezium is a popular option for Postgres
A Kafka topic can be seen as an "append-only" log. It keeps all meesages for as long as you like but Kafka is not built to delete individual messages out of a topic.
In the scenario you are describing it is common that the downstream application (consuming the topic) handles the information on a deleted record.
As an alternative you could set the cleanup.policy of your topic to compact which means it will eventually keep only the latest value for each key. If you now define the key of a message as the primary key of the Postgres table, your topic will eventually delete the record when you produce a message with the same key and a null value into the topic. However,
I am not sure if your connector is flexible to do this
Depending on what you do with the data in the kafka topic, this could still not be a solution to your problem as the downstream application will still read both record, the original one and the null message as the deleted record.

how to configure debezium fields sent on update events (mongo connector)

I want to use debezium mongo connector to:
-> get events from mongo
-> get them in my kafka
-> read from kafka
my issue is, when debezium gets update events from mongo it only sends the updated fields:
The value of an update change event on this collection will actually
have the exact same schema, and its payload will be structured the
same but will hold different values. Specifically, an update event
will not have an after value and will instead have a patch string
containing the JSON representation of the idempotent update operation.
and I was wondering if i can configure it somehow cause there are some fields I would like to get with the updates events.
you should be able to solve the problem using Kafka Streams - in a similar (simplified) way as in this case https://debezium.io/blog/2018/03/08/creating-ddd-aggregates-with-debezium-and-kafka-streams/

How to update KSQL stream definition dynamically based on schema-registry

I created a KSQL stream based on schema-registry by following this post. The Kafka JDBC connector updates a latest schema in schema-registry. The new stream gets created with the latest schema, but existing stream sill in the oldest schema.
I don't know when the schema of the datasource gets changed. In this case, I am expecting KSQL to dynamically refresh its definition with the newest schema available in schema-registry.
Any idea? How to achieve this?
At the moment you have to manually drop and recreate a stream to pick up the new schema.
I've logged #2215 if you want to upvote/discuss desired behaviour there.