Debezium SQL Server Connector Kafka Initial Snapshot - apache-kafka

according to the Debezium SQL Server Connector documentation, initial snapshot only fires on connector first run.
However if I delete connector and create new one but with the same name, initial snapshot is not working also.
Is this by design or known Issue?
Any help appreciated

Kafka Connect stores details about connectors such as their snapshot status and ingest progress even after they've been deleted. If you recreate it with the same name it will assume it's the same connector and thus will try to continue from where the previous connector got to.
If you want a connector to start from scratch (i.e. run snapshot etc) then you need to give the connector a new name. (Technically, you could also go into Kafka Connect and muck about with the internal data to remove the data for the connector of the same name, but that's probably a bad idea)

Give your connector a new database.server.name value or create a new topic. The reason why the snapshot doesn't fire again is because the current offset value for your topic and consumer has already passed the snapshot count index.

Related

Kafka connect - completely removing a connector

my question is split to two. I've read Kafka Connect - Delete Connector with configs?. I'd like to completely remove a connector, with offsets and all, so I can recreate it with the same name later. Is this possible? To my understanding, a tombstone message will kill this connector indefinitely.
The second part is - is there a way to have the kafka-connect container automatically delete all connectors he created when bringing it down?
Thanks
There is no such command to completely cleanup connector state. For sink connectors, you can use kafka-consumer-groups to reset it's offsets. For source connectors, it's not as straightforward, as you'll need to manually produce data into the Connect-managed offsets topic.
The config and status topics also persist historical data, but shouldn't prevent you from recreating the connector with the same name/details.
The Connect containers published by Confluent and Debezium always uses Distributed mode. You'll need to override the entrypoint of the container to use standalone mode to not persist the connector metadata in Kafka topics (this won't be fault tolerant, but it'll be fine for testing)

Can I initiate an ad-hoc Debezium snapshot without a signaling table?

I am running a Debezium connector to PostgreSQL. The snapshot.mode I use is initial, since I don't want to resnapshot just because the connector has been restarted. However, during development I want to restart the process, as the messages expire from Kafka before they have been read.
If I delete and recreate the connector via Kafka Connect REST API, this doesn't do anything, as the information in the offset/status/config topics is preserved. I have to delete and recreate them when restarting the whole connect cluster to trigger another snapshot.
Am I missing a more convenient way of doing this?
You will also need a new name for the connector as well as a new database.server.name name in the connector config, which stores all the offset information. It should almost be like deploying a connector for the first time again.

Debezium: Check if snapshot is complete for postgres initial_only

I'm using Debezium postgres connector v1.4.2.Final.
I'm using snapshot.mode=initial_only, where I only want to get the table(s) snapshot and not stream the incremental changes. Once the snapshot is completed, I want to stop/kill the connector. How can I find out if the snapshotting is complete and that it's safe to kill the connector?
I'm using this to be able to add new tables to an existing connector. For doing that I'm trying this:
kill the original connector (snapshot.mode=initial)
start a new connector with snapshot.mode=initial_only for new tables
stop the new connector once snapshotting is complete
Start original connector after adding new tables to table.whitelist
please check JMX metrics. Verify if this one https://debezium.io/documentation/reference/1.5/connectors/postgresql.html#connectors-snaps-metric-snapshotcompleted_postgresql would suite to your needs.

Coucbase to Kafka source connector

I have used kafka source connector to get the documents from Couchbase to kafka. These documents are then replicated to Mongo DB.
Couchbase --> Source Connector --> Kafka --> Sink Connector ---> Mongo
If the source connector is down then how to again synch all the documents to Kafka?
Is there any get and touch functionality that can agian event out all the changes made during the down period to the kafka topic?
If you're asking about processing the document changes that occurred while the source connector was down, then you don't need to do anything. Kafka Connect stores the state (offsets) of the source connector and will restore the StreamTask state and continue from where it left off. The Couchbase source connector supports this, as we can see in the code here, which is then used here to initialize the DCP stream with the saved offsets.
If you're asking how to reset the connector and re-stream the entire bucket from the beginning, that's actually not as easy. As far as I know, there is no built-in way in Kafka to reset a connector's offsets - there is a KIP under review related to that: KIP-199 Barring official support, the best way I know of resetting the connector state is either change the config to use a different topic for saving the offsets, which is hacky and leaves the old offsets as a potential problem, or actually edit the saved offsets as described here. I would never advocate doing either of those on a production system, so use your own judgement.

Kafka Connect - Delete Connector with configs?

I know how to delete Kafka connector as mentioned here Kafka Connect - How to delete a connector
But I am not sure if it also delete/erase specific connector related configs, offsets and status from *.sorage.topic for that worker?
For e.g:
Lets say I delete a connector having connector-name as"connector-abc-1.0.0" and Kafka connect worker was started with following config.
offset.storage.topic=<topic.name>.internal.offsets
config.storage.topic=<topic.name>.internal.configs
status.storage.topic=<topic.name>.internal.status
Now after DELETE call for that connector, will it erased all records from above internal topics for that specific connector?
So that I can create new connector with "same name" on same worker but different config(different offset.start or connector.class)?
When you delete a connector, the offsets are retained in the offsets topic.
If you recreate the connector with the same name, it will re-use the offsets from the previous execution (even if the connector was deleted in between).
Since Kafka is append only, then only way the messages in those Connect topics would be removed is if it were published with the connector name as the message key, and null as the value.
You could inspect those topics using console consumer to see what data is in them including --property print.key=true, and keep the consumer running when you delete a connector.
You can PUT a new config at /connectors/{name}/config, but any specific offsets that are used are dependent upon the actual connector type (sink / source); for example, there is the internal Kafka __consumer_offsets topic, used by Sink connectors, as well as the offset.storage.topic, optionally used by source connectors.
"same name" on same worker but different config(different offset.start or connector.class)?
I'm not sure changing connector.class would be a good idea with the above in mind since it'd change the connector behavior completely. offset.start isn't a property I'm aware of, so you'll need to see the documentation of that specific connector class to know what it does.