CDC replication not working after few days of inactivity - debezium

I am struggling with a issue where Debezium is unable to continue CDC replication after a few days of inactivity on source DB (SQL Server Always ON). Debezium Version: 1.9.5
The Debezium is reading from a listener which is installed over the SQL server Always ON.
When i see the connector/task status - its seems RUNNING. But NO CDC data is transferred over the TOPICS.
I verified this by running a kafka console consumer and checking if any data is sent to the topics. I am using snapshot.mode=initial, which worked fine to do a FULL LOAD and continued with CDC for a couple of days, but stopped running without any errors.
Can someone please point me how to fix this. Thank you for your time and efforts.

Related

Can I initiate an ad-hoc Debezium snapshot without a signaling table?

I am running a Debezium connector to PostgreSQL. The snapshot.mode I use is initial, since I don't want to resnapshot just because the connector has been restarted. However, during development I want to restart the process, as the messages expire from Kafka before they have been read.
If I delete and recreate the connector via Kafka Connect REST API, this doesn't do anything, as the information in the offset/status/config topics is preserved. I have to delete and recreate them when restarting the whole connect cluster to trigger another snapshot.
Am I missing a more convenient way of doing this?
You will also need a new name for the connector as well as a new database.server.name name in the connector config, which stores all the offset information. It should almost be like deploying a connector for the first time again.

Kafka connect-distributed mode fault tolerance not working

I have created kafka connect cluster with 3 EC2 machines and started 3 connectors ( debezium-postgres source) on each machine reading a different set of tables from postgres source. In one of the machines, I started the s3 sink connector as well. So the changed data from postgres is being moved to kafka broker via source connectors (3) and S3 sink connector consumes these messages and pushes them to S3 bucket.
The cluster is working fine and so are the connectors too. When I pause any of the connectors running on one of EC2 machine, I was expecting that its task should be taken by another connector (postgres-debezium) running on another machine. But that's not happening.
I installed kafdrop as well to monitor the brokers. I see 3 internal topics connect-offsets, connect-status and connect-configs are getting populated with necessary offsets, configs, and status too ( when I pause, status paus message appears).
But somehow connectors are not taking the task when I paused.
Let me know in what scenario connector takes the task of other failed one? Is pause is the right way? or we should produce some error on one of the connectors then it takes.
Please guide.
Sounds like it's working as expected.
Pausing has nothing to do with the fault tolerance settings and it'll completely stop the tasks. There's nothing to rebalance until unpaused.
The fault tolerance settings for dead letter queue, skip+log, or halt are for when there are actual runtime exception in the connector that you cannot control through the API. For example, a database or S3 network / authentication exception, or serialization error in the Kafka client

How can I know if I'm suffering data loss during Kafka Connect intermittent read-from-source issues?

We are running Kafka Connect using the Confluent JDBC Source Connector to read from a DB2 database. Periodically, we see issues like this in our Kafka Connect logs:
kafkaconnect-deploy-prod-967ddfffb-5l4cm 2021-04-23 10:39:43.770 ERROR Failed to run query for table TimestampIncrementingTableQuerier{table="PRODSCHEMA"."VW_PRODVIEW", query='null', topicPrefix='some-topic-prefix-', incrementingColumn='', timestampColumns=[UPDATEDATETIME]}: {} (io.confluent.connect.jdbc.source.JdbcSourceTask:404)
com.ibm.db2.jcc.am.SqlException: DB2 SQL Error: SQLCODE=-668, SQLSTATE=57007, SQLERRMC=1;PRODSCHEMA.SOURCE_TABLE, DRIVER=4.28.11
at com.ibm.db2.jcc.am.b7.a(b7.java:815)
...
at com.ibm.db2.jcc.am.k7.bd(k7.java:785)
at com.ibm.db2.jcc.am.k7.executeQuery(k7.java:750)
at io.confluent.connect.jdbc.source.TimestampIncrementingTableQuerier.executeQuery(TimestampIncrementingTableQuerier.java:200)
at io.confluent.connect.jdbc.source.TimestampIncrementingTableQuerier.maybeStartQuery(TimestampIncrementingTableQuerier.java:159)
at io.confluent.connect.jdbc.source.JdbcSourceTask.poll(JdbcSourceTask.java:371)
This appears to be an intermittent issue connecting to DB2, and is semi-expected; for reasons outside the scope of this question, we know that the network between the two is unreliable.
However, what we are trying to establish is whether in this circumstance data loss is likely to have occurred. I've found this article which talks about error handling in Kafka Connect, but it only refers to errors due to broken messages, not the actual connectivity between Kafka Connect and the data source.
In this case, how would we know if the failure to connect had caused data loss? (i.e. records in our data source that were not processed for target topic). Would there be errors in the Kafka Connect log? Will Kafka Connect always retry indefinitely when it has a connectivity issue? Are there any controls over its retry?
(If it matters, Kafka Connect is version 2.5; it is deployed in a Kubernetes cluster, in distributed mode, but with only one actual running worker/container.)

Springboot kafka streams Application failed if one kafka broker went down

we are using springboot application to develop kafka streams application. Till these days we are using single broker only so we are not facing any issues
But a week ago we created cluster mode with 3 zookeepers and 3 kafka broker for higher availability
we configured our application like the following.
spring.kafka.bootstrap-servers=x.x.x.x:9093,x.x.x.x:9093,x.x.x.x:9093
leader-1
leader-2
leader-3
So we are testing the server down behaviour below are the results
Expected behavior: it has to continuously run without any struggle by consuming and producing the data
Actual behavior: if we down any one server it will throw the exception and broker not available after some time application got stopped
while analysing the cause we found consuming topic is having leader-1 and data producing topic is having leader-2 so when i stop the leader-1 what we thought is it will change to the next leader but it is not?
is this is the default behaviour or else we are doing anything wrong?
can anyone please suggest me how to overcome this issue?

Running Source Connector on Demand and Not Based on poll.interval.ms

I have a table that is updated once / twice a day, but I want the data to be pushed to Kafka immediately after the table is updated. Is it possible to avoid running the connector every poll.interval.ms, but rather to run it only after the table is updated (sync on demand or trigger the sync in some other way after the table update)
I apologize if this question is stupid... Can sink connector be running on one Kafka cluster, but pull messages from another Kafka cluster and insert them into Postgres. I'm not talking about replicating messages from Cluster A to Cluster B and then inserting messages from Cluster B to Postgres. I'm talking about Connector running on Cluster B but pulling messages from Cluster A and writing them to Postgres.
Thanks!
If you use log-based change data capture (Debezium, etc) then you capture changes as soon as they are there, without needing to re-query the database. If you use query-based CDC then you do have to query the database on a polling interval. For query-based vs log-based CDC see this blog or talk.
One option would be to use the Kafka Connect REST API to control the connector - but you're kind of going against the streaming paradigm here and will start to find awkward edges in doing this. For example, when do you decide to pause the connector? How do you determine that it's ingested all the changes? etc.
Using log-based CDC is low-impact on the source system and commonly the route that people go.
Kafka Connect does not run on your Kafka cluster. Kafka Connect runs as its own cluster. Physically, it can be co-located for purposes of dev/sandbox environment (this ref arch is useful for production). See also this talk "Running Kafka Connect".
So in your example, "Cluster B" is actually a Kafka Connect cluster - and it would be configured to read from Kafka cluster "A", and that is fine.