In kafka connect SinkTask implementation, If there is an exception which is unavoidable, if I invoke stop() method from my code, does the connector stop altogether ?
Only that task that encountered the exception will stop
The connect cluster can have multiple connectors which won't stop, and there can be other tasks for a single connect depending on your configurations that, for example, would be assigned different, clean, data that could be processed
Related
Some weeks ago my project has been updated to use Kafka 3.2.1 instead of using the one coming with Spring Boot 2.7.3 (3.1.1). We made this upgrade to avoid an issue in Kafka streams – Illegal state and argument exceptions were not ending in the uncaught exception handler.
On the consumer side, we also moved to the cooperative sticky assignator.
In parallel, we started some resiliency tests and we started to have issues with Kafka records that are not consumed anymore on some partitions when using a Kafka batch listener. The issue occurred after several rebalances caused by the test (deployment is done in Kubernetes and we stopped some pods, micro services and broker instances). The issue not present on every listeners. Kafka brokers and micro-services are up and running.
During our investigations,
we enabled Kafka events and we can clearly see that the consumer is started
we can see in the logs that the partitions that are not consuming events are assigned.
debug has been enabled on the KafkaMessageListenerContainer. We see a lot of occurrences of Receive: 0 records and Commit list: {}
Is there any blocking points to use Kafka 3.2.1 with Spring Boot/Kafka 2.7.3/2.8.8?
Any help or other advices are more than welcome to progress our investigations.
Multiple listeners are defined, the retry seems to be fired from another listener (shared err handler?).
This is a known bug, fixed in the next release:
https://github.com/spring-projects/spring-kafka/issues/2382
https://github.com/spring-projects/spring-kafka/commit/3de1e89ba697ead04de171cfa35273bb0daddbe6
Temporary work around is to give each container its own error handler.
I have created kafka connect cluster with 3 EC2 machines and started 3 connectors ( debezium-postgres source) on each machine reading a different set of tables from postgres source. In one of the machines, I started the s3 sink connector as well. So the changed data from postgres is being moved to kafka broker via source connectors (3) and S3 sink connector consumes these messages and pushes them to S3 bucket.
The cluster is working fine and so are the connectors too. When I pause any of the connectors running on one of EC2 machine, I was expecting that its task should be taken by another connector (postgres-debezium) running on another machine. But that's not happening.
I installed kafdrop as well to monitor the brokers. I see 3 internal topics connect-offsets, connect-status and connect-configs are getting populated with necessary offsets, configs, and status too ( when I pause, status paus message appears).
But somehow connectors are not taking the task when I paused.
Let me know in what scenario connector takes the task of other failed one? Is pause is the right way? or we should produce some error on one of the connectors then it takes.
Please guide.
Sounds like it's working as expected.
Pausing has nothing to do with the fault tolerance settings and it'll completely stop the tasks. There's nothing to rebalance until unpaused.
The fault tolerance settings for dead letter queue, skip+log, or halt are for when there are actual runtime exception in the connector that you cannot control through the API. For example, a database or S3 network / authentication exception, or serialization error in the Kafka client
I am running a sinkTask using connect-standalone.sh and connect-standalone.properties. I am doing this in a shell script and I am not sure how to stop the sinkTask once the data is consumed by the consumer.
I tried various settings in the properties file like connections.max.idle.ms=5000. But nothing is stopping the sink.
I don't want to try the distributed mode as it requires REST API calls. Any suggest to stop the sinkTask once the messages in the producer are empty?
When running in standalone, the only way to stop a connector is to stop the connect process you started with connect-standalone.sh.
If you want to often start and stop connectors, I'd recommend you to reconsider distributed mode as it makes controlling the life cycle of connectors easy to manage via the REST API.
I have a critical Kafka application that needs to be up and running all the time. The source topics are created by debezium kafka connect for mysql binlog. Unfortunately, many things can go wrong with this setup. A lot of times debezium connectors fail and need to be restarted, so does my apps then (because without throwing any exception it just hangs up and stops consuming). My manual way of testing and discovering the failure is checking kibana log, then consume the suspicious topic through terminal. I can mimic this in code but obviously no way the best practice. I wonder if there is the ability in KafkaStream api that allows me to do such health check, and check other parts of kafka cluster?
Another point that bothers me is if I can keep the stream alive and rejoin the topics when connectors are up again.
You can check the Kafka Streams State to see if it is rebalancing/running, which would indicate healthy operations. Although, if no data is getting into the Topology, I would assume there would be no errors happening, so you need to then lookup the health of your upstream dependencies.
Overall, sounds like you might want to invest some time into using monitoring tools like Consul or Sensu which can run local service health checks and send out alerts when services go down. Or at the very least Elasticseach alerting
As far as Kafka health checking goes, you can do that in several ways
Is the broker and zookeeper process running? (SSH to the node, check processes)
Is the broker and zookeeper ports open? (use Socket connection)
Are there important JMX metrics you can track? (Metricbeat)
Can you find an active Controller broker (use AdminClient#describeCluster)
Are there a required minimum number of brokers you would like to respond as part of the Controller metadata (which can be obtained from AdminClient)
Are the topics that you use having the proper configuration? (retention, min-isr, replication-factor, partition count, etc)? (again, use AdminClient)
We are using Kafka streams (0.11.0.1) api to consume events from topic. But whenever there is a Kafka Broker outage/failover, we need to restart all Kafka streamers to recover from following error:
"Connection to node 39366 could not be established. Broker may not be available."
Just wondering if it is really required for streamers to do streams close and restart? Why streamers are not able recover from this issue automatically? Or are we missing any configuration in client/Broker?
Now we are planning to introduce code changes to handle all stream exception and trigger an automated restart of streams. But am really worried if that is the right way to handle this scenario.
If you think in a real world use case where hundreds of clients connected to brokers and restarting each of them, its not making any sense.