Kafka Connect Hangs when Kafka Node Goes Down - apache-kafka

we are testing out kafka connect and in our testing noticed that when one of our kafka nodes goes down or is unavailable, kafka connect goes down (hangs).
In our rest and distributed properties, our broker configuration looks like: dp-kafka-01:9092, dp-kafka-02:9092, dp-kafka-03:9092.
We are looking at possibly using a load balancer to maintain uptime but I would be interested in seeing 1) if others have had this problem 2) their solution to it.
Many Thanks.

Putting a load balancer in front of the Connect REST API will not stop them from crashing upon broker failure.
You need to administer Kafka better to prevent the whole system from going down.

Related

Kafka has multiple bootstrap server but we will connect only one server. Can we still publish data to it?

Kafka has multiple bootstrap server like b1.com, b2.com, b3.com. While Producer Configuration, we are passing only b1.com as bootstrap server. What will happen once we will publish data to kafka?
As of my knowledge, it should not allow to publish the data if b1.com is not leader as kafka allow publishing data through leader only. Please guide me.
Even if b1.com is not the leader, you would still be able to publish data successfully. The reason being once you connect to a server, you can get the complete metadata of your topic (partitions, their respective leaders etc).
That being said, it is still recommended to provide all servers. Reason for this is the scenario where b1.com goes down. Now since you provided only one server to your producer, it will not be able to connect to kafka and your system effectively goes down.
On the other hand, if you had provided all the servers and assuming your topic was replicated - the system would still be functional even if b1.com had gone down.

Kafka Cluster cotinues to run without zookeeper

I have a five node kafka cluster(confluent 5.5 community edition) with 3 zookeeper nodeseach on different aws instances.
While doing failover testing , noticed that the kafka cluster works fine even if all zookeeper nodes are down.
I was able to produce , consume and also create new consumers.
why does the kafka cluster not stop if it cannot connect to any zookeeper nodes ?
What would be the possible issues if we are unaware of such a failure scenario in production and kafka cluster continues to run without zookeeper connectivity ?
how do we handle such a scenario ?
Broker leader election, topic creation, simple ACLs (if you use them) still depend on Zookeeper. For other basic functions relying on the Kafka bootstrap protocols, they might still work, sure. There should definitely be broker logs indicating connection was lost
Ideally you'd have basic process healthchecking and incident management software that you shouldn't miss critical services going down in prod
How to handle? Restart Zookeeper...

How to handle failure senario for kafka and zookeeper in kubernetes

What I have zookeeper setup which is running on server1, server2 and server3 and similarly kafka also running in server1, server2 and server3.
Setup are running in kubernetes.
Problem statement:
In case one zookeeper setup get down entire setup will get down, because kafka is depended to zookeeper. am i right?
If Q1 correct - Is there any way to make setup like if one zookeeper server will get down then kafka should run as it is?
How to expose kafka port in kubernetes setup ?
what is the recommended way to persist data in kubernetes for production server ?
I fail to see how Zookeeper questions are related to k8s... But you definitely should set affinity rules such that Zookeeper and Kafka are not on the same physical servers or sharing same disks
If one Zookeeper out of three goes down, you'll end up with a split brain event in that no single Zookeeper knows which should be responsible for leadership. This effectively can crash or corrupt Kafka, yes.
To mitigate that risk, you can choose to run 5 Zookeepers, in which case you can lose up to 3 servers to reach the same state. The Definitive Guide book covers these concepts in the first few chapters
Regarding the other questions - NodePorts and PVCs, generally speaking.
Use one of the popular Kafka Operators on Github and you'll not need to think too hard about setting those properties
You still must manually perform Kafka admin tasks in any installation... You can use extra services like Cruise Control if you want to reduce that workload, though

How to add health check for topics in KafkaStreams api

I have a critical Kafka application that needs to be up and running all the time. The source topics are created by debezium kafka connect for mysql binlog. Unfortunately, many things can go wrong with this setup. A lot of times debezium connectors fail and need to be restarted, so does my apps then (because without throwing any exception it just hangs up and stops consuming). My manual way of testing and discovering the failure is checking kibana log, then consume the suspicious topic through terminal. I can mimic this in code but obviously no way the best practice. I wonder if there is the ability in KafkaStream api that allows me to do such health check, and check other parts of kafka cluster?
Another point that bothers me is if I can keep the stream alive and rejoin the topics when connectors are up again.
You can check the Kafka Streams State to see if it is rebalancing/running, which would indicate healthy operations. Although, if no data is getting into the Topology, I would assume there would be no errors happening, so you need to then lookup the health of your upstream dependencies.
Overall, sounds like you might want to invest some time into using monitoring tools like Consul or Sensu which can run local service health checks and send out alerts when services go down. Or at the very least Elasticseach alerting
As far as Kafka health checking goes, you can do that in several ways
Is the broker and zookeeper process running? (SSH to the node, check processes)
Is the broker and zookeeper ports open? (use Socket connection)
Are there important JMX metrics you can track? (Metricbeat)
Can you find an active Controller broker (use AdminClient#describeCluster)
Are there a required minimum number of brokers you would like to respond as part of the Controller metadata (which can be obtained from AdminClient)
Are the topics that you use having the proper configuration? (retention, min-isr, replication-factor, partition count, etc)? (again, use AdminClient)

ActiveMQ Deployment model

I have gone through (not fully) ActiveMQ and tried to figure out the deployment model for my application.
I am bit confused on that.
I want to make the system High Availability and decided to use the following. Please correct me if anything is wrong or disadvantage of the model.
Deployment Modle:
Will deploy Brokers in M1 and M2 respectivley.
Use Hardware load balancer (Either F5 or Zeus) to connect either one of the broker (M1 or M2) based on the load.
Want to publish a message using Load balancer URL.
I have gone through network of brokers and we need to mintain some topology. I fell which makes the system more complicated if system grows horizontally. So it is better to have one load balancer to distribute the load.
Questions
Is this above model will send message to any one of the Broker?
Consumer Will be deployed in Tomcat (Think i need to use embeded brokers to configure either M1 or M2). Is it possible to use Load balancer URL instaed of M1 or M2?
Is it possible to have single Web Console Admin to monitor both M1 and M2.
Do we have any performance issue using Spring's feature to consume message.
Sorry to shoot out so many questions. Please help me to correct the deployment model.
I think the best way to get some load balancing with some activemq servers is having a : network of brokers and your consumers/producers (in your webapps) should use some failover
So if a producer p1 send a message on a queue on broker 1, the consumer c1 can read the message on broker 2.
[Edit] I have never tried to add some hardware balancer instead of the activemq protocole failover. It should work : just try it and tell us.
3- I do not think it is possible to have only one Web Console to monitor both of your brokers.
4- As far as I am concerned I do not have any performance issue with my Spring configuration.
There are a lot of questions there.
The first thing is to do is start simple. If your application's load is being handled with just one broker, consider setting up high availability through a master-slave setup. For this you do not need a load balancer - the ActiveMQ client library has a failover mechanism where you can define the URLs to a set of brokers that the client should attempt to connect to.
If you are looking at setting up an infrastructure where one broker will not be able to deal with the message load (you can test the maximum throughput of your broker using the performance module), I would advise you to read up on how networks of brokers work. If you do go down this path, you really need to understand ActiveMQ.
On monitoring, a web console can only show you the internals of a single broker. To get insight around what is going on around a set of brokers you will need a monitoring tool such as FuseHQ/Hyperic that is able to aggregate JMX information from a number of boxes.
Performance with Spring is not a problem as long as you configure it correctly (see the section on PooledConnectionFactory).
I see that you are a new user, so if this answers your question, please tick it.