Kafka has multiple bootstrap server but we will connect only one server. Can we still publish data to it? - apache-kafka

Kafka has multiple bootstrap server like b1.com, b2.com, b3.com. While Producer Configuration, we are passing only b1.com as bootstrap server. What will happen once we will publish data to kafka?
As of my knowledge, it should not allow to publish the data if b1.com is not leader as kafka allow publishing data through leader only. Please guide me.

Even if b1.com is not the leader, you would still be able to publish data successfully. The reason being once you connect to a server, you can get the complete metadata of your topic (partitions, their respective leaders etc).
That being said, it is still recommended to provide all servers. Reason for this is the scenario where b1.com goes down. Now since you provided only one server to your producer, it will not be able to connect to kafka and your system effectively goes down.
On the other hand, if you had provided all the servers and assuming your topic was replicated - the system would still be functional even if b1.com had gone down.

Related

Kafka Connect Hangs when Kafka Node Goes Down

we are testing out kafka connect and in our testing noticed that when one of our kafka nodes goes down or is unavailable, kafka connect goes down (hangs).
In our rest and distributed properties, our broker configuration looks like: dp-kafka-01:9092, dp-kafka-02:9092, dp-kafka-03:9092.
We are looking at possibly using a load balancer to maintain uptime but I would be interested in seeing 1) if others have had this problem 2) their solution to it.
Many Thanks.
Putting a load balancer in front of the Connect REST API will not stop them from crashing upon broker failure.
You need to administer Kafka better to prevent the whole system from going down.

How to add health check for topics in KafkaStreams api

I have a critical Kafka application that needs to be up and running all the time. The source topics are created by debezium kafka connect for mysql binlog. Unfortunately, many things can go wrong with this setup. A lot of times debezium connectors fail and need to be restarted, so does my apps then (because without throwing any exception it just hangs up and stops consuming). My manual way of testing and discovering the failure is checking kibana log, then consume the suspicious topic through terminal. I can mimic this in code but obviously no way the best practice. I wonder if there is the ability in KafkaStream api that allows me to do such health check, and check other parts of kafka cluster?
Another point that bothers me is if I can keep the stream alive and rejoin the topics when connectors are up again.
You can check the Kafka Streams State to see if it is rebalancing/running, which would indicate healthy operations. Although, if no data is getting into the Topology, I would assume there would be no errors happening, so you need to then lookup the health of your upstream dependencies.
Overall, sounds like you might want to invest some time into using monitoring tools like Consul or Sensu which can run local service health checks and send out alerts when services go down. Or at the very least Elasticseach alerting
As far as Kafka health checking goes, you can do that in several ways
Is the broker and zookeeper process running? (SSH to the node, check processes)
Is the broker and zookeeper ports open? (use Socket connection)
Are there important JMX metrics you can track? (Metricbeat)
Can you find an active Controller broker (use AdminClient#describeCluster)
Are there a required minimum number of brokers you would like to respond as part of the Controller metadata (which can be obtained from AdminClient)
Are the topics that you use having the proper configuration? (retention, min-isr, replication-factor, partition count, etc)? (again, use AdminClient)

if Schema-Registry is down, does that mean Kafka will have downtime?

So there is a Kafka cluster and we have a Schema registry on top of it to validate schema for topics. For some maintenance reason if schema registry is down, Kafka will have downtime for that duration and it will not accept any new incoming data request ?
Kafka consumers and producers cache the schemas they retrieve from the schema registry internally. The Schema Registry is only contacted when a record is sent/received for which no schema was previously seen.
So as long as you don't start any new consumers/producers or send records with schemas that have not been previously sent you should be fine.
Take this with a grain of salt though, I've looked through the code and run a quick test with the console consumer and producer and could still produce and consume after killing the schema registry, but there may be cases where it still fails.
Update:
It occurred to me today that I probably have answered your question too literal, instead of trying to understand what you are trying to do :)
If you want to enable maintenance windows on your schema registry, it might be worthwhile looking into running two or more schema registries in parallel and configure both of them in your producers and consumers.
One of them will be elected master and write requests for schemas will be forwarded to that instance. That way you can perform rolling restarts if you need maintenance windows.
The KafkaAvroSerializer and deserializer maintain a schema ID cache.
So as long as no new producers and consumers come online, you would see no errors.
If the registry was down, Kafka will not have downtime, but you will start to see network exception errors in the clients since they use HTTP to talk to the registry whereas Kafka uses its own TCP protocol

Kafka: What happens when the entire Kafka Cluster is down?

We're testing out the Producer and Consumer using Kafka. A few questions:
What happens when all the brokers are down and they're not responding at all?
Does the Producer need to keep pinging the Kafka brokers to know when it is back up online? Or is there a more elegant way for the Producer application to know?
How does Zookeeper help in all this? What if the ZK is down as well?
If one or more brokers are down, the producer will re-try for a certain period of time (based on the settings). And during this time one or more of the consumers will not be able to read anything until the respective brokers are up.
But if the cluster is down for a longer period than your total re-try period, then probably you need to find a way to resend those failed messages again.
This is the one scenario where Kafka Mirroring(MirrorMaker tool) comes into picture.
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=27846330
Producer will fail because cluster will be unavailable, this means they will get a non retriable error from kafka client implementation and depending on your client process, message will buffer on the local send queue of your application.
I'm sure that if zookeeper is down your system will not work anymore. This is one of the weakness of Kafka, he need zookeeper to work.

Basic kafka topic availability checker

I need a simple health checker for Apache Kafka. I dont want something large and complex like Yahoo Kafka Manager, basically I want to check if a topic is healthy or not and if a consumer is healthy.
My first idea was to create a separate heart-beat topic and periodically send and read messages to/from it in order to check availability and latency.
The second idea is to read all the data from Apache Zookeeper. I can get all brokers, partitions, topics etc. from ZK, but I dont know if ZK can provide something like failure detection info.
As I said, I need something simple that I can use in my app health checker.
Some existing tools you can try them out if you haven't yet -
Burrow Linkedin's Kafka Consumer Lag Checking
exhibitor Netflix's ZooKeeper co-process for instance monitoring, backup/recovery, cleanup and visualization.
Kafka System Tools Kafka command line tools