what is bootstrap-server in kafka config? - apache-kafka

I have just started learning kafka and continuously I am coming across a term bootstrap-server.
Which server does it represent in my kafka cluster?

It is the url of one of the Kafka brokers which you give to fetch the initial metadata about your Kafka cluster.
The metadata consists of the topics, their partitions, the leader brokers for those partitions etc.
Depending upon this metadata your producer or consumer produces or consumes the data.
You can have multiple bootstrap-servers in your producer or consumer configuration. So that if one of the broker is not accessible, then it falls back to other.

We know that a kafka cluster can have 100s or 1000nds of brokers (kafka servers). But how do we tell clients (producers or consumers) to which to connect? Should we specify all 1000nds of kafka brokers in the configuration of clients? no, that would be troublesome and the list will be very lengthy. Instead what we can do is, take two to three brokers and consider them as bootstrap servers where a client initially connects. And then depending on alive or spacing, those brokers will point to a good kafka broker.
So bootstrap.servers is a configuration we place within clients, which is a comma-separated list of host and port pairs that are the addresses of the Kafka brokers in a "bootstrap" Kafka cluster that a Kafka client connects to initially to bootstrap itself.
A host and port pair uses : as the separator.
localhost:9092
localhost:9092,another.host:9092
So as mentioned, bootstrap.servers provides the initial hosts that act as the starting point for a Kafka client to discover the full set of alive servers in the cluster.
Special Notes:
Since these servers are just used for the initial connection to discover the full cluster membership (which may change dynamically), this list does not have to contain the full set of servers (you may want more than one, though, in case a server is down).
Clients (producers or consumers) make use of all servers irrespective of which servers are specified in bootstrap.servers for bootstrapping.

bootstrap.servers is a comma-separated list of host and port pairs that are the addresses of the Kafka brokers in a "bootstrap" Kafka cluster that a Kafka client connects to initially to bootstrap itself.
Kafka broker
A Kafka cluster is made up of multiple Kafka Brokers. Each Kafka Broker has a unique ID (number). Kafka Brokers contain topic log partitions. Connecting to one broker bootstraps a client to the entire Kafka cluster. For failover, you want to start with at least three to five brokers. A Kafka cluster can have, 10, 100, or 1,000 brokers in a cluster if needed.
more information: check this, official doc

Related

Impact of publishing messages to one Kafka broker with in a cluster

I have Kafka cluster with three brokers and zookeeper instances. Kept the replication factor of 2 for each partition.
i want to understand the impact of publishing messages to single node in a cluster by giving one broker address. Will this broker sends message to other brokers if messages fit into partitions hold by other brokers?
can someone explain how internal sync works or else point to resources.
giving one broker address
Even if you give one address, the bootstrap protocol returns all brokers to the client.
The partitioner logic determines which partition in which broker to send the data to - you target partitions, not brokers in the client.

What is the difference between Kafka Cluster and Kafka Broker?

Has Kafka cluster and Kafka broker the same meaning?
I know cluster has multiple brokers (Is this wrong?).
But when I write code to produce messages, I find awkward option.
props.put("bootstrap.servers", "kafka001:9092, kafka002:9092, kafka003:9092");
Is this broker address or cluster address? If this is broker address, I think it is not good because we have to modify above address when brokers count changes.
(But it seems like broker address..)
Additionally, I saw in MSK in amazon, we can add broker to each AZ.
It means, we cannot have many broker. (Three or four at most?)
And they guided we should write this broker addresses to bootstrap.serveroption as a,` seperated list.
Why they don't guide us to use clusters address or ARN?
A Kafka cluster is a group of Kafka brokers.
When using the Producer API it is not required to mention all brokers within the cluster in the bootstrap.servers properties. The Producer configuration documentation on bootstrap.servers gives the full details:
A list of host/port pairs to use for establishing the initial connection to the Kafka cluster. The client will make use of all servers irrespective of which servers are specified here for bootstrapping—this list only impacts the initial hosts used to discover the full set of servers. This list should be in the form host1:port1,host2:port2,.... Since these servers are just used for the initial connection to discover the full cluster membership (which may change dynamically), this list need not contain the full set of servers (you may want more than one, though, in case a server is down).
All brokers within a cluster share meta information of other brokers in the same cluster. Therefore, it is sufficient to mention even only one broker in the bootstrap-servers properties. However, you should still mention more than one in case of the one broker being not available for whatever reason.

Is it possible to produce to a kafka topic when only 1 of the brokers is reachable?

Is it possible to produce to a Kafka topic when only 1 of the brokers is reachable from the producer, none of the zookeeper nodes are reachable from the producer, but all of the brokers are healthy and are reachable from each other?
For example, this would be required if I were to produce messages via an SSH tunnel. If this were for a temporary push I could possibly create the topic with replication factor 1 and have all partitions assigned to the broker in question, and reassign the partitions after the fact, but I'm hoping there is a more flexible setup.
This is all using the java client.
Producers don't interact with Zookeeper so it's not an issue.
The only requirement for Producers is to be able to connect to the brokers that are leaders for the partitions they want to use.
If the broker you connect to is the leader for the partitions you want to use, then yes you can produce to it.
Otherwise it's not going to work. Also creating a topic may not help as its partitions could be assigned to any brokers. Also in order to create a topic, a client has to connect to the controller which may not be the broker you can reach.
If you can only connect to 1 "thing", you may want to consider using something like a REST Proxy. Your "isolated" environment could send REST requests to the proxy which is able to connect to all brokers in the cluster.

How many connections are made by producer and consumer to a Kafka cluster?

Can anyone shed some light on the number of connections and what type of connections are made by the Kafka Java producers and Kafka Java consumers to a Kafka cluster.
Are the number of connections based on the number of topics or partitions or brokers in the cluster?
Each consumer/producer needs to be connected to the broker which is leader for the partition that the consumer/producer wants to read/write.
It means that a client doesn't need to be connected to all brokers inside a cluster but just the brokers needed for reading/sending messages.
During the initial configuration, we provide a list of brokers to connect to (which could be even only one). Using such broker(s), the client gets metadata information about the topic/partitions it wants to use and where they are placed (other brokers in the cluster). Such connections need to be in place for client working on the desired topic/partitions.

How to setup/connect distributed Kafka brokers, producers and consumers on a local network?

I have setup Apache Kafka and confirmed that producers and consumers work on the localhost.
How to set up Kafka so that:
multiple producers feed messages into a broker on a network computer
many consumers on the network can consume the messages from the broker
I noticed the following line: zookeeper.connect=localhost:2181 in server.properties which is used to start the kafka server. If this is the setting, is the setting for what addresses it listens to or is it specifying the server's address/port is on the network?
The zookeeper is used internally by Kafka to coordinate the cluster (leader election). In versions of Kafka before 0.8, ZK was the exclusive store for consumer offsets (what is consumed so far), but from 0.8.1, I think, you can choose whether to store offsets in ZK or in a special Kafka topic called __consumer_offsets.
What you're interested in is advertised.host.name and advertised.port settings that Kafka exposes to clients (or "what addresses it listens to", as you put it).
It is the name of the zookeeper server that Kafka connects to. The documentation for the Broker configuration can be found here http://kafka.apache.org/documentation.html#brokerconfigs