Kafka bootstrap.servers config value as subset of brokers

Kafka bootstrap.servers config value as subset of brokers - apache-kafka

For kafka bootstrap.servers config one should provide all the broker details.
If not, how does kafka producer/consumer will resolve all brokers available?
If yes, Do producer/consumer needs restart after adding a node?

The bootstrap.servers config is mandatory for producer and consumer and it must have one broker host:port at least in order to allow producer/consumer to connect to the cluster and then getting all the other metadata (where topics are, who are the leaders and so on).
If you add a node, you don't need to restart producer/consumer but of course the current configuration will be stale. I mean ...
If you have a cluster with broker1 and broker2 and you put them into the bootstrap.servers config then producer/consumer connect to the cluster.
Let's imagine that you add broker3 and some new topics/partitions are deployed there. Your producer/consumer will be able to write/read to/from these new topics/partitions because information about them (so that they are on broker3) will be available through asking metadata to broker1 or broker2.
Of course if you shutdown broker1 and broker2 and your cluster will be just made of broker3, you'll have to restart producer/consumer with the bootstrap.servers config with just broker3.
The bootstrap.servers config is just (as the name implies) a configuration used on starting; then each client will be able to connect directly to a broker (for reading/writing topic/partitions there) even if such broker isn't in the bootstrap.servers config.

Related

Kafka broker setup

To connect to a Kafka cluster I've been provided with a set of bootstrap servers with name and port :
s1:90912
s2:9092
s3:9092
Kafka and Zookeeper are running on the instance s4. From reading https://jaceklaskowski.gitbooks.io/apache-kafka/content/kafka-properties-bootstrap-servers.html, it states:
bootstrap server is a comma-separated list of host and port pairs that
are the addresses of the Kafka brokers in a "bootstrap" Kafka cluster
that a Kafka client connects to initially to bootstrap itself.
I reference the above bootstrap server definition as I'm trying to understand the relationship between the kafka brokers s1,s2,s3 and kafka,zookeeper running on s4.
To connect to the Kafka cluster, I set the broker to a CSV list of 's1,s1,s3'. When I send messages to the CSV list of brokers, to verify the messages are added to the topic, I ssh onto the s4 box and view the messages on the topic.
What is the link between the Kafka brokers s1,s2,s3 and s4? I cannot ssh onto any of the brokers s1,s2,s3 as these brokers do not seem accessible using ssh, should s1,s2,s3 be accessible?
The individual responsible for the setup of the Kafka box is no longer available, and I'm confused as to how this configuration works. I've searched for config references of the brokers s1,s2,s3 on s4 but there does not appear to be any configuration.
When Kafka is being set up and configured what allows the linking between the brokers (in this case s1,s2,s3) and s4?
I start Kafka and Zookeeper on the same server, s4.
Should Kafka and Zookeeper also be running on s1,s2,s3?

What is the link between the Kafka brokers s1,s2,s3 and s4?
As per the Kafka documentation about adding nodes to a cluster, each server must share the same zookeeper.connect string and have a unique broker.id to be part of the cluster.
You may check which nodes are in the cluster via zookeeper-shell with an ls /brokers/ids, or via the Kafka AdminClient API, or kafkacat -L
should s1,s2,s3 be accessible?
Via SSH? They don't have to be.
They should respond to TCP connections from your Kafka client machines on their Kafka server ports, though
Should Kafka and Zookeeper also be running on s1,s2,s3?
You should not have 4 Zookeeper servers in a cluster (odd numbers, only)
Otherwise, you've at least been given some ports for Kafka on those machines, therefore Kafka should be

what is bootstrap-server in kafka config?

I have just started learning kafka and continuously I am coming across a term bootstrap-server.
Which server does it represent in my kafka cluster?

It is the url of one of the Kafka brokers which you give to fetch the initial metadata about your Kafka cluster.
The metadata consists of the topics, their partitions, the leader brokers for those partitions etc.
Depending upon this metadata your producer or consumer produces or consumes the data.
You can have multiple bootstrap-servers in your producer or consumer configuration. So that if one of the broker is not accessible, then it falls back to other.

We know that a kafka cluster can have 100s or 1000nds of brokers (kafka servers). But how do we tell clients (producers or consumers) to which to connect? Should we specify all 1000nds of kafka brokers in the configuration of clients? no, that would be troublesome and the list will be very lengthy. Instead what we can do is, take two to three brokers and consider them as bootstrap servers where a client initially connects. And then depending on alive or spacing, those brokers will point to a good kafka broker.
So bootstrap.servers is a configuration we place within clients, which is a comma-separated list of host and port pairs that are the addresses of the Kafka brokers in a "bootstrap" Kafka cluster that a Kafka client connects to initially to bootstrap itself.
A host and port pair uses : as the separator.
localhost:9092
localhost:9092,another.host:9092
So as mentioned, bootstrap.servers provides the initial hosts that act as the starting point for a Kafka client to discover the full set of alive servers in the cluster.
Special Notes:
Since these servers are just used for the initial connection to discover the full cluster membership (which may change dynamically), this list does not have to contain the full set of servers (you may want more than one, though, in case a server is down).
Clients (producers or consumers) make use of all servers irrespective of which servers are specified in bootstrap.servers for bootstrapping.

bootstrap.servers is a comma-separated list of host and port pairs that are the addresses of the Kafka brokers in a "bootstrap" Kafka cluster that a Kafka client connects to initially to bootstrap itself.
Kafka broker
A Kafka cluster is made up of multiple Kafka Brokers. Each Kafka Broker has a unique ID (number). Kafka Brokers contain topic log partitions. Connecting to one broker bootstraps a client to the entire Kafka cluster. For failover, you want to start with at least three to five brokers. A Kafka cluster can have, 10, 100, or 1,000 brokers in a cluster if needed.
more information: check this, official doc

Kafka - consumers / producers works with all Zookeper instances down

I've configured a cluster of Kafka brokers and a cluster of Zk instances using kafka_2.11-1.1.0 distribution archive.
For Kafka brokers I've configured config/server.properties
broker.id=1,2,3
zookeeper.connect=box1:2181,box2:2181,box3:2181
For Zk instances I've configured config/zookeeper.properties:
server.1=box1:2888:3888
server.2=box3:2888:3888
server.3=box3:2888:3888
I've created a basic producer and a basic consumer and I don't know why I am able to write messages / read messages even if I shut down all the Zookeeper
instances and have all the Kafka brokers up and running.
Even booting up new consumers, producers works without any issue.
I thought having a quorum of Zk instances is a vital point for a Kafka cluster.
For both consumer and producer, I've used following configuration:
bootrapServers=box1:9092,box2:9092,box3:9092
Thanks

I thought having a quorum of Zk instances is a vital point for a Kafka cluster.
Zookeeper quorum is vital for managing partition lists, leaders, etc. In general, ZK is necessary for management that is done by the cluster coordinator in the cluster.
Basically, right now (with ZK down), you cannot modify topics (as the partition metadata is stored in ZK), start up / shut down brokers (as they use ZK for discovery) and other similar operations.
Even booting up new consumers, producers works without any issue.
Producer/consumer operations reach out to brokers only. The broker instance can still append to the log, and can still communicate with other brokers to have replication. So it is possible to send a message, get it received by broker and saved to disk, with other brokers replicating (as they are continuously sending fetch requests to the leader (and they know who this partition's leader is because they saved that data when ZK was still running)).

Does Kafka broker store metadata?

Does Kafka broker store metadata which producer API uses (e.g. which partitions are leader for a topic etc.)? As per my understanding this metadata is stored in Zookeeper , is it correct? If it is true then how Brokers are updated by Zookeeper with latest information?

All Kafka brokers can answer a metadata request that describes the current state of the cluster: what topics there are, which partitions those topics have, which broker is the leader for those partitions etc.
ZooKeeper is responsible for:
Electing a controller broker - and making sure there is only one
Cluster membership - allowing brokers to join a cluster
Topic configuration - which topics exist, how many partitions each has, where are the replicas, who is the preferred leader, what configuration overrides are set for each topic
Quotas - how much data is each client allowed to read and write
ACLs - who is allowed to read and write to which topic
There is regular communication between Kafka and ZooKeeper such that ZooKeeper knows a Kafka broker is still alive (ZooKeeper heartbeat mechanism) and also in response to events such as a topic being created or a replica falling out of sync for a topic-partition.

Kafka is a distributed system and is built to use Zookeeper which is responsible for controller election, topic configuration, clustering etc.
More precisely, Zookeeper initiates controller election. The controller broker is a single broker in the Kafka cluster which takes care of leader broker and followers for every partition. When a particular broker is taken down, the controller lets other replicas know (in order to handle partition leaders etc). Moreover, when the controller fails then Zookeeper initiates new elections in order to elect the new broker which will act as the controller.
Furthermore, Zookeeper knows which brokers are part of the Kafka cluster and which are still alive. Similarly, it is also aware of topic-specific information such as which topics exist, how many partitions each has, where are the replicas and so on.
Zookeeper also stores information regarding quotas and ACLs, i.e. what volume of data each client is allowed to consume/produce and also, who is allowed to consume or produce from a particular topic.

How to setup/connect distributed Kafka brokers, producers and consumers on a local network?

I have setup Apache Kafka and confirmed that producers and consumers work on the localhost.
How to set up Kafka so that:
multiple producers feed messages into a broker on a network computer
many consumers on the network can consume the messages from the broker
I noticed the following line: zookeeper.connect=localhost:2181 in server.properties which is used to start the kafka server. If this is the setting, is the setting for what addresses it listens to or is it specifying the server's address/port is on the network?

The zookeeper is used internally by Kafka to coordinate the cluster (leader election). In versions of Kafka before 0.8, ZK was the exclusive store for consumer offsets (what is consumed so far), but from 0.8.1, I think, you can choose whether to store offsets in ZK or in a special Kafka topic called __consumer_offsets.
What you're interested in is advertised.host.name and advertised.port settings that Kafka exposes to clients (or "what addresses it listens to", as you put it).

It is the name of the zookeeper server that Kafka connects to. The documentation for the Broker configuration can be found here http://kafka.apache.org/documentation.html#brokerconfigs

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse