Start multiple brokers in kafka - apache-kafka

Beginner in kafka and confluent package.I want to start multiple brokers so as to consume the topic.
It can be done via this setting -
{'bootstrap.server' : 'ip:your_host,...',}
This setting can be defined in the server config file or else in the script as well.
But how shall I run those?. If I just add multiple end points to the bootstrap servers, it gives this error:
java.lang.IllegalArgumentException: requirement failed: Each listener must have a different name, listeners: PLAINTEXT://:9092, PLAINTEXT://:9093

cp config/server.properties config/server-1.properties
cp config/server.properties config/server-2.properties
config/server-1.properties:
broker.id=1
listeners=PLAINTEXT://:9093
log.dirs=/tmp/kafka-logs-1
config/server-2.properties:
broker.id=2
listeners=PLAINTEXT://:9094
log.dirs=/tmp/kafka-logs-2
Reference: kafka_quickstart_multibroker

Done.
I had actually mentioned same port for producer and consumer and hence was the issue.
Set up brokers on different ports and works fine even if one broker goes down.

Related

Trouble with Apache Kafka to Allow External Connections

I'm just having a difficult time with Kafka right now, but I feel like I'm close.
I have two VMs on FreeNAS running locally. Both Running Ubuntu 18.04 LTS.
VM Graylog: 192.168.1.25. Running Graylog Server. Working well retrieving rsyslogs and apache from itself.
VM Kafka: 192.168.1.16. Running Kafka.
My goal is to have VM Graylog pull logs from VM Kafka, via a Graylog Kafka UDP input. The secondary goal is to replicate this, except tha the Kafka instance will sit on my VPS server feeding apache logs from a website. Of course, I want to test this in a dev environment first.
I am able to have my VM Kafka server successfully listen through this line of code:
/opt/kafka_2.13-2.6.0/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic rsyslog_kafka --from-beginning
This is my 60-kafka.conf file:
module(load="omkafka")
template(name="json"
type="list"
option.json="on") {
constant(value="{")
constant(value="\"#timestamp\":\"") property(name="timereported" dateFormat="rfc33$
constant(value="\",\"#version\":\"1")
constant(value="\",\"message\":\"") property(name="msg")
constant(value="\",\"host\":\"") property(name="hostname")
constant(value="\",\"severity\":\"") property(name="syslogseverity-text")
constant(value="\",\"facility\":\"") property(name="syslogfacility-text")
constant(value="\",\"programname\":\"") property(name="programname")
constant(value="\",\"procid\":\"") property(name="procid")
constant(value="\"}\n")
}
action(
broker=["192.168.1.16:9092"]
type="omkafka"
topic="rsyslog_kafka"
template="json"
)
I'm using the default server.properties file which doesn't contain any listeners, just the defaults. I do understand I need to set the listeners and advertised.listeners.
I've attempted the following settings to no avail:
Attempt 1:
listeners = PLAINTEXT://localhost:9092
advertised.listeners=PLAINTEXT://192.168.1.16:9092
Attempt 2:
listeners = PLAINTEXT://127.0.0.1:9092
advertised.listeners=PLAINTEXT://192.168.1.16:9092
This after reloading both Kafka and Rsyslog and confirming their statuses are active.
Example errors when attempting to read messages.
Bunch of these
[2020-08-20 00:52:42,248] WARN [Consumer clientId=consumer-console-consumer-70205-1, groupId=console-consumer-70205] Connection to node -1 (localhost/127.0.0.1:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
Followed by an infinite amount of these:
[2020-08-20 00:48:50,598] WARN [Consumer clientId=consumer-console-consumer-11975-1, groupId=console-consumer-11975] Error while fetching metadata with correlation id 254 : {rsyslog_kafka=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
I feel like I'm close. Perhaps there is something I'm just understanding. I've read lots of similar articles where they say just replace the IP addresses with your server. I feel like I've done that, with no success.
You need to set listeners to PLAINTEXT://0.0.0.0:9092 in order to bind externally.
The advertised listener ought to be set to an address that your consumers will be able to use to discover the cluster
Note: Docker Compose might be easier than VMs

Kafka Topic Creation with --bootstrap-server gives timeout Exception (kafka version 2.5)

When trying to create topic using --bootstrap-server,
I am getting exception "Error while executing Kafka topic command: Timed out waiting for a node" :-
kafka-topics --bootstrap-server localhost:9092 --topic boottopic --replication-factor 3 --partitions
However following works fine, using --zookeeper :-
kafka-topics --zookeeper localhost:2181--topic boottopic --replication-factor 3 --partitions
I am using Kafka version 2.5 and as per knowledge since version >2.2, all the offsets and metadata are stored on the broker itself. So, while creating topic there's no need to connect to zookeeper.
Please help to understand this behaviour
Note - I have set up a Zookeeper quorum and Kafka broker cluster each containing 3 instance on a single machine (for dev purposes)
Old question, but Ill answer anyways for the sake of internet wisdom.
You probably have auth set, when using --bootstrap-server you need to also specify your credentials with --command-config
since version >2.2, all the ... metadata are stored on the broker itself
False. Topic metadata is still stored on Zookeeper until KIP-500 is completed.
The AdminClient.createTopics() method, however that is used internally will delegate to Zookeeper from the Controller broker node in the cluster.
Hard to say what the error is, but most common issue is that Kafka is not running, you have SSL enabled and the certs are wrong, or the listeners are misconfigured.
For example, in the listeners, the default broker port on a Cloudera Kafka installation would be 6667, not 9092
each containing 3 instance on a single machine
Running 3 instances on one machine does not improve resiliency or performance unless you have 3 CPUs and 3 separate HDDs on that one motherboard.
"Error while executing Kafka topic command: Timed out waiting for a
node"
This seems like your broker is down or is inaccessible from where you are running those commands or it hasn't started yet (perhaps still starting).
Sometimes the broker startup takes long because it performs some cleaning operations. You may want to check your Kafka broker startup logs and see if it is ready and then try creating the topics by giving in the bootstrap servers.
There could also be some errors during your Kafka broker startup like Too many open files or wrong zookeeper url, zookeeper not being accessible by your broker, to name a few.
If you are able to create topics by passing in your Zookeeper URL means that zookeeper is up but does not necessarily mean that your Kafka broker(s) are also up and running.
Since a zookeeper can start without a broker but not vice-versa.

How to add two more kafka brokers in the local machine if my current running kafka broker already has the data

I have one broker running in my local machine with Windows OS which has 2-3 topics with messages stored. I want to scale up my machine by adding two more broker instances. I have followed all the steps to configure 3 brokers on the same machine by creating different properties file.
My broker=0 getting shutdown when I am starting broker=1 server with below error.
[2019-07-11 13:56:33,580] INFO Stopping serving logs in dir C:\kafka_2.12-2.2.1\data\kafka (kafka.log.LogManager)
[2019-07-11 13:56:33,585] ERROR Shutdown broker because all log dirs in C:\kafka_2.12-2.2.1\data\kafka have failed (kafka.log.LogManager)
Is it possible to add more brokers if my existing broker instance has the data.
Or do I need to delete the data directory and freshly start the broker 0. Is there any possibility to preserve the data without deleting it from the kafka server.
Yes you can add brokers to your cluster and migrate/spread data across all your brokers.
The Expanding your cluster section in the documentation details the steps to achieve this.
After starting the new brokers, you basically need to use the bin/kafka-reassign-partitions.sh tool (other 3rd party tools also exists) to move data onto them.
Please note however that adding brokers on the same machine does not provide a lot of resiliency as if the machine was to go down, all brokers would be affected. But if you want to just play around and learn about Kafka that may be fine.
To run multiple brokers on the same physical machine, it is necessary for each broker in the config to specify a unique broker.id, different log.dirs and ports in listeners.
For example,
config/server{1,2,3}.properties
in every config set difference
broker.id=<id>
log.dirs=/data/kafka<id>
listeners=PLAINTEXT://localhost:909<id>
When all three brokers start, new topics will be created evenly throughout the cluster, but old ones need to be rebalanced.

kafka with multiple zookeeper config

A bit confused about clustering setup:
Zookeeper could be setup as a cluster by configuring myid (1,2,3...) in the file and having for example zookeeper1:2888:3888, zookeeper2:2889:3889 in the zoo.cfg file
For Kafka, in the server.properties file, is it must to specify the full list of zookeeper server for parameter zookeeper.connect, or just 1 is enough? Is there any differences?
I've seen practices of specifying the full list of zookeeper server even when creating a topic, e.g. /opt/kafka/bin/kafka-topics.sh --create --zookeeper x.x.x.x:2181,x.x.x.x:2181,x.x.x.x:2181 --replication-factor 1 --partitions 1 --topic sample_test
---Production and DR setup (large latency is expected between production and dr)---
Let's say, having 1 Kafka (kafka1) and 1 zookeeper server (zookeeper1) in production, 1 kafka (kafka2) and 1 zookeeper server (zookeeper2) in DR, and form those 2 zookeepers into a cluster;
running uReplicator to replicate data in production to DR; from uReplicator example, it seems the configuration is like: kafka1 (in production) is connecting to "zookeeper1:2181/cluster1", and kafka2 (in DR) is connecting to "zookeeper1:2181/cluster2", what's the meaning of "/cluster1", "/cluster2"? what's the right config for this scenario, what's the idea of having kafka2 in DR connects to zookeeper1 in prod?
is it must to specify the full list of zookeeper server for parameter zookeeper.connect
It is good practice to put at least 3 or 5. If you only put one, and that goes down, Kafka will likely not work as expected, or fail out.
in DR, and form those 2 zookeepers into a cluster
It's not generally encouraged to share Zookeepers clusters between Kafka clusters, as Kafka puts a reasonable amount of load on Zookeeper for high volume Kafka clusters.
Though, as you point out
connecting to "zookeeper1:2181/cluster1", and kafka2 (in DR) is connecting to "zookeeper1:2181/cluster2", what's the meaning of "/cluster1", "/cluster2"?
This is called a Chroot in Zookeeper. Think of it like a directory, or namespace for each unique Kafka cluster within the Zookeeper cluster.
what's the idea of having kafka2 in DR connects to zookeeper1 in prod?
Well, you wouldn't. If Kafka2 has its own unique topic data that is not being replicated to Kafka1, then pointing at the Zookeeper data that says those topics existed on Kafka2, but not Kafka1 will only result in confusion and error.
Also, I am unaware of how uReplicator works other than MirrorMaker, but you'll also want to prepare a DR strategy for Zookeeper, not only Kafka
You have two questions in there. I'll try to tackle the first one at least:
Specifying only one zookeeper server:port is usually enough, but in production instances/properties, you always want to configure all of them. If one of the servers is down, but the cluster is still up and running (say, 2 out of 3 Zookeeper servers are up), Kafka will try the next server in the config, until it finds one it can talk to. However, if the only one you chose to put happens to be down at that exact time, the server won't be able to talk to Zookeeper at all. It's best to always include the entire list of zookeeper servers in configuration.

Kafka consumer Error : ERROR Unknown error when running consumer: (kafka.tools.ConsoleConsumer)

In my case, I am having Kafka binary kafka_2.11-1.0.0 install both on server and client side, but after creating the topic my consumer is not working when I was using --bootstrap-server instead of --zookeeper.
And I changed as per warning coming. Would you please update why the consumer is not working with expected one but working for the old way of calling consumer.
As mentioned in the comments as well:
2181 is a common zookeeper port number.
It seems you tried to update the command but not the url. Make sure to use the actual broker url and port rather than trying to talk with the new command to the zookeeper port and url.