How to migrate kafka details to other server? - apache-kafka

How do I got all details of kafka server like kafka topic names, partitions, groups etc before shutting kafka server. And use this information to ready new kafka server ?
Is there any option for this type of backup ?

Kafka uses zookeeper to store metadata.
If you want to get overview of all topics, partitions or consumer groups, you can collect from zookeeper shell.
Example : to collect the consumer groups, use ls /consumers as following:
kafka % bin/zookeeper-shell.sh localhost:2181 <<< "ls /consumers"
Connecting to localhost:2181
Welcome to ZooKeeper!
JLine support is disabled
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
[console-consumer-66605, console-consumer-84350, console-consumer-9354, console-consumer-28182, console-consumer-61085, console-consumer-67016, console-consumer-81504, console-consumer-47711, console-consumer-87328, console-consumer-27998, console-consumer-73330, console-consumer-73529, console-consumer-17369, console-consumer-75626, console-consumer-6886, console-consumer-11693]
Similarly for collecting topic names, use :
ls /brokers/topics
For collecting number of partitions :
ls /brokers/topics/<topic name>/partitions
You can export these details for some file and use for next server

Related

Replication factor: 3 larger than available brokers: 1 when starting the kafka

I have a kafka installed in my mac last year, which has many topics within the system. Now I upgrade the zookeeper and kafka to the latest version.
by running zookeeper, it is successful
zookeeper-server-start /usr/local/etc/kafka/zookeeper.properties
Then a broker:
kafka-server-start /usr/local/etc/kafka/server.properties
however it comes up with the error
INFO [Admin Manager on Broker 0]: Error processing create topic request CreatableTopic(name='_confluent-license', numPartitions=1, replicationFactor=3, assignments=[], configs=[CreateableTopicConfig(name='cleanup.policy', value='compact'), CreateableTopicConfig(name='min.insync.replicas', value='2')]) (kafka.server.AdminManager)
org.apache.kafka.common.errors.InvalidReplicationFactorException: Replication factor: 3 larger than available brokers: 1.
How would I solve it?
A Confluent enterprise license is stored in the _confluent-command topic. This topic is created by default and contains the license that corresponds to the license key supplied through the confluent.license property. So when you're starting the Kafka server it tries to create it with replication-factor of 3 but there is only 1 broker available so it failed.
Set confluent.topic.replication.factor property to 1 in /usr/local/etc/kafka/server.properties file.
#Pardeep 's answer worked for me, but for me there were more replication factors to set (I'm using Confluent 6.2.1):
confluent.balancer.topic.replication.factor=1
confluent.durability.topic.replication.factor=1
confluent.license.topic.replication.factor=1
confluent.tier.metadata.replication.factor=1
transaction.state.log.replication.factor=1
offsets.topic.replication.factor=1
You can use findstr (on Windows) or grep (on Unix-based OS) to extract them all from the console output:
kafka-server-start /usr/local/etc/kafka/server.properties | findstr "replication.factor"

Missing required argument "[zookeeper]"

i'm trying to start a consumer using Apache Kafka, it used to work well, but i had to format my pc and reinstall everything again, and now when trying to run this:
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
this is what i'm getting:
Missing required argument "[zookeeper]"
Option Description
------ -----------
--blacklist <blacklist> Blacklist of topics to exclude from
consumption.
--bootstrap-server <server to connect
to>
--consumer.config <config file> Consumer config properties file.
--csv-reporter-enabled If set, the CSV metrics reporter will
be enabled
--delete-consumer-offsets If specified, the consumer path in
zookeeper is deleted when starting up
--formatter <class> The name of a class to use for
formatting kafka messages for
display. (default: kafka.tools.
DefaultMessageFormatter)
--from-beginning If the consumer does not already have
an established offset to consume
from, start with the earliest
message present in the log rather
than the latest message.
--key-deserializer <deserializer for
key>
--max-messages <Integer: num_messages> The maximum number of messages to
consume before exiting. If not set,
consumption is continual.
--metrics-dir <metrics directory> If csv-reporter-enable is set, and
this parameter isset, the csv
metrics will be outputed here
--new-consumer Use the new consumer implementation.
--property <prop>
--skip-message-on-error If there is an error when processing a
message, skip it instead of halt.
--timeout-ms <Integer: timeout_ms> If specified, exit if no message is
available for consumption for the
specified interval.
--topic <topic> The topic id to consume on.
--value-deserializer <deserializer for
values>
--whitelist <whitelist> Whitelist of topics to include for
consumption.
--zookeeper <urls> REQUIRED: The connection string for
the zookeeper connection in the form
host:port. Multiple URLS can be
given to allow fail-over.
my guess is that there's some kind of problem with the zookeeper connection port, because it's telling me to specify the port which zookeeper has to use to get connected to kafka. I'm not sure of this though, and don't know how to figure out the port to specify if this was the problem. Any suggestions??
Thanks in advance for the help
It looks like you are using an old version of the Kafka tools that requires to set --new-consumer if you want to directly connect to the brokers.
I'd recommend picking a recent version of Kafka so you only need to specify --bootstrap-server like in your example: http://kafka.apache.org/downloads

Kafka Connect FileStreamSource ignores appended lines

I'm working on an application to process logs with Spark and I thought to use Kafka as a way to stream the data from the log file. Basically I have a single log file (on the local file system) which is continuously updated with new logs, and Kafka Connect seems to be the perfect solution to get the data from the file along with the new appended lines.
I'm starting the servers with their default configurations with the following commands:
Zookeeper server:
zookeeper-server-start.sh config/zookeeper.properties
zookeeper.properties
dataDir=/tmp/zookeeper
clientPort=2181
maxClientCnxns=0
Kafka server:
kafka-server-start.sh config/server.properties
server.properties
broker.id=0
log.dirs=/tmp/kafka-logs
zookeeper.connect=localhost:2181
[...]
Then I created the topic 'connect-test':
kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic connect-test
And finally I run the Kafka Connector:
connect-standalone.sh config/connect-standalone.properties config/connect-file-source.properties
connect-standalone.properties
bootstrap.servers=localhost:9092
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=true
value.converter.schemas.enable=true
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
offset.storage.file.filename=/tmp/connect.offsets
offset.flush.interval.ms=10000
connect-file-source.properties
name=my-file-connector
connector.class=FileStreamSource
tasks.max=1
file=/data/users/zamara/suivi_prod/app/data/logs.txt
topic=connect-test
At first I tested the connector by running a simple console consumer:
kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic connect-test --from-beginning
Everything was working perfectly, the consumer was receiving the logs from the file and as I was adding logs the consumer kept updating with the new ones.
(Then I tried Spark as a "consumer" following this guide: https://spark.apache.org/docs/2.2.0/streaming-kafka-0-8-integration.html#approach-2-direct-approach-no-receivers and it was still fine)
After this, I removed some of the logs from the log file and changed the topic (I deleted the 'connect-test' topic, created another one and edited the connect-file-source.properties with the new topic).
But now the connector doesn't work the same way anymore. When using the console consumer, I only get the logs that were already in the file and every new line I add is ignored. Maybe changing the topic (and/or modifying the data from the log file) without changing the connector name broke something in Kafka...
This is what Kafka Connect does with my topic 'new-topic' and connector 'new-file-connector', :
[2018-05-16 15:06:42,454] INFO Created connector new-file-connector (org.apache.kafka.connect.cli.ConnectStandalone:104)
[2018-05-16 15:06:42,487] INFO Cluster ID: qjm74WJOSomos3pakXb2hA (org.apache.kafka.clients.Metadata:265)
[2018-05-16 15:06:42,522] INFO Updated PartitionLeaderEpoch. New: {epoch:0, offset:0}, Current: {epoch:-1, offset:-1} for Partition: new-topic-0. Cache now contains 0 entries. (kafka.server.epoch.LeaderEpochFileCache)
[2018-05-16 15:06:52,453] INFO WorkerSourceTask{id=new-file-connector-0} Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask:328)
[2018-05-16 15:06:52,453] INFO WorkerSourceTask{id=new-file-connector-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask:345)
[2018-05-16 15:06:52,458] INFO WorkerSourceTask{id=new-file-connector-0} Finished commitOffsets successfully in 5 ms (org.apache.kafka.connect.runtime.WorkerSourceTask:427)
[2018-05-16 15:07:02,459] INFO WorkerSourceTask{id=new-file-connector-0} Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask:328)
[2018-05-16 15:07:02,459] INFO WorkerSourceTask{id=new-file-connector-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask:345)
[2018-05-16 15:07:12,459] INFO WorkerSourceTask{id=new-file-connector-0} Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask:328)
[2018-05-16 15:07:12,460] INFO WorkerSourceTask{id=new-file-connector-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask:345)
(it keeps flushing 0 outstanding messages even after appending new lines to the file)
So I tried to start over: I deleted the /tmp/kafka-logs directory, the /tmp/connect.offset file, and used a brand new topic name, connector name and log file, just in case. But still, the connector ignores new logs... I even tried to delete my kafka, re-extract it from the archive and run the whole process again (in case something changed in Kafka), but no success.
I'm confused as to where the problem is, any help would be appreciated !
Per docs:
The FileStream Connector examples are intended to show how a simple connector runs for those first getting started with Kafka Connect as either a user or developer. It is not recommended for production use.
I would use something like Filebeat (with its Kafka output) instead for ingesting logs into Kafka. Or kafka-connect-spooldir if your logs are not appended to directly but are standalone files placed in a folder for ingest.
Kafka Connect does not "watch" or "tail" a file. I don't believe it is documented anywhere that it does do that.
I would say it is even less useful for reading active logs than using Spark Streaming to watch a folder. Spark will "recognize" newly created files. Kafka Connect FileStreamSource must point at a single pre-existing, immutable file.
To get Spark to work with active logs, you would need something that does "log rotation" - that is, when the file reaches a max size or a condition such as the end of a time period (say a day), then this process would move the active log to the directory Spark is watching, then it handles starting a new log file for your application to continue to write to.
If you want files to be actively watched and ingested into Kafka then Filebeat, Fluentd, or Apache Flume can be used.

Kafka Remote Producer - advertised.listeners

I am running Kafka 0.10.0 on CDH 5.9, cluster is kerborized.
What I am trying to do is to write messages from a remote machine to my Kafka broker.
The cluster (where Kafka is installed) has internal as well as external IP addresses.
The machines' hostnames within the cluster get resolved to the private IPs, the remote machine resolves the same hostnames to the public IP addreses.
I opened the necessary port 9092 (I am using SASL_PLAINTEXT protocol) from remote machine to Kafka Broker, verified that using telnet.
First Step - in addition to the standard properties for the Kafka Broker, I configured the following:
listeners=SASL_PLAINTEXT://0.0.0.0:9092
advertised.listeners=SASL_PLAINTEXT://<hostname>:9092
I am able to start the console consumer with
kafka-console-consumer --new consumer --topic <topicname> --from-beginning --bootstrap-server <hostname>:9092 --consumer.config consumer.properties
I am able to use my custom producer from another machine within the cluster.
Relevant excerpt of producer properties:
security.protocol=SASL_PLAINTEXT
bootstrap.servers=<hostname>:9092
I am not able to use my custom producer from the remote machine:
Exception org.apache.kafka.common.errors.TimeoutException: Batch containing 1 record(s) expired due to timeout while requesting metadata from brokers for <topicname>-<partition>
using the same producer properties. I am able to telnet the Kafka Broker from the machine and /etc/hosts includes hostnames and public IPs.
Second Step - I modified server.properties:
listeners=SASL_PLAINTEXT://0.0.0.0:9092
advertised.listeners=SASL_PLAINTEXT://<kafkaBrokerInternalIP>:9092
consumer & producer within the same cluster still run fine (bootstrap
servers are now the internal IP with port 9092)
as expected remote producer fails (but that is obvious given that it
is not aware of the internal IP addresses)
Third Step - where it gets hairy :(
listeners=SASL_PLAINTEXT://0.0.0.0:9092
advertised.listeners=SASL_PLAINTEXT://<kafkaBrokerPublicIP>:9092
starting my consumer with
kafka-console-consumer --new-consumer --topic <topicname> --from-beginning --bootstrap-server <hostname>:9092 --consumer.config consumer.properties
gives me a warning, but I don't think this is right...
WARN clients.NetworkClient: Error while fetching metadata with correlation id 1 : {<topicname>=LEADER_NOT_AVAILABLE}
starting my consumer with
kafka-console-consumer --new-consumer --topic <topicname> --from-beginning --bootstrap-server <KafkaBrokerPublicIP>:9092 --consumer.config consumer.properties
just hangs after those log messages:
INFO utils.AppInfoParser: Kafka version : 0.10.0-kafka-2.1.0
INFO utils.AppInfoParser: Kafka commitId : unknown
seems like it cannot find a coordinator as in the normal flow this would be the next log:
INFO internals.AbstractCoordinator: Discovered coordinator <hostname>:9092 (id: <someNumber> rack: null) for group console-consumer-<someNumber>.
starting the producer on a cluster node with bootstrap.servers=:9092
I observe the same as with the producer:
WARN NetworkClient:600 - Error while fetching metadata with correlation id 0 : {<topicname>=LEADER_NOT_AVAILABLE}
starting the producer on a cluster node with bootstrap.servers=:9092 I get
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.
starting the producer on my remote machine with either bootstrap.servers=:9092 or bootstrap.servers=:9092 I get
NetworkClient:600 - Error while fetching metadata with correlation id 0 : {<topicname>=LEADER_NOT_AVAILABLE}
I have been struggling for the past three days to get this to work, however I am out of ideas :/ My understanding is that advertised.hostnames serves for exactly this purpose, however either I am doing something wrong, or there is something wrong in the machine setup.
Any hints are very much appreciated!
I met this issue recently.
In my case , I enabled Kafka ACL, and after disable it by comment this 2 configuration, the problem worked around.
authorizer.class.name=kafka.security.auth.SimpleAclAuthorizer
super.users=User:kafka
And an thread may help you I think:
https://gist.github.com/jorisdevrede/a7933a99251452bb1867
What mentioned in it at the end:
If you only use a SASL_PLAINTEXT listener on the Kafka Broker, you
have to make sure that you have set the
security.inter.broker.protocol=SASL_PLAINTEXT too, otherwise you will
get a LEADER_NOT_AVAILABLE error in the client.

Kafka QuickStart, advertised.host.name gives kafka.common.LeaderNotAvailableException

I am able to get a simple one-node Kafka (kafka_2.11-0.8.2.1) working locally on one linux machine, but when I try to run a producer remotely I'm getting some confusing errors.
I'm following the quickstart guide at http://kafka.apache.org/documentation.html#quickstart. I stopped the kafka processes and deleted all the zookeeper & karma files in /tmp. I am on a local 10.0.0.0/24 network NAT-ed with an external IP address, so I modified server.properties to tell zookeeper how to broadcast my external address, as per https://medium.com/#thedude_rog/running-kafka-in-a-hybrid-cloud-environment-17a8f3cfc284:
advertised.host.name=MY.EXTERNAL.IP
Then I'm running this:
$ bin/zookeeper-server-start.sh config/zookeeper.properties
--> ...
$ export KAFKA_HEAP_OPTS="-Xmx256M -Xms128M" # small test server!
$ bin/kafka-server-start.sh config/server.properties
--> ...
I opened up the firewall for my producer on the remote machine, and created a new topic and verified it:
$ bin/kafka-topics.sh --create --zookeeper MY.EXTERNAL.IP:2181 --replication-factor 1 --partitions 1 --topic test123
--> Created topic "test123".
$ bin/kafka-topics.sh --list --zookeeper MY.EXTERNAL.IP:2181
--> test123
However, the producer I'm running remotely gives me errors:
$ bin/kafka-console-producer.sh --broker-list MY.EXTERNAL.IP:9092 --topic test123
--> [2015-06-16 14:41:19,757] WARN Property topic is not valid (kafka.utils.VerifiableProperties)
My Test Message
--> [2015-06-16 14:42:43,347] WARN Error while fetching metadata [{TopicMetadata for topic test123 ->
No partition metadata for topic test123 due to kafka.common.LeaderNotAvailableException}] for topic [test123]: class kafka.common.LeaderNotAvailableException (kafka.producer.BrokerPartitionInfo)
--> (repeated several times)
(I disabled the whole firewall to make sure that wasn't the problem.)
The stdout errors in the karma-startup are repeated: [2015-06-16 20:42:42,768] INFO Closing socket connection to /MY.EXTERNAL.IP. (kafka.network.Processor)
And the controller.log gives me this, several times:
java.nio.channels.ClosedChannelException
at kafka.network.BlockingChannel.send(BlockingChannel.scala:100)
at kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:132)
at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
[2015-06-16 20:44:08,128] INFO [Controller-0-to-broker-0-send-thread], Controller 0 connected to id:0,host:MY.EXTERNAL.IP,port:9092 for sending state change requests (kafka.controller.RequestSendThread)
[2015-06-16 20:44:08,428] WARN [Controller-0-to-broker-0-send-thread], Controller 0 epoch 1 fails to send request Name:LeaderAndIsrRequest;Version:0;Controller:0;ControllerEpoch:1;CorrelationId:7;ClientId:id_0-host_null-port_9092;Leaders:id:0,host:MY.EXTERNAL.IP,port:9092;PartitionState:(test123,0) -> (LeaderAndIsrInfo:(Leader:0,ISR:0,LeaderEpoch:0,ControllerEpoch:1),ReplicationFactor:1),AllReplicas:0) to broker id:0,host:MY.EXTERNAL.IP,port:9092. Reconnecting to broker. (kafka.controller.RequestSendThread)
Running this seems to indicate that there is a leader at 0:
$ ./bin/kafka-topics.sh --zookeeper MY.EXTERNAL.IP:2181 --describe --topic test123
--> Topic:test123 PartitionCount:1 ReplicationFactor:1 Configs:
Topic: test123 Partition: 0 Leader: 0 Replicas: 0 Isr: 0
I reran this test and my server.log indicates that there is a leader at 0:
...
[2015-06-16 21:58:04,498] INFO 0 successfully elected as leader (kafka.server.ZookeeperLeaderElector)
[2015-06-16 21:58:04,642] INFO Registered broker 0 at path /brokers/ids/0 with address MY.EXTERNAL.IP:9092. (kafka.utils.ZkUtils$)
[2015-06-16 21:58:04,670] INFO [Kafka Server 0], started (kafka.server.KafkaServer)
[2015-06-16 21:58:04,736] INFO New leader is 0 (kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
I see this error in the logs when I send a message from the producer:
[2015-06-16 22:18:24,584] ERROR [KafkaApi-0] error when handling request Name: TopicMetadataRequest; Version: 0; CorrelationId: 7; ClientId: console-producer; Topics: test123 (kafka.server.KafkaApis)
kafka.admin.AdminOperationException: replication factor: 1 larger than available brokers: 0
at kafka.admin.AdminUtils$.assignReplicasToBrokers(AdminUtils.scala:70)
I assume this means that the broker can't be found for some reason? I'm confused what this means...
For the recent versions of Kafka (0.10.0 as of this writing), you don't want to use advertised.host.name at all. In fact, even the [documentation] states that advertised.host.name is already deprecated. Moreover, Kafka will use this not only as the "advertised" host name for the producers/consumers, but for other brokers as well (in a multi-broker environment)...which is kind of a pain if you're using using a different (perhaps internal) DNS for the brokers...and you really don't want to get into the business of adding entries to the individual /etc/hosts of the brokers (ew!)
So, basically, you would want the brokers to use the internal name, but use the external FQDNs for the producers and consumers only. To do this, you will update advertised.listeners instead.
Set advertised.host.name to a host name, not an IP address. The default is to return a FQDN using getCanonicalHostName(), but this is only best effort and falls back to an IP. See the java docs for getCanonicalHostName().
The trick is to get that host name to always resolve to the correct IP. For small environments I usually setup all of the hosts with all of their internal IPs in /etc/hosts. This way all machines know how to talk to each other over the internal network, by name. In fact, configure your Kafka clients by name now too, not by IP. If managing all the /etc/hosts files is a burden then setup an internal DNS server to centralize it, but internal DNS should return internal IPs. Either of these options should be less work than having IP addresses scattered throughout various configuration files on various machines.
Once everything is communicating by name all that's left is to configure external DNS with the external IPs and everything just works. This includes configuring Kafka clients with the server names, not IPs.
So to summarize, the solution to this was to add a route via NAT so that the machine can access its own external IP address.
Zookeeper uses the address it finds in advertised.host.name both to tell clients where to find the broker as well as to communicate with the broker itself. The error that gets reported doesn't make this very clear, and it's confusing because a client has no problem opening a TCP connection.
Taking cue from above: for my single node (while still learning) I modified server.properties file having text "advertised.host.name" to value=127.0.01. So finally it looks something like this
advertised.host.name=127.0.0.1
While starting producer it still shows warning, but now it is atleast working while I can see messages on consumer terminal perfectly comming
On your machine where Kafka is installed, check if it is up and running. The error states, 0 brokers are available that means Kafka is not up and running.
On linux machine you can use the netstat command to check if the service is running.
netstat -an|grep port_kafka_is_Listening ( default is 9092)
conf/server.properties:
host.name
DEPRECATED: only used when listeners is not set. Use listeners instead. hostname of broker. If this is set, it will only bind to this address. If this is not set, it will bind to all interfaces