Unable to setup Apache Zookeeper and Kafka on windows machine - apache-kafka

I am very new to Apache Zookeeper and Kafka. I've downloaded below on windows machine.
Apache Zookeeper - http://zookeeper.apache.org/releases.html
Kafka - https://kafka.apache.org/downloads.html
I am not very clear on what to execute next or where to make necessary changes.
I went to C:\apache-zookeeper-3.6.0\bin and executed zkServer.bat file
C:\apache-zookeeper-3.6.0\bin>call "C:\Program Files\Java\jdk1.8.0_151"\bin\java "-Dzookeeper.log.dir=C:\apache-zookeeper-3.6.0\bin\..\logs" "-Dzookeeper.root.logger=INFO,CONSOLE" "-Dzookeeper.log.file=zookeeper-pc-server-DESKTOP-NQ639DU.log" "-XX:+HeapDumpOnOutOfMemoryError" "-XX:OnOutOfMemoryError=cmd /c taskkill /pid %%p /t /f" -cp "C:\apache-zookeeper-3.6.0\bin\..\build\classes;C:\apache-zookeeper-3.6.0\bin\..\build\lib\*;C:\apache-zookeeper-3.6.0\bin\..\*;C:\apache-zookeeper-3.6.0\bin\..\lib\*;C:\apache-zookeeper-3.6.0\bin\..\conf" org.apache.zookeeper.server.quorum.QuorumPeerMain "C:\apache-zookeeper-3.6.0\bin\..\conf\zoo.cfg"
Error: Could not find or load main class org.apache.zookeeper.server.quorum.QuorumPeerMain
C:\apache-zookeeper-3.6.0\bin>endlocal
And Kafka from the location: C:\kafka_2.11-2.3.1\bin\windows
C:\kafka_2.11-2.3.1\bin\windows>kafka-server-start.bat
USAGE: kafka-server-start.bat server.properties
C:\kafka_2.11-2.3.1\bin\windows>
I've setup ZOOKEEPER_HOME=C:\apache-zookeeper-3.6.0 and PATH to C:\apache-zookeeper-3.6.0/bin.

I went through this video https://www.youtube.com/watch?v=TTsOoQ6_QB0 and it solved my issue.
Apache Zookeeper :
Create zookeeper_data (You can choose any name) inside C:\kafka_2.11-2.3.1 where my Kafka distribution has kept.
Then go to C:\kafka_2.11-2.3.1\config and edit zookeeper.properties file and use dataDir=C:\kafka_2.11-2.3.1\zookeeper_data
Step to Start Zookeeper:
zookeeper-server-start.bat C:\kafka_2.11-2.3.1\config\zookeeper.properties
Apache Kafka:
Create kafka-logs folder under C:\kafka_2.11-2.3.1 and then
Go to C:\kafka_2.11-2.3.1\config and edit server.properties file and use below
# A comma separated list of directories under which to store log files
log.dirs=C:\kafka_2.11-2.3.1\kafka-logs
############################# Internal Topic Settings #############################
# The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
# For anything other than development testing, a value greater than 1 is recommended for to ensure availability such as 3.
offsets.topic.num.partitions=1
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
min.insync.replicas=1
default.replication.factor=1
Step To start Apache Kafka
kafka-server-start.bat C:\kafka_2.11-2.3.1\config\server.properties

Related

Replication factor: 3 larger than available brokers: 1 when starting the kafka

I have a kafka installed in my mac last year, which has many topics within the system. Now I upgrade the zookeeper and kafka to the latest version.
by running zookeeper, it is successful
zookeeper-server-start /usr/local/etc/kafka/zookeeper.properties
Then a broker:
kafka-server-start /usr/local/etc/kafka/server.properties
however it comes up with the error
INFO [Admin Manager on Broker 0]: Error processing create topic request CreatableTopic(name='_confluent-license', numPartitions=1, replicationFactor=3, assignments=[], configs=[CreateableTopicConfig(name='cleanup.policy', value='compact'), CreateableTopicConfig(name='min.insync.replicas', value='2')]) (kafka.server.AdminManager)
org.apache.kafka.common.errors.InvalidReplicationFactorException: Replication factor: 3 larger than available brokers: 1.
How would I solve it?
A Confluent enterprise license is stored in the _confluent-command topic. This topic is created by default and contains the license that corresponds to the license key supplied through the confluent.license property. So when you're starting the Kafka server it tries to create it with replication-factor of 3 but there is only 1 broker available so it failed.
Set confluent.topic.replication.factor property to 1 in /usr/local/etc/kafka/server.properties file.
#Pardeep 's answer worked for me, but for me there were more replication factors to set (I'm using Confluent 6.2.1):
confluent.balancer.topic.replication.factor=1
confluent.durability.topic.replication.factor=1
confluent.license.topic.replication.factor=1
confluent.tier.metadata.replication.factor=1
transaction.state.log.replication.factor=1
offsets.topic.replication.factor=1
You can use findstr (on Windows) or grep (on Unix-based OS) to extract them all from the console output:
kafka-server-start /usr/local/etc/kafka/server.properties | findstr "replication.factor"

ZooKeeper Timeout - Cannot find myid file. Windows Server 2016

I have very simple setup. I am trying to start ZooKeeper (apache-zookeeper-3.6.1-bin) on two machines. I get following error when i do zookeeper status
/cygdrive/c/ZooKeeper/apache-zookeeper-3.6.1-bin/apache-zookeeper-3.6.1-bin
$ bin/zkServer.sh restart
ZooKeeper JMX enabled by default
Using config: C:\ZooKeeper\apache-zookeeper-3.6.1-bin\apache-zookeeper-3.6.1-bin\conf\zoo.cfg
ZooKeeper JMX enabled by default
Using config: C:\ZooKeeper\apache-zookeeper-3.6.1-bin\apache-zookeeper-3.6.1-bin\conf\zoo.cfg
Stopping zookeeper ... STOPPED
ZooKeeper JMX enabled by default
Using config: C:\ZooKeeper\apache-zookeeper-3.6.1-bin\apache-zookeeper-3.6.1-bin\conf\zoo.cfg
Starting zookeeper ... STARTED
kalsa#CO01EAP00000027 /cygdrive/c/ZooKeeper/apache-zookeeper-3.6.1-bin/apache-zookeeper-3.6.1-bin
$ bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: C:\ZooKeeper\apache-zookeeper-3.6.1-bin\apache-zookeeper-3.6.1-bin\conf\zoo.cfg
cat: '/tmp/zookeeper/'$'\r''/myid': No such file or directory
clientPort not found and myid could not be determined. Terminating.
My Zoo.cfg
tickTime=5000
dataDir=/tmp/zookeeper/
clientPort=2181
initLimit=5
syncLimit=2
server.1=XYZ:2888:3888
server.2=ABC:2888:3888
I have proper IPs in place of XYZ and ABC.
I have created myid file created as well. Can someone let me know if i am missing anything obvious
enter image description here
The owner of datadir should be the zookeeper user. If not, change the owner.

Testing "fail-over" on Kafka

Set-up 1:
OS: Windows 10
ZooKeeper
3 ZooKeeper instances downloaded from Apache(tested with v3.5.6 and v.3.4.14):
(1) apache-zookeeper-3.5.6-bin_1
(2) apache-zookeeper-3.5.6-bin_2 (Copy of 1)
(3) apache-zookeeper-3.5.6-bin_3 (Copy of 1)
zoo.cfg:
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/tmp/zookeeper_3.4.14_1
clientPort=2181
admin.serverPort=10081
server.1=localhost:2881:3881
server.2=localhost:2882:3882
server.3=localhost:2883:3883
4lw.commands.whitelist=*
zoo.cfg:
...
dataDir=/tmp/zookeeper_3.4.14_2
clientPort=2182
admin.serverPort=10082
...
zoo.cfg:
...
dataDir=/tmp/zookeeper_3.4.14_3
clientPort=2183
admin.serverPort=10083
...
myid file in dataDir with values 1,2 and 3 respectively
Kafka
2 Kafka instances:
(1) kafka_2.12-2.3.0_1
(2) kafka_2.12-2.3.0_2 (Copy of 1)
server.properties:
...
broker.id=1
listeners=PLAINTEXT://:9091
log.dirs=/tmp/kafka-logs-1
zookeeper.connect=localhost:2181,localhost:2182,localhost:2183
...
server.properties:
...
broker.id=2
listeners=PLAINTEXT://:9092
log.dirs=/tmp/kafka-logs-2
zookeeper.connect=localhost:2181,localhost:2182,localhost:2183
...
Spring
spring-boot-starter-* 2.2.0.RELEASE
spring-kafka-2.3.1.RELEASE
=====================================================================
Set-up 2:
Same as set-up 1, the only difference being that instead of using the ZooKeeper downloaded from Apache, i am using the ZooKeeper that comes with Kafka.
=====================================================================
Issue
The issue is that when i bring 1 Kafka down:
=> Set-up 1 will not fail-over, meaning that when i produce a message, the Kafka that is up is not receiving the message
=> Set-up 2 will fail-over, meaning that when i produce a message, the Kafka that is up will receive the message
Do you guys see anything wrong with Set-up 1?
P.S If you need more details, i am happy to provide.
In case it helps someone:
I had 2 Kafka instances, but my replication-factor for any topic created was 1(due to my misunderstanding/misinterpretation of its meaning).
What this means is that, at the time of topic creation, the topic/s will be either created in Kafka-1 or Kafka-2, not both. As such, when i tried to do a fail-over, the fail-over will fail, depending on which topic i am writing to, and which Kafka was brought down.
In short, if you have X Kafka instances, topic replication-factor needs to be X (Same goes for offsets.topic.replication.factor)

Kafka Connect FileStreamSource ignores appended lines

I'm working on an application to process logs with Spark and I thought to use Kafka as a way to stream the data from the log file. Basically I have a single log file (on the local file system) which is continuously updated with new logs, and Kafka Connect seems to be the perfect solution to get the data from the file along with the new appended lines.
I'm starting the servers with their default configurations with the following commands:
Zookeeper server:
zookeeper-server-start.sh config/zookeeper.properties
zookeeper.properties
dataDir=/tmp/zookeeper
clientPort=2181
maxClientCnxns=0
Kafka server:
kafka-server-start.sh config/server.properties
server.properties
broker.id=0
log.dirs=/tmp/kafka-logs
zookeeper.connect=localhost:2181
[...]
Then I created the topic 'connect-test':
kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic connect-test
And finally I run the Kafka Connector:
connect-standalone.sh config/connect-standalone.properties config/connect-file-source.properties
connect-standalone.properties
bootstrap.servers=localhost:9092
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=true
value.converter.schemas.enable=true
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
offset.storage.file.filename=/tmp/connect.offsets
offset.flush.interval.ms=10000
connect-file-source.properties
name=my-file-connector
connector.class=FileStreamSource
tasks.max=1
file=/data/users/zamara/suivi_prod/app/data/logs.txt
topic=connect-test
At first I tested the connector by running a simple console consumer:
kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic connect-test --from-beginning
Everything was working perfectly, the consumer was receiving the logs from the file and as I was adding logs the consumer kept updating with the new ones.
(Then I tried Spark as a "consumer" following this guide: https://spark.apache.org/docs/2.2.0/streaming-kafka-0-8-integration.html#approach-2-direct-approach-no-receivers and it was still fine)
After this, I removed some of the logs from the log file and changed the topic (I deleted the 'connect-test' topic, created another one and edited the connect-file-source.properties with the new topic).
But now the connector doesn't work the same way anymore. When using the console consumer, I only get the logs that were already in the file and every new line I add is ignored. Maybe changing the topic (and/or modifying the data from the log file) without changing the connector name broke something in Kafka...
This is what Kafka Connect does with my topic 'new-topic' and connector 'new-file-connector', :
[2018-05-16 15:06:42,454] INFO Created connector new-file-connector (org.apache.kafka.connect.cli.ConnectStandalone:104)
[2018-05-16 15:06:42,487] INFO Cluster ID: qjm74WJOSomos3pakXb2hA (org.apache.kafka.clients.Metadata:265)
[2018-05-16 15:06:42,522] INFO Updated PartitionLeaderEpoch. New: {epoch:0, offset:0}, Current: {epoch:-1, offset:-1} for Partition: new-topic-0. Cache now contains 0 entries. (kafka.server.epoch.LeaderEpochFileCache)
[2018-05-16 15:06:52,453] INFO WorkerSourceTask{id=new-file-connector-0} Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask:328)
[2018-05-16 15:06:52,453] INFO WorkerSourceTask{id=new-file-connector-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask:345)
[2018-05-16 15:06:52,458] INFO WorkerSourceTask{id=new-file-connector-0} Finished commitOffsets successfully in 5 ms (org.apache.kafka.connect.runtime.WorkerSourceTask:427)
[2018-05-16 15:07:02,459] INFO WorkerSourceTask{id=new-file-connector-0} Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask:328)
[2018-05-16 15:07:02,459] INFO WorkerSourceTask{id=new-file-connector-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask:345)
[2018-05-16 15:07:12,459] INFO WorkerSourceTask{id=new-file-connector-0} Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask:328)
[2018-05-16 15:07:12,460] INFO WorkerSourceTask{id=new-file-connector-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask:345)
(it keeps flushing 0 outstanding messages even after appending new lines to the file)
So I tried to start over: I deleted the /tmp/kafka-logs directory, the /tmp/connect.offset file, and used a brand new topic name, connector name and log file, just in case. But still, the connector ignores new logs... I even tried to delete my kafka, re-extract it from the archive and run the whole process again (in case something changed in Kafka), but no success.
I'm confused as to where the problem is, any help would be appreciated !
Per docs:
The FileStream Connector examples are intended to show how a simple connector runs for those first getting started with Kafka Connect as either a user or developer. It is not recommended for production use.
I would use something like Filebeat (with its Kafka output) instead for ingesting logs into Kafka. Or kafka-connect-spooldir if your logs are not appended to directly but are standalone files placed in a folder for ingest.
Kafka Connect does not "watch" or "tail" a file. I don't believe it is documented anywhere that it does do that.
I would say it is even less useful for reading active logs than using Spark Streaming to watch a folder. Spark will "recognize" newly created files. Kafka Connect FileStreamSource must point at a single pre-existing, immutable file.
To get Spark to work with active logs, you would need something that does "log rotation" - that is, when the file reaches a max size or a condition such as the end of a time period (say a day), then this process would move the active log to the directory Spark is watching, then it handles starting a new log file for your application to continue to write to.
If you want files to be actively watched and ingested into Kafka then Filebeat, Fluentd, or Apache Flume can be used.

How to enable remote JMX on Kafka brokers (for JmxTool)?

I enabled JMX on Kafka brokers by adding
KAFKA_JMX_OPTS="-Dcom.sun.management.jmxremote=true
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Djava.rmi.server.hostname=<server_IP>
-Djava.net.preferIPv4Stack=true"
However, when I use kafka.tools.JmxTool to get the JMX metrics, it outputs Unix timestamps only. Why?
./bin/kafka-run-class.sh kafka.tools.JmxTool \
--object-name 'kafka.server:type=BrokerTopicMetrics,name=AllTopicsMessagesInPerSec' \
--jmx-url "service:jmx:rmi:///jndi/rmi://<server_IP>:9111/jmxrmi"
How can I have it print out the metrics?
Edit bin/kafka-run-class.sh and set KAFKA_JMX_OPTS variable
KAFKA_JMX_OPTS="-Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=your.kafka.broker.hostname -Djava.net.preferIPv4Stack=true"
Update bin/kafka-server-start.sh add the below line
export JMX_PORT=PORT
You must set 'JMX_PORT' variable, or add the following line to bin/kafka-server-start.sh.
export JMX_PORT=${JMX_PORT:-9999}
then you will be able to connect to Kafka JMX metrics. I use jconsole tool and 'localhost:9999' address.
Setting JMX_PORT inside bin/kafka-run-class.sh will clash with Zookeeper, if you are running Zookeeper on the same node.
Best is to set JMX port individually inside corresponding server-start scripts:
Insert line “export JMX_PORT=${JMX_PORT:-9998}” before last line in $KAFKA_HOME/bin/zookeeper-server-start.sh file.
Restart the Zookeeper server.
Repeat steps 1 and 2 for all zookeeper nodes in the cluster.
Insert line “export JMX_PORT=${JMX_PORT:-9999}” before last line in $KAFKA_HOME/bin/kafka-server-start.sh file.
Restart the Kafka Broker.
Repeat steps 4 and 5 for all brokers in the cluster.
If you're running via systemd:
edit /etc/systemd/system/multi-user.target.wants/kafka.service
in the "[service]" section add a line:
Environment=JMX_PORT=9989
reload: systemctl daemon-reload
restart: systemctl restart kafka
enjoy the beans: echo 'beans' | java -jar jmxterm-1.0-alpha-4-uber.jar -l localhost:9989 -n 2>&1
This is Kafka 2.3.0.
Using jconsole For Available MBeans
You should use jconsole first to know the names of the MBeans available.
The proper name of the MBean you wanted to query metrics of is kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec (the AllTopics prefix was used in older verions). Thanks AndyTheEntity.
Enabling Remote JMX (with no authentication or SSL)
As described in Monitoring and Management Using JMX Technology you should set certain system properties when you start the Java VM of a Kafka broker.
Kafka's bin/kafka-run-class.sh shell script makes the configuration painless as it does the basics for you and sets KAFKA_JMX_OPTS.
# JMX settings
if [ -z "$KAFKA_JMX_OPTS" ]; then
KAFKA_JMX_OPTS="-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false "
fi
For remote JMX you should set com.sun.management.jmxremote.port that Kafka's bin/kafka-run-class.sh shell script sets using JMX_PORT environment variable.
# JMX port to use
if [ $JMX_PORT ]; then
KAFKA_JMX_OPTS="$KAFKA_JMX_OPTS -Dcom.sun.management.jmxremote.port=$JMX_PORT "
fi
With that, enabling remote JMX is as simple as the following command:
JMX_PORT=9999 ./bin/kafka-server-start.sh config/server.properties
Using JmxTool
With the above, run the JmxTool:
$ ./bin/kafka-run-class.sh kafka.tools.JmxTool \
--object-name 'kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec'
Trying to connect to JMX url: service:jmx:rmi:///jndi/rmi://:9999/jmxrmi.
"time","kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec:Count","kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec:EventType","kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec:FifteenMinuteRate","kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec:FiveMinuteRate","kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec:MeanRate","kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec:OneMinuteRate","kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec:RateUnit"
1567586728595,0,messages,0.0,0.0,0.0,0.0,SECONDS
1567586730597,0,messages,0.0,0.0,0.0,0.0,SECONDS
...
You could use --one-time option to print the JMX metrics just once.
$ ./bin/kafka-run-class.sh kafka.tools.JmxTool \
--object-name 'kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec' \
--one-time true
Trying to connect to JMX url: service:jmx:rmi:///jndi/rmi://:9999/jmxrmi.
"time","kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec:Count","kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec:EventType","kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec:FifteenMinuteRate","kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec:FiveMinuteRate","kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec:MeanRate","kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec:OneMinuteRate","kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec:RateUnit"
1567586898459,0,messages,0.0,0.0,0.0,0.0,SECONDS
vim kafka_2.11-0.10.1.1/bin/kafka-run-class.sh
and then add the first two lines and comment as I have done for other lines, (Note : after doing this Kafka scripts cannot be used for client operations for listing topics.. for your client operations you need to use a separate scripts , download again in different locations and use)
export JMX_PORT=9096
KAFKA_JMX_OPTS="-Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=<ipaddress> -Dcom.sun.management.jmxremote.port=$JMX_PORT -Dcom.sun.management.jmxremote.rmi.port=$JMX_PORT"
# JMX settings
#if [ -z "$KAFKA_JMX_OPTS" ]; then
# KAFKA_JMX_OPTS="-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false "
#fi
# JMX port to use
#if [ $JMX_PORT ]; then
# KAFKA_JMX_OPTS="$KAFKA_JMX_OPTS -Dcom.sun.management.jmxremote.port=$JMX_PORT "
#fi
Use kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec
The AllTopics prefix was used in older verions. You can specify topic using kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec,topic=<topic-name>
src: http://grokbase.com/t/kafka/users/164ksnhff0/enable-jmx-on-kafka-brokers
Kafka has provided all you need. When your start your server, activate the KAFKA_JMX_OPTS arg by using these command:
$KAFKA_JMX_OPTS JMX_PORT=[your_port_number] ./kafka-server-start.sh -daemon ../config/server.properties
Using those command, you activated JMX Remote and related port. Then you can connect your JConsole or another monitoring tools.
Just before calling kafka-server-start.sh add following exports. It worked like a charm for my case. You can set desired port for JMX_PORT and you should set broker for $BROKER_IP part.
export JMX_PORT=9900
export KAFKA_JMX_OPTS="-Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=$BROKER_IP -Djava.net.preferIPv4Stack=true"
This is standard Kafka start procedure:
bin/kafka-server-start.sh config/server.properties
This is Kafka start procedure with JMX:
JMX_PORT=8004 bin/kafka-server-start.sh config/server.properties