How to find the kafka version in linux?
whether there is a way to find the installed kafka version other than mentioning the version while downloading it?
Not sure if there's a convenient way, but you can just inspect your kafka/libs folder. You should see files like kafka_2.10-0.8.2-beta.jar, where 2.10 is Scala version and 0.8.2-beta is Kafka version.
Kafka 2.0 have the fix(KIP-278) for it:
kafka-topics.sh --version
Or
kafka-topics --version
Using confluent utility:
Kafka version check can be done with confluent utility which comes by default with Confluent platform(confluent utility can be added to cluster separately as well - credits cricket_007).
${confluent.home}/bin/confluent version kafka
Checking the version of other Confluent platform components like ksql schema-registry and connect
[confluent-4.1.0]$ ./bin/confluent version kafka
1.1.0-cp1
[confluent-4.1.0]$ ./bin/confluent version connect
4.1.0
[confluent-4.1.0]$ ./bin/confluent version schema-registry
4.1.0
[confluent-4.1.0]$ ./bin/confluent version ksql-server
4.1.0
There is nothing like kafka --version at this point. So you should either check the version from your kafka/libs/ folder or you can run
find ./libs/ -name \*kafka_\* | head -1 | grep -o '\kafka[^\n]*'
from your kafka folder (and it will do the same for you). It will return you something like kafka_2.9.2-0.8.1.1.jar.asc where 0.8.1.1 is your kafka version.
There are several methods to find kafka version
Method 1 simple:-
ps -ef|grep kafka
it will displays all running kafka clients in the console...
Ex:- /usr/hdp/current/kafka-broker/bin/../libs/kafka-clients-0.10.0.2.5.3.0-37.jar
we are using 0.10.0.2.5.3.0-37 version of kafka
Method 2:-
go to
cd /usr/hdp/current/kafka-broker/libs
ll |grep kafka
Ex:- kafka_2.10-0.10.0.2.5.3.0-37.jar
kafka-clients-0.10.0.2.5.3.0-37.jar
same result as method 1 we can find the version of kafka using in kafka libs.
You can grep the logs to see the version. Let's say kafka is installed under /usr/local/kafka, then:
$ grep "Kafka version" /usr/local/kafka/logs/*
/usr/local/kafka/logs/kafkaServer.out: INFO Kafka version : 0.9.0.1 (org.apache.kafka.common.utils.AppInfoParser)
will reveal the version
If you want to check the version of a specific Kafka broker, run this CLI on the broker*
kafka-broker-api-versions.sh --bootstrap-server localhost:9092 --version
where localhost:9092 is the accessible <hostname|IP Address>:<port> this API will check (localhost can be used if it's the same host you're running this command on). Example of output:
2.4.0 (Commit:77a89fcf8d7fa018)
* Apache Kafka comes with a variety of console tools in the ./bin sub-directory of your Kafka download; e.g. ~/kafka/bin/
Simple way on macOS e.g. installed via homebrew
$ ls -l $(which kafka-topics)
/usr/local/bin/kafka-topics -> ../Cellar/kafka/0.11.0.1/bin/kafka-topics
You can use for Debian/Ubuntu:
dpkg -l|grep kafka
Expected result should to be like:
ii confluent-kafka-2.11 0.11.0.1-1 all publish-subscribe messaging rethought as a distributed commit log
ii confluent-kafka-connect-elasticsearch 3.3.1-1 all Kafka Connect connector for copying data between Kafka and Elasticsearch
ii confluent-kafka-connect-hdfs 3.3.1-1 all Kafka Connect connector for copying data between Kafka and Hadoop HDFS
ii confluent-kafka-connect-jdbc 3.3.1-1 all Kafka Connect connector for JDBC-compatible databases
ii confluent-kafka-connect-replicator 3.3.1-1 all Kafka Connect connector for replicating topics between Kafka clusters
ii confluent-kafka-connect-s3 3.3.1-1 all Kafka Connect S3 connector for copying data between Kafka and
ii confluent-kafka-connect-storage-common 3.3.1-1 all Kafka Connect Storage Common contains packages used by storage
ii confluent-kafka-rest 3.3.1-1 all A REST proxy for Kafka
go to kafka/libs folder
we can see multiple jars search for something similar kafka_2.11-0.10.1.1.jar.asc in this case the kafka version is 0.10.1.1
I found an easy way to do this without searching directories or log files:
kafka-dump-log --version
Output looks like this:
5.3.0-ccs (Commit:6481debc2be778ee)
cd kafka
./bin/kafka-topics.sh --version
When you install Kafka in Centos7 with confluent :
yum install confluent-platform-oss-2.11
You can see the version of Kafka with :
yum deplist confluent-platform-oss-2.11
You can read : confluent-kafka-2.11 >= 0.10.2.1
To find the Kafka Version, We can use the jps command which show all the java processes running on the machine.
Step 1: Let's say, you are running Kafka as the root user, so login to your machine with root and use jps -m. It will show the result like
4979 Jps -m
9434 Kafka config/server.properties
Step 2: From the above result, you can take the PID for Kafka application and use pwdx 9434 which reports the current directory of the process. the result will be like
9434: /apps/kafka_2.12-2.4.0
here you can see the Kafka version which is 2.12-2.4.0
cd confluent-7.2.0/share/java/kafka
then
$ ls -lha | grep kafka
-rw-r--r-- 1 root root 5.3M Jul 5 09:45 kafka_2.13-7.2.0-ccs.jar
-rw-r--r-- 1 root root 4.8M Jul 5 09:45 kafka-clients-7.2.0-ccs.jar
lrwxrwxrwx 1 root root 26 Jul 23 10:10 kafka.jar -> ./kafka_2.13-7.2.0-ccs.jar
-rw-r--r-- 1 root root 9.4K Jul 5 09:45 kafka-log4j-appender-7.2.0-ccs.jar
-rw-r--r-- 1 root root 458K Jul 5 09:45 kafka-metadata-7.2.0-ccs.jar
-rw-r--r-- 1 root root 182K Jul 5 09:45 kafka-raft-7.2.0-ccs.jar
-rw-r--r-- 1 root root 36K Jul 5 09:45 kafka-server-common-7.2.0-ccs.jar
-rw-r--r-- 1 root root 84K Jul 5 09:45 kafka-shell-7.2.0-ccs.jar
-rw-r--r-- 1 root root 151K Jul 5 09:45 kafka-storage-7.2.0-ccs.jar
-rw-r--r-- 1 root root 23K Jul 5 09:45 kafka-storage-api-7.2.0-ccs.jar
-rw-r--r-- 1 root root 1.6M Jul 5 09:45 kafka-streams-7.2.0-ccs.jar
-rw-r--r-- 1 root root 41K Jul 5 09:45 kafka-streams-examples-7.2.0-ccs.jar
-rw-r--r-- 1 root root 161K Jul 5 09:45 kafka-streams-scala_2.13-7.2.0-ccs.jar
-rw-r--r-- 1 root root 52K Jul 5 09:45 kafka-streams-test-utils-7.2.0-ccs.jar
-rw-r--r-- 1 root root 127K Jul 5 09:45 kafka-tools-7.2.0-ccs.jar
You can also type
cat /build.info
This will give you an output like this
BUILD_BRANCH=master
BUILD_COMMIT=434160726dacc4a1a592fe6036891d6e646a3a4a
BUILD_TIME=2017-05-12T16:02:04Z
DOCKER_REPO=index.docker.io/landoop/fast-data-dev
KAFKA_VERSION=0.10.2.1
CP_VERSION=3.2.1
To check kafka version :
cd /usr/hdp/current/kafka-broker/libs
ls kafka_*.jar
Related
usually after kafka cluster scratch installation I saw this files under /data/kafka-logs ( kafka broker logs. where all topics should be located )
ls -ltr
-rw-r--r-- 1 kafka hadoop 0 Jan 9 10:07 cleaner-offset-checkpoint
-rw-r--r-- 1 kafka hadoop 57 Jan 9 10:07 meta.properties
drwxr-xr-x 2 kafka hadoop 4096 Jan 9 10:51 _schemas-0
-rw-r--r-- 1 kafka hadoop 17 Jan 10 07:39 recovery-point-offset-checkpoint
-rw-r--r-- 1 kafka hadoop 17 Jan 10 07:39 replication-offset-checkpoint
but on some other Kafka scratch installation we saw the folder - /data/kafka-logs is empty
is this indicate on problem ?
note - we still not create the topics
I'm not sure when each checkpoint file is created (though, they track log cleaner and replication offsets), but I assume that the meta properties is created at broker startup.
Otherwise, you would see one folder per Topic-partition, for example, looks like you had one topic created, _schemas.
If you only see one partition folder out of multiple brokers, then your replication factor for that topic is set to 1
I have a Kafka topic called retention and Below are the server configuration related to retention:
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=3600000 (~ 1 hour)
log.cleaner.enable=true
And below is the topic specific config:
retention.ms=2592000000,retention.bytes=3298534883328
where retention.ms ~ 30d and retention.bytes = ~ 3.29 TB
I configured the retention.ms and retention.bytes using the below command recently (on 14th Jan 2019) using below commands:
./bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic retentions --config retention.bytes=219902325555
Here the configuration for the retntion.bytes seems to be working while retention.ms does not seem to be working. Here is the evidence that I could collect:
cd log_dir/retentions-0/
ls -lrt 00000000000000000000.*
-rw-r--r-- 1 root root 294387381 Nov 26 22:37 00000000000000000000.log
-rw-r--r-- 1 root root 3912 Jan 14 18:06 00000000000000000000.index
-rw-r--r-- 1 root root 5868 Jan 14 18:06 00000000000000000000.timeindex
If we look into the logs of older segments these are nearly 2 months old.
Can anybody tell which of these two configurations will take effect on priority Or, both can work whichever crosses the configured threshold.
In my assumptions, both configurations should work in conjunction. Plz, let me know if this is not the case.
Both work in conjunction.
From Kafka: The Definitive Guide book
If you have specified a value for both log.retention.bytes and log.retention.ms ... messages may be removed when either criteria is met.
I am trying to understand the kafka data logs. I can see the logs under the dir set in logs.dir as "Topicname_partitionnumber". However I would like to know what are the different logs captured under it. Below is the screenshot for a sample log.
In Kafka logs, each partition has a log.dir directory. Each partition is split into segments.
A segment is just a collection of messages. Instead of writing all messages into a single file, Kafka splits them into chunks of segments.
Whenever Kafka writes to a partition, it writes to an active segment. Each segment has defined size limit. When the segment size limit is reached, it closes the segment and opens a new one that becomes active. One partition can have one or more segment based on the configuration.
Each segment contains three files - segment.log,segment.index and segment.timeindex
There are three types of file for each Kafka topic partition:
-rw-r--r-- 1 kafka hadoop 10485760 Dec 3 23:57 00000000000000000000.index
-rw-r--r-- 1 kafka hadoop 148814230 Oct 11 06:50 00000000000000000000.log
-rw-r--r-- 1 kafka hadoop 10485756 Dec 3 23:57 00000000000000000000.timeindex
The 00000000000000000000 in front of log and index files is the name of the segments. It represents the offset of the first record written in that segment. If there are 2 segments i.e. Segment 1 containing message offset 0,1 and Segment 2 containing message offset 2 and 3.
-rw-r--r-- 1 kafka hadoop 10485760 Dec 3 23:57 00000000000000000000.index
-rw-r--r-- 1 kafka hadoop 148814230 Oct 11 06:50 00000000000000000000.log
-rw-r--r-- 1 kafka hadoop 10485756 Dec 3 23:57 00000000000000000000.timeindex
-rw-r--r-- 1 kafka hadoop 10485760 Dec 3 23:57 00000000000000000002.index
-rw-r--r-- 1 kafka hadoop 148814230 Oct 11 06:50 00000000000000000002.log
-rw-r--r-- 1 kafka hadoop 10485756 Dec 3 23:57 00000000000000000002.timeindex
.log file stores the offset, the physical position of the message, timestamp along with the message content. While reading the messages from Kafka at a particular offset, it becomes an expensive task to find the offset in a huge log file.
That's where .index the file becomes useful. It stores the offsets and physical position of the messages in the log file.
.timeindex the file is based on the timestamp of messages.
The files without a suffix are the segment files, i.e. the files the data is actually written to, named by earliest contained message offset. The latest of those is the active segment, meaning the one that messages are currently appended to.
.index are corresponding mappings from offset to positions in the segment file. .timeindex are mappings from timestamp to offset.
Below is the screenshot for a sample log
You should add your screenshot and sample log, then we could give your expected and specific answer.
before that, only can give you some common knowledge:
eg: in my CentOS, for folder:
/root/logs/kafka/kafka.log/storybook_add-0
storybook_add: is the topic name
in code, the real topic name is storybook-add
its contains:
[root#xxx storybook_add-0]# ll
total 8
-rw-r--r-- 1 root root 10485760 Aug 28 16:44 00000000000000000023.index
-rw-r--r-- 1 root root 700 Aug 28 16:45 00000000000000000023.log
-rw-r--r-- 1 root root 10485756 Aug 28 16:44 00000000000000000023.timeindex
-rw-r--r-- 1 root root 9 Aug 28 16:44 leader-epoch-checkpoint
00000000000000000023.log: log file
store the real data = kafka message
00000000000000000023.index: index file
00000000000000000023.timeindex: timeindex file
->
00000000000000000023 called segment name
why is 23?
in 00000000000000000023.log, which stored first message's postion is 23
kafka previously has totally received 23 messages
what the message data look like?
we can see through its content:
For further basic concept and logic of kafka, recommend to read this article:
A Practical Introduction to Kafka Storage Internals
Zookeeper's rapidly pooping its internal binary files all over our production environment.
According to: http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html
and
http://dougchang333.blogspot.com/2013/02/zookeeper-cleaning-logs-snapshots.html
this is expected behavior and you must call org.apache.zookeeper.server.PurgeTxnLog
regularly to rotate its poop.
So:
% ls -l1rt /tmp/zookeeper/version-2/
total 314432
-rw-r--r-- 1 root root 67108880 Jun 26 18:00 log.1
-rw-r--r-- 1 root root 947092 Jun 26 18:00 snapshot.e99b
-rw-r--r-- 1 root root 67108880 Jun 27 05:00 log.e99d
-rw-r--r-- 1 root root 1620918 Jun 27 05:00 snapshot.1e266
... many more
% sudo java -cp zookeeper-3.4.6.jar::lib/jline-0.9.94.jar:lib/log4j-1.2.16.jar:lib/netty-3.7.0.Final.jar:lib/slf4j-api-1.6.1.jar:lib/slf4j-log4j12-1.6.1.jar:conf \
org.apache.zookeeper.server.PurgeTxnLog \
/tmp/zookeeper/version-2 /tmp/zookeeper/version-2 -n 3
but I get:
% ls -l1rt /tmp/zookeeper/version-2/
... all the existing logs plus a new directory
/tmp/zookeeper/version-2/version-2
Am I doing something wrong?
zookeeper-3.4.6/
ZooKeeper now has an Autopurge feature as of 3.4.0. Take a look at https://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html
It says you can use autopurge.snapRetainCount and autopurge.purgeInterval
autopurge.snapRetainCount
New in 3.4.0: When enabled, ZooKeeper auto purge feature retains the autopurge.snapRetainCount most recent snapshots and the corresponding transaction logs in the dataDir and dataLogDir respectively and deletes the rest. Defaults to 3. Minimum value is 3.
autopurge.purgeInterval
New in 3.4.0: The time interval in hours for which the purge task has to be triggered. Set to a positive integer (1 and above) to enable the auto purging. Defaults to 0.
Since I'm not hearing a fix via Zookeeper, this was an easy workaround:
COUNT=6
DATADIR=/tmp/zookeeper/version-2/
ls -1drt ${DATADIR}/* | head --lines=-${COUNT} | xargs sudo rm -f
Should run once a day from a cron job or jenkins to prevent zookeeper from exploding.
You need to specify the parameter dataDir and snapDir with the value that is configured as dataDir in your .properties file of zookeeper.
If your configuration looks like the following.
dataDir=/data/zookeeper
You need to call PurgeTxnLog (version 3.5.9) like the following if you want to keep the last 10 logs/snapshots
java -cp zookeeper.jar:lib/slf4j-api-1.7.5.jar:lib/slf4j-log4j12-1.7.5.jar:lib/log4j-1.2.17.jar:conf org.apache.zookeeper.server.PurgeTxnLog /data/zookeeper /data/zookeeper -n 10
I am new to Zookeeper and it has being a real issue to install it and run. I am not sure what is wrong in here but I will explain what I've being doing to make it more clear:
1.- I've followed the installation guide provided by Apache. This means download the Zookeeper distribution (stable release) extracted the file and moved into the home directory.
2.- As I am using Ubuntu 12.04 I've modified the .bashrc file including this:
export ZOOKEEPER_INSTALL=/home/myusername/zookeeper-3.4.5
export PATH=$PATH:$ZOOKEEPER_INSTALL/bin
3.- Create a config file on conf/zoo.cfg
tickTime=2000
dataDir=/var/zookeeper
clientPort=2181
and also tried with:
dataDir=/var/log/zookeeper
and
dataDir=/var/bin/zookeeper
4.- When running the start command
zkServer.sh start or `bin/zkServer.sh start` nothing happens and always returns this
JMX enabled by default
Using config: /home/sasuke/zookeeper-3.4.5/bin/../conf/zoo.cfg
mkdir: cannot create directory `/var/zookeeper': Permission denied
Starting zookeeper ... /home/sasuke/zookeeper-3.4.5/bin/zkServer.sh: line 113: /var/zookeeper/zookeeper_server.pid: No such file or directory
FAILED TO WRITE PID
I have Java installed and inside the zookeper directory there is a zookeeper.jar file that I think it's not running.
Checking here on stackoverflow there was a guy that said he could run zookeeper after typing
ssh localhost
But when I try to do it I get this error
ssh: connect to host localhost port 22: Connection refused
Please help. I've being here trying to solve it for too long.
Getting started guide of zookeeper:
http://zookeeper.apache.org/doc/r3.1.2/zookeeperStarted.html
Previous case solved with the shh localhost
Zookeeper: FAILED TO WRITE PID
UPDATE:
The permissions for log are:
drwxr-xr-x 19 root root 4096 Oct 10 07:52 log
and for zookeeper:
drwxr-xr-x 2 zookeeper zookeeper 4096 Mar 23 2012 zookeeper
Should I change any of these?
I have had the same problem. In my case was useful to start Zookeeper and directly specify a configuration file:
/bin/zkServer.sh start conf/zoo.conf
It seems you do not have the required permissions. The /var/log owner is is going to be root. Zookeeper stores the process id and snapshot of data in that directory. The process id of the spawned zookeeper server is stored in a file -zookeeper_server.pid (as of 3.3.6)
If you have root previleges, you could start zookeeper with sudo (root) previleges, it should work but definitely not recommended. Make sure you start zookeeper with the same(or higher) permissions as the owner of the directory.
Create a new directory in your home folder like /home/username/zookeeper-data.
Let dataDir point to that directory and it should work.
The default zookeeper installation (tar extract) comes with the conf file named conf/zoo_sample.cfg while the same extract's bin/zkServer.sh expects the conf file to be called zoo.cfg thereby resulting in a "No such file or dir" and the "failed to write pid" error. So before running zkServer.sh to start or stop zookeeper instance, either:
rename the zoo_sample.cfg in the conf dir to zoo.cfg, or
give the name (and path) to the conf file (as suggested by Ilya Lapitan), or, of course
edit zkServer.sh ;-)
When you create the Directory for dataDir make sure to use the -p option. This will allow subsequent directories to be created as required by the application placing files.
mkdir -p /var/log/zookeeperData
Then set:
dataDir=/var/log/zookeeperData
Seems there's all kinds of reasons this can happen. So many helpful answers here!
For me, I had improper line endings in my zoo.cfg file, and possibly invisible characters, so zookeeper was trying to create directories like /var/zookeeper? and /var/zookeeper\r. Reworking my zoo.cfg a bit fixed it for me, along with deleting zoo_sample.conf.
This happens to me due to low disk space. cause zookeeper cant create pid file inside zookeeper data folder.
I have faced the same issue while starting the zookeeper with this command:
hadoop#ubuntu:~/hadoop/zookeeper/zookeeper-3.4.8$ bin/zkServer.sh
start
ERROR [main] client.ConnectionManager$HConnectionImplementation:
The node /hbase is not in ZooKeeper.
It should have been written by the master. Check the value configured in zookeeper.znode.parent. There could be a mismatch with the one configured in the master.
But running the script as su rectified the issue:
hadoop#ubuntu:~/hadoop/zookeeper/zookeeper-3.4.8$ sudo bin/zkServer.sh
start
ZooKeeper JMX enabled by default Using config:
/home/hadoop/hadoop/zookeeper/zookeeper-3.4.8/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
Go to /usr/local/etc/
You will find zookeeper directory
delete the directory
and restart the server - zkServer start
Change the path give dataDir=/tmp/zookeeper. If it works then its clearly access issues
But its generally not advisable to use tmp directory.
This seems to be an ownership issue; running the following solved this for me.
$ sudo chown -R $USER /var/lib/zookeeper
N.B.
I've outlined my steps below which show the error I was getting (the same as the error in this SO question) and the attempt at trying the solution proposed by a user above, which advised to provide zoo.cfg as an argument.
13:01:29 ✔ ~ :: $ZK/bin/zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/local/Cellar/zookeeper/3.4.14/libexec/bin/../conf/zoo.cfg
Starting zookeeper ... /usr/local/Cellar/zookeeper/3.4.14/libexec/bin/zkServer.sh: line 149: /var/lib/zookeeper/zookeeper_server.pid: Permission denied
FAILED TO WRITE PID
13:01:32 ✘ ~ :: $ZK/bin/zkServer.sh start $ZK/conf/zoo.cfg
ZooKeeper JMX enabled by default
Using config: /usr/local/Cellar/zookeeper/3.4.14/libexec/conf/zoo.cfg
Starting zookeeper ... /usr/local/Cellar/zookeeper/3.4.14/libexec/bin/zkServer.sh: line 149: /var/lib/zookeeper/zookeeper_server.pid: Permission denied
FAILED TO WRITE PID
13:04:45 ✔ /var/lib :: ls -la
total 0
drwxr-xr-x 4 root wheel 128 Apr 19 18:55 .
drwxr-xr-x 27 root wheel 864 Apr 19 18:55 ..
drwxr--r-- 3 root wheel 96 Mar 24 15:07 zookeeper
13:04:48 ✔ /var/lib :: echo $USER
tallamjr
13:06:03 ✔ /var/lib :: sudo chown -R $USER zookeeper
Password:
13:06:44 ✔ /var/lib :: ls -la
total 0
drwxr-xr-x 4 root wheel 128 Apr 19 18:55 .
drwxr-xr-x 27 root wheel 864 Apr 19 18:55 ..
drwxr--r-- 3 tallamjr wheel 96 Mar 24 15:07 zookeeper
13:06:48 ✔ ~ :: $ZK/bin/zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/local/Cellar/zookeeper/3.4.14/libexec/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
REF:
- https://askubuntu.com/questions/6723/change-folder-permissions-and-ownership
For me this solution worked:
I granted the read, write and execute permissions for everyone using the command $sudo chmod 777 foldername for the directory zookeeper by going inside the directory /var (/var/zookeeper).
After executing this command try running the zookeeper. It ran in my case
try to use sudo -E bin/zkServer.sh start