I need to see the offset values that storm is using to read via its KafkaSpout. Here is the config I pass in:
SpoutConfig kafkaConfig = new SpoutConfig(brokerHosts, "some_values",
"/storm/env_values", "storm_DEBUG");
I've tried searching about with some of the kafka tools but haven't found anything useful yet:
kafka.tools.ExportZkOffsets
kafka.tools.ConsumerOffsetChecker
Are there better tools to use to find my offset?
Take a look to https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem
in Managment consoles there is a list of tools for monitoring your topics and offsets. I remember i have use https://github.com/otoolep/stormkafkamon a python monitoring tool.
But what exactly do you want?, if you only tried to see your offset you can see your offset in zookeeper, only connect in your zookeeper where your offsets are stored.
Example:
bin/zookeeper-shell.sh localhost:2000
Connecting to localhost:2000
Welcome to ZooKeeper! JLine support is disabled
ls /
[storm, brokers, zookeeper]
ls /brokers
[kafka-spout]
ls /brokers/kafka-spout
[partition_0]
get /brokers/kafka-spout/partition_0
{"topology":{"id":"a9be1962-6b4e-4ed4-ae68-155a1948a1f6","name":"consolidate_reports"},"offset":4426029,"partition":0,"broker":{"host":"localhost","port":9092},"topic":"bid_history"}
cZxid = 0x50 ctime = Thu May 21 11:00:48 BRT 2015 mZxid = 0x50 mtime =
Thu May 21 11:00:48 BRT 2015 pZxid = 0x50 cversion = 0 dataVersion = 0
aclVersion = 0 ephemeralOwner = 0x0 dataLength = 182 numChildren = 0
KafkaSpout maintains the offset information on Storm's zookeeper by default under {root path}/{id}/{partition-id}. Read this for more information.
You can use the bundled zookeeper shell in Kafka (bin/zookeeper-shell.sh zookeeperHost:port) and browse to the location using the CLI. Issuing a get against the offset path will give you the stored value.
Turns out that I was looking at the wrong zookeeper. According to this doc:
The Kafka spout stores its offsets in the same instance of Zookeeper used by Apache Storm.
So looking at the kafka zookeeper isn't going to be v helpful.
Related
Is there a command to show the details of Kafka server or the status of Kafka server? (I am not trying to find out if the kafka server is running.)
I can only find information on topic, partition, producer, and consumer CLI commands.
If you are looking for the Kafka cluster broker status, you can use zookeeper cli to find the details for each broker as given below:
ls /brokers/ids returns the list of active brokers IDs on the cluster.
get /brokers/ids/<id> returns the details of the broker with the given ID.
Example :
kafka_2.12-1.1.1 % ./bin/zookeeper-shell.sh localhost:2181 ls /brokers/ids
Connecting to localhost:2181
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
[0]
kafka_2.12-1.1.1 % ./bin/zookeeper-shell.sh localhost:2181 get /brokers/ids/0
Connecting to localhost:2181
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
{"listener_security_protocol_map":{"PLAINTEXT":"PLAINTEXT"},"endpoints":["PLAINTEXT://localhost:9092"],"jmx_port":-1,"host":"localhost","timestamp":"1558428038778","port":9092,"version":4}
cZxid = 0x116
ctime = Tue May 21 08:40:38 UTC 2019
mZxid = 0x116
mtime = Tue May 21 08:40:38 UTC 2019
pZxid = 0x116
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x16ad9830f16000b
dataLength = 188
numChildren = 0
You can put these steps in some shell script to get the details for all brokers.
You can activate JMX metrics by setting environment variable JMX_PORT.
$ export JMX_PORT=9010
$ ./bin/kafka-server-start.sh ./config/server.properties
Then, you can use jconsole or Java Mission Control to display cluster metrics.
Or at least one of them? I don't get it when I use kafka-topics.sh --list or --describe, perhaps I'm missing the option for verbosity, although I don't see them in the attribute list for topic configuration at all. Is it not sensible information with Kafka?
You can see the Kafka topic creation time(ctime) and last modified time(mtime) in zookeeper stat.
First login to zookeeper shell
kafka % bin/zookeeper-shell.sh localhost:2181 stat /brokers/topics/test-events
It will return below details:
Connecting to localhost:2181
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
cZxid = 0x1007ac74c
ctime = Thu Nov 01 10:38:39 UTC 2018
mZxid = 0x4000f6e26
mtime = Mon Jan 07 05:22:25 UTC 2019
pZxid = 0x1007ac74d
cversion = 1
dataVersion = 8
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 112
numChildren = 1
You can refer this to understand the attributes : https://zookeeper.apache.org/doc/current/zookeeperProgrammers.html#sc_zkStatStructure
Kafka does not publicly state the date of topic creation/alteration.
The timing data itself is not required by Kafka to work. The current topic config values are kept by the Zookeeper ensemble that the whole Kafka cluster requires to function, so it's kept in-sync by the underlying zookeeper process, and for the part that Kafka is required to syncrhonize, only the offsets within the topic are required to partially-order the messages as they come, the timestamp is not required information.
If you want to keep topic modifications actionable, maybe your best bet is to have a Kafka topic to save such modifications so that you can later read it.
I read the following from confluence wiki for kafka and I am quoting it below:
Why do I see error "Should not set log end offset on partition" in the
broker log?
Typically, you will see errors like the following.
kafka.common.KafkaException: Should not set log end offset on
partition [test,22]'s local replica 4 ERROR
[ReplicaFetcherThread-0-6], Error for partition [test,22] to broker
6:class
kafka.common.UnknownException(kafka.server.ReplicaFetcherThread)
A common problem is that more than one broker registered the same
host/port in Zookeeper. As a result, the replica fetcher is confused
when fetching data from the leader. To verify that, you can use a
Zookeeper client shell to list the registration info of each broker.
The Zookeeper path and the format of the broker registration is
described in Kafka data structures in Zookeeper. You want to make sure
that all the registered brokers have unique host/port.
According to the official documentation, if I do PLAINTEXT://:9092 then all interfaces will register using 9092 port. 0.0.0.0 means default interface will register using 9092 port.
If this is true, then I don't see how 0.0.0.0:9092 broker registration can never confuse zookeeper? I think if I don't explicitly specify the hostname or ipaddr with portname, Zookeeper will always get confuse since all brokers will register with same interface and port number. I have confirmed that using Zookeeper-shell.bat and running command get /broker/ids/{id} command.
The following is from Zookeeper Client Shell enquiry on /brokers/ids
get /brokers/ids/1
{"listener_security_protocol_map":{"PLAINTEXT":"PLAINTEXT"},"endpoints":["PLAINTEXT://0.0.0.0:9092"],"jmx_port":-1,"host":"0.0.0.0","timestamp":"1500646657734","port":9092,"version":4}
cZxid = 0xe0000000f
ctime = Fri Jul 21 14:17:37 UTC 2017
mZxid = 0xe0000000f
mtime = Fri Jul 21 14:17:37 UTC 2017
pZxid = 0xe0000000f
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x15d6582c70b0001
dataLength = 184
numChildren = 0
get /brokers/ids/2
{"listener_security_protocol_map":{"PLAINTEXT":"PLAINTEXT"},"endpoints":["PLAINTEXT://0.0.0.0:9092"],"jmx_port":-1,"host":"0.0.0.0","timestamp":"1500646657006","port":9092,"version":4}
cZxid = 0xe0000000b
ctime = Fri Jul 21 14:17:37 UTC 2017
mZxid = 0xe0000000b
mtime = Fri Jul 21 14:17:37 UTC 2017
pZxid = 0xe0000000b
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x15d6582c70b0000
dataLength = 184
numChildren = 0
get /brokers/ids/3
{"listener_security_protocol_map":{"PLAINTEXT":"PLAINTEXT"},"endpoints":["PLAINTEXT://0.0.0.0:9092"],"jmx_port":-1,"host":"0.0.0.0","timestamp":"1500646656895","port":9092,"version":4}
cZxid = 0xe00000008
ctime = Fri Jul 21 14:17:36 UTC 2017
mZxid = 0xe00000008
mtime = Fri Jul 21 14:17:36 UTC 2017
pZxid = 0xe00000008
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x35d6582c7800000
dataLength = 184
numChildren = 0
Has anyone got a better idea?
In kafka server.properties , there are two property keys:
listeners
The address the socket server listens on. It will get the value returned from
java.net.InetAddress.getCanonicalHostName() if not configured.
FORMAT:
listeners = listener_name://host_name:port
EXAMPLE:
listeners = PLAINTEXT://your.host.name:9092
advertised.listeners
Hostname and port the broker will advertise to producers and
consumers. If not set, it uses the value for "listeners" if
configured. Otherwise, it will use the value returned from
java.net.InetAddress.getCanonicalHostName().
OK. Pay attention to the details for advertised.listeners. if you don't configure this property, it will use the listeners default. when you set listeners to 0.0.0.0:9092, It will listen all net interface of your Kafka server. But if the advertised.listeners also set to 0.0.0.0, then others will not know how to connect to your Kafka server, Consumer, Producer and Zookeeper. all of these will fail to find where is your Kafka server.
So in a word, The advertised.listeners should be set your public net ip which other machine in Internet can connnect to your server with this ip.
How do you know when was a topic created in Kafka?
It seems that a few of the topics were created with a wrong number of partitions. Is there a way to know the date the topic was created? Supposedly, a topic with the name "test" was created with n number of partitions. How can I find the date and time when this "test" topic was created on Kafka?
You can see the Kafka topic creation time(ctime) and last modified time(mtime) in zookeeper stat.
First login to zookeeper shell and add command "stat "
kafka % bin/zookeeper-shell.sh localhost:2181 stat /brokers/topics/test-events
It will return below details:
Connecting to localhost:2181
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
cZxid = 0x1007ac74c
ctime = Thu Nov 01 10:38:39 UTC 2018
mZxid = 0x4000f6e26
mtime = Mon Jan 07 05:22:25 UTC 2019
pZxid = 0x1007ac74d
cversion = 1
dataVersion = 8
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 112
numChildren = 1
You can refer this to understand the attributes : https://zookeeper.apache.org/doc/current/zookeeperProgrammers.html#sc_zkStatStructure
You can tell the topic creation time by checking the zookeeper node creation time for the topic. Given that "zookeeper001:2181/foo" is the Kafka zookeeper connection string, and "test_topic" is the topic name, you can check the stat of znode to get the topic creation time:
/foo/brokers/topics/test_tpopic
I don't think that there is a way to check number of partitions at the topic creation time. You can always increase the topic partition number by using :
kafka-topics.sh --alter ...
We are running zookeeper 3.3 and kafka 0.8 on separate servers. We are using HL consumers and they access the data in kafka queue as expected, on a restart they pick up form where they left off. So, the consumers behave as expected.
The problem is we can't see the offsets in the zookeeper when we use zkCli.sh. For now the consumer is set up for running in one partition only for a topic .
the cmd "ls /consumers/mygrpid/offsets/mytopic/0" returns [].
same for "ls /consumers/mygrpid/owners/mytopic", it returns [].
Because the consumer behaves as expected when the consumer is stopped and restarted again (ie. it picks up from the offset it left off last time it ran. we can tell this by looking at the log which gives the offsets it starts with and every time it commits) we know that somewhere zookeeper should be saving the committed offsets for the consumer. My understanding is that the zookeeper keeps track for the HL consumer, and not the kafka broker. Yet the "ls" commands that are supposed to show the offsets show null instead.
Should I be looking at a different place for accessing the offsets? (ultimately, I need to have a script that reports on the offsets for all the consumers.)
Very much appreciate any help or suggestion.
You should use get instead of ls. ls gets child nodes and in your case /consumers/mygrpid/offsets/mytopic/0 does not have children. But it has a value, so running get /consumers/mygrpid/offsets/mytopic/0 should show you something like this:
47
cZxid = 0x568
ctime = Tue Feb 03 19:08:10 EET 2015
mZxid = 0x568
mtime = Tue Feb 03 19:08:10 EET 2015
pZxid = 0x568
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 2
numChildren = 0
where 47 is the offset value.