ZooKeeper node counter? - apache-zookeeper

I have a cluster of ZooKeeper with just 2 nodes, each zoo.conf has the following
# Servers
server.1=10.138.0.8:2888:3888
server.2=10.138.0.9:2888:3888
the same two lines are present in both configs
[root#zk1-prod supervisor.d]# echo mntr | nc 10.138.0.8 2181
zk_version 3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f, built on 03/23/2017 10:13 GMT
zk_avg_latency 0
zk_max_latency 0
zk_min_latency 0
zk_packets_received 5
zk_packets_sent 4
zk_num_alive_connections 1
zk_outstanding_requests 0
zk_server_state follower
zk_znode_count 4
zk_watch_count 0
zk_ephemerals_count 0
zk_approximate_data_size 27
zk_open_file_descriptor_count 28
zk_max_file_descriptor_count 4096
[root#zk1-prod supervisor.d]# echo mntr | nc 10.138.0.9 2181
zk_version 3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f, built on 03/23/2017 10:13 GMT
zk_avg_latency 0
zk_max_latency 0
zk_min_latency 0
zk_packets_received 3
zk_packets_sent 2
zk_num_alive_connections 1
zk_outstanding_requests 0
zk_server_state leader
zk_znode_count 4
zk_watch_count 0
zk_ephemerals_count 0
zk_approximate_data_size 27
zk_open_file_descriptor_count 29
zk_max_file_descriptor_count 4096
zk_followers 1
zk_synced_followers 1
zk_pending_syncs 0
so why zk_znode_count == 4 ?

Znodes are not Zookeeper servers.
From Hadoop Definitive Guide:
Zookeeper doesn’t have files and directories, but a unified concept of
a node, called a znode, which acts both as a container of data (like a
file) and a container of other znodes (like a directory).
zk_znode_count refers to number of znodes available in that Zookeeper server. In your ZK ensemble, each server has four znodes.

Related

pyspark conf and yarn top memory discrepancies

An EMR cluster reads (from main node, after running yarn top):
ARN top - 13:27:57, up 0d, 1:34, 1 active users, queue(s): root
NodeManager(s): 6 total, 6 active, 0 unhealthy, 2 decommissioned, 0
lost, 0 rebooted Queue(s) Applications: 3 running, 8 submitted, 0
pending, 5 completed, 0 killed, 0 failed Queue(s) Mem(GB): 18
available, 189 allocated, 1555 pending, 0 reserved Queue(s) VCores: 44
available, 20 allocated, 132 pending, 0 reserved Queue(s) Containers:
20 allocated, 132 pending, 0 reserved
APPLICATIONID USER TYPE QUEUE PRIOR #CONT #RCONT VCORES RVCORES MEM RMEM VCORESECS MEMSECS %PROGR TIME NAME
application_1663674823778_0002 hadoop spark default 0 10 0 10 0 99G 0G 18754 187254 10.00 00:00:33 PyS
application_1663674823778_0003 hadoop spark default 0 9 0 9 0 88G 0G 9446 84580 10.00 00:00:32 PyS
application_1663674823778_0008 hadoop spark default 0 1 0 1 0 0G 0G 382 334 10.00 00:00:06 PyS
Note that the PySpark apps for application_1663674823778_0002 and application_1663674823778_0003 were provisioned via the main node command line with just executing pyspark (with no explicit config editing).
However, the application_1663674823778_0008 was provisioned via the following command: pyspark --conf spark.executor.memory=11g --conf spark.driver.memory=12g. Despite this (test) PySpark config customization, that app in yarn fails to show anything other than 0 for the memory (regular or reserved) value.
Why is this?

kafka + which files should created under kafka-logs

usually after kafka cluster scratch installation I saw this files under /data/kafka-logs ( kafka broker logs. where all topics should be located )
ls -ltr
-rw-r--r-- 1 kafka hadoop 0 Jan 9 10:07 cleaner-offset-checkpoint
-rw-r--r-- 1 kafka hadoop 57 Jan 9 10:07 meta.properties
drwxr-xr-x 2 kafka hadoop 4096 Jan 9 10:51 _schemas-0
-rw-r--r-- 1 kafka hadoop 17 Jan 10 07:39 recovery-point-offset-checkpoint
-rw-r--r-- 1 kafka hadoop 17 Jan 10 07:39 replication-offset-checkpoint
but on some other Kafka scratch installation we saw the folder - /data/kafka-logs is empty
is this indicate on problem ?
note - we still not create the topics
I'm not sure when each checkpoint file is created (though, they track log cleaner and replication offsets), but I assume that the meta properties is created at broker startup.
Otherwise, you would see one folder per Topic-partition, for example, looks like you had one topic created, _schemas.
If you only see one partition folder out of multiple brokers, then your replication factor for that topic is set to 1

Upgrading consumer from Kafka 8 to 10 with no code changes fails in ZookeeperConsumerConnector.RebalanceListener

I changed my Maven pom.xml to use the 0.10.1.0 client jar, and without changing any of the client code I ran both a producer and consumer.
The producer added messages to the Kafka 10 cluster fine (verified by kafka-consumer-offset-checker.sh), but the consumers that should have covered the 10 partitions in the topic did not seem to register at all. All partitions are unowned.
The consumer offset and owner output:
kafka-consumer-offset-checker.sh --zookeeper localhost:2181 --topic eddude-default-topic --group optimizer-group
[2017-06-28 12:56:06,493] WARN WARNING: ConsumerOffsetChecker is deprecated and will be dropped in releases following 0.9.0. Use ConsumerGroupCommand instead. (kafka.tools.ConsumerOffsetChecker$)
Group Topic Pid Offset logSize Lag Owner
optimizer-group eddude-default-topic 0 28 28 0 none
optimizer-group eddude-default-topic 1 2 2 0 none
optimizer-group eddude-default-topic 2 87 87 0 none
optimizer-group eddude-default-topic 3 0 0 0 none
optimizer-group eddude-default-topic 4 0 0 0 none
optimizer-group eddude-default-topic 5 2 5 3 none
optimizer-group eddude-default-topic 6 80 80 0 none
optimizer-group eddude-default-topic 7 29 29 0 none
optimizer-group eddude-default-topic 8 15 15 0 none
optimizer-group eddude-default-topic 9 0 0 0 none
And here is the relevant consumer client error from my app log:
2017-06-28 12:55:24,702 ERROR [ConnectorManagerEventPool 1] An error occurred starting KafkaTopicSet 4:eddude-default-topic
kafka.common.ConsumerRebalanceFailedException: optimizer-group_L-SEA-10002721-1498679709599-7154a218 can't rebalance after 4 retries
at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:670) ~[kafka_2.10-0.10.1.0.jar:na]
at kafka.consumer.ZookeeperConsumerConnector.kafka$consumer$ZookeeperConsumerConnector$$reinitializeConsumer(ZookeeperConsumerConnector.scala:977) ~[kafka_2.10-0.10.1.0.jar:na]
at kafka.consumer.ZookeeperConsumerConnector.consume(ZookeeperConsumerConnector.scala:264) ~[kafka_2.10-0.10.1.0.jar:na]
at kafka.javaapi.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:85) ~[kafka_2.10-0.10.1.0.jar:na]
at kafka.javaapi.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:97) ~[kafka_2.10-0.10.1.0.jar:na]
at com.ebay.traffic.messaging.optimizer.impl.kafka.KafkaTopicSet.start(KafkaTopicSet.java:160) ~[classes/:na]
I am just using the same Kafka 8 client code I already had and ignoring the deprecation warnings for now. Shouldn't it work as-is?
I could also post details like the configuration properties and code establishing the actual producer and consumer, but I thought I'd first simply ask in case it is an obvious answer.

sar command output strange socket information about totsck and tcpsck

my nginx web server sits on a linux virtual machine running cent os 6.4.
when using sar command to check socket info ,it output mistake socket count like this :
05:00:01 PM totsck tcpsck udpsck rawsck ip-frag tcp-tw
05:10:01 PM 16436 16944 9 0 0 4625
05:20:01 PM 16457 16844 9 0 0 2881
05:30:01 PM 16501 16835 9 0 0 2917
05:40:01 PM 16486 16842 9 0 0 3083
05:50:02 PM 16436 16885 9 0 0 2962
pay attention to the totsck and tcpsck,the later is more than the former, this is suppose to be less than the former, why ?
I suppose that's due to tcp-tw.
that correspond to the Number of TCP sockets in TIME_WAIT state.
http://linux.die.net/man/1/sar

Explain replication-offset-checkpoint AND recovery-point-offset in Kafka

Can Some explain what these files means, present inside kafka broker logs.
root#a2md23297l:/tmp/kafka-logs-1# cat recovery-point-offset-checkpoint
0
5
my-topic 0 0
kafkatopic_R2P1_1 0 0
my-topic 1 0
kafkatopic_R2P1 0 0
test 0 0
root#a2md23297l:/tmp/kafka-logs-1# cat replication-offset-checkpoint
0
5
my-topic 0 0
kafkatopic_R2P1_1 0 2
my-topic 1 0
kafkatopic_R2P1 0 2
test 0 57
Fyi, my-topic,kafkatopic_R2P1_1,my-topic,kafkatopic_R2P1,test are the topics created.
Thanks in advance.
AFAIK: recovery-point-offset-checkpoint is the internal broker log where Kafka tracks which messages (from-to offset) were successfully checkpointed to disk.
replication-offset-checkpoint is the internal broker log where Kafka tracks which messages (from-to offset) were successfully replicated to other brokers.
For more details you can take a deeper look at: kafka/core/src/main/scala/kafka/server/LogOffsetMetadata.scala and ReplicaManager.scala. The code is commented pretty well.
Marko is spot on.
the starting two numbers (0- Not sure what this is) (5-Number of partitions that are present on that particular disk)
Numbers next to the topic name(0- Partition number of the topic)
next number is the offset which was flushed to the disk(recovery-point-offset-checpoint) and in replication-offset-checkpoint last offset which the replicas were successfully replicated the data