Different Kafka brokers in the same cluster produce different JMX metrics - apache-kafka

I want to use JMX metrics to collect stats on topic size from my 6 broker Kafka cluster.
I have created a test topic and sent 1000 test messages to it.
Now I am using the JMX to to look at the topic.
Broker 6 shows all the partitions and the 1000 messages, broker 4 only shows partition 2!!
$ kubectl exec -it kafka-broker4-0 -c kafka-broker -- bash -c "/kafka_2.13-2.8.1/bin/kafka-run-class.sh kafka.tools.JmxTool --jmx-url service:jmx:rmi:///jndi/rmi://localhost:9581/jmxrmi --one-time true --report-format tsv" | grep greg-test | grep Offset
kafka.cluster:type=Partition,name=LastStableOffsetLag,topic=greg-test,partition=2:Value 0
kafka.log:type=Log,name=LogEndOffset,topic=greg-test,partition=2:Value 341
kafka.log:type=Log,name=LogStartOffset,topic=greg-test,partition=2:Value 0
$ kubectl exec -it kafka-broker6-0 -c kafka-broker -- bash -c "/kafka_2.13-2.8.1/bin/kafka-run-class.sh kafka.tools.JmxTool --jmx-url service:jmx:rmi:///jndi/rmi://localhost:9581/jmxrmi --one-time true --report-format tsv" | grep greg-test | grep Offset
kafka.cluster:type=Partition,name=LastStableOffsetLag,topic=greg-test,partition=0:Value 0
kafka.cluster:type=Partition,name=LastStableOffsetLag,topic=greg-test,partition=1:Value 0
kafka.cluster:type=Partition,name=LastStableOffsetLag,topic=greg-test,partition=2:Value 0
kafka.log:type=Log,name=LogEndOffset,topic=greg-test,partition=0:Value 348
kafka.log:type=Log,name=LogEndOffset,topic=greg-test,partition=1:Value 311
kafka.log:type=Log,name=LogEndOffset,topic=greg-test,partition=2:Value 341
kafka.log:type=Log,name=LogStartOffset,topic=greg-test,partition=0:Value 0
kafka.log:type=Log,name=LogStartOffset,topic=greg-test,partition=1:Value 0
kafka.log:type=Log,name=LogStartOffset,topic=greg-test,partition=2:Value 0

Related

zookeeper server cluster + Mode: follower and Mode: leader are change evry couple min

we have zookeeper cluster with 3 nodes
when we perform the following commands
echo stat | nc zookeeper_server01 2181 | grep Mode
echo stat | nc zookeeper_server02 2181 | grep Mode
echo stat | nc zookeeper_server03 2181 | grep Mode
we saw that zookeeper_server03 is the leader and other are the Mode: follower
but we noticed that every couple min the state is change and indeed after 4 min zookeeper_server01 became the leader and other are Mode: follower
again after 6 min zookeeper_server02 became a leader and so on
my Question is - dose this strange behavior is normal ?
I want to say that production Kafka cluster is using this zookeeper servers so , we are worry about that

The tdengine image (2.4.0.3) cannot be shut down normally when pulled up with kubernetes

I created a three node cluster through k8s deployment using the tdengine version 2.4.0.3 image. View the pod information:
kubectl get pods -n mytdengine
NAME READY STATUS RESTART AGE
tdengine-01 1/1 Running 0 1m45s
tdengine-02 1/1 Running 0 1m45s
tdengine-03 1/1 Running 0 1m45s
Everthing was going well.
However, when I tried to stop the pods with delete operation:
kubectl delete pod tdengine-03 -n mytdengine
The target pod is not deleted as expect. The status turns to:
NAME READY STATUS RESTART AGE
tdengine-01 1/1 Running 0 2m35s
tdengine-02 1/1 Running 0 2m35s
tdengine-03 1/1 Terminating 0 2m35s
After several tests, pod will successfully deleted until 3 mins, which is unnormal. I didn't actually use the tdengine instance, which means there are no excessive load or storage occupation. I cannot find a reason to explain why it cost 3mins to shut down.
After testing, I eliminated the problem of kubernetes configuration. Moreover, I found that the parameter ‘terminationgraceperiodseconds’ configured in the yaml file of Pod: 180
terminationgraceperiodseconds:180
This means that the pod was not shut down gracefully, but was forcibly removed after timeout.
Generally speaking, the stop of pod usually sends a signal of signterm. The container processes the signal correctly and makes an elegant shutdown. However, if it does not stop or the container does not respond to the signal and exceeds the timeout set by the above parameter 'termination graceriodseconds', the container will receive the signal of signkill and forcibly kill the container. Ref: https://tasdikrahman.me/2019/04/24/handling-singals-for-applications-in-kubernetes-docker/
The reason for this is tdengine2 4.0.3 the startup script of the image pulls up taosadapter first and then taosd, but it does not rewrite the processing method of signterm signal. Due to the particularity of Linux PID 1, only PID 1 receives the signterm signal after k8s sends it to the pod content container (as shown in the figure below, PID 1 is the startup script) and does not notify taosadapter and taosd (become zombie processes).
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
9 root 20 0 2873404 80144 2676 S 2.3 0.5 112:30.81 taosadapter
8 root 20 0 2439240 41364 2996 S 1.0 0.3 130:53.67 taosd
1 root 20 0 20044 1648 1368 S 0.0 0.0 0:00.01 run_taosd.sh
7 root 20 0 20044 476 200 S 0.0 0.0 0:00.00 run_taosd.sh
135 root 20 0 20176 2052 1632 S 0.0 0.0 0:00.00 bash
146 root 20 0 38244 1788 1356 R 0.0 0.0 0:00.00 top
I personally choose the way to rewrite hook function in k8s yaml file to immediately delete the container:
lifecycle:
preStop:
command:
- /bin/bash
- -c
- procnum=`ps aux | grep taosd | grep -v -e grep -e entrypoint -e run_taosd
| awk '{print $2}'`; kill -15 $procnum; if ["$?" -eq 0]; then echo "kill
Of course, once we know the cause of the problem, there are other solutions which are not discussed here.

Getting error while setting up a Kafka Producer and consumer using Kafkacat

used these commands to set up a kafka server and then build a basic producer consumer model for sending hello from producer to consumer.
brew services start zookeeper
brew services start kafka
kcat -P -b localhost:9092 -t topic1
kcat -C -b localhost:9092 -t topic1 -o beginning
Getting this error while running consumer code in separate terminal.
ERROR: Topic topic1 error: Broker: Unknown topic or partition
You have not actually produced any messages, so the topic was not created. kcat -P reads input from the console and sends it to Kafka, and in this case you are not sending anything.
For instance, these commands:
echo "My Message" | kcat -P -b localhost:9092 -c 1 -t some-topic
kcat -C -b localhost:9092 -t some-topic
Will generate this output:
My Message
% Reached end of topic some-topic [0] at offset 1

How to get more information about the Zookeeper status in Confluent

I am making Zookeeper cluster, and I separately start the Zookeeper in Confluent by using:
./bin/zookeeper-server-start etc/kafka/zookeeper.properties
and I want to get the status of Zookeeper.
I search it online, and all of them is using:
./zkServer.sh status
But I can't find zkServer.sh in Confluent.
I know that I can use ./bin/confluent status to get status. But I want more information about the Zookeeper like follow:
./zkServer.sh status
JMX enabled by default
Using config: /opt/../conf/zoo.cfg
Mode: follower
How can I do that?
You can use the Four Letter Words to get the same information or better instead. The output from stat:
$ echo "stat" | nc <ZOOKEEPER-IP-ADDRESS> 2181
Zookeeper version: 3.4.10
Clients:
/192.168.1.2:49618[1](queued=0,recved=1304,sent=1304)
/192.168.1.3:53484[0](queued=0,recved=1,sent=0)
Latency min/avg/max: 0/0/15
Received: 1330
Sent: 1329
Connections: 2
Outstanding: 0
Zxid: 0x1000001ee
Mode: leader
Node count: 435
The output from conf:
$ echo "conf" | nc <ZOOKEEPER-IP-ADDRESS> 2181
clientPort=2181
dataDir=/var/zookeeper/data
dataLogDir=/var/log/zookeeper
tickTime=2000
maxClientCnxns=0
minSessionTimeout=4000
maxSessionTimeout=40000
serverId=3
initLimit=20
syncLimit=5
electionAlg=3
electionPort=3888
quorumPort=2888
peerType=0

Why do I have 4 celery processes instead of the 2 I expected?

I have configured celery to run 2 workers, each with a concurrency of 1. My /etc/default/celeryd file contains (amongst other settings):
CELERYD_NODES="worker1 worker2"
CELERYD_OPTS="-Q:worker1 central -c:worker1 1 -Q:worker2 RetailSpider -c:worker2 1"
In other words, I expect 2 workers and since concurrency is 1, 1 process per worker; one worker consumes from the queue 'central' and the other consumes from a queue called 'RetailSpider'. Both have concurrency 1.
Also sudo service celeryd status shows:
celery init v10.1.
Using config script: /etc/default/celeryd
celeryd (node worker1) (pid 46610) is up...
celeryd (node worker2) (pid 46621) is up...
However what is puzzling me is the output of ps aux|grep 'celery worker', which is
scraper 34384 0.0 1.0 348780 77780 ? S 13:07 0:00 /opt/scraper/evo-scrape/venv/bin/python -m celery worker --app=evofrontend --loglevel=INFO -Q central -c 1 --logfile=/opt/scraper/evo-scrape/evofrontend/logs/celery/worker1.log --pidfile=/opt/scraper/evo-scrape/evofrontend/run/celery/worker1.pid --hostname=worker1#scraping0-evo
scraper 34388 0.0 1.0 348828 77884 ? S 13:07 0:00 /opt/scraper/evo-scrape/venv/bin/python -m celery worker --app=evofrontend --loglevel=INFO -Q RetailSpider -c 1 --logfile=/opt/scraper/evo-scrape/evofrontend/logs/celery/worker2.log --pidfile=/opt/scraper/evo-scrape/evofrontend/run/celery/worker2.pid --hostname=worker2#scraping0-evo
scraper 46610 0.1 1.2 348780 87552 ? Sl Apr26 1:55 /opt/scraper/evo-scrape/venv/bin/python -m celery worker --app=evofrontend --loglevel=INFO -Q central -c 1 --logfile=/opt/scraper/evo-scrape/evofrontend/logs/celery/worker1.log --pidfile=/opt/scraper/evo-scrape/evofrontend/run/celery/worker1.pid --hostname=worker1#scraping0-evo
scraper 46621 0.1 1.2 348828 87920 ? Sl Apr26 1:53 /opt/scraper/evo-scrape/venv/bin/python -m celery worker --app=evofrontend --loglevel=INFO -Q RetailSpider -c 1 --logfile=/opt/scraper/evo-scrape/evofrontend/logs/celery/worker2.log --pidfile=/opt/scraper/evo-scrape/evofrontend/run/celery/worker2.pid --hostname=worker2#scraping0-evo
What are the additional 2 processes - the ones with ids 34384 and 34388?
(This is a Django project)
EDIT:
I wonder if this is somehow related to the fact that celery by default launches as many concurrent worker processes, as the number of CPUs/cores available. This machine as 2 cores, hence 2 per worker. However, I would have expected the -c:worker1 1 and -c:worker2 1 options to override that.
I added --concurrency=1 to CELERYD_OPTS and also CELERYD_CONCURRENCY = 1 to settings.py. I then killed all processes and restarted celeryd, yet I still saw 4 processes (2 per worker).