file_descriptor_count,avg_latency and outstanding_request metrics is missing from zookeeper instance after we enabled prometheus exporter - apache-zookeeper

After I enabled prometheus metrics exporter in zookeeper 3.7.1, one instance of zookeeper, running in a cluster of 3 , has stopped sending metrics for file_descriptor_count, avg_latency and outstanding_request. When I deploy it on a single instance it seems to be working fine sending all the metric data.
This is the content of zoo.cfg file -
zoo.cfg
metrics missing from one instance of zookeeper
Missing Metrics
I am currently stuck at this,It would be great if someone can help me... Thanks in advance!!

Related

kafka on kubernetes does not receive messages from external producer

Hello (I use the google translator).
I have the following problem, I have a Kafka service in kubernetes, everything is managed by rancher and the deployment of kafka is done through the catalogs that rancher allows, (I attach an image of the service)
Everything works correctly within kubernetes, but now I need a producer external to kubernetes connects to Kafka and sends messages so that they are received internally in kubernetes.
I have not been able to accomplish this task and I have already tried another kafka deployment following this guide:
https://www.weave.works/blog/kafka-on-kubernetes-and-deploying-best-practice
[1]
But I can't understand both in the version of rancher catalogs and not in the version installed through YAML files, where and what should I configure to have a producer outside of kubernetes, I also tried to set the service as NodePort but this didn't work, any help you are welcome and thank you.
NodePort is one option. A LoadBalancer or Ingress is another.
Rather than use Rancher catalogs, I'd recommend that you read through the Strimzi operator documentation, where it covers all options for external client communication

Can you run Zookeeper cluster without using statefulsets in openshift?

I have a single instance of zookeeper running without issues, however when I add two more nodes it crashes with leader election or we got a connection request from the server with own id.
Appreciate any help here.
In short, you should use statefulset.
Would you like community help you - please provide logs and errors of crushes.

Failed to send the transaction successfully to the order status: SERVICE UNAVAILABLE

I am using kakfa orderer service for hyperledger fabric 1.4. while updating chaincode or making any puutState call i am getting error message stated as Failed to send the transaction successfully to the order status: SERVICE UNAVAILABLE. while checking zoopeker and kafka node it seems like kafka nodes are not able to talk to each other.
kakfa & zookeeper logs
Could you provide more info about the topology of the zookeeper-kafka cluster?
Did you use docker to deploy the zk cluster? If so, you can refer to this file.
https://github.com/whchengaa/hyperledger-fabric-technical-tutorial/blob/cross-machine-hosts-file/balance-transfer/artifacts/docker-compose-kafka.yaml
Remember to specify the IP address of the other zookeeper nodes in the hosts file which is mounted to /etc/hosts of that zookeeper node.
Make sure the port number of zookeeper nodes listed in ZOO_SERVERS environment variables are correct.

Flink HA JobManager cluster cannot elect a leader

I'm trying to deploy Apache Flink 1.6 on kubernetes. With following the tutorial at job manager high availabilty
page. I already have a working Zookeeper 3.10 cluster from its logs I can see that it's healthy and doesn't configured to Kerberos or SASL.All ACL rules are let's every client to write and read znodes. When I start the cluster everything works as expected every JobManager and TaskManager pods are successfully getting into Running state and I can see the connected TaskManager instances from the master JobManager's web-ui. But when I delete the master JobManager's pod, the other JobManager pod's cannot elect a leader with following error message on any JobManager-UI in the cluster.
{
"errors": [
"Service temporarily unavailable due to an ongoing leader election. Please refresh."
]
}
Even if I restart this page nothing changes. It stucks at this error message.
My suspicion is, the problem is related with high-availability.storageDir option. I already have a working (tested with CloudExplorer) minio s3 deployment to my k8s cluster. But flink cannot write anything to the s3 server. Here you can find every config from github-gist.
According to the logs it looks as if the TaskManager cannot connect to the new leader. I assume that this is the same for the web ui. The logs say that it tries to connect to flink-job-manager-0.flink-job-svc.flink.svc.cluster.local/10.244.3.166:44013. I cannot say from the logs whether flink-job-manager-1 binds to this IP. But my suspicion is that the headless service might return multiple IPs and Flink picks the wrong/old one. Could you log into the flink-job-manager-1 pod and check what its IP address is?
I think you should be able to resolve this problem by defining for each JobManager a dedicated service or if you use the pod hostname instead.

Prometheus zookeeper monitoring

i want to move zookeeper to being monitored by prometheus .
i deployed jmx-exporter (sscaling/jmx-prometheus-exporter:0.1.0)
and got most of the metrics but some are missing , for example zookeeper.approximate_data_size and parnew metrics of the GarbageCollector
for example:
i get this par new metrics from the logstash with the same jmx exporter:
java_lang_GarbageCollector_LastGcInfo_memoryUsageAfterGc_used{name="ParNew",key="Par Survivor Space",}
but in the zookeeper i get only copy metrics:
java_lang_GarbageCollector_LastGcInfo_memoryUsageAfterGc_used{name="Copy",key="Metaspace",} 1.4809288E7
The most likely reason you are getting different GC metrics is that you are running the different JVMs with different memory settings/garbage collectors, and thus the metrics are different.
If Zookeeper is exposing a number via JMX, the JMX exporter should be returning it.
To collect zookeeper metrics from Prometheus
Create conf/java.env file (my zookeeper configuration directory: /etc/zookeeper/conf/java.env) and below mentioned contents
export SERVER_JVMFLAGS="-Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=10701 -javaagent:/opt/jmx_prometheus/jmx_prometheus_javaagent-0.3.0.jar=10801:/opt/jmx_prometheus/config.yml $SERVER_JVMFLAGS"
Restart zookeeper service, zookeeper JMX metrics can be collected from port 10801(as I configured this port above to get metrics)