Zookeeper restart always falls when restarting - apache-zookeeper

I have a 3 node master setup for marathon and mesos masters, everything is fine until I reboot one of the nodes or try restarting zookeeper.
The issue is somehow correlating to the internal DB zookeeper is using, when I delete /var/lib/zookeeper/version2/* then zookeeper comes up fine and re syncs with the other nodes.
I am using the current marathon mesos packages with zookeeper from the mesosphere repo on rhel7.
Does anybody know how to fix this? The Filesystem below does not change anything, I tried with xfs and ext4.
Another question would be to somehow backup the marathon apps - if I reboot let's say all nodes in the quorum with deleting the zookeeper DB everything is lost.

Related

Start/Stop local dev kubernetes cluster created by kubeadm (like microk8s or minikube)

3 nodes Kubernetes cluster created using kubeadm v1.19.9. The nodes are VMs (using KVM hypervisor on Ubuntu 20.04).
The usage of this Kubernetes cluster is for development and exercises on Kubernetes. I'd like to stop and restart the cluster where it was left off. In the same fashion as the stop and start commands available with minikube or microk8s.
EDIT: clarify the question to avoid suggested duplicated posts. I am looking for an elegant solution to stop and restart the same cluster. NOT to destroy / reset / uninstall the cluster.
I couldn't find a simple solution from various web searches. There are solutions which suggested to tear down the cluster which is not my use case here. An answer dating 3 years ago, proper shutdown of a kubernetes cluster, is closer to what I want but this sounds quite complicate. Another solution How to Setup & Recover a Self-hosted Kubeadm Kubernetes Cluster After Reboot doesn't explain well enough about the principle used.
I hope there is a simpler solution now.
EDIT (2021-04-11): Kubernetes 1.21 release notes:
Kubelet Graceful Node Shutdown feature graduates to Beta and enabled by default.
kubernetes/enhancements Graceful node shutdown #2000
Enhancement target (which target equals to which milestone):
Alpha release target (1.20)
Beta release target (1.21)
Stable release target (1.23)
To summarize:
k8s should be able to handle shutdowns. What may not be able to handle it are the applications/containers that you run - just make sure containers start on their own and don't require manual intervention and you should be fine.
I mentioned in comments about flushing etcd data to disk but (after some research) this should not be neccessary since etcd does it itself and implements strong consistency model to make sure it doesn't lose the data. But this doesn't mean you should not be doing your backups - it's better to have a backup and don't ever use it then don't have one when needed.
The solution mentioned in How to Setup & Recover a Self-hosted Kubeadm Kubernetes Cluster After Reboot is relevant only if you use SelfHosting.
Also (for convenience) make sure that all configs are persisting between reboots e.g. swap partition should be disabled and if you only run swapoff -a it won't persist after reboot - it's much better to make changes in fstab so that when rebooted you don't have to disable anything again manually.
Here are some links:
Backing up an etcd cluster
etcd disaster recovery
Permanently Disable Swap for Kubernetes Cluster

Restarting NiFi Node Joins Cluster as New Node

I am currently running Apache NiFi as a StatefulSet on Kubernetes. I'm testing to see how the cluster recovers if I kill a pod but am experiencing a problem when the pod (NiFi node) rejoins the cluster.
The node will rejoin as an additional node instead of appearing as it's original identity. For example, if I have a 3 node NiFi cluster and kill and restart one pod/NiFi node I will end up with a 4 node cluster with one disconnected.
Before:
After:
I believe that the NiFi node is identified somehow in a config file which isn't persisting when it is killed. So far I am using persistent volumes to persist the following config files:
state-management.xml
authorizers.xml
I haven't persisted nifi.properties (it is dynamically generated on startup a
and I can't see anything in there that could uniquely identify the node).
So I guess, the question is how is the node uniquely identified to the server and where is it stored?
EDIT: I'm using an external Zookeeper.
Thank you in advance,
Harry
Each node stores the state of the cluster in the local state manager which by default would be written to a write-ahead-log in nifi-home/state/local. Most likely you are losing the state/local directory on the node being restarted.

Flink HA JobManager cluster cannot elect a leader

I'm trying to deploy Apache Flink 1.6 on kubernetes. With following the tutorial at job manager high availabilty
page. I already have a working Zookeeper 3.10 cluster from its logs I can see that it's healthy and doesn't configured to Kerberos or SASL.All ACL rules are let's every client to write and read znodes. When I start the cluster everything works as expected every JobManager and TaskManager pods are successfully getting into Running state and I can see the connected TaskManager instances from the master JobManager's web-ui. But when I delete the master JobManager's pod, the other JobManager pod's cannot elect a leader with following error message on any JobManager-UI in the cluster.
{
"errors": [
"Service temporarily unavailable due to an ongoing leader election. Please refresh."
]
}
Even if I restart this page nothing changes. It stucks at this error message.
My suspicion is, the problem is related with high-availability.storageDir option. I already have a working (tested with CloudExplorer) minio s3 deployment to my k8s cluster. But flink cannot write anything to the s3 server. Here you can find every config from github-gist.
According to the logs it looks as if the TaskManager cannot connect to the new leader. I assume that this is the same for the web ui. The logs say that it tries to connect to flink-job-manager-0.flink-job-svc.flink.svc.cluster.local/10.244.3.166:44013. I cannot say from the logs whether flink-job-manager-1 binds to this IP. But my suspicion is that the headless service might return multiple IPs and Flink picks the wrong/old one. Could you log into the flink-job-manager-1 pod and check what its IP address is?
I think you should be able to resolve this problem by defining for each JobManager a dedicated service or if you use the pod hostname instead.

Flapping metrics in DC/OS dashboard after changing master nodes

After changing two of three master nodes in an DC/OS 1.8 cluster to a newer CoreOS version (one with a kernel that is patched against the DirtyCOW vulnerability) the masters stopped working. The dashboard showed an empty data center.
We synchronized /var/lib/dcos from the old master to the two new master nodes. Then the dashboard started working again. The DC/OS dashboard still shows flapping metrics.
We have a mesos.leader and a zookeeper leader.
How can we stabilize the cluster?
Last time this happened to us we had to reinstall the cluster. I just finished stopping our master nodes one at a time to increase the disk size. We are now back in the flapping state. I think a reinstall is in our future. I'm searching for answers now to help avoid that.

Kubernetes on Mesos Cluster

Hi I am setting up Kubernetes on top of Mesos by following http://kubernetes.io/v1.1/docs/getting-started-guides/mesos.html and this is how my current test lab looks like
2 Numbers of mesos master with zookeeper
2 Numbers of mesos slaves with docker and flannel installed
Additional mesos slave running Kubernetes-mesos and kubernetes srvices
A server with ETCD service which supports both flannel and kubernetes
Can you please let me know if this is enough ?
Below are the two questions I have
Do we really need to have the kubernetes master server here to be configured as a mesos slave?
Do we need to install kubernetes package on mesos slaves as well ? The url talks about package installation and configuration only on the kubernetes master..With out kubernetes running on the slaves can the master create pods/services etc on the slaves through mesos scheduler?
Regarding Mesos Masters and Zookeeper instances, to have an even number of nodes is not really a good idea, because of the quorum mechanisms involved. My suggestion would be running three nodes of both services.
I assume you want to run this locally? If so, I guess it would make sense to use a preconfigured Vagrant project such as https://github.com/tobilg/coreos-mesos-cluster This launches a three node CoreOS cluster with all the Mesos/Zookeeper services already installed, and etcd and flanneld are also already installed on CoreOS itself.
This would mean that you only would have to do the following steps once the cluster is launched:
http://kubernetes.io/v1.1/docs/getting-started-guides/mesos.html#deploy-kubernetes-mesos respectively https://coreos.com/kubernetes/docs/latest/getting-started.html
http://kubernetes.io/v1.1/docs/getting-started-guides/mesos.html#start-kubernetes-mesos-services
1)Kubernetes master doesnt need to be a mesos slave .
2)You dont need kubernetes to be installed on the minions(mesos-slaves)
All you need is below
1)Mesos setup (Mesos masters and slaves along with zookeeper ,docker running on all mesos slaves)
2)etcd cluster ,will provide overlay network(flannel) and also will do the service discovery of kubernetes setup
3)kubernetes master ..
Below blogs helped a lot in setting it up
http://manfrix.blogspot.in/2015/11/mesoskubernetes-how-to-install-and-run.html
https://github.com/ruo91/docker-kubernetes-mesos