I am currently running Apache NiFi as a StatefulSet on Kubernetes. I'm testing to see how the cluster recovers if I kill a pod but am experiencing a problem when the pod (NiFi node) rejoins the cluster.
The node will rejoin as an additional node instead of appearing as it's original identity. For example, if I have a 3 node NiFi cluster and kill and restart one pod/NiFi node I will end up with a 4 node cluster with one disconnected.
Before:
After:
I believe that the NiFi node is identified somehow in a config file which isn't persisting when it is killed. So far I am using persistent volumes to persist the following config files:
state-management.xml
authorizers.xml
I haven't persisted nifi.properties (it is dynamically generated on startup a
and I can't see anything in there that could uniquely identify the node).
So I guess, the question is how is the node uniquely identified to the server and where is it stored?
EDIT: I'm using an external Zookeeper.
Thank you in advance,
Harry
Each node stores the state of the cluster in the local state manager which by default would be written to a write-ahead-log in nifi-home/state/local. Most likely you are losing the state/local directory on the node being restarted.
Related
we have an application deployed on AWS EKS, with these components:
Apache Artemis JMS
PostgreSQL
Kafka
and some application stateless pods made in node.js
Which is the best approach to move the entire application from one nodegroup to another?
We were thinking to use the "kubectl drain" command and move the EBS manually to the new node.
Is there any better option?
The reason behind this request is that we started with 2 xlarge nodes and we want to move to 4 large nodes, also to have the application on all 3 AWS zones, because we are worried that if a node dies, AWS may start the node on a different zone and EBS disks will not be mounted.
Thanks for any advise
I would just add nodeselectors of nodeaffinity and then delete the running pods (so they will be rescheduled on the correct nodes)
The best way to achieve this to use node taint and tolerations and redeploy your application to the desired node.
I have a node mistakenly registered on a cluster B while it is actually serving for cluster A.
Here 'registered on a cluster B' means I can see the node from kubectl get node from cluster B.
I want to deregister this node from cluster B, but keep the node intact.
I know regular process to delete a node is:
kubectl drain node xxx
kubectl delete node xxx
# on node
kubeadm reset
But I do not want pods on the node from cluster A to be deleted or transfered. And I want to make sure the node would not self-register to cluster B afterwards.
To be clear, let's say, cluster A has Pod A on the node, cluster B has Pod B on the node as well, I want to delete node from cluster B, but keep Pod A intact. (By the way, can I see Pod A from cluster B?)
Thank you in advance!
To deregister the node without removing any pod you run below command
kubectl delete node nodename
After this is done the node will not appear in kubectl get nodes
For the node to not self register again stop the kubelet process on that node by logging into that node and using below command.
systemctl stop kubelet
As this case has been already clarified I decided to publish a Community Wiki answer based on the following comment:
#mario nvm, I thought different clusters in one node affect each
other, actually they do not, they just share container runtime which
is more like 'read-only', and they have different kubelets of
themselves listening on different port. – Li Ziyan Aug 17 at 5:29
to make it clear also for other users what was actually the issue here and how it has been solved or simply clarified.
So if you design your infrastructure in such a way that you use one physical (or virtual) machine as Node for more than one kubernetes clusters (which I believe is not very common case) the infrastructure looks as follows:
Components that are shared:
physical (or virtual) node
common container runtime environment (e.g. docker)
Components that are separate:
two separate kubelets. Although they are running on the same physical/virtual node they are configured to listen on different ports and are registered within two master Nodes (or more specifically two different kube-apiservers being part of two different kubernetes control planes)
two logically separate, independent kubernetes Nodes which, although they are configured on the same physical node/host, are logically completely separate kubernetes Nodes, being part of two completely different kubernetes clusters that don't interfere with each other in any way.
I hope it helps to clarify possible confusion about this question and maybe help someone in case they have similar doubts.
I want to understand what could be the possible impact of a master node failure in a k8s cluster with only one master node with internal etcd store.
As per my understanding, all kinds of deployed workload containers (including stateless and stateful sets with persistent volume claims) running on worker nodes would keep on running until recreation of any container is required as they don't have a direct functional dependency on the master node and etcd store for their core functions. And, the unavailability of the master node only affects the control plane operations for the cluster.
Is my understanding correct? If not, could you please explain the impact of the master node failure on my workload running on that cluster?
I understand that the best way to achieve HA for k8s cluster is to set up a multi-master cluster with possibly externalizing etcd stores also for decoupling of them. This question is to understand the exact impact of the master node failure to take an informed call before configuring a multi-master cluster.
Etcd operators on the quorum system so as long as the cluster sees a majority it will continue operating. If the failed node was the current leader, the others would trigger an election after the heartbeat timeout.
For kube-apiserver, it's a horizontal service so losing a node is not interesting, just like any other webapp. Some (most) controllers are singletons, but they run on every control plane node and use kube-apiserver for leader elections so as with Etcd, if the leader dies then a few seconds later another copy will get the leader lock and take over.
I want to know when the master nodes want to connect the etcd cluster, which etcd node will be selected?does the master node always connects the same etcd node untill it becomes unavailable?does each node in master cluster will connect the same node in etcd cluster?
The scheduler and controller-manager talk to the API server present on the same node. In a HA setup you'll have only one of them running at a time (based on a lease) and whoever is the current active will be talking to the local API server. If for some reason it fails to connect to the local API server, it doesn't renew the lease and another leader will be elected.
As described only one API server will be the leader at any given moment so that's the only place that needs to worry about reaching the etcd cluster. As for the etcd cluster itself, when you configure the kubernetes API server you pass it the etcd-servers flag, which is a list of etcd nodes like:
--etcd-servers=https://10.240.0.10:2379,https://10.240.0.11:2379,https://10.240.0.12:2379
This is then passed the Go etcd/client library which, looking at it's README states:
etcd/client does round-robin rotation on other available endpoints if the preferred endpoint isn't functioning properly. For example, if the member that etcd/client connects to is hard killed, etcd/client will fail on the first attempt with the killed member, and succeed on the second attempt with another member. If it fails to talk to all available endpoints, it will return all errors happened.
Which means that it'll try each of the available nodes until it succeeds connecting to one.
What should I do with pods after adding a node to the Kubernetes cluster?
I mean, ideally I want some of them to be stopped and started on the newly added node. Do I have to manually pick some for stopping and hope that they'll be scheduled for restarting on the newly added node?
I don't care about affinity, just semi-even distribution.
Maybe there's a way to always have the number of pods be equal to the number of nodes?
For the sake of having an example:
I'm using juju to provision small Kubernetes cluster on AWS. One master and two workers. This is just a playground.
My application is apache serving PHP and static files. So I have a deployment, a service of type NodePort and an ingress using nginx-ingress-controller.
I've turned off one of the worker instances and my application pods were recreated on the one that remained working.
I then started the instance back, master picked it up and started nginx ingress controller there. But when I tried deleting my application pods, they were recreated on the instance that kept running, and not on the one that was restarted.
Not sure if it's important, but I don't have any DNS setup. Just added IP of one of the instances to /etc/hosts with host value from my ingress.
descheduler, a kuberenets incubator project could be helpful. Following is the introduction
As Kubernetes clusters are very dynamic and their state change over time, there may be desired to move already running pods to some other nodes for various reasons:
Some nodes are under or over utilized.
The original scheduling decision does not hold true any more, as taints or labels are added to or removed from nodes, pod/node affinity requirements are not satisfied any more.
Some nodes failed and their pods moved to other nodes.
New nodes are added to clusters.
There is automatic redistribution in Kubernetes when you add a new node. You can force a redistribution of single pods by deleting them and having a host based antiaffinity policy in place. Otherwise Kubernetes will prefer using the new node for scheduling and thus achieve a redistribution over time.
What are your reasons for a manual triggered redistribution?