I am running my services on an open shift cluster with all the nodes in ready status.
I found few microservice pods are having networking issues on selected nodes but they are up and running.
But when they are running on other nodes they are fine.
Also what can be the reason behind the pod is showing stickiness even after the restart pod is deployed on the same node again and again also there is no toleration-taint scenerio.
Related
I have setup a EKS in AWS, setup 2 worker node and configured the autosclaing on those nodes with 3 as desired capacity.
Sometime my worker node goes down due to "an EC2 health check indicating it has been terminated or stopped." which results my pod get restarted. I have not enabled any replicas for the pods. It is one now.
Just wanted to know, how can my services (pod) will be highly available despite of any worked node goes down or restart?
If you have only one pod for your service, then your service is NOT highly available. It is a single point of failure. If that pod dies or is restarted, as has happened here, then during the time the pod is being restarted, your service is dead.
You need a bare minimum, TWO pods for a service to be highly available, they they should be on different nodes (you can force Kuberentes to schedule the pods on different nodes using pod antiaffinity (https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/) so that if one node goes down as in your example, it takes out only pod, leaving the other pod(s) to handle the requests until the other pod can be rescheduled.
I am trying to use helm charts for deploying kafka and zookeeper in local k8s but while checking the status of respective pods it shows PENDING for long time and pod is not assigning to any node nevertheless i have 2 worker nodes running which are healthy
I tried by deleting the pods and redeployed still i landed in same situation not able to make pods run need help on how i can run this pods
I deployed my frontend and backend application in my kops cluster on AWS ec2 with master size of t2 medium , when I increase the load on my applications, my both worker node goes to not ready state and the pods changes their state to pending state,
how can I resolve this issue my cluster is in production at moment.
You should firstly run kubectl get events -n default to see why the nodes go into NotReady.
Usually your cluster is overloaded. Try using cluster autoscaler to dynamically manage your cluster capacity. Also ensure you have proper resource requests on your Pods.
Cluster consists of one master and one worker node. If the master is down and worker is restarted no workloads (deployments) are started on boot. How and if it is possible to make worker resume last state without master?
Kubernetes 1.18.3
On worker node are installed: kubelet, kubectl, kubeadm
Ideally you should have more than one(typically a odd number like 3 or 5) node serving as master and accessible from worker nodes via a LoadBalancer.
The state is stored in ETCD which is accessed by worker nodes via the API Server. So without master nodes running there is no way for workers to know the desired state.
Although it's not recommended you but can use static pod as potential solution here.Static Pods are managed directly by the kubelet daemon on a specific node, without the API server observing them.Unlike Pods that are managed by the control plane (for example, a Deployment ), instead the kubelet watches each static Pod (and restarts it if it crashes).
The caveat of using static pod is since those pods are not dependent on API Server Hence static Pods cannot be managed with kubectl or other Kubernetes API clients.
I have an AKS Cluster with two nodepools. Node pool 1 has 3 nodes, and nodepool 2 has 1 node - all Linux VMs. I noticed that after stopping the VMs and then doing kubectl get pods, the Pods status shows "running" though the VMs are not actually running. How is this possible?
This is the command I tried: kubectl get pods -n development -o=wide
The screenshot is given below. Though VMs are not running, the Pod status shows "running". However, trying to access the app using the Public IP of the service resulted in
ERR_CONNECTION_TIMED_OUT
Here is a full thread (https://github.com/kubernetes/kubernetes/issues/55713) on this issue. The problem here is by default the pod waits for 5 minutes before evicting to another node when the current node becomes notReady, but in this case none of the worker nodes are ready and hence pods are not getting evicted. Refer the git issue, there are some suggestions and solutions provided.
What is actually going is related to the kubelet processes running on the nodes cannot provide their status to the Kubernetes API server. Kubernetes will always assume that your PODs are running when the nodes associated with the POD are offline. The fact that all nodes are offline, will in fact cause your POD to not be running hence not being accessible, causing the ERR_CONNECTION_TIMED_OUT
You can run kubectl get nodes to get the status of the nodes, they should show NotReady. Please check and let me know.
Also, can you please provide the output for kubectl get pods -A