Hello all I'm pretty new to Kubernetes
I have a couple of regarding Kubernetes
I set memory request, limit as 300Mi and 600Mi respectively for a deployment. I have tested that if any POD reaches 600Mi then Kubernetes is terminating the POD and creating a new POD. My doubt is when POD is while terminating does POD drop any incoming requests or does cluster load balancer takes care this situation and routes the incoming requests to any other available POD's in the deployment.
What happens if all available POD's reaches the memory limit at the same time. Generally, it takes few minutes to creates new POD's. In this case, how kubernetes load balancer works, does it drops any incoming requests.
Is there any way to set life time for a POD ?
Thanks
Do you have Service in front of your Deployment Pods. If yes, then all requests coming through that Service will be forwarded to its available Endpoints. When a Pod is termination, Service removes that Pods IP from its Endpoint list. So, any requests that are coming will be forwarded to existing Pods
The set of Pods targeted by a Service is (usually) determined by a Label Selector
Read more about services-networking
In case of any reason, if all Pods are terminating and new Pods are not ready yet, then yes, some requests will be lost.
It first created a new Pod, then deleted some old Pods and created new ones. It does not kill old Pods until a sufficient number of new Pods have come up, and does not create new Pods until a sufficient number of old Pods have been killed
Read more about deployment behavior
Related
I have setup a EKS in AWS, setup 2 worker node and configured the autosclaing on those nodes with 3 as desired capacity.
Sometime my worker node goes down due to "an EC2 health check indicating it has been terminated or stopped." which results my pod get restarted. I have not enabled any replicas for the pods. It is one now.
Just wanted to know, how can my services (pod) will be highly available despite of any worked node goes down or restart?
If you have only one pod for your service, then your service is NOT highly available. It is a single point of failure. If that pod dies or is restarted, as has happened here, then during the time the pod is being restarted, your service is dead.
You need a bare minimum, TWO pods for a service to be highly available, they they should be on different nodes (you can force Kuberentes to schedule the pods on different nodes using pod antiaffinity (https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/) so that if one node goes down as in your example, it takes out only pod, leaving the other pod(s) to handle the requests until the other pod can be rescheduled.
I am running multiple pods with python/gunicorn serving web request. At times, request get really slow (up to 60s) which blocks all workers and makes the livenessProbe fail.
In some instances, all pods are blocked in this state and are restarted at the same time (graceful shutdown takes up to 60s). This means that no pod is available to take new requests.
Is there a way of telling k8s to cover for pods that it is restarting? For example starting a new pod when other pods are unhealthy.
You can have an ingress or a load balancer at L7 layer which can route traffic to kubernetes service which can have multiple backend pods(selected by labels of the pods and label selector of the service) which spread across different deployments running in different nodes. The ingress controller or loadbalancer can do health check on backends and stop routing traffic to unhealthy pods.This topology overall increases the availability and resiliency of the application.
I have a Kubernetes deployment that has 3 replicas. It starts 3 pods which are distributed across a given cluster. I would like to know how to reliably get one pod to contact another pod within the same ReplicaSet.
The deployment above is already wrapped up in a Kubernetes Service. But Services do not cover my use case. I need each instance of my container (each Pod) to start-up a local in memory cache and have these cache communicate/sync with other cache instances running on other Pods. This is how I see a simple distributed cache working on for my service. Pod to pod communication within the same cluster is allowed as per the Kubernetes Network Model but I cannot see a reliable way to address each a pod from another pod.
I believe I can use a StatefulSet, however, I don't want to lose the ClusterIP assigned to the service which is required by Ingress for load balancing.
Ofcourse you can use statefulset, and ingress doesn't need ClusterIP that assigned to the service, since it uses the endpoints, so 'headless service' is ok.
Now we have deployed services as Kubernetes Deployments with multiple replicas. Once the server crashes, Kubernetes will migrate its containers to another available server which tasks about 3~5 minutes.
While migrating, the client can access the the Deployment service because we still have other running replicas. But sometimes the requests fail because the load balancer redirect to the dead or migrating containers.
It would be great if Kubernetes could kickoff the dead replicas automatically and add them once they run in other servers. Otherwise, we need to setup LB like haproxy to do the same job with multiple Deployment instances.
You need to configure health checking to have properly working load balancing for a Service. Please have a read of:
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/
The kubelet uses readiness probes to know when a Container is ready to start accepting traffic. A Pod is considered ready when all of its Containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers.
1、kubelet
--node-status-update-frequency duration
Specifies how often kubelet posts node status to master. Note: be cautious when changing the constant, it must work with nodeMonitorGracePeriod in nodecontroller. Default: 10s (default 10s)
2、controller-manager
--node-monitor-grace-period duration
Amount of time which we allow running Node to be unresponsive before marking it unhealthy. Must be N times more than kubelet's nodeStatusUpdateFrequency, where N means number of retries allowed for kubelet to post node status. (default 40s)
--pod-eviction-timeout duration
The grace period for deleting pods on failed nodes. (default 5m0s)
I have a "homogeneous" Kubernetes setup. By this I mean that I am only running instances of a single type of pod (an http server) with a load balancer service distributing traffic to them.
By my reasoning, to get the most out of my cluster (edit: to be concrete -- getting the best average response times to http requests) I should have:
At least one pod running on every node: Not having a pod running on a node, means that I am paying for the node and not having it ready to serve a request.
At most one pod running on every node: The pods are threaded http servers so they can maximize utilization of a node, so running multiple pods on a node does not net me anything.
This means that I should have exactly one pod per node. I achieve this using a DaemonSet.
The alternative way is to configure a Deployment and apply a HorizontalPodAutoscaler to it and have Kubernetes handle the number of pods and pod to node mapping. Is there any disadvantage of my approach in comparison to this?
My evaluation is that the HorizontalPodAutoscaler is relevant mainly in heterogeneous situations, where one HorizontalPodAutoscaler can scale up a Deployment at the expense of another Deployment. But since I have only one type of pod, I would have only one Deployment and I would be scaling up that deployment at the expense of itself, which does not make sense.
HorizontalPodAutoscaler is actually a valid solution for your needs. To address your two concerns:
1. At least one pod running on every node
This isn't your real concern. The concern is underutilizing your cluster. However, you can be underutilizing your cluster even if you have a pod running on every node. Consider a three-node cluster:
Scenario A: pod running on each node, 10% CPU usage per node
Scenario B: pod running on only one node, 70% CPU usage
Even though Scenario A has a pod on each node the cluster is actually being less utilized than in Scenario B where only one node has a pod.
2. At most one pod running on every node
The Kubernetes scheduler tries to spread pods around so that you don't end up with multiple pods of the same type on a single node. Since in your case the other nodes should be empty, the scheduler should have no problems starting the pods on the other nodes. Additionally, if you have the pod request resources equivalent to the node's resources, that will prevent the scheduler from scheduling a new pod on a node that already has one.
Now, you can achieve the same effect whether you go with DaemonSet or HPA, but I personally would go with HPA since I think it fits your semantics better, and would also work much better if you eventually decide to add other types of pods to your cluster
Using a DamonSet means that the pod has to run on every node (or some subset). This is a great fit for something like a logger or a metrics collector which is per-node. But you really just want to use available cluster resources to power your pod as needed, which matches up better with the intent of HPA.
As an aside, I believe GKE supports cluster autoscaling, so you should never be paying for nodes that aren't needed.