I have a service set up in Kubernetes which seems to be a fairly normal: deployment, service, and HPA. However, it does something which I'd like to fix. The sequence of events goes like this:
We change the deployed image, which creates new pods.
The pods become healthy and enter the service through the label selector.
The HPA enters an unhealthy state because it cannot read the new pod metrics.
I get notified through Argo rollouts that the HPA is unhealthy.
I'd like to somehow delay pods entering service until their metrics are ready so we don't get this false alarm on every deploy.
Right now, we solve this by waiting 60 seconds before changing the labels in our blue/green rollout script, but that's pretty unsatisfying!
I think I could also do this by creating a liveness probe that asked for the pod's metrics, but it seems like a lot of hassle for something that seems like it should be easy. (for example, it doesn't look like I have the current namespace in the environment by default. I guess I could get it with the downward API, but I'd also have to bundle curl or kubectl in my container images even if I had it, which I'd prefer not to do.)
Anyway, are other people even seeing this? If so, how are you solving it?
Editing to add information requested in a comment: we use Kubernetes 1.21 on an Amazon EKS cluster.
Related
We have a bunch of pods running in dev environment. The pods are auto-provisioned by an application on every business action. The problem is that across various namespaces they are accumulating and eating available resources in EKS.
Is there a way without jenkins/k8s jobs to simply put some parameter on the pod manifest to tell it to self destruct say in 24 hours?
Add to your pod.spec:
activeDeadlineSeconds: 86400
After deadline your Pod will be stopped for good with the status DeadlineExceeded
If I understood your situation properly, you would like to scale your cluster down in order to save resources.
Kubernetes is featured with the ability to autoscale your application in a cluster. Literally, it means that Kubernetes can start additional pods when the load is increasing and terminate excessive pods when the load is decreasing.
It is possible to downscale the application to zero pods, but, in this case, you will have a delay serving the first request while the pod is starting.
This functionality relies on performance metrics. From the practical side, it means that autoscaling doesn't happen instantly, because it takes some time to performance metrics reach the configured threshold.
The mentioned Kubernetes feature called HPA(horizontal pod autoscale) is described in this document.
In case you are running your cluster on GCP or GKE, you are able to go further and automatically start additional nodes for your cluster when you need more computing capacity and shut down nodes when they are not running application pods anymore.
More information about this functionality can be found following the link.
Last, but not least, you can use tool like Ansible to manage all your kubernetes assets (it can create/manage deployments via playbooks).
If you decide to give it a try, you might find this information useful:
Creating a Container cluster in GKE
70% cheaper Kubernetes cluster on AWS
How to build a Kubernetes Horizontal Pod Autoscaler using custom metrics
I'm trying to create a deployment along with a service and then access the service immediately as soon as the rollout is complete:
> kubectl create -f my-deployment.yaml
> kubectl create -f my-service.yaml
> kubectl rollout status deployment/my-deployment --watch --timeout 10m # This usually takes ~30 seconds
deployment "my-deployment" successfully rolled out
> curl "my-service" # This happens inside a pod, so the service DNS name should be available
Sometimes this works, but there seems to be a race condition -- if the curl command happens too quickly, it seems the socket fails to connect and I get a connection timeout.
This seems like the behavior I would get if there were no ready pods, as per this question: What happens when a service receives a request but has no ready pods?
I expected that the completion of the rollout meant that the service was guaranteed to be ready to go. Is this not the case? Is there some Kubernetes command to "wait" for the service to be available? (I notice that services don't have conditions, so you can't do kubectl wait...)
To know if a service is ready you can check if an endpoints object exists with the name of the service and if that endpoint object has IPs or not. If IPs are there that means service is ready. But there is no guarantee that it will still not fail because there can be a network issue in your infrastructure.
K8s primitives that manage pods, such as Deployment, only take pod status into account for decision making, such as advancement during a rolling update.
For example, during deployment rolling update, a new pod becomes ready. On the other hand, service, network policy, and load-balancer are not yet ready for the new pod due to whatever reason (e.g. slowness in API machinery, endpoints controller, kube-proxy, iptables, or infrastructure programming). This may cause service disruption or loss of backend capacity. In extreme cases, if rolling update completes before any new replacement pod actually starts serving traffic, this will cause service outage.
Here is the proposal to improve pod readiness which is motivated by the above problem.
My answer is slightly different from what you're asking but probably it will be useful to you.
I propose using Helm to improve your deployment experience.
How can it help you?
Helm has several flags like --wait that can be applied during the update. It will make sure all the resources were successfully created before moving forward.
As a leaner of Kubernetes concepts, their working, and deployment with it. I have a couple of cases which I don't know how to achieve. I am looking for advice or some guideline to achieve it.
I am using the Google Cloud Platform. The current running flow is described below. A push to the google source repository triggers Cloud Build which creates a docker image and pushes the image to the running cluster nodes.
Case 1: Now I want that when new pods are up and running. Then traffic is routed to the new pods. Kill old pod but after each pod complete their running request. Zero downtime is what I'm looking to achieve.
Case 2: What will happen if the space of running pod reaches 100 and in the Debian case that the inode count reaches full capacity. Will kubernetes create new pods to manage?
Case 3: How to manage pod to database connection limits?
Like the other answer use Liveness and Readiness probes. Basically, a new pod is added to the service pool then it will only serve traffic after the readiness probe has passed. The old pod is removed from the Service pool, then drained and then terminated. This happens on a rolling fashion one pod at a time.
This really depends on the capacity of your cluster and the ability to schedule pods depending on the limits for the containers in them. For more about setting up limits for containers refer to here. In terms of the inode limit, if you reach it on a node, the kubelet won't be able to run any more pods on that node. The kubelet eviction manager also has a mechanism in where evicts some pods using the most inodes. You can also configure your eviction thresholds on the kubelet.
This would be more a limitation at the OS level combined your stateful application configuration. You can keep this configuration in a ConfigMap. And for example in something for MySql the option would be max_connections.
I can answer case 1 since Ive done it myself.
Use Deployments with readinessProbes & livelinessProbes
I am trying to create a system health check. I would like to be able to determine how often the pods are restarting. Since the pods have liveness probes, they may continuously restart if something is wrong. Is there anyway to check the time a pod restarted at instead of just the number of restarts and the total time the pod has been alive?
You may want to use a monitoring tool, like cAdvisor/Heapster/Grafana/Prometheus.
Another option you have is to deploy the kube-state-metrics container.
Good articles to read: Monitoring Kubernetes and Get Kubernetes Cluster Metrics with Prometheus.
Also, read a similar question on stackoverflow.
Is there a way to monitor the pod status and restart count of pods running in a GKE cluster with Stackdriver?
While I can see CPU, memory and disk usage metrics for all pods in Stackdriver there seems to be no way of getting metrics about crashing pods or pods in a replica set being restarted due to crashes.
I'm using a Kubernetes replica set to manage the pods, hence they are respawned and created with a new name when they crash. As far as I can tell the metrics in Stackdriver appear by pod-name (which is unique for the lifetime of the pod) which doesn't sound really sensible.
Alerting upon pod failures sounds like such a natural thing that it sounds hard to believe that this is not supported at the moment. The monitoring and alerting capabilities that I get from Stackdriver for Google Container Engine as they stand seem to be rather useless as they are all bound to pods whose lifetime can be very short.
So if this doesn't work out of the box are there known workarounds or best practices on how to monitor for continuously crashing pods?
You can achieve this manually with the following:
In Logs Viewer, creating the following filter:
resource.labels.project_id="<PROJECT_ID>"
resource.labels.cluster_name="<CLUSTER_NAME>"
resource.labels.namespace_name="<NAMESPACE, or default>"
jsonPayload.message:"failed liveness probe"
Create a metric by clicking on the Create Metric button above the filter input and filling up the details.
You may now track this metric in Stackdriver.
Would be happy to be informed of a built-in metric instead of this.
There is a built in metric now, so it's easy to dashboard and/or alert on it without setting up custom metrics
Metric: kubernetes.io/container/restart_count
Resource type: k8s_container
In my cluster (a bare-metal k8s cluster),I use kube-state-metrics https://github.com/kubernetes/kube-state-metrics to do what you want. This project belongs to kubernetes repo and it is quite easy to use. Once deployed u can use kube_pod_container_status_restarts this metrics to know if a container restarts
Others have commented on how to do this with metrics, which is the right solution if you have a very large number of crashing pods.
An alernative approach is to treat crashing pods as discrete events or even log-lines. You can do this with Robusta (disclaimer, I wrote this) with YAML like this:
triggers:
- on_pod_update: {}
actions:
- restart_loop_reporter:
restart_reason: CrashLoopBackOff
- image_pull_backoff_reporter:
rate_limit: 3600
sinks:
- slack
Here we're triggering an action named restart_loop_reporter whenever a pod updates. The data stream comes from the APIServer.
The restart_loop_reporter is an action which filters out non-crashing pods. Above it's configured to report only on CrashLoopBackOffs but you could remove that to report all crashes.
A benefit of doing it this way is that you can gather extra data about the crash automatically. For example, the above will fetch the pod's logs and forward them along with the crash report.
I'm sending the result here to Slack, but you could just as well send it to a structured output like Kafka (already builtin) or Stackdriver (not yet supported, but I can fix that if you like).
Remember that, you can always raise feature request if the options available are not enough.