Kubernetes (K8s) - Can you check what time a pod restarts at? - kubernetes

I am trying to create a system health check. I would like to be able to determine how often the pods are restarting. Since the pods have liveness probes, they may continuously restart if something is wrong. Is there anyway to check the time a pod restarted at instead of just the number of restarts and the total time the pod has been alive?

You may want to use a monitoring tool, like cAdvisor/Heapster/Grafana/Prometheus.
Another option you have is to deploy the kube-state-metrics container.
Good articles to read: Monitoring Kubernetes and Get Kubernetes Cluster Metrics with Prometheus.
Also, read a similar question on stackoverflow.

Related

wait for metrics to become available in metrics-server

I have a service set up in Kubernetes which seems to be a fairly normal: deployment, service, and HPA. However, it does something which I'd like to fix. The sequence of events goes like this:
We change the deployed image, which creates new pods.
The pods become healthy and enter the service through the label selector.
The HPA enters an unhealthy state because it cannot read the new pod metrics.
I get notified through Argo rollouts that the HPA is unhealthy.
I'd like to somehow delay pods entering service until their metrics are ready so we don't get this false alarm on every deploy.
Right now, we solve this by waiting 60 seconds before changing the labels in our blue/green rollout script, but that's pretty unsatisfying!
I think I could also do this by creating a liveness probe that asked for the pod's metrics, but it seems like a lot of hassle for something that seems like it should be easy. (for example, it doesn't look like I have the current namespace in the environment by default. I guess I could get it with the downward API, but I'd also have to bundle curl or kubectl in my container images even if I had it, which I'd prefer not to do.)
Anyway, are other people even seeing this? If so, how are you solving it?
Editing to add information requested in a comment: we use Kubernetes 1.21 on an Amazon EKS cluster.

How to get information about scaled up pods in a Kubernetes cluster

I have a Kubernetes cluster and I am figuring out in what numbers have pods got scaled up using the Kubectl command.
what is the possible way to get the details of all the scaled-up and scaled-down pods within a month?
That is not information Kubernetes records. The Events system keeps some debugging messages that include stuff about pod startup and sometimes shutdown, but that's only kept for a few hours. For long term metrics look at something like Prometheus + kube-state-metrics.
If you have k8s audit Policy in place, you can find the Events that are passed through the k8s API by filtering them in Cloud Audit Logs or elasticsearch! it depends on you current setup!

Is it possible to schedule a pod to run for say 24 hours and then remove deployment/statefulset? or need to use jobs?

We have a bunch of pods running in dev environment. The pods are auto-provisioned by an application on every business action. The problem is that across various namespaces they are accumulating and eating available resources in EKS.
Is there a way without jenkins/k8s jobs to simply put some parameter on the pod manifest to tell it to self destruct say in 24 hours?
Add to your pod.spec:
activeDeadlineSeconds: 86400
After deadline your Pod will be stopped for good with the status DeadlineExceeded
If I understood your situation properly, you would like to scale your cluster down in order to save resources.
Kubernetes is featured with the ability to autoscale your application in a cluster. Literally, it means that Kubernetes can start additional pods when the load is increasing and terminate excessive pods when the load is decreasing.
It is possible to downscale the application to zero pods, but, in this case, you will have a delay serving the first request while the pod is starting.
This functionality relies on performance metrics. From the practical side, it means that autoscaling doesn't happen instantly, because it takes some time to performance metrics reach the configured threshold.
The mentioned Kubernetes feature called HPA(horizontal pod autoscale) is described in this document.
In case you are running your cluster on GCP or GKE, you are able to go further and automatically start additional nodes for your cluster when you need more computing capacity and shut down nodes when they are not running application pods anymore.
More information about this functionality can be found following the link.
Last, but not least, you can use tool like Ansible to manage all your kubernetes assets (it can create/manage deployments via playbooks).
If you decide to give it a try, you might find this information useful:
Creating a Container cluster in GKE
70% cheaper Kubernetes cluster on AWS
How to build a Kubernetes Horizontal Pod Autoscaler using custom metrics

Kubernetes Deployment with Zero Down Time

As a leaner of Kubernetes concepts, their working, and deployment with it. I have a couple of cases which I don't know how to achieve. I am looking for advice or some guideline to achieve it.
I am using the Google Cloud Platform. The current running flow is described below. A push to the google source repository triggers Cloud Build which creates a docker image and pushes the image to the running cluster nodes.
Case 1: Now I want that when new pods are up and running. Then traffic is routed to the new pods. Kill old pod but after each pod complete their running request. Zero downtime is what I'm looking to achieve.
Case 2: What will happen if the space of running pod reaches 100 and in the Debian case that the inode count reaches full capacity. Will kubernetes create new pods to manage?
Case 3: How to manage pod to database connection limits?
Like the other answer use Liveness and Readiness probes. Basically, a new pod is added to the service pool then it will only serve traffic after the readiness probe has passed. The old pod is removed from the Service pool, then drained and then terminated. This happens on a rolling fashion one pod at a time.
This really depends on the capacity of your cluster and the ability to schedule pods depending on the limits for the containers in them. For more about setting up limits for containers refer to here. In terms of the inode limit, if you reach it on a node, the kubelet won't be able to run any more pods on that node. The kubelet eviction manager also has a mechanism in where evicts some pods using the most inodes. You can also configure your eviction thresholds on the kubelet.
This would be more a limitation at the OS level combined your stateful application configuration. You can keep this configuration in a ConfigMap. And for example in something for MySql the option would be max_connections.
I can answer case 1 since Ive done it myself.
Use Deployments with readinessProbes & livelinessProbes

Configure Kubernetes StatefulSet to start pods first restart failed containers after start?

Basic info
Hi, I'm encountering a problem with Kubernetes StatefulSets. I'm trying to spin up a set with 3 replicas.
These replicas/pods each have a container which pings a container in the other pods based on their network-id.
The container requires a response from all the pods. If it does not get a response the container will fail. In my situation I need 3 pods/replicas for my setup to work.
Problem description
What happens is the following. Kubernetes starts 2 pods rather fast. However since I need 3 pods for a fully functional cluster the first 2 pods keep crashing as the 3rd is not up yet.
For some reason Kubernetes opts to keep restarting both pods instead of adding the 3rd pod so my cluster will function.
I've seen my setup run properly after about 15 minutes because Kubernetes added the 3rd pod by then.
Question
So, my question.
Does anyone know a way to delay restarting failed containers until the desired amount of pods/replicas have been booted?
I've since found out the cause of this.
StatefulSets launch pods in a specific order. If one of the pods fails to launch it does not launch the next one.
You can add a podManagementPolicy: "Parallel" to launch the pods without waiting for previous pods to be Running.
See this documentation
I think a better way to deal with your problem is to leverage liveness probe, as described in the document, rather than delay the restart time (not configurable in the YAML).
Your pods respond to the liveness probe right after they are started to let Kubernetes know they are alive, which prevents them from being restarted. Meanwhile, your pods keep ping others until they are all up. Only when all your pods are started will serve the external requests. This is similar to creating a Zookeeper ensemble.