We had a system outage, service was unresponsive and I restarted the service with kubectl rollout restart sts myservice and it worked. However, I want to look at the logs to see a cause of the problem. When I try kubect logs --previous myservice-0 it says 'previous terminated container "mycontainer" in pod "myservice-0" not found'. Is there a way to find the logs before the restart? I tried to look at the dead docker containers (docker ps -a), there are containers exited 6 month ago, but no recently exited containers of my service, why is so?
I suggest the following reading: The Complete Guide to Kubernetes Logging:
In Kubernetes, when pods are evicted, crashed, deleted, or scheduled
on a different node, the logs from the containers are gone. The system
cleans up after itself. Therefore you lose any information about why
the anomaly occurred.
Also, as per Logging Architecture:
If you want to access the application's logs if a container crashes; a
pod gets evicted; or a node dies, [...] you need a separate backend to
store, analyze, and query logs. Kubernetes does not provide a native
storage solution for log data. Instead, there are many logging
solutions that integrate with Kubernetes.
Some example of those log aggregation solutions are:
The ELK Stack (Elasticsearch, Logstash, Kibana)
The EFK Stack (Elasticsearch, Fluentd, Kibana)
Related
When I try to retrieve logs from my pods, I note that K8s does not print all the logs, and I know that because I observe that logs about microservice initialization are not present in the head of logs.
Considering that my pods print a lot of logs in a long observation period, does someone know if K8s has a limit in showing all logs?
I also tried to set --since parameter in the kubectl logs command to get all logs in a specific time range, but it seems to have no effect.
Thanks.
The container runtime engine typically manages container (pod) logs. Do check the settings on the runtime engine in use.
There seems to be an issue with the logging earlier. Attaching the link for the same. https://github.com/kubernetes/kubernetes/pull/78071
There are some answers, I'll add more details and sources.
The answer is quite short. There is no limit but free space. By default kubernetes is not responsible for log rotation:
An important consideration in node-level logging is implementing log
rotation, so that logs don't consume all available storage on the
node. Kubernetes is not responsible for rotating logs, but rather a
deployment tool should set up a solution to address that. For example,
in Kubernetes clusters, deployed by the kube-up.sh script, there is a
logrotate tool configured to run each hour. You can also set up a
container runtime to rotate an application's logs automatically.
As it was stated by William, Kubernetes itself doesn’t provide log aggregation of its own and it relies on container runtime by default.
When a container running on Kubernetes writes its logs to stdout or
stderr streams, they are picked up by the kubelet service running on
that node, and are delegated to the container engine for handling
based on the logging driver configured in Kubernetes.
In most cases, Docker container logs will end up in the
/var/log/containers directory on your host. Docker supports multiple
logging drivers but, unfortunately, Kubernetes API does not support
driver configuration.
Once a container terminates or restarts, kubelet keeps its logs on the
node. To prevent these files from consuming all of the host’s storage,
a log rotation mechanism should be set on the node.
Kubernetes doesn’t provide built-in log rotation, but this
functionality is available in many tools, such as Docker’s log-opt, or
standard file shippers or even a simple custom cron job. When a
container is evicted from the node, so are its corresponding log files
That means you can try to find full logs in /var/log/containers and var/log/pods. This part is from official documentation and more precise:
By default, if a container restarts, the kubelet keeps one terminated
container with its logs. If a pod is evicted from the node, all
corresponding containers are also evicted, along with their logs.
To have a good visibility and accessibility of logs you may consider having a dedicated solution for logs storing. E.g. node logging agent or streaming to a sidecar
Please find articles and official kubernetes documentation with concepts and examples:
Kubernetes logging architecture
Practical guide to kubernetes
I have an issue that, at face value, appears to indicate that I have two deployments running in parallel within my kube cluster, but 'kubectl get pods' only shows one deployment.
My deployment is composed of a pod with two containers. One of the containers runs a golang application that creates an http API endpoint, and the other runs Telegraf to read metrics from the API endpoint and push them to InfluxDB. When writing the data to Influx I tag the data with the source host as the name of the pod. I use Grafana to plot the metrics and I can clearly see incoming streaming data coming from two hosts (e.g. I can set a "WHERE host=" query clause as either "application-pod-name-231620957-7n32f" and "application-pod-name-1931165991-x154c").
Based on the above, I'm fairly certain that two deployments of the pod are running, each with the two containers (one providing application metrics and the other with telegraf sending metrics to InfluxDB).
However, kube seems to think that one of the deployments doesn't exist. As mentioned, "kubectl get pods" doesn't display the 2nd pod name in any way shape or form. Only one of them.
Has anyone seen this? Any ideas on further troubleshooting? I've attempted to use the pod name (that I have within telegraf) to query more information using kubectl but always get the response that the pod doesn't exist... but it must exist! It's sending live data!
We had been experiencing issues with a node within the cluster. Specifically, the node was experiencing GC failures and communications into the cluster from that node was broken. Due to these failures, someone on our team performed a 'kubectl delete' on the node from within the cluster. By doing so the node continued running, but also the kubelet running on the node remained in a broken state, and so the node couldn't re-auto-register itself into the cluster. This node happened to be running the 2nd pod, and the pods running on the node continued running without issue. In our case, the node was running on AWS, in which case the way to avoid this situation is to reboot the node either from the AWS console or AWS API.
Basic info
Hi, I'm encountering a problem with Kubernetes StatefulSets. I'm trying to spin up a set with 3 replicas.
These replicas/pods each have a container which pings a container in the other pods based on their network-id.
The container requires a response from all the pods. If it does not get a response the container will fail. In my situation I need 3 pods/replicas for my setup to work.
Problem description
What happens is the following. Kubernetes starts 2 pods rather fast. However since I need 3 pods for a fully functional cluster the first 2 pods keep crashing as the 3rd is not up yet.
For some reason Kubernetes opts to keep restarting both pods instead of adding the 3rd pod so my cluster will function.
I've seen my setup run properly after about 15 minutes because Kubernetes added the 3rd pod by then.
Question
So, my question.
Does anyone know a way to delay restarting failed containers until the desired amount of pods/replicas have been booted?
I've since found out the cause of this.
StatefulSets launch pods in a specific order. If one of the pods fails to launch it does not launch the next one.
You can add a podManagementPolicy: "Parallel" to launch the pods without waiting for previous pods to be Running.
See this documentation
I think a better way to deal with your problem is to leverage liveness probe, as described in the document, rather than delay the restart time (not configurable in the YAML).
Your pods respond to the liveness probe right after they are started to let Kubernetes know they are alive, which prevents them from being restarted. Meanwhile, your pods keep ping others until they are all up. Only when all your pods are started will serve the external requests. This is similar to creating a Zookeeper ensemble.
We have an application with 4 pods running with a load balancer! We want to try the rolling update, but we are not sure what happens when a pod goes down! The documentation is unclear! Particularly this quote from Termination Of Pods:
Pod is removed from endpoints list for service, and are no longer considered part of the set of running pods for replication controllers. Pods that shutdown slowly can continue to serve traffic as load balancers (like the service proxy) remove them from their rotations.
So, if someone can guide us on the following questions :
1.) When a pod is shutting down, can it still serve new requests? Or does the load balancer not consider it?
2.) Does it complete the requests it is processing till the grace-period is exhausted? and then kills the container even if any process is still running?
3.) Also, this mentions replication controllers, what we have is a Deployment and Deployment has replica sets, so will there be any difference?
We went through this question but the answers are conflicting without any source : Does a Kubernetes rolling-update gracefully remove pods from a service load balancer
1) when a Pod is shutting down it's state is changed to Terminating and it is not considered by the LoadBalancer - as described in the Pod termination docs
2) Yes - you might want to look at the pod.Spec.TerminationGracePeriodSeconds configuration to gain some control. You'll find details in the API documentation
3) No - the ReplicaSet and the Deployment take care of scheduling Pods, there's no difference when it comes to the shutdown behaviour of the Pods
How long does a pod persist without a replication controller?
I have run some pods that have a very simple purpose, they execute and then terminate. Other pods like a database server pod persists for quite a longer time. However after a day or so, the pod would terminate. I know docker containers exit once their process has finished running, but why would my database pods continue running for a while and then randomly exit.
What controls the termination of a pod?
The easiest way for you to find a definitive answer to that question would be to kubectl describe pod <podName>, or kubectl get events. Any pod termination would have an associated event that you can use to diagnose the reason.
Pods may die due to several reasons, ranging from errors within the container, to a node going down for maintenance. You can usually set the appropriate RestartPolicy, which will restart the pod if it fails (except in case of node failure). If you have multiple nods and would like the pod to be restarted on a different node, you should use a higher level controller like a ReplicaSet or Deployment.
For pods expected to terminate, a job is better suited.