deployment fails to recreate a successful replicaset - deployment

We are using Kubernetes 1.8 to deploy our software in a cloud provider. Frequently, when deploying a specific pod-template, the deployment fails to create a successful replicaset and no instance is created. I am not able to find a better description than kubectl describe deploy.
Type Status Reason
---- ------ ------
Available False MinimumReplicasUnavailable
Progressing False ProgressDeadlineExceeded
OldReplicaSets: <none>
NewReplicaSet: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 21m (x3 over 2d) deployment-controller Scaled up replica set cbase-d-6bbfbdb5dc to 1
Normal ScalingReplicaSet 19m (x3 over 2d) deployment-controller Scaled down replica set cbase-d-6bbfbdb5dc to 0

also you can check the status of the replicaet:
kubectl describe replicaset cbase-d-6bbfbdb5dc
hope you will find the conditions and the reason why the replicaset could not be scaled up

While this might not be always true but a likely reason could be the unavailability of resources. Try increasing the resources (cpu+memory) allocated to the cluster.
This was exactly the error I got and increasing allocated resources fixed the issue (on GKE).

I got a similar error like yours yesterday and finally figured out that I could get error message from the pod corresponds with the deployment by using command kubectl get pod YOUR_POD_NAME -o yaml. You can check the status and error message there.

Related

Pod stuck in pending status

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 18m (x145 over 3h19m) default-scheduler 0/2 nodes are available: 1 node(s) didn't match Pod's node affinity, 1 node(s) had taint {node-role.kubernetes.io/controlplane: true}, that the pod didn't tolerate.
Please provide me a solution to deploy the pods on worker server.
It seems that your pod is not getting scheduled to a Node. Can you try to run the below command ?
kubectl taint nodes <name-node-master> node-role.kubernetes.io/control-plane:NoSchedule-
To find the name of your node please use
kubectl get nodes

Does apply works according to the rolling update policy?

I know about several ways to perform rolling update of deployment. But do either kubectl apply -f deployment.yaml or kubectl apply -k ... update deployment according to the rolling update policy of a new version of deployment or an old one?
Yes it will, with one note :
Note: A Deployment's rollout is triggered if and only if the
Deployment's Pod template (that is, .spec.template) is changed, for
example if the labels or container images of the template are updated.
Other updates, such as scaling the Deployment, do not trigger a
rollout.
Reference : https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#updating-a-deployment
For example, you can see the events section of a deployment update after updating the nginx image and running kubectl apply -f nginx-deploy.yml :
...
NewReplicaSet: nginx-deployment-559d658b74 (3/3 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 112s deployment-controller Scaled up replica set nginx-deployment-66b6c48dd5 to 3
Normal ScalingReplicaSet 44s deployment-controller Scaled up replica set nginx-deployment-559d658b74 to 1
Normal ScalingReplicaSet 20s deployment-controller Scaled down replica set nginx-deployment-66b6c48dd5 to 2
Normal ScalingReplicaSet 20s deployment-controller Scaled up replica set nginx-deployment-559d658b74 to 2
Normal ScalingReplicaSet 19s deployment-controller Scaled down replica set nginx-deployment-66b6c48dd5 to 1
Normal ScalingReplicaSet 19s deployment-controller Scaled up replica set nginx-deployment-559d658b74 to 3
Normal ScalingReplicaSet 18s deployment-controller Scaled down replica set nginx-deployment-66b6c48dd5 to 0
$ kubectl get deploy
NAME READY UP-TO-DATE AVAILABLE AGE
nginx-deployment 3/3 3 3 114s

Can I get events from other resources in addition to the pod in Kubernetes?

When running this command for resources ( deployment, ReplicaSet ...) other than Pod
$ kubectl describe deployment xxx-deployment
---- ------ ------
Events: <none>
I have deployed several resources, but I haven't seen the event yet except for Pod.
What type of event will occur if events occur in other resources?
Could you recommend any materials to refer to?
Good explanation what is event in Kubernetes you can find in Types of Kubernetes Events article. Author also mentioned about types of events.
Kubernetes events are a resource type in Kubernetes that are automatically created when other resources have state changes, errors, or other messages that should be broadcast to the system. While there is not a lot of documentation available for events, they are an invaluable resource when debugging issues in your Kubernetes cluster.
You can describe not only pod, deployment or replicaset but almost all resources in kubernetes.
Examples:
kubectl describe job pi -n test
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 12s job-controller Created pod: pi-5rgbz
kubectl describe node ubuntu
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning MissingClusterDNS 22h (x98 over 23h) kubelet, ubuntu-18 kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to "Default" policy.
Normal Starting 22h kubelet, ubuntu-18 Starting kubelet.
Warning InvalidDiskCapacity 22h kubelet, ubuntu-18 invalid capacity 0 on image filesystem
Normal NodeHasSufficientMemory 22h kubelet, ubuntu-18 Node ubuntu-18 status is now: NodeHasSufficientMemory
Normal NodeHasSufficientPID 22h
To list all resources events you can use
$ kubectl get events --all-namespaces
$ kubectl get events --all-namespaces
NAMESPACE LAST SEEN TYPE REASON OBJECT MESSAGE
default 50m Normal Starting node/gke-cluster-1-default-pool-XXXXXXXXXXXXX Starting kubelet.
default 50m Normal NodeHasSufficientMemory node/gke-cluster-1-default-pool-XXXXXXXXXXXXX Node gke-cluster-1-default-pool-XXXXXXXXXXXXX status is now: NodeHasSufficientMemory
default 2m47s Normal SuccessfulCreate job/pi Created pod: pi-5rgbz
kube-system 50m Normal ScalingReplicaSet deployment/fluentd-gcp-scaler Scaled up replica set fluentd-gcp-scaler-6855f55bcc to 1
In Object column you resource type.
If you would like more detailed information you can use -o wide flag - $ kubectl get events --all-namespaces -o wide
$ kubectl get events -o wide
LAST SEEN TYPE REASON OBJECT SUBOBJECT SOURCE MESSAGE
FIRST SEEN COUNT NAME
20m Normal Scheduled pod/hello-world-86d6c6f84d-8qz9d default-scheduler Successfully assigned default/hello-world-86d
6c6f84d-8qz9d to ubuntu-18
Possibly root cause.
I wasn't able to create deployment without any event at the beginning I would guess that you have set --event-ttl which is described in Kube-apiserver docs.
--event-ttl duration Default: 1h0m0s
Amount of time to retain events.
It was also mentioned in Github thread.
In short, all events will disappear after 1 hour if you have this flag set.
To check if you have this flag set in kube-apiserver you can check this StackOverflow thread.
If this didn't help you please edit your question with informations like your configuration YAMLs, what version of K8s are you using, steps to reproduce etc.
Well yes deployment do have events. But keep that in mind events only available for around 1 hr.
you can also filter by labels with --labelsfor describe all resources

Can kubernetes provide a verbose description of its scheduling decisions?

When scheduling a kubernetes Job and Pod, if the Pod can't be placed the explanation available from kubectl describe pods PODNAME looks like:
Warning FailedScheduling <unknown> default-scheduler 0/172 nodes are available:
1 Insufficient pods, 1 node(s) were unschedulable, 11 Insufficient memory,
30 Insufficient cpu, 32 node(s) didn't match node selector, 97 Insufficient nvidia.com/gpu.
That's useful but a little too vague. I'd like more detail than that.
Specifically can I list all nodes with the reason the pod wasn't scheduled to each particular node?
I was recently changing labels and the node selector and want to determine if I made a mistake somewhere in that process or if the nodes I need really are just busy.
You can find more details related to problems with scheduling particular Pod in kube-scheduler logs. If you set up your cluster with kubeadm tool, kube-scheduler as well as other key components of the cluster is deployed as a system Pod. You can list such Pods with the following command:
kubectl get pods -n kube-system
which will show you among others your kube-scheduler Pod:
NAME READY STATUS RESTARTS AGE
kube-scheduler-master-ubuntu-18-04 1/1 Running 0 2m37s
Then you can check its logs. In my example the command will look as follows:
kubectl logs kube-scheduler-master-ubuntu-18-04 -n kube-system
You should find there the information you need.
One more thing...
If you've already verified it, just ignore this tip
Let's start from the beginning...
I've just created a simple job from the example you can find here:
kubectl apply -f https://k8s.io/examples/controllers/job.yaml
job.batch/pi created
If I run:
kubectl get jobs
it shows me:
NAME COMPLETIONS DURATION AGE
pi 0/1 17m 17m
Hmm... completions 0/1 ? Something definitely went wrong. Let's check it.
kubectl describe job pi
tells me basically nothing. In it's events I can see only:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 18m job-controller Created pod: pi-zxp4p
as if everything went well... but we already know it didn't. So let's investigate further. As you probably know, job-controller creates Pods that run to completion to perform certain task. From the perspective of the job-controller everything went well (we've just seen it in it's events):
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 23m job-controller Created pod: pi-zxp4p
It did it's part of the task and reported that everything went fine. But it's just part of the whole task. It passed actual Pod creation task further to the kube-scheduler controller as being just a job-controller it isn't responsible (and doesn't even have enough privileges) to schedule the actual Pod on particular node. If we run:
kubectl get pods
we can see one Pod in a Pending state:
NAME READY STATUS RESTARTS AGE
pi-zxp4p 0/1 Pending 0 30m
Let's describe it:
kubectl describe pod pi-zxp4p
In events we can see some very important and specific info:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 20s (x24 over 33m) default-scheduler 0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
so now we know the actual reason why our Pod couldn't be scheduled.
Pay attention to different fields of the event:
From: default-scheduler - it means that the message was originated from our kube-scheduler.
Type: Warning, which isn't as important as Critical or Error so chances are that it may not appear in kube-scheduler logs if the last one was started with the default level of log verbosity.
You can read here that:
As per the comments, the practical default level is V(2). Developers
and QE environments may wish to run at V(3) or V(4). If you wish to
change the log level, you can pass in -v=X where X is the desired
maximum level to log.

Is there a way in Kubernetes to check when hpa happened?

I have hpa configured for one of my deployment in Kubernetes.
Is there any way to check if HPA scaling happened to the deployment and when it happened?
I don't have prometheus or any monitoring solutions deployed.
If you created HPA you can check current status using command
$ kubectl get hpa
You can also use "watch" flag to refresh view each 30 seconds
$ kubectl get hpa -w
To check if HPA worked you have to describe it
$ kubectl describe hpa <yourHpaName>
Information will be in Events: section.
Also your deployment will contain some information about scaling
$ kubectl describe deploy <yourDeploymentName>
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 11m deployment-controller Scaled up replica set php-apache-b5f58cc5f to 1
Normal ScalingReplicaSet 9m45s deployment-controller Scaled up replica set php-apache-b5f58cc5f to 4
Normal ScalingReplicaSet 9m30s deployment-controller Scaled up replica set php-apache-b5f58cc5f to 8
Normal ScalingReplicaSet 9m15s deployment-controller Scaled up replica set php-apache-b5f58cc5f to 10
Another way is use events
$ kubectl get events | grep HorizontalPodAutoscaler
5m20s Normal SuccessfulRescale HorizontalPodAutoscaler New size: 4; reason: cpu resource utilization (percentage of request) above target
5m5s Normal SuccessfulRescale HorizontalPodAutoscaler New size: 8; reason: cpu resource utilization (percentage of request) above target
4m50s Normal SuccessfulRescale HorizontalPodAutoscaler New size: 10; reason: