My old windows pods are dead and don't respond to http requests / exec fails - kubernetes

I have an AKS cluster with a mix of Windows and Linux nodes and an nginx-ingress.
This all worked great, but a few days ago all my windows pods have become unresponsive.
Everything is still green on the K8s dashboard, but they don't respond to HTTP requests and kubectl exec fails.
All the linux pods still work.
I created a new deployment with the exact same image and other properties, and this new pod works, responds to HTTP and kubectl exec works.
Q: How can I find out why my old pods died? How can I prevent this from occuring again in the future?
Note that this is a test cluster, so I have the luxury of being able to investigate, if this was prod I would have burned and recreated the cluster already.
Details:
https://aks-test.progress-cloud.com/eboswebApi/ is one of the old pods, https://aks-test.progress-cloud.com/eboswebApi2/ is the new pod.
When I look at the nginx log, I see a lot of connect() failed (111: Connection refused) while connecting to upstream.
When I try kubectl exec -it <podname> --namespace <namespace> -- cmd I get one of two behaviors:
Either the command immediatly returns without printing anything, or I get an error:
container 1dfffa08d834953c29acb8839ea2d4c6b78b7a530371d98c16b15132d49f5c52 encountered an error during CreateProcess: failure in a Windows system call: The remote procedure call failed and did not execute. (0x6bf) extra info: {"CommandLine":"cmd","WorkingDirectory":"C:\\inetpub\\wwwroot","Environment":{...},"EmulateConsole":true,"CreateStdInPipe":true,"CreateStdOutPipe":true,"ConsoleSize":[0,0]}
command terminated with exit code 126
kubectl describe pod works on both.
The only difference I could find was that on the old pod, I don't get any events:
Events: <none>
whereas on the new pod I get a bunch of them for pulling the image etc:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 39m default-scheduler Successfully assigned ingress-basic/ebos-webapi-test-2-78786968f4-xmvfw to aksnpwin000000
Warning Failed 38m kubelet, aksnpwin000000 Error: failed to start container "ebos-webapi-test-2": Error response from daemon: hcsshim::CreateComputeSystem ebos-webapi-test-2: The binding handle is invalid.
(extra info: {"SystemType":"Container","Name":"ebos-webapi-test-2","Owner":"docker","VolumePath":"\\\\?\\Volume{dac026db-26ab-11ea-bb33-e3730ff9432d}","IgnoreFlushesDuringBoot":true,"LayerFolderPath":"C:\\ProgramData\\docker\\windowsfilter\\ebos-webapi-test-2","Layers":[{"ID":"8c160b6e-685a-58fc-8c4b-beb407ad09b4","Path":"C:\\ProgramData\\docker\\windowsfilter\\12061f29088664dc41c0836c911ed7ced1f6d7ed38b1c932c25cd8ca85a3a88e"},{"ID":"6a230a46-a97c-5e30-ac4a-636e62cd9253","Path":"C:\\ProgramData\\docker\\windowsfilter\\8c0ce5a9990bc433c4d937aa148a4251ef55c1aa7caccf1b2025fd64b4feee97"},{"ID":"240d5705-d8fe-555b-a966-1fc304552b64","Path":"C:\\ProgramData\\docker\\windowsfilter\\2b334b769fe19d0edbe1ad8d1ae464c8d0103a7225b0c9e30fdad52e4b454b35"},{"ID":"5f5d8837-5f62-5a76-a706-9afb789e45e4","Path":"C:\\ProgramData\\docker\\windowsfilter\\3d1767755b0897aaae21e3fb7b71e2d880de22473f0071b0dca6301bb6110077"},{"ID":"978503cb-b816-5f66-ba41-ed154db333d5","Path":"C:\\ProgramData\\docker\\windowsfilter\\53d2e85a90d2b8743b0502013355df5c5e75448858f0c1f5b435281750653520"},{"ID":"d7d0d14e-b097-5104-a492-da3f9396bb06","Path":"C:\\ProgramData\\docker\\windowsfilter\\38830351b46e7a0598daf62d914eb2bf01e6eefde7ac560e8213f118d2bd648c"},{"ID":"90b1c608-be4c-55a1-a787-db3a97670149","Path":"C:\\ProgramData\\docker\\windowsfilter\\84b71fda82ea0eacae7b9382eae2a26f3c71bf118f5c80e7556496f21e754126"},{"ID":"700711b2-d578-5d7c-a17f-14165a5b3507","Path":"C:\\ProgramData\\docker\\windowsfilter\\08dd6f93c96c1ac6acd3d2e8b60697340c90efe651f805809dbe87b6bd26a853"},{"ID":"270de12a-461c-5b0c-8976-a48ae0de2063","Path":"C:\\ProgramData\\docker\\windowsfilter\\115de87074fadbc3c44fc33813257c566753843f8f4dd7656faa111620f71f11"},{"ID":"521250bb-4f30-5ac4-8fcd-b4cf45866627","Path":"C:\\ProgramData\\docker\\windowsfilter\\291e51f5f030d2a895740fae3f61e1333b7fae50a060788040c8d926d46dbe1c"},{"ID":"6dded7bf-8c1e-53bb-920e-631e78728316","Path":"C:\\ProgramData\\docker\\windowsfilter\\938e721c29d2f2d23a00bf83e5bc60d92f9534da409d0417f479bd5f06faa080"},{"ID":"90dec4e9-89fe-56ce-a3c2-2770e6ec362c","Path":"C:\\ProgramData\\docker\\windowsfilter\\d723ebeafd1791f80949f62cfc91a532cc5ed40acfec8e0f236afdbcd00bbff2"},{"ID":"94ac6066-b6f3-5038-9e1b-d5982fcefa00","Path":"C:\\ProgramData\\docker\\windowsfilter\\00d1bb6fc8abb630f921d3651b1222352510d5821779d8a53d994173a4ba1126"},{"ID":"037c6d16-5785-5bea-bab4-bc3f69362e0c","Path":"C:\\ProgramData\\docker\\windowsfilter\\c107cf79e8805e9ce6d81ec2a798bf4f1e3b9c60836a40025272374f719f2270"}],"ProcessorWeight":5000,"HostName":"ebos-webapi-test-2-78786968f4-xmvfw","MappedDirectories":[{"HostPath":"c:\\var\\lib\\kubelet\\pods\\c44f445c-272b-11ea-b9bc-ae0ece5532e1\\volumes\\kubernetes.io~secret\\default-token-n5tnc","ContainerPath":"c:\\var\\run\\secrets\\kubernetes.io\\serviceaccount","ReadOnly":true,"BandwidthMaximum":0,"IOPSMaximum":0,"CreateInUtilityVM":false}],"HvPartition":false,"NetworkSharedContainerName":"4c9bede623553673fde0da6e8dc92f9a55de1ff823a168a35623ad8128f83ecb"})
Normal Pulling 38m (x2 over 38m) kubelet, aksnpwin000000 Pulling image "progress.azurecr.io/eboswebapi:release-2019-11-11_16-41"
Normal Pulled 38m (x2 over 38m) kubelet, aksnpwin000000 Successfully pulled image "progress.azurecr.io/eboswebapi:release-2019-11-11_16-41"
Normal Created 38m (x2 over 38m) kubelet, aksnpwin000000 Created container ebos-webapi-test-2
Normal Started 38m kubelet, aksnpwin000000 Started container ebos-webapi-test-2

Related

istio bookinfo sample application deployment failed for "ImagePullBackOff"

I am trying to deploy istio's sample bookinfo application using the below command:
kubectl apply -f samples/bookinfo/platform/kube/bookinfo.yaml
from here
but each time I am getting ImagePullBackoff error like this:
NAME READY STATUS RESTARTS AGE
details-v1-c74755ddf-m878f 2/2 Running 0 6m32s
productpage-v1-778ddd95c6-pdqsk 2/2 Running 0 6m32s
ratings-v1-5564969465-956bq 2/2 Running 0 6m32s
reviews-v1-56f6655686-j7lb6 1/2 ImagePullBackOff 0 6m32s
reviews-v2-6b977f8ff5-55tgm 1/2 ImagePullBackOff 0 6m32s
reviews-v3-776b979464-9v7x5 1/2 ImagePullBackOff 0 6m32s
For error details, I have run :
kubectl describe pod reviews-v1-56f6655686-j7lb6
Which returns these:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 7m41s default-scheduler Successfully assigned default/reviews-v1-56f6655686-j7lb6 to minikube
Normal Pulled 7m39s kubelet Container image "docker.io/istio/proxyv2:1.15.3" already present on machine
Normal Created 7m39s kubelet Created container istio-init
Normal Started 7m39s kubelet Started container istio-init
Warning Failed 5m39s kubelet Failed to pull image "docker.io/istio/examples-bookinfo-reviews-v1:1.17.0": rpc error: code = Unknown desc = context deadline exceeded
Warning Failed 5m39s kubelet Error: ErrImagePull
Normal Pulled 5m39s kubelet Container image "docker.io/istio/proxyv2:1.15.3" already present on machine
Normal Created 5m39s kubelet Created container istio-proxy
Normal Started 5m39s kubelet Started container istio-proxy
Normal BackOff 5m36s (x3 over 5m38s) kubelet Back-off pulling image "docker.io/istio/examples-bookinfo-reviews-v1:1.17.0"
Warning Failed 5m36s (x3 over 5m38s) kubelet Error: ImagePullBackOff
Normal Pulling 5m25s (x2 over 7m38s) kubelet Pulling image "docker.io/istio/examples-bookinfo-reviews-v1:1.17.0"
Do I need to build dockerfile first and push it to the local repository? There are no clear instructions there or I failed to find any.
Can anybody help?
If you check in dockerhub the image is there:
https://hub.docker.com/r/istio/examples-bookinfo-reviews-v1/tags
So the error that you need to deal with is context deadline exceeded while trying to pull it from dockerhub. This is likely a networking error (a generic Go error saying it took too long), depending on where your cluster is running you can do manually a docker pull from the nodes and that should work.
EDIT: for minikube do a minikube ssh and then a docker pull
Solved the problem by below command :
minikube ssh docker pull istio/examples-bookinfo-reviews-v1:1.17.0
from this git issues here
Also How to use local docker images with Minikube?
Hope this may help somebody

Kubernetes readiness probe warning has no details

I've made a deployment in GKE with a readiness probe. My container is coming up but it seems the readiness probe is having some difficulty. When I try to describe the pod I see that there are many probe warnings but it's not clear what the warning is.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 43m default-scheduler Successfully assigned default/tripvector-7996675758-7vjxf to gke-tripvector2-default-pool-78cf58d9-5dgs
Normal LoadBalancerNegNotReady 43m neg-readiness-reflector Waiting for pod to become healthy in at least one of the NEG(s): [k8s1-07274a01-default-tripvector-np-60000-a912870e]
Normal Pulling 43m kubelet Pulling image "us-west1-docker.pkg.dev/triptastic-1542412229773/tripvector/tripvector"
Normal Pulled 43m kubelet Successfully pulled image "us-west1-docker.pkg.dev/triptastic-1542412229773/tripvector/tripvector" in 888.583654ms
Normal Created 43m kubelet Created container tripvector
Normal Started 43m kubelet Started container tripvector
Normal LoadBalancerNegTimeout 32m neg-readiness-reflector Timeout waiting for pod to become healthy in at least one of the NEG(s): [k8s1-07274a01-default-tripvector-np-60000-a912870e]. Marking condition "cloud.google.com/load-balancer-neg-ready" to True.
Warning ProbeWarning 3m1s (x238 over 42m) kubelet Readiness probe warning:
I've tried examining events with kubectl get events but that also doesn't provide extra details on the probe warning:
❯❯❯ k get events
LAST SEEN TYPE REASON OBJECT MESSAGE
43m Normal LoadBalancerNegNotReady pod/tripvector-7996675758-7vjxf Waiting for pod to become healthy in at least one of the NEG(s): [k8s1-07274a01-default-tripvector-np-60000-a912870e]
43m Normal Scheduled pod/tripvector-7996675758-7vjxf Successfully assigned default/tripvector-7996675758-7vjxf to gke-tripvector2-default-pool-78cf58d9-5dgs
43m Normal Pulling pod/tripvector-7996675758-7vjxf Pulling image "us-west1-docker.pkg.dev/triptastic-1542412229773/tripvector/tripvector"
43m Normal Pulled pod/tripvector-7996675758-7vjxf Successfully pulled image "us-west1-docker.pkg.dev/triptastic-1542412229773/tripvector/tripvector" in 888.583654ms
43m Normal Created pod/tripvector-7996675758-7vjxf Created container tripvector
43m Normal Started pod/tripvector-7996675758-7vjxf Started container tripvector
3m38s Warning ProbeWarning pod/tripvector-7996675758-7vjxf Readiness probe warning:
How can I get more details on how/why this readiness probe is giving off warnings?
EDIT: the logs of the pod are mostly empty as well (klf is an alias I have to kubectl logs):
❯❯❯ klf tripvector-6f4d4c86c5-dn55c
(node:1) [DEP0131] DeprecationWarning: The legacy HTTP parser is deprecated.
{"line":"87","file":"percolate_synced-cron.js","message":"SyncedCron: Scheduled \"Refetch expired Google Places\" next run #Tue Mar 22 2022 17:47:53 GMT+0000 (Coordinated Universal Time)","time":{"$date":1647971273653},"level":"info"}
Regarding the error in the logs “DeprecationWarning: The legacy HTTP parser is deprecated.”, it is due to the legacy HTTP parser being deprecated with the pending End-of-Life of Node.js 10.x. It will now warn on use, but otherwise continue to function and may be removed in a future Node.js 12.x release. Use this URL for more reference Node v12.22.0 (LTS).
On the other hand, about the kubelet’s “ProbeWarning” reason in the warning events on your container, Health check (liveness & readiness) probes using an HTTPGetAction will no longer follow redirects to different host-names from the original probe request. Instead, these non-local redirects will be treated as a Success (the documented behavior). In this case an event with reason "ProbeWarning" will be generated, indicating that the redirect was ignored. If you were previously relying on the redirect to run health checks against different endpoints, you will need to perform the healthcheck logic outside the Kubelet, for instance by proxying the external endpoint rather than redirecting to it. You can verify the detailed root cause of this in the following Kubernetes 1.14 release notes. There is no way to see more detailed information about the “ProbeWarning” in the Events’ table. Use the following URLs as a reference too Kubernetes emitting ProbeWarning, Confusing/incomplete readiness probe warning and Add probe warning message body.

minikube intermittent ErrImageNeverPull

When I run minikube, I get ErrImageNeverPull intermittently. I am not sure why, so I ask. First of all, I set imagePullPolicy: Never to this (writes the internal image), and I verified that everything works fine. However, sometimes phpmyadmin is ErrImageNeverPull, wordpress is ErrImageNeverPull, and so on. The environment is running on a laptop Mac Catalina.
I don't know the exact reason, but what is the reason to infer?
kubectl logs wordpress-deployment-5545dcd6f5-h6mfx
Error from server (BadRequest): container "wordpress" in pod "wordpress-deployment-5545dcd6f5-h6mfx" is waiting to start: ErrImageNeverPull
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 9m14s (x2 over 9m14s) default-scheduler persistentvolumeclaim "wordpress-pv-claim" not found
Normal Scheduled 9m11s default-scheduler Successfully assigned default/wordpress-deployment-5545dcd6f5-h6mfx to minikube
Warning Failed 6m55s (x13 over 9m8s) kubelet, minikube Error: ErrImageNeverPull
Warning ErrImageNeverPull 4m8s (x26 over 9m8s) kubelet, minikube Container image "wordpress-dockerfile" is not present with pull policy of Never
Oh, of course, I also applied the following command.
# eval $(minikube docker-env)
eval $(minikube -p minikube docker-env)
Again, the shocking fact is that I have confirmed that all of these are working correctly and it happens intermittently.
I fixed the problem just before. The reason is that I ran it on a personal laptop, but the number of Pods created was probably so the laptop couldn't stand. When I ran it to the actual desktop, all 10 out of 10 ran fine without any errors. In actual minikube start, I did not give a separate cpu or memory option, but it seems that the cause of the error was that the total usage was not considered.

More detailed monitoring of Pod states

Our Pods usually spend at least a minute and up to several minutes in the Pending state, the events via kubectl describe pod x yield:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned testing/runner-2zyekyp-project-47-concurrent-0tqwl4 to host
Normal Pulled 55s kubelet, host Container image "registry.com/image:c1d98da0c17f9b1d4ca81713c138ee2e" already present on machine
Normal Created 55s kubelet, host Created container build
Normal Started 54s kubelet, host Started container build
Normal Pulled 54s kubelet, host Container image "gitlab/gitlab-runner-helper:x86_64-6214287e" already present on machine
Normal Created 54s kubelet, host Created container helper
Normal Started 54s kubelet, host Started container helper
The provided information is not exactly detailed as to figure out exactly what is happening.
Question:
How can we gather more detailed metrics of what exactly and when exactly something happens in regards to get a Pod running in order to troubleshoot which step exactly needs how much time?
Special interest would be the metric of how long it takes to mount a volume.
Check kubelet and kube scheduler logs because kube scheduler schedules the pod to a node and kubelet starts the pod on that node and reports the status as ready.
journalctl -u kubelet # after logging into the kubernetes node
kubectl logs kube-scheduler -n kube-system
Describe the pod, deployment, replicaset to get more details
kubectl describe pod podnanme -n namespacename
kubectl describe deploy deploymentnanme -n namespacename
kubectl describe rs replicasetnanme -n namespacename
Check events
kubectl get events -n namespacename
Describe the nodes and check available resources and status which should be ready.
kubectl describe node nodename

Kubernetes pod deployment error FailedSync| Error syncing pod

Env:
Vbox on a windows 10 desktop machine
Two ubuntu VMs, one VM is master and the other one is k8s(1.7) worker.
I can see two nodes are "ready" when get nodes. But even deploy a very simple nginx pod, I got the error message from pod describe
"norm | SandboxChanged |Pod sandbox changed, it will be killed and re-created." and "warning | FailedSync| Error syncing pod".
But if I run the docker container directly on the worker, the container can be up and running. Anyone has a suggestion what I can check for?
k8s-master#k8smaster-VirtualBox:~$ **kubectl get pods** NAME
READY STATUS RESTARTS AGE
movie-server-1517284798-lbb01 0/1 CrashLoopBackOff 6
16m
k8s-master#k8smaster-VirtualBox:~$ **kubectl describe pod
movie-server-1517284798-lbb01**
--- clip --- kubelet, master-virtualbox spec.containers{movie-server} Warning FailedError: failed to start
container "movie-server": Error response from daemon:
{"message":"cannot join network of a non running container:
3f59947dbd404ecf2f6dd0b65dd9dad8b25bf0c418aceb8cf666ad0761402b53"}
kubelet, master-virtualbox spec.containers{movie-server}
Warning BackOffBack-off restarting failed container
kubelet, master-virtualbox Normal
SandboxChanged Pod sandbox changed, it will be killed and
re-created.
kubelet, master-virtualbox spec.containers{movie-server} Normal
PulledContainer image "nancyfeng/movie-server:0.1.0" already present
on machine
kubelet, master-virtualbox spec.containers{movie-server}
Normal CreatedCreated container
kubelet, master-virtualbox
Warning FailedSync Error syncing pod
kubelet, master-virtualbox spec.containers{movie-server}
Warning FailedError: failed to start container "movie-server": Error
response from daemon: {"message":"cannot join network of a non running
container:
72ba77b25b6a3969e8921214f0ca73ffaab4c82d8a2852e3d1b1f3ac5dde6ce1"}
--- clip ---