Ingress-nginx is in CrashLoopBackOff after K8s upgrade - kubernetes

After upgrading Kubernetes node pool from 1.21 to 1.22, ingress-nginx-controller pods started crashing. The same deployment has been working fine in EKS. I'm just having this issue in GKE. Does anyone have any ideas about the root cause?
$ kubectl logs ingress-nginx-controller-5744fc449d-8t2rq -c controller
-------------------------------------------------------------------------------
NGINX Ingress controller
Release: v1.3.1
Build: 92534fa2ae799b502882c8684db13a25cde68155
Repository: https://github.com/kubernetes/ingress-nginx
nginx version: nginx/1.19.10
-------------------------------------------------------------------------------
W0219 21:23:08.194770 8 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0219 21:23:08.194995 8 main.go:209] "Creating API client" host="https://10.1.48.1:443"
Ingress pod events:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 27m default-scheduler Successfully assigned infra/ingress-nginx-controller-5744fc449d-8t2rq to gke-infra-nodep-ffe54a41-s7qx
Normal Pulling 27m kubelet Pulling image "registry.k8s.io/ingress-nginx/controller:v1.3.1#sha256:54f7fe2c6c5a9db9a0ebf1131797109bb7a4d91f56b9b362bde2abd237dd1974"
Normal Started 27m kubelet Started container controller
Normal Pulled 27m kubelet Successfully pulled image "registry.k8s.io/ingress-nginx/controller:v1.3.1#sha256:54f7fe2c6c5a9db9a0ebf1131797109bb7a4d91f56b9b362bde2abd237dd1974" in 6.443361484s
Warning Unhealthy 26m (x6 over 26m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 502
Normal Killing 26m kubelet Container controller failed liveness probe, will be restarted
Normal Created 26m (x2 over 27m) kubelet Created container controller
Warning FailedPreStopHook 26m kubelet Exec lifecycle hook ([/wait-shutdown]) for Container "controller" in Pod "ingress-nginx-controller-5744fc449d-8t2rq_infra(c4c166ff-1d86-4385-a22c-227084d569d6)" failed - error: command '/wait-shutdown' exited with 137: , message: ""
Normal Pulled 26m kubelet Container image "registry.k8s.io/ingress-nginx/controller:v1.3.1#sha256:54f7fe2c6c5a9db9a0ebf1131797109bb7a4d91f56b9b362bde2abd237dd1974" already present on machine
Warning BackOff 7m7s (x52 over 21m) kubelet Back-off restarting failed container
Warning Unhealthy 2m9s (x55 over 26m) kubelet Liveness probe failed: HTTP probe failed with statuscode: 502

The Beta API versions (extensions/v1beta1 and networking.k8s.io/v1beta1) of Ingress are no longer served (removed) for GKE clusters created on versions 1.22 and later. Please refer to the official GKE ingress documentation for changes in the GA API version.
Also refer to Official Kubernetes documentation for API removals for Kubernetes v1.22 for more information.
Before upgrading your Ingress API as a client, make sure that every ingress controller that you use is compatible with the v1 Ingress API. See Ingress Prerequisites for more context about Ingress and ingress controllers.
Also check below possible causes for Crashloopbackoff :
Increasing the initialDelaySeconds value for the livenessProbe setting may help to alleviate the issue, as it will give the container more time to start up and perform its initial work operations before the liveness probe server checks its health.
Check “Container restart policy”, the spec of a Pod has a restartPolicy field with possible values Always, OnFailure, and Never. The default value is Always.
Out of memory or resources : Try to increase the VM size. Containers may crash due to memory limits, then new ones spun up, the health check failed and Ingress served up 502.
Check externalTrafficPolicy=Local is set on the NodePort service will prevent nodes from forwarding traffic to other nodes.
Refer to the Github issue Document how to avoid 502s #34 for more information.

Related

K8s cluster deployment error: nc: bad address 'xx'

host#host:~$ kubectl logs kafka-0 -c init-zookeeper
nc: bad address 'zookeeper-0.zookeeper-headless-service.default.svc.cluster.local'
I have deployed an k8s cluster. When the application pod was installed, the pod keep in the Init state. I try to find out where goes wrong, only get this error below.
pml#pml:~/bfn-mon/k8s$ kubectl get pods
NAME READY STATUS RESTARTS AGE
broker-59f66ff494-lwtxq 0/1 Init:0/2 0 41m
coordinator-9998c64b8-ql7xz 0/1 Init:0/2 0 41m
kafka-0 0/1 Init:0/1 0 41m
host#host:~$ kubectl logs kafka-0 -c init-zookeeper
nc: bad address 'zookeeper-0.zookeeper-headless-service.default.svc.cluster.local'
Would someone can tell what's going wrong? How can I fix it?
I would expect someone who did have the same problem, or know what's going wrong, and give some debug instructions.
During Pod startup, the kubelet delays running init containers until the networking and storage are ready. Then the kubelet runs the Pod's init containers in the order they appear in the Pod's spec.
For pods stuck in an init state with a bad address, It means the PVC may not be recycled correctly so the storage is not ready so the pod will be init state until it gets cleared.
From this link, you can follow below solutions:
Check if PVs are created and bound to all expected PVCs.
Run /opt/kubernetes/bin/kube-restart.sh to restart the cluster.

Kubernetes readiness probe warning has no details

I've made a deployment in GKE with a readiness probe. My container is coming up but it seems the readiness probe is having some difficulty. When I try to describe the pod I see that there are many probe warnings but it's not clear what the warning is.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 43m default-scheduler Successfully assigned default/tripvector-7996675758-7vjxf to gke-tripvector2-default-pool-78cf58d9-5dgs
Normal LoadBalancerNegNotReady 43m neg-readiness-reflector Waiting for pod to become healthy in at least one of the NEG(s): [k8s1-07274a01-default-tripvector-np-60000-a912870e]
Normal Pulling 43m kubelet Pulling image "us-west1-docker.pkg.dev/triptastic-1542412229773/tripvector/tripvector"
Normal Pulled 43m kubelet Successfully pulled image "us-west1-docker.pkg.dev/triptastic-1542412229773/tripvector/tripvector" in 888.583654ms
Normal Created 43m kubelet Created container tripvector
Normal Started 43m kubelet Started container tripvector
Normal LoadBalancerNegTimeout 32m neg-readiness-reflector Timeout waiting for pod to become healthy in at least one of the NEG(s): [k8s1-07274a01-default-tripvector-np-60000-a912870e]. Marking condition "cloud.google.com/load-balancer-neg-ready" to True.
Warning ProbeWarning 3m1s (x238 over 42m) kubelet Readiness probe warning:
I've tried examining events with kubectl get events but that also doesn't provide extra details on the probe warning:
❯❯❯ k get events
LAST SEEN TYPE REASON OBJECT MESSAGE
43m Normal LoadBalancerNegNotReady pod/tripvector-7996675758-7vjxf Waiting for pod to become healthy in at least one of the NEG(s): [k8s1-07274a01-default-tripvector-np-60000-a912870e]
43m Normal Scheduled pod/tripvector-7996675758-7vjxf Successfully assigned default/tripvector-7996675758-7vjxf to gke-tripvector2-default-pool-78cf58d9-5dgs
43m Normal Pulling pod/tripvector-7996675758-7vjxf Pulling image "us-west1-docker.pkg.dev/triptastic-1542412229773/tripvector/tripvector"
43m Normal Pulled pod/tripvector-7996675758-7vjxf Successfully pulled image "us-west1-docker.pkg.dev/triptastic-1542412229773/tripvector/tripvector" in 888.583654ms
43m Normal Created pod/tripvector-7996675758-7vjxf Created container tripvector
43m Normal Started pod/tripvector-7996675758-7vjxf Started container tripvector
3m38s Warning ProbeWarning pod/tripvector-7996675758-7vjxf Readiness probe warning:
How can I get more details on how/why this readiness probe is giving off warnings?
EDIT: the logs of the pod are mostly empty as well (klf is an alias I have to kubectl logs):
❯❯❯ klf tripvector-6f4d4c86c5-dn55c
(node:1) [DEP0131] DeprecationWarning: The legacy HTTP parser is deprecated.
{"line":"87","file":"percolate_synced-cron.js","message":"SyncedCron: Scheduled \"Refetch expired Google Places\" next run #Tue Mar 22 2022 17:47:53 GMT+0000 (Coordinated Universal Time)","time":{"$date":1647971273653},"level":"info"}
Regarding the error in the logs “DeprecationWarning: The legacy HTTP parser is deprecated.”, it is due to the legacy HTTP parser being deprecated with the pending End-of-Life of Node.js 10.x. It will now warn on use, but otherwise continue to function and may be removed in a future Node.js 12.x release. Use this URL for more reference Node v12.22.0 (LTS).
On the other hand, about the kubelet’s “ProbeWarning” reason in the warning events on your container, Health check (liveness & readiness) probes using an HTTPGetAction will no longer follow redirects to different host-names from the original probe request. Instead, these non-local redirects will be treated as a Success (the documented behavior). In this case an event with reason "ProbeWarning" will be generated, indicating that the redirect was ignored. If you were previously relying on the redirect to run health checks against different endpoints, you will need to perform the healthcheck logic outside the Kubelet, for instance by proxying the external endpoint rather than redirecting to it. You can verify the detailed root cause of this in the following Kubernetes 1.14 release notes. There is no way to see more detailed information about the “ProbeWarning” in the Events’ table. Use the following URLs as a reference too Kubernetes emitting ProbeWarning, Confusing/incomplete readiness probe warning and Add probe warning message body.

Error while starting POD in a newly created kubernetes cluster (ContainerCreating)

I am new to Kubernetes. I have created a Kubernetes cluster with one Master node and 2 worker nodes. I have installer helm for the deployment of apps. I am getting the following error while starting the tiller pod
tiller-deploy-5b4685ffbf-znbdc 0/1 ContainerCreating 0 23h
After describing the pod I got the following result
[root#master-node flannel]# kubectl --namespace kube-system describe
pod tiller-deploy-5b4685ffbf-znbdc
Events:
Type Reason Age From Message
Warning FailedCreatePodSandBox 10m (x34020 over 22h) kubelet,
worker-node1 (combined from similar events): Failed to create pod
sandbox: rpc error: code = Unknown desc = failed to set up sandbox
container
"cdda0a8ae9200668a2256e8c7b41904dce604f73f0282b0443d972f5e2846059"
network for pod "tiller-deploy-5b4685ffbf-znbdc": networkPlugin cni
failed to set up pod "tiller-deploy-5b4685ffbf-znbdc_kube-system"
network: open /run/flannel/subnet.env: no such file or directory
Normal SandboxChanged 25s (x34556 over 22h) kubelet, worker-node1 Pod
sandbox changed, it will be killed and re-created.
Any hint of how can I get away with this error.
You need to setup a CNI plugin such as Flannel. Verify if all the pods in kube-system namespace are running.
To apply flannel in you cluster run the following command:
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml
For flannel to work correctly pod-network-cidr should be 10.244.0.0/16 or if you have a different CIDR, you can customize flannel manifest (kube-flannel.yml) according to your needs.
Example:
net-conf.json: |
{
"Network": "10.10.0.0/16",
"Backend": {
"Type": "vxlan"
}

My old windows pods are dead and don't respond to http requests / exec fails

I have an AKS cluster with a mix of Windows and Linux nodes and an nginx-ingress.
This all worked great, but a few days ago all my windows pods have become unresponsive.
Everything is still green on the K8s dashboard, but they don't respond to HTTP requests and kubectl exec fails.
All the linux pods still work.
I created a new deployment with the exact same image and other properties, and this new pod works, responds to HTTP and kubectl exec works.
Q: How can I find out why my old pods died? How can I prevent this from occuring again in the future?
Note that this is a test cluster, so I have the luxury of being able to investigate, if this was prod I would have burned and recreated the cluster already.
Details:
https://aks-test.progress-cloud.com/eboswebApi/ is one of the old pods, https://aks-test.progress-cloud.com/eboswebApi2/ is the new pod.
When I look at the nginx log, I see a lot of connect() failed (111: Connection refused) while connecting to upstream.
When I try kubectl exec -it <podname> --namespace <namespace> -- cmd I get one of two behaviors:
Either the command immediatly returns without printing anything, or I get an error:
container 1dfffa08d834953c29acb8839ea2d4c6b78b7a530371d98c16b15132d49f5c52 encountered an error during CreateProcess: failure in a Windows system call: The remote procedure call failed and did not execute. (0x6bf) extra info: {"CommandLine":"cmd","WorkingDirectory":"C:\\inetpub\\wwwroot","Environment":{...},"EmulateConsole":true,"CreateStdInPipe":true,"CreateStdOutPipe":true,"ConsoleSize":[0,0]}
command terminated with exit code 126
kubectl describe pod works on both.
The only difference I could find was that on the old pod, I don't get any events:
Events: <none>
whereas on the new pod I get a bunch of them for pulling the image etc:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 39m default-scheduler Successfully assigned ingress-basic/ebos-webapi-test-2-78786968f4-xmvfw to aksnpwin000000
Warning Failed 38m kubelet, aksnpwin000000 Error: failed to start container "ebos-webapi-test-2": Error response from daemon: hcsshim::CreateComputeSystem ebos-webapi-test-2: The binding handle is invalid.
(extra info: {"SystemType":"Container","Name":"ebos-webapi-test-2","Owner":"docker","VolumePath":"\\\\?\\Volume{dac026db-26ab-11ea-bb33-e3730ff9432d}","IgnoreFlushesDuringBoot":true,"LayerFolderPath":"C:\\ProgramData\\docker\\windowsfilter\\ebos-webapi-test-2","Layers":[{"ID":"8c160b6e-685a-58fc-8c4b-beb407ad09b4","Path":"C:\\ProgramData\\docker\\windowsfilter\\12061f29088664dc41c0836c911ed7ced1f6d7ed38b1c932c25cd8ca85a3a88e"},{"ID":"6a230a46-a97c-5e30-ac4a-636e62cd9253","Path":"C:\\ProgramData\\docker\\windowsfilter\\8c0ce5a9990bc433c4d937aa148a4251ef55c1aa7caccf1b2025fd64b4feee97"},{"ID":"240d5705-d8fe-555b-a966-1fc304552b64","Path":"C:\\ProgramData\\docker\\windowsfilter\\2b334b769fe19d0edbe1ad8d1ae464c8d0103a7225b0c9e30fdad52e4b454b35"},{"ID":"5f5d8837-5f62-5a76-a706-9afb789e45e4","Path":"C:\\ProgramData\\docker\\windowsfilter\\3d1767755b0897aaae21e3fb7b71e2d880de22473f0071b0dca6301bb6110077"},{"ID":"978503cb-b816-5f66-ba41-ed154db333d5","Path":"C:\\ProgramData\\docker\\windowsfilter\\53d2e85a90d2b8743b0502013355df5c5e75448858f0c1f5b435281750653520"},{"ID":"d7d0d14e-b097-5104-a492-da3f9396bb06","Path":"C:\\ProgramData\\docker\\windowsfilter\\38830351b46e7a0598daf62d914eb2bf01e6eefde7ac560e8213f118d2bd648c"},{"ID":"90b1c608-be4c-55a1-a787-db3a97670149","Path":"C:\\ProgramData\\docker\\windowsfilter\\84b71fda82ea0eacae7b9382eae2a26f3c71bf118f5c80e7556496f21e754126"},{"ID":"700711b2-d578-5d7c-a17f-14165a5b3507","Path":"C:\\ProgramData\\docker\\windowsfilter\\08dd6f93c96c1ac6acd3d2e8b60697340c90efe651f805809dbe87b6bd26a853"},{"ID":"270de12a-461c-5b0c-8976-a48ae0de2063","Path":"C:\\ProgramData\\docker\\windowsfilter\\115de87074fadbc3c44fc33813257c566753843f8f4dd7656faa111620f71f11"},{"ID":"521250bb-4f30-5ac4-8fcd-b4cf45866627","Path":"C:\\ProgramData\\docker\\windowsfilter\\291e51f5f030d2a895740fae3f61e1333b7fae50a060788040c8d926d46dbe1c"},{"ID":"6dded7bf-8c1e-53bb-920e-631e78728316","Path":"C:\\ProgramData\\docker\\windowsfilter\\938e721c29d2f2d23a00bf83e5bc60d92f9534da409d0417f479bd5f06faa080"},{"ID":"90dec4e9-89fe-56ce-a3c2-2770e6ec362c","Path":"C:\\ProgramData\\docker\\windowsfilter\\d723ebeafd1791f80949f62cfc91a532cc5ed40acfec8e0f236afdbcd00bbff2"},{"ID":"94ac6066-b6f3-5038-9e1b-d5982fcefa00","Path":"C:\\ProgramData\\docker\\windowsfilter\\00d1bb6fc8abb630f921d3651b1222352510d5821779d8a53d994173a4ba1126"},{"ID":"037c6d16-5785-5bea-bab4-bc3f69362e0c","Path":"C:\\ProgramData\\docker\\windowsfilter\\c107cf79e8805e9ce6d81ec2a798bf4f1e3b9c60836a40025272374f719f2270"}],"ProcessorWeight":5000,"HostName":"ebos-webapi-test-2-78786968f4-xmvfw","MappedDirectories":[{"HostPath":"c:\\var\\lib\\kubelet\\pods\\c44f445c-272b-11ea-b9bc-ae0ece5532e1\\volumes\\kubernetes.io~secret\\default-token-n5tnc","ContainerPath":"c:\\var\\run\\secrets\\kubernetes.io\\serviceaccount","ReadOnly":true,"BandwidthMaximum":0,"IOPSMaximum":0,"CreateInUtilityVM":false}],"HvPartition":false,"NetworkSharedContainerName":"4c9bede623553673fde0da6e8dc92f9a55de1ff823a168a35623ad8128f83ecb"})
Normal Pulling 38m (x2 over 38m) kubelet, aksnpwin000000 Pulling image "progress.azurecr.io/eboswebapi:release-2019-11-11_16-41"
Normal Pulled 38m (x2 over 38m) kubelet, aksnpwin000000 Successfully pulled image "progress.azurecr.io/eboswebapi:release-2019-11-11_16-41"
Normal Created 38m (x2 over 38m) kubelet, aksnpwin000000 Created container ebos-webapi-test-2
Normal Started 38m kubelet, aksnpwin000000 Started container ebos-webapi-test-2

istio-pilot on minikube is always in pending state

istio-pilot pod on minikube kubernetes cluster is always in Pending state. Increased CPU=4 and memory=8GB. Still the status of istio-pilot pod is Pending.
Is specific change required to run istio on minikube other than the ones mentioned in documentation?
Resolved the issue . Im running minikube with Virtual box and running minikube with higher memory and CPU does not reflect until minikube is deleted and started with new parameters. Without this it was resulting in Insufficient memory.
I saw istio-pilot in 1.1 rc3 consume a lot of CPU and was in Pending state due to the following message in kubectl describe <istio-pilot pod name> -n=istio-system:
Warning FailedScheduling 1m (x25 over 3m) default-scheduler 0/2 nodes are available:
1 Insufficient cpu, 1 node(s) had taints that the pod didn't tolerate.
I was able to reduce it by doing --set pilot.resources.requests.cpu=30m when installing istio using helm.
https://github.com/istio/istio/blob/1.1.0-rc.3/install/kubernetes/helm/istio/charts/pilot/values.yaml#L16