I discovered a strange behavior with K8s networking that can break some applications designs completely.
I have two pods and one Service
Pod 1 is a stupid Reverse Proxy (I don't know the implementation)
Pod 2 is a Webserver
The mentioned Service belongs to pod 2, the webserver
After the initial start of my stack I discovered that Pod 1 - the Reverse Proxy is not able to reach the webserver on the first attempt for some reason, ping is working fine and curl also.
Now I tried wget mywebserver inside of Pod 1 - Reverse Proxy and got back the following:
wget mywebserver
--2020-11-16 20:07:37-- http://mywebserver/
Resolving mywebserver (mywebserver)... 10.244.0.34, 10.244.0.152, 10.244.1.125, ...
Connecting to mywebserver (mywebserver)|10.244.0.34|:80... failed: Connection refused.
Connecting to mywebserver (mywebserver)|10.244.0.152|:80... failed: Connection refused.
Connecting to mywebserver (mywebserver)|10.244.1.125|:80... failed: Connection refused.
Connecting to mywebserver (mywebserver)|10.244.2.177|:80... connected.
Where 10.244.2.177 is the Pod IP of the Webserver.
The problem to me it seems is that the Reverse-Proxy does not try to trigger the attempt to forward the package twice, instead it only tries once where it fails like in the wget cmd above and the request gets dropped as the backed is not reachable due to fancy K8s IPtables stuff it seems...
If I configure the reverse-proxy not to use the Service DNS-name for load-off and instead use the Pod IP (10.244.2.177) everything is working fine and as expected.
I already tried this with a variety of CNI Providers like: Flannel, Calico, Canal, Weave and also Cilium as Kube-Proxy is not used with Cilium but all of them failed and all of them doing fancy routing nobody clearly understands out-of-the-box. So my question is how can I make K8s routing work immediately at this point? I already have reimplemented my whole stack to docker-swarm just to see if it works, and it does, flawlessly! So this issue has to do something with K8s routing scheme it seems.
Just to exclude misconfiguration from my side I also tried this with different ready-to-use K8s solutions like managed K8s from Digital-Ocean and or self-hosted RKE. All have the same behavior.
Does somebody maybe have a Idea what the problem might be and how to fix this behavior of K8s?
I might also be very useful to know what actually happens at the wget request, as this remains a mystery to me.
Many thanks in advance!
It turned out that I had several misconfigurations at my K8s Deployment.
I first removed ClusterIP: None as this leads to the behavior wget shows above at my question. Beside I've set app: and tier: wrong at my deployment. Anyways now everything is working fine and wget has a proper connection.
Thanks again
Related
We're having a medium sized Kubernetes cluster. So imagine a situation where approximately 70 pods are being connecting to a one socket server. It works fine most of the time, however, from time to time one or two pods just fail to resolve k8s DNS, and it times out with the following error:
Error: dial tcp: lookup thishost.production.svc.cluster.local on 10.32.0.10:53: read udp 100.65.63.202:36638->100.64.209.61:53: i/o timeout at
What we noticed is that this is not the only service that's failing intermittently. Other services experience that from time to time. We used to ignore it, since it was very random and rate, however in the above case that is very noticeable. The only solution is to actually kill the faulty pod. (Restarting doesn't help)
Has anyone experienced this? Do you have any tips on how to debug it/ fix?
It almost feels as if it's beyond our expertise and is fully related to the internals of the DNS resolver.
Kubernetes version: 1.23.4
Container Network: cilium
this issue most probably will be related to the CNI.
I would suggest following the link to debug the issue:
https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/
and to be able to help you we need more information:
is this cluster on-premise or cloud?
what are you using for CNI?
how many nodes are running and are they all in the same subnet? if yes, dose they have other interfaces?
share the below command result.
kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o wide
when you restart the pod to solve the issue temp does it stay on the same node or does it change?
I have a cluster with 4 nodes (3 raspi, 1 NUC) and have setup several different workloads.
The cluster itself worked perfectly fine, so I doubt that it is a general problem with the configuration.
After a reboot of all nodes the cluster came back up well and all pods are running without issues.
Unfortunately, pods that are running on one of my nodes (NUC) are not reachable via ingress anymore.
If I access them through kube-proxy, I can see that the pods itself run fine and the http services behave as exptected.
I upgraded the NUC node from Ubuntu 20.10 from 21.04, which may be related to the issues, but is not confirmed.
When the same pods are scheduled to the other nodes everything works as expected.
For pods on the NUC node, I see the following in the ingress-controller logs:
2021/08/09 09:17:28 [error] 1497#1497: *1027899 upstream timed out (110: Operation timed out) while connecting to upstream, client: 10.244.1.1, server: gitea.fritz.box, request: "GET / HTTP/2.0", upstream: "http://10.244.3.50:3000/", host: "gitea.fritz.box"
I can only assume that the problem is related to the cluster internal network and have compared iptables rules and the like, but have not found differences that seem relevant.
The NUC node is running on Ubuntu 21.04 with kube v1.21.1, the raspis run Ubuntu 20.04.2 LTS. The master node still runs v1.21.1 the two worker nodes already run v.1.22.0, which works fine.
I have found a thread that points out incompatibility between metallb and nftables (https://github.com/metallb/metallb/issues/451) and though it's a bit older, I already changed to xtables as suggested (update-alternatives --set iptables /usr/sbin/iptables-legacy ...) without success.
Currently I'm running out of ideas on where to look.
Can anyone suggest possible issues?
Thanks in advance!
Updating flannel from 13.1-rc2 to 14.0 seems to have done the trick.
Maybe some of the iptables rules were screwed and got revreated, maybe 14.0 is necessary to work with 21.04? Who knows...
I'm back up running fine and happy :)
Updated
I have some script that initializes our service.
The script fails when it runs in the container because of connection refused error in the first outbound request (to external service) in the script.
We tried to add a loop that makes curl and if it fails, re-try, if not - continuous the script.
Sometimes it succeeds for the first time, sometimes it fails 10-15 times in a row.
We recently started using istio
What may be a reason??
It is a famous istio bug https://github.com/istio/istio/issues/11130 ( App container unable to connect to network before Istio's sidecar is fully running) it seems the Istio proxy will not start in parallel , it is waiting for the app container to be ready. a sequential startup sequence as one blogger mentioned (https://medium.com/#marko.luksa/delaying-application-start-until-sidecar-is-ready-2ec2d21a7b74) quote: most Kubernetes users assume that after a pod’s init containers have finished, the pod’s regular containers are started in parallel. It turns out that’s not the case.
containers will start in order defined by the Deployment spec YAML.
so the biggest question is will the Istio proxy envoy will start while the first container is stuck in a curl-loop . (egg and chicken problem) .
App container script performs:
until curl --head localhost:15000 ; do echo "Waiting for Istio Proxy to start" ; sleep 3 ; done
as far as I saw: that script doesn't help a bit. proxy is up but connection to external hostname return "connection refused"
With istio 1.7 comes a new feature that configures the pod to start the sidecar first and hold every other container untill the sidecar is started.
Just set values.proxy.holdApplicationUntilProxyStarts to true.
Please note that the feature is still experimental.
Every request made to my my kubernetes node results in a Ingress Service Unavailable (503) response.
What are some different steps I should take to troubleshoot this issue?
So if you are asking for ingress debugging steps, mine usually go along the lines of:
Check if Service is available internally, this could be done by running a busybox container internally and just running curl commands against the endpoint
Make sure that ingress selectors match the service that you have specified
Make sure that Pods is up and running (log the pod etc).
Make sure that ingress controller is not throwing errors (log the ingress controller)
It is a bit of a vague question as you could possibly have a host of issues wrong. I would say give us more info and we could better help understand your problem (i.e show use the yaml you use to configure the ingress)
I have a Kubernetes cluster on AWS, set up with kops.
I set up a Deployment that runs an Apache container and a Service for the Deployment (type: LoadBalancer).
When I update the deployment by running kubectl set image ..., as soon as the first pod of the new ReplicaSet becomes ready, the first couple of requests to the service time out.
Things I have tried:
I set up a readinessProbe on the pod, works.
I ran curl localhost on a pod, works.
I performed a DNS lookup for the service, works.
If I curl the IP returned by that DNS lookup inside a pod, the first request will timeout. This tells me it's not an ELB issue.
It's really frustrating since otherwise our Kubernetes stack is working great, but every time we deploy our application we run the risk of a user timing out on a request.
After a lot of debugging, I think I've solved this issue.
TL;DR; Apache has to exit gracefully.
I found a couple of related issues:
https://github.com/kubernetes/kubernetes/issues/47725
https://github.com/kubernetes/ingress-nginx/issues/69
504 Gateway Timeout - Two EC2 instances with load balancer
Some more things I tried:
Increase the KeepAliveTimeout on Apache, didn't help.
Ran curl on the pod IP and node IPs, worked normally.
Set up an externalName selector-less service for a couple of external dependencies, thinking it might have something to do with DNS lookups, didn't help.
The solution:
I set up a preStop lifecycle hook on the pod to gracefully terminate Apache to run apachectl -k graceful-stop
The issue (at least from what I can tell), is that when pods are taken down on a deployment, they receive a TERM signal, which causes apache to immediately kill all of its children. This might cause a race condition where kube-proxy still sends some traffic to pods that have received a TERM signal but not terminated completely.
Also got some help from this blog post on how to set up the hook.
I also recommend increasing the terminationGracePeriodSeconds in the PodSpec so apache has enough time to exit gracefully.