Intermittent failure of K8S DNS resolver / dial udp / Operation cancelled - kubernetes

We're having a medium sized Kubernetes cluster. So imagine a situation where approximately 70 pods are being connecting to a one socket server. It works fine most of the time, however, from time to time one or two pods just fail to resolve k8s DNS, and it times out with the following error:
Error: dial tcp: lookup thishost.production.svc.cluster.local on 10.32.0.10:53: read udp 100.65.63.202:36638->100.64.209.61:53: i/o timeout at
What we noticed is that this is not the only service that's failing intermittently. Other services experience that from time to time. We used to ignore it, since it was very random and rate, however in the above case that is very noticeable. The only solution is to actually kill the faulty pod. (Restarting doesn't help)
Has anyone experienced this? Do you have any tips on how to debug it/ fix?
It almost feels as if it's beyond our expertise and is fully related to the internals of the DNS resolver.
Kubernetes version: 1.23.4
Container Network: cilium

this issue most probably will be related to the CNI.
I would suggest following the link to debug the issue:
https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/
and to be able to help you we need more information:
is this cluster on-premise or cloud?
what are you using for CNI?
how many nodes are running and are they all in the same subnet? if yes, dose they have other interfaces?
share the below command result.
kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o wide
when you restart the pod to solve the issue temp does it stay on the same node or does it change?

Related

(Kubernetes HPA) Scaling up pods got connection refused

I have some amount of traffic that can boost the cpu usage up to 180%. I tried using a single pod which works fine but the response was extremely slow. When I configured my HPA to cpu=80%, min=1 and max={2 or more} I hit connection refused when HPA was creating more pods. I tried put a large value to min (ie. min = 3) the connection refused relief but there will be too many idle pods when traffic is low. Is there any way to stop putting pod online until it is completely started?
I hit connection refused when HPA was creating more pods
Kubernetes uses the readinessProbe, to determine whether to redirect clients to some pods. If the readinessProbe for a Pod is not successful, then Service whose selectors could have matched that Pod would not take it under consideration.
If there is no readinessProbe defined, or if it was misconfigured, Pods that are still starting up may end up serving client requests. Connection refused could suggest there was no process listening yet for incoming connections.
Please share your deployment/statefulset/..., if you need further assistance setting this up.

K8s does strange networking stuff that breaks application designs

I discovered a strange behavior with K8s networking that can break some applications designs completely.
I have two pods and one Service
Pod 1 is a stupid Reverse Proxy (I don't know the implementation)
Pod 2 is a Webserver
The mentioned Service belongs to pod 2, the webserver
After the initial start of my stack I discovered that Pod 1 - the Reverse Proxy is not able to reach the webserver on the first attempt for some reason, ping is working fine and curl also.
Now I tried wget mywebserver inside of Pod 1 - Reverse Proxy and got back the following:
wget mywebserver
--2020-11-16 20:07:37-- http://mywebserver/
Resolving mywebserver (mywebserver)... 10.244.0.34, 10.244.0.152, 10.244.1.125, ...
Connecting to mywebserver (mywebserver)|10.244.0.34|:80... failed: Connection refused.
Connecting to mywebserver (mywebserver)|10.244.0.152|:80... failed: Connection refused.
Connecting to mywebserver (mywebserver)|10.244.1.125|:80... failed: Connection refused.
Connecting to mywebserver (mywebserver)|10.244.2.177|:80... connected.
Where 10.244.2.177 is the Pod IP of the Webserver.
The problem to me it seems is that the Reverse-Proxy does not try to trigger the attempt to forward the package twice, instead it only tries once where it fails like in the wget cmd above and the request gets dropped as the backed is not reachable due to fancy K8s IPtables stuff it seems...
If I configure the reverse-proxy not to use the Service DNS-name for load-off and instead use the Pod IP (10.244.2.177) everything is working fine and as expected.
I already tried this with a variety of CNI Providers like: Flannel, Calico, Canal, Weave and also Cilium as Kube-Proxy is not used with Cilium but all of them failed and all of them doing fancy routing nobody clearly understands out-of-the-box. So my question is how can I make K8s routing work immediately at this point? I already have reimplemented my whole stack to docker-swarm just to see if it works, and it does, flawlessly! So this issue has to do something with K8s routing scheme it seems.
Just to exclude misconfiguration from my side I also tried this with different ready-to-use K8s solutions like managed K8s from Digital-Ocean and or self-hosted RKE. All have the same behavior.
Does somebody maybe have a Idea what the problem might be and how to fix this behavior of K8s?
I might also be very useful to know what actually happens at the wget request, as this remains a mystery to me.
Many thanks in advance!
It turned out that I had several misconfigurations at my K8s Deployment.
I first removed ClusterIP: None as this leads to the behavior wget shows above at my question. Beside I've set app: and tier: wrong at my deployment. Anyways now everything is working fine and wget has a proper connection.
Thanks again

kubernetes, pods can not communicate via domain name

Recently our domain was down for some reason, but it was just the domain name the kubernetes cluster wasnt changed at all.
Now the pods can not communicate via domains and sub-domains, on ip's they work like curl ip-to-any-pod is ok but curl sub-domain.domain.com wont work. It says curl: (6) Could not resolve host: sub-domain.domain.com
Whats crazy is, it works sometimes and sometimes it doesn't work.
I have gone through every related issue on the internet but can not find anything specific, neither does the logs, events etc tell me anything.
I restarted my pods, the calico network pods but still nothing has changed.
I got this message once while restarting one of my pod
Warning FailedCreatePodSandBox 45s kubelet, ip-xxx-xx-xx-xx.ap-south-1.compute.internal Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "db2249c98d0b8b4bbef79ac5cd7e5c36c957f3929637093268670e7002c2467f" network for pod "web-6576f9fcdc-kt9xw": NetworkPlugin cni failed to set up pod "web-6576f9fcdc-kt9xw_hc" network: dial tcp: lookup etcd-a.internal.cluster.xxxx.xx on xxx.xx.x.x:53: no such host, failed to clean up sandbox container "db2249c98d0b8b4bbef79ac5cd7e5c36c957f3929637093268670e7002c2467f" network for pod "web-6576f9fcdc-kt9xw": NetworkPlugin cni failed to teardown pod "web-6576f9fcdc-kt9xw_hc" network: dial tcp: lookup etcd-a.internal.cluster.xx.xx on xxx.xx.x.x:53: no such host]
Often when setting up a domain it takes time for it to propagate, and propagates non-uniformly. It's common to see that immediately after creating the record you will not be able to resolve it at all, then a little later it'll be flaky, and eventually it will stabilize. Sometimes DNS takes tens of hours to propagate.
There are various articles online you can find from an Internet search which explain why DNS propagation can take so much time. There are also neat tools like DNS Checker which can give you a sense for how well your DNS records have propagated globally.
As you confirmed in the comments, your problems went away by the next day.
In my opinion your question it's quite complex and it cant' be answered so simply.
Please refer to:
CoreDNS
Cluster specific issues f.e. kubernetes 1.15.3 (you can verify this settings in your environment)
Default TTL for DNS records in kubernetes zone has been changed from 5s to 30s to keep consistent with old dnsmasq based kube-dns. The TTL can be customized with command kubectl edit -n kube-system configmap/coredns
Reverted the CoreDNS version to 1.3.1 for kubeadm cluster-dns
CoreDNS configuration equivalent to kube-dns
Firstly please start debugging your cluster and verify if your problem is related to your domain settings or it is cluster internal issue.
Debugging DNS Resolution
Please verify the local dns configuration in /etc/resolv.conf inside your pod.
Please verify Errors in the in DNS,Coredns PODS.
To obtain more information about dns resolution you can use different tools like: nslkookup, dig, traceroute
example:
nslookup -type=a [domain.com]
using against specific domain server
nslookup -type=a [domain.com] [ns server]
Using those tools you can get also information about Non-authoritative or Authoritative answers.
An authoritative name server is a name server that has the original source files of a domain zone files.
Because it's very important in production environment try to recreate the issue in order to keep your services healthy in the future.
Hope this help.

First request to a new ReplicaSet times out

I have a Kubernetes cluster on AWS, set up with kops.
I set up a Deployment that runs an Apache container and a Service for the Deployment (type: LoadBalancer).
When I update the deployment by running kubectl set image ..., as soon as the first pod of the new ReplicaSet becomes ready, the first couple of requests to the service time out.
Things I have tried:
I set up a readinessProbe on the pod, works.
I ran curl localhost on a pod, works.
I performed a DNS lookup for the service, works.
If I curl the IP returned by that DNS lookup inside a pod, the first request will timeout. This tells me it's not an ELB issue.
It's really frustrating since otherwise our Kubernetes stack is working great, but every time we deploy our application we run the risk of a user timing out on a request.
After a lot of debugging, I think I've solved this issue.
TL;DR; Apache has to exit gracefully.
I found a couple of related issues:
https://github.com/kubernetes/kubernetes/issues/47725
https://github.com/kubernetes/ingress-nginx/issues/69
504 Gateway Timeout - Two EC2 instances with load balancer
Some more things I tried:
Increase the KeepAliveTimeout on Apache, didn't help.
Ran curl on the pod IP and node IPs, worked normally.
Set up an externalName selector-less service for a couple of external dependencies, thinking it might have something to do with DNS lookups, didn't help.
The solution:
I set up a preStop lifecycle hook on the pod to gracefully terminate Apache to run apachectl -k graceful-stop
The issue (at least from what I can tell), is that when pods are taken down on a deployment, they receive a TERM signal, which causes apache to immediately kill all of its children. This might cause a race condition where kube-proxy still sends some traffic to pods that have received a TERM signal but not terminated completely.
Also got some help from this blog post on how to set up the hook.
I also recommend increasing the terminationGracePeriodSeconds in the PodSpec so apache has enough time to exit gracefully.

Kube-dns - Intermittent name resolution errors

We are running kubernetes 1.5.7 on CoreOS in AWS. Our kube-dns image versions are
gcr.io/google_containers/kubedns-amd64:1.9
gcr.io/google_containers/kube-dnsmasq-amd64:1.4.1
The args that we pass to dnsmasq are
--cache-size=1000
--no-resolv
--server=/in-addr.arpa/ip6.arpa/cluster.local/ec2.internal/127.0.0.1#10053
--server=169.254.169.253
--server=8.8.8.8
--log-facility=-
--log-async
--address=/com.cluster.local/com.svc.cluster.local/com.kube-system.svc.cluster.local/<ourdomain>.com.cluster.local/<ourdomain>.com.svc.cluster.local/<ourdomain>.com.kube-system.svc.cluster.local/com.ec2.internal/ec2.internal.kube-system.svc.cluster.local/ec2.internal.svc.cluster.local/ec2.internal.cluster.local/
We run 1 kube-dns pod per node in 20 node clusters. For the last few months we have been experiencing DNS failures that range from a 5 - 10 minute event that renders our services mostly unusable because name resolution is failing for most name lookups. During these events we were running 3 - 6 kube-dns pods. Since then we have drastically over provisioned our kube-dns pods to 1 per node and have not seen any of the long 5 - 10 minute DNS failure events. However now we are still seeing smaller DNS failure events that range from 1 - 30 seconds. During the investigation of these issues we noticed in our logs the following errors from the dnsmasq-metrics container
ERROR: logging before flag.Parse: W0517 03:19:50.139060 1 server.go:53] Error getting metrics from dnsmasq: read udp 127.0.0.1:36181->127.0.0.1:53: i/o timeout
When ever we have one of our smaller DNS events lasting 1 - 30 seconds we find the these logs from the kube-dns pods. For awhile we were suspecting that we were experiencing an iptables/conntrack problem wrt to pods hitting the kube-dns service. But based off these dnsmasq related errors we believe dnsmasq is refusing connections for some period of time causing the DNS failures we have been experiencing. For people who are not familiar with the dnsmasq-metrics container it is performing DNS lookups against the dnsmasq container in the same pod to get dnsmasq stats. If the dnsmasq stats cannot be retrieved via a DNS lookup it seems logical to think that services performing a DNS lookup could experience the same problem.
It's worth noting that during these issues we do NOT see the following logs from dnsmasq which makes me believe we are not hitting this threshold.
dnsmasq: Maximum number of concurrent DNS queries reached (max: 150)
I feel pretty confident that our current DNS errors are related to dnsmasq refusing connections intermittently. I'm curious if other users are seeing the same problems where the kube-dns pod logs the error from dnsmasq-metrics and during that same time frame DNS errors are logged from applications in the cluster.
Additionally if anyone has any ideas on what to do next to find out exactly what is happening wrt dnsmasq refusing connections. I'm pondering if it would be useful to run dnsmasq in debug mode but also worried that will introduce other problems related to running in debug mode. Other options we are considering is slowly rolling out CoreDNS (https://github.com/coredns/coredns).
You provide a lot of cluster domains. Each cluster domain will be inserted into the local /etc/resolv.conf and be used. For every domain in the resolv.conf there will be seperate dns request. In your case there would be 10+ dns queries for every dns query.
--address=/com.cluster.local/com.svc.cluster.local/com.kube-system.svc.cluster.local/<ourdomain>.com.cluster.local/<ourdomain>.com.svc.cluster.local/<ourdomain>.com.kube-system.svc.cluster.local/com.ec2.internal/ec2.internal.kube-system.svc.cluster.local/ec2.internal.svc.cluster.local/ec2.internal.cluster.local/
My suggestion would be to reduce the number of cluster domain to just cluster.local.
What is your reason for providing multiple cluster domains?