Kube-dns - Intermittent name resolution errors - kubernetes

We are running kubernetes 1.5.7 on CoreOS in AWS. Our kube-dns image versions are
gcr.io/google_containers/kubedns-amd64:1.9
gcr.io/google_containers/kube-dnsmasq-amd64:1.4.1
The args that we pass to dnsmasq are
--cache-size=1000
--no-resolv
--server=/in-addr.arpa/ip6.arpa/cluster.local/ec2.internal/127.0.0.1#10053
--server=169.254.169.253
--server=8.8.8.8
--log-facility=-
--log-async
--address=/com.cluster.local/com.svc.cluster.local/com.kube-system.svc.cluster.local/<ourdomain>.com.cluster.local/<ourdomain>.com.svc.cluster.local/<ourdomain>.com.kube-system.svc.cluster.local/com.ec2.internal/ec2.internal.kube-system.svc.cluster.local/ec2.internal.svc.cluster.local/ec2.internal.cluster.local/
We run 1 kube-dns pod per node in 20 node clusters. For the last few months we have been experiencing DNS failures that range from a 5 - 10 minute event that renders our services mostly unusable because name resolution is failing for most name lookups. During these events we were running 3 - 6 kube-dns pods. Since then we have drastically over provisioned our kube-dns pods to 1 per node and have not seen any of the long 5 - 10 minute DNS failure events. However now we are still seeing smaller DNS failure events that range from 1 - 30 seconds. During the investigation of these issues we noticed in our logs the following errors from the dnsmasq-metrics container
ERROR: logging before flag.Parse: W0517 03:19:50.139060 1 server.go:53] Error getting metrics from dnsmasq: read udp 127.0.0.1:36181->127.0.0.1:53: i/o timeout
When ever we have one of our smaller DNS events lasting 1 - 30 seconds we find the these logs from the kube-dns pods. For awhile we were suspecting that we were experiencing an iptables/conntrack problem wrt to pods hitting the kube-dns service. But based off these dnsmasq related errors we believe dnsmasq is refusing connections for some period of time causing the DNS failures we have been experiencing. For people who are not familiar with the dnsmasq-metrics container it is performing DNS lookups against the dnsmasq container in the same pod to get dnsmasq stats. If the dnsmasq stats cannot be retrieved via a DNS lookup it seems logical to think that services performing a DNS lookup could experience the same problem.
It's worth noting that during these issues we do NOT see the following logs from dnsmasq which makes me believe we are not hitting this threshold.
dnsmasq: Maximum number of concurrent DNS queries reached (max: 150)
I feel pretty confident that our current DNS errors are related to dnsmasq refusing connections intermittently. I'm curious if other users are seeing the same problems where the kube-dns pod logs the error from dnsmasq-metrics and during that same time frame DNS errors are logged from applications in the cluster.
Additionally if anyone has any ideas on what to do next to find out exactly what is happening wrt dnsmasq refusing connections. I'm pondering if it would be useful to run dnsmasq in debug mode but also worried that will introduce other problems related to running in debug mode. Other options we are considering is slowly rolling out CoreDNS (https://github.com/coredns/coredns).

You provide a lot of cluster domains. Each cluster domain will be inserted into the local /etc/resolv.conf and be used. For every domain in the resolv.conf there will be seperate dns request. In your case there would be 10+ dns queries for every dns query.
--address=/com.cluster.local/com.svc.cluster.local/com.kube-system.svc.cluster.local/<ourdomain>.com.cluster.local/<ourdomain>.com.svc.cluster.local/<ourdomain>.com.kube-system.svc.cluster.local/com.ec2.internal/ec2.internal.kube-system.svc.cluster.local/ec2.internal.svc.cluster.local/ec2.internal.cluster.local/
My suggestion would be to reduce the number of cluster domain to just cluster.local.
What is your reason for providing multiple cluster domains?

Related

(Kubernetes HPA) Scaling up pods got connection refused

I have some amount of traffic that can boost the cpu usage up to 180%. I tried using a single pod which works fine but the response was extremely slow. When I configured my HPA to cpu=80%, min=1 and max={2 or more} I hit connection refused when HPA was creating more pods. I tried put a large value to min (ie. min = 3) the connection refused relief but there will be too many idle pods when traffic is low. Is there any way to stop putting pod online until it is completely started?
I hit connection refused when HPA was creating more pods
Kubernetes uses the readinessProbe, to determine whether to redirect clients to some pods. If the readinessProbe for a Pod is not successful, then Service whose selectors could have matched that Pod would not take it under consideration.
If there is no readinessProbe defined, or if it was misconfigured, Pods that are still starting up may end up serving client requests. Connection refused could suggest there was no process listening yet for incoming connections.
Please share your deployment/statefulset/..., if you need further assistance setting this up.

Intermittent failure of K8S DNS resolver / dial udp / Operation cancelled

We're having a medium sized Kubernetes cluster. So imagine a situation where approximately 70 pods are being connecting to a one socket server. It works fine most of the time, however, from time to time one or two pods just fail to resolve k8s DNS, and it times out with the following error:
Error: dial tcp: lookup thishost.production.svc.cluster.local on 10.32.0.10:53: read udp 100.65.63.202:36638->100.64.209.61:53: i/o timeout at
What we noticed is that this is not the only service that's failing intermittently. Other services experience that from time to time. We used to ignore it, since it was very random and rate, however in the above case that is very noticeable. The only solution is to actually kill the faulty pod. (Restarting doesn't help)
Has anyone experienced this? Do you have any tips on how to debug it/ fix?
It almost feels as if it's beyond our expertise and is fully related to the internals of the DNS resolver.
Kubernetes version: 1.23.4
Container Network: cilium
this issue most probably will be related to the CNI.
I would suggest following the link to debug the issue:
https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/
and to be able to help you we need more information:
is this cluster on-premise or cloud?
what are you using for CNI?
how many nodes are running and are they all in the same subnet? if yes, dose they have other interfaces?
share the below command result.
kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o wide
when you restart the pod to solve the issue temp does it stay on the same node or does it change?

kubernetes, pods can not communicate via domain name

Recently our domain was down for some reason, but it was just the domain name the kubernetes cluster wasnt changed at all.
Now the pods can not communicate via domains and sub-domains, on ip's they work like curl ip-to-any-pod is ok but curl sub-domain.domain.com wont work. It says curl: (6) Could not resolve host: sub-domain.domain.com
Whats crazy is, it works sometimes and sometimes it doesn't work.
I have gone through every related issue on the internet but can not find anything specific, neither does the logs, events etc tell me anything.
I restarted my pods, the calico network pods but still nothing has changed.
I got this message once while restarting one of my pod
Warning FailedCreatePodSandBox 45s kubelet, ip-xxx-xx-xx-xx.ap-south-1.compute.internal Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "db2249c98d0b8b4bbef79ac5cd7e5c36c957f3929637093268670e7002c2467f" network for pod "web-6576f9fcdc-kt9xw": NetworkPlugin cni failed to set up pod "web-6576f9fcdc-kt9xw_hc" network: dial tcp: lookup etcd-a.internal.cluster.xxxx.xx on xxx.xx.x.x:53: no such host, failed to clean up sandbox container "db2249c98d0b8b4bbef79ac5cd7e5c36c957f3929637093268670e7002c2467f" network for pod "web-6576f9fcdc-kt9xw": NetworkPlugin cni failed to teardown pod "web-6576f9fcdc-kt9xw_hc" network: dial tcp: lookup etcd-a.internal.cluster.xx.xx on xxx.xx.x.x:53: no such host]
Often when setting up a domain it takes time for it to propagate, and propagates non-uniformly. It's common to see that immediately after creating the record you will not be able to resolve it at all, then a little later it'll be flaky, and eventually it will stabilize. Sometimes DNS takes tens of hours to propagate.
There are various articles online you can find from an Internet search which explain why DNS propagation can take so much time. There are also neat tools like DNS Checker which can give you a sense for how well your DNS records have propagated globally.
As you confirmed in the comments, your problems went away by the next day.
In my opinion your question it's quite complex and it cant' be answered so simply.
Please refer to:
CoreDNS
Cluster specific issues f.e. kubernetes 1.15.3 (you can verify this settings in your environment)
Default TTL for DNS records in kubernetes zone has been changed from 5s to 30s to keep consistent with old dnsmasq based kube-dns. The TTL can be customized with command kubectl edit -n kube-system configmap/coredns
Reverted the CoreDNS version to 1.3.1 for kubeadm cluster-dns
CoreDNS configuration equivalent to kube-dns
Firstly please start debugging your cluster and verify if your problem is related to your domain settings or it is cluster internal issue.
Debugging DNS Resolution
Please verify the local dns configuration in /etc/resolv.conf inside your pod.
Please verify Errors in the in DNS,Coredns PODS.
To obtain more information about dns resolution you can use different tools like: nslkookup, dig, traceroute
example:
nslookup -type=a [domain.com]
using against specific domain server
nslookup -type=a [domain.com] [ns server]
Using those tools you can get also information about Non-authoritative or Authoritative answers.
An authoritative name server is a name server that has the original source files of a domain zone files.
Because it's very important in production environment try to recreate the issue in order to keep your services healthy in the future.
Hope this help.

DNS problem on AWS EKS when running in private subnets

I have an EKS cluster setup in a VPC. The worker nodes are launched in private subnets. I can successfully deploy pods and services.
However, I'm not able to perform DNS resolution from within the pods. (It works fine on the worker nodes, outside the container.)
Troubleshooting using https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/ results in the following from nslookup (timeout after a minute or so):
Server: 172.20.0.10
Address 1: 172.20.0.10
nslookup: can't resolve 'kubernetes.default'
When I launch the cluster in an all-public VPC, I don't have this problem. Am I missing any necessary steps for DNS resolution from within a private subnet?
Many thanks,
Daniel
I feel like I have to give this a proper answer because coming upon this question was the answer to 10 straight hours of debugging for me. As #Daniel said in his comment, the issue I found was with my ACL blocking outbound traffic on UDP port 53 which apparently kubernetes uses to resolve DNS records.
The process was especially confusing for me because one of my pods worked actually worked the entire time since (I think?) it happened to be in the same zone as the kubernetes DNS resolver.
To elaborate on the comment from #Daniel, you need:
an ingress rule for UDP port 53
an ingress rule for UDP on ephemeral ports (e.g. 1025–65535)
I hadn't added (2) and was seeing CoreDNS receiving requests and trying to respond, but the response wasn't getting back to the requester.
Some tips for others dealing with these kinds of issues, turn on CoreDNS logging by adding the log configuration to the configmap, which I was able to do with kubectl edit configmap -n kube-system coredns. See CoreDNS docs on this https://github.com/coredns/coredns/blob/master/README.md#examples This can help you figure out whether the issue is CoreDNS receiving queries or sending the response back.
I ran into this as well. I have multiple node groups, and each one was created from a CloudFormation template. The CloudFormation template created a security group for each node group that allowed the nodes in that group to communicate with each other.
The DNS error resulted from Pods running in separate node groups from the CoreDNS Pods, so the Pods were unable to reach CoreDNS (network communications were only permitted withing node groups). I will make a new CloudFormation template for the node security group so that all my node groups in my cluster can share the same security group.
I resolved the issue for now by allowing inbound UDP traffic on port 53 for each of my node group security groups.
So I been struggling for a couple of hours i think, lost track of time, with this issue as well.
Since i am using the default VPC but with the worker nodes inside the private subnet, it wasn't working.
I went through the amazon-vpc-cni-k8s and found the solution.
We have to sff the environment variable of the aws-node daemonset AWS_VPC_K8S_CNI_EXTERNALSNAT=true.
You can either get the new yaml and apply or just fix it through the dashboard. However for it to work you have to restart the worker node instance so the ip route tables are refreshed.
issue link is here
thankz
Re: AWS EKS Kube Cluster and Route53 internal/private Route53 queries from pods
Just wanted to post a note on what we needed to do to resolve our issues. Noting that YMMV and everyone has different environments and resolutions, etc.
Disclaimer:
We're using the community terraform eks module to deploy/manage vpcs and the eks clusters. We didn't need to modify any security groups. We are working with multiple clusters, regions, and VPC's.
ref:
Terraform EKS module
CoreDNS Changes:
We have a DNS relay for private internal, so we needed to modify coredns configmap and add in the dns-relay IP address
...
ec2.internal:53 {
errors
cache 30
forward . 10.1.1.245
}
foo.dev.com:53 {
errors
cache 30
forward . 10.1.1.245
}
foo.stage.com:53 {
errors
cache 30
forward . 10.1.1.245
}
...
VPC DHCP option sets:
Update with the IP of the above relay server if applicable--requires regeneration of the option set as they cannot be modified.
Our DHCP options set looks like this:
["AmazonProvidedDNS", "10.1.1.245", "169.254.169.253"]
ref: AWS DHCP Option Sets
Route-53 Updates:
Associate every route53 zone with the VPC-ID that you need to associate it with (where our kube cluster resides and the pods will make queries from).
there is also a terraform module for that:
https://www.terraform.io/docs/providers/aws/r/route53_zone_association.html
We had run into a similar issue where DNS resolution times out on some of the pods, but re-creating the pod couple of times resolves the problem. Also its not every pod on a given node showing issues, only some pods.
It turned out to be due to a bug in version 1.5.4 of Amazon VPC CNI, more details here -- https://github.com/aws/amazon-vpc-cni-k8s/issues/641.
Quick solution is to revert to the recommended version 1.5.3 - https://docs.aws.amazon.com/eks/latest/userguide/update-cluster.html
As many others, I've been struggling with this bug a few hours.
In my case the issue was this bug https://github.com/awslabs/amazon-eks-ami/issues/636 that basically sets up an incorrect DNS when you specify endpoint and certificate but not certificate.
To confirm, check
That you have connectivity (NACL and security groups) allowing DNS on TCP and UDP. For me the better way was to ssh into the cluster and see if it resolves (nslookup). If it doesn't resolve (most likely it is either NACL or SG), but check that the DNS nameserver in the node is well configured.
If you can get name resolution in the node, but not inside the pod, check that the nameserver in /etc/resolv.conf points to an IP in your service network (if you see 172.20.0.10, your service network should be 172.20.0.0/24 or so)

First request to a new ReplicaSet times out

I have a Kubernetes cluster on AWS, set up with kops.
I set up a Deployment that runs an Apache container and a Service for the Deployment (type: LoadBalancer).
When I update the deployment by running kubectl set image ..., as soon as the first pod of the new ReplicaSet becomes ready, the first couple of requests to the service time out.
Things I have tried:
I set up a readinessProbe on the pod, works.
I ran curl localhost on a pod, works.
I performed a DNS lookup for the service, works.
If I curl the IP returned by that DNS lookup inside a pod, the first request will timeout. This tells me it's not an ELB issue.
It's really frustrating since otherwise our Kubernetes stack is working great, but every time we deploy our application we run the risk of a user timing out on a request.
After a lot of debugging, I think I've solved this issue.
TL;DR; Apache has to exit gracefully.
I found a couple of related issues:
https://github.com/kubernetes/kubernetes/issues/47725
https://github.com/kubernetes/ingress-nginx/issues/69
504 Gateway Timeout - Two EC2 instances with load balancer
Some more things I tried:
Increase the KeepAliveTimeout on Apache, didn't help.
Ran curl on the pod IP and node IPs, worked normally.
Set up an externalName selector-less service for a couple of external dependencies, thinking it might have something to do with DNS lookups, didn't help.
The solution:
I set up a preStop lifecycle hook on the pod to gracefully terminate Apache to run apachectl -k graceful-stop
The issue (at least from what I can tell), is that when pods are taken down on a deployment, they receive a TERM signal, which causes apache to immediately kill all of its children. This might cause a race condition where kube-proxy still sends some traffic to pods that have received a TERM signal but not terminated completely.
Also got some help from this blog post on how to set up the hook.
I also recommend increasing the terminationGracePeriodSeconds in the PodSpec so apache has enough time to exit gracefully.