Alpine container not able to resolve hostname - kubernetes

I have a StatefulSet Kubernetes deployment with 2 worker nodes. I was able to bring the containers up on those 2 nodes using a headless service hence the nodes have hostname as
abc-0.abc.default.svc.cluster.local
abc-1.abc.default.svc.cluster.local
The problem is that I am not able to ping abc-0 from abc-1 and in the opposite direction using the hostname. If I use IP addresses then everything works fine.
There were issues related the DNS resolution with Alpine Linux at some point(https://github.com/gliderlabs/docker-alpine/issues/8#issuecomment-172594887) but seems it was fixed in Alpine Linux 3.4.
What would be the good way to verify that the problem I am facing is not because of the Linux but instead some misconfiguration in Kube-DNS or something?

I assume you mean ping abc-0, did you try ping abc-0.abc ?
The resolv.conf is not enriched with all the statefull service entries search domains, just the namespace one, so when you try to access by only abc-0 it actually resolves to abc-0.default.svc.cluster.local

The problem was that the service I was running didn't get endpoints assigned to it. I am using v1.7.3 so resolved the problem by adding 'selector' on the headless service. Also, looks like this is no longer a problem in v1.8.3.

Related

K8s does strange networking stuff that breaks application designs

I discovered a strange behavior with K8s networking that can break some applications designs completely.
I have two pods and one Service
Pod 1 is a stupid Reverse Proxy (I don't know the implementation)
Pod 2 is a Webserver
The mentioned Service belongs to pod 2, the webserver
After the initial start of my stack I discovered that Pod 1 - the Reverse Proxy is not able to reach the webserver on the first attempt for some reason, ping is working fine and curl also.
Now I tried wget mywebserver inside of Pod 1 - Reverse Proxy and got back the following:
wget mywebserver
--2020-11-16 20:07:37-- http://mywebserver/
Resolving mywebserver (mywebserver)... 10.244.0.34, 10.244.0.152, 10.244.1.125, ...
Connecting to mywebserver (mywebserver)|10.244.0.34|:80... failed: Connection refused.
Connecting to mywebserver (mywebserver)|10.244.0.152|:80... failed: Connection refused.
Connecting to mywebserver (mywebserver)|10.244.1.125|:80... failed: Connection refused.
Connecting to mywebserver (mywebserver)|10.244.2.177|:80... connected.
Where 10.244.2.177 is the Pod IP of the Webserver.
The problem to me it seems is that the Reverse-Proxy does not try to trigger the attempt to forward the package twice, instead it only tries once where it fails like in the wget cmd above and the request gets dropped as the backed is not reachable due to fancy K8s IPtables stuff it seems...
If I configure the reverse-proxy not to use the Service DNS-name for load-off and instead use the Pod IP (10.244.2.177) everything is working fine and as expected.
I already tried this with a variety of CNI Providers like: Flannel, Calico, Canal, Weave and also Cilium as Kube-Proxy is not used with Cilium but all of them failed and all of them doing fancy routing nobody clearly understands out-of-the-box. So my question is how can I make K8s routing work immediately at this point? I already have reimplemented my whole stack to docker-swarm just to see if it works, and it does, flawlessly! So this issue has to do something with K8s routing scheme it seems.
Just to exclude misconfiguration from my side I also tried this with different ready-to-use K8s solutions like managed K8s from Digital-Ocean and or self-hosted RKE. All have the same behavior.
Does somebody maybe have a Idea what the problem might be and how to fix this behavior of K8s?
I might also be very useful to know what actually happens at the wget request, as this remains a mystery to me.
Many thanks in advance!
It turned out that I had several misconfigurations at my K8s Deployment.
I first removed ClusterIP: None as this leads to the behavior wget shows above at my question. Beside I've set app: and tier: wrong at my deployment. Anyways now everything is working fine and wget has a proper connection.
Thanks again

DNS problem on AWS EKS when running in private subnets

I have an EKS cluster setup in a VPC. The worker nodes are launched in private subnets. I can successfully deploy pods and services.
However, I'm not able to perform DNS resolution from within the pods. (It works fine on the worker nodes, outside the container.)
Troubleshooting using https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/ results in the following from nslookup (timeout after a minute or so):
Server: 172.20.0.10
Address 1: 172.20.0.10
nslookup: can't resolve 'kubernetes.default'
When I launch the cluster in an all-public VPC, I don't have this problem. Am I missing any necessary steps for DNS resolution from within a private subnet?
Many thanks,
Daniel
I feel like I have to give this a proper answer because coming upon this question was the answer to 10 straight hours of debugging for me. As #Daniel said in his comment, the issue I found was with my ACL blocking outbound traffic on UDP port 53 which apparently kubernetes uses to resolve DNS records.
The process was especially confusing for me because one of my pods worked actually worked the entire time since (I think?) it happened to be in the same zone as the kubernetes DNS resolver.
To elaborate on the comment from #Daniel, you need:
an ingress rule for UDP port 53
an ingress rule for UDP on ephemeral ports (e.g. 1025–65535)
I hadn't added (2) and was seeing CoreDNS receiving requests and trying to respond, but the response wasn't getting back to the requester.
Some tips for others dealing with these kinds of issues, turn on CoreDNS logging by adding the log configuration to the configmap, which I was able to do with kubectl edit configmap -n kube-system coredns. See CoreDNS docs on this https://github.com/coredns/coredns/blob/master/README.md#examples This can help you figure out whether the issue is CoreDNS receiving queries or sending the response back.
I ran into this as well. I have multiple node groups, and each one was created from a CloudFormation template. The CloudFormation template created a security group for each node group that allowed the nodes in that group to communicate with each other.
The DNS error resulted from Pods running in separate node groups from the CoreDNS Pods, so the Pods were unable to reach CoreDNS (network communications were only permitted withing node groups). I will make a new CloudFormation template for the node security group so that all my node groups in my cluster can share the same security group.
I resolved the issue for now by allowing inbound UDP traffic on port 53 for each of my node group security groups.
So I been struggling for a couple of hours i think, lost track of time, with this issue as well.
Since i am using the default VPC but with the worker nodes inside the private subnet, it wasn't working.
I went through the amazon-vpc-cni-k8s and found the solution.
We have to sff the environment variable of the aws-node daemonset AWS_VPC_K8S_CNI_EXTERNALSNAT=true.
You can either get the new yaml and apply or just fix it through the dashboard. However for it to work you have to restart the worker node instance so the ip route tables are refreshed.
issue link is here
thankz
Re: AWS EKS Kube Cluster and Route53 internal/private Route53 queries from pods
Just wanted to post a note on what we needed to do to resolve our issues. Noting that YMMV and everyone has different environments and resolutions, etc.
Disclaimer:
We're using the community terraform eks module to deploy/manage vpcs and the eks clusters. We didn't need to modify any security groups. We are working with multiple clusters, regions, and VPC's.
ref:
Terraform EKS module
CoreDNS Changes:
We have a DNS relay for private internal, so we needed to modify coredns configmap and add in the dns-relay IP address
...
ec2.internal:53 {
errors
cache 30
forward . 10.1.1.245
}
foo.dev.com:53 {
errors
cache 30
forward . 10.1.1.245
}
foo.stage.com:53 {
errors
cache 30
forward . 10.1.1.245
}
...
VPC DHCP option sets:
Update with the IP of the above relay server if applicable--requires regeneration of the option set as they cannot be modified.
Our DHCP options set looks like this:
["AmazonProvidedDNS", "10.1.1.245", "169.254.169.253"]
ref: AWS DHCP Option Sets
Route-53 Updates:
Associate every route53 zone with the VPC-ID that you need to associate it with (where our kube cluster resides and the pods will make queries from).
there is also a terraform module for that:
https://www.terraform.io/docs/providers/aws/r/route53_zone_association.html
We had run into a similar issue where DNS resolution times out on some of the pods, but re-creating the pod couple of times resolves the problem. Also its not every pod on a given node showing issues, only some pods.
It turned out to be due to a bug in version 1.5.4 of Amazon VPC CNI, more details here -- https://github.com/aws/amazon-vpc-cni-k8s/issues/641.
Quick solution is to revert to the recommended version 1.5.3 - https://docs.aws.amazon.com/eks/latest/userguide/update-cluster.html
As many others, I've been struggling with this bug a few hours.
In my case the issue was this bug https://github.com/awslabs/amazon-eks-ami/issues/636 that basically sets up an incorrect DNS when you specify endpoint and certificate but not certificate.
To confirm, check
That you have connectivity (NACL and security groups) allowing DNS on TCP and UDP. For me the better way was to ssh into the cluster and see if it resolves (nslookup). If it doesn't resolve (most likely it is either NACL or SG), but check that the DNS nameserver in the node is well configured.
If you can get name resolution in the node, but not inside the pod, check that the nameserver in /etc/resolv.conf points to an IP in your service network (if you see 172.20.0.10, your service network should be 172.20.0.0/24 or so)

Can't get to GCE instance from k8s pods on the same subnet

I have a cluster with container range 10.101.64.0/19 on a net A and subnet SA with ranges 10.101.0.0/18. On the same subnet, there is VM in GCE with IP 10.101.0.4 and it can be pinged just fine from within the cluster, e.g. from a node with 10.101.0.3. However, if I go to a pod on this node which got address 10.101.67.191 (which is expected - this node assigns addresses 10.101.67.0/24 or something), I don't get meaningful answer from that VM I want to access from this pod. Using tcpdump on icmp, I can see that when I ping that VM machine from the pod, the ping gets there but I don't receive ACK in the pod. Seems like VM is just throwing it away.
Any idea how to resolve it? Some routes or firewalls? I am using the same topology in the default subnet created by kubernetes where this work but I cannot find anything relevant which could explain this (there are some routes and firewall rules which could influence it but I wasn't successful when trying to mimic them in my subnet)
I think it is a firewall issue.
Here I've already provided the solution on Stakoverflow.
It may help to solve your case.

Pass IP of kube-dns as env to container

Is there a way to pass the kube-dns server IP to the container so that services inside the container can resolve the names properly?
I am trying to run nginx and it needs a resolver directive to be specified to resolve names against a DNS server.
I do not want to use public DNS servers; only the one provided by kube-dns.
Also, I need a dynamic way to pass the IP as the DNS server IP can change across various cloud platforms or bare-metal configurations. So, I cannot use a hardcoded 10.0.0.10 IP.
Alright, it seems quite simple.
A few points I had missed.
kube-dns runs as a Kubernetes Service in the kube-system namespace.
The DNS name for the service is kube-dns.kube-system.svc.cluster.local
We can pass this to the container using env.
EDIT:
It seems I was looking at the wrong place. It indeed uses the local resolver resolve. The problem is I hit a 'feature' in NGINX which caches the lookups for 300 secs and causes name resolution failure, and I was investigating k8s.

Kubernetes: Connect to the outside world from pod

I have a local Kubernetes cluster on a single machine, and a container that needs to receive a url (like https://www.wikipedia.org/) and extract the text content from it. Essentially I need my pod to connect to the outside world. Since I am using v1.2.5, I need some DNS add-on like SkyDNS, but I cannot find any working example or tutorial on how to set it up. Tutorials like this usually only tell me how to make pods within the cluster talk to each other by DNS look-up.
Therefore, could anyone give me some advice on how to set up and configure an add-on of Kubernetes so that pods can access the public Internet? Thank you very much!
You can simply create your pods with "dnsPolicy: Default", this will give it a resolv.conf just like on the host and it will be able to resolve wikipedia.org. It will not be able to resolve cluster local services.If you're looking to actually deploy kube-dns so you can also resolve cluster local services this is probably the best starting point: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/