kops kubernetes cluster with multiple DNS - kubernetes

There are two parts to this.
I am using kops v1.17.0 to standup kubernetes cluster on ec2 instances. I am followinf these docs for doing so. https://kubernetes.io/docs/setup/production-environment/tools/kops/
on of the points go as follows.
kops has a strong opinion on the cluster name: it should be a valid
DNS name.
this got me confused. Can my cluster serve requests to only one DNS and its subdomains?
I tried this on a domain example.com I created a hosted zone for it. created a cluster named example.com.k8s.local.
I pointed this domain to my clusters load balancer. and I can access example.com. All good till now.
now, I want one of the services in my cluster to be served on abc.com. I created another hosted zone, and a new record set within it which points to this load balancer. I am expecting to visit abc.com and see this service but all I see is nginx 404 not found
Is this happening because of the first point I mentioned or totally separate issue? If it is because of 1st point is there aa way around or one cluster is always tied to one domain in the kops world?

As far as the first part is concerned, Yes I can serve multiple domains from same kubernetes cluster with this setup. upto certain version there was a hard requirement of matching domain name with cluster name, its not the case anymore.
Couple of things you need to consider. while issuing a certificate from ACM, make sure all your domains are listed
example
example.com
*example.com
bar.com
*.bar.com
make sure that all of the domains are validated and are not in pending or any other state.
I think reason for second issue was one of the domains in my certificate generated by ACM was invalid state and thus in pending state.
#jt97 ^^

Related

How to automatically update the Service `spec.externalIPs` when a Kubernetes worker is drained/down?

I'm hosting a Kubernetes cluster on VMs/VPS from a random cloud provider not providing any Kubernetes things at all, meaning with a dedicated public IP address and to allow the trafic coming to the worker nodes, I'm defining my Service with the spec.externalIPs with the fixed list of IP addresses.
I'm looking for a way to get that list updated when a node is drained/down automatically.
I had a look at the existing operators from https://operatorhub.io/ but I haven't found any that seem to cover my use case.
The idea would that when the event of a node passing to NotReady is emitted, the Service is updated with the Nodes being Ready.
Is there any operator that could allow doing that?
After some time working on this, I finally figured out that this is not possible, at least today, there's no known operator or what so ever that could update the field with the IP addresses.
And even if it was the case, there would be delays to update the DNS records.
What I've done instead is to buy another VPS, installing HAproxy in order to proxy the Kubernetes API trafic to the master nodes, and the web trafic (both 80 and 443) to the Kubernetes worker nodes.
HAproxy monitors the nodes, and add/remove nodes automagically and in a very quick way.
With this, you just need one DNS record, pointing to the Load Balancer (or VIP of the Load Balancers in order to avoid SPOF), and HAproxy will do the rest!

How to point my domain to my EKS cluster?

I have followed the AWS getting started guide to provision an EKS cluster (3 public subnets and 3 private subnets). After creating it, I get the following API server endpoint https://XXXXXXXXXXXXXXXXXXXX.gr7.us-east-2.eks.amazonaws.com (replaced the URL with X's for privacy reasons).
Accessing the URL in the browser I get the expected output from the cluster endpoint.
Question: How do I point my registered domain in Route 53 to my cluster endpoint?
I can't use a cname record because my domain is a root domain and will receive an apex domain error.
I don' have access to a static ip, and I don't believe my EKS cluster has a public IP address I can directly used. This would mean I can't use an A record (as I need an IP address).
Can I please get help/instructions as to how I can point my domain straight to my cluster?
Below is my AWS VPC architecture:
Don't try and assign a pretty name to the API endpoint. Your cluster endpoint is the address that's used to talk to the control plane. When you configure your kubectl tool, the api endpoint is what kubectl talks to.
Once you've got an application running on your EKS cluster, and have a load balancer, or Ingress, or something for incoming connections, that's when you worry about creating pretty names.
And yes, If you're dealing with AWS load balancers, you don't get the option of A records, so you can't use the apex of the domain, unless you're hosting DNS in route 53, in which case, you can use "alias" records to point the apex of a domain at a load balancer.
Kubernetes is a massively complex thing to try understand and get running. Given that this is the type of question you're asking, it sounds like you don't have the full picture yet. I recommend (1) joining the Kubenetes slack channel. It'll be a much faster way to get help than SO, and (2) take in Jeff Geerling's excellent Kubernetes 101 course on youtube.

DNS problem on AWS EKS when running in private subnets

I have an EKS cluster setup in a VPC. The worker nodes are launched in private subnets. I can successfully deploy pods and services.
However, I'm not able to perform DNS resolution from within the pods. (It works fine on the worker nodes, outside the container.)
Troubleshooting using https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/ results in the following from nslookup (timeout after a minute or so):
Server: 172.20.0.10
Address 1: 172.20.0.10
nslookup: can't resolve 'kubernetes.default'
When I launch the cluster in an all-public VPC, I don't have this problem. Am I missing any necessary steps for DNS resolution from within a private subnet?
Many thanks,
Daniel
I feel like I have to give this a proper answer because coming upon this question was the answer to 10 straight hours of debugging for me. As #Daniel said in his comment, the issue I found was with my ACL blocking outbound traffic on UDP port 53 which apparently kubernetes uses to resolve DNS records.
The process was especially confusing for me because one of my pods worked actually worked the entire time since (I think?) it happened to be in the same zone as the kubernetes DNS resolver.
To elaborate on the comment from #Daniel, you need:
an ingress rule for UDP port 53
an ingress rule for UDP on ephemeral ports (e.g. 1025–65535)
I hadn't added (2) and was seeing CoreDNS receiving requests and trying to respond, but the response wasn't getting back to the requester.
Some tips for others dealing with these kinds of issues, turn on CoreDNS logging by adding the log configuration to the configmap, which I was able to do with kubectl edit configmap -n kube-system coredns. See CoreDNS docs on this https://github.com/coredns/coredns/blob/master/README.md#examples This can help you figure out whether the issue is CoreDNS receiving queries or sending the response back.
I ran into this as well. I have multiple node groups, and each one was created from a CloudFormation template. The CloudFormation template created a security group for each node group that allowed the nodes in that group to communicate with each other.
The DNS error resulted from Pods running in separate node groups from the CoreDNS Pods, so the Pods were unable to reach CoreDNS (network communications were only permitted withing node groups). I will make a new CloudFormation template for the node security group so that all my node groups in my cluster can share the same security group.
I resolved the issue for now by allowing inbound UDP traffic on port 53 for each of my node group security groups.
So I been struggling for a couple of hours i think, lost track of time, with this issue as well.
Since i am using the default VPC but with the worker nodes inside the private subnet, it wasn't working.
I went through the amazon-vpc-cni-k8s and found the solution.
We have to sff the environment variable of the aws-node daemonset AWS_VPC_K8S_CNI_EXTERNALSNAT=true.
You can either get the new yaml and apply or just fix it through the dashboard. However for it to work you have to restart the worker node instance so the ip route tables are refreshed.
issue link is here
thankz
Re: AWS EKS Kube Cluster and Route53 internal/private Route53 queries from pods
Just wanted to post a note on what we needed to do to resolve our issues. Noting that YMMV and everyone has different environments and resolutions, etc.
Disclaimer:
We're using the community terraform eks module to deploy/manage vpcs and the eks clusters. We didn't need to modify any security groups. We are working with multiple clusters, regions, and VPC's.
ref:
Terraform EKS module
CoreDNS Changes:
We have a DNS relay for private internal, so we needed to modify coredns configmap and add in the dns-relay IP address
...
ec2.internal:53 {
errors
cache 30
forward . 10.1.1.245
}
foo.dev.com:53 {
errors
cache 30
forward . 10.1.1.245
}
foo.stage.com:53 {
errors
cache 30
forward . 10.1.1.245
}
...
VPC DHCP option sets:
Update with the IP of the above relay server if applicable--requires regeneration of the option set as they cannot be modified.
Our DHCP options set looks like this:
["AmazonProvidedDNS", "10.1.1.245", "169.254.169.253"]
ref: AWS DHCP Option Sets
Route-53 Updates:
Associate every route53 zone with the VPC-ID that you need to associate it with (where our kube cluster resides and the pods will make queries from).
there is also a terraform module for that:
https://www.terraform.io/docs/providers/aws/r/route53_zone_association.html
We had run into a similar issue where DNS resolution times out on some of the pods, but re-creating the pod couple of times resolves the problem. Also its not every pod on a given node showing issues, only some pods.
It turned out to be due to a bug in version 1.5.4 of Amazon VPC CNI, more details here -- https://github.com/aws/amazon-vpc-cni-k8s/issues/641.
Quick solution is to revert to the recommended version 1.5.3 - https://docs.aws.amazon.com/eks/latest/userguide/update-cluster.html
As many others, I've been struggling with this bug a few hours.
In my case the issue was this bug https://github.com/awslabs/amazon-eks-ami/issues/636 that basically sets up an incorrect DNS when you specify endpoint and certificate but not certificate.
To confirm, check
That you have connectivity (NACL and security groups) allowing DNS on TCP and UDP. For me the better way was to ssh into the cluster and see if it resolves (nslookup). If it doesn't resolve (most likely it is either NACL or SG), but check that the DNS nameserver in the node is well configured.
If you can get name resolution in the node, but not inside the pod, check that the nameserver in /etc/resolv.conf points to an IP in your service network (if you see 172.20.0.10, your service network should be 172.20.0.0/24 or so)

Kubernetes StatefulSets: External DNS

Kubernetes StatefulSets create internal DNS entries with stable network IDs. The docs describe this here:
Each Pod in a StatefulSet derives its hostname from the name of the
StatefulSet and the ordinal of the Pod. The pattern for the
constructed hostname is $(statefulset name)-$(ordinal). The example
above will create three Pods named web-0,web-1,web-2. A StatefulSet
can use a Headless Service to control the domain of its Pods. The
domain managed by this Service takes the form: $(service
name).$(namespace).svc.cluster.local, where “cluster.local” is the
cluster domain. As each Pod is created, it gets a matching DNS
subdomain, taking the form: $(podname).$(governing service domain),
where the governing service is defined by the serviceName field on the
StatefulSet.
I am experimenting with headless services, and this works great for communication between individual services i.e web-0.web.default.svc.cluster.local can connect and communicate with web-1.web.default.svc.cluster.local just fine.
Is there any way that I can configure this to work outside of the cluster network as well, where "cluster.local" is replaced with something like "clustera.com"?
I would like to give another kubernetes cluster, lets call it clusterb.com, access to the individual services of the original cluster (clustera.com); I'm hoping it would look something like clusterb simply hitting endpoints like web-1.web.default.svc.clustera.com and web-0.web.default.svc.clustera.com.
Is this possible? I would like access to the individual services, not a load balanced endpoint.
I would suggest you to test the following solutions and check if they can help you to achieve your goal in your particular scenario:
The first one is for sure the easiest and I believe that you didn't implemented it for some reason and you did not reported in the question why.
I am talking about Headless services Without selectors CNAME records for ExternalName-type services.
ExternalName: Maps the service to the contents of the externalName field (e.g. foo.bar.example.com), by returning a CNAME record with its value. No proxying of any kind is set up. This requires version 1.7 or higher of kube-dns
Therefore if you need to point a service of an other cluster you will need to register a domain name pointing to the relative IP of clusterb.
The second solution that I have never tested, but I believe it can apply to your case is to make use of a Federated Cluster whose reason why to use it is accordinding to the documentation:
Cross cluster discovery: Federation provides the ability to auto-configure DNS servers and load balancers with backends from all clusters. For example, you can ensure that a global VIP or DNS record can be used to access backends from multiple clusters.

kube2sky in kubernetes with multiple api servers

It a Kubernetes cluster where everything is highly available, the DNS is a key piece of the system, everything relies on the DNS.
The pod kube2sky has a parameter "-kube_master_url" where, afaik, you can only specify one api server node.
You might have multiple api servers for redundancy behing a service, but if the one that kube2sky is using gets down, the whole DNS system gets down too, hence, the highly availabily of the cluster is gone.
For other pods, you can use the internal DNS name of the api server service, but in this case, you can't since this is the actual DNS service.
Any idea how to solve this issue?
In its standard configuration, kube2sky doesn't actually rely on having a single apiserver IP address to use. Instead, it uses the virtual IP of the kubernetes service that gets auto-created in every cluster, and which the kube-proxy sets up iptables rules for. It's briefly explained in the docs on github.
Also, it's recommended that replicated masters are put behind a load balancer in such high-availability configurations to avoid problems like this with client tools.