kube-apiserver is not using docker-dns - kubernetes

I'm using kubespray 2.14 to deploy a k8s cluster. Most of the configuration is default. I've configured OIDC authentication for kubectl. I'm using Keycloak as a locally deployed auth server. Traffic is secured by a self-signed certificate and the domain keycloak.example.com is resolved by the local dns server. I've added a CoreDNS external zone for the example.com domain.
kube-apiserver uses the host network, so names are not resolved by CoreDNS by default.
Everything works fine as long as I use resolvconf_mode: host_resolvconf. Then the coredns address is added to the hosts /etc/resolv.conf file and kube-apiserver uses CoreDNS to resolve custom domain. But this mode makes my cluster highly unstable. I don't want to go deep into this problem because I spent too mach time on it already.
To fix the stability issue, I went back to the default resolvconf_mode: docker_dns, but then i have a OIDC problem.
oidc authenticator: initializing plugin: Get https://keycloak.example.com/auth/realms/test/.well-known/openid-configuration: dial tcp: lookup keycloak.example.com on 8.8.8.8:53: no such host
kube-apiserver can't resolve keycloak.example.com domain because it queires nameservers from the host (8.8.8.8). I thought it should query docker_dns as it is stated in the kubespray documentation:
https://github.com/kubernetes-sigs/kubespray/blob/master/docs/dns-stack.md
For hostNetwork: true PODs however, k8s will let docker setup DNS settings. Docker containers which are not started/managed by k8s will also use these docker options.
Is there a way to configure kubespray inventory to fix this problem without manually adding a nameserver to each master node? Here is my part of the inventory config:
kube_oidc_url: https://keycloak.example.com/auth/realms/test
kube_oidc_client_id: cluster
resolvconf_mode: docker_dns
upstream_dns_servers:
- 192.168.30.47
coredns_external_zones: &external_zones
- zones:
- example.com:53
nameservers:
- 192.168.30.47
cache: 5
nodelocaldns_external_zones: *external_zones
docker-dns.conf
[Service]
Environment="DOCKER_DNS_OPTIONS=\
--dns 10.233.0.3 --dns 192.168.30.47 --dns 8.8.8.8 \
--dns-search default.svc.cluster.local --dns-search svc.cluster.local \
--dns-opt ndots:2 --dns-opt timeout:2 --dns-opt attempts:2 \
"

Related

How to set DNS entrys & network configuration for kubernetes cluster #home (noob here)

I am currently running a Kubernetes cluster on my own homeserver (in proxmox ct's, was kinda difficult to get working because I am using zfs too, but it runs now), and the setup is as follows:
lb01: haproxy & keepalived
lb02: haproxy & keepalived
etcd01: etcd node 1
etcd02: etcd node 2
etcd03: etcd node 3
master-01: k3s in server mode with a taint for not accepting any jobs
master-02: same as above, just joining with the token from master-01
master-03: same as master-02
worker-01 - worker-03: k3s agents
If I understand it correctly k3s delivers with flannel as a CNI pre-installed, as well as traefik as a Ingress Controller.
I've setup rancher on my cluster as well as longhorn, the volumes are just zfs volumes mounted inside the agents tho, and as they aren't on different hdd's I've set the replicas to 1. I have a friend running the same setup (we set them up together, just yesterday) and we are planing on joining our networks trough vpn tunnels and then providing storage nodes for each other as an offsite backup.
So far I've hopefully got everything correct.
Now to my question: I've both got a static ip #home as well as a domain, and I've set that domain to my static ip
Something like that: (don't know how dns entries are actually written, just from the top of my head for your reference, the entries are working well.)
A example.com. [[my-ip]]
CNAME *.example.com. example.com
I've currently made a port-forward to one of my master nodes for port 80 & 443 but I am not quite sure how you would actually configure that with ha in mind, and my rancher is throwing a 503 after visiting global settings, but I have not changed anything.
So now my question: How would one actually configure the port-forward and, as far as I know k3s has a load-balancer pre-installed, but how would one configure those port-forwards for ha? the one master node it's pointing to could, theoretically, just stop working and then all services would not be reachable anymore from outside.
Assuming your apps are running on port 80 and port 443 your ingress should give you a service with an external ip and you would point your dns at that. Read below for more info.
Seems like you are not a noob! you got a lot going on with your cluster setup. What you are asking is a bit complicated to answer and I will have to make some assumptions about your setup, but will do my best to give you at least some intial info.
This tutorial has a ton of great info and may help you with what you are doing. They use kubeadm instead of k3s, buy you can skip that section if you want and still use k3s.
https://www.debontonline.com/p/kubernetes.html
If you are setting up and installing etcd on your own, you don't need to do that k3s will create an etcd cluster for you that run inside pods on your cluster.
Load Balancing your master nodes
haproxy + keepalived nodes would be configured to point to the ips of your master nodes at port 6443 (TCP), the keepalived will give you a virtual ip and you would configure your kubeconfig (that you get from k3s) to talk to that ip. On your router you will want to reserve an ip (make sure not to assign that to any computers).
This is a good video that explains how to do it with a nodejs server but concepts are the same for your master nodes:
https://www.youtube.com/watch?v=NizRDkTvxZo
Load Balancing your applications running in the cluster
Use an K8s Service read more about it here: https://kubernetes.io/docs/concepts/services-networking/service/
essentially you need an external ip, I prefer to do this with metal lb.
metal lb gives you a service of type load balancer with an external ip
add this flag to k3s when creating initial master node:
https://metallb.universe.tf/configuration/k3s/
configure metallb
https://metallb.universe.tf/configuration/#layer-2-configuration
You will want to reserve more ips on your router and put them under the addresses section in the yaml below. In this example you will see you have 11 ips in the range 192.168.1.240 to 192.168.1.250
create this as a file example metallb-cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
namespace: metallb-system
name: config
data:
config: |
address-pools:
- name: default
protocol: layer2
addresses:
- 192.168.1.240-192.168.1.250
kubectl apply -f metallb-cm.yaml
Install with these yaml files:
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.12.1/manifests/namespace.yaml
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.12.1/manifests/metallb.yaml
source - https://metallb.universe.tf/installation/#installation-by-manifest
ingress
Will need a service of type load balancer, use its external ip as the external ip
kubectl get service -A - look for your ingress service and see if it has an external ip and does not say pending
I will do my best to answer any of your follow up questions. Good Luck!

kubernetes pod cannot resolve local hostnames but can resolve external ones like google.com

I am trying kubernetes and seem to have hit bit of a hurdle. The problem is that from within my pod I can't curl local hostnames such as wrkr1 or wrkr2 (machine hostnames on my network) but can successfully resolve hostnames such as google.com or stackoverflow.com.
My cluster is a basic setup with one master and 2 worker nodes.
What works from within the pod:
curl to google.com from pod -- works
curl to another service(kubernetes) from pod -- works
curl to another machine on same LAN via its IP address such as 192.168.x.x -- works
curl to another machine on same LAN via its hostname such as wrkr1 -- does not work
What works from the node hosting pod:
curl to google.com --works
curl to another machine on same LAN via
its IP address such as 192.168.x.x -- works
curl to another machine
on same LAN via its hostname such as wrkr1 -- works.
Note: the pod cidr is completely different from the IP range used in
LAN
the node contains a hosts file with entry corresponding to wrkr1's IP address (although I've checked node is able to resolve hostname without it also but I read somewhere that a pod inherits its nodes DNS resolution so I've kept the entry)
Kubernetes Version: 1.19.14
Ubuntu Version: 18.04 LTS
Need help as to whether this is normal behavior and what can be done if I want pod to be able to resolve hostnames on local LAN as well?
What happens
Need help as to whether this is normal behavior
This is normal behaviour, because there's no DNS server in your network where virtual machines are hosted and kubernetes has its own DNS server inside the cluster, it simply doesn't know about what happens on your host, especially in /etc/hosts because pods simply don't have access to this file.
I read somewhere that a pod inherits its nodes DNS resolution so I've
kept the entry
This is a point where tricky thing happens. There are four available DNS policies which are applied per pod. We will take a look at two of them which are usually used:
"Default": The Pod inherits the name resolution configuration from the node that the pods run on. See related discussion for more details.
"ClusterFirst": Any DNS query that does not match the configured cluster domain suffix, such as "www.kubernetes.io", is forwarded to the upstream nameserver inherited from the node. Cluster administrators may have extra stub-domain and upstream DNS servers configured
The trickiest ever part is this (from the same link above):
Note: "Default" is not the default DNS policy. If dnsPolicy is not
explicitly specified, then "ClusterFirst" is used.
That means that all pods that do not have DNS policy set will be run with ClusterFirst and they won't be able to see /etc/resolv.conf on the host. I tried changing this to Default and indeed, it can resolve everything host can, however internal resolving stops working, so it's not an option.
For example coredns deployment is run with Default dnsPolicy which allows coredns to resolve hosts.
How this can be resolved
1. Add local domain to coreDNS
This will require to add A records per host. Here's a part from edited coredns configmap:
This should be within .:53 { block
file /etc/coredns/local.record local
This part is right after block above ends (SOA information was taken from the example, it doesn't make any difference here):
local.record: |
local. IN SOA sns.dns.icann.org. noc.dns.icann.org. 2015082541 7200 3600 1209600 3600
wrkr1. IN A 172.10.10.10
wrkr2. IN A 172.11.11.11
Then coreDNS deployment should be added to include this file:
$ kubectl edit deploy coredns -n kube-system
volumes:
- configMap:
defaultMode: 420
items:
- key: Corefile
path: Corefile
- key: local.record # 1st line to add
path: local.record # 2nd line to add
name: coredns
And restart coreDNS deployment:
$ kubectl rollout restart deploy coredns -n kube-system
Just in case check if coredns pods are running and ready:
$ kubectl get pods -A | grep coredns
kube-system coredns-6ddbbfd76-mk2wv 1/1 Running 0 4h46m
kube-system coredns-6ddbbfd76-ngrmq 1/1 Running 0 4h46m
If everything's done correctly, now newly created pods will be able to resolve hosts by their names. Please find an example in coredns documentation
2. Set up DNS server in the network
While avahi looks similar to DNS server, it does not act like a DNS server. It's not possible to setup requests forwarding from coredns to avahi, while it's possible to proper DNS server in the network and this way have everything will be resolved.
3. Deploy avahi to kubernetes cluster
There's a ready image with avahi here. If it's deployed into the cluster with dnsPolicy set to ClusterFirstWithHostNet and most importantly hostNetwork: true it will be able to use host adapter to discover all available hosts within the network.
Useful links:
Pods DNS policy
Custom DNS entries for kubernetes

Failed calling webhook "namespace.sidecar-injector.istio.io"

I have make my deployment work with istio ingressgateway before. I am not aware of any changes made in istio or k8s side.
When I tried to deploy, I see an error in replicaset side that's why it cannot create new pod.
Error creating: Internal error occurred: failed calling webhook
"namespace.sidecar-injector.istio.io": Post
"https://istiod.istio-system.svc:443/inject?timeout=10s": dial tcp
10.104.136.116:443: connect: no route to host
When I try to go inside api-server and ping 10.104.136.116 (istiod service IP) it just hangs.
What I have tried so far:
Deleted all coredns pods
Deleted all istiod pods
Deleted all weave pods
Reinstalling istio via istioctl x uninstall --purge
turning all of VMs firewall
sudo iptables -P INPUT ACCEPT
sudo iptables -P FORWARD ACCEPT
sudo iptables -P OUTPUT ACCEPT
sudo iptables -F
restarted all of the nodes
manual istio pod injection
Setup
k8s version: 1.21.2
istio: 1.10.3
HA setup
CNI: weave
CRI: containerd
In my case this was related to firewall. More info can be found here.
The gist of it is that on GKE at least you need to open another port 15017 in addition to 10250 and 443. This is to allow communication from your master node(s) to you VPC.
I don't have a definite answer unto why is this happening. But kube-apiserver cannot access istiod via service IP, wherein it can connect when I used the istiod pod IP.
Since I don't have the control over the VM and lower networking layer and not sure if they have changed something (because it is working before).
I made this work by changing my CNI from weave to flannel
In my case it was due to firewall. Following this Istio debug guide, I identified that the kubectl get --raw /api/v1/namespaces/istio-system/services/https:istiod:https-webhook/proxy/inject -v4 command was timing out while all other cluster internal calls were ok.
The best way to diagnose this is to open temporarly your AWS Security Groups involved to 0.0.0.0/0 for port 15017 and then try again.
If the errror won't show again, you know there's need to fix this part.
I am using EKS with Amazon VPC CNI v1.12.2-eksbuild.1

Redis Cluster Client doesn't work with Redis cluster on GKE

My setup has a K8S Redis cluster with 8 nodes and 32 pods across them and a load balancer service on top.
I am using a Redis cluster client to access this cluster using the load balancer's external IP. However, when handling queries, as part of Redis cluster redirection (MOVED / ASK), the cluster client receives internal IP addresses of the 32 Pods, connection to which fails within the client.
For example, I provide the IP address of the load balancer (35.245.51.198:6379) but the Redis cluster client throws errors like -
Caused by: redis.clients.jedis.exceptions.JedisConnectionException: Failed connecting to host 10.32.7.2:6379, which is an internal Pod IP.
Any ideas about how to deal with this situation will be much appreciated.
Thanks in advance.
If you're running on GKE, you can NAT the Pod IP using the IP masquerade agent:
Using IP masquerading in your clusters can increase their security by preventing individual Pod IP addresses from being exposed to traffic outside link-local range (169.254.0.0/16) and additional arbitrary IP ranges
Your issue specifically is that, the pod range is on 10.0.0.0/8, which is by default a non-masquerade CIDR.
You can change this using a ConfigMap to treat that range as masquerade so that it picks the node's external IP as source address.
Alternatively, you can change the pod range in your cluster to anything that is masked.
I have been struggling with the same problem in installing the bitnami/redis-cluster on gke.
In order to have the right networking settings you should create the cluster setting as a public cluster
The equivalent command line for creating the cluster in MYPROJECT is:
gcloud beta container --project "MYPROJECT" clusters create "redis-cluster" --zone "us-central1-c" --no-enable-basic-auth --cluster-version "1.21.5-gke.1802" --release-channel "regular" --machine-type "e2-medium" --image-type "COS_CONTAINERD" --disk-type "pd-standard" --disk-size "100" --metadata disable-legacy-endpoints=true --scopes "https://www.googleapis.com/auth/devstorage.read_only","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/monitoring","https://www.googleapis.com/auth/servicecontrol","https://www.googleapis.com/auth/service.management.readonly","https://www.googleapis.com/auth/trace.append" --num-nodes "3" --logging=SYSTEM,WORKLOAD --monitoring=SYSTEM --no-enable-ip-alias --network "projects/MYPROJECT/global/networks/default" --subnetwork "projects/oddsjam/regions/us-central1/subnetworks/default" --no-enable-intra-node-visibility --no-enable-master-authorized-networks --addons HorizontalPodAutoscaling,HttpLoadBalancing,GcePersistentDiskCsiDriver --enable-autoupgrade --enable-autorepair --max-surge-upgrade 1 --max-unavailable-upgrade 0 --workload-pool "myproject.svc.id.goog" --enable-shielded-nodes --node-locations "us-central1-c"
Then you need to create as many External IP addresses in the Network VPC product. Those IP addresses will be picked by the Redis nodes automatically.
Then you are ready to get the values.yaml of the Bitnami Redis Cluster Helm chart and change the conf accordingly to your use case. Add the list of external ips you created to the cluster.externalAccess.loadBalancerIP value.
Finally, you can run the command to install a Redis cluster on GKE by running
helm install cluster-name -f values.yaml bitnami/redis-cluster
This command will give you the password of the cluster. you can use redis-client to connect to the new cluster with:
redis-cli -c -h EXTERNAL_IP -p 6379 -a PASSWORD

Kubernetes Endpoints IPs not in range

I have a K8s cluster installed in several RHEL 7.2 VMs.
Seems that the installation form yum repository comes without addons.
Currently I am facing the following problem almost with any service I am trying to deploy: Jenkins, Kube-ui, influxdb-grafana
Endpoints IPs are not in the range that is defined for Flannel and obviously the services are not available.
Any ideas on how to debug\resolve the problem?
System details:
# lsb_release -i -r
Distributor ID: RedHatEnterpriseServer
Release: 7.2
Packages installed:
kubernetes.x86_64 1.2.0-0.9.alpha1.gitb57e8bd.el7
etcd.x86_64 2.2.5-1.el7
flannel.x86_64 0.5.3-9.el7
docker.x86_64 1.9.1-25.el7.centos
ETCD network configuration
# etcdctl get /atomic.io/network/config
{"Network":"10.0.0.0/16"}
Service gets proper IP but wrong Endpoints
# kubectl describe svc jenkinsmaster
Name: jenkinsmaster
Namespace: default
Labels: kubernetes.io/cluster-service=true,kubernetes.io/name=JenkinsMaster
Selector: name=jenkinsmaster
Type: NodePort
IP: 10.254.113.89
Port: http 8080/TCP
NodePort: http 30996/TCP
Endpoints: 172.17.0.2:8080
Port: slave 50000/TCP
NodePort: slave 31412/TCP
Endpoints: 172.17.0.2:50000
Session Affinity: None
No events.
Thank you.
I think the flannel network subnet and the kubernetes internal network subnet seems to be conflicting here.
With the amount of information as I see now all I can say is that there is a conflict here. To verify that flannel is working just start contianer in two different machines connected with flannel and see if they can talk and what IP address they get. If they are being assigned IP of range 10.0.0.0/16 and they can talk then flannel is doing good. And something is wrong with the integration with kubernetes.
If you are not getting the IP addresses of some other range flannel is not doing good.
kubernetes 1.12...docker 1.9... They are ancient version now. So you don't have CNI or kubeadm. I can barely remember how to setup a kubernetes cluster with flannel that time.
Anyway, you need to know Endpoint IP is same as target Pod IP, that is IP of docker container. So your docker container IP is not the same range as your flannel IP, and 172.17.0.x is the default docker IP range. So I think you need to change docker start parameter like --bip=${FLANNEL_SUBNET} --mtu=${FLANNEL_MTU}, you can use 10.0.0.0/16 as FLANNEL_SUBNET is you want a basic setup.