kube service domain name not working, but clusterIP does work with Jupyter Enterprise Gateway - kubernetes

I have a Jupyter notebook setup in the jupyter namespace on a kubernetes cluster, and Jupyter Enterprise Gateway setup in the enterprise-gateway namespace as a Service in the same cluster.
If I configure the notebook to connect to the enterprise-gateway service using the clusterIP it works fine.
--gateway-url=http://172.20.186.249:8888
but if I switch to using the service domain name the notebook receives a 503 Connection Refused error
--gateway-url=http://enterprise-gateway.enterprise-gateway.svc.cluster.local:8888
When I use busybox check to check the kubernetes dns, the domain resolves as expected.
kubectl -n default exec -ti busybox nslookup enterprise-gateway.enterprise-gateway
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Server: 172.20.0.10
Address 1: 172.20.0.10 kube-dns.kube-system.svc.cluster.local
Name: enterprise-gateway.enterprise-gateway
Address 1: 172.20.186.249 enterprise-gateway.enterprise-gateway.svc.cluster.local
How do I get the domain name to work?
The Service config for the JEG looks like this...
kubectl describe svc enterprise-gateway --namespace enterprise-gateway
Name: enterprise-gateway
Namespace: enterprise-gateway
Labels: app=enterprise-gateway
app.kubernetes.io/managed-by=Helm
chart=enterprise-gateway-2.6.0
component=enterprise-gateway
heritage=Helm
release=enterprise-gateway
Annotations: meta.helm.sh/release-name: enterprise-gateway
meta.helm.sh/release-namespace: enterprise-gateway
Selector: app=enterprise-gateway
Type: NodePort
IP Family Policy: SingleStack
IP Families: IPv4
IP: 172.20.250.15
IPs: 172.20.250.15
Port: http 8888/TCP
TargetPort: 8888/TCP
NodePort: http 31366/TCP
Endpoints: 10.1.16.136:8888,10.1.2.228:8888,10.1.30.90:8888
Port: response 8877/TCP
TargetPort: 8877/TCP
NodePort: response 31201/TCP
Endpoints: 10.1.16.136:8877,10.1.2.228:8877,10.1.30.90:8877
Session Affinity: ClientIP
External Traffic Policy: Cluster
Events: <none>

Ok, i dont know where to start i have a bunch of findings. I will start with the eye catcher one, i have a working test project i can share later on and i have to elaborate more in this answer if needed.
Step1
1- I see a mismatch on your IPs. The DNS lookup did not resolved the service DNS to the correct IP.
Address 1: 172.20.186.249 is different than IP: 172.20.250.15
To debug DNS:
kubectl exec "YOURPODNAME" cat /etc/resolv.conf
Verify that a search path and a name server are set up correctly
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
check if the kubedns/coredns pods are running
kubectl get pods --namespace=kube-system
NAME READY STATUS RESTARTS AGE
....
kube-dns-86f4d74b45-2qkfd 3/3 Running 232 133d
kube-proxy-b2frq 1/1 Running 0 15m
...
If the pod is running, there might be something wrong with the global DNS service
kubectl get svc --namespace=kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP
You might also need to check whether DNS endpoints are exposed:
kubectl get ep kube-dns --namespace=kube-system
NAME ENDPOINTS AGE
kube-dns 172.17.0.5:53,172.17.0.5:53 133d
These debugging actions will usually indicate the problem with your DNS configuration, or it will simply show you that a DNS add-on should be enabled in your cluster configuration.
Step 2
When using busybox to check the kubernetes dns
This seems incorrect when looking Address 1: 172.20.186.249 im expecting to get an IP 10.X.X.X
Install dnsutils on the pod as pointed below
kubectl exec --stdin --tty "YOURPODNAME" -- apt update && sudo
apt-get -y install dnsutils
kubectl exec -it "YOURPODNAME" -- /bin/bash
Inside the pod and again (weird) run apt-get install dnsutils
Stay inside the pod and run nslookup "YOURSERVICENAME" you will get
an IP and a Name(DNS).
Check this IP since it needs to match with the IP of the service description.
kubectl describe svc "YOURSERVICENAME", the IP should be the same as #4
What you must see:
Step 3
Once you have Step #2 solved you will be able to use the service
name(FQDN) returned in Step 2 item #4
To be continued...

Related

ClusterIP not reachable within the Cluster

I'm struggling with kubernates configurations. What I want to get it's just to reach a deployment within the cluster. The cluster is on my dedicated server and I'm deploying it by using Kubeadm.
My nodes:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master Ready master 9d v1.19.3
k8s-worker1 Ready <none> 9d v1.19.3
I've a deployment running (nginx basic example)
$ kubectl get deployment
NAME READY UP-TO-DATE AVAILABLE AGE
nginx-deployment 2/2 2 2 29m
I've created a service
$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 9d
my-service ClusterIP 10.106.109.94 <none> 80/TCP 20m
The YAML file for my service is the following:
apiVersion: v1
kind: Service
metadata:
name: my-service
spec:
selector:
app: nginx-deployment
ports:
- protocol: TCP
port: 80
Now I should expect, if I run curl 10.106.109.94:80 on my k8s-master to get the http answer.. but what I got is:
curl: (7) Failed to connect to 10.106.109.94 port 80: Connection refused
I've tried with NodePort as well and with targetPort and nodePort but the result is the same.
The cluster ip can not be reachable from outside of your cluster that means you will not get any response from the host machine that host your k8s cluster as this ip is not a part of your machine or any other machine rather than its a cluster ip which is used by your cluster CNI network like flunnel,weave.
So to get your services accessible from the outside or atleast from the host machine you have to change the type of your service like NodePort,LoadBalancer,K8s port-forward.
If you can change the service type NodePort then you will get response with any of your host machine ip and the allocated nodeport.
For example,if your k8s-master is 192.168.x.x and nodePort is 33303 then you can get response by
curl http://192.168.x.x:33303
or
curl http://worker_node_ip:33303
if your cluster is in locally installed, then you can install metalLB to get the privilege of load balancer.
You can also use port-forward to get your service accessible from the host that has kubectl client with k8s cluster access.
kubectl port-forward svc/my-service 80:80
kubectl -n namespace port-forward svc/service_name Port:Port

Kubespray : Netchecker connectivity check fails

I deployed a Kubernetes (v1.17.5) cluster on OpenStack instances using Kubespray. Those instances are CentOS 7.6.1811 qcow2 images imported in Glance.
The install was successful, and I can see my nodes and pods with kubectl commands.
I used the deploy_netchecker option to deploy NetChecker and test the network within my cluster, and set network_plugin="flannel".
I also tried kube_proxy_mode="iptables", but it doesn't seem to affect the result.
That's pretty much all the changes I did in the k8s-cluster.yml file.
All the pods are running, services too :
[centos#cl1-master-0 ~]$ kubectl get svc --all-namespaces
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default kubernetes ClusterIP 10.233.0.1 <none> 443/TCP 46h
default netchecker-service NodePort 10.233.13.213 <none> 8081:31081/TCP 46h
kube-system coredns ClusterIP 10.233.0.3 <none> 53/UDP,53/TCP,9153/TCP 46h
kube-system dashboard-metrics-scraper ClusterIP 10.233.59.12 <none> 8000/TCP 46h
kube-system kubernetes-dashboard ClusterIP 10.233.63.20 <none> 443/TCP 46h
But netchecker API gives the following answer :
[root#localhost ~]# curl http://X.X.X.X:31081/api/v1/connectivity_check
{"Message":"Connectivity check fails. Reason: there are absent or outdated pods; look up the payload","Absent":["netchecker-agent-hostnet-kk56x","netchecker-agent-hostnet-klldn","netchecker-agent-hostnet-r2vqs","netchecker-agent-hostnet-wqhjs"],"Outdated":["netchecker-agent-4jsgf","netchecker-agent-c9pcf","netchecker-agent-hostnet-jzbfv","netchecker-agent-vxgpf"]}
For an unknown reason, I cannot access the API from a cluster node with localhost, so I used a floating IP with OpenStack.
Here are some logs from the agent :
[centos#cl1-master-0 ~]$ sudo vi /var/log/pods/default_netchecker-agent-vjnwl_d8290268-3ea4-4e3c-acb4-295ab162a735/netchecker-agent/0.log
{"log":"I0701 13:04:01.814246 1 agent.go:135] Response status code: 200\n","stream":"stderr","time":"2020-07-01T13:04:01.81437579Z"}
{"log":"I0701 13:04:01.814272 1 agent.go:128] Sleep for 15 second(s)\n","stream":"stderr","time":"2020-07-01T13:04:01.814393199Z"}
{"log":"I0701 13:04:16.817398 1 agent.go:55] Send payload via URL: http://netchecker-service:8081/api/v1/agents/netchecker-agent-vjnwl\n","stream":"stderr","time":"2020-07-01T13:04:16.817786735Z"}
[centos#cl1-master-0 ~]$ sudo vi /var/log/pods/default_netchecker-agent-hostnet-klldn_d5fa6e72-885f-44e1-97a6-880a25e6d6d6/netchecker-agent/0.log
{"log":"E0701 13:05:22.804428 1 agent.go:133] Error while sending info. Details: Post http://netchecker-service:8081/api/v1/agents/netchecker-agent-hostnet-klldn: dial tcp 10.233.13.213:8081: i/o timeout\n","stream":"stderr","time":"2020-07-01T13:05:22.805138032Z"}
{"log":"I0701 13:05:22.804474 1 agent.go:128] Sleep for 15 second(s)\n","stream":"stderr","time":"2020-07-01T13:05:22.805190295Z"}
{"log":"I0701 13:05:37.807140 1 agent.go:55] Send payload via URL: http://netchecker-service:8081/api/v1/agents/netchecker-agent-hostnet-klldn\n","stream":"stderr","time":"2020-07-01T13:05:37.807309111Z"}
Logs from the server do not indicate any error.
I tried to check DNS resolve with the following :
[centos#cl1-master-0 ~]$ kubectl exec -it netchecker-agent-4jsgf -- /bin/sh
/ $ nslookup kubernetes.default
Server: 169.254.25.10
Address 1: 169.254.25.10
nslookup: can't resolve 'kubernetes.default'
[centos#cl1-master-0 ~]$ kubectl exec -it netchecker-agent-4jsgf -- cat /etc/resolv.conf
nameserver 169.254.25.10
search default.svc.cluster.local svc.cluster.local cluster.local openstacklocal
options ndots:5
169.254.25.10 is the IP of the nodelocaldns, but it doesn't seem to query the coredns service deployed.
When I use nslookup netchecker-service.default.svc.cluster.local 10.233.0.3, with the coredns IP, I get a correct answer.
What can be wrong with my configuration ?
Thanks in advance
UPDATE : The plugin Flannel has an issue and contains a fix to apply on all nodes of the cluster. Once done, the pods successfully report back to the netchecker server.
UPDATE : The plugin Flannel has an issue and contains a fix to apply on all nodes of the cluster. Once done, the pods successfully report back to the netchecker server.

k8s: Get access to pods

I newbie question related with k8s. I've just installed a k3d cluster.
I've deployed an this helm chart:
$ helm install stable/docker-registry
It's been installed and pod is running correctly.
Nevertheless, I don't quite figure out how to get access to this just deployed service.
According to documentation, it's listening on 5000 port, and is using a ClusterIP. A service is also deployed.
$ kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.43.0.1 <none> 443/TCP 42h
docker-registry-1580212517 ClusterIP 10.43.80.185 <none> 5000/TCP 19m
EDIT
I've been able to say to chard creates an ingress:
$ kubectl get ingresses.networking.k8s.io -n default
NAME HOSTS ADDRESS PORTS AGE
docker-registry-1580214408 chart-example.local 172.20.0.4 80 10m
Nevertheless, I'm still without being able tp push images to registry:
$ docker push 172.20.0.4/feedly:v1
The push refers to repository [172.20.0.4/feedly]
Get https://172.20.0.4/v2/: x509: certificate has expired or is not yet valid
Since the service type is ClusterIP, you can't access the service from host system. You can run below command to access the service from your host system.
kubectl port-forward --address 0.0.0.0 svc/docker-registry-1580212517 5000:5000 &
curl <host IP/name>:5000

coredns do not resolve service name correctly

i use Kubernetes v1.11.3 ,it use coredns to resolve host or service name,but i find in pod ,the resolve not work correctly,
# kubectl get services --all-namespaces -o wide
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 50d <none>
kube-system calico-etcd ClusterIP 10.96.232.136 <none> 6666/TCP 50d k8s-app=calico-etcd
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 50d k8s-app=kube-dns
kube-system kubelet ClusterIP None <none> 10250/TCP 32d <none>
testalex grafana NodePort 10.96.51.173 <none> 3000:30002/TCP 2d app=grafana
testalex k8s-alert NodePort 10.108.150.47 <none> 9093:30093/TCP 13m app=alertmanager
testalex prometheus NodePort 10.96.182.108 <none> 9090:30090/TCP 16m app=prometheus
following command no response
# kubectl exec -it k8s-monitor-7ddcb74b87-n6jsd -n testalex /bin/bash
[root#k8s-monitor-7ddcb74b87-n6jsd /]# ping k8s-alert
PING k8s-alert.testalex.svc.cluster.local (10.108.150.47) 56(84) bytes of data.
and no cordons output log
# kubectl logs coredns-78fcdf6894-h78sd -n kube-system
i think maybe something is wrong,but i can not locate the problem,another question is why the two coredns pods on the master node,it suppose to one on each node
UPDATE
it seems coredns work fine ,but i do not understand the ping command no return
[root#k8s-monitor-7ddcb74b87-n6jsd yum.repos.d]# nslookup kubernetes.default
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: kubernetes.default.svc.cluster.local
Address: 10.96.0.1
[root#k8s-monitor-7ddcb74b87-n6jsd yum.repos.d]# cat /etc/resolv.conf
nameserver 10.96.0.10
search testalex.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
# kubectl get ep kube-dns --namespace=kube-system
NAME ENDPOINTS AGE
kube-dns 192.168.121.3:53,192.168.121.4:53,192.168.121.3:53 + 1 more... 50d
also dns server can not be reached
# kubectl exec -it k8s-monitor-7ddcb74b87-n6jsd -n testalex /bin/bash
[root#k8s-monitor-7ddcb74b87-n6jsd /]# cat /etc/resolv.conf
nameserver 10.96.0.10
search testalex.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
[root#k8s-monitor-7ddcb74b87-n6jsd /]# ping 10.96.0.10
PING 10.96.0.10 (10.96.0.10) 56(84) bytes of data.
^C
--- 10.96.0.10 ping statistics ---
9 packets transmitted, 0 received, 100% packet loss, time 8000ms
i think maybe i misconfig the network
this is my cluster init command
kubeadm init --kubernetes-version=v1.11.3 --apiserver-advertise-address=10.100.1.20 --pod-network-cidr=172.16.0.0/16
and this is calico ip pool set
# kubectl exec -it calico-node-77m9l -n kube-system /bin/sh
Defaulting container name to calico-node.
Use 'kubectl describe pod/calico-node-77m9l -n kube-system' to see all of the containers in this pod.
/ # cd /tmp
/tmp # ls
calicoctl tunl-ip
/tmp # ./calicoctl get ipPool
CIDR
192.168.0.0/16
You can start by checking if the dns is working
Run the nslookup on kubernetes.default from inside the pod k8s-monitor-7ddcb74b87-n6jsd, check if it is working.
[root#k8s-monitor-7ddcb74b87-n6jsd /]# nslookup kubernetes.default
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: kubernetes.default.svc.cluster.local
Address: 10.96.0.1
If this returns output that means everything is working from the coredns. If output is not okay, then look into the the resolve.conf inside the pod k8s-monitor-7ddcb74b87-n6jsd, it should return output something like this:
[root#metrics-master-2 /]# cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local ec2.internal
options ndots:5
At last check the coredns endpoints are exposed using:
kubectl get ep kube-dns --namespace=kube-system
NAME ENDPOINTS AGE
kube-dns 10.180.3.17:53,10.180.3.17:53 1h
You can verify if queries are being received by CoreDNS by adding the log plugin to the CoreDNS configuration (aka Corefile). The CoreDNS Corefile is held in a ConfigMap named coredns
Hope this helps.
EDIT:
You might be having this issue, Please have a look:
https://github.com/kubernetes/kubeadm/issues/1056
You cannot ping ipaddress or hostname of service cluster always,since it is virtual ip
service’s cluster IP is a virtual IP, and only has meaning when combined with the service port.You can try the same via srv recored(combination of virtual ip and port)(refer kubernetes in action by mark luksa)
Thanks for the answer. This is the output. IP-s certainly not real.
[root#master ~]# nslookup kubernetes.default
Server: 203.150.92.12
Address: 203.150.92.12#53
** server can't find kubernetes.default: NXDOMAIN
[root#master ~]# kubectl cluster-info
Kubernetes master is running at https://203.150.72.81:6443
coredns is running at https://203.150.72.81:6443/api/v1/namespaces/kube-system/services/coredns:dns/proxy
kubernetes-dashboard is running at https://203.150.72.81:6443/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy
metrics-server is running at https://203.150.72.81:6443/api/v1/namespaces/kube-system/services/https:metrics-server:/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
[root#master ~]# cat /etc/resolv.conf
search invalid
nameserver 203.150.92.12
nameserver 203.150.92.10
nameserver 1111:c207::2:55
[root#master ~]# kubectl get ep kube-dns --namespace=kube-system
Error from server (NotFound): endpoints "kube-dns" not found
[root#master ~]#
I think the reason why you cannot get ping working is because you are using iptables to redirect the request to service cluster IP to the correct pods. The iptables rule will only redirect the traffic to the service cluster IP with the exported ports. The icmp request is never been redirected to the real endpoints.

Cannot access Kubernetes pods exposed external ip on google cloud

I have created a sample node.js app and other required files (deployment.yml, service.yml) but I am not able to access the external IP of the service.
#kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.7.240.1 <none> 443/TCP 23h
node-api LoadBalancer 10.7.254.32 35.193.227.250 8000:30164/TCP 4m37s
#kubectl get pods
NAME READY STATUS RESTARTS AGE
node-api-6b9c8b4479-nclgl 1/1 Running 0 5m55s
#kubectl describe svc node-api
Name: node-api
Namespace: default
Labels: <none>
Annotations: <none>
Selector: app=node-api
Type: LoadBalancer
IP: 10.7.254.32
LoadBalancer Ingress: 35.193.227.250
Port: <unset> 8000/TCP
TargetPort: 8000/TCP
NodePort: <unset> 30164/TCP
Endpoints: 10.4.0.12:8000
Session Affinity: None
External Traffic Policy: Cluster
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal EnsuringLoadBalancer 6m19s service-controller Ensuring load balancer
Normal EnsuredLoadBalancer 5m25s service-controller Ensured load balancer
When I try to do a curl on external ip it gives connection refused
curl 35.193.227.250:8000
curl: (7) Failed to connect to 35.193.227.250 port 8000: Connection refused
I have exposed port 8000 in Dockerfile also. Let me know if I am missing anything.
Looking at your description on this thread it seems everything is fine.
Here is what you can try:
SSH to the GKE node where the pod is running. You can get the node name by running the same command you used with "-o wide" flag.
$ kubectl get pods -o wide
After that doing the SSH, try to curl Cluster as well as Service IP to see if you get response or not.
Try to SSH to the pod
$ kubectl exec -it -- /bin/bash
After that, run local host to see if you get response or not
$ curl localhost
So if you get response upon trying above troubleshooting steps then it could be an issue underlying at the GKE. You can file a defect report here.
If you do not get any response while trying the above steps, it is possible that you have misconfigured the cluster somewhere.
This seems to me a good starting point for troubleshooting your use case.