dns entries for pods in not ready state - mongodb

I'm trying to build a simple mongo replica set cluster in kubernetes.
i have a StatefulSet of mongod instances, with
livenessProbe:
initialDelaySeconds: 60
exec:
command:
- mongo
- --eval
- "db.adminCommand('ping')"
readinessProbe:
initialDelaySeconds: 60
exec:
command:
- /usr/bin/mongo --quiet --eval 'rs.status()' | grep ok | cut -d ':' -f 2 | tr -dc '0-9' | awk '{ if($0=="0"){ exit 127 }else{ exit 0 } }'
as you can see, my readinessProbe is checking to see if the mongo replicaSet is working correctly.
however, i get a circular dependency with (and existing) cluster reporting:
"lastHeartbeatMessage" : "Error connecting to mongo-2.mongo:27017 :: caused by :: Could not find address for mongo-2.mongo:27017: SocketException: Host not found (authoritative)",
(where mongo-2 was undergoing a rolling update).
looking further:
$ kubectl run --generator=run-pod/v1 tmp-shell --rm -i --tty --image nicolaka/netshoot -- /bin/bash
bash-5.0# nslookup mongo-2.mongo
Server: 10.96.0.10
Address: 10.96.0.10#53
** server can't find mongo-2.mongo: NXDOMAIN
bash-5.0# nslookup mongo-0.mongo
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: mongo-0.mongo.cryoem-logbook-dev.svc.cluster.local
Address: 10.27.137.6
so the question is whether there is a way to get kubernetes to always keep the dns entry for the mongo pods to always be present? it appears that i have a chicken and egg situation where if the entire pod hasn't passed its readiness and liveness checks, then a dns entry is not created, and hence the other mongod instances will not be able to access it.

I ended up just putting in a ClusterIP Service for each of the statefulset instances with a selector for the specific instance:
ie
apiVersion: v1
kind: Service
metadata:
name: mongo-0
spec:
clusterIP: 10.101.41.87
ports:
- port: 27017
protocol: TCP
targetPort: 27017
selector:
role: mongo
statefulset.kubernetes.io/pod-name: mongo-0
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
and repeat for the othe stss. the key here is the selector:
statefulset.kubernetes.io/pod-name: mongo-0

I believe you are misinterpreting the error.
Could not find address for mongo-2.mongo:27017: SocketException: Host not found (authoritative)"
The pod is created with an IP attached. Then it's registered into DNS:
Pod-0 has the IP 10.0.0.10 and now it's FQDN is Pod-0.servicename.namespace.svc.cluster.local
Pod-1 has the IP 10.0.0.11 and now it's FQDN is Pod-1.servicename.namespace.svc.cluster.local
Pod-2 has the IP 10.0.0.12 and now it's FQDN is Pod-2.servicename.namespace.svc.cluster.local
But DNS is a live service, IPs are dynamically assigned and can't be duplicated.
So whenever it receives a request:
"Connect me with Pod-A.servicename.namespace.svc.cluster.local"
It tries to reach the registered IP and if the Pod is offline due to a rolling update, it will think the pod is unavailable and will return "Could not find the address (IP) for Pod-0.servicename" until the pod is online again or until the IP reservation expires and only then the DNS registry will be recycled.
The DNS is not discarting the DNS name registered, it's only answering it's currently offline.
You can either ignore the errors during the rolling or rethink your script and try using the internal js environment as mentioned in the comments for continuous monitoring of the mongo status.
EDIT:
When Pods from a StatefulSet with N replicas are being deployed, they are created sequentially, in order from {0..N-1}.
When Pods are being deleted, they are terminated in reverse order, from {N-1..0}.
This is the expected/desired default behavior.
So the error is expected, since the rollingUpdate makes the pod temporarily unavailable.

Related

Why sessionAffinity doesn't work on a headless service

I have the following headless service in my kubernetes cluster :
apiVersion: v1
kind: Service
metadata:
labels:
app: foobar
name: foobar
spec:
clusterIP: None
clusterIPs:
- None
internalTrafficPolicy: Cluster
ipFamilies:
- IPv4
ipFamilyPolicy: SingleStack
ports:
- port: 80
protocol: TCP
targetPort: 80
selector:
app: foobar
sessionAffinity: ClientIP
sessionAffinityConfig:
clientIP:
timeoutSeconds: 10800
type: ClusterIP
Behind are running couple of pods managed by a statefulset.
Lets try to reach my pods individually :
Running an alpine pod to contact my pods :
> kubectl run alpine -it --tty --image=alpine -- sh
Adding curl to fetch webpage :
alpine#> add apk curl
I can curl into each of my pods :
alpine#> curl -s pod-1.foobar
hello from pod 1
alpine#> curl -s pod-2.foobar
hello from pod 2
It works just as expected.
Now I want to have a service that will loadbalance between my pods.
Let's try to use that same foobar service :
alpine#> curl -s foobar
hello from pod 1
alpine#> curl -s foobar
hello from pod 2
It works just well. At least almost : In my headless service, I have specified sessionAffinity. As soon as I run a curl to a pod, I should stick to it.
I've tried the exact same test with a normal service (not headless) and this time it works as expected. It load balances between pods at first run BUT then stick to the same pod afterwards.
Why sessionAffinity doesn't work on a headless service ?
The affinity capability is provided by kube-proxy, only connection establish thru the proxy can have the client IP "stick" to a particular pod for a period of time. In case of headless, your client is given a list of pod IP(s) and it is up to your client app. to select which IP to connect. Because the order of IP(s) in the list is not always the same, typical app. that always pick the first IP will result to connect to the backend pod randomly.

port forwarding on microk8s on mac m1

I'm freshman on microk8s, and I'm trying out things by deploying a simple apache2 to see things working on my Mac M1:
◼ ~ $ microk8s kubectl run apache --image=ubuntu/apache2:2.4-22.04_beta --port=80
pod/apache created
◼ ~ $ microk8s kubectl get pods
NAME READY STATUS RESTARTS AGE
apache 1/1 Running 0 5m37s
◼ ~ $ microk8s kubectl port-forward pod/apache 3000:80
Forwarding from 127.0.0.1:3000 -> 80
but:
◼ ~ $ curl http://localhost:3000
curl: (7) Failed to connect to localhost port 3000 after 5 ms: Connection refused
I've also tried to use a service:
◼ ~ $ microk8s kubectl expose pod apache --type=NodePort --port=4000 --target-port=80
service/apache exposed
◼ ~ $ curl http://localhost:4000
curl: (7) Failed to connect to localhost port 4000 after 3 ms: Connection refused
I guess I'm doing something wrong?
For some reason I haven't figured it out, if I port-forward right within the VM by opening a shell via multipass, it does work. Next, you simply have to point to the VM's IP:
within a VM's shell:
ubuntu#microk8s-vm:~$ sudo microk8s kubectl port-forward service/hellopg 8080:8080 --address="0.0.0.0"
Forwarding from 0.0.0.0:8080 -> 8080
Handling connection for 8080
ubuntu#microk8s-vm:~$ ifconfig enp0s1
enp0s1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.64.2 netmask 255.255.255.0 broadcast 192.168.64.255
inet6 fde3:1a04:ba31:1209:5054:ff:fea9:9cf4 prefixlen 64 scopeid 0x0<global>
from the host:
curl http://192.168.64.2:8080/hello 7
{"status": "how you doing?", "env_var":"¡hola mundo!"}
it works. I guess the command via microk8s is not executed properly within the machine? If anybody can explain this I'll update the question
Microk8’s acts the same as kuberentes. So, it's better to create a Service with NodePort. This would expose your apache.
apiVersion: v1
kind: Service
metadata:
name: my-apache
spec:
type: NodePort
selector:
app: apache
ports:
- port: 80
targetPort: 80
nodePort: 30004
Change the selector as per your requirement. For more detailed information to create NodePort service refer to this official document
You can use ingress as well. But in your case only for testing you can go with NodePort
I think the easiest way for you to test it would be adding: externalIPs to you service.
kind: Service
metadata:
name: nginx-service
spec:
selector:
app: nginx
ports:
- name: http
protocol: TCP
port: 80
targetPort: 80
externalIPs:
- 192.168.56.100 #your cluster IP
Happy coding!

Connection Refused between Kubernetes pods in the same cluster

I am new to Kubernetes and I'm working on deploying an application within a new Kubernetes cluster.
Currently, the service running has multiple pods that need to communicate with each other. I'm looking for a general approach to go about debugging the issue, rather than getting into the specifies of the service as the question will become much too specific.
The pods within the cluster are throwing an error:
err="Get \"http://testpod.mynamespace.svc.cluster.local:8080/": dial tcp 10.10.80.100:8080: connect: connection refused"
Both pods are in the same cluster.
What are the best steps to take to debug this?
I have tried running:
kubectl exec -it testpod --namespace mynamespace -- cat /etc/resolv.conf
And this returns:
search mynamespace.svc.cluster.local svc.cluster.local cluster.local us-east-2.compute.internal
Which I found here: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/
First of all, the following pattern:
my-svc.my-namespace.svc.cluster-domain.example
is applicable only to FQDNs of Services, not Pods which have the following form:
pod-ip-address.my-namespace.pod.cluster-domain.example
e.g.:
172-17-0-3.default.pod.cluster.local
So in fact you're querying cluster dns about FQDN of the Service named testpod and not about FQDN of the Pod. Judging by the fact that it's being resolved successfully, such Service already exists in your cluster but most probably is misconfigured. The fact that you're getting the error message connection refused can mean the following:
your Service FQDN testpod.mynamespace.svc.cluster.local has been successfully resolved
(otherwise you would receive something like curl: (6) Could not resolve host: testpod.default.svc.cluster.local)
you've reached successfully your testpod Service
(otherwise, i.e. if it existed but wasn't listening on 8080 port, you're trying to connect to, you would receive timeout e.g. curl: (7) Failed to connect to testpod.default.svc.cluster.local port 8080: Connection timed out)
you've reached the Pod, exposed by testpod Service (you've been sussessfully redirected to it by the testpod Service)
but once reached the Pod, you're trying to connect to incorect port and that's why the connection is being refused by the server
My best guess is that your Pod in fact listens on different port, like 80 but you exposed it via the ClusterIP Service by specifying only --port value e.g. by:
kubectl expose pod testpod --port=8080
In such case both --port (port of the Service) and --targetPort (port of the Pod) will have the same value. In other words you've created a Service like the one below:
apiVersion: v1
kind: Service
metadata:
name: testpod
spec:
ports:
- protocol: TCP
port: 8080
targetPort: 8080
And you probably should've exposed it either this way:
kubectl expose pod testpod --port=8080 --targetPort=80
or with the following yaml manifest:
apiVersion: v1
kind: Service
metadata:
name: testpod
spec:
ports:
- protocol: TCP
port: 8080
targetPort: 80
Of course your targetPort may be different than 80, but connection refused in such case can mean only one thing: target http server (running in a Pod) refuses connection to 8080 port (most probably because it isn't listening on it). You didn't specify what image you're using, whether it's a standard nginx webserver or something based on your custom image. But if it's nginx and wasn't configured differently it listens on port 80.
For further debug, you can attach to your Pod:
kubectl exec -it testpod --namespace mynamespace -- /bin/sh
and if netstat command is not present (the most likely scenario) run:
apt update && apt install net-tools
and then check with netstat -ntlp on which port your container listens on.
I hope this helps you solve your issue. In case of any doubts, don't hesitate to ask.

redis-cluster on kubernetes: connection timed out

I have combined/followed the following manuals to create a redis cluster on kubernetes (GCP):
https://github.com/sanderploegsma/redis-cluster
https://rancher.com/blog/2019/deploying-redis-cluster
I have created 3 nodes with each 2 pods on it. The problem is: I get a connection timeout when I connect from outside of the kubernetes cluster (through a load balancer external ip) to the redis-cluster.
$ redis-cli -h external_ip_lb -p 6379 -c
external_ip_lb:6379> set foo bar
-> Redirected to slot [12182] located at interal_ip_node:6379
Could not connect to Redis at interal_ip_node:6379: Operation timed out
When I get into the shell of a running container and do the redis-cli commands there, it works.
$ kubectl exec -it redis-cluster-0 -- redis-cli -c
127.0.0.1:6379> set foo bar
-> Redirected to slot [12182] located at internal_ip_node:6379
OK
internal_ip_node:6379> get foo
"bar"
I also tried to set a cluster IP service and do a port-foward to my local machine port 7000, this gives me the same error as with the external ip method.
$ kubectl port-foward pods/redis-cluster-0 7000:6379
Does anyone has an idea what could be wrong? Clearly it has something do do with my local machine not being a part of the kubernetes cluster, so the connection with the internal IP's of the other nodes fail.
Edit: output of kubectl describe svc redis-cluster-lb
Name: redis-cluster-lb
Namespace: default
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"redis-cluster-lb","namespace":"default"},"spec":{"ports":[{"port"...
Selector: app=redis-cluster
Type: LoadBalancer
IP: internal_ip_lb
LoadBalancer Ingress: external_ip_lb
Port: <unset> 6379/TCP
TargetPort: 6379/TCP
NodePort: <unset> 30631/TCP
Endpoints: internal_ip_node_1:6379,internal_ip_node_2:6379,internal_ip_node_3:6379 + 3 more...
Session Affinity: None
External Traffic Policy: Cluster
Events: <none>
I'm able to ping the external load balancer's IP.
I am not Redis expert, but in Redis documentation you can read:
Since cluster nodes are not able to proxy requests, clients may be redirected to other nodes using redirection errors
This is why you are are having this issues with redis cluster behind LB and this is also the reason why it is (most probably) not going to work.
You may probably need to use some proxy (e.g. official redis-cluster-poxy) that is running inside of k8s cluster, can reach all internal IPs of redis cluster and would handle redirects.

DNS lookup not working properly in Kubernetes cluster

I spin up a cluster with minikube then apply this dummy deployment/service
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
spec:
selector:
matchLabels:
run: nginx-label
replicas: 2
template:
metadata:
labels:
run: nginx-label
spec:
containers:
- name: nginx-container
image: nginx
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: nginx-service
labels:
run: nginx-label
spec:
ports:
- port: 1234
targetPort: 80
protocol: TCP
selector:
run: nginx-label
Then I create a dummy curl pod to test internal network with the following
kubectl run curl --image=radial/busyboxplus:curl -i --tty
Inside that curl instance, I'm able to access the nginx with $NGINX_SERVICE_SERVICE_HOST:$NGINX_SERVICE_SERVICE_PORT or nginx-service.default:1234, but not nginx-service:1234, even though those pods belong to the same namespace.
ubuntu:~$ kubectl get pods --namespace=default
NAME READY STATUS RESTARTS AGE
curl-69c656fd45-d7w8t 1/1 Running 1 29m
nginx-deployment-58595d65fc-9ln25 1/1 Running 0 29m
nginx-deployment-58595d65fc-znkqp 1/1 Running 0 29m
Any idea what could cause this? Following is the nslookup result
[ root#curl-69c656fd45-d7w8t:/ ]$ nslookup nginx-service
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name: nginx-service
Address 1: 23.202.231.169 a23-202-231-169.deploy.static.akamaitechnologies.com
Address 2: 23.217.138.110 a23-217-138-110.deploy.static.akamaitechnologies.com
[ root#curl-69c656fd45-d7w8t:/ ]$ nslookup nginx-service.default
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name: nginx-service.default
Address 1: 10.103.69.73 nginx-service.default.svc.cluster.local
[ root#curl-69c656fd45-d7w8t:/ ]$
Update: here's the content of /etc/resolv.conf
[ root#curl-69c656fd45-d7w8t:/ ]$ cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local attlocal.net
options ndots:5
[ root#curl-69c656fd45-d7w8t:/ ]$
Answering on your question in comment.
Check following lines below the field name in nslookup file, for host nginx-service they are :
Name: nginx-service
Address 1: 23.202.231.169 a23-202-231-169.deploy.static.akamaitechnologies.com
Address 2: 23.217.138.110 a23-217-138-110.deploy.static.akamaitechnologies.com
for nginx-service.default:
Name: nginx-service.default
Address 1: 10.103.69.73 nginx-service.default.svc.cluster.local
The flow is:
Executing command nslookup
Checking entrypoints(line below host with adress and hostname)
Comparing entrypoints and its domains to domain listed in /etc/resolv.conf file if it doesn't match this mean that we cannot reach specific host.
Your nginx-service"hit" a23-202-231-169.deploy.static.akamaitechnologies.com NOTnginx-service.default.svc.cluster.local.
nginx-service.default "hit" nginxservice.default.svc.cluster.local as you said that is why curl is working.
10.96.0.10 is the address of our system's Domain Name Server named kube-dns.kube-system.svc.cluster.local. This is the server your system is configured to use to translate domain names into IP addresses.
Speaking about the domain deploy.static.akamaitechnologies.com
Akamai is a Content Delivery Network used by Symantec (and many other companies). Kubernetes is a cloud computing service used by a large number of companies as well for internet content. These services are essential for providing content when you visit certain websites, and for delivering things like product updates. It is normal to see connections to these sites when you visit certain other websites or have certain products, like Norton, installed. They essentially provide the servers needed to propagate large amounts of data to various regions of the world quickly while balancing internet traffic so that individual server locations are not overloaded..
Hmm, sorry, not really...My question is that, based on the search
directive in /etc/resolv.conf, if nginx-service.default is resolved
to nginx-service.default.svc.cluster.local, so should nginx-service.
Did I miss anything?
Answer:
Keep in mind that:
Look firstly at nslookup
And then find match in /etc/resolv.conf
Note: In case you will do these steps vice versa, it won't work.
Reasons what may be the problem of wrong resolution of domain in nslookup are many - see: dns-debugging . Try to execute command dig nginx-service. Then interpret the output from this file to find the real problem. Because it is obvious that you cannot curl nginx-service (I have explained it above) but why nslookup shows different records for nameservers it is completely different question.
More information you can find here: nslookup, akamai.