haproxy cannot detect services after reboot - kubernetes

I have reids nodes:
NAME READY STATUS RESTARTS AGE
pod/redis-haproxy-deployment-65497cd78d-659tq 1/1 Running 0 31m
pod/redis-sentinel-node-0 3/3 Running 0 81m
pod/redis-sentinel-node-1 3/3 Running 0 80m
pod/redis-sentinel-node-2 3/3 Running 0 80m
pod/ubuntu 1/1 Running 0 85m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/redis-haproxy-balancer ClusterIP 10.43.92.106 <none> 6379/TCP 31m
service/redis-sentinel-headless ClusterIP None <none> 6379/TCP,26379/TCP 99m
service/redis-sentinel-metrics ClusterIP 10.43.72.97 <none> 9121/TCP 99m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/redis-haproxy-deployment 1/1 1 1 31m
NAME DESIRED CURRENT READY AGE
replicaset.apps/redis-haproxy-deployment-65497cd78d 1 1 1 31m
NAME READY AGE
statefulset.apps/redis-sentinel-node 3/3 99m
I connect to the master redis using the following command:
redis-cli -h redis-haproxy-balancer
redis-haproxy-balancer:6379> keys *
1) "sdf"
2) "sdf12"
3) "s4df12"
4) "s4df1"
5) "fsafsdf"
6) "!s4d!1"
7) "s4d!1"
Here is my configuration file haproxy.cfg:
global
daemon
maxconn 256
defaults REDIS
mode tcp
timeout connect 3s
timeout server 3s
timeout client 3s
frontend front_redis
bind 0.0.0.0:6379
use_backend redis_cluster
backend redis_cluster
mode tcp
option tcp-check
tcp-check comment PING\ phase
tcp-check send PING\r\n
tcp-check expect string +PONG
tcp-check comment role\ check
tcp-check send info\ replication\r\n
tcp-check expect string role:master
tcp-check comment QUIT\ phase
tcp-check send QUIT\r\n
tcp-check expect string +OK
server redis-0 redis-sentinel-node-0.redis-sentinel-headless:6379 maxconn 1024 check inter 1s
server redis-1 redis-sentinel-node-1.redis-sentinel-headless:6379 maxconn 1024 check inter 1s
server redis-2 redis-sentinel-node-2.redis-sentinel-headless:6379 maxconn 1024 check inter 1s
Here is the service I go to in order to get to the master redis - haproxy-service.yaml:
apiVersion: v1
kind: Service
metadata:
name: redis-haproxy-balancer
spec:
type: ClusterIP
selector:
app: redis-haproxy
ports:
- protocol: TCP
port: 6379
targetPort: 6379
here is a deployment that refers to a configuration file - redis-haproxy-deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis-haproxy-deployment
labels:
app: redis-haproxy
spec:
replicas: 1
selector:
matchLabels:
app: redis-haproxy
template:
metadata:
labels:
app: redis-haproxy
spec:
containers:
- name: redis-haproxy
image: haproxy:lts-alpine
volumeMounts:
- name: redis-haproxy-config-volume
mountPath: /usr/local/etc/haproxy/haproxy.cfg
subPath: haproxy.cfg
ports:
- containerPort: 6379
volumes:
- name: redis-haproxy-config-volume
configMap:
name: redis-haproxy-config
items:
- key: haproxy.cfg
path: haproxy.cfg
After restarting redis I cannot connect to it with redis-haproxy-balancer...
[NOTICE] (1) : New worker (8) forked
[NOTICE] (1) : Loading success.
[WARNING] (8) : Server redis_cluster/redis-0 is DOWN, reason: Layer7 timeout, info: " at step 6 of tcp-check (expect string 'role:master')", check duration: 1000ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] (8) : Server redis_cluster/redis-1 is DOWN, reason: Layer7 timeout, info: " at step 6 of tcp-check (expect string 'role:master')", check duration: 1005ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] (8) : Server redis_cluster/redis-2 is DOWN, reason: Layer7 timeout, info: " at step 6 of tcp-check (expect string 'role:master')", check duration: 1001ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[ALERT] (8) : backend 'redis_cluster' has no server available!
It only works by connecting directly: redis-sentinel-node-0.redis-sentinel-headless
What is wrong with my haproxy?

You will need to add a resolver section and point it to the kubernetes dns.
Kubernetes: DNS for Services and Pods
HAProxy: Server IP address resolution using DNS
resolvers mydns
nameserver dns1 Kubernetes-DNS-Service-ip:53
resolve_retries 3
timeout resolve 1s
timeout retry 1s
hold other 30s
hold refused 30s
hold nx 30s
hold timeout 30s
hold valid 10s
hold obsolete 30s
backend redis_cluster
mode tcp
option tcp-check
... # your other settings
server redis-0 redis-sentinel-node-0.redis-sentinel-headless:6379 resolvers mydns maxconn 1024 check inter 1s
server redis-1 redis-sentinel-node-1.redis-sentinel-headless:6379 resolvers mydns maxconn 1024 check inter 1s

Related

Nginx Ingress Controller on Bare Metal expose problem

i try to deploy nginx-ingress-controller on bare metal , I have
4 Node
10.0.76.201 - Node 1
10.0.76.202 - Node 2
10.0.76.203 - Node 3
10.0.76.204 - Node 4
4 Worker
10.0.76.205 - Worker 1
10.0.76.206 - Worker 2
10.0.76.207 - Worker 3
10.0.76.214 - Worker 4
2 LB
10.0.76.208 - LB 1
10.0.76.209 - Virtual IP (keepalave)
10.0.76.210 - LB 10
Everything is on BareMetal , Load balancer located outside Cluster .
This is simple haproxy config , just check 80 port ( Worker ip )
frontend kubernetes-frontends
bind *:80
mode tcp
option tcplog
default_backend kube
backend kube
mode http
balance roundrobin
cookie lsn insert indirect nocache
option http-server-close
option forwardfor
server node-1 10.0.76.205:80 maxconn 1000 check
server node-2 10.0.76.206:80 maxconn 1000 check
server node-3 10.0.76.207:80 maxconn 1000 check
server node-4 10.0.76.214:80 maxconn 1000 check
I Install nginx-ingress-controller using Helm and everything work fine
NAME READY STATUS RESTARTS AGE
pod/ingress-nginx-admission-create-xb5rw 0/1 Completed 0 18m
pod/ingress-nginx-admission-patch-skt7t 0/1 Completed 2 18m
pod/ingress-nginx-controller-6dc865cd86-htrhs 1/1 Running 0 18m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/ingress-nginx-controller NodePort 10.106.233.186 <none> 80:30659/TCP,443:32160/TCP 18m
service/ingress-nginx-controller-admission ClusterIP 10.102.132.131 <none> 443/TCP 18m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/ingress-nginx-controller 1/1 1 1 18m
NAME DESIRED CURRENT READY AGE
replicaset.apps/ingress-nginx-controller-6dc865cd86 1 1 1 18m
NAME COMPLETIONS DURATION AGE
job.batch/ingress-nginx-admission-create 1/1 24s 18m
job.batch/ingress-nginx-admission-patch 1/1 34s 18m
Deploy nginx simple way and works fine
kubectl create deploy nginx --image=nginx:1.18
kubectl scale deploy/nginx --replicas=6
kubectl expose deploy/nginx --type=NodePort --port=80
after , i decided to create ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: tektutor-ingress
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
rules:
- host: "tektutor.training.org"
http:
paths:
- pathType: Prefix
path: "/nginx"
backend:
service:
name: nginx
port:
number: 80
works fine
kubectl describe ingress tektutor-ingress
Name: tektutor-ingress
Labels: <none>
Namespace: default
Address: 10.0.76.214
Ingress Class: <none>
Default backend: <default>
Rules:
Host Path Backends
---- ---- --------
tektutor.training.org
/nginx nginx:80 (192.168.133.241:80,192.168.226.104:80,192.168.226.105:80 + 3 more...)
Annotations: kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/rewrite-target: /
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal AddedOrUpdated 18m nginx-ingress-controller Configuration for default/tektutor-ingress was added or updated
Normal Sync 18m (x2 over 18m) nginx-ingress-controller Scheduled for sync
everything work fine , when i try curl any ip works curl (192.168.133.241:80,192.168.226.104:80,192.168.226.105:80 + 3 more...)
now i try to add hosts
10.0.76.201 tektutor.training.org
This is my master ip , is it correct to add here master ip ? when i try curl tektutor.training.org not working
Can you please explain what I am having problem with this last step?
I set the IP wrong? or what ? Thanks !
I hope I have written everything exhaustively
I used to this tutor Medium Install nginx Ingress Controller
TL;DR
Put in your haproxy backend config values shown below instead of the ones you've provided:
30659 instead of 80
32160 instead of 443 (if needed)
More explanation:
NodePort works on certain set of ports (default: 30000-32767) and in this scenario it allocated:
30659 for your ingress-nginx-controller port 80.
32160 for your ingress-nginx-controller port 443.
This means that every request trying to hit your cluster from outside will need to contact this ports (30...).
You can read more about it by following official documentation:
Kubernetes.io: Docs: Concepts: Services
A funny story that took 2 days :) In Ingress i have used the path /nginx but not hitting it while
Something like :
http://tektutor.training.org/nginx
THanks #Dawid Kruk who try to helm me :) !

Not able to connect to kafka brokers

I've deployed https://github.com/confluentinc/cp-helm-charts/tree/master/charts/cp-kafka on my on prem k8s cluster.
I'm trying to expose it my using a TCP controller with nginx.
My TCP nginx configmap looks like
data:
"<zookeper-tcp-port>": <namespace>/cp-zookeeper:2181
"<kafka-tcp-port>": <namespace>/cp-kafka:9092
And i've made the corresponding entry in my nginx ingress controller
- name: <zookeper-tcp-port>-tcp
port: <zookeper-tcp-port>
protocol: TCP
targetPort: <zookeper-tcp-port>-tcp
- name: <kafka-tcp-port>-tcp
port: <kafka-tcp-port>
protocol: TCP
targetPort: <kafka-tcp-port>-tcp
Now I'm trying to connect to my kafka instance.
When i just try to connect to the IP and port using kafka tools, I get the error message
Unable to determine broker endpoints from Zookeeper.
One or more brokers have multiple endpoints for protocol PLAIN...
Please proved bootstrap.servers value in advanced settings
[<cp-broker-address-0>.cp-kafka-headless.<namespace>:<port>][<ip>]
When I enter, what I assume are the correct broker addresses (I've tried them all...) I get a time out. There are no logs coming from the nginx controler excep
[08/Apr/2020:15:51:12 +0000]TCP200000.000
[08/Apr/2020:15:51:12 +0000]TCP200000.000
[08/Apr/2020:15:51:14 +0000]TCP200000.001
From the pod kafka-zookeeper-0 I'm gettting loads of
[2020-04-08 15:52:02,415] INFO Accepted socket connection from /<ip:port> (org.apache.zookeeper.server.NIOServerCnxnFactory)
[2020-04-08 15:52:02,415] WARN Unable to read additional data from client sessionid 0x0, likely client has closed socket (org.apache.zookeeper.server.NIOServerCnxn)
[2020-04-08 15:52:02,415] INFO Closed socket connection for client /<ip:port> (no session established for client) (org.apache.zookeeper.server.NIOServerCnxn)
Though I'm not sure these have anything to do with it?
Any ideas on what I'm doing wrong?
Thanks in advance.
TL;DR:
Change the value nodeport.enabled to true inside cp-kafka/values.yaml before deploying.
Change the service name and ports in you TCP NGINX Configmap and Ingress object.
Set bootstrap-server on your kafka tools to <Cluster_External_IP>:31090
Explanation:
The Headless Service was created alongside the StatefulSet. The created service will not be given a clusterIP, but will instead simply include a list of Endpoints.
These Endpoints are then used to generate instance-specific DNS records in the form of:
<StatefulSet>-<Ordinal>.<Service>.<Namespace>.svc.cluster.local
It creates a DNS name for each pod, e.g:
[ root#curl:/ ]$ nslookup my-confluent-cp-kafka-headless
Server: 10.0.0.10
Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local
Name: my-confluent-cp-kafka-headless
Address 1: 10.8.0.23 my-confluent-cp-kafka-1.my-confluent-cp-kafka-headless.default.svc.cluster.local
Address 2: 10.8.1.21 my-confluent-cp-kafka-0.my-confluent-cp-kafka-headless.default.svc.cluster.local
Address 3: 10.8.3.7 my-confluent-cp-kafka-2.my-confluent-cp-kafka-headless.default.svc.cluster.local
This is what makes this services connect to each other inside the cluster.
I've gone through a lot of trial and error, until I realized how it was supposed to be working. Based your TCP Nginx Configmap I believe you faced the same issue.
The Nginx ConfigMap asks for: <PortToExpose>: "<Namespace>/<Service>:<InternallyExposedPort>".
I realized that you don't need to expose the Zookeeper, since it's a internal service and handled by kafka brokers.
I also realized that you are trying to expose cp-kafka:9092 which is the headless service, also only used internally, as I explained above.
In order to get outside access you have to set the parameters nodeport.enabled to true as stated here: External Access Parameters.
It adds one service to each kafka-N pod during chart deployment.
Then you change your configmap to map to one of them:
data:
"31090": default/demo-cp-kafka-0-nodeport:31090
Note that the service created has the selector statefulset.kubernetes.io/pod-name: demo-cp-kafka-0 this is how the service identifies the pod it is intended to connect to.
Edit the nginx-ingress-controller:
- containerPort: 31090
hostPort: 31090
protocol: TCP
Set your kafka tools to <Cluster_External_IP>:31090
Reproduction:
- Snippet edited in cp-kafka/values.yaml:
nodeport:
enabled: true
servicePort: 19092
firstListenerPort: 31090
Deploy the chart:
$ helm install demo cp-helm-charts
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
demo-cp-control-center-6d79ddd776-ktggw 1/1 Running 3 113s
demo-cp-kafka-0 2/2 Running 1 113s
demo-cp-kafka-1 2/2 Running 0 94s
demo-cp-kafka-2 2/2 Running 0 84s
demo-cp-kafka-connect-79689c5c6c-947c4 2/2 Running 2 113s
demo-cp-kafka-rest-56dfdd8d94-79kpx 2/2 Running 1 113s
demo-cp-ksql-server-c498c9755-jc6bt 2/2 Running 2 113s
demo-cp-schema-registry-5f45c498c4-dh965 2/2 Running 3 113s
demo-cp-zookeeper-0 2/2 Running 0 112s
demo-cp-zookeeper-1 2/2 Running 0 93s
demo-cp-zookeeper-2 2/2 Running 0 74s
$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
demo-cp-control-center ClusterIP 10.0.13.134 <none> 9021/TCP 50m
demo-cp-kafka ClusterIP 10.0.15.71 <none> 9092/TCP 50m
demo-cp-kafka-0-nodeport NodePort 10.0.7.101 <none> 19092:31090/TCP 50m
demo-cp-kafka-1-nodeport NodePort 10.0.4.234 <none> 19092:31091/TCP 50m
demo-cp-kafka-2-nodeport NodePort 10.0.3.194 <none> 19092:31092/TCP 50m
demo-cp-kafka-connect ClusterIP 10.0.3.217 <none> 8083/TCP 50m
demo-cp-kafka-headless ClusterIP None <none> 9092/TCP 50m
demo-cp-kafka-rest ClusterIP 10.0.14.27 <none> 8082/TCP 50m
demo-cp-ksql-server ClusterIP 10.0.7.150 <none> 8088/TCP 50m
demo-cp-schema-registry ClusterIP 10.0.7.84 <none> 8081/TCP 50m
demo-cp-zookeeper ClusterIP 10.0.9.119 <none> 2181/TCP 50m
demo-cp-zookeeper-headless ClusterIP None <none> 2888/TCP,3888/TCP 50m
Create the TCP configmap:
$ cat nginx-tcp-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: tcp-services
namespace: kube-system
data:
31090: "default/demo-cp-kafka-0-nodeport:31090"
$ kubectl apply -f nginx-tcp.configmap.yaml
configmap/tcp-services created
Edit the Nginx Ingress Controller:
$ kubectl edit deploy nginx-ingress-controller -n kube-system
$kubectl get deploy nginx-ingress-controller -n kube-system -o yaml
{{{suppressed output}}}
ports:
- containerPort: 31090
hostPort: 31090
protocol: TCP
- containerPort: 80
name: http
protocol: TCP
- containerPort: 443
name: https
protocol: TCP
My ingress is on IP 35.226.189.123, now let's try to connect from outside the cluster. For that I'll connect to another VM where I have a minikube, so I can use kafka-client pod to test:
user#minikube:~$ kubectl get pods
NAME READY STATUS RESTARTS AGE
kafka-client 1/1 Running 0 17h
user#minikube:~$ kubectl exec kafka-client -it -- bin/bash
root#kafka-client:/# kafka-console-consumer --bootstrap-server 35.226.189.123:31090 --topic demo-topic --from-beginning --timeout-ms 8000 --max-messages 1
Wed Apr 15 18:19:48 UTC 2020
Processed a total of 1 messages
root#kafka-client:/#
As you can see, I was able to access the kafka from outside.
If you need external access to Zookeeper as well I'll leave a service model for you:
zookeeper-external-0.yaml
apiVersion: v1
kind: Service
metadata:
labels:
app: cp-zookeeper
pod: demo-cp-zookeeper-0
name: demo-cp-zookeeper-0-nodeport
namespace: default
spec:
externalTrafficPolicy: Cluster
ports:
- name: external-broker
nodePort: 31181
port: 12181
protocol: TCP
targetPort: 31181
selector:
app: cp-zookeeper
statefulset.kubernetes.io/pod-name: demo-cp-zookeeper-0
sessionAffinity: None
type: NodePort
It will create a service for it:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
demo-cp-zookeeper-0-nodeport NodePort 10.0.5.67 <none> 12181:31181/TCP 2s
Patch your configmap:
data:
"31090": default/demo-cp-kafka-0-nodeport:31090
"31181": default/demo-cp-zookeeper-0-nodeport:31181
Add the Ingress rule:
ports:
- containerPort: 31181
hostPort: 31181
protocol: TCP
Test it with your external IP:
pod/zookeeper-client created
user#minikube:~$ kubectl exec -it zookeeper-client -- /bin/bash
root#zookeeper-client:/# zookeeper-shell 35.226.189.123:31181
Connecting to 35.226.189.123:31181
Welcome to ZooKeeper!
JLine support is disabled
If you have any doubts, let me know in the comments!

Why can't my service pass traffic to a pod with a named port on minikube?

I'm having trouble with the examples in section 5.1.1 Using Named Ports of Kubernetes In Action by Marko Luksa. The example goes like this:
First - Create
I'm creating a pod with a named port that runs a Node.js container that responds with You've hit <hostname> when it's hit:
apiVersion: v1
kind: Pod
metadata:
name: named-port-pod
labels:
app: named-port
spec:
containers:
- name: kubia
image: michaellundquist/kubia
ports:
- name: http
containerPort: 8080
And a service like this (note, this is a simplified version of the original example which also doesn't work.:
apiVersion: v1
kind: Service
metadata:
name: named-port-service
spec:
ports:
- name: http
port: 80
targetPort: http
selector:
app: named-port
Second - Verify
$ kubectl get po -o wide --show-labels
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES LABELS
named-port-pod 1/1 Running 0 45m 172.17.0.7 minikube <none> <none> app=named-port
$ kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 53m
named-port-service ClusterIP 10.96.115.108 <none> 80/TCP 19m
$ kubectl describe service named-port-service
Name: named-port-service
Namespace: default
Labels: <none>
Annotations: <none>
Selector: app=named-port
Type: ClusterIP
IP: 10.96.115.108
Port: http 80/TCP
TargetPort: http/TCP
Endpoints: 172.17.0.7:8080
Session Affinity: None
Events: <none>
Third - Test (Failing)
$ kubectl exec named-port-pod -- curl named-port-pod:8080
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 26 0 26 0 0 5494 0 --:--:-- --:--:-- --:--:-- 6500
You've hit named-port-pod
$ kubectl exec named-port-pod -- curl --max-time 20 named-port-service
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:19 --:--:-- 0curl: (28) Connection timed out after 20001 milliseconds
command terminated with exit code 28
As you can see, everything works when I hit named-port-pod:8080, but fails when I hit named-port-service. I'm pretty sure I have the mapping correct because kubectl describe service named-port-service has the correct endpoint I think minikube can use named ports but my service can't pass connections to my pod. Why?
p.s here's my minikube version:
$ minikube version
minikube version: v1.6.2
commit: 54f28ac5d3a815d1196cd5d57d707439ee4bb392
This is known issue with minikube. Pod cannot reach itself via service IP. You can try accesing your service from a different pod or use the following workaround to fix this.
minikube ssh
sudo ip link set docker0 promisc on
Open issue: https://github.com/kubernetes/minikube/issues/1568

Why there is downtime while rolling update a deployment or even scaling down a replicaset

Due to official document of kubernetes
Rolling updates allow Deployments' update to take place with zero downtime by incrementally updating Pods instances with new ones
I was trying to perform zero downtime update using Rolling Update strategy which was recommanded way to update an application in kube cluster.
Official reference:
https://kubernetes.io/docs/tutorials/kubernetes-basics/update/update-intro/
But i was a a little bit confused about the definition while performing it: downtime of application still happens. Here is my cluster info at the begining, as shown below:
liguuudeiMac:~ liguuu$ kubectl get all
NAME READY STATUS RESTARTS AGE
pod/ubuntu-b7d6cb9c6-6bkxz 1/1 Running 0 3h16m
pod/webapp-deployment-6dcf7b88c7-4kpgc 1/1 Running 0 3m52s
pod/webapp-deployment-6dcf7b88c7-4vsch 1/1 Running 0 3m52s
pod/webapp-deployment-6dcf7b88c7-7xzsk 1/1 Running 0 3m52s
pod/webapp-deployment-6dcf7b88c7-jj8vx 1/1 Running 0 3m52s
pod/webapp-deployment-6dcf7b88c7-qz2xq 1/1 Running 0 3m52s
pod/webapp-deployment-6dcf7b88c7-s7rtt 1/1 Running 0 3m52s
pod/webapp-deployment-6dcf7b88c7-s88tb 1/1 Running 0 3m52s
pod/webapp-deployment-6dcf7b88c7-snmw5 1/1 Running 0 3m52s
pod/webapp-deployment-6dcf7b88c7-v287f 1/1 Running 0 3m52s
pod/webapp-deployment-6dcf7b88c7-vd4kb 1/1 Running 0 3m52s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 3h16m
service/tc-webapp-service NodePort 10.104.32.134 <none> 1234:31234/TCP 3m52s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/ubuntu 1/1 1 1 3h16m
deployment.apps/webapp-deployment 10/10 10 10 3m52s
NAME DESIRED CURRENT READY AGE
replicaset.apps/ubuntu-b7d6cb9c6 1 1 1 3h16m
replicaset.apps/webapp-deployment-6dcf7b88c7 10 10 10 3m52s
deployment.apps/webapp-deployment is a tomcat-based webapp application, and the Service tc-webapp-service mapped to Pods contains tomcat containers(the full deployment config files was present at the end of ariticle). deployment.apps/ubuntu is just a standalone app in cluster, which is about to perform infinite http request to tc-webapp-service every second, so that i can trace the status of so called rolling update of webapp-deployment, the commands run in ubuntu container was likely as below(infinite loop of curl command in every 0.01 second):
for ((;;)); do curl -sS -D - http://tc-webapp-service:1234 -o /dev/null | grep HTTP; date +"%Y-%m-%d %H:%M:%S"; echo ; sleep 0.01 ; done;
And the output of ubuntu app(everthing is fine):
...
HTTP/1.1 200
2019-08-30 07:27:15
...
HTTP/1.1 200
2019-08-30 07:27:16
...
Then I try to change tag of tomcat image, from 8-jdk8 to 8-jdk11. Note that the rolling update strategy of deployment.apps/webapp-deployment has been config correctly, with maxSurge 0 and maxUnavailable 9.(the same result if these two attr were default )
...
spec:
containers:
- name: tc-part
image: tomcat:8-jdk8 -> tomcat:8-jdk11
...
Then, the output of ubuntu app:
HTTP/1.1 200
2019-08-30 07:47:43
curl: (56) Recv failure: Connection reset by peer
2019-08-30 07:47:43
HTTP/1.1 200
2019-08-30 07:47:44
As shown above, some http requests failed, and this is no doubt the interruption of application while performing rolling update for apps in kube cluster.
However, I can also replay the situation mentioned above(interruption) in Scaling down, the commands as shown below(from 10 to 2):
kubectl scale deployment.apps/tc-webapp-service --replicas=2
After performing the above tests, I was wondering whether so-called Zero downtime actually means. Although the way mocking http request was a little bit tricky, the situation is so normal for some applications which were designed to be able to handle thousands of, millions of request in one second.
env:
liguuudeiMac:cacheee liguuu$ minikube version
minikube version: v1.3.1
commit: ca60a424ce69a4d79f502650199ca2b52f29e631
liguuudeiMac:cacheee liguuu$ kubectl version
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.3", GitCommit:"5e53fd6bc17c0dec8434817e69b04a25d8ae0ff0", GitTreeState:"clean", BuildDate:"2019-06-06T01:44:30Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.2", GitCommit:"f6278300bebbb750328ac16ee6dd3aa7d3549568", GitTreeState:"clean", BuildDate:"2019-08-05T09:15:22Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
Deployment & Service Config:
# Service
apiVersion: v1
kind: Service
metadata:
name: tc-webapp-service
spec:
type: NodePort
selector:
appName: tc-webapp
ports:
- name: tc-svc
protocol: TCP
port: 1234
targetPort: 8080
nodePort: 31234
---
# Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: webapp-deployment
spec:
replicas: 10
selector:
matchLabels:
appName: tc-webapp
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 0
maxUnavailable: 9
# Pod Templates
template:
metadata:
labels:
appName: tc-webapp
spec:
containers:
- name: tc-part
image: tomcat:8-jdk8
ports:
- containerPort: 8080
livenessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 10
periodSeconds: 10
readinessProbe:
httpGet:
scheme: HTTP
port: 8080
path: /
initialDelaySeconds: 5
periodSeconds: 1
To deploy an application which will really update with zero downtime the application should meet some requirements. To mention few of them:
application should handle graceful shutdown
application should implement readiness and liveness probes correctly
For example if shutdown signal is recived, then it should not respond with 200 to new readiness probes, but it still respond with 200 for liveness untill all old requests are processed.

Pods are unable to connect to internal Kubernetes service

I have issues with CoreDNS on some nodes are in Crashloopback state due to error trying to reach the kubernetes internal service.
This is a new K8s cluster deployed using Kubespray, the network layer is Weave with Kubernetes version 1.12.5 on Openstack.
I've already tested the connection to the endpoints and have no issue reaching to 10.2.70.14:6443 for example.
But telnet from the pods to 10.233.0.1:443 is failing.
Thanks in advance for the help
kubectl describe svc kubernetes
Name: kubernetes
Namespace: default
Labels: component=apiserver
provider=kubernetes
Annotations: <none>
Selector: <none>
Type: ClusterIP
IP: 10.233.0.1
Port: https 443/TCP
TargetPort: 6443/TCP
Endpoints: 10.2.70.14:6443,10.2.70.18:6443,10.2.70.27:6443 + 2 more...
Session Affinity: None
Events: <none>
And from CoreDNS logs:
E0415 17:47:05.453762 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:311: Failed to list *v1.Service: Get https://10.233.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.233.0.1:443: connect: connection refused
E0415 17:47:05.456909 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:313: Failed to list *v1.Endpoints: Get https://10.233.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.233.0.1:443: connect: connection refused
E0415 17:47:06.453258 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:318: Failed to list *v1.Namespace: Get https://10.233.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.233.0.1:443: connect: connection refused
Also, checking out the logs of kube-proxy from one of the problematic nodes revealed the following errors:
I0415 19:14:32.162909 1 graceful_termination.go:160] Trying to delete rs: 10.233.0.1:443/TCP/10.2.70.36:6443
I0415 19:14:32.162979 1 graceful_termination.go:171] Not deleting, RS 10.233.0.1:443/TCP/10.2.70.36:6443: 1 ActiveConn, 0 InactiveConn
I0415 19:14:32.162989 1 graceful_termination.go:160] Trying to delete rs: 10.233.0.1:443/TCP/10.2.70.18:6443
I0415 19:14:32.163017 1 graceful_termination.go:171] Not deleting, RS 10.233.0.1:443/TCP/10.2.70.18:6443: 1 ActiveConn, 0 InactiveConn
E0415 19:14:32.215707 1 proxier.go:430] Failed to execute iptables-restore for nat: exit status 1 (iptables-restore: line 7 failed
)
I had exactly the same problem, and it turned out that my kubespray config was wrong. Especially the nginx ingress setting ingress_nginx_host_network
As it turns our you have to set ingress_nginx_host_network: true (defaults to false)
If you do not want to rerun the whole kubespray script, edit the nginx ingress deamon set
$ kubectl -n ingress-nginx edit ds ingress-nginx-controller
Add --report-node-internal-ip-address to the commandline:
spec:
container:
args:
- /nginx-ingress-controller
- --configmap=$(POD_NAMESPACE)/ingress-nginx
- --tcp-services-configmap=$(POD_NAMESPACE)/tcp-services
- --udp-services-configmap=$(POD_NAMESPACE)/udp-services
- --annotations-prefix=nginx.ingress.kubernetes.io
- --report-node-internal-ip-address # <- new
Set the following two properties on the same level as e.g serviceAccountName: ingress-nginx:
serviceAccountName: ingress-nginx
hostNetwork: true # <- new
dnsPolicy: ClusterFirstWithHostNet # <- new
Then save and quit :wq, check the pod status kubectl get pods --all-namespaces.
Source:
https://github.com/kubernetes-sigs/kubespray/issues/4357