Readiness probe based on service - kubernetes

I have 2 pods and my application is based on a cluster i.e. application synchronizes with another pod to bring it up. Let us say in my example I am using appod1 and appod2 and the synchronization port is 8080.
I want the service for DNS to be resolved for these pod hostnames but I want to block the traffic from outside the apppod1 and appod2.
I can use a readiness probe but then the service doesn't have endpoints and I can't resolve the IP of the 2nd pod. If I can't resolve the IP of the 2nd pod from pod1 then I can't complete the configuration of these pods.
E.g.
App Statefulset definition
app1_sts.yaml
===
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
cluster: appcluster
name: app1
namespace: app
spec:
selector:
matchLabels:
cluster: appcluster
serviceName: app1cluster
template:
metadata:
labels:
cluster: appcluster
spec:
containers:
- name: app1-0
image: localhost/linux:8
imagePullPolicy: Always
securityContext:
privileged: false
command: [/usr/sbin/init]
ports:
- containerPort: 8080
name: appport
readinessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 120
periodSeconds: 30
failureThreshold: 20
env:
- name: container
value: "true"
- name: applist
value: "app2-0"
app2_sts.yaml
====
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
cluster: appcluster
name: app2
namespace: app
spec:
selector:
matchLabels:
cluster: appcluster
serviceName: app2cluster
template:
metadata:
labels:
cluster: appcluster
spec:
containers:
- name: app2-0
image: localhost/linux:8
imagePullPolicy: Always
securityContext:
privileged: false
command: [/usr/sbin/init]
ports:
- containerPort: 8080
name: appport
readinessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 120
periodSeconds: 30
failureThreshold: 20
env:
- name: container
value: "true"
- name: applist
value: "app1-0"
Create Statefulsets and check name resolution
[root#oper01 onprem]# kubectl get all -n app
NAME READY STATUS RESTARTS AGE
pod/app1-0 0/1 Running 0 8s
pod/app2-0 0/1 Running 0 22s
NAME READY AGE
statefulset.apps/app1 0/1 49s
statefulset.apps/app2 0/1 22s
kubectl exec -i -t app1-0 /bin/bash -n app
[root#app1-0 ~]# nslookup app2-0
Server: 10.96.0.10
Address: 10.96.0.10#53
** server can't find app2-0: NXDOMAIN
[root#app1-0 ~]# nslookup app1-0
Server: 10.96.0.10
Address: 10.96.0.10#53
** server can't find app1-0: NXDOMAIN
[root#app1-0 ~]#
I understand the behavior of the readiness probe and I am using it as it helps me to make sure service should not resolve to app pods if port 8080 is down. However, I am unable to make out how can I complete the configuration as app pods need to resolve each other and they need their hostname and IPs to configure. DNS resolution can only happen once the service has end points. Is there a better way to handle this situation?

Related

Kubernetes pod never ready and service endpoint empty

I want to deploy an ASP.NET server with Kubernetes:
Deployment with my docker image (ASP.NET server)
Service to expose the pods
Nginx ingress controller
Ingress to access my pods from "outside"
For my issue, we can forget all the ingress yaml.
When I deploy my Deployment and my Service I have a problem: my pod is never ready and my service endpoints field is empty.
NAME READY STATUS RESTARTS AGE
pod/server-deployment-bd4977bf5-n7gmx 0/1 Running 36 (41s ago) 147m
When I run "kubectl logs pod/server-deployment-bd4977bf5-n7gmx" there are no logs related to this issue
Microsoft.Hosting.Lifetime[14]
Now listening on: http://[::]:80
Microsoft.Hosting.Lifetime[14]
Now listening on: https://[::]:403
Microsoft.Hosting.Lifetime[0]
Application started. Press Ctrl+C to shut down.
Microsoft.Hosting.Lifetime[0]
Hosting environment: Production
Microsoft.Hosting.Lifetime[0]
Content root path: /app/
Microsoft.Hosting.Lifetime[0]
Application is shutting down...
When I run "kubectl describe service/server-svc" I see that the "Endpoints:" field is empty.
After some research on stackoverflow & others sites, I didn't find any solution or explanation about my problem. From what I read, I know that service's endpoints field shouldn't be empty and my pods might have a problem with the readinessProbe.
Below is the .yaml of my Deployment and Service
Deployment :
apiVersion: apps/v1
kind: Deployment
metadata:
name: server-deployment
spec:
replicas: 1
selector:
matchLabels:
app: server-app
strategy:
type: RollingUpdate
template:
metadata:
labels:
app: server-app
spec:
imagePullSecrets:
- name: regcred
containers:
- name: server-container
image: server:0.0.2
imagePullPolicy: Always
command: ["dotnet", "server.dll"]
envFrom:
- configMapRef:
name: server-configmap
optional: false
- secretRef:
name: server-secret
optional: false
ports:
- name: http
containerPort: 443
hostPort: 443
livenessProbe:
httpGet:
path: /api/health/live
port: http
initialDelaySeconds: 10
periodSeconds: 20
timeoutSeconds: 1
failureThreshold: 6
successThreshold: 1
readinessProbe:
httpGet:
path: /api/health/ready
port: http
initialDelaySeconds: 10
periodSeconds: 20
timeoutSeconds: 1
failureThreshold: 6
successThreshold: 1
volumeMounts:
- name: server-pfx-volume
mountPath: "/https"
readOnly: true
volumes:
- name: server-pfx-volume
secret:
secretName: server-pfx
Service:
apiVersion: v1
kind: Service
metadata:
name: server-svc
spec:
type: ClusterIP
selector:
app: server-app
ports:
- name: http
protocol: TCP
port: 443
targetPort: 443
When I run "kubectl get pods --show-labels" I got the pod with the correct label
NAME READY STATUS RESTARTS AGE LABELS
server-deployment-bd4977bf5-n7gmx 0/1 CrashLoopBackOff 38 (74s ago) 158m app=server-app,pod-template-hash=bd4977bf5
So I'm here to looking for help to figure out why my pod is never ready and why my service endpoint field is empty.

nginx-ingress tcp services - connection refused

I'm currently deploying a new kubernetes cluster and I want to expose a mongodb service from outside the cluster using an nginx-ingress.
I know that nginx-ingress is usually for layer 7 applications but also capable to work on layer 4 (TCP/UDP) according to the official documentation.
https://kubernetes.github.io/ingress-nginx/user-guide/exposing-tcp-udp-services/
My mongodb service is a ClusterIP serivce which is accssible on port 11717 (internal namespace):
kubectl get svc -n internal
mongodb ClusterIP 10.97.63.154 <none> 11717/TCP 3d20h
telnet 10.97.63.154 11717
Trying 10.97.63.154...
Connected to 10.97.63.154.
I literally tried every possible combination to achieve this goal but with no success.
I'm using the nginx-ingress helm chart (daemonset type).
My nginx-ingress/templates/controller-daemonset.yaml file:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: nginx-ingress-nginx-ingress
namespace: default
labels:
app.kubernetes.io/name: nginx-ingress-nginx-ingress
helm.sh/chart: nginx-ingress-0.13.0
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/instance: nginx-ingress
spec:
selector:
matchLabels:
app: nginx-ingress-nginx-ingress
template:
metadata:
labels:
app: nginx-ingress-nginx-ingress
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9113"
prometheus.io/scheme: "http"
spec:
serviceAccountName: nginx-ingress-nginx-ingress
terminationGracePeriodSeconds: 30
hostNetwork: false
containers:
- name: nginx-ingress-nginx-ingress
image: "nginx/nginx-ingress:2.2.0"
imagePullPolicy: "IfNotPresent"
ports:
- name: http
containerPort: 80
hostPort: 80
- name: https
containerPort: 443
hostPort: 443
- name: mongodb
containerPort: 11717
hostPort: 11717
- name: prometheus
containerPort: 9113
- name: readiness-port
containerPort: 8081
readinessProbe:
httpGet:
path: /nginx-ready
port: readiness-port
periodSeconds: 1
securityContext:
allowPrivilegeEscalation: true
runAsUser: 101 #nginx
capabilities:
drop:
- ALL
add:
- NET_BIND_SERVICE
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
resources:
{}
args:
- /nginx-ingress-controller
- -nginx-plus=false
- -nginx-reload-timeout=60000
- -enable-app-protect=false
- -tcp-services-configmap=$(POD_NAMESPACE)/tcp-services
- -publish-service=$(POD_NAMESPACE)/ingress-nginx
- -annotations-prefix=nginx.ingress.kubernetes.io
- -enable-app-protect-dos=false
- -nginx-configmaps=$(POD_NAMESPACE)/nginx-ingress-nginx-ingress
- -default-server-tls-secret=$(POD_NAMESPACE)/nginx-ingress-nginx-ingress-default-server-tls
- -ingress-class=nginx
- -health-status=false
- -health-status-uri=/nginx-health
- -nginx-debug=false
- -v=1
- -nginx-status=true
- -nginx-status-port=8080
- -nginx-status-allow-cidrs=127.0.0.1
- -report-ingress-status
- -external-service=nginx-ingress-nginx-ingress
- -enable-leader-election=true
- -leader-election-lock-name=nginx-ingress-nginx-ingress-leader-election
- -enable-prometheus-metrics=true
- -prometheus-metrics-listen-port=9113
- -prometheus-tls-secret=
- -enable-custom-resources=true
- -enable-snippets=false
- -enable-tls-passthrough=false
- -enable-preview-policies=false
- -enable-cert-manager=false
- -enable-oidc=false
- -ready-status=true
- -ready-status-port=8081
- -enable-latency-metrics=false
My nginx-ingress/templates/controller-service.yaml file:
apiVersion: v1
kind: Service
metadata:
name: nginx-ingress-nginx-ingress
namespace: default
labels:
app.kubernetes.io/name: nginx-ingress-nginx-ingress
helm.sh/chart: nginx-ingress-0.13.0
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/instance: nginx-ingress
spec:
externalTrafficPolicy: Local
type: LoadBalancer
ports:
- port: 80
targetPort: 80
protocol: TCP
name: http
- port: 443
targetPort: 443
protocol: TCP
name: https
- name: mongodb
port: 11717
targetPort: 11717
protocol: TCP
selector:
app: nginx-ingress-nginx-ingress
My nginx-ingress/templates/tcp-services.yaml file:
apiVersion: v1
kind: ConfigMap
metadata:
name: tcp-services
namespace: default
data:
"11717": internal/mongodb:11717
kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-ingress-nginx-ingress-d5vms 1/1 Running 0 61m
nginx-ingress-nginx-ingress-kcs4p 1/1 Running 0 61m
nginx-ingress-nginx-ingress-mnnn2 1/1 Running 0 61m
kubectl get svc -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 4d1h <none>
nginx-ingress-nginx-ingress LoadBalancer 10.99.176.220 <pending> 80:31700/TCP,443:31339/TCP,11717:31048/TCP 61m app=nginx-ingress-nginx-ingress
telnet 10.99.176.220 80
Trying 10.99.176.220...
Connected to 10.99.176.220.
Escape character is '^]'.
telnet 10.99.176.220 11717
Trying 10.99.176.220...
telnet: Unable to connect to remote host: Connection refused
I can't understand why the connection is getting refused on port 11717.
How can I achieve this scenario:
mongo.myExternalDomain:11717 --> nginx-ingress service --> nginx-ingress pod --> mongodb service --> mongodb pod
Thanks in advance!
I would appreciate any kind of help!
I had simmiliar issue. Maybe this will help you. In my case it was in tcp-services configmap:
Shortly. Instead of this:
apiVersion: v1
kind: ConfigMap
metadata:
name: tcp-services
namespace: default
data:
"11717": internal/mongodb:11717
please change to:
apiVersion: v1
kind: ConfigMap
metadata:
name: tcp-services
namespace: default
data:
"11717": internal/mongodb:11717:PROXY
Details:
Edit the 'tcp-services' configmap to add a tcp .service 8000: namespace/service:8000.
edit the nginx-controller service to add a port (port:8000 --> targetPort:8000) for the tcp service in step1
Check /etc/nginx/nginx.conf in nginx controller pod and confirm it contains a 'server' block with correct listen 8000; directive for the tcp/8000 service.
Edit the 'tcp-services' configmap again to add the proxy-protocol decode directive and now the k/v for the tcp/8000 service becomes 8000: namespace/service:8000:PROXY
Check /etc/nginx/nginx.conf in nginx controller pod and there isn't any change comparing that from step3, it is still listen 8000;
Edit some ingress rule (make some change like updating the host)
Check /etc/nginx/nginx.conf in nginx controller pod again and now the listen directive for the tcp/8000 service becomes listen 8000 proxy_protocol; which is correct.

GCP health check failing for kubernetes pod

I'm trying to launch an application on GKE and the health checks made by the Ingress always fail.
Here's my full k8s yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: tripvector
labels:
app: tripvector
spec:
replicas: 1
minReadySeconds: 60
selector:
matchLabels:
app: tripvector
template:
metadata:
labels:
app: tripvector
spec:
containers:
- name: tripvector
readinessProbe:
httpGet:
port: 3000
path: /healthz
initialDelaySeconds: 30
timeoutSeconds: 10
periodSeconds: 11
image: us-west1-docker.pkg.dev/triptastic-1542412229773/tripvector/tripvector:healthz2
env:
- name: ROOT_URL
value: https://paymahn.tripvector.io/
- name: MAIL_URL
valueFrom:
secretKeyRef:
key: MAIL_URL
name: startup
- name: METEOR_SETTINGS
valueFrom:
secretKeyRef:
key: METEOR_SETTINGS
name: startup
- name: MONGO_URL
valueFrom:
secretKeyRef:
key: MONGO_URL
name: startup
ports:
- containerPort: 3000
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: tripvector
spec:
defaultBackend:
service:
name: tripvector-np
port:
number: 60000
---
apiVersion: v1
kind: Service
metadata:
name: tripvector-np
annotations:
cloud.google.com/neg: '{"ingress": true}'
spec:
type: ClusterIP
selector:
app: tripvector
ports:
- protocol: TCP
port: 60000
targetPort: 3000
This yaml should do the following:
make a deployment with my healthz2 image along with a readiness check at /healthz on port 3000 which is exposed by the image
launch a cluster IP service
launch an ingress
When I check for the status of the service I see it's unhealth:
❯❯❯ gcloud compute backend-services get-health k8s1-07274a01-default-tripvector-np-60000-a912870e --global
---
backend: https://www.googleapis.com/compute/v1/projects/triptastic-1542412229773/zones/us-central1-a/networkEndpointGroups/k8s1-07274a01-default-tripvector-np-60000-a912870e
status:
healthStatus:
- healthState: UNHEALTHY
instance: https://www.googleapis.com/compute/v1/projects/triptastic-1542412229773/zones/us-central1-a/instances/gke-tripvector2-default-pool-78cf58d9-5dgs
ipAddress: 10.12.0.29
port: 3000
kind: compute#backendServiceGroupHealth
It seems that the healthcheck is hitting the right port but this output doesn't confirm if it's hitting the right path. If I look up the health check object in the console I see the following:
Which confirms the GKE health check is hitting the healthz path.
I've verified in the following ways that the health check endpoint I'm using for the readiness probe works but something still isn't working properly:
exec into the pod and run wget
port forward the pod and check /healthz in my browser
port forward the service and check /healthz in my browser
In all three instances above, I can see the /healthz endpoint working. I'll outline each one below.
Here's evidence that running wget from within the pod:
❯❯❯ k exec -it tripvector-65ff4c4dbb-vwvtr /bin/sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
/tripvector # ls
bundle
/tripvector # wget localhost:3000/healthz
Connecting to localhost:3000 (127.0.0.1:3000)
saving to 'healthz'
healthz 100% |************************************************************************************************************************************************************| 25 0:00:00 ETA
'healthz' saved
/tripvector # cat healthz
[200] Healthcheck passed./tripvector #
Here's what happens when I perform a port forward from the pod to my local machine:
❯❯❯ k port-forward tripvector-65ff4c4dbb-vwvtr 8081:3000
Forwarding from 127.0.0.1:8081 -> 3000
Forwarding from [::1]:8081 -> 3000
Handling connection for 8081
Handling connection for 8081
Handling connection for 8081
Handling connection for 8081
Handling connection for 8081
Handling connection for 8081
Handling connection for 8081
Handling connection for 8081
Handling connection for 8081
And here's what happens when I port forward from the Service object:
2:53PM /Users/paymahn/code/tripvector/tripvector ✘ 1 docker ⬆ ✱
❯❯❯ k port-forward svc/tripvector-np 8082:60000
Forwarding from 127.0.0.1:8082 -> 3000
Forwarding from [::1]:8082 -> 3000
Handling connection for 8082
How can I get the healthcheck for the ingress and network endpoint group to succeed so that I can access my pod from the internet?

unknown host when lookup pod by name, resolved with pod restart

I have an installer that spins up two pods in my CI flow, let's call them web and activemq. When the web pod starts it tries to communicate with the activemq pod using the k8s assigned amq-deployment-0.activemq pod name.
Randomly, the web will get an unknown host exception when trying to access amq-deployment1.activemq. If I restart the web pod in this situation the web pod will have no problem communicating with the activemq pod.
I've logged into the web pod when this happens and the /etc/resolv.conf and /etc/hosts files look fine. The host machines /etc/resolve.conf and /etc/hosts are sparse with nothing that looks questionable.
Information:
There is only 1 worker node.
kubectl --version
Kubernetes v1.8.3+icp+ee
Any ideas on how to go about debugging this issue. I can't think of a good reason for it to happen randomly nor resolve itself on a pod restart.
If there is other useful information needed, I can get it. Thank in advance
For activeMQ we do have this service file
apiVersion: v1 kind: Service
metadata:
name: activemq
labels:
app: myapp
env: dev
spec:
ports:
- port: 8161
protocol: TCP
targetPort: 8161
name: http
- port: 61616
protocol: TCP
targetPort: 61616
name: amq
selector:
component: analytics-amq
app: myapp
environment: dev
type: fa-core
clusterIP: None
And this ActiveMQ stateful set (this is the template)
kind: StatefulSet
apiVersion: apps/v1beta1
metadata:
name: pa-amq-deployment
spec:
replicas: {{ activemqs }}
updateStrategy:
type: RollingUpdate
serviceName: "activemq"
template:
metadata:
labels:
component: analytics-amq
app: myapp
environment: dev
type: fa-core
spec:
containers:
- name: pa-amq
image: default/myco/activemq:latest
imagePullPolicy: Always
resources:
limits:
cpu: 150m
memory: 1Gi
livenessProbe:
exec:
command:
- /etc/init.d/activemq
- status
initialDelaySeconds: 10
periodSeconds: 15
failureThreshold: 16
ports:
- containerPort: 8161
protocol: TCP
name: http
- containerPort: 61616
protocol: TCP
name: amq
envFrom:
- configMapRef:
name: pa-activemq-conf-all
- secretRef:
name: pa-activemq-secret
volumeMounts:
- name: timezone
mountPath: /etc/localtime
volumes:
- name: timezone
hostPath:
path: /usr/share/zoneinfo/UTC
The Web stateful set:
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: pa-web-deployment
spec:
replicas: 1
updateStrategy:
type: RollingUpdate
serviceName: "pa-web"
template:
metadata:
labels:
component: analytics-web
app: myapp
environment: dev
type: fa-core
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: component
operator: In
values:
- analytics-web
topologyKey: kubernetes.io/hostname
containers:
- name: pa-web
image: default/myco/web:latest
imagePullPolicy: Always
resources:
limits:
cpu: 1
memory: 2Gi
readinessProbe:
httpGet:
path: /versions
port: 8080
initialDelaySeconds: 30
periodSeconds: 15
failureThreshold: 76
livenessProbe:
httpGet:
path: /versions
port: 8080
initialDelaySeconds: 30
periodSeconds: 15
failureThreshold: 80
securityContext:
privileged: true
ports:
- containerPort: 8080
name: http
protocol: TCP
envFrom:
- configMapRef:
name: pa-web-conf-all
- secretRef:
name: pa-web-secret
volumeMounts:
- name: shared-volume
mountPath: /MySharedPath
- name: timezone
mountPath: /etc/localtime
volumes:
- nfs:
server: 10.100.10.23
path: /MySharedPath
name: shared-volume
- name: timezone
hostPath:
path: /usr/share/zoneinfo/UTC
This web pod also has a similar "unknown host" problem finding an external database we have configured. The issue being resolved similarly by restarting the pod. Here is the configuration of that external service. Maybe it is easier to tackle the problem from this angle? ActiveMQ has no problem using the database service name to find the DB and startup.
apiVersion: v1
kind: Service
metadata:
name: dbhost
labels:
app: myapp
env: dev
spec:
type: ExternalName
externalName: mydb.host.com
Is it possible that it is a question of which pod, and the app in its container, is started up first and which second?
In any case, connecting using a Service and not the pod name would be recommended as the pod's name assigned by Kubernetes changes between pod restarts.
A way to test connectivity, is to use telnet (or curl for the protocols it supports), if found in the image:
telnet <host/pod/Service> <port>
Not able to find a solution, I created a workaround. I set up the entrypoint.sh in my image to lookup the domain I need to access and write to the log, exiting on error:
#!/bin/bash
#disable echo and exit on error
set +ex
#####################################
# verfiy that the db service can be found or exit container
#####################################
# we do not want to install nslookup to determine if the db_host_name is valid name
# we have ping available though
# 0-success, 1-error pinging but lookup worked (services can not be pinged), 2-unreachable host
ping -W 2 -c 1 ${db_host_name} &> /dev/null
if [ $? -le 1 ]
then
echo "service ${db_host_name} is known"
else
echo "${db_host_name} service is NOT recognized. Exiting container..."
exit 1
fi
Next since only a pod restart fixed the issue. In my ansible deploy, I do a rollout check, querying the log to see if I need to do a pod restart. For example:
rollout-check.yml
- name: "Rollout status for {{rollout_item.statefulset}}"
shell: timeout 4m kubectl rollout status -n {{fa_namespace}} -f {{ rollout_item.statefulset }}
ignore_errors: yes
# assuming that the first pod will be the one that would have an issue
- name: "Get {{rollout_item.pod_name}} log to check for issue with dns lookup"
shell: kubectl logs {{rollout_item.pod_name}} --tail=1 -n {{fa_namespace}}
register: log_line
# the entrypoint will write dbhost service is NOT recognized. Exiting container... to the log
# if there is a problem getting to the dbhost
- name: "Try removing {{rollout_item.component}} pod if unable to deploy"
shell: kubectl delete pods -l component={{rollout_item.component}} --force --grace-period=0 --ignore-not-found=true -n {{fa_namespace}}
when: log_line.stdout.find('service is NOT recognized') > 0
I repeat this rollout check 6 times as sometimes even after a pod restart the service cannot be found. The additional checks are instant once the pod is successfully up.
- name: "Web rollout"
include_tasks: rollout-check.yml
loop:
- { c: 1, statefulset: "{{ dest_deploy }}/web.statefulset.yml", pod_name: "pa-web-deployment-0", component: "analytics-web" }
- { c: 2, statefulset: "{{ dest_deploy }}/web.statefulset.yml", pod_name: "pa-web-deployment-0", component: "analytics-web" }
- { c: 3, statefulset: "{{ dest_deploy }}/web.statefulset.yml", pod_name: "pa-web-deployment-0", component: "analytics-web" }
- { c: 4, statefulset: "{{ dest_deploy }}/web.statefulset.yml", pod_name: "pa-web-deployment-0", component: "analytics-web" }
- { c: 5, statefulset: "{{ dest_deploy }}/web.statefulset.yml", pod_name: "pa-web-deployment-0", component: "analytics-web" }
- { c: 6, statefulset: "{{ dest_deploy }}/web.statefulset.yml", pod_name: "pa-web-deployment-0", component: "analytics-web" }
loop_control:
loop_var: rollout_item

kubernetes connection refused during deployment

I'm trying to achieve a zero downtime deployment using kubernetes and during my test the service doesn't load balance well.
My kubernetes manifest is:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: myapp-deployment
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 1
template:
metadata:
labels:
app: myapp
version: "0.2"
spec:
containers:
- name: myapp-container
image: gcr.io/google-samples/hello-app:1.0
imagePullPolicy: Always
ports:
- containerPort: 8080
protocol: TCP
readinessProbe:
httpGet:
path: /
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
successThreshold: 1
---
apiVersion: v1
kind: Service
metadata:
name: myapp-lb
labels:
app: myapp
spec:
type: LoadBalancer
externalTrafficPolicy: Local
ports:
- port: 80
targetPort: 8080
selector:
app: myapp
If I loop over the service with the external IP, let's say:
$ kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.35.240.1 <none> 443/TCP 1h
myapp-lb LoadBalancer 10.35.252.91 35.205.100.174 80:30549/TCP 22m
using the bash script:
while True
do
curl 35.205.100.174
sleep 0.2s
done
I receive some connection refused during the deployment:
curl: (7) Failed to connect to 35.205.100.174 port 80: Connection refused
The application is the default helloapp provided by Google Cloud Platform and running on 8080.
Cluster information:
Kubernetes version: 1.8.8
Google cloud platform
Machine type: g1-small
It looks like your request goes to a not started pod. I have avoided this by adding a few parameters:
Liveness probe to be sure app has already started
maxUnavalible: 1 to deploy pods one by one
I still have some errors, but they are acceptable because they rarely happen . During the deployment, an error may occur once or twice, so with increasing load you will have a negligible amount of errors. I mean one or two errors per 2000 requests during the deployment.