Internal mesh communication is ignoring settings from the virtual service - kubernetes

I'm trying to inject an HTTP status 500 fault in the bookinfo example.
I managed to inject a 500 error status when the traffic is coming from the Gateway with:
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: bookinfo
namespace: default
spec:
gateways:
- bookinfo-gateway
hosts:
- '*'
http:
- fault:
abort:
httpStatus: 500
percent: 100
match:
- uri:
prefix: /api/v1/products
route:
- destination:
host: productpage
port:
number: 9080
Example:
$ curl $(minikube ip):30890/api/v1/products
fault filter abort
But, I fails to achieve this for traffic that is coming from other pods:
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: bookinfo
namespace: default
spec:
gateways:
- mesh
hosts:
- productpage
http:
- fault:
abort:
httpStatus: 500
percent: 100
match:
- uri:
prefix: /api/v1/products
route:
- destination:
host: productpage
port:
number: 9080
Example:
# jump into a random pod
$ kubectl exec -ti details-v1-dasa231 -- bash
root#details $ curl productpage:9080/api/v1/products
[{"descriptionHtml": ... <- actual product list, I expect a http 500
I tried using the FQDN for the host productpage.svc.default.cluster.local but I get the same behavior.
I checked the proxy status with istioctl proxy-status everything is synced.
I tested if the istio-proxy is injected into the pods, it is:
Pods:
NAME READY STATUS RESTARTS AGE
details-v1-6764bbc7f7-bm9zq 2/2 Running 0 4h
productpage-v1-54b8b9f55-72hfb 2/2 Running 0 4h
ratings-v1-7bc85949-cfpj2 2/2 Running 0 4h
reviews-v1-fdbf674bb-5sk5x 2/2 Running 0 4h
reviews-v2-5bdc5877d6-cb86k 2/2 Running 0 4h
reviews-v3-dd846cc78-lzb5t 2/2 Running 0 4h
I'm completely stuck and not sure what to check next. I feel like I am missing something very obvious.
I would really appreciate any help on this topic.

This should work, and does when I tried. My guess is that you have other conflicting route rules for the productpage service defined.

The root cause of my issues were an improperly set up includeIPRanges in my minicloud cluster. I set up the 10.0.0.1/24 CIDR, but some services were listening on 10.35.x.x.

Related

linkerd Top feature only shows /healthz requests

Doing Lab 7.2. Service Mesh and Ingress Controller from the Kubernetes Developer course from the Linux Foundation and there is a problem I am facing - the Top feature only shows the /healthz requests.
It is supposed to show / requests too. But does not. Would really like to troubleshoot it, but I have no idea how to even approach it.
More details
Following the course instructions I have:
A k8s cluster deployed on two GCE VMs
linkerd
nginx ingress controller
A simple LoadBalancer service off the httpd image. In effect, this is a NodePort service, since the LoadBalancer is never provisioned. The name is secondapp
A simple ingress object routing to the secondapp service.
I have no idea what information is useful to troubleshoot the issue. Here is some that I can think off:
Setup
Linkerd version
student#master:~$ linkerd version
Client version: stable-2.11.1
Server version: stable-2.11.1
student#master:~$
nginx ingress controller version
student#master:~$ helm list
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
myingress default 1 2022-09-28 02:09:35.031108611 +0000 UTC deployed ingress-nginx-4.2.5 1.3.1
student#master:~$
The service list
student#master:~$ k get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 7d4h
myingress-ingress-nginx-controller LoadBalancer 10.106.67.139 <pending> 80:32144/TCP,443:32610/TCP 62m
myingress-ingress-nginx-controller-admission ClusterIP 10.107.109.117 <none> 443/TCP 62m
nginx ClusterIP 10.105.88.244 <none> 443/TCP 3h42m
registry ClusterIP 10.110.129.139 <none> 5000/TCP 3h42m
secondapp LoadBalancer 10.105.64.242 <pending> 80:32000/TCP 111m
student#master:~$
Verifying that the ingress controller is known to linkerd
student#master:~$ k get ds myingress-ingress-nginx-controller -o json | jq .spec.template.metadata.annotations
{
"linkerd.io/inject": "ingress"
}
student#master:~$
The secondapp pod
apiVersion: v1
kind: Pod
metadata:
name: secondapp
labels:
example: second
spec:
containers:
- name: webserver
image: httpd
- name: busy
image: busybox
command:
- sleep
- "3600"
The secondapp service
student#master:~$ k get svc secondapp -o yaml
apiVersion: v1
kind: Service
metadata:
creationTimestamp: "2022-09-28T01:21:00Z"
name: secondapp
namespace: default
resourceVersion: "433221"
uid: 9266f000-5582-4796-ba73-02375f56ce2b
spec:
allocateLoadBalancerNodePorts: true
clusterIP: 10.105.64.242
clusterIPs:
- 10.105.64.242
externalTrafficPolicy: Cluster
internalTrafficPolicy: Cluster
ipFamilies:
- IPv4
ipFamilyPolicy: SingleStack
ports:
- nodePort: 32000
port: 80
protocol: TCP
targetPort: 80
selector:
example: second
sessionAffinity: None
type: LoadBalancer
status:
loadBalancer: {}
student#master:~$
The ingress object
student#master:~$ k get ingress
NAME CLASS HOSTS ADDRESS PORTS AGE
ingress-test <none> www.example.com 80 65m
student#master:~$ k get ingress ingress-test -o yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: nginx
creationTimestamp: "2022-09-28T02:20:03Z"
generation: 1
name: ingress-test
namespace: default
resourceVersion: "438934"
uid: 1952a816-a3f3-42a4-b842-deb56053b168
spec:
rules:
- host: www.example.com
http:
paths:
- backend:
service:
name: secondapp
port:
number: 80
path: /
pathType: ImplementationSpecific
status:
loadBalancer: {}
student#master:~$
Testing
secondapp
student#master:~$ curl "$(curl ifconfig.io):$(k get svc secondapp '--template={{(index .spec.ports 0).nodePort}}')"
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 15 100 15 0 0 340 0 --:--:-- --:--:-- --:--:-- 348
<html><body><h1>It works!</h1></body></html>
student#master:~$
through the ingress controller
student#master:~$ url="$(curl ifconfig.io):$(k get svc myingress-ingress-nginx-controller '--template={{(index .spec.ports 0).nodePort}}')"
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 15 100 15 0 0 319 0 --:--:-- --:--:-- --:--:-- 319
student#master:~$ curl -H "Host: www.example.com" $url
<html><body><h1>It works!</h1></body></html>
student#master:~$
And without the Host header:
student#master:~$ curl $url
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx</center>
</body>
</html>
student#master:~$
And finally the linkerd dashboard Top snapshot:
Where are the GET / requests?
EDIT 1
So on the linkerd slack someone suggested to have a look at https://linkerd.io/2.12/tasks/using-ingress/#nginx and that made me examine my pods more carefully. It turns out one of the nginx-ingress pods could not start and it is clearly due to linkerd injection. Please, observe:
Before linkerd
student#master:~$ k get pod
NAME READY STATUS RESTARTS AGE
myingress-ingress-nginx-controller-gbmbg 1/1 Running 0 19m
myingress-ingress-nginx-controller-qtdhw 1/1 Running 0 3m6s
secondapp 2/2 Running 4 (13m ago) 12h
student#master:~$
After linkerd
student#master:~$ k get ds myingress-ingress-nginx-controller -o yaml | linkerd inject --ingress - | k apply -f -
daemonset "myingress-ingress-nginx-controller" injected
daemonset.apps/myingress-ingress-nginx-controller configured
student#master:~$
And checking the pods:
student#master:~$ k get pod
NAME READY STATUS RESTARTS AGE
myingress-ingress-nginx-controller-gbmbg 1/1 Running 0 40m
myingress-ingress-nginx-controller-xhj5m 1/2 Running 8 (5m59s ago) 17m
secondapp 2/2 Running 4 (34m ago) 12h
student#master:~$
student#master:~$ k describe pod myingress-ingress-nginx-controller-xhj5m |tail
Normal Created 19m kubelet Created container linkerd-proxy
Normal Started 19m kubelet Started container linkerd-proxy
Normal Pulled 18m (x2 over 19m) kubelet Container image "registry.k8s.io/ingress-nginx/controller:v1.3.1#sha256:54f7fe2c6c5a9db9a0ebf1131797109bb7a4d91f56b9b362bde2abd237dd1974" already present on machine
Normal Created 18m (x2 over 19m) kubelet Created container controller
Normal Started 18m (x2 over 19m) kubelet Started container controller
Warning FailedPreStopHook 18m kubelet Exec lifecycle hook ([/wait-shutdown]) for Container "controller" in Pod "myingress-ingress-nginx-controller-xhj5m_default(93dd0189-091f-4c56-a197-33991932d66d)" failed - error: command '/wait-shutdown' exited with 137: , message: ""
Warning Unhealthy 18m (x6 over 19m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 502
Normal Killing 18m kubelet Container controller failed liveness probe, will be restarted
Warning Unhealthy 14m (x30 over 19m) kubelet Liveness probe failed: HTTP probe failed with statuscode: 502
Warning BackOff 4m29s (x41 over 14m) kubelet Back-off restarting failed container
student#master:~$
I will process the link I was given on the linkerd slack and update this post with any new findings.
The solution was provided by the Axenow user on the linkerd2 slack forum. The problem is that ingress-nginx cannot share the namespace with the services it provides the ingress functionality to. In my case all of them were in the default namespace.
To quote Axenow:
When you deploy nginx, by default it send traffic to the pod directly.
To fix it you have to make this configuration:
https://linkerd.io/2.12/tasks/using-ingress/#nginx
To elaborate, one has to update the values.yaml file of the downloaded ingress-nginx helm chart to make sure the following is true:
controller:
replicaCount: 2
service:
externalTrafficPolicy: Cluster
podAnnotations:
linkerd.io/inject: enabled
And install the controller in a dedicated namespace:
helm upgrade --install --create-namespace --namespace ingress-nginx -f values.yaml ingress-nginx ingress-nginx/ingress-nginx
(Having uninstalled the previous installation, of course)

How to scale my app on nginx metrics without prometheus?

I want to scale my application based on custom metrics (RPS or active connections in this cases). Without having to set up prometheus or use any external service. I can expose this API from my web app. What are my options?
Monitoring different types of metrics (e.g. custom metrics) on most Kubernetes clusters is the foundation that leads to more stable and reliable systems/applications/workloads. As discussed in the comments section, to monitor custom metrics, it is recommended to use tools designed for this purpose rather than inventing a workaround. I'm glad that in this case the final decision was to use Prometheus and KEDA to properly scale the web application.
I would like to briefly show other community members who are struggling with similar considerations how KEDA works.
To use Prometheus as a scaler for Keda, we need to install and configure Prometheus.
There are many different ways to install Prometheus and you should choose the one that suits your needs.
I've installed the kube-prometheus stack with Helm:
NOTE: I allowed Prometheus to discover all PodMonitors/ServiceMonitors within its namespace, without applying label filtering by setting the prometheus.prometheusSpec.podMonitorSelectorNilUsesHelmValues and prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues values to false.
$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
$ helm repo update
$ helm install prom-1 prometheus-community/kube-prometheus-stack --set prometheus.prometheusSpec.podMonitorSelectorNilUsesHelmValues=false --set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
alertmanager-prom-1-kube-prometheus-sta-alertmanager-0 2/2 Running 0 2m29s
prom-1-grafana-865d4c8876-8zdhm 3/3 Running 0 2m34s
prom-1-kube-prometheus-sta-operator-6b5d5d8df5-scdjb 1/1 Running 0 2m34s
prom-1-kube-state-metrics-74b4bb7857-grbw9 1/1 Running 0 2m34s
prom-1-prometheus-node-exporter-2v2s6 1/1 Running 0 2m34s
prom-1-prometheus-node-exporter-4vc9k 1/1 Running 0 2m34s
prom-1-prometheus-node-exporter-7jchl 1/1 Running 0 2m35s
prometheus-prom-1-kube-prometheus-sta-prometheus-0 2/2 Running 0 2m28s
Then we can deploy an application that will be monitored by Prometheus. I've created a simple application that exposes some metrics (such as nginx_vts_server_requests_total) on the /status/format/prometheus path:
$ cat app-1.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-1
spec:
selector:
matchLabels:
app: app-1
template:
metadata:
labels:
app: app-1
spec:
containers:
- name: app-1
image: mattjcontainerregistry/nginx-vts:v1.0
resources:
limits:
cpu: 50m
requests:
cpu: 50m
ports:
- containerPort: 80
name: http
---
apiVersion: v1
kind: Service
metadata:
name: app-1
labels:
app: app-1
spec:
ports:
- port: 80
targetPort: 80
name: http
selector:
app: app-1
type: LoadBalancer
Next, create a ServiceMonitor that describes how to monitor our app-1 application:
$ cat servicemonitor.yaml
kind: ServiceMonitor
apiVersion: monitoring.coreos.com/v1
metadata:
name: app-1
labels:
app: app-1
spec:
selector:
matchLabels:
app: app-1
endpoints:
- interval: 15s
path: "/status/format/prometheus"
port: http
After waiting some time, let's check the app-1 logs to make sure that it is scrapped correctly:
$ kubectl get pods | grep app-1
app-1-5986d56f7f-2plj5 1/1 Running 0 35s
$ kubectl logs -f app-1-5986d56f7f-2plj5
10.44.1.6 - - [07/Feb/2022:16:31:11 +0000] "GET /status/format/prometheus HTTP/1.1" 200 2742 "-" "Prometheus/2.33.1" "-"
10.44.1.6 - - [07/Feb/2022:16:31:26 +0000] "GET /status/format/prometheus HTTP/1.1" 200 3762 "-" "Prometheus/2.33.1" "-"
10.44.1.6 - - [07/Feb/2022:16:31:41 +0000] "GET /status/format/prometheus HTTP/1.1" 200 3762 "-" "Prometheus/2.33.1" "-"
Now it's time to deploy KEDA. There are a few approaches to deploy KEDA runtime as described in the KEDA documentation.
I chose to install KEDA with Helm because it's very simple :-)
$ helm repo add kedacore https://kedacore.github.io/charts
$ helm repo update
$ kubectl create namespace keda
$ helm install keda kedacore/keda --namespace keda
The last thing we need to create is a ScaledObject which is used to define how KEDA should scale our application and what the triggers are. In the example below, I used the nginx_vts_server_requests_total metric.
NOTE: For more information on the prometheus trigger, see the Trigger Specification documentation.
$ cat scaled-object.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: scaled-app-1
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: app-1
pollingInterval: 30
cooldownPeriod: 120
minReplicaCount: 1
maxReplicaCount: 5
advanced:
restoreToOriginalReplicaCount: false
horizontalPodAutoscalerConfig:
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 100
periodSeconds: 15
triggers:
- type: prometheus
metadata:
serverAddress: http://prom-1-kube-prometheus-sta-prometheus.default.svc:9090
metricName: nginx_vts_server_requests_total
query: sum(rate(nginx_vts_server_requests_total{code="2xx", service="app-1"}[2m])) # Note: query must return a vector/scalar single element response
threshold: '10'
$ kubectl apply -f scaled-object.yaml
scaledobject.keda.sh/scaled-app-1 created
Finally, we can check if the app-1 application scales correctly based on the number of requests:
$ for a in $(seq 1 10000); do curl <PUBLIC_IP_APP_1> 1>/dev/null 2>&1; done
$ kubectl get hpa -w
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS
keda-hpa-scaled-app-1 Deployment/app-1 0/10 (avg) 1 5 1
keda-hpa-scaled-app-1 Deployment/app-1 15/10 (avg) 1 5 2
keda-hpa-scaled-app-1 Deployment/app-1 12334m/10 (avg) 1 5 3
keda-hpa-scaled-app-1 Deployment/app-1 13250m/10 (avg) 1 5 4
keda-hpa-scaled-app-1 Deployment/app-1 12600m/10 (avg) 1 5 5
$ kubectl get pods | grep app-1
app-1-5986d56f7f-2plj5 1/1 Running 0 36m
app-1-5986d56f7f-5nrqd 1/1 Running 0 77s
app-1-5986d56f7f-78jw8 1/1 Running 0 94s
app-1-5986d56f7f-bl859 1/1 Running 0 62s
app-1-5986d56f7f-xlfp6 1/1 Running 0 45s
As you can see above, our application has been correctly scaled to 5 replicas.

Why can't my service pass traffic to a pod with a named port on minikube?

I'm having trouble with the examples in section 5.1.1 Using Named Ports of Kubernetes In Action by Marko Luksa. The example goes like this:
First - Create
I'm creating a pod with a named port that runs a Node.js container that responds with You've hit <hostname> when it's hit:
apiVersion: v1
kind: Pod
metadata:
name: named-port-pod
labels:
app: named-port
spec:
containers:
- name: kubia
image: michaellundquist/kubia
ports:
- name: http
containerPort: 8080
And a service like this (note, this is a simplified version of the original example which also doesn't work.:
apiVersion: v1
kind: Service
metadata:
name: named-port-service
spec:
ports:
- name: http
port: 80
targetPort: http
selector:
app: named-port
Second - Verify
$ kubectl get po -o wide --show-labels
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES LABELS
named-port-pod 1/1 Running 0 45m 172.17.0.7 minikube <none> <none> app=named-port
$ kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 53m
named-port-service ClusterIP 10.96.115.108 <none> 80/TCP 19m
$ kubectl describe service named-port-service
Name: named-port-service
Namespace: default
Labels: <none>
Annotations: <none>
Selector: app=named-port
Type: ClusterIP
IP: 10.96.115.108
Port: http 80/TCP
TargetPort: http/TCP
Endpoints: 172.17.0.7:8080
Session Affinity: None
Events: <none>
Third - Test (Failing)
$ kubectl exec named-port-pod -- curl named-port-pod:8080
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 26 0 26 0 0 5494 0 --:--:-- --:--:-- --:--:-- 6500
You've hit named-port-pod
$ kubectl exec named-port-pod -- curl --max-time 20 named-port-service
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:19 --:--:-- 0curl: (28) Connection timed out after 20001 milliseconds
command terminated with exit code 28
As you can see, everything works when I hit named-port-pod:8080, but fails when I hit named-port-service. I'm pretty sure I have the mapping correct because kubectl describe service named-port-service has the correct endpoint I think minikube can use named ports but my service can't pass connections to my pod. Why?
p.s here's my minikube version:
$ minikube version
minikube version: v1.6.2
commit: 54f28ac5d3a815d1196cd5d57d707439ee4bb392
This is known issue with minikube. Pod cannot reach itself via service IP. You can try accesing your service from a different pod or use the following workaround to fix this.
minikube ssh
sudo ip link set docker0 promisc on
Open issue: https://github.com/kubernetes/minikube/issues/1568

Why there is downtime while rolling update a deployment or even scaling down a replicaset

Due to official document of kubernetes
Rolling updates allow Deployments' update to take place with zero downtime by incrementally updating Pods instances with new ones
I was trying to perform zero downtime update using Rolling Update strategy which was recommanded way to update an application in kube cluster.
Official reference:
https://kubernetes.io/docs/tutorials/kubernetes-basics/update/update-intro/
But i was a a little bit confused about the definition while performing it: downtime of application still happens. Here is my cluster info at the begining, as shown below:
liguuudeiMac:~ liguuu$ kubectl get all
NAME READY STATUS RESTARTS AGE
pod/ubuntu-b7d6cb9c6-6bkxz 1/1 Running 0 3h16m
pod/webapp-deployment-6dcf7b88c7-4kpgc 1/1 Running 0 3m52s
pod/webapp-deployment-6dcf7b88c7-4vsch 1/1 Running 0 3m52s
pod/webapp-deployment-6dcf7b88c7-7xzsk 1/1 Running 0 3m52s
pod/webapp-deployment-6dcf7b88c7-jj8vx 1/1 Running 0 3m52s
pod/webapp-deployment-6dcf7b88c7-qz2xq 1/1 Running 0 3m52s
pod/webapp-deployment-6dcf7b88c7-s7rtt 1/1 Running 0 3m52s
pod/webapp-deployment-6dcf7b88c7-s88tb 1/1 Running 0 3m52s
pod/webapp-deployment-6dcf7b88c7-snmw5 1/1 Running 0 3m52s
pod/webapp-deployment-6dcf7b88c7-v287f 1/1 Running 0 3m52s
pod/webapp-deployment-6dcf7b88c7-vd4kb 1/1 Running 0 3m52s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 3h16m
service/tc-webapp-service NodePort 10.104.32.134 <none> 1234:31234/TCP 3m52s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/ubuntu 1/1 1 1 3h16m
deployment.apps/webapp-deployment 10/10 10 10 3m52s
NAME DESIRED CURRENT READY AGE
replicaset.apps/ubuntu-b7d6cb9c6 1 1 1 3h16m
replicaset.apps/webapp-deployment-6dcf7b88c7 10 10 10 3m52s
deployment.apps/webapp-deployment is a tomcat-based webapp application, and the Service tc-webapp-service mapped to Pods contains tomcat containers(the full deployment config files was present at the end of ariticle). deployment.apps/ubuntu is just a standalone app in cluster, which is about to perform infinite http request to tc-webapp-service every second, so that i can trace the status of so called rolling update of webapp-deployment, the commands run in ubuntu container was likely as below(infinite loop of curl command in every 0.01 second):
for ((;;)); do curl -sS -D - http://tc-webapp-service:1234 -o /dev/null | grep HTTP; date +"%Y-%m-%d %H:%M:%S"; echo ; sleep 0.01 ; done;
And the output of ubuntu app(everthing is fine):
...
HTTP/1.1 200
2019-08-30 07:27:15
...
HTTP/1.1 200
2019-08-30 07:27:16
...
Then I try to change tag of tomcat image, from 8-jdk8 to 8-jdk11. Note that the rolling update strategy of deployment.apps/webapp-deployment has been config correctly, with maxSurge 0 and maxUnavailable 9.(the same result if these two attr were default )
...
spec:
containers:
- name: tc-part
image: tomcat:8-jdk8 -> tomcat:8-jdk11
...
Then, the output of ubuntu app:
HTTP/1.1 200
2019-08-30 07:47:43
curl: (56) Recv failure: Connection reset by peer
2019-08-30 07:47:43
HTTP/1.1 200
2019-08-30 07:47:44
As shown above, some http requests failed, and this is no doubt the interruption of application while performing rolling update for apps in kube cluster.
However, I can also replay the situation mentioned above(interruption) in Scaling down, the commands as shown below(from 10 to 2):
kubectl scale deployment.apps/tc-webapp-service --replicas=2
After performing the above tests, I was wondering whether so-called Zero downtime actually means. Although the way mocking http request was a little bit tricky, the situation is so normal for some applications which were designed to be able to handle thousands of, millions of request in one second.
env:
liguuudeiMac:cacheee liguuu$ minikube version
minikube version: v1.3.1
commit: ca60a424ce69a4d79f502650199ca2b52f29e631
liguuudeiMac:cacheee liguuu$ kubectl version
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.3", GitCommit:"5e53fd6bc17c0dec8434817e69b04a25d8ae0ff0", GitTreeState:"clean", BuildDate:"2019-06-06T01:44:30Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.2", GitCommit:"f6278300bebbb750328ac16ee6dd3aa7d3549568", GitTreeState:"clean", BuildDate:"2019-08-05T09:15:22Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
Deployment & Service Config:
# Service
apiVersion: v1
kind: Service
metadata:
name: tc-webapp-service
spec:
type: NodePort
selector:
appName: tc-webapp
ports:
- name: tc-svc
protocol: TCP
port: 1234
targetPort: 8080
nodePort: 31234
---
# Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: webapp-deployment
spec:
replicas: 10
selector:
matchLabels:
appName: tc-webapp
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 0
maxUnavailable: 9
# Pod Templates
template:
metadata:
labels:
appName: tc-webapp
spec:
containers:
- name: tc-part
image: tomcat:8-jdk8
ports:
- containerPort: 8080
livenessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 10
periodSeconds: 10
readinessProbe:
httpGet:
scheme: HTTP
port: 8080
path: /
initialDelaySeconds: 5
periodSeconds: 1
To deploy an application which will really update with zero downtime the application should meet some requirements. To mention few of them:
application should handle graceful shutdown
application should implement readiness and liveness probes correctly
For example if shutdown signal is recived, then it should not respond with 200 to new readiness probes, but it still respond with 200 for liveness untill all old requests are processed.

GKE with Ingress setup always gives status UNHEALTHY

To start of I have tested the tutorial at https://cloud.google.com/kubernetes-engine/docs/tutorials/http-balancer
which works fine. I also tested the same tutorial but added a tls secret as well to test https which also worked fine.
My problems arise when I create my own image. Here is the steps I take:
The Dockerfile:
# We label our stage as "builder"
FROM node:9.4.0-alpine as builder
COPY package.json package-lock.json ./
## Storing node modules on a separate layer will prevent unnecessary npm installs at each build
RUN npm i && mkdir /srv/cs-ui && cp -R ./node_modules ./srv/cs-ui
WORKDIR /srv/cs-ui
COPY . .
## Build the angular app in production mode and store the artifacts in dist folder
RUN $(npm bin)/ng build --environment "prod"
FROM nginx
## Copy our default nginx config
COPY nginx/default.conf /etc/nginx/conf.d/
## Remove default nginx website
RUN rm -rf /usr/share/nginx/html/*
## From "builder" stage copy over the artifacts in dist folder to default nginx nginx public folder
COPY --from=builder /srv/cs-ui/dist /usr/share/nginx/html/
The Dockerfile is run with docker-compose file that looks like this:
version: '2'
services:
cs-ui:
image: "gcr.io/cs-micro/cs-ui:v1"
container_name: "cs-ui"
tty: true
build: .
ports:
- "80:80"
Locally this works without any issues. The next thing I do is to push it to the Container Registry.
gcloud docker -- push gcr.io/cs-micro/cs-ui:v1
After that I create a container:
kubectl run cs-ui --image=gcr.io/cs-micro/cs-ui:v1 --port=80
Then I expose it:
kubectl expose deployment cs-ui --target-port=80 --type=NodePort
Then I run the following ingress file:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: basic-ingress
spec:
tls:
- secretName: tls-certificate
backend:
serviceName: cs-ui
servicePort: 80
with command:
kubectl apply -f test.yaml
kubectl describe service
Name: cs-ui
Namespace: default
Labels: run=cs-ui
Annotations:
Selector: run=cs-ui
Type: NodePort
IP: 10.35.244.124
Port: 80/TCP
TargetPort: 80/TCP
NodePort: 30272/TCP
Endpoints: 10.32.0.32:80
Session Affinity: None
External Traffic Policy: Cluster
Events:
Name: kubernetes
Namespace: default
Labels: component=apiserver
provider=kubernetes
Annotations:
Selector:
Type: ClusterIP
IP: 10.35.240.1
Port: https 443/TCP
TargetPort: 443/TCP
Endpoints: 35.195.192.28:443
Session Affinity: ClientIP
Events:
kubectl describe deployment
Name: cs-ui
Namespace: default
CreationTimestamp: Thu, 25 Jan 2018 12:27:59 +0100
Labels: run=cs-ui
Annotations: deployment.kubernetes.io/revision=1
Selector: run=cs-ui
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 1 max unavailable, 1 max surge
Pod Template:
Labels: run=cs-ui
Containers:
cs-ui:
Image: gcr.io/cs-micro/cs-ui:v1
Port: 80/TCP
Environment:
Mounts:
Volumes:
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
OldReplicaSets:
NewReplicaSet: cs-ui-2929390783 (1/1 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 9m deployment-controller Scaled up replica set cs-ui-2929390783 to 1
kubectl describe ing
Name: basic-ingress
Namespace: default
Address: 35.227.220.186
Default backend: cs-ui:80 (10.32.0.32:80)
TLS:
tls-certificate terminates
Rules:
Host Path Backends
---- ---- --------
* * cs-ui:80 (10.32.0.32:80)
Annotations:
https-forwarding-rule: k8s-fws-default-basic-ingress--f5fde3efbfa51336
https-target-proxy: k8s-tps-default-basic-ingress--f5fde3efbfa51336
ssl-cert: k8s-ssl-default-basic-ingress--f5fde3efbfa51336
target-proxy: k8s-tp-default-basic-ingress--f5fde3efbfa51336
url-map: k8s-um-default-basic-ingress--f5fde3efbfa51336
backends: {"k8s-be-30272--f5fde3efbfa51336":"UNHEALTHY"}
forwarding-rule: k8s-fw-default-basic-ingress--f5fde3efbfa51336
static-ip: k8s-fw-default-basic-ingress--f5fde3efbfa51336
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ADD 12m loadbalancer-controller default/basic-ingress
Normal CREATE 11m loadbalancer-controller ip: 35.227.220.186
Normal Service 6m (x4 over 11m) loadbalancer-controller default backend set to cs-ui:30272
After 3-5 minutes I get Unhealthy and I have no clue why because the setup is almost exactly the same as with their setup.
I have read countless of threads on what to do when you get the backend status of Unhealthy, but none of them have helped. One mentioned to add a firewall rule mention in this tutorial: https://cloud.google.com/compute/docs/load-balancing/health-checks which I have added, but did not help.
If you have any suggestions I will gladly test them.
Turned out our Angular application had a redirect on '/' which gave it a 302 response. This response makes the health check fail and results in a UNHEALTHY state.
As soon as we set up a custom health check it worked.