Route change apply in Istio too slow and make deployment failed - kubernetes

I am working on DevOps solution, and try to automate the blue-green deployment solution on kubernetes. However, we are facing the issue that the istio apply the route rules too slow, when removing the virtualservices and take a long time to effective. We tried to wait 60s to wait the rules updated and destroy the old pods. We don't have ideas that 60s is enough to finish the route change, and will have downtime if over 60s to take effective. Would like to get some advises on how to check the route (to green one only ) is updated properly? and how to make the istio apply to execute faster? Thanks.
Here is the yaml file to apply the virtualservice:
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
namespace: xxx-d
name: xxx-virtualservice
labels:
microservice: xxx-new
spec:
hosts:
- xxx.com
gateways:
- mesh
- http-gateway.istio-system.svc.cluster.local
- https-gateway.istio-system.svc.cluster.local
http:
- headers:
request:
set:
x-forwarded-port: '443'
x-forwarded-proto: https
route:
- destination:
host: xxx-service.svc.cluster.local
port:
number: 8080
retries:
attempts: 3
retryOn: gateway-error,connect-failure,refused-stream
timeout: 3s

Related

Adding Custom httpStatus to Kubernetes Virtual Service

I am deploying a virtual service on Kubernetes using which I want to expose a host. This host will target the elastic search DB and allow read only operations on it. Every other CRUD operation should be blocked.
After blocking I get 404 Not Found HTTP Response but I want to customize it to HTTP 405: Method Not Found.
Any suggestions on achieving this? Attached is the virtual service YAML file used in kubernetes
Tried
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: read-es-vs
spec:
hosts:
- "readonlyelastic.com"
gateways:
- istio-system/default-gateway
http:
- match:
- uri:
prefix: /
method:
exact: GET
route:
- destination:
port:
number: 8080
host: elasticsearch-master-data
fault:
abort:
httpStatus: 405
But it didnot give the expected result

How to expose a service to outside Kubernetes cluster via ingress?

I'm struggling to expose a service in an AWS cluster to outside and access it via a browser. Since my previous question haven't drawn any answers, I decided to simplify the issue in several aspects.
First, I've created a deployment which should work without any configuration. Based on this article, I did
kubectl create namespace tests
created file probe-service.yaml based on paulbouwer/hello-kubernetes:1.8 and deployed it kubectl create -f probe-service.yaml -n tests:
apiVersion: v1
kind: Service
metadata:
name: hello-kubernetes-first
spec:
type: ClusterIP
ports:
- port: 80
targetPort: 8080
selector:
app: hello-kubernetes-first
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: hello-kubernetes-first
spec:
replicas: 3
selector:
matchLabels:
app: hello-kubernetes-first
template:
metadata:
labels:
app: hello-kubernetes-first
spec:
containers:
- name: hello-kubernetes
image: paulbouwer/hello-kubernetes:1.8
ports:
- containerPort: 8080
env:
- name: MESSAGE
value: Hello from the first deployment!
created ingress.yaml and applied it (kubectl apply -f .\probes\ingress.yaml -n tests)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: hello-kubernetes-ingress
spec:
rules:
- host: test.projectname.org
http:
paths:
- pathType: Prefix
path: "/test"
backend:
service:
name: hello-kubernetes-first
port:
number: 80
- host: test2.projectname.org
http:
paths:
- pathType: Prefix
path: "/test2"
backend:
service:
name: hello-kubernetes-first
port:
number: 80
ingressClassName: nginx
Second, I can see that DNS actually point to the cluster and ingress rules are applied:
if I open http://test.projectname.org/test or any irrelevant path (http://test.projectname.org/test3), I'm shown NET::ERR_CERT_AUTHORITY_INVALID, but
if I use "open anyway" in browser, irrelevant paths give ERR_TOO_MANY_REDIRECTS while http://test.projectname.org/test gives Cannot GET /test
Now, TLS issues aside (those deserve a separate question), why can I get Cannot GET /test? It looks like ingress controller (ingress-nginx) got the rules (otherwise it wouldn't descriminate paths; that's why I don't show DNS settings, although they are described in the previous question) but instead of showing the simple hello-kubernetes page at /test it returns this simple 404 message. Why is that? What could possibly go wrong? How to debug this?
Some debug info:
kubectl version --short tells Kubernetes Client Version is v1.21.5 and Server Version is v1.20.7-eks-d88609
kubectl get ingress -n tests shows that hello-kubernetes-ingress exists indeed, with nginx class, 2 expected hosts, address equal to that shown for load balancer in AWS console
kubectl get all -n tests shows
NAME READY STATUS RESTARTS AGE
pod/hello-kubernetes-first-6f77d8ff99-gjw5d 1/1 Running 0 5h4m
pod/hello-kubernetes-first-6f77d8ff99-ptwsn 1/1 Running 0 5h4m
pod/hello-kubernetes-first-6f77d8ff99-x8w87 1/1 Running 0 5h4m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/hello-kubernetes-first ClusterIP 10.100.18.189 <none> 80/TCP 5h4m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/hello-kubernetes-first 3/3 3 3 5h4m
NAME DESIRED CURRENT READY AGE
replicaset.apps/hello-kubernetes-first-6f77d8ff99 3 3 3 5h4m
ingress-nginx was installed before me via the following chart:
apiVersion: v2
name: nginx
description: A Helm chart for Kubernetes
type: application
version: 4.0.6
appVersion: "1.0.4"
dependencies:
- name: ingress-nginx
version: 4.0.6
repository: https://kubernetes.github.io/ingress-nginx
and the values overwrites applied with the chart differ from the original ones mostly (well, those got updated since the installation) in extraArgs: default-ssl-certificate: "nginx-ingress/dragon-family-com" is uncommneted
PS To answer Andrew, I indeed tried to setup HTTPS but it seemingly didn't help, so I haven't included what I tried into the initial question. Yet, here's what I did:
installed cert-manager, currently without a custom chart: kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.5.4/cert-manager.yaml
based on cert-manager's tutorial and SO question created a ClusterIssuer with the following config:
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-backoffice
spec:
acme:
server: https://acme-staging-v02.api.letsencrypt.org/directory
# use https://acme-v02.api.letsencrypt.org/directory after everything is fixed and works
privateKeySecretRef: # this secret will be created in the namespace of cert-manager
name: letsencrypt-backoffice-private-key
# email: <will be used for urgent alerts about expiration etc>
solvers:
# TODO: add for each domain/second-level domain/*.projectname.org
- selector:
dnsZones:
- test.projectname.org
- test2.projectname.org
# haven't made it to work yet, so switched to the simpler to configure http01 challenge
# dns01:
# route53:
# region: ... # that of load balancer (but we also have ...)
# accessKeyID: <of IAM user with access to Route53>
# secretAccessKeySecretRef: # created that
# name: route53-credentials-secret
# key: secret-access-key
# role: arn:aws:iam::645730347045:role/cert-manager
http01:
ingress:
class: nginx
and applied it via kubectl apply -f issuer.yaml
created 2 certificates in the same file and applied it again:
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: letsencrypt-certificate
spec:
secretName: tls-secret
issuerRef:
kind: ClusterIssuer
name: letsencrypt-backoffice
commonName: test.projectname.org
dnsNames:
- test.projectname.org
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: letsencrypt-certificate-2
spec:
secretName: tls-secret-2
issuerRef:
kind: ClusterIssuer
name: letsencrypt-backoffice
commonName: test2.projectname.org
dnsNames:
- test2.projectname.org
made sure that the certificates are issued correctly (skipping the pain part, the result is: kubectl get certificates shows that both certificates have READY = true and both tls secrets are created)
figured that my ingress is in another namespace and secrets for tls in ingress spec can only be referred in the same namespace (haven't tried the wildcard certificate and --default-ssl-certificate option yet), so for each one copied them to tests namespace:
opened existing secret, like kubectl edit secret tls-secret-2, copied data and annotations
created an empty (Opaque) secret in tests: kubectl create secret generic tls-secret-2-copy -n tests
opened it (kubectl edit secret tls-secret-2-copy -n tests) and inserted data an annotations
in ingress spec, added the tls bit:
tls:
- hosts:
- test.projectname.org
secretName: tls-secret-copy
- hosts:
- test2.projectname.org
secretName: tls-secret-2-copy
I hoped that this will help, but actually it made no difference (I get ERR_TOO_MANY_REDIRECTS for irrelevant paths, redirect from http to https, NET::ERR_CERT_AUTHORITY_INVALID at https and Cannot GET /test if I insist on getting to the page)
Since you've used your own answer to complement the question, I'll kind of answer all the things you asked, while providing a divide and conquer strategy to troubleshooting kubernetes networking.
At the end I'll give you some nginx and IP answers
This is correct
- host: test3.projectname.org
http:
paths:
- pathType: Prefix
path: "/"
backend:
service:
name: hello-kubernetes-first
port:
number: 80
Breaking down troubleshooting with Ingress
DNS
Ingress
Service
Pod
Certificate
1.DNS
you can use the command dig to query the DNS
dig google.com
Ingress
the ingress controller doesn't look for the IP, it just looks for the headers
you can force a host using any tool that lets you change the headers, like curl
curl --header 'Host: test3.projectname.com' http://123.123.123.123 (your public IP)
Service
you can be sure that your service is working by creating ubuntu/centos pod, using kubectl exec -it podname -- bash and trying to curl your service form withing the cluster
Pod
You're getting this
192.168.14.57 - - [14/Nov/2021:12:02:58 +0000] "GET /test2 HTTP/2.0" 404 144
"-" "<browser's user-agent header value>" 448 0.002
This part GET /test2 means that the request got the address from the DNS, went all the way from the internet, found your clusters, found your ingress controller, got through the service and reached your pod. Congratz! Your ingress is working!
But why is it returning 404?
The path that was passed to the service and from the service to the pod is /test2
Do you have a file called test2 that nginx can serve? Do you have an upstream config in nginx that has a test2 prefix?
That's why, you're getting a 404 from nginx, not from the ingress controller.
Those IPs are internal, remember, the internet traffic ended at the cluster border, now you're in an internal network. Here's a rough sketch of what's happening
Let's say that you're accessing it from your laptop. Your laptop has the IP 192.168.123.123, but your home has the address 7.8.9.1, so when your request hits the cluster, the cluster sees 7.8.9.1 requesting test3.projectname.com.
The cluster looks for the ingress controller, which finds a suitable configuration and passed the request down to the service, which passes the request down to the pod.
So,
your router can see your private IP (192.168.123.123)
Your cluster(ingress) can see your router's IP (7.8.9.1)
Your service can see the ingress's IP (192.168.?.?)
Your pod can see the service's IP (192.168.14.57)
It's a game of pass around.
If you want to see the public IP in your nginx logs, you need to customize it to get the X-Real-IP header, which is usually where load-balancers/ingresses/ambassador/proxies put the actual requester public IP.
Well, I haven't figured this out for ArgoCD yet (edit: figured, but the solution is ArgoCD-specific), but for this test service it seems that path resolving is the source of the issue. It may be not the only source (to be retested on test2 subdomain), but when I created a new subdomain in the hosted zone (test3, not used anywhere before) and pointed it via A entry to the load balancer (as "alias" in AWS console), and then added to the ingress a new rule with / path, like this:
- host: test3.projectname.org
http:
paths:
- pathType: Prefix
path: "/"
backend:
service:
name: hello-kubernetes-first
port:
number: 80
I've finally got the hello kubernetes thing on http://test3.projectname.org. I have succeeded with TLS after a number of attempts/research and some help in a separate question.
But I haven't succeeded with actual debugging: looking at kubectl logs -n nginx <pod name, see kubectl get pod -n nginx> doesn't really help understanding what path was passed to the service and is rather difficult to understand (can't even find where those IPs come from: they are not mine, LB's, cluster IP of the service; neither I understand what tests-hello-kubernetes-first-80 stands for – it's just a concatenation of namespace, service name and port, no object has such name, including ingress):
192.168.14.57 - - [14/Nov/2021:12:02:58 +0000] "GET /test2 HTTP/2.0" 404 144
"-" "<browser's user-agent header value>" 448 0.002
[tests-hello-kubernetes-first-80] [] 192.168.49.95:8080 144 0.000 404 <some hash>
Any more pointers on debugging will be helpful; also suggestions regarding correct path ~rewriting for nginx-ingress are welcome.

Why isn't the circuit breaking of ISTIO working?

with istio 1.4.6
I configured Kubernetes using resources such as service, deployment.
I also configured gateway, virtual service, and destination rules to implement circuit breakers.
The composition diagram is as follows. (number of Pod's replica is two. & I operate only one version of app.)
I wrote VirtualServices and DestinationRules to use circuit breakers
VirtualService
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: reviews-virtual-service
spec:
gateways:
- reviews-istio-gateway
hosts:
- reviews
http:
- route:
- destination:
host: reviews-service
port:
number: 80
DestinationRules
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: reviews-destination-rule
spec:
host: reviews
trafficPolicy:
loadBalancer:
simple: ROUND_ROBIN
outlierDetection:
baseEjectionTime: 1m
consecutiveErrors: 1
interval: 1s
maxEjectionPercent: 100
Here, I expect that if more than one error occurs in reviews-app, all pods will be excluded from the load balancing list for a minute.
Therefore, I expected the circuit breaking to work as below.
However, contrary to expectations, circuit breakers did not work, and error logs were continuously being recorded in reviews-app.
Why isn't the circuit breaker working?
I guess the problem is not about Circuit Breaking, but about the usage of Virtual Services and Destination Rules.
For example, if using a Virtual Service with a Gateway, its host should probably be of public host, like http://amce.io
The host of the Destination Rule should probably be that of the Kubernetes Service.

Does Istio allow to configure a maximum response timeout for a circuit breaker to open? How?

I'm checking the documentation for the DestinationRule, where there are several examples of a circuit breaking configuration, e.g:
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: bookinfo-app
spec:
host: bookinfoappsvc.prod.svc.cluster.local
trafficPolicy:
connectionPool:
tcp:
connectTimeout: 30ms
...
The connectionPool.tcp element offers a connectTimeout. However what I need to configure is a maximum response timeout. Imagine I want to open the circuit if the service takes longer than 5 seconds to answer. Is it possible to configure this in Istio? How?
Take a look at Tasks --> Traffic Management --> Setting Request Timeouts:
A timeout for http requests can be specified using the timeout field
of the route rule. By default, the timeout is 15 seconds [...]
So, you must set the http.timeout in the VirtualService configuration.
Take a look at this example from the Virtual Service / Destination official docs:
The following VirtualService sets a timeout of 5s for all calls to
productpage.prod.svc.cluster.local service in Kubernetes.
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: my-productpage-rule
namespace: istio-system
spec:
hosts:
- productpage.prod.svc.cluster.local # ignores rule namespace
http:
- timeout: 5s
route:
- destination:
host: productpage.prod.svc.cluster.local
http.timeout: Timeout for HTTP requests.

How to configure Ingress request timeouts on GKE

I currently have an Ingress configured on GKE (k8s 1.2) to forward requests towards my application's pods. I have a request which can take a long time (30 seconds) and timeout from my application (504). I observe that when doing so the response that i receive is not my own 504 but a 502 from what looks like the Google Loadbalancer after 60 seconds.
I have played around with different status codes and durations, exactly after 30 seconds i start receiving this weird behaviour regardless of statuscode emitted.
Anybody have a clue how i can fix this? Is there a way to reconfigure this behaviour?
Beginning with 1.11.3-gke.18, it is possible to configure timeout settings in kubernetes directly.
First add a backendConfig:
apiVersion: cloud.google.com/v1beta1
kind: BackendConfig
metadata:
name: my-bsc-backendconfig
spec:
timeoutSec: 40
Then add an annotation in Service to use this backendConfig:
apiVersion: v1
kind: Service
metadata:
name: my-bsc-service
labels:
purpose: bsc-config-demo
annotations:
beta.cloud.google.com/backend-config: '{"ports": {"80":"my-bsc-backendconfig"}}'
spec:
type: NodePort
selector:
purpose: bsc-config-demo
ports:
- port: 80
protocol: TCP
targetPort: 8080
And viola, your ingress load balancer now has a timeout of 40 second instead of the default 30 seconds.
See https://cloud.google.com/kubernetes-engine/docs/how-to/configure-backend-service#creating_a_backendconfig
When creating an ingress on GKE the default setup is that a GLBC HTTP load balancer will be created with the backends that you supplied. Default it is configured at a 30 second timeout for your application to handle the request.
If you need a longer timeout you have to edit this manually after setup in the backends of your HTTP Load balancer in the google cloud console.