We have an HTTP(s) Load Balancer created by a kubernetes ingress, which points to a backend formed by set of pods running nginx and Ruby on Rails.
Taking a look to the load balancer logs we have detected an increasing number of requests with a response code of 0 and statusDetails = client_disconnected_before_any_response.
We're trying to understand why this his happening, but we haven't found anything relevant. There is nothing in the nginx access or error logs.
This is happening for multiple kind of requests, from GET to POST.
We also suspect that sometimes despite of the request being logged with that error, the requests is actually passed to the backend. For instance we're seeing PG::UniqueViolation errors, due to idential sign up requests being sent twice to the backend in our sign up endpoint.
Any kind of help would be appreciated. Thanks!
UPDATE 1
As requested here is the yaml file for the ingress resource:
UPDATE 2
I've created a log-based Stackdriver metric, to count the number of requests that present this behavior. Here is the chart:
The big peaks approximately match the timestamp for these kubernetes events:
Full error: Readiness probe failed: Get http://10.48.1.28:80/health_check: net/http: request canceled (Client.Timeout exceeded while awaiting headers)"
So it seems sometimes the readiness probe for the pods behind the backend fails, but not always.
Here is the definition of the readinessProbe
readinessProbe:
failureThreshold: 3
httpGet:
httpHeaders:
- name: X-Forwarded-Proto
value: https
- name: Host
value: [redacted]
path: /health_check
port: 80
scheme: HTTP
initialDelaySeconds: 1
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 5
A response code of 0 and statusDetails = client_disconnected_before_any_response means the client closed the connection before the Load Balancer being able to provide a response as per this GCP documentation.
Investigating why it did not respond in time, one of the reasons could be the difference between the keepalive timeouts from nginx and the GCP Load Balancer, even if this will most-like provide a backend_connection_closed_before_data_sent_to_client caused by a 502 Bad Gateway race condition.
To make sure the backend responds to the request and to see if how long it takes, you can repeat this process for a couple of times (since you still get some valid responses):
curl response time
$ curl -w "#curl.txt" -o /dev/null -s IP_HERE
curl.txt content(create and save this file first):
time_namelookup: %{time_namelookup}\n
time_connect: %{time_connect}\n
time_appconnect: %{time_appconnect}\n
time_pretransfer: %{time_pretransfer}\n
time_redirect: %{time_redirect}\n
time_starttransfer: %{time_starttransfer}\n
----------\n
time_total: %{time_total}\n
If this is the case, please review the sign up endpoint code for any type of loop like the PG::UniqueViolation errors that you mentioned.
Related
My readiness probe specifies HTTPS and 8200 as the port to check my hashicorp vault pod.
readinessProbe:
httpGet:
scheme: HTTPS
path: '/v1/sys/health?activecode=200'
port: 8200
Once the pod is running kubectl describe pod shows this
Readiness probe failed: Error checking seal status: Error making API request.
URL: GET http://127.0.0.1:8200/v1/sys/seal-status
Code: 400. Raw Message:
Client sent an HTTP request to an HTTPS server.
I have found this similar problem. See the whole answer here.
According to this documentation:
The http/https scheme is controlled by the tlsDisable value.
When set to true, changes URLs from https to http (such as the VAULT_ADDR=http://127.0.0.1:8200 environment variable set on the Vault pods).
To turn it off:
global:
tlsDisable: false
NOTE:
Vault should always be used with TLS in production to provide secure communication between clients and the Vault server. It requires a certificate file and key file on each host where Vault is running.
See also this documentation. One can find there many examples of using readiness and liveness probes, f.e
readinessProbe:
exec:
command:
- /bin/sh
- -ec
- vault status
initialDelaySeconds: 5
periodSeconds: 5
I am facing a very weird problem.
I have a deployment with nginx container(six pods) work on GKE, I use Liveness/Readiness Probe on nginx container, all the pods are working fine.
One day, all pods were dead because of readiness/liveness failed. And I use kubectl describe pod that pop up ...
Liveness probe failed: Get http://10.28.6.19:80/healthz: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Readiness probe failed: Get http://10.28.6.19:80/healthz: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
I try to delete pods to let six pods reborn, the pod is very normal.
Some posts said: this is pods hit the limit of resource, I swear my pod resource usage is very low. And I don't set limit for my pods.
I have encounter this problem third, until now I still don't know the root cuse and any solution of this kind of problem.
Deployment Spec as below, Nginx will direct repose 200 status code according to /healthz, so nginx config have to add a location
readinessProbe:
httpGet:
path: /healthz
port: 80
initialDelaySeconds: 10
timeoutSeconds: 2
livenessProbe:
httpGet:
path: /healthz
port: 80
initialDelaySeconds: 10
timeoutSeconds: 2
When your app starts hitting the resource limit, the Kubernetes starts throttling your container. Resulting in probe failure. You should always set the resource request and limit for avoiding such events:
containers:
- name: app
image: images.my-company.example/app:v4
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
Liveness probe failed: Get http://10.28.6.19:80/healthz: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Readiness probe failed: Get http://10.28.6.19:80/healthz: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
The error message says that your HTTP request wasn't successful.
Readiness Probe needs to succeed for the pod to be added as an endpoint for the
service exposing it.
kubectl get po -o wide : Run this command to get the pod's cluster IP
kubectl exec -t [another_pod] -- curl -I [pod's cluster IP] : If this command returns a 200 response, the path is configured properly and the readiness probe should pass. If you get anything other than a 200 response, this is why the readiness probe fails and you need to check your image.
You can try out any of these solutions.
1. Increase the timeoutSeconds to atleast 10 seconds. During peak load time nginx don't provide the response in 2 seconds. Also configure successThreshold and failureThreshold values.
2. Replace httpGet to exec method which will call the same endpoint using curl if needed. This resolves my issue.
3. Isolate the nginx containers to separate node pools to ensure no other deployments are creating any kind of resource constraints to nginx pods.
I think the title is pretty much self explanatory. I have done many experiments and the sad truth, is that coredns does add a 20 ms overhead to all the requests inside the cluster. At first we thought maybe by adding more replications, and balancing the resolving requests between more instances, we could improve the response time, but it did not help at all. (we scaled up from 2 pods to 4 pods)
There was some enhancements on the fluctuations of resolving time, after scaling up to 4 instances. But it wasn't what we were expecting, and the 20 ms overhead was still there.
We have some web-services that their actual response time is < 30 ms and using coredns we are doubling up the response time, and it is not cool!
After coming to a conclusion about this matter, we did an experiment to double-check that this is not an OS level overhead. And the results were not different from what we were expecting.
We thought maybe we can implement/deploy a solution based on putting list of needed hostname mappings for each pod, inside /etc/hosts of that pod. So my final questions are as follows:
Has anyone else experienced something similar with coredns?
Can you please suggest alternative solutions to coredns that work in k8s environment?
Any thoughts or insights are appreciated. Thanks in advance.
There are several things to look at when running coreDNS in your kubernetes cluster
Memory
AutoPath
Number of Replicas
Autoscaler
Other Plugins
Prometheus metrics
Separate Server blocks
Memory
CoreDNS recommended amount of memory for replicas is
MB required (default settings) = (Pods + Services) / 1000 + 54
Autopath
Autopath is a feature in Coredns that helps increase the response time for external queries
Normally a DNS query goes through
..svc.cluster.local
.svc.cluster.local
cluster.local
Then the configured forward, usually host search path (/etc/resolv.conf
Trying "example.com.default.svc.cluster.local"
Trying "example.com.svc.cluster.local"
Trying "example.com.cluster.local"
Trying "example.com"
Trying "example.com"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 55265
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;example.com. IN A
;; ANSWER SECTION:
example.com. 30 IN A 93.184.216.34
This requires more memory so the calculation now becomes
MB required (w/ autopath) = (Number of Pods + Services) / 250 + 56
Number of replicas
Defaults to 2 but enabling the Autoscaler should help with load issues.
Autoscaler
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: coredns
namespace: default
spec:
maxReplicas: 20
minReplicas: 2
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: coredns
targetCPUUtilizationPercentage: 50
Node local cache
Beta in Kubernetes 1.15
NodeLocal DNSCache improves Cluster DNS performance by running a dns caching agent on cluster nodes as a DaemonSet. In today’s architecture, Pods in ClusterFirst DNS mode reach out to a kube-dns serviceIP for DNS queries. This is translated to a kube-dns/CoreDNS endpoint via iptables rules added by kube-proxy. With this new architecture, Pods will reach out to the dns caching agent running on the same node, thereby avoiding iptables DNAT rules and connection tracking. The local caching agent will query kube-dns service for cache misses of cluster hostnames(cluster.local suffix by default).
https://kubernetes.io/docs/tasks/administer-cluster/nodelocaldns/
Other Plugins
These will also help see what is going on inside CoreDNS
Error - Any errors encountered during the query processing will be printed to standard output.
Trace - enable OpenTracing of how a request flows through CoreDNS
Log - query logging
health - CoreDNS is up and running this returns a 200 OK HTTP status code
ready - By enabling ready an HTTP endpoint on port 8181 will return 200 OK when all plugins that are able to signal readiness have done so.
Ready and Health should be used in the deployment
livenessProbe:
httpGet:
path: /health
port: 8080
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
readinessProbe:
httpGet:
path: /ready
port: 8181
scheme: HTTP
Prometheus Metrics
Prometheus Plugin
coredns_health_request_duration_seconds{} - duration to process a HTTP query to the local /health endpoint. As this a local operation, it should be fast. A (large) increase in this duration indicates the CoreDNS process is having trouble keeping up with its query load.
https://github.com/coredns/deployment/blob/master/kubernetes/Scaling_CoreDNS.md
Separate Server blocks
One last bit of advice is to separate the Cluster DNS server block to external block
CLUSTER_DOMAIN REVERSE_CIDRS {
errors
health
kubernetes
ready
prometheus :9153
loop
reload
loadbalance
}
. {
errors
autopath #kubernetes
forward . UPSTREAMNAMESERVER
cache
loop
}
More information about the k8 plugin and other options here
https://github.com/coredns/coredns/blob/master/plugin/kubernetes/README.md
I would like to ask on the mechanism for stopping the pods in kubernetes.
I read https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods before ask the question.
Supposably we have a application with gracefully shutdown support
(for example we use simple http server on Go https://play.golang.org/p/5tmkPPMiSSt).
Server has two endpoints:
/fast, always send 200 http status code.
/slow, wait 10 seconds and send 200 http status code.
There is deployment/service resource with that configuration:
apiVersion: apps/v1
kind: Deployment
metadata:
name: test
spec:
replicas: 1
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app/name: test
template:
metadata:
labels:
app/name: test
spec:
terminationGracePeriodSeconds: 120
containers:
- name: service
image: host.org/images/grace:v0.1
livenessProbe:
httpGet:
path: /health
port: 10002
failureThreshold: 1
initialDelaySeconds: 1
readinessProbe:
httpGet:
path: /health
port: 10002
failureThreshold: 1
initialDelaySeconds: 1
---
apiVersion: v1
kind: Service
metadata:
name: test
spec:
type: NodePort
ports:
- name: http
port: 10002
targetPort: 10002
selector:
app/name: test
To make sure the pods deleted gracefully I conducted two test options.
First option (slow endpoint) flow:
Create deployment with replicas value equal 1.
Wait for pod readness.
Send request on /slow endpoint (curl http://ip-of-some-node:nodePort/slow) and delete pod (simultaneously, with 1 second out of sync).
Expected:
Pod must not end before http server completed my request.
Got:
Yes, http server process in 10 seconds and return response for me.
(if we pass --grace-period=1 option to kubectl, then curl will write - curl: (52) Empty reply from server)
Everything works as expected.
Second option (fast endpoint) flow:
Create deployment with replicas value equal 10.
Wait for pods readness.
Start wrk with "Connection: close" header.
Randomly delete one or two pods (kubectl delete pod/xxx).
Expected:
No socket errors.
Got:
$ wrk -d 2m --header "Connection: Close" http://ip-of-some-node:nodePort/fast
Running 2m test # http://ip-of-some-node:nodePort/fast
Thread Stats Avg Stdev Max +/- Stdev
Latency 122.35ms 177.30ms 1.98s 91.33%
Req/Sec 66.98 33.93 160.00 65.83%
15890 requests in 2.00m, 1.83MB read
Socket errors: connect 0, read 15, write 0, timeout 0
Requests/sec: 132.34
Transfer/sec: 15.64KB
15 socket errors on read, that is, some pods were disconnected from the service before all requests were processed (maybe).
The problem appears when a new deployment version is applied, scale down and rollout undo.
Questions:
What's reason of that behavior?
How to fix it?
Kubernetes version: v1.16.2
Edit 1.
The number of errors changes each time, but remains in the range of 10-20, when removing 2-5 pods in two minutes.
P.S. If we will not delete a pod, we don't got errors.
Does Kubernetes support green-blue deployment?
Yes, it does. You can read about it on Zero-downtime Deployment in Kubernetes with Jenkins,
A blue/green deployment is a change management strategy for releasing software code. Blue/green deployments, which may also be referred to as A/B deployments require two identical hardware environments that are configured exactly the same way. While one environment is active and serving end users, the other environment remains idle.
Container technology offers a stand-alone environment to run the desired service, which makes it super easy to create identical environments as required in the blue/green deployment. The loosely coupled Services - ReplicaSets, and the label/selector-based service routing in Kubernetes make it easy to switch between different backend environments.
I would also recommend reading Kubernetes Infrastructure Blue/Green deployments.
Here is a repository with examples from codefresh.io about blue green deployment.
This repository holds a bash script that allows you to perform blue/green deployments on a Kubernetes cluster. See also the respective blog post
Prerequisites
As a convention the script expects
The name of your deployment to be $APP_NAME-$VERSION
Your deployment should have a label that shows it version
Your service should point to the deployment by using a version selector, pointing to the corresponding label in the deployment
Notice that the new color deployment created by the script will follow the same conventions. This way each subsequent pipeline you run will work in the same manner.
You can see examples of the tags with the sample application:
service
deployment
You might be also interested in Canary deployment:
Another deployment strategy is using Canaries (a.k.a. incremental rollouts). With canaries, the new version of the application is gradually deployed to the Kubernetes cluster while getting a very small amount of live traffic (i.e. a subset of live users are connecting to the new version while the rest are still using the previous version).
...
The small subset of live traffic to the new version acts as an early warning for potential problems that might be present in the new code. As our confidence increases, more canaries are created and more users are now connecting to the updated version. In the end, all live traffic goes to canaries, and thus the canary version becomes the new “production version”.
EDIT
Questions:
What's reason of that behavior?
When new deployment is being applied old pods are being removed and new ones are being scheduled.
This is being done by Control Plan
For example, when you use the Kubernetes API to create a Deployment, you provide a new desired state for the system. The Kubernetes Control Plane records that object creation, and carries out your instructions by starting the required applications and scheduling them to cluster nodes–thus making the cluster’s actual state match the desired state.
You have only setup a readinessProbe, which tells your service if it should send traffic to the pod or not. This is not a good solution as like you can see in your example if you have 10 pods and remove one or two there is a gap and you receive socket error.
How to fix it?
You have to understand this is not broken so it doesn't need a fix.
This might be mitigated by implementing a check in your application to make sure it's sending request to working address or utilize other features like load balancing like ingress.
Also when you are updating deployment you can do checks before deleting the pod to check if it does have any traffic incoming/outgoing and roll the update to only not used pods.
I am creating an helm chart that should install 2 services.
It has a dependency that first postgresql service will be installed.
Then the other service should use the database user,password,hostname and port for the postgresql service installed.
Since I need to get these details run time I.e soon installed postgresql service of course user details I will use as env variables, hostname and port to be used once postgresql is deployed.
I tried using some template functions and subchart concepts that I got from different sites.. but nothing is solving the requirement.
Is there any examples that I can get to match the above requirement ?
There are a couple of ways you could do this, for ex. using a InitContainer to check if DB is up, but I will show you with a sample example in the charts. I am using Wordpress Chart as an example
livenessProbe:
httpGet:
path: /wp-login.php
port: http
initialDelaySeconds: 120
timeoutSeconds: 5
failureThreshold: 6
readinessProbe:
httpGet:
path: /wp-login.php
port: http
initialDelaySeconds: 30
timeoutSeconds: 3
periodSeconds: 5
I have removed some lines for brevity.
The readiness probe will start acting after a initialDelaySeconds of 30 seconds, will check every periodSeconds i.e. 5 seconds to see if the page responds. Unless the readiness probe succeeds, the traffic won't be sent to this pod. If the probe succeeds then we are good.
The second check - liveness probe does something more. It is starting 120 seconds after the pod is deployed. But if the check fails, it will restart the pod and it will restart failureThreshold times i.e. 6 times.
Coming back to your question and how to solve this:
Use liveness and readiness probes in the applications which are dependent on the database
Use some defaults based on your experience and optimize them as you go.
More information about the readiness and liveness probes can be found here