Does Kubernetes support green-blue deployment? - sockets

I would like to ask on the mechanism for stopping the pods in kubernetes.
I read https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods before ask the question.
Supposably we have a application with gracefully shutdown support
(for example we use simple http server on Go https://play.golang.org/p/5tmkPPMiSSt).
Server has two endpoints:
/fast, always send 200 http status code.
/slow, wait 10 seconds and send 200 http status code.
There is deployment/service resource with that configuration:
apiVersion: apps/v1
kind: Deployment
metadata:
name: test
spec:
replicas: 1
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app/name: test
template:
metadata:
labels:
app/name: test
spec:
terminationGracePeriodSeconds: 120
containers:
- name: service
image: host.org/images/grace:v0.1
livenessProbe:
httpGet:
path: /health
port: 10002
failureThreshold: 1
initialDelaySeconds: 1
readinessProbe:
httpGet:
path: /health
port: 10002
failureThreshold: 1
initialDelaySeconds: 1
---
apiVersion: v1
kind: Service
metadata:
name: test
spec:
type: NodePort
ports:
- name: http
port: 10002
targetPort: 10002
selector:
app/name: test
To make sure the pods deleted gracefully I conducted two test options.
First option (slow endpoint) flow:
Create deployment with replicas value equal 1.
Wait for pod readness.
Send request on /slow endpoint (curl http://ip-of-some-node:nodePort/slow) and delete pod (simultaneously, with 1 second out of sync).
Expected:
Pod must not end before http server completed my request.
Got:
Yes, http server process in 10 seconds and return response for me.
(if we pass --grace-period=1 option to kubectl, then curl will write - curl: (52) Empty reply from server)
Everything works as expected.
Second option (fast endpoint) flow:
Create deployment with replicas value equal 10.
Wait for pods readness.
Start wrk with "Connection: close" header.
Randomly delete one or two pods (kubectl delete pod/xxx).
Expected:
No socket errors.
Got:
$ wrk -d 2m --header "Connection: Close" http://ip-of-some-node:nodePort/fast
Running 2m test # http://ip-of-some-node:nodePort/fast
Thread Stats Avg Stdev Max +/- Stdev
Latency 122.35ms 177.30ms 1.98s 91.33%
Req/Sec 66.98 33.93 160.00 65.83%
15890 requests in 2.00m, 1.83MB read
Socket errors: connect 0, read 15, write 0, timeout 0
Requests/sec: 132.34
Transfer/sec: 15.64KB
15 socket errors on read, that is, some pods were disconnected from the service before all requests were processed (maybe).
The problem appears when a new deployment version is applied, scale down and rollout undo.
Questions:
What's reason of that behavior?
How to fix it?
Kubernetes version: v1.16.2
Edit 1.
The number of errors changes each time, but remains in the range of 10-20, when removing 2-5 pods in two minutes.
P.S. If we will not delete a pod, we don't got errors.

Does Kubernetes support green-blue deployment?
Yes, it does. You can read about it on Zero-downtime Deployment in Kubernetes with Jenkins,
A blue/green deployment is a change management strategy for releasing software code. Blue/green deployments, which may also be referred to as A/B deployments require two identical hardware environments that are configured exactly the same way. While one environment is active and serving end users, the other environment remains idle.
Container technology offers a stand-alone environment to run the desired service, which makes it super easy to create identical environments as required in the blue/green deployment. The loosely coupled Services - ReplicaSets, and the label/selector-based service routing in Kubernetes make it easy to switch between different backend environments.
I would also recommend reading Kubernetes Infrastructure Blue/Green deployments.
Here is a repository with examples from codefresh.io about blue green deployment.
This repository holds a bash script that allows you to perform blue/green deployments on a Kubernetes cluster. See also the respective blog post
Prerequisites
As a convention the script expects
The name of your deployment to be $APP_NAME-$VERSION
Your deployment should have a label that shows it version
Your service should point to the deployment by using a version selector, pointing to the corresponding label in the deployment
Notice that the new color deployment created by the script will follow the same conventions. This way each subsequent pipeline you run will work in the same manner.
You can see examples of the tags with the sample application:
service
deployment
You might be also interested in Canary deployment:
Another deployment strategy is using Canaries (a.k.a. incremental rollouts). With canaries, the new version of the application is gradually deployed to the Kubernetes cluster while getting a very small amount of live traffic (i.e. a subset of live users are connecting to the new version while the rest are still using the previous version).
...
The small subset of live traffic to the new version acts as an early warning for potential problems that might be present in the new code. As our confidence increases, more canaries are created and more users are now connecting to the updated version. In the end, all live traffic goes to canaries, and thus the canary version becomes the new “production version”.
EDIT
Questions:
What's reason of that behavior?
When new deployment is being applied old pods are being removed and new ones are being scheduled.
This is being done by Control Plan
For example, when you use the Kubernetes API to create a Deployment, you provide a new desired state for the system. The Kubernetes Control Plane records that object creation, and carries out your instructions by starting the required applications and scheduling them to cluster nodes–thus making the cluster’s actual state match the desired state.
You have only setup a readinessProbe, which tells your service if it should send traffic to the pod or not. This is not a good solution as like you can see in your example if you have 10 pods and remove one or two there is a gap and you receive socket error.
How to fix it?
You have to understand this is not broken so it doesn't need a fix.
This might be mitigated by implementing a check in your application to make sure it's sending request to working address or utilize other features like load balancing like ingress.
Also when you are updating deployment you can do checks before deleting the pod to check if it does have any traffic incoming/outgoing and roll the update to only not used pods.

Related

How to scale up all OpenShift pods before scaling down old ones

I have a basic OpenShift deployment configuration:
kind: DeploymentConfig
spec:
replicas: 3
strategy:
type: Rolling
Additionaly I've put:
maxSurge: 3
maxUnavailable: 0%
because I want to scale up all new pods first and after that scale down old pods (so there will be 6 pods running during deploymentm that's why I decided to set up maxSurge).
I want to have all old pods running until all new pods are up but with this set of parameters there is something wrong. During deployment:
all 3 new pods are initialized at once and are trying to start, old pods are running (as expected)
if first new pod started sucessfully then the old one is terminated
if second new pod is ready then another old pod is terminated
I want to terminate all old pods ONLY if all new pods are ready to handle requests, otherwise all the old pods should handle requests.
What did I miss in this confgiuration?
The behavior you document is expected for a deployment rollout (that OpenShift will shut down each old pod as a new pod becomes ready). It will also start routing traffic to the new nodes as they become available, which you say that you don't want either.
A service is pretty much by definition going to route to pods as they are available. And a deployment pretty much handles pods independently, so I don't believe that anything will really give you the behavior you are looking for there either.
If you want a blue green style deployment like you describe, you are essentially going to have deploy the new pods as a separate deployment. Then once the new deployment is completely up, you can change the corresponding service to point at the new pods. Then you can shut down the old deployment.
Service Mesh can help with some of that. So could an operator. Or you could do it manually.
You can combine the rollout strategy with readiness checks with an initial delay to ensure that all the new pods have time to start up before the old ones are all shut down at the same time.
In the case below, the new 3 pods will be spun up (for a total of 6 pods) and then after 60 seconds, the readiness check will occur and the old pods will be shut down. You would just want to adjust your readiness delay to a large enough timeframe to give all of your new pods time to start up.
apiVersion: v1
kind: DeploymentConfig
spec:
replicas: 3
strategy:
rollingParams:
maxSurge: 3
maxUnavailable: 0
type: Rolling
template:
spec:
containers:
- readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8099
initialDelaySeconds: 60

Is it possible to specify a delay for pod restart when Kubernetes liveness probe fails?

Got a simple REST API server built with python gunicorn, which runs multiple threads to accept requests. After running for some time, some of these threads crash. Got a script to detect the number of dead threads (using log files). Once this number crosses some threshold, we want to restart gunicorn. This script is configured to be used as liveness probe.
The script works fine and restarts the pod as expected. But there are a few live threads that are still processing requests. Also, gunicorn keeps a backlog queue of accepted requests that it cannot process yet, since other requests are processing. Is there a way to specify a delay for the pod restart so the other running threads and the backlog requests have some time to finish processing?
You can use a prestop hook. Offcial docs here
How to use documented here.
You can also use terminationGracePeriodSeconds to allow graceful termination of pod.
Best Practices here
You can configure graceful pod termination with terminationGracePeriodSeconds
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: test
spec:
replicas: 1
template:
spec:
containers:
- name: test
image: ...
terminationGracePeriodSeconds: 60

How to avoid coredns resolving overhead in kubernetes

I think the title is pretty much self explanatory. I have done many experiments and the sad truth, is that coredns does add a 20 ms overhead to all the requests inside the cluster. At first we thought maybe by adding more replications, and balancing the resolving requests between more instances, we could improve the response time, but it did not help at all. (we scaled up from 2 pods to 4 pods)
There was some enhancements on the fluctuations of resolving time, after scaling up to 4 instances. But it wasn't what we were expecting, and the 20 ms overhead was still there.
We have some web-services that their actual response time is < 30 ms and using coredns we are doubling up the response time, and it is not cool!
After coming to a conclusion about this matter, we did an experiment to double-check that this is not an OS level overhead. And the results were not different from what we were expecting.
We thought maybe we can implement/deploy a solution based on putting list of needed hostname mappings for each pod, inside /etc/hosts of that pod. So my final questions are as follows:
Has anyone else experienced something similar with coredns?
Can you please suggest alternative solutions to coredns that work in k8s environment?
Any thoughts or insights are appreciated. Thanks in advance.
There are several things to look at when running coreDNS in your kubernetes cluster
Memory
AutoPath
Number of Replicas
Autoscaler
Other Plugins
Prometheus metrics
Separate Server blocks
Memory
CoreDNS recommended amount of memory for replicas is
MB required (default settings) = (Pods + Services) / 1000 + 54
Autopath
Autopath is a feature in Coredns that helps increase the response time for external queries
Normally a DNS query goes through
..svc.cluster.local
.svc.cluster.local
cluster.local
Then the configured forward, usually host search path (/etc/resolv.conf
Trying "example.com.default.svc.cluster.local"
Trying "example.com.svc.cluster.local"
Trying "example.com.cluster.local"
Trying "example.com"
Trying "example.com"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 55265
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;example.com. IN A
;; ANSWER SECTION:
example.com. 30 IN A 93.184.216.34
This requires more memory so the calculation now becomes
MB required (w/ autopath) = (Number of Pods + Services) / 250 + 56
Number of replicas
Defaults to 2 but enabling the Autoscaler should help with load issues.
Autoscaler
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: coredns
namespace: default
spec:
maxReplicas: 20
minReplicas: 2
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: coredns
targetCPUUtilizationPercentage: 50
Node local cache
Beta in Kubernetes 1.15
NodeLocal DNSCache improves Cluster DNS performance by running a dns caching agent on cluster nodes as a DaemonSet. In today’s architecture, Pods in ClusterFirst DNS mode reach out to a kube-dns serviceIP for DNS queries. This is translated to a kube-dns/CoreDNS endpoint via iptables rules added by kube-proxy. With this new architecture, Pods will reach out to the dns caching agent running on the same node, thereby avoiding iptables DNAT rules and connection tracking. The local caching agent will query kube-dns service for cache misses of cluster hostnames(cluster.local suffix by default).
https://kubernetes.io/docs/tasks/administer-cluster/nodelocaldns/
Other Plugins
These will also help see what is going on inside CoreDNS
Error - Any errors encountered during the query processing will be printed to standard output.
Trace - enable OpenTracing of how a request flows through CoreDNS
Log - query logging
health - CoreDNS is up and running this returns a 200 OK HTTP status code
ready - By enabling ready an HTTP endpoint on port 8181 will return 200 OK when all plugins that are able to signal readiness have done so.
Ready and Health should be used in the deployment
livenessProbe:
httpGet:
path: /health
port: 8080
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
readinessProbe:
httpGet:
path: /ready
port: 8181
scheme: HTTP
Prometheus Metrics
Prometheus Plugin
coredns_health_request_duration_seconds{} - duration to process a HTTP query to the local /health endpoint. As this a local operation, it should be fast. A (large) increase in this duration indicates the CoreDNS process is having trouble keeping up with its query load.
https://github.com/coredns/deployment/blob/master/kubernetes/Scaling_CoreDNS.md
Separate Server blocks
One last bit of advice is to separate the Cluster DNS server block to external block
CLUSTER_DOMAIN REVERSE_CIDRS {
errors
health
kubernetes
ready
prometheus :9153
loop
reload
loadbalance
}
. {
errors
autopath #kubernetes
forward . UPSTREAMNAMESERVER
cache
loop
}
More information about the k8 plugin and other options here
https://github.com/coredns/coredns/blob/master/plugin/kubernetes/README.md

kubernetes Blue green deployment

Kubernetes Blue-green deployment, I am patching the Kubernetes-application-service to redirect the traffic from app-v1 to app-v2(Behind the load balancer). if any connection is ongoing on during that "patching", will be disconnected? and if not !! how I can test this?
what is the best approach as per you for version deployment with the warm handover(without any connection loss) from app-v1 to app-v2?
The question seems to be about supporting two versions at the same time. That is kind of Canary deployment, which make production traffic to gradually shifting from app-v1 to app-v2.
This could be achieved with:
Allow deployments to have HPA with custom metric that based on number of connections. That is, when it reaches certain number of connections scale up/down.
Allow two deployments at the same time, app-v1 and app-v2.
Allow new traffic to route on new deployment via some Ingress annotation, but still keeping access to the old version, so no existing connection be dropped.
Now, all the new requests will be routed to the new version. The HPA eventually, scale down pods from old version. (You can even allow deployment to have zero replicas).
Addition to your question above blue-green deployments.
The blue-green deployment is about having two identical environments, where one environment active at a time, let's say blue is active on production now. Once you have a new version ready for deployment, say green, is deployed and tested separately. Finally, you switched the traffic to the green environment, when you are happy with the test result on green environment. So green become active while blue become idle or terminated later sometime.
(Referred from martin fowler article).
In Kubernetes, this can be achieved with having two identical deployments. Here is a good reference.
Basically, you can have two identical deployments, assume you have current deployment my-deployment-blue is on production. Once you are ready with the new version, you can deploy it as a completely new deployment, lets say my-deployment-green, and use a separate test service to test the green environment. Finally, switch the traffic to the my-deployment-green when all test are passed.
If you are trying to achieve Blue/Green in Kubernetes then my answer might help you.
Do a rolling update by setting the following configuration
maxUnavailable = 0
maxSurge = 100%
How?
The deployment controller first scales the latest version up to 100% of the obsolete version. Once the latest version is healthy, it immediately scales the obsolete version down to 0%.
Example Code:
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
strategy:
rollingUpdate:
maxSurge: 100%
maxUnavailable: 0
type: RollingUpdate

Gitlab Autodevops How to always keep one pod alive

I'm using Gitlab Autodevops to deploy app on my kubernetes cluster. That app should always have only one instance running.
Problem is, during the update process, Helm kills currently running pod before the new pod is ready. This causes downtime period, when old version is already killed and new one isn't ready yet. To make it worse, app need significant time to start (2+ minutes).
I have tried to set minAvailable: 1 in PodDisruptionBudget, but no help.
Any idea how can i tell helm to wait for readiness of updated pod before killing old one? (Having 2 instances running simultaneously for several second is not such a problem for me)
You can release a new application version in few ways, it's necessary to choose the one that fit your needs.
I would recommend one of the following:
Ramped - slow rollout
A ramped deployment updates pods in a rolling update fashion, a secondary ReplicaSet is created with the new version of the application, then the number of replicas of the old version is decreased and the new version is increased until the correct number of replicas is reached.
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 2 # how many pods we can add at a time
maxUnavailable: 0 # maxUnavailable define how many pods can be unavailable
# during the rolling update
Full example and steps can be found here.
Blue/Green - best to avoid API versioning issues
A blue/green deployment differs from a ramped deployment because the “green” version of the application is deployed alongside the “blue” version. After testing that the new version meets the requirements, we update the Kubernetes Service object that plays the role of load balancer to send traffic to the new version by replacing the version label in the selector field.
apiVersion: v1
kind: Service
metadata:
name: my-app
labels:
app: my-app
spec:
type: NodePort
ports:
- name: http
port: 8080
targetPort: 8080
# Note here that we match both the app and the version.
# When switching traffic, we update the label “version” with
# the appropriate value, ie: v2.0.0
selector:
app: my-app
version: v1.0.0
Full example and steps can be found here.
Canary - for testing
A canary deployment consists of routing a subset of users to a new functionality. In Kubernetes, a canary deployment can be done using two Deployments with common pod labels. One replica of the new version is released alongside the old version. Then after some time and if no error is detected, scale up the number of replicas of the new version and delete the old deployment.
Using this ReplicaSet technique requires spinning-up as many pods as necessary to get the right percentage of traffic. That said, if you want to send 1% of traffic to version B, you need to have one pod running with version B and 99 pods running with version A. This can be pretty inconvenient to manage so if you are looking for a better managed traffic distribution, look at load balancers such as HAProxy or service meshes like Linkerd, which provide greater controls over traffic.
Manifest for version A:
spec:
replicas: 3
Manifest for version B:
spec:
replicas: 1
Full example and steps can be found here.
You can also play with Interactive Tutorial - Updating Your App on Kubernetes.
I recommend reading Deploy, Scale And Upgrade An Application On Kubernetes With Helm.