Traffic still distributed to disabled PODs in 1 service multple DC scenario - Openshift - kubernetes

In our environment we usually do a canary mechanism for our microservices. we implement 1 service pointed to 2 different deployment config pods. But we found an issue that after we remove one of the DC from the service, we still able to see traffic exist on the removed DC pods. And it happend like forever, the traffic not distributed to the enabled DC even we wait for couple of hours.
The traffic distributed to the enabled DC pods after we restart the pods that call the service. We use openshift 3.6 in our environment.
I attach the flow of the service in this case also.
Badly need help for this issue.

Related

How to avoid downtime during scheduled maintenance window

I'm experiencing downtimes whenever the GKE cluster gets upgraded during the maintenance window. My services (APIs) become unreachable for like ~5min.
The cluster Location type is set to "Zonal", and all my pods have 2 replicas. The only affected pods seem to be the ones using nginx ingress controller.
Is there anything I can do to prevent this? I read that using Regional clusters should prevent downtimes in the control plane, but I'm not sure if it's related to my case. Any hints would be appreciated!
You mention "downtime" but is this downtime for you using the control plane (i.e. kubectl stop working) or is it downtime in that the end user who is using the services stops seeing the service working.
A GKE upgrade upgrades two parts of the cluster: the control plane or master nodes, and the worker nodes. These are two separate upgrades although they can happen at the same time depending on your configuration of the cluster.
Regional clusters can help with that, but they will cost more as you are having more nodes, but the upside is that the cluster is more resilient.
Going back to the earlier point about the control plane vs node upgrades. The control plane upgrade does NOT affect the end-user/customer perspective. The services will remaining running.
The node upgrade WILL affect the customer so you should consider various techniques to ensure high availability and resiliency on your services.
A common technique is to increase replicas and also to include pod antiaffinity. This will ensure the pods are scheduled on different nodes, so when the node upgrade comes around, it doesn't take the entire service out because the cluster scheduled all the replicas on the same node.
You mention the nginx ingress controller in your question. If you are using Helm to install that into your cluster, then out of the box, it is not setup to use anti-affinity, so it is liable to be taken out of service if all of its replicas get scheduled onto the same node, and then that node gets marked for upgrade or similar.

Forward Traffic to POD in Kubernetes Cluster

I installed and configured 3 node K8S cluster. The worker nodes are windows nodes. We have one .Net application. We want to containerize this application. This application internally using Apache Ignite for the distributed cache.
We build docker image for this application, wrote a deployment file and deployed it in K8S cluster. The deployment will also create a service of “LoadBalancer” type. Using this service we are connecting to the application from the outside world. All is good so far.
Coming to the issue, as we are using Apache Ignite for the distributed cache. One of the POD will be master. We want to always forward the traffic to the POD which is acting as the master node in the Apache Ignite cluster. The Apache Ignite master node identification must be dynamic.
I had gone through the below link. Here the POD configuration is static. We want to dynamically identify the master POD and forward the traffic. What we have to do on the service side.
https://appscode.com/products/voyager/7.4.0/guides/ingress/http/statefulset-pod/
Any help on how to forward the traffic to the POD is greatly appreciated.
The very fact that you have a leader/follower topology, the ask to direct traffic to a said nome (master node) is flawed for a couple of reasons:
What happens when the current leader fails over and there is a new election to select a new leader
The fact that pods are ephemeral they should not have major roles to play in production, instead work with deployments and their replicas. What you are trying to achieve is an anti-pattern
In any case, if this is what you want, may be you want to read about gateways in istio which can be found here

Kubernetes - Load balancing Web App access per connections

Long time I did not come here and I hope you're fine :)
So for now, i have the pleasure of working with kubernetes ! So let's start ! :)
[THE EXISTING]
I have an operationnal kubernetes cluster with which I work every day.it consists of several applications, one of which is of particular interest to us, which is the web management interface.
I currently own one master and four nodes in my cluster.
For my web application, pod contain 3 containers : web / mongo /filebeat, and for technical reasons, we decided to assign 5 users max for each web pod.
[WHAT I WANT]
I want to deploy a web pod on each nodes (web0,web1,web2,web3), what I can already do, and that each session (1 session = 1 user) is distributed as follows:
For now, all HTTP requests are processed by web0.
[QUESTIONS]
Am I forced to go through an external loadbalancer (haproxy)?
Can I use an internal loadbalancer, configuring a service?
Does anyone have experience on the implementation described above?
I thank in advance those who can help me in this process :)
This generally depends how and where you've deployed your Kubernetes infrastructure, but you can do this natively with a few options.
Firstly, you'll need to scale your web deployment. This is very simple to do:
kubectl scale --current-replicas=2 --replicas=3 deployment/web
If you're deployed into a cloud provider (such as AWS using kops, or GKE) you can use a service. Just specify the type as LoadBalancer. Services will spread the sessions for your users.
Another option is to use an Ingress. In order to do this, you'll need to use an Ingress Controller, such as the nginx-ingress-controller which is the most featureful and widely deployed.
Both of these options will automatically loadbalance your incoming application sessions, but they may not necessarily do it in the order you've described in your image, it'll be random across the available web deployments

Using spinnaker's Red/Black deployments strategy and still having both versions serving traffic

I'm currently setting up a POC spinnaker pipeline to deploy to a kubernetes cluster.
Experimenting with spinnaker's red/black strategy, i've noticed that it does not behave as i expect it to. I expect it to guarantee that only 1 version gets traffic with the following steps:
deploy black server group (kubernete's replicaset) & ensure it's healthy
reroute the traffic of the service to the black server group by updating the load balancer's targets
disable the red server group
But in reality, at least when using it with kubernetes, step 2 here seems to map to several steps:
add black targets to the load balancer
remove red targets from the load balancer
Therefore, i get 2 versions serving traffic for a minute here.
To my understanding, blue green can be achieved in kubernetes by updating the service (load balancer) 's pods selector, so i'm confused as for why spinnaker's kubernetes driver does not seem to leverage this.
Can anybody help me see what i'm missing here ?
Thanks
Can you verify if the deployment isn't still in the phase of rolling out? It can be that your spinacker setup just spins up a new version of the current deployment. If this is the case your deployment will doe a rolling upgrade withe the max surge you provided or the default one and that's why you have 2 versions running at the same time.
If I'm not mistaken, most of the people that doe blue/green deploys have 2 separated networks (for example with flannel) and just spin up a new deployment that gets switched either gradually or instant via their ingress controllers.

What happens when the Kubernetes master fails?

I've been trying to figure out what happens when the Kubernetes master fails in a cluster that only has one master. Do web requests still get routed to pods if this happens, or does the entire system just shut down?
According to the OpenShift 3 documentation, which is built on top of Kubernetes, (https://docs.openshift.com/enterprise/3.2/architecture/infrastructure_components/kubernetes_infrastructure.html), if a master fails, nodes continue to function properly, but the system looses its ability to manage pods. Is this the same for vanilla Kubernetes?
In typical setups, the master nodes run both the API and etcd and are either largely or fully responsible for managing the underlying cloud infrastructure. When they are offline or degraded, the API will be offline or degraded.
In the event that they, etcd, or the API are fully offline, the cluster ceases to be a cluster and is instead a bunch of ad-hoc nodes for this period. The cluster will not be able to respond to node failures, create new resources, move pods to new nodes, etc. Until both:
Enough etcd instances are back online to form a quorum and make progress (for a visual explanation of how this works and what these terms mean, see this page).
At least one API server can service requests
In a partially degraded state, the API server may be able to respond to requests that only read data.
However, in any case, life for applications will continue as normal unless nodes are rebooted, or there is a dramatic failure of some sort during this time, because TCP/ UDP services, load balancers, DNS, the dashboard, etc. Should all continue to function for at least some time. Eventually, these things will all fail on different timescales. In single master setups or complete API failure, DNS failure will probably happen first as caches expire (on the order of minutes, though the exact timing is configurable, see the coredns cache plugin documentation). This is a good reason to consider a multi-master setup–DNS and service routing can continue to function indefinitely in a degraded state, even if etcd can no longer make progress.
There are actions that you could take as an operator which would accelerate failures, especially in a fully degraded state. For instance, rebooting a node would cause DNS queries and in fact probably all pod and service networking functionality until at least one master comes back online. Restarting DNS pods or kube-proxy would also be bad.
If you'd like to test this out yourself, I recommend kubeadm-dind-cluster, kind or, for more exotic setups, kubeadm on VMs or bare metal. Note: kubectl proxy will not work during API failure, as that routes traffic through the master(s).
Kubernetes cluster without a master is like a company running without a Manager.
No one else can instruct the workers(k8s components) other than the Manager(master node)(even you, the owner of the cluster, can only instruct the Manager)
Everything works as usual. Until the work is finished or something stopped them.(because the master node died after assigning the works)
As there is no Manager to re-assign any work for them, the workers will wait and wait until the Manager comes back.
The best practice is to assign multiple managers(master) to your cluster.
Although your data plane and running applications does not immediately starts breaking but there are several scenarios where cluster admins will wish they had multi-master setup. Key to understanding the impact would be understanding which all components talk to master for what and how and more importantly when will they fail if master fails.
Although your application pods running on data plane will not get immediately impacted but imagine a very possible scenario - your traffic suddenly surged and your horizontal pod autoscaler kicked in. The autoscaling would not work as Metrics Server collects resource metrics from Kubelets and exposes them in Kubernetes apiserver through Metrics API for use by Horizontal Pod Autoscaler and vertical pod autoscaler ( but your API server is already dead).If your pod memory shoots up because of high load then it will eventually lead to getting killed by k8s OOM killer. If any of the pods die, then since controller manager and scheduler talks to API Server to watch for current state of pods so they too will fail. In short a new pod will not be scheduled and your application may stop responding.
One thing to highlight is that Kubernetes system components communicate only with the API server. They don’t
talk to each other directly and so their functionality themselves could fail I guess. Unavailable master plane can mean several things - failure of any or all of these components - API server,etcd, kube scheduler, controller manager or worst the entire node had crashed.
If API server is unavailable - no one can use kubectl as generally all commands talk to API server ( meaning you cannot connect to cluster, cannot login into any pods to check anything on container file system. You will not be able to see application logs unless you have any additional centralized log management system).
If etcd database failed or got corrupted - your entire cluster state data is gone and the admins would want to restore it from backups as early as possible.
In short - a failed single master control plane although may not immediately impact traffic serving capability but cannot be relied on for serving your traffic.