How do you replace a Kubernetes Master node on AWS? - kubernetes

How do I replace a downed master node? In particular, how do you replace a node on AWS if you are using the kubernetes tools?
If I restart the node, it doesn't start correctly, if I clone the node, it definitely does not.

You'll have to connect the new master to the current etcd cluster, or create a new etcd cluster from a snapshot of the old one, in order to preserve state of the cluster. Aside from that, you have to ensure that the nodes point to the new master's IP address. I also suggest looking at HA masters if you are running a version greater than 1.0.X http://kubernetes.io/v1.1/docs/admin/high-availability.html.

If replacing it comes up with a new IP, you'll have to update all your nodes, which know how to reach it by IP or by internal DNS (in flags).

Related

HA in k8s cluster

let's imagine situation - I have HA cluster with 3 Control plane node, with CP endpoint floating ip adress. First node down - ok, no problem, switch ip dest and go on. Second node down, and cluster goes to unavailiable state. So sad
Question - is possible return cluster in avaliable state, after falled nodes will be up?
Because my previous expiriense said no
Thanks
Avaliable cluster after nodes up
Yes.
It is possible to recover from 1, 2 or all 3 masters down.
Boot them.
Make sure etcd cluster gets back up, or fix whatever issue there could be (disk full, expired certs, ...)
Then make sure kube-apiserver gets back up. Next kube-controller-manager & kube-scheduler.
At which point, your kubelets should already be re-registering and workloads starting back up.
If you use a managed kubernetes cluster you don't have to worry about this, but if you're running your own masters you don't even need to worry about the floating IP. You just bring up new masters and join them to the existing master(s) and you're back up to fighting strength.

What is minikube config specifying?

According to the minikube handbook the configuration commands are used to "Configure your cluster". But what does that mean?
If I set cpus and memory then are these the max values the cluster as a whole can ever consume?
Are these the values it will reserve on the host machine in preparation for use?
Are these the values that are handed to the control plane container/VM and now I have to specify more resources when making a worker node?
What if I want to add another machine (VM or bare metal) and add its resources in the form of a worker node to the cluster? From the looks of it I would have to delete that cluster, change the configuration, then start a new cluster with the new configuration. That doesn't seem scalable.
Thanks for the help in advance.
Answering the question:
If I set cpus and memory then are these the max values the cluster as a whole can ever consume?
In short. It will be a limit for the whole resource (either a VM, a container, etc. depending on a --driver used). It will be used for the underlying OS, Kubernetes components and the workload that you are trying to run on it.
Are these the values it will reserve on the host machine in preparation for use?
I'd reckon this would be related to the --driver you are using and how its handling the resources. I personally doubt it's reserving the 100% of CPU and memory you've passed in the $ minikube start and I'm more inclined to the idea that it uses how much it needs during specific operations.
Are these the values that are handed to the control plane container/VM and now I have to specify more resources when making a worker node?
By default, when you create a minikube instance with: $ minikube start ... you will create a single node cluster capable of being a control-plane node and a worker node simultaneously. You will be able to run your workloads (like an nginx-deployment without adding additional node).
You can add a node to your minikube ecosystem with just: $ minikube node add. This will make another node marked as a worker (with no control-plane components). You can read more about it here:
Minikube.sigs.k8s.io: Docs: Tutorials: Multi node
What if I want to add another machine (VM or bare metal) and add its resources in the form of a worker node to the cluster? From the looks of it I would have to delete that cluster, change the configuration, then start a new cluster with the new configuration. That doesn't seem scalable.
As said previously, you don't need to delete the minikube cluster to add another node. You can run $ minikube node add to add a node on a minikube host. There are also options to delete/stop/start nodes.
Personally speaking if the workload that you are trying to run requires multiple nodes, I would try to consider other Kubernetes cluster built on top/with:
Kubeadm
Kubespray
Microk8s
This would allow you to have more flexibility on where you want to create your Kubernetes cluster (as far as I know, minikube works within a single host (like your laptop for example)).
A side note!
There is an answer (written more than 2 years ago) which shows the way to add a Kubernetes cluster node to a minikube here :
Stackoverflow.com: Answer: How do I get the minikube nodes in a local cluster
Additional resources:
Kubernetes.io: Docs: Setup: Production environment: Tools: Kubeadm: Create cluster kubeadm
Github.com: Kubernetes sigs: Kubespray
Microk8s.io

How to add remote vm instance as worker node in kubernetes cluster

I'm new to kubernetes and trying to explore the new things in it. So, my question is
Suppose I have existing kubernetes cluster with 1 master node and 1 worker node. Consider this setup is on AWS, now I have 1 more VM instance available on Oracle Cloud Platform and I want to configure that VM as worker node and attach that worker node to existing cluster.
So, is it possible to do so? Can anybody have any suggestions regarding this.
I would instead divide your clusters up based on region (unless you have a good VPN between your oracle and AWS infrastructure)
You can then run applications across clusters. If you absolutely must have one cluster that is geographically separated, I would create a master (etcd host) in each region that you have a worker node in.
Worker Node and Master Nodes communication is very critical for Kubernetes cluster. Adding nodes from on-prem to a cloud provider or from different cloud provider will make lots of issues from network perspective.
As VPN connection between AWS and Oracle Cloud needed and every time worker node has to cross ocean (probably) to reach master node.
EDIT: From Kubernetes Doc, Clusters cannot span clouds or regions (this functionality will require full federation support).
https://kubernetes.io/docs/setup/best-practices/multiple-zones/

in kubernetes 1.7 with multi-master nodes and etcd cluster, how the master node connect the etcd cluster by default?

I want to know when the master nodes want to connect the etcd cluster, which etcd node will be selected?does the master node always connects the same etcd node untill it becomes unavailable?does each node in master cluster will connect the same node in etcd cluster?
The scheduler and controller-manager talk to the API server present on the same node. In a HA setup you'll have only one of them running at a time (based on a lease) and whoever is the current active will be talking to the local API server. If for some reason it fails to connect to the local API server, it doesn't renew the lease and another leader will be elected.
As described only one API server will be the leader at any given moment so that's the only place that needs to worry about reaching the etcd cluster. As for the etcd cluster itself, when you configure the kubernetes API server you pass it the etcd-servers flag, which is a list of etcd nodes like:
--etcd-servers=https://10.240.0.10:2379,https://10.240.0.11:2379,https://10.240.0.12:2379
This is then passed the Go etcd/client library which, looking at it's README states:
etcd/client does round-robin rotation on other available endpoints if the preferred endpoint isn't functioning properly. For example, if the member that etcd/client connects to is hard killed, etcd/client will fail on the first attempt with the killed member, and succeed on the second attempt with another member. If it fails to talk to all available endpoints, it will return all errors happened.
Which means that it'll try each of the available nodes until it succeeds connecting to one.

How to re-connect to Amazon kubernetes cluster after stopping & starting instances?

I create a cluster for trying out kubernetes using cluster/kube-up.sh in Amazon EC2. Then I stop it to save money when not using it. Next time I start the master & minion instances in amazon, *~/.kube/config has old IP-s for the cluster master as EC2 assigns new public IP to the instances.
Currently I havent found way to provide Elastic IP-s to cluster/kube-up.sh so that consistent IP-s between stopping & starting instances would be set in place. Also the certificate in ~/.kube/config for the old IP so manually changing IP doesn't work either:
Running: ./cluster/../cluster/aws/../../cluster/../_output/dockerized/bin/darwin/amd64/kubectl get pods --context=aws_kubernetes
Error: Get https://52.24.72.124/api/v1beta1/pods?namespace=default: x509: certificate is valid for 54.149.120.248, not 52.24.72.124
How to make kubectl make queries against the same kubernetes master on a running on different IP after its restart?
If the only thing that has changed about your cluster is the IP address of the master, you can manually modify the master location by editing the file ~/.kube/config (look for the line that says "server" with an IP address).
This use case (pausing/resuming a cluster) isn't something that we commonly test for so you may encounter other issues once your cluster is back up and running. If you do, please file an issue on the GitHub repository.
I'm not sure which version of Kubernetes you were using but in v1.0.6 you can pass MASTER_RESERVED_IP environment variable to kube-up.sh to assign a given Elastic IP to Kubernetes Master Node.
You can check all the available options for kube-up.sh in config-default.sh file for AWS in Kubernetes repository.