How to re-connect to Amazon kubernetes cluster after stopping & starting instances? - kubernetes

I create a cluster for trying out kubernetes using cluster/kube-up.sh in Amazon EC2. Then I stop it to save money when not using it. Next time I start the master & minion instances in amazon, *~/.kube/config has old IP-s for the cluster master as EC2 assigns new public IP to the instances.
Currently I havent found way to provide Elastic IP-s to cluster/kube-up.sh so that consistent IP-s between stopping & starting instances would be set in place. Also the certificate in ~/.kube/config for the old IP so manually changing IP doesn't work either:
Running: ./cluster/../cluster/aws/../../cluster/../_output/dockerized/bin/darwin/amd64/kubectl get pods --context=aws_kubernetes
Error: Get https://52.24.72.124/api/v1beta1/pods?namespace=default: x509: certificate is valid for 54.149.120.248, not 52.24.72.124
How to make kubectl make queries against the same kubernetes master on a running on different IP after its restart?

If the only thing that has changed about your cluster is the IP address of the master, you can manually modify the master location by editing the file ~/.kube/config (look for the line that says "server" with an IP address).
This use case (pausing/resuming a cluster) isn't something that we commonly test for so you may encounter other issues once your cluster is back up and running. If you do, please file an issue on the GitHub repository.

I'm not sure which version of Kubernetes you were using but in v1.0.6 you can pass MASTER_RESERVED_IP environment variable to kube-up.sh to assign a given Elastic IP to Kubernetes Master Node.
You can check all the available options for kube-up.sh in config-default.sh file for AWS in Kubernetes repository.

Related

Unable to connect to k8s cluster using master/worker IP

I am trying to install a Kubernetes cluster with one master node and two worker nodes.
I acquired 3 VMs for this purpose running on Ubuntu 21.10. In the master node, I installed kubeadm:1.21.4, kubectl:1.21.4, kubelet:1.21.4 and docker-ce:20.4.
I followed this guide to install the cluster. The only difference was in my init command where I did not mention the --control-plane-endpoint. I used calico CNI v3.19.1 and docker for CRI Runtime.
After I installed the cluster, I deployed minio pod and exposed it as a NodePort.
The pod got deployed in the worker node (10.72.12.52) and my master node IP is 10.72.12.51).
For the first two hours, I am able to access the login page via all three IPs (10.72.12.51:30981, 10.72.12.52:30981, 10.72.13.53:30981). However, after two hours, I lost access to the service via 10.72.12.51:30981 and 10.72.13.53:30981. Now I am only able to access the service from the node on which it is running (10.72.12.52).
I have disabled the firewall and added calico.conf file inside /etc/NetworkManager/conf.d with the following content:
[keyfile]
unmanaged-devices=interface-name:cali*;interface-name:tunl*;interface-name:vxlan.calico
What am I missing in the setup that might cause this issue?
This is a community wiki answer posted for better visibility. Feel free to expand it.
As mentioned by #AbhinavSharma the problem was solved by switching from Calico to Flannel CNI.
More information regarding Flannel itself can be found here.

kubeadm init on CentOS 7 using AWS as cloud provider enters a deadlock state

I am trying to install Kubernetes 1.4 on a CentOS 7 cluster on AWS (the same happens with Ubuntu 16.04, though) using the new kubeadm tool.
Here's the output of the command kubeadm init --cloud-provider aws on the master node:
# kubeadm init --cloud-provider aws
<cmd/init> cloud provider "aws" initialized for the control plane. Remember to set the same cloud provider flag on the kubelet.
<master/tokens> generated token: "980532.888de26b1ef9caa3"
<master/pki> created keys and certificates in "/etc/kubernetes/pki"
<util/kubeconfig> created "/etc/kubernetes/kubelet.conf"
<util/kubeconfig> created "/etc/kubernetes/admin.conf"
<master/apiclient> created API client configuration
<master/apiclient> created API client, waiting for the control plane to become ready
The issue is that the control plane does not become ready and the command seems to enter a deadlock state. I also noticed that if the --cloud-provider flag is not provided, pulling images from Amazon EC2 Container Registry does not work, and when creating a service with type LoadBalancer an Elastic Load Balancer is not created.
Has anyone run kubeadm using aws as cloud provider?
Let me know if any further information is needed.
Thanks!
I launched a cluster with kubeadm on AWS recently (kubernetes 1.5.1), and it was stuck on same step as your does. To solve it I had to add "--api-advertise-addresses=LOCAL-EC2-IP", it didn't work with external IP (which kubeadm probably fetches itself, when not specified other IP). So it's either a network connectivity issue (try temporarily a 0.0.0.0/0 security group rule on that master instance), or something else... In my case was a network issue, it wasn't able to connect to itself using its own external IP :)
Regarding PV and ELB integrations, I actually did launch a "PersistentVolumeClaim" with my MongoDB cluster and it works (it created the volume and attached to one of the slave nodes)
here is it for example:
PV created and attached to slave node
So latest version of kubeadm that ships with kubernetes 1.5.1 should work for you too!
One thing to note: you must have proper IAM role permission to create resources (assign your master node, IAM role with something like "EC2 full access" during testing, you can tune it later to allow only the few needed actions)
Hope it helps.
The documentation (as of now) clearly states the following in the limitations:
The cluster created here doesn’t have cloud-provider integrations, so for example won’t work with (for example) Load Balancers (LBs) or Persistent Volumes (PVs). To easily obtain a cluster which works with LBs and PVs Kubernetes, try the “hello world” GKE tutorial or one of the other cloud-specific installation tutorials.
http://kubernetes.io/docs/getting-started-guides/kubeadm/
There are a couple of possibilities I am aware of here -:
1) In older kubeadm versions selinux blocks access at this point
2) If you are behind a proxy you will need to add the usual to the kubeadm environment -:
HTTP_PROXY
HTTPS_PROXY
NO_PROXY
Plus, which I have not seen documented anywhere -:
KUBERNETES_HTTP_PROXY
KUBERNETES_HTTPS_PROXY
KUBERNETES_NO_PROXY

How to set KUBE_ENABLE_INSECURE_REGISTRY=true on a running Kubernetes cluster?

I forgot to set export KUBE_ENABLE_INSECURE_REGISTRY=true when running kube-up.sh (AWS provider). I was wondering if there was anyway to retroactively apply that change to a running cluster. It is only a 3 node cluster so doing it manually is an option. Or is the only way to tear down the cluster and start from scratch?
I haven't tested it but in theory you just need to add --insecure-registry 10.0.0.0/8 (if you are running your insecure registry in the kube network 10.0.0.0/8) to the docker daemon options (DOCKER_OPTS).
You can also specify the url instead of the network.

Kubernetes UI on Google Container Engine

Is there any way to access the UI on the GKE service?
I tried following the information on https://github.com/kubernetes/kubernetes/blob/v1.0.6/docs/user-guide/ui.md
And got this
Error: 'empty tunnel list.'
Trying to reach: 'http://10.64.xx.xx:8080/'
Is this feature turned on ?
That error means that the master can't communicate with the nodes in your cluster. Have you deleted the instances from your cluster, or messed with the firewalls? There should be a firewall allowing access SSH to the nodes in the cluster from the master's IP address, and an entry in your project-wide metadata with the master's public SSH key.
Something to check.. make sure you haven't added ssh keys to the cluster nodes metadata. I did this a few weeks back... opened a support case.. and found that I should have added the keys to the project metadata instead.

How do you replace a Kubernetes Master node on AWS?

How do I replace a downed master node? In particular, how do you replace a node on AWS if you are using the kubernetes tools?
If I restart the node, it doesn't start correctly, if I clone the node, it definitely does not.
You'll have to connect the new master to the current etcd cluster, or create a new etcd cluster from a snapshot of the old one, in order to preserve state of the cluster. Aside from that, you have to ensure that the nodes point to the new master's IP address. I also suggest looking at HA masters if you are running a version greater than 1.0.X http://kubernetes.io/v1.1/docs/admin/high-availability.html.
If replacing it comes up with a new IP, you'll have to update all your nodes, which know how to reach it by IP or by internal DNS (in flags).