Kubernetes setup for docker container - kubectl get minions failing - kubernetes

I am setting up Kubernetes with flannel following the instructions from
https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/getting-started-guides/centos/centos_manual_config.md
http://www.severalnines.com/blog/installing-kubernetes-cluster-minions-centos7-manage-pods-services
I am blocked at the following two steps, and unable to locate troubleshooting steps. I am running this on master node.
kubectl get minions
Error: Get http://localhost:8080/api/v1beta3/minions: dial tcp 127.0.0.1:8080: connection refused
Is this related to to the flannel network or should it give the minion information on the master node.
etcdctl mk /coreos.com/network/config '{"Network":"172.17.0.0/16"}'
Error: cannot sync with the cluster using endpoints http://127.0.0.1:4001, http://127.0.0.1:2379
Where is the port 2379 specified and how do I troubleshoot the sync step to work ?

It looks like you have an issue with ETCD. Are you sure you have ETCD cluster up and running? How did you configure your ETCD cluster?

Some time ago Minions were renamed to Nodes, so you should use kubectl get nodes instead.

Both the errors went away when I restarted the etcd service
sudo systemctl start etcd
systemctl status etcd
Active: active (running)
I am now not getting the errors.
However, the command > >kubectl get minions
is not giving any output. I am looking for way to debug this, as I am expecting it to list two other nodes.
I followed the steps with a clean machine, and got it working.

Related

k8s ClusterIP:Port accessible only within the node of running the pod

I create 3 ubuntu VMs in AWS and use kubeadm to set up the cluster in the master nodes and open port 6443. and apply flannel network via below command:
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
And the I join the other two nodes to the cluster via command join:
kubeadm join 172.31.5.223:6443
The I apply below two yaml to deploy my deployment and svc
Here comes the issue. I output all resources in k8s master:
I can only access clusterip:port inside the node/ip-172-31-36-90 as it has pod running.
Which results:
I can only access /:NodePorts using IP of node/ip-172-31-36-90 as it has pod running.
I can use curl <externalip/internal of the node/ip-172-31-36-90>:nodeport in other nodes. But this IP can only be ip-172-31-36-90.
If I try above two using IP of master node or node/ip-172-31-41-66, it will get a timeout issue. Notice: Nodeport 30000 are open on all nodes via aws security group.
Anyone can help me with this network issue? I am really bad at debug network stuff.
2.Second question, If I try curl <externalip/internal of the node/ip-172-31-36-90>:nodeport in my local machine, it gives error :
curl: (56) Recv failure: Connection reset by peer
It really bothers me. k8s expert please save me!!!
----------------Update---------------------------
After days of debugging, I notice it is related to IPs of docker0 and flannel.1
, they are not in the same subnet:
But I still don't where I have I done wrong and how to sync them. Any export here, please!

EKS kubectl logs <podname> suddenly stop working

I have pods running on eks, and pulling the container logs worked fine couple days ago. but today when i tried to run kubectl logs podname i get a tls error.
Error from server: Get "https://host:10250/containerLogs/dev/pod-748b649458-bczdq/server": remote error: tls: internal error
does anyone know how to fix this? the other answers on stackoverflow seems to suggest deleting the kubenetes cluster and rebuilding it...... is there no better solutions?
This could probably due to some firewall rules or security settings that were introduced. I would encourage you to check it along with the following troubleshooting steps -
Ensure all EKS nodes are in running state.
Restart nodes as required
Checking networking configuration and see if other kubectl commands are running.

Kubectl connection refused existing cluster

Hope someone can help me.
To describe the situation in short, I have a self managed k8s cluster, running on 3 machines (1 master, 2 worker nodes). In order to make it HA, I attempted to add a second master to the cluster.
After some failed attempts, I found out that I needed to add controlPlaneEndpoint configuration to kubeadm-config config map. So I did, with masternodeHostname:6443.
I generated the certificate and join command for the second master, and after running it on the second master machine, it failed with
error execution phase control-plane-join/etcd: error creating local etcd static pod manifest file: timeout waiting for etcd cluster to be available
Checking the first master now, I get connection refused for the IP on port 6443. So I cannot run any kubectl commands.
Tried recreating the .kube folder, with all the config copied there, no luck.
Restarted kubelet, docker.
The containers running on the cluster seem ok, but I am locked out of any cluster configuration (dashboard is down, kubectl commands not working).
Is there any way I make it work again? Not losing any of the configuration or the deployments already present?
Thanks! Sorry if it’s a noob question.
Cluster information:
Kubernetes version: 1.15.3
Cloud being used: (put bare-metal if not on a public cloud) bare-metal
Installation method: kubeadm
Host OS: RHEL 7
CNI and version: weave 0.3.0
CRI and version: containerd 1.2.6
This is an old, known problem with Kubernetes 1.15 [1,2].
It is caused by short etcd timeout period. As far as I'm aware it is a hard coded value in source, and cannot be changed (feature request to make it configurable is open for version 1.22).
Your best bet would be to upgrade to a newer version, and recreate your cluster.

Replacing dead master in Kubernetes 1.15 cluster with stacked control plane

I have a Kubernetes cluster with 3-master stacked control plane - so each master also has its own etcd instance running locally. The problem I am trying solve is this:
"If one master dies such that it cannot be restarted, how do I replace it?"
Currently, when I try to add the replacement master into the cluster, I get the following error while running kubeadm join:
[check-etcd] Checking that the etcd cluster is healthy
I0302 22:43:41.968068 9158 local.go:66] [etcd] Checking etcd cluster health
I0302 22:43:41.968089 9158 local.go:69] creating etcd client that connects to etcd pods
I0302 22:43:41.986715 9158 etcd.go:106] etcd endpoints read from pods: https://10.0.2.49:2379,https://10.0.225.90:2379,https://10.0.247.138:2379
error execution phase check-etcd: error syncing endpoints with etc: dial tcp 10.0.2.49:2379: connect: no route to host
The 10.0.2.49 node is the one that died. These nodes are all running in an AWS AutoScaling group, so I don't have control over the addresses.
I have drained and deleted the dead master node using kubectl drain and kubectl delete; and I have used etcdctl to make sure the dead node was not in the member list.
Why is it still trying to connect to that node's etcd?
It is still trying to connect to the member because etcd maintains a list of members in its store -- that's how it knows to vote on quorum decisions. I don't believe etcd is unique in that way -- most distributed key-value stores know their member list
The fine manual shows how to remove a dead member, but it also warns to add a new member before removing unhealthy ones.
There is also a project etcdadm that is designed to smooth over some of the rough edges about etcd cluster management, but I haven't used it to say what it is good at versus not
The problem turned out to be that the failed node was still listed in the ConfigMap. Further investigation led me to the following thread, which discusses the same problem:
https://github.com/kubernetes/kubeadm/issues/1300
The solution that worked for me was to edit the ConfigMap manually.
kubectl -n kube-system get cm kubeadm-config -o yaml > tmp-kubeadm-config.yaml
manually edit tmp-kubeadm-config.yaml to remove the old server
kubectl -n kube-system apply -f tmp-kubeadm-config.yaml
I believe updating the etcd member list is still necessary to ensure cluster stability, but it wasn't the full solution.

Problem getting pods stats from kubelet and cri-o

We are running Kubernetes with the following configuration:
On-premise Kubernetes 1.11.3, cri-o 1.11.6 and CentOS7 with UEK-4.14.35
I can't make crictl stats to return pods information, it returns empty list only. Has anyone run into the same problem?
Another issue we have, is that when I query the kubelet for stats/summary it returns an empty pods list.
I think that these two issues are related, although I am not sure which one of them is the problem.
I would recommend checking kubelet service to verify health status and debug any suspicious events within the cluster. I assume that CRI-O runtime engine can select kubelet as the main Pods information provider because of its managing Pod lifecycle role.
systemctl status kubelet -l
journalctl -u kubelet
In case you found some errors or dubious events, share it in a comment below this answer.
However, you can use metrics-server, which will collect Pod metrics in the cluster and enable kube-apiserver flags for Aggregation Layer. Here is a good article about Horizontal Pod Autoscaling and monitoring resources via Prometheus.