etcdserver: request timed out - kubernetes

I've backed up my etcd and after restoring it, i can't Create/Update/Delete anything in my cluster!
I've exactly went through the docs
Here are my steps:
Backing up etcd
Save the snapshop
sudo ETCDCTL_API=3 etcdctl snapshot save /tmp/etcd-backup-new.db \
--cacert /etc/kubernetes/pki/etcd/ca.crt \
--cert /etc/kubernetes/pki/etcd/server.crt \
--key /etc/kubernetes/pki/etcd/server.key
Check the status
$ sudo ETCDCTL_API=3 etcdctl snapshot status /tmp/etcd-backup-new.db --write-out=table
+----------+----------+------------+------------+
| HASH | REVISION | TOTAL KEYS | TOTAL SIZE |
+----------+----------+------------+------------+
| d8d0da24 | 7220348 | 874 | 1.9 MB |
+----------+----------+------------+------------+
Restoring etcd
Create Restore Point from backup
sudo ETCDCTL_API=3 etcdctl snapshot restore /tmp/etcd-backup-new.db \
--data-dir /var/lib/etcd-backup
Tell etcd to use new location
sudo vim /etc/kubernetes/manifests/etcd.yaml
- hostPath:
path: /var/lib/etcd-backup # Changed this ONLY!
type: DirectoryOrCreate
name: etcd-data
As far as i know, Kubelet restarts static Pods automatically. So, after a while everything seems good!
$ k get all -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/coredns-6d4b75cb6d-6cmtm 1/1 Running 1 (7d23h ago) 72d
kube-system pod/coredns-6d4b75cb6d-wchss 1/1 Running 1 (7d23h ago) 72d
kube-system pod/etcd-master 1/1 Running 2 (7d23h ago) 72d
kube-system pod/kube-apiserver-master 1/1 Running 1 (7d23h ago) 39d
kube-system pod/kube-controller-manager-master 1/1 Running 4 (7d23h ago) 72d
kube-system pod/kube-proxy-mqzbd 1/1 Running 1 (7d23h ago) 72d
kube-system pod/kube-scheduler-master 1/1 Running 4 (7d23h ago) 72d
kube-system pod/weave-net-4xtwz 2/2 Running 3 (7d23h ago) 49d
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 72d
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 72d
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system daemonset.apps/kube-proxy 1 1 1 1 1 kubernetes.io/os=linux 72d
kube-system daemonset.apps/weave-net 1 1 1 1 1 <none> 49d
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
kube-system deployment.apps/coredns 2/2 2 2 72d
NAMESPACE NAME DESIRED CURRENT READY AGE
kube-system replicaset.apps/coredns-6d4b75cb6d 2 2 2 72d
The Probelem
So, it seems everything is fine but its not! e.g.
$ k run test --image nginx
Error from server: etcdserver: request timed out
or
$ k rollout restart daemonset.apps/kube-proxy -n kube-system
error: failed to patch: etcdserver: request timed out
What is my mistake?
P.S: Kubernetes version: v1.24.1

Related

calico-kube-controller stays in pending state

I have a new install of kubernetes on Ubuntu-18 using version 1.24.3 with Calico. The calico-controller will not start:
$ sudo kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-555bc4b957-z4q2p 0/1 Pending 0 5m14s
kube-system calico-node-jz2j7 1/1 Running 0 5m15s
kube-system coredns-6d4b75cb6d-hwfx9 1/1 Running 0 5m14s
kube-system coredns-6d4b75cb6d-wdh55 1/1 Running 0 5m14s
kube-system etcd-ubuntu-18-extssd 1/1 Running 1 5m27s
kube-system kube-apiserver-ubuntu-18-extssd 1/1 Running 1 5m28s
kube-system kube-controller-manager-ubuntu-18-extssd 1/1 Running 1 5m26s
kube-system kube-proxy-t5z2r 1/1 Running 0 5m15s
kube-system kube-scheduler-ubuntu-18-extssd 1/1 Running 1 5m27s
Someone suggested setting a couple of Calico timeouts to 60 seconds, but that didn't work either.
What could be causing the calico-controller to fail to start, especially since the calico-node is running?
Also, is there a more trouble-free CNI implementation to use? Calico seems very error-prone.
I solved this by installing Weave:
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
with this cidr:
sudo kubeadm init --pod-network-cidr=192.168.0.0/16

Kubernetes API container dies constantly

I just installed from scratch a small Kubernetes test cluster in a 4 Armbian/Odroid_MC1 (Debian 10) nodes. The install process is this 1, nothing fancy or special, adding k8s apt repo and install with apt.
The problem is that the API server dies constantly, like every 5 to 10 minutes, after the controller-manager and the scheduler die together, who seem to stop simultaneously before. Evidently, the API becomes unusable for like a minute. All three services do restart, and things run fine for the next four to nine minutes, when the loop repeats. Logs are here 2. This is an excerpt:
$ kubectl get pods -o wide --all-namespaces
The connection to the server 192.168.1.91:6443 was refused - did you specify the right host or port?
(a minute later)
$ kubectl get pods -o wide --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system coredns-74ff55c5b-8pm9r 1/1 Running 2 88m 10.244.0.7 mc1 <none> <none>
kube-system coredns-74ff55c5b-pxdqz 1/1 Running 2 88m 10.244.0.6 mc1 <none> <none>
kube-system etcd-mc1 1/1 Running 2 88m 192.168.1.91 mc1 <none> <none>
kube-system kube-apiserver-mc1 0/1 Running 12 88m 192.168.1.91 mc1 <none> <none>
kube-system kube-controller-manager-mc1 1/1 Running 5 31m 192.168.1.91 mc1 <none> <none>
kube-system kube-flannel-ds-fxg2s 1/1 Running 5 45m 192.168.1.94 mc4 <none> <none>
kube-system kube-flannel-ds-jvvmp 1/1 Running 5 48m 192.168.1.92 mc2 <none> <none>
kube-system kube-flannel-ds-qlvbc 1/1 Running 6 45m 192.168.1.93 mc3 <none> <none>
kube-system kube-flannel-ds-ssb9t 1/1 Running 3 77m 192.168.1.91 mc1 <none> <none>
kube-system kube-proxy-7t9ff 1/1 Running 2 45m 192.168.1.93 mc3 <none> <none>
kube-system kube-proxy-8jhc7 1/1 Running 2 88m 192.168.1.91 mc1 <none> <none>
kube-system kube-proxy-cg75m 1/1 Running 2 45m 192.168.1.94 mc4 <none> <none>
kube-system kube-proxy-mq8j7 1/1 Running 2 48m 192.168.1.92 mc2 <none> <none>
kube-system kube-scheduler-mc1 1/1 Running 5 31m 192.168.1.91 mc1 <none> <none>
$ docker ps -a # (check the exited and restarted services)
CONTAINER ID NAMES STATUS IMAGE NETWORKS PORTS
0e179c6495db k8s_kube-apiserver_kube-apiserver-mc1_kube-system_c55114bd57b1bf357c8f4c0d749ae105_13 Up About a minute 66eaad223e2c
2ccb014beb73 k8s_kube-scheduler_kube-scheduler-mc1_kube-system_fe362b2b6b08ca576b7416df7f2e7845_6 Up 3 minutes 21e17680ca2d
3322f6ec1546 k8s_kube-controller-manager_kube-controller-manager-mc1_kube-system_17cf17caf36ba27e3d2ec4f113a0cf6f_6 Up 3 minutes a1ab72ce4ba2
583129da455f k8s_kube-apiserver_kube-apiserver-mc1_kube-system_c55114bd57b1bf357c8f4c0d749ae105_12 Exited (137) About a minute ago 66eaad223e2c
72268d8e1503 k8s_install-cni_kube-flannel-ds-ssb9t_kube-system_dbf3513d-dad2-462d-9107-4813acf9c23a_0 Exited (0) 5 minutes ago 263b01b3ca1f
fe013d07f186 k8s_kube-controller-manager_kube-controller-manager-mc1_kube-system_17cf17caf36ba27e3d2ec4f113a0cf6f_5 Exited (255) 3 minutes ago a1ab72ce4ba2
34ef8757b63d k8s_kube-scheduler_kube-scheduler-mc1_kube-system_fe362b2b6b08ca576b7416df7f2e7845_5 Exited (255) 3 minutes ago 21e17680ca2d
fd8e0c0ba27f k8s_coredns_coredns-74ff55c5b-8pm9r_kube-system_3b813dc9-827d-4cf6-88cc-027491b350f1_2 Up 32 minutes 15c1a66b013b
f44e2c45ed87 k8s_coredns_coredns-74ff55c5b-pxdqz_kube-system_c3b7fbf2-2064-4f3f-b1b2-dec5dad904b7_2 Up 32 minutes 15c1a66b013b
04fa4eca1240 k8s_POD_coredns-74ff55c5b-8pm9r_kube-system_3b813dc9-827d-4cf6-88cc-027491b350f1_42 Up 32 minutes k8s.gcr.io/pause:3.2 none
f00c36d6de75 k8s_POD_coredns-74ff55c5b-pxdqz_kube-system_c3b7fbf2-2064-4f3f-b1b2-dec5dad904b7_42 Up 32 minutes k8s.gcr.io/pause:3.2 none
a1d6814e1b04 k8s_kube-flannel_kube-flannel-ds-ssb9t_kube-system_dbf3513d-dad2-462d-9107-4813acf9c23a_3 Up 32 minutes 263b01b3ca1f
94b231456ed7 k8s_kube-proxy_kube-proxy-8jhc7_kube-system_cc637e27-3b14-41bd-9f04-c1779e500a3a_2 Up 33 minutes 377de0f45e5c
df91856450bd k8s_POD_kube-flannel-ds-ssb9t_kube-system_dbf3513d-dad2-462d-9107-4813acf9c23a_2 Up 34 minutes k8s.gcr.io/pause:3.2 host
b480b844671a k8s_POD_kube-proxy-8jhc7_kube-system_cc637e27-3b14-41bd-9f04-c1779e500a3a_2 Up 34 minutes k8s.gcr.io/pause:3.2 host
1d4a7bcaad38 k8s_etcd_etcd-mc1_kube-system_14b7b6d6446e21cc57f0b40571ae3958_2 Up 35 minutes 2e91dde7e952
e5d517a9c29d k8s_POD_kube-controller-manager-mc1_kube-system_17cf17caf36ba27e3d2ec4f113a0cf6f_1 Up 35 minutes k8s.gcr.io/pause:3.2 host
3a3da7dbf3ad k8s_POD_kube-apiserver-mc1_kube-system_c55114bd57b1bf357c8f4c0d749ae105_2 Up 35 minutes k8s.gcr.io/pause:3.2 host
eef29cdebf5f k8s_POD_etcd-mc1_kube-system_14b7b6d6446e21cc57f0b40571ae3958_2 Up 35 minutes k8s.gcr.io/pause:3.2 host
3631d43757bc k8s_POD_kube-scheduler-mc1_kube-system_fe362b2b6b08ca576b7416df7f2e7845_1 Up 35 minutes k8s.gcr.io/pause:3.2 host
I see no weird issues on the logs (I'm a k8s beginner). This was working until a month ago, when I've reinstalled this for practicing, this is probably my tenth install attempt, I've tried different options, versions and googled a lot, but can't find no solution.
What could be the reason? What else can I try? How can I get to the root of the problem?
UPDATE 2021/02/06
The problem is not occurring anymore. Apparently, the issue was the version in this specific case. Didn't filed an issue because I didn't found clues regarding what specific issue to report.
The installation procedure in all cases was this:
# swapoff -a
# curl -sL get.docker.com|sh
# usermod -aG docker rodolfoap
# curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
# echo "deb http://apt.kubernetes.io/ kubernetes-xenial main" > /etc/apt/sources.list.d/kubernetes.list
# apt-get update
# apt-get install -y kubeadm kubectl kubectx # Master
# kubeadm config images pull
# kubeadm init --apiserver-advertise-address=0.0.0.0 --pod-network-cidr=10.244.0.0/16
Armbian-20.08.1 worked fine. My installation procedure has not changed since.
Armbian-20.11.3 had the issue: the API, scheduler and coredns restarted every 5 minutes, blocking the access to the API 5 of each 8 minutes, average..
Armbian-21.02.1 works fine. Worked at the first install, same procedure.
All versions were updated to the last kernel, at the moment of the install, current is 5.10.12-odroidxu4.
As you can see, after around two hours, no API reboots:
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE LABELS
kube-system coredns-74ff55c5b-gnvf2 1/1 Running 0 173m 10.244.0.2 mc1 k8s-app=kube-dns,pod-template-hash=74ff55c5b
kube-system coredns-74ff55c5b-wvnnz 1/1 Running 0 173m 10.244.0.3 mc1 k8s-app=kube-dns,pod-template-hash=74ff55c5b
kube-system etcd-mc1 1/1 Running 0 173m 192.168.1.91 mc1 component=etcd,tier=control-plane
kube-system kube-apiserver-mc1 1/1 Running 0 173m 192.168.1.91 mc1 component=kube-apiserver,tier=control-plane
kube-system kube-controller-manager-mc1 1/1 Running 0 173m 192.168.1.91 mc1 component=kube-controller-manager,tier=control-plane
kube-system kube-flannel-ds-c4jgv 1/1 Running 0 123m 192.168.1.93 mc3 app=flannel,controller-revision-hash=64465d999,pod-template-generation=1,tier=node
kube-system kube-flannel-ds-cl6n5 1/1 Running 0 75m 192.168.1.94 mc4 app=flannel,controller-revision-hash=64465d999,pod-template-generation=1,tier=node
kube-system kube-flannel-ds-z2nmw 1/1 Running 0 75m 192.168.1.92 mc2 app=flannel,controller-revision-hash=64465d999,pod-template-generation=1,tier=node
kube-system kube-flannel-ds-zqxh7 1/1 Running 0 150m 192.168.1.91 mc1 app=flannel,controller-revision-hash=64465d999,pod-template-generation=1,tier=node
kube-system kube-proxy-bd596 1/1 Running 0 75m 192.168.1.94 mc4 controller-revision-hash=b89db7f56,k8s-app=kube-proxy,pod-template-generation=1
kube-system kube-proxy-n6djp 1/1 Running 0 75m 192.168.1.92 mc2 controller-revision-hash=b89db7f56,k8s-app=kube-proxy,pod-template-generation=1
kube-system kube-proxy-rf4cr 1/1 Running 0 173m 192.168.1.91 mc1 controller-revision-hash=b89db7f56,k8s-app=kube-proxy,pod-template-generation=1
kube-system kube-proxy-xhl95 1/1 Running 0 123m 192.168.1.93 mc3 controller-revision-hash=b89db7f56,k8s-app=kube-proxy,pod-template-generation=1
kube-system kube-scheduler-mc1 1/1 Running 0 173m 192.168.1.91 mc1 component=kube-scheduler,tier=control-plane
Cluster is fully functional :)
I have the same problem, but with Ubuntu:
PRETTY_NAME="Ubuntu 22.04 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04 (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
The cluster works good for:
Ubuntu 20.04 LTS
Ubuntu 18.04 LTS
Thought it will help someone else who is running ubuntu instead of Armbian.
Solution for ubuntu (possible for Armbian too) is here: Issues with "stability" with Kubernetes cluster before adding networking
Apparently it is a problem with the config of containerd on those versions.
UPDATE:
The problem is that if you use sudo apt install containerd, you will install the version v1.5.9 which has the option SystemdCgroup = false that worked, in my case, in Ubuntu 20.04 but on the Ubuntu 22.04 doesn't work. But if you change it to SystemdCgroup = trueit works.(this feature is updated in containerd v1.6.2 so that it is set on true). This will hopefully fix your problem too.

How to remove a master node from a HA cluster and also from etcd cluster

I am new to k8s and I found a problem that I can not resolve.
I am building a HA cluster of Master nodes. I am running some tests (removing one node and adding the node again). Through this process I noticed that the etcd cluster does not update the cluster list.
Sample of problem below:
$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
cri-o-metrics-exporter cri-o-metrics-exporter-77c9cf9746-qlp4d 0/1 Pending 0 16h
haproxy-controller haproxy-ingress-769d858699-b8r8q 0/1 Pending 0 16h
haproxy-controller ingress-default-backend-5fd4986454-kvbw8 0/1 Pending 0 16h
kube-system calico-kube-controllers-574d679d8c-tkcjj 1/1 Running 3 16h
kube-system calico-node-95t6l 1/1 Running 2 16h
kube-system calico-node-m5txs 1/1 Running 2 16h
kube-system coredns-7588b55795-gkfjq 1/1 Running 2 16h
kube-system coredns-7588b55795-lxpmj 1/1 Running 2 16h
kube-system etcd-masterNode1 1/1 Running 2 16h
kube-system etcd-masterNode2 1/1 Running 2 16h
kube-system kube-apiserver-masterNode1 1/1 Running 3 16h
kube-system kube-apiserver-masterNode2 1/1 Running 3 16h
kube-system kube-controller-manager-masterNode1 1/1 Running 4 16h
kube-system kube-controller-manager-masterNode2 1/1 Running 4 16h
kube-system kube-proxy-5q6xs 1/1 Running 2 16h
kube-system kube-proxy-k8p6h 1/1 Running 2 16h
kube-system kube-scheduler-masterNode1 1/1 Running 3 16h
kube-system kube-scheduler-masterNode2 1/1 Running 6 16h
kube-system metrics-server-575bd7f776-jtfsh 0/1 Pending 0 16h
kubernetes-dashboard dashboard-metrics-scraper-6f78bc588b-khjjr 1/1 Running 2 16h
kubernetes-dashboard kubernetes-dashboard-978555c5b-9jsxb 1/1 Running 2 16h
$ kubectl exec etcd-masterNode2 -n kube-system -it -- sh
sh-5.0# etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/peer.crt --key /etc/kubernetes/pki/etcd/peer.key member list -w table
+------------------+---------+----------------------------+---------------------------+---------------------------+------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
+------------------+---------+----------------------------+---------------------------+---------------------------+------------+
| 4c209e5bc1ca9593 | started | masterNode1 | https://IP1:2380 | https://IP1:2379 | false |
| 676d4bfab319fa22 | started | masterNode2 | https://IP2:2380 | https://IP2:2379 | false |
| a9af4b00e33f87d4 | started | masterNode3 | https://IP3:2380 | https://IP3:2379 | false |
+------------------+---------+----------------------------+---------------------------+---------------------------+------------+
sh-5.0# exit
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
masterNode1 Ready master 16h v1.19.0
masterNode2 Ready master 16h v1.19.0
I assume that I am removing correctly the node from the cluster. The procedure that I am following:
kubectl drain --ignore-daemonsets --delete-local-data
kubectl delete
node kubeadm reset
rm -f /etc/cni/net.d/* # Removing CNI configuration
rm -rf /var/lib/kubelet # Removing /var/lib/kubeler dir
rm -rf /var/lib/etcd # Removing /var/lib/etcd
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X && iptables -t filter -F && iptables -t filter -X # Removing iptables
ipvsadm --clear
rm -rf /etc/kubernetes # Removing /etc/kubernetes (in case of character change)
I am running kubernetes with version 1.19.0 and etcd etcd:3.4.9-1.
The cluster is running on bare metal nodes.
Is this a bug or I am not removing the node correctly from the etcd cluster?
Thanks to Mariusz K. I found the answer to my problem. In case that someone else might have the same problem here is how I solved it.
First query the cluster (HA) for the etcd members (sample of code):
$ kubectl exec etcd-< nodeNameMasterNode > -n kube-system -- etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/peer.crt --key /etc/kubernetes/pki/etcd/peer.key member list
1863b58e85c8a808, started, nodeNameMaster1, https://IP1:2380, https://IP1:2379, false
676d4bfab319fa22, started, nodeNameMaster2, https://IP2:2380, https://IP2:2379, false
b0c50c50d563ed51, started, nodeNameMaster3, https://IP3:2380, https://IP3:2379, false
Then once you have the list of nodes you can remove any member you want. Sample of code:
kubectl exec etcd-nodeNameMaster1 -n kube-system -- etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/peer.crt --key /etc/kubernetes/pki/etcd/peer.key member remove b0c50c50d563ed51
Member b0c50c50d563ed51 removed from cluster d1e1de99e3d19634
I wanted to be able to remove a member from the etcd cluster without the need to connect to the pod and run a secondary command. This way I execute the command to the pod through exec.

How to resolve Kubernetes DNS issues when trying to install Weave Cloud Agents for Minikube

I was trying to install the Weave Cloud Agents for my minikube. I used the provided command
curl -Ls https://get.weave.works |sh -s -- --token=xxx
but keep getting the following error:
There was an error while performing a DNS check: checking DNS failed, the DNS in the Kubernetes cluster is not working correctly. Please check that your cluster can download images and run pods.
I have following dns:
kube-system coredns-6955765f44-7zt4x 1/1 Running 0 38m
kube-system coredns-6955765f44-xdnd9 1/1 Running 0 38m
I tried different suggestions such as https://www.jeffgeerling.com/blog/2019/debugging-networking-issues-multi-node-kubernetes-on-virtualbox or https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/. However none of them resolved my issue.
It seems to an issue which happened before https://github.com/weaveworks/launcher/issues/285.
My Kubernetes is on v1.17.3
Reproduced you issue, have the same error.
minikube v1.7.2 on Centos 7.7.1908
Docker 19.03.5
vm-driver=virtualbox
Connecting cluster to "Old Tree 34" (id: old-tree-34) on Weave Cloud
Installing Weave Cloud agents on minikube at https://192.168.99.100:8443
Performing a check of the Kubernetes installation setup.
There was an error while performing a DNS check: checking DNS failed, the DNS in the Kubernetes cluster is not working correctly. Please check that your cluster can download images and run pods.
I wasnt able to fix this problem, instead of that found a workaround - use Helm. You have second tab 'Helm 'in 'Install the Weave Cloud Agents' with provided command, like
helm repo update && helm upgrade --install --wait weave-cloud \
--set token=xxx \
--namespace weave \
stable/weave-cloud
Lets install Helm and use it.
curl https://raw.githubusercontent.com/helm/helm/master/scripts/get | bash
kubectl create serviceaccount --namespace kube-system tiller
kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
helm init --service-account tiller
.....
Tiller (the Helm server-side component) has been installed into your Kubernetes Cluster.
helm repo update
helm upgrade --install --wait weave-cloud \
> --set token=xxx \
> --namespace weave \
> stable/weave-cloud
Release "weave-cloud" does not exist. Installing it now.
NAME: weave-cloud
LAST DEPLOYED: Thu Feb 13 14:52:45 2020
NAMESPACE: weave
STATUS: DEPLOYED
RESOURCES:
==> v1/Deployment
NAME AGE
weave-agent 35s
==> v1/Pod(related)
NAME AGE
weave-agent-69fbf74889-dw77c 35s
==> v1/Secret
NAME AGE
weave-cloud 35s
==> v1/ServiceAccount
NAME AGE
weave-cloud 35s
==> v1beta1/ClusterRole
NAME AGE
weave-cloud 35s
==> v1beta1/ClusterRoleBinding
NAME AGE
weave-cloud 35s
NOTES:
Weave Cloud agents had been installed!
First, verify all Pods are running:
kubectl get pods -n weave
Next, login to Weave Cloud (https://cloud.weave.works) and verify the agents are connect to your instance.
If you need help or have any question, join our Slack to chat to us – https://slack.weave.works.
Happy hacking!
Check(wait around 10 min to deploy everything):
kubectl get pods -n weave
NAME READY STATUS RESTARTS AGE
kube-state-metrics-64599b7996-d8pnw 1/1 Running 0 29m
prom-node-exporter-2lwbn 1/1 Running 0 29m
prometheus-5586cdd667-dtdqq 2/2 Running 0 29m
weave-agent-6c77dbc569-xc9qx 1/1 Running 0 29m
weave-flux-agent-65cb4694d8-sllks 1/1 Running 0 29m
weave-flux-memcached-676f88fcf7-ktwnp 1/1 Running 0 29m
weave-scope-agent-7lgll 1/1 Running 0 29m
weave-scope-cluster-agent-8fb596b6b-mddv8 1/1 Running 0 29m
[vkryvoruchko#nested-vm-image1 bin]$ kubectl get all -n weave
NAME READY STATUS RESTARTS AGE
pod/kube-state-metrics-64599b7996-d8pnw 1/1 Running 0 30m
pod/prom-node-exporter-2lwbn 1/1 Running 0 30m
pod/prometheus-5586cdd667-dtdqq 2/2 Running 0 30m
pod/weave-agent-6c77dbc569-xc9qx 1/1 Running 0 30m
pod/weave-flux-agent-65cb4694d8-sllks 1/1 Running 0 30m
pod/weave-flux-memcached-676f88fcf7-ktwnp 1/1 Running 0 30m
pod/weave-scope-agent-7lgll 1/1 Running 0 30m
pod/weave-scope-cluster-agent-8fb596b6b-mddv8 1/1 Running 0 30m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/prometheus ClusterIP 10.108.197.29 <none> 80/TCP 30m
service/weave-flux-memcached ClusterIP None <none> 11211/TCP 30m
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/prom-node-exporter 1 1 1 1 1 <none> 30m
daemonset.apps/weave-scope-agent 1 1 1 1 1 <none> 30m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/kube-state-metrics 1/1 1 1 30m
deployment.apps/prometheus 1/1 1 1 30m
deployment.apps/weave-agent 1/1 1 1 31m
deployment.apps/weave-flux-agent 1/1 1 1 30m
deployment.apps/weave-flux-memcached 1/1 1 1 30m
deployment.apps/weave-scope-cluster-agent 1/1 1 1 30m
NAME DESIRED CURRENT READY AGE
replicaset.apps/kube-state-metrics-64599b7996 1 1 1 30m
replicaset.apps/prometheus-5586cdd667 1 1 1 30m
replicaset.apps/weave-agent-69fbf74889 0 0 0 31m
replicaset.apps/weave-agent-6c77dbc569 1 1 1 30m
replicaset.apps/weave-flux-agent-65cb4694d8 1 1 1 30m
replicaset.apps/weave-flux-memcached-676f88fcf7 1 1 1 30m
replicaset.apps/weave-scope-cluster-agent-8fb596b6b 1 1 1 30m
Login to https://cloud.weave.works/ and check the same:
Started installing agents on Kubernetes cluster v1.17.2
All Weave Cloud agents are connected!

Kubernetes services sometime no response

My cluster contains 1 master with 3 worker nodes in which 1 POD with 2 replica sets and 1 service are created. When I try to access the service via the command curl <ClusterIP>:<port> either from 2 worker nodes, sometimes it can feedback Nginx welcome, but sometimes it gets stuck and connection refused and timeout.
I checked the Kubernetes Service, POD and endpoints are fine, but no clue what is going on. Please advise.
vagrant#k8s-master:~/_projects/tmp1$ sudo kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8s-master Ready master 23d v1.12.2 192.168.205.10 <none> Ubuntu 16.04.4 LTS 4.4.0-139-generic docker://17.3.2
k8s-worker1 Ready <none> 23d v1.12.2 192.168.205.11 <none> Ubuntu 16.04.4 LTS 4.4.0-139-generic docker://17.3.2
k8s-worker2 Ready <none> 23d v1.12.2 192.168.205.12 <none> Ubuntu 16.04.4 LTS 4.4.0-139-generic docker://17.3.2
vagrant#k8s-master:~/_projects/tmp1$ sudo kubectl get pod -o wide --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
default my-nginx-756f645cd7-pfdck 1/1 Running 0 5m23s 10.244.2.39 k8s-worker2 <none>
default my-nginx-756f645cd7-xpbnp 1/1 Running 0 5m23s 10.244.1.40 k8s-worker1 <none>
kube-system coredns-576cbf47c7-ljx68 1/1 Running 18 23d 10.244.0.38 k8s-master <none>
kube-system coredns-576cbf47c7-nwlph 1/1 Running 18 23d 10.244.0.39 k8s-master <none>
kube-system etcd-k8s-master 1/1 Running 18 23d 192.168.205.10 k8s-master <none>
kube-system kube-apiserver-k8s-master 1/1 Running 18 23d 192.168.205.10 k8s-master <none>
kube-system kube-controller-manager-k8s-master 1/1 Running 18 23d 192.168.205.10 k8s-master <none>
kube-system kube-flannel-ds-54xnb 1/1 Running 2 2d5h 192.168.205.12 k8s-worker2 <none>
kube-system kube-flannel-ds-9q295 1/1 Running 2 2d5h 192.168.205.11 k8s-worker1 <none>
kube-system kube-flannel-ds-q25xw 1/1 Running 2 2d5h 192.168.205.10 k8s-master <none>
kube-system kube-proxy-gkpwp 1/1 Running 15 23d 192.168.205.11 k8s-worker1 <none>
kube-system kube-proxy-gncjh 1/1 Running 18 23d 192.168.205.10 k8s-master <none>
kube-system kube-proxy-m4jfm 1/1 Running 15 23d 192.168.205.12 k8s-worker2 <none>
kube-system kube-scheduler-k8s-master 1/1 Running 18 23d 192.168.205.10 k8s-master <none>
kube-system kubernetes-dashboard-77fd78f978-4r62r 1/1 Running 15 23d 10.244.1.38 k8s-worker1 <none>
vagrant#k8s-master:~/_projects/tmp1$ sudo kubectl get svc -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 23d <none>
my-nginx ClusterIP 10.98.9.75 <none> 80/TCP 75s run=my-nginx
vagrant#k8s-master:~/_projects/tmp1$ sudo kubectl get endpoints
NAME ENDPOINTS AGE
kubernetes 192.168.205.10:6443 23d
my-nginx 10.244.1.40:80,10.244.2.39:80 101s
This sounds odd but it could be that one of your pods is serving traffic and the other is not. You can try shelling into the pods:
$ kubectl exec -it my-nginx-756f645cd7-rs2w2 sh
$ kubectl exec -it my-nginx-756f645cd7-vwzrl sh
You can see if they are listening on port 80:
$ curl localhost:80
You can also see if your service has the two endpoints 10.244.2.28:80 and 10.244.1.29:80.
$ kubectl get ep my-nginx
$ kubectl get ep my-nginx -o=yaml
Also, try to connect to each one of your endpoints from a node:
$ curl 10.244.2.28:80
$ curl 10.244.2.29:80