I have created a 3 node azure kubernetes cluster using the following commands
az group create --name ResourceGroup --location canadacentral
az provider register -n Microsoft.ContainerService
az provider register -n Microsoft.Compute
az provider register -n Microsoft.Network
az aks create --resource-group ResourceGroup --name ReplicaSet --node-count 3 --kubernetes-version 1.8.7 --node-vm-size Standard_A0 --generate-ssh-keys
kubectl create -f https://raw.githubusercontent.com/openebs/openebs/master/k8s/openebs-operator.yaml
kubectl create -f https://raw.githubusercontent.com/openebs/openebs/master/k8s/openebs-storageclasses.yaml
Subsequently I have created a postgres stateful set as well which does not start since opensci is not installed on the kubelet.
Kubelet logs from Node-1 (where the pgset pod is scheduled)
I0313 05:42:41.910525 7845 reconciler.go:257] operationExecutor.MountVolume started for volume "pvc-a980c1e4-2674-11e8-a384-0a58ac1f03e3" (UniqueName: "kubernetes.io/iscsi/10.0.20.229:3260:iqn.2016-09.com.openebs.jiva:pvc-a980c1e4-2674-11e8-a384-0a58ac1f03e3:0") pod "pgset-0" (UID: "a9826973-2674-11e8-a384-0a58ac1f03e3")
I0313 05:42:41.910605 7845 operation_generator.go:416] MountVolume.WaitForAttach entering for volume "pvc-a980c1e4-2674-11e8-a384-0a58ac1f03e3" (UniqueName: "kubernetes.io/iscsi/10.0.20.229:3260:iqn.2016-09.com.openebs.jiva:pvc-a980c1e4-2674-11e8-a384-0a58ac1f03e3:0") pod "pgset-0" (UID: "a9826973-2674-11e8-a384-0a58ac1f03e3") DevicePath ""
E0313 05:42:41.910744 7845 iscsi_util.go:207] iscsi: could not read iface default error:
E0313 05:42:41.910815 7845 nestedpendingoperations.go:264] Operation for "\"kubernetes.io/iscsi/10.0.20.229:3260:iqn.2016-09.com.openebs.jiva:pvc-a980c1e4-2674-11e8-a384-0a58ac1f03e3:0\"" failed. No retries permitted until 2018-03-13 05:44:43.910784094 +0000 UTC (durationBeforeRetry 2m2s). Error: MountVolume.WaitForAttach failed for volume "pvc-a980c1e4-2674-11e8-a384-0a58ac1f03e3" (UniqueName: "kubernetes.io/iscsi/10.0.20.229:3260:iqn.2016-09.com.openebs.jiva:pvc-a980c1e4-2674-11e8-a384-0a58ac1f03e3:0") pod "pgset-0" (UID: "a9826973-2674-11e8-a384-0a58ac1f03e3") : executable file not found in $PATH
E0313 05:43:12.080406 7845 kubelet.go:1628] Unable to mount volumes for pod "pgset-0_default(a9826973-2674-11e8-a384-0a58ac1f03e3)": timeout expired waiting for volumes to attach/mount for pod "default"/"pgset-0". list of unattached/unmounted volumes=[pgdata]; skipping pod
E0313 05:43:12.081262 7845 pod_workers.go:182] Error syncing pod a9826973-2674-11e8-a384-0a58ac1f03e3 ("pgset-0_default(a9826973-2674-11e8-a384-0a58ac1f03e3)"), skipping: timeout expired waiting for volumes to attach/mount for pod "default"/"pgset-0". list of unattached/unmounted volumes=[pgdata]
My question is whether there is a way to configure and ensure that the kubelet comes up by default with the openiscsi initiator utils installed and running
The following steps were followed to manually install iscsi-initiator in kubelet:
SSH into the Kubernetes Nodes
Identify the docker container running the kubelet using sudo docker ps.
Enter the kubelet container shell
sudo docker exec -it kubelet_container_id bash
Install open-iscsi.
apt-get update
apt install -y open-iscsi
Related
I am moving from Docker Desktop to Minikube and have been having some trouble in getting MetalLB to work properly. I am starting Minikube in MacOS Monterey.
I've started a Minikube profile using the command below:
minikube start -p myprofile --cpus=4 --memory='32g' --disk-size='100000mb'
--driver=hyperkit --kubernetes-version=v1.21.8 --addons=metallb
When I check the pods for MetalLB, they are in an ImagePullBackOff status. The pods are trying to pull images docker.io/metallb/controller:v0.9.6 and docker.io/metallb/speaker:v0.9.6 respectively.
NAME READY STATUS RESTARTS AGE
controller-5fd6788656-jvj4m 0/1 ImagePullBackOff 0 26m
speaker-ctdmw 0/1 ImagePullBackOff 0 37m
After running eval $(minikube -p myprofile docker-env) and manually pulling through docker pull docker.io/metallb/speaker:v0.9.6, I get the error:
Error response from daemon: Get "https://registry-1.docker.io/v2/": dial tcp: lookup registry-1.docker.io on <ip-address>:53: read udp <ip-address>:49978-><ip-address>:53: i/o timeout
I'm not certain if it's useful, but after SSHing into the Minikube node, I've also verified ping google.com does not return a result.
When starting my Minikube profile, I had the following output:
😄 [myprofile] minikube v1.28.0 on Darwin 12.3.1
🆕 Kubernetes 1.25.3 is now available. If you would like to upgrade, specify: --kubernetes-version=v1.25.3
✨ Using the hyperkit driver based on existing profile
👍 Starting control plane node myprofile in cluster myprofile
🔄 Restarting existing hyperkit VM for "myprofile" ...
❗ This VM is having trouble accessing https://k8s.gcr.io
💡 To pull new external images, you may need to configure a proxy: https://minikube.sigs.k8s.io/docs/reference/networking/proxy/
🐳 Preparing Kubernetes v1.21.8 on Docker 20.10.20 ...
🔎 Verifying Kubernetes components...
▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
▪ Using image metallb/speaker:v0.9.6
▪ Using image metallb/controller:v0.9.6
🌟 Enabled addons: storage-provisioner, metallb, default-storageclass
❗ /usr/local/bin/kubectl is version 1.25.4, which may have incompatibilities with Kubernetes 1.21.8.
▪ Want kubectl v1.21.8? Try 'minikube kubectl -- get pods -A'
🏄 Done! kubectl is now configured to use "myprofile" cluster and "default" namespace by default
When I reboot the master and work node, the pod of coredns show the below error message seem that it can not recreate kubelet after server restart.
Normal SandboxChanged 12s kubelet, izbp1dyjigsfwmw0dtl85gz Pod sandbox changed, it will be killed and re-created.
Warning FailedCreatePodSandBox 11s kubelet, izbp1dyjigsfwmw0dtl85gz Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "5e850ee3e8bf86688fec2badd9b0272127a0d775620a5783e7c30b4e0d412b01" network for pod "coredns-6955765f44-4xnhj": networkPlugin cni failed to set up pod "coredns-6955765f44-4xnhj_kube-system" network: open /run/flannel/subnet.env: no such file or directory
You can try cleaning up flannel and reinstalling it.
kubectl delete -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml
rm -rf /var/lib/cni/
rm -rf /run/flannel
rm -rf /etc/cni/
Remove interfaces related to and flannel:
ip link
For each interface flannel, do the following
ifconfig <name of interface from ip link> down
ip link delete <name of interface from ip link>
After this install flannel
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml
For flannel to work correctly, you must pass --pod-network-cidr=10.244.0.0/16 to kubeadm init
I am trying to setup my first cluster using Kubernetes 1.13.1. The master got initialized okay, but both of my worker nodes are NotReady. kubectl describe node shows that Kubelet stopped posting node status on both worker nodes. On one of the worker nodes I get log output like
> kubelet[3680]: E0107 20:37:21.196128 3680 kubelet.go:2266] node
> "xyz" not found.
Here is the full details:
I am using Centos 7 & Kubernetes 1.13.1.
Initializing was done as follows:
[root#master ~]# kubeadm init --apiserver-advertise-address=10.142.0.4 --pod-network-cidr=10.142.0.0/24
Successfully initialized the cluster:
You can now join any number of machines by running the following on each node
as root:
`kubeadm join 10.142.0.4:6443 --token y0epoc.zan7yp35sow5rorw --discovery-token-ca-cert-hash sha256:f02d43311c2696e1a73e157bda583247b9faac4ffb368f737ee9345412c9dea4`
deployed the flannel CNI:
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
The join command worked fine.
[kubelet-start] Activating the kubelet service
[tlsbootstrap] Waiting for the kubelet to perform the TLS Bootstrap...
[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "node01" as an annotation
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the master to see this node join the cluster.
Result of kubectl get nodes:
[root#master ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master Ready master 9h v1.13.1
node01 NotReady <none> 9h v1.13.1
node02 NotReady <none> 9h v1.13.1
on both nodes:
[root#node01 ~]# service kubelet status
Redirecting to /bin/systemctl status kubelet.service
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Tue 2019-01-08 04:49:20 UTC; 32s ago
Docs: https://kubernetes.io/docs/
Main PID: 4224 (kubelet)
Memory: 31.3M
CGroup: /system.slice/kubelet.service
└─4224 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfi
`Jan 08 04:54:10 node01 kubelet[4224]: E0108 04:54:10.957115 4224 kubelet.go:2266] node "node01" not found`
I appreciate your advise on how to troubleshoot this.
The previous answer sounds correct. You can verify that by running
kubectl describe node node01 on the master, or wherever kubectl is correctly configured.
It seems like the reason of this error is due to incorrect subnet. In Flannel documentation it is written that you should use /16 not /24 for pod network.
NOTE: If kubeadm is used, then pass --pod-network-cidr=10.244.0.0/16
to kubeadm init to ensure that the podCIDR is set.
I tried to run kubeadm with /24 and although I had nodes in Ready state the flannel pods did not run properly which resulted in some issues.
You can check if your flannel pods are running properly by:
kubectl get pods -n kube-system if the status is other than running then it is incorrect behavior. In this case you can check details by running kubectl describe pod PODNAME -n kube-system. Try changing the subnet and update us if that fixed the problem.
I ran into almost the same problem, and in the end I found that the reason was that the firewall was not turned off. You can try the following commands:
sudo ufw disable
or
systemctl disable firewalld
or
setenforce 0
I am following this tutorial with 2 vms running CentOS7. Everything looks fine (no errors during installation/setup) but I can't see my nodes.
NOTE:
I am running this on VMWare VMs
kub1 is my master and kub2 my worker node
kubectl get nodes output:
[root#kub1 ~]# kubectl cluster-info
Kubernetes master is running at http://kub1:8080
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
[root#kub2 ~]# kubectl cluster-info
Kubernetes master is running at http://kub1:8080
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
nodes:
[root#kub1 ~]# kubectl get nodes
[root#kub1 ~]# kubectl get nodes -a
[root#kub1 ~]#
[root#kub2 ~]# kubectl get nodes -a
[root#kub2 ~]# kubectl get no
[root#kub2 ~]#
cluster events:
[root#kub1 ~]# kubectl get events -a
LASTSEEN FIRSTSEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE
1h 1h 1 kub2.local Node Normal Starting {kube-proxy kub2.local} Starting kube-proxy.
1h 1h 1 kub2.local Node Normal Starting {kube-proxy kub2.local} Starting kube-proxy.
1h 1h 1 kub2.local Node Normal Starting {kubelet kub2.local} Starting kubelet.
1h 1h 1 node-kub2 Node Normal Starting {kubelet node-kub2} Starting kubelet.
1h 1h 1 node-kub2 Node Normal Starting {kubelet node-kub2} Starting kubelet.
/var/log/messages:
kubelet.go:1194] Unable to construct api.Node object for kubelet: can't get ip address of node node-kub2: lookup node-kub2: no such host
QUESTION: any idea why my nodes are not shown using "kubectl get nodes"?
My issue was that the KUBELET_HOSTNAME on /etc/kubernetes/kubeletvalue didn't match the hostname.
I commented that line, then restarted the services and I could see my worker after that.
hope that helps
Not sure about your scenario, but I have solved it after 3-4 hours of efforts.
Solved
I was facing this issue, because my docker cgroup driver was different than kubernetes cgroup driver.
Just updated it to cgroupfs using following commands mentioned in doc.
cat << EOF > /etc/docker/daemon.json
{
"exec-opts": ["native.cgroupdriver=cgroupfs"]
}
EOF
Restart docker service service docker restart.
Reset kubernetes on slave node: kubeadm reset
Joined master again: kubeadm join <><>
It was visible on master using kubectl get nodes.
I had a similar problem after installing k8s using kubespray on fedora31, and to debug the issue, tried to run a random container directly using docker run that failed with:
docker: Error response from daemon: cgroups: cgroup mountpoint does not exist: unknown.
this is a known problem cause by cgroup version on fedora 31, and the fix is to update grub to use the previous version:
sudo dnf install grubby
sudo grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=0"
I created the kubernetes cluster by using kubeadm kubeadm init.
I am getting error messages in /var/log/messages.
Oct 20 10:09:52 aws08 kubelet: I1020 10:09:52.015921 7116
docker_manager.go:1787] DNS ResolvConfPath exists:
/var/lib/docker/containers/717adf7a8481637ac20a9ba103d8f97635a88bf05f18bd4299f0d164e48f2920/resolv.conf.
Will attempt to add ndots option: options ndots:5 Oct 20 10:09:52
aws08 kubelet: I1020 10:09:52.015963 7116 docker_manager.go:2121]
Calling network plugin cni to setup pod for
kube-dns-2247936740-cjij4_kube-system(3b296413-96aa-11e6-8c40-02fff663a168)
Oct 20 10:09:52 aws08 kubelet: E1020 10:09:52.015982 7116
docker_manager.go:2127] Failed to setup network for pod
"kube-dns-2247936740-cjij4_kube-system(3b296413-96aa-11e6-8c40-02fff663a168)"
using network plugins "cni": cni config unintialized; Skipping pod Oct
20 10:09:52 aws08 kubelet: I1020 10:09:52.018824 7116
docker_manager.go:1492] Killing container
"717adf7a8481637ac20a9ba103d8f97635a88bf05f18bd4299f0d164e48f2920
kube-system/kube-dns-2247936740-cjij4" with 30 second grace period
The DNS pod is failing:
kube-system kube-dns-2247936740-j5rtc 0/3 ContainerCreating 21 1h
If I disabled CNI, the DNS pod is running. But the issue for DNS persists.
The method to disable cni is to comment the KUBELET_NETWORK_ARGS line in /etc/systemd/system/kubelet.service.d/10-kubeadm.conf and restart kubelet service
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--kubeconfig=/etc/kubernetes/kubelet.conf --require-kubeconfig=true"
Environment="KUBELET_SYSTEM_PODS_ARGS=--pod-manifest-path=/etc/kubernetes/manifests --allow-privileged=true"
# Environment="KUBELET_NETWORK_ARGS=--network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin"
Environment="KUBELET_DNS_ARGS=--cluster-dns=100.64.0.10 --cluster-domain=cluster.local"
Environment="KUBELET_EXTRA_ARGS=--v=4"
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_SYSTEM_PODS_ARGS $KUBELET_NETWORK_ARGS $KUBELET_DNS_ARGS $KUBELET_EXTRA_ARGS
followed by:
sudo systemctl restart kubelet
I'm guessing that you forgot to setup the pod network.
From the documentation:
It is necessary to do this before you try to deploy any applications to your cluster, and before kube-dns will start up. Note also that kubeadm only supports CNI based networks and therefore kubenet based networks will not work.
You can install a pod network add-on with the following command:
kubectl apply -f <add-on.yaml>
Example:
kubectl create -f https://git.io/weave-kube
To install Weave Net add-on.
After you have done this, you might need to recreate kube-dns pod.
The cni intialization should be completed during kubelet initialization. So try reboot kubelet service and make sure that cni configuration can be parsed correctly.