Description
I am relatively new to kubernetes. I can run my cluster when using the default socket (/var/run/dockershim.sock) but when I tried the crio socket to pull the images from my private repo I noticed the speed is not even close to compare with.
I am trying to configure all my nodes to use the crio.socket but I am failing to launch the master node with this socket.
I followed the documentation both from the kubernetes Configuring each kubelet in your cluster using kubeadm and also the git documentation cri-o.
Unfortunately I am not able to get it working as it seems to be ignoring the private repo flag.
Steps to reproduce the issue:
Launch a master node (prime) with the following init (using a private repo):
kubeadm init \
--upload-certs \
--cri-socket=/var/run/crio/crio.sock \
--node-name=my_node_name \
--image-repository=my.private.repo \
--pod-network-cidr=10.96.0.0/16 \
--kubernetes-version=v1.18.2 \
--control-plane-endpoint=ip:6443 \
--apiserver-cert-extra-sans=ip \
--apiserver-advertise-address=ip
Run as root or with sudo: journalctl -xeu crio -f
Observe in debug or info mode the logs sample below
Describe the results you received:
Sample of logs from crio in debug mode:
Jun 30 20:03:45 hostname crio[6693]: time="2020-06-30 20:03:45.043499089+02:00" level=debug msg="Trying to access \"k8s.gcr.io/pause:3.2\"" file="docker/docker_image_src.go:68"
Jun 30 20:03:45 hostname crio[6693]: time="2020-06-30 20:03:45.043547722+02:00" level=debug msg="Credentials not found" file="config/config.go:123"
Jun 30 20:03:45 hostname crio[6693]: time="2020-06-30 20:03:45.043576124+02:00" level=debug msg="Using registries.d directory /etc/containers/registries.d for sigstore configuration" file="docker/lookaside.go:51"
Jun 30 20:03:45 hostname crio[6693]: time="2020-06-30 20:03:45.043706369+02:00" level=debug msg=" Using \"default-docker\" configuration" file="docker/lookaside.go:169"
Jun 30 20:03:45 hostname crio[6693]: time="2020-06-30 20:03:45.043736378+02:00" level=debug msg=" No signature storage configuration found for k8s.gcr.io/pause:3.2" file="docker/lookaside.go:174"
Jun 30 20:03:45 hostname crio[6693]: time="2020-06-30 20:03:45.043769424+02:00" level=debug msg="Looking for TLS certificates and private keys in /etc/docker/certs.d/k8s.gcr.io" file="tlsclientconfig/tlsclientconfig.go:21"
Jun 30 20:03:45 hostname crio[6693]: time="2020-06-30 20:03:45.043858410+02:00" level=debug msg="GET https://k8s.gcr.io/v2/" file="docker/docker_client.go:516"
Jun 30 20:03:45 hostname crio[6693]: time="2020-06-30 20:03:45.046154250+02:00" level=debug msg="Ping https://k8s.gcr.io/v2/ err Get \"https://k8s.gcr.io/v2/\": dial tcp 10.254.3.15:443: connect: connection refused (&url.Error{Op:\"Get\", URL:\"https://k8s.gcr.io/v2/\", Err:(*net.OpError)(0xc00084d5e0)})" file="docker/docker_client.go:708"
Jun 30 20:03:45 hostname crio[6693]: time="2020-06-30 20:03:45.046239456+02:00" level=debug msg="GET https://k8s.gcr.io/v1/_ping" file="docker/docker_client.go:516"
Jun 30 20:03:45 hostname crio[6693]: time="2020-06-30 20:03:45.048653448+02:00" level=debug msg="Ping https://k8s.gcr.io/v1/_ping err Get \"https://k8s.gcr.io/v1/_ping\": dial tcp 10.254.3.15:443: connect: connection refused (&url.Error{Op:\"Get\", URL:\"https://k8s.gcr.io/v1/_ping\", Err:(*net.OpError)(0xc0006b0690)})" file="docker/docker_client.go:735"
Describe the results you expected:
Launching node with the use of crio socket
Additional information you deem important (e.g. issue happens only occasionally):
If I launch the node using the default socket e.g.:
# kubeadm init \
--upload-certs \
--cri-socket=/var/run/dockershim.sock \
--node-name=my_node_name \
--image-repository=my.private.repo \
--pod-network-cidr=10.96.0.0/16 \
--kubernetes-version=v1.18.2 \
--control-plane-endpoint=ip:6443 \
--apiserver-cert-extra-sans=ip \
--apiserver-advertise-address=ip
W0630 20:24:33.223266 29033 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
[init] Using Kubernetes version: v1.18.2
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Using existing ca certificate authority
[certs] Using existing apiserver certificate and key on disk
[certs] Using existing apiserver-kubelet-client certificate and key on disk
[certs] Using existing front-proxy-ca certificate authority
[certs] Using existing front-proxy-client certificate and key on disk
[certs] Using existing etcd/ca certificate authority
[certs] Using existing etcd/server certificate and key on disk
[certs] Using existing etcd/peer certificate and key on disk
[certs] Using existing etcd/healthcheck-client certificate and key on disk
[certs] Using existing apiserver-etcd-client certificate and key on disk
[certs] Using the existing "sa" key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/admin.conf"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/kubelet.conf"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/controller-manager.conf"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/scheduler.conf"
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
W0630 20:24:35.839949 29033 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
[control-plane] Creating static Pod manifest for "kube-scheduler"
W0630 20:24:35.841420 29033 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 11.003647 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.18" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[upload-certs] Using certificate key:
key
[mark-control-plane] Marking the node hostname as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node hostname as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: token
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of the control-plane node running the following command on each as root:
kubeadm join ip:6443 --token token \
--discovery-token-ca-cert-hash sha256:hash \
--control-plane --certificate-key key
Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
"kubeadm init phase upload-certs --upload-certs" to reload certs afterward.
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join ip:6443 --token token \
--discovery-token-ca-cert-hash sha256:hash
If I launch the node with crio socket:
# kubeadm init \
--upload-certs \
--cri-socket=/var/run/crio/crio.sock \
--node-name=my_node_name \
--image-repository=my.private.repo \
--pod-network-cidr=10.96.0.0/16 \
--kubernetes-version=v1.18.2 \
--control-plane-endpoint=ip:6443 \
--apiserver-cert-extra-sans=ip \
--apiserver-advertise-address=ip
W0630 20:32:33.827957 2916 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
[init] Using Kubernetes version: v1.18.2
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [hostname kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.96.134.57 10.96.134.57 10.96.134.57]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [hostname localhost] and IPs [10.96.134.57 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [hostname localhost] and IPs [10.96.134.57 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
W0630 20:32:37.829806 2916 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
[control-plane] Creating static Pod manifest for "kube-scheduler"
W0630 20:32:37.830826 2916 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all Kubernetes containers running in cri-o/containerd using crictl:
- 'crictl --runtime-endpoint /var/run/crio/crio.sock ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'crictl --runtime-endpoint /var/run/crio/crio.sock logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher
I can see the localhost is listening on port 10248:
# curl -sSL http://localhost:10248/healthz
ok
Sample of crio socket (as described in documentation):
# curl -v --unix-socket /var/run/crio/crio.sock http://localhost/info | jq
* About to connect() to localhost port 80 (#0)
* Trying /var/run/crio/crio.sock...
* Failed to set TCP_KEEPIDLE on fd 3
* Failed to set TCP_KEEPINTVL on fd 3
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Connected to localhost (/var/run/crio/crio.sock) port 80 (#0)
> GET /info HTTP/1.1
> User-Agent: curl/7.29.0
> Host: localhost
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: application/json
< Date: Tue, 30 Jun 2020 18:36:35 GMT
< Content-Length: 240
<
{ [data not shown]
100 240 100 240 0 0 144k 0 --:--:-- --:--:-- --:--:-- 234k
* Connection #0 to host localhost left intact
{
"storage_driver": "overlay2",
"storage_root": "/var/lib/containers/storage",
"cgroup_driver": "systemd",
"default_id_mappings": {
"uids": [
{
"container_id": 0,
"host_id": 0,
"size": 4294967295
}
],
"gids": [
{
"container_id": 0,
"host_id": 0,
"size": 4294967295
}
]
}
}
Output of kubelet status
# systemctl status kubelet -l
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Tue 2020-06-30 20:39:49 CEST; 6s ago
Docs: https://kubernetes.io/docs/
Main PID: 8502 (kubelet)
Tasks: 15
Memory: 20.1M
CGroup: /system.slice/kubelet.service
└─8502 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --hostname-override=hostname
Jun 30 20:39:55 hostname kubelet[8502]: I0630 20:39:55.369441 8502 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
Jun 30 20:39:55 hostname kubelet[8502]: I0630 20:39:55.399015 8502 kubelet_node_status.go:70] Attempting to register node hostname
Jun 30 20:39:55 hostname kubelet[8502]: E0630 20:39:55.403707 8502 kubelet.go:2267] node "hostname" not found
Jun 30 20:39:55 hostname kubelet[8502]: E0630 20:39:55.503871 8502 kubelet.go:2267] node "hostname" not found
Jun 30 20:39:55 hostname kubelet[8502]: E0630 20:39:55.604115 8502 kubelet.go:2267] node "hostname" not found
Jun 30 20:39:55 hostname kubelet[8502]: E0630 20:39:55.704324 8502 kubelet.go:2267] node "hostname" not found
Jun 30 20:39:55 hostname kubelet[8502]: E0630 20:39:55.769448 8502 kubelet_node_status.go:92] Unable to register node "hostname" with API server: Post https://ip:6443/api/v1/nodes: dial tcp ip:6443: connect: connection refused
Jun 30 20:39:55 hostname kubelet[8502]: E0630 20:39:55.805779 8502 kubelet.go:2267] node "hostname" not found
Jun 30 20:39:55 hostname kubelet[8502]: E0630 20:39:55.906014 8502 kubelet.go:2267] node "hostname" not found
Jun 30 20:39:56 hostname kubelet[8502]: E0630 20:39:56.007272 8502 kubelet.go:2267] node "hostname" not found
From the little that I know the network errors is not relevant as I have not yet launched the network container, so the errors are expected at this point.
Output of crio --version:
# crio --version
crio version 1.18.2
Version: 1.18.2
GitCommit: 7f261aeebffed079b4475dde8b9d602b01973d33
GitTreeState: clean
BuildDate: 2020-06-18T21:05:27Z
GoVersion: go1.14
Compiler: gc
Platform: linux/amd64
Linkmode: static
Output of kubelet --version:
# kubelet --version
Kubernetes v1.18.2
Output of LinuxOS version:
# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.8 (Maipo)
Additional environment details (AWS, VirtualBox, physical, etc.):
The installation is applied on barebone node.
kubelet file sample
# cat /etc/default/kubelet
KUBELET_EXTRA_ARGS=--feature-gates="AllAlpha=false,RunAsGroup=true" --container-runtime=remote --cgroup-driver=systemd --container-runtime-endpoint='unix:///var/run/crio/crio.sock' --runtime-request-timeout=5m
Update: I have raised a ticket in github Kubernetes v1.18.2 with crio version 1.18.2 failing to sync with kubelet on RH7 #3915. It looks that there is a bug as cri-o is not able to process the remote-repository and it is trying to pull the default repo k8s.io. I will update the ticket as soon as I have more information.
So the problem is not exactly a bug on CRI-O as we initially thought (also the CRI-O dev team) but it seems to be a lot of configurations that need to be applied if the user desires to use CRI-O as the CRI for kubernetes and also desire to use a private repo.
So I will not put here the configurations for the CRI-O as it is already documented on the ticket that I raised with the team
Kubernetes v1.18.2 with crio version 1.18.2 failing to sync with kubelet on RH7
#3915.
The first configuration that someone should apply is to configure the registries of the containers where the images will be pulled:
$ cat /etc/containers/registries.conf
[[registry]]
prefix = "k8s.gcr.io"
insecure = false
blocked = false
location = "k8s.gcr.io"
[[registry.mirror]]
location = "my.private.repo"
CRI-O recommends that this configuration should be passed as a flag to the kubelet (haircommander/cri-o-kubeadm) but for me it was not working with only this configuration.
I went back to the kubernetes manual and it is recommended not to pass the flag there for kubelet but to the file /var/lib/kubelet/config.yaml during run time. For me this is not possible as the node needs to start with the CRI-O socket and not any other socket (ref Configure cgroup driver used by kubelet on control-plane node).
So I managed to get it up and running by passing this flag on my config file sample below:
$ cat /tmp/config.yaml
apiVersion: kubeadm.k8s.io/v1beta2
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 1.2.3.4
bindPort: 6443
nodeRegistration:
criSocket: unix:///var/run/crio/crio.sock
name: node.name
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
---
apiServer:
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
type: CoreDNS
etcd:
local:
dataDir: /var/lib/etcd
controlPlaneEndpoint: 1.2.3.4:6443
imageRepository: my.private.repo
kind: ClusterConfiguration
kubernetesVersion: v1.18.2
networking:
dnsDomain: cluster.local
podSubnet: 10.85.0.0/16
serviceSubnet: 10.96.0.0/12
scheduler: {}
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd
Then simply the user can start the master / worker node with the flag --config <file.yml> and the node will be launched successfully.
Hope all the information here will help someone else.
I have setup a small cluster with kubeadm, it was working fine and 6443 port was up. But after rebooting my system, the cluster is not getting up anymore.
What should I do?
Here is some information:
systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Sun 2020-04-05 14:16:44 UTC; 6s ago
Docs: https://kubernetes.io/docs/home/
Main PID: 31079 (kubelet)
Tasks: 20 (limit: 4915)
CGroup: /system.slice/kubelet.service
└─31079 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet
k8s.io/kubernetes/pkg/kubelet/kubelet.go:458: Failed to list *v1.Node: Get https://infra01.mydomainname.com:6443/api/v1/nodes?fieldSelector=metadata.name%3Dtest-infra01&limit=500&resourceVersion=0: dial tcp 116.66.187.210:6443: connect: connection refused
kubectl get nodes
The connection to the server infra01.mydomainname.com:6443 was refused - did you specify the right host or port?
kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.3", GitCommit:"06ad960bfd03b39c8310aaf92d1e7c12ce618213", GitTreeState:"clean", BuildDate:"2020-02-11T18:12:12Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"linux/amd64"}
journalctl -xeu kubelet
6 18167 reflector.go:153] k8s.io/kubernetes/pkg/kubelet/kubelet.go:458:
Failed to list *v1.Node: Get https://infra01.mydomainname.com
1 18167 reflector.go:153]
k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:46: Failed to list *v1.Pod: Get https://huawei-infra01.s
4 18167 aws_credentials.go:77] while getting AWS credentials
NoCredentialProviders: no valid providers in chain. Deprecated.
messaging see aws.Config.CredentialsChainVerboseErrors
6 18167 kuberuntime_manager.go:211] Container runtime docker initialized,
version: 19.03.7, apiVersion: 1.40.0
6 18167 server.go:1113] Started kubelet
1 18167 kubelet.go:1302] Image garbage collection failed once. Stats
initialization may not have completed yet: failed to get imageF
8 18167 server.go:144] Starting to listen on 0.0.0.0:10250
4 18167 server.go:778] Starting healthz server failed: listen tcp
127.0.0.1:10248: bind: address already in use
5 18167 fs_resource_analyzer.go:64] Starting FS ResourceAnalyzer
4 18167 volume_manager.go:265] Starting Kubelet Volume Manager
1 18167 desired_state_of_world_populator.go:138] Desired state populator
starts to run
3 18167 server.go:384] Adding debug handlers to kubelet server.
4 18167 server.go:158] listen tcp 0.0.0.0:10250: bind: address already in
use
Docker
docker run hello-world
Hello from Docker!
ubuntu
lsb_release -a
Ubuntu 18.04.2 LTS
swap && kubeconfig
swap is turned off and kubeconfig was correctly exported
Note
Things can be fixed by resetting the cluster, but this should be the final option.
Kubelet is not started because of port already in use and hence not able to create pod for api server.
Use following command to find out which process is holding the port 10250
root#master admin]# ss -lntp | grep 10250
LISTEN 0 128 :::10250 :::* users:(("kubelet",pid=23373,fd=20))
It will give you PID of that process and name of that process. If it is unwanted process which is holding the port, you can always kill the process and that port becomes available to use by kubelet.
After killing the process again run the above command, it should return no value.
Just to be on safe side run kubeadm reset and then run kubeadm init and it should go through
Edit:
Using snap stop kubelet did the trick of stopping kubelet on the node.
I'm running debian 9.6. with uboot on beaglebone black, 32bit ARM system.
Image downloaded from:
https://rcn-ee.com/rootfs/bb.org/testing/2019-01-06/stretch-console/bone-debian-9.6-console-armhf-2019-01-06-1gb.img.xz
Kernel command params:
Jan 17 14:19:19 bbb-test kernel: Kernel command line: console=ttyO0,115200n8 bone_capemgr.enable_partno=BB-UA
RT1,BB-UART4,BB-UART5 bone_capemgr.uboot_capemgr_enabled=1 root=/dev/mmcblk1p1 ro rootfstype=ext4 rootwait coherent_p
ool=1M net.ifnames=0 quiet cape_universal=enable cgroup_enable=cpuset cgroup_enable=memory cgroup_memory=1 swapaccoun
t=1
debian#bbb-test:~$ uname -a
Linux bbb-test 4.14.79-ti-r84 #1 SMP PREEMPT Tue Nov 13 20:11:18 UTC 2018 armv7l GNU/Linux
Docker is installed and working on this machine. Kubelet log:
debian#bbb-test:~$ sudo kubelet
I0117 14:31:25.972837 2841 server.go:407] Version: v1.13.2
I0117 14:31:25.982041 2841 plugins.go:103] No cloud provider specified.
W0117 14:31:25.995051 2841 server.go:552] standalone mode, no API client
E0117 14:31:26.621471 2841 machine.go:194] failed to get cache information for node 0: open /sys/devices/system/cpu/cpu0/cache: no such file or directory
W0117 14:31:26.655651 2841 server.go:464] No api server defined - no events will be sent to API server.
I0117 14:31:26.657953 2841 server.go:666] --cgroups-per-qos enabled, but --cgroup-root was not specified. defaulting to /
I0117 14:31:26.664398 2841 container_manager_linux.go:248] container manager verified user specified cgroup-root exists: []
I0117 14:31:26.666142 2841 container_manager_linux.go:253] Creating Container Manager object based on Node Config: {RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName: ContainerRuntime:docker CgroupsPerQOS:true CgroupRoot:/ CgroupDriver:cgroupfs KubeletRootDir:/var/lib/kubelet ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: EnforceNodeAllocatable:map[pods:{}] KubeReserved:map[] SystemReserved:map[] HardEvictionThresholds:[{Signal:memory.available Operator:LessThan Value:{Quantity:100Mi Percentage:0} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.1} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.inodesFree Operator:LessThan Value:{Quantity:<nil> Percentage:0.05} GracePeriod:0s MinReclaim:<nil>} {Signal:imagefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.15} GracePeriod:0s MinReclaim:<nil>}]} QOSReserved:map[] ExperimentalCPUManagerPolicy:none ExperimentalCPUManagerReconcilePeriod:10s ExperimentalPodPidsLimit:-1 EnforceCPULimits:true CPUCFSQuotaPeriod:100ms}
I0117 14:31:26.675196 2841 container_manager_linux.go:272] Creating device plugin manager: true
I0117 14:31:26.676961 2841 state_mem.go:36] [cpumanager] initializing new in-memory state store
I0117 14:31:26.680692 2841 state_mem.go:84] [cpumanager] updated default cpuset: ""
I0117 14:31:26.682789 2841 state_mem.go:92] [cpumanager] updated cpuset assignments: "map[]"
I0117 14:31:26.760365 2841 client.go:75] Connecting to docker on unix:///var/run/docker.sock
I0117 14:31:26.762342 2841 client.go:104] Start docker client with request timeout=2m0s
W0117 14:31:26.820114 2841 docker_service.go:540] Hairpin mode set to "promiscuous-bridge" but kubenet is not enabled, falling back to "hairpin-veth"
I0117 14:31:26.821450 2841 docker_service.go:236] Hairpin mode set to "hairpin-veth"
W0117 14:31:26.832217 2841 cni.go:203] Unable to update cni config: No networks found in /etc/cni/net.d
W0117 14:31:26.932660 2841 hostport_manager.go:68] The binary conntrack is not installed, this can cause failures in network connection cleanup.
I0117 14:31:26.980052 2841 docker_service.go:251] Docker cri networking managed by kubernetes.io/no-op
I0117 14:31:27.241438 2841 docker_service.go:256] Docker Info: &{ID:YHXC:3CJD:MPM6:SQ6I:37PA:E7CY:YG32:EQZH:XKFS:5FLV:OHRN:UYQS Containers:0 ContainersRunning:0 ContainersPaused:0 ContainersStopped:0 Images:0 Driver:overlay2 DriverStatus:[[Backing Filesystem extfs] [Supports d_type true] [Native Overlay Diff true]] SystemStatus:[] Plugins:{Volume:[local] Network:[bridge host macvlan null overlay] Authorization:[] Log:[awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog]} MemoryLimit:true SwapLimit:true KernelMemory:true CPUCfsPeriod:true CPUCfsQuota:true CPUShares:true CPUSet:false IPv4Forwarding:true BridgeNfIptables:true BridgeNfIP6tables:true Debug:false NFd:24 OomKillDisable:true NGoroutines:44 SystemTime:2019-01-17T14:31:27.033459842Z LoggingDriver:json-file CgroupDriver:cgroupfs NEventsListener:0 KernelVersion:4.14.79-ti-r84 OperatingSystem:Debian GNU/Linux 9 (stretch) OSType:linux Architecture:armv7l IndexServerAddress:https://index.docker.io/v1/ RegistryConfig:0x65be080 NCPU:1 MemTotal:506748928 GenericResources:[] DockerRootDir:/var/lib/docker HTTPProxy: HTTPSProxy: NoProxy: Name:bbb-test Labels:[] ExperimentalBuild:false ServerVersion:18.06.1-ce ClusterStore: ClusterAdvertise: Runtimes:map[runc:{Path:docker-runc Args:[]}] DefaultRuntime:runc Swarm:{NodeID: NodeAddr: LocalNodeState:inactive ControlAvailable:false Error: RemoteManagers:[] Nodes:0 Managers:0 Cluster:<nil>} LiveRestoreEnabled:false Isolation: InitBinary:docker-init ContainerdCommit:{ID:468a545b9edcd5932818eb9de8e72413e616e86e Expected:468a545b9edcd5932818eb9de8e72413e616e86e} RuncCommit:{ID:69663f0bd4b60df09991c08812a60108003fa340 Expected:69663f0bd4b60df09991c08812a60108003fa340} InitCommit:{ID:fec3683 Expected:fec3683} SecurityOptions:[name=seccomp,profile=default]}
I0117 14:31:27.248202 2841 docker_service.go:269] Setting cgroupDriver to cgroupfs
I0117 14:31:27.558049 2841 kuberuntime_manager.go:198] Container runtime docker initialized, version: 18.06.1-ce, apiVersion: 1.38.0
I0117 14:31:27.611546 2841 server.go:999] Started kubelet
E0117 14:31:27.623092 2841 kubelet.go:1308] Image garbage collection failed once. Stats initialization may not have completed yet: failed to get imageFs info: unable to find data in memory cache
W0117 14:31:27.623253 2841 kubelet.go:1412] No api server defined - no node status update will be sent.
I0117 14:31:27.644045 2841 fs_resource_analyzer.go:66] Starting FS ResourceAnalyzer
I0117 14:31:27.650728 2841 status_manager.go:148] Kubernetes client is nil, not starting status manager.
I0117 14:31:27.651717 2841 kubelet.go:1829] Starting kubelet main sync loop.
I0117 14:31:27.652815 2841 kubelet.go:1846] skipping pod synchronization - [container runtime status check may not have completed yet PLEG is not healthy: pleg has yet to be successful]
I0117 14:31:27.623347 2841 server.go:137] Starting to listen on 0.0.0.0:10250
I0117 14:31:27.671780 2841 server.go:333] Adding debug handlers to kubelet server.
I0117 14:31:27.684102 2841 volume_manager.go:248] Starting Kubelet Volume Manager
I0117 14:31:27.713834 2841 desired_state_of_world_populator.go:130] Desired state populator starts to run
I0117 14:31:27.808299 2841 kubelet.go:1846] skipping pod synchronization - [container runtime status check may not have completed yet PLEG is not healthy: pleg has yet to be successful]
I0117 14:31:27.943782 2841 reconciler.go:154] Reconciler: start to sync state
I0117 14:31:28.051598 2841 kubelet.go:1846] skipping pod synchronization - [container runtime status check may not have completed yet]
W0117 14:31:28.415337 2841 nvidia.go:66] Error reading "/sys/bus/pci/devices/": open /sys/bus/pci/devices/: no such file or directory
I0117 14:31:28.496333 2841 kubelet.go:1846] skipping pod synchronization - [container runtime status check may not have completed yet]
I0117 14:31:29.362011 2841 kubelet.go:1846] skipping pod synchronization - [container runtime status check may not have completed yet]
W0117 14:31:29.477526 2841 container.go:409] Failed to create summary reader for "/system.slice/system-openvpn.slice/openvpn#bbb.service": none of the resources are being tracked.
I0117 14:31:30.999869 2841 kubelet.go:1846] skipping pod synchronization - [container runtime status check may not have completed yet]
I0117 14:31:31.131276 2841 kubelet_node_status.go:278] Setting node annotation to enable volume controller attach/detach
I0117 14:31:31.184490 2841 cpu_manager.go:155] [cpumanager] starting with none policy
I0117 14:31:31.195065 2841 cpu_manager.go:156] [cpumanager] reconciling every 10s
I0117 14:31:31.196076 2841 policy_none.go:42] [cpumanager] none policy: Start
F0117 14:31:31.199910 2841 kubelet.go:1384] Failed to start ContainerManager system validation failed - Following Cgroup subsystem not mounted: [cpuset]
d
The issue persists with valid config and proper kubeconfig.
EDIT: additional data
cgroups are kinda mounted:
debian#bbb-test:~$ mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
udev on /dev type devtmpfs (rw,nosuid,relatime,size=220140k,nr_inodes=55035,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,noexec,relatime,size=49492k,mode=755)
/dev/mmcblk1p1 on / type ext4 (rw,noatime,errors=remount-ro,data=ordered)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=34,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=15992)
mqueue on /dev/mqueue type mqueue (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
configfs on /sys/kernel/config type configfs (rw,relatime)
tmpfs on /run/user/1000 type tmpfs (rw,nosuid,nodev,relatime,size=49488k,mode=700,uid=1000,gid=1000)
debian#bbb-test:~$ systemctl status 'cg*'
debian#bbb-test:~$
I've updated to 4.19 kernel and full system package upgrade and it resolved the problem. That's kinda good enough for me
I was facing the same error on Ubuntu 20.04 (kernel 5.14), Docker 20.10.17, minikube 1.26.1:
Failed to start ContainerManager system validation failed - Following Cgroup subsystem not mounted: [cpuset]
Note of the solutions above (or this, or this) worked for me.
Eventually, I figured out it had to do with Docker being configured to use a rootless context. docker context use default fixed it.
I suggest looking at https://unix.stackexchange.com/questions/427327/how-can-i-check-if-cgroups-are-available-on-my-linux-host to check if your cgroup activation works as expected.
cgroup_enable=cpuset as a parameter for kernel should be enough unless there's something else that's missing (eg: maybe it's in the wrong place?)
kubeadm init seems to be hanging when I started using vsphere cloud provider. Followed instructions from here - Anybody got it working with 1.9?
root#master-0:~# kubeadm init --config /tmp/kube.yaml
[init] Using Kubernetes version: v1.9.1
[init] Using Authorization modes: [Node RBAC]
[preflight] Running pre-flight checks.
[WARNING Hostname]: hostname "master-0" could not be reached
[WARNING Hostname]: hostname "master-0" lookup master-0 on 8.8.8.8:53: no such host
[WARNING FileExisting-crictl]: crictl not found in system path
[preflight] Starting the kubelet service
[certificates] Generated ca certificate and key.
[certificates] Generated apiserver certificate and key.
[certificates] apiserver serving cert is signed for DNS names [master-0 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.11.0.101]
[certificates] Generated apiserver-kubelet-client certificate and key.
[certificates] Generated sa key and public key.
[certificates] Generated front-proxy-ca certificate and key.
[certificates] Generated front-proxy-client certificate and key.
[certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "scheduler.conf"
[controlplane] Wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml"
[controlplane] Wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
[controlplane] Wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests".
[init] This might take a minute or longer if the control plane images have to be pulled.
Master os details
root#master-0:~# uname -r
4.4.0-21-generic
root#master-0:~# docker version
Client:
Version: 17.03.2-ce
API version: 1.27
Go version: go1.7.5
Git commit: f5ec1e2
Built: Tue Jun 27 03:35:14 2017
OS/Arch: linux/amd64
Server:
Version: 17.03.2-ce
API version: 1.27 (minimum version 1.12)
Go version: go1.7.5
Git commit: f5ec1e2
Built: Tue Jun 27 03:35:14 2017
OS/Arch: linux/amd64
Experimental: false
root#master-0:~# cat /etc/os-release
NAME="Ubuntu"
VERSION="16.04 LTS (Xenial Xerus)"
ID=ubuntu
kubelet service seems to be running fine
root#master-0:~# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf, 90-local-extras.conf
Active: active (running) since Mon 2018-01-22 11:25:00 UTC; 13min ago
Docs: http://kubernetes.io/docs/
Main PID: 4270 (kubelet)
Tasks: 13 (limit: 512)
Memory: 37.6M
CPU: 11.626s
CGroup: /system.slice/kubelet.service
└─4270 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeco
nfig=/etc/kubernetes/kubelet.conf --pod-manifest-path=/etc/kubernetes/manifests --allow-privileged=true
--network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin --cluster-dns=10.96.0.10
--cluster-domain=cluster.local --authorization-mode=Webhook --client-ca-file=/etc/kubernetes/pki/ca.cr
t --cadvisor-port=0 --rotate-certificates=true --cert-dir=/var/lib/kubelet/pki
journalctl -f -u kubelet has some connection refused errors which probably networking service is missing. Those errors should go away when networking service is installed post kubeadm init
Jan 22 11:17:45 localhost kubelet[1184]: I0122 11:17:45.759764 1184 feature_gate.go:220] feature gat
es: &{{} map[]}
Jan 22 11:17:45 localhost kubelet[1184]: I0122 11:17:45.761350 1184 controller.go:114] kubelet confi
g controller: starting controller
Jan 22 11:17:45 localhost kubelet[1184]: I0122 11:17:45.762632 1184 controller.go:118] kubelet confi
g controller: validating combination of defaults and flags
Jan 22 11:17:46 localhost systemd[1]: Started Kubernetes systemd probe.
Jan 22 11:17:46 localhost kubelet[1184]: W0122 11:17:46.070619 1184 cni.go:171] Unable to update cni
config: No networks found in /etc/cni/net.d
Jan 22 11:17:46 localhost kubelet[1184]: I0122 11:17:46.081384 1184 server.go:182] Version: v1.9.1
Jan 22 11:17:46 localhost kubelet[1184]: I0122 11:17:46.081417 1184 feature_gate.go:220] feature gat
es: &{{} map[]}
Jan 22 11:17:46 localhost kubelet[1184]: I0122 11:17:46.082271 1184 plugins.go:101] No cloud provide
r specified.
Jan 22 11:17:46 localhost kubelet[1184]: error: failed to run Kubelet: unable to load bootstrap kubecon
fig: stat /etc/kubernetes/bootstrap-kubelet.conf: no such file or directory
Jan 22 11:17:46 localhost systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILU
RE
Jan 22 11:17:46 localhost systemd[1]: kubelet.service: Unit entered failed state.
Jan 22 11:17:46 localhost systemd[1]: kubelet.service: Failed with result 'exit-code'.
Jan 22 11:17:56 localhost systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
Jan 22 11:17:56 localhost systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Jan 22 11:17:56 localhost systemd[1]: Started kubelet: The Kubernetes Node Agent.
Jan 22 11:17:56 localhost kubelet[1229]: I0122 11:17:56.410883 1229 feature_gate.go:220] feature gat
es: &{{} map[]}
Jan 22 11:17:56 localhost kubelet[1229]: I0122 11:17:56.411198 1229 controller.go:114] kubelet confi
g controller: starting controller
Jan 22 11:17:56 localhost kubelet[1229]: I0122 11:17:56.411353 1229 controller.go:118] kubelet confi
g controller: validating combination of defaults and flags
Jan 22 11:17:56 localhost systemd[1]: Started Kubernetes systemd probe.
Jan 22 11:17:56 localhost kubelet[1229]: W0122 11:17:56.424264 1229 cni.go:171] Unable to update cni
config: No networks found in /etc/cni/net.d
Jan 22 11:17:56 localhost kubelet[1229]: I0122 11:17:56.429102 1229 server.go:182] Version: v1.9.1
Jan 22 11:17:56 localhost kubelet[1229]: I0122 11:17:56.429156 1229 feature_gate.go:220] feature gat
es: &{{} map[]}
Jan 22 11:17:56 localhost kubelet[1229]: I0122 11:17:56.429247 1229 plugins.go:101] No cloud provide
r specified.
Jan 22 11:17:56 localhost kubelet[1229]: E0122 11:17:56.461608 1229 certificate_manager.go:314] Fail
ed while requesting a signed certificate from the master: cannot create certificate signing request: Po
st https://10.11.0.101:6443/apis/certificates.k8s.io/v1beta1/certificatesigningrequests: dial tcp 10.11
.0.101:6443: getsockopt: connection refused
Jan 22 11:17:56 localhost kubelet[1229]: I0122 11:17:56.491374 1229 server.go:428] --cgroups-per-qos
enabled, but --cgroup-root was not specified. defaulting to /
Jan 22 11:17:56 localhost kubelet[1229]: I0122 11:17:56.492069 1229 container_manager_linux.go:242]
container manager verified user specified cgroup-root exists: /
Jan 22 11:17:56 localhost kubelet[1229]: I0122 11:17:56.492102 1229 container_manager_linux.go:247]
Creating Container Manager object based on Node Config: {RuntimeCgroupsName: SystemCgroupsName: Kubelet
CgroupsName: ContainerRuntime:docker CgroupsPerQOS:true CgroupRoot:/ CgroupDriver:cgroupfs KubeletRootD
ir:/var/lib/kubelet ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemRe
servedCgroupName: EnforceNodeAllocatable:map[pods:{}] KubeReserved:map[] SystemReserved:map[] HardEvict
ionThresholds:[{Signal:memory.available Operator:LessThan Value:{Quantity:100Mi Percentage:0} GracePeri
od:0s MinReclaim:<nil>} {Signal:nodefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.1
} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.inodesFree Operator:LessThan Value:{Quantity:<nil> Pe
rcentage:0.05} GracePeriod:0s MinReclaim:<nil>} {Signal:imagefs.available Operator:LessThan Value:{Quan
tity:<nil> Percentage:0.15} GracePeriod:0s MinReclaim:<nil>}]} ExperimentalQOSReserved:map[] Experiment
alCPUManagerPolicy:none ExperimentalCPUManagerReconcilePeriod:10s}
docker ps, controller & scheduler logs
root#master-0:~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
6db549891439 677911f7ae8f "kube-scheduler --..." About an hour ago Up About an hour k8s_kube-scheduler_kube-scheduler-master-0_kube-system_df32e281019039e73be77e3f53d09596_0
4f7ddefbd86e 4978f9a64966 "kube-controller-m..." About an hour ago Up About an hour k8s_kube-controller-manager_kube-controller-manager-master-0_kube-system_34bad395be69e74db6304d6c4218c536_0
18604db89db6 gcr.io/google_containers/pause-amd64:3.0 "/pause" About an hour ago Up About an hour k8s_POD_kube-scheduler-master-0_kube-system_df32e281019039e73be77e3f53d09596_0
252b86ea4b5e gcr.io/google_containers/pause-amd64:3.0 "/pause" About an hour ago Up About an hour k8s_POD_kube-controller-manager-master-0_kube-system_34bad395be69e74db6304d6c4218c536_0
4021061bf8a6 gcr.io/google_containers/pause-amd64:3.0 "/pause" About an hour ago Up About an hour k8s_POD_kube-apiserver-master-0_kube-system_7a3ae9279d0ca7b4ada8333fbe7442ed_0
4f94163d313b gcr.io/google_containers/etcd-amd64:3.1.10 "etcd --name=etcd0..." About an hour ago Up About an hour 0.0.0.0:2379-2380->2379-2380/tcp, 0.0.0.0:4001->4001/tcp, 7001/tcp etcd
root#master-0:~# docker logs -f 4f7ddefbd86e
I0122 11:25:06.253706 1 controllermanager.go:108] Version: v1.9.1
I0122 11:25:06.258712 1 leaderelection.go:174] attempting to acquire leader lease...
E0122 11:25:06.259448 1 leaderelection.go:224] error retrieving resource lock kube-system/kube-controller-manager: Get https://10.11.0.101:6443/api/v1/namespaces/kube-system/endpoints/kube-controller-manager: dial tcp 10.11.0.101:6443: getsockopt: connection refused
E0122 11:25:09.711377 1 leaderelection.go:224] error retrieving resource lock kube-system/kube-controller-manager: Get https://10.11.0.101:6443/api/v1/namespaces/kube-system/endpoints/kube-controller-manager: dial tcp 10.11.0.101:6443: getsockopt: connection refused
E0122 11:25:13.969270 1 leaderelection.go:224] error retrieving resource lock kube-system/kube-controller-manager: Get https://10.11.0.101:6443/api/v1/namespaces/kube-system/endpoints/kube-controller-manager: dial tcp 10.11.0.101:6443: getsockopt: connection refused
E0122 11:25:17.564964 1 leaderelection.go:224] error retrieving resource lock kube-system/kube-controller-manager: Get https://10.11.0.101:6443/api/v1/namespaces/kube-system/endpoints/kube-controller-manager: dial tcp 10.11.0.101:6443: getsockopt: connection refused
E0122 11:25:20.616174 1 leaderelection.go:224] error retrieving resource lock kube-system/kube-controller-manager: Get https://10.11.0.101:6443/api/v1/namespaces/kube-system/endpoints/kube-controller-manager: dial tcp 10.11.0.101:6443: getsockopt: connection refused
root#master-0:~# docker logs -f 6db549891439
W0122 11:25:06.285765 1 server.go:159] WARNING: all flags than --config are deprecated. Please begin using a config file ASAP.
I0122 11:25:06.292865 1 server.go:551] Version: v1.9.1
I0122 11:25:06.295776 1 server.go:570] starting healthz server on 127.0.0.1:10251
E0122 11:25:06.295947 1 reflector.go:205] k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1beta1.ReplicaSet: Get https://10.11.0.101:6443/apis/extensions/v1beta1/replicasets?limit=500&resourceVersion=0: dial tcp 10.11.0.101:6443: getsockopt: connection refused
E0122 11:25:06.296027 1 reflector.go:205] k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1.ReplicationController: Get https://10.11.0.101:6443/api/v1/replicationcontrollers?limit=500&resourceVersion=0: dial tcp 10.11.0.101:6443: getsockopt: connection refused
E0122 11:25:06.296092 1 reflector.go:205] k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1.Service: Get https://10.11.0.101:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.11.0.101:6443: getsockopt: connection refused
E0122 11:25:06.296160 1 reflector.go:205] k8s.io/kubernetes/plugin/cmd/kube-scheduler/app/server.go:590: Failed to list *v1.Pod: Get https://10.11.0.101:6443/api/v1/pods?fieldSelector=spec.schedulerName%3Ddefault-scheduler%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded&limit=500&resourceVersion=0: dial tcp 10.11.0.101:6443: getsockopt: connection refused
E0122 11:25:06.296218 1 reflector.go:205] k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1beta1.StatefulSet: Get https://10.11.0.101:6443/apis/apps/v1beta1/statefulsets?limit=500&resourceVersion=0: dial tcp 10.11.0.101:6443: getsockopt: connection refused
E0122 11:25:06.297374 1 reflector.go:205] k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1.PersistentVolume: Get https://10.11.0.101:6443/api/v1/persistentvolumes?limit=500&resourceVersion=0: dial tcp 10.11.0.101:6443: getsockopt: connection refused
There was a bug in the controller manager when starting with the vsphere cloud provider. See https://github.com/kubernetes/kubernetes/issues/57279, fixed in 1.9.2
Follow this guide to install Kubernetes:
https://www.linuxtechi.com/install-kubernetes-1-7-centos7-rhel7/
When went to kubeadm init step, got error:
$ kubeadm init --skip-preflight-checks
[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[init] Using Kubernetes version: v1.8.3
[init] Using Authorization modes: [Node RBAC]
[preflight] Skipping pre-flight checks
[kubeadm] WARNING: starting in 1.8, tokens expire after 24 hours by default (if you require a non-expiring token use --token-ttl 0)
[certificates] Using the existing ca certificate and key.
[certificates] Using the existing apiserver certificate and key.
[certificates] Using the existing apiserver-kubelet-client certificate and key.
[certificates] Using the existing sa key.
[certificates] Using the existing front-proxy-ca certificate and key.
[certificates] Using the existing front-proxy-client certificate and key.
[certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[kubeconfig] Using existing up-to-date KubeConfig file: "admin.conf"
[kubeconfig] Using existing up-to-date KubeConfig file: "kubelet.conf"
[kubeconfig] Using existing up-to-date KubeConfig file: "controller-manager.conf"
[kubeconfig] Using existing up-to-date KubeConfig file: "scheduler.conf"
[controlplane] Wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml"
[controlplane] Wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
[controlplane] Wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"
[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests"
[init] This often takes around a minute; or longer if the control plane images have to be pulled.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by that:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
- There is no internet connection; so the kubelet can't pull the following control plane images:
- gcr.io/google_containers/kube-apiserver-amd64:v1.8.3
- gcr.io/google_containers/kube-controller-manager-amd64:v1.8.3
- gcr.io/google_containers/kube-scheduler-amd64:v1.8.3
You can troubleshoot this for example with the following commands if you're on a systemd-powered system:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
couldn't initialize a Kubernetes cluster
When check systemctl status kubelet:
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: activating (auto-restart) (Result: exit-code) since Fri 2017-11-10 05:34:12 UTC; 6s ago
Docs: http://kubernetes.io/docs/
Process: 29927 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_SYSTEM_PODS_ARGS $KUBELET_NETWORK_ARGS $KUBELET_DNS_ARGS $KUBELET_AUTHZ_ARGS $KUBELET_CADVISOR_ARGS $KUBELET_CGROUP_ARGS $KUBELET_CERTIFICATE_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=1/FAILURE)
Main PID: 29927 (code=exited, status=1/FAILURE)
Nov 10 05:34:12 master systemd[1]: kubelet.service: main process exited, code=exited, status=1/FAILURE
Nov 10 05:34:12 master systemd[1]: Unit kubelet.service entered failed state.
Nov 10 05:34:12 master systemd[1]: kubelet.service failed.
When check journalctl -xeu kubelet:
Nov 10 05:35:15 master systemd[1]: kubelet.service holdoff time over, scheduling restart.
Nov 10 05:35:15 master systemd[1]: Started kubelet: The Kubernetes Node Agent.
-- Subject: Unit kubelet.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit kubelet.service has finished starting up.
--
-- The start-up result is done.
Nov 10 05:35:15 master systemd[1]: Starting kubelet: The Kubernetes Node Agent...
-- Subject: Unit kubelet.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit kubelet.service has begun starting up.
Nov 10 05:35:15 master kubelet[30174]: I1110 05:35:15.364837 30174 feature_gate.go:156] feature gates: map[]
Nov 10 05:35:15 master kubelet[30174]: I1110 05:35:15.364917 30174 controller.go:114] kubelet config controller: starting controller
Nov 10 05:35:15 master kubelet[30174]: I1110 05:35:15.364921 30174 controller.go:118] kubelet config controller: validating combination of defaults and flags
Nov 10 05:35:15 master kubelet[30174]: I1110 05:35:15.375149 30174 client.go:75] Connecting to docker on unix:///var/run/docker.sock
Nov 10 05:35:15 master kubelet[30174]: I1110 05:35:15.375226 30174 client.go:95] Start docker client with request timeout=2m0s
Nov 10 05:35:15 master kubelet[30174]: W1110 05:35:15.377200 30174 cni.go:196] Unable to update cni config: No networks found in /etc/cni/net.d
Nov 10 05:35:15 master kubelet[30174]: I1110 05:35:15.382890 30174 feature_gate.go:156] feature gates: map[]
Nov 10 05:35:15 master kubelet[30174]: W1110 05:35:15.383011 30174 server.go:289] --cloud-provider=auto-detect is deprecated. The desired cloud provider should be set explicitly
Nov 10 05:35:15 master kubelet[30174]: I1110 05:35:15.408678 30174 certificate_manager.go:361] Requesting new certificate.
Nov 10 05:35:15 master kubelet[30174]: E1110 05:35:15.409287 30174 certificate_manager.go:284] Failed while requesting a signed certificate from the master: cannot create certificate signing request: Post https://10.0.2.15:6443/apis/certificates.k8s.io/v1beta1/certifica
Nov 10 05:35:15 master kubelet[30174]: I1110 05:35:15.411480 30174 manager.go:149] cAdvisor running in container: "/sys/fs/cgroup/cpu,cpuacct/system.slice/kubelet.service"
Nov 10 05:35:15 master kubelet[30174]: W1110 05:35:15.425796 30174 manager.go:157] unable to connect to Rkt api service: rkt: cannot tcp Dial rkt api service: dial tcp [::1]:15441: getsockopt: connection refused
Nov 10 05:35:15 master kubelet[30174]: W1110 05:35:15.426006 30174 manager.go:166] unable to connect to CRI-O api service: Get http://%2Fvar%2Frun%2Fcrio.sock/info: dial unix /var/run/crio.sock: connect: no such file or directory
Nov 10 05:35:15 master kubelet[30174]: I1110 05:35:15.440364 30174 fs.go:139] Filesystem UUIDs: map[4537d533-47ff-463c-bffc-7ce294d9c93a:/dev/dm-1 598bbfb9-027e-4f52-a5b3-c4d3d1fbc2b8:/dev/dm-0 8ffa0ee9-e1a8-4c03-acce-b65b342c6935:/dev/sda2]
Nov 10 05:35:15 master kubelet[30174]: I1110 05:35:15.440395 30174 fs.go:140] Filesystem partitions: map[tmpfs:{mountpoint:/dev/shm major:0 minor:17 fsType:tmpfs blockSize:0} /dev/mapper/VolGroup00-LogVol00:{mountpoint:/var/lib/docker/overlay major:253 minor:0 fsType:xf
Nov 10 05:35:15 master kubelet[30174]: I1110 05:35:15.441589 30174 manager.go:216] Machine: {NumCores:1 CpuFrequency:3100000 MemoryCapacity:1040621568 HugePages:[{PageSize:2048 NumPages:0}] MachineID:a0b78b0170c248288e172d5196d59063 SystemUUID:A0B78B01-70C2-4828-8E17-2D
Nov 10 05:35:15 master kubelet[30174]: I1110 05:35:15.446544 30174 manager.go:222] Version: {KernelVersion:3.10.0-693.5.2.el7.x86_64 ContainerOsVersion:CentOS Linux 7 (Core) DockerVersion:17.09.0-ce DockerAPIVersion:1.32 CadvisorVersion: CadvisorRevision:}
Nov 10 05:35:15 master kubelet[30174]: I1110 05:35:15.447201 30174 server.go:422] --cgroups-per-qos enabled, but --cgroup-root was not specified. defaulting to /
Nov 10 05:35:15 master kubelet[30174]: I1110 05:35:15.451260 30174 container_manager_linux.go:252] container manager verified user specified cgroup-root exists: /
Nov 10 05:35:15 master kubelet[30174]: I1110 05:35:15.451293 30174 container_manager_linux.go:257] Creating Container Manager object based on Node Config: {RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName: ContainerRuntime:docker CgroupsPerQOS:true CgroupRoot:/
Nov 10 05:35:15 master kubelet[30174]: I1110 05:35:15.451403 30174 container_manager_linux.go:288] Creating device plugin handler: false
Nov 10 05:35:15 master kubelet[30174]: I1110 05:35:15.451616 30174 kubelet.go:273] Adding manifest file: /etc/kubernetes/manifests
Nov 10 05:35:15 master kubelet[30174]: I1110 05:35:15.451710 30174 kubelet.go:283] Watching apiserver
Nov 10 05:35:15 master kubelet[30174]: E1110 05:35:15.480061 30174 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:422: Failed to list *v1.Node: Get https://10.0.2.15:6443/api/v1/nodes?fieldSelector=metadata.name%3Dmaster&resourceVersion=0: dial tcp 10.0.2.15
Nov 10 05:35:15 master kubelet[30174]: E1110 05:35:15.500829 30174 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:413: Failed to list *v1.Service: Get https://10.0.2.15:6443/api/v1/services?resourceVersion=0: dial tcp 10.0.2.15:6443: getsockopt: connection r
Nov 10 05:35:15 master kubelet[30174]: E1110 05:35:15.500917 30174 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://10.0.2.15:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dmaster&resourceVersion=0: dial tcp 10.
Nov 10 05:35:15 master kubelet[30174]: W1110 05:35:15.541334 30174 kubelet_network.go:69] Hairpin mode set to "promiscuous-bridge" but kubenet is not enabled, falling back to "hairpin-veth"
Nov 10 05:35:15 master kubelet[30174]: I1110 05:35:15.541369 30174 kubelet.go:517] Hairpin mode set to "hairpin-veth"
Nov 10 05:35:15 master kubelet[30174]: W1110 05:35:15.541616 30174 cni.go:196] Unable to update cni config: No networks found in /etc/cni/net.d
Nov 10 05:35:15 master kubelet[30174]: W1110 05:35:15.548689 30174 cni.go:196] Unable to update cni config: No networks found in /etc/cni/net.d
Nov 10 05:35:15 master kubelet[30174]: W1110 05:35:15.553143 30174 cni.go:196] Unable to update cni config: No networks found in /etc/cni/net.d
Nov 10 05:35:15 master kubelet[30174]: I1110 05:35:15.553164 30174 docker_service.go:207] Docker cri networking managed by cni
Nov 10 05:35:15 master kubelet[30174]: error: failed to run Kubelet: failed to create kubelet: misconfiguration: kubelet cgroup driver: "systemd" is different from docker cgroup driver: "cgroupfs"
Nov 10 05:35:15 master systemd[1]: kubelet.service: main process exited, code=exited, status=1/FAILURE
Nov 10 05:35:15 master systemd[1]: Unit kubelet.service entered failed state.
Nov 10 05:35:15 master systemd[1]: kubelet.service failed.
The key point in logs misconfiguration: kubelet cgroup driver: "systemd" is different from docker cgroup driver: "cgroupfs"
Make sure that the cgroup driver used by kubelet is the same as the one used by Docker.
To ensure compatability you can either update Docker, or ensure the --cgroup-driver kubelet flag is set to the same value as Docker (e.g. cgroupfs)
-- Installing kubeadm
Either update Docker to use systemd
cat << EOF > /etc/docker/daemon.json
{
"exec-opts": ["native.cgroupdriver=systemd"]
}
EOF
And restart docker service.
Or update kubelet to use cgroupfs
sed -i -E 's/--cgroup-driver=systemd/--cgroup-driver=cgroupfs/' /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
And restart the kubelet by systemctl restart kubelet.service.