I have tried setting max nodes per pod using the following upon install:
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--max-pods 250" sh -s -
However, the K3s server will then fail to load. It appears that the --max-pods flag has been deprecated per the kubernetes docs:
--max-pods int32 Default: 110
(DEPRECATED: This parameter should be set via the config file
specified by the Kubelet's --config flag. See
https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/
for more information.)
So with K3s, where is that kubelet config file and can/should it be set using something like the above method?
To update your existing installation with an increased max-pods, add a kubelet config file into a k3s associated location such as /etc/rancher/k3s/kubelet.config:
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
maxPods: 250
edit /etc/systemd/system/k3s.service to change the k3s server args:
ExecStart=/usr/local/bin/k3s \
server \
'--disable' \
'servicelb' \
'--disable' \
'traefik' \
'--kubelet-arg=config=/etc/rancher/k3s/kubelet.config'
reload systemctl to pick up the service change:
sudo systemctl daemon-reload
restart k3s:
sudo systemctl restart k3s
Check the output of describe nodes with kubectl describe <node> and look for allocatable resources:
Allocatable:
cpu: 32
ephemeral-storage: 199789251223
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 131811756Ki
pods: 250
and a message noting that allocatable node limit has been updated in Events:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 20m kube-proxy Starting kube-proxy.
Normal Starting 20m kubelet Starting kubelet.
...
Normal NodeNotReady 7m52s kubelet Node <node> status is now: NodeNotReady
Normal NodeAllocatableEnforced 7m50s kubelet Updated Node Allocatable limit across pods
Normal NodeReady 7m50s kubelet Node <node> status is now: NodeReady
As described in documentation, it is possible to set the Kubelet's configuration parameters via
an on-disk config file.
NOTE: Using an on-disk config file, we can set only subset of the Kubelet's configuration parameters, that we want to override, all other Kubelet configuration values are left at their built-in defaults, unless overridden by flags.
I've created simple config file to override maxPods value (default 110):
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
maxPods: 250
And then we have to pass this config file as an argument during K3s installation (I recommend to specify absolute pathname):
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--kubelet-arg=config=<KUBELET_CONFIG_FILE_LOCATION>" sh -
Finally we can check that maxPods is equal to 250:
# kubectl describe nodes <NODE_NAME> | grep -i pod
pods: 250
pods: 250
PodCIDR: 10.42.0.0/24
PodCIDRs: 10.42.0.0/24
In addition you can find interesting discussion here, I believe you can find there another way to solve your issue.
I am using Microk8s (1.19, on Ubuntu 20.04.1 LTS) and am trying to use OpenOBS (cStor engine) for storage.
Since I'm running this locally before pushing to prod, I created virtual block devices with:
blockdevicedisk='/k8storage/diskimage'
blockdevicesize=10000
sudo dd if=/dev/zero of=$blockdevicedisk bs=1M count=$blockdevicesize
sudo mkfs.ext4 $blockdevicedisk
sudo losetup /dev/loop0 /k8storage/diskimage
$lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 9.8G 0 loop
loop1 7:1 0 7.8G 0 loop
sda 8:0 0 256G 0 disk
sdb 8:16 0 256G 0 disk /
I installed OpenEBS with helm, then removed 'loop' from the openebs-ndm-config -> filterconfigs -> path-filter -> exclude. So that ndm would display these as blockdevices.
$kubectl get blockdevices -n openebs
NAME NODENAME SIZE CLAIMSTATE STATUS AGE
blockdevice-87ca7d6819eab3ea3af2884f2f6e9f8e v 274877906944 Unclaimed Active 19h
blockdevice-0a6c8d26081660a37f0a87dbb316c7ae v 10485760000 Unclaimed Active 19h
blockdevice-cd43d37664edd1c880e11f5b8e9cbe60 v 8388608000 Unclaimed Active 19h
^ the last 2 are the ones I made.
I then wrote the config to create a cStor StoragePoolClaim
apiVersion: openebs.io/v1alpha1
kind: StoragePoolClaim
metadata:
name: cstor-pool-claim
spec:
name: cstor-pool-claim
type: disk
poolSpec:
poolType: striped
blockDevices:
blockDeviceList:
- blockdevice-0a6c8d26081660a37f0a87dbb316c7ae
- blockdevice-cd43d37664edd1c880e11f5b8e9cbe60
When I apply it, both the blockdevices are claimed
$kubectl get blockdevices -n openebs
NAME NODENAME SIZE CLAIMSTATE STATUS AGE
blockdevice-87ca7d6819eab3ea3af2884f2f6e9f8e v 274877906944 Unclaimed Active 19h
blockdevice-0a6c8d26081660a37f0a87dbb316c7ae v 10485760000 Claimed Active 19h
blockdevice-cd43d37664edd1c880e11f5b8e9cbe60 v 8388608000 Claimed Active 19h
which is expected.
$kubectl get spc
NAME AGE
cstor-pool-claim 18h
However, there is a problem!
$kubectl get csp
NAME ALLOCATED FREE CAPACITY STATUS READONLY TYPE AGE
cstor-pool-claim-nf0g Init false striped 19h
It never changes from Init status. There is a pod created which shows the error as
$kubectl describe pod cstor-pool-claim-nf0g-6cb75f8f49-sw6q2 -n openebs
which spews a lot of text I could show if it helps. The key part is the error message, which is:
Error: failed to create containerd task: OCI runtime create failed:
container_linux.go:370: starting container process caused:
process_linux.go:459: container init caused: rootfs_linux.go:59:
mounting
"/var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/a9b84df9076c91b83982f157e9bacdc5a10f80846d32034dd15cdae1c1d4c4c1/shm"
to rootfs at "/dev/shm" caused: secure join: too many levels of
symbolic links: unknown
I've tried resetting my setup and re-entering the commands one-by-one to ensure that I was following the documentation, and other examples correctly, however, I keep encountering this error.
Is this a limitation of microk8s? A fault of OpenEBS? Something weird about my setup? Or did I do something wrong?
More importantly: Is there a way to get this to work correctly?
I have 3 node Kubernetes cluster on 1.11 deployed with kubeadm and weave(CNI) running of version 2.5.1. I am providing weave CIDR of IP range of 128 IP's. After two reboot of nodes some of the pods stuck in containerCreating state.
Once you run kubectl describe pod <pod_name> you will see following errors:
Events:
Type Reason Age From Message
---- ------ ---- ----
-------
Normal SandboxChanged 20m (x20 over 1h) kubelet, 10.0.1.63 Pod sandbox changed, it will be killed and re-created.
Warning FailedCreatePodSandBox 30s (x25 over 1h) kubelet, 10.0.1.63 Failed create pod sandbox: rpc error: code = DeadlineExceeded desc = context deadline exceeded
If I check how many containers are running and how many IP address are allocated to those, I can see 24 containers:
[root#ip-10-0-1-63 centos]# weave ps | wc -l
26
The number of total IP's to weave at that node is 42.
[root#ip-10-0-1-212 centos]# kubectl exec -n kube-system -it weave-net-6x4cp -- /home/weave/weave --local status ipam
Defaulting container name to weave.
Use 'kubectl describe pod/weave-net-6x4cp -n kube-system' to see all of the containers in this pod.
6e:0d:f3:d7:f5:49(10.0.1.63) 42 IPs (32.8% of total) (42 active)
7a:24:6f:3c:1b:be(10.0.1.212) 40 IPs (31.2% of total)
ee:00:d4:9f:9d:79(10.0.1.43) 46 IPs (35.9% of total)
You can see all 42 IP's are active so no more IP's are available to allocate to new containers. But out of 42 only 26 are actually allocated to containers, I am not sure where are remaining IP's. It is happening on all three nodes.
Here is the output of weave status for your reference:
[root#ip-10-0-1-212 centos]# weave status
Version: 2.5.1 (version 2.5.2 available - please upgrade!)
Service: router
Protocol: weave 1..2
Name: 7a:24:6f:3c:1b:be(10.0.1.212)
Encryption: disabled
PeerDiscovery: enabled
Targets: 3
Connections: 3 (2 established, 1 failed)
Peers: 3 (with 6 established connections)
TrustedSubnets: none
Service: ipam
Status: waiting for IP(s) to become available
Range: 192.168.13.0/25
DefaultSubnet: 192.168.13.0/25
If you need anymore information, I would happy to provide. Any Clue?
Not sure if we have the same problem.
But before i reboot a node. I need to drain it first. So, all pods in that nodes will be evicted. We are safe to reboot the node.
After that node is up. You need to uncordon again. The node will be available to scheduling pod again.
My reference https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/
I guess that 16 IP's have reserved for Pods reuse purpose. These are the maximum pods per node based on CIDR ranges.
Maximum Pods per Node CIDR Range per Node
8 /28
9 to 16 /27
17 to 32 /26
33 to 64 /25
65 to 110 /24
In case if you're weave IP's are exhausted and some of the IP's are not released after reboot. You can delete the file /var/lib/weave/weave-netdata.db and restart the weave pods.
For my case, I have added a systemd script which on every reboot or shutdown of the system removes the /var/lib/weave/weave-netdata.db file and Once system comes up it allocates new Ip's to all the pods and the weave IP exhaust were never seen again.
Posting this here in hope someone else will find it useful for their use case.
Is it possible to specify CPU ID list to the Kubernetes cpumanager? The goal is to make sure pods get CPUs from a single socket (0). I brought all the CPUs on the peer socket offline as mentioned here, for example:
$ echo 0 > /sys/devices/system/cpu/cpu5/online
After doing this, the Kubernetes master indeed sees the remaining online CPUs
kubectl describe node foo
Capacity:
cpu: 56 <<< socket 0 CPU count
ephemeral-storage: 958774760Ki
hugepages-1Gi: 120Gi
memory: 197524872Ki
pods: 110
Allocatable:
cpu: 54 <<< 2 system reserved CPUs
ephemeral-storage: 958774760Ki
hugepages-1Gi: 120Gi
memory: 71490952Ki
pods: 110
System Info:
Machine ID: 1155420082478559980231ba5bc0f6f2
System UUID: 4C4C4544-0044-4210-8031-C8C04F584B32
Boot ID: 7fa18227-748f-496c-968c-9fc82e21ecd5
Kernel Version: 4.4.13
OS Image: Ubuntu 16.04.4 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://17.3.3
Kubelet Version: v1.11.1
Kube-Proxy Version: v1.11.1
However, cpumanager still seems to think there are 112 CPUs (socket0 + socket1).
cat /var/lib/kubelet/cpu_manager_state
{"policyName":"static","defaultCpuSet":"0-111"}
As a result, the kubelet system pods are throwing the following error:
kube-system kube-proxy-nk7gc 0/1 rpc error: code = Unknown desc = failed to update container "eb455f81a61b877eccda0d35eea7834e30f59615346140180f08077f64896760": Error response from daemon: Requested CPUs are not available - requested 0-111, available: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110 762 36d <IP address> foo <none>
I was able to get this working. Posting this as an answer so that someone in need might benefit.
It appears the CPU set is read from /var/lib/kubelet/cpu_manager_state file and it is not updated across kubelet restarts. So this file needs to be removed before restarting kubelet.
The following worked for me:
# On a running worker node, bring desired CPUs offline. (run as root)
$ cpu_list=`lscpu | grep "NUMA node1 CPU(s)" | awk '{print $4}'`
$ chcpu -d $cpu_list
$ rm -f /var/lib/kubelet/cpu_manager_state
$ systemctl restart kubelet.service
# Check the CPU set seen by the CPU manager
$ cat /var/lib/kubelet/cpu_manager_state
# Try creating pods and check the syslog:
Dec 3 14:36:05 k8-2-w1 kubelet[8070]: I1203 14:36:05.122466 8070 state_mem.go:84] [cpumanager] updated default cpuset: "0,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110"
Dec 3 14:36:05 k8-2-w1 kubelet[8070]: I1203 14:36:05.122643 8070 policy_static.go:198] [cpumanager] allocateCPUs: returning "2,4,6,8,58,60,62,64"
Dec 3 14:36:05 k8-2-w1 kubelet[8070]: I1203 14:36:05.122660 8070 state_mem.go:76] [cpumanager] updated desired cpuset (container id: 356939cdf32d0f719e83b0029a018a2ca2c349fc0bdc1004da5d842e357c503a, cpuset: "2,4,6,8,58,60,62,64")
I have reported a bug here as I think the CPU set should be updated after kubelet restarts.
I have installed two nodes kubernetes 1.12.1 in cloud VMs, both behind internet proxy. Each VMs have floating IPs associated to connect over SSH, kube-01 is a master and kube-02 is a node. Executed export:
no_proxy=127.0.0.1,localhost,10.157.255.185,192.168.0.153,kube-02,192.168.0.25,kube-01
before running kubeadm init, but I am getting the following status for kubectl get nodes:
NAME STATUS ROLES AGE VERSION
kube-01 NotReady master 89m v1.12.1
kube-02 NotReady <none> 29s v1.12.2
Am I missing any configuration? Do I need to add 192.168.0.153 and 192.168.0.25 in respective VM's /etc/hosts?
Looks like pod network is not installed yet on your cluster . You can install weave for example with below command
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
After a few seconds, a Weave Net pod should be running on each Node and any further pods you create will be automatically attached to the Weave network.
You can install pod networks of your choice . Here is a list
after this check
$ kubectl describe nodes
check all is fine like below
Conditions:
Type Status
---- ------
OutOfDisk False
MemoryPressure False
DiskPressure False
Ready True
Capacity:
cpu: 2
memory: 2052588Ki
pods: 110
Allocatable:
cpu: 2
memory: 1950188Ki
pods: 110
next ssh to the pod which is not ready and observe kubelet logs. Most likely errors can be of certificates and authentication.
You can also use journalctl on systemd to check kubelet errors.
$ journalctl -u kubelet
Try with this
Your coredns is in pending state check with the networking plugin you have used and check the proper addons are added
check kubernates troubleshooting guide
https://kubernetes.io/docs/setup/independent/troubleshooting-kubeadm/#coredns-or-kube-dns-is-stuck-in-the-pending-state
https://kubernetes.io/docs/concepts/cluster-administration/addons/
And install the following with those
And check
kubectl get pods -n kube-system
On the off chance it might be the same for someone else, in my case, I was using the wrong AMI image to create the nodegroup.
Run
journalctl -u kubelet
Then check at node logs, if you get below error, disable the sawp using swapoff -a
"Failed to run kubelet" err="failed to run Kubelet: running with swap on is not supported, please disable swap! or set --fa
Main process exited, code=exited, status=1/FAILURE