Why does kubeadm not start even after disabling swap? - kubernetes

I am trying to install kubernetes with kubeadm in my laptop which has Ubuntu 16.04. I have disabled swap, since kubelet does not work with swap on. The command I used is :
swapoff -a
I also commented out the reference to swap in /etc/fstab.
# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point> <type> <options> <dump> <pass>
# / was on /dev/sda1 during installation
UUID=1d343a19-bd75-47a6-899d-7c8bc93e28ff / ext4 errors=remount-ro 0 1
# swap was on /dev/sda5 during installation
#UUID=d0200036-b211-4e6e-a194-ac2e51dfb27d none swap sw 0 0
I confirmed swap is turned off by running the following:
free -m
total used free shared buff/cache available
Mem: 15936 2108 9433 954 4394 12465
Swap: 0 0 0
When I start kubeadm, I get the following error:
kubeadm init --pod-network-cidr=10.244.0.0/16
[init] Using Kubernetes version: v1.14.2
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR Swap]: running with swap on is not supported. Please disable swap
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
I also tried restarting my laptop, but I get the same error. What could the reason be?

below was the root cause.
detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd".
you need to update the docker cgroup driver.
follow the below fix
cat > /etc/docker/daemon.json <<EOF
{
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2",
"storage-opts": [
"overlay2.override_kernel_check=true"
]
}
EOF
mkdir -p /etc/systemd/system/docker.service.d
# Restart Docker
systemctl daemon-reload
systemctl restart docker

you could try kubeadm reset , then kubeadm init --ignore-preflight-errors Swap .

first try with sudo
sudo swapoff -a
then check if there's anything swapped
cat /proc/swaps
and
free -h

Related

Ceph (cepadm) quincy: can't add osd from remote nodes (command hanging)

I stuck with a problem, while trying to create cluster of 3 nodes (AWS EC2 instancies):
fa11.something.com ~ # ceph orch host ls
HOST ADDR LABELS STATUS
fa11.something.com 172.16.24.67 _admin
fa12.something.com 172.16.23.159 _admin
fa13.something.com 172.16.25.119 _admin
3 hosts in cluster
Each of them have 2 disks (all accepted by CEPH):
fa11.something.com ~ # ceph orch device ls
HOST PATH TYPE DEVICE ID SIZE AVAILABLE REFRESHED REJECT REASONS
fa11.something.com /dev/nvme1n1 ssd Amazon_Elastic_Block_Store_vol016651cf7f3b9c9dd 8589M Yes 7m ago
fa11.something.com /dev/nvme2n1 ssd Amazon_Elastic_Block_Store_vol034082d7d364dfbdb 5368M Yes 7m ago
fa12.something.com /dev/nvme1n1 ssd Amazon_Elastic_Block_Store_vol0ec193fa3f77fee66 8589M Yes 3m ago
fa12.something.com /dev/nvme2n1 ssd Amazon_Elastic_Block_Store_vol018736f7eeab725f5 5368M Yes 3m ago
fa13.something.com /dev/nvme1n1 ssd Amazon_Elastic_Block_Store_vol0443a031550be1024 8589M Yes 84s ago
fa13.something.com /dev/nvme2n1 ssd Amazon_Elastic_Block_Store_vol0870412d37717dc2c 5368M Yes 84s ago
fa11.something.com is first host, where from I manage cluster.
Adding OSD from fa11.something.com itself works fine:
fa11.something.com ~ # ceph orch daemon add osd fa11.something.com:/dev/nvme1n1
Created osd(s) 0 on host 'fa11.something.com'
But it doesn't work for other 2 hosts (it hangs forever):
fa11.something.com ~ # ceph orch daemon add osd fa12.something.com:/dev/nvme1n1
^CInterrupted
Logs on fa12.something.com shows that it hangs at following step:
fa12.something.com ~ # tail /var/log/ceph/a9ef6c26-ac38-11ed-9429-06e6bc29c1db/ceph-volume.log
...
[2023-02-14 07:38:20,942][ceph_volume.process][INFO ] Running command: /usr/bin/ceph-authtool --gen-print-key
[2023-02-14 07:38:20,964][ceph_volume.process][INFO ] Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new a51506c2-e910-4763-9a0c-f6c2194944e2
I'm not sure what might be the reason for this hanging?
*Additional details:
cephadm installed, using curl (https://docs.ceph.com/en/quincy/cephadm/install/#curl-based-installation)
I use user ceph, instead of root and port 2222 instead of 22. First node was bootstrapped, using below command:
cephadm bootstrap --mon-ip 172.16.24.67 --allow-fqdn-hostname --ssh-user ceph --ssh-config /home/anton/ceph/ssh_config --cluster-network 172.16.16.0/20 --skip-monitoring-stack
Content of /home/anton/ceph/ssh_config:
fa11.something.com ~ # cat /home/anton/ceph/ssh_config
Host *
User ceph
Port 2222
IdentityFile /home/ceph/.ssh/id_rsa
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
Hosts fa12.something.com and fa13.something.com were added, using commands:
ceph orch host add fa12.something.com 172.16.23.159 --labels _admin
ceph orch host add fa13.something.com 172.16.25.119 --labels _admin
Not sure if I have to check some specific ports are not blocked? Just I expected to get some error at early stages in case if ceph can't access some port...
Thanks in advance for any suggestion!
It should be that problem caused by some port(s) blocked. I've tried to drop all restrictions on iptables level: iptables -F; iptables -P INPUT ACCEPT and it started to work. Will check later which ports exactly needed. I guess it should be described somewhere here: https://docs.ceph.com/en/latest/rados/configuration/network-config-ref/

Fail to start Minikube on Debian

I installed Minikube on my Debian 10, but when I try to start it, I
get these errors:
$ minikube start
* minikube v1.25.2 on Debian 10.1
* Unable to pick a default driver. Here is what was considered, in preference order:
- docker: Not healthy: "docker version --format {{.Server.Os}}-{{.Server.Version}}" exit status 1: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.40/version: dial unix /var/run/docker.sock: connect: permission denied
- docker: Suggestion: Add your user to the 'docker' group: 'sudo usermod -aG docker $USER && newgrp docker' <https://docs.docker.com/engine/install/linux-postinstall/>
- kvm2: Not healthy: /usr/bin/virsh domcapabilities --virttype kvm failed:
error: failed to get emulator capabilities
error: invalid argument: KVM is not supported by '/usr/bin/qemu-system-x86_64' on this host
exit status 1
- kvm2: Suggestion: Follow your Linux distribution instructions for configuring KVM <https://minikube.sigs.k8s.io/docs/reference/drivers/kvm2/>
* Alternatively you could install one of these drivers:
- podman: Not installed: exec: "podman": executable file not found in $PATH
- vmware: Not installed: exec: "docker-machine-driver-vmware": executable file not found in $PATH
- virtualbox: Not installed: unable to find VBoxManage in $PATH
I added my user to the docker group using:
sudo usermod -aG docker $USER
and I insalled kvm without any apparent problems as far as I understand:
kvm --version
QEMU emulator version 3.1.0 (Debian 1:3.1+dfsg-8~deb10u1)
Copyright (c) 2003-2018 Fabrice Bellard and the QEMU Project developers
$ lsmod | grep kvm
kvm 729088 0
irqbypass 16384 1 kvm
$ sudo virsh list --all
Id Name State
-----------------------------
1 debian10-MK running
What could be the problem and solution then?
Thanks,
Tamar

Kubernetes cluster deployment in packstack

I am trying to deploy a k8s cluster in openstack rocky but after long time it fails. I've checked orchestration stack and see that kube_minions resources never completes. Checking the logs output for all the instances created:
[ 196.817505] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 215.082433] random: crng init done
Fedora 27 (Atomic Host)
Kernel 4.14.18-300.fc27.x86_64 on an x86_64 (ttyS0)
host-10-0-0-3 login: [ 691.438618] bridge: filtering via
arp/ip/ip6tables is no longer available by default. Update your scripts
to load br_netfilter if you need this.
[ 691.516277] Bridge firewalling registered
[ 692.149217] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
[ 701.932912] IPv6: ADDRCONF(NETDEV_UP): docker0: link is not ready
Checking deeply in the instances I've found that in master node can not start heat-agent-service...
_prefix=docker.io/openstackmagnum/
atomic install --storage ostree --system --system-package no --set
REQUESTS_CA_BUNDLE=/etc/pki/tls/certs/ca-bundle.crt --name heat-container-
agent docker.io/openstackmagnum/heat-container-agent:rocky-stable
systemctl start heat-container-agent
Failed to start heat-container-agent.service: Unit heat-container-
agent.service not found.
2019-04-04 14:57:40,238 - util.py[WARNING]: Failed running
/var/lib/cloud/instance/scripts/part-013 [5]´

Ran out of Docker disk space

I have this Docker command:
docker run -d mongo
this will build and run a mongodb server running in a docker container
However, I get an error:
no space left on device
I am on MacOS, and using the newer versions of Docker which use hyper-v instead of VirtualBox (I think that's correct).
Here is the exact error message from the mongo container:
$ docker logs efee16702c5756659d563b98d4ae0f58ecf1f1bba8a54f63443c0ae4b520ab4e
about to fork child process, waiting until server is ready for connections.
forked process: 21
2017-05-04T20:23:51.412+0000 I CONTROL [main] ***** SERVER RESTARTED *****
2017-05-04T20:23:51.430+0000 I CONTROL [main] ERROR: Cannot write pid file to /tmp/tmp.Lo035QkbfL: No space left on device
ERROR: child process failed, exited with error number 1
Any idea how to fix this and prevent it from happening in future?
As suggested, the output of df -h is:
Filesystem Size Used Avail Capacity iused ifree %iused Mounted on
/dev/disk1 465Gi 116Gi 349Gi 25% 1963838 4293003441 0% /
devfs 183Ki 183Ki 0Bi 100% 634 0 100% /dev
map -hosts 0Bi 0Bi 0Bi 100% 0 0 100% /net
map auto_home 0Bi 0Bi 0Bi 100% 0 0 100% /home
Output of docker info is:
$ docker info
Containers: 5
Running: 0
Paused: 0
Stopped: 5
Images: 741
Server Version: 17.03.1-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 4ab9917febca54791c5f071a9d1f404867857fcc
runc version: 54296cf40ad8143b62dbcaa1d90e520a2136ddfe
init version: N/A (expected: 949e6facb77383876aeff8a6944dde66b3089574)
Security Options:
seccomp
Profile: default
Kernel Version: 4.9.13-moby
Operating System: Alpine Linux v3.5
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 1.952 GiB
Name: moby
ID: OR4L:WYWW:FFAP:IDX3:B6UK:O2AN:UVTO:EPH6:GYSV:4GV4:L5WP:BQTH
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
File Descriptors: 17
Goroutines: 30
System Time: 2017-05-04T20:45:27.056157913Z
EventsListeners: 1
No Proxy: *.local, 169.254/16
Registry: https://index.docker.io/v1/
Experimental: true
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
As you state in the comments to the question, ls -altrh ~/Library/Containers/com.docker.docker/Data/com.docker.driver.‌​amd64-linux/Docker.q‌​cow2 returns the following:
-rw-r--r--# 1 alexamil staff 53G
This is a known bug on MacOS (actually, not only) and an official dev comment could be found here. Except for one thing: I read, that different people get different size limit. In the comment it is 64Gb, but for another person it was 20Gb.
There are a couple walkarounds, but no definite solution that I could find.
The manual one
Run docker ps -a and manually remove all unused containers. Then run docker images and remove manually all the intermediate and unused images.
The simplest one
Delete the Docker.qcow2 file entirely. But you will lose all images and containers. Completely.
The less simple
Another way is to run docker volume prune, which will remove all unused volumes
The resizing one (keeps the data)
Another idea that comes to me is to expand the disk image size with QEMU or something like it:
$ brew install qemu
$ /Applications/Docker.app/Contents/MacOS/qemu-img resize ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2 +5G
After you expanded the image, you will need to run a VM in which you should run GParted against Docker.qcow2 and expand the partition to use added space. You could use GParted Live ISO for that:
$ qemu-system-x86_64 -drive file=Docker.qcow2 -m 512 -cdrom ~/Downloads/gparted-live.iso -boot d -device usb-mouse -usb
Some people report this either doesn't work or doesn't help.
Yet another resizing one (wipes the data)
Create a substitute image with desired size (120G):
$ qemu-img create -f qcow2 ~/data.qcow2 120G
$ cp ~/data.qcow2 /Application/Docker.app/Contents/Resources/moby/data.qcow2
$ rm ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2
data.qcow2 is copied to ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2 when you restart docker.
This walkaround comes from this comment.
Hope this helps. Good luck!

Failed to start puppetserver Service

While trying to run a puppet update form a node:
sudo /opt/puppetlabs/bin/puppet agent -t
I get an error:
Error: Could not retrieve catalog; skipping run
Error: Could not send report: Connection refused - connect(2) for "puppet" port 8140`
Elsewhere indicates this is likely a problem with the puppetserver service, and suggests to reboot the server. Restarting didn't help, and when I try to restart the service I get failure:
~$ sudo service puppetserver restart
Job for puppetserver.service failed because the control process exited with error code. See "systemctl status puppetserver.service" and "journalctl -xe" for details.
I've looked at these logs, and as a puppet/linux noob, I'm not sure what to do next.
systemctl status puppetserver.service
● puppetserver.service - puppetserver Service
Loaded: loaded (/lib/systemd/system/puppetserver.service; enabled; vendor preset: enabled)
Active: activating (start-post) since Fri 2016-09-02 15:54:26 PDT; 2s ago
Process: 22301 ExecStartPre=/usr/bin/install --directory --owner=puppet --group=puppet --mode=775 /var/run/puppetlabs/puppetserver (code=exited
Main PID: 22306 (java); : 22307 (bash)
Tasks: 17
Memory: 335.7M
CPU: 5.535s
CGroup: /system.slice/puppetserver.service
├─22306 /usr/bin/java -Xms6g -Xmx6g -XX:MaxPermSize=256m -XX:OnOutOfMemoryError=kill -9 %p -Djava.security.egd=/dev/urandom -cp /opt/p
└─control
├─22307 /bin/bash /opt/puppetlabs/server/apps/puppetserver/ezbake-functions.sh wait_for_app
└─22331 sleep 1
Sep 02 15:54:26 puppet systemd[1]: Starting puppetserver Service...
Sep 02 15:54:26 puppet java[22306]: OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0
puppet version 4.6.1
The puppet master communicates with the other node using port number 8140.
I don't think a restart will help, since this looks like a connection issue between the server and the node.
please try the following -
first make sure that the puppet master is actually listening on port 8140. run the following command on the puppetmaster -
netstat -ntlp | grep 8140
this command should return something like this -
tcp 0 0 0.0.0.0:8140 0.0.0.0:* LISTEN 1783/puppetmaster
If you don't get the same output, your puppetmaster is not listening, and therefore can not compile catalogs for the node.
Try checking the puppet master log at /var/log/puppetmaster.log
check that the node can communicate with the puppetmaster on the relevant port. you can check this quickly with the telnet command. run this on your node -
telnet < puppetmaster ip address \ dns name> 8140
you should get something like -
Connected to <puppet-master-IP/DNS-name>
Escape character is '^]'.
if you don't get this output, this means that something is blocking you from accessing the puppetmaster. try opening the port in your firewall to access the puppetmaster.
if you're still stuck try using the --debug flag for verbose output and edit your question.
Could be 2 things: (1) in puppet.conf you have configured more memory than you have on your machine. Or (2) You installed both apt-get install puppetserver and apt-get install puppet.
If you get failed to start puppet.service: unit not found. error on slave machine while connecting to puppet.
Close the putty and then again open and connect it.The issue wont come while starting putty on slave.
The error occurs because there is not enough RAM and to fix the error, open the Puppet server configuration file:
sudo nano /etc/sysconfig/puppetserver
And reduce the amount of allocated RAM for the Puppet server (for example, I specified 512m instead of 2g):
JAVA_ARGS="-Xms512m -Xmx512m"
Now let’s start the Puppet server:
sudo systemctl start puppetserver