Ran out of Docker disk space - mongodb

I have this Docker command:
docker run -d mongo
this will build and run a mongodb server running in a docker container
However, I get an error:
no space left on device
I am on MacOS, and using the newer versions of Docker which use hyper-v instead of VirtualBox (I think that's correct).
Here is the exact error message from the mongo container:
$ docker logs efee16702c5756659d563b98d4ae0f58ecf1f1bba8a54f63443c0ae4b520ab4e
about to fork child process, waiting until server is ready for connections.
forked process: 21
2017-05-04T20:23:51.412+0000 I CONTROL [main] ***** SERVER RESTARTED *****
2017-05-04T20:23:51.430+0000 I CONTROL [main] ERROR: Cannot write pid file to /tmp/tmp.Lo035QkbfL: No space left on device
ERROR: child process failed, exited with error number 1
Any idea how to fix this and prevent it from happening in future?
As suggested, the output of df -h is:
Filesystem Size Used Avail Capacity iused ifree %iused Mounted on
/dev/disk1 465Gi 116Gi 349Gi 25% 1963838 4293003441 0% /
devfs 183Ki 183Ki 0Bi 100% 634 0 100% /dev
map -hosts 0Bi 0Bi 0Bi 100% 0 0 100% /net
map auto_home 0Bi 0Bi 0Bi 100% 0 0 100% /home
Output of docker info is:
$ docker info
Containers: 5
Running: 0
Paused: 0
Stopped: 5
Images: 741
Server Version: 17.03.1-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 4ab9917febca54791c5f071a9d1f404867857fcc
runc version: 54296cf40ad8143b62dbcaa1d90e520a2136ddfe
init version: N/A (expected: 949e6facb77383876aeff8a6944dde66b3089574)
Security Options:
seccomp
Profile: default
Kernel Version: 4.9.13-moby
Operating System: Alpine Linux v3.5
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 1.952 GiB
Name: moby
ID: OR4L:WYWW:FFAP:IDX3:B6UK:O2AN:UVTO:EPH6:GYSV:4GV4:L5WP:BQTH
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
File Descriptors: 17
Goroutines: 30
System Time: 2017-05-04T20:45:27.056157913Z
EventsListeners: 1
No Proxy: *.local, 169.254/16
Registry: https://index.docker.io/v1/
Experimental: true
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false

As you state in the comments to the question, ls -altrh ~/Library/Containers/com.docker.docker/Data/com.docker.driver.‌​amd64-linux/Docker.q‌​cow2 returns the following:
-rw-r--r--# 1 alexamil staff 53G
This is a known bug on MacOS (actually, not only) and an official dev comment could be found here. Except for one thing: I read, that different people get different size limit. In the comment it is 64Gb, but for another person it was 20Gb.
There are a couple walkarounds, but no definite solution that I could find.
The manual one
Run docker ps -a and manually remove all unused containers. Then run docker images and remove manually all the intermediate and unused images.
The simplest one
Delete the Docker.qcow2 file entirely. But you will lose all images and containers. Completely.
The less simple
Another way is to run docker volume prune, which will remove all unused volumes
The resizing one (keeps the data)
Another idea that comes to me is to expand the disk image size with QEMU or something like it:
$ brew install qemu
$ /Applications/Docker.app/Contents/MacOS/qemu-img resize ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2 +5G
After you expanded the image, you will need to run a VM in which you should run GParted against Docker.qcow2 and expand the partition to use added space. You could use GParted Live ISO for that:
$ qemu-system-x86_64 -drive file=Docker.qcow2 -m 512 -cdrom ~/Downloads/gparted-live.iso -boot d -device usb-mouse -usb
Some people report this either doesn't work or doesn't help.
Yet another resizing one (wipes the data)
Create a substitute image with desired size (120G):
$ qemu-img create -f qcow2 ~/data.qcow2 120G
$ cp ~/data.qcow2 /Application/Docker.app/Contents/Resources/moby/data.qcow2
$ rm ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2
data.qcow2 is copied to ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2 when you restart docker.
This walkaround comes from this comment.
Hope this helps. Good luck!

Related

Debug Alpine Image in K8s: No `netstat`, no `ip`, no `apk`

There is a container in my Kubernetes cluster which I want to debug.
But there is nonetstat, no ip and no apk.
Is there a way to upgrade this image, so that the common tools are installed?
In this case it is the nginx container image in a K8s 1.23 cluster.
Alpine is a stripped-down version of the image to reduce the footprint. So the absence of those tools is expected. Although since Kubernetes 1.23, you can use the kubectl debug command to attach a debug pod to the subject pod.
Syntax:
kubectl debug -it <POD_TO_DEBUG> --image=ubuntu --target=<CONTAINER_TO_DEBUG> --share-processes
Example:
In the below example, the ubuntu container is attached to the Nginx-alpine pod, requiring debugging. Also, note that the ps -eaf output shows nginx process running and the cat /etc/os-release shows ubuntu running. The indicating process is shared/visible between the two containers.
ps#kube-master:~$ kubectl debug -it nginx --image=ubuntu --target=nginx --share-processes
Targeting container "nginx". If you don't see processes from this container, the container runtime doesn't support this feature.
Defaulting debug container name to debugger-2pgtt.
If you don't see a command prompt, try pressing enter.
root#nginx:/# ps -eaf
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 19:50 ? 00:00:00 nginx: master process nginx -g daemon off;
101 33 1 0 19:50 ? 00:00:00 nginx: worker process
101 34 1 0 19:50 ? 00:00:00 nginx: worker process
101 35 1 0 19:50 ? 00:00:00 nginx: worker process
101 36 1 0 19:50 ? 00:00:00 nginx: worker process
root 248 0 1 20:00 pts/0 00:00:00 bash
root 258 248 0 20:00 pts/0 00:00:00 ps -eaf
root#nginx:/#
Debugging as ubuntu as seen here, this arm us with all sort of tools:
root#nginx:/# cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.3 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.3 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
root#nginx:/#
In case ephemeral containers need to be enabled in your cluster, then you can enable it via feature gates as described here.
The whole point of using containers is to optimize the resource utilization in your cluster. The images used should only include the packages that are needed to run your app.
The unwanted packages should be removed from your images (especially in prod) to reduce the compute utilization and to reduce the attack vector.
This appears to be a stripped down image that has only the libraries needed to run that application.
In order to debug, you will have to create a new container in the same pid and network namespace as the container you are trying to debug
Build container first
Dockerfile
FROM alpine
RUN apk update && apk add strace
CMD ["strace", "-p", "1"]
Build
$ docker build -t strace .
Run
docker run -t --pid=container:<targetContainer> \
--net=container:targetContainer \
--cap-add sys_admin \
--cap-add sys_ptrace \
strace
strace: Process 1 attached
futex(0xd72e90, FUTEX_WAIT, 0, NULL
https://rothgar.medium.com/how-to-debug-a-running-docker-container-from-a-separate-container-983f11740dc6

"Error getting stats for file: /usr/share/metricbeat/modules.d/system.yml" when running metricbeat on docker

I'm trying to run metricbeat in a docker container to monitor a server's CPU/RAM usage and load on Kibana, but when I try to run the command sudo docker-compose up I get the following error:
metricbeat | 2021-07-28T05:02:22.033Z ERROR cfgfile/glob_watcher.go:66 Error getting stats for file: /usr/share/metricbeat/modules.d/system.yml
also Kibana doesn't seem to be able to monitor the info although the container's log in the terminal seems to be legit.
These configurations are running on other servers and they work just fine, but I can't seem to figure out the problem here. Also I have ran sudo chown -R 1000:1000 configs/ and sudo chmod -R go-w configs/ in my directory.
This is the system.yml file:
- module: system
metricsets:
- cpu # CPU usage
- load # CPU load averages
- memory # Memory usage
- network # Network IO
- process # Per process metrics
- process_summary # Process summary
- uptime # System Uptime
#- socket_summary # Socket summary
- core # Per CPU core usage
- diskio # Disk IO
- filesystem # File system usage for each mountpoint
- fsstat # File system summary metrics
#- raid # Raid
#- socket # Sockets and connection info (linux only)
#- service # systemd service information
enabled: true
period: 10s
processes: ['.*']
# Configure the mount point of the host’s filesystem for use in monitoring a host from within a container
system.hostfs: "/hostfs"
# Configure the metric types that are included by these metricsets.
cpu.metrics: ["percentages","normalized_percentages"] # The other available option is ticks.
core.metrics: ["percentages"] # The other available option is ticks.
And this is the docker-compose.yml:
services:
metricbeat:
image: ${METRICBEAT_IMAGE}
container_name: metricbeat
network_mode: host
environment:
- ELASTICSEARCH_HOSTS=${ELASTICSEARCH_HOSTS}
- ELASTICSEARCH_USERNAME=${ELASTICSEARCH_USERNAME}
- ELASTICSEARCH_PASSWORD=${ELASTICSEARCH_PASSWORD}
volumes:
- ./configs/metricbeat.docker.yml:/usr/share/metricbeat/metricbeat.yml:ro
- ./configs/modules.d:/usr/share/metricbeat/modules.d:ro
# system module
- /proc:/hostfs/proc:ro
- /sys/fs/cgroup:/hostfs/sys/fs/cgroup:ro
- /:/hostfs:ro
I appreciate any help as this has been bugging me for a while, Thanks in advance.
I have same error.
I found modules.d 'permission is
drw-r--r-x 2 root root 4096 Dec 2 15:17 modules.d
So I execute:
chmod g+X -R modules.d
and restart filebeat .Bingo

Why does kubeadm not start even after disabling swap?

I am trying to install kubernetes with kubeadm in my laptop which has Ubuntu 16.04. I have disabled swap, since kubelet does not work with swap on. The command I used is :
swapoff -a
I also commented out the reference to swap in /etc/fstab.
# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point> <type> <options> <dump> <pass>
# / was on /dev/sda1 during installation
UUID=1d343a19-bd75-47a6-899d-7c8bc93e28ff / ext4 errors=remount-ro 0 1
# swap was on /dev/sda5 during installation
#UUID=d0200036-b211-4e6e-a194-ac2e51dfb27d none swap sw 0 0
I confirmed swap is turned off by running the following:
free -m
total used free shared buff/cache available
Mem: 15936 2108 9433 954 4394 12465
Swap: 0 0 0
When I start kubeadm, I get the following error:
kubeadm init --pod-network-cidr=10.244.0.0/16
[init] Using Kubernetes version: v1.14.2
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR Swap]: running with swap on is not supported. Please disable swap
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
I also tried restarting my laptop, but I get the same error. What could the reason be?
below was the root cause.
detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd".
you need to update the docker cgroup driver.
follow the below fix
cat > /etc/docker/daemon.json <<EOF
{
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2",
"storage-opts": [
"overlay2.override_kernel_check=true"
]
}
EOF
mkdir -p /etc/systemd/system/docker.service.d
# Restart Docker
systemctl daemon-reload
systemctl restart docker
you could try kubeadm reset , then kubeadm init --ignore-preflight-errors Swap .
first try with sudo
sudo swapoff -a
then check if there's anything swapped
cat /proc/swaps
and
free -h

Ceph luminous rbd map hangs forever

Running a 1 node ceph cluster, and using the ceph-client from another node. Qemu is working fine with the RBD mounting. When I try to mount a RBD block device on the ceph-client I get an indefinite hang with no output. How to diagnose whats wrong?
System is ubuntu 16.04 server, and Ceph Luminous.
sudo ceph tell osd.* version
{
"version": "ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous (stable)"
}
ceph -s
cluster:
id: 4bfcc109-e432-4ac0-ba9d-bf81243aea
health: HEALTH_OK
services:
mon: 1 daemons, quorum gcmaster
mgr: gcmaster(active)
osd: 1 osds: 1 up, 1 in
data:
pools: 1 pools, 128 pgs
objects: 1512 objects, 5879 MB
usage: 7356 MB used, 216 GB / 223 GB avail
pgs: 128 active+clean
rbd info gcbase
rbd image 'gcbase':
size 512 MB in 128 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.376974b0dc51
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
flags:
create_timestamp: Fri Dec 29 17:58:02 2017
This hangs forever
rbd map gcbase --pool rbd
As does this
rbd map typo_gcbase --pool rbd
dmesg shows
Dec 29 13:27:32 cephclient1 kernel: [85798.195468] libceph: mon0 192.168.1.55:6789 feature set mismatch, my 106b84a842a42 < server's 40106b84a842a42, missing 400000000000000
Dec 29 13:27:32 cephclient1 kernel: [85798.222070] libceph: mon0 192.168.1.55:6789 missing required protocol features
The dmesg output tells what's going on: The cluster requires a feature bit that is not supported by the libceph kernel module.
The feature bit in question is either CEPH_FEATURE_CRUSH_TUNABLES5, CEPH_FEATURE_NEW_OSDOPREPLY_ENCODING or CEPH_FEATURE_FS_FILE_LAYOUT_V2 (they are overlapping because they were introduced at the same time) which only became available on kernel 4.5, whereas Ubuntu 16.04 uses a 4.4 kernel.
A similar question (although related to CephFS) came up on the mailing list with a possible solution:
Yes, you should be able to set your CRUSH tunables profile to hammer
with "ceph osd crush tunables hammer".
This will disable some features, but should make the older kernel compatible with the cluster.
Alternatively you could upgrade to a mainline kernel or to a newer OS release.

osm2pgsql import fails with "Failed to read from node cache: Input/output error"

I'm attempting a whole-planet OSM data import on an AWS EC2. During or possibly after the "Ways" processing i receive the following message:
"Failed to read from node cache: Input/output error"
The EC2 has the following specs:
type: i3.xlarge
memory: 30.5 Gb
vCPUs: 4
Postgresql: v9.5.6
PostGIS: 2.2
In addition to the root volume, I have mounted 900GB SSD and a 2TB HHD (high throughput). The Postgresql data directory is on the HHD. I have commanded the osm2pgsql to write the flat-nodes file the SSD.
Here is my osm2pgsql command:
osm2pgsql -c -d gis --number-processes 4 --slim -C 20000 --flat-nodes /data-cache/flat-node-cache/flat.nodes /data-postgres/planet-latest.osm.pbf
I run the above command as user renderaccount that is a member of the following groups renderaccount ubuntu postgres. The flat-nodes file appears to be successfully created at /data-cache/flat-node-cache/flat.nodes and has this profile:
ubuntu#ip-172-31-25-230:/data-cache/flat-node-cache$ ls -l
total 37281800
-rw------- 1 renderaccount renderaccount 38176555024 Apr 13 05:45 flat.nodes
Has anyone run into and or resolved this? I suspect maybe a permissions issue? I notice now that since this last osm2pgsql failure, the mounted SSD that is the destination of the flat-nodes file has been converted to a "read-only" file system - which sounds like may happen when there are i/o errors on the mounted volume(?).
Also, does osm2pgsql write to a log that I could acquire additional info?
UPDATE: dmesg output:
[ 6206.884412] blk_update_request: I/O error, dev nvme0n1, sector 66250752
[ 6206.890813] EXT4-fs warning (device nvme0n1): ext4_end_bio:329: I/O error -5 writing to inode 14024706 (offset 10871640064 size 8388608 starting block 8281600)
[ 6206.890817] Buffer I/O error on device nvme0n1, logical block 8281344
After researching the above output, it appears it might be a bug in Ubuntu 16.04. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129?comments=all
This was an error with Ubuntu 16.04 writing to the volume nvme0n1. Solved by this https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129/comments/29