centos7.2 error: Input/output error - centos

Our server runs Centos 7.2, today it seems to fail.
When I connect to it using ssh, I get
-bash: /share/home/MGI/.bash_profile: Input/output error
and log in like this:
-bash-4.2$
ls failed,
ls: cannot open directory .: Input/output error
Trying to edit a file with vi will get Permission denied
but cd and pwd is still of use.
I googled it and found the disk might be damaged, then I tried some suggestions. mount give me this:
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs (rw,nosuid,size=32831312k,nr_inodes=8207828,mode=755)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/net_cls type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
configfs on /sys/kernel/config type configfs (rw,relatime)
/dev/md126p2 on / type ext4 (rw,relatime,data=ordered)
mqueue on /dev/mqueue type mqueue (rw,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=35,pgrp=1,timeout=300,minproto=5,maxproto=5,direct)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)
nfsd on /proc/fs/nfsd type nfsd (rw,relatime)
/dev/mapper/mpatha1 on /share type xfs (rw,relatime,attr2,inode64,noquota)
/dev/md126p1 on /boot type ext4 (rw,relatime,data=ordered)
fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
tmpfs on /run/user/1038 type tmpfs (rw,nosuid,nodev,relatime,size=6569364k,mode=700,uid=1038,gid=1039)
tmpfs on /run/user/1016 type tmpfs (rw,nosuid,nodev,relatime,size=6569364k,mode=700,uid=1016,gid=1016)
tmpfs on /run/user/1008 type tmpfs (rw,nosuid,nodev,relatime,size=6569364k,mode=700,uid=1008,gid=1008)
tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=6569364k,mode=700)
gvfsd-fuse on /run/user/0/gvfs type fuse.gvfsd-fuse (rw,nosuid,nodev,relatime,user_id=0,group_id=0)
tmpfs on /run/user/1019 type tmpfs (rw,nosuid,nodev,relatime,size=6569364k,mode=700,uid=1019,gid=1020)
dmesg gives a long output and these two lines appear frequently:
XFS (dm-1): xfs_log_force: error -5 returned.
scsi 11:0:0:0: alua: rtpg failed with 8000002
What should I do now to find out what is actually going wrong and how can I fix it? Many thanks.

Related

Why does it show I have no disk space left while I still have a lot of space available?

I am on a CentOs system, and df shows that I have a lot of disk spaces available:
See this command:
$ git pull
fatal: write error: No space left on device
$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 30G 4.2G 24G 15% /
devtmpfs 63G 0 63G 0% /dev
tmpfs 63G 0 63G 0% /dev/shm
tmpfs 63G 435M 63G 1% /run
tmpfs 63G 0 63G 0% /sys/fs/cgroup
/dev/sda2 30G 28G 0 100% /usr
/dev/sda7 148G 24G 118G 17% /data0
/dev/sda6 30G 1.3G 27G 5% /var
/dev/sda5 30G 45M 28G 1% /tmp
/dev/sdc1 3.9T 462G 3.3T 13% /data1
/dev/sdb1 274G 107G 154G 42% /data2
tmpfs 13G 0 13G 0% /run/user/60422
And I am currently running the git pull command under /data1, which has 87% spaces left.
Why is that?
EDIT:
df -ih
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sda1 1.9M 14K 1.9M 1% /
devtmpfs 16M 610 16M 1% /dev
tmpfs 16M 1 16M 1% /dev/shm
tmpfs 16M 1022 16M 1% /run
tmpfs 16M 16 16M 1% /sys/fs/cgroup
/dev/sda2 1.9M 344K 1.6M 18% /usr
/dev/sda7 9.5M 58K 9.4M 1% /data0
/dev/sda6 1.9M 14K 1.9M 1% /var
/dev/sda5 1.9M 35 1.9M 1% /tmp
/dev/sdc1 251M 160K 251M 1% /data1
/dev/sdb1 18M 1.2K 18M 1% /data2
tmpfs 16M 1 16M 1% /run/user/60422
Maybe you are running out of inodes? Check with df -ih.

How to delete files on a read-only filesystem not found in the output of `mount`?

I want to rm -rf /var/run/secrets/kubernetes.io/serviceaccount/ to delete the default Kubernetes service account for testing anonymous API access.
However, running the above command shows that many of the files are on a read-only filesystem, so I want to temporarily remount the filesystem (mount -o remount) to delete the files.
Now, how can I see which filesystem in the output of mount to remount? None of the filesystems below are mounted on a /var/run/ path. The closest match is /var/lib/ in the options for the overlay filesystem mounted on /.
How can I safely delete the files under /var/run/secrets/kubernetes.io/serviceaccount/?
root#ctf1-deploy1-6fd44cbcd6-vrckg:~# rm -rf /var/run/secrets/kubernetes.io/serviceaccount/
rm: cannot remove '/var/run/secrets/kubernetes.io/serviceaccount/..data': Read-only file system
rm: cannot remove '/var/run/secrets/kubernetes.io/serviceaccount/token': Read-only file system
rm: cannot remove '/var/run/secrets/kubernetes.io/serviceaccount/namespace': Read-only file system
rm: cannot remove '/var/run/secrets/kubernetes.io/serviceaccount/ca.crt': Read-only file system
rm: cannot remove '/var/run/secrets/kubernetes.io/serviceaccount/..2020_03_25_08_59_25.631059710/namespace': Read-only file system
rm: cannot remove '/var/run/secrets/kubernetes.io/serviceaccount/..2020_03_25_08_59_25.631059710/ca.crt': Read-only file system
rm: cannot remove '/var/run/secrets/kubernetes.io/serviceaccount/..2020_03_25_08_59_25.631059710/token': Read-only file system
root#ctf1-deploy1-6fd44cbcd6-vrckg:~# mount
overlay on / type overlay (rw,relatime,lowerdir=/var/lib/docker/overlay2/l/ZLOIVO6AXDQFZ3NU3O4VYG5QJC:/var/lib/docker/overlay2/l/MZY5Y3MJC6IVDUFSNOQEU55JZJ:/var/lib/docker/overlay2/l/E5HAR5VEWTG6MCFYN22KDJNMK3:/var/lib/docker/overlay2/l/5Z2WGKVJRNGPXV5QIFR5CWHJSE:/var/lib/docker/overlay2/l/U5HNVHXGGWRIGBX3XJV5CW5VIZ:/var/lib/docker/overlay2/l/TI5WPYBQSWQJXBMOY7DT5DN26Z:/var/lib/docker/overlay2/l/XOZNIEPFIZTLP2HEHFKL66H4EO,upperdir=/var/lib/docker/overlay2/4ea51426b9eb47af0faf53e60a25b6cffae64c3338130663efd1068c7d2ffb20/diff,workdir=/var/lib/docker/overlay2/4ea51426b9eb47af0faf53e60a25b6cffae64c3338130663efd1068c7d2ffb20/work)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev type tmpfs (rw,nosuid,size=65536k,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666)
sysfs on /sys type sysfs (ro,nosuid,nodev,noexec,relatime)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,relatime,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (ro,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (ro,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/blkio type cgroup (ro,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (ro,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/perf_event type cgroup (ro,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/devices type cgroup (ro,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/memory type cgroup (ro,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/freezer type cgroup (ro,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/cpuset type cgroup (ro,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/pids type cgroup (ro,nosuid,nodev,noexec,relatime,pids)
mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)
/dev/nvme0n1p2 on /dev/termination-log type ext4 (rw,relatime,data=ordered)
/dev/nvme0n1p2 on /etc/resolv.conf type ext4 (rw,relatime,data=ordered)
/dev/nvme0n1p2 on /etc/hostname type ext4 (rw,relatime,data=ordered)
/dev/nvme0n1p2 on /etc/hosts type ext4 (rw,relatime,data=ordered)
shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=65536k)
tmpfs on /run/secrets/kubernetes.io/serviceaccount type tmpfs (ro,relatime)
proc on /proc/bus type proc (ro,relatime)
proc on /proc/fs type proc (ro,relatime)
proc on /proc/irq type proc (ro,relatime)
proc on /proc/sys type proc (ro,relatime)
proc on /proc/sysrq-trigger type proc (ro,relatime)
tmpfs on /proc/acpi type tmpfs (ro,relatime)
tmpfs on /proc/kcore type tmpfs (rw,nosuid,size=65536k,mode=755)
tmpfs on /proc/keys type tmpfs (rw,nosuid,size=65536k,mode=755)
tmpfs on /proc/timer_list type tmpfs (rw,nosuid,size=65536k,mode=755)
tmpfs on /proc/sched_debug type tmpfs (rw,nosuid,size=65536k,mode=755)
tmpfs on /sys/firmware type tmpfs (ro,relatime)

Disk Usage in kubernetes pod

I am trying to debug the storage usage in my kubernetes pod. I have seen the pod is evicted because of Disk Pressure. When i login to running pod, the see the following
Filesystem Size Used Avail Use% Mounted on
overlay 30G 21G 8.8G 70% /
tmpfs 64M 0 64M 0% /dev
tmpfs 14G 0 14G 0% /sys/fs/cgroup
/dev/sda1 30G 21G 8.8G 70% /etc/hosts
shm 64M 0 64M 0% /dev/shm
tmpfs 14G 12K 14G 1% /run/secrets/kubernetes.io/serviceaccount
tmpfs 14G 0 14G 0% /proc/acpi
tmpfs 14G 0 14G 0% /proc/scsi
tmpfs 14G 0 14G 0% /sys/firmware
root#deploy-9f45856c7-wx9hj:/# du -sh /
du: cannot access '/proc/1142/task/1142/fd/3': No such file or directory
du: cannot access '/proc/1142/task/1142/fdinfo/3': No such file or directory
du: cannot access '/proc/1142/fd/4': No such file or directory
du: cannot access '/proc/1142/fdinfo/4': No such file or directory
227M /
root#deploy-9f45856c7-wx9hj:/# du -sh /tmp
11M /tmp
root#deploy-9f45856c7-wx9hj:/# du -sh /dev
0 /dev
root#deploy-9f45856c7-wx9hj:/# du -sh /sys
0 /sys
root#deploy-9f45856c7-wx9hj:/# du -sh /etc
1.5M /etc
root#deploy-9f45856c7-wx9hj:/#
As we can see 21G is consumed, but when i try to run du -sh it just returns 227M. I would like to find out who(which directory) is consuming the space
According to the docs Node Conditions, DiskPressure has to do with conditions on the node causing kubelet to evict the pod. It doesn't necessarily mean it's the pod that caused the conditions.
DiskPressure
Available disk space and inodes on either the node’s root filesystem
or image filesystem has satisfied an eviction threshold
You may want to investigate what's happening on the node instead.
Looks like the process 1142 is still running and holding file descriptors and/or perhaps some space (You may have other processes and other file descriptors too not being released) Is it the kubelet?. To alleviate the problem you can verify that it's running and then kill it:
$ ps -Af | grep 1142
$ kill -9 1142
P.D. You need to provide more information about the processes and what's running on that node.

How to attach extra volume on Centos7 server

I have created additional volume on my server.
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 19G 3.4G 15G 19% /
devtmpfs 874M 0 874M 0% /dev
tmpfs 896M 0 896M 0% /dev/shm
tmpfs 896M 17M 879M 2% /run
tmpfs 896M 0 896M 0% /sys/fs/cgroup
tmpfs 180M 0 180M 0% /run/user/0
/dev/sdb 25G 44M 24G 1% /mnt/HC_Volume_1788024
How can I attach /dev/sdb either to the whole server (I mean merge it with "/dev/sda1") or assign it to specific directory on server "/var/lib" without overwriting current /var/lib...
Your will not be able to "merge" as you are using standard sdX devices and not using something like LVM for file systems.
As root, you can manually run:
mount /dev/sdb /var/lib/
The original content of /var/lib will still be there (taking up space on your / filesystem)
To make permanent, (carefully) edit your /etc/fstab and add a line like:
/dev/sdb /var/lib FILESYSTEM_OF_YOUR_SDB_DISK defaults 0 0
You will need to replace "FILESYSTEM_OF_YOUR_SDB_DISK" with the correct filesystem type ("df -T", "blkid" or "lsblk -f" will show the type)
You should test the "correctness" of your /etc/fstab by first:
umount /var/lib (if you previously mounted)
Then run:
mount -a then
df -T
and you should see the mount point corrected and the "mount -a" should have not produced any error.

Which device Docker Container writing to?

I am trying to throttle the disk I/O of a Docker container using the blkio controller (without destroying the container), but I am unsure how to find out which device to run the throttling on.
The Docker container is running Mongo. Running a df -h inside the bash of the container gives the following:
root#82e7bdc56db0:/# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/docker-202:1-524400-916a3c171357c4f0349e0145e34e7faf60720c66f9a68badcc09d05397190c64 10G 379M 9.7G 4% /
tmpfs 1.9G 0 1.9G 0% /dev
tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup
/dev/xvda1 32G 3.2G 27G 11% /data/db
shm 64M 0 64M 0% /dev/shm
Is there a way to find out which device to limit on the host machine? Thanks!
$docker info
Containers: 9
Running: 9
Paused: 0
Stopped: 0
Images: 6
Server Version: 1.12.1
Storage Driver: devicemapper
Pool Name: docker-202:1-524400-pool
Pool Blocksize: 65.54 kB
Base Device Size: 10.74 GB
Backing Filesystem: xfs
Data file: /dev/loop0
Metadata file: /dev/loop1
Data Space Used: 1.694 GB
Data Space Total: 107.4 GB
Data Space Available: 30.31 GB
Metadata Space Used: 3.994 MB
Metadata Space Total: 2.147 GB
Metadata Space Available: 2.143 GB
Thin Pool Minimum Free Space: 10.74 GB
Udev Sync Supported: true
Deferred Removal Enabled: false
Deferred Deletion Enabled: false
Deferred Deleted Device Count: 0
Data loop file: /var/lib/docker/devicemapper/devicemapper/data
WARNING: Usage of loopback devices is strongly discouraged for production use. Use `--storage-opt dm.thinpooldev` to specify a custom block storage device.
Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
Library Version: 1.02.110 (2015-10-30)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: null host overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: apparmor seccomp
Kernel Version: 4.4.0-38-generic
Operating System: Ubuntu 16.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.675 GiB
Name: ip-172-31-6-72
ID: 4RCS:IMKM:A5ZT:H5IA:6B4B:M3IG:XGWK:2223:UAZX:GHNA:FUST:E5XC
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
127.0.0.0/8
df -h on the host machine:
Filesystem Size Used Avail Use% Mounted on
udev 1.9G 0 1.9G 0% /dev
tmpfs 377M 6.1M 371M 2% /run
/dev/xvda1 32G 3.2G 27G 11% /
tmpfs 1.9G 0 1.9G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup
tmpfs 377M 0 377M 0% /run/user/1001