Debugging emacs consuming 100%, but still responsive - emacs

I notice that emacs is consuming 100% of cpu. An strace of the emacs process shows that it is writting the following text:
Wrong type argument: window-live-p, #<window 3>
Wrong type argument: window-live-p, #<window 3>
Wrong type argument: window-live-p, #<window 3>
Wrong type argument: window-live-p, #<window 3>
Wrong type argument: window-live-p, #<window 3>
and so on.
The text is being written to a file descriptor which according to /proc/.../fd/ is a pipe.
Emacs is still responsive, so this must be code that is running on the "background".
How can identify which code is responsible for this and debug it?
Edit:
I checked the pipe number to which this text is being written, and did an lsof. I see that the text is possibly interacting with dconf, and gdbus.
$lsof |grep 6967
sh 7147 myuser 1w FIFO 0,8 0t0 6967 pipe
sh 7147 myuser 2w FIFO 0,8 0t0 6967 pipe
emacs 7148 myuser 1w FIFO 0,8 0t0 6967 pipe
emacs 7148 myuser 2w FIFO 0,8 0t0 6967 pipe
dconf 7148 7149 myuser 1w FIFO 0,8 0t0 6967 pipe
dconf 7148 7149 myuser 2w FIFO 0,8 0t0 6967 pipe
gdbus 7148 7151 myuser 1w FIFO 0,8 0t0 6967 pipe
gdbus 7148 7151 myuser 2w FIFO 0,8 0t0 6967 pipe
$
To be honest, the actual strace output is the following:
write(2, "\n", 1) = -1 EPIPE (Broken pipe)
write(2, "W", 1) = -1 EPIPE (Broken pipe)
write(2, "r", 1) = -1 EPIPE (Broken pipe)
write(2, "o", 1) = -1 EPIPE (Broken pipe)
write(2, "n", 1) = -1 EPIPE (Broken pipe)
write(2, "g", 1) = -1 EPIPE (Broken pipe)
write(2, " ", 1) = -1 EPIPE (Broken pipe)
write(2, "t", 1) = -1 EPIPE (Broken pipe)
write(2, "y", 1) = -1 EPIPE (Broken pipe)
write(2, "p", 1) = -1 EPIPE (Broken pipe)
write(2, "e", 1) = -1 EPIPE (Broken pipe)
write(2, " ", 1) = -1 EPIPE (Broken pipe)
write(2, "a", 1) = -1 EPIPE (Broken pipe)
write(2, "r", 1) = -1 EPIPE (Broken pipe)
write(2, "g", 1) = -1 EPIPE (Broken pipe)
write(2, "u", 1) = -1 EPIPE (Broken pipe)
write(2, "m", 1) = -1 EPIPE (Broken pipe)
write(2, "e", 1) = -1 EPIPE (Broken pipe)
write(2, "n", 1) = -1 EPIPE (Broken pipe)
write(2, "t", 1) = -1 EPIPE (Broken pipe)
write(2, ":", 1) = -1 EPIPE (Broken pipe)
write(2, " ", 1) = -1 EPIPE (Broken pipe)
write(2, "w", 1) = -1 EPIPE (Broken pipe)
write(2, "i", 1) = -1 EPIPE (Broken pipe)
write(2, "n", 1) = -1 EPIPE (Broken pipe)
write(2, "d", 1) = -1 EPIPE (Broken pipe)
write(2, "o", 1) = -1 EPIPE (Broken pipe)
write(2, "w", 1) = -1 EPIPE (Broken pipe)
write(2, "-", 1) = -1 EPIPE (Broken pipe)
write(2, "l", 1) = -1 EPIPE (Broken pipe)
write(2, "i", 1) = -1 EPIPE (Broken pipe)
write(2, "v", 1) = -1 EPIPE (Broken pipe)
write(2, "e", 1) = -1 EPIPE (Broken pipe)
write(2, "-", 1) = -1 EPIPE (Broken pipe)
write(2, "p", 1) = -1 EPIPE (Broken pipe)
write(2, ",", 1) = -1 EPIPE (Broken pipe)
write(2, " ", 1) = -1 EPIPE (Broken pipe)
write(2, "#", 1) = -1 EPIPE (Broken pipe)
write(2, "<", 1) = -1 EPIPE (Broken pipe)
write(2, "w", 1) = -1 EPIPE (Broken pipe)
write(2, "i", 1) = -1 EPIPE (Broken pipe)
write(2, "n", 1) = -1 EPIPE (Broken pipe)
write(2, "d", 1) = -1 EPIPE (Broken pipe)
write(2, "o", 1) = -1 EPIPE (Broken pipe)
write(2, "w", 1) = -1 EPIPE (Broken pipe)
write(2, " ", 1) = -1 EPIPE (Broken pipe)
write(2, "3", 1) = -1 EPIPE (Broken pipe)
write(2, ">", 1) = -1 EPIPE (Broken pipe)
write(2, "\n", 1) = -1 EPIPE (Broken pipe)
So it is writing what I originally wrote, but one character at a time. However, it seems to be writing this to a broken pipe.

Related

How to add the lustre file system client to a BlueData container?

I'm trying to set up a lustre client (docs) inside a docker container running on BlueData.
As per this post, I've modified the BlueData config on each worker and the controller node:
$ vi /opt/bluedata/common-install/bd_mgmt/releases/1/sys.config
I added the SYS_ADMIN capability:
{allowed_docker_caps, ["SETPCAP",
"SYS_ADMIN",
...
And rebooted the host.
Next, I provisioned a Centos 7.x cluster in BlueData:
CentOS 7.x with no pre-packaged apps or software
Image Version: 2.2
Distro ID: bluedata/centos7
Then I ssh'd into the Centos container:
$ ssh -o StrictHostKeyChecking=no -i /Users/me/.ssh/id_rsa centos#x.x.x.x
Inside the container, I install the lustre client:
sudo yum install \
kernel \
kernel-devel \
kernel-headers \
kernel-abi-whitelists \
kernel-tools \
kernel-tools-libs \
kernel-tools-libs-devel
cat >/tmp/lustre-repo.conf <<\__EOF
[lustre-server]
name=lustre-server
baseurl=https://downloads.whamcloud.com/public/lustre/latest-release/el7/server
gpgcheck=0
[lustre-client]
name=lustre-client
baseurl=https://downloads.whamcloud.com/public/lustre/latest-release/el7/client
gpgcheck=0
[e2fsprogs-wc]
name=e2fsprogs-wc
baseurl=https://downloads.whamcloud.com/public/e2fsprogs/latest/el7
gpgcheck=0
__EOF
sudo mv /tmp/lustre-repo.conf /etc/yum.repos.d/lustre.repo
sudo reboot
sudo yum install epel-release
sudo yum --nogpgcheck --enablerepo=lustre-client install lustre-client-dkms lustre-client
sudo reboot
However, I receive an error when I try to load the lustre module:
$ sudo modprobe -v lustre
insmod /lib/modules/3.10.0-957.21.3.el7.x86_64/kernel/crypto/crct10dif_generic.ko.xz
insmod /lib/modules/3.10.0-957.21.3.el7.x86_64/kernel/lib/crc-t10dif.ko.xz
modprobe: ERROR: could not insert 'lustre': Operation not permitted
I have checked the kernel version:
[bluedata#bluedata-2 ~]$ uname -a
Linux bluedata-2.bdlocal 3.10.0-957.21.3.el7.x86_64 #1 SMP Tue Jun 18 16:35:19 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
The lustre version I installed is 2.12:
kmod-lustre-client.x86_64 2.12.2-1.el7 #lustre-client
lustre-client.x86_64 2.12.2-1.el7 #lustre-client
Update 1
No errors are shown with dmesg:
[bluedata#bluedata-3 ~]$ dmesg -c
[bluedata#bluedata-3 ~]$ sudo modprobe -v lustre
insmod /lib/modules/3.10.0-957.21.3.el7.x86_64/kernel/crypto/crct10dif_generic.ko.xz
insmod /lib/modules/3.10.0-957.21.3.el7.x86_64/kernel/lib/crc-t10dif.ko.xz
modprobe: ERROR: could not insert 'lustre': Operation not permitted
[bluedata#bluedata-3 ~]$ dmesg
Update 2
$ sudo strace modprobe lustre
Outputs:
execve("/sbin/modprobe", ["modprobe", "lustre"], [/* 16 vars */]) = 0
brk(NULL) = 0x1648000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4458ff2000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=22387, ...}) = 0
mmap(NULL, 22387, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f4458fec000
close(3) = 0
open("/lib64/liblzma.so.5", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\2000\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=157424, ...}) = 0
mmap(NULL, 2249352, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f4458bac000
mprotect(0x7f4458bd1000, 2093056, PROT_NONE) = 0
mmap(0x7f4458dd0000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x24000) = 0x7f4458dd0000
close(3) = 0
open("/lib64/libz.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20!\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=90248, ...}) = 0
mmap(NULL, 2183272, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f4458996000
mprotect(0x7f44589ab000, 2093056, PROT_NONE) = 0
mmap(0x7f4458baa000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x14000) = 0x7f4458baa000
close(3) = 0
open("/lib64/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\220*\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=88776, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4458feb000
mmap(NULL, 2184192, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f4458780000
mprotect(0x7f4458795000, 2093056, PROT_NONE) = 0
mmap(0x7f4458994000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x14000) = 0x7f4458994000
close(3) = 0
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240%\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2151672, ...}) = 0
mmap(NULL, 3981792, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f44583b3000
mprotect(0x7f4458575000, 2097152, PROT_NONE) = 0
mmap(0x7f4458775000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1c2000) = 0x7f4458775000
mmap(0x7f445877b000, 16864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f445877b000
close(3) = 0
open("/lib64/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260l\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=141968, ...}) = 0
mmap(NULL, 2208904, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f4458197000
mprotect(0x7f44581ae000, 2093056, PROT_NONE) = 0
mmap(0x7f44583ad000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x16000) = 0x7f44583ad000
mmap(0x7f44583af000, 13448, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f44583af000
close(3) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4458fea000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4458fe8000
arch_prctl(ARCH_SET_FS, 0x7f4458fe8740) = 0
mprotect(0x7f4458775000, 16384, PROT_READ) = 0
mprotect(0x7f44583ad000, 4096, PROT_READ) = 0
mprotect(0x7f4458994000, 4096, PROT_READ) = 0
mprotect(0x7f4458baa000, 4096, PROT_READ) = 0
mprotect(0x7f4458dd0000, 4096, PROT_READ) = 0
mprotect(0x621000, 4096, PROT_READ) = 0
mprotect(0x7f4458ff3000, 4096, PROT_READ) = 0
munmap(0x7f4458fec000, 22387) = 0
set_tid_address(0x7f4458fe8a10) = 1264
set_robust_list(0x7f4458fe8a20, 24) = 0
rt_sigaction(SIGRTMIN, {0x7f445819d790, [], SA_RESTORER|SA_SIGINFO, 0x7f44581a65d0}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {0x7f445819d820, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x7f44581a65d0}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
brk(NULL) = 0x1648000
brk(0x1669000) = 0x1669000
brk(NULL) = 0x1669000
uname({sysname="Linux", nodename="bluedata-3.bdlocal", ...}) = 0
stat("/etc/modprobe.d", {st_mode=S_IFDIR|0755, st_size=54, ...}) = 0
openat(AT_FDCWD, "/etc/modprobe.d", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
getdents(3, /* 4 entries */, 32768) = 128
newfstatat(3, "dccp-blacklist.conf", {st_mode=S_IFREG|0644, st_size=215, ...}, 0) = 0
newfstatat(3, "ko2iblnd.conf", {st_mode=S_IFREG|0644, st_size=999, ...}, 0) = 0
getdents(3, /* 0 entries */, 32768) = 0
close(3) = 0
stat("/run/modprobe.d", 0x7ffcc1e0a640) = -1 ENOENT (No such file or directory)
stat("/lib/modprobe.d", {st_mode=S_IFDIR|0755, st_size=6, ...}) = 0
openat(AT_FDCWD, "/lib/modprobe.d", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
getdents(3, /* 2 entries */, 32768) = 48
getdents(3, /* 0 entries */, 32768) = 0
close(3) = 0
open("/etc/modprobe.d/dccp-blacklist.conf", O_RDONLY|O_CLOEXEC) = 3
fcntl(3, F_GETFL) = 0x8000 (flags O_RDONLY|O_LARGEFILE)
fstat(3, {st_mode=S_IFREG|0644, st_size=215, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4458ff1000
read(3, "# DCCP is considered a potential"..., 4096) = 215
read(3, "", 4096) = 0
close(3) = 0
munmap(0x7f4458ff1000, 4096) = 0
open("/etc/modprobe.d/ko2iblnd.conf", O_RDONLY|O_CLOEXEC) = 3
fcntl(3, F_GETFL) = 0x8000 (flags O_RDONLY|O_LARGEFILE)
fstat(3, {st_mode=S_IFREG|0644, st_size=999, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4458ff1000
read(3, "# Currently it isn't possible to"..., 4096) = 999
read(3, "", 4096) = 0
close(3) = 0
munmap(0x7f4458ff1000, 4096) = 0
open("/lib/modules/3.10.0-957.21.3.el7.x86_64/modules.softdep", O_RDONLY|O_CLOEXEC) = 3
fcntl(3, F_GETFL) = 0x8000 (flags O_RDONLY|O_LARGEFILE)
fstat(3, {st_mode=S_IFREG|0644, st_size=518, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4458ff1000
read(3, "# Soft dependencies extracted fr"..., 4096) = 518
read(3, "", 4096) = 0
close(3) = 0
munmap(0x7f4458ff1000, 4096) = 0
open("/proc/cmdline", O_RDONLY|O_CLOEXEC) = 3
read(3, "BOOT_IMAGE=/boot/vmlinuz-3.10.0-"..., 4095) = 193
read(3, "", 3902) = 0
close(3) = 0
open("/lib/modules/3.10.0-957.21.3.el7.x86_64/modules.dep.bin", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=382199, ...}) = 0
mmap(NULL, 382199, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f4458f8a000
close(3) = 0
open("/lib/modules/3.10.0-957.21.3.el7.x86_64/modules.alias.bin", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=802187, ...}) = 0
mmap(NULL, 802187, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f4458ec6000
close(3) = 0
open("/lib/modules/3.10.0-957.21.3.el7.x86_64/modules.symbols.bin", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=537967, ...}) = 0
mmap(NULL, 537967, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f4458e42000
close(3) = 0
open("/lib/modules/3.10.0-957.21.3.el7.x86_64/modules.builtin.bin", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=9332, ...}) = 0
mmap(NULL, 9332, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f4458fef000
close(3) = 0
stat("/lib/modules/3.10.0-957.21.3.el7.x86_64/extra/lmv.ko.xz", {st_mode=S_IFREG|0644, st_size=58688, ...}) = 0
stat("/lib/modules/3.10.0-957.21.3.el7.x86_64/extra/mdc.ko.xz", {st_mode=S_IFREG|0644, st_size=81772, ...}) = 0
stat("/lib/modules/3.10.0-957.21.3.el7.x86_64/extra/fid.ko.xz", {st_mode=S_IFREG|0644, st_size=11592, ...}) = 0
stat("/lib/modules/3.10.0-957.21.3.el7.x86_64/extra/osc.ko.xz", {st_mode=S_IFREG|0644, st_size=133688, ...}) = 0
stat("/lib/modules/3.10.0-957.21.3.el7.x86_64/extra/lov.ko.xz", {st_mode=S_IFREG|0644, st_size=101472, ...}) = 0
stat("/lib/modules/3.10.0-957.21.3.el7.x86_64/extra/fld.ko.xz", {st_mode=S_IFREG|0644, st_size=14600, ...}) = 0
stat("/lib/modules/3.10.0-957.21.3.el7.x86_64/extra/ptlrpc.ko.xz", {st_mode=S_IFREG|0644, st_size=369448, ...}) = 0
stat("/lib/modules/3.10.0-957.21.3.el7.x86_64/extra/obdclass.ko.xz", {st_mode=S_IFREG|0644, st_size=270652, ...}) = 0
stat("/lib/modules/3.10.0-957.21.3.el7.x86_64/extra/lnet.ko.xz", {st_mode=S_IFREG|0644, st_size=174800, ...}) = 0
stat("/lib/modules/3.10.0-957.21.3.el7.x86_64/extra/libcfs.ko.xz", {st_mode=S_IFREG|0644, st_size=88252, ...}) = 0
stat("/lib/modules/3.10.0-957.21.3.el7.x86_64/kernel/lib/crc-t10dif.ko.xz", {st_mode=S_IFREG|0644, st_size=2028, ...}) = 0
stat("/lib/modules/3.10.0-957.21.3.el7.x86_64/kernel/crypto/crct10dif_common.ko.xz", {st_mode=S_IFREG|0644, st_size=2004, ...}) = 0
open("/sys/module/lustre/initstate", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/sys/module/lustre", 0x7ffcc1e0a5c0) = -1 ENOENT (No such file or directory)
stat("/lib/modules/3.10.0-957.21.3.el7.x86_64/kernel/crypto/crct10dif_common.ko.xz", {st_mode=S_IFREG|0644, st_size=2004, ...}) = 0
stat("/lib/modules/3.10.0-957.21.3.el7.x86_64/kernel/crypto/crct10dif_common.ko.xz", {st_mode=S_IFREG|0644, st_size=2004, ...}) = 0
open("/sys/module/crct10dif_common/initstate", O_RDONLY|O_CLOEXEC) = 3
read(3, "live\n", 31) = 5
read(3, "", 26) = 0
close(3) = 0
open("/sys/module/crct10dif_common/initstate", O_RDONLY|O_CLOEXEC) = 3
read(3, "live\n", 31) = 5
read(3, "", 26) = 0
close(3) = 0
open("/sys/module/crct10dif_pclmul/initstate", O_RDONLY|O_CLOEXEC) = 3
read(3, "live\n", 31) = 5
read(3, "", 26) = 0
close(3) = 0
open("/sys/module/crct10dif_common/initstate", O_RDONLY|O_CLOEXEC) = 3
read(3, "live\n", 31) = 5
read(3, "", 26) = 0
close(3) = 0
open("/sys/module/crct10dif_generic/initstate", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/sys/module/crct10dif_generic", 0x7ffcc1e0a5c0) = -1 ENOENT (No such file or directory)
open("/lib/modules/3.10.0-957.21.3.el7.x86_64/kernel/crypto/crct10dif_generic.ko.xz", O_RDONLY|O_CLOEXEC) = 3
read(3, "\3757zXZ\0", 6) = 6
lseek(3, 0, SEEK_SET) = 0
read(3, "\3757zXZ\0\0\4\346\326\264F\2\0!\1\26\0\0\0t/\345\243\340\30l\6\267]\0?"..., 8192) = 1784
mmap(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4457996000
read(3, "", 8192) = 0
munmap(0x7f4457996000, 8392704) = 0
init_module(0x1653f40, 6253, "") = -1 ENOSYS (Function not implemented)
open("/sys/module/crc_t10dif/initstate", O_RDONLY|O_CLOEXEC) = -1 ENOSYS (Function not implemented)
stat("/sys/module/crc_t10dif", 0x7ffcc1e0a5c0) = -1 ENOSYS (Function not implemented)
open("/lib/modules/3.10.0-957.21.3.el7.x86_64/kernel/lib/crc-t10dif.ko.xz", O_RDONLY|O_CLOEXEC) = -1 ENOSYS (Function not implemented)
read(4, 0x7ffcc1e0b5f0, 6) = -1 ENOSYS (Function not implemented)
lseek(4, 0, SEEK_SET) = -1 ENOSYS (Function not implemented)
read(4, 0x7ffcc1e074e0, 8192) = -1 ENOSYS (Function not implemented)
brk(NULL) = -1 ENOSYS (Function not implemented)
brk(0x1e7d000) = -1 ENOSYS (Function not implemented)
read(4, 0x7ffcc1e074e0, 8192) = -1 EPERM (Operation not permitted)
close(3) = 0
write(2, "modprobe: ERROR: could not inser"..., 68modprobe: ERROR: could not insert 'lustre': Operation not permitted
) = 68
close(4) = 0
munmap(0x7f4458f8a000, 382199) = 0
munmap(0x7f4458ec6000, 802187) = 0
munmap(0x7f4458e42000, 537967) = 0
munmap(0x7f4458fef000, 9332) = 0
exit_group(1) = ?
+++ exited with 1 +++
Update 3
I tried installing the kmod package instead of dkms:
Running transaction
Installing : kmod-lustre-client-2.12.2-1.el7.x86_64 1/1
mknod: '/var/tmp/dracut.cG1SKj/initramfs/dev/null': Operation not permitted
mknod: '/var/tmp/dracut.cG1SKj/initramfs/dev/kmsg': Operation not permitted
mknod: '/var/tmp/dracut.cG1SKj/initramfs/dev/console': Operation not permitted
Verifying : kmod-lustre-client-2.12.2-1.el7.x86_64 1/1
Installed:
kmod-lustre-client.x86_64 0:2.12.2-1.el7
Complete!
I then tried again sudo strace modprobe lustre:
...
open("/lib/modules/3.10.0-957.21.3.el7.x86_64/kernel/lib/crc-t10dif.ko.xz", O_RDONLY|O_CLOEXEC) = -1 ENOSYS (Function not implemented)
read(4, 0x7fff450be5f0, 6) = -1 ENOSYS (Function not implemented)
lseek(4, 0, SEEK_SET) = -1 ENOSYS (Function not implemented)
read(4, 0x7fff450ba4e0, 8192) = -1 ENOSYS (Function not implemented)
brk(NULL) = -1 ENOSYS (Function not implemented)
brk(0x1410000) = -1 ENOSYS (Function not implemented)
read(4, 0x7fff450ba4e0, 8192) = -1 EPERM (Operation not permitted)
close(3) = 0
write(2, "modprobe: ERROR: could not inser"..., 68modprobe: ERROR: could not insert 'lustre': Operation not permitted
) = 68
close(4) = 0
munmap(0x7f04da388000, 383873) = 0
munmap(0x7f04da2c4000, 802187) = 0
munmap(0x7f04da240000, 537967) = 0
munmap(0x7f04da3ed000, 9332) = 0
exit_group(1) = ?
+++ exited with 1 +++
Update 4
Running the container as --privileged has resolved the original error, but I now hit a new error:
[bluedata#bluedata-5 ~]$ sudo dmesg -c
[bluedata#bluedata-5 ~]$ sudo modprobe -v lustre
insmod /lib/modules/3.10.0-957.21.3.el7.x86_64/extra/ptlrpc.ko.xz
modprobe: ERROR: could not insert 'lustre': Invalid argument
[bluedata#bluedata-5 ~]$ dmesg
[ 2072.258326] LNetError: 56638:0:(api-ni.c:2233:lnet_startup_lndnet()) Can't load LND tcp, module ksocklnd, rc=256
[ 2072.264113] LustreError: 56638:0:(events.c:625:ptlrpc_init_portals()) network initialisation failed
Update 5
The error message suggested I needed to configure the network, so I tried:
[bluedata#bluedata-5 ~]$ sudo modprobe lnet
[bluedata#bluedata-5 ~]$ sudo lnetctl lnet configure
lustre now loads without error:
[bluedata#bluedata-5 ~]$ sudo modprobe -v lustre
insmod /lib/modules/3.10.0-957.21.3.el7.x86_64/extra/ptlrpc.ko.xz
insmod /lib/modules/3.10.0-957.21.3.el7.x86_64/extra/fld.ko.xz
insmod /lib/modules/3.10.0-957.21.3.el7.x86_64/extra/lov.ko.xz
insmod /lib/modules/3.10.0-957.21.3.el7.x86_64/extra/osc.ko.xz
insmod /lib/modules/3.10.0-957.21.3.el7.x86_64/extra/fid.ko.xz
insmod /lib/modules/3.10.0-957.21.3.el7.x86_64/extra/mdc.ko.xz
insmod /lib/modules/3.10.0-957.21.3.el7.x86_64/extra/lmv.ko.xz
insmod /lib/modules/3.10.0-957.21.3.el7.x86_64/extra/lustre.ko.xz
I followed the original steps in the question and run the container as --privileged. Then loading and configuring lnet allowed loading the lustre module without error:
[bluedata#bluedata-5 ~]$ sudo modprobe lnet
[bluedata#bluedata-5 ~]$ sudo lnetctl lnet configure
[bluedata#bluedata-5 ~]$ sudo modprobe -v lustre
insmod /lib/modules/3.10.0-957.21.3.el7.x86_64/extra/ptlrpc.ko.xz
insmod /lib/modules/3.10.0-957.21.3.el7.x86_64/extra/fld.ko.xz
insmod /lib/modules/3.10.0-957.21.3.el7.x86_64/extra/lov.ko.xz
insmod /lib/modules/3.10.0-957.21.3.el7.x86_64/extra/osc.ko.xz
insmod /lib/modules/3.10.0-957.21.3.el7.x86_64/extra/fid.ko.xz
insmod /lib/modules/3.10.0-957.21.3.el7.x86_64/extra/mdc.ko.xz
insmod /lib/modules/3.10.0-957.21.3.el7.x86_64/extra/lmv.ko.xz
insmod /lib/modules/3.10.0-957.21.3.el7.x86_64/extra/lustre.ko.xz
[bluedata#bluedata-5 ~]$
IMPORTANT NOTE: Running with the privileged flag is not recommended. There are other options - reach out to your local BlueData team to learn more.

bind error when doing traceroute

I'm not very knowledgeable in named and bind, but after setting up my domain and playing with named (trying to set up my private email server) I ended up probably messing up something. Taceroute does not work anymore, I get a bind error (ping works). Relevant part of "strare traceroute xxx.xxx.xxx.xxx" is below. Can anybody please guide me to a solution - thx jankom
socket(PF_INET, SOCK_RAW, IPPROTO_ICMP) = 10
socket(PF_INET, SOCK_RAW, IPPROTO_RAW) = 11
getuid32() = 0
setuid32(0) = 0
getpid() = 3212
socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 12
bind(12, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = -1 EINVAL (Invalid argument)
dup(2) = 13
fcntl64(13, F_GETFL) = 0x2 (flags O_RDWR)
fstat64(13, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 4), ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb77c3000
_llseek(13, 0, 0xbfc4ebc0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
write(13, "bind: Invalid argument\n", 23bind: Invalid argument) = 23
) 23
close(13) = 0
munmap(0xb77c3000, 4096) = 0
exit_group(1) = ?

Hung processes resume if attached to strace

I have a network program written in C using TCP sockets. Sometimes the client program hangs forever expecting input from server. Specifically, the client hangs on select() call set on an fd intended to read characters sent by server.
I am using strace to know where the process got stuck. However, sometimes when I attach the hung client process to strace, it immediately resumes it's execution and properly exits. Not all hung processes exhibit this behavior, some processes stuck in the select() even if I attach them to strace. But most of the processes resume their execution when attached to strace.
I am curious what causing the processes resume when attached to strace. It might give me clues to know why client processes are getting hung.
Any ideas? what causes a hung process to resume it's execution when attached to strace?
Update:
Here's the output of strace on hung processes.
> sudo strace -p 25645
Process 25645 attached - interrupt to quit
--- SIGSTOP (Stopped (signal)) # 0 (0) ---
--- SIGSTOP (Stopped (signal)) # 0 (0) ---
[ Process PID=25645 runs in 32 bit mode. ]
select(6, [3 5], NULL, NULL, NULL) = 2 (in [3 5])
read(5, "\0", 8192) = 1
write(2, "", 0) = 0
read(3, "====Setup set_oldtempbehaio"..., 8192) = 555
write(1, "====Setup set_oldtempbehaio"..., 555) = 555
select(6, [3 5], NULL, NULL, NULL) = 2 (in [3 5])
read(5, "", 8192) = 0
read(3, "", 8192) = 0
close(5) = 0
kill(25652, SIGKILL) = 0
exit_group(0) = ?
Process 25645 detached
_
> sudo strace -p 14462
Process 14462 attached - interrupt to quit
[ Process PID=14462 runs in 32 bit mode. ]
read(0, 0xff85fdbc, 8192) = -1 EIO (Input/output error)
shutdown(3, 1 /* send */) = 0
exit_group(0) = ?
_
> sudo strace -p 7517
Process 7517 attached - interrupt to quit
--- SIGSTOP (Stopped (signal)) # 0 (0) ---
--- SIGSTOP (Stopped (signal)) # 0 (0) ---
[ Process PID=7517 runs in 32 bit mode. ]
connect(3, {sa_family=AF_INET, sin_port=htons(300), sin_addr=inet_addr("100.64.220.98")}, 16) = -1 ETIMEDOUT (Connection timed out)
close(3) = 0
dup(2) = 3
fcntl64(3, F_GETFL) = 0x1 (flags O_WRONLY)
close(3) = 0
write(2, "dsd13: Connection timed out\n", 30) = 30
write(2, "Error code : 110\n", 17) = 17
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
exit_group(1) = ?
Process 7517 detached
Not just select(), but the processes(of same program) are stuck in various system calls before I attach them to strace. They suddenly resume after attaching to strace. If I don't attach them to strace, they just hang there forever.
Update 2:
I learned that strace could start a process which was previously stopped (process in T sate). Now I am trying to understand why did these processes go to 'T' state, what's the cause. Here's the /proc//status information:
> cat /proc/12554/status
Name: someone
State: T (stopped)
SleepAVG: 88%
Tgid: 12554
Pid: 12554
PPid: 9754
TracerPid: 0
Uid: 5000 5000 5000 5000
Gid: 48986 48986 48986 48986
FDSize: 256
Groups: 9149 48986
VmPeak: 1992 kB
VmSize: 1964 kB
VmLck: 0 kB
VmHWM: 608 kB
VmRSS: 608 kB
VmData: 156 kB
VmStk: 20 kB
VmExe: 16 kB
VmLib: 1744 kB
VmPTE: 20 kB
Threads: 1
SigQ: 54/73728
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000006
SigCgt: 0000000000004000
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
Cpus_allowed: 00000000,00000000,00000000,0000000f
Mems_allowed: 00000000,00000001
strace uses ptrace. The ptrace man page has this:
Since attaching sends SIGSTOP and the tracer usually suppresses it,
this may cause a stray EINTR return from the currently executing system
call in the tracee, as described in the "Signal injection and
suppression" section.
Are you seeing select return EINTR?

nrpe unable to run custom perl script: Return Code: 1, Output: NRPE: Unable to read output

I'm trying to implement a custom perl nagios script to check for rogue dhcp servers remotely with nrpe. On the central server when i run:
/usr/local/nagios/libexec/check_nrpe -H 10.9.0.25 -c check_roguedhcp
In my debugging logs i'm seeing this:
Host is asking for command 'check_roguedhcp' to be run...
Running command: sudo /usr/lib64/nagios/plugins/check_roguedhcp.pl
Command completed with return code 1 and output:
Return Code: 1, Output: NRPE: Unable to read output
Locally if i run the script (even as the nrpe user) I get the expected output.
On the local server my /etc/nagios/nrpe.cfg has the following settings:
command[check_roguedhcp]=sudo /usr/lib64/nagios/plugins/check_roguedhcp.pl
command[check_dhcp]=sudo /usr/lib64/nagios/plugins/check_dhcp -v
nrpe_user=nrpe
nrpe_group=nagios
ps aux shows nrpe is running as user nrpe (nrpe is in group nagios)
nrpe 5941 0.0 0.1 52804 2384 ? Ss 08:25 0:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d
I've added the command to /etc/sudoers
%nagios ALL=(ALL) NOPASSWD: /usr/lib/nagios64/plugins/check_dhcp, /usr/lib64/nagios/plugins/check_roguedhcp.pl
on my central server that does the nrpe calls, i have the following service groups and configurations:
define servicegroup{
servicegroup_name rogue_dhcp
alias All dhcp monitors
}
define service{
name security-service
servicegroups rogue_dhcp
register 0
max_check_attempts 1
}
Nagios can run any other check_users etc script via nrpe on this server.
Here's the perl script itself, though we know that the file executes locally just fine.
1 #!/usr/bin/perl -w
2 # nagios: -epn
3 # the above makes nagios run the script separately.
4 use POSIX;
5 use lib "/usr/lib64/nagios/plugins";
6 use utils qw(%ERRORS);
7
8 sub fail_usage {
9 if (scalar #_) {
10 print "$0: error: \n";
11 map { print " $_\n"; } #_;
12 }
13 print "$0: Usage: \n";
14 print "$0 [-v [-v [-v]]] [ []] \n";
15 print "$0 [-v [-v [-v]]] [-s] [[-s] [[-s] ]] \n";
16 print " \n";
17 exit 3 ;
18 }
19
20 my $verbose = 0;
21 my %servers=(
22 "x", "10.x.x.x",
23 "x", "10.x.x.x",
24 "x", "10.x.x.x",
25 "x", "10.x.x.x"
26 );
27
28 # examine commandline args
29 while ($ARGV=$ARGV[0]) {
30 my $myarg = $ARGV;
31 if ($ARGV eq '-s') {
32 shift #ARGV;
33 if (!($ARGV = $ARGV[0])) { fail_usage ("$myarg needs an argument"); }
34 if ($ARGV =~ /^-/) { fail_usage ("$myarg must be followed by an argument"); }
35 if (!defined($servers{$ARGV})) { $servers{$ARGV}=1; }
36 }
37 elsif ($ARGV eq '-v' ) { $verbose++; }
38 elsif ($ARGV eq '-h' or $ARGV eq '--help' ) { fail_usage ; }
39 elsif ($ARGV =~ /^-/ ) { fail_usage " invalid option ($ARGV)"; }
40 elsif ($ARGV =~ /^\d+\.\d+\.\d+\.\d+$/)
41 # servers should be ip addresses. I'm not doing detailed checks for this.
42 { if (!defined($servers{$ARGV})) { $servers{$ARGV}=1; } }
43 else { last; }
44 shift #ARGV;
45 }
46 # for some reason I can't test for empty ARGs in the while loop
47 #ARGV = grep {!/^\s*$/} #ARGV;
48 if (scalar #ARGV) { fail_usage "didn't understand arguments: (".join (" ",#ARGV).")"; }
49
50 my $serversn = scalar keys %servers;
51
52 if ($verbose > 2) {
53 print "verbosity=($verbose)\n";
54 print "servers = ($serversn)\n";
55 if ($serversn) { for my $i (keys %servers) { print "server ($i)\n"; } }
56 }
57
58 if (!$serversn) { fail_usage "no servers"; }
59 my $responses=0;
60 my $responders="";
61 my #check_dhcp = qx{/usr/lib64/nagios/plugins/check_dhcp -v};
62 foreach my $value (#check_dhcp) {
63 if ($value =~ /Added offer from server \# /i){
64 $value =~ m/(\d+\.\d+\.\d+\.\d+)/i;
65 my $host = $1;
66 # we find a server in our list
67 if (defined($servers{$host})) { $responses++; $responders.="$host "; }
68 # we find a rogue DHCP server. Danger Will Robinson!
69 else {
70 print "DHCP:CRITICAL: DHCP service running on $host";
71 exit $ERRORS{'OK'}
72 }
73 }
74 }
75 # we saw all the servers in our list. All is good.
76 if ($responses == $serversn) {
77 print "DHCP:OK: $responses of $serversn Expected Responses to DHCP Broadcast";
78 exit $ERRORS{'OK'};
79 }
80 # we found no DHCP responses.
81 if ($responses == 0) {
82 print "DHCP:OK: no rogue servers detected!!!!#!##";
83 exit $ERRORS{'OK'}
84 }
85 # we found less DHCP servers than we should have. Oh Nos!
86 $responders =~ s/ $//;
87 print "DHCP:OK: $responses of $serversn Responses to DHCP Broadcast. ($responders) responded. ";
88 exit $ERRORS{'OK'};
Here's what I am seeing (of relevance) when I do an strace of the nrpe process.
955 6950 stat("/usr/lib64/nagios/plugins/check_roguedhcp.pl", {st_mode=S_IFREG|S_ISUID|S_ISGID|0755, st_size=2799, ...}) = 0
956 6950 setresuid(4294967295, 4294967295, 4294967295) = 0
957 6950 setresgid(4294967295, 536347864, 4294967295) = 0
958 6950 setgroups(3, [536347864, 536347137, 536353632]) = 0
959 6950 open("/dev/tty", O_RDWR|O_NOCTTY) = -1 ENXIO (No such device or address)
960 6950 socket(PF_NETLINK, SOCK_RAW, 9) = 3
961 6950 fcntl(3, F_SETFD, FD_CLOEXEC) = 0
962 6950 fcntl(3, F_SETFD, FD_CLOEXEC) = 0
963 6950 ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff3de81ac0) = -1 ENOTTY (Inappropriate ioctl for device)
964 6950 ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff3de81ac0) = -1 EINVAL (Invalid argument)
965 6950 ioctl(2, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff3de81ac0) = -1 ENOTTY (Inappropriate ioctl for device)
966 6950 getcwd("/", 4096) = 2
967 6950 sendto(3, "d\0\0\0c\4\5\0\1\0\0\0\0\0\0\0cwd=\"/\" cmd=\"/us"..., 100, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 100
968 6950 poll([{fd=3, events=POLLIN}], 1, 500) = 1 ([{fd=3, revents=POLLIN}])
969 6950 recvfrom(3, "$\0\0\0\2\0\0\0\1\0\0\0&\33\0\0\0\0\0\0d\0\0\0c\4\5\0\1\0\0\0"..., 8988, MSG_PEEK|MSG_DONTWAIT, {sa_family=AF_NE TLINK, pid=0, groups=00000000}, [12]) = 36
970 6950 recvfrom(3, "$\0\0\0\2\0\0\0\1\0\0\0&\33\0\0\0\0\0\0d\0\0\0c\4\5\0\1\0\0\0"..., 8988, MSG_DONTWAIT, {sa_family=AF_NETLINK, pi d=0, groups=00000000}, [12]) = 36
971 6950 write(2, "sudo", 4) = 4
972 6950 write(2, ": ", 2) = 2
973 6950 write(2, "sorry, you must have a tty to ru"..., 38) = 38
974 6950 write(2, "\n", 1) = 1
975 6950 setresuid(4294967295, 4294967295, 4294967295) = 0
976 6950 setresgid(4294967295, 4294967295, 4294967295) = 0
977 6950 exit_group(1) = ?
978 6949 <... read resumed> "", 4096) = 0
979 6949 --- SIGCHLD (Child exited) # 0 (0) ---
980 6949 close(5) = 0
981 6949 wait4(6950, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 6950
970 6950 recvfrom(3, "$\0\0\0\2\0\0\0\1\0\0\0&\33\0\0\0\0\0\0d\0\0\0c\4\5\0\1\0\0\0"..., 8988, MSG_DONTWAIT, {sa_family=AF_NETLINK, pi d=0, groups=00000000}, [12]) = 36
971 6950 write(2, "sudo", 4) = 4
972 6950 write(2, ": ", 2) = 2
973 6950 write(2, "sorry, you must have a tty to ru"..., 38) = 38
974 6950 write(2, "\n", 1) = 1
975 6950 setresuid(4294967295, 4294967295, 4294967295) = 0
976 6950 setresgid(4294967295, 4294967295, 4294967295) = 0
977 6950 exit_group(1) = ?
This was solved by adding the following to /etc/sudoers
Defaults:nagios !requiretty
in my case i have resolved changing permissions of scripts file under /nagios/libexec/
do not work with root:root and WORK with nagios:nagios user permission!
I changed permission of my specific script on libexec folder to allow the "Other" (non-root users) to execute it chmod 755 myfile.pl, and it worked well.

Why does IPC::SysV->shmget respond with EINVAL?

I am currently running perl 5.8.8 on a server and I'm trying to install 5.14.
I configured it to usethreads and use64bitint and otherwise the defaults it suggested.
make ran without problems, but make test is failing, on
../cpan/IPC-SysV/t/ipcsysv.t
../cpan/IPC-SysV/t/shm.t
thus:
# ./perl harness ../cpan/IPC-SysV/t/shm.t ../cpan/IPC-SysV/t/ipcsysv.t
../cpan/IPC-SysV/t/shm.t ...... IPC::SharedMem->new failed: Invalid argument at t/shm.t line 54.
../cpan/IPC-SysV/t/shm.t ...... Dubious, test returned 22 (wstat 5632, 0x1600)
No subtests run
../cpan/IPC-SysV/t/ipcsysv.t .. 1/38 shmget failed: Invalid argument at t/ipcsysv.t line 100.
# Looks like you planned 38 tests but ran 17.
# Looks like your test exited with 22 just after 17.
../cpan/IPC-SysV/t/ipcsysv.t .. Dubious, test returned 22 (wstat 5632, 0x1600)
Failed 21/38 subtests
Test Summary Report
-------------------
../cpan/IPC-SysV/t/shm.t (Wstat: 5632 Tests: 0 Failed: 0)
Non-zero exit status: 22
Parse errors: No plan found in TAP output
../cpan/IPC-SysV/t/ipcsysv.t (Wstat: 5632 Tests: 17 Failed: 0)
Non-zero exit status: 22
Parse errors: Bad plan. You planned 38 tests but ran 17.
Files=2, Tests=17, 0 wallclock secs ( 0.01 usr 0.00 sys + 0.13 cusr 0.00 csys = 0.14 CPU)
Result: FAIL
Both of these tests are reporting 'Invalid argument', but when I look at the source, I can't see anything that looks invalid. I'm not really sure how to proceed... any pointers?
UPDATE
I ran
strace perl -MIPC::SysV=IPC_PRIVATE,S_IRWXU -e 'shmget(IPC_PRIVATE, 8, S_IRWXU) or die $!'
on two servers: one which is having these problems and one which is not.
There was a lot of output, but what appears interesting is this:
GOOD:
.
.
.
stat64("/usr/lib/perl5/site_perl/5.8.8/i386-linux-thread-multi/auto/IPC/SysV", 0x9d7f0c8) = -1 ENOENT (No such file or directory)
stat64("/usr/lib/perl5/site_perl/5.8.8/auto/IPC/SysV", 0x9d7f0c8) = -1 ENOENT (No such file or directory)
stat64("/usr/lib/perl5/site_perl/auto/IPC/SysV", 0x9d7f0c8) = -1 ENOENT (No such file or directory)
stat64("/usr/lib/perl5/vendor_perl/5.8.8/i386-linux-thread-multi/auto/IPC/SysV", 0x9d7f0c8) = -1 ENOENT (No such file or directory)
stat64("/usr/lib/perl5/vendor_perl/5.8.8/auto/IPC/SysV", 0x9d7f0c8) = -1 ENOENT (No such file or directory)
stat64("/usr/lib/perl5/vendor_perl/auto/IPC/SysV", 0x9d7f0c8) = -1 ENOENT (No such file or directory)
stat64("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/auto/IPC/SysV", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat64("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/auto/IPC/SysV/SysV.so", {st_mode=S_IFREG|0755, st_size=15072, ...}) = 0
stat64("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/auto/IPC/SysV/SysV.bs", 0x9d7f0c8) = -1 ENOENT (No such file or directory)
futex(0x4d106c, FUTEX_WAKE_PRIVATE, 2147483647) = 0
open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/auto/IPC/SysV/SysV.so", O_RDONLY) = 4
read(4, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0 \v\0\0004\0\0\0"..., 512) = 512
fstat64(4, {st_mode=S_IFREG|0755, st_size=15072, ...}) = 0
mmap2(NULL, 17948, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 4, 0) = 0x588000
mmap2(0x58c000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 4, 0x3) = 0x58c000
close(4) = 0
close(3) = 0
shmget(IPC_PRIVATE, 8, 0700) = 7438344
exit_group(0)
BAD:
.
.
.
stat64("/usr/lib/perl5/site_perl/5.8.8/i386-linux-thread-multi/auto/IPC/SysV", 0x8d290c8) = -1 ENOENT (No such file or directory)
stat64("/usr/lib/perl5/site_perl/5.8.8/auto/IPC/SysV", 0x8d290c8) = -1 ENOENT (No such file or directory)
stat64("/usr/lib/perl5/site_perl/auto/IPC/SysV", 0x8d290c8) = -1 ENOENT (No such file or directory)
stat64("/usr/lib/perl5/vendor_perl/5.8.8/i386-linux-thread-multi/auto/IPC/SysV", 0x8d290c8) = -1 ENOENT (No such file or directory)
stat64("/usr/lib/perl5/vendor_perl/5.8.8/auto/IPC/SysV", 0x8d290c8) = -1 ENOENT (No such file or directory)
stat64("/usr/lib/perl5/vendor_perl/auto/IPC/SysV", 0x8d290c8) = -1 ENOENT (No such file or directory)
stat64("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/auto/IPC/SysV", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat64("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/auto/IPC/SysV/SysV.so", {st_mode=S_IFREG|0755, st_size=15072, ...}) = 0
stat64("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/auto/IPC/SysV/SysV.bs", 0x8d290c8) = -1 ENOENT (No such file or directory)
futex(0x94306c, FUTEX_WAKE_PRIVATE, 2147483647) = 0
open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/auto/IPC/SysV/SysV.so", O_RDONLY) = 4
read(4, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0 \v\0\0004\0\0\0"..., 512) = 512
fstat64(4, {st_mode=S_IFREG|0755, st_size=15072, ...}) = 0
mmap2(NULL, 17948, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 4, 0) = 0x6a4000
mmap2(0x6a8000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 4, 0x3) = 0x6a8000
close(4) = 0
close(3) = 0
shmget(IPC_PRIVATE, 8, 0700) = -1 EINVAL (Invalid argument)
open("/usr/share/locale/locale.alias", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=2528, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7dbe000
read(3, "# Locale name alias data base.\n#"..., 4096) = 2528
read(3, "", 4096) = 0
close(3) = 0
munmap(0xb7dbe000, 4096) = 0
open("/usr/share/locale/en_US.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en_US.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en_US/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
write(2, "Invalid argument at -e line 1.\n", 31Invalid argument at -e line 1.
) = 31
exit_group(22) = ?
So, it appears that the same thing is happening on both servers, it's just that on one, I see
shmget(IPC_PRIVATE, 8, 0700) = 7438344
and the other, I see
shmget(IPC_PRIVATE, 8, 0700) = -1 EINVAL (Invalid argument)
The versions of IPC::SysV are the same on both servers... but it looks to me that this isn't relevant, and that the problem is the the code making the system call... right?
What next?
** UPDATE 2 **
After some googling, I ran the following:
GOOD:
# cat /proc/sys/kernel/shmmax
4294967295
BAD:
# cat /proc/sys/kernel/shmmax
0
So, that explains the EINVAL, since (from the man pages)
EINVAL
A new segment was to be created and size < SHMMIN or size > SHMMAX, or no new segment
was to be created, a segment with given key existed, but size is greater than the size
of that segment.
Now, my question is, is there a good reason why this might be set at zero?
Problem solved.
The /etc/sysctl.conf file contained the following:
kernel.shmmax = 137438953472
This is a 64 bit value, but the system is a 32 bit system.
As a result, the SHMMAX value was being set to 0, making all calls to shmget fail.
Changing it to
kernel.shmmax = 4294967295
And using
echo 4294967295 >/proc/sys/kernel/shmmax
I changed the value of SHMMAX and the test completed successfully.