Segmentation fault when starting G-WAN 3.12.26 32-bit on linux fc14 - virtualization

I have a fc14 32 bit system with 2.6.35.13 custom compiled kernel.
When I try to start G-wan I get a "Segmentation fault".I've made no changes, just downloaded and unpacked the files from g-wan site.
In the log file I have:
"[Wed Dec 26 16:39:04 2012 GMT] Available network interfaces (16)"
which is not true, on the machine i have around 1k interfaces mostly ppp interfaces.
I think the crash has something to do with detecting interfaces/ip addresses because in the log after the above line I have 16 lines with ip's belonging to the fc14 machine and after that about 1k lines with "0.0.0.0" or "random" ip addresses.
I ran gwan 3.3.7 64-bit on a fc16 with about the same number of interfaces and had no problem,well it still reported a wrong number of interfaces (16) but it did not crashed and in the log file i got only 16 lines with the ip addresses belonging to the fc16 machine.
Any ideas?
Thanks

I have around 1k interfaces mostly ppp interfaces
Only the first 16 will be listed as this information becomes irrelevant with more interfaces (the intent was to let users find why a listen attempt failed).
This is probably the long 1K list, many things have changed internally after the allocator was redesigned from scratch. Thank you for reporting the bug.
I also confirm the comment which says that the maintenance script crashes. Thanks for that.
Note that bandwidth shaping will be modified to avoid the newer Linux syscalls so the GLIBC 2.7 requirement will be waved.
...with a custom compiled kernel
As a general rule, check again on a standard system like Debian 6.x before asking a question: there is room enough for trouble with a known system - there's no need to add custom system components.
Thank you all for the tons(!) of emails received these two last days about the new release!

I had a similar "Segmentation fault" error; mine happens any time I go to 9+GB of RAM. Exact same machine at 8GB works fine, and 10GB doesn't even report an error, it just returns to the prompt.
Interesting behavior... Have you tried adjusting the amount of RAM to see what happens?
(running G-WAN 4.1.25 on Debian 6.x)

Related

vncserver has wrong hostname

I had to change the name of the windows 7 system. Unaccountably, vncserver is still using the old computer name. This was RealVnc free version. I re-installed but it is still using the old computer name.
I had a Z400 motherboard go bad and it took the disk drive. I replaced the motherboard, $39 was cheap, and I cloned one of my other Z400 workstation C drives using Acronis. I booted the replacement motherboard with the cloned copy, changed its name to the old defective one, and activated windows. When It rebooted, vncserver still had the old computer name and I cannot get rid of it and it is conflicting with the vncserver on the other Z400 since they both use the same name. There is no option in the server to use a different name that I can find anywhere.
IPs are different and all system behave fine. I can ping and even access shares using their names. The problem system clearly shows the correct name but, unaccountably, vncserver is using the wrong name.
This system will be upgraded to 10 in a few days, maybe the problem will go away when that happens.
Solved! First I had to log in and reduce the number of clients to under 5 as that was my limit. Then I had to remove the problem system's leftover name. This was all done at realvnc web site. Once under 5 systems then I could add the problem one and once it connected to realvnc's cloud then it got the correct name. This was an artifact of having more then 5 systems when I was only licensed for 5. The "6th" one worked locally as its setup was still valid, but it was refused connection to the cloud so it never got to change its name. It "worked" until I did a flushdns and its old setup was no longer valid.

Is WinDbg's vertarget command always accurate?

I wonder because running it on a client's minidump it reports a different Windows version than the client repeatedly told me she had, and the version I'm being reported happens to be exactly the same version I'm running WinDbg on.
So I wonder, can vertarget always be trusted (and clients not) or the information it relies on may be absent with some dump generation options and when it is it reports the version WinDbg is currently running on, or maybe just some default that happens to coincide with my OS version?
I'm using WinDbg 6.12.
In all my cases so far, vertarget has been correct and the customer/client made a mistake - and vertarget is one of the commands I use for every dump, exactly for the purpose of checking if the dump contains what I need.
But perhaps, things can potentially go wrong here as well, so let's evaluate some options:
vertarget also reports debug session time and system uptime. Do those also match your system? Reboot your system in order to get a low system uptime and check again. Is it still your PC's uptime?
vertarget also reports the number of CPUs. Does that number match your number?
Get a virtual machine which does not have your OS, e.g. one from Modern.IE (Microsoft). Copy WinDbg and the dump to the VM and check the output of vertarget again.
WinDbg 6.12 is a bit old. Do newer versions (6.2.9200 / 6.3.9600 or even 10.0) provide the same information or was there a bug fixed already?
And even check some other information:
Is it a dump of the correct application? Use | (pipe)
Is it a dump of the version you are expecting? Use lm vm <exename>
Does it have the flags which can be expected for the method used for taking the dump? Use .dumpdebug.
Other than that I observe (not representative) that many client OS version dumps (Windows 7, 8, 8.1) have all latest service packs installed, while administrators seem to follow the "never change a running system" approach for server OS (Windows Server 2012, R2). So it might just be a coincident.

Tibco dataloss remote daemon did not satisfy our retransmission request

I am having some real problems now with TIBCO RV, at the exact same time two machines reported dataloss on INBOUND. The 3rd machine, the remote daemon, did not report any unusual errors. This is happening too often, even though I've manually changed the rx/tx buffer size on network cards. I am not using CDM. Any ideas? Thanks
Machine 1
2013-01-08 12:58:21 /usr/tibco/tibrv/bin/realrvd64: TIB/Rendezvous Error: {ADV_CLASS="ERROR" ADV_SOURCE="SYSTEM" ADV_NAME="DATALOSS.INBOUND.BCAST" ADV_DESC="dataloss: remote daemon did not satisfy our retransmission request(s)" host="Machine 3" lost=3 scid=13001}
Machine 2
2013-01-08 12:58:21 /usr/tibco/tibrv/bin/realrvd64: TIB/Rendezvous Error: {ADV_CLASS="ERROR" ADV_SOURCE="SYSTEM" ADV_NAME="DATALOSS.INBOUND.BCAST" ADV_DESC="dataloss: remote daemon did not satisfy our retransmission request(s)" host="Machine 3" lost=3 scid=13001}

Attempt to access remote folder mounted with CIFS hangs when disconnected

This question is an extension for that question.
Yet again: I'm working under CentOS 6.0 and I have a remote win7 folder, mounted with:
mount -t cifs //PC128/mnt /media/net -o "username=WORKGROUP\user,password=pwd,rw,noexec,soft,uid=user,gid=user"
When remote folder is not available (e.g. network cable is pulled out) an attempt to access the remote folder locks an application I'm working on. At first I detected that QDir::exists() caused locking for 20-90 seconds (I still can't find out why such difference), further I detected that any call to stat() function leads to application lock.
I followed an advice provided in topic above, I moved QDir::exists() call (and later - call to the stat() function) to another thread and this didn't solve the problem. The application still hangs when connection is suddenly lost. Qt trace shows that lock is somewhere in the kernel:
0 __kernel_vsyscall
1 __xstat64#GLIBC_2.1 /lib/libc.so.6
2 QFSFileEnginePrivate::doStat stat.h
I did also tried to check if remote share is still mounted before trying to access folder itself, but it didn't help. Approaches such as:
mount | grep /media/net
show that shared folder is still mounted even is there is no active connection to the network.
Checking folder status differences such as:
stat -fc%t:%T /media/net/ != stat -fc%t:%T /media/net/..
also hangs for ~20 seconds.
So I have several questions:
Is there any way to change CIFS timeouts? I did try to find out but it seems that there is no appropriate parameters and no CIFS config.
How can I check if remote folder is still mounted and not get locked?
How can I check is folder exists and also not get locked?
Your problem: "An unreachable network filesystem" is a very well known example which trigger linux hung task which isn't the same of zombies process at all(killing the parent PID won't do anything)
An hung task, is task which triggered a system call that cause problem in the kernel, so that the system call never return.
The major particularity is that the task is declared in the "D" state by the scheduler which mean the program is in an uninterruptible state. This mean that you can do nothing to stop you program: You can trigger all signal to the task, it would not respond. Launching hundreds of SIGTERM/SIGKILL does nothing!
This the case whith my old kernel: when my nfs server crash, I need to reboot the client to kill the tasks using the filesystem. I compiled it a long time ago (I have still the build tree on my hdd) and during the configuration I saw this in lib/Kconfig.debug:
config DETECT_HUNG_TASK
bool "Detect Hung Tasks"
depends on DEBUG_KERNEL
default LOCKUP_DETECTOR
help
Say Y here to enable the kernel to detect "hung tasks",
which are bugs that cause the task to be stuck in
uninterruptible "D" state indefinitiley.
When a hung task is detected, the kernel will print the
current stack trace (which you should report), but the
task will stay in uninterruptible state. If lockdep is
enabled then all held locks will also be reported. This
feature has negligible overhead.
It was only proposing to detect such tash or panic on detection: I don't checked if recent kernel actually can solve the problem (It seems to be the case with your question), but I think it didn't worth enabling it.
There is second problem : normally, the detection occur after 120 seconds, but I saw also a Konfig option for this:
config DEFAULT_HUNG_TASK_TIMEOUT
int "Default timeout for hung task detection (in seconds)"
depends on DETECT_HUNG_TASK
default 120
help
This option controls the default timeout (in seconds) used
to determine when a task has become non-responsive and should
be considered hung.
It can be adjusted at runtime via the kernel.hung_task_timeout_secs
sysctl or by writing a value to
/proc/sys/kernel/hung_task_timeout_secs.
A timeout of 0 disables the check. The default is two minutes.
Keeping the default should be fine in most cases.
This also works with kernel threads: example: make a loop device to a file on a fuse filesystem. Then crash the userspace program controlling the fuse filesystem!
You should a get a Ktread which name is in the form loopX (X correspond normally to your loopback device number) HUNGing!
weblinks:
https://unix.stackexchange.com/questions/5642/what-if-kill-9-does-not-work (look at the answer written by ultrasawblade)
http://www.linuxquestions.org/questions/linux-general-1/kill-a-hung-task-when-kill-9-doesn't-help-697305/
http://forums-web2.gentoo.org/viewtopic-t-811557-start-0.html
http://comments.gmane.org/gmane.linux.kernel/1189978
http://comments.gmane.org/gmane.linux.kernel.cifs/7674 (This is a case similar to yours)
In your case of the three question: you have the answer: This probably due to what is probably a well known bug in the vfs linux kernel layer! (There is no CIFS timeouts)
After much trial & error I found a solution that persists.
# vim /etc/fstab
//192.168.1.122/myshare /mnt/share cifs username=user,password=password,_netdev 0 0
The _netdev option is important since we are mounting a network device. Clients may hang during the boot process if the system encounters any difficulties with the network.
https://www.redhat.com/sysadmin/samba-windows-linux

Why does Apache complain that CGI.pm has panicked at line 4001 due to a memory wrap?

This error according to the logs is caused by a 5-year old Perl script that merely grabs data from MySQL via a simple SQL select and displays it.
It's running on my dev machine which is MBP with 8GB of RAM running the stock Apache.
Once a while, once or twice a month, I get the following error for no apparent reason :
panic: memory wrap at /System/Library/Perl/5.10.0/CGI.pm line 4001.
Apache refuses to run the script again and only a reboot of the OS would make Apache relent. The OS says that there's 3+ GB of free memory when it happens so it's not a low memory issue. Luckily this doesn't happen on the production Debian 5 server.
What's a memory wrap? And what causes it?
I hit this bug as well in a slightly different circumstance. PerlMonks, as ever, has just saved me probably days of work:
http://www.perlmonks.org/?node_id=823389
the problem lies in the way osx ties
up other resources. a simple sleep
will give the os time to close and
open. restart or graceful will go in
conflict.
apachectl stop
sleep 2
apachectl start
This is late but the perl distributed by MacPorts does not have this problem, if that is an option.
mu is too short's answer, which as unfortunately posted as a comment. :
perldiag says that "panic: memory wrap" means "Something tried to allocate more memory than possible". A bit of googling suggests that this isn't a CGI.pm problem but an occassional problem with Perl 5.10 and OSX.