Attempt to access remote folder mounted with CIFS hangs when disconnected

Attempt to access remote folder mounted with CIFS hangs when disconnected - centos

This question is an extension for that question.
Yet again: I'm working under CentOS 6.0 and I have a remote win7 folder, mounted with:
mount -t cifs //PC128/mnt /media/net -o "username=WORKGROUP\user,password=pwd,rw,noexec,soft,uid=user,gid=user"
When remote folder is not available (e.g. network cable is pulled out) an attempt to access the remote folder locks an application I'm working on. At first I detected that QDir::exists() caused locking for 20-90 seconds (I still can't find out why such difference), further I detected that any call to stat() function leads to application lock.
I followed an advice provided in topic above, I moved QDir::exists() call (and later - call to the stat() function) to another thread and this didn't solve the problem. The application still hangs when connection is suddenly lost. Qt trace shows that lock is somewhere in the kernel:
0 __kernel_vsyscall
1 __xstat64#GLIBC_2.1 /lib/libc.so.6
2 QFSFileEnginePrivate::doStat stat.h
I did also tried to check if remote share is still mounted before trying to access folder itself, but it didn't help. Approaches such as:
mount | grep /media/net
show that shared folder is still mounted even is there is no active connection to the network.
Checking folder status differences such as:
stat -fc%t:%T /media/net/ != stat -fc%t:%T /media/net/..
also hangs for ~20 seconds.
So I have several questions:
Is there any way to change CIFS timeouts? I did try to find out but it seems that there is no appropriate parameters and no CIFS config.
How can I check if remote folder is still mounted and not get locked?
How can I check is folder exists and also not get locked?

Your problem: "An unreachable network filesystem" is a very well known example which trigger linux hung task which isn't the same of zombies process at all(killing the parent PID won't do anything)
An hung task, is task which triggered a system call that cause problem in the kernel, so that the system call never return.
The major particularity is that the task is declared in the "D" state by the scheduler which mean the program is in an uninterruptible state. This mean that you can do nothing to stop you program: You can trigger all signal to the task, it would not respond. Launching hundreds of SIGTERM/SIGKILL does nothing!
This the case whith my old kernel: when my nfs server crash, I need to reboot the client to kill the tasks using the filesystem. I compiled it a long time ago (I have still the build tree on my hdd) and during the configuration I saw this in lib/Kconfig.debug:
config DETECT_HUNG_TASK
bool "Detect Hung Tasks"
depends on DEBUG_KERNEL
default LOCKUP_DETECTOR
help
Say Y here to enable the kernel to detect "hung tasks",
which are bugs that cause the task to be stuck in
uninterruptible "D" state indefinitiley.
When a hung task is detected, the kernel will print the
current stack trace (which you should report), but the
task will stay in uninterruptible state. If lockdep is
enabled then all held locks will also be reported. This
feature has negligible overhead.
It was only proposing to detect such tash or panic on detection: I don't checked if recent kernel actually can solve the problem (It seems to be the case with your question), but I think it didn't worth enabling it.
There is second problem : normally, the detection occur after 120 seconds, but I saw also a Konfig option for this:
config DEFAULT_HUNG_TASK_TIMEOUT
int "Default timeout for hung task detection (in seconds)"
depends on DETECT_HUNG_TASK
default 120
help
This option controls the default timeout (in seconds) used
to determine when a task has become non-responsive and should
be considered hung.
It can be adjusted at runtime via the kernel.hung_task_timeout_secs
sysctl or by writing a value to
/proc/sys/kernel/hung_task_timeout_secs.
A timeout of 0 disables the check. The default is two minutes.
Keeping the default should be fine in most cases.
This also works with kernel threads: example: make a loop device to a file on a fuse filesystem. Then crash the userspace program controlling the fuse filesystem!
You should a get a Ktread which name is in the form loopX (X correspond normally to your loopback device number) HUNGing!
weblinks:
https://unix.stackexchange.com/questions/5642/what-if-kill-9-does-not-work (look at the answer written by ultrasawblade)
http://www.linuxquestions.org/questions/linux-general-1/kill-a-hung-task-when-kill-9-doesn't-help-697305/
http://forums-web2.gentoo.org/viewtopic-t-811557-start-0.html
http://comments.gmane.org/gmane.linux.kernel/1189978
http://comments.gmane.org/gmane.linux.kernel.cifs/7674 (This is a case similar to yours)
In your case of the three question: you have the answer: This probably due to what is probably a well known bug in the vfs linux kernel layer! (There is no CIFS timeouts)

After much trial & error I found a solution that persists.
# vim /etc/fstab
//192.168.1.122/myshare /mnt/share cifs username=user,password=password,_netdev 0 0
The _netdev option is important since we are mounting a network device. Clients may hang during the boot process if the system encounters any difficulties with the network.
https://www.redhat.com/sysadmin/samba-windows-linux

Related

Keep getting VisaIOErrors after crash, unless device and ipython are rebooted

We are controlling a Keithley DMM6500 using the pyVisa library. In our setup, we are keeping an iPython kernel running (through Spyder).
The problem we're running into is the following: whenever a function that interacts with the DMM encounters an unhandled exception (like a KeyboardInterrupt), any subsequent calls to the DMM result in the error VI_ERROR_SYSTEM_ERROR (-1073807360): Unknown system error (miscellaneous error).
In order to fix this, we have tried to call device.clear() and device.close() / device.open(), but this doesn't seem to work. Even rebooting the device does not work. The only thing that fixes the issue, it seems, is to completely restart our iPython kernel.
Is there any way to programmatically restore communication with the device, such that we can avoid having to reboot the ipython kernel?

Some of your question is unclear so my answer might not help, however, it sounds like the terminal is locking the connection and you're loosing the reference.
The two way I have done this in the past:
Open the connection when talking to the device and close the connection when finished. This is useful if your connection is unstable but takes a fraction longer to open and close the connection a lot.
2)In your program you should have a try/except to handle the connection to the insturument and when the program errors you need to close the connection so that it doesn't become locked.
example:
try:
run_program()
except:
close_connection_to_all devices() # build a function to clear connection to all devices
dump_any_unsaved_data() # maybe you want to dump some of the variable to see what the data was when it errored for debug

Skip host while nmap is running

Is there a way to skip a host while it is being scanned. I am providing a list of hosts to nmap and while it is scanning from that list, I would like to skip one host because the scripts keep running on that host hence delaying my scan. Please suggest.
Thanks

There is not a way during runtime to stop scanning a host. However, you can impose time limits on how long Nmap spends on a particular host. The --host-timeout option will cause Nmap to drop all results and stop scanning a target when the timeout expires. Unfortunately, this means all that work is lost. But there is a better way, if NSE scripts are slowing you down.
Nmap 7.30 added the --script-timeout option, which puts a time limit on each NSE script that runs against a target. Any script that exceeds the time limit will be terminated and will produce no output, but any other scripts will be allowed to run. No port scan, OS detection, or traceroute data will be lost.
Your last option if NSE is taking too long is to find out which script is causing the problem. Most NSE scripts are designed to run quickly; even most of the brute-force password guessing scripts enforce a 10-minute time limit. But sometimes there are bugs, and other times you may select a script with an intentionally long run time. In debug mode (-d or press d during runtime), printing a status line (by pressing any key during execution) will show a list of running scripts when there are 5 or fewer running. At debug level 2 (-dd or press d twice), a full stack trace of each running script thread is produced, which can help Nmap developers debug delays. If you suspect a misbehaving script, you can file a bug report on Github or send it to dev#nmap.org.

nmap has a host timeout option which will give up on any host that takes longer than the provided value. So, the below option would give up on any host that takes longer than 10 minutes. You can read more about the various timing related options here.
nmap --host-timeout 10m

What is the opposite of `mknod`?

I am learning to write character device drivers from the Kernel Module Programming Guide, and used mknod to create a node in /dev to talk to my driver.
However, I cannot find any obvious way to remove it, after checking the manpage and observing that rmnod is a non-existent command.
What is the correct way to reverse the effect of mknod, and safely remove the node created in /dev?

The correct command is just rm :)
A device node created by mknod is just a file that contains a device major and minor number. When you access that file the first time, Linux looks for a driver that advertises that major/minor and loads it. Your driver then handles all I/O with that file.
When you delete a device node, the usual Un*x file behavior aplies: Linux will wait until there are no more references to the file and then it will be deleted from disk.
Your driver doesn't really notice anything of this. Linux does not automatically unload modules. Your driver wil simply no longer receive requests to do anything. But it will be ready in case anybody recreates the device node.

You are probably looking for a function rather than a command. unlink() is the answer. unlink() will remove the file/special file if no process has the file open. If any processes have the file open, then the file will remain until the last file descriptor referring to it is closed. Read more here: http://man7.org/linux/man-pages/man2/unlink.2.html

Change site configuration without restarting G-WAN

I'm looking at hosting a number of small, static websites and have been looking at a few alternatives including G-WAN. At the moment I'm just trying to get a feel for how well each server suits my needs before picking one.
G-WAN seems to do exactly what I want, though I'm running into problems with updating the configuration (by adding new folders) after the server's started. I can't find anything in the documentation or online about this, so I don't know if I'm doing anything dumb, running an unsupported configuration, or whether it's a feature that doesn't exist in G-WAN.
Here's my setup:
G-WAN 3.3.28 64-bit on Ubuntu 12.04.1 LTS.
I have what I think is the required minimal folder structure:
0.0.0.0_80
#0.0.0.0
www
$site.com
www
$othersite.com
www
I startup gwan via (I'm still messing around, so hopefully ):
sudo .\gwan -d
Everything works brilliantly. I add $thirdsite.com/, $thirdsite.com/www/, and $thirdsite.com/www/index.html; then when I try to visit thirdsite.com it gives me the root host (ie it doesn't seem to pick up the changes).
To reload the modified configuration, I have to either do:
sudo .\gwan -k; sudo .\gwan -d
or kill the non-angel process (kill -s 15) to restart the child process.
Can G-WAN reload the host definitions another way? If so, is it something that works out of the box or is there a command that can cycle the server without dropping requests made to other hosts (/is it safe to kill -s 15 on the non-angel process + if so, is there a reliable way to identify the process)? Thanks in advance!

G-WAN loads the host definitions at startup and does not check them as time goes to reload them dynamically.
To force a reload, you have to stop the child process (when in daemon mode) and v3.9+ keeps the old child alive the time to process any pending request while the new child accepts new connections.
Since stopping the child can also be done from the maintenance script or from a handler or from a servlet by just running exit(0) there is not need for a dedicated command.
Note that when you use kill you can pick the pid file from the gwan directory:
the parent process starts with a capital letter: Gwan_xxxx.pid
the child process starts with a lowercase letter: gwan_xxxx.pid
That will make your life easier.

STM32 GDB/OpenOCD Commands and Initialization for Flash and Ram Debugging

I am looking for assistance with the proper GDB / OpenOCD initializion and running commands (external tools) to use within Eclipse for flash and RAM debugging, as well as the proper modifications or additions that need to be incorporated in a makefile for flash vs RAM building for this MCU, if this matters of course.
MCU: STM32F103VET6
I am using Eclipse Helios with Zylin Embedded CDT, Yagarto Tools and Bins, OpenOCD 0.4, and have an Olimex ARM-USB-OCD JTAG adapter.
I have already configured the ARM-USB-OCD and added it as an external tool in Eclipse. For initializing OpenOCD I used the following command in Eclipse. The board config file references the stm32 MCU:
openocd -f interface/olimex-arm-usb-ocd-h.cfg -f board/stm32f10x_128k_eval.cfg
When I run this within Eclipse everything appears to be working (GDB Interface, OpenOCD finds the MCU, etc). I can also telnet into OpenOCD and run commands.
So, I am stuck on the next part; initialization and commands for flash and RAM debugging, as well as erasing flash.
I read through several tutorials, and scoured the net, but have not been able to find anything particular to this processor. I am new to this, so I might not be recognizing an equivalent product for an example.

I'm working with the same tool chain to program and debug a STM32F107 board. Following are my observations to get an STM32Fxxx chip programmed and debugged under this toolchain.
Initial Starting Point
So at this point you've got a working OpenOCD to ARM-USB-OCD connection and so you should be all set on that end. Now the work is on getting Eclipse/Zylin/Yagarto GDB combination to properly talk to the STM32Fxxx through the OpenOCD/Olimex connection. One thing to keep in mind is that all the OpenOCD commands to issue are the run mode commands. The configuration scripts and command-line options to invoke the OpenOCD server are configuration mode commands. Once you issue the init command then the server enters run mode which opens up the set of commands you'll need next. You've probably done it somewhere else but I tack on a '-c "init"' option when I call the OpenOCD server like so:
openocd -f /path to scripts/olimex-arm-usb-ocd-h.cfg -f /path to targets/stm32f107.cfg -c "init"
The following commands I issue next are done by the Eclipse Debug Configurations dialogue. Under the Zylin Embedded debug (Native) section, I create a new configuration, give it a name, Project (optional), and absolute path to the binary that I want to program. Under the Debugger tab I set the debugger to Embedded GDB, point to the Yagarto GDB binary path, don't set a GDB command file, set GDB command set to Standard, and the protocol to mi.
The Commands Tab - Connect GDB to OpenOCD
So the next tab is the Commands tab and that's where the meat of the issue lies. You have two spaces Initialize and Run. Not sure exactly what the difference is except to guess that they occur pre- and post-invocation of GDB. Either way I haven't noticed a difference in how my commands are run.
But anyway, following the examples I found on the net, I filled the Initialize box with the following commands:
set remote hardware-breakpoint limit 6
set remote hardware-watchoint-limit 4
target remote localhost:3333
monitor halt
monitor poll
First two lines tell GDB how many breakpoints and watchpoints you have. Open OCD Manual Section 20.3 says GDB can't query for that information so I tell it myself. Next line commands GDB to connect to the remote target at the localhost over port 3333. The last line is a monitor command which tells GDB to pass the command on to the target without taking any action itself. In this case the target is OpenOCD and I'm giving it the command halt. After that I tell OpenOCD to switch to asynchronous mode of operation. As some of the following operations take a while, it's useful not to have OpenOCD block and wait for every operation.
Sidenote #1: If you're ever in doubt about the state of GDB or OpenOCD then you can use the Eclipse debug console to send commands to GDB or OpenOCD (via GDB monitor commands) after invoking this debug configuration.
The Commands Tab - Setting up the User Flash
Next are commands I give in the Run commands section:
monitor flash probe 0
monitor flash protect 0 0 127 off
monitor reset halt
monitor stm32x mass_erase 0
monitor flash write_image STM3210CTest/test_rom.elf
monitor flash protect 0 0 127 on
disconnect
target remote localhost:3333
monitor soft_reset_halt
to be explained in the following sections...
Setting up Access to User Flash Memory
First I issue an OpenOCD query to see if it can find the flash module and report the proper address. If it responds that it found the flash at address 0x08000000 then we're good. The 0 at the end specifies to get information about flash bank 0.
Sidenote #2: The STM32Fxxx part-specific data sheets have a memory map in section 4. Very useful to keep on hand as you work with the chip. Also as everything is accessed as a memory address, you'll come to know this layout like the back of your hand after a little programming time!
So after confirming that the flash has been properly configured we invoke the command to turn off write protection to the flash bank. PM0075 describes everything you need to know about programming the flash memory. What you need to know for this command is the flash bank, starting sector, ending sector, and whether to enable or disable write protection. The flash bank is defined in the configuration files you passed to OpenOCD and was confirmed by the previous command. Since I want to disable protection for the entire flash space I specify sectors 0 to 127. PM0075 explains how I got that number as it refers to how the flash memory is organized into 2KB pages for my (and your) device. My device has 256KB of flash so that means I have 128 pages. Your device has 512KB of flash so you'll have 256 pages. To confirm that your device's write-protection has been disabled properly, you can check the FLASH_WRPR register at address 0x40022020 using the OpenOCD command:
monitor mdw 0x40022020
The resulting word that it prints will be 0xffffffff which means all pages have their write protection disabled. 0x00000000 means all pages have write protection enabled.
Sidenote #3: On the subject of the memory commands, I bricked my chip twice as I was messing with the option bytes at the block starting at address 0x1ffff800. First time I set the read protection on the flash (kind of hard to figure out what your doing if you do that), second time I set the hardware watchdog which prevented me from doing anything afterwards since the watchdog kept firing off! Fixed it by using the OpenOCD memory access commands. Moral of the story is: With great power comes great responsibility.... Or another take is that if I shoot myself in the foot I can still fix things via JTAG.
Sidenote #4: One thing that'll happen if you try to write to protected flash memory is the FLASH_SR:WRPRTERR bit will be set. OpenOCD will report a more user-friendly error message.
Erasing the Flash
So after disabling the write protection, we need to erase the memory that you want to program. I do a mass erase which erases everything, you also have the option to erase by sector or address (I think). Either way you need to erase first before programming as the hardware checks for erasure first before allowing a write to occur. If the FLASH_SR:PGERR bit (0x4002200c) ever gets set during programming then you know you haven't erased that chunk of memory yet.
Sidenote #5: Erasing a bit in flash memory means setting it to 1.
Programming Your Binary
The next two lines after erasure writes the binary image to the flash and reenables the write protection. There isn't much more to say that isn't covered by PM0075. Basically any error that occurs when you issue flash write_image is probably related to the flash protection not being disabled. It's probably NOT OpenOCD though if you're curious you can take enable the debug output and follow what it does.
GDB Debugging
So finally after programming I disconnect GDB from the remote connection and then reconnect it to the target, do a soft-reset, and my GDB is now ready to debug. This last part I just figured out last night as I was trying to figure out why, after programming, GDB wouldn't properly stop at main() after reset. It kept going off into the weeds and blowing up.
My current thinking and from what I read in the OpenOCD and GDB manuals is that the remote connection is, first and foremost, meant to be used between GDB and a target that has already been configured and running. Well I'm using GDB to configure before I run so I think the symbol table or some other important info gets messed up during the programming. The OpenOCD manual says that the server automatically reports the memory and symbols when GDB connects but all that info probably becomes invalid when the chip gets programmed. Disconnecting and reconnecting I think refreshes the info GDB needs to debug properly. So that has led me to create another Debug Configuration, this one just connects and resets the target since I don't necessarily need to program the chip every time I want to use GDB.
Whew! Done! Kind of long but this took me 3 weekends to figure out so isn't too terribly bad I think...
Final sidenote: During my time debugging I found that OpenOCD debug output to be invaluable to me understanding what OpenOCD was doing under the covers. To program a STM32x chip you need to unlock the flash registers, flip the right bits, and can only write a half-word at a time. For a while I was questioning whether OpenOCD was doing this properly but after looking through the OpenOCD debug output and comparing it against what the PM0075 instructions were, I was able to confirm that it did indeed follow the proper steps to do each operation. I also found I was duplicating steps that OpenOCD was already doing so I was able to cut out instructions that weren't helping! So moral of the story: Debug output is your friend!

I struggled getting JLink to work with a STM3240XX and found a statement in the JLink GDB server documentation saying that after loading flash you must issue a "target reset":
"When debugging in flash the stack pointer and the PC are set automatically when the target is reset after the flash download. Without reset after download, the stack pointer and the PC need to be initialized correctly, typically in the .gdbinit file."
When I added a "target reset" in the Run box of the debugger Setup of Eclipse, suddenly everything worked. I did not have this problem with a Kinetis K60.
The document also explains how to manually set the stack pointer and pc directly if you don't want to issue a reset. It may not be the disconnect/connect that solves the problem but the reset.

What i use after the last sentence in the Comannd Tab - 'Run' Commands, is:
symbol-file STM3210CTest/test_rom.elf
thbreak main
continue
The thbreak main sentence is what makes gdb stop at main.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse