Keep getting VisaIOErrors after crash, unless device and ipython are rebooted - ipython

We are controlling a Keithley DMM6500 using the pyVisa library. In our setup, we are keeping an iPython kernel running (through Spyder).
The problem we're running into is the following: whenever a function that interacts with the DMM encounters an unhandled exception (like a KeyboardInterrupt), any subsequent calls to the DMM result in the error VI_ERROR_SYSTEM_ERROR (-1073807360): Unknown system error (miscellaneous error).
In order to fix this, we have tried to call device.clear() and device.close() / device.open(), but this doesn't seem to work. Even rebooting the device does not work. The only thing that fixes the issue, it seems, is to completely restart our iPython kernel.
Is there any way to programmatically restore communication with the device, such that we can avoid having to reboot the ipython kernel?

Some of your question is unclear so my answer might not help, however, it sounds like the terminal is locking the connection and you're loosing the reference.
The two way I have done this in the past:
Open the connection when talking to the device and close the connection when finished. This is useful if your connection is unstable but takes a fraction longer to open and close the connection a lot.
2)In your program you should have a try/except to handle the connection to the insturument and when the program errors you need to close the connection so that it doesn't become locked.
example:
try:
run_program()
except:
close_connection_to_all devices() # build a function to clear connection to all devices
dump_any_unsaved_data() # maybe you want to dump some of the variable to see what the data was when it errored for debug

Related

Is there a way to stop VS Code Remote SSH from failing after it is disconnected?

I find that if I leave my VS Code Remote SSH connection open it disconnects automatically after a certain amount of time. Following automatic disconnection I find the Remote SSH then fails: when I try to log in again I get repeated requests for my remote password and every time I enter my password I just get another password prompt.
My current workaround is to go to the Command Palette and do "Remote-SSH: Kill VS Code Server on Host". Sometimes I need to do this multiple times for it to take effect. Then when I next log in there is a lengthy VS Code installation script that needs to run before I can start coding again.
Is there a way of setting up VS Code Remote SSH that avoids this issue? I have tried some of the suggestions on this page - https://code.visualstudio.com/docs/remote/troubleshooting. However I feel like I am completely in the dark regarding what the underlying issue is. I do not even know how I could go about generating informative diagnostics / a log.
Maybe the problem is that the remote machine has a limited number of proccess to run at same time. When a automatic disconnection of vscode happen, the that session is still running, but you cannot create a new one because you are over the limit in number of process.
In my case, asking to the remote machine to kill my process (manually done by the technician working in that machine in this case) works.
A better solution will be close the vscode sessions from your machine, to be able to start a new one again.

failed using cuda-gdb to launch program with CUPTI calls

I'm having this weird issue: I have a program that uses CUPTI callbackAPI to monitor the kernels in the program. It runs well when it's directly launched; but when I put it under cuda-gdb and run, it failed with the following error:
error: function cuptiSubscribe(&subscriber, CUpti_CallbackFunc)my_callback, NULL) failed with error CUPTI_ERROR_NOT_INITIALIZED
I've tried all examples in CUPTI/samples and concluded that programs that use callbackAPI and activityAPI will fail under cuda-gdb. (They are all well-behaved without cuda-gdb) But the fail reason differs:
If I have calls from activityAPI, then once run it under cuda-gdb, it'll hang for a minute then exit with error:
The CUDA driver has hit an internal error. Error code: 0x100ff00000001c Further execution or debugging is unreliable. Please ensure that your temporary directory is mounted with write and exec permissions.
If I have calls from callbackAPI like my own program, then it'll fail out much sooner with the same error:
CUPTI_ERROR_NOT_INITIALIZED
Any experience on this kinda issue? I really appreciate that!
According to NVIDIA forum posting here and also referred to here, the CUDA "tools" must be used uniquely. These tools include:
CUPTI
any profiler
cuda-memcheck
a debugger
Only one of these can be "in use" on a code at a time. It should be fairly easy for developers to use a profiler, or cuda-memcheck, or a debugger independently, but a possible takeaway for those using CUPTI, who also wish to be able to use another CUDA "tool" on the same code, would be to provide a coding method to be able to disable CUPTI use in their application, when they wish to use another tool.

Attempt to access remote folder mounted with CIFS hangs when disconnected

This question is an extension for that question.
Yet again: I'm working under CentOS 6.0 and I have a remote win7 folder, mounted with:
mount -t cifs //PC128/mnt /media/net -o "username=WORKGROUP\user,password=pwd,rw,noexec,soft,uid=user,gid=user"
When remote folder is not available (e.g. network cable is pulled out) an attempt to access the remote folder locks an application I'm working on. At first I detected that QDir::exists() caused locking for 20-90 seconds (I still can't find out why such difference), further I detected that any call to stat() function leads to application lock.
I followed an advice provided in topic above, I moved QDir::exists() call (and later - call to the stat() function) to another thread and this didn't solve the problem. The application still hangs when connection is suddenly lost. Qt trace shows that lock is somewhere in the kernel:
0 __kernel_vsyscall
1 __xstat64#GLIBC_2.1 /lib/libc.so.6
2 QFSFileEnginePrivate::doStat stat.h
I did also tried to check if remote share is still mounted before trying to access folder itself, but it didn't help. Approaches such as:
mount | grep /media/net
show that shared folder is still mounted even is there is no active connection to the network.
Checking folder status differences such as:
stat -fc%t:%T /media/net/ != stat -fc%t:%T /media/net/..
also hangs for ~20 seconds.
So I have several questions:
Is there any way to change CIFS timeouts? I did try to find out but it seems that there is no appropriate parameters and no CIFS config.
How can I check if remote folder is still mounted and not get locked?
How can I check is folder exists and also not get locked?
Your problem: "An unreachable network filesystem" is a very well known example which trigger linux hung task which isn't the same of zombies process at all(killing the parent PID won't do anything)
An hung task, is task which triggered a system call that cause problem in the kernel, so that the system call never return.
The major particularity is that the task is declared in the "D" state by the scheduler which mean the program is in an uninterruptible state. This mean that you can do nothing to stop you program: You can trigger all signal to the task, it would not respond. Launching hundreds of SIGTERM/SIGKILL does nothing!
This the case whith my old kernel: when my nfs server crash, I need to reboot the client to kill the tasks using the filesystem. I compiled it a long time ago (I have still the build tree on my hdd) and during the configuration I saw this in lib/Kconfig.debug:
config DETECT_HUNG_TASK
bool "Detect Hung Tasks"
depends on DEBUG_KERNEL
default LOCKUP_DETECTOR
help
Say Y here to enable the kernel to detect "hung tasks",
which are bugs that cause the task to be stuck in
uninterruptible "D" state indefinitiley.
When a hung task is detected, the kernel will print the
current stack trace (which you should report), but the
task will stay in uninterruptible state. If lockdep is
enabled then all held locks will also be reported. This
feature has negligible overhead.
It was only proposing to detect such tash or panic on detection: I don't checked if recent kernel actually can solve the problem (It seems to be the case with your question), but I think it didn't worth enabling it.
There is second problem : normally, the detection occur after 120 seconds, but I saw also a Konfig option for this:
config DEFAULT_HUNG_TASK_TIMEOUT
int "Default timeout for hung task detection (in seconds)"
depends on DETECT_HUNG_TASK
default 120
help
This option controls the default timeout (in seconds) used
to determine when a task has become non-responsive and should
be considered hung.
It can be adjusted at runtime via the kernel.hung_task_timeout_secs
sysctl or by writing a value to
/proc/sys/kernel/hung_task_timeout_secs.
A timeout of 0 disables the check. The default is two minutes.
Keeping the default should be fine in most cases.
This also works with kernel threads: example: make a loop device to a file on a fuse filesystem. Then crash the userspace program controlling the fuse filesystem!
You should a get a Ktread which name is in the form loopX (X correspond normally to your loopback device number) HUNGing!
weblinks:
https://unix.stackexchange.com/questions/5642/what-if-kill-9-does-not-work (look at the answer written by ultrasawblade)
http://www.linuxquestions.org/questions/linux-general-1/kill-a-hung-task-when-kill-9-doesn't-help-697305/
http://forums-web2.gentoo.org/viewtopic-t-811557-start-0.html
http://comments.gmane.org/gmane.linux.kernel/1189978
http://comments.gmane.org/gmane.linux.kernel.cifs/7674 (This is a case similar to yours)
In your case of the three question: you have the answer: This probably due to what is probably a well known bug in the vfs linux kernel layer! (There is no CIFS timeouts)
After much trial & error I found a solution that persists.
# vim /etc/fstab
//192.168.1.122/myshare /mnt/share cifs username=user,password=password,_netdev 0 0
The _netdev option is important since we are mounting a network device. Clients may hang during the boot process if the system encounters any difficulties with the network.
https://www.redhat.com/sysadmin/samba-windows-linux

STM32 GDB/OpenOCD Commands and Initialization for Flash and Ram Debugging

I am looking for assistance with the proper GDB / OpenOCD initializion and running commands (external tools) to use within Eclipse for flash and RAM debugging, as well as the proper modifications or additions that need to be incorporated in a makefile for flash vs RAM building for this MCU, if this matters of course.
MCU: STM32F103VET6
I am using Eclipse Helios with Zylin Embedded CDT, Yagarto Tools and Bins, OpenOCD 0.4, and have an Olimex ARM-USB-OCD JTAG adapter.
I have already configured the ARM-USB-OCD and added it as an external tool in Eclipse. For initializing OpenOCD I used the following command in Eclipse. The board config file references the stm32 MCU:
openocd -f interface/olimex-arm-usb-ocd-h.cfg -f board/stm32f10x_128k_eval.cfg
When I run this within Eclipse everything appears to be working (GDB Interface, OpenOCD finds the MCU, etc). I can also telnet into OpenOCD and run commands.
So, I am stuck on the next part; initialization and commands for flash and RAM debugging, as well as erasing flash.
I read through several tutorials, and scoured the net, but have not been able to find anything particular to this processor. I am new to this, so I might not be recognizing an equivalent product for an example.
I'm working with the same tool chain to program and debug a STM32F107 board. Following are my observations to get an STM32Fxxx chip programmed and debugged under this toolchain.
Initial Starting Point
So at this point you've got a working OpenOCD to ARM-USB-OCD connection and so you should be all set on that end. Now the work is on getting Eclipse/Zylin/Yagarto GDB combination to properly talk to the STM32Fxxx through the OpenOCD/Olimex connection. One thing to keep in mind is that all the OpenOCD commands to issue are the run mode commands. The configuration scripts and command-line options to invoke the OpenOCD server are configuration mode commands. Once you issue the init command then the server enters run mode which opens up the set of commands you'll need next. You've probably done it somewhere else but I tack on a '-c "init"' option when I call the OpenOCD server like so:
openocd -f /path to scripts/olimex-arm-usb-ocd-h.cfg -f /path to targets/stm32f107.cfg -c "init"
The following commands I issue next are done by the Eclipse Debug Configurations dialogue. Under the Zylin Embedded debug (Native) section, I create a new configuration, give it a name, Project (optional), and absolute path to the binary that I want to program. Under the Debugger tab I set the debugger to Embedded GDB, point to the Yagarto GDB binary path, don't set a GDB command file, set GDB command set to Standard, and the protocol to mi.
The Commands Tab - Connect GDB to OpenOCD
So the next tab is the Commands tab and that's where the meat of the issue lies. You have two spaces Initialize and Run. Not sure exactly what the difference is except to guess that they occur pre- and post-invocation of GDB. Either way I haven't noticed a difference in how my commands are run.
But anyway, following the examples I found on the net, I filled the Initialize box with the following commands:
set remote hardware-breakpoint limit 6
set remote hardware-watchoint-limit 4
target remote localhost:3333
monitor halt
monitor poll
First two lines tell GDB how many breakpoints and watchpoints you have. Open OCD Manual Section 20.3 says GDB can't query for that information so I tell it myself. Next line commands GDB to connect to the remote target at the localhost over port 3333. The last line is a monitor command which tells GDB to pass the command on to the target without taking any action itself. In this case the target is OpenOCD and I'm giving it the command halt. After that I tell OpenOCD to switch to asynchronous mode of operation. As some of the following operations take a while, it's useful not to have OpenOCD block and wait for every operation.
Sidenote #1: If you're ever in doubt about the state of GDB or OpenOCD then you can use the Eclipse debug console to send commands to GDB or OpenOCD (via GDB monitor commands) after invoking this debug configuration.
The Commands Tab - Setting up the User Flash
Next are commands I give in the Run commands section:
monitor flash probe 0
monitor flash protect 0 0 127 off
monitor reset halt
monitor stm32x mass_erase 0
monitor flash write_image STM3210CTest/test_rom.elf
monitor flash protect 0 0 127 on
disconnect
target remote localhost:3333
monitor soft_reset_halt
to be explained in the following sections...
Setting up Access to User Flash Memory
First I issue an OpenOCD query to see if it can find the flash module and report the proper address. If it responds that it found the flash at address 0x08000000 then we're good. The 0 at the end specifies to get information about flash bank 0.
Sidenote #2: The STM32Fxxx part-specific data sheets have a memory map in section 4. Very useful to keep on hand as you work with the chip. Also as everything is accessed as a memory address, you'll come to know this layout like the back of your hand after a little programming time!
So after confirming that the flash has been properly configured we invoke the command to turn off write protection to the flash bank. PM0075 describes everything you need to know about programming the flash memory. What you need to know for this command is the flash bank, starting sector, ending sector, and whether to enable or disable write protection. The flash bank is defined in the configuration files you passed to OpenOCD and was confirmed by the previous command. Since I want to disable protection for the entire flash space I specify sectors 0 to 127. PM0075 explains how I got that number as it refers to how the flash memory is organized into 2KB pages for my (and your) device. My device has 256KB of flash so that means I have 128 pages. Your device has 512KB of flash so you'll have 256 pages. To confirm that your device's write-protection has been disabled properly, you can check the FLASH_WRPR register at address 0x40022020 using the OpenOCD command:
monitor mdw 0x40022020
The resulting word that it prints will be 0xffffffff which means all pages have their write protection disabled. 0x00000000 means all pages have write protection enabled.
Sidenote #3: On the subject of the memory commands, I bricked my chip twice as I was messing with the option bytes at the block starting at address 0x1ffff800. First time I set the read protection on the flash (kind of hard to figure out what your doing if you do that), second time I set the hardware watchdog which prevented me from doing anything afterwards since the watchdog kept firing off! Fixed it by using the OpenOCD memory access commands. Moral of the story is: With great power comes great responsibility.... Or another take is that if I shoot myself in the foot I can still fix things via JTAG.
Sidenote #4: One thing that'll happen if you try to write to protected flash memory is the FLASH_SR:WRPRTERR bit will be set. OpenOCD will report a more user-friendly error message.
Erasing the Flash
So after disabling the write protection, we need to erase the memory that you want to program. I do a mass erase which erases everything, you also have the option to erase by sector or address (I think). Either way you need to erase first before programming as the hardware checks for erasure first before allowing a write to occur. If the FLASH_SR:PGERR bit (0x4002200c) ever gets set during programming then you know you haven't erased that chunk of memory yet.
Sidenote #5: Erasing a bit in flash memory means setting it to 1.
Programming Your Binary
The next two lines after erasure writes the binary image to the flash and reenables the write protection. There isn't much more to say that isn't covered by PM0075. Basically any error that occurs when you issue flash write_image is probably related to the flash protection not being disabled. It's probably NOT OpenOCD though if you're curious you can take enable the debug output and follow what it does.
GDB Debugging
So finally after programming I disconnect GDB from the remote connection and then reconnect it to the target, do a soft-reset, and my GDB is now ready to debug. This last part I just figured out last night as I was trying to figure out why, after programming, GDB wouldn't properly stop at main() after reset. It kept going off into the weeds and blowing up.
My current thinking and from what I read in the OpenOCD and GDB manuals is that the remote connection is, first and foremost, meant to be used between GDB and a target that has already been configured and running. Well I'm using GDB to configure before I run so I think the symbol table or some other important info gets messed up during the programming. The OpenOCD manual says that the server automatically reports the memory and symbols when GDB connects but all that info probably becomes invalid when the chip gets programmed. Disconnecting and reconnecting I think refreshes the info GDB needs to debug properly. So that has led me to create another Debug Configuration, this one just connects and resets the target since I don't necessarily need to program the chip every time I want to use GDB.
Whew! Done! Kind of long but this took me 3 weekends to figure out so isn't too terribly bad I think...
Final sidenote: During my time debugging I found that OpenOCD debug output to be invaluable to me understanding what OpenOCD was doing under the covers. To program a STM32x chip you need to unlock the flash registers, flip the right bits, and can only write a half-word at a time. For a while I was questioning whether OpenOCD was doing this properly but after looking through the OpenOCD debug output and comparing it against what the PM0075 instructions were, I was able to confirm that it did indeed follow the proper steps to do each operation. I also found I was duplicating steps that OpenOCD was already doing so I was able to cut out instructions that weren't helping! So moral of the story: Debug output is your friend!
I struggled getting JLink to work with a STM3240XX and found a statement in the JLink GDB server documentation saying that after loading flash you must issue a "target reset":
"When debugging in flash the stack pointer and the PC are set automatically when the target is reset after the flash download. Without reset after download, the stack pointer and the PC need to be initialized correctly, typically in the .gdbinit file."
When I added a "target reset" in the Run box of the debugger Setup of Eclipse, suddenly everything worked. I did not have this problem with a Kinetis K60.
The document also explains how to manually set the stack pointer and pc directly if you don't want to issue a reset. It may not be the disconnect/connect that solves the problem but the reset.
What i use after the last sentence in the Comannd Tab - 'Run' Commands, is:
symbol-file STM3210CTest/test_rom.elf
thbreak main
continue
The thbreak main sentence is what makes gdb stop at main.

SNMP traps not caught in Windows XP

I have an SNMP client, sending my PC SNMP traps with destination port 162.
I run a sniffer (Wireshark) from my PC, and see that the traps are indeed received.
The SNMP.exe and SNMPTRAP.exe system processes are up and running (I've even restarted them),and SNMPTRAP.exe is listening to port 162. I have no activated firewall (whether Windows or 3rd party).
The problem: On my PC I have three different applications, all registered to SNMPTRAP.exe. These are all off-the-shelf sw, not something I wrote. MG-SOFT Trap Ringer is, f.e., one of
them. NONE OF THEM CATCHES ANY OF THE TRAPS, and I have no idea where exactly the failure is.
Do you have any idea what may be causing this? Or how, perhaps, I can debug the SNMPTRAP.exe process?
Thanks!
You may disable Microsoft Trap Service in Services panel. Then MG-SOFT TRAP Ringer will work.
The reason is simply that when Microsoft's SnmpTrap.exe monitors port 162, it prevents other applications, such as MG-SOFT's one from monitoring that port.
How you can eliminate and corner the problem is, try below stuff.
How are sure its not catching traps ( since u see them in wireshark). I mean, should they be displayed on some GUI Screen, if yes are the fields empty? or logged into some file, if yes are you sure you are checking the correct file, have you configured either through conf files or command line options which file the traps need to be logged in?
If step 1 does not work, try installing or configuring the whole setup on different pc /laptop and see if it works?
If step 2 does not work, try a different trap daemon program ( SNMPTRAP.EXE), plenty of open source programs exist, if you could capture with different daemon program, then some issue with SNMPTRAP.EXE you are using.
I'm sure one of these steps should work for you, if not get back :)