My client has an old version of uClinux, kernel 2.6.22, running on a Blackfin STAMP board. The main application is divided into 14 processes, plus there's a webserver running on the board.
The bug we're seeing, the webserver keeps running happily along while the VOIP application seems to run out of file handles and can't create new sockets. I've tried every debugging technique I know of. I have a JTAG debugger but the memory is too small for debug symbols. I can't compile with Valgrind or anything like that. Any guesses?
Thanks,
Mike
It's likely you've got a file descriptor leak. Valgrind isn't the best tool for tracking that down anyway.
Start by doing 'ls -lah /proc/pid/fd'. That'll show you a list of file descriptors opened (and not yet closed) by the process.
If you've really got a file descriptor leak you should see a lot of entries there. It should also be immediately obvious which type of file descriptor you're leaking (file, socket, ...).
Once you know that you'll have a better idea of where in the code to look for the leak.
The fact that your file system is full may be another hint. If your application is creating a file and removing it, but not closing the file descriptor, you might have a bunch of files which you won't find in the tree but which still eat up space hanging around. In that case you'll see the file names in (the target of the symlinks in) /proc/pid/fd.
Related
I've written a shared library in C/C++ for MATLAB to create an API for a Monochrome camera.
The code works, but I have some odd issues with memory management (basically the MATLAB functions for freeing/dynamically allocating aren't too reliable). Additionally I have some other really low level things I'd like to debug like looking at values of register holding raw camera buffer.
I can write standalone C Code and launch it with GDB, however a child process will crash the software as only 1 thread is allowed to open a connection to camera at a time. If I don't set break points within the code interacting with device all is fine. But I want to stop the program say after acquiring image buffer, but before copying the data into MATLAB output, the child process spawned by the debugger causes everything to lock up.
Anyone know how I might address this?
Edit: "Unreliable" is not a good wording. Basically, I retrieve an image buffer from camera (which is dynamically allocated because image bitdepth is variable). This array is created/destroyed with mxMalloc and mxDestroyArray which works okay if I have MATLAB_MEM_MGR enabled. This is part of what I would like to debug inside MEX. The other is comparing the raw byte values of the image buffer before coming back to matlab.
Additional Clarification:
The error I get in GDB is actually a GenICam error for RESOURCE_IN_USE. My intuition is because the parent process hasn't released the camera resources, the child thread started actually causes issues if that makes sense.
On Windows, it seems somewhat dangerous to switch from VC++ to MinGW-w64 for a MEX, as you can easily wind up with the classic Windows bug of having multiple copies of libc. See the warning here, "Do not link to library files compiled with non-MinGW compilers". A MEX built with 'gcc' will pull in malloc() and free() from msvcrt.dll, whereas any DLLs you link against that were built with 'cl' will pull in ucrtbase.dll instead. That could easily lead to crashes, if for instance gcc-compiled code calls free() on a block allocated using malloc() in cl-compiled code.
The gdb.exe installed by MATLAB does seem to behave somewhat strangely, especially when I press Control-C. It doesn't like Cygwin terminals either. It does spawn a weird 'gdborig' helper process. Finally, I kept seeing a Microsoft A/V tool, mpcmdrun, firing off as I was playing with it. You might try the newer gdb installed by MSYS2 (although I may have seen A/V there as well).
Indeed, some random child process probably inherited the device connection handle, causing the lock-up. But that child process might be something else... not be the extra one created by that silly old version of gdb (7.11.1).
An executable is loaded and run in WinDbg
It loads modules it needs at certain addresses
Breakpoints set/traces retrieved in this session depend on these addresses
When another session is started for the same executable, (either depending the on the code execution path changing dll dependency order, or some indeterministic loader behavior?) the modules are now loaded into different addresses.
It would have been helpful if there was a way to instruct windbg/loader to load the not-yet-loaded modules at given addresses. This would make certain scripts/text-comparisons much easier.
Yes, I do realize that for example, setting breakpoints relative to symbol names should be preferred instead of using fixed addreses, but being able to "reproduce" a reference debugging environment definitely has certain advantages.
Assuming we're dealing with 3rd party DLLs (that I cannot recompile with predefined loading addresses), is there a way to do this?
I was so happy to see .reload command has an address parameter, which looked like it would do exactly what I'm asking. However, even though that command would load the modules, when the program is continued (and the actual dll load is needed), it would go ahead and still load another copy(?) for the same module, and give a warning like:
WARNING: moduleX_1be0000 overlaps moduleX
So it didn't really work like I expected, thus this question!
WinDbg does not load modules (DLLs). The modules are loaded by the executable.
The ld and .reload commands of WinDbg do not load modules, they load symbol information (PDB files).
The process of changing the address of a module is called rebasing. It happens if the base address is not available any more, e.g. in use by a heap already. In that case, you cannot prevent rebasing at all.
One thing that might help is disabling ASLR (address space layout randomization). You can change that setting in a DLL or EXE. It's part of the COFF header:
On Windows 7, there were ways to disable ASLR completely, but it's not recommended to change that setting on a per-system basis just to help you debug a single process.
Another option would be to use rebase.exe of the Windows SDK and change the base address to a virtual address that you think is more likely to be free at the time the DLL is loaded. I never did that myself, but the rebase help says:
If you want to rebase to a fixed address (ala QFE)
use the ##files.txt format where files.txt contains
address/size combos in addition to the filename
so, it sounds possible to define your own address.
I am working through the tutorial files included with the ACT-R Standalone Windows distribution. This isn't part of any academics assignment; I'm working on this to learn cognitive modeling and writing production systems. I am using Lispbox, an EMACS-SLIME-LISP bundle to write my cognitive models. The distro and lispbox reside on my flash drive. Finally, the distro uses Clozure Common Lisp.
The problem is that whenever I try to reload a model after making changes, ACT-R gives me this error:
Error Reloading:
#|warning: no load file recorded |#
#|warning: cannot use reload |#
It only does this for my unit 2 assignment model. Not any other model, including the one I have written in unit 1.
Now this is a big issue for me - instead of simply pressing "reload" on ACT-R's GUI, I'm forced to close ACT-R entirely and open it again every time I want to reload the model.
I'm thinking this is a problem with EMACS. I have tried reinstalling ACT-R, and deleting any .lisp~ files or anything else that Emacs has saved in addition to the file I wrote. I still get this error.
Could you please help me understand what's going on and how I can fix this if it ever arises again in the future? I would like to get back to working on my assignment as soon as possible.
I have emailed the creator of ACT-R; He told me that I must include the statement
(clear all)
at the beginning of every file, so the software uses the most up-to-date file when reloading.
As a background, today, my GWT hosted mode runs just mysteriously slowed down to the extent that it is virtually not working. Whenever I pause the application the relevant threads (the main thread, the code server, etc.) are waiting on some file I/O native method. After scratching my head for a while I tried to clean up my hard disk a bit. Then I just discovered in my user's Temp folder a 4 gigabyte file named gwt7155307955598297091byte-cache. I wonder what this file may be good for, and what will happen if I delete it completely. Will I have a performance issue the next time I start the dev mode waiting for the "byte-cache" to be recreated?
Looking at the gwt source code, it says it's "A global shared Disk cache", used by a linker (com.google.gwt.core.ext.linker package) and compiler (com.google.gwt.dev.javac package).
I am just curious about this.
I had a network folder open on one computer viewing the files in the folder. From another computer I opened the same folder on the network and deleted a file. On the first computer the deleted file immediate disappeared from the list.
The only way that I can think of how it knows that is that it is constantly checking the contents of the open folder. But that sounds like it would waste a lot of resources to do, but I cannot think of any other way it could do that. So I was just wondering...how does that work?
Thanks.
It's probably a push notification. Rather that the client computer constantly checking, the server sends a message to the client when a change is made.
You never specified what platform you're interested in. In general, the only thing that is portable is polling to see when a file or directory has been updated. Polling once a second or so is generally not too expensive, though over a network file system it may be too much.
Various platforms offer a variety of solutions for being notified when filesystems change. Moder versions of Linux provide inotify. Mac OS X provides the FSEvent system. On Windows there is a directory change notification system.