How to pinpoint where in the program a crash happened using .dmp and WinDbg? - windbg

I have a huge application (made in PowerBuilder) that crashes every once in a while so it is hard to reproduce this error. We have it set up so that when a crash like this occurs, we recieve a .dmp file.
I used WinDbg to analyze my .dmp file with the command !analyze -v. From this I can deduct that the error that occured was an Access Violation C0000005. Based on the [0] and [1] parameters, it attempted to dereference a null pointer.
WinDbg also showed me STACK_TEXT consisting of around 30 lines, but I am not sure how to read it. From what I have seen I need to use some sort of symbols.
First line of my STACK_TEXT is this:
00000000`00efca10 00000000`75d7fa46 : 00000000`10df1ae0 00000000`0dd62828 00000000`04970000 00000000`10e00388 : pbvm!ob_get_runtime_class+0xad
From this, my goal is to analyze this file to figure out where exactly in the program this error happened or which function it was in. Is this something I will be able to find after further analyzing the stack trace?
How can I pinpoint where in the program a crash happened using .dmp and WinDbg so I can fix my code?

If you analyze a crash dump with !analyze -v, the lines after STACK TEXT is the stack trace. The output is equivalent to kb, given you set the correct thread and context.
The output of kb is
Child EBP
Return address
First 4 values on the stack
Symbol
The backticks ` tell you that you are running in 64 bit and they split the 64 bit values in the middle.
On 32 bit, the first 4 parameters on the stack were often equivalent to the first 4 parameters to the function, depending on the calling convention.
On 64 bit, the stack is not so relevant any more, because with the 64 bit calling convention, parameters are passed via registers. Therefore you can probably ignore those values.
The interesting part is the symbol like pbvm!ob_get_runtime_class+0xad.
In front of ! is the module name, typically a DLL or EXE name. Look for something that you built. After the ! and before the + is a method name. After the + is the offset in bytes from the beginning of the function.
As long as you don't have functions with thousands of lines of code, that number should be small, like < 0x200. If the number is larger than that, it typically means that you don't have correct symbols. In that case, the method name is no longer reliable, since it's probably just the last known (the last exported) method name and a faaaar way from there, so don't trust it.
In case of pbvm!ob_get_runtime_class+0xad, pbvm is the DLL name, ob_get_runtime_class is the method name and +0xad is the offset within the method where the instruction pointer is.
To me (not knowing anything about PowerBuilder) PBVM sounds like the PowerBuilder DLL implementation for Virtual Memory. So that's not your code, it's the code compiled by Sybase. You'd need to look further down the call stack to find the culprit code in your DLL.
After reading Wikipedia, it seems that PowerBuilder does not necessarily compile to native code, but to intermediate P-Code instead. In this case you're probably out of luck, since your code is never really on the call stack and you need a special debugger or a WinDbg extension (which might not exist, like for Java). Run it with the -pbdebug command line switch or compile it to native code and let it crash again.

Related

Fortran and Eclipse: Displaying text in console

I'm having a small difficulty with Fortran 90 and Eclipse. I installed the "Photran" plugin to Eclipse, and have managed to compile everything perfect, and overall the program does what it has to do. The problem comes when displaying text in the Eclipse console. The code it self not that important, since it does what it has to do, but more the output generation.
The piece of the code I'm having trouble with is the following:
subroutine main_program
write(*,*) "Program begins!"
<Program that takes ~5mins to run>
write(*,*) "Program ends!"
end subroutine main_program
Specifically, the problem is that in the console, the first message should be shown immediately, "Program begins!", and after ~5 minutes it should show "Program ends!". It happens that both of these messages get displayed only after the program is done running, not while the programs is executing.
I have used:
subroutine main_program
print*, "Program begins!"
<Program that takes ~5mins to run>
print*, "Program ends!"
end subroutine main_program
but it keeps on doing the same thing. I saw a "similar" post earlier (can't find the link though, sorry about that) but it was not really what I was looking for.
OK, here's the answer. Insert the statement
flush 6
after the first write statement to have its output sent immediately to the console. Insert it anywhere else you wish once you understand what it is doing.
It is obvious (to me) from the situation OP describes that the output is being buffered, that is the program issues a write statement and passes the output off to the operating system which does as it damn well pleases -- here it waits until the program ends before writing anything to the console. I guess that its buffering capabilities have some limits and if the program exceeded them the o/s would empty its buffers prior to program end.
Fortran now (since 2003 I think) provides a standard way of telling the o/s to actually flush the buffer to the output device -- the flush statement. In its simplest form flush takes only one argument, the unit number of the output channel to be flushed. I guessed that OP had unit 6 connected to stdout (aka *), since this is a near-universal default configuration, though not one guaranteed by the Fortran language standard.
I don't think that flush * is correct.
If you have a pre-2003 compiler then (a) for Backus' sake update and (b) it is likely that it supports a non-standard way to flush buffers; if memory serves gfortran used to provide a subroutine which would be called something like call flush(6).
There are other ways, outside Fortran, to tell the o/s to write to disk (or console or what have you) immediately. Look at the documentation for your o/s if you are interested in them.

BFX field to large for a data item increase -S

I am getting the above error when trying to run a script produce to a report. It is a pre-existing script that has been run, successfully many times before. Research has told me that that it is something to do with the stack size? I’m running 10.2B02 in WRQ Reflections. Can anyone tell me what this statement means and how I look up the value of my –S.
Thanks,
Paul
-s is a client startup parameter. You mention "Reflections" so you are probably using a character terminal session. The -s parameter is on the command line used to start Progress (which might be inside a script). If there is a -pf somefile.pf on the command line then it is inside that "parameter file". If it is not specified the default value is 40. The maximum value is limited by available memory but setting it in the hundreds or even in the thousands is not unheard of.
You can also get the startup values by sending a SIGUSR1 to the _progres process that the session is running. I.e. kill -USR1 That will (safely) create a "protrace." file that includes startup parameters and a 4gl stack trace. The file will appear in either the current directory, the home directory or the temp-file directory (I forget which, just look for protrace*).
This error usually means that your code is manipulating a field that is too large. (Like the error says.) That might be for a lot of reasons.
One common possibility is string concatenation in a loop.
Or you might be calling lots of sub-procedures and passing parameters around.
If "nothing has changed" in the code then it probably just means that some data structure has grown slightly larger over time and increasing -s is really no big deal so long as it solves the problem.
If you keep having to increase it then it is more likely that you have some sort of coding issue. Maybe you're passing things by value that ought to be passed by reference or maybe you have run away recursion. Or something else. You'd need to provide a lot more detail to say for sure.
It is also possible (but unlikely) that you have a corrupt data record that appears to have a field in it that is too large. You could run "proutil dbName -C dbanalys" as an initial step to see if that is true.
Part of the error message is non-standard -- I'm not certain which log file it is coming from or how it got there (applications can write their own messages) but it seems that it might have something to do with trying to send an e-mail. So I'd be suspicious that either the list of recipients got too long or that the body of the e-mail is too large.

Fortran Input Files from Mac OS to XP

I recently got some Fortran code, which successfully ran on Mac OS. This code along with input files were later sent to me to get compiled. I precisely used the same code and the same input files but an error "array bounds exceeded" appeared. I am using CVF 6.6 on Windows XP.
I wanted to know the following things:
Is this a compiler or OS problem?
Shall I arrange a Mac OS to get them compiled?
After surfing so much on internet I think the wise thing to do is to get my data "format free". But I don't how to do that when my data is a time series with time in one column and voltage in the second.
The error message array bounds exceeded always (I think) indicates that your code has tried to access an array element outside the bounds of an array, for example element 25 in an array with 24 elements. This can only occur at run-time, and your compiler/run-time will only spot it if, when compiling, you set on the compiler option(s) for array bounds checking; your compiler documentation will tell you what those options are.
The error message should have been accompanied by some more information telling you where in the program the error occurred and the index of the out-of-bounds array access.
Given that your source code and your input data are identical how could this have occurred ? Since you have compiled the program on 2 different platforms your compilations cannot have been identical, it is entirely possible that array bounds checking is switched off on your Mac and on on your Windows PC.
Fortran programs may execute apparently successfully despite making accesses to out-of-bounds array elements. If the memory address of array element 25 out of 24 holds a value which is meaningful and the address is within your program's space the computation is likely to continue. It is also likely to be useless, but you can go for many years before finding that out.
I suggest that you go back to the Mac, recompile with array bounds checking, and run again, see what happens.
It's also possible that the routines which read your file find a different number of values on XP and Mac; I suspect that can be caused by different line ending characters, even by whether or not the input file has a newline at the end. Check this too.

Finding out the call site from hex representation

I'm trying to analyse a crash dump of MS BizTalk service, which is constantly consuming 100% CPU (and I assume that's because of our code :) ). I have a couple of dumps and the stack trace of the busiest threads looks similar - the only problem is, that the top of the stack seems to be missing symbols. It looks like this:
0x642`810b2fd0
So, the question is - how can I find out the module/function from this address? (or at least the module, so that I know what symbol file is missing).
lm in WinDbg dumps list of modules. In your case WinDbg does not find any modules that occupy this address -- otherwise it would have printed +. Some of the libraries generate code dynamically, in this case the body of the function will be placed in the heap and won't have any symbols or even module associated with it. I know MCF at some point did this.
I suggest you try to analyze the frames at the top of the stack that have symbols and try to find out what they might be doing.
Wish I could help more, but the only thing I can suggest is reading this cheat sheet of WinDbg commands. There is one command wt which has a list of params which could help with getting module information about that call site.
Let me know if this is any use for you.

Need help debugging a minidump with WinDbg

I've read a lot of similar questions, but I can't seem to find an answer to exactly what my problem is.
I've got a set of minidumps from a 32-bit application that was running on 64-bit Windows 2008. The 32-bit Visual Studio on my 32-Bit Vista Business wouldn't touch them at all, so I've been trying to open them in WinDbg.
I don't have the EXACT corresponding .pdb files (we only started saving them AFTER this particular release), but I have .pdbs built by the same machine with the same code. I also have access to the exact executable that created the minidumps.
I found a nifty little application called ChkMatch that can make .pdbs match an executable... the only difference (according to ChkMatch) was age, so I matched my newer .pdbs to the original executable.
However, when I load it in WinDbg, it still says that it is a "mismatched pdb" then, since I had set .symopts+0x40 it tries to load them anyway. I then get the warning:
*** WARNING: Unable to verify checksum for myexe.exe
I ran !lmi myexe and saw that, indeed, the checksum of the executable was in fact zero. From poking around a bit, I've found that the executable should have been built with the /release flag to have a checksum. That's all well and good, but I can't exactly go back in time and rebuild (if I did though, I'd definitely save the original .pdbs :-P ).
Is there anything I can do here? Seems a little ridiculous I can't make things match here at least enough to get a call stack.
you don't need the checksum to get a call stack - this warning can be safely ignored.
to get the stack you need to issue the stack command (any variant of k).
if the minidumps are any good (i.e. describe an actual fault), you should first try the auto analysis !analyze -v which will get you started.
come back when you have exhausted your expertise :o)
If you're working with minidumps then you have to set your image path (Ctrl+I) to point to a location with the images in the dump. The trouble with minidumps is that they don't contain any code or data from the executables on the target, so you have to supply them yourself.
-scott