How can I debug a deadlocked rippled? - rippled

My dogfood machine hit a deadlock earlier this morning in a debug build. It's at a breakpoint in gdb now. What is my best next step before I kill it?

Turn off pagination: set pagination off
Turn on logging: set logging on
Get a stack trace: thread apply all bt
Get a core dump in case we want more info: generate-core-file
Confirm the core file and log file are reasonable (the core file should be binary, the log file should be text, both should be non-empty) before exiting from gdb.
Upload the log file to a Gist so we can all see it.

Related

Unable to obtain Proper Stack Trace after running the dump file with WinDbg

We are having exception thrown on Production that causes the w3wp process to crash. To figure out the faulted code, we configured the Debug Diag that is creating dump file when exception occur. Then we are trying to run the dump file with WinDbg to obtain the Stack Trace to figure out the faulted code but this is what we are experiencing after opening the dump file and running the required commands.
As you can see in the image above, it's not giving the stack trace after running the commands, I'm not sure what I'm missing
UPDATE
After running a command twice as suggested in the comments, I'm able to get the stack trace. But seems like there is no faulted code pointed out in the stack instead there is a long list of underlying framework in the stack. Below is the snapshot for the start of stack. Not sure how to identify the error. Any suggestion or I may need to open separate Question for this?

Working with WinDbg, save .dmp file

I only started working with WinDbg and I saw video, where guy attach WinDbg to a process. Then he saved dump file as command ".dump /f C:\example\mydump.dmp".
What is meaning with ".dump" and "/f" ? Thank you, sorry for my english.
Refer to the docs
This produces a user-mode or kernel-mode crash dump and with the switch /f will create a complete memory dump to that location.
However, personally I always use the flag /ma for user-mode dumps as this has more info (and produces a larger memory dump).
The dump is essentially memory (either the entire memory for kernel or your process for user mode) and depending on the flags you get more info such as state, handles and other info that help diagnose the problem.
For a more complete explanation you can read these links
Kernel mode dumps
user mode dumps

agetty log file location

On Centos 6.2, trying to get the kernel log redirected to the serial console, I came across an issue where agetty seems to be respawning every few keypresses.
That is, I get a login prompt in the middle of typing (after logging in).
In order to investigate the issue further, I'm looking for the location of agetty logs, but to no avail. Where and how can I see log messages for respawned agetty process?
The "diagnostics" section of the "agetty" command manpage states:
Depending on how the program was configured, all diagnostics are written to
the console device or reported via the syslog(3) facility. Error messages are
produced if the port argument does not specify a terminal device; if there is
no utmp entry for the current process (System V only); and so on.
The syslog facility by default writes the "/var/log/messages" file, but it can be configured to write another file by editing its configuration file "/etc/syslog.conf".
Finally, if the error you get is "respawning too fast", you should check your "/etc/inittab" file, as described here.

STM32 GDB/OpenOCD Commands and Initialization for Flash and Ram Debugging

I am looking for assistance with the proper GDB / OpenOCD initializion and running commands (external tools) to use within Eclipse for flash and RAM debugging, as well as the proper modifications or additions that need to be incorporated in a makefile for flash vs RAM building for this MCU, if this matters of course.
MCU: STM32F103VET6
I am using Eclipse Helios with Zylin Embedded CDT, Yagarto Tools and Bins, OpenOCD 0.4, and have an Olimex ARM-USB-OCD JTAG adapter.
I have already configured the ARM-USB-OCD and added it as an external tool in Eclipse. For initializing OpenOCD I used the following command in Eclipse. The board config file references the stm32 MCU:
openocd -f interface/olimex-arm-usb-ocd-h.cfg -f board/stm32f10x_128k_eval.cfg
When I run this within Eclipse everything appears to be working (GDB Interface, OpenOCD finds the MCU, etc). I can also telnet into OpenOCD and run commands.
So, I am stuck on the next part; initialization and commands for flash and RAM debugging, as well as erasing flash.
I read through several tutorials, and scoured the net, but have not been able to find anything particular to this processor. I am new to this, so I might not be recognizing an equivalent product for an example.
I'm working with the same tool chain to program and debug a STM32F107 board. Following are my observations to get an STM32Fxxx chip programmed and debugged under this toolchain.
Initial Starting Point
So at this point you've got a working OpenOCD to ARM-USB-OCD connection and so you should be all set on that end. Now the work is on getting Eclipse/Zylin/Yagarto GDB combination to properly talk to the STM32Fxxx through the OpenOCD/Olimex connection. One thing to keep in mind is that all the OpenOCD commands to issue are the run mode commands. The configuration scripts and command-line options to invoke the OpenOCD server are configuration mode commands. Once you issue the init command then the server enters run mode which opens up the set of commands you'll need next. You've probably done it somewhere else but I tack on a '-c "init"' option when I call the OpenOCD server like so:
openocd -f /path to scripts/olimex-arm-usb-ocd-h.cfg -f /path to targets/stm32f107.cfg -c "init"
The following commands I issue next are done by the Eclipse Debug Configurations dialogue. Under the Zylin Embedded debug (Native) section, I create a new configuration, give it a name, Project (optional), and absolute path to the binary that I want to program. Under the Debugger tab I set the debugger to Embedded GDB, point to the Yagarto GDB binary path, don't set a GDB command file, set GDB command set to Standard, and the protocol to mi.
The Commands Tab - Connect GDB to OpenOCD
So the next tab is the Commands tab and that's where the meat of the issue lies. You have two spaces Initialize and Run. Not sure exactly what the difference is except to guess that they occur pre- and post-invocation of GDB. Either way I haven't noticed a difference in how my commands are run.
But anyway, following the examples I found on the net, I filled the Initialize box with the following commands:
set remote hardware-breakpoint limit 6
set remote hardware-watchoint-limit 4
target remote localhost:3333
monitor halt
monitor poll
First two lines tell GDB how many breakpoints and watchpoints you have. Open OCD Manual Section 20.3 says GDB can't query for that information so I tell it myself. Next line commands GDB to connect to the remote target at the localhost over port 3333. The last line is a monitor command which tells GDB to pass the command on to the target without taking any action itself. In this case the target is OpenOCD and I'm giving it the command halt. After that I tell OpenOCD to switch to asynchronous mode of operation. As some of the following operations take a while, it's useful not to have OpenOCD block and wait for every operation.
Sidenote #1: If you're ever in doubt about the state of GDB or OpenOCD then you can use the Eclipse debug console to send commands to GDB or OpenOCD (via GDB monitor commands) after invoking this debug configuration.
The Commands Tab - Setting up the User Flash
Next are commands I give in the Run commands section:
monitor flash probe 0
monitor flash protect 0 0 127 off
monitor reset halt
monitor stm32x mass_erase 0
monitor flash write_image STM3210CTest/test_rom.elf
monitor flash protect 0 0 127 on
disconnect
target remote localhost:3333
monitor soft_reset_halt
to be explained in the following sections...
Setting up Access to User Flash Memory
First I issue an OpenOCD query to see if it can find the flash module and report the proper address. If it responds that it found the flash at address 0x08000000 then we're good. The 0 at the end specifies to get information about flash bank 0.
Sidenote #2: The STM32Fxxx part-specific data sheets have a memory map in section 4. Very useful to keep on hand as you work with the chip. Also as everything is accessed as a memory address, you'll come to know this layout like the back of your hand after a little programming time!
So after confirming that the flash has been properly configured we invoke the command to turn off write protection to the flash bank. PM0075 describes everything you need to know about programming the flash memory. What you need to know for this command is the flash bank, starting sector, ending sector, and whether to enable or disable write protection. The flash bank is defined in the configuration files you passed to OpenOCD and was confirmed by the previous command. Since I want to disable protection for the entire flash space I specify sectors 0 to 127. PM0075 explains how I got that number as it refers to how the flash memory is organized into 2KB pages for my (and your) device. My device has 256KB of flash so that means I have 128 pages. Your device has 512KB of flash so you'll have 256 pages. To confirm that your device's write-protection has been disabled properly, you can check the FLASH_WRPR register at address 0x40022020 using the OpenOCD command:
monitor mdw 0x40022020
The resulting word that it prints will be 0xffffffff which means all pages have their write protection disabled. 0x00000000 means all pages have write protection enabled.
Sidenote #3: On the subject of the memory commands, I bricked my chip twice as I was messing with the option bytes at the block starting at address 0x1ffff800. First time I set the read protection on the flash (kind of hard to figure out what your doing if you do that), second time I set the hardware watchdog which prevented me from doing anything afterwards since the watchdog kept firing off! Fixed it by using the OpenOCD memory access commands. Moral of the story is: With great power comes great responsibility.... Or another take is that if I shoot myself in the foot I can still fix things via JTAG.
Sidenote #4: One thing that'll happen if you try to write to protected flash memory is the FLASH_SR:WRPRTERR bit will be set. OpenOCD will report a more user-friendly error message.
Erasing the Flash
So after disabling the write protection, we need to erase the memory that you want to program. I do a mass erase which erases everything, you also have the option to erase by sector or address (I think). Either way you need to erase first before programming as the hardware checks for erasure first before allowing a write to occur. If the FLASH_SR:PGERR bit (0x4002200c) ever gets set during programming then you know you haven't erased that chunk of memory yet.
Sidenote #5: Erasing a bit in flash memory means setting it to 1.
Programming Your Binary
The next two lines after erasure writes the binary image to the flash and reenables the write protection. There isn't much more to say that isn't covered by PM0075. Basically any error that occurs when you issue flash write_image is probably related to the flash protection not being disabled. It's probably NOT OpenOCD though if you're curious you can take enable the debug output and follow what it does.
GDB Debugging
So finally after programming I disconnect GDB from the remote connection and then reconnect it to the target, do a soft-reset, and my GDB is now ready to debug. This last part I just figured out last night as I was trying to figure out why, after programming, GDB wouldn't properly stop at main() after reset. It kept going off into the weeds and blowing up.
My current thinking and from what I read in the OpenOCD and GDB manuals is that the remote connection is, first and foremost, meant to be used between GDB and a target that has already been configured and running. Well I'm using GDB to configure before I run so I think the symbol table or some other important info gets messed up during the programming. The OpenOCD manual says that the server automatically reports the memory and symbols when GDB connects but all that info probably becomes invalid when the chip gets programmed. Disconnecting and reconnecting I think refreshes the info GDB needs to debug properly. So that has led me to create another Debug Configuration, this one just connects and resets the target since I don't necessarily need to program the chip every time I want to use GDB.
Whew! Done! Kind of long but this took me 3 weekends to figure out so isn't too terribly bad I think...
Final sidenote: During my time debugging I found that OpenOCD debug output to be invaluable to me understanding what OpenOCD was doing under the covers. To program a STM32x chip you need to unlock the flash registers, flip the right bits, and can only write a half-word at a time. For a while I was questioning whether OpenOCD was doing this properly but after looking through the OpenOCD debug output and comparing it against what the PM0075 instructions were, I was able to confirm that it did indeed follow the proper steps to do each operation. I also found I was duplicating steps that OpenOCD was already doing so I was able to cut out instructions that weren't helping! So moral of the story: Debug output is your friend!
I struggled getting JLink to work with a STM3240XX and found a statement in the JLink GDB server documentation saying that after loading flash you must issue a "target reset":
"When debugging in flash the stack pointer and the PC are set automatically when the target is reset after the flash download. Without reset after download, the stack pointer and the PC need to be initialized correctly, typically in the .gdbinit file."
When I added a "target reset" in the Run box of the debugger Setup of Eclipse, suddenly everything worked. I did not have this problem with a Kinetis K60.
The document also explains how to manually set the stack pointer and pc directly if you don't want to issue a reset. It may not be the disconnect/connect that solves the problem but the reset.
What i use after the last sentence in the Comannd Tab - 'Run' Commands, is:
symbol-file STM3210CTest/test_rom.elf
thbreak main
continue
The thbreak main sentence is what makes gdb stop at main.

DebugView Error

I'm working with Windows 7 64X and DebugView 4.76.0.0.
Logs isn't shown on DebugView.
I trying to write logs with Debug.WriteLine("Text"); and see nothing.
I can see that It's connected to my computer.
When I use DebugView V4.64.0.0 I get error message that it is already connected to other instance of DebugView, but I've checked and there isn't any other.
What can I do or check ?
BTW,
I can see the log in the output window.
Regards,
Eitan Gabay
To check if you really have another instance of debugview running, open up your task manager, and select "show processes from all users". Make sure that only one debugview is running.
When debugging through Visual Studio, Visual Studio actually competes against DebugView. If you were to compile your executable, and run it externally, you will see your log messages printed in DebugView.
One other thing that people sometimes overlook is that Debug.Write statements are excluded if a program is compiled for Release. However, you can still write to the trace if you use Trace.Write instead of Debug.Write.
All messages that you print go to a shared section of memory called DB_WINBUFFER link. It is important to realize that each windows session has its own "DB_WINBUFFER". Whenever DebugView detects that you are not in session 0, it will provide a "Capture Global" option. If your program is running as a windows service, then you will need to enable capture global (unless you are already in Session 0, which is only possible in Windows XP).