How can I solve the error "Can't switch processors on a single processor kernel triage dump" in WinDbg? - windbg

I have a mini dump generated with the default parameters described at Collecting User-Mode Dumps.
The dump was generated when the system was hanging through right CTRL+SCROLL LOCK+SCROLL LOCK as set in the following register keys:
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\kbdhid\Parameters]
"CrashOnCtrlScroll"=dword:00000001
So the call stack that WinDbg shows me after the command 0: kd> !analyze -v is the one of the thread that was executing from kbdhid device driver.
When I tried to switch to a different processor I get the error:
0: kd> ~1
Can't switch processors on a single processor kernel triage dump
How can I solve this error?
What is a "single processor kernel triage dump"? If I search with Google I will get 3 or 4 results... no more, maybe someone from Microsoft could be of great help here :-).
Is there some particular value of CustomDumpFlags that I have to set? See MINIDUMP_TYPE enumeration.
I know that my system is multiprocessor and WinDbg confirms it:
0: kd> ~8
8 is not a valid processor number
0: kd> ~7
Can't switch processors on a single processor kernel triage dump

A Single Processor Kernel Dump or a kernel triage dump is a feature
where you can collect the kernel mode stack trace of an user mode process
on a machine that was not booted with /DEBUG on iirc available from vista+
you can also collect this dump using kdbgctrl
D:\>tasklist | grep -i edge
xxxxxxxxxxxxxxxxxxxxxx
MicrosoftEdgeCP.exe 12588 Console 5 41,892 K
MicrosoftEdgeCP.exe 9152 Console 5 1,49,064 K
xxxxxxxxxxxxxx
D:\>kdbgctrl -td 9152 edgy.dmp
Dump created in edgy.dmp, 1048564 bytes
D:\>file edgy.dmp
edgy.dmp: MS Windows 64bit crash dump, 1018708 pages
run !process -1 1f command to get the stack of all the threads for the current process
only one process kernel memory will be available in this dump
!process 0 0 wont work
it is not full kernel memory dump and may not be having information about any other processor stack aswell
run !cpuid only the info about 0 processor will be present in this dump
0: kd> !cpuid
CP F/M/S Manufacturer MHz
0 6,142,9 GenuineIntel 2304
Unable to get information for processor 1
Unable to get information for processor 2
Unable to get information for processor 3
0: kd>
or irql
0: kd> !irql 0
Debugger saved IRQL for processor 0x0 -- 0 (LOW_LEVEL)
0: kd> !irql 1
Cannot get PRCB address from processor 0x1
0: kd> !irql 2
Cannot get PRCB address from processor 0x2
0: kd> !irql 3
Cannot get PRCB address from processor 0x3
0: kd>

Related

How to count cache-misses in mmap-ed memory (using eBPF)?

I would like to get timeseries
t0, misses
...
tN, misses
where tN is a timestamp (second-resolution) and misses is a number of times the kernel made disk-IO for my PID to load missing page of the mmap()-ed memory region when process did access to that memory. Ok, maybe connection between disk-IO and memory-access is harder to track, lets assume my program can not do any disk-io with another (than assessing missing mmapped memory) reason. I THINK, I need to track something called node-load-misses in perf world.
Any ideas how eBPF can be used to collect such data? What probes should I use?
Tried to use perf record for similar purpose: I dislike how much data perf records. As I recall the try was like (also I dont remember how I parsed that output.data file):
perf record -p $PID -a -F 10 -e node-loads -e node-load-misses -o output.data
I thought eBPF could give some facility to implement such thing in less overhead way.
Loading of mmaped pages which are not present in memory is not hardware event like perf's cache-misses or node-loads or node-load-misses. When your program assess not present memory address, GPFault/pagefault exception is generated by hardware and it is handled in software by Linux kernel codes. For first access to anonymous memory physical page will be allocated and mapped for this virtual address; for access of mmaped file disk I/O will be initiated. There are two kinds of page faults in linux: minor and major, and disk I/O is major page fault.
You should try to use trace-cmd or ftrace or perf trace. Support of fault tracing was planned for perf tool in 2012, and patches were proposed in https://lwn.net/Articles/602658/
There is a tracepoint for page faults from userspace code, and this command prints some events with memory address of page fault:
echo 2^123456%2 | perf trace -e 'exceptions:page_fault_user' bc
With recent perf tool (https://mirrors.edge.kernel.org/pub/linux/kernel/tools/perf/) there is perf trace record which can record both mmap syscalls and page_fault_user into perf.data and perf script will print all events and they can be counted by some awk or python script.
Some useful links on perf and tracing: http://www.brendangregg.com/perf.html http://www.brendangregg.com/ebpf.html https://github.com/iovisor/bpftrace/blob/master/INSTALL.md
And some bcc tools may be used to trace disk I/O, like https://github.com/iovisor/bcc/blob/master/examples/tracing/disksnoop.py or https://github.com/brendangregg/perf-tools/blob/master/examples/iosnoop_example.txt
And for simple time-series stat you can use perf stat -I 1000 command with correct software events
perf stat -e cpu-clock,page-faults,minor-faults,major-faults -I 1000 ./program
...
# time counts unit events
1.000112251 413.59 msec cpu-clock # 0.414 CPUs utilized
1.000112251 5,361 page-faults # 0.013 M/sec
1.000112251 5,301 minor-faults # 0.013 M/sec
1.000112251 60 major-faults # 0.145 K/sec
2.000490561 16.32 msec cpu-clock # 0.016 CPUs utilized
2.000490561 1 page-faults # 0.005 K/sec
2.000490561 1 minor-faults # 0.005 K/sec
2.000490561 0 major-faults # 0.000 K/sec

Is there an equivalent register to Intel's MSR_SMI_COUNT on AMD architecture?

On recent Intel CPUs it's possible to count the number of SMIs that have occurred, by reading msr 0x34.
I have checked the manuals at -
https://developer.amd.com/resources/developer-guides-manuals/
for an equivalent register/function, without success.
AMD Zen specifies the LsSmiRx performance counter for System Management Interrupts (SMIs):
PMCx02B [SMIs Received] (Core::X86::Pmc::Core::LsSmiRx)
Counts the number of SMIs received.
(Open-Source
Register Reference
For AMD Family 17h Processors
Models 00h-2Fh. Rev 3.03, 2018, page 153)
On Linux, you can monitor it like this:
# perf stat -e ls_smi_rx -I 60000
This command prints each minute a count of all newly triggered SMIs aggregated over all CPUs.
That means for monitoring - unlike with the MSR_SMI_COUNT register available on Intel CPUs - you have to actively program a PMU register (to observe the LsSmiRx event).
NB: The above referenced AMD documentation confirms that AMD Zen doesn't support the SMI_COUNT MSR (0x34), since it isn't included in the list of available MSRs (in Chapter 2.1.10, page 77).
No, but SMI count is available as a PMC (performance counter) on AMD processors.

stm32 factory bootloader possibly overwritten with openocd?

tl;dr: flashed firmware to 0x00000000 instead of 0x08000000, am I lost?
Hello,
my device is based on a STM32F103CBTx which came with a proprietary firmware and had readout protection on.
I connect to it with a ST-Link v2 SWDIO and SWCLK connected to PA13 and PA14 and this command:
sudo openocd -f /usr/share/openocd/scripts/interface/stlink-v2.cfg -f /usr/share/openocd/scripts/target/stm32f1x.cfg
I don't remember how I removed flash protection, but it worked as the original firmware didn't work anymore. Then I created a simple hello world firmware which pulls up and down three gpios and flashed it. The gpios are pulled up and down in 700ms intervals.
After flashing, I can't connect with openocd anymore. I forgot to specify the offset, the manual says the offset defaults to 0 and as it worked, I suppose instead of the boot loader my shitty hello world is pulling up and down some random pins happily… Is this possible? All other threads I found say the boot loader is write protected.
This is the last contact I had:
> halt
halt
target halted due to debug-request, current mode: Handler HardFault
xPSR: 0x01000003 pc: 0xfffffffe msp: 0xffffffdc
> flash write_image erase fw.hex
flash write_image erase fw.hex
auto erase enabled
target halted due to breakpoint, current mode: Handler HardFault
xPSR: 0x61000003 pc: 0x2000003a msp: 0xffffffdc
wrote 4096 bytes from file fw.hex in 0.285697s (14.001 KiB/s)
> reset
reset
jtag status contains invalid mode value - communication failure
Polling target stm32f1x.cpu failed, trying to reexamine
Examination failed, GDB will be halted. Polling again in 100ms
Any directions highly appreciated.
Edit:
What I get now, also tried another st-link:
% sudo openocd -f /usr/share/openocd/scripts/interface/stlink-v2.cfg -f /usr/share/openocd/scripts/target/stm32f1x.cfg
Open On-Chip Debugger 0.10.0
Licensed under GNU GPL v2
For bug reports, read
http://openocd.org/doc/doxygen/bugs.html
Info : auto-selecting first available session transport "hla_swd". To override use 'transport select '.
Info : The selected transport took over low-level target control. The results might differ compared to plain JTAG/SWD
adapter speed: 1000 kHz
adapter_nsrst_delay: 100
none separate
Info : Unable to match requested speed 1000 kHz, using 950 kHz
Info : Unable to match requested speed 1000 kHz, using 950 kHz
Info : clock speed 950 kHz
Info : STLINK v2 JTAG v17 API v2 SWIM v4 VID 0x0483 PID 0x3748
Info : using stlink api v2
Info : Target voltage: 3.244356
Error: init mode failed (unable to connect to the target)
in procedure 'init'
in procedure 'ocd_bouncer'
flashed firmware to 0x00000000 instead of 0x08000000, am I lost?
No, it doesn't matter at all, they are the same.
After reset, the MCU loads the word at address 0 in SP, and the next one at address 4 in PC. The BOOT0 and BOOT1 pins control which memory gets mapped to 0x00000000. Usually, BOOT0 is tied low, and flash memory at 0x08000000 gets mirrored at 0x00000000.
instead of the boot loader my shitty hello world is pulling up and down some random pins happily… Is this possible? All other threads I found say the boot loader is write protected.
The factory bootloader is indeed write protected, openocd can't overwrite it.
However, your application could have reconfigured the SWD pins, by writing a wrong value in GPIOA->CRH or AFIO->MAPR, thereby preventing openocd from working. It's the most common cause of this problem.
Fortunately, there is a way to recover.
Connect under Reset
If the reset pin of the controller is held low for a while when openocd is started, the application is prevented from starting, and messing up the GPIO configuration.
Openocd can do this automatically, when
It's told to do so, the line reset_config srst_only srst_nogate is present somewhere in the configuration script.
The MCU reset pin is connected to the debugger hardware, pin 15 on an official ST-Link/V2.
Or you can do it manually, by whatever means your board provides. If you are lucky, it has a reset button, if not, you must find a way to somehow ground the MCU reset pin.
Pull the reset pin low
Start openocd
Wait until the Info : Target voltage line appears. Maybe a second longer.
Release the reset pin.
It requires a bit of trial and error, you'll get better with practice.
Then you can flash your improved application, which carefully avoids reconfiguring the SWD pins.

OpenOCD multiple STLinks

I need to be connect to 2 STM32s over 2 ST-Links at the same time. I found this issue described here.
However, solution doesn't work for me.
ST-Link ID1: 55FF6B067087534923182367
ST-Link ID2: 49FF6C064983574951291787
OpenOCD cfg file:
source [find interface/stlink-v2.cfg]
hla_serial "55FF6B067087534923182367"
source [find target/stm32f4x.cfg]
# use hardware reset, connect under reset
reset_config srst_only srst_nogate
I get:
$ openocd.exe -f stm32f4_fmboard.cfg
Open On-Chip Debugger 0.10.0
Licensed under GNU GPL v2
For bug reports, read
http://openocd.org/doc/doxygen/bugs.html
Info : auto-selecting first available session transport "hla_swd". To override use 'transport select <transport>'.
Info : The selected transport took over low-level target control. The results might differ compared to plain JTAG/SWD
adapter speed: 2000 kHz
adapter_nsrst_delay: 100
none separate
srst_only separate srst_nogate srst_open_drain connect_deassert_srst
Info : Unable to match requested speed 2000 kHz, using 1800 kHz
Info : Unable to match requested speed 2000 kHz, using 1800 kHz
Info : clock speed 1800 kHz
Error: open failed
in procedure 'init'
in procedure 'ocd_bouncer'
I do not know if solved but:
pi#raspberrypi:~/prog/bootloader $ st-info --probe
Found 1 stlink programmers
serial: 363f65064b46323613500643
openocd: "\x36\x3f\x65\x06\x4b\x46\x32\x36\x13\x50\x06\x43"
flash: 0 (pagesize: 0)
sram: 0
chipid: 0x0000
descr: unknown device
this tool shows serial of st-links and there is option called openocd. When I put hla_serial "\x36\x3f\x65\x06\x4b\x46\x32\x36\x13\x50\x06\x43" in file then it works for me. Your way does not. It also does not work in command line given as argument. It works only as I described in cfg file
The format of the configuration file seems to have changed recently. The following applies for Open On-Chip Debugger 0.10.0+dev-00634-gdb070eb8 (2018-12-30-23:05).
Find out the serial number with lsusb, st-link, or with ls -l /dev/serial/by-id. The latter yields (with two STLink/V2.1 connected):
total 0
lrwxrwxrwx 1 root root 13 Nov 30 14:31 usb-STMicroelectronics_STM32_STLink_066CFF323535474B43125623-if02 -> ../../ttyACM0
lrwxrwxrwx 1 root root 13 Dec 30 23:55 usb-STMicroelectronics_STM32_STLink_0672FF485457725187052924-if02 -> ../../ttyACM1
The specification on the .cfg-file is now plain hex. Do not use the C string syntax any longer. For selecting the latter device, simply write:
#hla_serial "066CFF323535474B43125623"
hla_serial "0672FF485457725187052924"

Unable to read crash dump in windbg

I have been getting a stackoverflow exception in my program which may be originating from a thirdparty libary, microsoft.sharepoint.client.runtime.dll.
Using adplus to create the crash dump, I'm facing the problem that I'm struggling to get any information from it when i open it in windbg. This is what I get as a response:
> 0:000> .restart /f
Loading Dump File [C:\symbols\FULLDUMP_FirstChance_epr_Process_Shut_Down_DocumentumMigrator.exe__0234_2011-11-17_15-19-59-426_0d80.dmp]
User Mini Dump File with Full Memory: Only application data is available
Comment: 'FirstChance_epr_Process_Shut_Down'
Symbol search path is: C:\symbols
Executable search path is:
Windows 7 Version 7601 (Service Pack 1) MP (8 procs) Free x64
Product: Server, suite: Enterprise TerminalServer SingleUserTS
Machine Name:
Debug session time: Thu Nov 17 15:19:59.000 2011 (UTC + 2:00)
System Uptime: 2 days 2:44:48.177
Process Uptime: 0 days 0:13:05.000
.........................................WARNING: rsaenh overlaps cryptsp
.................WARNING: rasman overlaps apphelp
......
..WARNING: webio overlaps winhttp
.WARNING: credssp overlaps mswsock
.WARNING: IPHLPAPI overlaps mswsock
.WARNING: winnsi overlaps mswsock
............
wow64cpu!CpupSyscallStub+0x9:
00000000`74e42e09 c3 ret
Any ideas as to how i can get more information from the dump, or how to use it to find where my stackoverflow error is occuring?
The problem you are facing is that the process is 32-bit, but you are running on 64-bit, therefore your dump is a 64-bit dump. To make use of the dump you have to run the following commands:
.load wow64exts
.effmach x86
!analyze -v
The last command should give you a meaningful stack trace.
This page provides lots of useful information and method to analyze the problem.
http://www.dumpanalysis.org/blog/index.php/2007/09/11/crash-dump-analysis-patterns-part-26/
You didn't mention if your code is managed or unmanaged. Assuming it is unmanaged. In debugger:
.symfix
.reload
~*kb
Look through the call stack for all threads and identify thread that caused SO. It is easy to identify the thread with SO, because the call stack will be extra long. Switch to that thread using command ~<N>s, where is thread number, dump more of the call stack using command k 200 to dump up to 200 lines of call stack. At the very bottom of the call stack you should be able to see the code that originated the nested loop.
If your code is managed, use SOS extension to dump call stacks.