Breaking a stack/call frame information chain on ELF/Linux? - x86-64

I'm trying to do a rather niche thing which is essentially breaking the CFI (Call Frame Information in DWARF EH info) and rbp & rsp links between frames. Main reason for that is that is that past a certain point in thread control flow I want to do a call continuation which is basically a one-way tailcall combined with a yield which should clean up the stack and then return to the top of the stack ready to be executed again at the continuation point.
Here is the idea in principle, which works as long as I keep the lines that mess with the stack commented out:
/*
* x86_64 SysV:
* rdi, rsi, rds, rcx, r8, r9, xmm0-xmm7
*/
__asm {
mov rax, TCB
mov rax, qword ptr [rax] OSThreadControlBlock.StartFn;
call rax;
mov rax, 0;
// end of stack
//push rax;
//push rax;
//push rbx;
// last "real" frame
//push rbp;
//mov rbp, rsp;
//push rbx;
// make the call
mov rdi, RL;
lea rax, qword ptr __OS_RUNLOOP_START__;
call rax;
// trap if it returns
//int 3;
}
I'm aware of the general principles behind SP/BP registers, I'm specifically using -fno-omit-frame-pointer. My question is, after having spent hours trying to get it to work, what am I missing? It seems that any alteration to the stack layout, even as simple as a push before a call while keeping it aligned will cause a snowball crash starting with something like this (custom signal handler):
Received fatal signal: Segmentation fault (11) [thread: 10298 ctl-thrd]
* Unknown error at address 0x0 Regs:
%rip=0x00000000003E2D91 %rbp=0x00007F820A547EA8 %rsp=0x00007F820A547DE8 %rax=0x00007F820A547DE8 %rbx=0x00007F820A547F38
%rdi=0x00000000002121E1 %rsi=0x000000000000007B %rcx=0x000000000000000A %r8=0x0000000000000900 %r9=0x00007F820A5490C0
The ABI in question is libc++/libc++abi on x86_64 Linux, with a LLVM/Clang 6.0.X based toolchain. I tried practically everything, I know the above looks weird but it's an MS extension for inline assembly, I checked multiple times in disassemblies that it generates perfectly sane code. As far as I understand this is some weird conflict between CFI and frame pointer based stuff but I'm not that amazingly good at x86_64 so I'm not really sure what I'm missing. I know the unwinding process is meant to be terminated by a sentinel (null SP/FP on the last frame) but at this point I'm honestly lost because even the debugger gets completely thrown off by this.
If anyone has any suggestions that would be really appreciated, I tried various things but the core problem is the same, as soon as I touch the stack, even if I return it to normal, everything goes haywire. Clobber beyond the asm block doesn't matter since the last call is not meant to conventionally return. One thing I did notice is that it seems this is somehow related to TLVs but I'm not sure how since NPTL is meant to configure that.
Any help or suggestions would me immensely appreciated.
Edit:
Looks like this comment from Valgrind may explain what is happening:
/* NB 9 Sept 07. There is a nasty kludge here in all these CALL_FN_
macros. In order not to trash the stack redzone, we need to drop
%rsp by 128 before the hidden call, and restore afterwards. The
nastyness is that it is only by luck that the stack still appears
to be unwindable during the hidden call - since then the behaviour
of any routine using this macro does not match what the CFI data
says. Sigh.
Why is this important? Imagine that a wrapper has a stack
allocated local, and passes to the hidden call, a pointer to it.
Because gcc does not know about the hidden call, it may allocate
that local in the redzone. Unfortunately the hidden call may then
trash it before it comes to use it. So we must step clear of the
redzone, for the duration of the hidden call, to make it safe.
Probably the same problem afflicts the other redzone-style ABIs too
(ppc64-linux, ppc32-aix5, ppc64-aix5); but for those, the stack is
self describing (none of this CFI nonsense) so at least messing
with the stack pointer doesn't give a danger of non-unwindable
stack. */

Related

breakpoint with debugger Commend jump in xcode

I made a breakpoint in Xcode with the jump commend to force passing some condition, but when it execute to line 168 it crash with message
"Thread 1: EXC_BAD_ACCESS (code=1, address=0x1)"
why did that happen?
the console logged:
warning: MoreMultitypeCollectionViewCell.swift:178 appears multiple times in this function, selecting the first location:
MoreMultitypeCollectionViewCell.(updateButtonStateCkeck in _9A12557DCAB30EEB52DC7C2EA09487CD)() -> () + 1580 at MoreMultitypeCollectionViewCell.swift:178
MoreMultitypeCollectionViewCell.(updateButtonStateCkeck in _9A12557DCAB30EEB52DC7C2EA09487CD)() -> () + 1600 at MoreMultitypeCollectionViewCell.swift:178
my questions are:
How should I type in lldb to select location?
Is there a better way to force passing into If Statement without change code and rebuild project?
sometimes when I type 'po' in lldb or click print description in variable view, it will show fail message, how is that?
1) In lldb, the equivalent command is thread jump and you can specify an address as well as a line number there.
2) thread jump or the Xcode equivalent is an inherently dangerous operation. If you jump over the initialization of some variable, you will be dealing with bad data now and will likely crash. That sort of thing you can sometimes spot by eye - though Swift is lazy about initialization so the actual initialization of a variable may not happen where you think it does in the source. There are more subtle problems as well. For instance, if you jump over some code that as a byproduct of its operation retains or releases an object, the object will end up under or over retained. The former will cause crashes, the latter memory leaks. These retains & releases are generated by the compiler, so you can't see them in your source code, though you could if you look at the disassembly of the code you are jumping over.
Without looking at the code in question, I can't tell why this particular jump caused a crash.
But you can't 100% safely skip some of the code the compiler choose to emit. Looking at the disassembly you might be able to spot either (a) a better place to stop before the jump - i.e. stop past some retain or release that is causing a problem or jump to an address in the middle of a line so you still call a retain that's needed. You'll have to figure this out by hand.
3) There's not enough info to answer this question.
BTW, your image links don't seem to resolve.

FreeRTOS wrong context switch restoration

I'm using FreeRTOS with STM32F407. I have problem with wrong context switch restoration. The code goes like this inside task code:
char *ptr = pvPortMalloc(sizeof(char) * size);
memcpy(ptr, buf, size);
...
log("Before:");
logItoa((int)ptr);
blockingFunction(); // Here preemption will occur
log("After:");
logItoa((int)ptr);
blockingFunction() does not use ptr.
When I debug I can see, that the address pointed by the ptr is stored with instruction:
STR R0, [R7, #24]
so I check the value under the address (R7 + 24)(^1) in data memory and see that the address to dynamically allocated data is successfully saved.
After context restoration I check the variable ptr and see, that it isn't pointing on my newly allocated data, so I check the value under the address (^1) and see, that the value remains unchanged, but value in R7 register (used for address counting) isn't same as before preemption.
It leads to situation when each of my local variables aren't the same, because they're wrongly fetched from data memory.
If it's kind of stack overflow issues, how can I debug it?
Most issues on Cortex-M come down to incorrect interrupt priority assignments and stack overflow, so later versions of FreeRTOS have lots of traps for both of these errors to let you know immediately if they occur - but you have to turn on the ability to trap these common errors, as per below:
Do you have configASSERT() defined, and which version of FreeRTOS are you using? The later the version the more helpful configASSERT() will be.
Do you have configCHECK_FOR_STACK_OVERFLOW set to 2 and a stack overflow hook defined?
I found the source of problem. Inside the blockingFunction() there were buffer explicitly filled. The buffer was too small and overwrote my task's TCB.

release build variable corruption when using ne10 math library assembly function

has anyone experience the following issue?
A stack variable getting changed/corrupted after calling ne10 assembly function such as ne10_len_vec2f_neon?
e.g
float gain = 8.0;
ne10_len_vec2f_neon(src, dst, len);
after the call to ne10_len_vec2f_neon, the value of gain changes as its memory is getting corrupted.
1. Note this only happens when the project is compiled in release build but not debug build.
2. Does Ne10 assembly functions preserve registers?
3. Replacing the assembly function call to c equivalent such as ne10_len_vec2f_c and both release and debug build seem to work OK.
thanks for any help on this. Not sure if there's an inherent issue within the program or it is really the call to ne10_len_vec2f_neon causing the corruption with release build.enter code here
I had a quick rummage through the master NEON code here:
https://github.com/projectNe10/Ne10/blob/master/modules/math/NE10_len.neon.s
... and it doesn't really touch address-based stack at all, so not sure it's a stack problem in memory.
However based on what I remember of the NEON procedure call standard q4-q7 (alias d8-d15 or s16-s31) should be preserved by the callee, and as far as I can tell that code is clobbering q4-6 without the necessary save/restore, so it does indeed look like it's clobbering the stack in registers.
In the failed case do you know if gain is still stored in FPU registers, and if yes which ones? If it's stored in any of s16/17/18/19 then this looks like the problem. It also seems plausible that a compiler would choose to use s16 upwards for things it needs to keep across a function call, as it avoids the need to touch in-RAM stack memory.
In terms of a fix, if you perform the following replacements:
s/q4/q8/
s/q5/q9/
s/q6/q10/
in that file, then I think it should work; no means to test here, but those higher register blocks are not callee saved.

GDB error invalid offset, value too big (0x00000400) Unable to build app in debug mode need help

I have an app which was working fine few days ago. But today I'm getting this error:
{standard input}:1948:invalid offset, value too big (0x00000400)
Command /Developer/Platforms/iPhoneOS.platform/Developer/usr/bin/gcc-4.2 failed with exit >code 1
Ok Guys,
After a lot of troubleshooting finally I found the solution. A big Switch Case was the problem. Converting that into if else statement solved the problem.
I had a similar issue today while I was writing an assembly routine for an ARM Cortex-M0 processor. In my case, the code that caused the error looked like this:
ldr r7 ,=PRTCNFG_10
This is a pseudo instruction causing the processor to load the value of constant PRTCNFG_10 (defined using the .equ directive) into register r7. The pseudo instruction will be translated into
ldr r7 ,[pc, #immed8]
where #immed8 is an 8 bit immediate value. Since 2^8=256, the definition of PRTCNFG_10 must not be placed beyond pc+256 bytes, otherwise the Assembler will throw an error.
I solved the issue by explicitly allocating PRTCNFG_10 in memory:
PRTCNFG_10:
.word 0x606
Just saw the same issue, which also turned out to be caused by a switch case. It wasn't even that big (26 cases), and it had compiled fine in the past, but for some reason it started to fail today. Replacing it with if-else solved the weird GCC error.
While this question is not strictly about assembler, this question pops up in web searches about this specific errors often enough that I'd like to add an answer that should be helpful to people programming in it.
The assembler syntax is LDR REG, =SOMETHING.
If that SOMETHING is >16 bits, we got a problem because Thumb doesn't have 32-bit immediates. To fix this, the assembler remembers the constant and replaces the statement with a PC-relative load to something that's less than 0x400 bytes off (more than that doesn't fit in the instruction).
You then say
.ltoff
someplace convenient (e.g. right behind the next bx lr or pop {pc}) to direct the assembler to place these constants there.

How could an assembly OUTB function cause a triple fault?

In my systems programming class we are working on a small, simple hobby OS. Personally I have been working on an ATA hard disk driver. I have discovered that a single line of code seems to cause a fault which then immediately reboots the system. The code in question is at the end of my interrupt service routine for the IDE interrupts. Since I was using the IDE channels, they are sent through the slave PIC (which is cascaded through the master). Originally my code was only sending the end-of-interrupt byte to the slave, but then my professor told me that I should be sending it to the master PIC as well.
SO here is my problem, when I un-comment the line which sends the EOI byte to the master PIC, the systems triple faults and then reboots. Likewise, if I leave it commented the system stays running.
_outb( PIC_MASTER_CMD_PORT, PIC_EOI ); // this causes (or at least sets off) a triple fault reboot
_outb( PIC_SLAVE_CMD_PORT, PIC_EOI );
Without seeing the rest of the system, is it possible for someone to explain what could possibly be happening here?
NOTE: Just as a shot in the dark, I replaced the _outb() call with another _outb() call which just made sure that the interrupts were enable for the IDE controller, however, the generated assembly would have been almost identical. This did not cause a fault.
*_outb() is a wrapper for the x86 OUTB instruction.
What is so special about my function to send EOI to the master PIC that is an issue?
I realize without seeing the code this may be impossible to answer, but thanks for looking!
Triple faults usually point to a stack overflow or odd stack pointer. When a fault or interrupt occurs, the system immediately tries to push some more junk onto the stack (before invoking the fault handler). If the stack is hosed, this will cause another fault, which then tries to push more stuff on the stack, which causes another fault. At this point, the system gives up on you and reboots.
I know this because I actually have a silly patent (while working at Dell about 20 years ago) on a way to cause a CPU reset without external hardware (used to be done through the keyboard controller):
MOV ESP,1
PUSH EAX ; triple fault and reset!
An OUTB instruction can't cause a fault on its own. My guess is you are re-enabling an interrupt, and the interrupt gets triggered while something is wrong with your stack.
When you re-enable the PIC, are you doing it with the CPU's interrupt flag set, or cleared (ie. are you doing it sometime after a CLI opcode, or, sometime after an STI opcode)?
Assuming that the CPU's interrupt flag is enabled, your act of re-enabling the PIC allows any pending interrupts to reach the CPU: which would interrupt your code, dispatch to a vector specified by the IDT, etc.
So I expect that it's not your opcode that's directly causing the fault: rather, what's faulting is code that's run as the result of an interrupt which happens as a result of your re-enabling the PIC.