Why ILSpy is adding variables on stack instead of Instructions? - ilspy

Why ILSpy is adding variables on stack instead of Instructions? I mean, when pushing or poping from/on stack it adds Ldloc and Stloc instructions. Can anyone explain why it has this behaviour? Thanks!

Because a stack slot acts like a variable: it can be used multiple times (e.g. on both branches of an if), but the effect of the instruction only happens once, when the value is pushed on the stack.
A decompiler that uses a stack of instructions would effectively cause the side effects of the instruction to instead happen at the point where the value is popped from the stack. This would be a program reordering that could subtly change program behavior -> incorrect decompilation.
In principle, using a stack of instructions would be possible within basic blocks; but when there's control flow (either outgoing or incoming) or a dup instruction, the whole stack of instructions would have to be converted to a stack of variables.
Currently the ILSpy ILReader uses a single pass (as specified in the Ecma-335 spec), so it doesn't know about incoming control flow during the ILReader run, so it has to always use a stack of variables to be safe.
It turns out that this is not how the .NET framework reads IL bytecodes, and some obfuscators are exploiting the difference. So in the future, we may rewrite the ILReader to work more like the .NET bytecode importer, at which point we might move to the mixed stack of variables+stack of instructions model. ILSpy issue #901

Related

How to determine Stackdepth required for individual task to create using FreeRTOS xTaskCreate?

I am using FreeRTOS to develop Firmware on STM32 MCU.
What are the techniques I can use to determine stack size required for individual task which is created using FreeRTOS xTaskCreate API?
I know this depends on work done by individual task but I need to know how can I find out nearby stackdepth value to make sure my Task will work without any stack overflow error during runtime.
The easiest way is to use a FreeRTOS aware IDE plug-in that tells you the stack usage. Failing that, you can calculate it - or get GCC to calculate it for you - but my preference is a bit more pragmatic. First ensure you have a stack overflow hook defined, in case the stack is too small. Then assign a stack you think is too large, let the code execute through what is assume to be the highest stack usage code path, then call uxStackGetHighWaterMark() to see how much stack was actually used and adjust accordingly - remembering to add anything necessary for whatever the likely interrupt nesting stack usage will be. You can also use more invasive functions such as uxTaskGetSystemStack() to see the stack usage of all tasks.

Sibling calls don’t appear in stack trace?

I just stumbled upon a line in the Wikipedia article on stack traces.
It says:
Sibling calls do not appear in a stack trace.
What exactly does this mean? I thought all stack frames appeared in a stack trace. From my understanding, even with a tail call, new frames are still pushed onto the stack, and are thus traceable. Is there an example where I can see this in action, where sibling calls are not shown in the stack trace?
From my understanding, even with a tail call, new frames are still pushed onto the stack, and are thus traceable
You misunderstand.
From Wikipedia:
Tail calls can be implemented without adding a new stack frame to the call stack. [emphasis mine] Most of the frame of the current procedure is no longer needed, and can be replaced by the frame of the tail call, modified as appropriate (similar to overlay for processes, but for function calls). The program can then jump to the called subroutine. Producing such code instead of a standard call sequence is called tail call elimination.
As "sibling calls" are just a special case of tail calls, they can be optimized in the same way. You should be able to see examples of this in any scenario where the compiler would optimize other tail calls, as well as in those specific examples such as described in the above-referenced Wikipedia article.
Sibling calls are, according to the lemma led to by the link under that term, "tail calls to functions which take and return the same types as the caller".
This uses a jump instead of a function call using the current stack frame.

release build variable corruption when using ne10 math library assembly function

has anyone experience the following issue?
A stack variable getting changed/corrupted after calling ne10 assembly function such as ne10_len_vec2f_neon?
e.g
float gain = 8.0;
ne10_len_vec2f_neon(src, dst, len);
after the call to ne10_len_vec2f_neon, the value of gain changes as its memory is getting corrupted.
1. Note this only happens when the project is compiled in release build but not debug build.
2. Does Ne10 assembly functions preserve registers?
3. Replacing the assembly function call to c equivalent such as ne10_len_vec2f_c and both release and debug build seem to work OK.
thanks for any help on this. Not sure if there's an inherent issue within the program or it is really the call to ne10_len_vec2f_neon causing the corruption with release build.enter code here
I had a quick rummage through the master NEON code here:
https://github.com/projectNe10/Ne10/blob/master/modules/math/NE10_len.neon.s
... and it doesn't really touch address-based stack at all, so not sure it's a stack problem in memory.
However based on what I remember of the NEON procedure call standard q4-q7 (alias d8-d15 or s16-s31) should be preserved by the callee, and as far as I can tell that code is clobbering q4-6 without the necessary save/restore, so it does indeed look like it's clobbering the stack in registers.
In the failed case do you know if gain is still stored in FPU registers, and if yes which ones? If it's stored in any of s16/17/18/19 then this looks like the problem. It also seems plausible that a compiler would choose to use s16 upwards for things it needs to keep across a function call, as it avoids the need to touch in-RAM stack memory.
In terms of a fix, if you perform the following replacements:
s/q4/q8/
s/q5/q9/
s/q6/q10/
in that file, then I think it should work; no means to test here, but those higher register blocks are not callee saved.

stack overflow method

In some operating system,for any process there is a stack and a heap.Both grows towards each other.There must be a guard band between them to check for overlapping.Can anyone give me some illustration about it.I want to write my own function for checking stack overflow error.
In a system like that, you would normally have a guard word or something similar at the top of the heap, something like 0xa55a or 0xdeadbeef.
Then, periodically, that guard word is checked to see if it's been corrupted. If so something has overwritten the memory.
Now this may not necessarily be a stack overflow, it may be a rogue memory write. But, in both those cases, something is seriously wrong so you may as well treat them the same.
Of course, more modern operating systems may take the approach of using the assistance of the hardware such as in the Intel chips. In those, you can set up a stack segment to a specific size and, if you try to write outside of there (using the stack selector), you'll get a trap raised.
The heap in that case would be using a different selector so as to be kept separate.
Many operating systems place a guard page (or similar techniques) between stack and heap to protect against such attack vectors. I haven't seen canaries (the method mentioned by paxdiablo) there yet, they're mostly used to guard against stack-internal overflows (aka to guard the return address).
Guard pages on Windows: http://msdn.microsoft.com/en-us/library/aa366549(VS.85).aspx
Linux had an interesting exploit based on this problem some time ago though: http://www.h-online.com/open/news/item/Root-privileges-through-Linux-kernel-bug-Update-1061563.html

The stack size used in kernel development

I'm developing an operating system and rather than programming the kernel, I'm designing the kernel. This operating system is targeted at the x86 architecture and my target is for modern computers. The estimated number of required RAM is 256Mb or more.
What is a good size to make the stack for each thread run on the system? Should I try to design the system in such a way that the stack can be extended automatically if the maximum length is reached?
I think if I remember correctly that a page in RAM is 4k or 4096 bytes and that just doesn't seem like a lot to me. I can definitely see times, especially when using lots of recursion, that I would want to have more than 1000 integars in RAM at once. Now, the real solution would be to have the program doing this by using malloc and manage its own memory resources, but really I would like to know the user opinion on this.
Is 4k big enough for a stack with modern computer programs? Should the stack be bigger than that? Should the stack be auto-expanding to accommodate any types of sizes? I'm interested in this both from a practical developer's standpoint and a security standpoint.
Is 4k too big for a stack? Considering normal program execution, especially from the point of view of classes in C++, I notice that good source code tends to malloc/new the data it needs when classes are created, to minimize the data being thrown around in a function call.
What I haven't even gotten into is the size of the processor's cache memory. Ideally, I think the stack would reside in the cache to speed things up and I'm not sure if I need to achieve this, or if the processor can handle it for me. I was just planning on using regular boring old RAM for testing purposes. I can't decide. What are the options?
Stack size depends on what your threads are doing. My advice:
make the stack size a parameter at thread creation time (different threads will do different things, and hence will need different stack sizes)
provide a reasonable default for those who don't want to be bothered with specifying a stack size (4K appeals to the control freak in me, as it will cause the stack-profligate to, er, get the signal pretty quickly)
consider how you will detect and deal with stack overflow. Detection can be tricky. You can put guard pages--empty--at the ends of your stack, and that will generally work. But you are relying on the behavior of the Bad Thread not to leap over that moat and start polluting what lays beyond. Generally that won't happen...but then, that's what makes the really tough bugs tough. An airtight mechanism involves hacking your compiler to generate stack checking code. As for dealing with a stack overflow, you will need a dedicated stack somewhere else on which the offending thread (or its guardian angel, whoever you decide that is--you're the OS designer, after all) will run.
I would strongly recommend marking the ends of your stack with a distinctive pattern, so that when your threads run over the ends (and they always do), you can at least go in post-mortem and see that something did in fact run off its stack. A page of 0xDEADBEEF or something like that is handy.
By the way, x86 page sizes are generally 4k, but they do not have to be. You can go with a 64k size or even larger. The usual reason for larger pages is to avoid TLB misses. Again, I would make it a kernel configuration or run-time parameter.
Search for KERNEL_STACK_SIZE in linux kernel source code and you will find that it is very much architecture dependent - PAGE_SIZE, or 2*PAGE_SIZE etc (below is just some results - many intermediate output are deleted).
./arch/cris/include/asm/processor.h:
#define KERNEL_STACK_SIZE PAGE_SIZE
./arch/ia64/include/asm/ptrace.h:
# define KERNEL_STACK_SIZE_ORDER 3
# define KERNEL_STACK_SIZE_ORDER 2
# define KERNEL_STACK_SIZE_ORDER 1
# define KERNEL_STACK_SIZE_ORDER 0
#define IA64_STK_OFFSET ((1 << KERNEL_STACK_SIZE_ORDER)*PAGE_SIZE)
#define KERNEL_STACK_SIZE IA64_STK_OFFSET
./arch/ia64/include/asm/mca.h:
u64 mca_stack[KERNEL_STACK_SIZE/8];
u64 init_stack[KERNEL_STACK_SIZE/8];
./arch/ia64/include/asm/thread_info.h:
#define THREAD_SIZE KERNEL_STACK_SIZE
./arch/ia64/include/asm/mca_asm.h:
#define MCA_PT_REGS_OFFSET ALIGN16(KERNEL_STACK_SIZE-IA64_PT_REGS_SIZE)
./arch/parisc/include/asm/processor.h:
#define KERNEL_STACK_SIZE (4*PAGE_SIZE)
./arch/xtensa/include/asm/ptrace.h:
#define KERNEL_STACK_SIZE (2 * PAGE_SIZE)
./arch/microblaze/include/asm/processor.h:
# define KERNEL_STACK_SIZE 0x2000
I'll throw my two cents in to get the ball rolling:
I'm not sure what a "typical" stack size would be. I would guess maybe 8 KB per thread, and if a thread exceeds this amount, just throw an exception. However, according to this, Windows has a default reserved stack size of 1MB per thread, but it isn't committed all at once (pages are committed as they are needed). Additionally, you can request a different stack size for a given EXE at compile-time with a compiler directive. Not sure what Linux does, but I've seen references to 4 KB stacks (although I think this can be changed when you compile the kernel and I'm not sure what the default stack size is...)
This ties in with the first point. You probably want a fixed limit on how much stack each thread can get. Thus, you probably don't want to automatically allocate more stack space every time a thread exceeds its current stack space, because a buggy program that gets stuck in an infinite recursion is going to eat up all available memory.
If you are using virtual memory, you do want to make the stack growable. Forcing static allocation of stack sized, like is common in user-level threading like Qthreads and Windows Fibers is a mess. Hard to use, easy to crash. All modern OSes do grow the stack dynamically, I think usually by having a write-protected guard page or two below the current stack pointer. Writes there then tell the OS that the stack has stepped below its allocated space, and you allocate a new guard page below that and make the page that got hit writable. As long as no single function allocates more than a page of data, this works fine. Or you can use two or four guard pages to allow larger stack frames.
If you want a way to control stack size and your goal is a really controlled and efficient environment, but do not care about programming in the same style as Linux etc., go for a single-shot execution model where a task is started each time a relevant event is detected, runs to completion, and then stores any persistent data in its task data structure. In this way, all threads can share a single stack. Used in many slim real-time operating systems for automotive control and similar.
Why not make the stack size a configurable item, either stored with the program or specified when a process creates another process?
There are any number of ways you can make this configurable.
There's a guideline that states "0, 1 or n", meaning you should allow zero, one or any number (limited by other constraints such as memory) of an object - this applies to sizes of objects as well.