Why syncblk is located at -4 and not at 0? - windbg

So if you want to look at sync block for an object, under sos you have to look at -4 bytes (on 32 bit machines) before the object address. Does anyone know what is the wisdom for going back 4 bytes? I mean they could have sync block at 0, then type handle at +4 and then object fields at +8.

This is an implementation detail, so I can't give you the exact reason for the placement of the syncblock. However, if you look at the shared source CLI, you'll see that the runtime has all sorts of optimizations for how objects are allocated and used, and actually the data associated with a single instance is located in several different places. The syncblock for instance is just an index value for a structure located elsewhere. Similarly the MethodTable and the EEClass are stored elsewhere. These are all implementation details. The important point IMO is understanding how to dig out the information needed during debugging. It is of much less importance to understand why the implementation details are as they are.

I'd say it matches expectations, especially for structs that have been explicitly laid out. As Brian says, it's just an implementation detail though. It's similar to how many implementations of malloc will allocate more space than requested, store the allocation size in the first four (or eight) bytes, and then return a pointer that is offset to point to the next byte beyond that.

Related

$readmemh to load subblocks of memory

Sorry if the question is too newbie. I have been looking for information on readmemh and I haven't been able to find a suitable solution to my problem.
I have a large hex file with the contents of a memory to read from it. But since the file is just too large I want to be able to bring to my program just a smaller memory block every time. For example, if my memory has addresses 32 bits long, being able to load only chunks of 2048 addresses and storing in a variable the upper part to know if I'm in the right chunk or not.
Or more or less, if my hex file is as follows:
#00000000 1212
#00000001 3434
#00000002 5656
...
#ffffffff 9a9a
I want to be able to save it in a structure with the following fields:
logic [15:0] chunck [11:0]; // Keeps 2048 entries
logic [20:0] address_top; // Keeps the top part of the address
Looking at the documentation, I see that readmemh allows me to indicate a start and an end address, but what I am seeing is that it refers to the position in the destination array, not from the source.
How can I do it?
Thanks in advance!
If you are using SystemVerilog, then you can use an associative array to model your memory, and let it take care a allocating and mapping the physical memory as needed, instead of doing it yourself.

Questions around memory utilization in Perl

SO community,
I have been scratching my head lately around two memory issues I am running into with some of my perl scripts and I am hoping I am finding some help/pointers here to better understand what is going on.
Questionable observation #1:
I am running the same perl script on different server instances (local laptop macosx, dedicated server hardware, virtual server hardware) and am getting significantly varying results in the traced memory consumption. Just after script initialization one instance would report be a memory consumption of the script of 210 MB compared to 330 MB on another box which is a fluctuation of over 60%. I understand that the malloc() function in charge of "garbage collection" for Perl is OS specific but are there deviations normal or should I be looking more closely at what is going on?
Questionable observation #2:
One script that is having memory leaks is relatively trivial:
foreach(#dataSamples) {
#memorycheck_1
my $string = subRoutine($_);
print FILE $string;
#memorycheck_2
}
All variables in the subRoutine are kept local and should be out of scope once the subroutine finishes. Yet when checking memory usage at #memorycheck_1 and #memorycheck_1 there is a significant memory leak.
Is there any explanation for that? Using Devel::Leak it seems there are leaked pointers which I have a hard time understanding where they would be coming from. Is there an easy way to translate the response of Devel::Leak into something that can actually give me pointers from where those leaked references origin?
Thanks
You have two different questions:
1) Why is the memory footprint not the same across various environments?
Well, are all the OS involved 64 bit? Or is there a mix? If one OS is 32 bit and the other 64 bit, the variation is to be expected. Or, as #hobbs notes in the comments, is one of the perls compiled with threads support whereas another is not?
2) Why does the memory footprint change between check #1 and check #2?
That does not necessarily mean there is a memory leak. Perl won't give back memory to the OS. The memory footprint of your program will be the largest footprint it reaches and will not go down.
Neither of these points is Perl specific. For more detail, you'll need to show more detail.
See also Question 7.25 in the C FAQ and further reading mentioned in that FAQ entry.
The most common reason for a memory leak in Perl is circular references. The simplest form would be something along the lines of:
sub subRoutine {
my( $this, $that );
$this = \$that;
$that = \$this;
return $_[0];
}
Now of course people reading that are probably saying, "Why would anyone do that?" And one generally wouldn't. But more complex data structures can contain circular references pretty easily, and we don't even blink an eye at them. Consider double-linked lists where each node refers to the node to its left and its right. It's important to not let the last explicit reference to such a list pass out of scope without first breaking the circular references contained in each of its nodes, or you'll get a structure that is inaccessible but can't be garbage collected because the reference count to each node never falls to zero.
Per Eric Strom's excellent suggestion, the core module Scalar::Util has a function called weaken. A reference that has been weakened won't hold a reference count to the entity it refers to. This can be helpful for preventing circular references. Another strategy is to implement your circular-reference-wielding datastructure within a class where an object method explicitly breaks the circular reference. Either way, such data structures do require careful handling.
Another source of trouble is poorly written XS modules (nothing against XS authors; it's just really tricky to write XS modules well). What goes on behind the closed doors of an XS module may be a memory leak.
Until we see what's happening inside of subRoutine we can only guess whether or not there's actually an issue, and what the source of the issue may be.

What makes NSdata advantageous?

I've been looking through the apple documentation for the NSdata class, and I didn't really find it too enlightening. I know how to use the class but I don't really understand the gravity of the advantages that it may or may not provide. I know its a simple question but perhaps it would be good to have such information as a reference.
Advantages over what? Certainly, it's useful to represent an arbitrary block of data as an object just as it's useful to represent a string, a number, or a value as an object. Memory management becomes simpler and is consistent with memory management for all other objects, and there are a number of useful methods defined.
Say you want to read a binary file into memory. We won't worry about the reasons why -- there are as many reasons as there are data file formats. You'll have to:
Check the size of the file
Allocate a block of memory of the proper size
Open the file
Read the contents into memory
Close the file
Remember to free the memory when you're done with it (a condition that can sometimes be tricky to detect)
(Optional) Worry about whether the block of memory has been modified
With NSData, you can just create a new instance from a path or URL and not have to think about the rest.

Post mortem minidump debugging In windbg -- what causes <memory access error> for heap memory?

I'm looking at a crash dump. Some variables seem perfectly viewable in windbg, while others just say "memory access error". What causes this? Why do some variables have sensical values while others simply list ?
It appears that all the problems are associated with following pointers. I'm certain that while many of these pointers are uninitialized the vast majority of them should be pointing somewhere valid. Based on the nature of this crash (a simple null ptr dereference) I'm fairly certain the whole process hasn't gone out to lunch.
Mini-dumps are fairly useless, they don't contain a snapshot of all in use memory. Instead, all they contain are some critical structures/lists (e.g. the loaded module list) and the contents of the crashing stack.
So, any pointer that you try to follow in the dump will just give you question marks. Grab a full memory dump instead and you'll be able to see what these buffers point to.
-scott
If they are local pointer variables, what is most likely happening is that the pointers are not initialized, or that stack location has been reused to contain another variable, that may not be a pointer. In both cases, the pointer value may point to a random, unreadable portion of memory.

The stack size used in kernel development

I'm developing an operating system and rather than programming the kernel, I'm designing the kernel. This operating system is targeted at the x86 architecture and my target is for modern computers. The estimated number of required RAM is 256Mb or more.
What is a good size to make the stack for each thread run on the system? Should I try to design the system in such a way that the stack can be extended automatically if the maximum length is reached?
I think if I remember correctly that a page in RAM is 4k or 4096 bytes and that just doesn't seem like a lot to me. I can definitely see times, especially when using lots of recursion, that I would want to have more than 1000 integars in RAM at once. Now, the real solution would be to have the program doing this by using malloc and manage its own memory resources, but really I would like to know the user opinion on this.
Is 4k big enough for a stack with modern computer programs? Should the stack be bigger than that? Should the stack be auto-expanding to accommodate any types of sizes? I'm interested in this both from a practical developer's standpoint and a security standpoint.
Is 4k too big for a stack? Considering normal program execution, especially from the point of view of classes in C++, I notice that good source code tends to malloc/new the data it needs when classes are created, to minimize the data being thrown around in a function call.
What I haven't even gotten into is the size of the processor's cache memory. Ideally, I think the stack would reside in the cache to speed things up and I'm not sure if I need to achieve this, or if the processor can handle it for me. I was just planning on using regular boring old RAM for testing purposes. I can't decide. What are the options?
Stack size depends on what your threads are doing. My advice:
make the stack size a parameter at thread creation time (different threads will do different things, and hence will need different stack sizes)
provide a reasonable default for those who don't want to be bothered with specifying a stack size (4K appeals to the control freak in me, as it will cause the stack-profligate to, er, get the signal pretty quickly)
consider how you will detect and deal with stack overflow. Detection can be tricky. You can put guard pages--empty--at the ends of your stack, and that will generally work. But you are relying on the behavior of the Bad Thread not to leap over that moat and start polluting what lays beyond. Generally that won't happen...but then, that's what makes the really tough bugs tough. An airtight mechanism involves hacking your compiler to generate stack checking code. As for dealing with a stack overflow, you will need a dedicated stack somewhere else on which the offending thread (or its guardian angel, whoever you decide that is--you're the OS designer, after all) will run.
I would strongly recommend marking the ends of your stack with a distinctive pattern, so that when your threads run over the ends (and they always do), you can at least go in post-mortem and see that something did in fact run off its stack. A page of 0xDEADBEEF or something like that is handy.
By the way, x86 page sizes are generally 4k, but they do not have to be. You can go with a 64k size or even larger. The usual reason for larger pages is to avoid TLB misses. Again, I would make it a kernel configuration or run-time parameter.
Search for KERNEL_STACK_SIZE in linux kernel source code and you will find that it is very much architecture dependent - PAGE_SIZE, or 2*PAGE_SIZE etc (below is just some results - many intermediate output are deleted).
./arch/cris/include/asm/processor.h:
#define KERNEL_STACK_SIZE PAGE_SIZE
./arch/ia64/include/asm/ptrace.h:
# define KERNEL_STACK_SIZE_ORDER 3
# define KERNEL_STACK_SIZE_ORDER 2
# define KERNEL_STACK_SIZE_ORDER 1
# define KERNEL_STACK_SIZE_ORDER 0
#define IA64_STK_OFFSET ((1 << KERNEL_STACK_SIZE_ORDER)*PAGE_SIZE)
#define KERNEL_STACK_SIZE IA64_STK_OFFSET
./arch/ia64/include/asm/mca.h:
u64 mca_stack[KERNEL_STACK_SIZE/8];
u64 init_stack[KERNEL_STACK_SIZE/8];
./arch/ia64/include/asm/thread_info.h:
#define THREAD_SIZE KERNEL_STACK_SIZE
./arch/ia64/include/asm/mca_asm.h:
#define MCA_PT_REGS_OFFSET ALIGN16(KERNEL_STACK_SIZE-IA64_PT_REGS_SIZE)
./arch/parisc/include/asm/processor.h:
#define KERNEL_STACK_SIZE (4*PAGE_SIZE)
./arch/xtensa/include/asm/ptrace.h:
#define KERNEL_STACK_SIZE (2 * PAGE_SIZE)
./arch/microblaze/include/asm/processor.h:
# define KERNEL_STACK_SIZE 0x2000
I'll throw my two cents in to get the ball rolling:
I'm not sure what a "typical" stack size would be. I would guess maybe 8 KB per thread, and if a thread exceeds this amount, just throw an exception. However, according to this, Windows has a default reserved stack size of 1MB per thread, but it isn't committed all at once (pages are committed as they are needed). Additionally, you can request a different stack size for a given EXE at compile-time with a compiler directive. Not sure what Linux does, but I've seen references to 4 KB stacks (although I think this can be changed when you compile the kernel and I'm not sure what the default stack size is...)
This ties in with the first point. You probably want a fixed limit on how much stack each thread can get. Thus, you probably don't want to automatically allocate more stack space every time a thread exceeds its current stack space, because a buggy program that gets stuck in an infinite recursion is going to eat up all available memory.
If you are using virtual memory, you do want to make the stack growable. Forcing static allocation of stack sized, like is common in user-level threading like Qthreads and Windows Fibers is a mess. Hard to use, easy to crash. All modern OSes do grow the stack dynamically, I think usually by having a write-protected guard page or two below the current stack pointer. Writes there then tell the OS that the stack has stepped below its allocated space, and you allocate a new guard page below that and make the page that got hit writable. As long as no single function allocates more than a page of data, this works fine. Or you can use two or four guard pages to allow larger stack frames.
If you want a way to control stack size and your goal is a really controlled and efficient environment, but do not care about programming in the same style as Linux etc., go for a single-shot execution model where a task is started each time a relevant event is detected, runs to completion, and then stores any persistent data in its task data structure. In this way, all threads can share a single stack. Used in many slim real-time operating systems for automotive control and similar.
Why not make the stack size a configurable item, either stored with the program or specified when a process creates another process?
There are any number of ways you can make this configurable.
There's a guideline that states "0, 1 or n", meaning you should allow zero, one or any number (limited by other constraints such as memory) of an object - this applies to sizes of objects as well.