Is the heap preallocated for a process - operating-system

Since I was introduced to the concept of heap of a process, I have been assuming that the OS allocates it at the creation of the process. But then I was doing some research and read a statement here.
It says:
When a program asks malloc for space, malloc asks sbrk to increment the heap size and returns a pointer to the start
of the new region on the heap.
If I understood what's been said, the OS allocates 0 cell for the process's heap, and it is only by calling malloc that the process gets some heap cells. And for me this makes more sens for the expression "dynamic allocation". Is this correct ?

in figure you can see that your c/c++ program have a free memory area where the heap and the stack can grow until full the region, so Initialy the heap is empty, and when a process call malloc, Normally (but in modern implementation, malloc prefer to call always mmap()) he call the sbrk() function for increase the memory size of the heap (in reality he first search into the free linked list and if there is not any entry into the linked list he call sbrk(), see this for a implementation of malloc() malloc implementation?).
So the os don't directly decide how the heap of a process should be allocated, in c/c++ the thinks work like this, but i think that in other languages the thinks can be slightly different.

Related

Differentiate between memory leak and NULL dereferencing

I don't understand the difference between memory leak and null dereferencing. How are these two terms related?
The operating-system have a memory map for each process (each executable you start like the one you compile and run). This memory map tells the OS what pages of physical memory are allocated to the process. A memory leak is when you allocate memory using the new operator in C++ (or malloc() in C) but never release it later. The actual memory leak happens when you change the address that a pointer points to when it has been allocated with new without releasing memory first with delete.
There are 2 types of memory allocation. One is static the other is dynamic. Static memory allocation works like the following:
unsigned char memory[10];
In this example, I allocate 10 unsigned chars statically. This means that the memory will be allocated in the executable at compilation time. The executable will contain space for these unsigned chars I allocated statically. When you will launch the executable, the OS will place the content of that array in RAM (after loading the executable from disk). In this example, memory represents a pointer to the first element of the unsigned char array.
Dynamic memory works like the following (in C++):
unsigned char* memory = new unsigned char[10];
In this example, I allocate 10 unsigned chars on the heap instead of statically in the executable. The heap is managed by the OS and grows according to how much memory you allocate. There is no limit to memory allocation with new. Nothing prevents a program from allocating the whole RAM. If a program runs for a long time and it has memory leaks, it could allocate a lot of RAM until the OS starts having a hard time to make it work with the rest of the system (or until the amount of memory allocated to the process is bigger than RAM).
This works by doing a system call in the OS. When you compile a program which has the above dynamic memory allocation, you compile the line to a system call in the kernel to ask for memory. This is OS specific so the program you compile will have to be recompiled to work on a different OS.
In the meantime, you can create a nullptr or initialize a pointer to 0 like the following:
unsigned char* ptr = nullptr;
or
unsigned char* ptr = 0;
When you dereference that pointer, the dereference will be compiled to a memory fetch. The memory fetch will trigger a page fault because the memory at 0 wasn't allocated to your process. Then the OS will look in it's memory map for the process. It will determine that this address access wasn't legal and kill the process.
The terms are pretty much different and there isn't much relation between the 2.

pycuda shared memory up to device hard limit

This is an extension of the discussion here: pycuda shared memory error "pycuda._driver.LogicError: cuLaunchKernel failed: invalid value"
Is there a method in pycuda that is equivalent to the following C++ API call?
#define SHARED_SIZE 0x18000 // 96 kbyte
cudaFuncSetAttribute(func, cudaFuncAttributeMaxDynamicSharedMemorySize, SHARED_SIZE)
Working on a recent GPU (Nvidia V100), going beyond 48 kbyte shared memory requires this function attribute be set. Without it, one gets the same launch error as in the topic above. The "hard" limit on the device is 96 kbyte shared memory (leaving 32 kbyte for L1 cache).
There's a deprecated method Fuction.set_shared_size(bytes) that sounds promising, but I can't find what it's supposed to be replaced by.
PyCUDA uses the driver API, and the corresponding function call for setting a function dynamic memory limits is cuFuncSetAttribute.
I can't find that anywhere in the current PyCUDA tree, and therefore suspect that it has not been implemented.
I'm not sure if this is what you're looking for, but this might help someone looking in this direction.
The dynamic shared memory size in PyCUDA can be set either using:
shared argument in the direct kernel call (the "unprepared call"). For example:
myFunc(arg1, arg2, shared=numBytes, block=(1,1,1), grid=(1,1))
shared_size argument in the prepared kernel call. For example:
myFunc.prepared_call(grid, block, arg1, arg2, shared_size=numBytes)
where numBytes is the amount of memory in bytes you wish to allocate at runtime.

Can't get max ram size - STM32 with rtos

i'm using STM32F103R8T6,I'm currently setting max heap size for RTOS
When i try setting 12000
#define configTOTAL_HEAP_SIZE ((size_t)12000)
ERROR Compilation
region `RAM' overflowed by 780 bytes Project-STM32 C/C++ Problem
so what's the max i can use ?
Look in the linker (.ld) file. You'll see section defining RAM. That will tell you how much RAM you have, assuming the linker file was properly generated.
The error message you've pasted indicates that linker went 780 bytes past the end of available RAM area. In your case (STM32F103R8T6), it tried to place 21260 bytes (20KB + 780) into RAM which is defined to only fit 20KB. If you decrease configTOTAL_HEAP_SIZE by the amount reported by linker, it'll likely link successfully. There will however be 0 remaining space for regular / non-RTOS heap so no malloc or new will succeed, in case any part of your code wanted to use it.
You can determine exactly what gets put into RAM by your linker by analyzing your *.map file (sidenote: map file is created only if your program gets linked successfully, so you need to at least get it to that state). When you open it, search for 20000000 (start of your RAM region) and there you should see what exactly gets put there, including size of each chunk.
Unless you did something out-of-ordinary to your project (which I think is safe to assume you didn't as you mention using generated project), your RAM area during linking will need to at least fit the following sections:
.data segment where things like global variables initialized by value live
.bss segment which is similar to the one above except values are zero-initialized. This is where eventually the byte array of size configTOTAL_HEAP_SIZE will be put that RTOS uses as its own heap
Stack (don't confuse with RTOS stack sizes, this one is totally separate) - stack used outside of RTOS tasks. This has a constant size - consult your sections.ld file to find the value.
Heap segment that has a size calculated dynamically by the linker and which is equal to total size of RAM minus size of all other sections. The bigger you make your other segments, the smaller your regular heap will be.
Having said that, apart from going through the *.map file to determine what else other than the RTOS heap occupies your RAM, I'd also think twice about why you'd need 12KB (out of 20KB total) assigned only to RTOS heap. Things like do you need so many tasks, do they need such large stacks, do you need so many/so large queues/mutexes/semaphores.

Heap Memory Overflow

I have written some piece of code for my program in Matlab 7.10.0 which has a graphical user interface, but sometimes i do receive this error on the command window:
[ConditionalEventPump] Exception
occurred during event dispatching:
java.lang.OutOfMemoryError: Java heap
space
And the system gets too slow...
Kindly help me with this,,how can i resolve this heap memory issue.
You need to increase the amount of heap memory for the JVM, see this "solution"
Go to Preferences -> General -> Java Heap Memory and increase the heap size. Restart MATLAB and check if it works (if not, maybe you could consider increasing a bit more).

The stack size used in kernel development

I'm developing an operating system and rather than programming the kernel, I'm designing the kernel. This operating system is targeted at the x86 architecture and my target is for modern computers. The estimated number of required RAM is 256Mb or more.
What is a good size to make the stack for each thread run on the system? Should I try to design the system in such a way that the stack can be extended automatically if the maximum length is reached?
I think if I remember correctly that a page in RAM is 4k or 4096 bytes and that just doesn't seem like a lot to me. I can definitely see times, especially when using lots of recursion, that I would want to have more than 1000 integars in RAM at once. Now, the real solution would be to have the program doing this by using malloc and manage its own memory resources, but really I would like to know the user opinion on this.
Is 4k big enough for a stack with modern computer programs? Should the stack be bigger than that? Should the stack be auto-expanding to accommodate any types of sizes? I'm interested in this both from a practical developer's standpoint and a security standpoint.
Is 4k too big for a stack? Considering normal program execution, especially from the point of view of classes in C++, I notice that good source code tends to malloc/new the data it needs when classes are created, to minimize the data being thrown around in a function call.
What I haven't even gotten into is the size of the processor's cache memory. Ideally, I think the stack would reside in the cache to speed things up and I'm not sure if I need to achieve this, or if the processor can handle it for me. I was just planning on using regular boring old RAM for testing purposes. I can't decide. What are the options?
Stack size depends on what your threads are doing. My advice:
make the stack size a parameter at thread creation time (different threads will do different things, and hence will need different stack sizes)
provide a reasonable default for those who don't want to be bothered with specifying a stack size (4K appeals to the control freak in me, as it will cause the stack-profligate to, er, get the signal pretty quickly)
consider how you will detect and deal with stack overflow. Detection can be tricky. You can put guard pages--empty--at the ends of your stack, and that will generally work. But you are relying on the behavior of the Bad Thread not to leap over that moat and start polluting what lays beyond. Generally that won't happen...but then, that's what makes the really tough bugs tough. An airtight mechanism involves hacking your compiler to generate stack checking code. As for dealing with a stack overflow, you will need a dedicated stack somewhere else on which the offending thread (or its guardian angel, whoever you decide that is--you're the OS designer, after all) will run.
I would strongly recommend marking the ends of your stack with a distinctive pattern, so that when your threads run over the ends (and they always do), you can at least go in post-mortem and see that something did in fact run off its stack. A page of 0xDEADBEEF or something like that is handy.
By the way, x86 page sizes are generally 4k, but they do not have to be. You can go with a 64k size or even larger. The usual reason for larger pages is to avoid TLB misses. Again, I would make it a kernel configuration or run-time parameter.
Search for KERNEL_STACK_SIZE in linux kernel source code and you will find that it is very much architecture dependent - PAGE_SIZE, or 2*PAGE_SIZE etc (below is just some results - many intermediate output are deleted).
./arch/cris/include/asm/processor.h:
#define KERNEL_STACK_SIZE PAGE_SIZE
./arch/ia64/include/asm/ptrace.h:
# define KERNEL_STACK_SIZE_ORDER 3
# define KERNEL_STACK_SIZE_ORDER 2
# define KERNEL_STACK_SIZE_ORDER 1
# define KERNEL_STACK_SIZE_ORDER 0
#define IA64_STK_OFFSET ((1 << KERNEL_STACK_SIZE_ORDER)*PAGE_SIZE)
#define KERNEL_STACK_SIZE IA64_STK_OFFSET
./arch/ia64/include/asm/mca.h:
u64 mca_stack[KERNEL_STACK_SIZE/8];
u64 init_stack[KERNEL_STACK_SIZE/8];
./arch/ia64/include/asm/thread_info.h:
#define THREAD_SIZE KERNEL_STACK_SIZE
./arch/ia64/include/asm/mca_asm.h:
#define MCA_PT_REGS_OFFSET ALIGN16(KERNEL_STACK_SIZE-IA64_PT_REGS_SIZE)
./arch/parisc/include/asm/processor.h:
#define KERNEL_STACK_SIZE (4*PAGE_SIZE)
./arch/xtensa/include/asm/ptrace.h:
#define KERNEL_STACK_SIZE (2 * PAGE_SIZE)
./arch/microblaze/include/asm/processor.h:
# define KERNEL_STACK_SIZE 0x2000
I'll throw my two cents in to get the ball rolling:
I'm not sure what a "typical" stack size would be. I would guess maybe 8 KB per thread, and if a thread exceeds this amount, just throw an exception. However, according to this, Windows has a default reserved stack size of 1MB per thread, but it isn't committed all at once (pages are committed as they are needed). Additionally, you can request a different stack size for a given EXE at compile-time with a compiler directive. Not sure what Linux does, but I've seen references to 4 KB stacks (although I think this can be changed when you compile the kernel and I'm not sure what the default stack size is...)
This ties in with the first point. You probably want a fixed limit on how much stack each thread can get. Thus, you probably don't want to automatically allocate more stack space every time a thread exceeds its current stack space, because a buggy program that gets stuck in an infinite recursion is going to eat up all available memory.
If you are using virtual memory, you do want to make the stack growable. Forcing static allocation of stack sized, like is common in user-level threading like Qthreads and Windows Fibers is a mess. Hard to use, easy to crash. All modern OSes do grow the stack dynamically, I think usually by having a write-protected guard page or two below the current stack pointer. Writes there then tell the OS that the stack has stepped below its allocated space, and you allocate a new guard page below that and make the page that got hit writable. As long as no single function allocates more than a page of data, this works fine. Or you can use two or four guard pages to allow larger stack frames.
If you want a way to control stack size and your goal is a really controlled and efficient environment, but do not care about programming in the same style as Linux etc., go for a single-shot execution model where a task is started each time a relevant event is detected, runs to completion, and then stores any persistent data in its task data structure. In this way, all threads can share a single stack. Used in many slim real-time operating systems for automotive control and similar.
Why not make the stack size a configurable item, either stored with the program or specified when a process creates another process?
There are any number of ways you can make this configurable.
There's a guideline that states "0, 1 or n", meaning you should allow zero, one or any number (limited by other constraints such as memory) of an object - this applies to sizes of objects as well.