I noticed quite a strange thing while trying to allocate a lot of memory on my iPhone 3G running iOS 4.2.1.
When I call malloc(512 * 1024) in a loop it returns a valid pointer for about 1500 times after which I get a NULL and
app(2032,0x3e7518b8) malloc: *** mmap(size=524288) failed (error code=12)
*** error: can't allocate region
It surprised me, since I think my iPhone doesn't 750 MB of RAM. Then I added a memset after the malloc and it brought the number of allocations down to 120, which makes much more sense.
Here's the super-simple code that I used:
for (int i = 1; ; ++i)
{
void *p = malloc(512 * 1024);
NSLog(#"%d %p", i, p);
memset(p, 0, 512 * 1024);
}
I though iPhone didn't have any virtual memory system that could explain behavior similar to this. What would be a reasonable explanation for this?
On iOS (and many other systems), a call to malloc() doesn't actually allocate memory. It requests memory from the OS/kernel, but the request is not satisfied until the memory is written to (e.g. with memset().) This allows for greater efficiency in the system's memory management, but it can result in misleading malloc() behaviour.
The iPhone definitely has a virtual memory system. What it's missing is the ability to page memory out to disk. In other words, it's missing swap space.
Related
First of all, is it even possible, reading groupshared data? Or is groupshared data required to be copied to some RWbuffer before transfering it to cpu memory? Since RWbuffers can't be groupshared (I'm assuming it's because you don't know the size of the buffer at compile time).
For those interested, this is the error it throws when declaring a groupshared buffer:
Shader error in 'FOWComputeShader': 'Result': groupshared variables cannot hold resources at kernel CSMain at ...
Basically I'm declaring a big groupshared uint array in the shader, worth 16kb. I'm linking a computebuffer in the main code to this groupshared array. Dispatching the shader, then reading back from the buffer. Sadly the data I read back is all 0.
I'm working in a unity environment with a compute shader, setting my buffer up like this:
// MapSize is 128 * 128, so 16kb
// sizeof(uint) is the stride size
// ComputeBufferType.Raw, because I intend to use each uint as 4 bytes later on, so I don't want funny stuff to happen to the values
ComputeBuffer FOWMapBuffer = new ComputeBuffer(MapSize, sizeof(uint), ComputeBufferType.Raw);
FOWComputeShader.SetBuffer(kernel, "_FoWMap", FOWMapBuffer);
//just the dispatch
int ThreadCount = Mathf.CeilToInt((float)FOWdata.Count / ThreadGroupSizeX);
FOWComputeShader.Dispatch(kernel, ThreadCount, 1, 1);
//outVisibleToFaction is a byte array of 128 * 128 size
FOWMapBuffer.GetData(outVisibleToFaction);
FOWMapBuffer.Dispose();
Then inside the shader:
// 4096 uints * 4 bytes per uint = 16kb
#define FoWMap_Size 4096
groupshared uint _FoWMap[FoWMap_Size];
[numthreads(32,1,1)]
void CSMain(uint3 id : SV_DispatchThreadID)
{
for (uint i = 0; i < FoWMap_Size; i++)
{
_FoWMap[i] = i;
}
}
That's my environment.
Does anyone know if reading back groupshared data is possible, if so then why is my buffer reading back all 0s?
No, you can't access groupshared memory on the CPU directly. Groupshared memory is a block of on-chip memory, and is the name suggests, it's only shared between the threads inside a single group, so there isn't even one single groupshared memory, but rather multiple instances (which may or may not co-exist, depending on hardware and shader). The lifetime of each block of groupshared memory ends once the thread group that it belongs to finished executing (which allows the hardware to re-use that memory for the next thread group). In your case, for example, you're actually dispatching ThreadCount groups, so there will be that many logical blocks of 16 kb groupshared memory, each 16 kb in size.
So, as a summary, groupshared memory is more like a temporary cache that you can use so the threads inside your thread group can communicate with each other. Nothing outside of these 32 threads in your thread group knows about content or even existance of that memory (since it only really exists while these threads are currently executing).
If anything outside of these 32 threads needs to have access to the memory, you will need to write it out to an RW Buffer.
I don't understand the difference between memory leak and null dereferencing. How are these two terms related?
The operating-system have a memory map for each process (each executable you start like the one you compile and run). This memory map tells the OS what pages of physical memory are allocated to the process. A memory leak is when you allocate memory using the new operator in C++ (or malloc() in C) but never release it later. The actual memory leak happens when you change the address that a pointer points to when it has been allocated with new without releasing memory first with delete.
There are 2 types of memory allocation. One is static the other is dynamic. Static memory allocation works like the following:
unsigned char memory[10];
In this example, I allocate 10 unsigned chars statically. This means that the memory will be allocated in the executable at compilation time. The executable will contain space for these unsigned chars I allocated statically. When you will launch the executable, the OS will place the content of that array in RAM (after loading the executable from disk). In this example, memory represents a pointer to the first element of the unsigned char array.
Dynamic memory works like the following (in C++):
unsigned char* memory = new unsigned char[10];
In this example, I allocate 10 unsigned chars on the heap instead of statically in the executable. The heap is managed by the OS and grows according to how much memory you allocate. There is no limit to memory allocation with new. Nothing prevents a program from allocating the whole RAM. If a program runs for a long time and it has memory leaks, it could allocate a lot of RAM until the OS starts having a hard time to make it work with the rest of the system (or until the amount of memory allocated to the process is bigger than RAM).
This works by doing a system call in the OS. When you compile a program which has the above dynamic memory allocation, you compile the line to a system call in the kernel to ask for memory. This is OS specific so the program you compile will have to be recompiled to work on a different OS.
In the meantime, you can create a nullptr or initialize a pointer to 0 like the following:
unsigned char* ptr = nullptr;
or
unsigned char* ptr = 0;
When you dereference that pointer, the dereference will be compiled to a memory fetch. The memory fetch will trigger a page fault because the memory at 0 wasn't allocated to your process. Then the OS will look in it's memory map for the process. It will determine that this address access wasn't legal and kill the process.
The terms are pretty much different and there isn't much relation between the 2.
I have post here ,a function that i use , to get the accelerator fft .
Setup the accelerator framework for fft on the iPhone
It is working great.
The thing is, that i use it in real time, so for each new audio buffer i call this function with the new buffer.
I get a memory warning because of these lines (probably)
A.realp = (float *) malloc(nOver2 * sizeof(float));
A.imagp = (float *) malloc(nOver2 * sizeof(float));
questions :
do i have another way, but to malloc them again and again(dont forget i have to feed it with a new buffer many times a second )
how exactly do i free them? (code lines)
can it caused by the fact that the fft is heavy to the system ?
Any way to get rid of this warning will help me a lot .
Thanks a lot.
These things should be done once, at the start of your program:
Allocate memory for buffers, using code like float *buffer = malloc(NumberOfElements * sizeof *buffer);.
Create an FFT setup, using code like FFTSetup setup = vDSP_create_fftsetup(log2n, FFT_RADIX2);.
Also test the return values. If malloc or vDSP_create_fftsetup returns 0, write an error message and exit the program or take other exception behavior.
These things should be done once, at the end of your program:
Destroy the FFT setup, using code like vDSP_destroy_fftsetup(setup);.
Release the memory for the buffers, using code like free(buffer);.
In the middle of your program, while you are processing samples, the code should use the existing buffers and setup. So the variables pointing to the buffers and the setup must be visible to that code. You can either pass them in as parameters (perhaps grouped together in a struct) or make them global (which should be only a temporary solution for small programs).
Your program should be arranged so that it is never necessary to allocate memory or create an FFT setup while samples are being processed.
All memory that is allocated should be freed eventually.
If you are malloc'ing and never freeing, you will run out of memory. Make sure to 'free' your memory using free().
*Note: free() doesn't actually erase any memory. It simply tells the system that we're done with the memory and it's available for other allocations.
// Example:
// allocating memory
int *intpointer;
intpointer = malloc(sizeof(int));
// ... do stuff...
// 'Freeing' it when you are done
free(intpointer);
I got this error only if I select Release or Distribution configuration on Device, on the Simulator it works well... please, where I mistake?
cc1obj(4113) malloc: *** mmap(size=429379584) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
cc1obj: out of memory allocating 429376832 bytes after a total of 0 bytes
{standard input}:13160:non-relocatable subtraction expression,
Thanks for your help! :)
Probably because your simulator is able to allocate ~500Mb of memory while your iPhone is not able to do it. I think you should rethink what you are doing
do you really need so much memory?
isn't it just a calculating bug? (maybe a wrong sizeof or whatever)
in any case this is really too much data to be handled
It looks like you're trying to allocate 429 MB. iPhones don't have that much RAM. I suggest you look at what you're allocating to see why it's so big.
I'm trying to find all memory abandoned using instruments.
The leaks test has been passed and at least it can't find any memory leak.
I'm doing some repeated actions between each Marked Heap, and the average is 100,00 kb for heap growth and 1000 objects alive.
Doing a quick search on each snapshot, I found 700 with a heap of 64 kbytes.
The other are some objects used by internals iOS, like:
UIDeviceWhiteColor => responsible caller +[UIColor allocWithZone:] and I can find only the Malloc, but not the release.
I'm using the whiteColor like this:
scoreLabel.textColor = [UIColor whiteColor];
so, all this objects, are really going down ?
this is a complex example to debug/analyze as it navigates through 9 UIViewControllers, and each round takes aprox. 2 minutes to complete (the user must enter some data ...)
In other easier parts of this project, the heap realy has 0 bytes and 0 objects, but it's a simple one.
thanks for your advice,
regards,
m.
Some things will be cached, and therefore not released. You could try triggering a memory warning.
I wouldn't worry too much about small leaks.