When we define a variable with following syntax does that mean it is hanging in the memory all the time:
static NSString *const kMyLabel = #"myLabel";
I have 100 of constants. Should I go with this of #define pre-processor compiler considering that #define will not keep them alive in the memory.
Hardcoded strings, in the format #"my string", are baked into the application binary. In order to make it not be permanent, you'd have to do:
static NSString *kMyLabel = nil;
...somewhere else
kMyLabel = [[NSMutableString alloc] initWithString:#"myLabel"];
But that'd be stupid, because then you'd have both #"myLabel" in memory (because it's part of the app binary) AND your allocated string. So double the memory.
In short:
If you have a constant string, there's no way to "unload" it from memory. And unless you're hard coding a few chapters from a book into your binary, it's not going to be something to worry about. Have you measured it as being a performance issue?
There would be no difference between a constant static variable and #define directive. When using #define, the preprocessor will replace the variable with #"myLabel" every time it is used. This could mean that you have one instance of the string for each use, but the compiler combines them so that any strings in the binary are unique. Using the constant static, the code will load the location of the variable when needed. This means #define may be a tiny bit faster as there is less dereferencing to get the string, but it would be unnoticeable.
It will be "in memory", but it will just be a memory mapped section of your application's executable file. If there's memory pressure, that page will be flushed without writing to disk.
Basically, it's "free" except for a tiny bit of IO on startup. Go nuts with them.
Related
I'm relatively new to CUDA programming, so I want to clarify the behaviour of a struct when I pass it into a kernel. I've defined the following struct to somewhat imitate the behavior of a 3D array that knows its own size:
struct protoarray {
size_t dim1;
size_t dim2;
size_t dim3;
float* data;
};
I create two variables of type protoarray, dynamically allocate space to data via malloc and cudaMalloc on the host and device side, and update dim1, dim2 and dim3 to reflect the size of array I want this struct to represent. I read in this thread that the struct should be passed via copy. So this is what I do in my kernel
__global__ void kernel(curandState_t *state, protoarray arr_device){
const size_t dim1 = arr_device.dim1;
const size_t dim2 = arr_device.dim2;
for(size_t j(0); j < dim2; j++){
for(size_t i(0); i < dim1; i++){
// Do something
}
}
}
The struct is passed by copy, so all its contents are copied into shared memory of each block. This is where I'm getting bizarre behaviour, which I'm hoping you could help me with. Suppose I had set arr_device.dim1 = 2 on the host side. While debugging inside the kernel and setting a breakpoint at one of the for loops, checking the value of arr_device.dim1 yields something like 16776576, nowhere large enough to cause overflow, but this value copies correctly into dim1 as 2, which means that the for loops execute as I intended them to. As a side question, is using size_t which is essential unsigned long long int bad practice, seeing as the GPU's are made of 32bit cores?
Generally, how safe is it to pass struct and class into kernels as arguments, is bad practice that should be avoided at all cost? I imagine that passing pointers to classes to kernels is difficult in case they contain members which point to dynamically allocated memory, and that they should be very lightweight if I want to pass them by value.
This is a partial answer, since without a proper program to look into, it is difficult/impossible to guess why you would see an invalid value in your arr_device.dim1.
The struct is passed by copy, so all its contents are copied into shared memory of each block.
Incorrect. Kernel arguments are stored in constant memory, which is device-global and not block-specific. They are not stored shared memory (which is block-specific).
When a thread runs, it typically reads arguments from constant memory into registers (and again, not shared memory).
Generally, how safe is it to pass struct and class into kernels as arguments
My personal rule of thumb on this matter is: If the struct/class...
is trivially-copyable; and
all its members of the struct/class are defined both for the host and the device side, or at least - designed with GPU use in mind;
then it should be safe to pass to a kernel.
passing struct and class into kernels as arguments [ - ] is [it] bad practice that should be avoided at all cost?
No. But remember that most C++ libraries only provide host-side code; and were not written with a mind of being used on a GPU. So I'd be wary of using non-trivial classes without a lot of scrutiny.
I imagine that passing pointers to classes to kernels is difficult in case they contain members which point to dynamically allocated memory
Yes, this can be problematic. However - if you used cuda::memory::managed::allocate(), cuda::memory::managed::make_unique() or cudaMallocManaged() - then this should "just work", i.e. the relevant memory pages will be fetched to the GPU or the CPU as necessary when accessed. See:
Unified Memory in CUDA for beginners
Beyond GPU Memory Limits with Unified Memory on Pascal
and that they should be very lightweight if I want to pass [objects to kernels] by value.
Yes, because each and every thread has to read each argument from constant memory before it can use that argument. And while constant memory allows this to happen relatively quickly, it's still a bunch of overhead that you want to minimize.
Also remember that you can't pass anything to kernels by (C++) reference; it's all "by-value" - the object itself or a pointer to it.
I need to read several dozen files and do some trivial processing with their contents. Each file individually won't cause problems, but having all the data loaded at once will quickly exhaust my memory.
I started with:
for (NSString *filename in filenames)
do_something([NSData dataWithContentsOfFile:filename]);
Then of course, I remembered that Objective-C on the iPhone is not really garbage collected, and those would all stick around until the end of the frame anyway. Okay:
for (NSString *filename in filenames) {
NSData *d = [[NSData alloc] initWithContentsOfFile:filename];
do_something(d);
[d release];
}
This nominally only uses as much memory as the largest file, but that's only assuming the allocator is playing friendly at the moment - it could also thrash and fragment everything.
Is there some way I can make an NSMutableData, and keep reusing that Data's buffer, growing it as necessary? I need it as an NSData for other third-party APIs. The best idea I have at the moment is mallocing/reallocing a char* buffer as I go, reading using e.g. stdio, and constructing NSDatas with freeWhenDone:NO backed by that; that way I only thrash/retain a small amount per file.
What you are doing is the second example is fine. Even if you reused an NSMutableData object for its capacity another NSData object would need to be created with the file contents. If you are running into memory issues consider modifying do_something() to work with NSInputStreams.
You could use -[NSData initWithContentsOfMappedFile:] with your second example to keep the memory usage as low as possible.
From the documentation:
A mapped file uses virtual memory techniques to avoid copying pages of the file into memory until they are actually needed.
Do
[[NSMutableArray alloc] init];
and
[[NSMutableArray alloc] initWithCapacity:0];
compile into the exact same thing?
If they differ, then how, and which form is "better" in terms of memory and runtime performance?
No, they will not behave identically. init will create an array with some unknown but most likely nonzero capacity—whatever the authors decided was a reasonable default for most situations. initWithCapacity:0 will request an array with absolutely no allocated space. What this means is up to the NSMutableArray implementation: it might not allow a capacity of 0 and behave exactly as init, or it might not immediately allocate anything and instead allocate some amount (the same as init, possibly, or maybe not) when you first need it.
Which will perform "better" completely depends on how you expect to use the array. -init says "I have no expectations for this array; I'll either be adding to and removing from it a lot, or it will likely be pretty small." -initWithCapacity: says "I expect this array to have no more than this many elements for the foreseeable future."
About the only place where I would expect initWithCapacity:0 to be reasonable is if you are creating an array that you don't expect to fill with anything for a fairly long period of time.
Note that the standard performance caveat applies here: it's probably not an issue unless you profile and determine that it is. I can't really imagine a situation (except for thousands upon thousands of NSMutableArray objects) where the difference between these two will be substantial.
I've been looking through the apple documentation for the NSdata class, and I didn't really find it too enlightening. I know how to use the class but I don't really understand the gravity of the advantages that it may or may not provide. I know its a simple question but perhaps it would be good to have such information as a reference.
Advantages over what? Certainly, it's useful to represent an arbitrary block of data as an object just as it's useful to represent a string, a number, or a value as an object. Memory management becomes simpler and is consistent with memory management for all other objects, and there are a number of useful methods defined.
Say you want to read a binary file into memory. We won't worry about the reasons why -- there are as many reasons as there are data file formats. You'll have to:
Check the size of the file
Allocate a block of memory of the proper size
Open the file
Read the contents into memory
Close the file
Remember to free the memory when you're done with it (a condition that can sometimes be tricky to detect)
(Optional) Worry about whether the block of memory has been modified
With NSData, you can just create a new instance from a path or URL and not have to think about the rest.
After testing my app with Instruments I realized that the current CSV parser I use has a huge memory footprint. Does anybody have a recommendation for one with a low memory footprint?
You probably should do this row-by-row, rather than reading the whole file, parsing it, and returning an array with all the rows in it. In any case, the code you linked to produces zillions of temporary objects in a loop, which means it'll have very high memory overhead.
A quick fix would be to create an NSAutoreleasePool at the lop of the loop, and drain it at the bottom:
while ( ![scanner isAtEnd] ) {
NSAutoreleasePool *innerPool = [[NSAutoreleasePool alloc] init];
... bunch of code...
[innerPool drain];
}
This will wipe out the temporary objects, so your memory usage will be the size of the data, plus an object for each string in the file (roughly 8 bytes * rows * columns)
There are some other CSV parsers to try:
http://michael.stapelberg.de/cCSVParse
http://cocoawithlove.com/2009/11/writing-parser-using-nsscanner-csv.html (my own blog)
You could experiment to see if either is lower memory overhead.
Neither of these supports "event based" parsing. In event based parsing, you never load the whole source file into memory, just enough of the file to read the current row (you can also do this in-progress on a download). You must handle each row as it is read and make certain all data from the source is freed between rows.
This would be the theoretical lowest overhead solution. If you really needed low overhead, you should adapt an existing solution to do that (I don't have any advice on how this would be done).
It's not a CSV parser, but my open source Cocoa ParseKit framework has a powerfull/convenient/configurable string tokenizer which might be handy for CSV or other types of parsing/tokenizing.
The framework:
http://parsekit.com
Some usage documentation:
http://parsekit.com/tokenization.html
The PKTokenizer class:
http://github.com/itod/parsekit/blob/master/include/ParseKit/PKTokenizer.h
http://github.com/itod/parsekit/blob/master/src/PKTokenizer.m