NSKeyedArchiver on NSArray has large size overhead - iphone

I'm using NSKeyedArchiver in Mac OS X program which generates data for iPhone application. I found out that by default, resulting archives are much bigger than I expected. Example:
NSMutableArray * ar = [NSMutableArray arrayWithCapacity:10];
for (int i = 0; i < 100000; i++) {
NSString * s = [NSString stringWithFormat:#"item%06d", i];
[ar addObject:s];
}
[NSKeyedArchiver archiveRootObject:ar toFile: #"NSKeyedArchiver.test"];
This stores 10 * 100000 = 1M bytes of useful data, yet the size of the resulting file is almost three megabytes. The overhead seems to be growing with number of items in the array. In this case, for 1000 items, the file was about 22k.
"file" reports that it is a "Apple binary property list" (not the XML format).
Is there an simple way to prevent this huge overhead? I wanted to use the NSKeyedArchiver for the simplicity it provides. I can write data to my own, non-generic, binary format, but that's not very elegant. Also, aggregating the data into large chunks and feeding these to the NSKeyedArchiver should work, but again, that kinda beats the point of using simple&easy&ready to use archiver. Am I missing some method call or usage pattern that would reduce this overhead?

I filed a bug to track this.
That aside, NSKeyedArchiver is designed for archiving object networks. Like, if an object appears twice in the graph, on unarchiving you'll still find that is the case. You're probably seeing overhead for this kind of uniquing.
For hierarchical structured data as opposed to arbitrary object networks, try NSPropertyListSerialization. I see 1.8MB for a binary plist.

Related

Get lower quality UIImage from NSData?

I have an image compressed into NSData using JPEG compression. I access it with [UIImage imageWithContentsOfFile:]. With larger images though, this takes a few seconds. Is there a faster way to load images from the file system, perhaps at the same speed that images are loaded from the bundle? And if not, is there a way to load a lower quality version of the image temporarily while the full quality version loads, other than saving a lower quality version too?
Although you may be able to build something like this using JPEG 2000 (you'd need to build your own copy of the jpeg library as discussed here, and then hand-write the reading code), I don't think you're going to get good return on investment there. The cost of reading the data off disk is still likely to overwhelm everything else.
First, if you're reading from your bundle, use PNG if at all possible. iOS highly optimizes PNGs stored in the bundle (part of the copying process is to rewrite them in an iOS-specific optimized format).
No matter what you do, if you want a place holder you are probably going to need to provide it somehow yourself, either as a separate file, or as a custom file format that you read and manage yourself. This wouldn't be an incredibly difficult format to devise, but you'd still need to do all the resizing beforehand somewhere.
The main key is that reading a large image file is expensive and you shouldn't do it on the main thread. You need to do this stuff on a background queue (GCD or operation) and update the UI when the data becomes available. There's no really easy way around this fact.
Low qulaity is smaller file than others... Here is the code to check the files in document folder of an app.
NSFileManager *manager = [NSFileManager defaultManager];
if ([manager fileExistsAtPath:path]) {
NSDictionary *attributes = [manager attributesOfItemAtPath:path error:nil];
unsigned long long size = [attributes fileSize];
resultlbl.text = [NSString StringWithFormat:#"%d",size];
}

Converting/uploading large amounts of data from iPad to Dropbox

I'm finishing up my app by running it through Instruments as well as stressing it with large amounts of data. The Instruments tests go fine, but the stress test is where I'm having issues. Without getting into too much detail, I'm giving my app increasing amounts of Core Data events with which it needs to extrapolate data, make graphs, and present locations on a MKMapView instance. I started small and increased to 56000 events, which it handled fine wihtout any leaks or memory warnings (and I was quite proud of it for handling it all).
My app implements the Dropbox API to allow for uploading and downloading templates and data for sync purposes. Files uploaded from my app are converted from Core Data to an NSDictionary, then to NSData. I create a temporary folder for the data, then upload that file to Dropbox, which works fine.....normally. If I try to upload my data file with 56000 events, then it crashes. I've logged it and watched as the data is converted. It reaches the last event with no issues, but when it's supposed to start uploading to Dropbox, the app crashes and I cannot for the life of me figure out why. I see memory warnings pop up on my log. Typically, it will go Level=1, Level=2, Level=1, Level=2, then crash, which confuses me as it never reaches Level=3.
The majority of the information I've found is in my edit at the botton. Below is some relevant code:
- (void)uploadSurveys:(NSDictionary *)dict {
NSArray *templateArray = [dict objectForKey:#"templates"];
NSArray *dataArray = [dict objectForKey:#"data"];
NSString *filename;
NSLog(#"upload called");
if ([templateArray count] || [dataArray count]) {
if ([templateArray count]) {
// irrelevent code;
}
if ([dataArray count]) {
SurveyData *survey;
for (int i = 0; i < [dataArray count]; i++) {
BOOL matchExists = NO;
// ...... code to make sure no file exists in dropbox folder and creates new version if necessary;
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^{
NSData *data = [self convertSurvey:survey];
dispatch_async(dispatch_get_main_queue(), ^{
[self uploadData:data withFilename:filename];
NSLog(#"converted and uploading");
});
});
}
}
}
[self convertSurvey:survey] simply converts my Core Data object to NSData.
- (void)uploadData:(NSData *)data withFilename:(NSString *)filename {
NSFileManager *manager = [NSFileManager defaultManager];
NSString *pathComponent = [NSString stringWithFormat:#"tempData.%#", filename];
NSString *path = [NSTemporaryDirectory() stringByAppendingPathComponent:pathComponent];
if ([manager createFileAtPath:path contents:data attributes:nil]) {
[self.restClient uploadFile:filename toPath:[NSString stringWithFormat:#"/%#", currentSearch] fromPath:path];
NSLog(#"uploading data");
}
}
Any help would be much appreicated and I thoroughly thank you in advance. I'm just trying to figure out if I'm either taking the wrong approach for large files or if it's simply not allowed. If I have to split the files, that is fine, but I'd prefer to know what is going on that prevents my app from performing this action before I try to make a workaround. Thank you again.
UPDATE: As this issue is now the only hinderance to the release of my application, I'm adding a bounty to this question to hopefully get a solution or workaround. It will be up for a week, after which given time I am most likely going to just split up the files as they upload to ensure that this apparent size limit is not reached. This approach is not ideal, which is why a better solution is very welcomed, but is my backup plan if this fails to bring in something more convenient.
EDIT: It appears that NSTemporaryDirectory plays no part in this at all. Here is the new situation. As you can see in the code above, NSData *data = [self convertSurvey:survey]; is called in a secondary thread (which isn't the issue). I have been logging the objects created and knew that they had reached the last one, but never thought to check and see if the NSData file was returned. Turns out, it isn't. In short, I convert all my Core Data objects into arrays and place them into a dictionary (only for the relevant survey/data to be converted). This does indeed work and the dictionary is created. Then I create an NSData file using NSData *data = [NSKeyedArchiver archivedDataWithRootObject:d]; where d is my dictionary. Directly after that, I call return data; to set the value for NSData *data = [self convertSurvey:survey];. This being the case, it appears the NSData or NSKeyedArchiver are at fault here. According to the Apple documentation:
Using 32-bit Cocoa, the size of the data is subject to a theoretical 2GB limit (in practice, because memory will be used by other objects this limit will be smaller); using 64-bit Cocoa, the size of the data is subject to a theoretical limit of about 8EB (in practice, the limit should not be a factor).
I have checked the file sizes in small increments to see where the failure occurs. I have successfully gotten 48.2MB of data through, but not 51.5MB, which leads me to believe that the issue occurs around 50MB, well below the theoretical limit for NSData (unless there is a discrepancy between iOS and OS X in that respect).
Hopefully this new information will help to solve this problem
The 2 GB limit for NSData is completely theoretical on iOS, even the iPhone 4 only has 512 MB of RAM and iOS (unlike Mac OS X) cannot swap, so if your physical RAM is full, you crash (or your app is terminated before that).
The 50 MB NSData object alone is already very large and it's not the only object you have in memory – given that you convert the data from Core Data to a dictionary representation and then to NSData, you probably consume at least twice as much memory (likely more). The system and other apps also need RAM, so you're probably reaching a limit.
Try running your app in Instruments to see how much memory you actually consume.
To reduce your peak memory usage, you have a couple of options that largely depend on your data model:
As Jason Foreman suggested in his answer, try to avoid having your whole file in memory at once. Using NSFileHandle, you can write chunks of data to a file without needing to have the whole data in memory at once. Of course, this requires that you prepare your data accordingly, so that it can be split into chunks. A higher-level approach might be to serialize your data into an XML format that you could write out as a stream. If your data format is very simple, something like CSV might also work.
Don't use NSData for uploading to Dropbox. Write your data to a file instead (see above) and point the Dropbox SDK to that file. The Dropbox SDK makes it pretty easy to do so (DBRestClient has an uploadFile:toPath:fromPath: method).
If your data model makes it difficult to take a streaming approach, try to segment the data into more manageable parts. You could then use your old method of serializing dictionaries, just with multiple files.
Be careful with Core Data's memory usage. Try to re-fault objects using refreshObject:mergeChanges: if possible to break cyclic references within your data (see the Core Data Programming Guide for details).
Avoid using autorelease pools while you're in a long-running loop or create a separate NSAutoreleasePool that gets drained in each iteration of your loop.
A way to work around this type of memory pressure is to build your APIs using streams, both for writing your converted data to a file on disk and also for uploading the data to a web service.
During conversion you can use an NSOutputStream to write chunks of data to the file to avoid keeping an large chunk of data in memory at one time. Then, NSMutableURLRequest can accept an NSStream for the body instead of an NSData, so you should create an NSInputStream to read from your file back from disk and upload it.
Using streams in this way will ensure you never have 50+ MB of data loaded and should avoid the memory warnings you are seeing.

How can I reuse an NSData to read multiple large files?

I need to read several dozen files and do some trivial processing with their contents. Each file individually won't cause problems, but having all the data loaded at once will quickly exhaust my memory.
I started with:
for (NSString *filename in filenames)
do_something([NSData dataWithContentsOfFile:filename]);
Then of course, I remembered that Objective-C on the iPhone is not really garbage collected, and those would all stick around until the end of the frame anyway. Okay:
for (NSString *filename in filenames) {
NSData *d = [[NSData alloc] initWithContentsOfFile:filename];
do_something(d);
[d release];
}
This nominally only uses as much memory as the largest file, but that's only assuming the allocator is playing friendly at the moment - it could also thrash and fragment everything.
Is there some way I can make an NSMutableData, and keep reusing that Data's buffer, growing it as necessary? I need it as an NSData for other third-party APIs. The best idea I have at the moment is mallocing/reallocing a char* buffer as I go, reading using e.g. stdio, and constructing NSDatas with freeWhenDone:NO backed by that; that way I only thrash/retain a small amount per file.
What you are doing is the second example is fine. Even if you reused an NSMutableData object for its capacity another NSData object would need to be created with the file contents. If you are running into memory issues consider modifying do_something() to work with NSInputStreams.
You could use -[NSData initWithContentsOfMappedFile:] with your second example to keep the memory usage as low as possible.
From the documentation:
A mapped file uses virtual memory techniques to avoid copying pages of the file into memory until they are actually needed.

CSV parser with low memory footprint for iPhone

After testing my app with Instruments I realized that the current CSV parser I use has a huge memory footprint. Does anybody have a recommendation for one with a low memory footprint?
You probably should do this row-by-row, rather than reading the whole file, parsing it, and returning an array with all the rows in it. In any case, the code you linked to produces zillions of temporary objects in a loop, which means it'll have very high memory overhead.
A quick fix would be to create an NSAutoreleasePool at the lop of the loop, and drain it at the bottom:
while ( ![scanner isAtEnd] ) {
NSAutoreleasePool *innerPool = [[NSAutoreleasePool alloc] init];
... bunch of code...
[innerPool drain];
}
This will wipe out the temporary objects, so your memory usage will be the size of the data, plus an object for each string in the file (roughly 8 bytes * rows * columns)
There are some other CSV parsers to try:
http://michael.stapelberg.de/cCSVParse
http://cocoawithlove.com/2009/11/writing-parser-using-nsscanner-csv.html (my own blog)
You could experiment to see if either is lower memory overhead.
Neither of these supports "event based" parsing. In event based parsing, you never load the whole source file into memory, just enough of the file to read the current row (you can also do this in-progress on a download). You must handle each row as it is read and make certain all data from the source is freed between rows.
This would be the theoretical lowest overhead solution. If you really needed low overhead, you should adapt an existing solution to do that (I don't have any advice on how this would be done).
It's not a CSV parser, but my open source Cocoa ParseKit framework has a powerfull/convenient/configurable string tokenizer which might be handy for CSV or other types of parsing/tokenizing.
The framework:
http://parsekit.com
Some usage documentation:
http://parsekit.com/tokenization.html
The PKTokenizer class:
http://github.com/itod/parsekit/blob/master/include/ParseKit/PKTokenizer.h
http://github.com/itod/parsekit/blob/master/src/PKTokenizer.m

Bluetooth Transfer for Core Data Entities

How would I go about using bluetooth to transfer a core data entity with it's corresponding relationships? I have three core data entities with inverse relationships set up and it all works fine, but I need to transfer these to another iPhone based on the context that it is not in the corresponding table in the core data entity set on the other iPhone. I know how to transfer simple things such as strings and integers over bluetooth, but this is on a whole new level, and I only started programming for iPhone around 4 month ago. Thanks for all your help you experts!
EDIT:
Thanks, but for some reason I keep getting this error! What should I do?
2010-02-12 21:24:14.907 PitScout[92918:207] Failed to call designated initializer on NSManagedObject class 'Team'
2010-02-12 21:24:14.907 PitScout[92918:207] *** -[Team setTeamNumber:]: unrecognized selector sent to instance 0x112b630
2010-02-12 21:24:14.908 PitScout[92918:207] *** Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: '*** -[Team setTeamNumber:]: unrecognized selector sent to instance 0x112b630'
Thanks.
You will need to serialize your objects in some way to transfer and then re-insert into a context on the other side. I suggest looking into the NSCoding protocol and examples which will allow you to use NSKeyedArchiver and NSKeyedUnarchiver to serialize your objects to NSData for transfer (or base64 encoded to an NSString if necessary).
First make sure your model object implements NSCoding:
#interface MyObject : NSManagedObject <NSCoding>
And then implement the following methods in your model object to handle the encoding and decoding of the objects:
-(id)initWithCoder:(NSCoder*)coder
{
if (self = [self init])
{
self.myProperty = [coder decodeObjectForKey:#"myProperty"];
}
return self;
}
-(void)encodeWithCoder:(NSCoder*)coder
{
[coder encodeObject:self.message forKey:#"myProperty"];
}
Use NSKeyedArchiver to serialize your object to NSData:
NSData *data = [NSKeyedArchiver archivedDataWithRootObject:myObject];
Use NSKeyedUnarchiver to deserialize:
MyObject *myObject = (MyObject *)[NSKeyedUnarchiver unarchiveObjectWithData:myData];
If a string is required then you'll have to base64 encode and decode the NSData, see this post for details on that: How do I do base64 encoding on iphone-sdk?
Trying to serialize NSManagedObject instances is going to fail because they are tied directly to the NSManagedObjectContext that they come from.
You will need to translate them into another data structure and then transmit them. Both JSON and XML work very well for this and since you can use KVC to get the data out of an NSManagedObject and into a NSDictionary which can then easily be translated into the intermediate format.
Once you have them in the intermediate format and sent over the wire then you can easily reconstruct them into the destination NSManagedObjectContext without issue.
It may be over kill for this but a method that has yet to fail me is SLIP, RFC 1055 the 1988 version. For years i have used it to map blocks of data into a 7 or 8 bit ASCII stream for transmission over every media I have encountered. Then used the inverse or some modification of it to convert the stream back to their needed configuration on the other end. Examples of the code in C are in the RFC. I always used Phil Karn's suggestion to use the same character for both the start and end of packet. 
That way only one routine is needed to deal with the stream. It gobble up characters until the SOP/EOP is encountered. This was chosen to deal with noise that can accumulate on the input of radio links as they sit idle awaiting data. Phil address that in other writings.
I usually use \x0D or \x0A which ever the system the debugging tools run on uses for as a carriage return and use the ever popular back slash '\' as the escape character. Now and then it is handy to use another control code or use differ values for the control characters to reduce the packet size. Use of the system as allows a terminal program with the code for SLIP added and a few modifications  to function as a monitor and as tool to enter packets into the stream by hand.
I have always found I had enough options if the first character in the packet indicated the options on the other end. Of course some form of error checking and either/or error recovery and ability to re-transmit a MUNGED packet must be provided. For small packets of data sent over highly reliable links a simple checksum might do or in the case transmissions using three mineralized volcanos as antenna sites that a bit farther apart than one would like a highly redubpndantr Fowarad Error Correction algorithim is right at home.  
SLIP is versatile enough to take data from a 16 bit Motorola 68HC11 and reconstruct it on a 32 bit Intel system if the programmer reverses the endedness and takes care of the offset between 16 & 32 bit data.
Gordon 
Gordon Couger
Stillwater, OK