I'm looking for the best read performance for a bunch (~200) cached 80px by 80px images. A large chuck (~50) will all needed to be accessed at once.
Should I store the uiimages (as binary data) in a plist or using core data?
A couple of basic concepts:
CoreData is probably the worst way to go for the image data, the documentation states that BLOB segments in the store cause massive performance problems.
Use the file system for what it's built for, read / write access of random chunks of data.
The rest depends on how you organize your data, so here are some thoughts:
80x80 is pretty small, you could probably hold 50 or so in memory at a given time.
You need a way to hash the images into some kind of structure so you know which ones to fetch. I would use core data for storing the locations of the images on the file system, and back your view with an NSFetchedResultsController to pull out the list of file names.
Use some in memory data structure to store the UIImage objects, a FIFO queue with a size of 50 would work well here, as it gets a new image from the file system it pops out the oldest one.
Finally you have to know what images you're going to view and stay ahead of it, file system reads won't be super fast, so you'll need to either chunk your reads or stay far enough of your view to avoid lagging. If your view is showing 50, you might want to keep 100 in memory, 50+ 25 previous and 25 next if you're scrolling for example.
A premature optimization:
If read performance is essential, it would be worth while to store the images in "page" sized chunks, such as a zip of 5 or n images that can be read into memory at once, and then split into their corresponding UIImage files.
Related
I have large amounts of data stored as nested cells in .mat files. My biggest problem right now is the load times for accessing these files, but I'm wondering if the underlying problem is that I came up with an inefficient way for storing the data and I should restructure it to be smaller.
The full file consists of a cell aray:
Hemi{1,h} where there are 52 versions of h
.{n,p} where there are 85 versions of n and up to ~100 versions of p
.Variable where there are 10 variables, each with ~2500 values
This full file ate up all my memory, so I saved it in parts, aka:
Hemi1.mat=Hemi{1,1}
Hemi2.mat=Hemi{1,2}
etc.
The next step for this application is to load each file, determine which part of it is an appropriate solution (I need Hemi{1,h}.{n,p}.Var1, Hemi{1,h}.{n,p}.Var2, and Hemi{1,h}.{n,p}.Var3 for this, but I still need to keep track of the other Variables), save the solution, then close the file and move to the next one.
Is there a faster way to load these files?
Is the problem less my dataset and more how I've chosen to store it? Is there a better alternative?
That is quite a lot of data. I have a few suggestions that you could look into. The first is to see if you can change the datatypes to something like Categorical Objects. They are way more memory efficient. Also if you are storing strings as your final data this can be quite heavy storage wise.
Second you could look into HDF5 file storage. I hear it is a nice way to store structured data.
You could finally try to convert your {n,p} arrays into Table structures. I am not sure if this is better for memory, but tables are nice to work with and it may help you out. (Depending on your version of Matlab you may not have tables :P).
I hope this helps!
-Kyle
I have a 215MB csv file which I have parsed and stored in core data wrapped in my own custom objects. The problem is my core data sqlite file is around 260MB. The csv file contains about 4.5million lines of data on my city's transit system (bus stop, times, routes etc).
I have tried modifying attributes so that arrays of strings representing stop times are stored instead as NSData files but for some reason the file size still remains at around 260MB.
I can't ship an app this size. I doubt anyone would want to download a 260MB app even if it means they have the whole city's transit schedule on it.
Are there any ways to compress or minimize the storage space used (even if it means not using core data, I am willing to hear suggestions)?
EDIT: I just want to provide an update right now because I have been staring at the file size in disbelief. With some clever manipulation involving strings, indexing and database normalization in general, I have managed to reduce the size down to 6.5MB or 2.6MB when compressed. About 105,000 objects stored in Core Data containing the full details of the city's transit system. I'm almost in tears right now D':
Unless your original CSV is encoded in a really foolish manner, it seems unlikely that the size is not going to get below 100M, no matter how much you compress it. That's still really large for an app. The solution is to move your data to a web service. You may want to download and cache significant parts, but if you're talking about millions of records, then fetching from a server seems best. Besides, I have to believe that from time to time the transit system changes, and it would be frustrating to have to upgrade a many-10s-of-MB app every time there was a single stop adjustment.
I've said that, but actually there are some things you may consider:
Move booleans into a bit fields. You can put 64 booleans into an NSUInteger. (And don't use a full 64-bit integer if you just need 8 bits. Store the smallest thing you can.)
Compress how you store times. There are only 1440 minutes in a day. You can store that in 2 bytes. Transit times are generally not to the second; they don't need a CGFloat.
Days of the week and dates can similarly be compressed.
Obviously you should normalize any strings. Look at the CSV for duplicated string values on many lines.
I generally would recommend raw sqlite rather than core data for this kind of problem. Core Data is more about object persistence than raw data storage. The fact that you're seeing a 20% bloat over CSV (which is not itself highly efficient) is not a good direction for this problem.
If you want to get even tighter, and don't need very good searching capabilities, you can create packed data blobs. I used to do this on phone switches where memory was extremely tight. You create a bit field struct and allocate 5 bits for one variable, and 7 bits for another, etc. With that, and some time shuffling things so they line up correctly on word boundaries, you can get pretty tight.
Since you care most about your initial download size, and may be willing to expand your data later for faster access, you can consider very domain-specific compression. For example, in the above discussion, I mentioned how to get down to 2 bytes for a time. You could probably get down to 1 bytes in many cases by storing times as delta minutes since the last time (since most of your times are going to be always increasing by fairly small steps if they're bus and train schedules). Abandoning the database, you could create a very tightly encoded data file that you could extract into a database on first launch.
You also can use domain-specific knowledge to encode your strings into smaller tokens. If I were encoding the NY subway system, I would notice that some strings show up a lot, like "Avenue", "Road", "Street", "East", etc. I'd probably encode those as unprintable ASCII like ^A, ^R, ^S, ^E, etc. I'd probably encode "138 Street" as two bytes (0x8A13). This of course is based on my knowledge that รจ (0x8a) never shows up in the NY subway stops. It's not a general solution (in Paris it might be a problem), but it can be used to highly compress data that you have special knowledge of. In a city like Washington DC, I believe their highest numbered street is 38th St, and then there's a 4-value direction. So you can encode that in two bytes, first a "numbered street" token, and then a bit field with 2 bits for the quadrant and 6 bits for the street number. This kind of thinking can potentially significantly shrink your data size.
You might be able to perform some database normalization.
Look for anything that might be redundant or the same values being stored in multiple rows. You will probably need to restructure your database so these duplicate values (if any) are stored in separate tables and then referenced from their original row by means of id's.
How big is the sqlite file compressed? If it's satisfactorily small, the simplest thing would be to ship it compressed, then uncompress it to NSCachesDirectory.
I'm building an application which has a "record" feature which records user interaction over time. As time progresses, I fill an array in memory with "state" objects representing the current state of the user input. A typical recording will result in about 5k of these objects.
I then archive this data using NSKeyedArchiver archiveRootObject: toFile:. This works fine, however the file size is very large (3.5 megs or so). My question is this:
Is there any inherent file-size overhead involved in archiving files? Would I be able to save this data using much less disk space if I were to use SQLite, or even roll my own file format? Or is the only way to reduce the disk size of the data going to be to reduce the bit depth of the numbers I'm storing?
If your concern is performance, Core Data gives you more granularity. You can lazy load and save by parts during app execution vs loading/saving the whole 3.5Mb object graph.
If your concern is file size, this is the binary plist format, and this is the SQLite file format. But more important than the overhead, is how complex is the translation between your object graph and the Core Data model.
You may also be interested in this comparison of speed and performance for several file formats: https://github.com/eishay/jvm-serializers/wiki/ Not sure if everything there has an C, C++ or objective-C implementation.
3.5 MB isn't a very large file. However, if your app has to load or save a 3.5 MB file all the time, then using Core Data is a lot smarter as this allows you to save only the data that has changed and retrieve only the parts that you're interested in -- not the whole thing every time.
If storage is the main concern, there would be little difference b/w sqlite and core data.
I had to store UIViewControllers with state in an app, where I ended up not saving the serialized objects but saving only the most specific properties and creating a class which read that data and re-allocated those objects.
The property map was then stored in a csv [admittedly very difficult to manage, but small like anything] and then compressed.
We have about 10 sqlite files getting downloaded in our app and each of which contains about 4000 rows. We process that data and display it in a tableview. We are running into speed and memory issues when scrolling through the tableview.
We were thinking whether instead of sqlite files, if we have csv files or some other format, can we get better performance than sqlite? I have read that xml or json won't help since the number of records is too huge and parsing time would go up.
Please suggest.
First, don't assume that SQLite is your bottleneck. I made that same assumption in my own application and spent days trying to optimize the database access, only to run Instruments against it and find that I had a slow string-processing routine in my interface that was bogging things down.
Use Time Profiler and Object Allocations first to verify where your hotspots are in code. SQLite is ridiculously fast.
That said, with 4000 rows, you will probably run into memory issues at the least if you try to load all of them into an array for display to the screen. My recommendation would be to import that data into a Core Data SQLite database and use an NSFetchedResultsController with a batch size set for its fetch request to be slightly larger than the number of rows displayed onscreen.
Core Data will handle the loading / unloading of batched data this way, meaning that only a small part of the database is loaded into memory at once. This can lead to a tremendous speedup (particularly on the initial load) and will significantly reduce memory usage. It also does it using a trivial amount of code.
A properly indexed SQLite database will run circles around any flat file, especially if you have a lot of records. Also try consolidating those 10 files into 1 database, so you can perform joins on indexed columns and use clever tricks such as views. Right now it seems like you're pulling data from 10 different databases and manually comparing/processing them, which would of course take a lot of time and memory.
It is going to depend on the application, how you are using and querying the data. Profile it, confirm that sqlite is or isn't the problem. Then attack whatever the profiling turns up.
Profilers: Shark
Or some other profiling solution
In the app I am working on now I was storing about 500 images in Core Data. I have since pulled those images out and store them in the file system now, but in the process I found that the app would crash on the device if I had an array of 500 objects with image data in them. An array with 500 object ids with the image data in those objects worked fine. The 500 objects without the image data also worked fine. I found that I got the best performance with both an array of object ids and image data stored on the filesystem instead of in core data.
The conclusion I came to was that if I had an object in an array that told Core Data I was "using" that object and Core Data would hold on to the data. Is this correct?
Short answer is yes.
The long answer is that it depends on the size of the images. The rule is:
less than 100kb store it in a main table.
less than 1mb store it in a secondary table on the other end of a relationship.
greater than 1mb, store it on disk and reference it via a file path.
So, depending on the size of your files will determine where to store them. Also, keep in mind that UIImage handles caching so you may not need to store the images in an array at all.
Update
Your question is unclear then. You do not need to store the images in an array because A) the image is being held by the cell and; B) UIImage will cache the image for you so it will not be retrieved from disk if it has been accessed lately. So you are forcing the retention of images unnecessarily.
With regard to Core Data itself, it will drop attributes out of memory as needed as well. It will automatically pull them back into memory when accessed. Core Data also caches the data so you should not see any performance issues there either as things are being moved around in memory.