Decide which caching startegy to use? - iphone

I want to cache my loaded data so that I can reduce my application start time .
I know several strategies to store application data
i.e. core data, nsuserdefaults , archiving .
Now my scenario is that suppose that I have array of maximum 10 objects each object having 5 fields .
So I can not decide which strategy to store this array an later retrieving the same .
Thanks .

Never store cache data in NSUserDefaults; that is not what it is for.
Archiving is expensive and should not be used. It is also far more difficult to manage.
Core Data is almost always the right answer unless the data storage is trivial.
Update
Archiving, also known as serializing is one of the most expensive ways to write data to disk compared to other formats. The exact details are difficult to explain in an answer here but it boils down to an old design that does not perform nearly as well as the more modern persistence systems such as Core Data. Putting the two side by side you will see significant performance increases (due to internal threading, caching, database support on the backend, etc.) when using Core Data.
The fact that Core Data also handles your data model life cycle and structure is just gravy on top of that performance increase.

Related

Too much data duplication in mongodb?

I'm new to this whole NOSQL stuff and have recently been intrigued with mongoDB. I'm creating a new website from scratch and decided to go with MONGODB/NORM (for C#) as my only database. I've been reading up a lot about how to properly design your document model database and I think for the most part I have my design worked out pretty well. I'm about 6 months into my new site and I'm starting to see issues with data duplication/sync that I need to deal with over and over again. From what I read, this is expected in the document model, and for performance it makes sense. I.E. you stick embedded objects into your document so it's fast to read - no joins; but of course you can't always embed, so mongodb has this concept of a DbReference which is basically analogous to a foreign key in relational DBs.
So here's an example: I have Users and Events; both get their own document, Users attend events, Events have users attendees. I decided to embed a list of Events with limited data into the User objects. I embedded a list of Users also into the Event objects as their "attendees". The problem here is now I have to keep the Users in sync with the list of Users that is also embedded in the Event object. As I read it, this seems to be the preferred approach, and the NOSQL way to do things. Retrieval is fast, but the fall-back is when I update the main User document, I need to also go into the Event objects, possibly find all references to that user and update that as well.
So the question I have is, is this a pretty common problem people need to deal with? How much does this problem have to happen before you start saying "maybe the NOSQL strategy doesn't fit what I'm trying to do here"? When does the performance advantage of not having to do joins turn into a disadvantage because you're having a hard time keeping data in sync in embedded objects and doing multiple reads to the DB to do so?
Well that is the trade off with document stores. You can store in a normalized fashion like any standard RDMS, and you should strive for normalization as much as possible. It's only where its a performance hit that you should break normalization and flatten your data structures. The trade off is read efficiency vs update cost.
Mongo has really efficient indexes which can make normalizing easier like a traditional RDMS (most document stores do not give you this for free which is why Mongo is more of a hybrid instead of a pure document store). Using this, you can make a relation collection between users and events. It's analogous to a surrogate table in a tabular data store. Index the event and user fields and it should be pretty quick and will help you normalize your data better.
I like to plot the efficiency of flatting a structure vs keeping it normalized when it comes to the time it takes me to update a records data vs reading out what I need in a query. You can do it in terms of big O notation but you don't have to be that fancy. Just put some numbers down on paper based on a few use cases with different models for the data and get a good gut feeling about how much works is required.
Basically what I do is first try to predict the probability of how many updates a record will have vs. how often it's read. Then I try to predict what the cost of an update is vs. a read when it's both normalized or flattened (or maybe partially combination of the two I can conceive... lots of optimization options). I can then judge the savings of keeping it flat vs. the cost of building up the data from normalized sources. Once I plotted all the variables, if the savings of keeping it flat saves me a bunch, then I will keep it flat.
A few tips:
If you require fast lookups to be quick and atomic (perfectly up to date) you may want a favor a solution where you favor flattening over normalization and taking the hit on the update.
If you require update to be quick, and access immediately then favor normalization.
If you require fast lookups but don't require perfectly up to date data, consider building out your normalized data in batch jobs (using map/reduce possibly).
If your queries need to be fast, and updates are rare, and do not necessarily require your update to be accessible immediately or require transaction level locking that it went through 100% of the time (to guarantee your update was written to disk), you can consider writing your updates to a queue processing them in the background. (In this model, you will probably have to deal with conflict resolution and reconciliation later).
Profile different models. Build out a data query abstraction layer (like an ORM in a way) in your code so you can refactor your data store structure later.
There are lot of other ideas that you can employ. There a lot of great blogs on line that go into it like highscalabilty.org and make sure you understand CAP theorem.
Also consider a caching layer, like Redis or memcache. I will put one of those products in front my data layer. When I query mongo (which is storing everything normalized), I use the data to construct a flattened representation and store it in the cache. When I update the data, I will invalidate any data in the cache that references what I'm updating. (Although you have to take the time it takes to invalidate data and tracking data in the cache that is getting updated into consideration of your scaling factors). Someone once said "The two hardest things in Computer Science are naming things and cache invalidation."
Try adding an IList of type UserEvent property to your User object. You didn't specify much about how your domain model is designed. Check the NoRM group http://groups.google.com/group/norm-mongodb/topics
for examples.

Core Data NSFetchedResultsController Performance Advantages Over NSArray?

Does using an NSFetchedResultsController provide any performance advantages on the iPhone over an NSArray?
I have between 4,000 and 8,000 records stored in core data and wanted to know if I should select one over the other. Is the NSFetchedResultsController just used to make code 'prettier'?
My concern is searching, and lag on keyboard button presses (as well as the issue of loading that many records into memory). Thanks!
Given your parameters, Core Data will be faster than an array especially if you make any changes to the data.
The disadvantage of an array in this case is that you have to load the entire array into memory in one go.
It might seem obvious that Core Data will be slower than more primitive methods but owing to fine tuned optimization plus the ease of integration with the rest of the API it is actually fairly hard to beat Core Data in real world apps with significant amounts of data.

Core Data for temporary data

Is Core Data still a good option to use for the iOS even if they data will only be quite temporary. i.e – data being sent up to a server in the cloud once within range of a network, and then never needed again on the mobile device.
You don't have to use Core Data to persist data.
If you don't want to persist the data at all, you can define an in-memory store which never writes to disk.
Core Data's true function is the management of an object graph i.e. it handles the relationships between objects. It's real advantage is the ability to automatically handle complexity.
That complexity can arise from the data objects themselves or from their needed relationships to controller or view objects. Either way, Core Data makes it easy to tie all the objects together without great gobs of custom code. Where the objects end up being persisted or even persisted at all, is really secondary.
Yes, if there could ever be a large number of records (e.g. user is overseas, doesn't have a data connection), use Core Data. The point of the Core Data abstraction is just this - if you only have 10 records max, ever, it may just use a flat file of data, or maybe a Sqlite database if more than that -- by "handing the problem" over to Core Data, you make storage decisions Apple's problem, and for free you'll get all the optimizations that Apple'll throw into the Core Data framework in the coming years.
Core Data is complex when you first look at it. Apple's API docs aren't bad, but there are a few "gotchas". If you've worked with anything like the Entity ORM framework, etc, it's really easy to pick up.
Alternatively, if you're reasonably certain that you're only going to get 5-10 records, and the data is anything that conforms to NSCoder, you could just archive it and save it, and then unarchive it when you launch the app. Also, if it's array data, a plist is pretty nice.
The approach I take is to insert entities into a NIL context and provide a base class for insertion into a valid context when I wish to persist them. Code can be found here... Temporary Core Data

What's the best way to store static data in an iOS app?

I have in my app a considerable amount of data that it needs to access, but will never be changed by the app. Currently I'm using this data in other applications in JSON files and SQL databases, but neither seems very straightforward to use in iOS.
I don't want to use CoreData, which provides tons of unnecessary functionality and complexity.
Would it be a good idea store the data in PropertyList file and build an accessor class? Are there any simple ways to incorporate SQLite without going the CoreData route?
You can only use plist if the amount of data is relatively small. Plist are entirely loaded into memory so you can only really use them if you can sustain all the objects created by the plist in memory at once for as long as you need them.
Core Data has a learning curve but in use it is usually less complex than SQL. In most cases the "simpler" SQL leads to more coding because you end up having to duplicate much of the functionality of Core Data to shoehorn the procedural SQL into the object-oriented API. You have to manually manage the memory use of all the data by tracking retention. You've write a lot of SQL code every time you want data. I've updated several apps from SQL to Core Data and in all cases the Core Data implementation was smaller and cleaner than the SQL.
Neither is the memory or processor "overhead" any larger. Core Data is highly optimized. In most cases, off the shelf Core Data is more efficient than hand tuned SQL. One minor sub optimization in SQL usually destroys any theoretical advantage it might have.
Of course, if you're already highly skilled at managing SQL in C then you personally might get the app to market more quickly by using SQL. However, if you're wondering what you should plan to use in general on on Apple Platforms, Core Data is almost always the answer and you should take the time to learn it.
You can just use SQLite directly without the overhead of Core Data using the SQLite C API.
Here is a tutorial I found on your use-case - simply loading some data from an SQLite database. Hope this helps.
Depending on the type of your data, the size and how often it changes, you may desire to just keep things simple and use a property list. Otherwise, using SQLite (documented in Jergason's answer) would be where I'd go. Though let me say that if you have a relatively small (less than a couple hundred) set of basic types (arrays, dictionaries, numbers, strings) that don't change frequently, then a property list will be a better choice in my opinion.
As an example to that, in one of my games, I create the levels from a single property list per difficulty. Since there are only a handful of levels per difficulty (99) and a small set of parameters for each (number of elements in play, their initial positions, mass, etc) then it makes sense, and I avoid having to deal with SQLite directly or worse yet, setting up and maintaining CoreData.
What do you mean by "best"? What kind of data?
If it's a bunch of objects, then JSON or (binary) plist aren't terrible formats, since you'll want the whole thing loaded in memory to walk the object graph. Compare space efficiency and loading performance to pick which one to use.
If it's a bunch of binary blobs, then store the blobs in a big file, memory-map the file (NSDataReadingMapped a.k.a. NSMappedRead), and use indexes into the blobs. iOS frameworks use a mixture of these (e.g. there are a lot of .pngs, but also "other.artwork" which just contains raw image data).
You can also use NSKeyedArchiver and friends if your classes implement the NSCoding protocol, but there's some object graph management overhead and the plist format it produces isn't exactly nice to work with.

Optimal way to persist an object graph to flash on the iPhone

I have an object graph in Objective-C on the iPhone platform that I wish to persist to flash when closing the app. The graph has about 100k-200k objects and contains many loops (by design). I need to be able to read/write this graph as quickly as possible.
So far I have tried using NSCoder. This not only struggles with the loops but also takes an age and a significant amount of memory to persist the graph - possibly because an XML document is used under the covers. I have also used an SQLite database but stepping through that many rows also takes a significant amount of time.
I have considered using Core-Data but fear I will suffer the same issues as SQLite or NSCoder as I believe the backing stores to core-data will work in the same way.
So is there any other way I can handle the persistence of this object graph in a lightweight way - ideally I'd like something like Java's serialization? I've been thinking of trying Tokyo Cabinet or writing the memory occupied by bunch of C structs out to disk - but that's going to be a lot of rewrite work.
I would reccomend re-writing as c structs. I know it will be a pain, but not only will it be quick to write to disk but should perform much better.
Before anyone gets upset, I am not saying people should always use structs, but there are some situations where this is actually better for performance. Especially if you pre-allocate your memory in say 20k contiguous blocks at a time (with pointers into the block), rather than creating/allocating lots of little chunks within a repeated loop.
ie if your loop continually allocates objects, that is going to slow it down. If you have preallocated 1000 structs and just have an array of pointers (or a single pointer) then this is a large magnitude faster.
(I have had situations where even my desktop mac was too slow and did not have enough memory to cope with those millions of objects being created in a row)
Rather than rolling your own, I'd highly recommend taking another look at Core Data. Core Data was designed from the ground up for persisting object graphs. An NSCoder-based archive, like the one you describe, requires you to have the entire object graph in memory and all writes are atomic. Core Data brings objects in and out of memory as needed, and can only write the part of your graph that has changed to disk (via SQLite).
If you read the Core Data Programming Guide or their tutorial guide, you can see that they've put a lot of thought into performance optimizations. If you follow Apple's recommendations (which can seem counterintuitive, like their suggestion to denormalize your data structures at some points), you can squeeze a lot more performance out of your data model than you'd expect. I've seen benchmarks where Core Data handily beat hand-tuned SQLite for data access within databases of the size you're looking at.
On the iPhone, you also have some memory advantages when using controlling the batch size of fetches and a very nice helper class in NSFetchedResultsController.
It shouldn't take that long to build up a proof-of-principle Core Data implementation of your graph to compare it to your existing data storage methods.