How often should I save to Core Data?

How often should I save to Core Data? - iphone

I'm working on an application backed by Core Data.
Right now, I'm saving the Object Context as and when I add or delete an entity to and from the Context.
I'm afraid it will affect the performance, so I was thinking of delaying the save.
In fact, I could delay it all the way until the application is gonna terminate.
Is it too risky to save the data only when the application is about to close? How often should I call the save on Object Context?
I was thinking of having a separate thread handle the save: it will wait on a semaphore. Every time any part of the application calls a helper/util method to save the Core Data, it will decrement the semaphore. When it is down to zero, the "save thread" will do a save once and it increments the semaphore to a, say, 5, and then sleep again.
Any good recommendation?
Thanks!

You should save frequently. The actual performance of the save operation has a lot to do with which persistent store type you're using. Since binary and XML stores are atomic, they need to be completely rewritten to disk on every save. As your object graph grows, this can really slow down your application. The SQLite store, on the other hand, is much easier to write to incrementally. So, while there will be some stuff that gets written above and beyond the objects you're saving, the overhead is much lower than with the atomic store types. Saves affecting only a few objects will always be fast, regardless of overall object graph size.
That said, if you're importing data in a loop, say, I would wait until the end of the complete operation to save rather than saving on each iteration. Your primary goal should be to prevent data loss. (I have found that users don't care for that very much!) Performance should be a close second. You may have to do some work to balance the frequency of saving against performance, but the solution you outline above seems like overkill unless you've identified a specific and significant performance issue.

One issue not mentioned here in other answers is that your solution, which involves using a background thread, should not be operating on a managed object context used in another thread. Generally you make a new MOC for background threads, but that would defeat the purpose of saving if you saved to a different/unmodified background MOC.
So a few answers to your question:
You would need to call back your original thread to save the MOC
As the current accepted answers suggests the whole counter might be overkill for your needs unless a performance issue was measured.
If a performance issue WAS measured, you could take a simple throttling technique where you set a limit of, say, 1 save per 10 seconds. Store the Date of the last time you saved. When your save function is called always make sure the current time is > 10 seconds since your last save otherwise, early return.
You really want to be saving immediately as much as possible, so at the very least my recommendation is to throttle rather than arbitrarily set any timer or countdown.

The best way I think, is to save after every object. If something ever happens such as a sudden crash nothing will be lost.
Some performance enhancements, if you adding a lot of objects is to batch. Add all objects to the context than save. This is good for example if you adding a lot objects in a loop. Your idea is similar, but there could be a long time between saves, in which the program could crash.
I don't think adding a single object would be a that much of a performance problem. How big are your objects, do they contain a lot of data?

Related

nsuserdefaults synchronize method slows down the application

I am doing a calculation intensive operation in loops(hundreds for iterative formulas).In each loop the values are fetched from nsuserdefaults directly and calculated and saved back.my question is that should i use -synchronize method each time i write into nsuserdefaults?.i think without using this method. my application runs much faster. Does using synchronize slows down the calculations

Does using synchronize slows down the calculations?
Yes, absolutely. synchronize writes the current user default values to the disk.
should i use -synchronize method each time i write into nsuserdefaults?.
No absolutely not. If you have a long loop, where you are changing user defaults, the values are saved in memory. It won't mess up your calculations. It is only necessary to save to disk after the loop is done.
synchronize is usually done:
manually, before the app is terminated or sent to background
automatically by the system every few minutes
manually by the program after some important changes are made that you don't want to risk losing in the event of a crash or sudden power off.
In your case, after the long loop, you want to do it for reason 3.
By doing it every time within the loop, you are just unnecessarily writing values to flash, which you likely immediately overwrite.

No! You should not. Consider to synchronize in applicationWillTerminate.

No. In theory you never need to call it at all, it will be done for you (it “is automatically invoked at periodic intervals”). In practice, it's a good idea to do so in applicationWillResignActive:.

what is NSZone? What are the advantages of using initWithZone:?

There are so many functions like
1. NSDefaultMallocZone()
2. NSCreateZone();
3. NSRecycleZone();
4. NSSetZoneName();
5. NSZoneMalloc();
and many more related to NSZone
What does NSZone means, where to use these functions and when?
What are the advantages of initWithZone: and how to use in my iphone app?

NSZone is Apple's way of optimizing
object allocation and freeing. NSZone
is not an object; it is an opaque
C-struct storing information about how
memory should be handled for a set of
objects.
One rarely needs to worry about
handling your own zones in
applications; Cocoa handles it
transparently. A default NSZone is
created on startup and all objects
default to being allocated there. So
why would you want to use your own?
If you are mass-allocating hundreds of
cheap objects, you may find the cost
of actually allocating space for them
becomes significant. Because the
standard zone is used all the time, it
can become very patchy; deleted
objects can leave awkward gaps
throughout memory. The allocator for
the standard NSZone knows this, and it
tries to fill these gaps in preference
to grabbing more memory off the
system, but this can be costly in time
if the zone has grown quite large.
If you want to mass-allocate objects,
then, you can create your own zone and
tell it not to bother with finding
gaps to put new objects in. The
allocator can now jump to the end of
its allotted memory each time and
quickly assign memory to your new
objects, saving a lot of effort.
Allocators can save you time
elsewhere, too, as asking the OS for
more memory, which a zone needs to do
whenever it fills up, is another
costly operation if it's done a lot.
Much quicker is to ask for huge chunks
of memory at a time, and you can tell
your NSZone what to do here as well.
Rumor has it that NSZone could save
you deallocation time in the Good Old
Days, too, with a method that simply
chucks away all the allotted memory
without bothering to call
deallocators. If a set of objects is
self-contained, this could save a lot
of time, as you can chuck them all
away at once without tediously
deallocating them all. Alas, there
appears to be no sign of this godsend
in the current documentation; the
single NSZone method (NSRecycleZone?)
carefully puts all the objects in a
zone neatly on the default NSZone. Not
exactly a huge time-saver.
So, in summary, zones save you time in
mass allocations. But only if
programmers know how to use them!
From CocoaDev

How do I improve performance of Core Data object insert on iPhone?

I'm trying to import a large amount of data into a core data store on the iPhone. I'm using a SQLite backing for the core data store. It seems to be taking way longer than I would expect it to. I've trimmed down the routines so that it is basically just attempting to a fetch an object (to see if it already exists) and then create a new object if it doesn't (they never do since I am importing data). The fetching isn't the time consuming part, though. It's the creation of the objects. Basically, the offending code is:
MobileObject *newObject = (MobileObject *)[NSEntityDescription insertNewObjectForEntityForName:objDesc inManagedObjectContext:managedObjectContext];
I've noticed that on the simulator, it is fairly quick at the start with about 100 objects created a second. It slows down though and by the time five thousand objects are created it's almost 2 seconds for 100 objects and by the time ten thousand objects are created, it's 4 seconds per 100 objects. The whole group of 21000 objects takes more than 10 minutes. That is with all the actual useful code taken out (that's just a fetch and an object create). And it's much much slower on the actual device (by maybe 4 times as much).
What I don't understand is why core data starts off fast but then begins to slow down. I've tried both with index and no indexes on my data. I've tried creating my own autorelease pool which I periodically drain in my loop. I've tried saving after every object creation. I've tried waiting until the end to save. But no matter what I do, the performance still seems miserable. Is it just that slow to add a new object to a core data store with a few thousand objects in it? Any suggestions?

It can be quite speedy but it depends on what you are doing. As others have suggested you should be looking at Instruments and finding the actual hotspot. Also posting the actual import code would help to identify the issue.

Try using Instruments. Don't you save after inserting every single object? Actually, more insert-related code and scheme may be very useful.

Reasons for & against a Database

i had a discussion with a coworker about the architecture of a program i'm writing and i'd like some more opinions.
The Situation:
The Program should update at near-realtime (+/- 1 Minute).
It involves the movement of objects on a coordinate system.
There are some events that occur at regular intervals (i.e. creation of the objects).
Movements can change at any time through user input.
My solution was:
Build a server that runs continously and stores the data internally.
The server dumps a state-of-the-program at regular intervals to protect against powerfailures and/or crashes.
He argued that the program requires a Database and i should use cronjobs to update the data. I can store movement information by storing startpoint, endpoint and speed and update the position in the cronjob (and calculate collisions with other objects there) by calculating direction and speed.
His reasons:
Requires more CPU & Memory because it runs constantly.
Powerfailures/Crashes might destroy data.
Databases are faster.
My reasons against this are mostly:
Not very precise as events can only occur at full minutes (wouldn't be that bad though).
Requires (possibly costly) transformation of data on every run from relational data to objects.
RDBMS are a general solution for a specialized problem so a specialized solution should be more efficient.
Powerfailures (or other crashes) can leave the Data in an undefined state with only partially updated data unless (possibly costly) precautions (like transactions) are taken.
What are your opinions about that?
Which arguments can you add for any side?

Databases are not faster. How silly... How can a database be faster than writing a custom data structure and storing it in memory ?? Databases are Generalized tools to persist data to disk for you so you don't have to write all the code to do that yourself. Because they have to address the needs of numerous disparate (and sometimes inconsistent) business functions (Persistency (Durability), Transactional integrity, caching, relational integrity, atomicity, etc. etc. ) and do it in a way that protects the application developer from having to worry about it so much, by definition it is going to be slower. That doesn't necessarilly mean his conclusion is wrong however.
Each of his other objections can be addressed by writing the code to address that issue yourself... But you see where that is going... At some point, the development efforts of writing the custom code to address the issues that are important for your application outweigh the performance hit of just using a database - which already does all that stuff out of the box... How many of these issues are important ? and do you know how to write the code necessary to address them ?

From what you've described here, I'd say your solution does seem to be the better option. You say it runs once a minute, but how long does it take to run? If only a few seconds, then the transformation to relational data would likely be inconsequential, as would any other overhead. most of this would take likely 30 seconds. This is assuming, again, that the program is quite small.
However, if it is larger, and assuming that it will get larger, doing a straight dump is a better method. You might not want to do a full dump every run, but that's up to you, just remember that it could wind up taking a lot of space (same goes if you're using a database).
If you're going to dump the state, you would need to have some sort of a redundancy system in place, along with quasi-transactions. You would want to store several copies, in case something happens to the newest version. Say, the power goes out while you're storing, and you have no backups beyond this half-written one. Transactions, you would need something to tell that the file has been fully written, so if something does go wrong, you can always tell what the most recent successful save was.
Oh, and for his argument of it running constantly: if you have it set to a cronjob, or even a self-enclosed sleep statement or similar, it doesn't use any CPU time when it's not running, the same amount that it would if you're using an RDBMS.
If you're writing straight to disk, then this will be the faster method over a database, and faster retrieval, since, as you pointed out, there is no overhead.
Summary: A database is a good idea if you have a lot of idle processor time or historical records, but if resources are a legitimate concern, then it can become too much overhead and a dump with precautions taken is better.

mySQL can now model spatial data.
http://dev.mysql.com/doc/refman/4.1/en/gis-introduction.html
http://dev.mysql.com/doc/refman/5.1/en/spatial-extensions.html
You could use the database to keep track of world locations, user locations, items locations ect.

Is it a good idea to warm up a cache in the BEGIN block in Perl?

Is it a good idea to warm up cache in the BEGIN block, when it gets used?

You didn't really provide any information on what kind of environment you're talking about, which I think is important. In most cases the answer is probably "no", but I can think of one case where it's a definite yes, which is preforking servers -- web applications and the like. In that case, any work that you can do "before the fork" not only saves the cost of having the children recompute the same values individually, it alo saves memory, since the pages containing the results can be shared across all of the child processes by the OS's COW mechanism.
If you're talking about a module you're writing and not an application, then I'd say no, don't lift things to compilation time without the user's permission unless they're things that have to be done for the module to work. Instead, provide a preheat_cache class method, and if your caller has a reason to need a hot cache at compile time they can put the call into a BEGIN block themselves. You could also use a :preheat_cache import tag but that's unnecessarily fancy in my book.

If it's a choice between preloading your cache at compile time, or preloading your cache as the first thing you do at run time, there's virtually no difference.
If your cache is large enough that loading it will trigger a lot of page swaps, that's an argument for waiting until run time. That way, all your module loading and other compile time code can be done while your system is under a lighter load.

I'm going to go with "no", even though I could be wrong. Reasoning goes like this: keep the code, and data it uses, small, so that it takes up less space in any caches (I am presuming you mean CPU cache, not programmatic hashes with common query results or some such thing).
Unless you see some sort of bad access pattern, trying to second guess what needs to be prefetched is probably useless at best. In fact such code or initialization data is likely to displace something you (or another process on the system) were actually using. Think about what you can do in the actual work part of the code to maximize locality of reference, to try to stay within smaller memory regions at any one time.
I used to use "top" to detect when processes were swapping between memory and disk. I don't know of any good tools yet to tell how often a process is getting cache misses and going to plain old slow mo'board memory. There must be such tools, I just don't know what they are yet (software tools, rather than some custom In Circuit Emulator type hardware). Perhaps some thought on this earlier in the day...

by warm up I assume you mean use BEGIN() to guarantee the cache is preloaded before anything else in your script executes?
If you need the cache for your program to run properly, then yes, I think it would be a good idea.