How big xml file we can parse in iPhone application - iphone

I will have xml file on server. It will store information of about 600 stores. information includes name, address, opening time , coordinates. So is it ok to parse whole file into iphone then select nearest stores according to coordinates?
I am thinking about processing time and memory use
Please suggest

The way I would do this is write a web service and pass it the coordinates and download only those within a certain radius. Always try to download as little data as possible to the iPhone (especially xml data)

I just put this here
http://quatermain.tumblr.com/post/93651539/aqxmlparser-big-memory-win

A simple solution would be to group them into clusters that are somehow related, probably by location. You already have an XML on a server, so simply split them up into 3 groups of related stores of around 200, or preferably even smaller. I'm not entirely sure on why you would want to store 600 data points of that nature. I feel that if you filter/shrink on the server side you could be saving a lot of time/memory.
I have seen people storing 300-400 data points, though it is so dependent on how large your defined objects in your Core Database are, that it is probably best for you to just run some tests.

Related

How would you minimize or compress Core Data sqlite file size?

I have a 215MB csv file which I have parsed and stored in core data wrapped in my own custom objects. The problem is my core data sqlite file is around 260MB. The csv file contains about 4.5million lines of data on my city's transit system (bus stop, times, routes etc).
I have tried modifying attributes so that arrays of strings representing stop times are stored instead as NSData files but for some reason the file size still remains at around 260MB.
I can't ship an app this size. I doubt anyone would want to download a 260MB app even if it means they have the whole city's transit schedule on it.
Are there any ways to compress or minimize the storage space used (even if it means not using core data, I am willing to hear suggestions)?
EDIT: I just want to provide an update right now because I have been staring at the file size in disbelief. With some clever manipulation involving strings, indexing and database normalization in general, I have managed to reduce the size down to 6.5MB or 2.6MB when compressed. About 105,000 objects stored in Core Data containing the full details of the city's transit system. I'm almost in tears right now D':
Unless your original CSV is encoded in a really foolish manner, it seems unlikely that the size is not going to get below 100M, no matter how much you compress it. That's still really large for an app. The solution is to move your data to a web service. You may want to download and cache significant parts, but if you're talking about millions of records, then fetching from a server seems best. Besides, I have to believe that from time to time the transit system changes, and it would be frustrating to have to upgrade a many-10s-of-MB app every time there was a single stop adjustment.
I've said that, but actually there are some things you may consider:
Move booleans into a bit fields. You can put 64 booleans into an NSUInteger. (And don't use a full 64-bit integer if you just need 8 bits. Store the smallest thing you can.)
Compress how you store times. There are only 1440 minutes in a day. You can store that in 2 bytes. Transit times are generally not to the second; they don't need a CGFloat.
Days of the week and dates can similarly be compressed.
Obviously you should normalize any strings. Look at the CSV for duplicated string values on many lines.
I generally would recommend raw sqlite rather than core data for this kind of problem. Core Data is more about object persistence than raw data storage. The fact that you're seeing a 20% bloat over CSV (which is not itself highly efficient) is not a good direction for this problem.
If you want to get even tighter, and don't need very good searching capabilities, you can create packed data blobs. I used to do this on phone switches where memory was extremely tight. You create a bit field struct and allocate 5 bits for one variable, and 7 bits for another, etc. With that, and some time shuffling things so they line up correctly on word boundaries, you can get pretty tight.
Since you care most about your initial download size, and may be willing to expand your data later for faster access, you can consider very domain-specific compression. For example, in the above discussion, I mentioned how to get down to 2 bytes for a time. You could probably get down to 1 bytes in many cases by storing times as delta minutes since the last time (since most of your times are going to be always increasing by fairly small steps if they're bus and train schedules). Abandoning the database, you could create a very tightly encoded data file that you could extract into a database on first launch.
You also can use domain-specific knowledge to encode your strings into smaller tokens. If I were encoding the NY subway system, I would notice that some strings show up a lot, like "Avenue", "Road", "Street", "East", etc. I'd probably encode those as unprintable ASCII like ^A, ^R, ^S, ^E, etc. I'd probably encode "138 Street" as two bytes (0x8A13). This of course is based on my knowledge that รจ (0x8a) never shows up in the NY subway stops. It's not a general solution (in Paris it might be a problem), but it can be used to highly compress data that you have special knowledge of. In a city like Washington DC, I believe their highest numbered street is 38th St, and then there's a 4-value direction. So you can encode that in two bytes, first a "numbered street" token, and then a bit field with 2 bits for the quadrant and 6 bits for the street number. This kind of thinking can potentially significantly shrink your data size.
You might be able to perform some database normalization.
Look for anything that might be redundant or the same values being stored in multiple rows. You will probably need to restructure your database so these duplicate values (if any) are stored in separate tables and then referenced from their original row by means of id's.
How big is the sqlite file compressed? If it's satisfactorily small, the simplest thing would be to ship it compressed, then uncompress it to NSCachesDirectory.

Load and perform search on large amount of data

I need a suggest how to operate with large amount of data on iPhone. Let say I have xml file with ~120k text records. I need to perform search on this data. The solution i have tried is to use Core Data to store information in sorted order in caches. And then use binary search which works fast. But the problem is to build this caches. On first launch application takes about 15-25 seconds to build this caches. Maybe I need to use different approach to search the data?
Thanks in advance.
If you're using an XML file with the requirement that you can't cache, then you're not going to succeed unless you somehow carefully format your XML file to have useful data traversal properties -- but then you may as well use a binary file that's more useful unless you have some very esoteric requirements.
Really what you want is one of the typical indexing algorithms (on disk hash, B-tree, etc) from the get-go.
However...
If you have to read in and parse your XML text file, then you can skirt using a typical big and slow generic XML parser and write a fast hackish version since most of the data records you'll need to recognize are probably formatted the same way over and over. Nothing special, just find where the relevant data fields start, grab the data until it ends, move on to the next data field.
Honestly, 120k of text isn't very much-- sounds like whatever XML parser you're using is just slow. (I use this trick all the time for autogenerated XML data that just represents things like tables or simple data records -- my own parser is faster than any generic XML parser.)
This is probably the solution you actually want since you sound fairly attached to the XML file format. It won't be as error-proof as a generic XML parser if you're not careful, however it will eat that 120KB file up like nobody's business. And it's entry level CS work -- read in a file with certain specific formatting and grab the data values from it. Regexps are your friend if you have access to them.
Try storing and doing your searches in the cloud. (using a database stored on a server somewhere)
Unless you specifically need ALL of the information on the device..

How to load data into Core Data?

thanks for you help.
I'm attempting to add core data to my project and I'm stuck at where and how to add the actual data into the persistent store (I'm assuming this is the place for the raw data).
I will have 1000 < objects so I don't want to use a plist approach. From my searches, there seems to be xml and csv approaches. Is there a way I can use SQL for input?
The data will not be changed by the user and the data file will be typed in by hand, so I won't need to update these files during runtime, and at this point I am not limited in any type of file - the lightest on syntax is preferred.
Thanks again for any help.
You could load your data from an xml/csv/json file and create the DB on the first lunch of your application (if the DB is not there, then read the data and create it).
A better/faster approach might be to ship your sqllite DB within your application. You can parse the file in any format you want on the simulator, create a DB with all your entities, then take it from the ApplicationData and just add it to your app as a resource.
Although I'm sure there are lighter file types that could be used, I would include a JSON file into the app bundle from which you import the initial dataset.
Update: some folks are recommending XML. NSXMLParser is almost as fast as JSONKit (but much faster than most other parsers), but the XML syntax is heavier than JSON. So an XML bundled file that holds the initial dataset would weight more than if it was in JSON.
Considering Apple considers the format of its persistent stores implementation details, shipping a prefabricated SQLite database is not a very good idea. I.e. the names of fields and tables may change between iOS versions/phones/whatever hidden variable you can think of. You should, in general, not concern yourself with how this serialization of your data is formatted.
There's a brief article about importing data on Apple's developer site: Efficiently Importing Data
You should ship initial data in whatever format you're comfortable with (XML allows you to do incremental parsing efficiently, which reduces memory footprint) and write an import routine to run if you need to import data.
Edit: With EliBud's comment in mind, I still consider the approach a bit "iffy"... The format of the SQLite database used by Core Data is not something you'd want to generate by yourself (it's weird, simply put, and still not something you should really rely on).
So you'd want to use a mock app running on the Simulator and use Core Data to create the database (as per EliBud's answer). But you'd still have to import the data into that mock-app! And while it might make sense to do this once on a "real" computer instead of a lot of times on a mobile device (i.e. copying a file is easy, importing data is hard), you're essentially using the Simulator as an administration tool.
But hey, if it works...

Saving fetched data on disk

I'm creating an iPhone app, which fetches information from a server every time it is started. However, i'm planning on using the fetched data of the last month/few months/year to calculate some averages.
I had been thinking about saving them to NSUserDefaults using dictionaries (associating a date with a value), but i just remembered there also exists something like core data. Seeing that i do not have any experience with core data, i don't know if it's better. If it wouldn't, i could save the
time i'd use learning it otherwise.
The data comes in in XML format, and i get several sets of the same response each time (for different locations on a map). The amount of sets can change, as the user can add more locations. I currently only save the raw data to the disk to load if the load fails next time it starts. However, i also want to save some specific values from that XML in a way that i can easily access it. What would be the best way to do this?
Edit: I actually also need to know how fast/efficient core data is. I'm currently passing around NSArrays with NSDictionaries for the sets of data during that session. For saving the data that last longer than the session core data is ideal, i found out that much (just need to find a nice way to associate an entity with a date), i just need some advice on the efficiency.
If you're going to be working with larger amounts of data, it's probably anyway better to give Core Data a try, it's after all not that complicated and there's a plenty of good tutorials where you can learn it. There are different settings for the storage type, you can either use a sqlite database or an xml file.
According to the guys from Apple, it should be fast and memory effective to use Core Data in contrast to self-made solutions, so it's a preferred way to go.
Core data would be easier to manipulate the data and query the data using predicates. Core Data supports dates so you can even find items in date ranges.

coredata vs file access

I have 100s of file which needs to be accessed for displaying the content on iphone. They are all plists.
Which one is faster core data or file access ? which one is secured ?
You have to consider the file size first, a nice rule of thumb found in these boards is, if the file is under 100kB you can store it as an attribute in an entity as a BLOB, if it is greater that that you maybe want to create a ad-hoc entity for it, and in the end if it exceeds 1 MB in size you can access it through the filesystem.
Secondly, you shall evaluate the cost of the operation too, 100 files may appear many but if you access them few times, maybe file access is the way to go, on the other hand if you need that stored information multiple times frequently but you can even create ad hoc entities for Core Data and load the files at start up. And so on.
This is a nice book on Core Data. You can find many guide lines by reading it, but keep in mind also the general guide lines of designing databases.
If they are static files I would recommend pre-loading them into a Core Data SQLite file. That would yield far better performance, especially if you structure your model properly.