How to maintain the huge result of xml parsing

How to maintain the huge result of xml parsing - iphone

I am developing one application. In it I get 1000 results from xml parsing. Every result has different attributes. So I create one class for the attributes and create one object for every result. I save the results in one array. My doubt is due to the fact that this is a lot of results and I may face memory problems. If this is a problem, how do I handle that? Please tell me how to do this.

If you're parsing an exceptionally large document, use NSXMLParser and a delegate object to parse the document. Rather than creating an enormous tree of objects to represent the XML, the parser will call your delegate each time it encounters a new attribute, element, etc. This way you can build up your data objects directly, without wasting memory on an intermediate XML parse tree representation.
Once you are doing this, you can save the objects as you create them, or in batches.
If you're very memory conscious, you can actually use NSXMLParser to parse the input stream as it is downloading, so you never even need to have the full XML text in memory. (To avoid interruptions you could also download to a disk file, then parse from the file.)

Memory management in Objective-C is very well explained in below discussion..
http://www.iphonedevsdk.com/forum/iphone-sdk-tutorials/7295-getters-setters-properties-newbie.html
hope this will answer your question...

Related

Better way to load content from web, JSON or XML?

I have an app which will load content from a website.
There will be around 100 articles during every loading.
I would like to know which way is better to load content from web if we look at:
speed
compatibility (will there be any problems with encoding if we use special characters etc.)
your experience

JSON is better if your data is huge
read more here
http://www.json.org/xml.html

Strongly recommend JSON for better performance and less bandwidth consumption.

JSON all the way. The Saad's link is an excellent resource for comparing the two (+1 to the Saad), but here is my take from experience and based on your post:
speed
JSON is likely to be faster in many ways. Firstly the syntax is much simpler, so it'll be quicker to parse and to construct. Secondly, it is much less verbose. This means it will be quicker to transfer over the wire.
compatiblity
In theory, there are no issues with either JSON or XML here. In terms of character encodings, I think JSON wins because you must use Unicode. XML allows you to use any character encoding you like, but I've seen parsers choke because the line at the top specifies one encoding and the actual data is in a different one.
experience
I find XML to be far more difficult to hand craft. You can write JSON in any text editor but XML really needs a special XML editor in order to get it right.
XML is more difficult to manipulate in a program. Parsers have to deal with more complexity: name spaces, attributes, entities, CDATA etc. So if you are using a stream based parser you need to track attributes, element content, namespace maps etc. DOM based parsers tend to produce complex graphs of custom objects (because they have to in order to model the complexity). I have to admit, I've never used a stream based JSON parser, but parsers producing object graphs can use the natural Objective-C collections.
On the iPhone, there is no built in XML DOM parser in Cocoa (you can use the C based parser - libxml2) but there is a simple to use JSON parser as of iOS 5.
In summary, if I have control of both ends of the link, I'll use JSON every time. On OS X, if I need a structured human readable document format, I'll use JSON.

You say you are loading "articles". If you mean documents containing rich text (stuff like italic and bold), then it's not clear that JSON is an option - JSON doesn't really do mixed content.
If it's pure simple structured data, and if you don't have to handle complexities like the need for the software at both ends of the communication to evolve separately rather than remaining in lock sync, then JSON is simpler and cheaper: you don't need the extra power or complexity of XML.

Cocoa app layout with Core Data and lots of business logic

For those who have seen my other questions: I am making progress but I haven't yet wrapped my head around this aspect. I've been pouring over stackoverflow answers and sites like Cocoa With Love but I haven't found an app layout that fits (why such a lack of scientific or business app examples? recipe and book examples are too simplistic).
I have a data analysis app that is laid out like this:
Communication Manager (singleton, manages the hardware)
DataController (tells Comm.mgr what to do, and checks raw data it receives)
Model (receives data from datacontroller, cleans, analyzes and stores it)
MainViewController (skeleton right now, listens to comm.mgr to present views and alerts)
Now, never will my data be directly shown on a view (like a simple table of entities and attributes), I'll probably use core plot to plot the analyzed results (once I figure that out). The raw data saved will be huge (10,000's of points), and I am using a c++ vector wrapped in an ObjC++ class to access it. The vector class also has the encodeWithCoder and initWithCoder functions which use NSData as a transport for the vector. I'm trying to follow proper design practices, but I'm lost on how to get persistent storage into my app (which will be needed to store and review old data sets).
I've read several sources that say the "business logic" should go into the model class. This is how I have it right now, I send it the raw data, and it parses, cleans and analyzes the results and then saves those into ivar arrays (of the vector class). However, I haven't seen a Core Data example yet that has a Managed Object that is anything but a simple storage of very basic attributes (strings, dates) and they never have any business logic. So I wonder, how can I meld these two aspects? Should all of my analysis go into the data controller and have it manage the object context? If so, where is my model? (seems to break the MVC architecture if my data is stored in my controller - read: since these are vector arrays, I can't be constantly encoding and decoding them into NSData streams, they need a place to exist before I save them to disk with Core Data, and they need a place to exist after I retrieve them from storage and decode them for review).
Any suggestions would be helpful (even on the layout I've already started). I just drew some of the communication between objects to give you an idea. Also, I don't have any of the connections between the model and view/view controllers yet (using NSLog for now).

While vector<> is great for handling your data that you are sampling (because of its support for dynamically resizing underlying storage), you may find that straight C arrays are sufficient (even better) for data that is already stored. This does add a level of complexity but it avoids a copy for data arrays that are already of a known and static size.
NSData's -bytes returns a pointer to the raw data within an NSData object. Core Data supports NSData as one its attribute types. If you know the size of each item in data, then you can use -length to calculate the number of elements, etc.
On the sampling side, I would suggest using vector<> as you collect data and, intermittently, copy data to an NSData attribute and save. Note: I ran into a bit of problem with this approach (Truncated Core Data NSData objects) that I attribute to Core Data not recognizing changes made to NSData attribute when it is backed by an NSMutableData object and that mutable object's data is changed.
As for MVC question. I would suggest that data (model) is managed in by Model. Views and Controllers can ask Model for data (or subsets of data) in order to display. But ownership of data is with the Model. In my case, which may be similar to yours, there were times when the Model returns abridged data sets (using Douglas-Peucker algorithm). The views and controllers were none the wiser that points were being dropped - even though their requests to the Model may have played in a role in that (graph scaling factors, etc.).
Update
Here is a snippet of code from my Data class which extends NSManagedObject. For a filesystem solution, NSFileHandle's -writeData: and methods for monitoring file offset might allow similar (better) management controls.
// Exposed interface for adding data point to stored data
- (void) addDatum:(double_t)datum
{
[self addToCache:datum];
}
- (void) addToCache:(double_t)datum
{
if (cache == nil)
{
// This is temporary. Ideally, cache is separate from main store, but
// is appended to main store periodically - and then cleared for reuse.
cache = [NSMutableData dataWithData:[self dataSet]];
[cache retain];
}
[cache appendBytes:&datum length:sizeof(double_t)];
// Periodic copying of cache to dataSet could happen here...
}
// Called at end of sampling.
- (void) wrapup
{
[self setDataSet:[NSData dataWithData:cache]]; // force a copy to alert Core Data of change
[cache release];
cache = nil;
}

I think you may be reading into Core Data a bit too much. I'm not that experienced with it, so I speak as a non-expert, but there are basically two categories of data storage systems.
First is the basic database, like SQLite, PostgreSQL, or any number of solutions. These are meant to store data and retrieve it; there's no logic so you have to figure out how to manage the tables, etc. It's a bit tricky at times but they're very efficient. They're best at managing lots of data in raw form; if you want objects with that data you have to create and manage them yourself.
Then you have something like Core Data, which shouldn't be considered a database as much as an "object persistence" framework. With Core Data, you create objects, store them, and then retrieve them as objects. It's great for applications where you have large numbers of objects that each contain several pieces of data and relationships with other objects.
From my limited knowledge of your situation, I would venture a guess that a true database may be better suited to your needs, but you'll have to make the decision there (like I said, I don't know much about your situation).
As for MVC, my view is that the view should only contain display code, the model should only contain code for managing the data itself and its storage, and the controller should contain the logic that processes the data. In your case it sounds like you're gathering raw data and processing it before storing it, in which case you'd want to have another object to process the data before storing it in the model, then a separate controller to sort, manage, and otherwise prepare the data before the view receives it. Again, this may not be the best explanation (or the best methods for your situation) so take it with a grain of salt.
EDIT: Also, if you're looking on getting into Core Data more, I like this book. It explains the whole object-persistence-vs-database concept a lot better than I can.

How to save / reload a custom array to a plist

I'm loading in data from an sqlite database, storing the values i load from there in the instance variables of a custom class, and then adding this class to a mutable array, which i'm then assigning to the instance variable of my view controller, for use in a tableview.
I would, though, like to save this array into a .plist file in the documents directory on the app's first run, so that i can retrieve the whole object from there on load, rather than pulling all 214 items from the database.
Is this approach a better option? if so, could someone please help provide me with some code that will allow me to save an array of my custom classes as a .plist file? I've come across a lot of sample code on the web, but none of it works correctly.
I'd like to:
Check for the existence of the my_data.plist file.
If it exists, read it in as the array.
If it doesn't, read the data from the sqlite db into an array.
save this data to a .plist so that it can be read in faster later.
Thanks guys, appreciate any help you can give me.

It will probably be faster to just get the values from your database on launch. There will almost definitely be more cost to parse a plist containing these values than to just get them all from the database, unless the query you have to use to get them from the database is really slow.
Note also that once you're saving these objects to a plist on disk, you're actually going to be hurting performance of your program because you'll be writing your objects to disk twice and reading them from disk twice. You'll also be introducing opportunities for discrepancies between the plist and the database in the event of a bug or a crash.
That said, the only way to prove this to yourself may be to implement and profile both options, and compare actual numbers. Check out #occulus's link, above, for instructions how to read and write a plist. To profile your app, try using Instruments

When I google for "nsarray writetofile custom object" (no quotes) and click on the first link in the results, I find a really useful page.
For the record, it's this:
http://www.cocoabuilder.com/archive/cocoa/240775-saving-nsarray-of-custom-objects.html

CoreData or Individual Files (iOS)

I've got a little conundrum: would it be better to use direct file management, or a CoreData SQLite database?
Here's my scenario:
I have a bunch of 'user' objects, each with a list of 'post' objects. This is easily done in CoreData, and would be great - however, the 'post' objects are downloaded from a web server, and they each have a unique identifier. I don't want to have multiple 'post' objects with the same ID. I could solve this by caching CoreData responses into an NSDictionary, however this would not apply well to the design pattern of an application. As far as I am aware, when adding a new 'post' to my CoreData NSManagedObjectContext, I would have to lookup the unique ID to check for its existence (fast), then add it if it does not exist (slow), and update the previous if it does (fast). This is effectively replacing it. How would you guys handle this?
I've been trying to think of alternatives for a few days now, but no matter which way I look at it, CoreData is going to be slower than my alternative:
A file architecture inside the Caches/ directory of an iOS application could solve the problem. Something like this:
Users/
{unique ID}.user
{unique ID}.user
Posts/
{unique ID}.post
{unique ID}.post
Then, when retrieving a post object or user object, I can check the files for the existence of the data, and cache the file contents in an NSDictionary. If the ID exists in the dictionary, retrieve it from there instead. Replacing previous 'user' and 'post' objects is as simple as overwriting the file and updating the cache.
My second alternative would clearly be faster - however, I would not be taking advantage of any efficiencies built into CoreData, and I would have to provide my own memory management scheme to clear my cached dictionaries when a memory warning occurs.
Is there is some way of 'uniquing' in CoreData? That would solve my problem. Something similar to using a primary key in an ordinary SQLite database.
I'll start doing tests to verify speeds of both methods, but I thought I'd post this up here before starting in case anyone has any better solutions.

This exact question comes up a lot.
You can check if a value exist in Core Data without reading in the entire object. Just set the fetch to fetch the specific property you want to test, the ID in this case, and then return the fetch as a dictionary. Provide a predicate that looks for one or more IDs and if the returned dictionary has values, you know you have existing objects.
It's very rare that you can end up with a custom system which is faster and more robust than Core Data. It's rarely worth even trying.
Remember as well that premature optimization is the root of all evil. All this work is predicated on the premise that the simplest Core Data implementation is to slow. Have you actually tested that it is to slow? If not, do so before you try more elaborate designs.

After testing, I've found that CoreData at least halves the amount of time taken. The test I was running was as follows: I added 1000 posts to an empty CoreData object graph; and then retrieved 100 of those objects for updating. The time taken to add the objects was 0.069s, and the time taken to retrieve the objects was 0.181s. I retrieved these values on a 3G iPad device. Using files, adding these objects took 10 times longer, and retrieving them took 4 times longer.
My recommendation: Stick to using CoreData!

How a class that wraps and provides access to a single file should be designed?

MyClass is all about providing access to a single file. It must CheckHeader(), ReadSomeData(), UpdateHeader(WithInfo), etc.
But since the file that this class represents is very complex, it requires special design considerations.
That file contains a potentially huge folder-like tree structure with various node types and is block/cell based to handle fragmentation better. Size is usually smaller than 20 MB. It is not of my design.
How would you design such a class?
Read a ~20MB stream into memory?
Put a copy on temp dir and keep its path as property?
Keep a copy of big things on memory and expose them as read-only properties?
GetThings() from the file with exception-throwing code?
This class(es) will be used only by me at first, but if it ends good enough I might open-source it.
(This is a question on design, but platform is .NET and class is about offline registry access for XP)

It depends what you need to do with this data. If you only need to process it linearly one time, then it might be faster to just take the performance hit of a large file in memory.
If however you need to do various things with the file beyond a single, linear parsing, I would parse the data into a lightweight database such as SQLite and then operate on that. This way all of your file's structure is preserved and all subsequent operations on the file will be faster.

Registry access is quite complex. You are basically reading a large binary tree. The class design should rely heavily on the stored data structures. Only then you can choose an appropriate class design. To stay flexible you should model the primitives such as REG_SZ, REG_EXPAND_SZ, DWORD, SubKey, .... Don Syme has in his book Expert F# a nice section about binary parsing with binary combinators. The basic idea is that your objects know by themself how to deserialize from a binary representation. When you have a stream of bytes which is structured like this
<Header>
<Node1/>
<Node2>
<Directory1>
</Node2>
</Header>
you start with a BinaryReader to read the binary objects byte by byte. Since you know that the first thing must be the header you can pass it to the Header object
public class Header
{
static Header Deserialize(BinaryReader reader)
{
Header header = new Header();
int magic = reader.ReadByte();
if( magic == 0xf4 ) // we have a node entry
header.Insert(Node.Read( reader );
else if( magic == 0xf3 ) // directory entry
header.Insert(DirectoryEntry.Read(reader))
else
throw NotSupportedException("Invalid data");
return header;
}
}
To stay performant you can e.g. delay parsing the data up to a later time when specific properties of this or that instance are actually accessed.
Since the registry in Windows can get quite big it is not possible to read it completely into memory at once. You will need to chunk it. One solution that Windows applies is that the whole file is allocated in paged pool memory which can span several gigabytes but only the actually accessed parts are swapped out from disk into memory. That allows Windows to deal with a very large registry file in an efficient manner. You will need something similar for your reader as well. Lazy parsing is one aspect and the ability to jump around in the file without the need to read the data in between is cruical to stay performant.
More infos about paged pool and the registry can be found there:
http://blogs.technet.com/b/markrussinovich/archive/2009/03/26/3211216.aspx
Your Api design will depend on how you read the data to stay efficient (e.g. use a memory mapped file and read from different mapped regions). With .NET 4 a Memory Mapped file implementation has arrived that is quite good now but wrappers around the OS APIs exist as well.
Yours,
Alois Kraus
To support delayed loading from a memory mapped file it would make sense not to read the byte array into the object and parse it later but go one step furhter and store only the offset and length of the memory chunk from the memory mapped file. Later when the object is actually accessed you can read and deserialize the data. This way you can traverse the whole file and build a tree of objects which contain only the offsets and the reference to the memory mapped file. That should save huge amounts of memory.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse