Getting Data From Webpages? - iphone

When looking to get data from a web page whats the recommended method if the page does not provide a structured data feed? Am I right in thinking that its just a case of doing an NSURLRequest and then hacking what you need out of the responseData(NSData*)? I am not too concerned about the implementation in Xcode, I am more curious about actually collecting the data, before I start coding a "hunt & peck" through a list of data.
gary

Unless you are in control of what is getting fetched (e.g. you're sending yourself well-formed XML and can parse it appropriately), you're pretty much forced to picking through it "by hand" as you say. What you're doing here is also called "screen scraping".

Related

Iphone: Is SAX or DOM better for this specific problem?

My Iphone communicates with my SOAP web service.
I have a questionnaire in my XML, which includes questions, their answers, and choices of some UI information..etc. So when the user enters a value for a set of questions using the IPhone and sends them back again to the Web service, the Service will add new unanswered questions to that XML.
So a conversation with the Web service starts with an empty XML and builds up to 200-300Kb of XML as the user answers the questions and receives new ones. Since it is a stateless Web Service, all the information will be kept in the XML.
Practically, I only need to find and parse the latest questions from the response XML, which should be good with a SAX parser, and modify only that part of the XML while adding the new answer and sending back again via the Web service the modified XML. BUT the user also should be able to click "Back". So that means I have to hold that XML in memory(200-300KB) and parse when necessary as the user clicks back and next.
My question is which approach is better:
1-Get the XML, parse it totally into objects with the DOM, release XML from memory and work only with the objects as the user clicks back and next. Then when it comes to sending it back, assemble a new XML message from scratch with my objects. Also this approach seems to reduce the clicking time
2-Use the SAX parser and only parse when I need to as the user clicks back and next. But then I have to hold all the XML in the memory. I do not know if Iphone can handle that, and back-next actions should take longer since I parse every time. But the good side is that I don't need to re-assemble an XML from objects again when I am done.
I think the second approach is better, what do you think? And which parser is good for this job?
Personally I prefer DOM XML parsers due to their ease of use and the ability to separate parsing/creating logic from the rest of the code. It seems that your application would be best suited to use a DOM parser because you said so yourself that sometimes you only need certain parts of the XML, not the entire document. SAX parsers do not support random read access.
read the following article and you will find the best solution for you
http://www.raywenderlich.com/553/how-to-chose-the-best-xml-parser-for-your-iphone-project
I would go with NSXMLDocument nevertheless.

NSDictionaries and NSArrays from JSON from complicated YouTube API

I'm making an awesome iPhone app which searches for YouTube videos using the JSON API. However, Google is lazy so they just transformed the ATOM feed into JSON. Things look like this:
feed->entry[0]->author[0]->name->$t
This means that getting the information out of the NSArray is difficult, as I need to get a value of a key of an object of an array of an object of an array of an object of a key.
To check if the structure is correct, I can choose two things:
Use a huge amount of code for each item I want to check if the JSON was correct.
Wrap everything in a #try block.
I'd like to choose the second one. The problem is that some time ago I read that this is bad practice. Is it? And if so, is there a shorter way to validate the NSArrays en NSDictionaries? My app may never crash, not even if the user remover the processor at runtime, so not checking at all is not an option.
Can you please help me? Thanks.
Have you tried the GData API? I'm using it for my application ( http://itunes.apple.com/us/app/skystop/id392782307?mt=8 ) for the Youtube Feed. It basically spits out an XML file for whatever you've requested and you can convert it right into a plist file or an NSArray.
i'm not sure i understand, the API itself works in JSON instead of ATOM so you need to di into every item ?
If this is so then you are right, not much you can do except to seardch the web for helper libraries that might have been made even in google code to support this API.
In any case #2 is bad practice first of all since try catch usually consume more system resources then simple boolean cheek or checks.
Second once you are in the catch block you are kind of in problem since all you can do is print an error to the user or yourself, if you want to go on parsing and checking, you can't...
and last but not least (I'm sure there are reasons I'm not thinking of) except for the message you might get with the exception u are never to sure where it came from...
Are you parsing the JSON yourself? If so, I suggest using an external framework to do the work for you. I use Json-framework in a few of my own projects and it does the job just fine.
http://code.google.com/p/json-framework/

How should I architect my iPhone app to talk to my website?

I'm planning my first iPhone app and I'd like to get some inputs as to how to build it, right from the start. The iPhone app is being built to be paired with a public facing web application that is already built in PHP.
I'd like the web platform to be central (data is housed in a mySQL database), and have the
iPhone clients talk to it and use REST'ful methods to perform the functions of the site
(fetching latest content, posting content, voting, account management as examples).
I'd like the clients to get a local copy of the data in a SQLite database, but refresh to get the latest version of the feed (similar to the Twitter app).
Couple of thoughts I have right now:
Use something like ASIHTTPRequest to send/recieve data to PHP files on the server listening for requests
JSON - would I be better off to send the GET/POSTS to a PHP that returns JSON objects, and work with some sort of wrapper that manages the data and communicates changes to the local SQLite database?
Am I totally off in how I should be building this thing to communicate with the web? Is
there a best practice for this?
I'd really appreciate any input on how you would architect this sort of a setup.
Thank you,
EDIT: After reading my own post again, I know it sounds like a Twitter client, but it is NOT, although it has similar features/structure of a Twitter type setup. Thanks!
As you already outlined in your plan, XML and REST are a great way to communicate with a web application. I want to suggest few details about how to actually design and build it, or what you should keep in mind.
First of all, I believe it's important to stick with MVC. I've seen people creating HTTP connections in view-controllers, controllers being NSXMLParser's delegate, controllers containing data in member variables. I've even seen UITableCells establishing HTTP connections. Don't do it!
Your model and its basic manipulation code should be as much extracted from user interface as possible. As you already have created the model in your web-application, try to recreate the entities in your iPhone project. Don't be afraid of having some simple methods in entity classes, but do not make them use external resources, especially tcp connections. As an example of methods in entity class you might have methods that formats data in specific ways (dates as an example, or returning fullname as concatenation of firstname and surname), or you can even have a method like - (void)update that would act as a wrapper to call class responsible to update the model.
Create another class for updating the model - fetching the XMLs from web-app. Do not even consider using synchronous connections, not even from a dedicated thread. Asynchronous connections with delegate is the way to go. Sometimes multiple requests need to be made to get all required data. You might want to create some kind of state-machine to keep the information about in which stage of downloading you are, and progress from stage to stage, skipping to the end if error occurs, re-executing from failed stage after some moments.
Download data somewhere temporarily, and first when you have it all, make a switch and update user interface. This helps responsiveness during launching the app - user gets to work immediately with data stored locally, while the update mechanism is downloading the new data.
If you need to download lots of files, try to download them simultaneously, if dependencies between files allow for that. This involves creating a connection per request, probably delegate instance for each of them. You can of course have only one delegate instance for all of those connections, but it gets a bit more complex to track the data. Downloading simultaneously might decrease latency considerably, making the mechanism much faster for the user.
To save the time and bandwidth, consider using HTTP's If-Modified-Since and/or ETag headers. Remember the time or tag when you requested the data the last time, and next time send it in HTTP's header. Your web-application should return HTTP code 304 if content has not been changed. iPhone app should react on this code accordingly in connection:didReceiveResponse:.
Create a dedicated class to parse the XML and update the model. You can use NSXMLParser, but if your files are not huge I strongly recommend TouchXML, it's such a pleasure to work with XML as document (it also supports XPath), instead of an event based API. You can use this parser also when files are downloaded to check their validity - re-download if parsing fails. That's when dedicated class for parsing comes handy.
If your dataset is not huge, if you do not need to persist downloaded data on iPhone forever, you probably don't need to store them in SQLite database, you can simply store them in XML format - just a simple caching. That at least might be the way for a twitter app. It gets easier that way, but for bigger data sets XML consumes lots of memory and processing power - in that case SQLite is better.
I'd suggest using Core Data, but you mention this is your first iPhone app, so I suggest you don't use it. Yet.
Do not forget about multitasking - your app can go to sleep in the middle of download, you need to cancel connections, and cleanup your update mechanisms. On app's wake-up you might want to resume the update.
Regarding the view part of the application - use Interface Builder. It might be painful in the beginning, but it pays off in the long run.
View controllers are the glue between model and views. Do not store data in there. Think twice about what to implement where, and who should call it.
This is not related to architecture of the app, but I want to remind that Objective-C is very expressive language. Code should read much like a sentence. Extend classes with protocols. As an example the other day I needed first line of a string. Sure, you can write a one-liner where you find first occurrence of a new-line, and get a substring from beginning till there. But it doesn't look right. I've added - (NSString*)firstLine into my NSString's protocol. Code looks so much better this way, it doesn't need any comments.
There are lots of things to consider in both architecture and design of any project, they both should go hand in hand. If one is causing trouble to the other, you need to adapt. Nothing is written in stone.
I'm currently working on an app that sounds similar to yours. I'd also suggest ASIHTTPRequest, and probably something like TouchJSON for JSON parsing, or extending/making a delegate of NSXMLParser if you want to parse XML.
As suggested by JosephH, depending on how your app works you may want to consider alternate authentication methods: I'd take a look at something token-based like OAuth, which has ready-made libraries for people to dig in to.
SQLite is totally viable for feed caching, although I prefer NSCoding so that you can freeze-dry your custom data structures.
As a general suggestion, make sure to spend a lot of time thinking about every use case and corner case for connections: it's easy to assume a user will only contact the server in certain ways and at certain times, and then after you throw in multitasking/incoming calls/lock screen/memory warnings, things can get hairy without any planning.
All in all, you seem to be on the right track, just make sure you plan out everything beforehand :)
Apple have a brand new in depth piece of sample code - MVCNetworking that shows in depth how to use subclasses of NSHTTPRequests and NSOperationQueues.
As others mentioned, I think you are asking the right questions and are heading in the right direction. All of the replies above are valuable advice. Here is my advice, and I hope you'll find it useful.
No matter which method/library you choose to talk to your web services, I think it's important to make a clean separation in the way you design your data model on the phone VS. the data model in your web application. You have 3 major distinctions to keep in mind for your design:
Data model on the web application (reflected by your existing mySQL database)
Since this is already there, there is not much to say about it, except that it will influence a lot your design for the following 2 parts. I suggest to make this model the 'master reference' for how your data is represented across platforms.
Data model on the iPhone app (reflected by the information you need to display in the iPhone app)
This is where the fun begins. First, you need a good understanding of what data you need to display in the phone app. So have a good, high level design of your app first (use pen and paper, draw mock-ups of each view and the interactions between them, model the navigation between your view controllers etc.). It really helps to understand the interactions between your view controllers and the various bits and pieces of data you want to show in the app. This will help you create the requirements for the data model on the phone. Based on these requirements, map the existing (web) data model to a new model, suited to your iPhone app. This new model may or may not include all tables and fields found in your web app. But the general representation of the 2 models should be very similar (e.g. relationships, data types, etc.)
Data model used to communicate between the 2 above (this is your 'data exchange protocol')
Once you have the 2 representations of your data above, you need to 'translate' from one to the other, both ways. Design your data exchange protocol to be as simple and compact as possible. You don't want to waste bytes on useless information, as transmissions over the network are costly. (As a side note, you might think of compressing the transmitted data later on, but it's just as important to have a good design from the beginning). It's probably best to begin with a protocol in which the metadata is the same as the one in your web application model (e.g. same relationships, names of tables, attributes, etc.). But remember, you'll only have to serialize/de-serialize those entities and relationships that you listed in point 2) above. So design accordingly. Your exchange protocol may also include session tokens, authentication info, a version number, or other metadata, if you need it.
Remember: your data exchange protocol is what will de-couple your web application and iPhone application models. I found that it's best to de-couple them because they may both evolve over time. The data model on the iPhone for example, may evolve a lot especially when you will find that you need to re-model some relationships or add/remove attributes from your entities in order to improve application responsiveness, or the user experience, the navigation, or whatever.
Since this is a whole concern in and by itself, well, you need to design a generic serialization/de-serialization mechanism on top of your (JSON/XML/whatever parser you choose) that is flexible enough to sustain the potential differences between your 2 data models. These differences might be: entity/attribute/relationship names, primary key identifier names, data types, attributes to ignore, and the list goes on. I would definitely implement a serializer/de-serializer utility class in the iPhone app, backed by a .plist configuration file containing all supported entities, concerns, aliases you might have. Of course, each model object should 'know' how to serialize, de-serialize itself and its relationships (i.e. the required object graph depth).
One last note, since you will end up with 2 representations of your data, you will need a way to uniquely identify an object on both sides. So for example, think of adding a uuid attribute to all data that needs to be exchanged, or use any other approach that suits your needs.
I am building an app that has similar requirements to yours, and these are the approaches I found to be best so far. Also, you might find this video useful (it inspired me a lot on how to implement some of the issues I mentioned above and is especially interesting if you're using CoreData) :
http://itunes.apple.com/ca/podcast/linkedin-important-life-lessons/id384233225?i=85092597
(see the lecture entitled "LinkedIn: Important Life Lessons on CoreData & GameKit (March 12, 2010)" )
Good luck!
It's quite a broad question, and I think you're going in the right way anyway, however I'll do my best to give some advice:
JSON, ASIHTTPRequest and POSTs to PHP scripts sound like a great way to go.
If the data is not really sensitive, I'd use http most of the time, and use https only for a login page that either sets a cookie or returns a "token" that you use in subsequent requests. (HTTPS can be quite slow over a 3G connection as the overhead in terms of number of packets to setup an SSL connection is higher than a plain TCP connection.)
You should make sure you correctly pass any data from the input to the PHP scripts to the database, to avoid any SQL injection attacks - ie. used parameterised SQL, don't create sql queries by doing "SELECT * from users where username="+$_GET['username']"
I would do this like I have done with a lot of AJAX web-page stuff. i.e.:
Have a URL on your server side package the information to be transmitted into XML format. (This can be through a CGI/PHP script or whatever). Your transmitting XML in the message body - so it's easy to human read and debug with a standard web browser.
Use the standard iPhone NSXMLParser methods to parse out the individual data fields from the XML doc, and write it back to your database. This method is equiped to both fetch the data from a URL and parse it in one call - like:
NSURL *xmlURL = [NSURL URLWithString:#"http://www.example.com/livefeed.cgi"];
NSXMLParser *myParser = [[NSXMLParser alloc] initWithContentsOfURL:xmlURL];
Walk through the data hierarchy with the NSXMLParser methods and populate your database accordingly.

iphone - retrieving information from the internet (rss alternative?)

(Very) basically my app is just a load of information collected from the internet - eg: someone can log into an admin panel on a website and update their app from there. The information gets put into a mysql database.
The way I thought about going about this was to use an RSS feed - it works for blog/twitter feeds, so I thought why not do it for the rest of the information that I want to get.
My question is, is this a suitable way to do it? Basically just make dynamic XML files (php scripts that output XML) and parse them on the iphone, or is there a better way to do it?
I'm not looking for a full blown tutorial, just maybe a few keywords that I can go off and look up myself - or a "XML is the best way... stick at that". :p
Thanks a lot.
I personally like JSON more than XML, since it creates less characters to transfer the same data = less bandwidth/transfer used and faster response.
You can use a JSON library from here or just stick with XML since you're familiar with it. I guess it's just a matter of personal preference.

"Refreshing" an XML feed on iPhone/Mac OSX

I'm curious for those of you who are building iPhone apps based on REST/SOAP/XML-RPC or simply pulling down a dynamic XML feed, what does it mean exactly to you when a user says 'refresh' the feed?
The straight forward way is to populate some collection, say an NSMutableArray, with whatever you bring down from the feed. If a widget on the UI is available to refresh, I typically do something like:
[myMutableArray removeAllObjects];
// follow steps to repopulate myMutableArray
It seems this is the least efficient algorithm for refreshing an XML feed. For instance many folks who are building Twitter clients, are appending changes to their existing feed, versus bringing down the entire feed in its complete form again.
What kind of algorithms are you using to "refresh" your models when speaking to a server-side data source?
Thanks all.
You should look into using the PubSub framework if you can require OS X 10.5. It's explicitly designed to fetch and update RSS/Atom feeds.
(Disclaimer: I wrote a lot of that framework while I was at Apple :)
The answer to your question is that feeds are inherently inefficient. You can minimize this by
Using HTTP "conditional GETs", so if the feed hasn't changed on the server you'll just get back a tiny 304 response. This saves time for the server and for you. (Some feed servers, like slashdot, will ban you if you don't use conditional gets!)
Check the "Last-Modified:" date on the response. Yes, even if you use a conditional GET. Some servers don't handle them properly. If the date is unchanged, ignore the feed.
Compare the raw data of the response against the last raw response you got. If identical, ignore the feed. (Some servers don't support conditional gets or send last-modified dates...)
Now you have to parse the XML.
Check the top-level mod date on the feed itself (this varies between Atom and the different flavors of RSS.) Again, if it's the same as it was last time, ignore the feed.
If you got here, the feed's been updated, most likely. The easiest thing to do is to throw away all of your old saved entries and replace them with the new ones. But this means you can't keep 'historic' entries that have fallen off the end of the feed. If you want to do that, you have to go through each entry in the just-parsed feed, match it with the corresponding entry in your persistent storage, and update the persistent one based on the new one. If you couldn't find a persistent one, add it as a new entry. (Matching entries can be difficult in lame RSS feeds that don't include unique GUIDs for each entry. You have to try comparing permalinks and titles. Yuck.)
This whole thing really is a big mess. It took a lot of work to make everything behave correctly and work with all the broken feeds and servers out there; take advantage of my pain and use PubSub, if you can :)
One approach is using the built in NSXML pull parser in a background thread and comparing entries from the stream to what you have in memory, updating only what has changed.
I've just released an open source RSS/Atom Parser for iPhone and hopefully it might be of some use.
I'd love to hear your thoughts on it too!