parsing HTML on the iPhone [closed] - iphone

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Can anyone recommend a C or Objective-C library for HTML parsing? It needs to handle messy HTML code that won't quite validate.
Does such a library exist, or am I better off just trying to use regular expressions?

I found using hpple quite useful to parse messy HTML. Hpple project is a Objective-C wrapper on the XPathQuery library for parsing HTML. Using it you can send an XPath query and receive the result .
Requirements:
-Add libxml2 includes to your project
Menu Project->Edit Project Settings
Search for setting "Header Search Paths"
Add a new search path "${SDKROOT}/usr/include/libxml2"
Enable recursive option
-Add libxml2 library to to your project
Menu Project->Edit Project Settings
Search for setting "Other Linker Flags"
Add a new search flag "-lxml2"
-From hpple get the following source code files an add them to your project:
TFpple.h
TFpple.m
TFppleElement.h
TFppleElement.m
XPathQuery.h
XPathQuery.m
-Take a walk on w3school XPath Tutorial to feel comfortable with the XPath language.
Code Example
#import "TFHpple.h"
NSData *data = [[NSData alloc] initWithContentsOfFile:#"example.html"];
// Create parser
xpathParser = [[TFHpple alloc] initWithHTMLData:data];
//Get all the cells of the 2nd row of the 3rd table
NSArray *elements = [xpathParser searchWithXPathQuery:#"//table[3]/tr[2]/td"];
// Access the first cell
TFHppleElement *element = [elements objectAtIndex:0];
// Get the text within the cell tag
NSString *content = [element content];
[xpathParser release];
[data release];
Known issues
As hpple is a wrapper over XPathQuery which is another wrapper, this option probably is not the most efficient. If performance is an issue in your project, I recommend to code your own lightweight solution based on hpple and xpathquery library code.

Looks like libxml2.2 comes in the SDK, and libxml/HTMLparser.h claims the following:
This module implements an HTML 4.0 non-verifying parser with API compatible with the XML parser ones. It should be able to parse "real world" HTML, even if severely broken from a specification point of view.
That sounds like what I need, so I'm probably going to use that.

Just in case anyone has got here by googling for a nice XPath parser and gone off and used TFHpple, Note that TFHpple uses XPathQuery. This is pretty good, but has a memory leak.
In the function *PerformXPathQuery, if the nodes are found to be nil, it jumps out before cleaning up.
So where you see this bit of code: Add in the two cleanup lines.
xmlNodeSetPtr nodes = xpathObj->nodesetval;
if (!nodes)
{
NSLog(#"Nodes was nil.");
/* Cleanup */
xmlXPathFreeObject(xpathObj);
xmlXPathFreeContext(xpathCtx);
return nil;
}
If you are doing a LOT of parsing, it's a vicious leak.
Now.... how do I get my night back :-)

I wrote a lightweight wrapper around libxml which maybe useful:
Objective-C-HMTL-Parser

This probably depends on how messy the HTML is and what you want to extract. But usually Tidy does quite a good job. It is written in C and I guess you should be able to build and statically link it for the iPhone. You can easily install the command line version and test the results first.

You may want to check out ElementParser. It provides "just enough" parsing of HTML and XML. Nice interfaces make walking around XML / HTML documents very straightforward. http://touchtank.wordpress.com/

How about using the Webkit component, and possibly third party packages such as jquery for tasks such as these? Wouldn't it be possible to fetch the html data in an invisible component and take advantage of the very mature selectors of the javascript frameworks?

Google's GData Objective-C API reimplements NSXMLElement and other related classes that Apple removed from the iPhone SDK. You can find it here http://code.google.com/p/gdata-objectivec-client/. I've used it for dealing messaging via Jabber. Of course if your HTML is malformed (missing closing tags) this might not help much.

We use Convertigo to parse HTML on the server side and return a clean and neat JSON web services to our Mobile Apps

Related

iOS basic FTP setup; Read and Write Stream

I'm attempting to create an iOS 5 app with some very basic FTP functionality and need some guidance. It will be connecting to a device on a local network and performing read/write actions with .dat/txt files. I've done some searching for the past few days and have seen various recommendations but nothing simple enough that I can pick up and quickly modify for my personal use.
My questions are these:
Are there any tutorials/sample code that you could recommend to me?
What frameworks and classes should I be working with for basic read/write operations?
Lastly, I should mention that I have given a considerable amount of time to analyzing the SimpleFTPSample from Apple but the sample code is giving "Connection Failure" and "Stream Open Error" notices for each example, so I'm a bit wary of its usefulness.
Forgive me if this has been answered elsewhere. All of the related posts have pieces of the answer I need, but not the whole thing. Thank you in advance!
EDIT for clarity: A well-defined example or step-by-step tutorial is what I would really like. My own Google searches have turned up nothing and I am desperately in need of some guidance here.
UPDATE:
I posted this question long ago but have continued using the FTPHelper mentioned in the accepted answer. I recently brushed the dust off the old project and realized there was a minor memory leak in FTPHelper's fetch function that can be an app-killer if called repeatedly. If anybdy stumbles across this question and chooses to use FTPHelper, be sure to add the CFRelease line seen in the code below.
- (void) fetch: (NSString *) anItem
{
if (!self.uname || !self.pword) COMPLAIN_AND_BAIL(#"Please set user name and password first");
if (!self.urlString) COMPLAIN_AND_BAIL(#"Please set URL string first");
NSString *ftpRequest = [NSString stringWithFormat:#"%#/%#", self.urlString, [anItem stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding]];
/* CFShow(ftpRequest); */
NSString *writepath = [NSHomeDirectory() stringByAppendingPathComponent:#"Documents"];
self.filePath = [writepath stringByAppendingPathComponent:anItem];
CFURLRef writeURL = CFURLCreateFromFileSystemRepresentation (NULL, (const UInt8 *) [writepath UTF8String], [writepath length], NO);
MySimpleDownload((CFStringRef)ftpRequest, writeURL, (CFStringRef) self.uname, (CFStringRef)self.pword);
CFRelease(writeURL);//ADD THIS LINE TO FIX MEMORY LEAK
}
The SimpleFTPSample app is running perfect, probably there is an issue that you can't see. What I can recommend you (except Apple's example) is to check THIS example which contains a helper class for all basic FTP operations. One thing to be aware of is iOS 5 ARC. Both Apple's example and the one I linked are for older iOS versions.
There are basically 2 ways to use them in iOS 5 - by telling the compiler to not use ARC by adding -fno-objc-arc flag in [Your project] -> TARGETS -> [Your app] -> Build Phases -> Compile Sources -> [Your file], or by using the built-in tool in Xcode for converting to ARC.
I personally have tested only the first method and it works for me.
If this does not help you I can write an example, but unfortunately today I am very busy.
UPDATED:
The basic mechanism is to use [FTPHelper list:THE_FTP_URL] to list the content of a folder, then create one list with the content and depending on the type (file or folder) download using [FTPHelper download: THE_FTP_URL_WITH_THE_FILENAME_FROM_LISTING]. From here you have to implement
- (void) downloadFinished
{
//do the reading depending on the file type
NSData *data = [NSData dataWithContentsOfFile:[FTPHelper sharedInstance].filePath];
}
The uploading is achieved in a similar way - using [FTPHelper upload:FILE_TO_UPLOAD] with a file from the filesystem.
There are many libraries which you could use and they are working great. :)
For example:
http://www.chilkatsoft.com/ftp-objc.asp
http://code.google.com/p/ios-ftp-server/
I recommend using them, because coding one by yourself would take a lot of time :)
One thing to remember, as o15a3d4l11s2 said, is to be aware of ARC. If you use it don't forget to add build flags to libraries which aren't ARC.

getting the information of the audio file like author, title, album, year, etc. [duplicate]

I have mp3 files stored on the iPhone and I my application should to be able to read the ID3 information, i.e length in seconds, artist, etc.
Does anyone know how to do this or what library to use in Objective-C?
Your thoughts are much appreciated.
ID3 information can be read retrieving the kAudioFilePropertyInfoDictionary property of an audio file using the AudioFileGetProperty function of the AudioToolbox framework.
A detailed explanation is available at iphonedevbook.com
edit: Original link is now down. InformIT has some similar sample code, but it's not as complete.
Look into the Media Player framework:
Guide
Reference
All documentation
This requires that the MP3 in question is part of the iPod library on the phone.
For example, determining the name of every media file on the phone (including movies, podcasts, etc.):
MPMediaQuery *everything = [[MPMediaQuery alloc] init];
NSArray *itemsFromGenericQuery = [everything items];
for (MPMediaItem *item in itemsFromGenericQuery) {
NSString *itemTitle = [item valueForProperty:MPMediaItemPropertyTitle];
// ...
}
It appears that the following properties are available:
MPMediaItemPropertyMediaType
MPMediaItemPropertyTitle
MPMediaItemPropertyAlbumTitle
MPMediaItemPropertyArtist
MPMediaItemPropertyAlbumArtist
MPMediaItemPropertyGenre
MPMediaItemPropertyComposer
MPMediaItemPropertyPlaybackDuration
MPMediaItemPropertyAlbumTrackNumber
MPMediaItemPropertyAlbumTrackCount
MPMediaItemPropertyDiscNumber
MPMediaItemPropertyDiscCount
MPMediaItemPropertyArtwork
MPMediaItemPropertyLyrics
MPMediaItemPropertyIsCompilation
Doing this without going through the media player framework will be somewhat difficult, and will need an external framework.
There are not many ID3 parsing libraries out there that are not GPLed. There is on Objective-C framework that could probably be modified to work on the iPhone when statically linked, but it is LGPL. In order to satisfy the terms of the LGPL with a statically linked binary you have to provide enough of the intermediary components that someone could relink it with their own version of the library, which is difficult (but not impossible) for an iPhone app. Of course since I have not been in a position where I have had to do that I have not actually discussed it with a lawyer, and since I am not one you should not take that as authoritative.
Your best bet if you don't feel like consulting a lawyer is to use a more liberally licensed C library like libID3 and wrap that in some Objective C classes. I would also recommend just directly including the code rather than dealing with all the static library build and link issues, but that is just a personal style thing.

Is there a native YAML library for iPhone?

I'm considering using YAML as part of my next iPhone application, but I haven't been able to find an Objective-C library to use.
The Wikipedia page for YAML mentions one, but the link is dead.
Is there an Objective-C library that can parse YAML into native collection objects (NSArray, NSDictionary, etc...)?
The Cocoa extensions for Syck are probably what you're looking for -- it's where the library that Shaggy Frog mentioned seems to be living these days.
You can try YAML.framework it's LibYAML based, it's fast and easy to use. Follows the same pattern as standard NSPropertyListSerialization.
You can use it for iOS (iPhone/iPad) development.
The YAMLKit framework is a thin wrapper around LibYAML. It does exactly what you want. For example:
[[YKParser alloc] init];
[p readString:#"- foo\n- bar\n- baz"];
id result = [p parse];
/* result is now an NSArray containing an NSArray with elements:
#"foo", #"bar", #"baz" */
[p release];
I recently wrote modern ObjC-YAML bindings, based on the standard NSCoder/NSKeyedArchiver interface: http://github.com/th-in-gs/YACYAML. I'm using them in my own projects, and intend to maintain them for at least as long as I continue to do so.
More here: http://www.blog.montgomerie.net/yacyaml
IF you are doing alot of c++ in your iPhone projects, then please have a look at yaml-cpp:
http://code.google.com/p/yaml-cpp/
has native iPhone support (via it's cmake build system)
has no dependencies beyond a good compiler and cmake
is very c++ friendly (thus, the name) with solid documentation (see the wiki/HowToParseADocument page)
I found this right from YAML's front page. But it looks like it might be out of date (c. 2004?), and the CVS link doesn't work for me.
I would bet that it's just a thin wrapper around an underlying C library like this or this... C code being "native" code that the Objective-C compiler will grok.
I found this question looking for YAML + objective C options. I ended up using this solution: https://github.com/icanzilb/JSONModel. Very cool, up to date and easy to use. Parses yaml directly into objective C models that you create inheriting the JSONModel class.

Reading MP3 information using objective c

I have mp3 files stored on the iPhone and I my application should to be able to read the ID3 information, i.e length in seconds, artist, etc.
Does anyone know how to do this or what library to use in Objective-C?
Your thoughts are much appreciated.
ID3 information can be read retrieving the kAudioFilePropertyInfoDictionary property of an audio file using the AudioFileGetProperty function of the AudioToolbox framework.
A detailed explanation is available at iphonedevbook.com
edit: Original link is now down. InformIT has some similar sample code, but it's not as complete.
Look into the Media Player framework:
Guide
Reference
All documentation
This requires that the MP3 in question is part of the iPod library on the phone.
For example, determining the name of every media file on the phone (including movies, podcasts, etc.):
MPMediaQuery *everything = [[MPMediaQuery alloc] init];
NSArray *itemsFromGenericQuery = [everything items];
for (MPMediaItem *item in itemsFromGenericQuery) {
NSString *itemTitle = [item valueForProperty:MPMediaItemPropertyTitle];
// ...
}
It appears that the following properties are available:
MPMediaItemPropertyMediaType
MPMediaItemPropertyTitle
MPMediaItemPropertyAlbumTitle
MPMediaItemPropertyArtist
MPMediaItemPropertyAlbumArtist
MPMediaItemPropertyGenre
MPMediaItemPropertyComposer
MPMediaItemPropertyPlaybackDuration
MPMediaItemPropertyAlbumTrackNumber
MPMediaItemPropertyAlbumTrackCount
MPMediaItemPropertyDiscNumber
MPMediaItemPropertyDiscCount
MPMediaItemPropertyArtwork
MPMediaItemPropertyLyrics
MPMediaItemPropertyIsCompilation
Doing this without going through the media player framework will be somewhat difficult, and will need an external framework.
There are not many ID3 parsing libraries out there that are not GPLed. There is on Objective-C framework that could probably be modified to work on the iPhone when statically linked, but it is LGPL. In order to satisfy the terms of the LGPL with a statically linked binary you have to provide enough of the intermediary components that someone could relink it with their own version of the library, which is difficult (but not impossible) for an iPhone app. Of course since I have not been in a position where I have had to do that I have not actually discussed it with a lawyer, and since I am not one you should not take that as authoritative.
Your best bet if you don't feel like consulting a lawyer is to use a more liberally licensed C library like libID3 and wrap that in some Objective C classes. I would also recommend just directly including the code rather than dealing with all the static library build and link issues, but that is just a personal style thing.

How can I query chapter metadata from a m4a file?

I need to write some code that will let me query a m4a file and extract the chapter information out. Including:
chapter name
chapter start time
chapter artwork
I did some quick searching and it seems this is viewed as proprietary information by Apple? I found some discussions, but most were from 2005. Also there have been some similar questions on here, but more for CREATING m4a files with chapters, not querying.
Is this just something I have to DIY, cause there isn't a nice apple API for me to use? Or am I missing something obvious?
Also, ideally I need whatever technique I end up using to work on the iPhone.
The metadata tags system is Apple-proprietary. To work with the tags, you have to (sigh) reverse-engineer it or work with a library that has already done this.
I found the following links, but honestly it seems like you will have to pull out the hex editor.
Binary format info (basic spec for generic tags)
Perl library for working with M4A files.
Turns out this is much simpler than talked about here in the "answers". Not sure if this works on the iPhone, but I just tested it in a command line app:
QTMovie* movie = [QTMovie movieWithFile:#"filename.m4a" error:nil];
NSInteger numChapters = [movie chapterCount];
NSLog(#"Number of Chapters: %d", numChapters);
NSArray* chapterArray = [movie chapters];
for ( NSDictionary* chapDict in chapterArray )
{
NSLog(#"%#", [chapDict objectForKey:#"QTMovieChapterName"] );
}
Easy as pie. DOH!
this library should solve your needs, but is not runnable on iphone without jailbreaking I would think. http://wmptagext.sourceforge.net/
oops if you need it to work on iphone there is probably an apple api to get this info. /me looks
it sounds like you need to play around with the ipodlibrary library....
http://developer.apple.com/iphone/library/documentation/Audio/Conceptual/iPodLibraryAccess_Guide/UsingTheiPodLibrary/UsingTheiPodLibrary.html#//apple_ref/doc/uid/TP40008765-CH101-SW1
If the files in question live in the iPod library,
maybe you can get your information via the MPMediaLibrary
query interface (3.0 upward).