I am working on a GWT project which has to run on phones,
The project is required to parse large xml files and since phones are tight on memory i am refraining from using the XML DOm parger bundled with GWT.
In my view the Pull Parser will be apt here. Is there a GWT implementation of the Pull Parser?? It will help reduce the memory required for parsing large XML files...
Thanks,
Karthik.
If you want to parse large files of xml on phones, you will likely hit memory limits of different phones pretty early.
Maybe there is an alternative approach (parse the files server side) and give the relavant information to the phones in a faster format like json.
I'll agree... JSON seems like a good solution!!! did some digging and the following link also shows how to parse and convert json objects to java for GWT coding
http://code.google.com/webtoolkit/doc/1.6/tutorial/JSON.html
Related
i'm beginner with GXT and i'm wondering if there is a way to parse a file and extract some informations without uploading it.
i created a formpanel that contains an uploadFile form but i don't know waht's next, how to get the complete path of the file so i can read/write with java io or how to retrieve the file or is there an alternatif solution, thank you.
Best Regards.
You can do it in some modern browsers using bleeding edge HTML5 apis for which you would need to use GWT JSNI code. There are no api's from GWT team as is.
HTML5 FileReader
FileReader includes four options for reading a file, asynchronously:
FileReader.readAsBinaryString(Blob|File) - The result property will contain the file/blob's data as a binary string.
FileReader.readAsText(Blob|File, opt_encoding) - The result property will contain the file/blob's data as a text string.
FileReader.readAsDataURL(Blob|File) - The result property will contain the file/blob's data encoded as a data URL.
FileReader.readAsArrayBuffer(Blob|File) - The result property will contain the file/blob's data as an ArrayBuffer object.
Example of GWT wrapper over these -
https://github.com/bradrydzewski/gwt-filesystem
You can read about it more from here - How to retrieve file from GWT FileUpload component?
IMHO you cannot read it .
Due to security reasons javascript(gwt) doesn't have access to the system drives files.
http://en.wikipedia.org/wiki/JavaScript#Security
see Opening a file in local file system in javascript
In order to get the file you need to make a server call.
Instead you can do your validation server side and throw proper messages to user.
P.S : i am not considering modern browser concept.What happens if someone opened in other than so called modern browsers?? Will the programm runs same?? Its always better to do server side validation..
I have an iOS application that parses xml data from the web. I've setup it to parse some xml tags for me and then display some information in the application.
I do not own the xml data so it's not unlikely that the xml tags could change without my knowledge and then rendering my iOS application useless because I'm not able to parse the data with the wrong xml tags.
So instead of having the application crashing when (if) they change xml tags I was thinking of having the application send an e-mail in the background alerting that the xml tags have changed. Or something like that. Is that possible to do or is it even a smart solution to my problem?
Why don't you parse the XML file in your server side using any technology that you prefer, and provide your controlled XML file to your iOS application. That way you will have the full control over the XML tags that your application expects! If the other party changes the tags, you just re-write your server side program to handle the changes gracefully!
I'm trying to parse HTML using TouchXML. However, it seems that the data I want to parse (I do not control the source, it's downloaded from the internet) is partially malformed - I get various errors during the parse. Therefore, it seems that I should be using the inbuilt tidy support to fix the HTML but I cannot seem to find any documentation or information on how to enable it or link libtidy successfully into my project.
If anyone has any information on how to do this, it'd be much appreciated. Alternatively if there's another tool I could be using to do this - do tell me!
Actually, you can both link to the framework and include the headers, without needing to download the source.
Link to the existing framework libtidy.dylib
Add /usr/include/tidy to HEADER_SEARCH_PATHS
Turns out that although the framework can be linked in to an xcode project, the headers are missing. I have got around this by downloading the HTML Tidy Source (src and include directory) and added them in to compile as part of my xcode project.
Essentially, I currently have an iPhone app that can query and parse an XML file on my server. Right now, I currently have to manually update and upload my XML file every morning so my users can have the updated information. I would like to automate this process, which would essentially entail parsing various websites (NYTimes, iAmBored.com, etc), outputting the relevant information from each of these websites to an XML file, and uploading that file to my server.
Does anyone know the best way to accomplish this (parsing HTML to an XML file). Since I am a beginner, I'm not sure what languages this requires or what is the best way to do this?
Thanks a lot in advance!
You can try to translate HTML to XHTML (XHTML is based on XML so it's XML with some rules defined in a DTD).
You can also try to parse directly HTML with a SGML parser (As XHTML is based on XML, HTML is based on SGML).
The links are provided as inspiration.
If the content you need to scrape is in XHTML then you can easily use the XSLT language to transform original content in what you need inside the XML you provide to your users.
Otherwise any kind of scraping and XML producing solution will be fine, every programming language has its support to do such things.. but you could use XPath to select the elements you need from the page and then save them inside the output file.
Can you get what you need from the RSS/Atom feeds? That will simplify things greatly because they are XML rather than HTML and can be parsed by a standard XML parser. Of course, descriptions embedded inside RSS feeds will be HTML, so depending on your application, that may be when you need to parse HTML.
XSLT is a domain-specific programming language designed for processing XML, but you can also use any programming language that includes an XML parser for the task.
TagSoup - Just Keep On Truckin'
...a SAX-compliant parser written in Java
that, instead of parsing well-formed
or valid XML, parses HTML as it is
found in the wild: poor, nasty and
brutish, though quite often far from
short.
TagSoup is designed for people
who have to process this stuff using
some semblance of a rational
application design.
By providing a SAX
interface, it allows standard XML
tools to be applied to even the worst
HTML. TagSoup also includes a
command-line processor that reads HTML
files and can generate either clean
HTML or well-formed XML that is a
close approximation to XHTML.
Also, Taggle, a TagSoup in C++, available now
I am using the SIMPLE RSS reading example found at http://theappleblog.com/2008/08/04/tutorial-build-a-simple-rss-reader-for-iphone/
It uses parseXML to load the RSS feeds.
Here is the problem I am having. For the following RSS feed example, I am having trouble getting it to load the feed. Comes up with an error that it cannot connect. However on my Mac RSS Reader it works fine, so I know the link is good.
Any ideas on why it cannot load this particular feed but it can load others fine?
http://www.okstate.com/rss.dbml?db_oem_id=200&media=news
Thanks.
I've just released an open source RSS/Atom Parser for iPhone and hopefully it might be of some use.
I'd love to hear your thoughts on it too!
In my experience, HTML markup causes an RSS parser to fail in most cases. I've experienced a problem like this with a lot of parser classes I've come across (in search of the ultimate one, which I didn't find)
My guess is that entities such as
's
are responsible for your crash. That was usually the case with my crashes. This also lead to my decision to create a 'proxy server' to pre-parse the XML before sending it to the iPhone (which gives me the advantage of caching, scaling, and some other stuff). I do believe there are solid solutions out there, but is always difficult writing a parser for so many RSS implementations.
P.S: W3C validates this feed as 'valid', so it really is 'our' problem..
Your problem could lie with:
Unicode characters (i.e. I see some o's with two dots above them in the feed)
The code you have doesn't respect CDATA sections correctly
To find out which is the case, save the feed file to your local disk and load it via your code to make sure the error happens.
Do a binary search on the file to find out if a particular RSS entry is causing the problem (i.e. remove all but the first rss entry and see if the problem exists. If it does, then the problem is there, if it doesn't put half the rss entries back in the file and repeat)
I've been experiencing a similar issue. I haven't yet pinned down the answer, but I've noticed that RSS 2 tends to parse more successfully than the rest.
There are many RSS feeds that contain invalid XML, usually because they were hacked together on the server side using HTML templates by somebody who didn't understand XML. I've seen improperly escaped (or non-escaped) HTML post contents, missing close tags, badly nested tags, and so on.
If you want to be able to parse arbitrary feeds, you have to clean up bad XML. The usual way is to use the "htmlTidy" library, which is included in the OS. This can clean up XML as well as HTML.
This example you're following uses NSXMLParser -- I have no idea why. It's a lower-level API and it doesn't support tidying. I would suggest using NSXMLDocument instead. There's a flag in that API that will tell it to use tidy when parsing the XML. This API also returns you the XML as a handy tree of elements that's easy to work with.