hi I am working on an application which takes data from a website and it displays it in table. I have been sucessful in making like an RSS feed (made like a twitter feed so I think it is an xmlparser) but now I want to get data from a website which doesn't have RSS feed in it..I just want to get the titles from the webpage.... any suggestion how do I do it without the XMLParser...
thanks
I think that the best way is to create on your server a php/asp/... page that will scrape data from the remote website.
Then, in that page, you can use some CURL to scrape data.
See here.
Next, you return the data in the format you want (XML/jSon/etc...).
Finally, you can easily call that script from your code.
On the other hand, pay attention to not scrape anything as skimming is generally illegal and Apple ca reject your app because of that.
There is a nice post talking about it.
Related
there's only one similar question and it's not been answered. not for me at least.
There are public pages on Facebook, suppose i want to get their photostream ? their album pictures ?
I don't see how the Graph API allows me access since i can't get an access token, if i browse from my browser, NOT LOGGED IN, i can still see this public information, so how do i use the AI to access it.
BTW, i tried scrapping with python+mechanize and it's no go as if u get the regular we photo stream u get only partial, no all of it and the rest you need to scroll or to know how to build the same request the browser is building, but suprise suprise the JS doing the request is obfuscated pretty well...
Any help ?
in short it's not possible to do what i wanted annonymously, moreover at the moment, only some of the API works in the web site by accident only, if u are not logged in FB do not want to let u see any page and anyone, considering the last article about their cooperation with the US gov. it's not surprising.
In any case i just use grease money and FF to get a full page\album and then download the whole thing and i i need scrapping i'll do it on those pages with a script.
basically if i log in i can get a full page using the scripts in script monkey.... though script monkey is easier since he has the browser to parse all the data and works inside the browser...
I gave up after 3 weeks...
These days modern sites are becoming more and more service oriented like facebook/gmail.
A main page is loaded and then with ajax requests it calls all sorts of data and adds them on the site. This is also something that is promoted on ASP.NET MVC4 with the Web API.
So now lets say we want to create a product category page for a eshop. It has come to my understanding that the way to go with this implementation is to create a nice layout and create a Web API that will retrieve all data on request.
So we'll have a url like
/api/Products
that will retun a json with all of our products and then we can build up with this api by adding filters/paging maybe (/api/Products?sort-by=name) or anything else that will return the filtered json and we can pass with ajax requests back and forth offering the user an excellent experience.
My question with this now is what happens with SEO.
So a few years ago without onepage ajax/service oriented sites we would have
http://website.com/apples/
http://website.com/apples/2/
that would load the list of the apples with pagination.
Now the site would be
http://website.com/apples/
however it wouldn't load the apples instantly but load a blank page and call the service
/api/apples
that would return a json and then load the data on the site.
I read this article at Google https://developers.google.com/webmasters/ajax-crawling/docs/html-snapshot which didn't convince me. I really don't want to load the service behind and then string replace.
It is possible to have the
http://website.com/apples/
that would call the service
/api/apples
and load the data and be at the same time Google friendly?
You have a couple of options. Either you can use HTML5 pushState to update the URL, but then you also will need to create a version of your site that works without JavaScript turned on.
Another option is to use Googles AJAX Crawling specification. I don't know which search providers that currently supports it, but should be a good way to at least get into Googles search results.
Hello I am using GDataXML to parse RSS Feeds.
However most of todays feeds doesn't show the full text article. So most of the times I end up with just a tiny piece of the whole thing.
I see this feature in a lot of iPhone and iPad readers - it kinda fetches the article from the web and put it in full text.
So how do i do that?
My idea is this - the root element starts with the start of the article.
So if the root element have [article]
i need to go to the website, fetch the html code between the starting divs, and then display it in my app.
So how do i get the code between those divs? regular expressions or what? I want example thanks.
And finally how do i display images after I get the full article in html format?
Thanks guys and regards.
use MWFeedParser you will get RSS Feeds in
identifier, title, link, date, updated, summary, content, enclosures
I use MWFeedParser as well, because it will get all the elements of a feed entry, but you are correct that it will not do a "deep dive" into all of the links in the feed entry.
If you want to bring in the full content from the link, and the full content from the enclosures (such as audio or video from a podcast), you are basically talking about saving the web page for offline viewing. For a full html page, you would have to save that HTML, plus crawl the whole page and save the images, and change the path of those images so that you would be able to load it offline. It's not really the job of the RSS applications to save HTML content for offline use, but to get the elements of the RSS feed. Once you have all the links you want to save for offline use, you need to provide the code that will take a URL and save it offline.
I did a search for ios save html offline and found this post which seems pretty positive using ASIHttpRequest to save a page offline: https://stackoverflow.com/a/6698854/1072068. I would recommend you try using something like that once you get the parts of the rss feed entry from MWFeedParser.
i want parse a wikipedia page to retrieve information for my ios app, there is a parser or some tutorial that explain me how i can do it?...or to put the page in an xml format, i have look the http://www.mediawiki.org/wiki/MediaWiki page, but i haven't understood nothing, if anyone can help me please..maybe with some example...
Have you read the MediaWiki API page, the page that describes the Query action, and above all else their API FAQ? These links will tell you what URLs you should be using to get the data that you require.
Do you know how to download a URL with NSURLConnection?
To start with, try using their API to download a Wikipedia page of your choice in HTML format. There's an answer in their FAQ that tells you how to request HTML format. If you do that, you'll get something you could display in a web view and style as you'd like.
I'm trying to implement a feature like that where a user inputs a url and when displaying that url I want to have a custom display (an embed object if it's a video from youtube, a thumbnail if it's an image link, title and excerpt of body if it's a normal link).
How can such a feature be realized?
There is a new idea called oEmbed that a few sites support (Flickr, Vimeo and a few others) that addresses this problem. oEmbed site
Otherwise, just check the site against a list of ones you pick and then pull out the relevant bits to construct an embed link.
I liked the idea of oEmbed a lot but unfortunately it doesn't has that much adoption yet.
oohEmbed tries to solve this issue by building oEmbed for many websites.
For the feature to work, it needs the server's interaction where I believe the following scenario is how it works
Assume that we have the site humanzz.com and that it provides such feature
A user enters a url on the humanzz.com's webpage and presses a button like facebooks' preview button
An AJAX call is made to a dedicated page on humanzz.com
humanzz.com does calls the remote website and gets its data
The AJAX call now returns the page's data (oEmbed JSON object)
This involves so much server's overhead.
I really wanted to do it using JavaScript as the server's role was only to bypass "Same Origin Policy"'s restrictions.
oohEmbed allows bypassing the server's step by specifying a callback parameter to oohEmbed so that the JSON object returned is passed to a callback function on your page.
An example illustrating this is as follows
Add a script tag dynamically to your page
< script type="text/javascript" src="http://oohembed.com/oohembed/?url=http%3A//www.amazon.com/Myths-Innovation-Scott-Berkun/dp/0596527055/&callback=myCallBack">< /script>
This would result in executing myCallback(oEmbedJSONObject) which is great.
The problem with that solution is you still have to have a fallback for websites that don't have oEmbed representations.
For the embedded things, I have been using auto_html ( https://github.com/dejan/auto_html) with great success (vimeo, youtube, images) and even added soundcloud myself. But I am still looking for a "thumbnail" generation with an image and text facebook-like.
I guess you have to construct it by yourself by manually parsing the kind of URL you get.
If it is an image url, well then you just have to rescale it and in case the user clicks on it, then handle that by opening the original one somehow.
If it is a link to some youtube video, then you have to take a look at how the embedding of Youtube videos works. You can just copy the code that is provided by Youtube itself, and then exchange the parts with the URL to the video with the URL you got from your user.
I did never implement something like that, but I assume it should work somehow like this.