SEO and Javascript Data Load - rest

These days modern sites are becoming more and more service oriented like facebook/gmail.
A main page is loaded and then with ajax requests it calls all sorts of data and adds them on the site. This is also something that is promoted on ASP.NET MVC4 with the Web API.
So now lets say we want to create a product category page for a eshop. It has come to my understanding that the way to go with this implementation is to create a nice layout and create a Web API that will retrieve all data on request.
So we'll have a url like
/api/Products
that will retun a json with all of our products and then we can build up with this api by adding filters/paging maybe (/api/Products?sort-by=name) or anything else that will return the filtered json and we can pass with ajax requests back and forth offering the user an excellent experience.
My question with this now is what happens with SEO.
So a few years ago without onepage ajax/service oriented sites we would have
http://website.com/apples/
http://website.com/apples/2/
that would load the list of the apples with pagination.
Now the site would be
http://website.com/apples/
however it wouldn't load the apples instantly but load a blank page and call the service
/api/apples
that would return a json and then load the data on the site.
I read this article at Google https://developers.google.com/webmasters/ajax-crawling/docs/html-snapshot which didn't convince me. I really don't want to load the service behind and then string replace.
It is possible to have the
http://website.com/apples/
that would call the service
/api/apples
and load the data and be at the same time Google friendly?

You have a couple of options. Either you can use HTML5 pushState to update the URL, but then you also will need to create a version of your site that works without JavaScript turned on.
Another option is to use Googles AJAX Crawling specification. I don't know which search providers that currently supports it, but should be a good way to at least get into Googles search results.

Related

RESTful-compliant "create page" URL

I have a question about HTTP method mapping based on RESTful architecture. This is a really widely discussed topic, I know, but I can't find a useful example, unfortunately.
What I'm interested in is how I should map "create new" page to an HTTP request/URL. Most web-application use AJAX with popups or something similar, so there's no difference between a list of resources(users/) and a creation of a new user(users/), and hence the creation is not bookmarkable, which is justifiable, for sure, since in general why would one want to bookmark a page with a bunch of input fields if they will be empty anyways and all the data should be filled from scratch.
Incidentally, SO uses "questions/ask". What do you use/prefer and what is more RESTful-compliant in your opinion?
I mostly use GET /users/new to serve creation form and POST /users/new to submit the form

Merge Orchard Blog Into Existing Website

I'm trying to determine the best way to "merge" my orchard blog into my existing website. Currently the blog accessed outside the site.
I threw together a quick view in my MVC site that just loads the blog into an iframe. Any other ideas?
The blog is tuned up with a great theme and tons of mods & styling that matches my main site design to a T.
On the home page of my site, I'm using the RSS feed to output a list of the last 3 blog posts. My idea is that the user will click on a blog post link and go directly the view that hosts the blog in the inline frame.
I guess the only variable that I haven't handled yet is how to load up the correct page in the blog based on the link that the user clicked on my main site home page.
I've read other posts on this subject and it seems like the solution that is always offered is to merge all the code from the main website into Orchard which seems insane...I have a very large auction based website, taking all that logic & content and putting into Orchard is not an option.
Hope all that makes sense, thanks for the input. I can't think it would be a huge issue to "seamless" integrate my blog with my MVC site.
Orchard was never designed to be integrated into an existing application, so something like what you've done is what you have to do. The iframe however has a number of problems, such as its fixed size, and awkward navigation. It's better to integrate data than markup. It's now easy to build WebAPI controllers to expose Orchard data. You could consume that data in your application and render it there. That enables you to manipulate the data before rendering, which is of course easier than manipulating rendered HTML. For example, you can build your own link URLs so that clicking on a post's title goes to an action on your site that fetches the post contents rather than the Orchard post URL.
One final comment: It is a little weird that an auction website would need to integrate a blog in the middle of its own rendering. Shouldn't the blog be a separate section of the site?

How do I define an edit page within a REST API?

In a Play app I'm designing, these are some of my routes
POST /visits controllers.Visit.create
GET /visits controllers.Visit.visits
GET /visits/:id controllers.Visit.visit(id: Long)
PUT /visits/:id controllers.Visit.update(id: Long)
DELETE /visits/:id controllers.Visit.delete(id: Long)
I'm supporting a browser interface too. I'm following with the guidance I saw here:
RESTful on Play! framework
I can easily provide an HTML template to display detailed information about one specific visit, or a list of visits. But how does an "edit page" fall cleanly into this, which would have to be prefilled with the information from a particular visit? I can easily do something like: GET /visits/:id/edit controllers.Visit.edit(id: Long) which would return a prefilled page with the visit information, or I can have a static HTML page which calls the /visits/:id with an AJAX call to populate the fields, and this would let me avoid corrupting my resource-driven API with a browser page-specific route. Or is there some better option? What is best practice and why?
In REST it doesn't make sense for you to create additional resources simply to perform standardized actions. Everyone who knows the HTTP protocol knows your visit object should be editable through a PATCH request with the diff you want to be applied, or through a PUT request that replaces the whole resource with a new one. Why create a custom and non-standard edit action with POST, that you will have to document and explain to everyone how it works?
In that sense, I'd say your best option is having a static HTML page that drives your API, using a GET on /visits/:id to populate the fields, and use the edited content to replace /visits/:id with a PUT when submitted.
However, keep in mind that there's nothing wrong with having a browser page-specific route in your API, as long as you're respecting the Accept header sent by the client. On my APIs, I sometimes have some routes returning a human friendly representation of the resource when the request is made using the Accept: text/html header, so it's a simple way to have a builtin admin client and the API can be easily explored with a browser. In REST, the only difference between an API and a WEB page is that the first returns a representation in a machine-friendly format, and the second a representation that you expect to be rendered by a browser in a human-friendly document. Both of them are supposed to have the links and/or forms with directions on how to edit the resource.

iPhone web services NSURL

hi I am working on an application which takes data from a website and it displays it in table. I have been sucessful in making like an RSS feed (made like a twitter feed so I think it is an xmlparser) but now I want to get data from a website which doesn't have RSS feed in it..I just want to get the titles from the webpage.... any suggestion how do I do it without the XMLParser...
thanks
I think that the best way is to create on your server a php/asp/... page that will scrape data from the remote website.
Then, in that page, you can use some CURL to scrape data.
See here.
Next, you return the data in the format you want (XML/jSon/etc...).
Finally, you can easily call that script from your code.
On the other hand, pay attention to not scrape anything as skimming is generally illegal and Apple ca reject your app because of that.
There is a nice post talking about it.

Best way to store data for Greasemonkey based crawler?

I want to crawl a site with Greasemonkey and wonder if there is a better way to temporarily store values than with GM_setValue.
What I want to do is crawl my contacts in a social network and extract the Twitter URLs from their profile pages.
My current plan is to open each profile in it's own tab, so that it looks more like a normal browsing person (ie css, scrits and images will be loaded by the browser). Then store the Twitter URL with GM_setValue. Once all profile pages have been crawled, create a page using the stored values.
I am not so happy with the storage option, though. Maybe there is a better way?
I have considered inserting the user profiles into the current page so that I could all process them with the same script instance, but I am not sure if XMLHttpRequest looks indistignuishable from normal user initiated requests.
I've had a similar project where I needed to get a whole lot of (invoice line data) from a website, and export it into an accounting database.
You could create a .aspx (or PHP etc) back end, which processes POST data and stores it in a database.
Any data you want from a single page can be stored in a form (hidden using style properties if you want), using field names or id's to identify the data. Then all you need to do is make the form action an .aspx page and submit the form using javascript.
(Alternatively you could add a submit button to the page, allowing you to check the form values before submitting to the database).
I think you should first ask yourself why you want to use Greasemonkey for your particular problem. Greasemonkey was developed as a way to modify one's browsing experience -- not as a web spider. While you might be able to get Greasemonkey to do this using GM_setValue, I think you will find your solution to be kludgy and hard to develop. That, and it will require many manual steps (like opening all of those tabs, clearing the Greasemonkey variables between runs of your script, etc).
Does anything you are doing require the JavaScript on the page to be executed? If so, you may want to consider using Perl and WWW::Mechanize::Plugin::JavaScript. Otherwise, I would recommend that you do all of this in a simple Python script. You will want to take a look at the urllib2 module. For example, take a look at the following code (note that it uses cookielib to support cookies, which you will most likely need if your script requires you to be logged into a site):
import urllib2
import cookielib
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookielib.CookieJar()))
response = opener.open("http://twitter.com/someguy")
responseText = response.read()
Then you can do all of the processing you want using regular expressions.
Have you considered Google Gears? That would give you access to a local SQLite database which you can store large amounts of information in.
The reason for wanting Greasemonkey
is that the page to be crawled does
not really approve of robots.
Greasemonkey seemed like the easiest
way to make the crawler look
legitimate.
Actually tainting your crawler through the browser does not make it that more legitimate. You are still breaking the terms of use of the site! WWW::Mechanize for example is equally well suited to 'spoof' your User Agent String, but that and crawling is, if the site does not allow spiders/crawlers, illegal!
The reason for wanting Greasemonkey is that the page to be crawled does not really approve of robots. Greasemonkey seemed like the easiest way to make the crawler look legitimate.
I think this is the the hardest way imaginable to make a crawler look legitimate. Spoofing a web browser is trivially easy with some basic understanding of HTTP headers.
Also, some sites have heuristics that look for clients that behave like spiders, so simply making requests look like browser doesn't mean the won't know what you are doing.