Indexing an HTML page that redirects onload - gwt

I have a pure GWT based website and as we are aware the search engines cannot index pure gwt based websites. Thus, I have created an alternate web page as shown below which is stored as a separate html in the war folder. The purpose of this webpage is to enlist and index details regarding my website. This page is never displayed on my website, but instead is meant only for indexing. The url leading to this web page is part of the Sitemaps.xml. Thus I am assuming that the below html will be indexed because it's a part of Sitemaps. So here are my questions:
Will the content I give in the div with id "crawler" be indexed given the fact that it is scheduled for removal onload and that the browser is redirected to another url on load?
Is there a better way to get the content indexed for a pure GWT website which does not have any html based user interface?
I can also have urls that will invoke a servlet and return a response that is meant for indexing. But then the same url will be displayed in search results, which is not useful. In other words, I am trying to figure out a way in which the content gets indexed, but when the user clicks the search result he should be redirected to the home page instead of showing the indexed content.
<head>
<script>
function load(){
element = document.getElementById("crawler");
element.parentNode.removeChild(element);
window.location.href='http://<mysite>.com';
}
</script>
</head>
<body onLoad='load()'>
<div id="crawler">
<CONTENT TO BE INDEXED>......
</div>
</body>
As you can see here the div (crawler) that contains all the content that is meant for indexing, is removed as soon as the body loads. Apart from this the page also redirects to the home page of the site on load.

The crawler will read in the entire contents of the page for indexing, so it will have no trouble picking up the portion within the div. The onload is not executed by the crawler prior to reading the page.
A method I have used in the past was to generate static html versions of the pages and reference these through the sitemap.xml. Users landing on the html page would then be directed to the equivalent dynamic page when they click on a link (ie: Buy or Specifications). This worked well for search engine placement with many pages appearing in the top ten.

The best solution to notify the search engines about an undiscoverable website's content is to create a HTML website (as you did). If you create redirects based on the crawler, search engines will not love you. I think you have to fill out your HTML with content with relevant information and add
<link rel="canonical" href="https://gwtsite.com/exact_url"/>
tag to your website's head section. This will notify the search engines that the other site has to appear in the SERP-s instead of the HTML one.

Related

How to manipulate the meta area of the HTML dom with Scala-JS for a single page application

General Scala-JS page building advice needed. Most of the examples seem to be of the pattern where the main into which your single page application will go is between the tags in a landing page html file. How do you handle the need to insert something in the meta area of the dom? Do I need to render my landing page dynamically from the server to accomplish this? My specific need is to inject a script tag into the meta area of an already defined static html page. I'm using scalajs-react.
Generally you will want a server-rendered "root page" for the SPA. This allows you to dynamically compute proper cache busting file names for your script and stylesheet tags and to easily manage the cache expiration of the root page. Also, for proper html5 push state support you'll want to serve that page at every URL, which is easily done with a server side route.

How can I render a completed CGI form as a PDF?

I have an HTML form which a user may have filled in or partially filled in. I want to snapshot that state and render it as a PDF document. I've been using wkhtmltopdf.
I've tried this from both the client side and the server side, and the rendered result is always the original form, never the filled-in one.
I notice if I reload the filled-in form page I get back the filled-in form, but if I cut and paste the form's URL into a new window, I get the initial, non-filled-in form.
So I've convinced myself that, if I could use CGI::Session properly, I could successfully open a session identical to the filled-in session. I tried using CGI::Session::Plugin::Redirect with no joy. I think the key is that window.open() has to use the SID of the filled-in form window.
I don't have a lot of experience with CGI session management, so this has been a four-day quest to nowhere. Any advice is appreciated, even if it's to abandon this approach and go back to the more common post->render a new form in a new window, and generate the PDF from that. I'd like to avoid all of that if I can.
Say you have the following HTML document on your web server:
/var/www/html/index.html
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
</head>
<body>
<form action="/process.cgi">
<input type="text" name="foo">
</form>
</body>
</html>
When you navigate to http://hostname/index.html in your browser, the webserver returns this document and the browser displays it.
When you fill in the text field in your browser, the document on the webserver doesn't change. So anybody who navigates to http://hostname/index.html will get the original, unmodified form. This is why you can't simply copy and paste the URL into another browser tab and get the filled-in form.
Most browsers use caching by default. When you fill in some fields in a form, the browser caches what you entered. When you reload the page, the webserver sends the exact same document as before* (i.e. the unmodified form), but the browser uses the cached data to fill in the form fields the way you had them. If you override the cache when you reload the page (Ctrl+F5 in Firefox), the form fields will not be filled in. Note that neither the URL nor the document on the server have changed. This is why you can't copy and paste the URL into another browser tab after reloading the page and get the filled-in form.
wkhtmltopdf takes a URL, renders the corresponding page, and generates a PDF based on what is rendered. Based on the explanation above, it should now be clear why wkhtmltopdf always generates an image of the unmodified form.
The solution
If filling in form fields doesn't change anything on the webserver, what does it change? It changes the DOM, a structure describing the document in your browser that you can access using JavaScript.
One approach would be to use a client-side JavaScript PDF generator like jsPDF; since it runs on the client, it has access to the DOM that the user is interacting with, so it can "see" the values the user enters into the form fields.
* Actually, the webserver will typically send a 304 Unmodified response to save bandwidth, but form caching works the same either way.
The explanation from ThisSuitIsBlackNot is accurate about why your design is failing. Typing characters into form fields in a browser changes only your screen and the data in the memory allocated to the browser.
I suggest a different solution. The WWW::Mechanize::Firefox module is a variant of WWW::Mechanize that uses a real browser application to retrieve and render web pages. It is mostly chosen when a site requires JavaScript support, but it is useful here because it has a content_as_png method which returns a PNG image of the current page. Hopefully that is enough for you to build a PDF file with the required content

Using Adobe Test-and-Target, how do I avoid seeing the first page before the redirect kicks in?

I've got an A/B Test set up in Adobe Test-and-Target. The idea is that 50% of the time, visitors to a certain page should be redirected to a different page instead. It is working correctly, in that half of users are sent to the new page.
However, sometimes the entire original page is loaded before the redirect happens. I put the mbox in the head tag of the page, which I thought would ensure the redirect happened before any HTML was displayed to the user, but that's not happening.
How can I create a seamless result for the user, where the redirected users only see the new page loading, and never see the original page?
For our site, the <script src="http://maur.imageg.net/js/mbox.js" ></script>is at the very end of the head tag and works fine.
Your mbox.js should be as close to the top as possible and then your inline mbox should be defined preferably right after the tag. This way the request is made before the content starts to render, and the redirect kicks in before the guest even sees the page.
Avoid using anything post DOM related for example jQuery's:
document.ready( function{});
If you paste your code you're using for the A/B - we be able to review & respond accordingly.
However pure Javascript and pure CSS should execute seamlessly.
You can use CSS first to not show anything
<style>
body {display:none!important}
</style>
Then use JavaScript to redirect the page to new page.

og meta tags, social buttons and angularjs

I'm creating a website using multiple views.
The tag and the tags of the page get changed through a a $rootScope variable.
so I have something like
<html>
<head>
<title ng-bind="page_title"></title>
<meta property="og:title" content="{{page_title}}">
</head>
Whenever each view get loaded on the website, the page_title variable changes and the title and the og:title tags get updated (everything works as expected).
The problem is that I need, on some views to load a facebook, a google+ and a twitter button.
I can display them properly but if I click on each them the page title appear to be something like:
{{page_title}}
I've tried to delay the execution of the scripts of each button using setTimeOut but to no good.
But the scripts just read whatever is written, they don't parse the page_title.
Does anyone know a workaround to this?
Thank you
This can't be done using javascript. Some people think that Facebook is reading what's currently on the page. It's not. It makes a separate request to your server using the same url (from window.location.href) using it's Scraper, and the Facebook Scraper does not run javascript. That's why you get {{page_title}} when clicking on something like a Facebook share button. Your content will have to be generated by the server so when Facebook goes to hit the url it gets the content it needs up front without the need for javascript. You can tackle the server side rendering in a fews ways.
You can allow your server side technology to render the content.
You can use the PhantomJS approach https://github.com/steeve/angular-seo.
There's also a possibility that you can re-render Facebook widgets. Use their parse method:
FB.XFBML.parse();
after your angular stuff has completed. It's not working for my share button (yet!!), but I tested it on likes, and it's cool. Basically it re-scans the DOM and renders the Facebook widgets. You can also pass it a single element, something like this directive:
'use strict';
angular.module('ngApp')
.directive("fbLike", function($rootScope) {
return function (scope, iElement, iAttrs) {
if (FB && scope.$last) {
FB.XFBML.parse(iElement[0]);
}
};
});
This snippet would rescan the DOM for html5 facebook fb-like widgets when creating the last element in angular repeater.

How to get "Facebook Like" to focus on items in the page not the page itself?

<iframe src='http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fwww.example.com&layout=standard&show_faces=false&width=100&action=like&colorscheme=light&height=35' scrolling='no' frameborder='0' style='border:none; overflow:hidden; width:100px; height:35px;' allowTransparency='true'></iframe>
or:
<script src='http://connect.facebook.net/en_US/all.js#xfbml=1'></script><fb:like href='http://www.example.com' show_faces='false' width='100'></fb:like>
I have media files that are loaded dynamically onto a page.
After the file is loaded, I want to put out a Facebook Like button for it so that the text that appears in the Facebook News Feed is specific to the media item.
At the moment it always describes the page itself.
In the developer docs it says to do this you need to set Open Graph info in the page HEAD section.
I don't see how that would work with the multiple media items I'm dealing with.
Create a new page for each media item (as a parameter to a script) so that FB indexes each individually:
<fb:like href='http://www.example.com/fblike.php?item=name_id_url_whatever' show_faces='false' width='100'></fb:like>
Where fblike.php redirects back to the main example.com or whatever page the media item in question is on. You'll probably need some sort of database to track it, unless everything links back to the main page.