Get google to index links from javascript generated content - robots.txt

On my site I have a directory of things which is generated through jquery ajax calls, which subsequently creates the html.
To my knwoledge goole and other bots aren't aware of dom changes after the page load, and won't index the directory.
What I'd like to achieve, is to serve the search bots a dedicated page which only contains the links to the things.
Would adding a noscript tag to the directory page be a solution? (in the noscript section, I would link to a page which merely serves the links to the things.)
I've looked at both the robots.txt and the meta tag, but neither seem to do what I want.

It looks like you stumbled on the answer to this yourself, but I'll post the answer to this question anyway for posterity:
Implement Google's AJAX crawling specification. If links to your page contain #! (a URL fragment starting with an exclamation point), Googlebot will send everything after the ! to the server in the special query string parameter _escaped_fragment_.
You then look for the _escaped_fragment_ parameter in your server code, and if present, return static HTML.
(I went into a little more detail in this answer.)

Related

How do I rewrite URL to drop file extension for pdf on github pages?

Imagine my website is hosted on GitHub Pages and has a custom domain website.com. I can access a pdf at website.com/mypdf.pdf
Is there a way where I can make it work at website.com/mypdf?
As mentioned in comments, if you are using static website hosted by a 3rd party like GitHub pages, you don't really get a lot of control over http server. I would tentatively say you cannot control URL rewrite rules on GitHub.
What you could potentially do instead is to host a page with a bit of JavaScript that would start the download on a given event (button click, page load, etc) this way you could mask your actual download URL with this html page (that by convention comes with no file extension)
UPD: and surely enough someone's been doing it already: http://lea.verou.me/2016/11/url-rewriting-with-github-pages/. The post is going on about having nice urls, but I believe file downloads implementation can be implemented similarly
Yes you should make your website with MVC structure. Make a controller and in Index action load pdf file.
Then on action calling your pdf will be loaded like that:
Students/AllResult etc

AMP errors in web master tool

I have implemented AMP successfully for my webpages and google started indexing it, which I came to know via WebMaster tool. I am facing some issues which is present and disappears in short span of time.
Issue logged are:
User authored JavaScript found on page
The pages doesn't contain any script tags except schema.
This error is showing for few pages from 120 pages instead of following same
template. Below is the image link:
Have some more query:
I have observe different amp urls getting redirected to its original page when the same amp url is being used in Web Browser.
Is Google taking care of it or its on us to do the redirection?
I am planning to implement the sign in and share buttons on my web pages which will be using javascript. But if I do so, I do get validation error. So what is the right approach.
Can anyone please help me on this?
Please ensure that all script tags are of type application/ld+json. There should be no executable code in these script tags.
Redirection is something that you must be doing on your end. Google doesn't do any sort of redirection from AMP to non-amp pages if the URL is hit directly. In fact that URL schema that Google uses in their carousel is entirely their own, and just includes the path to your page inside it. E.g. https://cdn.ampproject.org/v/www.yoursitehere.com/path/to/article.html
Social sharing using Javascript inserted in the page is not allowed, as no Javascript is allowed. If you want to use social sharing, use a non-javascript implemention, or try out the amp-social-share
thanks for the response. As per the query which I asked
Please ensure that all script tags are of type application/ld+json. There should be no executable code in these script tags - I am not using any Script as of now except amp only
Redirection is something that you must be doing on your end. Google doesn't do any sort of redirection from AMP to non-amp pages if the URL is hit directly. In fact that URL schema that Google uses in their carousel is entirely their own, and just includes the path to your page inside it. E.g. https://cdn.ampproject.org/v/www.yoursitehere.com/path/to/article.html -
Understood
Social sharing using Javascript inserted in the page is not allowed, as no Javascript is allowed. If you want to use social sharing, use a non-javascript implementation, or try out the amp-social-share - Implemented Social Share and its working fine
Can we implement AMP for eCommerce sites where a lot of JavaScript, forms, plugins can be included? As of my knowledge AMP wants to keep it simple and thus restrict as many JavaScript, form tag is not valid only. So is there any chance we can implement AMP on eCommerce sites.

The title, link and description don't work

I've been reading guides and examples for a long time (hours) but I can't manage. I tried to use all html meta tag like title, description, and og:property. Also tried to use the link sharer and also to create a new blank page with just the info I want to share to facebook in order to test. Also I tried to generate an random url in php so to have always a different url variable (the url to share and also the url of the main page containing the script). I also grabbed (url linter) a lot of time the url to clean the cache of facebook. It always give me the title of the site domain as title or the url itself as the shared title and description. I don't know what to do.
The main web site is from joomla. In the code of index of joomla I put a php include if the url has the variable "articolo" id. This incuded php page has regulat head body etc. So maybe I facebook check the main meta of joomla first? So now I tried to open a popup with just the page for sharing. Look here: link
It's possible that the title is locked in, meaning that after X number of likes Facebook doesn't allow you to change it anymore. Can you give us an example URL you're having issues with?
EDIT
Ok, now the link you provided shows some very interesting output. http://modernolatina.it/wjs/index.php?option=com_content&view=article&id=96&Itemid=258&autore=6&articolo=6
First, you webserver, instead of sending back a 200 code, is sending back a 500 code.
Secondly the HTML your webserver is sending back has two HTML tags (Do a view source on the content returned)
Fix up those two issues and I think the linter will be happier with your page.
Test your page here:
http://developers.facebook.com/tools/debug

two sites, same content, how to redirect?

I've a question for you, i need to maintain two sites (let's name them example.com and yyy.com), they will be something like an alias.
I want visitors to be able to access the pages with same content via both of them.
what's the best way of doing this without getting in trouble with search engines?
I know about the 301 redirect, but i want visitors to stay on example.com or yyy.com, same name to show up in address bar, not to be redirected.
One thing you could do is to use the rel=canonical tag on the pages of the site you consider to be "the copy".
Basically, in the head section of each page's HTML you can tell which page on the "original" site has the same content.
So if (for instance) your sites are called www.yourmainsite.com and www.yoursecondsite.com, you should tag testpage.htm on yoursecondsite.com like this:
<link rel="canonical" href="http://www.yourmainsite.com/testpage.htm"/>
See here for more details.
Otherwise you can simply tell search engines not to index yoursecondsite.com in your robots.txt
User-agent: *
Disallow: /
Warning: I'm not an SEO person. I did have to implement something similar, but take my advice with a grain of salt
From the theoretical point, "Content-Location" HTTP header was invented for this as defined here and explained here.
However, search engines prefer "canonical" link tag (as in Paolo's explanation) for the same purpose because "Content-Location" header is mostly being misused by the web designers.
I would probably use both.

How can I pull in my BlogSpot page into a page on my web site

I have a blog on BlogSpot.com, and I have a domain based on my own name. I want to have a URL on my site (like http://www.mydomain.com/blog) that will then pull in the content from my blog page, but I want the URL in the address bar to stay on http://www.mydomain.com/blog, so that it does not look like you left my site.
(I have a Windows hosting account on 1and1.com)
I did Google this question, and I found how a few things, like:
1: Adding a tag in to "refresh". Tried this, but it changes the address bar.
<meta http-equiv="refresh" content="0; URL=http://myblog.blogspot.com" />
2: I also learned about the html iframe thing, but it has height and scrollbar issues.
3: Then, I found this partial code snippet, but I don't know what to do with it, or if it will even work against the BlogSpot server, or on my server:
<%
Set objHTTP = Server.CreateObject("Microsoft.XMLHTTP")
objHTTP.Open "GET", "http://myblog.blogspot.com", false
objHTTP.Send
Response.Write objHTTP.ResponseText
%>
I am a client app guy, so this web stuff is all new to me.
Any help will be greatly appreciated.
The third option will probably work for the initial page load, but any links on the page will then direct the user to the BlogSpot page, and change the url. It simply fetches the page from blogspot, and then sends it to the user without any changes.
For me, the changing url is not a big deal, as long as it's easy for the user to get from one to the other easily; have prominent links on either page that tell the user where they go. Most people don't care about the url, they just care about the content.
Using an IFrame is probably your best bet. Many Facebook applications are in IFrames and still integrate very well.
I think using a regular frame or an iFrame is probably the easiest solution. What kind of scrollbar issues did you encounter? You can set custom values for some of these attributes, just check out the documentation here:
http://www.w3schools.com/TAGS/tag_iframe.asp
If you didn't want to use frames, you could actually proxy the entire page using a server side application like Squid. However, this is more difficult to setup, requires the ability to install software and configure firewall/iptable settings on your host, and must be configured properly to prevent malicious abuse.
-Mark
Here are some options you can try:
If you have PHP installed:
<?php
echo file_get_contents('http://myblog.blogspot.com'); // or you can use fopen()
?>
Or Server-Side-Includes installed:
<!--# include virtual="http://myblog.blogspot.com" -->
You can also pull blog content from Blogspot using the Blogger Data API.
The advantage of this is that you can reformat and reorganize the content to match the style of your website. The disadvantage is that it's more work than an iframe, and you probably won't match the full functionality of Blogspot.
I'm playing with this now to see whether I can use Blogspot as a type of CMS for a club news system.