Can a rel=canonical meta tag point to a 301 redirect URL? - redirect

I'm working on a project which has versioned URLs. I want the rel=canonical meta tag to always point to the latest version, and this can always be reached via a 301 redirect.
Here are the URLs:
/example 301 redirect to /example/3
/example/1 <link rel="canonical" href="/example" />
/example/2 <link rel="canonical" href="/example" />
/example/3 <link rel="canonical" href="/example" />
Will this setup work? Which URL will a service like Google choose to index, or will it get caught in a "redirect loop" going between /example and /example/3?

So this website is now being indexed by Google. Here's how Google reacts to this scenario:
Results are shown for both /example and /example/{latestVersion} where {latestVersion} is the result of the 301 redirect (most recent version)
The /example link appears first in the results
Often the links to /example/{latestVersion} are only revealed if the user clicks on the following text:
In order to show you the most relevant results, we have omitted some entries very similar to the 13 already displayed.
If you like, you can repeat the search with the omitted results included.
I am changing the code so that the canonical URL is always /example/{latestVersion}, to ensure duplicate content is not indexed.

Related

Google Search Console: How to remove url which is under the AMP

We have added rel="amphtml" in some of the pages mistakenly so that URL is going into AMP section in google search console. But I don't want to used that url as a AMP so I have removed that rel="amphtml" from all that URL but now that all URL still showing into AMP page list.
So how can I remove from this AMP list?
Any idea please share.
Please remove this meta tag in pages and google will take time to remove search listing :
<link rel="amphtml" href="https://www.example.com/url/to/amp/document.html">
<link rel="canonical" href="https://www.example.com/url/to/full/document.html">

What is the best approach for redirection of old pages in Jekyll and GitHub Pages?

I have blog on github pages - jekyll
What is the best way to solve url strategy migration?
I found the best practice in common is create htaccess like so
Redirect 301 /programovani/2010/04/git-co-to-je-a-co-s-tim/ /2010/04/05/git-co-to-je-a-co-s-tim.html
But it does not seems to work with Github. Another solution i found is create rake task, which will generate redirection pages. But since it's an html, it's not able to send 301 head, so SE crawlers will not recognize it as an redirection.
The best solution is to use both <meta http-equiv="refresh" and <link rel="canonical" href=
It works very well, Google Bot reindexed my entire website under new links without losing positions. Also the users are redirected to the new posts right away.
<meta http-equiv="refresh" content="0; url=http://konradpodgorski.com/blog/2013/10/21/how-i-migrated-my-blog-from-wordpress-to-octopress/">
<link rel="canonical" href="http://konradpodgorski.com/blog/2013/10/21/how-i-migrated-my-blog-from-wordpress-to-octopress/" />
Using <meta http-equiv="refresh" will redirect each visitor to the new post.
As for Google Bot, it treats <link rel="canonical" href= as 301 redirect, the effect is that you get your pages reindexed and that is what you want.
I described whole process how I moved my blog from Wordpress to Octopress here.
http://konradpodgorski.com/blog/2013/10/21/how-i-migrated-my-blog-from-wordpress-to-octopress/#redirect-301-on-github-pages
Have you tried the Jekyll Alias Generator plugin?
You put the alias urls in the YAML front matter of a post:
---
layout: post
title: "My Post With Aliases"
alias: [/first-alias/index.html, /second-alias/index.html]
---
When a user visits one of the alias urls, they are redirected to the main url via a meta tag refresh:
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<meta http-equiv="refresh" content="0;url=/blog/my-post-with-aliases/" />
</head>
</html>
See also this blog post on the subject.
redirect-from plugin
https://github.com/jekyll/jekyll-redirect-from#redirect-to
It is supported by GitHub and makes it easy:
_config.yml
gems:
- jekyll-redirect-from
a.md
---
permalink: /a
redirect_to: 'http://example.com'
---
as explained at: https://help.github.com/articles/redirects-on-github-pages/
Now:
firefox localhost:4000/a
will redirect you to example.com.
The plugin takes over whenever the redirect_to is defined by the page.
Tested on GitHub pages v64.
Note: this version has a serious recently fixed bug which wrongly reuses the default layout for the redirect: https://github.com/jekyll/jekyll-redirect-from/pull/106
Manual layout method
If you don't feel like using https://github.com/jekyll/jekyll-redirect-from it's easy to implement it yourself:
a.md
---
layout: 'redirect'
permalink: /a
redir_to: 'http://example.com'
sitemap: false
---
_layouts/redirect.html based on Redirect from an HTML page :
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>Redirecting...</title>
{% comment %}
Don't use 'redirect_to' to avoid conflict
with the page redirection plugin: if that is defined
it takes over.
{% endcomment %}
<link rel="canonical" href="{{ page.redir_to }}"/>
<meta http-equiv="refresh" content="0;url={{ page.redir_to }}" />
</head>
<body>
<h1>Redirecting...</h1>
<a href="{{ page.redir_to }}">Click here if you are not redirected.<a>
<script>location='{{ page.redir_to }}'</script>
</body>
</html>
Like this example, the redirect-from plugin does not generate 301s, only meta + JavaScript redirects.
We can verify what is going on with:
curl localhost:4000/a
This solution allows you to use true HTTP redirects via .htaccess — however, nothing involving .htaccess will work on GitHub pages because they do not use Apache.
As of May 2014 GitHub Pages supports redirects, but according to the jekyll-redirect-from Gem documentation they are still based on HTTP-REFRESH (using <meta> tags), which requires full a page load before redirection can occur.
I don't like the <meta> approach so I whipped up a solution for anyone looking to provide real HTTP 301 redirects within an .htaccess file using Apache, which serves a pre-generated Jekyll site:
First, add .htaccess to the include property in _config.yml
include: [.htaccess]
Next, create an .htaccess file and be sure to include YAML front matter. Those dashes are important because now Jekyll will parse the file with Liquid, Jekyll's templating language:
---
---
DirectoryIndex index.html
RewriteEngine On
RewriteBase /
...
Make sure your posts that require redirects have two properties like so:
---
permalink: /my-new-path/
original: blog/my/old/path.php
---
Now in .htaccess, just add a loop:
{% for post in site.categories.post %}
RewriteRule ^{{ post.original }} {{ post.permalink }} [R=301,L]
{% endfor %}
This will dynamically generate .htaccess every time you build the site, and the include in your config file ensures that .htaccess makes it into _site directory.
RewriteRule ^blog/my/old/path.php /my-new-path/ [R=301,L]
From there, it's up to you to serve _site using Apache. I normally clone the full Jekyll repo into a non-webroot directory, then my vhost is a symlink to the _site folder:
ln -s /path/to/my-blog/_site /var/www/vhosts/my-blog.com
Tada! Now Apache can serve the _site folder from your virtual root, complete with .htaccess-powered redirects that use whichever HTTP response code you desire!
You could even get super fancy and use a redirect property within each post's front matter to designate which redirect code to use in your .htaccess loop.
The best option is to avoid url changes altogether by setting the permalink format in _config.yml to match your old blog.
Beyond that, the most complete solution is generating redirect pages, but isn't necessarily worth the effort. I ended up simply making my 404 page a bit friendlier, with javascript to guess the correct new url. It doesn't do anything for search, but actual users can get to the page they were looking for and there's no legacy stuff to support in the rest of the code.
http://tqcblog.com/2012/11/14/custom-404-page-for-a-github-pages-jekyll-blog/
Since github doesn't allow 301 redirects (which isn't surprising), you'll have to make a decision between moving to your new URL structure (and taking a search engine hit) or leaving the URLs the way they are. I suggest you go ahead and make the move. Let the search engine chips fall where they may. If someone hits one of your old links via the search engine, they'll be redirected to the new location. Over time, the search engines will pick up your changes.
Something you can do to help matters is to create a Sitemap where you only list your new pages and not the old ones. This should speed up the replacement of old URLs with the new ones. Additionally, if all your old URLs are in your '/programovani' directory, you can also use a robots.txt file to tell future crawls they should ignore that directory. For example:
User-agent: *
Disallow: /programovani/
It will take a little while for the search engines to catch up with the changes. This isn't really a big deal. As long as the old URLs still exist and redirect actual people to the active pages, you'll be fine.
As others have mentioned, the best solution is to preserve working URLs or duplicate the pages and specify a canonical URL.
Since github pages doesn't support true redirects, I chose to set up rerouter on Heroku to return 301 (permanent) redirects from my site's old domain to the new one. I described the details here:
http://joey.aghion.com/simple-301-redirects/
Jekyll has gone through some major updates in the past few months, so maybe this wasn't possible when this question was originally posted...
Jekyll supports a permalink attribute in the YAML front-matter section of your blog posts. You can specify the URL that you would like your post to have and Jekyll will use that (instead of the filename) when generating your site.
---
title: My special blog post
permalink: /programovani/2010/04/git-co-to-je-a-co-s-tim
---
My blog post markdown content

301/302 with document body showing 'click here if your browser doesn't redirect you' anchor

We will be implementing a tiny document body with all our 301 and 302 responses.
They will contain a small bit of html with an anchor pointing towards the URL where the user should be redirected.
Are there any pitfalls or things we should know about when doing this or is it as simple as including the html in the document body when sending out a 'location' header?
If browser will see 301/302 HTTP result code it will IGNORE document/response body and will do instant redirect to the URL specified in Location: response header.
But yes -- you can display such page and do redirect to a new URL .. but this will be the same as normal click on a link (and not 301/302 redirect in any means) and therefore is not good for SEO purposes. If interested -- this is how it can be done:
When user hits such page, show him/her your redirect message/page. In that page such redirect can be achieved in 2 ways:
Using JavaScript -- window.location = "http://www.example.com/new-url". All what you need to do is to execute this code 10 seconds after page is loaded -- for that use setTimeout() functionality.
Without JavaScript (preferred method as it will work even if JavaScript is disabled or not available) using <meta http-equiv="refresh" header line:
<meta http-equiv="refresh" content="10; url=http://www.example.com/new-url">

301 Redirects using Typepad

Does any know if it is possible to do redirects using Typepad?
Not domain mapping but pure honest to goodness 301 redirects. There seems to be some mention of a .htaccess file put in the root of the site working but it has not worked for me.
In your header, you can add a meta Refresh tag with time set to zero, pointing to your new blog.
Example:
<meta http-equiv="refresh" content="0;url=http://example.com/" />
If Typepad allows, insert this inside your <head> section. And while you're at it, add a rel=canonical tag pointing to your new domain, too.

How to prevent favicon.ico requests?

I don't have a favicon.ico, but my browser always makes a request for it.
Is it possible to prevent the browser from making a request for the favicon from my site? Maybe some META-TAG in the HTML header?
I will first say that having a favicon in a Web page is a good thing (normally).
However it is not always desired and sometime developers need a way to avoid the extra payload. For example an IFRAME would request a favicon without showing it.
Worst yet, in Chrome and Android an IFRAME will generate 3 requests for favicons:
"GET /favicon.ico HTTP/1.1" 404 183
"GET /apple-touch-icon-precomposed.png HTTP/1.1" 404 197
"GET /apple-touch-icon.png HTTP/1.1" 404 189
The following uses data URI and can be used to avoid fake favicon requests:
<link rel="shortcut icon" href="data:image/x-icon;," type="image/x-icon">
For references see here:
https://github.com/h5bp/html5-boilerplate/issues/1103
https://twitter.com/diegoperini/status/4882543836930048
UPDATE 1:
From the comments (jpic) it looks like Firefox >= 25 doesn't like the above syntax anymore. I tested on Firefox 27 and it doesn't work while it still work on Webkit/Chrome.
So here is the new one that should cover all recent browsers. I tested Safari, Chrome and Firefox:
<link rel="icon" href="data:;base64,=">
I left out the "shortcut" name from the "rel" attribute value since that's only for older IE and versions of IE < 8 doesn't like dataURIs either. Not tested on IE8.
UPDATE 2:
If you need your document to validate against HTML5 use this instead:
<link rel="icon" href="data:;base64,iVBORw0KGgo=">
Just add the following line to the <head> section of your HTML file:
<link rel="icon" href="data:,">
Features of this solution:
100% valid HTML5
very short
does not incur any quirks from IE 8 and older
does not make the browser interpret the current HTML code as favicon (which would be the case with href="#")
You can use the following HTML in your <head> element:
<link rel="shortcut icon" href="#" />
I tested this on a forced full refresh, and no favicon requests were seen in Fiddler. (tested against IE8 in compat mode as IE7 standards, and FF 3.6)
Note: this may download the html file twice, so while it works in hiding the error, it comes with a cost.
You can't. All you can do is to make that image as small as possible and set some cache invalidation headers (Expires, Cache-Control) far in the future. Here's what Yahoo! has to say about favicon.ico requests.
if you use nginx
# skip favicon.ico
#
location = /favicon.ico {
access_log off;
return 204;
}
Put this into your HTML head:
<link rel="icon" href="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAIAAACQd1PeAAAADElEQVQI12P4//8/AAX+Av7czFnnAAAAAElFTkSuQmCC">
This is a bit larger than the other answers, but does contain an actually valid PNG image (1x1 pixel white).
The easiest way to block these temporarily for testing purposes is to open up the inspect page in chrome by right-clicking anywhere on the page and clicking inspect or by pressing Ctrl+Shift+j and then going to the networking tab and then reloading the page which will send all the requests your page is supposed to make including that annoying favicon.ico. You can now simply right click the favicon.ico request and click "Block request URL".
All of the above answers are for devs who control the app source code. If you are a sysadmin, who's figuring out load-balancer or proxying configuration and is annoyed by this favicon.ico shenanigans, this simple trick does a better job. This answer is for Chrome, but I think there should be a similar alternative which you would figure out for Firefox/Opera/Tor/any other browser :)
You can use .htaccess or server directives to deny access to favicon.ico, but the server will send an access denied reply to the browser and this still slows page access.
You can stop the browser requesting favicon.ico when a user returns to your site, by getting it to stay in the browser cache.
First, provide a small favicon.ico image, could be blank, but as small as possible. I made a black and white one under 200 bytes. Then, using .htaccess or server directives, set the file Expires header a month or two in the future. When the same user comes back to your site it will be loaded from the browser cache and no request will go to your site. No more 404's in the server logs too.
If you have control over a complete Apache server or maybe a virtual server you can do this:-
If the server document root is say /var/www/html then add this to /etc/httpd/conf/httpd.conf:-
Alias /favicon.ico "/var/www/html/favicon.ico"
<Directory "/var/www/html">
<Files favicon.ico>
ExpiresActive On
ExpiresDefault "access plus 1 month"
</Files>
</Directory>
Then a single favicon.ico will work for all the virtual hosted sites since you are aliasing it. It will be drawn from the browser cache for a month after the users visit.
For .htaccess this is reported to work (not checked by me):-
AddType image/x-icon .ico
ExpiresActive On
ExpiresByType image/x-icon "access plus 1 month"
A very simple solution is put the below code in your .htaccess. I had the same issue and it solve my problem.
<IfModule mod_alias.c>
RedirectMatch 403 favicon.ico
</IfModule>
Reference: http://perishablepress.com/block-favicon-url-404-requests/
Elaborating on previous answers, this might be the shortest solution from the HTML file itself:
<link rel="shortcut icon" href="data:" />
Tested working, no error messages or failed requests on Chrome Version 94.0.4606.81
Just make it simple with :
<link rel="shortcut icon" href="#" type="image/x-icon">
It displays nothing!!!!
In Node.js,
res.writeHead(200, {'Content-Type': 'text/plain', 'Link': 'rel="shortcut icon" href="#"'} );
Personally I used this in my HTML head tag:
<link rel="shortcut icon" href="#" />
I need prevent request AND have icon displayed i.e. in Chrome.
Quick code to try in <head>:
<link rel="icon" type="image/png" sizes="16x16" href="data:image/png;base64,
iVBORw0KGgoAAAANSUhEUgAAABAAAAAQBAMAAADt3eJSAAAAMFBMVEU0OkArMjhobHEoPUPFEBIu
O0L+AAC2FBZ2JyuNICOfGx7xAwTjCAlCNTvVDA1aLzQ3COjMAAAAVUlEQVQI12NgwAaCDSA0888G
CItjn0szWGBJTVoGSCjWs8TleQCQYV95evdxkFT8Kpe0PLDi5WfKd4LUsN5zS1sKFolt8bwAZrCa
GqNYJAgFDEpQAAAzmxafI4vZWwAAAABJRU5ErkJggg==" />
In our experience, with Apache falling over on request of favicon.ico, we commented out extra headers in the .htaccess file.
For example we had
Header set X-XSS-Protection "1; mode=block"
... but we had forgotten to sudo a2enmod headers beforehand. Commenting out extra headers being sent resolved our favicon.ico issue.
We also had several virtual hosts set up for development, and only failed out with 500 Internal Server Error when using http://localhost and fetching /favicon.ico. If you run "curl -v http://localhost/favicon.ico" and get a warning about the host name not being in the resolver cache or something to that effect, you might experience problems.
It could be as simple as not fetching (we tried that and it didn't work, because our root cause was different) or look around for directives in apache2.conf or .htaccess which might be causing strange 500 Internal Server Error messages.
We found it failed so quickly there was nothing useful in Apache's error logs whatsoever and spent an entire morning changing small things here and there until we resolved the problem of setting extra headers when we had forgotten to have mod_headers loaded!
Sometimes this error comes, when HTML has some commented code and browser is trying to look for something. Like in my case I had commented code for a web form in flask and I was getting this.
After spending 2 hours I fixed it in the following ways:
1) I created a new python environment and then it threw an error on the commented HTML line, before this I was only thrown error 'GET /favicon.ico HTTP/1.1" 404'
2) Sometimes, when I had a duplicate code, like python file existing with the same name, then also I saw this error, try removing those too
If you are not using HTML and it's auto-generated by Flask or some frameworks you can always add a dummy route in the app to just return dummy text to fix this issue.
Or
.
.
.
you can just add the favicon :)
Eg for Python Flask Application.
#app.route('/favicon.ico')
def favicon():
return 'dummy', 200
I solved this problem by using the Content-Security-Policy HTTP response header. By using this, is possible to block the browser from making further media queries like images (other types are also possible). I added the following header to the response:
Content-Security-Policy: img-src 'none'
The problem is it will block ALL image queries. If your HTML has any image, they won't be loaded. In my case it was very likely a bug in Firefox because the browser was requesting the favicon.ico for a response whose Content-type is text/xml!
It also depends on the browser implementing this feature as is enforced on the client side.
Check https://content-security-policy.com for a complete guide on CSP.
Cheers!
You could use
<link rel="shortcut icon" href="http://localhost/" />
That way it won't actually be requested from the server.