SEO redirects for removed pages - redirect

Apologies if SO is not the right place for this, but there are 700+ other SEO questions on here.
I'm a senior developer for a travel site with 12k+ pages. We completely redeveloped the site and relaunched in January, and with the volatile nature of travel, there are many pages which are no longer on the site. Examples:
/destinations/africa/senegal.aspx
/destinations/africa/features.aspx
Of course, we have a 404 page in place (and it's a hard 404 page rather than a 30x redirect to a 404).
Our SEO advisor has asked us to 30x redirect all our 404 pages (as found in Webmaster Tools), his argument being that 404's are damaging to our pagerank. He'd want us to redirect our Senegal and features pages above to the Africa page (which doesn't contain the content previously found on Senegal.aspx or features.aspx).
An equivalent for SO would be taking a url for a removed question and redirecting it to /questions rather than showing a 404 'Question/Page not found'.
My argument is that, as these pages are no longer on the site, 404 is the correct status to return. I'd also argue that redirecting these to less relevant pages could damage our SEO (due to duplicate content perhaps)? It's also very time consuming redirecting all 404's when our site takes some content from our in-house system, which adds/removes content at will.
Thanks for any advice,
Adam

The correct status to return is 410 Gone. I wouldn't want to speculate about what search engines will do if they are redirected to a page with entirely different content.

As I know 404 is quite bad for SEO because your site won't get any PageRank for pages linked from somewhere but missing.
I would added another page, which will explain that due to redesign original pages are not available, offering links to some other most relevant pages. (e.g. to Africa and FAQ) Then this page sounds like a good 301 answer for those pages.

This is actually a good idea.
As described at http://www.seomoz.org/blog/url-rewrites-and-301-redirects-how-does-it-all-work
(which is a good resource for the non seo people here)
404 is obviously not good. A 301 tells spiders/users that this is a permanent redirect of a source. The content should not get flagged as duplicate because you are not sending a 200 (good page) response and so there is nothing spidered/compared.
This IS kinda a grey hat tactic though so be careful, it would be much better to put actual 301 pages in place where it is looking for the page and also to find who posted the erroneous link and if possible, correct it.

I agree that 404 is the correct status, but than again you should take a step back and answer the following questions:
Do these old pages have any inbound links?
Did these old pages have any good, relevant content to the page you are 301'ing it to?
Is there any active traffic that is trying to reach these pages?
While the pages may not exist I would investigate the pages in question with those 3 questions, because you can steer incoming traffic and page rank to other existing pages that either need the PR/traffic or to pages that are already high traffic.
With regards to your in house SEO saying you are losing PR this can be true of those pages have inbound links, because you they will be met with a 404 status code and will not pass link juice, since nothing exists there any more. That's why 301's rock.

404s should not affect overall pagerank of other web pages of a website.
If they are really gone then 404/410 is appropriate. Check the official google webmasters blog.

Related

Redirects and metadata

I wondered if someone could answer this question.
When putting in place 301 redirects for an old website to a new website. Would the metadata from the old website show on Google. If so, what is the best way to resole this?
How will our meta description, google preview and such like be impacted by the redirect? Meaning, will the current ones still show up once the redirect is in place, or will it be the meta description and google preview of the url it is being pointed to?
I guess that question applies to pretty much all of the current site settings/errors. Will we still be ranked on these and therefore is it in our interest to fix any errors with on the old site or should all the focus be on the destination domain, i.e. will any errors or settings on the referring domain no longer matter?

Disallow certain URLs in robot.txt

I'm currently running a web service where people can browse products. The URL for that is basically just /products/product_pk/. However, we don't serve products with certain product_pks, e.g. nothing smaller than 200. Is there hence a way to discourage bots to hit URLs like /products/10/ (because they will receive a 404)?
Thank you for your help :)
I am pretty sure that crawlers don't try and fail autogenerated urls. It crawls your website and find the next links to crawl. If you have any links that return 404, that is bad design on your site, since they should not be there.

Googlebot guesses urls. How to avoid/handle this crawling

Googlebot is crawling our site. Based on our URL structure it is guessing new possible URLs.
Our structure is of the kind /x/y/z/param1.value. Now google bot exchanges the values of x,y,z and value with tons of different keywords.
Problem is, that each call triggers a very expensive operation and it will return positive results only in very rare cases.
I tried to set an url parameter in the crawling section of the webmasters tools (param1. -> no crawling). But this seems not to work, probably cause of our inline url format (would it be better to use the html get format ?param1=..?)
As Disallow: */param1.* seems not to be an allowed robots.txt entry, is there another way to disallow google from crawling this sites?
As another solution I thought of detecting the googlebot and returning him a special page.
But I have heard that this will be punished by google.
Currently we always return a http status code 200 and a human readable page, which says: "No targets for your filter critera found". Would It help to return another status code?
Note: This is probably not a general answer!
Joachim was right. It turned out that the googlebot is not guessing URLs.
Doing a bit of research I found out that I added a new DIV in my site containing those special URLs half a year ago (which I unfortunately forgot). A week ago googlebot has started crawling it.
My solution: I deleted the DIV and also I return a 404 status code on those URLs. I think, sooner or later, googlebot will now stop crawling the URLs after revisiting my site.
Thanks for the help!

How to tell google that a specific page of my website disappeared and won't come back?

I have a website where 50% of the pages have a limited lifetime.
To give an idea, 4.000 pages appear each week and the same amount disappears.
By "appearing" and "disappearing", I mean that the appearing pages are completely new ones, and disappearing pages are removed from the website forever. There is no "this new page replaces this old page".
I naively used a 410 code on every URL where a page had disappeared.
Meaning the url http://mywebsite/this-page-was-present-until-yesterday.php returned until yesterday a 200 OK code, and returns now a 410 Gone code.
I didn't use no redirect, because I want to tell the user that the URL he accessed isn't wrong, but that it is expired.
The problem is : Google won't acknowledge this information. It is still crawling the pages and Webmaster Tools alerts me as if the page was 404 broken. This affects significantly my "reputation".
Did I do something wrong ? How should I proceed ?
It's always a very good idea to make your own error page. This can save you a lot of visits through broken links.
.htaccess error pages
The Webmaster Tools of Google enables you to delete certain pages.
You can find this under "crawler access".
Try adding a noindex header.

Dealing with 301 redirects for a brand new website

I have seen multiple articles on redirecting Urls when the site has been redesigned or Url just changed to a standard format but I need to know how to manage when the Url has no correlation to the old one.
For instance, an old Url may have been www.mysite.com/index.php?product=12 but there is no way to map that Url to the new site.
I don't want search engines to think that the page has broken so I assume the best thing to do is to 301 redirect to the home page but I am not sure how I would do that effectively. Would I just change the 404 error page to do a 301 to the home page?
Also, would that then cause issues with duplicate content via dofferent Urls?
Is it better to just not worry about these and let the search engines re-index the new Urls?
I am running IIS7 with Rewrite module and ASP.NET 2.
Thanks.
Why do you say there is no way to map that URL to the new one? There probably is, since both should be unique identifiers for a given resource. If your site has good rankings, it may be worth the pain to work this out and have a 301 redirect to the right page. In this way, the ranks should be unchanged.
Redirecting everything to the new home page will probably have a negative effect. It really depends on how the bots are going to interpret this. But it may seem an artificial way to increase the rank of the home page, and correspondingly get a penalty.
Doing nothing and waiting for the bots to index your new site will of course work, but often you cannot afford to lose the high rank you have gained.
All in all, I would advise you to ask here a new question on how to map the old URLs to the new ones, and do proper redirects.
That product URL you supplied is obviously, well, a product. The best bet is to 301 redirect it to a new page that is the most relevant to that old page. If there aren't any external links even pointing to it at all, just let it die. Be sure to remove it from any sitemaps or old navigation links you may have internally though or it will keep getting re-indexed which is what you want to avoid.
Once you have your new site structure set up, visit a site like AuditMyPc.com and create a brand new sitemap of your new site setup. Then login to Google Webmaster Tools and resubmit the new sitemap. This normally will fix the problem, but if that page is indexed, expect it to stay in Google's index for a while. They don't clean themselves up too well.