We redeveloped a site in 2012 that had been a simple HTML page catalog site. It became a dynamic, SOLR-driven site under the same domain name. The site content is updated daily (although not all products change daily) and there are now nearly 300,000 products. XML sitemaps of the new site are uploaded daily. About 2 years ago we moved the site to HTTPS.
Google Webmaster is still reporting pages from the old site (blah-blah.html) in the the Crawl Errors under Not Found. All the pages are showing status 410. Here's the relevant part of the vhosts
RewriteCond %{REQUEST_URI} ^(.*).html$
RewriteRule .* - [G,NC]
We cannot see where Google is finding these dead pages after 6 years! We remove them in Webmaster and mark them as fixed but they keep coming back.
How long do we have to wait before Google stops 'finding' them? How can we find where Google may be finding old backlinks - there aren't any showing in search?
Related
This is my site link:
www.englishact.com
This is the sitemap current position:
Google is showing no error in sitemap or any other pages. But indexed pages are 0 for about 3 months. I also have uploaded new sitemaps which are acting same way with no index.
NB:
I am using 1and1 paid hosting package. Also, google has accepted adsence for this site. Now what can I do? Any suggestions?
Your website is index on Google, i just searched for site: www.englishact.com and got many results.
Check if the links in your XML sitemap are valid or redirecting to another URL.
Also you have to solve the duplication in the URLs, you can access your website with WWW and without it, also you have two URLs for the homepage http://englishact.com/ and http://www.englishact.com/index.php
After fixing these errors your website should be healthy and Google will understand the structure of it.
I just saw in my WMT that i have a lot of indexed soft 404 pages. They all starts with mysite.com/index.php/somepage.. The problem is that i don't have not a single page starts with index.php.
Can someone explain me:
1. Why is those soft 404 pages generated at all when i don't have index.php part on my site?
2. Now the thing that worries me is that i have for example 700 soft 404 pages, and after some time i have 300 and i didn't change anything.
3. Are they harmful for my site?
4. Can this be fixed in using some rewrite rule in Apache to prevent them for been indexed at all?
In my Apache file i have some redirection rules for same pages, but
1) Why Google Webmaster Tools shows Total indexed = 0 for my website?
When I run site:ziedireizija.lv in google, it shows 59 results.
I have add both www and non-www to my webmaster tools. I have set a preferred domain to be without www.
When I open non-www in webmaster tools it shows:
Total indexed = 0
Ever crawled = 80
Not selected = 52
What this means? Why total indexed = 0.
This is for website ziedireizija.lv
2) The second question is that Webmaster Tools has HTML improvements section.
It shows Duplicate meta descriptions = 12.
I have updated meta description for those pages. However, it still shows duplicate meta description and I do see that Google has not updated these pages (neither meta description, nor page content). It has passed some time I have done this. Why?
And also Webmaster tools shows Last updated Dec 24, 2012, however, Duplicate Meta Description = 12 and I do see that those pages were not updated.
It could be somehow related with question 1.
The urls on the HTML Improvements page don't update very quickly - often taking months to refresh.
To encourage Google to update this section simply use the 'Fetch as Google' (under 'Health' in webmaster tools) for pages that you know have been fixed and then click the 'Submit to Index' button when it appears.
Using this technique I usually see urls drop off the HTML Improvements page in a couple of days.
My blog was successfully transferred to octopress and github-pages. My problem though is that website's search uses google search but the result of 'search' as you can see, are pointing to the old (wordpress) links. Now these links have change structure, following default octopress structure.
I don't understand why this is happening. Is it possible for google to have stored in it's DB the old links (my blog was 1st page for some searches, but gathered just 3.000 hits / month... not much by internet's standards) and this will change with time, or is it something I'm able to change somehow?
thanks.
1.You can wait for Google to crawl and re-index your
pages, or you can use the URL Removal Request tool
to expedite removal of old pages from the index.
http://www.google.com/support/webmasters/bin/answer.py?answer=61062
According to that page, the removal process
"usually takes 3-5 business days."
Consider submitting a Sitemap:
http://www.google.com/support/webmasters/bin/answer.py?answer=40318
click here to resubmit your sitemap.
More information about Sitemaps:
http://www.google.com/support/webmasters/bin/answer.py?answer=34575
http://www.google.com/support/webmasters/bin/topic.py?topic=8467
http://www.google.com/support/webmasters/bin/topic.py?topic=8477
https://www.google.com/webmasters/tools/docs/en/protocol.html
2.Perhaps your company might consider the
Google Mini? You could set up the Mini to
crawl the site every night or even 'continuously'.
http://www.google.com/enterprise/mini/
According to the US pricing page,
the Mini currently starts at $1995 for a
50,000-document license with a year of support.
Here is the Google Mini discussion group:
http://groups.google.com/group/Google-Mini
http://www.google.com/enterprise/hosted_vs_appliance.html
(Click: "show all descriptions")
http://www.google.com/support/mini/
(Google Mini detailed FAQ)
Apologies if SO is not the right place for this, but there are 700+ other SEO questions on here.
I'm a senior developer for a travel site with 12k+ pages. We completely redeveloped the site and relaunched in January, and with the volatile nature of travel, there are many pages which are no longer on the site. Examples:
/destinations/africa/senegal.aspx
/destinations/africa/features.aspx
Of course, we have a 404 page in place (and it's a hard 404 page rather than a 30x redirect to a 404).
Our SEO advisor has asked us to 30x redirect all our 404 pages (as found in Webmaster Tools), his argument being that 404's are damaging to our pagerank. He'd want us to redirect our Senegal and features pages above to the Africa page (which doesn't contain the content previously found on Senegal.aspx or features.aspx).
An equivalent for SO would be taking a url for a removed question and redirecting it to /questions rather than showing a 404 'Question/Page not found'.
My argument is that, as these pages are no longer on the site, 404 is the correct status to return. I'd also argue that redirecting these to less relevant pages could damage our SEO (due to duplicate content perhaps)? It's also very time consuming redirecting all 404's when our site takes some content from our in-house system, which adds/removes content at will.
Thanks for any advice,
Adam
The correct status to return is 410 Gone. I wouldn't want to speculate about what search engines will do if they are redirected to a page with entirely different content.
As I know 404 is quite bad for SEO because your site won't get any PageRank for pages linked from somewhere but missing.
I would added another page, which will explain that due to redesign original pages are not available, offering links to some other most relevant pages. (e.g. to Africa and FAQ) Then this page sounds like a good 301 answer for those pages.
This is actually a good idea.
As described at http://www.seomoz.org/blog/url-rewrites-and-301-redirects-how-does-it-all-work
(which is a good resource for the non seo people here)
404 is obviously not good. A 301 tells spiders/users that this is a permanent redirect of a source. The content should not get flagged as duplicate because you are not sending a 200 (good page) response and so there is nothing spidered/compared.
This IS kinda a grey hat tactic though so be careful, it would be much better to put actual 301 pages in place where it is looking for the page and also to find who posted the erroneous link and if possible, correct it.
I agree that 404 is the correct status, but than again you should take a step back and answer the following questions:
Do these old pages have any inbound links?
Did these old pages have any good, relevant content to the page you are 301'ing it to?
Is there any active traffic that is trying to reach these pages?
While the pages may not exist I would investigate the pages in question with those 3 questions, because you can steer incoming traffic and page rank to other existing pages that either need the PR/traffic or to pages that are already high traffic.
With regards to your in house SEO saying you are losing PR this can be true of those pages have inbound links, because you they will be met with a 404 status code and will not pass link juice, since nothing exists there any more. That's why 301's rock.
404s should not affect overall pagerank of other web pages of a website.
If they are really gone then 404/410 is appropriate. Check the official google webmasters blog.