How to tell google that a specific page of my website disappeared and won't come back? - google-search-console

I have a website where 50% of the pages have a limited lifetime.
To give an idea, 4.000 pages appear each week and the same amount disappears.
By "appearing" and "disappearing", I mean that the appearing pages are completely new ones, and disappearing pages are removed from the website forever. There is no "this new page replaces this old page".
I naively used a 410 code on every URL where a page had disappeared.
Meaning the url http://mywebsite/this-page-was-present-until-yesterday.php returned until yesterday a 200 OK code, and returns now a 410 Gone code.
I didn't use no redirect, because I want to tell the user that the URL he accessed isn't wrong, but that it is expired.
The problem is : Google won't acknowledge this information. It is still crawling the pages and Webmaster Tools alerts me as if the page was 404 broken. This affects significantly my "reputation".
Did I do something wrong ? How should I proceed ?

It's always a very good idea to make your own error page. This can save you a lot of visits through broken links.
.htaccess error pages

The Webmaster Tools of Google enables you to delete certain pages.
You can find this under "crawler access".
Try adding a noindex header.

Related

Facebook Lint/Debugger 403 and 503 Response code. (Wordpress site.)

Humbly asking for any assistance people have time to give me on this one. Let me start by saying that I am aware there are previous questions about this on this site and elsewhere on the web; I have read a lot of them, and they are either unanswered/resolved, had a particular cause that doesn't apply to me, or suggests things I have already done.
Over the past few days, Facebook has suddenly stopped scraping my website posts successfully, so when I paste a link into Facebook it pulls nothing through - no thumb or description. I run the links through the FB lint/debugger, and it alternates between 403 and 503 response codes, but mainly 403. Previous links that Facebook has cached/successfully scraped still display with thumbs and desc, but still present as a 403 or 503 response.
My site is http://21stcenturyburlesque.com
One of the new URLs I have been testing is : http://21stcenturyburlesque.com/the-burlesque-top-50-2013/
I have checked with the server/host people. Nothing has changed, everything fine.
I have tried with the default wordpress theme. No change.
I have read threads about Bullet Proof Security causing issues, although why it suddenly would I don't know. It was deactivated on my site anyway, but I went through the removal process to remove the htaccess file with the BPS code in it. I have then run debug without an htaccess file present, and with a very basic htaccess present. No change.
Hotlinking protection is disabled in my cpanel.
I have experimented with adding/removing www. and / when I paste the link into lint as someone suggested. No change.
I use Facebook OGP Wordpress plugin. I spoke to the creator and he says the plugin is working as it should and to contact my host/server. See bullet one.
I tried creating a new FB App and using the new App Id number with the OGP plugin. No change.
Checked the cpanel error log. This came up three times tonight:
[Fri Nov 01 21:47:53 2013] [error] [client 193.242.149.35] File does not exist: /home/**/public_html/403.shtml
There are a few other things I ruled out but I've been at this for so long I can't remember all of them, so if someone suggests something else I've tried then I apologise for not mentioning it here in advance.
If anyone can suggest anything else, I would really appreciate it. I manage to fix most technical problems I come up against, but this has stumped me and my much more experienced colleague and it is really affecting my clickthrough rates and site traffic. If it comes down to adding things to my htaccess file, I would appreciate guidance on what to add/remove. Many thanks in advance.
I had the same problem. Drove me crazy for hours (maybe days). In your FB app settings make sure that the top Facebook url has http://

Error parsing input URL, no data was scraped. only with new pages on my site

The problem i have is that i own a website where other people can post stuff ,creating new pages on my domain, but the problem that occured today is that all the new post pages created today are malfunctioning , sharing is not loading thumbnail picture and title and so on, but the weird this is that all the posts(new pages) created before today are all working fine
What caused an error to occur out of nowhere?
I also cannot debug any of the URL's of my website as the same error: Error parsing input URL, no data was scraped
The website im having problems with is here http://www.vabameedia.ee/vm/184/h%C3%A4da-ei-anna-h%C3%A4beneda.html
This is one of the sites where it says no error on page but facebook still cant reach it. http://www.vabameedia.ee/vm/178/craig-parks-%C3%BChek%C3%A4eline-krossisoitja.html
For people experiencing the same problem but for different causes, I discovered a few interesting things about how Facebook "scrapes" pages, checking the logs of the server while doing some trials.
First of all: if you never tried to share a page with FB, FB never tried to scrape it, and it will not try to do so if you only put the url in the Debug tool.
That's the first reason because you get the error: it just states that FB has no information on the page, you must "force" it to scrape the page.
The first time you try to share a page, FB scrapes it (asks your server the first 40k of the page and analyse the opengraph tags).
What can happen is that you do not see the image: Facebook Share Dialog does not display thumbnails one first load
The reason is that FB behind the scenes is still scraping your page and caching the image. The next time, in fact, you have also the image.
How to solve it? Pre caching: https://developers.facebook.com/docs/sharing/best-practices#precaching
or simply add
<meta property="og:image:width" content="450"/>
<meta property="og:image:height" content="298"/>
I was pulling my hair out trying to fix this issue. Hours and hours of troubleshooting to no avail. After speaking with one of our programmers about a topic unrelated I thought of something to try as a long shot.
Much to my surprise, it worked!!!
This is the reason behind the problem and my solution for it:
When you draft a post in WordPress it generates a link based on your article's title (unless you manually change it). The title of my article included special characters, however the auto-generated link didn't display these special characters, only hyphens to replace the spaces. Should be fine right? Wrong! Somewhere embedded in metadata and code in the WordPress platform are those special characters and they mess up the way Facebook pulls info from the article being linked to. This is a problem because certain special characters invalidate hyperlinks.
For example:
Article Title: R[eloaded]
Auto-generated hyperlink DISPLAYED in WordPress "Permalink" field: http://www.example.com/reloaded
Actual WordPress Auto-generated hyperlink: http://www.example.com/r[eloaded]
Those brackets will invalidate the link and Facebook will be unable to pull any information (ie pictures) from it.
Solution:
(1) Simply, manually change the WordPress hyperlink address to something that doesn't include any special characters (this will not change the title of your article).
(2) Click "Update" to change the post to include the new hyperlink.
(3) Click "Purge from Cache" in the WordPress window
(4) Refresh your Facebook browser window
(5) Paste the new hyperlink for your article
(6) Enjoy your Facebook post with a preview image and information
Sidenote: Don't pull your hair out over Facebook, it's not worth it. =)
If you're using Wordpress, edit the post in question to change the permalink (just alter it slightly), then update the post. Using the new permalink in the Facebook OG debugger should now work.
It's a weird fix, but I think it takes care of a problem caused by special characters being used in the title of a post, which is then used to make the permalink.
Its all about DNS issue, was having same issue and resolved it by updating domain name servers to actual name servers.
In my case my domain was pointed to ns1.websterz.net and ns2.websterz.net and on this server i had DNS redirect to my other server (where web site is hosted). I Just updated name servers of the domain to actual name servers where my web site is hosted on. This was account migration case i forgot to update name servers as of new server.
Everything works fine now.

Facebook - click on a like button does not increase the like count

After searching the internet and doing my own research on this subject I still can not find the answer to my problem, so here it is.
When I click the like button (to like my website http://openarchitecture.cz) then the like count is not being increased.
Debugging the FB javascript code on client-side (in Chrome) and examining the ajax response
send back from FB servers after the click on "Like" button, revealed that FB is instructing the Like button to be "disconnected", resulting in the behaviour described below.
The term "disconnected" is a strict FB term (in a sense of a javascript code), it means that on client-side there will be used a "plugin" that will perform certain operations leading to "inactivity" of like button. Technically, when the plugin "disconnect" is beeing recognized as part of the ajax response, there is an array of predefind actions (functions) that will be followed and called sequentially.
Now for the reproducibility of the problem.
Go to http://developers.facebook.com/docs/reference/plugins/like and fill the "URL to Like" field with http://openarchitecture.cz
url.
Click "Get code", then click "Ok" on the pop-up and finally click "Like" button on the right.
The like count should increase. Instead a pop-up shows up for (aprox.) 1 sec.
then the popup disappears. Now I am in the same state as before I click the
like button, i.e. like count is not increased.
I have found similar questions here on SO, but none of them seems to finally
resolve the issue.
The related questions here on SO are:
1. http://facebook.stackoverflow.com/questions/5195183/facebook-like-button-flashing-on-then-off/12958474#12958474
2. Facebook Like button does not work on one website?
One of the suggestions was that this migth actually be a FB bug. I found a (very recently created) bug, reported in FB bug tracking system. The bug is located here:
http://developers.facebook.com/bugs/268340209965207?browse=search_512b8e0bed9724580954683
The bug has however "Low" priority an so far it does not seem to be resolved (it might even be returned as not a FB issue, I am not sure if this possibility is still open).
So for all interested in this.
Is this a real FB bug ?
How have you dealt with this ?
Could it be that my site is for some reason on FB spam/black/"whatever nasty" list ?
Well. This will end up like the other posts, i.e. no lesson learned here.
[The term "page" used later in this post represents the http://openarchitecture.cz page]
I just tried today to like the page again via the FB generated like button (on http://developers.facebook.com/docs/reference/plugins/like/ ) and the result is now ok. So the like count gets increased after clicking the like button.
The difference that I observed when checking the request exchange to FB servers is that this time the communication has been done (by default. i.e. using the XFBML version of the like button) over iframe, not direct ajax call (as was in the past for XFBML).
I dont know what was the cause (I tried the pure iframe version of like button before) but the response going back as a result of the mentioned iframe request is now correct, ie. FB sends back response instructing javascript in client browser to use "connect" plugin not "disconnect" plug-in.
One more thing. One month ago I have created a FB profile (http://www.facebook.com/pages/Openarchitecture/125515934292877) of the page and have done some updates to this profile. So maybe FB decided that the page (being referenced from FB profile) has now earned the provilige to be "liked".
Like I said at the beginning. Problem solved, but no lesson learned.
For me, the problem (Like popup disappearing after a second; "Plugin","disconnect" response) was happening when the Like button URL redirected to another URL.
The fix was to add og:type, og:url, and og:title (required per https://developers.facebook.com/docs/reference/opengraph/object-type/website), then running the URL through the Facebook debugger to clear the cache (https://developers.facebook.com/tools/debug).
More at https://stackoverflow.com/a/16597060/2391566 .

Domain blocked and no data scraped

I recently purchased the domain www.iacro.dk from UnoEuro and installed WordPress planning to integrate blogging with Facebook. However, I cannot even get to share a link to the domain.
When I try to share any link on my timeline, it gives the error "The content you're trying to share includes a link that's been blocked for being spammy or unsafe: iacro.dk". Searching, I came across Sucuri SiteCheck which showed that McAfee TrustedSource had marked the site as having malicious content. Strange considering that I just bought it, it contains nothing but WordPress and I can't find any previous history of ownership. But I got McAfee to reclassify it and it now shows up green at SiteCheck. However, now a few days later, Facebook still blocks it. Clicking the "let us know" link in the FB block dialog got me to a "Blocked from Adding Content" form that I submitted, but this just triggered a confirmation mail stating that individual issues are not processed.
I then noticed the same behavior as here and here: When I type in any iacro.dk link on my Timeline it generates a blank preview with "(No Title)". It doesn't matter if it's the front page, a htm document or even an image - nothing is returned. So I tried the debugger which returns the very generic "Error Parsing URL: Error parsing input URL, no data was scraped.". Searching on this site, a lot of people suggest that missing "og:" tags might cause no scraping. I installed a WP plugin for that and verified tag generation, but nothing changed. And since FB can't even scrape plain htm / jpg from the domain, I assume tags can be ruled out.
Here someone suggests 301 Redirects being a problem, but I haven't set up redirection - I don't even have a .htaccess file.
So, my questions are: Is this all because of the domain being marked as "spammy"? If so, how can I get the FB ban lifted? However, I have seen examples of other "spammy" sites where the preview is being generated just fine, e.g. http://dagbok.nu described in this question. So if the blacklist is not the only problem, what else is wrong?
This is driving me nuts so thanks a lot in advance!
I don't know the details, but it is a problem that facebook has with web sites hosted on shared servers, i.e. the server hosting your web site also hosts a number of other web sites.

SEO redirects for removed pages

Apologies if SO is not the right place for this, but there are 700+ other SEO questions on here.
I'm a senior developer for a travel site with 12k+ pages. We completely redeveloped the site and relaunched in January, and with the volatile nature of travel, there are many pages which are no longer on the site. Examples:
/destinations/africa/senegal.aspx
/destinations/africa/features.aspx
Of course, we have a 404 page in place (and it's a hard 404 page rather than a 30x redirect to a 404).
Our SEO advisor has asked us to 30x redirect all our 404 pages (as found in Webmaster Tools), his argument being that 404's are damaging to our pagerank. He'd want us to redirect our Senegal and features pages above to the Africa page (which doesn't contain the content previously found on Senegal.aspx or features.aspx).
An equivalent for SO would be taking a url for a removed question and redirecting it to /questions rather than showing a 404 'Question/Page not found'.
My argument is that, as these pages are no longer on the site, 404 is the correct status to return. I'd also argue that redirecting these to less relevant pages could damage our SEO (due to duplicate content perhaps)? It's also very time consuming redirecting all 404's when our site takes some content from our in-house system, which adds/removes content at will.
Thanks for any advice,
Adam
The correct status to return is 410 Gone. I wouldn't want to speculate about what search engines will do if they are redirected to a page with entirely different content.
As I know 404 is quite bad for SEO because your site won't get any PageRank for pages linked from somewhere but missing.
I would added another page, which will explain that due to redesign original pages are not available, offering links to some other most relevant pages. (e.g. to Africa and FAQ) Then this page sounds like a good 301 answer for those pages.
This is actually a good idea.
As described at http://www.seomoz.org/blog/url-rewrites-and-301-redirects-how-does-it-all-work
(which is a good resource for the non seo people here)
404 is obviously not good. A 301 tells spiders/users that this is a permanent redirect of a source. The content should not get flagged as duplicate because you are not sending a 200 (good page) response and so there is nothing spidered/compared.
This IS kinda a grey hat tactic though so be careful, it would be much better to put actual 301 pages in place where it is looking for the page and also to find who posted the erroneous link and if possible, correct it.
I agree that 404 is the correct status, but than again you should take a step back and answer the following questions:
Do these old pages have any inbound links?
Did these old pages have any good, relevant content to the page you are 301'ing it to?
Is there any active traffic that is trying to reach these pages?
While the pages may not exist I would investigate the pages in question with those 3 questions, because you can steer incoming traffic and page rank to other existing pages that either need the PR/traffic or to pages that are already high traffic.
With regards to your in house SEO saying you are losing PR this can be true of those pages have inbound links, because you they will be met with a 404 status code and will not pass link juice, since nothing exists there any more. That's why 301's rock.
404s should not affect overall pagerank of other web pages of a website.
If they are really gone then 404/410 is appropriate. Check the official google webmasters blog.