How to remove google search results for 303 redirect? - redirect

I run a dynamic site that may or may not redirect a certain route based on user preferences.
Let's say it's http://clientname.example.com/maybe. Our backend has a response for /maybe, but if the client decides they would rather use their site for the information on that page, we instead use a 303 Redirect to their page on a separate domain.
All of our content pages use the <meta name="robots" content="noindex"> tag, so google will not index any of our pages. HOWEVER, when I search google for "site:our_domain_name.com", I get a bunch of results that all trace back to those dynamic routes that return a 303. When I click on the search results in google, the 303 is followed as expected and I arrive at the client's site. What I want, is for my piece of the puzzle to not show in results at all.
I was troubleshooting it this morning, and I realized that our noindex meta tag was obviously not being seen by the robot as it was following the redirect, so I added a rule on the server that adds the 'X-Robot-Tag: noindex' header to redirect responses.
Is that enough? If I wait long enough, will those search results be removed?

Is that enough? If I wait long enough, will those search results be removed?
No because if an external page links to your site, Google will follow the link to your site, then your 303 (if your return such a code) and won't see the noindex.
Don't return a 303 for Google bots and you should be fine. It may take a bit of time, because Google needs to reprocess the page and see the noindex to remove it.

Related

Remove sitemaps URL from Google results

My webpage with Sitemaps http://foo.com/oddname.php appears within the results of Google if I search "oddname site:foo.com".
How could I remove it from appearing in the results without using 'robots.txt'? I do not want anybody to know which my sitemaps URL is.
Thank you.
Try to add the noindex metatag.
Block search indexing with 'noindex'
You can prevent a page from appearing in Google Search by including a noindex meta tag in the page's HTML code, or by returning a 'noindex' header in the HTTP request. When Googlebot next crawls that page and see the tag or header, Googlebot will drop that page entirely from Google Search results, regardless of whether other sites link to it.
See https://support.google.com/webmasters/answer/93710

How to prevent Google from indexing redirect URL I do not own

A domainname that I do not own, is redirecting to my domain. I donĀ“t know who owns it and why it is redirecting to my domain.
This domain however is showing up in Googles search results. When doing a whois it also returns this message:
"Domain:http://[baddomain].com webserver returns 307 Temporary Redirect"
Since I do not own this domain I cannot set a 301 redirect, or disable it. When clicking the baddomain in Google it shows the content of my website but the baddomain.com stays visible in the URL bar.
My question is: How can I stop Google from indexing and showing this bad domain in the search results and only show my website instead?
Thanks.
Some thoughts:
You cannot directly stop Google from indexing other sites, but what you could do is add the cannonical tag to your pages so Google can see that the original content is located on your domain and not "bad domain".
For example check out : https://support.google.com/webmasters/answer/139394?hl=en
Other actions can be taken SEO wise if the 'baddomain' is outscoring you in the search rankings, because then it sounds like your site could use some optimizing.
The better your site and domain rank in the SERPs, the less likely it is that people will see the scraped content and 'baddomain'.
You could however also look at the referrer for the request and if it is 'bad domain' you should be able to do a redirect to your own domain, change content etc, because the code is being run from your own server.
But that might be more trouble than it's worth as you'd need to investigate how the 'baddomain' is doing things and code accordingly. (properly iframe or similar from what you describe, but that can still be circumvented using scripts).
Depending on what country you and 'baddomain' are located in, there are also legal actions. So called DMCA complaints. This however can also be quite a task, and well - it's often not worth it because a new domain will just pop up.

Googlebot vs "Google Plus +1 Share Button bot"?

Site Setup
I have a fully client-side one page webapp that is dynamically updated and routed on the client side. I redirect any #! requests to a headless server that renders the request with javascript executed and returns the final html to the bot. The head of the site also contains:
<meta name="fragment" content="!">
Fetch as Google works
Using the Fetch as Google webmaster tool, in the Fetch Status page, I can see that the jQuery I used to update the og:title, og:image, and og:description was executed and the default values replaced. Everything looks good, and if I mouseover the URL, the screenshot is correct.
However, with the Google Plus button, no matter what values og:title, og:image, and og:description tags are updated to, the share pop-up always uses the default/initial values.
Attempted use
I call this after each time the site content is updated, rerouted, and og meta content updated.
gapi.plusone.render("plusone-div");
I was assuming that if this approach works for the Googlebot, it should also work for the +1 button. Is there a difference between the Googlebot and whatever is used by +1 to retrieve the site metadata?
edit:
Passing a url containing the #! results in a 'site not found'
gapi.plusone.render("plusone-div", {"href" : 'http://www.site.com/#!city/Paris');
The Google crawler does not render the snippet when the +1 button is rendered but rather when a user clicks the +1 button (or share button). What you should try is to determine what your server is sending to the Googlebot during this user initiated and asynchronous load by the Google crawler.
You can emulate this by using the following cURL command:
curl -A "Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20110814 Firefox/6.0 Google (+https://developers.google.com/+/web/snippet/)" http://myurl.com/path/to/page
You can output that command to a file by adding -o testoutput.html to the command.
This will give you an idea of what the Google crawler sees when it encounters your page. The structured data testing tool can also give you hints.
What you'll likely see is that unless your doing your snippet preparation in a static file or on the server side is that you're likely not going to get the snippet that you desire.
If you can provide real URLs to test, I can probably provide more specific feedback.
Google+ fetch the pages using the _escaped_fragment_ query parameter but without the equal sign.
So, it would fetch http://www.site.com/?_escaped_fragment and NOT https://www.site.com?_escaped_fragment_=
Google Search crawler still using the fragment with the equal sign, this is just for Google plus crawler.

is a page redirect to most current article bad from an seo perspective?

I have navigation at the top of my site that links to news/
The news is always paginated, one article per page, with the ability to navigate to the next or previous article.
I would like the default article to be the second-to-last most current article. So if there are 10 articles, when the user clicks on news/, they are redirected to news/9 with 302 redirect code.
From an SEO perspective, is it bad to be constantly redirecting like this? Would it be better to change the link in the top navigation to instead link directly to news/9, and keep changing that everytime there is a new article instead?
Search engines expect a given piece of content to have a canonical URL. It's OK to have any number of URLs to a single page but use a canonical URL.
Sop no matter what redirect you have, add a canonical URL and search engine will take care of any mess.
302 redirects are for this. Use it.

SEO redirects for removed pages

Apologies if SO is not the right place for this, but there are 700+ other SEO questions on here.
I'm a senior developer for a travel site with 12k+ pages. We completely redeveloped the site and relaunched in January, and with the volatile nature of travel, there are many pages which are no longer on the site. Examples:
/destinations/africa/senegal.aspx
/destinations/africa/features.aspx
Of course, we have a 404 page in place (and it's a hard 404 page rather than a 30x redirect to a 404).
Our SEO advisor has asked us to 30x redirect all our 404 pages (as found in Webmaster Tools), his argument being that 404's are damaging to our pagerank. He'd want us to redirect our Senegal and features pages above to the Africa page (which doesn't contain the content previously found on Senegal.aspx or features.aspx).
An equivalent for SO would be taking a url for a removed question and redirecting it to /questions rather than showing a 404 'Question/Page not found'.
My argument is that, as these pages are no longer on the site, 404 is the correct status to return. I'd also argue that redirecting these to less relevant pages could damage our SEO (due to duplicate content perhaps)? It's also very time consuming redirecting all 404's when our site takes some content from our in-house system, which adds/removes content at will.
Thanks for any advice,
Adam
The correct status to return is 410 Gone. I wouldn't want to speculate about what search engines will do if they are redirected to a page with entirely different content.
As I know 404 is quite bad for SEO because your site won't get any PageRank for pages linked from somewhere but missing.
I would added another page, which will explain that due to redesign original pages are not available, offering links to some other most relevant pages. (e.g. to Africa and FAQ) Then this page sounds like a good 301 answer for those pages.
This is actually a good idea.
As described at http://www.seomoz.org/blog/url-rewrites-and-301-redirects-how-does-it-all-work
(which is a good resource for the non seo people here)
404 is obviously not good. A 301 tells spiders/users that this is a permanent redirect of a source. The content should not get flagged as duplicate because you are not sending a 200 (good page) response and so there is nothing spidered/compared.
This IS kinda a grey hat tactic though so be careful, it would be much better to put actual 301 pages in place where it is looking for the page and also to find who posted the erroneous link and if possible, correct it.
I agree that 404 is the correct status, but than again you should take a step back and answer the following questions:
Do these old pages have any inbound links?
Did these old pages have any good, relevant content to the page you are 301'ing it to?
Is there any active traffic that is trying to reach these pages?
While the pages may not exist I would investigate the pages in question with those 3 questions, because you can steer incoming traffic and page rank to other existing pages that either need the PR/traffic or to pages that are already high traffic.
With regards to your in house SEO saying you are losing PR this can be true of those pages have inbound links, because you they will be met with a 404 status code and will not pass link juice, since nothing exists there any more. That's why 301's rock.
404s should not affect overall pagerank of other web pages of a website.
If they are really gone then 404/410 is appropriate. Check the official google webmasters blog.