Remove sitemaps URL from Google results - google-search-console

My webpage with Sitemaps http://foo.com/oddname.php appears within the results of Google if I search "oddname site:foo.com".
How could I remove it from appearing in the results without using 'robots.txt'? I do not want anybody to know which my sitemaps URL is.
Thank you.

Try to add the noindex metatag.
Block search indexing with 'noindex'
You can prevent a page from appearing in Google Search by including a noindex meta tag in the page's HTML code, or by returning a 'noindex' header in the HTTP request. When Googlebot next crawls that page and see the tag or header, Googlebot will drop that page entirely from Google Search results, regardless of whether other sites link to it.
See https://support.google.com/webmasters/answer/93710

Related

How to remove google search results for 303 redirect?

I run a dynamic site that may or may not redirect a certain route based on user preferences.
Let's say it's http://clientname.example.com/maybe. Our backend has a response for /maybe, but if the client decides they would rather use their site for the information on that page, we instead use a 303 Redirect to their page on a separate domain.
All of our content pages use the <meta name="robots" content="noindex"> tag, so google will not index any of our pages. HOWEVER, when I search google for "site:our_domain_name.com", I get a bunch of results that all trace back to those dynamic routes that return a 303. When I click on the search results in google, the 303 is followed as expected and I arrive at the client's site. What I want, is for my piece of the puzzle to not show in results at all.
I was troubleshooting it this morning, and I realized that our noindex meta tag was obviously not being seen by the robot as it was following the redirect, so I added a rule on the server that adds the 'X-Robot-Tag: noindex' header to redirect responses.
Is that enough? If I wait long enough, will those search results be removed?
Is that enough? If I wait long enough, will those search results be removed?
No because if an external page links to your site, Google will follow the link to your site, then your 303 (if your return such a code) and won't see the noindex.
Don't return a 303 for Google bots and you should be fine. It may take a bit of time, because Google needs to reprocess the page and see the noindex to remove it.

Googlebot vs "Google Plus +1 Share Button bot"?

Site Setup
I have a fully client-side one page webapp that is dynamically updated and routed on the client side. I redirect any #! requests to a headless server that renders the request with javascript executed and returns the final html to the bot. The head of the site also contains:
<meta name="fragment" content="!">
Fetch as Google works
Using the Fetch as Google webmaster tool, in the Fetch Status page, I can see that the jQuery I used to update the og:title, og:image, and og:description was executed and the default values replaced. Everything looks good, and if I mouseover the URL, the screenshot is correct.
However, with the Google Plus button, no matter what values og:title, og:image, and og:description tags are updated to, the share pop-up always uses the default/initial values.
Attempted use
I call this after each time the site content is updated, rerouted, and og meta content updated.
gapi.plusone.render("plusone-div");
I was assuming that if this approach works for the Googlebot, it should also work for the +1 button. Is there a difference between the Googlebot and whatever is used by +1 to retrieve the site metadata?
edit:
Passing a url containing the #! results in a 'site not found'
gapi.plusone.render("plusone-div", {"href" : 'http://www.site.com/#!city/Paris');
The Google crawler does not render the snippet when the +1 button is rendered but rather when a user clicks the +1 button (or share button). What you should try is to determine what your server is sending to the Googlebot during this user initiated and asynchronous load by the Google crawler.
You can emulate this by using the following cURL command:
curl -A "Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20110814 Firefox/6.0 Google (+https://developers.google.com/+/web/snippet/)" http://myurl.com/path/to/page
You can output that command to a file by adding -o testoutput.html to the command.
This will give you an idea of what the Google crawler sees when it encounters your page. The structured data testing tool can also give you hints.
What you'll likely see is that unless your doing your snippet preparation in a static file or on the server side is that you're likely not going to get the snippet that you desire.
If you can provide real URLs to test, I can probably provide more specific feedback.
Google+ fetch the pages using the _escaped_fragment_ query parameter but without the equal sign.
So, it would fetch http://www.site.com/?_escaped_fragment and NOT https://www.site.com?_escaped_fragment_=
Google Search crawler still using the fragment with the equal sign, this is just for Google plus crawler.

Facebook like is broken, adds a trailing # to the final url

I'm running my site on Cargo Collective and trying to have likes per page. I cannot modify the code in the head tag only within the body tag.
When i debug a page i get the following;
Response Code 206
Fetched URL http://www.iamneuron.com/Break-this
Canonical URL http://www.iamneuron.com/Break-this
URL for Likes http://www.iamneuron.com/#Break-this
Final URL http://www.iamneuron.com/#Break-this
I can't figure this out and i have been searching for a while now. Even if i explicitly specify the url rather than leaving the code to figure it out, facebook still adjusts the url to one that doesn't work with the trailing #.
Originally i was trying to create the like code via facebook but i have now switched to this which works better with cargo but still produces the same error:
http://randomcodescraps.tumblr.com/post/1363402555/js-dynamic-like-button-on-cargo-collective-projects
Anybody any ideas? Thanks in advance btw!
Your page doesn't have any OG tags that Facebook can use. And those that are in your page are commented out. Add proper OG tags and then use the link tool to check you tags and see if still have the same problem.

The title, link and description don't work

I've been reading guides and examples for a long time (hours) but I can't manage. I tried to use all html meta tag like title, description, and og:property. Also tried to use the link sharer and also to create a new blank page with just the info I want to share to facebook in order to test. Also I tried to generate an random url in php so to have always a different url variable (the url to share and also the url of the main page containing the script). I also grabbed (url linter) a lot of time the url to clean the cache of facebook. It always give me the title of the site domain as title or the url itself as the shared title and description. I don't know what to do.
The main web site is from joomla. In the code of index of joomla I put a php include if the url has the variable "articolo" id. This incuded php page has regulat head body etc. So maybe I facebook check the main meta of joomla first? So now I tried to open a popup with just the page for sharing. Look here: link
It's possible that the title is locked in, meaning that after X number of likes Facebook doesn't allow you to change it anymore. Can you give us an example URL you're having issues with?
EDIT
Ok, now the link you provided shows some very interesting output. http://modernolatina.it/wjs/index.php?option=com_content&view=article&id=96&Itemid=258&autore=6&articolo=6
First, you webserver, instead of sending back a 200 code, is sending back a 500 code.
Secondly the HTML your webserver is sending back has two HTML tags (Do a view source on the content returned)
Fix up those two issues and I think the linter will be happier with your page.
Test your page here:
http://developers.facebook.com/tools/debug

Get google to index links from javascript generated content

On my site I have a directory of things which is generated through jquery ajax calls, which subsequently creates the html.
To my knwoledge goole and other bots aren't aware of dom changes after the page load, and won't index the directory.
What I'd like to achieve, is to serve the search bots a dedicated page which only contains the links to the things.
Would adding a noscript tag to the directory page be a solution? (in the noscript section, I would link to a page which merely serves the links to the things.)
I've looked at both the robots.txt and the meta tag, but neither seem to do what I want.
It looks like you stumbled on the answer to this yourself, but I'll post the answer to this question anyway for posterity:
Implement Google's AJAX crawling specification. If links to your page contain #! (a URL fragment starting with an exclamation point), Googlebot will send everything after the ! to the server in the special query string parameter _escaped_fragment_.
You then look for the _escaped_fragment_ parameter in your server code, and if present, return static HTML.
(I went into a little more detail in this answer.)