Removing crawing from search engine on my login page - robots.txt

I have a login page (login.aspx) that is currently indexed in google when somebody does a search.
I have created a robots.txt file with the following:
User-agent: *
Disallow: /login.aspx
My question is how long will it take effect to where my login.aspx page will no longer be indexed by google. Is there anything else necessary to tell Google not to index my login page?

It could take up to 90 days before the index is removed from google database but realistic a week or two to update. You could also ask google to remove that page on Webmaster Tools but will work the same way as the crawler.

You might also want to log in to Google Webmaster tools and use the "Remove URL" feature from Site Configuration/crawler access and also increase the crawling speed from Site Configuration/Settings . This might help accelerate the removal of the URL.

Related

English Literature site is not indexing new pages in google

This is my site link:
www.englishact.com
This is the sitemap current position:
Google is showing no error in sitemap or any other pages. But indexed pages are 0 for about 3 months. I also have uploaded new sitemaps which are acting same way with no index.
NB:
I am using 1and1 paid hosting package. Also, google has accepted adsence for this site. Now what can I do? Any suggestions?
Your website is index on Google, i just searched for site: www.englishact.com and got many results.
Check if the links in your XML sitemap are valid or redirecting to another URL.
Also you have to solve the duplication in the URLs, you can access your website with WWW and without it, also you have two URLs for the homepage http://englishact.com/ and http://www.englishact.com/index.php
After fixing these errors your website should be healthy and Google will understand the structure of it.

Different Google Index Information

More or less three month ago, I launched my own website. On the first day, I also verified my website for the Google Webmaster Tools, combined them with the Google Analytics Account and submitted a sitemap index file linked to five sitemap files.
But till now, I receive different Google Index Status information:
In Webmaster Tools:
Menu: Crawl -> Sitemaps: 123,861 Urls submitted, 64,313 Urls indexed
Menu: Google Index -> Index Status: 65,375 Urls indexed
When I type in google.de: “site:www.mysite.de”, then I receive 103,000 results.
When I check my website with push2check.net, I receive 110,000 Urls in Google Index.
What is wrong there? I understand that’s impossible for Google to deliver the accurate date because of the distributed processing and the result also depends on the location, where you searching from and so on. But between 65,000 and 110,000 is a huge gap. What’s the reason?
Thanks in advance!
Toby
google.de: “site:www.mysite.de”
You can search this type then google view your site all pages display index by Google.
And
Only Search
push2check.net
Then google View all result display when your website link have.
Then both result is different.

How come when I block a directory in robots.txt, its contents are still coming up?

This is what I've got in my robots.txt, placed in the base directory, of course:
User-Agent: *
Disallow: /foo/
But then, in Google, I have no index of /foo/, but for some reason, I still have /foo/foo.php showing up as a link in Google.
How come? Did I write something incorrectly? Do I need to write something else?
When you put robots.txt after your site went live, Google could already index files under /foo/.
You can remove already indexed files via Google Webmaster Tools - removal request.
robots.txt does not prevent Google to link to your blocked pages. Google won't index your blocked pages (so it won't show the page title/description/snippet), but if it finds a link to any blocked page, it might still link it from their search results.
If you want to also forbid this linking, you could use the meta element with robots and noindex.

Octopress, github pages, CNAME domain and google website search

My blog was successfully transferred to octopress and github-pages. My problem though is that website's search uses google search but the result of 'search' as you can see, are pointing to the old (wordpress) links. Now these links have change structure, following default octopress structure.
I don't understand why this is happening. Is it possible for google to have stored in it's DB the old links (my blog was 1st page for some searches, but gathered just 3.000 hits / month... not much by internet's standards) and this will change with time, or is it something I'm able to change somehow?
thanks.
1.You can wait for Google to crawl and re-index your
pages, or you can use the URL Removal Request tool
to expedite removal of old pages from the index.
http://www.google.com/support/webmasters/bin/answer.py?answer=61062
According to that page, the removal process
"usually takes 3-5 business days."
Consider submitting a Sitemap:
http://www.google.com/support/webmasters/bin/answer.py?answer=40318
click here to resubmit your sitemap.
More information about Sitemaps:
http://www.google.com/support/webmasters/bin/answer.py?answer=34575
http://www.google.com/support/webmasters/bin/topic.py?topic=8467
http://www.google.com/support/webmasters/bin/topic.py?topic=8477
https://www.google.com/webmasters/tools/docs/en/protocol.html
2.Perhaps your company might consider the
Google Mini? You could set up the Mini to
crawl the site every night or even 'continuously'.
http://www.google.com/enterprise/mini/
According to the US pricing page,
the Mini currently starts at $1995 for a
50,000-document license with a year of support.
Here is the Google Mini discussion group:
http://groups.google.com/group/Google-Mini
http://www.google.com/enterprise/hosted_vs_appliance.html
(Click: "show all descriptions")
http://www.google.com/support/mini/
(Google Mini detailed FAQ)

robots.txt: user-agent: Googlebot disallow: / Google still indexing

Look at the robots.txt of this site:
fr2.dk/robots.txt
The content is:
User-Agent: Googlebot
Disallow: /
That ought to tell google not to index the site, no?
If true, why does the site appear in google searches?
Besides having to wait, because Google's index updates take some time, also note that if you have other sites linking to your site, robots.txt alone won't be sufficient to remove your site.
Quoting Google's support page "Remove a page or site from Google's search results":
If the page still exists but you don't want it to appear in search results, use robots.txt to prevent Google from crawling it. Note that in general, even if a URL is disallowed by robots.txt we may still index the page if we find its URL on another site. However, Google won't index the page if it's blocked in robots.txt and there's an active removal request for the page.
One possible alternative solution is also mentioned in above document:
Alternatively, you can use a noindex meta tag. When we see this tag on a page, Google will completely drop the page from our search results, even if other pages link to it. This is a good solution if you don't have direct access to the site server. (You will need to be able to edit the HTML source of the page).
If you just added this, then you'll have to wait - it's not instantaenous - until Googlebot comes back to respider the site and sees the robots.txt, the site'll still be in their database.
I doubt it's relevant, but you might want to change your "Agent" to "agent" - Google's most likely not case sensitive for this, but can't hurt to follow the standard exactly.
I can confirm Google doesn't respect the Robots Exclusion File. Here's my file, which I created before putting this origin online:
https://git.habd.as/robots.txt
And the full contents of the file:
User-agent: *
Disallow:
User-agent: Google
Disallow: /
And Google still indexed it.
I don't use Google after cancelling my account last March and never had this site added to a webmaster console outside Yandex which leaves me with two assumptions:
Google is scraping Yandex
Google doesn't respect the Robots Exclusion Standard
I haven't grepped my logs yet but I will and my assumption is I'll find Google spiders in there misbehaving.