Robots.txt and secret URL ?xx=xx - robots.txt

I have a different secret url with ?id=123, ?id=567 or example.com/123 and I need to block all these URLs except my homepage, but I have a problem with Disallow: /*: it only works with Google.
My first robots.txt (blocked by Google)
User-Agent: *
Allow: /
Disallow: /$
Actually I have replaced example.com/123 by example.com/?id=123 because $ did not work and I use
User-Agent: *
Allow: /
Disallow: /?id=
I have added meta-robots
$robotIndex = "index,nofollow";
if(!empty($_GET)) {
$robotIndex = "noindex,nofollow";
}
Is it correct? What is the syntax to disallow all pages except the homepage?

Recently, Google offered robots.txt testing tool in Webmaster Tools (Crawl section). You can add a rule and test a URL against it. That way you can test if your configuration works properly.
Also, under Crawl section, you have URL Parameters option. You can set how and if parameters in your URLs change the content of the page and if these URLs should be indexed.

Related

Want to disallow few url with the robots.txt

I want to block a few URLs in robots.txt, but I really don't know how to do this.
Below I have mentioned the URL, How should I disallow the dynamic URL. I really appreciate it if you help me to get rid of these doubts.
https://falgunishanepeacock.in/order-inquire?sku=FSPI-20NOVUN03LH
User-agent: Googlebot
Allow: https://falgunishanepeacock.in/order-inquire$
Disallow: https://falgunishanepeacock.in/order-inquire*
Test it with:
https://www.google.com/webmasters/tools/robots-testing-tool
Reference:
https://developers.google.com/search/docs/advanced/robots/robots_txt?csw=1
* designates 0 or more instances of any valid character.
$ designates the end of the URL.

Firebase hosting multiple domains rewrites for specific robots.txt

I have my website setup correctly via Firebase hosting and have connected multiple domains to it:
https://domain1.com, https://domain2.com and https://domain3.com
For SEO, each domain should best have a robots.txt and a sitemap.xml reference in it.
User-agent: *
disallow:
sitemap: https://domain1 or domain2 or domain3/sitemap.xml
The sitemap url in the robots.txt should be a full URL (not relative) and I can only specify one. So I created on the root of my projects different domain1_robots.txt, domain2_robots.txt
I read Firebase documentation about rewrite rules and glob patterns:
https://firebase.google.com/docs/hosting/full-config#glob_pattern_matching
But these patterns do not seam to check on the host name?
So how to create a rewrite rules like this?:
https://domain1.com/robots.txt -> /domain1_robots.txt
https://domain2.com/robots.txt -> /domain2_robots.txt
https://domain3.com/robots.txt -> /domain3_robots.txt
So that I can correctly serve the right robots.txt for each domain?
This would be possible by using a cloud function and some js code, but I do not want to invoke a cloud function for each crawler... (costs)

Disallow routes in robots.txt

If I have routes like /info/page1 and /info/page2, but route /info doesn't exist, if I write Disallow: /info in robots.txt, robot will go to /info/page1 ?
If you disallow "/info", you cannot to go on /info/*

add block urls in robots.txt rule

I want to allow only these urls to read robots in mysite
example.com/site/faq
example.com/site/blog
example.com/site/aboutus
All other URLs need to be blocked, what are the rules ? Thanks
I think what you're looking for is:
User-agent: *
Allow: /site/faq
Allow: /site/blog
Allow: /site/aboutus
Disallow: *
That specifically allows the three folders you mentioned, and disallows everything else.

Robot.txt to allow specified URL in domain

I want to just allow the main URL(domain) and http://domain/about, and others URL are not visible to search google. Example I have link as below:
http://example.com
http://example.com/about
http://example.com/other1
http://example.com/other2
http://example.com/other3
http://example.com/other4
http://example.com/other5
http://example.com/other6
and more URL.
My question what the content of robot.txt, I want to allow just http://example.com and http://example.com/about , My site use wordpress.
Thanks.
What you want is:
User-agent: *
Allow: /$
Allow: /about
Disallow: /
The $ indicates that the url string has to end there. So it won't allow, for example, /example.com/foo.
User-agent: *
Allow: http://example.com/about
Disallow: /