i have been looking for the answer, but i cannot find it.
I have the following structure
http://mysite-com/universities/UNIVERSITY_ID/review
I am trying to disallow the review page, but keep the page of UNIVERSITY_ID which is dynamic.
how can i accomplish this in the robots.txt?
would this work: Disallow /universities/*/review
thanks
Using wildcard is better solution to tackle this issue.
User-agent: *
disallow: /*review$
reference: https://webmasters.stackexchange.com/questions/72722/can-we-use-regex-in-robots-txt-file-to-block-urls
Related
I need help removing or DisAllowing some malicious URLs that Google has configured with my main domain. I didn't really pay attention until my site broke down and I found hundreds of pages like website dot com / 10588msae28bdem12b84
Now I want to Disallow: all of them in Robots.txt
I need your help thanks And I also want to remove them from the Google index. Any excellent advice would be appreciated. Thanks
You could use ten rules with wildcards like this:
User-agent: *
Disallow: /*0
Disallow: /*1
Disallow: /*2
Disallow: /*3
Disallow: /*4
Disallow: /*5
Disallow: /*6
Disallow: /*7
Disallow: /*8
Disallow: /*9
However, this probably not the best solution to your problem. It is likely that your site was hacked. You should clean that up by following this guide: Help, I think I've been hacked! | Web Fundamentals | Google Developers
You need to find a way of removing the hack so that these URLs return a 404 or 410 status code which Google won't index. Then you should actualy let Google crawl those URLs to know they are gone.
I'm trying to make sure our robots.txt file is correct and would greatly appreciate some info. We want all bots to be able to crawl and index the homepage and the 'sample triallines' but that's it. Here's the file:
User-agent: *
Allow: /$
Allow: /sample-triallines$
Disallow: /
Can anyone please let me know if this is correct?
Thanks in advance.
You can test your XML sitemap directly with a robots testing tool or within the webmaster tools of most major search engines (e.g. Google Search Console). Your current robots.txt file will work for most crawlers for the exact URLs you mentioned (e.g. https://www.example/ and https://www.example/sample-triallines).
However, just to note, if your URLs deviate from these exact URLs they will be blocked to crawlers (e.g. tracking parameters). For example, the below URLs will be blocked with the current robots.txt setup, which may or may not be acceptable for what you're working on.
https://www.example/index.html
https://www.example/?marketing=promo
https://www.example/sample-triallines/
https://www.example/sample-triallines?marketing=promo
If any of these above URLs need to be crawled you'll just need to add additional directives into the robots.txt file as needed and test them within the robots testing tools. Additional information on robots directives can be found here.
Hope this helps
I am trying to disallow a specific page and its parameters along with a parameter on the entire site. Below I have the exact examples.
We now have a page that will redirect and track exteral urls. Any external URL we want to track will be linked like /redirect?u=http://example.com We do not want to add rel="nofollow" to every link.
Last but not least (our biggest seo and index issue) is every single page has an auto generate URL to disable or enable mobile. So it can be on any page like /?mobileVersion=off (or on) or /accounts?login_to=%2Fdashboard&mobileVersion=off
Basically the easy way to disallow the two parameters would be to disallow mobileVersion and u from any page. (u is the parameter needed to redirect the URL and is only valid on /redirect)
My current robots.txt config:
User-Agent: *
Disallow: /redirect
Disallow: / *?*mobileVersion=off
If you want to see our full robots.txt files its located at http://spicethymeinc.com/robots.txt.
you could change
Disallow: / *?*mobileVersion=off
to
Disallow: /*mobileVersion=off
but it looks like it should work.
I'm going off the wildcard section and examples on this page:
http://tools.seobook.com/robots-txt/
edit: I have tested with the googlebot and googlebot mobile. The are blocked by both your current robots.txt and my suggested change. Google webmaster tools has a handy robots checker you can use to test.
How would I disallow all dynamic pages within my robots.txt?
E.g.
page.php?hello=there
page.php?hello=everyone
page.php?thank=you
I would like page.php AND all possible dynamic versions to be disallowed.
At the moment I have
User-Agent: *
Disallow: /page.php
But this still allows e.g. page.php?hello=there
Thanks
What you've already got should block all access to /page.php for all search engines which respect robots.txt (no matter whether there are any query string parameters provided)
Don't forget robots.txt is only for robots :-) If you're trying to block users from accessing the page you'll need to use .htaccess or similar
I have been trying to get an answer on this question on various Google forums but no-one answers so I'll try here at SO.
I had an old site that used different URL parameters like
domain.com/index.php?showimage=166
domain.com/index.php?x=googlemap&showimage=139
How can I block access to these pages for these parameters? Of course without my domain.com/index.php page being blocked?
Can this be done in robots.txt
EDIT I found a post here: Ignore urls in robot.txt with specific parameters?
Allow: *
Disallow: /index.php?showImage=*
Disallow: /index.php?x=*