My robots.txt is User-agent: * Disallow: /feeds/ Disallow: /*/_/ what its means? - robots.txt

I have a Google site, and today I found google generated a new robots.txt file:
User-agent: *
Disallow: /feeds/
Disallow: /*/_/
What does it mean? Your kind reply is appreciated.

Here is the breakdown:
User-agent: * -- Apply to all robots
Disallow: /feeds/ -- Do not crawl the /feeds/ directory
Disallow: /*/_/ -- Do not crawl any subdirectory that is named _
For more information, see www.robotstxt.org.
The User-Agent header is how browsers and robots identify themselves.
The Disallow lines define the rules the robots are supposed to follow - in this case what they shouldn't crawl.

Related

How to disallow multiple folder in robots.txt

I want to disallow robots from crawling any folder/subfolder.
I want to disallow the ff:
http://example.com/staging/
http://example.com/test/
And this is the code inside my robots.txt
User-agent: *
Disallow: /staging/
Disallow: /test/
Is this right? and will it work?
Yes, it is right !
You have to add the command Disallow line by line to each path.
Like this:
User-agent: *
Disallow: /cgi-bin/
Disallow: /img/
Disallow: /docs/
A good trick is to use some Robot.txt Generator.
Another tip is test your Robot.txt using this Google Tool

Write the correct robots txt file

I have a webshop on my test domain, no one knows it. I always store the search-es on the site into a search sql table, and there are always search-es with same words. Maybe a robot?
How can I fix this? What should I write into the robots.txt? What folders or links should I disable in the file?
My robots txt looks like:
User-agent: *
Disallow: /cms
Sitemap: http://www.my-domain.hu/sitemap.xml
Host: www.my-domain.hu
Sorry for bad english, I hope you understand what I write. :)
Update:
And what about this robotx file? Is it correct? What is MJ12bot?
User-agent: *
Disallow: /admin/
Disallow: /index.php?route=checkout*
Disallow: /cache/*/block/
Disallow: /custom/*/cache/block/
Disallow: /cib.php
Disallow: /cib_facebook.php
Disallow: /index.php?route=product/relatedproducts/
Disallow: /index.php?route=product/similar_products/
Disallow: /index.php?route=module/upsale/
Disallow: /.well-known/
Allow: /
Sitemap: http://mydomain.hu/sitemap.xml
User-Agent: MJ12bot
Disallow: /
To exclude all robots from accessing anything under the root
User-agent: *
Disallow: /
To allow all crawlers complete access
User-agent: *
Disallow:
Alternatively, you can skip creating a robots.txt file, or create one with empty content.
To exclude a single robot
User-agent: Googlebot
Disallow: /
This will disallow Google’s crawler from the entire website.
To allow just Google crawler
User-agent: Google
Disallow:
User-agent: *
Disallow: /
Is the basic idea. Does that work for you?
Source: https://www.wst.space/allow-disallow-robots-txt/

Website Description not showing in Google search engine

My website description not showing in Google search engine.
I wrote description meta tag inside the header tag.
and my robots.txt file as below
User-agent: *
Disallow: /
When i am searching in google i am getting below message
A description for this result is not available because of this site's robots.txt – learn more.
User-agent: *
Disallow: /
You're blocking all robots to crawl your website.
Simply remove that line to enable robots to crawl your site, or use:
User-agent: *
Disallow:
robots.txt examples:
To exclude all robots from the entire server (what you have!)
User-agent: *
Disallow: /
To allow all robots complete access (or just create an empty "/robots.txt" file, or don't use one at all)
User-agent: *
Disallow:
To exclude all robots from part of the server
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/
To exclude a single robot
User-agent: BadBot
Disallow: /
To allow a single robot
User-agent: Google
Disallow:
Learn more about robots.txt at http://www.robotstxt.org

Disallow pages that ends with number only in robots.txt

Is it possible to tell Google not to crawl these pages
/blog/page/10
/blog/page/20
…
/blog/page/100
These are basically Ajax calls that bring blog posts data.
I created this in robots.txt:
User-agent: *
Disallow: /blog/page/*
But now I have to another page that I want allow which is
/blog/page/start
Is there a way that I tell robots that only pages that end with a number
e.g
User-agent: *
Disallow: /blog/page/(:num)
I also got an error bellow when I tried to validate the robots.txt file:
Following the original robots.txt specification, this would work (for all conforming bots, including Google’s):
User-agent: *
Disallow: /blog/pages/0
Disallow: /blog/pages/1
Disallow: /blog/pages/2
Disallow: /blog/pages/3
Disallow: /blog/pages/4
Disallow: /blog/pages/5
Disallow: /blog/pages/6
Disallow: /blog/pages/7
Disallow: /blog/pages/8
Disallow: /blog/pages/9
This blocks all URLs whose path begins with /blog/pages/ followed by any number (/blog/pages/9129831823, /blog/pages/9.html, /blog/pages/5/10/foo etc.).
So you should not append the * character (it’s not a wildcard in the original robots.txt specification, and not even needed in your case for bots that interpret it as wildcard).
Google supports some features for robots.txt which are not part of the original robots.txt specification, and therefore are not supported by (all) other bots, e.g., the Allow field. But as the above robots.txt would work, there is no need for using it.

Robots.txt: Disallow subdirectory but allow directory

I want to allow crawling of files in:
/directory/
but not crawling of files in:
/directory/subdirectory/
Is the correct robots.txt instruction:
User-agent: *
Disallow: /subdirectory/
I'm afraid that if I disallowed /directory/subdirectory/
that I would be disallowing crawling of all files in /directory/ which I do not want to do, so am I correct in using:
User-agent: *
Disallow: /subdirectory/
You've overthinking it:
User-agent: *
Disallow: /directory/subdirectory/
is correct.
User-agent: *
Disallow: /directory/subdirectory/
Spiders aren't stupid, they can parse a path :)