Remove multiples urls of same type from google webmaster

Remove multiples urls of same type from google webmaster - google-search-console

I accidentally kept some urls of type www.example.com/abc/?id=1 in which value of id can vary from 1 to 200. I don't want these to appear in search so i am using remove url feature of google webmasters tools. How can i remove all these types of urls in one shot? i tried www.example.com/abc/?id=* but this doesn't worked!

just block them using robots.txt ie.
User-agent: *
Disallow: /junk.html
Disallow: /foo.html
Disallow: /bar.html

Related

Why does Google not index my "robots.txt"?

I am trying to allow the Googlebot webcrawler to index my site. My robots.txt initially looked like this:
User-agent: *
Disallow: /
Host: www.sitename.com
Sitemap: https://www.sitename.com/sitemap.xml
And I changed it to:
User-agent: *
Allow: /
Host: www.sitename.com
Sitemap: https://www.sitename.com/sitemap.xml
Only Google is still not indexing my links.

I am trying to allow the Googlebot webcrawler to index my site.
Robots rules has nothing to do with indexing! They are ONLY about crawling ability. A page can be indexed, even if it is forbidden to be crawled!
host directive is supported only by Yandex.
If you want all bots are able to crawl your site, your robots.txt file should be placed under https://www.sitename.com/robots.txt, be available with status code 200, and contain:
User-agent: *
Disallow:
Sitemap: https://www.sitename.com/sitemap.xml

From the docs:
Robots.txt syntax can be thought of as the “language” of robots.txt files. There are five common terms you’re likely come across in a robots file. They include:
User-agent: The specific web crawler to which you’re giving crawl instructions (usually a search engine). A list of most user agents can be found here.
Disallow: The command used to tell a user-agent not to crawl particular URL. Only one "Disallow:" line is allowed for each URL.
Allow (Only applicable for Googlebot): The command to tell Googlebot it can access a page or subfolder even though its parent page or subfolder may be disallowed.
Crawl-delay: How many seconds a crawler should wait before loading and crawling page content. Note that Googlebot does not acknowledge this command, but crawl rate can be set in Google Search Console.
Sitemap: Used to call out the location of any XML sitemap(s) associated with this URL. Note this command is only supported by Google, Ask, Bing, and Yahoo.
Try to specifically mention Googlebot in your robots.txt-directives such as:
User-agent: Googlebot
Allow: /
or allow all web crawlers access to all content
User-agent: *
Disallow:

Robots.txt different from the one in directory

I can't figure out why Google reads my robots.txt file as Disallow: /.
This is what I have in my robots.txt file that is in the main root directory:
User-agent: *
Allow: /
But if I digit in browser it will show Disallow: /: http://revita.hr/robots.txt
I tried everything, submitted the sitemap, added meta robots index, follow into <head>, but it's always the same.
Any ideas?

You seem to have a different robots.txt file if accessing it via HTTPS (→ Allow) instead of HTTP (→ Disallow).
By the way, you don’t need to state
User-agent: *
Allow: /
because allowing everything is the default. As Allow is not part of the original robots.txt specification, you might want to use this instead:
User-agent: *
Disallow:
Also note that you should not have a blank line inside a record.

Check for specific text in Robots.txt

My URL ends with &content=Search. I want to block all URLs that end with this. I have added following in robots.txt.
User-agent: *
Disallow:
Sitemap: http://local.com/sitemap.xml
Sitemap: http://local.com/en/sitemap.xml
Disallow: /*&content=Search$
But it's not working when testing /en/search?q=terms#currentYear=2015&content=search in https://webmaster.yandex.com/robots.xml. It is not working for me because content=search is after # character.

The Yandex Robots.txt analysis will block your example if you test for Search instead of search, as Robots.txt Disallow values are case-sensitive.
If your site uses case-insensitive URLs, you might want to use:
User-agent: *
Disallow: /*&content=Search$
Disallow: /*&content=search$
# and possibly also =SEARCH, =SEarch, etc.
Having said that, I don’t know if Yandex really supports this for URL fragments (it would be unusual, I guess), although their tool gives this impression.

Google robots.txt file

I want to allow google robot:
1) to only see the main page
2) to see description in a search results for main page
I have the following code but it seems that it doesn't work
User-agent: *
Disallow: /feed
Disallow: /site/terms-of-service
Disallow: /site/rules
Disallow: /site/privacy-policy
Allow: /$
Am I missing something or I just need to wait the google robot to visit my site?
Or maybe it is some action required from google webmaster panel?
Thanks in advance!

Your robots.txt should work (and yes, it takes time), but you might want to make the following changes:
It seems you want to target only Google’s bot, so you should use User-agent: Googlebot instead of User-agent: * (which targets all bots that don’t have a specific record in your robots.txt).
It seems that you want to disallow crawling of all pages except the home page, so there is no need to specify a few specific path beginnings in Disallow.
So it could look like this:
User-agent: Googlebot
Disallow: /
Allow: /$
Google’s bot may only crawl your home page, nothing else. All other bots may crawl everything.

Proper wildcard Disallow for robots.txt

I am trying to disallow a specific page and its parameters along with a parameter on the entire site. Below I have the exact examples.
We now have a page that will redirect and track exteral urls. Any external URL we want to track will be linked like /redirect?u=http://example.com We do not want to add rel="nofollow" to every link.
Last but not least (our biggest seo and index issue) is every single page has an auto generate URL to disable or enable mobile. So it can be on any page like /?mobileVersion=off (or on) or /accounts?login_to=%2Fdashboard&mobileVersion=off
Basically the easy way to disallow the two parameters would be to disallow mobileVersion and u from any page. (u is the parameter needed to redirect the URL and is only valid on /redirect)
My current robots.txt config:
User-Agent: *
Disallow: /redirect
Disallow: / *?*mobileVersion=off
If you want to see our full robots.txt files its located at http://spicethymeinc.com/robots.txt.

you could change
Disallow: / *?*mobileVersion=off
to
Disallow: /*mobileVersion=off
but it looks like it should work.
I'm going off the wildcard section and examples on this page:
http://tools.seobook.com/robots-txt/
edit: I have tested with the googlebot and googlebot mobile. The are blocked by both your current robots.txt and my suggested change. Google webmaster tools has a handy robots checker you can use to test.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Remove multiples urls of same type from google webmaster - google-search-console

just block them using robots.txt ie. User-agent: * Disallow: /junk.html Disallow: /foo.html Disallow: /bar.html

Related

Why does Google not index my "robots.txt"?

Robots.txt different from the one in directory

Check for specific text in Robots.txt

Google robots.txt file

Proper wildcard Disallow for robots.txt

Categories

Resources