robots.txt on domain instead of subdomain? - robots.txt

I am researching this. I need suggestions.
I have a domain that I own. It is pointed to a subdomain that is my business that I do not own. The subdomain's robots.txt file controls the images I am trying to list in the Microsoft Shopping. But Microsoft Shopping wants to crawl the image URLs. The robots.txt will not allow this.
I need to use resources to have my domain carry its own robots.txt file while integrated with the subdomain. I need to avoid redirects for Microsoft.
TIA

Related

Removing robots.txt from tomcat

If I remove robots.txt from my webapps root directory, it's allow the Google bot to crawl pages in my site?
We have already disallowed all th bots, but we want to remove it.
so pls clarify, for bots does missing robots.txt file means don't crawl into the site?
A missing robots.txt file, means it's open for unlimited crawling by anyone.
Also, most websites don’t need a robots.txt file.
It is better practice to have an robots.txt listing disallowed paths, than rejecting/blocking the HTTP requests based on the User-Agent string.
A little side note:
On dynamic web pages, it's relatively easy to filter bots on runtime, using the User-Agent string, but it may be more difficult to rejecting bots on static assets, like files or images.
Also, many bots doesn't even have the word bot or crawler in it's User-Agent string, making it harder to differentiate humans from bots.

Website not indexed/scraped/crawled via Bing Custom Search

I want to create a multi-site search function using Bing Custom Search. I have created a search instance on https://www.customsearch.ai/ using my company-provided Microsoft account.
After adding my company's main www website (www.mysite.com) the search works, in the try-it-out tool and when going via the v7 API.
When I add a subsite, e.g. ``mysubsite.mysite.com`, how ever it does not crawl and display search results from that site.
I have tried:
Allowing subpages for mysite.com
Specifying protocol, e.g. HTTPS
Waiting for a day or two
What can be the problem? Sure, the subsite is not publicly released yet (or announced I mean), but it is accessible by everyone with an Internet connection and a web browser. How come Bing Custom Search does not find it when I tell it the exact address?
Thank you in advance.
If your subsite/pages are not crawled or indexed, you can check out the webmaster info on this page: https://learn.microsoft.com/en-us/azure/cognitive-services/bing-custom-search/define-your-custom-view. Search for webmaster documentation on this page.

How do I change robots.txt on a Google site?

I need to make changes to someone's robot.txt file, but their site is managed by Google, (so no FTPing).
I have full access to the site via the browser (Site Actions / Manage Site, etc) but how do I get to the robots.txt file to update it?

Can I add wildcard domain to Google Webmaster Tools?

I have just added a domain to Webmaster tools.
I realise now that it does not recognise all the subdomains.
My web app has user created subdomains. So I would like to have all the search engine information under the entire domain to be in one webmaster account.
What I can't figure out is a way to add the wildcard domain.
Does anyone know how to solve this?
it takes a while for Google to include those under same domain in Webmaster Tools
you can either wait or add each subdomain individually
hope that helps

redirect for smartphones and Googlebot-mobile

I'm building a mobile version of my site for smart-phones
(iPhone/Blackberry/Android/WebOS)
and I want to redirect to the mobile version from my main site whenever the user agent is of one of the kinds listed above (my mobile site is on a different url than my Desktop site).
My mobile version is more like a WebApp and does not contain the same content as the Desktop site.
After reading This Post by Google I understand that the Googlebot expects smartphones to display the Desktop version of the site (Googlebot-Mobile is not used for smartphones)
I'm afraid that if I redirect to the mobile version for smartphones, Google will give me penalty for cloaking, How can I avoid this?
I know that including a link from the main site to the mobile version and vice versa helps a lot.
Any other advice/best practices on how to be google friendly when creating mobile versions of the site for smartphones?
From the article:
For Googlebot and Googlebot-Mobile, it does not matter what the URL structure is as long as it returns exactly what a user sees too.
The key thing is you must be consistent in the content you give to the bot and the one you serve to the user.
Another interesting excerpt from the article:
For now, we expect smartphones to handle desktop experience content so there is no real need for mobile-specific effort from webmasters. However, for many websites it may still make sense for the content to be formatted differently for smartphones, and the decision to do so should be based on how you can best serve your users.
You can also serve a different page/content/styling based on the UA string, as stated in the article:
If you serve all types of content from www.example.com, i.e. serving desktop-optimized content or mobile-optimized content from the same URL depending on the User-agent, this will also lead to correct crawling by Googlebot and Googlebot-Mobile. This is not considered cloaking by Google.
I think it all boils down how different the content/styling is. If it's only slightly different, I would probably go with the same url serving both. If it's dramatically different, I would use a different url for smartphones.
Hope this helps!
Updating this with current information. Google now crawls with a smartphone Googlebot-Mobile user agent. See: Google blog post
Google's SEO PDF explains how to avoid cloaking penalties. Specifically, see Page 27. See: SEO PDF
The gist is, the content you serve a desktop user can be different from the content you serve a mobile user, as long as Googlebot is always served the same content you serve to any desktop user, and Googlebot-Mobile is always served the same content you serve to any mobile user. To abide by this, it seems to me you should not configure your site to serve mobile content based on finding "Googlebot-Mobile" in the user agent. The bot will supply a typical smartphone user agent string as part of it's own user agent--that's the part to rely on, or else if a new device comes out that you do not yet account for, you'll serve desktop content to it, but mobile content to Googlebot-Mobile impersonating that device.
You could use subdomain for your mobile site and redirect google mobile bot there together with smartphones