Removing robots.txt from tomcat - robots.txt

If I remove robots.txt from my webapps root directory, it's allow the Google bot to crawl pages in my site?
We have already disallowed all th bots, but we want to remove it.
so pls clarify, for bots does missing robots.txt file means don't crawl into the site?

A missing robots.txt file, means it's open for unlimited crawling by anyone.
Also, most websites don’t need a robots.txt file.
It is better practice to have an robots.txt listing disallowed paths, than rejecting/blocking the HTTP requests based on the User-Agent string.
A little side note:
On dynamic web pages, it's relatively easy to filter bots on runtime, using the User-Agent string, but it may be more difficult to rejecting bots on static assets, like files or images.
Also, many bots doesn't even have the word bot or crawler in it's User-Agent string, making it harder to differentiate humans from bots.

Related

robots.txt on domain instead of subdomain?

I am researching this. I need suggestions.
I have a domain that I own. It is pointed to a subdomain that is my business that I do not own. The subdomain's robots.txt file controls the images I am trying to list in the Microsoft Shopping. But Microsoft Shopping wants to crawl the image URLs. The robots.txt will not allow this.
I need to use resources to have my domain carry its own robots.txt file while integrated with the subdomain. I need to avoid redirects for Microsoft.
TIA

How do I change robots.txt on a Google site?

I need to make changes to someone's robot.txt file, but their site is managed by Google, (so no FTPing).
I have full access to the site via the browser (Site Actions / Manage Site, etc) but how do I get to the robots.txt file to update it?

How does the server know when to serve an amp page

I understand that there will be a version of a site with HTML designed for desktop devices and then the AMP pages.
Is there anything I need to do so that the site serves AMP content to mobile devices?
Good question!
Summary:
AMP supplies no automatic means of redirection, only necessary markup for its search engine (and potentially other websites/apps/search engines to send their users to the AMP page
Old methods of redirecting mobile users to mobile sites could be used, typically by detecting mobile user-agents and redirecting them via 301/302 redirect to the AMP page
Redirecting mobile users might not be worth doing because the aforementioned old methods kinda suck
Full answer:
In terms of Google and the search engine results page (SERP), you will need to include this in your desktop markup:
<link rel="amphtml"
href="https://www.example.com/url/to/amp/document.html">
and this in your AMP markup:
<link rel="canonical"
href="https://www.example.com/url/to/standard/document.html">
so that Google and other high-traffic networks like Twitter, LinkedIn or Pinterest, will detect the amphtml signature and direct mobile browsers to the AMP page accordingly. I would say Facebook but since AMP is a competitor product to Facebook Instant Articles, I suspect they will drag their heels.
AMP is of course a completely different animal, being both open-source and a web technology as opposed to a native app platform for content, but the web and native platforms stand in opposition to one another and while Google provides a great number of apps, it's clear from technologies like ServiceWorker that they are pushing for the web as a content platform—which should come as no surprise because time spent in Facebook or Apple apps is time spent away from Google search and its advertising, from which Google derives its revenue.
But I digress: obviously this rel="amphtml" declaration will only instruct Google et al. to redirect this result to mobile users from their pages. This is because a redirection policy was not the intention by Google or the AMP team, who rather envisage a world where everybody goes through Google or other big player rather than being visited directly or linked directly by email or something.
In theory, it might one day be implemented at the browser level, but it takes browser vendors long enough to standardise essential layout/styling properties and JavaScript APIs, let alone random non-standard considerations as AMP currently is. Apple will drag its heels when it comes to the browser because it would compete with their own News app.
We can probably expect that AMP redirection will be implemented in the Chrome browser (and therefore Opera), but even that could be a while yet. So, in order to force mobile devices to redirect to the AMP pages as opposed to your standard ones, you'd ultimately need to configure your web server to sniff for mobile user-agents (or less commonly supported MIME types) and redirect (use 302 for the sake of SEO) them to the AMP pages.
This may seem like something of a regression to past habits, and you'd be right to think so. The redirection will immediately slow the journey down a little bit, though AMP is valuable for its on-page optimisations as well as its HTTP response/transport time. Before the advent and current zenith of responsive web design, this is how mobile users would be catered for, especially in the WAP days. Websites would serve a mobile-friendly version served under a subdomain like mob.website.com or m.website.com. There were flavours of XHTML targeting mobile devices, which are still used by Google+ for its "basic" pages (note the DOCTYPE). These "basic" pages are reserved for devices of a low screen resolution, as we can see from this line:
<link rel="alternate"
media="only screen and (max-width: 640px)"
href="/app/basic/+SOME_PAGE">
This approach may have even served as an inspiration for AMP.
A similar redirection practice should hopefully not present a difficulty for you, because you probably intend to use amp.website.com or perhaps a subdirectory for your AMP pages anyway.
Since all websites should be responsive anyway — in terms of SEO, and because targeting only mobile devices is made more difficult by the unreliability of redirection techniques and of using user-agents and MIME types as a detection method — you might be tempted to try to estimate the connection speed or physical location of the visitor.
Then, if the connection speed is low, or if the user is located far from your origin server, it might be best to redirect them to the AMP page (since it is served from Google's CDN and uses HTTP/2 + heavy caching to serve content faster).
However, any CDN can be used for all pages to deliver them faster to everybody, not just slow connection users or people located far from your origin server; the point of AMP is only partially to deliver content via CDN and perhaps more about serving responsibly constructed pages to devices which are well known for their crappy JavaScript execution times, like mobile phones.
Ultimately, I wouldn't enforce a redirect for all mobile users. I would leave it to Google to direct visitors arriving via its search engine to be sent to the AMP pages. If AMP is going to catch on and be a long-lived product, browsers will implement it eventually.
Come to think of it, if you're serving content to mobile devices, it might be irresponsible to serve AMP pages to people using older Windows Phone or Blackberry devices whose browsers may not even properly support AMP.
There's a lot to think about but I hope I have provided an answer to your question, and if not, at least some considerations to bear in mind before deciding on the right answer for your product.
For more information about separate mobile sites, you can read this documentation on the subject provided by Google.
For examples of how to configure your web server to detect mobile user-agents and send them to a different subdomain, you can find articles or code samples quite easily if you search for them.
Just for completitude, I used the following redirect to serve AMP pages to certain user agents, it's a .htaccess for an apache web server with mod_redirect enabled:
<IfModule mod_rewrite.c>
RewriteBase /
RewriteCond %{REQUEST_URI} !/amp/$ [NC]
RewriteCond %{HTTP_USER_AGENT} (android|blackberry|googlebot\- mobile|iemobile|iphone|ipod|\#opera\ mobile|palmos|webos) [NC]
RewriteRule ^([a-zA-Z0-9-]+)([\/]*)$ https://www.yoursite.com/$1/amp/ [L,R=302]
</IfModule>
IMHO this question deserves more interest.
amdouglas answer is great and fully covers the front-end aspects. As for Server-side, Jesús Diéguez Fernández is a good start but would require to be maintained overtime to remain accurate (user agent signatures).
To complete this (and even if it was not a PHP related question), here below is a PHP snippet I use server-side to redirect mobile-devices requests.
It uses Mobile-Detect (+23M downloads), which holds an incredibly complete and up-to-date list of user-agent strings (that should be easily adapted to any programming language).
<?php
require_once "libs/Mobile_Detect.php";
$detector = new MobileDetect();
$is_mobile = $detector->isMobile();

Hashbang link previews and crawling in Facebook and others popular services

Do currently any popular websites support previewing links with hashbangs?
E.g. pasting link like
https://groups.google.com/forum/#!topic/pigfi/cF6GzPfeuO8
To a Facebook chat message.
(Translates to https://groups.google.com/forum/?_escaped_fragment_=topic/pigfi/cF6GzPfeuO8 )
There exists a spec for crawling hashbang URLs https://developers.google.com/webmasters/ajax-crawling/docs/getting-started
However, I found little information which websites or popular services currently support crawling / previewing these kind of hash bang URLs (besides GoogleBot?)

redirect for smartphones and Googlebot-mobile

I'm building a mobile version of my site for smart-phones
(iPhone/Blackberry/Android/WebOS)
and I want to redirect to the mobile version from my main site whenever the user agent is of one of the kinds listed above (my mobile site is on a different url than my Desktop site).
My mobile version is more like a WebApp and does not contain the same content as the Desktop site.
After reading This Post by Google I understand that the Googlebot expects smartphones to display the Desktop version of the site (Googlebot-Mobile is not used for smartphones)
I'm afraid that if I redirect to the mobile version for smartphones, Google will give me penalty for cloaking, How can I avoid this?
I know that including a link from the main site to the mobile version and vice versa helps a lot.
Any other advice/best practices on how to be google friendly when creating mobile versions of the site for smartphones?
From the article:
For Googlebot and Googlebot-Mobile, it does not matter what the URL structure is as long as it returns exactly what a user sees too.
The key thing is you must be consistent in the content you give to the bot and the one you serve to the user.
Another interesting excerpt from the article:
For now, we expect smartphones to handle desktop experience content so there is no real need for mobile-specific effort from webmasters. However, for many websites it may still make sense for the content to be formatted differently for smartphones, and the decision to do so should be based on how you can best serve your users.
You can also serve a different page/content/styling based on the UA string, as stated in the article:
If you serve all types of content from www.example.com, i.e. serving desktop-optimized content or mobile-optimized content from the same URL depending on the User-agent, this will also lead to correct crawling by Googlebot and Googlebot-Mobile. This is not considered cloaking by Google.
I think it all boils down how different the content/styling is. If it's only slightly different, I would probably go with the same url serving both. If it's dramatically different, I would use a different url for smartphones.
Hope this helps!
Updating this with current information. Google now crawls with a smartphone Googlebot-Mobile user agent. See: Google blog post
Google's SEO PDF explains how to avoid cloaking penalties. Specifically, see Page 27. See: SEO PDF
The gist is, the content you serve a desktop user can be different from the content you serve a mobile user, as long as Googlebot is always served the same content you serve to any desktop user, and Googlebot-Mobile is always served the same content you serve to any mobile user. To abide by this, it seems to me you should not configure your site to serve mobile content based on finding "Googlebot-Mobile" in the user agent. The bot will supply a typical smartphone user agent string as part of it's own user agent--that's the part to rely on, or else if a new device comes out that you do not yet account for, you'll serve desktop content to it, but mobile content to Googlebot-Mobile impersonating that device.
You could use subdomain for your mobile site and redirect google mobile bot there together with smartphones