How and why do they redirect their robots.txt file to their homepage on their site? - robots.txt

A robots.txt file is usually just a text file under your site root directory. For example, you can view www.amazon.com/robots.txt. But today, I found a website with a strange robots.txt file. If you just type
http://xli.bugs3.com/robots.txt
it does not show a text file, instead it still shows the home page of that site.
How could it happen and why does the webmaster do this?

Assuming a fairly conventional/basic server setup, where it is just files as you say, it could simply be a htaccess redirect rule. The rule might be something like "serve a file if it's on the server, otherwise just serve the index".
Or it might be an application server like Rails, where there's no direct relationship between the server directory structure and URL pathnames.

Related

How to correctly change page extensions sitewide not to loose rankings in google

Moved content from php files to content management system. Changed page extensions from .php to without any extension, like: before some-page.php after some-page
All php pages keep at server, but made redirection in each php file, like header('Location:'. basename($_SERVER['PHP_SELF'],'.php') );. I cannot (do not know how to) do it in .htaccess, because in .htaccess is this code
RewriteRule ^([a-zA-Z0-9_-]+)$ show-content.php?$1
DirectoryIndex show-content.php?index
From google webmasters tools removed previous sitemap and added the new sitemap. See that the new sitemap accepted, but no urls included.
All above I did 3-4 days before. And today see that website dropped down in search results (some pages can not find at all in first pages).
Trying to understand what I did wrong.
Seems the first thing is wrong redirection code. Changed to header('Location:'. basename($_SERVER['PHP_SELF'],'.php'), true, 301 );. Is such code ok?
What else I did wrong?
You definitely need to do a 301 redirect from the old URL (with the extension) to the new URL (without the extension) as this tells Google the page has moved to a new location and you want all links and other SEO goodness (like rankings) need to be "transferred" to the new URL.
You can do the 301 redirect any way you want so using PHP's header() is fine as long as you are redirecting to the right page. Your example should work fine.

Creating a page on CMS

For example:
I create a page on Joomla or Wordpress and then save it.
I create an entry in the menu that points to the new page.
When I select the new entry in the menu the page opens on the browser.
The URL that appears points to a file that doesn't exist on the server.
What is the mechanism that is used by a CMS like Joomla or wordpress to accomplish this?
This is typically done with a URL rewriting module that runs on the web server (mod_rewrite for Apache or URL Rewrite for IIS on Windows). It will rewrite a request URL like /blog/article-title to something like /index.php/blog/article-title or /index.php?q=blog/article-title before the website code even sees the request. Then, the code in index.php extracts the rest of the path and determines which content to serve based on that.
For Wordpress, see http://codex.wordpress.org/Using_Permalinks for some info about how the rewrites are set up.

Creating a redirect in IIS6

I need to create a redirect from www.domain.com/page to another page in a different domain. I need the first (referring) url to have no extension (meaning no .html or .asp).
I know how to do it in Apache, but have no clue how to work with iis6.
Is there any simple way of doing this?
Here is an easy solution:
Create a folder for your website www.domain.com and then another sub folder for "page".
Create a website in IIS manager for www.domain.com
Expand that website, and then right click on your "page" folder, choose properties.
On the directory tab, choose the option: A redirection to a URL put the full domain of the target location. You may also want to examine the check boxes below it if that fits your needs.
Another option that you could do outside of IIS is setup your website, and take advantage of the "default document". if you add an index.html file into your website at http://www.domain.com/page , the default document will be called automatically without referencing the index.html in your path and you can do a javascript redirect.
<script>
window.location='http://www.someothersite.com/page.html'
</script>
This may be easier when dealing with a large number of pages.

Trying to redirect entire folder to website front page

Our site used to have a blog at oursite.com/blog
Years later, we're still getting many 404s for pages in the /blog/ folder.
How can I use htaccess to redirect the blog folder and all its contents to the front page of our site, oursite.com ?
I have tried a lot of things based on web research but the closet thing I found redirected things like:
oursite.com/blog/?p=3226
to oursite.com/?p=3226
I don't want it to work that way. All blog files no longer exist, so I just want to redirect ALL files from the blog just to the main front page, i.e. oursite.com so:
oursite.com/blog/?p=3226
or
oursite.com/blog/cool-permalink/
or
oursite.com/blog/image.jpg
or
oursite.com/blog/
would ALL simply point to oursite.com
Can you please tell me how to do this? I've spent many hours Googling it unsuccessfully...
Thanks a bunch in advance!
Michael
Create a .htaccess inside the /blog directory.
Add this line to .htaccess
ErrorDocument 404 http://oursite.com
That will take care of any URL that is requested in /blog/
If you want everyone to be redirected if the file doesn't on the whole domain no matter what directory they're in just put the .htaccess in your main directory.
If you are using Apache as your front-end, mod_rewrite is what you want to use. Something like
RewriteEngine On
RewriteRule ^/blog/.* /cool-permalink/ [R]

Can I use a `robots.txt` file for a subdirectory on my school's domain?

I own some webspace which is registered with a University. Google has unfortunately found my CV (resume) on the site, but has mis-indexed it as a scholarly publication, which is screwing up things like citation counts on Google Scholar. I tried to upload a robots.txt into my local subdirectory. The problem is that google ignores this file, and instead uses the rules listed for the school domain.
That is, the url looks like
www.someschool.edu/~myusername/mycv.pdf
I have uploaded a robots.txt, which can be found here
www.someschool.edu/~myusername/robots.txt
And Google is ignoring it and instead using the robots.txt for the school's domain
www.someschool.edu/robots.txt
How can I make Googlebot ignore my CV?
Sadly, robots.txt is defined to be whatever you get when you GET /robots.txt, so you can't use it for your subdirectory.
What you can do is use the X-Robots-Tag HTTP header, if you can use custom .htaccess files. Here's Google's documentation on X-Robots-Tag.