Does robots.txt apply to subdomains? - robots.txt

Let's say I have a test folder (test.domain.com) and I don't want the search engines to crawl in it, do I need to have a robots.txt in the test folder or can I just place a robots.txt in the root, then just disallow the test folder?

Each subdomain is generally treated as a separate site and requires their own robots.txt file.

When the crawler fetches test.domain.com/robots.txt that is the robots.txt file that it will see. It will not see any other robots.txt file.

If your test folder is configured as a virtual host, you need robots.txt in your test folder as well. (This is the most common usage).
But if you move your web traffic from subdomain via .htaccess file, you could modify it to always use robots.txt from the root of your main domain.
Anyway - from my experience it's better to be safe than sorry and put (especially declining access) files robots.txt in all domains you need to protect. And double-check if you're getting the right file when accessing:
http://yourrootdomain.com/robots.txt
http://subdomain.yourrootdomain.com/robots.txt

Related

What's the best way to add a robots.txt file to a SvelteKit project?

The official SvelteKit docs on the topic of SEO, which mentions that a sitemap can be dynamically created using an endpoint. I could not find other documentation related to the robots.txt file, that can be used to reference the sitemap for web crawlers and SEO optimization.
I looked on other forums as well but could not find a solution. I created my robots.txt and included it at the root of my project / and in /src as well. When I search for the project file on nazar-design.com/robots.txt I am served with a 404 error message.
Any idea how to fix this?
You can place files in the directory named in your kit.files.assets configuration (which is the /static folder by default) to be served to users as-is.
In your case, placing the file at /static/robots.txt would yield the desired nazar-design.com/robots.txt URL.

Markdown CMS bypasses htaccess file

I'm using Pico CMS, a small markdown project - http://pico.dev7studios.com/- installed and running good, however I am trying to password protect a folder with htaccess file but the cms is bypassing this and showing the file I call in the browser.
The funny thing is that the url for the file does not contain the "content" folder which is where all the files/pages are stored. All the other folders are contained in the url. This is the only reason that I can find to explain what's happening.
If I manually enter the url to that same folder, which is password protected, which includes the "content" folder in it's path, then I get the htaccess auth window showing. This proves the htaccess file is being read, but not when the CMS accesses it. Can anyone explain why and how to force the folder to be protected when I call any page with the browser.
If you open up a Pico site, your request is redirected to the index.php file (via mod_rewrite). That's why the "content" folder does not show up in the url.
That's also the reason why you are not asked for a password. The index.php file does not have to pass the htaccess auth to get to the *.md files.
Read this for a bigger picture:
https://stackoverflow.com/a/10923542/3294973
This plugin may be interesting to you:
https://github.com/jbleuzen/Pico-Private
Unfortunately, it can't protect only part of the website at this point.
Protecting single pages is now possible. (Check my GitHub Fork)

Unlist a subdomain or directory according to robotstxt.org

According to robotstxt.org
The first answer is a workaround: You could put all the files you
don't want robots to visit in a separate sub directory, make that
directory un-listable on the web (by configuring your server)
How do I configure my server to have an unlisted directory or subdomain?
It depends on the server and its configuration.
As it may be a privacy/security issue to list the content of a folder, most servers will probably not do it by default. Some servers might display folder content only if there is no index.html file.
For Apache, see mod_autoindex.
You can easily test it if your server lists content or not:
create a folder test
add a dummy.txt file
visit the URL of this folder, e.g. http://example.com/test/
If you get an error message, your server doesn’t list content. If you see a link list containing dummy.txt, your server does list content.

Change document-root on Shared-host?

When you deploy a Zend Framework website to a shared host, you usually cannot change the DocumentRoot to point at the public/ folder of the website. As a result the URL to the website is now http://www.example.com/public/.
Apart from choosing a proper host..there's any workaround?
thanks
Luca
If you have access to directories above public, you can put all non public files there.
Otherwise, you can put everything in a subdirectory, and block access to it with an .htaccess file.

Problems with setting the path for Zend framework, needed for Youtube API

I copied & pasted this text here. It seems the editor seems to format some parts randomly. ;)
I downloaded ZendGdata 1.9.6, extracted it & uploaded it to my site's
root folder ..., which I need for use with Youtube API to get videos onto my site.
I must say I’m new to all this, and so I would appreciate taking this into account.
The library folder is at /ZendGdata/library.
The problem I'm having is Step. 3 when I follow instructions
(http://code.google.com/intl/de-DE/apis/gdata/articles/php_client_lib.html#gdata-installation)
for setting it up for that purpose.
Download the Google Data Client Library files.
Decompress the downloaded files. Four sub-directories should be
created:
demos — Sample applications
documentation — Documentation for the client library files
library — The actual client library source files.
tests — Unit-test files for automated testing.
Add the location of the library folder to your PHP path (see the next section)
One of the suggested locations to add the path, apart from the .htaccess file is in php.ini.
My site is on shared hosting. I have no access to the main php.ini file, but I’m allow to create one if I need one. For Drupal CMS, for some functions, it suffices placing one in the root folder.
I added this line:
include_path=".:/usr/lib/php:/usr/local/lib/php:/home/habaris6/
public_html/site.root.folder/ZendGdata/library";
When I however go to mysite.com/ZendGdata/demos/Zend/Gdata/InstallationChecker.php to test the set up, like is mentioned in the
documentation on Youtube, I get the error:
PHP Extension ErrorsTested No errors found
Zend Framework Installation Errors: Tested 0
Exception thrown trying to access Zend/Loader.php using 'use_include_path' = true.
Make sure you include Zend Framework in your include_path which currently
contains: .:/usr/lib/php:/usr/local/lib/php
SSL Capabilities Errors: Not tested
YouTube API Connectivity Errors: Not tested
So my question is: Is that the correct way to “Add the location of the library folder to your PHP path” ?
I’m a bit mixed up.
Someone was saying the php.ini file is only active in the folder where it is located. If that is the case, which of the ZendGdata folders should have it?
As I said, my purpose is to have a the Zend framework properly set up to allow using Youtube API, something I also yet have to learn to do.
In Youtube API Google group, I was referred here. The documentation coming with the downloaded file & at zend.com pre-supposes, one knows much more than some beginners like me.
Another person said I try placing this
$clientLibraryPath = '/home/habaris6/public_html/site.root.folder/ZendGdata/library';
$oldPath = set_include_path(get_include_path() . PATH_SEPARATOR . $clientLibraryPath);
in mysite.com/ZendGdata/demos/Zend/Gdata/InstallationChecker.php
Whereas everything I had tried before failed, except fot the first test, when I placed the above snippet in the installation checker, I got positive tests for everything:
Ran PHP Installation Checker on 2009-12-09T21:16:08+00:00
PHP Extension ErrorsTested: No errors found
Zend Framework Installation Errors Tested No errors found
SSL Capabilities ErrorsTested No errors found
YouTube API Connectivity ErrorsTested No errors found
Does it mean if I place that snippet in install checker, all scripts needing the library can access it?
If not, please let me know what exactly to place in the self-made php.ini & in which folder(s) it should be.
Should that not work, and I were to use .htaccess files, what exactly, based on the folders mentioned above should be the content & exactly which folders should they be in? I read that the .htaccess files should be placed in each folder. Does it really mean I should place one in each of the ZendGdata folders?
I would be grateful for any guidance enabling me to finally start, after failing to sufficient get responses elsewhere.
Thanks in advance.
It's not necessary to put all the ZendGdata code under your website document root. In fact, as a rule I don't put PHP class libraries in a location that can be accessed directly by web requests, because if there's any way to do mischief by invoking the class files directly, then anyone can do it.
Instead, put libraries outside your document root and then reference them from scripts that are run directly. For example, you could create a directory phplib as a sister to your public_html directory. Then upload the ZendGdata bundle under that phplib directory.
You can set your PHP include path in a .htaccess file. You don't need to create a .htaccess file in every directory, because the directives in any .htaccess file apply to all files and directories under the directory where the .htaccess resides. See http://httpd.apache.org/docs/2.2/howto/htaccess.html for more information.
So I would recommend creating a .htaccess file at /home/habaris6/public_html/site.root.folder containing the following directives:
<IfModule mod_php5.c>
php_value include_path ".:/usr/local/lib/php:/home/habaris6/phplib/ZendGdata/library"
</IfModule>
See http://php.net/manual/en/configuration.changes.php for more info on this.
Note that this assumes your webhosting company allows you to use .htaccess files, and that they allow you to use the php_value directive in .htaccess files. Enabling these options is an Apache configuration and they could have their own policies against that for reasons of performance or security. You should contact them for this answer; no one on the internet can answer questions about your hosting provider's policies.
If you choose to use the set_include_path() PHP function to append a directory to your runtime include path, you need to do this in each file that serves as a landing point for a web request. That is, if you permit a request to be made directly to foo.php then you need to add the code to foo.php. Any files or classes subsequently included by foo.php use the include path you defined.
Note also that whatever method you use to define the include path, it has to take effect before your script tries to load any PHP class files via the include path. The .htaccess method should accomplish this, and if you use the code method you just have to put the code high enough in your PHP script.
I don't use the method of creating a custom php.ini file under each directory within your site document tree. That's a new feature of PHP 5.3.0, not supported by earlier versions of PHP. If you're using Apache you should just use .htaccess for the same effect.