What position order to put Host: directive in robots.txt

What position order to put Host: directive in robots.txt - robots.txt

Should the Host directive in my robots.txt be at the top or the bottom of the file or does the order not matter.
Here is my robots.txt file :
User-agent: *
Crawl-delay: 10
Disallow: /administrator/
Does each User-Agent specified also require the host directive ?

From Yandex: https://yandex.com/support/webmaster/controlling-robot/robots-txt.html#host
[…] the Host directive is intersectional, so it will be used by the robot regardless of its location in robots.txt.
For every robots.txt file, only one Host directive is processed. If several directives are indicated in the file, the robot will use the first one.
For example:
Host: myhost.ru # uses
User-agent: *
Disallow: /cgi-bin
User-agent: Yandex
Disallow: /cgi-bin
Host: www.myhost.ru # is not used
So regardless of what User-Agent the Host directive is under or how many Host directives there are in the robots.txt and position of them only the first occurrence will be the one that is used.

Related

X-Frame-Options Header Not Set: How do I set it please?

I am using Apache server. While doing security testing, I got these error reports which says:
X-Frame-Options Header Not Set. For this I know that there are 3 types of X-Frame Options. But where do I implement the SAMEORIGIN option and how?
Header set X-Frame-Options: "SAMEORIGIN"
Tried adding the above in apache2.conf in /etc/apache2/
Tried with .htaccess file also
Restarted Apache and tried in Chrome , Developer Tools -> Networks -> Headers
No effect of new header . Please clarify how to add this header with file details.

Firstly look for .htaccess file in the html folder in the file manager (it could be par of the hidden files) and input this code
<If module mod_headers.c> Header always append X-Frame-Options SAMEORIGIN </IfModule>
After that test again for Clickjacking

404 redirect to another server/domain

I'm looking for a solution with redirects to another domain if the response from HTTP server was 404.
acl not_found status 404
acl found_ceph status 200
use_backend minio_s3 rsprep ^HTTP/1.1\ 404\ (.*)$ HTTP/1.1\ 302\ Found\nLocation:\ / if not_found
use_backend ceph if found_ceph
But still not working, this rule goes to minio_s3 backend.
Thank you for you advice.

When the response from this backend has status 404, first add a Location header that will send the browser to example.com with the original URI intact, then set the status code to 302 so the browser executes a redirect.
backend my-backend
mode http
server my-server 203.0.113.113:80 check inter 60000 rise 1 fall 2
http-response set-header Location http://example.com%[capture.req.uri] if { status eq 404 }
http-response set-status 302 if { status eq 404 }
Test:
$ curl -v http://example.org/pics/funny/cat.jpg
* Hostname was NOT found in DNS cache
* Trying 127.0.0.1...
* Connected to example.org (127.0.0.1) port 80 (#0)
> GET /pics/funny/cat.jpg HTTP/1.1
> User-Agent: curl/7.35.0
> Host: example.org
> Accept: */*
The actual back-end returns 404, but we don't see it. Instead...
< HTTP/1.1 302 Moved Temporarily
< Last-Modified: Thu, 04 Aug 2016 16:59:51 GMT
< Content-Type: text/html
< Content-Length: 332
< Date: Sat, 07 Oct 2017 00:03:22 GMT
< Location: http://example.com/pics/funny/cat.jpg
The response body from the back-end's 404 error page will still be sent to the browser, but -- as it turns out -- the browser will not display it, so no harm done. This requires HAProxy 1.6 or later.

#Michael's answer is rather good, but isno't working for me for two reasons:
Mainly because the %[capture.req.uri] tag resolves to empty (HA Proxy 1.7.9 Docker image)
Also due to the fact that the original assumptions are incomplete, due to the fact that the frontend section is missing...
So I struggled for a while, as you find all kinds of answers on the Internet, between those guys who swear the 404 logic should be put in the frontend, vs those who choose the backend, and any possible kind of tags...
This is my answer, which works for me.
My use case is that if an image is not found on the backend behind HA Proxy, then an S3 bucket is checked.
The entry point is: https://myhostname:8080/path/to/image.jpeg
defaults
mode http
global
log 127.0.0.1:514 local0 debug
frontend come_on_over_here
bind :8080
# The following two lines are here to save values while we have access to them. They won't be available in the backend section.
http-request set-var(txn.path) path
http-request set-var(txn.query) query
http-request replace-value Host localhost:8080 dev.local:80
default_backend onprems_or_s3_be
backend onprems_or_s3_be
log global
acl path_photos var(txn.path) -m beg /path/prefix/i/want/to/strip/off
acl p_ext_jpeg var(txn.path) -m end .jpeg
acl is404 status eq 404
http-response set-header Location https://mybucket.s3.eu-west-3.amazonaws.com"%[var(txn.path),regsub(^/path_prefix_i_want_to_strip_off/,/)]?%[var(txn.query)]" if path_photos p_ext_jpeg is404
http-response set-status 301 if is404
server onprems_server dev.local:80 check

HAProxy redirect to subdomain

I am trying to redirect these:
http://www.example.co.uk/blog/xyz?a=b
https://www.example.co.uk/blog/xyz?a=b
to these:
http://blog.example.co.uk/xyz?a=b
https://blog.example.co.uk/xyz?a=b
But struggling with the documentation and the best way to do this.
* Update *
This is what I have got working at the moment. If I pass in:
http://www.example.co.uk/blog?a=b
then this redirects to:
http://blog.example.co.uk?a=b
... and the section of the config:
acl blog_page path_beg -i /blog
use_backend blog_site if blog_page
backend blog_site
reqrep ^([^\ :]*)\ \/?(.*)\/blog\/?(.*) \1\ /\2\3
redirect prefix http://blog.example.co.uk code 301

The following line in the frontend section will accomplish this rewrite and redirect.
Shown as multiple lines for clarity, this must all appear on a single line of your configuration:
http-request redirect
code 301
location https://blog.example.com%[capture.req.uri,regsub(^/blog,)]
if { hdr(host) -i www.example.com } { path_beg /blog }
If the host header matches www.example.com and path begins with blog, redirect to a location beginning with the literal string https://blog.example.com then concatenate a value derived by taking the request URI (path + query string) and using regex substitution to remove /blog from the beginning.
Verifying:
$ curl -v 'http://www.example.com/blog/posts?which=this&that=1'
* Hostname was NOT found in DNS cache
* Trying 127.0.0.1...
* Connected to www.example.com (127.0.0.1) port 80 (#0)
> GET /blog/posts?which=this&that=1 HTTP/1.1
> User-Agent: curl/7.35.0
> Host: www.example.com
> Accept: */*
>
< HTTP/1.1 301 Moved Permanently
< Content-length: 0
< Location: https://blog.example.com/posts?which=this&that=1
The redirect location appears to be correct.
If you want to redirect http and https separately, you'd need two lines, each of them testing an additional condition to determine whether the original request was over http or https.
Using the regsub() converter requires HAProxy 1.6+.

Right way to enable requests with method DELETE in nginx

I was writing a RESTful applicaiton in PHP and enabled DELETE, PUT requests for nginx.
location / {
root html;
index index.php index.html index.htm;
dav_methods PUT DELETE;
}
When I executed a REST Request with method DELETE, which I wanted to handle inside my index.php - nginx removed the html folder.
What is the right way to tell nginx to pass DELETE requests to my index.php ?

Nginx does not disable PUT or DELETE requests, but it does not allow these requests on a folder index. There isn't really anything that needs to be enabled with nginx (you should remove the dav_methods line), but you need to avoid accessing your index.php through the index directive like:
index index.php index.html index.htm;
Instead use try_files to match the index.php file ie like:
try_files $uri /index.php$is_args$args;
In this case nginx wont complain about your DELETE method.

NginX executes those HTTP methods (DELETE, PUT) directly without even invoking PHP engine because they are handled by the DAV extension inside nginX.
To overcome the issue, you may use POST HTTP method for all of your API calls, but add additional custom header to indicate the actual REST method - instead of this
PUT /api/Person/4 HTTP/1.1
Host: localhost:10320
Content-Type: application/json
Cache-Control: no-cache
you will invoke this
POST /api/Person/4 HTTP/1.1
Host: localhost:10320
Content-Type: application/json
X-REST-Method: PUT
Cache-Control: no-cache
and then in PHP you will check in this way
if($_SERVER['HTTP_X_REST_METHOD']!='')
switch($_SERVER['HTTP_X_REST_METHOD'])
{
case 'PUT':
...
break;
case 'PATCH':
...
break;
case 'DELETE':
...
break;
}

nginx static index redirect

This seems ridiculous but I've not found a working answer in over an hour of searching.
I have a static website running off nginx (which happens to be behind Varnish). The index file is called index.html. I want to redirect anyone who actually visits the URL mydomain.com/index.html back to mydomain.com.
Here is my nginx config for the site:
server {
listen 8080;
server_name www.mydomain.com;
port_in_redirect off;
location / {
root /usr/share/nginx/www.mydomain.com/public;
index index.html;
}
rewrite /index.html http://www.mydomain.com/ permanent;
}
http://www.mydomain.com/index.html responds as expected with a 301 with the location http://www.mydomain.com/ but unfortunately http://www.mydomain.com/ also serves a 301 back to itself so we get a redirect loop.
How can I tell nginx to only serve the 301 if index.html is literally in the request?

Add a new location block to handle your homepage, and use try_files directive (instead of "index index.html;") to look for the index.html file directly. Note that try_files requires you to enter at least 2 choices. So I put the same file twice.
location = / {
root /usr/share/nginx/www.mydomain.com/public;
try_files /index.html /index.html;
}
Looks good based on my experiment:
curl -iL http://www.mydomain.com/index.html
HTTP/1.1 301 Moved Permanently
Server: nginx
Date: Sat, 16 Mar 2013 09:07:27 GMT
Content-Type: text/html
Content-Length: 178
Connection: keep-alive
Location: http://www.mydomain.com/
HTTP/1.1 200 OK
Server: nginx
Date: Sat, 16 Mar 2013 09:07:27 GMT
Content-Type: text/html
Content-Length: 4
Last-Modified: Sat, 16 Mar 2013 08:05:47 GMT
Connection: keep-alive
Accept-Ranges: bytes
[UPDATE]
The root cause of the redirect loop is the 'index' directive, which triggers nginx to do another round of location match again. That's how the rewrite rule outside the location block gets executed again, causing the loop. So the 'index' directive is like a "rewrite...last;" directive. You don't want that in your case.
The trick is to not trigger another location match again. try_files can do that efficiently. That's why I picked it in my original answer. However, if you like, another simple fix is to replace
index index.html;
by
rewrite ^/$ /index.html break;
inside your original "location /" block. This 'rewrite...break;' directive will keep nginx stay inside the same location block, effectively stop the loop. However, the side effect of this approach is that you lose the functionality provided by 'index' directive.
[UPDATE 2]
Actually, index directive executes after rewrite directive. So the following also works. Note that I just added the rewrite...break; line. If the request uri is "/", nginx finds the existing file /index.html from the rewrite rule first. So the index directive is never being triggered for this request. As a result, both directives can work together.
location / {
root /usr/share/nginx/www.mydomain.com/public;
index index.html;
rewrite ^/$ /index.html break;
}

Looks like you really don't want index.php to show up in the address bar, is that correct?
If you add a rewrite directive to the nginx config, you'll get a redirect loop, as you have experienced. If you are open to a javascript solution, you can place this anywhere in your index.html to silently rewrite the address bar:
<script>
history.pushState(null, '', '/');
</script>
For more information
Keep in mind that while most modern browsers support the history API, not all do (namely, most versions of IE).

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

What position order to put Host: directive in robots.txt - robots.txt

Should the Host directive in my robots.txt be at the top or the bottom of the file or does the order not matter. Here is my robots.txt file : User-agent: * Crawl-delay: 10 Disallow: /administrator/ Does each User-Agent specified also require the host directive ?

Related

X-Frame-Options Header Not Set: How do I set it please?

404 redirect to another server/domain

HAProxy redirect to subdomain

Right way to enable requests with method DELETE in nginx

nginx static index redirect

Categories

Resources