404 redirect to another server/domain - haproxy

I'm looking for a solution with redirects to another domain if the response from HTTP server was 404.
acl not_found status 404
acl found_ceph status 200
use_backend minio_s3 rsprep ^HTTP/1.1\ 404\ (.*)$ HTTP/1.1\ 302\ Found\nLocation:\ / if not_found
use_backend ceph if found_ceph
But still not working, this rule goes to minio_s3 backend.
Thank you for you advice.

When the response from this backend has status 404, first add a Location header that will send the browser to example.com with the original URI intact, then set the status code to 302 so the browser executes a redirect.
backend my-backend
mode http
server my-server 203.0.113.113:80 check inter 60000 rise 1 fall 2
http-response set-header Location http://example.com%[capture.req.uri] if { status eq 404 }
http-response set-status 302 if { status eq 404 }
Test:
$ curl -v http://example.org/pics/funny/cat.jpg
* Hostname was NOT found in DNS cache
* Trying 127.0.0.1...
* Connected to example.org (127.0.0.1) port 80 (#0)
> GET /pics/funny/cat.jpg HTTP/1.1
> User-Agent: curl/7.35.0
> Host: example.org
> Accept: */*
The actual back-end returns 404, but we don't see it. Instead...
< HTTP/1.1 302 Moved Temporarily
< Last-Modified: Thu, 04 Aug 2016 16:59:51 GMT
< Content-Type: text/html
< Content-Length: 332
< Date: Sat, 07 Oct 2017 00:03:22 GMT
< Location: http://example.com/pics/funny/cat.jpg
The response body from the back-end's 404 error page will still be sent to the browser, but -- as it turns out -- the browser will not display it, so no harm done. This requires HAProxy 1.6 or later.

#Michael's answer is rather good, but isno't working for me for two reasons:
Mainly because the %[capture.req.uri] tag resolves to empty (HA Proxy 1.7.9 Docker image)
Also due to the fact that the original assumptions are incomplete, due to the fact that the frontend section is missing...
So I struggled for a while, as you find all kinds of answers on the Internet, between those guys who swear the 404 logic should be put in the frontend, vs those who choose the backend, and any possible kind of tags...
This is my answer, which works for me.
My use case is that if an image is not found on the backend behind HA Proxy, then an S3 bucket is checked.
The entry point is: https://myhostname:8080/path/to/image.jpeg
defaults
mode http
global
log 127.0.0.1:514 local0 debug
frontend come_on_over_here
bind :8080
# The following two lines are here to save values while we have access to them. They won't be available in the backend section.
http-request set-var(txn.path) path
http-request set-var(txn.query) query
http-request replace-value Host localhost:8080 dev.local:80
default_backend onprems_or_s3_be
backend onprems_or_s3_be
log global
acl path_photos var(txn.path) -m beg /path/prefix/i/want/to/strip/off
acl p_ext_jpeg var(txn.path) -m end .jpeg
acl is404 status eq 404
http-response set-header Location https://mybucket.s3.eu-west-3.amazonaws.com"%[var(txn.path),regsub(^/path_prefix_i_want_to_strip_off/,/)]?%[var(txn.query)]" if path_photos p_ext_jpeg is404
http-response set-status 301 if is404
server onprems_server dev.local:80 check

Related

Filter by hostname is not working in my WireShark

I am using RESTFul client Insomnia to test my GET request.
I get 500 internal server error as shown below
* Preparing request to https://sample.azure-api.net/masterData/carTypes
* Using libcurl/7.57.0-DEV OpenSSL/1.0.2o zlib/1.2.11 libssh2/1.7.0_DEV
* Current time is 2019-08-30T05:03:09.029Z
* Disable timeout
* Enable automatic URL encoding
* Enable SSL validation
* Enable cookie sending with jar of 0 cookies
* Found bundle for host sample.azure-api.net: 0x205d69260b0 [can pipeline]
* Re-using existing connection! (#7) with host sample.azure-api.net
* Connected to sample.azure-api.net (XX.XXX.XXX.XX) port 443 (#7)
> GET /masterData/carTypes HTTP/1.1
> Host: sample.azure-api.net
> User-Agent: insomnia/6.6.2
> Accept: */*
< HTTP/1.1 500 Internal Server Error
< Content-Length: 111
< Content-Type: application/json
< Date: Fri, 30 Aug 2019 05:03:09 GMT
To troubleshoot, I opened wireshark, selected Ethernet2 interface and started to capture the traffic. also added a filter as follow
http.host == "sample.azure-api.net"
But I do not see any traffic filtered when I apply the above filter.
But when I try to filter like IP Destination, I get to see the traffic.
ip.dst == XX.XXX.XXX.XX && tcp.port == 443
Why filter by hostname is not working?
What I am trying to solve? Root Issue
When I try same request from c# code using rest client, I get below error
{"The request was aborted: Could not create SSL/TLS secure channel."}
So basically I am trying to find where exactly is request is failing!

How to redirect to domain name with https using haproxy

I tried to receive request and want to redirect it to other host using dns name and exposed with https protocol. For example, my server is http://8.8.8.8:10101/partnerA/getUser. I want haproxy redirect this to https://partner.com/partnerA/getUser (same path as the source).
I also want to filter by path for another redirect destination such as http://8.8.8.8:10101/partnerB/getMarketShare will redirected by HAProxy to https://subdomainb.differentpartner.com/partnerB/getMarketShare(notice the path also follow the same rule, but based on path it will give different host name.
I tried below haproxy.cfg
global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin
stats timeout 30s
user haproxy
group haproxy
daemon
defaults
log global
mode http
option httplog
option dontlognull
timeout connect 5000
timeout client 50000
timeout server 50000
#---------------------------------------------------------------------
# main frontend which proxys to the backends
#---------------------------------------------------------------------
frontend main
bind *:10101
acl url_partnerA path_beg -i /partnerA
acl url_partnerB path_beg -i /partnerB
http-request redirect scheme https if url_partnerA
http-request redirect scheme https if url_partnerB
http-request redirect prefix https://partnerA.com if url_partnerA
http-request redirect prefix https://subdomainb.differentpartner.com/ if url_partnerA
default_backend app
#---------------------------------------------------------------------
# round robin balancing between the various backends
backend app
balance roundrobin
# server app1 127.0.0.1:11003 check
But everytime I access (I use http) POST http://8.8.8.8:10101/partnerA/getUser, the log from haproxy -f haproxy10101.cfg -d will give me this
00000000:main.accept(0005)=0009 from [8.8.8.8:48554] ALPN=<none>
00000000:main.clireq[0009:ffffffff]: POST /partnerA/getUser HTTP/1.1
00000000:main.clihdr[0009:ffffffff]: Host: 8.8.8.8:10101
00000000:main.clihdr[0009:ffffffff]: User-Agent: curl/7.47.0
00000000:main.clihdr[0009:ffffffff]: Accept: */*
00000000:main.clihdr[0009:ffffffff]: Authorization: Basic dGNhc2g6RzBqM2tmMHJsMWYzIQ==
00000000:main.clihdr[0009:ffffffff]: Content-Type: application/json
00000000:main.clihdr[0009:ffffffff]: Postman-Token: 45a236c-740a-4859-a13a-1c45195a99f2
00000000:main.clihdr[0009:ffffffff]: cache-control: no-cache
00000000:main.clihdr[0009:ffffffff]: Content-Length: 218
00000000:main.clicls[0009:ffffffff]
00000000:main.closed[0009:ffffffff]
Anything I miss to make it work? Thanks

HAProxy redirect to subdomain

I am trying to redirect these:
http://www.example.co.uk/blog/xyz?a=b
https://www.example.co.uk/blog/xyz?a=b
to these:
http://blog.example.co.uk/xyz?a=b
https://blog.example.co.uk/xyz?a=b
But struggling with the documentation and the best way to do this.
* Update *
This is what I have got working at the moment. If I pass in:
http://www.example.co.uk/blog?a=b
then this redirects to:
http://blog.example.co.uk?a=b
... and the section of the config:
acl blog_page path_beg -i /blog
use_backend blog_site if blog_page
backend blog_site
reqrep ^([^\ :]*)\ \/?(.*)\/blog\/?(.*) \1\ /\2\3
redirect prefix http://blog.example.co.uk code 301
The following line in the frontend section will accomplish this rewrite and redirect.
Shown as multiple lines for clarity, this must all appear on a single line of your configuration:
http-request redirect
code 301
location https://blog.example.com%[capture.req.uri,regsub(^/blog,)]
if { hdr(host) -i www.example.com } { path_beg /blog }
If the host header matches www.example.com and path begins with blog, redirect to a location beginning with the literal string https://blog.example.com then concatenate a value derived by taking the request URI (path + query string) and using regex substitution to remove /blog from the beginning.
Verifying:
$ curl -v 'http://www.example.com/blog/posts?which=this&that=1'
* Hostname was NOT found in DNS cache
* Trying 127.0.0.1...
* Connected to www.example.com (127.0.0.1) port 80 (#0)
> GET /blog/posts?which=this&that=1 HTTP/1.1
> User-Agent: curl/7.35.0
> Host: www.example.com
> Accept: */*
>
< HTTP/1.1 301 Moved Permanently
< Content-length: 0
< Location: https://blog.example.com/posts?which=this&that=1
The redirect location appears to be correct.
If you want to redirect http and https separately, you'd need two lines, each of them testing an additional condition to determine whether the original request was over http or https.
Using the regsub() converter requires HAProxy 1.6+.

Nutch inconsistently ignores redirects

I ran into trouble with crawling (nutch 1.9/openjdk7) pretty simple redirect cases.
Here is a packet capture for the process.
Time Source Destination Protocol Info
12.988003 99.99.99.99 8.8.4.4 DNS Standard query 0xc165 A bloomberg.com
13.032343 8.8.4.4 99.99.99.99 DNS Standard query response 0xc165 A 69.191.212.191 A 69.191.251.238
13.124471 99.99.99.99 69.191.212.191 HTTP GET /robots.txt HTTP/1.0
13.228846 69.191.212.191 99.99.99.99 HTTP HTTP/1.1 301 Moved Permanently (text/html)
13.264230 99.99.99.99 8.8.4.4 DNS Standard query 0x7089 A www.bloomberg.com
13.344767 8.8.4.4 99.99.99.99 DNS Standard query response 0x7089 CNAME www.bloomberg.com.edgekey.net CNAME e4569.x.akamaiedge.net A 23.214.189.136
13.351030 99.99.99.99 23.214.189.136 HTTP GET /robots.txt HTTP/1.0
13.359121 23.214.189.136 99.99.99.99 HTTP HTTP/1.0 200 OK (text/plain)
13.448604 99.99.99.99 69.191.212.191 HTTP GET / HTTP/1.0
13.537211 69.191.212.191 99.99.99.99 HTTP HTTP/1.1 301 Moved Permanently (text/html)
13.640146 99.99.99.99 69.191.212.191 HTTP GET / HTTP/1.0
13.738564 69.191.212.191 99.99.99.99 HTTP HTTP/1.1 301 Moved Permanently (text/html)
Nutch tries to fetch http://bloomberg.com which replies with a 301 redirect to http://www.bloomberg.com. The redirect is handled correctly for robots.txt. However, for 'get /', fetcher keeps trying the original hostname, which keeps replying 301. No matter how big http.redirect.max, fetching fails (I've checked 10).
Nutch 1.9 running on
OpenJDK Runtime Environment (IcedTea 2.5.3) (7u71-2.5.3-0ubuntu0.12.04.1)
OpenJDK Client VM (build 24.65-b04, mixed mode, sharing)
Is this a bug (could you confirm it then) or just a misconfiguration?
Thanks.
This was a bug, 1.10 must to be shipped with the fix:
https://github.com/apache/nutch/commit/ed052df8822380ccfa89a9ffa1df324933669a59

Questions on proper REST api design specifically on the PUT action when updating a resource

I'm creating a REST interface (aren't we all), and I want to UPDATE a resource.
So, I think to use a PUT.
So, i read this.
My take away is that i PUT to a URL like this
/hc/api/v1/organizer/event/762d36c2-afc5-4c51-84eb-9b5b0ef2990c
with a payload, then a permanent redirect to the URL that it can GET an updated version of the resource.
In this case it happens to be the same URL, different action.
So my questions are:
Is my understanding of updating a resource correct in using a PUT, and is my understanding of the use of the PUT correct.
When a client gets a redirect does it do the same action on the redirected URL as it did on the original URL? If its "depends" is there a standard most clients follow?
I ask the 2nd question, because POSTMAN and my JQuery AJAX calls are choking. JQuery because of net::ERR_TOO_MANY_REDIRECTS. So is it redirecting and trying the PUT again, which it will get another REDIRECT?
curl blows up too but even though it says if it gets a 301 it will switch to a GET, it doesn't really seem to do that when i look at the output (below).
When curl follows a redirect and the request is not a plain GET (for example POST or PUT), it will do the following request with a GET if the HTTP response was 301, 302, or 303. If the response code was any other 3xx code, curl will re-send the following request using the same unmodified method.
CURL OUTPUT (edited for brevity) (also note how it says its going to switch to a GET [incorrectly from a POST], but then it seems to do a PUT anyway):
curl -X PUT -H "Authorization: Basic AUTHZ==" -H "Content-Type: application/json" -H "Cache-Control: no-cache" -H "Postman-Token: e80657f0-a8f5-af77-1d9d-d7bc22ed0b30" -d '{ JSONDATA"}' http://localhost:8080/hc/api/v1/organizer/event/762d36c2-afc5-4c51-84eb-9b5b0ef2990c -v -L
* Hostname was NOT found in DNS cache
* Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> PUT /hc/api/v1/organizer/event/762d36c2-afc5-4c51-84eb-9b5b0ef2990c HTTP/1.1
> User-Agent: curl/7.37.1
> Host: localhost:8080
> Accept: */*
> Authorization: Basic AUTHZ==
> Content-Type: application/json
> Cache-Control: no-cache
> Postman-Token: e80657f0-a8f5-af77-1d9d-d7bc22ed0b30
> Content-Length: 203
>
* upload completely sent off: 203 out of 203 bytes
< HTTP/1.1 301 Moved Permanently
< Connection: keep-alive
< X-Powered-By: Undertow/1
< Set-Cookie: rememberMe=deleteMe; Path=/hc; Max-Age=0; Expires=Fri, 20-Feb-2015 03:53:28 GMT
< Set-Cookie: JSESSIONID=uwI3_41LAa7vlvapTsrZdw10.macbook-air; path=/hc
* Server WildFly/8 is not blacklisted
< Server: WildFly/8
< Location: /hc/api/v1/organizer/event/762d36c2-afc5-4c51-84eb-9b5b0ef2990c
< Content-Length: 0
< Date: Sat, 21 Feb 2015 03:53:28 GMT
<
* Connection #0 to host localhost left intact
* Issue another request to this URL: 'http://localhost:8080/hc/api/v1/organizer/event/762d36c2-afc5-4c51-84eb-9b5b0ef2990c'
* Switch from POST to GET
* Found bundle for host localhost: 0x7f9e4b415430
* Re-using existing connection! (#0) with host localhost
* Connected to localhost (127.0.0.1) port 8080 (#0)
> PUT /hc/api/v1/organizer/event/762d36c2-afc5-4c51-84eb-9b5b0ef2990c HTTP/1.1
> User-Agent: curl/7.37.1
> Host: localhost:8080
> Accept: */*
> Authorization: Basic dGVzdHVzZXIxOlBhc3N3b3JkMQ==
> Content-Type: application/json
> Cache-Control: no-cache
> Postman-Token: e80657f0-a8f5-af77-1d9d-d7bc22ed0b30
>
< HTTP/1.1 500 Internal Server Error
< Connection: keep-alive
< Set-Cookie: JSESSIONID=fDXxlH2xI-0-DEaC6Dj5EhD9.macbook-air; path=/hc
< Content-Type: text/html; charset=UTF-8
< Content-Length: 8593
< Date: Sat, 21 Feb 2015 03:53:28 GMT
<
...failure ensues... It actually does a PUT
thanks in advance.
I think you're reading too much into the 301 redirect section.
If you want to update a resource using PUT, return:
201: if the resource was created
200: with the updated resource
The 301 in question only applies if there actually is a redirect in question - like, if something can be identified by name, and you need to redirect it to a url that has the id or something. (Maybe you refactor and people are still consuming the old endpoint).
So, do you really need to redirect your PUT requests? Because you should be sending back the updated resource within the same loop using 200, like stated above, instead of "redirecting to GET".
EDIT: Fix some spelling.