HAProxy ACL not matching user-agent in file - haproxy

I'm trying to setup HAProxy to deny requests from a user-agent blacklist, but it's not working.
I'm doing as explained in the answer: https://serverfault.com/a/616974/415429 (that is the example given in the official HAProxy docs: http://cbonte.github.io/haproxy-dconv/1.8/configuration.html#7)
The example in the official docs is: acl valid-ua hdr(user-agent) -f exact-ua.lst -i -f generic-ua.lst test
I've created a demo to simulate the issue of the user agent not working in a condition:
defaults
timeout connect 5s
timeout client 30s
timeout server 30s
frontend frontend
mode http
bind :80
acl is_blacklisted_ua hdr(User-Agent) -f /path/to/ua-blacklist.lst
http-request deny deny_status 403 if is_blacklisted_ua
http-request deny deny_status 404
Then if I access the browser at localhost:8080 it returns the status 404 instead of 403 (haproxy is in a docker container that forwards port 8080 to 80)
The file /path/to/ua-blacklist.lst is just:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36
localhost:8080
in which the 1st line is my user agent (and the 2nd line is to test with the Host header, as explained below). I can see it in the Chrome inspector, and if I capture the user-agent header in haproxy and log it, I can see it too (it's exactly the same).
But if I change (only for testing purposes):
acl is_blacklisted_ua hdr(User-Agent) -f /path/to/ua-blacklist.lst
To:
acl is_blacklisted_ua hdr(Host) -f /path/to/ua-blacklist.lst
to use the Host header. It then gives 403 status code (it works, because it matches with the 2nd line).
Then if I change the 2nd line localhost:8080 to localhost:8081 in the file it then gives 404 (as expected).
So, it seems the user agent header is not retrieved correctly, or can't match the provided values (I even tried to capture it to see if there's some difference, to no avail). The Host header works, tough.
I also tried to use hdr(user-agent) (in lowercase), as well as some combinations like hdr_sub instead of hdr, and the -i option for case insensitivity, but all these attempts didn't work too. I'm sure the user-agent value in the file is correct.
Update (2021-03-12)
I was able to make it work defining a user agent string and running curl with that user agent:
$ curl -o /dev/null -s -w "%{http_code}\n" -H "user-agent: test-ua" http://localhost:8080
403
I also tried a user agent with spaces test-ua space and it worked too, but using the user agent Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36 didn't return 403.
I'll try to dig it more to see if I can solve.
Any suggestions?

I tried to dig the problem using curl, and I saw that I achieved the desired effect of returning 403 in the cases I tested.
Then I tried to use the entire user agent and it didn't work, so I tried to use just the beginning of the user agent and included the other parts, until when I included (KHTML, like Gecko) and it stopped working.
So I ended up discovering that commas weren't working (tested with curl and a very simple test,comma user agent). Then I found that HAProxy handles commas in list files differently, as per:
https://discourse.haproxy.org/t/comma-in-acl-list-file/2333/2
I was able to solve it with the solution provided in the above answer, using req.fhdr: req.fhdr(User-Agent) instead of hdr(User-Agent).
According to the docs:
[...] It differs from req.hdr() in that any commas present in the
value are returned and are not used as delimiters. This is sometimes
useful with headers such as User-Agent.
(Weird that the official example of list files uses the user-agent header with hdr instead of req.fhdr, which can be misleading, and it was, for me at least)

Related

Fail2ban - How to detect the client IP in Apache Logs

As much as I like Fail2ban's concept, I'm giving up on it because it's too difficult to configure filters.
I'm looking to create an Apache-404 filter simply to detect IPs causing excessive 404 errors while trying to hit random pages.
For exmaple - I have the following different logs formats, how can I detect the IP:
sub.domain.com:443 145.86.60.76 - - [12/May/2022:08:35:00 +0300] "GET /folder/filepath/styles.js?t=KA9B HTTP/1.1" 404 2212 "https://sub.domain.com/path" "Mozilla/5.0 (iPhone; CPU iPhone OS 15_4_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko)"
99.39.28.218 - - [12/May/2022:02:39:33 +0000] "GET /js/amcharts/amcharts.js HTTP/1.1" 404 64258 "https://sub.domain.com/path" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.54 Safari/537.36"
Is there an easy way to build and test regex in Fail2ban? The fail2ban-regex command is not informative. It says when a line is matched but doesn't say which portion is actually matching the . I ran into cases where Fail2ban jailed its own domain IP.
I admit I'm not that good with regex and it's a bit difficult for me to get things to work.
This is my last resort before I move on.
you can give a try to crowdsec. There is already an apache2 parser and scenario that detect this behavior.
You can easily install it using the documentation. It will automatically detect apache2 and install for you the apache2 collections which contains the wanted scenario you are looking for.
And you can benefit from the power of the community by getting others IPs that already attacks on the same services you are running (based on the scenarios you installed using crowdsec).
There is also a discord community where you can ask questions if you are stuck somewhere.

How can Plack debug a request?

I am debugging an HTTP Client with Plack, where the plack server listens on port 80 and the client sends request. How can I see the client request using Plack? I am trying to do something like this:
my $app = sub {
my $self = shift;
[200, ['Content-Type' => 'text/html'], [ $self ]];
};
How can I debug the request?
PSGI/Plack is an abstraction around HTTP. Its goal is not to have to bother about the implementation details (as much as without it). At the time your app sees the request, it's already been parsed into the $env and is in Plack's representation.
You could monkey-patch something into Plack::HTTPParser::PP to dump the $chunks out to see what's coming in. You would have to set PLACK_HTTP_PARSER_PP=1 in your environment to make sure it loads the pure Perl version.
However that seems really tedious. If you're on Linux, you can use netcat (nc). Listen on your port, and send requests there with the client you are testing.
$ nc -l 3000
GET / HTTP/1.1
Host: localhost:3000
User-Agent: curl/7.68.0
Accept: */*
And on another terminal...
$ curl localhost:3000
If you don't care about the exact representation, but only about whether it's got the right stuff in it after parsing, start by dumping out $env in your Plack app instead.

Drools workbench (business-central) requests timing out

I have installed business central along with Keycloak authentication using MySQL as a database for storing Keycloak's data. The business-central workbench and Keycloak server are behind Nginx.
While working on the workbench some of the request timeout giving a 504 error code. The whole business central UI freezes and the user is not able to do anything after that.
The urls that error out in 504 are like: https://{host}:{port}/business-central/out.43601-24741.erraiBus?z=105&clientId=43601-24741
Other details about the setup are as below:
Java: 1.8.0_242
Business central version: 7.34.Final
Keycloak version: 9.0.0
MySql: 8
Java options for business central: -Xms1024M -Xmx2048M -XX:MaxPermSize=2048M -XX:MaxHeapSize=2048M
Note: All of this setup of mine is on a 4GB EC2 instance.
Any help on this issue would be appreciated.
EDIT: I have checked the access_log.log and it looks like the server takes more than 45 sec to process the request. Here is a log:
"POST /business-central/in.93979-28827.erraiBus?z=15&clientId=93979-28827&wait=1 HTTP/1.1" 200 "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36"i 45001 45.001
EDIT 2: Here is a sample request data that is sent:
[{"CommandType":"CDIEvent","BeanType":"org.kie.workbench.common.screens.library.api.ProjectCountUpdate","BeanReference":{"^EncodedType":"org.kie.workbench.common.screens.library.api.ProjectCountUpdate","^ObjectID":"1","count":1,"space":{"^EncodedType":"org.uberfire.spaces.Space","^ObjectID":"2","name":"Fraud_Team"}},"FromClient":"1","ToSubject":"cdi.event:Dispatcher"},{"ToSubject":"org.kie.workbench.common.screens.library.api.LibraryService:RPC","CommandType":"getAllUsers:","Qualifiers":{"^EncodedType":"java.util.ArrayList","^ObjectID":"1","^Value":[]},"MethodParms":{"^EncodedType":"java.util.Arrays$ArrayList","^ObjectID":"2","^Value":[]},"ReplyTo":"org.kie.workbench.common.screens.library.api.LibraryService:RPC.getAllUsers::94:RespondTo:RPC","ErrorTo":"org.kie.workbench.common.screens.library.api.LibraryService:RPC.getAllUsers::94:Errors:RPC"}]
The URL hit is : business-central/in.59966-45867.erraiBus?z=56&clientId=59966-45867&wait=1
It took more than a minute to process.
Problem Description
I had this same problem on 7.38.0. The problem, I believe, is that ERRAI seems to keep rolling 45 second requests open between the client and server to ensure communication is open. For me, Nginx had a default socket timeout of 30s which meant that it was returning a 504 gateway timeout for these requests, when in actuality they weren't "stuck". This would only happen if you didn't do anything within Business Central for 30 seconds, as otherwise the request would close and a new one takes over. I feel like ERRAI should really be able to recover from such a scenario, but anyway.
Solution
For me, I updated the socket timeout of my Nginx server to 60s such that the 45s requests didn't get timed out by Nginx. I believe this is equiavalent to the proxy_read_timeout config in Nginx.
If you can't touch your Nginx config, it seemed like there may also be a way to turn off the server to client communication as outlined here: https://docs.jboss.org/errai/4.0.0.Beta3/errai/reference/html_single/#sid-59146643_BusLifecycle-TurningServerCommunicationOnandOff. I didn't test this as I didn't need to, but it may be an option.

What is sent to the server when I modify the Host: header with FiddlerScript?

I'm using FiddlerScript to modify the request as follows:
oSession.oRequest["Host"] = "www.example.com";
oSession["x-overridehost"] = "Dotted.Quad.IP.Address";
Now, when I inspect one of the modified sessions, I see this:
GET https://www.example.com/rest/of/url HTTP/1.1
Host: www.example.com
My question is whether the host name from the full URL is passed to the server, or if the server is only sent something like:
GET /rest/of/url HTTP/1.1
in the first line of the request. I don't have access to the server's encryption key, so I can't use something like Wireshark to examine the exact traffic that is going out over the network.
If it helps at all, I see the following when performing a GET to an application running on my local machine:
GET http://localhost:51425/ HTTP/1.1
Host: localhost:51425
.
.
.
GET should always include a fully qualified domain name.

How to send HTTP Commands through Port 80

Breif Description of what I am trying to accomplish. So I am working with Crestrons Simpl+ software. My job is to create a module for a sound masking system called QT Pro. Now, QT Pro has an API where you can control it via HTTP. I need a way to establish a connection with the QT Pro via HTTP( I have everything I need, IP, Username, Password).
Whats the problem? I have just started working with this language. Unfortunately there isn't as much documentation as I would like, otherwise I wouldn't be here. I know I need to create a socket connection via TCP on port 80. I just don't know what I'm supposed to send through it.
Here is an example:
http://username:password#address/cmd.htm?cmd=setOneZoneData&ZN=Value&mD=Value
&mN=Value&auxA=Value&auxB=Value&autoR=Value
If I were to put this into the URL box, and fill it in correctly. then it would change the values that I specify. Am I supposed to send the entire thing? Or just after cmd.htm? Or is there some other way I'm supposed to send data? I'd like to stay away from the TCP/IP Module so I can keep this all within the same module.
Thanks.
You send
GET /cmd.htm?cmd=setOneZoneData&ZN=Value&mD=Value&mN=Value&auxA=Value&auxB=Value&autoR=Value HTTP/1.1
Host: address
Connection: close
(End with a couple of newlines.)
If you need to use HTTP basic authentication, then also include a header like
Authorization: Basic dXNlcm5hbWU6cGFzc3dvcmQ=
where the gibberish is the base64-encoded version of username:password.
But surely there is some mechanism for opening HTTP connections already there for you? Just blindly throwing out headers like this and hoping the response is what you expect is not robust, to say the least.
To see what is going on with your requests and responses, a great tool is netcat (or telnet, for that matter.)
Do nc address 80 to connect to server address on port 80, then paste your HTTP request:
GET /cmd.htm HTTP/1.1
Host: address
Authorization: Basic dXNlcm5hbWU6cGFzc3dvcmQ=
Connection: close
and see what comes back. SOMETHING should come back. (Remember to terminate with two newlines.)
To see what requests your browser is sending when you do something that works, you can listen like this: nc -l -p 8080.
Then direct your browser to localhost:8080 with the rest of the URL as before, and you'll see the request that was sent. (Then you can type back to see how the browser handles the response.)