HAProxy reqrep with text after last backreference - haproxy

I am trying to redirect links that point at an old application to a new one. The new application can't handle direct links, so I want to point them to a search, but this search needs to be quoted to work properly. I have tried the following configuration but get 400 Bad Request if I do:
reqrep ^([^\ :]*)\ /\?pdf=(.*) \1\ /newpdf?search=%22\2%22
The closest I have come is to remove everything after the last backreference, like this:
reqrep ^([^\ :]*)\ /\?pdf=(.*) \1\ /newpdf?search=%22\2
Is it not possible to put anything after the last backreference?

The problem isn't text after the backreference. You are overlooking the nature of the string you are manipating:
GET /page?pdf=foo HTTP/1.1 <<< here
After the URI there is a space and the HTTP version. You're capturing that inside \2. Separate it into \3, capturing a space and one or more non-space characters, anchored to the end.
reqrep ^([^\ :]*)\ /\?pdf=(.*)(\ [^\ ]+)$ \1\ /newpdf?search=%22\2%22\3
A better solution would be to use fetches and the http-request request modification capabilities to manipulate the query string.

Related

Single GKE Ingress path with multiple wildcards

I am trying to create a path in a GKE ingress like this: /organizations/''/entity/''/download.
NOTE: The '' above represents a wildcard (*)
The values after organizations/ and after entity/ are dynamic so I have to use two wildcards, but this is not working, the first wildcard after the /organizations/* is taking all the requests.
I want to apply a different timeout only on this specific request, therefore I need to configure it just like this, if there is /test instead of /download at the end, it shouldn't take place.
I can't be the only one to have the same situation, and I am struggling to find anything on internet with the same issue?
Any workaround?
The only supported wildcard character for the path field of an Ingress is the * character. The * character must follow a forward slash (/) and must be the last character in the pattern. For example, /, /foo/, and /foo/bar/* are valid patterns, but , /foo/bar, and /foo/*/bar are not.
Source (From GKE Ingress docs): https://cloud.google.com/kubernetes-engine/docs/concepts/ingress#multiple_backend_services

Garbage in Thunderbird multipart boundaries

In messages I send with Thunderbird I have the following email header
Content-Type: multipart/alternative;
boundary="000_160551222008756181axiscom"
All the history is divided by lines looking like
--000_160551222008756181axiscom
Those initial -- mess things up for other mail viewers like Outlook's webmail client. Where does those -- come from, and how do I get rid of them?
I'm using version 78.4.2 (64-bit) of Thunderbird.
The hyphens come from RFC2046. You don't normally get rid of them. Look for the problem elsewhere.
The Content-Type field for multipart entities requires one parameter,
"boundary". The boundary delimiter line is then defined as a line
consisting entirely of two hyphen characters ("-", decimal value 45)
followed by the boundary parameter value from the Content-Type header
field, optional linear whitespace, and a terminating CRLF.

robots.txt: how are ill-formed disallow lines treated

What happens when a Disallow line includes more than one URI? Example:
Disallow: / tmp/
I white space was introduced by mistake.
Is there a standard way in how web browsers deal with this? Do they ignore the whole line or just ignore the second URI and treat it like:
Disallow: /
Google, at least, seems to treat the first non-space character as the beginning of the path, and the last non-space character as the end. Anything in-between is counted as part of the path, even if it's a space. Google also silently percent-encodes certain characters in the path, including spaces.
So the following:
Disallow: / tmp/
will block:
http://example.com/%20tmp/
but it will not block:
http://example.com/tmp/
I have verified this on Google's robots.txt tester. YMMV for crawlers other than Google.

postfix header_checks.pcre wrongly blocking IPHONE emails

I have a postfix/dovecot mail server which has been working fine for a year or so but today one user came to me with his iPhone and said he couldn't send emails.
It turns out the emails were being rejected by my header_checks.pcre which I set up as per the example in http://www.postfix.org/header_checks.5.html
The error I got was something like:
Apr 30 09:48:28 mail06 postfix/cleanup[28849]: 53893A00CD: reject:
header Content-Type:
image/png;??name=email_logo.png;??x-apple-part-url="part22.05080008.04000601#mydomain.com"
from unknown[112.134.156.178]; from=
to= proto=ESMTP helo=<[192.168.1.12]>: 5.7.1
Attachment name
"email_logo.png;??x-apple-part-url="part22.05080008.04000601#mydomain.com"
may not end with ".com"
So it seems that the iPhone mail app was appending an "x-apple-part-url" suffix to the attachment name and the PCRE was mistakenly blocking this as a .com instead of allowing through a .png.
Does anyone know how I can safely modify the PCRE in http://www.postfix.org/header_checks.5.html to avoid this happening?
So far as I know ".com" is still a viable extension for Windows malware. The problem is that the second .* in the example PCRE in the Postfix documentation is spanning two parameters as if the .com ended the name or filename parameter.
According to RFC 2045, value := token / quoted-string. This means you need to cater for both the quoted and unquoted cases by providing appropriate character classes. You could split into two rules or, to save multiple lists of extensions, do something like:
/etc/postfix/header_checks.pcre:
/^Content-(Disposition|Type).*name\s*=\s*
("(?:[^"]|\\")*|[^();:,\/<>\#\"?=<>\[\]\ ]*)
((?:\.|=2E)(
ade|adp|asp|bas|bat|chm|cmd|com|cpl|crt|dll|exe|
hlp|ht[at]|
inf|ins|isp|jse?|lnk|md[betw]|ms[cipt]|nws|
\{[[:xdigit:]]{8}(?:-[[:xdigit:]]{4}){3}-[[:xdigit:]]{12}\}|
ops|pcd|pif|prf|reg|sc[frt]|sh[bsm]|swf|
vb[esx]?|vxd|ws[cfh])(\?=)?"?)\s*(;|$)/x
REJECT Attachment name $2$3 may not end with ".$4"
The new second line of the rule distinguishes between the quoted and unquoted cases and any closing quotation mark is absorbed into $3.
BTW I'd probably stick .mso, .xl, .ocx (obscure MS extensions) and .jar in there too. Obviously this check is useful against malware floods but doesn't substitute for an up-to-date antivirus or more detailed spam analysis.

What causes Tuckey's UrlRewriteFilter to malform urlencoded unicode characters (e.g. %C3%B6 for ö) and how can I avoid it?

We are using a simple UrlRewriteFilter rule to permanently (301) redirect HTTP requests without trailing slash to the same URL with trailing slash.
In some cases our presentation layer needs URLs with encoded special characters (e.g. %C3%B6 for ö) in it, which works fine as long as the UrlRewriteFilter is not involved. But when the rule kicks in I can see the encoded character getting malformed while redirecting, e.g.
www.mydomain.com/asdf%C3%B6asdf/ --> 301 --> www.mydomain.com/asdf%F6asdf/
%F6 not being a valid unicode sequence (ending up as question mark in black diamond when urldecoded).
We use UTF-8 throughout our application, it's set in response headers as well as in the HTML's <head> section. The malformed encoding occurs on Windows and Linux machines. The rewrite rule looks as follows
<rule enabled="true" match-type="regex" >
<name>Force trailing slash</name>
<note>...</note>
<condition type="request-uri" operator="notequal">...>/condition> <!-- some URLs shall not be redirected -->
<from>(^[^\?]*)(\?.*)?$</from>
<to type="permanent-redirect" last="true" >$1/$2</to> <!-- adding trailing slash and query string, if present -->
</rule>
I'd be happy for any ideas how this could be solved. I've played with the decode-using and encode attributes, but it did not help.
I had a similar problem. what I did was set decode to null :
<urlrewrite decode-using="null">
The issue I described below seems to be related to this bug report, which has been filed in 2010 and is untouched since then. I'll probably have to work around this by handling the request "manually" using Java. Other ideas are still welcome, though.