Regex for Apache - fail2ban

Regex for Apache - fail2ban

I would like to create a rule for Apache to block a massive logins according this type of log:
93.176.51.15 - - [21/Nov/2019:00:02:40 +0100] "GET /wordpress/wp-login.php HTTP/1.1" 200 5485
What's the exactly regex that I need? I use this:
^.+?:\d+ <HOST> -.*"(GET|POST|HEAD) .*/wp-login.php.*$
Thanks in advance

More "precise" regex would look like:
^<HOST> \S+ \S+ [^"]*"[A-Z]{3,15}\s+\S*/wp-login\.php\b
It is anchored (^.+ is not an anchor), does not have catch-all's (like .* but especially non-greedy like .+?), and covers all http-methods as well as from intruder side supplied user-name (if no auth expected and web server would log it instead of -).
And if you have fail2ban >= 0.10 use <ADDR> instead of <HOST> (more faster, safe and precise if only IPs are logged).

Related

how i can seperate only mobile number from this expression in notepad ++

103.199.145.50 - - [02/Jan/2020:04:42:34 +0530]
"GET /mspProducerM/sendSMS?user=userjesus&pwd=jcallshttp&sender=JCALLS&mobile9159525910&msg=Promise%20for%20Jan,%2002%20The%20Lord%20blesses%20His%20people%20with%20peace.-%20Psalm%2029:11.%20For%20Prayers%20and%20Queries%20call%20044-45999000&mt=0
HTTP/1.1" 200 42
like i need to extract text, 9159525910

You may try the following find and replace, in regex mode, with DOT ALL enabled:
Find: \b\d+\.\d+\.\d+\.\d+ - - \[[^]]+\].*?".*?\bmobile=(\d+).*?" \d+ \d+
Replace: $1
Demo
The approach here is to match the entire line of your input, and then capture the mobile number. The replacement is just the captured phone number.

snort | pcre| rule specification

My objective is to write a rule to detect a simple truth exploit (SQLi)
The string example is of a form:
% ' or 1 = 1 #
In order to identify the string above and some of its variations, I have developed following pcre.
pcre: "/\W\s*\W\s*or\s*([\d\w])\s*\W\s*\1\s*\W/";
I ran a test # regextester and my regex seems to work. However, in Snort, this rule fails to pick and does not trigger.
The rule is of a format
alert 192.168.x.x any -> 192.168.y.y 80 (msg: "SQL Query"; pcre: "/\W\s*\W\s*or\s*([\d\w])\s*\W\s*\1\s*\W/"; sid: 1001;);
I'd appreciate any help
GET request from Whireshark
GET /dvwa/vulnerabilities/sqli/?id=%25+%27+or+1+%3D+1+%23&Submit=Submit

The cause of the rule fail is URL encoding. %25 means %, %27means ', +(or %20) means space, %3D means =. https://www.w3schools.com/tags/ref_urlencode.asp
Snort have a HTTP normalization module. But i think it is not perfect.
Refer to following rule.
alert tcp any any -> any any (content:"+or+"; nocase; pcre:"/\+or\+\w\+%3D\+\w/";)
Using pcre alone can degrade performance. When used with content, it narrows the scope of the pcre inspection and improves performance.

PCRE Regex - How to return matches with multiline string looking for multiple strings in any order

I need to use Perl-compatible regex to match several strings which appear over multiple lines in a file.
The matches need to appear in any order (server servernameA.company.com followed by servernameZ.company.com followed by servernameD.company.com or any order combination of the three). Note: All matches will appear at the beginning of each line.
In my testing with grep -P, I haven't even been able to produce a match on simple string terms that appear in any order over new lines (even when using the /s and /m modifiers). I am pretty sure from reading I need a look-ahead assertion but the samples I used didn't produce a match for me even after analyzing each bit of the regex to make sure it was relevant to my scenario.
Since I need to support this in Production, I would like an answer that is simple and relatively straight-forward to interpret.
Sample Input
irrelevant_directive = 0
# Comment
server servernameA.company.com iburst
additional_directive = yes
server servernameZ.company.com iburst
server servernameD.company.com iburst
# Additional Comment
final_directive = true
Expectation
The regex should match and return the 3 lines beginning with server (that appear in any order) if and only if there is a perfect match for strings'serverA.company.com', 'serverZ.company.com', and 'serverD.company.com' followed by iburst. All 3 strings must be included.
Finally, if the answer (or a very similar form of the answer) can address checking for strings in any order on a single line, that would be very helpful. For example, if I have a single-line string of: preauth param audit=true silent deny=5 severe=false unlock_time=1000 time=20ms and I want to ensure the terms deny=5 and time=20ms appear in any order and if so match.
Thank you in advance for your assistance.

Regarding the main issue [for the secondary question see Casimir et Hippolyte answer] (using x modifier): https://regex101.com/r/mkxcap/5
(?:
(?<a>.*serverA\.company\.com\s+iburst.*)
|(?<z>.*serverZ\.company\.com\s+iburst.*)
|(?<d>.*serverD\.company\.com\s+iburst.*)
|[^\n]*(?:\n|$)
)++
(?(a)(?(z)(?(d)(*ACCEPT))))(*SKIP)(*F)
The matches are now all in the a, z and d capturing groups.
It's not the most efficient (it goes three times over each line with backtracking...), but the main takeaway is to register the matches with capturing groups and then checking for them being defined.

You don't need to use the PCRE features, you can simply write in ERE:
grep -E '.*(\bdeny=5\b.*\btime=20ms\b|\btime=20ms\b.*\bdeny=5\b).*' file
The PCRE approach will be different: (however you can also use the previous pattern)
grep -P '^(?=.*\bdeny=5\b).*\btime=20ms\b.*' file

Google Calculator Thousands Separator Special Character

NOTE: For more answers related to this, please see
Special Characters in Google Calculator
I noticed when grabbing the return value for a Google Calculator calculation, the thousands place is separated by a rather odd character. It is not simply a space.
Let's take the example of converting $4,000 USD to GBP.
If you visit the following Google link:
http://www.google.com/ig/calculator?hl=en&q=4000%20usd%20to%20gbp
You'll note that the response is:
{lhs: "4000 U.S. dollars",rhs: "2 497.81441 British pounds",error: "",icc: true}
This looks reasonable, and the thousands place appears to be separated by a whitespace character.
However, if you enter the following into your command line:
curl -s "http://www.google.com/ig/calculator?hl=en&q=4000%20usd%20to%20gbp"
You'll note that the response is:
{lhs: "4000 U.S. dollars",rhs: "2?498.28243 British pounds",error: "",icc: true}
That question mark (?) is a replacement character. What is going on?
AppleScript returns a different replacement character:
{lhs: "4000 U.S. dollars",rhs: "2†498.28243 British pounds",error: "",icc: true}
I am also getting from other sources:
{lhs: "4000 U.S. dollars",rhs: "2�498.28243 British pounds",error: "",icc: true}
It turns out that � is the proper Unicode replacement character 65533.
Can anyone give me insight into what Google is passing me?

It's a non-breaking space, U+00A0. It's to ensure that the number won't get broken at the end of a line.
Google returns the correct encoding (UTF-8) however:
Content-Type: text/html; charset=UTF-8
so ...
if it comes out as a normal space (U+0020) instead (Firefox does that when copying, stupidly enough), then the application performs conversion of certain characters to lookalikes, maybe to fit in some sort of restricted code page (ASCII perhaps).
if there is a question mark, then it was correctly read as Unicode but some part in processing uses a legacy character set that doesn't contain that character so it gets converted.
if there is a replacement character � (U+FFFD) then it was likely read as UTF-8, converted into a legacy character set that contains the character (e.g. Latin 1) and then re-interpreted as UTF-8.
if there is a totally different character, such as your dagger (†), then I'd guess the response is read correctly as Unicode, gets converted to a character set that contains the character and re-interpreted in another character set. A quick look at the Mac Roman codepage reveals that A0 indeed maps to †.
Needless to say, some parts in whatever you use in processing that response seem to be horrible broken in regard to Unicode. Something I'd hope wouldn't really happen that often in this millennium, but apparently it still does.
I figured out what it was by fiddling around in PowerShell a bit:
PS Home:\> $wc = new-object net.webclient
PS Home:\> $x = $wc.downloadstring('http://www.google.com/ig/calculator?hl=en&q=4000%20usd%20to%20gbp')
PS Home:\> [char[]]$x|%{"$_ - " + +$_}
...
" - 34
2 - 50
  - 160
4 - 52
9 - 57
8 - 56
. - 46
2 - 50
8 - 56
2 - 50
4 - 52
...
Also a quick look at the response headers revealed that the encoding is set correctly.

According to my tests with curl in the Terminal on OSX, by changing the International character encoding in the Terminal preferences : The encoding is iso latin 1.
When I set the encoding to UTF8 : I get "2?498.28243"
When I set the encoding to MacRoman : I get "2†498.28243"
First solution : use a user agent from any browser (Safari on OSX 10.6.8 in this example)
curl -s -A 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.48 (KHTML, like Gecko) Version/5.1 Safari/534.48' 'http://www.google.com/ig/calculator?hl=en&q=4000%20usd%20to%20gbp'
Second solution : use iconv
curl -s 'http://www.google.com/ig/calculator?hl=en&q=4000%20usd%20to%20gbp' | iconv -t utf8 -f iso-8859-1

Try
set myUrl to quoted form of "http://www.google.com/ig/calculator?hl=en&q=4000%20usd%20to%20gbp"
set xxx to do shell script "curl " & myUrl & " | sed 's/[†]/,/'"

What are these lines of log-parsing Perl doing and how can I come up with something that might work?

This problem comes under the context of pop-before-smtp / Postfix / Dovecot, but if I knew Perl string parsing, I could come up with an answer myself. However, I'm so lost I don't even know the precise question. To wit:
We've been using Postfix for a LONG time now and are kind of hooked on it. Now we need to "move into the modern era" and let people SEND email from our SMTP server(s) even when they're outside our network. So, tasked with this job, I've found pop-before-smtp.
You can find it here.
So, I've got it all configured but it fails in testing. I've troubleshot it using the directions here, and determined that the Perl that's trying to parse the log appears to be incorrect. We're using Dovecot as our IMAP / POP server, and there are three choices given in the configuration file. Here is an excerpt from the config file showing the three sets:
# For Dovecot POP3/IMAP when using syslog.
#$pat = '^[LOGTIME] \S+ (?:dovecot: )?(?:imap|pop3)-login: ' .
# 'Login: .*? (?:\[|rip=)[:f]*(\d+\.\d+\.\d+\.\d+)[],]';
#$out_pat = '^[LOGTIME] \S+ (?:dovecot: )?(?:imap|pop3)-login: ' .
# 'Disconnected.*? (?:\[|rip=)[:f]*(\d+\.\d+\.\d+\.\d+)[],]';
# For Dovecot POP3/IMAP when it does its own logging.
##$logtime_pat = '(\d\d\d\d-\d+-\d+ \d+:\d+:\d+)';
#$pat = '^dovecot: [LOGTIME] Info: (?:imap|pop3)-login: ' .
# 'Login: .+? rip=[:f]*(\d+\.\d+\.\d+\.\d+),';
#$out_pat = '^dovecot: [LOGTIME] Info: (?:imap|pop3)-login: ' .
# 'Disconnected.*? rip=[:f]*(\d+\.\d+\.\d+\.\d+),';
# For older Dovecot POP3/IMAP when it does its own logging.
#$pat = '^(?:imap|pop3)-login: [LOGTIME] Info: ' .
# 'Login: \S+ \[[:f]*(\d+\.\d+\.\d+\.\d+)\]';
#$out_pat = '^(?:imap|pop3)-login: [LOGTIME] Info: ' .
# 'Disconnected.*? \[[:f]*(\d+\.\d+\.\d+\.\d+)\]';
One is supposed to uncomment the ones that apply, however, none of them work.
I surmise that 'pat' is the pattern for login, and out-pat is the pattern for logging out or otherwise disconnecting.
The actual log record format is clearly different than any of these three, but they're close. Here are an example pair:
Mar 11 17:53:55 imap-login: Info: Login: user=<username>, method=PLAIN, rip=208.54.4.205, lip=192.168.1.1, TLS
Mar 11 17:59:10 IMAP(username): Info: Disconnected: Logged out bytes=352/43743
When using POP, 'imap-login' is replaced by 'pop-login', and on log-out, 'POP' replaces 'IMAP' - why the changes in capitalization I can't say!
Importand data are: The timestamp, the username, and, when logging in, the "remote" ip ("rip").
Given enough time, I may be able to piece together something that works, but since I don't actually know Perl, this is kind of tough. Please help me write new rules to parse the logging output used with our Dovecot package.

The (:?.. portion of a Perl regular expression asks for clustering but not capturing; this allows entire groups to be matched or ignored as as group without influencing the capture group numbers; all the lines capture exactly one field, the IP to allow. (Which is a little odd, I might have expected both username and IP, but this might be easier in the long run.)
# For Dovecot POP3/IMAP when using syslog.
$pat = '^[LOGTIME] \S+ (?:imap|pop3)-login: Info: ' .
'Login: .*? (?:\[|rip=)[:f]*(\d+\.\d+\.\d+\.\d+)[],]';
# not necessary? see comment header START OF PATTERNS
# $out_pat = '^[LOGTIME] \S+ (?:IMAP|POP3)\(\S+\): Info: ' .
# 'Disconnected.*';
I've removed the dovecot pieces since they weren't in your input. I added the Info: to both lines. I've modified the $out_pat to use IMAP(username) instead of the no-longer-there imap-login from the original. (The use of \S+ will break if usernames have spaces. Since this assumption was made elsewhere in the file, I hope it's fine.)
Since there is no longer any IP address to capture for the logout line, it is probably best to not define $out_pat -- the START OF PATTERNS comment block includes the phrase If the entry of your choice also provides $out_pat, you should uncomment that variable as well, which allows us to keep track of users who are still connected to the server (e.g. Thunderbird caches open IMAP connections).
I haven't tested this but I have good feelings about it.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse