awk: Process all lines after a certain date - date

I have an Apache log file I'd like to process with awk. But I'd like to process only lines after a certain date. I'm already converting the dates in the log file with the code found in https://stackoverflow.com/a/2115940/130121
Log file lines look like this:
194.88.248.197 - - [18/Sep/2012:11:08:40 +0200] "GET start HTTP/1.1" 200 3063 "" "Mozilla/5.0 (X11; Linux i686; rv:8.0) Gecko/20100101 Firefox/8.0"
How can I use a date that I give as a command line parameter to compare that date to the dates in the log file?

gawk supports Time functions like mktime and strftime. You could use them to format the parameter and use the formated parameter in a pattern to select only the records you are intereste in.

Related

Regex for Apache

I would like to create a rule for Apache to block a massive logins according this type of log:
93.176.51.15 - - [21/Nov/2019:00:02:40 +0100] "GET /wordpress/wp-login.php HTTP/1.1" 200 5485
What's the exactly regex that I need? I use this:
^.+?:\d+ <HOST> -.*"(GET|POST|HEAD) .*/wp-login.php.*$
Thanks in advance
More "precise" regex would look like:
^<HOST> \S+ \S+ [^"]*"[A-Z]{3,15}\s+\S*/wp-login\.php\b
It is anchored (^.+ is not an anchor), does not have catch-all's (like .* but especially non-greedy like .+?), and covers all http-methods as well as from intruder side supplied user-name (if no auth expected and web server would log it instead of -).
And if you have fail2ban >= 0.10 use <ADDR> instead of <HOST> (more faster, safe and precise if only IPs are logged).

Google Calculator Thousands Separator Special Character

NOTE: For more answers related to this, please see
Special Characters in Google Calculator
I noticed when grabbing the return value for a Google Calculator calculation, the thousands place is separated by a rather odd character. It is not simply a space.
Let's take the example of converting $4,000 USD to GBP.
If you visit the following Google link:
http://www.google.com/ig/calculator?hl=en&q=4000%20usd%20to%20gbp
You'll note that the response is:
{lhs: "4000 U.S. dollars",rhs: "2 497.81441 British pounds",error: "",icc: true}
This looks reasonable, and the thousands place appears to be separated by a whitespace character.
However, if you enter the following into your command line:
curl -s "http://www.google.com/ig/calculator?hl=en&q=4000%20usd%20to%20gbp"
You'll note that the response is:
{lhs: "4000 U.S. dollars",rhs: "2?498.28243 British pounds",error: "",icc: true}
That question mark (?) is a replacement character. What is going on?
AppleScript returns a different replacement character:
{lhs: "4000 U.S. dollars",rhs: "2†498.28243 British pounds",error: "",icc: true}
I am also getting from other sources:
{lhs: "4000 U.S. dollars",rhs: "2�498.28243 British pounds",error: "",icc: true}
It turns out that � is the proper Unicode replacement character 65533.
Can anyone give me insight into what Google is passing me?
It's a non-breaking space, U+00A0. It's to ensure that the number won't get broken at the end of a line.
Google returns the correct encoding (UTF-8) however:
Content-Type: text/html; charset=UTF-8
so ...
if it comes out as a normal space (U+0020) instead (Firefox does that when copying, stupidly enough), then the application performs conversion of certain characters to lookalikes, maybe to fit in some sort of restricted code page (ASCII perhaps).
if there is a question mark, then it was correctly read as Unicode but some part in processing uses a legacy character set that doesn't contain that character so it gets converted.
if there is a replacement character � (U+FFFD) then it was likely read as UTF-8, converted into a legacy character set that contains the character (e.g. Latin 1) and then re-interpreted as UTF-8.
if there is a totally different character, such as your dagger (†), then I'd guess the response is read correctly as Unicode, gets converted to a character set that contains the character and re-interpreted in another character set. A quick look at the Mac Roman codepage reveals that A0 indeed maps to †.
Needless to say, some parts in whatever you use in processing that response seem to be horrible broken in regard to Unicode. Something I'd hope wouldn't really happen that often in this millennium, but apparently it still does.
I figured out what it was by fiddling around in PowerShell a bit:
PS Home:\> $wc = new-object net.webclient
PS Home:\> $x = $wc.downloadstring('http://www.google.com/ig/calculator?hl=en&q=4000%20usd%20to%20gbp')
PS Home:\> [char[]]$x|%{"$_ - " + +$_}
...
" - 34
2 - 50
  - 160
4 - 52
9 - 57
8 - 56
. - 46
2 - 50
8 - 56
2 - 50
4 - 52
...
Also a quick look at the response headers revealed that the encoding is set correctly.
According to my tests with curl in the Terminal on OSX, by changing the International character encoding in the Terminal preferences : The encoding is iso latin 1.
When I set the encoding to UTF8 : I get "2?498.28243"
When I set the encoding to MacRoman : I get "2†498.28243"
First solution : use a user agent from any browser (Safari on OSX 10.6.8 in this example)
curl -s -A 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.48 (KHTML, like Gecko) Version/5.1 Safari/534.48' 'http://www.google.com/ig/calculator?hl=en&q=4000%20usd%20to%20gbp'
Second solution : use iconv
curl -s 'http://www.google.com/ig/calculator?hl=en&q=4000%20usd%20to%20gbp' | iconv -t utf8 -f iso-8859-1
Try
set myUrl to quoted form of "http://www.google.com/ig/calculator?hl=en&q=4000%20usd%20to%20gbp"
set xxx to do shell script "curl " & myUrl & " | sed 's/[†]/,/'"

Maildrop: Filter mail by Date: header

I'm using getmail + maildrop + mutt + msmtp chain with messages stored in Maildir. Very big inbox bothers me, so i wanted to organize mail by date like that:
Maildir
|-2010.11->all messages with "Date: *, * Nov 2010 *"
|-2010.12->same as above...
|-2011.01
`-2011.02
I've googled much and read about mailfilter language, but still it is hard for me to write such filter. Maildrop's mailing list archives has almost nothing on this (as far as i scanned through it). There is some semi-solution on https://unix.stackexchange.com/questions/3092/organize-email-by-date-using-procmail-or-maildrop, but i don't like it, because i want to use "Date:" header and i want to sort by month like "YEAR.MONTH" in digits.
Any help, thoughts, links, materials will be appreciated.
Using mostly man pages, I came up with the following solution for use on Ubuntu 10.04. Create a mailfilter file called, for example, mailfilter-archive with the following content:
DEFAULT="$HOME/mail-archive"
MAILDIR="$DEFAULT"
# Uncomment the following to get logging output
#logfile $HOME/tmp/maildrop-archive.log
# Create maildir folder if it does not exist
`[ -d $DEFAULT ] || maildirmake $DEFAULT`
if (/^date:\s+(.+)$/)
{
datefile=`date -d "$MATCH1" +%Y-%m`
to $DEFAULT/$datefile
}
# In case the message is missing a date header, send it to a default mail file
to $DEFAULT/inbox
This uses the date command, taking the date header content as input (assuming it is in RFC-2822 format) and producing a formatted date to use as the mail file name.
Then execute the following on existing mail files to archive your messages:
cat mail1 mail2 mail3 mail4 | reformail -s maildrop mailfilter-archive
If the mail-archive contents look good, you could remove the mail1, mail2, mail3, mail4, etc. mail files.

characters changed in a Curl request

When I look at the XML data feed i get with the below code, special characters are correct in the XML code.
However when Curl returns the data, characters like "ó" and "ä" are converted into resp. "ó" and "ä".
This conversion happens to all special characters, these 2 are just an example.
$myvar = curl_init();
$myURL = "http://someurl.com/";
curl_setopt($myvar, CURLOPT_USERAGENT, '[Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2) Gecko/20070219 Firefox/2.0.0.2")]');
curl_setopt($myvar, CURLOPT_URL, $myURL);
curl_setopt($myvar, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($myvar, CURLOPT_TRANSFERTEXT, TRUE);
curl_setopt($myvar, CURLOPT_CONNECTTIMEOUT,3);
$xmlstr = curl_exec ($myvar);
The header of the XML file says to encode as follows "?xml version="1.0" encoding="UTF-8"?"
All I want is to get the same characters to show up in the Curl result without any transformation.
Hoping i just missed some plain easy step, looking forward to any answers.
Best regards
Fons
How do you know $xmlstr contains the wrong bytes? If you're looking at the output in a terminal window of some sort, it's probable that the problem is that the terminal does not support UTF-8, not that cURL is broken.
cURL doesn't care about UTF-8 or any other character encoding - its job is just to fetch a sequence of bytes from somewhere. It's not likely to be doing anything that will mangle special characters. If there's something wrong with the way you're using cURL, it'll be mangling everything, not just non-ASCII characters.

How can I change the format of a date from the command line?

What's the quickest way to convert a date in one format, say
2008-06-01
to a date in another format, say
Sun 1st June 2008
The important bit is actually the 'Sun' because depending on the dayname, I may need to fiddle other things around - in a non-deterministic fashion. I'm running GNU bash, version 3.2.17(1)-release (i386-apple-darwin9.0).
[Background: The reason that I want to do it from the command line, is that what I really want is to write it into a TextMate command... It's an annoying task I have to do all the time in textMate.]
$ date -d '2005-06-30' +'%a %F'
Thu 2005-06-30
See man date for other format options.
This option is available on Linux, but not on Darwin. In Darwin, you can use the following syntax instead:
date -j -f "%Y-%m-%d" 2006-06-30 +"%a %F"
The -f argument specifies the input format and the + argument specifies the output format.
As pointed out by another poster below, you would be wise to use %u (numeric day of week) rather than %a to avoid localization issues.
Reading the date(1) manpage would have revealed:
-j Do not try to set the date. This allows you to use the -f flag
in addition to the + option to convert one date format to another.
Thanks for that sgm. So just so I can come back to refer to it -
date -j -f "%Y-%m-%d" "2008-01-03" +"%a%e %b %Y"
^ ^ ^
parse using | output using
this format | this format
|
date expressed in
parsing format
Thu 3 Jan 2008
Thanks.
date -d yyyy-mm-dd
If you want more control over formatting, you can also add it like this:
date -d yyyy-mm-dd +%a
to just get the Sun part that you say you want.
date -d ...
doesn't seem to cut it, as I get a usage error:
usage: date [-jnu] [-d dst] [-r seconds] [-t west] [-v[+|-]val[ymwdHMS]] ...
[-f fmt date | [[[mm]dd]HH]MM[[cc]yy][.ss]] [+format]
I'm running GNU bash, version 3.2.17(1)-release (i386-apple-darwin9.0), and as far as the man goes, date -d is just for
-d dst Set the kernel's value for daylight saving time. If dst is non-
zero, future calls to gettimeofday(2) will return a non-zero for
tz_dsttime.
If you're just looking to get the day of the week, don't try to match strings. That breaks when the locale changes. The %u format give you the day number:
$ date -j -f "%Y-%m-%d" "2008-01-03" +"%u"
4
And indeed, that was a Thursday. You might use that number to index into an array you have in your program, or just use the number itself.
See the date and strftime man pages for more details. The date manpage on OS X is the wrong one, though, since it doesn't list these options that work.