How to structure DKIM headers - email

I'm working on a system that sends emails from the ground up.
I'm currently working on a DKIM signer, I know how to do it, but cannot find how to format the header and body before hashing them.
Does anyone know this? Already thanks.

3.4.1. The "simple" Header Canonicalization Algorithm
The "simple" header canonicalization algorithm does not change header
fields in any way. Header fields MUST be presented to the signing or
verification algorithm exactly as they are in the message being
signed or verified. In particular, header field names MUST NOT be
case folded and whitespace MUST NOT be changed.
3.4.2. The "relaxed" Header Canonicalization Algorithm
The "relaxed" header canonicalization algorithm MUST apply the
following steps in order:
Convert all header field names (not the header field values) to
lowercase. For example, convert "SUBJect: AbC" to "subject: AbC".
Unfold all header field continuation lines as described in
[RFC5322]; in particular, lines with terminators embedded in
continued header field values (that is, CRLF sequences followed by
WSP) MUST be interpreted without the CRLF. Implementations MUST
NOT remove the CRLF at the end of the header field value.
Convert all sequences of one or more WSP characters to a single SP
character. WSP characters here include those before and after a
line folding boundary.
Delete all WSP characters at the end of each unfolded header field
value.
Delete any WSP characters remaining before and after the colon
separating the header field name from the header field value. The
colon separator MUST be retained.
3.4.3. The "simple" Body Canonicalization Algorithm
The "simple" body canonicalization algorithm ignores all empty lines
at the end of the message body. An empty line is a line of zero
length after removal of the line terminator. If there is no body or
no trailing CRLF on the message body, a CRLF is added. It makes no
other changes to the message body. In more formal terms, the
"simple" body canonicalization algorithm converts "*CRLF" at the end
of the body to a single "CRLF".
Note that a completely empty or missing body is canonicalized as a
single "CRLF"; that is, the canonicalized length will be 2 octets.
The SHA-1 value (in base64) for an empty body (canonicalized to a
"CRLF") is:
uoq1oCgLlTqpdDX/iUbLy7J1Wic=
The SHA-256 value is:
frcCV1k9oG9oKj3dpUqdJg1PxRT2RSN/XKdLCPjaYaY=
3.4.4. The "relaxed" Body Canonicalization Algorithm
The "relaxed" body canonicalization algorithm MUST apply the
following steps (1) and (2) in order:
Reduce whitespace:
Ignore all whitespace at the end of lines. Implementations
MUST NOT remove the CRLF at the end of the line.
Reduce all sequences of WSP within a line to a single SP
character.
Ignore all empty lines at the end of the message body. "Empty
line" is defined in Section 3.4.3. If the body is non-empty but
does not end with a CRLF, a CRLF is added. (For email, this is
only possible when using extensions to SMTP or non-SMTP transport
mechanisms.)
The SHA-1 value (in base64) for an empty body (canonicalized to a
null input) is:
2jmj7l5rSw0yVb/vlWAYkK/YBwk=
The SHA-256 value is:
47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=
3.4.5. Canonicalization Examples (INFORMATIVE)
In the following examples, actual whitespace is used only for
clarity. The actual input and output text is designated using
bracketed descriptors: "<SP>" for a space character, "<HTAB>" for a
tab character, and "<CRLF>" for a carriage-return/line-feed sequence.
For example, "X <SP> Y" and "X<SP>Y" represent the same three
characters.
Example 1: A message reading:
A: <SP> X <CRLF>
B <SP> : <SP> Y <HTAB><CRLF>
<HTAB> Z <SP><SP><CRLF>
<CRLF>
<SP> C <SP><CRLF>
D <SP><HTAB><SP> E <CRLF>
<CRLF>
<CRLF>
when canonicalized using relaxed canonicalization for both header and
body results in a header reading:
a:X <CRLF>
b:Y <SP> Z <CRLF>
and a body reading:
<SP> C <CRLF>
D <SP> E <CRLF>
Example 2: The same message canonicalized using simple
canonicalization for both header and body results in a header
reading:
A: <SP> X <CRLF>
B <SP> : <SP> Y <HTAB><CRLF>
<HTAB> Z <SP><SP><CRLF>
and a body reading:
<SP> C <SP><CRLF>
D <SP><HTAB><SP> E <CRLF>
Example 3: When processed using relaxed header canonicalization and
simple body canonicalization, the canonicalized version has a header
of:
a:X <CRLF>
b:Y <SP> Z <CRLF>
and a body reading:
<SP> C <SP><CRLF>
D <SP><HTAB><SP> E <CRLF>

Related

perl regex - pattern matching

Can anyone explain what is being done below?
$name=~m,common/([^/]+)/run.*/([^/]+)/([^/]+)$,;
common, run and / are match themselves.
() captures.
[^/]+ matches 1 or more characters that aren't /.
.* matches 0 or more characters that aren't Line Feeds.[1]
$ is equivalent to (\n?\z).[2]
\n optionally matches a Line Feed.
\z matches the end of the string.
I think it's trying to match a path of one or both of the following forms:
.../common/XXX/runYYY/XXX/XXX
common/XXX/runYYY/XXX/XXX
Where
XXX is a sequence of at least one character that doesn't contain /.
YYY is a sequence of any number of characters (incl zero) that doesn't contain /.
It matches more than that, however.
It matches uncommon/XXX/runYYY/XXX/XXX
It matches common/XXX/runYYY/XXX/XXX/XXX/XXX/XXX/XXX
The parts in bold are captured (available to the caller).
When the s flag isn't used.
When the m flag isn't used.

Regex rule match up to string

I need to used grep / egrep / sed to extract certain parts out of a SNORT rule string.
given a string that can be in the format:
alert tcp any any -> any any (msg:"Some message";
content:"c1"; content:"GET /blah"; offset:0; depth:9; content:"something else";)
How would I go about extracting just the following:
content:"GET /blah"; offset:0; depth:9;
Given that the following are true:
It must match up until the start of the next content match (if there is one)
A rule may only have this content term, it may have more and they may be in any order
Other modifiers may be applied before, after or in between the offset and depth operators, they must also be extracted as follows:
content:"GET "; offset:5; http_uri; depth:12;
Rules can be "malformed" i.e. instead of having a single semicolon after the content term it may have two or more.
What I have so far which I believe would work in other regex systems is:
(GET|POST).*?(?=content)
The idea behind this being that .*? is an ungreedy match on any character any number of times and a non grabbing (not sure if that's the term) match on the next term "content".
I believe this breaks though if there is no following content term and also doesn't seem to extract anything in grep or egrep.
Not sure what to do, any ideas?
This should do the trick:
grep -Po '\bcontent\s*:\s*"(GET|POST)\b[^"]*"((?!;\s*content\s*:)[^"]|"[^"]*")*;'
Sample input:
alert tcp any any -> any any (msg:"Some message";
content:"c1"; content:"GET /blah"; offset:0; depth:9; content:"something else";)
content:"GET "; offset:5; http_uri; depth:12;
Output:
content:"GET /blah"; offset:0; depth:9;
content:"GET "; offset:5; http_uri; depth:12;
Explanation:
Instead of looking ahead for the next content, I am using a negative lookahead to consume anything other than the word content. This way, end of line also qualifies as the end of the match.
The regex in detail:
\b - word boundary (to prevent matching e.g. othercontent)
content\s*:\s* - literally: content followed by a colon; with optional spaces
" - opening quote
(GET|POST) - either one of these verbs
\b - word boundary (to prevent matching e.g. POSTAL)
[^"]*" - everything upto and including the closing quote
( - begin repeating subpattern
(?!;\s*content\s*:) - negative lookahead, to make sure we stop before any subsequent content
[^"] - any non-quote; spaces, letters, colons, semicolons...
| - or...
"[^"]*" - some attribute string; matching this as a whole to prevent the negative lookahead to pick up something between quotes
)* - end repeating subpattern; zero or more times
; - closing semicolon

Escape from open()

I am a newcomer to Perl, however not to programming in general. I have been looking for any hints how to escape from open() in Perl, but have not been lucky, and that is why I am asking here.
I have a:
$mailprog = '/usr/lib/sendmail';
open(MAIL,"|$mailprog -t");
read(STDIN, $buffer, 18);
print MAIL "To: xxx#xxx.xxx\n";
print MAIL "From: xxx#xxx.xxx\n";
print MAIL "Subject: xxx\n";
print MAIL $buffer;
close (MAIL);
Is there any way how I can shape the input into the $buffer so as to escape from sendmail ? The buffer input length is arbitrary. Input is totally under my control. Thanks a lot for any ideas !
man sendmail says:
By default, Postfix sendmail(1) reads a message from standard input
until EOF or until it reads a line with only a . character, and
arranges for delivery. Postfix sendmail(1) relies on the postdrop(1)
command to create a queue file in the maildrop directory.
So you would want your input to contain the sequence "\n.\n" somewhere.
Only one sequence is special to sendmail once it starts reading the body: A line containing a single . signals the end of the input. (EOF does the same.)
That means that if your input contains a line that contains nothing but ., you need to escape it. The default transfer encoding doesn't provide a means of escape, so you will need to specifying a Content-Transfer-Encoding that avoids the issue (e.g. base64) or allows you to escape the period (e.g. quote-printable), and encode the content accordingly.
This brings us to the restrictions of the content transfer encoding you choose adds.
The default content transfer encoding, 7bit, requires lines of no more than 998 octets terminated by CRLF. Those lines may only contain octets in [1,127], and octets 10 and 13 may only appear as part of the line terminator.
If the content transfer encoding you chose isn't suitable to encode your input, you will need to choose a different one.
You really should be using something like Email::Sender instead of working at such a low level.

What is the proper way of denoting URI with diacritics (letters with accents)?

What is the correct and official way of using diacritics in URI?
I have 3 different ways shown below:
Here á = %E1, â = %E2, space = %20, comma = %2C, but this link doesn't work properly since the characters are mangled:
http://www.recordspreservation.org/cgi-bin/list_directory_1.cgi?directory=%2CBrasil%2CGoi%E1s%2CLuzi%E2nia%2CSanta%20Luzia%2CBatismos%201749-1753%2CImagens&image_name=_MG_5229.JPG
Here space = %20, comma = %2C and I don't do anything with the a's. This link works:
http://www.recordspreservation.org/cgi-bin/list_directory_1.cgi?directory=%2CBrasil%2CGoiás%2CLuziânia%2CSanta%20Luzia%2CBatismos%201749-1753%2CImagens&image_name=_MG_5229.JPG
Here space = +, comma = %2C and I don't do anything with the a's. This link works:
http://www.recordspreservation.org/cgi-bin/list_directory_1.cgi?directory=%2CBrasil%2CGoiás%2CLuziânia%2CSanta+Luzia%2CBatismos+1749-1753%2CImagens&image_name=_MG_5229.JPG
The characters in a URL string must be within in a restricted subset of 7-bit ASCII, and no encoding is specified for wide characters
Some of that set are unreserved, and may be used literally anywhere the syntax allows
The remaining characters are reserved because they form part of the URL syntax; reserved characters must be percent-encoded if they are used outside their syntactical meaning
Eight-bit characters that are in neither the reserved nor the unreserved categories must always be percent-encoded
##Unreserved characters
0 to 9
A to Z
a to z
-
.
_
~
##Reserved characters
! - %21
# - %23
$ - %24
& - %26
' - %27
( - %28
) - %29
* - %2A
+ - %2B
, - %2C
/ - %2F
: - %3A
; - %3B
= - %3D
? - %3F
# - %40
[ - %5B
] - %5D
This link doesn't work properly since the characters are mangled
That is a problem between the client and the server. It looks like you're sending ISO-8859-1 characters, in which scheme E1 and E2 correspond to e acute, and e circumflex. But if your server is expecting UTF-8 encoding then those should appear as byte sequences C3 A1 and C3 A2
I can't tell what encoding is expected by your server, but it clearly isn't what you're sending. The current standard is to encode non-ASCII characters in UTF-8 and percent-encode the resulting bytes
###Update
The best solution is to use the URI module, which will encode character string as necessary
Take special note that, if you need to use UTF-8-encoded characters in your source code, as below, then you must have use utf8 at the top of your program. You also need to make sure that your editor is writing UTF-8 data to the program file.
use utf8;
use strict;
use warnings 'all';
use feature 'say';
use URI;
my $url = URI->new('http://www.recordspreservation.org/cgi-bin/list_directory_1.cgi?directory=,Brasil,Goiás,Luziânia,Santa Luzia,Batismos 1749-1753,Imagens&image_name=_MG_5229.JPG');
say $url;
###output
http://www.recordspreservation.org/cgi-bin/list_directory_1.cgi?directory=,Brasil,Goi%C3%A1s,Luzi%C3%A2nia,Santa%20Luzia,Batismos%201749-1753,Imagens&image_name=_MG_5229.JPG

Telnet connection and issue while sending input to the server

I have written the server program using the select. Then I have connect the client using telnet. The connection also completed successfully.
If I have the input length as 6 character including newline, in the server side it display the length as 7 character. How it is possible?
Server side:
The client is sending \r\n instead of \n, which would account for the extra character. You can translate it back to just a newline with a simple regex:
# $data holds the input line from the client.
$data =~ s/\r\n/\n/g; # Search for \r\n, replace it with \n
Client side:
Assuming you're using Net::Telnet, you're probably sending 2 characters for the newline, \r and \n, as specified by the Telnet RFC.
The documentation I linked to says this,
In the input stream, each sequence of
carriage return and line feed (i.e.
"\015\012" or CR LF) is converted to
"\n". In the output stream, each
occurrence of "\n" is converted to a
sequence of CR LF. See binmode() to
change the behavior. TCP protocols
typically use the ASCII sequence,
carriage return and line feed to
designate a newline.
And the default is not binary mode (binmode), meaning that all instances of \n in your client data will be replaced by \r\n before it gets sent to the server.
The default Binmode is 0, which means
do newline translation.
You can stop the module from replacing your newlines by calling binmode on your file descriptor, or in the case of Net::Telnet, call binmode on your object and pass 1.
# Do not translate newlines.
$obj->binmode(1);
Or on the server you can search for \r\n on the input data and replace it with \n.