Go (lang) parsing an email header and keeping order - email

I'm using net/mail library in Go, everything is great, however I want to pass in an original email and keep the order of the headers. This is important because the mail servers that pass the message on each add their headers in an order. Without order, its hard to know who received what, when and what headers each server added.
The net/mail library stores the headers in a map, which by definition has no concept of order. Seems a strange choice as header order is based only on order in the email, but it is the case.
Anyone got any suggestions as to how I can retain order the headers were read?
Thanks

The net/mail package uses the net/textproto package to parse the headers
(see ReadMessage()). Specifically, it uses ReadMIMEHeader() for
the headers, which is documented as:
The returned map m maps CanonicalMIMEHeaderKey(key) to a sequence of values
in the same order encountered in the input.
You can view the full source if you want, but the basic process is:
headers = make(map[string][]string)
for {
key, value := readNextHeader()
if key == "" {
return headers // End of headers
}
if headers[key] == nil {
headers[key] = []string{value}
} else {
headers[key] = append(headers[key], value)
}
}
It's true that the original order of the headers as they appeared in the message
is lost, but I'm not aware of any scenario where this truly matters. What
isn't lost is the order of the multi-values headers. The slice ensures they're
in the same order as they appeared in the email.
You can verify this with a simple program which loops over the headers and
compares the values (such as this one in the
Playground).
However, matching Received and Received-SPF headers is a bit more complex,
as:
not every Received header may have a corresponding Received-SPF header;
the Received-SPF header may not appear above the Received header; this is
recommended but not mandated by the RFC (besides, many programs don't
even follow the RFC, so this wouldn't be a guarantee anyway).
So you'll either need to parse the value of the headers and match them based on
that, or use the net/textproto package for more low-level access to the
headers. You can use the source of ReadMIMEHeader() as a starting point.

Related

How to handle error responses in a REST endpoint that accepts different Accept header values.

I'm trying to add a new content type to a REST endpoint. Currently it only returns json but I now need to be able to return also a CSV file.
As far as I know, the best way to do this is by using the Accept header with value text/csv and then add a converter that is able to react to this and convert the returned body to the proper CSV representation.
I've been able to do this but then I have a problem handling exceptions. Up until know, all the errors returned are in json. The frontend expects any 500 status code to contain a specific body with the error. But now, by adding the option to return either application/json or text/csv to my endpoint, in case of an error, the converter to be used to transform the body is going to be either the jackson converter or my custom one depending on the Accept header passed. Moreover, my frontend is going to need to read the content-type returned and parse the value based on the type of representation returned.
Is this the normal approach to handle this situation?
A faster workaround would be to forget about the Accept header and include a url parameter indicating the format expected. Doing it this way, I'd be able to change the content-type of the response and the parsing of the data directly in the controller as the GET request won't include any Accept header and it will be able to accept anything. There are some parts of the code already doing this where the only expected response format is CSV so I'm going to have a difficult time defending the use of the Accept header unless there is a better way of handling this.
my frontend is going to need to read the content-type returned and parse the value based on the type of representation returned.
Is this the normal approach to handle this situation?
Yes.
For example, RFC 7807 describes a common format for describing problems. So the server would send an application/problem+json or an application/problem+xml representation of the issue in the response, along with the usual meta data in the headers.
Consumers that understand application/problem+json can parse the data with in, and forward a useful description of the problem to the user/logs whatever. Consumers that don't understand that representation are limited to acting on the information in the headers.
A faster workaround would be to forget about the Accept header and include a url parameter indicating the format expected.
That's also fine -- more precisely, you can have a different resource responsible for the each of the different media-types that you support.
It may be useful to review section 3.4 of RFC 7231, which describes the semantics of content negotiation.

PlayWS calculate the size of a http call without consuming the stream

I'm currently using the PlayWS http client which returns an Akka stream. From my understanding, I can consume the stream and turn it into a Byte[] to calculate the size. However, this also consumes the stream and I can't use it anymore. Anyway around this?
I think there are two different aspects related to the question.
You want to know the size of the server response in advance to prepare buffer. Unfortunately there is no guaranteed way to do this. HTTP 1.1 spec explicitly allows transfer mode when the server does not know the size of the response in advance via chunked transfer encoding. See also quote from 3.3.1. Transfer-Encoding:
A recipient MUST be able to parse the chunked transfer coding
(Section 4.1) because it plays a crucial role in framing messages
when the payload body size is not known in advance.
Section 3.3.3. Message Body Length specifies how length of a message body is defined and it besides the aforementioned chunked transfer encoding it also contains quite unhelpful
Otherwise, this is a response message without a declared message
body length, so the message body length is determined by the
number of octets received prior to the server closing the
connection.
This is added for backward compatibility and discouraged from usage but is still legally allowed.
Still in many real world scenarios you can use Content-Length header field that the server may return. However there is a catch here as well: if gzip Content-Encoding is used, then Content-Length will contain size of the compressed body.
To sum up: in general case you can't get the size of the message body in advance before you fully get the server response i.e. in terms of code perform a blocking call on the response. You may try to use Content-Length and it might or might not help in your specific case.
You already have a fully downloaded response (or you are OK with blocking on your StreamedResponse) and you want to process it by first getting the size and only then processing the actual data. In such case you may first use getBodyAsBytes method which returns IndexedSeq[Byte] and thus has size, and then convert it into a new Source using Source.single which is actually exactly what the default (i.e. non-streaming) implementation of getBodyAsSource does.

Writing the Pragma header in a DelegatingHandler in Asp.Net Web API

I've asked this question over on programmers that's linked to this one. I'm trying to find a suitable header, that is unlikely to be stripped, that I can use to send back a unique Request ID with every response, even if it does not send a body.
One of the headers I considered was the Pragma header, as looking at the spec it appears to be intended not only for the additional no-cache HTTP 1.0 backwards-compatibility value, but also for application-specific values, so I should be able to use it. It should be possible, for example, to send something like no-cache; requestid=id.
So in a DelegatingHandler I tried writing to it with my ID:
//HttpResponseMessage Response;
Response.Headers.Add("pragma", "some_value");
But it arrives at the client with no-cache; always. I think WebAPI automatically sends caching headers consistent with caching being switched off, which includes the Pragma one.
So, how do I make sure my value is maintained and not overwritten?
I've cracked it, the answer is to make sure you also set the CacheControl header on the HttpResponseMessage, which then bypasses some slightly fishy logic in System.Web.Http.WebHost.HttpControllerHandler (I've opened a discussion on CodePlex about this; I think the logic needs to be changed).
So instead of
//HttpResponseMessage Response;
Response.Headers.Add("pragma", "some_value");
You have to do:
Response.Headers.CacheControl =
new System.Net.Http.Headers.CacheControlHeaderValue()
{
NoCache = true
};
Response.Headers.Add("pragma", "some_value");
(I've used NoCache since the current API default is to switch caching off for all responses).

Detect duplicated header in HTTP::Response

I have a problem with a HTTP::Response Perl object from a remote server that sometimes returns the HTTP response with duplicated 'Content-Length' headers.
When this occurs, if the content-length value is '43215', when I read the header value with:
print ($response->header('Content-length'));
the result is:
4321543215
How can I detect if the header is duplicated and access to the real value?
From the fine manual for HTTP::Headers:
A multi-valued field will be returned as separate values in list context and will be
concatenated with "," as separator in scalar context.
and this is list context:
print ($response->header('Content-length'))
So, $response->header() is returning both Content-length headers as a list and the result is, essentially:
print join('', 43215, 43215)
You can either use kork's $response->content_length() approach or grab all the Content-length headers in an array and use the first one as the length:
my #lengths = $response->header('Content-length');
my $length = $lengths[0];
If you end up getting multiple Content-length headers and they're different then someone is very confused.
You cannot detect this, at least not reliably. You could of course split the header value in the middle and try to find out if the left value is equal to the right but when you got sizes like 4444, you don't know if it's duplicated or not. The only chance to fix this is fixing it in the upstream server that sends you duplicated headers.
You could maybe try to access the content length via the content_length property:
$response->content_length
Maybe this is aware of duplicate headers, but i did not try it.

Parse and display MIME multipart email on website

I have a raw email, (MIME multipart), and I want to display this on a website (e.g. in an iframe, with tabs for the HTML part and the plain text part, etc.). Are there any CPAN modules or Template::Toolkit plugins that I can use to help me achieve this?
At the moment, it's looking like I'll have to parse the message with Email::MIME, then iterate over all the parts, and write a handler for all the different mime types.
It's a long shot, but I'm wondering if anyone has done all this already? It's going to be a long and error prone process writing handlers if I attempt it myself.
Thanks for any help.
I actually just dealt with this problem just a few months ago. I added an email feature to the product I work for, both sending and receiving. The first part was sending reminders to users, but we didn't want to manage the bounce backs for our customer admins, we decided to have a message inbox that the admins could see bounces and replies without us, and the admins can deal with adjusting email addresses if they needed to.
Because of this, we accept all email that is sent to an inbox we watch. We use VERP to associate an email with a user, and store the entire email as is in the database. Then, when the admin requests to see the email, we have to parse the email.
My first attempt was very similar to an earlier answer. If one of the parts is html, show it. If it's text, show it. Otherwise, show the original, raw email. This broke down real fast with a few emails not generated by sendmail. Outlook, Exchange, and a few other email systems don't do that, they use multiparts to send the email. After a lot of digging and cussing, I discovered that the problem doesn't appear to be well documented. With the help of looking through MHonArc and reading the RFC's (RFC2045 and RFC2046), I settled on the solution below. I decided on not using MHonArc, since I couldn't easily resuse the parsing and display functionality. I wouldn't say this is perfect, but it's been good enough that we used it.
First, take the message and use Email::MIME to parse it. Then call a function called get_part with the array of parts Email::MIME gives you with ->parts().
get_part, for each part it was passed, decodes the content type, looks it up in a hash, and if it exists, call the function associated with that content type. If the decoder was able to give us something, put it on a result array.
The last piece of the puzzle is this decoder array. Basically, it defines the content types I can deal with:
text/html
text/plain
message/delivery-status, which is actually also plain text
multipart/mixed
multipart/related
multipart/alternative
The non-multipart sections I return as is. With mixed, related and alternative, I merely call get_parts on that MIME node and returns the results. Because alternative is special, it has some extra code after calling get_parts. It will only return html if it has an html part, or it will return only the text part of it has a text part. If it has neither, it won't return anything valid.
The advantage with the hash of valid content types is that I can easily add logic for more parts as needed. And by the time you get_parts is done, you should have an array of all content you care about.
One more item I should mention. As a part of this, we created a separate domain that actually serves these messages. The main domain that an admin works on will refuse to serve the message and redirect the browser to our user content domain. This second domain will only serve user content. This is to help the browser properly sandbox the content away from our main domain. See same origin policy (http://en.wikipedia.org/wiki/Same_origin_policy)
It doesn't sound like a difficult job to me:
use Email::MIME;
my $parsed = Email::MIME->new($message);
my #parts = $parsed->parts; # These will be Email::MIME objects, too.
print <<EOF;
<html><head><title>!</title></head><body>
EOF
for my $part (#parts) {
my $content_type = $parsed->content_type;
if ($content_type eq "text/plain") {
print "<pre>", $part->body (), "</pre>\n";
}
elsif ($content_type eq "text/html") {
print $part->body ();
}
# Handle some more cases here
}
print <<EOF;
</body></html>
EOF
Reuse existing complete software. The MHonArc mail-to-HTML converter has excellent MIME support.