I am Writing a little Mail parser for a project and to understand mail better.
There came up a question about Multiparts.
When I have a Mail with Content-Type: multipart/alternative oder any other multipart with a divider, will it be possible that in that multipart is anoter multipart (e.g. a multipart/mixed) or the oter way. What divider does that have, has that an own divider?
So is the MIME-Type multipart/* a flat structure (Can be parsed using one splitting divider) or is it a tree (Where each splitted part could be splitted again).
After 3 long nights of programming and intensive testing, I recognized, that multipart/* is NOT flat. It is a tree structure. For example if you have a html and a plaintext part as well as attachments, the mail is multipart/mixed holding the attachments and a multipart/alternative part. If there are also Inline-Images, the HTML-Part could be multipart/related holding the Images and the html.
Related
Executive summary: I can construct a multipart message which contains HTML in two separate parts, and it displays correctly in multiple MUAs. However, outlook.com insists on putting any additional HTML after the first HTML part in a downloadable attachment, instead of displaying it. It also does this for plaintext parts.
In detail: I need to add a signature to an email message, where the structure of the original message is, in general, unknown. I do this by wrapping the original message in a multipart/mixed, and then adding a new multipart/alternative which contains text and html versions of the required signature.
If the original message was itself a multipart/alternative, then the new message now looks like:
multipart/mixed
multipart/alternative [the original message]
text/plain
text/html
multipart/alternative [the appended signature]
text/plain [plaintext signature]
text/html [html signature]
This displays well in various clients (Thunderbird, and webmail from gmail/Yahoo/AOL/gmx), showing the original message with the appended signature. However, it doesn't work for MS clients (I've only tried outlook.com). The two alternative signatures are presented to the user as attachments, and not inline, so the user only sees download boxes.
To get around this, I've historically done this for Microsoft:
multipart/mixed
multipart/alternative [the original message]
text/plain
text/html
text/html [html-only signature]
This worked for several years for Microsoft, but has now stopped working - the signature is again shown as an attachment.
I've spent some hours experimenting with this, and can't find any way to get outlook.com to show two different HMTL (or even plain) text parts in the same message. The second one always appears as an attachment. Some of the things I've tried are:
Replace the second multipart-alternative above with another multipart/mixed, which encloses the multipart-alternative signature
Trying to force Content-Disposition: inline for the signature: this never works, and MS appears to ignore Content-Disposition
Replace the outer multipart/mixed with multipart-related, with type=multipart/alternative
Any ideas on how I can get MS clients to actually show the signature, short of parsing the internals of the original message and re-writing it?
I want to write an HTTP implementation.
I've been looking around for a few days about sending files over HTTP with Content-Type: multipart/form-data, and I'm really interested about how browsers (or any HTTP client) creates that kind of request.
I already took a look at a lots of questions about it here at stackoverflow like:
How does HTTP file upload work?
What does enctype='multipart/form-data' mean?
I dig into RFCs 2616 (and newer versions), 2046, etc. But I didn't find a clear answer (obviously I did not get the idea behind it).At most articles and answers I found this piece of request string, that's is simple to me to interpret, all these things are documented at RFCs...
POST /upload?upload_progress_id=12344 HTTP/1.1
Host: localhost:3000
Content-Length: 1325
Origin: http://localhost:3000
... other headers ...
Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryePkpFF7tjBAqx29L
------WebKitFormBoundaryePkpFF7tjBAqx29L
Content-Disposition: form-data; name="MAX_FILE_SIZE"
100000
------WebKitFormBoundaryePkpFF7tjBAqx29L
Content-Disposition: form-data; name="uploadedfile"; filename="hello.o"
Content-Type: application/x-object
... contents of file goes here ...
------WebKitFormBoundaryePkpFF7tjBAqx29L--
...and it would be simple to implement an HTTP client to construct a piece of string that way in any language.The problem becomes at ... contents of file goes here ..., there's little information about what "contents of file" is. I know it's binary data with a certain type and encoding, but It's difficult to think out of string data, how I would add a piece of binary data that has no string representation inside a string.
I would like to see examples of low level implementations of HTTP protocol with any language. And maybe in depth explanations about binary data transfer over HTTP, how client creates requests and how server read/parse it. PD. I know this question my look a duplicate but most of the answers are not focused on explaining binary data transfer (like media).
You should not try to handle strings on this part of the body, you should send binary data, see it as reading bytes from the resource and sending theses bytes unaltered.
So especially no encoding applied, no utf-8, no base64, HTTP is not a protocol with an ascii7 restriction like smtp, where base64 encoding is applied to ensure only ascii7 characters are used.
There is, by definition, no string version of this data, and looking at raw HTTP transfer (with wireshark for example) you should see binary data, bytes, stuff.
This is why most HTTP servers uses C to manage HTTP, they parse the HTTP communication byte per byte (as the protocol headers are ascii 7 only, certainly not multibytes characters) and they can also read/write arbitrary
binary data for the body quite easily (or even using system calls like readfile to let the kernel manage the binary part).
Now, about examples.
When you use Content-Length and no multipart stuff the body is exactly (content-length) bytes long, so the client parsing your sent data will just read this number of bytes and will treat this whole raw data as the body content (which may have a mime type and and encoding information, but that's just informations for layers set on top of the HTTP protocol).
When you use Transfer-Encoding: chunked, the raw binary body is separated into pieces, each part is then prefixed by an hexadecimal number (the size of the chunk) and the end of line marker. With a final null marker at the end.
If we take the wikipedia example:
4\r\n
Wiki\r\n
5\r\n
pedia\r\n
E\r\n
in\r\n
\r\n
chunks.\r\n
0\r\n
\r\n
We could replace each ascii7 letter by any byte, even a byte that would have no ascii7 representation, Ill use a * character for each real body byte:
4\r\n
****\r\n
5\r\n
*****\r\n
E\r\n
**************\r\n
0\r\n
\r\n
All the other characters are part of the HTTP protocol (here a chunked body transmission). I could also use a \n representation of binary data, and send only the null byte for each byte of the body, that would be:
4\r\n
\0\0\0\0\0\r\n
5\r\n
\0\0\0\0\0\0\r\n
E\r\n
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\r\n
0\r\n
\r\n
That's just a representation, we could also use \xNN or \NN representations, in reality these are bytes, 8 bits (too lazy to write the 0/1 representation of this body :-) ).
If the text of the example, instead of being:
Wikipedia in\r\n
\r\n
chunks.
It could have been a more complex one, with multibytes characters (here a é in utf-8):
Wikipédia in\r\n
\r\n
chunks.
This é is in fact 11000011:10101001 in utf-8, two bytes: \xc3\xa9 in \xNN representation), instead of the simple 01100101 / \x65 / echaracter. The HTTP body is now (see that second chunk size is 6 and not 5):
4\r\n
Wiki\r\n
6\r\n
p\xc3\xa9dia\r\n
E\r\n
in\r\n
\r\n
chunks.\r\n
0\r\n
\r\n
But this is only valid if the source data was effectively in utf-8, could have been another encoding. By default, unless you have some specific configuration settings available in your web server where you enforce a conversion of the source document in a specific encoding, that's not really the job of the web server to convert the source document, you take what you have, and you maybe add an header to tell the client what encoding was defined on the source document.
Finally we have the multipart way of transmitting the body, like in your question, it's a lot like the chunked version, except here boundaries and intermediary headers are used, but for the binary data between these boundaries, headers, and line endings control characters it is the same rule, everything inside are just bytes...
I have a requirement to add a signature to emails passing through a specialised proxy server. There doesn't seem to be a very good way to do this, and my initial thoughts are to:
If the mail is non-MIME, or is text/plain, just add text to the bottom of the body,
with a preceding "-- " line (an old convention for adding signatures
to text/plain messages; see RFC3676, 4.3)
If the mail is multipart/mixed, find the last boundary, and insert
a new `multipart/alternative' entity above it, containing plain and
HTML versions of the sig. This has the disadvantage that the sig may
appear below attachments, though.
If the mail is not multipart/mixed, make it multipart-mixed, and
demote the existing entities into the mixed section; now add the
multipart/alternative signature as the last part of the new
multipart/mixed. Same disadvantage as (2).
Seems pretty long-winded. Any ideas on better ways to do this? Thanks.
I coded this as above, and it's fine on Yahoo, gmail, AOL, GMX, and Thunderbird, with the MIME validated here and here.
The problem was that I forgot, again, about Microsoft's complete inability to read an RFC. If a multipart/mixed section contains two multipart/alternative sections, where each one contains a text/plain and a text/html alternative, then Outlook/hotmail actually displays both alternatives in the second multipart/alternative. So, it displays both the plain signature, and the HTML version.
The fix is straightforward - instead of adding a second multipart/alternative for the signatures, I instead add a single text/html section, containing only the HTML signature. This displays correctly on all the MUAs above, including the brain-dead ones. Not exactly ideal, but I could enable the fix only for Microsoft recipients, and let them sort things out if they have persuaded their MUA to display only plain text.
if only interesting in text/html and not in text/plain part, a bit simpler implementation will be to trasform the text/html into multipart/mixed with your html signature.
before
multipart/alternative
text/plain
text/html
after
multipart/alternative
text/plain
multipart/mixed
text/html
text/html <== your signature
Requirement: To send VB file of records length 100 as attachment using TCPSMTP utility with proper message in body without using IEBGENER utility.
I am trying to send email attachment file VB dataset as attachment. Its working either message in attachment or attachment file in body. But both simultaneously not working.
My JCL is:
//IRTCPN15 EXEC PROC=TCPSMTP
//SMTPIN DD DSN=EMAIL.CODE,
// DISP=SHR
// DD DSN=FILE.TOBE.SENTAS.ATTACH.MENT,DISP=SHR
Here, I have used both datasets EMAIL.CODE and FILE.TOBE.SENTAS.ATTACH.MENT of same specification VB 100 record length. I have also tried using boundary demiliter, but still its not working both together.
Dataset EMAIL.CODE contains:
HELO *******
MAIL FROM:<*******>
RCPT TO: <********>;
DATA
FROM: <******>
TO: <*******>;
SUBJECT: subject data
MIME-VERSION: 1.0
CONTENT-TYPE: TEXT/PLAIN
---Mail Body---
CONTENT DISPOSITION: ATTACHMENT; FILENAME=FILE.TXT
Please suggest me how to send this attachment with body. I have used asterisk due to security reasons. Please feel free to ask if any more information is needed.
In the EMAIL.CODE dataset, you're specifying that content-type of your message is text/plain. However, text/plain on its own (which is the default content type anyway) is always going to appear inline.
In order for the text in the message to be seen as an attachment, you need a Content-Disposition header that specifies attachment.
I can see in your question that you have a CONTENT DISPOSITION line, but it's labeled as being part of the message body. In addition to the fact that it needs to be a header, not a part of the body, it also needs to be hyphenated. So you should have CONTENT-DISPOSITION, not CONTENT DISPOSITION.
However, what all of this gets you is a message containing nothing but the attachment, and your question specifies that you want both a message body and an attachment. In order to do that, your Content-type at the top level needs to be multipart/mixed, and the body of the message needs to contain two MIME parts, one specified simply as being text/plain, and the other also text/plain, but with Content-Disposition: attachment.
This example shows the data for a MIME message containing both a text/plain body and a text/plain attachment.
FROM: <sender#example.com>
TO: <receiver#example.com>
Subject: TESTING message with body and attachment.
Mime-Version: 1.0
Content-type: multipart/mixed; boundary=MIME_BOUNDARY
This is the non-MIME body of a multipart message in MIME format.
Unless you are using a genuinely ancient email client or viewing
the raw source of a message, you should never see this paragraph.
--MIME_BOUNDARY
Content-type: text/plain
This is the inline text section of a multipart message
in MIME format. This is what will appear as the body
of your email when using any normal email client.
--MIME_BOUNDARY
Content-type: text/plain
Content-Disposition: attachment; filename=example.txt
This is the plain-text attachment.
--MIME_BOUNDARY--
.
Just want a basic understand of what parts a email message may have.
I know there is a messageId, date, subject, from, cc, bcc, body, etc.
Specifically I want to know how attachments and images may be embedded in the email.
At this point I think there are 2, please correct me if I am wrong.
attachments
embedded attachments/images
is that correct?
The official answer for this question is contained in RFC5322 and some related RFC's. The Wikipedia entry for email does a pretty good job of referencing the RFC numbers. To get started with MIME see RFC2045.
Attachments are encoded as multipart similar to multipart file uploads. Basically the message has a header saying there is an attachment and sets a boundary ( random string of characters to announce the start of the attachment) The boundary says when the data of the attachment starts. I think the filename is set on the boundary as well (if i remember correctly). I am doing a bit of hand waving, but this is the basic idea.
so you get somthing like
To: ...
From: ...
Content-Type: Multpart...
Content-Boundry: ewafoiuasfjasdfoashiafhj
message here
--------- Content-boundry: ewafoiuasfjasdfoashiafhj
attachement here