Extracting email attachment filename : Content-Disposition vs Content-type - email

I am working on a script that will handle email attachments. I see that, most of the time, both content-type and content-disposition headers have the filename, but I have seen cases where only one had proper encoding or valid mime header.
Is there a preferred header to use to extract the file name? If so, which one?

Quoting wikipedia http://en.wikipedia.org/wiki/MIME:
"Many mail user agents also send messages with the file name in the name parameter of the content-type header instead of the filename parameter of the content-disposition header. This practice is discouraged."
So it seems content-disposition is preferred. However as I am using JavaMail, current JavaMail API seems to have only a String getDisposition() method: http://javamail.kenai.com/nonav/javadocs/javax/mail/Part.html#getDisposition(). So you might need to work with the header directly if you are using JavaMail.

Related

Does message rfc822 allow a new line between two headers?

Does message rfc822 allow a new line in between two headers?
After Content-Disposition I got a newline.
Attaching Image
The Received header (and all of the headers that follow it) are not part of the MIME part headers - they are the content of the MIME part.
This attachment has a MIME-type of message/rfc822 which is an email message. When you parse the content of the MIME part (which starts with the Received header), what you end up with is another message object.

Headers for REST API with optional Base64 encoding

We have a media file repository, with which other services communicate over a REST API. For various reasons we want the users of the repository to be able to upload and download files over HTTP both directly (plaintext for text files and byte array for binary files) and using Base64 encoding. We want the fact that the file is uploaded (PUT, POST) and requested for download (GET) in the Base64 encoding be reflected in the header of the HTTP request.
How do we reflect the fact that the content of the request or requested response is Base64 encoded in the HTTP header?
So far I'm tending towards appending ;base64 after the mime type in the Content-Type header, for example Content-Type: image/png;base64. Other options (X- header, Content-Encoding) are discussed in this related question but do not offer satisfactory resolution to our question.
You have to use Content-Transfer-Encoding header.
It is in RFC https://www.rfc-editor.org/rfc/rfc2045#page-14.
It supports base64 value among others, like "7bit" / "8bit" / "binary" / "quoted-printable" / "base64" / ietf-token / x-token
This header is specially designed for your case, to use as a complement for MIME type.

Content-Disposition for email body

As far as I understand Content-Disposition header can by userd for any body part of email message
It specifies the "Content-Disposition" header field, which is
optional and valid for any MIME entity ("message" or "body part")
RFC 2183
I saw that many mail clients set Content-Disposition header only for attachmetns body parts.
The question is: is it normal to set Content-Disposition to inline for message body (text/html/email body) ?
What do you mean by "normal"?
It's acceptable to set Content-Disposition to "inline" for the message body, but as you noted most mailers only use Content-Disposition to set it to "attachment". And setting it to "inline" generally doesn't make any difference to how a mailer is going to display the message. That is, you can't force it to display some content inline if it doesn't know how to, or if only considers additional body parts as attachments.

How does an email client read the content-type headers for encoding?

It is possible to send an email with different content types: text/html, text/plain, mime, etc. It also is possible to use different encodings, including (according to the RFCs) for header fields: us-ascii, utf8, etc.
How do you solve the chicken and egg problem? The content-type header is just one of several headers. If the headers can be any encoding, how does a mail server or client know how to read the content-type header if it does not know what encoding the headers themselves are in?
I can see it if the first line, e.g. had to be the content-type and it had to be in a pre-agreed encoding, (e.g. ascii), but that is not the case.
How do you parse a stream of bytes whose encoding is embedded as a string inside that very same stream?
Headers are defined to be in ascii. They can be in utf-8 if agreed to out of band, such as via the smtp or imap utf-8 capability extensions.
Internationalization in headers is performed via "encoded words", where the encoding is part of the header data. (This looks like a string such as =?iso8859-1?q?sample_header_data?=). See rfc2047.
Content Type headers do not apply to headers themselves, only the body content.

HTTP Headers to use to specify CSV delimiter and options

I would like my REST service to accept CSV files in addition to JSON and XML.
I would accept an HTTP PUT request such as:
PUT /myservice/user
Content-Type: text/csv; charset=utf-8
"tomas";"1980-01-01"
"george";"1981-02-02"
I would like to be able to accept different delimiters and other format options for my CSV file. Preferably without using the querystring, which doesn't seem to be the proper tool for that. I understand I could just invent my own headers such as:
PUT /myservice/user
Content-Type: text/csv; charset=utf-8
CSV-Delimiter: ,
CSV-Options: merge-duplicates, no-header-row
Or maybe I could invent my own parameters to Content-Type if that is allowed (after all it is a part of the content-type just like the charset used):
PUT /myservice/user
Content-Type: text/csv; charset=utf-8; delimiter=,; options=no-header-row
What would be the proper way to handle this? Are there any HTTP-headers conventionally used for this?
For "no-header-row" a parameter already exists: [header="present"|"absent"].
As for adding new parameters to the content-type header:
New parameters SHOULD NOT be defined as a way to introduce new
functionality in types registered in the standards tree, although new
parameters MAY be added to convey additional information that does
not otherwise change existing functionality. An example of this
would be a "revision" parameter to indicate a revision level of an
external specification such as JPEG. Similar behavior is encouraged
for media types registered in the vendor or personal trees but is not
required.