Handling diacritics in SIP headers - sip

Following the SIMPLE specification of OMA, when sending a SIP INVITE for chat we can use a header named Subject.
Typically, this header contains the first message sent by a user to his contact.
My question is: this message can contain diacritics, so how should I encode them? Is there a standard definition on how to do this?

You should encode them as UTF-8 as specified in the SIP RFC. There are a few SIP Headers where UTF-8 is not allowed and US ASCII with escaping rules is mandated but the Subject header is not one of those.

Related

How does an email client read the content-type headers for encoding?

It is possible to send an email with different content types: text/html, text/plain, mime, etc. It also is possible to use different encodings, including (according to the RFCs) for header fields: us-ascii, utf8, etc.
How do you solve the chicken and egg problem? The content-type header is just one of several headers. If the headers can be any encoding, how does a mail server or client know how to read the content-type header if it does not know what encoding the headers themselves are in?
I can see it if the first line, e.g. had to be the content-type and it had to be in a pre-agreed encoding, (e.g. ascii), but that is not the case.
How do you parse a stream of bytes whose encoding is embedded as a string inside that very same stream?
Headers are defined to be in ascii. They can be in utf-8 if agreed to out of band, such as via the smtp or imap utf-8 capability extensions.
Internationalization in headers is performed via "encoded words", where the encoding is part of the header data. (This looks like a string such as =?iso8859-1?q?sample_header_data?=). See rfc2047.
Content Type headers do not apply to headers themselves, only the body content.

values of SIP Accept and SIP Accept-Contact

I am trying to find out the range of possible values of Accept and Accept-Contact header fields, but I can't find a complete list in the RFCs. Does anyone know where they are? I often see
Accept: application/sdp;level=1, application/x-private, text/html
but don't know all possible values. More generally, where can I find all possible values of SIP headers?
Thanks,
A lot of sections in the [SIP RFC]1 are based on the [HTTP 1.1 RFC]2 in acknowledgement that the semantics of SIP and HTTP are very similar. The SIP Accept header is a good case in point. The SIP RFC section that deals with the Accept header refers to [H14.1] which translates to section 14.1 in the HTTP 1.1 RFC and which goes into detail about how the Accept header can be used to specify the different types of media that are acceptable in the response.
That all being said in the real World 90% of the time the SIP response media is going to be application/sdp. There will be SIP requests that accept other types of response media but they are not that common.
Bob what you are looking is MIME types.
You find some common MIME type here
http://en.wikipedia.org/wiki/Internet_media_type
A text formation of SIP is derive from HTTP so you can refer HTTP headers also to fine possible values of other headers.
Most headers and parameter with their corresponding RFC are listed: http://www.iana.org/assignments/sip-parameters

Extracting email attachment filename : Content-Disposition vs Content-type

I am working on a script that will handle email attachments. I see that, most of the time, both content-type and content-disposition headers have the filename, but I have seen cases where only one had proper encoding or valid mime header.
Is there a preferred header to use to extract the file name? If so, which one?
Quoting wikipedia http://en.wikipedia.org/wiki/MIME:
"Many mail user agents also send messages with the file name in the name parameter of the content-type header instead of the filename parameter of the content-disposition header. This practice is discouraged."
So it seems content-disposition is preferred. However as I am using JavaMail, current JavaMail API seems to have only a String getDisposition() method: http://javamail.kenai.com/nonav/javadocs/javax/mail/Part.html#getDisposition(). So you might need to work with the header directly if you are using JavaMail.

Forwarded email detection

Is there any way to detect (using RFC 2822 headers) that an email is a forwarded email?
There are two things that are normally referred to as "forwarding".
When you set up automatic account-level forwarding to another email address, your mail system will usually introduce an extra header to enable it to detect and break mail loops. Unfortunately, the name of this header has never been standardized. Some use Delivered-To, some use X-Loop, some use X-Original-To, some use an X-header proprietary to their mail software. But there's no single header field that's present all cases.
When you manually forward a message by clicking the "Forward" button in your mailer and entering a recipient email address and some descriptive text, a new message with a new Message-ID header is generated. The set of headers on this message will be indistinguishable from a normal reply -- In-Reply-To and References are set in exactly the same way. The only difference is that the Subject header will usually start with "Fwd:" or end with "(fwd)". ("Usually" because some clients format it as "[Fwd: <original subject>]" with square brackets around the new subject, some clients localize the prefix Fwd: into their own language, and some users manually edit the Subject before hitting "send".)
So there are good hints that a message is forwarded, but no hard and fast rules.
Reading the spec, CTRL+F for "forward" gives the following header fields:
resent-date = "Resent-Date:" date-time CRLF
resent-from = "Resent-From:" mailbox-list CRLF
resent-sender = "Resent-Sender:" mailbox CRLF
resent-to = "Resent-To:" address-list CRLF
resent-cc = "Resent-Cc:" address-list CRLF
resent-bcc = "Resent-Bcc:" (address-list / [CFWS]) CRLF
resent-msg-id = "Resent-Message-ID:" msg-id CRLF
I'm not sure whether the major mail software uses these though.
EDIT
Read the spec a little too quickly, there is also this note:
Note: Reintroducing a message into the transport system and using
resent fields is a different operation from "forwarding".
"Forwarding" has two meanings: One sense of forwarding is that a mail
reading program can be told by a user to forward a copy of a message
to another person, making the forwarded message the body of the new
message. A forwarded message in this sense does not appear to have
come from the original sender, but is an entirely new message from
the forwarder of the message. On the other hand, forwarding is also
used to mean when a mail transport program gets a message and
forwards it on to a different destination for final delivery. Resent
header fields are not intended for use with either type of
forwarding.
There are no other notices of "forwarding", so there are no header fields that you can use to detect the forward, except for the subject = "Fwd: <msg>" convention.

How to retrieve a IMAP body with the right charset?

I'm trying to make an "IMAP fetch" command to retrieve the message body. I need to pass/use the correct charset, otherwise the response will come with special characters.
How can I make the IMAP request/command to consider the charset I received in the BODYSTRUCTURE result??
The IMAP server will send non-ASCII message body data as an 8-bit "literal", which is essentially an array of bytes. You can't make the FETCH command use a specific charset, as far as I know.
It's up to the IMAP library or your application to decode the byte array into a string representation using the appropriate charset you received in the BODYSTRUCTURE response.