why is there a '=' at the end of a SMTP message body? - email

I receive email messages over sockets and see that long lines in the message body are broken up, separated by the following expression
'=\r\n'
I cannot find any documentation on this and wonder if someone just happens to know where I can find information on this behavior.
Also, please ONLY feedback on my question, no comments regarding email and sockets!
Thanks
Alex

From Wikipedia, regarding Quoted-printable:
Lines of quoted-printable encoded data must not be longer than 76 characters. To satisfy this requirement without altering the encoded text, soft line breaks may be added as desired. A soft line break consists of an "=" at the end of an encoded line, and does not appear as a line break in the decoded text.
The \r\n is likely coming from whatever is generating the content or body of the email, and is a line break also. Depending on the client used to view the message, it may or may not render as an actual line break.

Related

Why does an email subject contain linefeed or carriage return characters?

I'm making a code to check a mailbox, and forward unseen mails to another user.
But sometimes it fails with an error:
ValueError: Header values may not contain linefeed or carriage return characters
I checked the raw fetched data and found out that the 'Subject' value contains \r\n.
Not all mails contain, but some do.
It just appears normal in the mailbox, and I have no idea why some contain such characters.
Does it have to do with the length of the subject?
How can I deal with these situations?
Thanks :)
Email messages have a maximum line length. That's historical and the rule isn't upheld 100% of the time, so to speak. But in header fields, a space is to be treated the same as a CR LF and a sequence of spaces or a htab character. This is a really long subject, encoded in that way:
Subject: Pretend this is about 80-90
characters long
The simplest way to deal with it is to consider any sequences of space characters to be a single space.
Read the source of any email message, you'll see this wrapping in most of then. The Received fields is almost always wrapped, for instance, and quite often To if there are many addressees, or Content-Type/Content-Disposition for attachments.

550 Maximum line length exceeded on CSS or attachments

I'm using Mailgun for sending emails. The mail has a short subject (11 characters), a text body which its maximum line length 115 characters and a PDF attached.
I'm getting some errors from Mailgun (on very few emails) with message: "550 Maximum line length exceeded (see RFC 5322 2.1.1)."
RFC 5322, 2.1.1 says that the maximum line length is 998 characters excluding the CRLF.
As my email's longest line is way shorter than that, is it possible that this issue is being caused by a header, CSS rule or the attachment?
The attachment should not be a problem. If you have css, then I suspect you have html body as well. I would check out the line lengths there and in the text body. Maybe a line break is missing somewhere.
I ran into the same error and I wanted to make a more definite answer: CSS is counted in the line length limit, so if you have a lot of CSS with no line breaks, it will cause this error.
The server will only allow so many (550 or 980) characters per line. When there are no line breaks, all the HTML is counted as a single line.
So the simple solution is to add a few line breaks in the email body.
That is put a few \r\n in the body of the message.
I ran into this problem when an attachment I had included in my raw email (encoded in base64) was longer than 998 characters. As the error suggests, this is due to RFC5322 section 2.1.1.
I resolved it by chunking the base64 up into lines of 1000 characters each (which is 998 + \r\n). PHP has a function for this called chunk_split, and a similar thing can be done in JavaScript with a regex implementation of chunk_split

Socket, read until \n. What if bytes in message happen to be \n and end reading prematurely?

I plan to have a socket reading data until it gets to the \n character (In order to read individual messages from a data stream). What happens, however, if the message you're sending happens to have bytes that match the \n character? Won't that end reading prematurely and mess up everything? How do people usually read until a certain part in their data?
Ok, Joe provided a lot of good alternatives to reading till the "\n" character. (Now that I think of it, \n is probably only used for text based things)
Quote Joe: There's a lot of choices here. 1) substitute something for the delimiter when it's part of the content. 2) pick a less likely delimiter than \n. 3) include a message length ahead of each message that you parse out. And
#3 seems like the best way to accomplish what I'm trying to do.

Quoted Printable Email showing equal signs in certain email clients

I'm generating emails. They Show up fine for me in gmail and Outlook 2010. However, my client sees the = sign that gets added to the end of lines by the quoted-printable formatting. It also eats the character on the next line, but then displaying the equal sign.
Example:
line that en=
ds like this
shows up like
line that en=s like this
(Note: The EOL character in my emails is just LF. No CR.)
I'm confirming what outlook version my client is using, but I think it's 2007. The email headers from her appear to come through Exchange 6.5.
My emails are created in php using the HtmlMimeMail5 library. They are multipart emails, with the applicable section sent with:
Content-Type: text/html; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
It appears I could just make sure nothing in my email reaches the line wrap at 76 characters, but that seems like the wrong way to solve the problem. Should the EOL character be different? (In emails from the client, the EOL character is simply a LF) Any other ideas?
I do not know what the PHP library does, but in the end MIME mail must contain CR LF line endings. Obviously the client notices that = is not followed by a proper CR LF sequence, so it assumes that it is not a soft line break, but a character encoded in two hex digits, therefore it reads the next two bytes. It should notice that the next two bytes are not valid hex digits, so its behavior is wrong too, but we have to admit that at that point it does not have a chance to display something useful. They opted for the garbage in, garbage out approach.

What escaping or purification is needed for an email subject?

Sites like Facebook have the user's name in the subject line that sent you a message.
Because of this, what escaping would you do on user entered values in a message subject? Or would you just not allow anything other than a-z, 0-9, period, comma and single quotes?
You need to be careful with email headers, 8 bit chars are a bit of a no-no. (mail servers will reject them).
The proper way to do it is to MIME encode your subject lines and make sure the ASCII char \n is not in the subject line (technically multi-line subjects are possible, but I'd imagine plenty of mail clients would have problems)
See http://en.wikipedia.org/wiki/MIME#Encoded-Word for more info.
Escaping is needed if there are forbidden characters. The subject is terminated by a NL so this is the only (ASCII) character that shouldn't be put in the header.
See also rfc821
It’s the same problem with contact forms.
If you look at an email header you get e.g. this:
Subject: user123 has sent you an invite
From: "User123" <user123#example.org>
You have to make sure that user names do not resemble values of an email header. If it’s possible for a user to name himself “To: spamreceiver1#example.org, spamreceiver2#example.org, spamreceiver3#example.org, spamreceiver4#example.org” you have to clean the input.
A search for “contact form spam” should show you what to do. You should at least remove all occurrences of "To:", "Subject:", "From:" etc.