Alright. I thought this problem had something to do with my rails app, but it seems to have to do with the deeper workings of email attachments.
I have to send out a csv file from my rails app to a warehouse that fulfills orders places in my store. The warehouse has a format for the CSV, and ironically the header line of the CSV file is super long (1000+ characters).
I was getting a line break in the header line of the csv file when I received the test emails and couldn't figure out what put it there. However, some googling has finally showed the reason: attached files have a line character limit of 1000. Why? I don't know. It seems ridiculous, but I still have to send this csv file somehow.
I tried manually setting the MIME type of the attachment to text/csv, but that was no help. Does anybody know how to solve this problem?
Some relevant google results : http://www.google.com/search?client=safari&rls=en&q=csv+wrapped+990&ie=UTF-8&oe=UTF-8
update
I've tried encoding the attachment in base64 like so:
attachments['205.csv'] = {:data=> ActiveSupport::Base64.encode64(#string), :encoding => 'base64', :mime_type => 'text/csv'}
That doesn't seem to have made a difference. I'm receiving the email with a me.com account via Sparrow for Mac. I'll try using gmail's web interface.
This seems to be because the SendGrid mail server is modifying the attachment content. If you send an attachment with a plain text storage mime type (e.g text/csv) it will wrap the content every 990 characters, as you observed. I think this is related to RFC 2045/821:
Content-Transfer-Encoding Header Field
Many media types which could be usefully transported via email are
represented, in their "natural" format, as 8bit character or binary
data. Such data cannot be transmitted over some transfer protocols.
For example, RFC 821 (SMTP) restricts mail messages to 7bit US-ASCII
data with lines no longer than 1000 characters including any trailing
CRLF line separator.
It is necessary, therefore, to define a standard mechanism for
encoding such data into a 7bit short line format. Proper labelling
of unencoded material in less restrictive formats for direct use over
less restrictive transports is also desireable. This document
specifies that such encodings will be indicated by a new "Content-
Transfer-Encoding" header field. This field has not been defined by
any previous standard.
If you send the attachment using base64 encoding instead of the default 7-bit the attachment remains unchanged (no added line breaks):
attachments['file.csv']= { :data=> ActiveSupport::Base64.encode64(#string), :encoding => 'base64' }
Could you have newlines in your data that would cause this? Check and see if
csv_for_orders(orders).lines.count == orders.count
If so, a quick/hackish fix might be changing where you call values_for_line_item(item) to values_for_line_item(item).map{|c| c.gsub(/(\r|\n)/, '')} (same for the other line_item calls).
Related
In messages I send with Thunderbird I have the following email header
Content-Type: multipart/alternative;
boundary="000_160551222008756181axiscom"
All the history is divided by lines looking like
--000_160551222008756181axiscom
Those initial -- mess things up for other mail viewers like Outlook's webmail client. Where does those -- come from, and how do I get rid of them?
I'm using version 78.4.2 (64-bit) of Thunderbird.
The hyphens come from RFC2046. You don't normally get rid of them. Look for the problem elsewhere.
The Content-Type field for multipart entities requires one parameter,
"boundary". The boundary delimiter line is then defined as a line
consisting entirely of two hyphen characters ("-", decimal value 45)
followed by the boundary parameter value from the Content-Type header
field, optional linear whitespace, and a terminating CRLF.
It's about sending emails with non ASCII chars in the email address.
When I use send the TO /RCPT stuff to the SMTP server I know that I need to use punycode here.
But what about the To: and From: Header. Again I know that if the User friendly part contains a non ascii char I con use the standard header encoding that I also use for the subject. But this encoding is only used for the user friendly part.
But what if the email address contains a non ascii char? How must the To header be formatted.
So how to encode "Tüst" ?
This is the encoding as far as I know.
"=?iso-8859-1?Q?T=FCst?="<tüst#domain.de>
But what with the email address.
In fact: I don't understand the RFC's. I tried hard but failed.
The answer is: UTF-8 is the correct way to encode the header.
After some more research I found the answer hidden inside this article:
https://en.wikipedia.org/wiki/International_email
Although the traditional format for email header section allows
non-ASCII characters to be included in the value portion of some of
the header fields using MIME-encoded words (e.g. in display names or
in a Subject header field), MIME-encoding must not be used to encode
other information in a header, such as an email address, or header
fields like Message-ID or Received. Moreover, the MIME-encoding
requires extra processing of the header to convert the data to and
from its MIME-encoded word representation, and harms readability of a
header section.
The 2012 standards RFC 6532 and RFC 6531 allow the inclusion of
Unicode characters in a header content using UTF-8 encoding, and their
transmission via SMTP - but in practice support is only slowly rolling
out.[5]
I have a mail command that looks something like this:
sh ~/filter.sh ~/build/site_build.err | mail -s "Site updated." me\#site.com
The bash script has a few commands, mostly grep, to filter the stderr. It sends the filtered content to me when our build process has finished.
It used to send the file contents in the message body, however now it's sending as an attachment, which I do not want.
When I delete most of the text it sends to me in the message body once again, so I've reasoned that this has to do with the message size -- is this correct?
Anyway, how can I prevent Unix from sending the message contents as an attachment, no matter the circumstances?
Thanks!
Still unsure as many things could cause mail to apparently send input as attachement. But in many Linux distribution, mail is an alias for is heirloom mailx. It is known to encode its input if there are any non printable characters in it.
As soon as it finds non standards characters, it adds the following headers to the mail :
Content-Type: application/octet-stream
Content-Transfert-Encoding: base64
and the text of the mail is effectively base 64 encoded. It is not a true attachement, but many mail readers treat such mails as an empty body with an unnamed attached file.
If your LANG environment variable declares a locale that can use non 7 bits chars (éèûôüö ...) mail seems to be clever enough to declare an extended charset (ISO-8859-1 for fr locale) and to do quoted-printable encoding. So even with (at least west european) non 7bits ASCII characters in the message, provided there are no control chararacters, the mail should be sent normally.
The alternative would be to filter all non printable characters from the output of filter.sh, and the last chance solution would be not to use mail and directly use sendmail.
(Ref : an answer of mine to an other post but that one concerned java)
In the example for Zend_Mail on http://framework.zend.com/manual/en/zend.mail.attachments.html they use ENCODING_8BIT but searching for what that might be sends me to http://msdn.microsoft.com/en-us/library/ms526992%28EXCHG.10%29.aspx were (and this sounds logical to me) it is explained that 8bit encoding does not make sense for emails.
Edit:
When I use this encoding for a mail with an attachment, I receive the mail with a corrupted attachment in my mail software (Thunderbird)
In which cases does it make sense to use ENCODING_8BIT?
As everybody said, ENCODING_8BIT represents the Content Transfer Encoding.
Basically, 8BITMIME is used for Internationalization. It's using a 8-bit character sets and therefore, allow you to send any character supported in the UTF8 charset.
In general, non-MIME mailers send 8-bit data but do not include any
MIME headers to mark the message as 8-bit data. MIME mailers should
cope with this without any problems. [source]
So basically there is not really a case where it makes sense to use ENCODING_8BIT over another encoding since emails in UTF8 are a standard today. Also, note that most of the MTAs (Message Transfer Agent, such as Postfix, etc.) automatically force the encoding to 8BITMIME (UTF-8).
Here is a good resource about the 8BITMIME encoding.
The 8BITMIME extension has two effects in practice:
The client will avoid Q-P conversion.
The client may add extra
information at the end of a MAIL request: a space followed by either
"BODY=7BIT" or "BODY=8BITMIME".
Zend_Mime::ENCODING_8BIT sets the Content-Transfer-Encoding.
The Content-Transfer-Encoding defines methods for representing binary data in ASCII text format.
The use of Zend_Mime::ENCODING_8BIT in the example is a Bug.
For sending Attachments you should always use Zend_Mime::ENCODING_BASE64
Not for email but for attachements. If you take a look on the RFC 2045 at page 7:
RFC2045
"Binary data" refers to data where any
sequence of octets whatsoever is
allowed.
The book "Designing Embedded Hardware" in the chapter "9.3. Old Faithful: RS-232C" mentions that emails are still sent in 7bit char set because of RS-232C:
It's also not unheard of to see
RS-232C systems still using 7-bit data
frames (another leftover from the
'60s), rather than the more common
8-bit. In fact, this is one of the reasons why you'll
still see email being sent on the
Internet limited to a 7-bit character
set, just in case the packets happen
to be routed via a serial connection
that supports only 7-bit
transmissions.
How can I confirm the observation?
Check out the spec. The original rfc822, for ARPA Internet Text Messages, explicitly states:
A message consists of header fields
and, optionally, a body. The body is
simply a sequence of lines containing
ASCII characters.
Since ASCII is 7-bit, voila.
Note, however, that there are a whole bunch of additions to that original spec, all the MIME extensions, which allow message header extensions for non-ascii text.
The Quoted-printable MIME encoding is specifically designed to encode 8-bit data in 7-bit characters. This encoding is widely used to encode email.
Note also that the text you quoted says "in case the packets happen to be routed via a serial connection" which is misleading, especially if they're talking in a context of IP packets. IP packets assume an 8-bit data path, and cannot be sent directly over a 7-bit RS-232 link without additional encoding (and then it's not a 7-bit data path anymore, it's 8-bit).
The systems that were restricted to 7 bits were already old when email first became popular. The chances that you will find one today approach zero.
Since certain characters have special meaning to email programs (most notably the end-of-line character), it still makes sense to limit the character set.