MIXED_ES Too many es are not es - email

I sending emails using PHPMailer and SpamAssassin is tagging it
* 3.3 MIXED_ES Too many es are not es
What does it mean? What is "es"? How to fix?

Spamassassin errors are often unhelpful! The source for that rule suggests it's to do with too many letter 'E's that are not "regular" letter Es, for example any of éèëêēĕėëẻěȅȇẹȩęḙḛềếễểḕḗệḝɇǝⱸ. It's to do with your content, not PHPmailer.

To disable the duplicate messages ( disable the test), you can add, somewhere at the top, in
/etc/mail/spamassassin/local.cf
meta __E_LIKE_LETTER (0)
meta __LOWER_E (0)

It's an error in Spam assassins rules as #Synchro says. We can't do anything with that until they remove that crapy code. I try to contact authors.

My solution was adding this code before the closing </body> tag:
<div style="display:none;">eeeeeeeeeeeeeeee</div>

Is your plaintext version of the email in a different language than your HTML content?
Like Synchro said, it is caused by different kind of E's using diacritics. If the mailing is recognized by SpamAssassin as being "English" and the HTML content is actually in another language that makes use a lot of those ES, it can result in this message.

In my case this Spamassassin rule activated because I generated email content in Russian which included the Russian е letter (yes, that is not e but е, totally different character, although almost identical with latin e) but email signature contained email address and a plain text link to a domain with latin e letter.

Related

Is it possible to add another Unicode character for "at sign" without changing any code in the back-end of all the email providers?

So lets say for some reason we wanted to add another Unicode character for at sign, and use it instead of # in all the email providers
Now i have three questions:
How do email providers parse the email, do they actually parse the written email until they see a # and they have hard-coded the # symbol's Unicode in the parser?
Do different service providers have different email parser with different standards or is there a standard type of parser library that every email provider use?
Will it be possible to add another at sign symbol and use it in emails without having to make changes in all the email provider's code?
Yes, e-mail addresses are parsed using a hard-wired # character. After almost fifty years of e-mail, there are literally millions of e-mail handling programs, and they all use this same syntax. So you're not going to be able to change this convention, and your second and third questions are moot.
E-mail addresses are parsed by tens of different kind of softwares, not just "email server" software inside "e-mail providers". Even things as trivial as client-side javascript highlighting for an e-mail field - of which there are easly tens of thousands around, would have to adapt.
An "#" is not a character class by itself - so, even if it were an unique "unicode character class" for "Unicode Separator", whou would ever have written code that would check for the character class of the separator? Have you ever done that, even for filtering punctuation out? (A real use case for the unicode classification of characters, and even them, this sees little use in real-world code).
Now, of course, you are free to write email client code that would present the "#" as anything else when rendering e-mail data to the users. Internally, if this software would not use "#", even for its own uses, it would not work with anything else in the World - from antivirus software to text-based templates.
And finally, such a change would hardly have to do with "unicode" itself - Unicode can standardize characters - but the e-mail protocol is a separate thing - normally the series of documents kept as "RFC"s is what mandate various internet protocols, including IMAP, POP and SMTP- the three protocols that are used to enable e-mail to work. Even if new RFCs for all these would be published with a new character accept in place of "#", it would likely take more than a decade until all software around, as detailed above, would be compliant enough to enable it to be used. (And yes, all of it would have to be changed)

postfix header_checks.pcre wrongly blocking IPHONE emails

I have a postfix/dovecot mail server which has been working fine for a year or so but today one user came to me with his iPhone and said he couldn't send emails.
It turns out the emails were being rejected by my header_checks.pcre which I set up as per the example in http://www.postfix.org/header_checks.5.html
The error I got was something like:
Apr 30 09:48:28 mail06 postfix/cleanup[28849]: 53893A00CD: reject:
header Content-Type:
image/png;??name=email_logo.png;??x-apple-part-url="part22.05080008.04000601#mydomain.com"
from unknown[112.134.156.178]; from=
to= proto=ESMTP helo=<[192.168.1.12]>: 5.7.1
Attachment name
"email_logo.png;??x-apple-part-url="part22.05080008.04000601#mydomain.com"
may not end with ".com"
So it seems that the iPhone mail app was appending an "x-apple-part-url" suffix to the attachment name and the PCRE was mistakenly blocking this as a .com instead of allowing through a .png.
Does anyone know how I can safely modify the PCRE in http://www.postfix.org/header_checks.5.html to avoid this happening?
So far as I know ".com" is still a viable extension for Windows malware. The problem is that the second .* in the example PCRE in the Postfix documentation is spanning two parameters as if the .com ended the name or filename parameter.
According to RFC 2045, value := token / quoted-string. This means you need to cater for both the quoted and unquoted cases by providing appropriate character classes. You could split into two rules or, to save multiple lists of extensions, do something like:
/etc/postfix/header_checks.pcre:
/^Content-(Disposition|Type).*name\s*=\s*
("(?:[^"]|\\")*|[^();:,\/<>\#\"?=<>\[\]\ ]*)
((?:\.|=2E)(
ade|adp|asp|bas|bat|chm|cmd|com|cpl|crt|dll|exe|
hlp|ht[at]|
inf|ins|isp|jse?|lnk|md[betw]|ms[cipt]|nws|
\{[[:xdigit:]]{8}(?:-[[:xdigit:]]{4}){3}-[[:xdigit:]]{12}\}|
ops|pcd|pif|prf|reg|sc[frt]|sh[bsm]|swf|
vb[esx]?|vxd|ws[cfh])(\?=)?"?)\s*(;|$)/x
REJECT Attachment name $2$3 may not end with ".$4"
The new second line of the rule distinguishes between the quoted and unquoted cases and any closing quotation mark is absorbed into $3.
BTW I'd probably stick .mso, .xl, .ocx (obscure MS extensions) and .jar in there too. Obviously this check is useful against malware floods but doesn't substitute for an up-to-date antivirus or more detailed spam analysis.

Mandrill "reject_reason": "invalid-sender"

I'm trying to send emails using mandrill email service but I get the following error
Full Response
[
{
"email": "someemail#somedomain.com",
"status": "rejected",
"_id": "b814c2974594466cba9c904c54dca6c6",
"reject_reason": "invalid-sender"
}
]
Apart from the above error there is no more details about it. we are using .net to send emails with Mandrill SMTP settings.
It'd be useful to see the call/email that's being sent. That error means that there's an invalid sender, as indicated in the reject reason field. That could be because of an invalid email address, invalidly-encoded from name, or invalid or broken encoding in other headers making it so that Mandrill can't parse the "from" header, but without seeing the actual email that you're sending, it's hard to say for sure exactly what the issue is.
You probably want to check that there's a fully-qualified domain name in the from email address, and that if the subject line is encoded, there aren't things like newline (\n) characters that break multibyte characters in the subject line. If you aren't able to identify the issue in the raw SMTP message, feel free to get in touch with support for further troubleshooting assistance.
I had the same problem, in my case, I had forgotten to complete the template defaults "From Name" and "Subject".
I had the same problem. In my case encoding in headers was the problem. I did change the headers encoding to UTF-8 and it worked. I was using C# SMTP and the code is below.
message.HeadersEncoding = Encoding.UTF8;
Hope it works!
For me, it was because my emails were coming from email#example.net1
Mandrill rejected me because of the 1 at the end. e+mail#example.net and email#example.neta are both valid and will be accepted.
My other tests just had blank From headers, so they were rejected as well. I didn't even realize these emails were being received by Mandrill until I logged in and checked the API logs.
I've had a similar problem recently. It was due to my use of certain characters in the message.from_name field. After searching through documentation and stack overflow, I couldn't find a list of forbidden characters, so although this doesn't necessarily pertain to your case, I thought I'd share this small list I compiled of some acceptable characters (not an exhaustive list):
a-z
A-Z
0-9
_, -, !, #, $, %, \, ^, &, *, +, =, {, }, ?, .
In JS, here's a RegExp that will match with forbidden characters (or, rather, any characters that aren't in the aforementioned list):
const pattern = /[^a-zA-Z0-9_\-!#$%\^&*+={}?.]/;
Hope this is helpful for anyone else stuck on this.
If you use .NET SmtpClient, may be this is because of bug on it: https://social.msdn.microsoft.com/Forums/vstudio/en-US/4d1c1752-70ba-420a-9510-8fb4aa6da046/subject-encoding-on-smtpclientmailmessage
Workaround, that helped us:
use
message.SubjectEncoding = Encoding.Unicode;
instead of
message.SubjectEncoding = Encoding.UTF8;
This is still actual in .Net Framework 4.7.2

Decoding Korean text files from the 90s

I have a collection of .html files created in the mid-90s, which include a significant ammount of Korean text. The HTML lacks character set metadata, so of course all of the Korean text now does not render properly. The following examples will all make use of the same excerpt of text .
In text editors such as Coda and Text Wrangler the text displays as
╙╦ ╝№бя└К ▓щ╥НВь╕цль▒Ф ▓щ╥НВь╕цль▒Ф
Which in the absence of character set metadata in < head > is rendered by the browser as:
ÓË ¼ü¡ïÀŠ ²éÒ‚ì¸æ«ì±” ²éÒ‚ì¸æ«ì±”
Adding euc-kr metadata to < head >
<meta http-equiv="Content-Type" content="text/html; charset=euc-kr">
Yields the following, which is illegible nonsense (verified by a native speaker):
沓 숩∽핅 꿴�귥멩レ콛 꿴�귥멩レ콛
I have tried this approach with all historic Korean character sets, each yielding similarly unsuccessful results. I also tried parsing and upgrading to UTF-8, via Beautiful Soup, which also failed.
Viewing the files in Emacs seems promising, as it reveals the text encoding a lower level. The following is the same sample of text:
\323\313 \274\374\241\357\300\212
\262\351\322\215\202\354\270\346\253\354\261\224 \262\3\
51\322\215\202\354\270\346\253\354\261\224
How can I identify this text encoding and promote it to UTF-8?
All of those octal codes that emacs revealed are less than 254 (or \376 in octal), so it looks like one of those old pre-Unicode fonts that just used it's own mapping in the ASCII range. If this is right, you'll just have to try to figure out what font it was intended for, find it and perhaps do the conversion yourself.
It's a pain. Many years ago I did something similar for some popular pre-Unicode Greek fonts: http://litot.es/unicode-converter/ (the code: https://github.com/seanredmond/Encoding-Converter)
In the end, it is about finding the correct character encoding and using iconv.
iconv --list
displays all available encodings. Grepping for "KR" reveals at least my system can do CSEUCKR, CSISO2022KR, EUC-KR, ISO-2022-KR and ISO646-KR. Korean is also BIG5HKSCS, CSKSC5636 and KSC5636 according to Wikipedia. Try them all until something reasonable pops out.
Even if this thread is old, it's still an issue, and not having found a way to convert the files in bulk (outside of using a Korean version of Windows7), now I'm using Naver, which has a cloud service like Google docs and if you upload those weirdly encoded files there, it deals with them very well. I just edit and copy the text, and it's back to being standard when I copy it elsewhere.
Not the kind of solution I like, but it might save a few passers-by.
You can register for the cloud account with an ID, even if you do not live in SKorea by the way, there's some minimal english to get by.

Can an email address contain international (non-english) characters?

If it's possible, should I accept such emails from users and what problems to expect when I will be sending mails to such addresses?
Officially, per RFC 6532 - Yes.
For a quick explanation, check out wikipedia on the subject.
Update 2015: Use RFC 6532
The experimental 5335 has been Obsoleted by: 6532 and
this later has been set to "Category: Standards Track",
making it the standard.
The Section 3.2 (Syntax Extensions to RFC 5322) has updated most text fields to
include (proper) UTF-8.
The following rules extend the ABNF syntax defined in [RFC5322] and
[RFC5234] in order to allow UTF-8 content.
VCHAR =/ UTF8-non-ascii
ctext =/ UTF8-non-ascii
atext =/ UTF8-non-ascii
qtext =/ UTF8-non-ascii
text =/ UTF8-non-ascii
; note that this upgrades the body to UTF-8
dtext =/ UTF8-non-ascii
The preceding changes mean that the following constructs now
allow UTF-8:
1. Unstructured text, used in header fields like
"Subject:" or "Content-description:".
2. Any construct that uses atoms, including but not limited
to the local parts of addresses and Message-IDs. This
includes addresses in the "for" clauses of "Received:"
header fields.
3. Quoted strings.
4. Domains.
Note that header field names are not on this list; these are still
restricted to ASCII.
Please note the explicit inclusion of Domains.
And the explicit exclusion of header names.
Also Note about NFKC:
The UTF-8 NFKC normalization form SHOULD NOT be used because
it may lose information that is needed to correctly spell
some names in some unusual circumstances.
And Section 3 start:
Also note that messages in this format require the use of the
SMTPUTF8 extension [RFC6531] to be transferred via SMTP.
The problem is that some mail clients (server-tools and / or desktop tools) don't support it and throw an 'invalid email' exception when you try to send a mail to an address which contains umlauts for example.
If you want full support, you could do the trick with converting the email-address parts to "punycode". This allows users to type in their addresses the usual way but you save it the supported-level way.
Example: müller.com » xn--mller-kva.com
Both points to the same thing.
I would assume yes since a number of top level domains already allow non ascii
characters for domains and since the domain is part of an email address, it's
perfectly possible. An example for such a domain would be www.öko.de
short answer: yes
not only in the username but also in the domain name are allowed.
The answer is yes, but they need to be encoded specially.
Look at this. Read the part that refers to email-headers and RFC 2047.
Not yet. The IEEE plans to do this:
H-Online article: IEFT planning internationalised email addresses, here is the RfC: SMTP Extension for Internationalized Email Addresses
Quote from H-Online (as it went down):
The Internet Engineering Task Force (IETF) has published three crucial documents for the standardisation of email address headers
that include symbols outside the ASCII character set. This means that
soon you'll be able to use Chinese characters, French accents, and
German umlauts in email addresses as well as just in the body of the
message. So if your name is Zoë and you work for a company that makes
façades, you might be interested in a new email address. But
representatives of providers are already moaning. They say there would
need to be an "upgrade mania" if the Unicode standard UTF-8 is to
replace the American Standard Code for Information Interchange (ASCII)
currently used as the general email language.
RFC 5335 specifies the use of UTF-8 in practically all email headers.
Changes would have to be made to SMTP clients, SMTP servers, mail user
agents (MUAs), software for mailing lists, gateways to other media,
and everywhere else where email is processed or passed along. RFC 5336
expands the SMTP email transport protocol. At the level of the
protocol, the expansion is labelled UTF8SMTP.
A new header field will be added as a sort of "emergency parachute" to
ensure that UTF-8 emails have a soft landing if they are thrown out
before reaching the recipient by systems that have not been upgraded.
The "OldAddress" is a purely ASCII address. But OldAddress is not to
be used as a channel for a second transfer attempt, but rather to make
sure that feedback is sent home.
Finally, RFC5337 ensures that correct messages are sent pertaining to
the delivery status of non-ASCII emails. The correct address of an
unreachable addressee must be sent back, even if further transport has
been refused. The email Address Internationalization (EAI) working
group is also working on a number of "downgrade mechanisms" for
various header fields and the envelope. If possible, original header
information is to be "packaged" and preserved.
Germany's DeNIC, the registrar for the ".de" domain, is nonetheless
taking this in its stride. "There is really not much we can do",
explained DeNIC spokesperson Klaus Herzig. DeNIC is instead paying
more attention to the update that the IETF is working on for the
standard of international domains – RFC3490, or IDNA2003 as it's
sometimes known. "We are not that happy about it because there is no
backwards compatibility," Herzig explained. When the update comes,
DeNIC says it will be throwing its weight behind the symbol "ß" - also
known as estzett - which has been overlooked up to now. The German
registrar also says that it may wait a bit before switching in light
of the lack of backward compatibility. Once the new standard is
running stably and registrars and providers have adopted it, the ß
will be added.
In contrast, experts believe that Chinese registrars in China and
Taiwan will quickly implement the change for internationalised email.
Representatives of CNIC and TWNIC are authors of the standards.
Chinese users currently have to write emails in ASCII to the left of
the # and in Chinese characters to the right of it for Chinese
domains, which have already been internationalized.
(Monika Ermert)