Gmail username collapsing - email

I noticed that the following gmail addresses are equivalent: foo#gmail.com and f.oo#gmail.com, and I would like to collapse these equivalent email addresses in a single category. I searched on the Internet about the gmail collapsing rules, but I didn't find anything. Do you have any idea of how can I normalize the gmail addresses?
P.S. By equivalent I mean that if I send an email to f.oo#gmail.com, then I will receive it in my mailbox, i.e. foo#gmail.com.
P.P.S. I think that somebody asked for the same question here: What emails are equivalent to each other?, but no correct answer was given. Maybe I should close this thread?

The GMail rules work like this:
Case is ignored.
Dots are ignored.
A plus character and anything following it is ignored.
You could thus normalize GMail usernames by first lowercasing the string, then removing all dots, then truncating the string right before the first plus character.
Note that these rules are specific to GMail. (Ignoring case in usernames is fairly universal but apparently not required by the relevant standards.)
Users may be angry if you send them email at "stripped" addresses. If someone gives you the address joe+yourapp#gmail.com, that's generally because they want to be able to filter the output from your application. If you then send mail to joe#gmail.com, you're sort of going against the user's explicit wishes.

Related

Any advantages to including recipient's name in email headers?

Are there any reasons (Spam, etc.) to include the name in the To: headers instead of just omitting them and only using the address?
No, it just looks pretty, that's mostly it.
When you're sending mail to multiple people it can be useful when some of the recipients' email addresses don't easily map to their names.
It usually displays in your email client. If you leave it out, it simply displays <example#test.com> instead of Example Test <example#test.com>.
Im guessing it might be a factor for spam detection too (so adding the recipients name makes it more likely to show up in the inbox)

Is it safe to generate an email subject from the body?

I'm writing an app which allows users to send out a text-only email to a bunch of recipients. I want to try and generate the subject of this email from the body of the message, to avoid the need for a subject field
Is it safe enough to do this? Are these emails likely to fall foul of spam filters?
I'm already scanning the entire email for spam words, so there won't be any in the subject
you could download the widely used spamfilter Spamassassin and search for 'SUBJ' in the *.cf files, this will give you many spamrules that trigger based on subject (like empty subject, all caps, bad words, bad encoding of non-ascii characters etc)
I would suggest that if the mail is from a trusted source then there is not a problem. On the other hand since the mailbox dosent know that the subject is generated automatically it does not matter to them. And the third thing is that you need to check the guidelines that the email filters follow. Check out some ope source mail filter.

Is there a "no-reply" email header?

I often see automated emails postfixed with a message like
Amazon:
*Please note: this e-mail was sent from an address that cannot accept incoming e-mail. Please use the link above if you need to contact us again about this same issue.
Twitter:
Please do not reply to this message; it was sent from an unmonitored email address. This message is a service email related to your use of Twitter.
Google Checkout:
Need help? Visit the Google Checkout help center. Please do not reply to this message.
Directly underneath this warning, Gmail shows me a reply input field. It seems to me that there should be some sort of header that could be attached to such automated emails that would tell the recipient's email client to not allow replies.
Is there such a header? If not, has it ever been discussed by the groups that control email formats?
Is there such a header?
No. I'm pretty sure there isn't anything like that; and even if there is, it'd be non-standard and not widely supported, so it'd be pretty much useless at the moment. Even if it were to become standard, any such header would presumably just be informational; and for backwards-compatibility, support would have to be entirely optional for email clients.
Clients would be slow to implement it, and many users would still be on old versions of mail clients.
If not, has it ever been discussed by the groups that control email formats?
Probably. People have had a long time to suggest all manner of things with email, but my gut feeling is that it would never be implemented; well... not unless there is a fundamental shift in the ideas of what email is designed to do.
I'm sure Google would be much happier if you didn't even have a "Reply" button when they email you, so if anyone is pushing for it, it'll be the people who are already sending from donotreply#...
Email is designed to be sent from real mailboxes. RFC 2822 and RFC 5322 say:
In all cases, the "From:" field SHOULD NOT contain any mailbox that
does not belong to the author(s) of the message.
To me, that is a clear indication that email is designed as a method for conversation, rather than broadcast.
Probably the biggest killer to any change would be the little bit above that line, which would need to be entirely redefined; which would cause more problems than would be solved:
The originator fields also provide the information required when
replying to a message. When the "Reply-To:" field is present, it
indicates the address(es) to which the author of the message suggests
that replies be sent. In the absence of the "Reply-To:" field,
replies SHOULD by default be sent to the mailbox(es) specified in the
"From:" field unless otherwise specified by the person composing the
reply.
RFC 6854 updates RFC 5322 to allow the group construct to be used in the From field as well (among other things). A group can be empty, which is likely the only way you've ever seen the group syntax being used: undisclosed-recipients:;.
Section 1 of the RFC explicitly lists "no-reply" among the motivations for allowing the group construct in the From field:
The use cases for the "From:" field have evolved. There are numerous instances of automated systems that wish to send email but cannot handle replies, and a "From:" field with no usable addresses would be extremely useful for that purpose.
It provides the following example: From: Automated System:;
However, at the end of the same section, the RFC also says:
This document recommends against the general use of group syntax in these fields at this time
In section 3, the RFC clarifies that the group syntax in the From field is only for Limited Use.
Personally, I think this method should not be used – unless we're certain that all relevant clients display the originating domain in some other way (reconstructed from the Return-Path or a new header). Otherwise, this defeats all the efforts towards domain authentication (SPF, DKIM, and DMARC). Introducing an additional header field which causes clients to simply hide the reply button seems the much better approach to me.
The RFC comments on this aspect in section 5:
Some protocols attempt to validate the originator address by matching the "From:" address to a particular verified domain (for one such protocol, see the Author Domain Signing Practices (ADSP) document [RFC5617]). Such protocols will not be applicable to messages that lack an actual email address (whether real or fake) in the "From:" field. Local policy will determine how such messages are handled, and senders, therefore, need to be aware that using groups in the "From:" might adversely affect deliverability of the message.
What a failed opportunity…
It seems that Thunderbird shows a built-in warning message if From address is of form no-reply#example.com. (The message I noticed this with also had To with no-reply#example.com and my email address in Cc field only. I haven't tested if this is important.)
As far as I know, the form no-reply#example.com has not been defined in any RFC.
Update: It appears that this behavior has been implemented in this bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1342809
and the actual implementation is a regex
/^(.*[._-])?(do[._-]?not|no)[._-]?reply([._-].*)?#/
If that matches, a confirmation prompt is displayed:
Reply Not Supported
The reply address ({ $email }) does not appear to be a monitored
address. Messages to this address will likely not be read by anyone.
[Reply Anyway] [Cancel]
This seems sensible enough for me and maybe other vendors could agree here. Note that this causes all the following to show the warning before allowing a reply:
service-name-no-reply#example.com
donot-reply#example.com
noreply.xyz#example.com
no-reply-userid#example.com
Unfortunately, it doesn't match
no-reply+eventid#example.com
so you have to use something like
no-reply-productname+eventid#example.com
if you want to encode extra information in the tag part.
Update: Note that none of this is specified in any RFC related to email so this is about what works in practice instead of in theory.

Handling typos in emails or signing up users

I have a web app at which visitors are signing up and getting a newsletter to the email they registered with.
I am using only a single email field in the signup form, since I wish to reduce the number of fields plus I figure most people (like me) copy and paste the email which mean a typo would propagate to the secondary verification field.
My problem is that a fair percentage of the signups have a typo in the email address, e.g. #yhaoo, #hotmaill, etc.
How can I effectively deal with such typos?
I was thinking of doing a simple auto-correction by using a list of misspellings for common domains, but I can't a ready-made comprehensive list for that.
When the form is posted, you can do an DNS lookup to see if there is a MX record for the domain. If there is not, you can be almost certain that it is a typo, because sending to that address would not get delivered. Then you could re-display the form with a friendly error message, asking the user to confirm that the email address is correct.
Don't auto-correct without prompting the user. It will be very hard to get right, and you might end up with confused users, that have their email address on a domain that closely resembles another domain.
I had this same question, and I just found a free javascript library at http://getmailcheck.org that I think will solve our problems:
The Javascript library and jQuery plugin that suggests a right domain when your users misspell it in an email address.
When your user types in "user#gmil.con", Mailcheck will suggest
"user#gmail.com".
Mailcheck will offer up suggestions for second and top level domains
too. For example, when a user types in "user#hotmail.cmo",
"hotmail.com" will be suggested.
Similarly, if only the second level domain is misspelled, it will be
corrected independently of the top level domain.
It is supposedly used by Dropbox, Lyft, Kickstarter, Khan Academy, and more.
First, you should first make a DNS lookup to see if there's a valid MX record for that domain (which implies the domain should exist) - if not you shouldn't accept that email.
Second, look for an http redirect from the domain to another domain. E.g. yayoo.com and yahooo.com both redirect to yahoo.com, so you may want to show a warning message "Did you mean ...#yahoo.com ?" or even automatically correct the addresses from a whitelist that you've made sure are safe to correct.
Lastly, if there's a valid MX record and no redirect, your remaining culprits will most likely be just typos that lead to hitfarms riding on typos for large providers (or innocent other services) e.g. gmial.com. For these you can resort to manually building a hash table of auto-correct suggestions (again, offering the user a "Did you mean.." step before accepting the submission.
I know that the question is old. But maybe my answer will help someone.
I'm using Mailgun API to handle typos in email addresses.

How does Gmail recognize email signatures (alternatively, "What's the best way to recognize email signatures?")

Gmail automatically greys text that looks like a signature. Anyone have any guesses how it does this? (I've noticed that it depends on the presence of the sender's name, but I think that's only part of the story).
I ask because I'm working on a web application that has an email interface, and I'd like to remove users' signatures before displaying the contents of their emails.
Email signatures are supposed to be started with two dashes, a space, and a newline.
See Wikipedia and RFC-3676