Surefire way to know if an email is in reply to an email my mail client sent - email

I'm looking for a way to know definitively if an email I receive is in response to a specific email I sent. I manually set the Message-Id of the outgoing message using make_msgid, store this value, and then check the In-Reply-To of an incoming email to determine if it is equal to the original Message-Id I sent.
This approach is basically what is suggested here in this very helpful answer by Mohammad Eghlima.
But I wonder if this approach is "foolproof" and if there is a better way to accomplish this? For example if there are some clients other than outlook, gmail etc. that do not follow this convention of setting In-Reply-To to the Message-Id of the original mail for replies, or if they set their own Message-Id for some reason (ex. Gmail does this if it determines the existing message id doesn't follow RFC standards)?
I've seen some other answers mention other potential methods to accomplish what I'm trying to do - for example, here but most of these questions/answers are from 10 years ago so I'm wondering if there is a better way to accomplish this now.

No, nothing is entirely foolproof. Junior PHP programmers write new email-sending code every day and none of it conforms to any particular set of conventions or RFCs. And then there's Microsoft and Google ... Oh, you are already familiar with them.
There have been no significant developments in email standardization on this particular front in the last decade, so advice from 10 years ago is by and large still relevant.
If anything, the field has been polarized by Microsoft and Google plunging ahead to "innovate" in various aspects of what may charitably be characterized as usability improvements over traditional email, but the motivation has often been to silo in users to prefer or be forced to use their solutions, not standardize anything.
(The efforts to improve e.g. email security through DMARC etc has been better coordinated and standardized.)
The post you link to basically summarizes the information from D. J. Bernstein's excellent email reference resource https://cr.yp.to/mail.html; see in particular the threading conventions at https://cr.yp.to/immhf/thread.html

Related

SpamAssassin flag 'RAND_MKTG_HEADER' is unclear in what it means

I'm managing a bulk email service for the company I work at and a recent change to SpamAssassin has started flagging emails sent by our bulk-email solution with 'RAND_MKTG_HEADER'. I can't find much about this on the internet other than 'Has partially-randomized marketing/tracking header(s)'. The thing is, the software doesn't randomize any of the marketing headers for the campaigns sent with it, so I'm a bit confused as to the hows and whys and what I can do to fix this issue.
Naturally campaigns IDs are randomized UIDs, that's the nature of indentifying things uniquely. If anyone has any insight as to what this particular flag entails and what I can do to fix the issue it would be GREATLY appreciated as it's starting to impact our legitimate customers with delivery issues.
Thanks in advance!
It's likely caused by a custom X- header prefix, i.e. you have an X-something- prefix in your mailer software. If you happen to use Mailwizz, here is the solution: https://kb.mailwizz.com/articles/low-score-in-spamassassin-because-of-the-rand_mktg_header-rule/
You can essentially fix the problem by changing the custom header prefix to a traditional X- prefix.

Random email addresses being signed up to my website

Over the past few months random email addresses, some of which are on known spam lists, have been added at the rate of 2 or 3 a day to my website.
I know they aren't real humans - for a start the website is in a very narrow geographical area, and many of these emails are clearly from a different country, others are info# addresses that appear to have been harvested from a website, rather than something a human would use to sign up to a site.
What I can't work out is, what are reasons for somebody doing this? I can't see any benefit to an external party beyond being vaguely destructive. (I don't want to link to the site here, it's just a textbox where you enter email and press join).
These emails are never verified - my question isn't about how to prevent this, but what are some valid reasons why somebody might do this. I think it's important to understand why malicious users do what they do.
This is probably a list bombing attack, which is definitely not valid. The only valid use I can think of is for security research, and that's a corner case.
List bomb
I suspect this is part of a list bombing attack, which is when somebody uses a tool or service to maliciously sign up a victim for as much junk email as possible. I work in anti-spam and have seen victims' perspectives on this: it's nearly all opt-in verifications, meaning the damage is only one per service. It sounds like you're in the Confirmed Opt-In (COI) camp, so congratulations, it could be worse.
We don't have good solutions for list bombing. There are too many problems to entertain a global database of hashed emails that have recently opted into lists (so list maintainers could look up an address, conclude it's being bombed, and refuse to invite). A global database of hashed emails opting out of bulk mail (like the US Do Not Call list or the now-defunct Blue Frog's Do Not Intrude registry but without the controversial DDoS-the-spammers portion) could theoretically work in this capacity, though there'd still be a lot of hurdles to clear.
At the moment, the best thing you can do is to rate-limit (which this attacker is savvy enough to avoid) and use captchas. You can measure your success based on the click rate of the links in your COI emails; if it's still low, you still have a problem.
In your particular case, asking the user to identify a region via drop-down, with no default, may give you an easy way to reject subscriptions or trigger more complex captchas.
If you're interested in a more research-driven approach, you could try to fingerprint the subscription requests and see if you can identify the tool (if it's client-run, and I believe most are) or the service (if it's cloud-run, in which case you can hopefully just blacklist a few CIDR ranges instead). Pay attention to requesters' HTTP headers, especially the referer. Browser fingerprinting it its own arms race; take a gander at the EFF's Panopticlick or Brian Kreb's piece on AntiDetect.
Security research
The only valid case I can consider, whose validity is debatable, is that of security research (which is my field). When I'm given a possible phishing link, I'm going to anonymize it. This means I'll enter fake data rather than reveal my source. I'd never intentionally go after a subscription mechanism (at least with an email I don't control), but I suppose automation could accidentally stumble into such a thing.
You can avoid that by requiring POST requests to subscribe. No (well-designed) subscription mechanism should accept GET requests or action links without parameters (though there are plenty that do). No (well-designed) web crawler, for search or archiving or security, should generate POST requests, at least without several controls to ensure it's acceptable (such as already concluding that it's a bad actor's site). I'm going to be generous and not call out any security vendors that I know do this.

Can I put star (★) in my email subject?

I got a request from my client that they want to add stars (★) to their email subject (They send these mails through the application we made as a part of bigger CRM for them).
I tried to send a test mail, and the email title is displayed nicely in my Gmail account, and I must agree with my client that it is eye catching, but what came to my mind is that this may be a spam magnet, so I googled about it but I can't find the actual "don't do this".
Generaly, my oppinion would be not to use it, but now I have to explain to the client why. My best explanation whould be there is a probability your emails will be treated as spam but I don't have the background for this statement.
Do you have any suggestions about what should I do?
The only information I could find is on the SpamAssassin page of how to avoid false positives. The only relevant part I found was this part.
Do not use "cute" spellings, Don't S.P.A.C.E out your words, don't put
str#nge |etters 0r characters into your emails.
SpamAssassin is a very widely used spam filtering tool. However, simply breaking one of the rules (strange characters) alone wouldn't get an email marked as spam. But combined with some other problems could lead to your email being considered spam. That being said, if your email is a completely legitimate business email, it's likely that few other rules are triggered, and using the special characters wouldn't create a huge problem. That being said, you should probably try out a couple test emails on SpamAssassin and a couple other spam filtering tools in order to come to a better conclusion on the emails you plan to send out.
Simply explain to your client as you have explained to SO: you stated that the star made it eye catching: this doesn't directly mean that it will be treated as spam, but you could explain how that concept COULD be considered spam.
If the star is part of their branding, however, this could be quite a nice way in which your client expresses themselves.
Spam emails are becoming more and more like what one would consider 'normal', so I think they have trial it internally, test the concept.
Talk it over with your client - there is going to be no basis in hard fact with things like this, purely social perception.
More and more retailers are using unicode symbols in their subject lines since a few months. Of course it's in order to gain more attention in cluttered inboxes. Until now, there has been absolutely no evidence that such symbols increase the likelihood of failing spam filter tests. However, keep in mind that rare symbols might not render (correctly) across all mail user agents. Especially keep an eye on Android and Blackberry smartphones, but also on Outlook. In addition, due to a Hotmail bug symbols will render much bigger in subect lines and in the email body within the web front end. In fact, they are beeing replaced by images. All in all, the star shouldn't make any problems. At least, if it's encoded correctly in the subject line. So, go for it.

How to identify emails sent by humans?

I am working on a project, where I need to identify emails sent by real humans as opposed to bulk mails, notifications and newsletters. Is there any definite way of doing that? Is there any information in email header which can help. I am working on top of Gmail IMAP so I already have non-spam emails.
Any help in this regard is appreciated. Thanks!
There isn't a clear way to distinguish bulk mail from personalised mailings. Unlike with spam, most bulk mail is requested/expected, so the sender doesn't do odd things to get round spam filters, which means these emails often blend in fairly well.
However, there are some trends that you can look for. If you want to do it reliably, you will probably need to apply some scoring system, like spam-filters do.
You will also need to accept that you are bound to get a substantial proportion of false positives and false negatives.
Some things that are common to bulk mail that appear less often in personalised correspondence:
"To" and "Cc" addresses do not contain a local recipient. Sometimes the sender will send to "mailList#mydomain.com" instead of "recipientA#recipientAdomain.com", "recipientB#recipientBdomain.com", etc. In these cases, it is also likely that only one address appears in "To" and nothing appears in "Cc"
"From" address is "noreply#", "newsletter#", "do-not-reply#", "mailinglist#", even less common terms like "support#" or "sales#" (but remember, they could cause false positives)
The presence of a "List-Unsubscribe:" header
The message contains an unsubscribe link. Run pattern matching to find common phrases in the final few lines of the email. Look for links, or words such as "unsubscribe", "opt out", etc.
Mailing lists tend to have rich content. Check for heavy use of CSS and lots of images, the entire message being contained within a <table></table> or <ul><li></li></ul> structure. i.e. the stuff that something like Dreamweaver would put in, rather than a mail client.
Headers or bold content at the top of the message. If the first bit of a message resembles a newsletter, it's probably a newsletter.
Lots of links or frequent linking to the same (or same few) websites. Newsletters will try to guide the user to the company's site(s), as much as they can. You may score this even more highly if the linked domain matches (or resembles) the sender domain.
Heavy references to social media. If it's a newsletter containing several articles, each story may have its own "Tweet this", "Like this" link. Personal users are likely to contain (at most) one reference to Twitter, Facebook, etc (in their signature)
Notifications and other auto-generated messages will often follow the same basic format. If you have the capabilities, run some kind of diffing or other comparison against previous messages. A strong match would imply automation.
There is no greeting, or a generic greeting. However, personal emails will often skip the "Dear Fred" bit too, so this isn't a good enough detection by itself; but things like "Dear User" or "Dear Customer" are almost certainly generic.
Unlikely to end in "Regards, Ian" or "Yours Sincerely, John Doe"
The sender has scored highly before. Keep a record. If a sender triggers a high score several times, they are almost certainly bulk mailing.

Is it still worth obfuscating email-addresses to prevent harvesting?

I was wondering, is it really worth the trouble to implement email-obfuscation techniques in order to prevent emails from being harvested these days? My initial thought is no but i might be wrong. My (possibly inaccurate) arguments:
spam filtering and detection is superior these days (when looking at my gmail spambox over 90% of all mail i receive is spam but none ends up in my inbox). Is it safe to assume the same for most other email services?
most techniques aren't 100% proof against advanced harvesting scripts so all effort could be in vain.
You might argue that it's no trouble to obfuscate an e-mail address but i notice a lot of our clients enter their e-mail addresses through our CMS which thus requires me to filter out the e-mail adresses from the text and replace it with an obfuscated version which obviously is a little more trouble.
I'd like to hear from other people wondering the same or actually proving me wrong :)
If it's your address, you can do whatever you see fit.
If it's not your address, you might want to ask the owners. (Or check DNS to see if it's hosted on Google Apps)
As I described here, it is possible to block even the most advanced harvesters. (Unless they specifically target your site and work with the script)