Do not send mails with duplicate subjects - email

we've got different processes that send mails in case of issues encountered (e.g. not enough permissions to perform an operation on a certain order item). This works fine to the point that sometimes identical messages are sent every 5 minutes. In our environment it is very difficult to synchronize the email sending on application layer (actually there are different applications sending out email, so we'd have to touch every application if we were to implement this inside application layer).
It would seem logical for me that filtering out mails (by duplicate subjects) is best done within the email layer, e.g. the application receiving the SMTP requests.
Yet we'd also prefer not to go down to SMTP layer by ourselves, rather use an existing service/application.
Is anybody aware of a web mailer (like googlemail) which does this kind of filtering? it would be ok for us the pay for such a service, so being "free as in beer" would be nice, but being not free is not a showstopper.
Thanks in advance
Holger

I find the idea of filtering duplicate e-mail message by the Subject: header quite worrisome. If they are produced by multiple applications, how can you be certain that the content of the messages is duplicated and that you are not unwittingly dropping important notifications?
The only unique feature of a message that can be used to filter out duplicates is its Message-ID: header. If that header is the same for two messages, then it's usually reasonable to assume that they are copies of the same original message - e.g. one received directly and one that was CC'ed to a mailing list.
That said, you can do pretty much anything you want on most SMTP servers - at least those that are based on a Unix-like OS. For example, Postfix can use custom shell scripts for filtering.
You can, for example, use formail to extract the body of each message and produce its
MD5 hash. Comparing the message body hashes along with the Date:, Subject:, From:, To: and Cc: headers at the same time is a good start to detect real duplicates.

Related

Postfix, isolate multiple sites mail headers so if one get's blocked/blacklisted, the others sharing the server don't also get blacklisted

I have a few separate sites on a server with a single IP.
The sites shouldn't ever send spam, but the customers are free to send emails from their sites so I have no way to prevent them from doing so.
What I'd like to do is when sending the emails via postfix, somehow separate the sites in the headers sent out.
Previously i've setup an ip for each but i'm trying to avoid doing this.
I've also found with /etc/postfix/header_checks I can remove headers but not sure if removing specific headers will cause issues?
One thing to consider here is that blacklisting is usually based on IP addresses. Separate headers won't help much there. The reason for this is that (a) it's simple and (b) many spam sending servers have been compromised and taken over by an attacker, using custom mail sending software, so headers don't matter anymore.
Different headers might still have their merit though, as spamfilters will check those. It just won't help if your server's IP gets blacklisted.
I guess rolling out DKIM might help here, it would give you artificial separation of domains using different domain keys for each. There are some good tutorials on the net on how to set it up with OpenDKIM.
A better solution, used by big mail providers like GMX, is to send mail from a separate IP if it looks like its spam. The setup for this is a little complicated, as it requires you to scan outgoing mail with spamassassin (or something similar) and to route mail depending on the respective spam value. Not an easy task. Marking spam as such, without sending it through a separate IP, might enough to convince the other side that you try to prevent spam send from your server, but this really depends on their spam filter.
The way your server identifies itself during an SMTP conversation is through the HELO command. The smtp_helo_name parameters specifies the name used there. One could try to setup a transport mechanism to use a different name for each sender domain. I'm honestly mot sure how difficult that would be.
If you are still set on changing headers: the header_checks tables not only allow to remove headers, but also to alter them via regular expressions.
Use the REPLACE command to do so. Example:
/^(Message-ID:.*)#your-domain.example(.*)/ REPLACE ${1}#other-domain.example${2}
I'd advise against it, though. It provides to little gain for the effort of finding and setting up the right rules.

How can I read the importance of mails using Perl's Mail::IMAPClient?

I am using Mail::IMAPClient to connect to an IMAP server. Can someone advise on how to get the priority of the messages? I tried dumping the headers, but I cannot see where the importance is set.
That may depend on the client the sender used.
In my eyes, the most important importance setting is the header
Precedence: bulk
which indicates messages like delivery errors for which the sender doesn't even expect a failure notice, the mail system is simply free to drop these silently if any problems occur. (A good idea to set for outbound error messages, to avoid loops.)
Then, there is a header field called Importance which I'm not sure any mail client uses; RfC 4021 defines Priority, but it doesn't yet seem to be that common in the wild (hey, it's only been out for eight years!) and then there is the non-standard X-Priority, which may be what you are looking for.
Note that all of these are not part of the IMAP layer, but the message headers themselves. Depending on the IMAP server, IMAP flags may or may not have been set based on them, but to be sure you see what the sender thought about the mail (or their own ego), you may need to get the actual mail headers and look there.
That being said, most people I know simply ignore these values/appraisals, anyway.

What is the effect of a Precedence: bulk header on e-mail messages

According to Google Mail guidelines, bulk mail must contain the header
Precedence: bulk
What does it do? I could not find a RFC describing the effect on mail delivery.
Some background: I'm working on scripts that will send 500k+ mails daily. They are different kind of messages: account related mail (delivery critical), but also notifications of new content (delivery not critical).
The exact meaning of Precedence: isn't standardised, but it prevents some mail servers from sending vacation and bounce messages, and may be used by service providers to deprioritise bulk mail during busy times so that "personal" mail continues to be delivered quickly.
From RFC2076:
Non-standard, controversial, discouraged.
Sometimes used as a priority value which can influence transmission speed and delivery. Common values are "bulk" and "first-class". Other uses is to control automatic replies and to control return-of-content facilities, and to stop mailing list loops.
See the answers to this question, as well as RFC 3834.
In short, the Precedence: header is non-standard. Google's recommendation is perhaps just a way to give a hint to their spam filters that you really did intend to send out 500k+ emails from your server (as opposed to one abused by spammers).

What's the most straightforward way to delete emails marked as spam by SpamAssassin?

I'm on Ubuntu Intrepid, using Postfix and SpamAssassin. I've seen approaches using procmail (like the one suggested # Apache), but I'm looking for a solution that does not use procmail.
This is a programming question because the correct answer will be some form of code that accomplishes the task at hand (my response to the negative votes).
UPDATE to the situation: I used this tutorial and it worked out excellently: https://help.ubuntu.com/community/PostfixAmavisNew
It really depends at which level you want to delete the spam:
At the mail client level, using Email client rules (like the ones available in Thunderbird) is easy: just set a rule that delete any email marked as SPAM in the subject.
At the user level, if mail is received automatically by the machine, you could set some cron job that periodically inspect the local mailbox and again delete mails marked as SPAM.
It's easy if your local store uses maildir since each email is just a file, as opposed to the mbox format which would require some more work since it's a single file.
Setting up maildir for postfix is trivial.
At the server level, using Amavisd will allow you to have more control over how mail is handled.
Amavisd has threshold settings where you can define an evasive action depending on the spam score given by spamassassin.
For instance, anything above 15 points is put in quarantine and anything above 30 points is deleted.
There are some instructions for installing Amavisd on Ubuntu.
The point is, as far as I know, spamassassin's job is to identify and give spam points to emails. How you want these to be handled is not up to spamassassin but the other modules down the chain.

Guidelines for email newsletter service

I'm implementing a email newsletter sender service using .NET and Windows Server technologies. Are there comprehensive guidelines which could help avoiding emails being trapped by spam filters and other mechanisms?
They should cover all aspects of (legal) bulk mail sending: SMTP configuration, DNS, HTML content, images, links within content etc. A simple example: is it better to embed images or load them from a server?
It would be great if you could provide some empirical data to show the efficiency of some measures taken.
Although I don't have a definitive answer, I think this is a very important question.
Here are few tidbits I know about it
Choose a clean hosting/smtp server. IP addresses of spamming SMTP servers are often black-listed by other ISPs.
Send a simple introductory email to every subscriber, asking them to add your sender address to their safe list.
Be very prudent in sending to only those people who are actually expecting it. You wouldn't want pattern recognizers of spam filters learning the smell of your content.
If you don't know your smtp servers in advance, its a good practice to provide configuration options in your application for controlling batch sizes and delay between batches. Some servers don't like large batches or continuous activity.
Unless you have a very specific reason to host the newsletter yourself, I think you'd be much better off using a third party service. There are lots out there, and some are very cheaply priced.
It'll save you on development work
(no point in re-inventing the
wheel).
Their system will handle all
the unsubscribe link stuff that you
need to include in email newsletters
to comply with CAN SPAM laws or
whatever.
They handle the spam
reports that you will inevitably get
if you have a list of any non-trivial size.
They keep records of who signed up,
how they signed up, and their IP
address, and can present those on
receipt of a spam report to prove
that their service wasn't sending
out spam.
You can use double-opt in
(or confirmed opt in), for extra
evidence to prove that the people
you're sending emails to actually
signed up to receive them.
If you really do need to host it yourself I'd suggest you search the web for "email deliverability". Things that are known to help include properly set up SPF records, DomainKeys/DKIM, correct DNS settings (reverse DNS especially - best to just use an online service to check your DNS settings). You can test a lot of these things by sending an email to check-auth#verifier.port25.com.
It's best to avoid using spammy words in your email - always a bit of guesswork this but you some words can trip filters.
But I'd guess that by far the most important thing is to be sending your email from a trusted server that maintains good relationships with ISPs (i.e. ensuring that ISPs don't think that the server is sending out spam). This is a big reason why it's much much easier to get a third party to handle everything for you.