Email headers of a real person - email

I am working on creating a system that requires me to differentiate a email that is sent by a real person or a automated bot.
I used gmail-api and got pretty sure that all emails under "Personal" label would have all the real life people emails, including important automated ones. Now how do I differentiate from there?

Tricky problem probably not solvable using the view of one mailbox content. If working across a large organisation where access and reading of all inbound email is implemented some sort of categorisation may be possible. Using the toolbox of spam filtering technology, Bayes filtering on keywords, urls, pictures etc. it might be possible to categorise email in to personal / work / commercial / bulk spam / transaction receipts. Check all other message headers as part of the email context.
Can you do the differentiate task by hand ? Automate how you did that.

Related

Sending Bulk Email To Members of Application

I've been reading up on sending mass email to a user-base, and I'm not feeling comfortable using the PHP mail function. It tends to be too simple, spammy and unreliable.
But that leads me to my question... for a custom application, what should I be using to send email to potentially hundreds of people? ...or is mail okay to use?
I appreciate the help.
I would use a third-party service. There are several of them. They ensure the emails are sent from white-listed IP's and have spent a LOT of money on legal prep for terms, privacy policy, etc to ensure the ISP's play nicely with the incoming mail.
If you're only sending mail to potentially hundreds of people, and not hundreds of thousands of people, PHP's sendmail will handle the load fine. You should be worrying more about the newsletter content and the opt-out easiness than the capability of PHP to send your email. For small campaigns to hundreds of people, check out MyEmma.com as an example of a small-list solution.
What you're probably looking for is an API to offload your email calls to and let a service handle the delivery for you. Sending a large number of email messages from PHP can be tricky as if it's not done quickly enough you run the risk of time-outs, and tracking which have been sent is always troublesome if you want to re-try a big batch.
Not surprisingly there are several companies which offer an email API service to make this sort of thing significantly easier than doing it yourself:
SendGrid (PHP example)
Mandrill (PHP package) from MailChimp
Postmark (PHP libraries)
PostageApp (PHP example)
MailGun (PHP sample)
While I'm a developer for PostageApp, but I encourage you to try out many of these to see what works best for you.
In most cases you need to re-write a small portion of your application to work with the particular API or library used to access the API, and once that's done you can send a very large number of messages with one quick call. The delivery of those messages becomes the responsibility of your provider.
The truth is that the less money you're willing to spend on email sending, the more things you have to do yourself, such as:
White listing your sender IP address (especially if you're on shared hosting this can be a PITA, because the other users can mess things up for you).
Setting up SPF and DKIM to add trust for mail hosts (Hotmail, GMail, etc.)
Checking bounce emails
Handling ISP complaints
This is also amongst the things that third party providers charge you for; if you don't wish to bother with any of the above, feel free to use providers such as Mailchimp, Bluehornet, etc. Make sure they provide what you need before you whip your wallet, some might have surprising hidden costs (such as charging you extra for API usage, use of transactional emails, life-cycle emails, etc.)
If you don't mind doing a few of the above (like checking bounce / complaint emails and making some simple DNS changes) you could sign up for Amazon SES; it has a proper API and their email charges are the lowest I've seen so far and recently they have introduced DKIM (signed emails) support. You can also configure your sendmail (assuming dedicated hosting) to talk directly to SES, so it's easy to hook up any mail() based solution and run it.
First, thanks for everyone that has helped me with this thus far.
The answer I was looking for is http://mandrillapp.com/
This is the service behind MailChimp and it rules in every way!

How to send bulk emails with good success rate?

There are many articles and threads about guidelines while sending bulk emails. Most of the times it was mentioned that emails should be sent to the subscribed users. So that we can avoid "users clicking on spam in their mail boxes".
There are some features in famous social networking sites where we can send invitations to Yahoo contact list. In such scenarios the persons in our contact list actually did not subscribe with Linkedin to get invitations or other mails. But the hard thing to understand is how do Linkedin and Facebook mails don't go to spam?
Developers often attack this problem from "how can I do everything (DomainKeys, et al) to ensure the message will not end up in someone's spam box?".
I used to work on a website that mailed 20,000 users weekly.
The key point in keeping out of the spam box - beyond all the technical components laid out in the link above posted by Martin - is making sure that your recipients do not spam-can you.
Bayesian detection algorithms on mail servers source data from users who click "this is spam" in their mail clients, so if your users think you are SPAM, you are - and you'll get lower delivery percentages (be prepared: 100% is impossible).
Use your brand or website name as the From: field. Your customer is most likely to remember that, if nothing else. Attempts to "be close" to the customer by using a name of someone in your organization is usually a bad idea, even if you're consistent. Identify your organization UP FRONT, and in the subject line every time.
Make the unsubscribe link incredibly easy to find (not at the bottom!). Users will say "this is spam" to make the message go away if they can't figure out how to get off your list. You don't want these people on your list anyhow.
Double opt-in. Don't add anyone to your list without their explicit opt-in. Not only do you have their blessing, but the delivery of the opt-in message indicates that they're getting your mail. The ever-famous "call to action" of "please add us to your address book" is viable, but I personally never do this when a service asks me to.
Have a schedule on which you send, to whatever the user has agreed (weekly, monthly, etc). If you are not consistent, users will forget registering with you and the mail will seem random. Also, allow them to change the frequency as the next link right next to "unsubscribe" - many people unsubscribe because they are annoyed by too-frequent messages.
To avoid being considered as a SPAM, the easiest and most powerful method, is to not send a SPAM.
Check this to understand what is a SPAM and avoid sending one: http://en.wikipedia.org/wiki/Spam
Jeff Atwood has written quite a good article on this.
From now on, we are using a commercial provider. There are several on the market (like critsend, which is among the more inexpensive ones) which also do bounce handling and make sure that "permanent failures" are not sent out again in the next bulk.

How can I test email generating without creating and checking 1,000 addresses?

I am working on generating emails using Java. I would love to use some sort of system where I can send out mass emails and ensure that the emails were received by the intended recipient (A. My code to send the emails worked and B.The emails were not marked as spam). How can I do this without setting up (and keeping track of) a couple hundred email addresses and then checking each one individually?
Thanks!
Jeff Atwood's latest blog post talks about several things you need to do when sending email to users:
So You'd Like to Send Some Email (Through Code)
Whenever I've needed to test a system like this, I simply use an override property in the code that substitutes my own email address, creating a loopback test.
I don't know how you're going to test the "not marked as spam" thing, without having at least a handful of test accounts at various providers. Every email provider uses different bayesian filters to filter out spam.

Sending Email Broadcasts

I'm working on an application that will allow management to send registered users (opt-in) broadcast emails at regular intervals, or based on various other criteria. In any case, I'm curious as to whether I should send a separate email to each recipient or bcc all of them on a single message. Currently the email list would be about 1500 recipients, but it should scale all the way up to at least 25k without problems.
Thoughts? Am I getting into a range that I need to worry about being put on spam lists?
Yes, I've had spam list problem with mailing lists of that size, managing email lists for non-profits.
One wants to take extra precautions: make sure your email has SPF records, write a script to send the emails in batches, paced out over time. Definitely send them one one at a time, not as bcc, as direct mail has a better chance of arriving. Make it very easy to unsubscribe. Include people's subscribed email in the message sent -- often people have email forwarded to another account and then try to unsubscribe that account and get frustrated.
Even so, don't be surprised if you have to change your IP at some point.
You are getting into that range. This is the point where I would look to get a third party to send the email on my behalf. Let them worry about being marked as spammers, supply the bandwidth, etc.
I recently built an application with those same criteria. We do the emailing in-house, and send one email to each recipient.
Do use domain keys signing or be sure to use SPF records for your domain. We didn't do that at first, and were blacklisted by a number of different ISPs. Fortunately, it is fairly easy to get them to unblock you. Most will include an online form you can fill out or an email address you can use in the server bounce message.
Don't try to implement the actual email sending yourself. That's a huge waste of time. Either outsource the entire process to one of the many reputable vendors out there (Many organizations I deal with use Constant Contact, and it works well), or run a garden-variety mailing list server (e.g. Mailman) in-house.
Either way, take efforts to make it very easy to unsubscribe (good vendors have that covered), to authenticate that messages are from your company, and to show that your company is not spamming. Real mailing list server software supports all of these goals, by adding proper headers that identify the source very clearly and making unsubscription easy. For instance, Gmail will now offer to send unsubscribe requests in response to mailing list messages marked as 'spam', as has AOL for a long time.
Definitely set up SPF and DKIM if you can manage it.
Finally, whatever you do, make sure you keep logs of your subscriptions, so that if someone does accuse you of spamming, you can defend yourself.
The task is mostly uninteresting on a strictly technical level. You should worry about what happens when a recipient thinks that your list's content is spam and starts (a) complaining or (b) flagging the message as spam with one or more anti-spam service providers. Something like this is bound to happen with a list of the size you describe.
If you are prepared and have the time handle such cases, go for it, at least for a start. (Changing your mail server's IP address as Devin Ceartas suggests won't be of much use by the way.)
If you want to build your own thing, I have two pieces of advice:
Unsubscribing has to be easy, no more than one or two clicks. Using Mailman or any other mailing list manager that was intended for discussion mailing lists is asking for trouble.
BCCing the same message to 1500 (or 25k) recipients may take some load off your mail server, but it has one serious disadvantage: You won't be able to use VERP in order to determine if all addresses that have once been subscribed to your list are still valid. (Large mail providers tend to classify messages as spam if there are delivery attempts to many invalid addresses.)

Guidelines for accepting email messages as input to application

A number of applications have the handy feature of allowing users to respond to notification emails from the application. The responses are slurped back into the application.
For example, if you were building a customer support system the email would likely contain some token to link the response back to the correct service ticket.
What are some guidelines, hints and tips for implementing this type of system? What are some potential pitfalls to be aware of? Hopefully those who have implemented systems like this can share their wisdom.
Some guidelines and considerations:
The address question: The best thing to do is to use the "+" extension part of an email (myaddr**+custom**#gmail.com) address. This makes it easier to route, but most of all, easier to keep track of the address routing to your system. Other techniques might use a token in the subject
Spam: Do spam processing outside the app, and have the app filter based on a header.
Queuing failed messages: Don't, for the most part. The standard email behavior is to try for up to 3 days to deliver a message. For an application email server, all this does is create giant spool files of mail you'll most likely never process. Only queue messages if the failure reasons are out of your control (e.g., server is down).
Invalid message handling: There are a multiple of ways a message can be invalid. Some are limitations of the library (it can't parse the address, even though its an RFC valid one). Others are because of broken clients (e.g., omitting quotes around certain headers). Other's might be too large, or use an unknown encoding, be missing critical headers, have multiple values where there should only be one, violate some semantic specific to your application, etc, etc, etc. Basically, where ever the Java mail API could throw an exception is an error handling case you must determine how to appropriately handle.
Error responses: Not every error deserves a response. Some are generated because of spam, and you should avoid sending messages back to those addresses. Others are from automated systems (yourself, a vacation responder, another application mail system, etc), and if you reply, it'll send you another message, repeating the cycle.
Client-specific hacks: like above, each client has little differences that'll complicate your code. Keep this in mind anytime you traverse the structure of a message.
Senders, replies, and loops: Depending on your situation, you might receive mail from some of the following sources:
Real people, maybe from external sources
Mailing lists
Yourself, or one of your own recipient addresses
Other mail servers (bounces, failures, etc)
Entity in another system (my-ldap-group#company.com, system-monitor#localhost)
An automated system
An alias to one of the above
An alias to an alias
Now, your first instinct is probably "Only accept mail from correct sources!", but that'll cause you lots of headaches down the line because people will send the damnedest things to an application mail server. I find its better to accept everything and explicitly deny the exceptions.
Debugging: Save a copy of the headers of any message you receive. This will help out tremendously anytime you have a problem.
--Edit--
I bought the book, Building Scalable Web Sites, mentioned by rossfabricant. It -does- have a good email section. A couple of important points it has are about handling email from wireless carriers and authentication of emails.
You can set the address that the email is sent from, what will be put into the To: address if someone just presses 'Reply-to'. Make that unique, and you'll be able to tell where it came from, and to where it must be directed back to.
When it comes to putting a name beside it though '"something here" ' - put something inviting to have them just reply to the mail. I've seen one major web-app, with Email capturing that has 'do not reply', which turns people off from actually sending anything to it though.
Building Scalable Web sites has a nice section on handling email. It's written by a Flickr developer.
(source: lsl.com.au)
EDIT: I misunderstood your question.
You could configure your email server to catch-all, and generate a unique reply-to address. E.g. CST-2343434#example.com.
A polling process on the server could read the inbox and parse out the relevant part from the received email, CS-2343434 could mean Customer Support ticket ID no. 2343434.
I implemented something like this using JavaMail API.
Just a thought.
The best way to achieve this will be to write a window service that acts like a mail client [pop3 or imap]. This windows service should execute a timed action triggered by a timer, which connects to the mail server and polls the server for any unread message(s) available in the email inbox. The email ID to check for is the email ID on which the users will give their input on/to. If the windows service client finds that there exists any new mail(s) then it should download and filter the email body and push further for processing based on the user input in the email. You can host the input processing in the same windows service but it is not advisable to do so. The windows service can put the inputs in a special application directory or database from where your main appication can read the user inputs received in email and process them as needed.
You will be required to develop a high performance TCP/IP client for doing so. I advise you not to use the default .Net library due to performance issues, instead use one of the best availabel open source TCP/IP implementations for .Net like XF.Server from kodart. we have used this in our applications and achieved remarkably grear results.
Hope this helps..
Bose has a pretty great system where they embed a Queue and Ticket ID into the email itself.
My company has the traditional Case # on the subject line, but when CREATING a case, require a specific character string "New Case" "Tech Support Issue" on the subject line to get through the spam filters.
If the email doesn't match the create or update semantics, the autoresponder sends an email back to the recipient demonstrating how to properly send an email, or directs them to our forums or web support site.
It helps eliminate the spam issue, and yet is still accessible to a wide technical audience that is still heavily email dependent.
Spam is going to be a bit of a concern. However since you are initiating the conversation you can use the presence of your unique identifier (I prefer to use the subject line - "Trouble ticket: Unable to log into web...[artf123456]") to filter out spam. Be sure to check the filter on occasion since some folks mangle the subject when replying.
Email is a cesspool of bad standards and broken clients. You need to be prepared to accept almost anything as input. You will need to be very forgiving about what kinds of input are tolerated. Anything easy for you to program will likely be difficult for your users to use correctly. Consider the old mailing list programs that require you to issue commands in the subject line. Only hardcore nerds can use those effectively. And some of those trouble-ticket CRM things you mentioned have bizarre requirements, such as forcing the user to reply between two specific text markers in the text. That sort of thing is confusing to people.
You'll need to deal with email clients that send you formatted text instead of plain text. Some email clients still don't handle HTML properly (cough GMail) so your replies will also need to be designed appropriately. There are various ways in which photos might be "uploaded" via email as well, especially when mobile phones are involved. You will need to implement various hacks and heuristics to deal with these situations.
It's also entirely possible that you will get email that is valid but unusable by the email parsing library you are using. Whether or not this is important enough to roll your own will be a judgement call.
Finally, others have mentioned using specific email addresses to uniquely identify a "conversation". This is probably the easiest way to do this, as the content of the mail will often not survive a round trip to a client. Be prepared, however, to get mail to old IDs from old customers who, instead of opening a new ticket somehow, reply to an old ticket. Your application will probably need some way to push emails with an old ID into a new case, either manually or automatically. For a CRM system it's very likely that a user would reply to an old email even if you already sent him a new email with a new ID in it. As for whether you should use some.email.address+some.id#yourdomain.com or just some.id#yourdomain.com, I'd go with the latter because the plus-sign confuses some email clients. Make your IDs guids or something and have some way to validate them (such as a CRC or something) and you'll get less junk. Humans should never have to type in the GUIDs, just reply to them. The downside is spam filtering: a user's computer might view such email addresses as spam, and there wouldn't be an easy way to whitelist the addresses.
Which reminds me: sending email these days is full of pitfalls. There are many anti-spam technologies which make it extremely hard for you to send email to your customers. You will need to research all of these and you need to be careful, and do some testing, to ensure that you can reach the major email providers. A website like Campaign Monitor
can help you if you are sending email.