I have two users, Joe and Jill, on an email where they are both recipients in the to field. Using IMAP and their respective user oauth tokens, I poll GMail and get this email. What fields will be unique to their user and what won't, e.g. will the email's message_uid be different for both of them or the same?
The message bodies themselves will be the same. The UID and ID will likely be different, but not necessarily. The message headers will be the same, except for possibly the Received headers.
There is completely reliable way of detecting this. The usual way of dealing with this is to work on unique Message-Id header which is supposed to be unique, but in real world, you will sooner or later came across a pair of messages which are vastly different, yet share the same Message-Id.
Tricks like hashing the raw, undecoded message body tend to work as well (or at least until you hit a user who happens to use a mail server which changes the MIME encoding of such a message).
Related
Like many web apps, we use Postmark to send all notifications for server side events. Many of our events are grouped and related by something simple and logical (think multiple replies to the same issue, like in GitHub).
Right now, every email sent for these related events is it's own email thread. My question is: how do I send these emails so that related ones get pushed into the same thread?
I'm not sure if this is something at the Postmark level (like include a previous message ID) or if this is something I do with SMTP (like I should format my subject better and inline previous responses), so that's why I'm seeking guidance. Also, every Google search about: "Postmark email threads" returns concerns over the thread safety of the Ruby Gem.
For more information, the app is written in PHP and right now we are znarkus/postmark-php for sending emails and jjaffeux/postmark-inbound-php for parsing inbound ones. However, I am more than willing to add any extra packages if they help me in my quest.
Thanks in advance!
You can add a few SMTP headers with the original Message-ID that most clients use to link together replies. If the original email had a Message-ID header of <123#mail.example.com> the new email you send out should keep the subject line the same and add headers of:
In-Reply-To: <123#mail.example.com>
References: <123#mail.example.com>
And that should inform clients that the two emails should be threaded.
Edit:
The value for these headers should be the SMTP Message-ID header, which is slightly confusing because it is a separate concept from the Postmark MessageID value, which is just a UUID for the email.
The SMTP Message-ID header is always in the form an email address, because that's how it's supposed to be formed, but doesn't have to be related to the from address.
Story is simple: one user creates a new discussion, and system sends out email notification to other users about that. When these users reply to a notification, their replies should be properly routed as comments to the particular discussion.
When system sends out email notification, it includes routing code in the subject. For example, subject of a notification may look like this: 'Discussion "Lets Talk" has been started {123}'. Since all email clients use Re: ORIGINAL SUBJECT we get {123} back as part of the subject, parse it and know where to put the comment.
We have this working already (had it for years actually), but current implementation looks a bit dirty (especially when codes become longer) so we would like to explore alternatives if there are any. Is there are a more elegant way to approach this, that works reliably across most email clients? Email header that we might be missing? Something similar?
Thanks so much
Since you didn't mention it I'm not sure if you looked into this:
There is a field in the email header called In-Reply-To which should contain the message id(s) of the email(s) that mail is replying to and one name References which should specify a thread this mail belongs to:
"In-Reply-To:" field may be used to identify the message (or
messages) to which the new message is a reply, while the
"References:" field may be used to identify a "thread" of
conversation.
According to the rfc the In-Reply-To field should contain the "parent"-message's Message-Id while the References field will quote the parent-message's References field.
The problem with this fields is that there is no guarantee that there is something useful in them because they are not required to be filled correctly for mail delivery so some mail clients might not fill them correctly or maybe not even at all.
I found this article about building a threading algorithm using the In-Reply-To field and claiming to be robust against garbage and malicious input in these fields.
I'm making a webstore with integrated customer service. Every few minutes, the system will retrieve emails into the database, parsing headers and relating the message to customers and orders.
Customer threading is fairly reliable via the messages From: header. But what about orders? It seems most people use the Reply-To: header for threading orders ...
From: <orders#company.com>
To: <person#place.com>
Subject: Company Order #314159
Reply-To: <order-314159#company.com>
But a messy Reply-to: obscures and uglifies things, and probably flags spam sensors or something. I definitely don't want to count on the Subject: field, people modify the subject all the time, even when replying. There are other headers that seem suited to the job, like ...
From: <orders#company.com>
To: <person#place.com>
Subject: Company Order #314159
Message-ID: <314159-2>
... or ...
In-Reply-To: <314159-1>
But are these sent back when the person reply? Are there any headers (other than Reply-To:) that are reliably copied into replies and forwards?
You can't rely entirely on headers being preserved. When replying or forwarding, a mail client creates a new message; that mail client can quite legitimately ignore or alter any content, as deemed appropriate.
You might be able to track by the following means, but all are vulnerable to alteration (mainly by the user, but also by a primitive mail client). You should really just use them to make a best-guess.
A disposable Reply-To address. Theoretically, you can also do it with "From" if you want, but Reply-To is better to ensure the user (and their mail server/client) recognise it's from you and act appropriately. I see no reason why a spam filter would care about disposable addresses. Seeing as most spam uses fake addresses anyway and doesn't care about replies, it is not really a spammer trick. It's unlikely to cause a substantial increase in spam filtering. Using a Reply-To for the same domain as your From address is also unlikely to look suspicious.
A unique subject. Yes, it can easily be changed, but usually the existing subject is appended to, rather than deleted (especially if it obviously contains some kind of reference number). You could apply a regular expression match - maybe only using it as a confirmation of your other detection methods.
A unique string in the body (possibly preceded by the words "DO NOT REMOVE THIS LINE")
The In-Reply-To and Reference headers are probably fine, when supported. There is a small chance that a user will copy their reply into a new blank message and trash the headers anyway.
Reply-To is sadly not entirely reliable. All replies should have References: which is better standardized than In-Reply-To: which is not easily machine readable.
Your best bet may be to set the envelope header to a unique identifier, perhaps with a From: and Sender: combo that directs replies to the right place but displays nicely.
See also tangentially Dan Bernstein's notes; http://cr.yp.to/immhf.html and in particular http://cr.yp.to/immhf/thread.html
I don't think you can count on anything when it comes to forwards.
Although you have already received some answers, however, we had a similar situation where we supposed to send emails to customers and read them back and associated them with various activities.
During the research the the only HEADER we found that does not get replaced or Removed by the various email clients (Outook, Yahoo, Gmail etc.) was "XREF".
We have thoroughly tested it and it has been working since we first introduced it.
Am trying to determine the best way to persist information from an originating email, through to a reply back.
Essentially, it is to pass a GUID from the original email (c#), whereby when the receiver replies back, that GUID is also sent back for reference.
I have tried setting the MessageID, whereby using Outlook, the In-Reply-To value is set with the original ID, however using some webclient email systems, that value is not created on reply. Is there another way to sent this info through email headers?
Some variation on VERP is probably the most reliable...
http://en.wikipedia.org/wiki/Variable_envelope_return_path
Specifically, instead of having all your replies coming to the same address, encode the information you want to persist into the From address for the email.
For example, in the case of a helpdesk ticket, you could use something like:
From: Helpdesk <support-ticket-123#example.com>
To: End User <user#example.org>
Subject: Ticket #123 - problem with computer.
That way, regardless of what the user edits in the subject or text, you know what ticket is being referred to by the receiving email address.
I don't think you'll be able to do anything that is perfectly reliable by headers alone -- the number of clients that would have to cooperate is immense.
Most systems that do this work by including something in the body of the email that is sent that allows it to identify the message, and including text instructing the recipient to include that block of text in the response. You could also try including it in the subject (and including text in the body to leave the subject unchanged). That's how some mailing list managers I've seen do it.
I stumbled upon this question, and it's been very informative. This, however, leaves me with one question: Will using VERP, or a variation of editing the 'reply-to' or 'from address', cause the messages to be locked up in spam filters?
I have read that spam senders often change the bounce address to prevent their servers from getting clogged with bad email address bounces. Is it a spam risk to assume this approach?
The most reliable way is to put the ID in the subject, which should be preserved throughout replies.
(It doesn't hurt to tell your users that they should keep the subject intact.)
RT, a popular ticketing system, does this. They use a simple subject format like "[Ticket #123]" and key off of 123.
A number of applications have the handy feature of allowing users to respond to notification emails from the application. The responses are slurped back into the application.
For example, if you were building a customer support system the email would likely contain some token to link the response back to the correct service ticket.
What are some guidelines, hints and tips for implementing this type of system? What are some potential pitfalls to be aware of? Hopefully those who have implemented systems like this can share their wisdom.
Some guidelines and considerations:
The address question: The best thing to do is to use the "+" extension part of an email (myaddr**+custom**#gmail.com) address. This makes it easier to route, but most of all, easier to keep track of the address routing to your system. Other techniques might use a token in the subject
Spam: Do spam processing outside the app, and have the app filter based on a header.
Queuing failed messages: Don't, for the most part. The standard email behavior is to try for up to 3 days to deliver a message. For an application email server, all this does is create giant spool files of mail you'll most likely never process. Only queue messages if the failure reasons are out of your control (e.g., server is down).
Invalid message handling: There are a multiple of ways a message can be invalid. Some are limitations of the library (it can't parse the address, even though its an RFC valid one). Others are because of broken clients (e.g., omitting quotes around certain headers). Other's might be too large, or use an unknown encoding, be missing critical headers, have multiple values where there should only be one, violate some semantic specific to your application, etc, etc, etc. Basically, where ever the Java mail API could throw an exception is an error handling case you must determine how to appropriately handle.
Error responses: Not every error deserves a response. Some are generated because of spam, and you should avoid sending messages back to those addresses. Others are from automated systems (yourself, a vacation responder, another application mail system, etc), and if you reply, it'll send you another message, repeating the cycle.
Client-specific hacks: like above, each client has little differences that'll complicate your code. Keep this in mind anytime you traverse the structure of a message.
Senders, replies, and loops: Depending on your situation, you might receive mail from some of the following sources:
Real people, maybe from external sources
Mailing lists
Yourself, or one of your own recipient addresses
Other mail servers (bounces, failures, etc)
Entity in another system (my-ldap-group#company.com, system-monitor#localhost)
An automated system
An alias to one of the above
An alias to an alias
Now, your first instinct is probably "Only accept mail from correct sources!", but that'll cause you lots of headaches down the line because people will send the damnedest things to an application mail server. I find its better to accept everything and explicitly deny the exceptions.
Debugging: Save a copy of the headers of any message you receive. This will help out tremendously anytime you have a problem.
--Edit--
I bought the book, Building Scalable Web Sites, mentioned by rossfabricant. It -does- have a good email section. A couple of important points it has are about handling email from wireless carriers and authentication of emails.
You can set the address that the email is sent from, what will be put into the To: address if someone just presses 'Reply-to'. Make that unique, and you'll be able to tell where it came from, and to where it must be directed back to.
When it comes to putting a name beside it though '"something here" ' - put something inviting to have them just reply to the mail. I've seen one major web-app, with Email capturing that has 'do not reply', which turns people off from actually sending anything to it though.
Building Scalable Web sites has a nice section on handling email. It's written by a Flickr developer.
(source: lsl.com.au)
EDIT: I misunderstood your question.
You could configure your email server to catch-all, and generate a unique reply-to address. E.g. CST-2343434#example.com.
A polling process on the server could read the inbox and parse out the relevant part from the received email, CS-2343434 could mean Customer Support ticket ID no. 2343434.
I implemented something like this using JavaMail API.
Just a thought.
The best way to achieve this will be to write a window service that acts like a mail client [pop3 or imap]. This windows service should execute a timed action triggered by a timer, which connects to the mail server and polls the server for any unread message(s) available in the email inbox. The email ID to check for is the email ID on which the users will give their input on/to. If the windows service client finds that there exists any new mail(s) then it should download and filter the email body and push further for processing based on the user input in the email. You can host the input processing in the same windows service but it is not advisable to do so. The windows service can put the inputs in a special application directory or database from where your main appication can read the user inputs received in email and process them as needed.
You will be required to develop a high performance TCP/IP client for doing so. I advise you not to use the default .Net library due to performance issues, instead use one of the best availabel open source TCP/IP implementations for .Net like XF.Server from kodart. we have used this in our applications and achieved remarkably grear results.
Hope this helps..
Bose has a pretty great system where they embed a Queue and Ticket ID into the email itself.
My company has the traditional Case # on the subject line, but when CREATING a case, require a specific character string "New Case" "Tech Support Issue" on the subject line to get through the spam filters.
If the email doesn't match the create or update semantics, the autoresponder sends an email back to the recipient demonstrating how to properly send an email, or directs them to our forums or web support site.
It helps eliminate the spam issue, and yet is still accessible to a wide technical audience that is still heavily email dependent.
Spam is going to be a bit of a concern. However since you are initiating the conversation you can use the presence of your unique identifier (I prefer to use the subject line - "Trouble ticket: Unable to log into web...[artf123456]") to filter out spam. Be sure to check the filter on occasion since some folks mangle the subject when replying.
Email is a cesspool of bad standards and broken clients. You need to be prepared to accept almost anything as input. You will need to be very forgiving about what kinds of input are tolerated. Anything easy for you to program will likely be difficult for your users to use correctly. Consider the old mailing list programs that require you to issue commands in the subject line. Only hardcore nerds can use those effectively. And some of those trouble-ticket CRM things you mentioned have bizarre requirements, such as forcing the user to reply between two specific text markers in the text. That sort of thing is confusing to people.
You'll need to deal with email clients that send you formatted text instead of plain text. Some email clients still don't handle HTML properly (cough GMail) so your replies will also need to be designed appropriately. There are various ways in which photos might be "uploaded" via email as well, especially when mobile phones are involved. You will need to implement various hacks and heuristics to deal with these situations.
It's also entirely possible that you will get email that is valid but unusable by the email parsing library you are using. Whether or not this is important enough to roll your own will be a judgement call.
Finally, others have mentioned using specific email addresses to uniquely identify a "conversation". This is probably the easiest way to do this, as the content of the mail will often not survive a round trip to a client. Be prepared, however, to get mail to old IDs from old customers who, instead of opening a new ticket somehow, reply to an old ticket. Your application will probably need some way to push emails with an old ID into a new case, either manually or automatically. For a CRM system it's very likely that a user would reply to an old email even if you already sent him a new email with a new ID in it. As for whether you should use some.email.address+some.id#yourdomain.com or just some.id#yourdomain.com, I'd go with the latter because the plus-sign confuses some email clients. Make your IDs guids or something and have some way to validate them (such as a CRC or something) and you'll get less junk. Humans should never have to type in the GUIDs, just reply to them. The downside is spam filtering: a user's computer might view such email addresses as spam, and there wouldn't be an easy way to whitelist the addresses.
Which reminds me: sending email these days is full of pitfalls. There are many anti-spam technologies which make it extremely hard for you to send email to your customers. You will need to research all of these and you need to be careful, and do some testing, to ensure that you can reach the major email providers. A website like Campaign Monitor
can help you if you are sending email.