Idempotency with side effects - email

Imagine I have a message queue on to which I place messages saying that I want to send a user an email.
If I’m using a message broker that provides “at least once” delivery guarantees, then from all the resources I’ve been reading, they say “you must make sure your processing is idempotent”.
However, in the case of a side effect like sending an email, I don’t see how this is possible.
I have two possible choices:
Store the message ID in my database, then send the email.
Send the email, then store the message ID in the database.
When a message is received, I would then check the database to see if the message ID exists. If it does, I would skip the message as a duplicate.
However, the first case leaves me with “at most once” semantics on sending my email (if sending the email fails, it will be skipped next time the message is seen), and the second case gives me “at least once” semantics (if storing the ID in the database fails, I’ll end up sending multiple emails).
Some things I have read say “you need an email API that supports idempotency”, but as far as I can tell that just pushes the problem on to their servers - they still have the same dilemma.
Am I missing something here? Or is it just not possible to have idempotent message processing when that processing has external side effects?

I am going to claim that email inherently does not support exactly-once semantics. When SMTP protocol is used, your side can always crash in between the far side confirms and you persist the confirmation. Your best bet is to use at-least once semantics.
But you may get exactly-once UX experience. If every time you send/resend a message you use the same Message ID, then the client side (mail client) may de duplicate those messages. See https://en.wikipedia.org/wiki/Message-ID for details.

Related

How to process read receipt vs delivery receipt vs bounceback (JavaMail)

We have a requirement coming in to try to, as best we can, determine the progress an email made to the user. We know it's not 100%, and the solution I'm advising is to use an image/watermark in the email that is loaded from a URL that would record that the image was read...BUT there's a fair chance that they're going to rely on read/delivery receipts and bouncebacks. So I wanted to learn more about it, both to be ready and so I can argue against it in the meeting.
If we were to set up an email mailbox to receive bouncebacks, read receipts, and delivery receipts and then write a java program to poll said mailbox, get the messages and inspect them. How could I tell the bounces from the read receipts from the delivery receipts from the spam? I know that the read and delivery receipt REQUESTS are SMTP headers. Do the returning messages have a header that tells which they are? And do the bouncebacks? And if so, what are they? If not, am I parsing the message body? Does that differ from server to server? Is there any standard (or close to standard) thing in it? Like the word 'undeliverable' is always there?
I tried to google, but all the hits I could get were about REQUESTING the receipts.
How could I tell the bounces from the read receipts from the delivery receipts from the spam?
Jakarta Mail and JavaMail have com.sun.mail:dsn artifact that have support for parsing and creating messages containing Delivery Status Notifications.
Those classes have constructors that can parse the delivery notifications but it may not be able to parse any read receipt.

Ensure at-most-once semantic with SendGrid Mail API

I have an [Azure Storage] queue where I put e-mail messages to be sent. Then, there is a separate service which monitors that queue and send e-mails using some service. In this particular case I'm using SendGrid.
So, theoretically, if the sender crashes right after a successful call to SendGrid Mail Send API (https://sendgrid.com/docs/API_Reference/Web_API_v3/Mail/index.html), the message will be returned to the queue and retried later. This may result in the same e-mail being delivered more than once, which could be really annoying for some type of e-mail.
The normal way to avoid this situation would be to provide some sort of idempotency key to Send API. Then the side being called can make sure the operation is performed at most once.
After careful reading of SendGrid documentation and googling, I could not find any way to achieve what I'm looking for here (at most once semantic). Any ideas?
Without support for an idempotency key in the API itself your options are limited I think.
You could modify your email sending service to dequeue and commit before calling the Send API. That way if the service fails to send the message will not be retried as it has already been removed from the queue, it will be sent at most once.
Additionally, you could implement some limited retries on particular http responses (e.g. 429 & 5xx) from SendGrid where you are certain the message was not sent and retrying might be useful - this would maintain "at most once" whilst lowering the failure rate. Probably this should include some backoff time between each attempt.

Sending emails in web applications

I'm looking for some opinions here, I'm building a web application which has the fairly standard functionality of:
Register for an account by filling out a form and submitting it.
Receive an email with a confirmation code link
Click the link to confirm the new account and log in
When you send emails from your web application, it's often (usually) the case that there will be some change to the persistence layer. For example:
A new user registers for an account on your site - the new user is created in the database and an email is sent to them with a confirmation link
A user assigns a bug or issue to someone else - the issue is updated and email notifications are sent.
How you send these emails can be critical to the success of your application. How you send them depends on how important it is that the intended recipient receives the email.
We'll look at the following four strategies in relation to the case where the mail server is down, using example 1.
TRANSACTIONAL & SYNCHRONOUS
The sending of the email fails and the user is shown an error message saying that their account could not be created. The application will appear to be slow and unresponsive as the application waits for the connection timeout. The account is not created in the database because the transaction is rolled back.
TRANSACTIONAL & ASYNCHRONOUS
The transactional definition here refers to sending the email to a JMS queue or saving it in a database table for another background process to pick up and send.
The user account is created in the database, the email is sent to a JMS queue for processing later. The transaction is successful and committed. The user is shown a message saying that their account was created and to check their email for a confirmation link. It's possible in this case that the email is never sent due to some other error, however the user is told that the email has been sent to them. There may be some delay in getting the email sent to the user if application support has to be called in to diagnose the email problem.
NON-TRANSACTIONAL & SYNCHRONOUS
The user is created in the database, but the application gets a timeout error when it tries to send the email with the confirmation link. The user is shown an error message saying that there was an error. The application is slow and unresponsive as it waits for the connection timeout
When the mail server comes back to life and the user tries to register again, they are told their account already exists but has not been confirmed and are given the option of having the email re-sent to them.
NON-TRANSACTIONAL & ASYNCHRONOUS
The only difference between this and transactional & asynchronous is that if there is an error sending the email to the JMS queue or saving it in the database, the user account is still created but the email is never sent until the user attempts to register again.
What I'd like to know is what have other people done here? Can you recommend any other solutions other than the 4 I've mentioned above? What's a reasonable way of approaching this problem? I don't want to over-engineer a system that's dealing with the (hopefully) rare situation where my mail server goes down!
The simplest thing to do is to code it synchronously, but are there any other pitfalls to this approach? I guess I'm wondering if there's a best practice, I couldn't find much out there by googling.
My 2 cents:
Once you have a user sign up, never roll back the registration if sending the E-Mail fails. For simple business reasons: They may not come back or re-register if it doesn't work out at the first try. Rather tolerate an incomplete registration and nag the user to confirm their E-Mail address as soon as possible.
In most cases when sending an E-Mail goes wrong, your app will not get immediate feedback anyway - non-existent E-Mail addresses on valid servers will send back a "undeliverable" message with some delay; if the mail gets eaten by a spam filter, you'll get no feedback at all; in other scenarios, it may take several minutes (greylisting) to several days (mail server temporarily down) for an E-Mail to get delivered. A synchronous approach waiting for the delivery of the mail is therefore doomed IMO. Even an immediate failure (because the user entered a obviously fake address) should never result in the registration getting rolled back.
What I would do is, make account creation as easy as possible, allow the user access to the account before it is confirmed, and then nag the hell out of them to confirm their E-Mail (if necessary, limit access to certain areas until confirmation). I would prevent the creation of a second account with the same E-Mail, though, to prevent clutter.
Make sure you allow changing the E-Mail address even if the previous address hasn't been confirmed yet, and enable the user to re-request the confirmation message to a different address.

How should one handle sending xmpp welcome messages when users subscribe to bot (in general)

As the title says, I would like to send a welcome message when a user subscribes to a bot.
However, as I understand it, presence subscribe stanzas should not contain a from-JID that includes resource (and my testing with Adium indicates that is also the case). That is, welcome message could easily be sent to the bare JID but is that really the right way to do it? It feels like it should be sent to the actual instance where the subscription originated.
Perhaps I'm seeing a problem where there is none? If not, any ideas on how to solve it?
Do not fear sending a message to a bare JID. Almost all the time this is what you want. The user may already have a fantastic system in place using priority to get the answer at the right device, like a blackberry, their home jabber client, the one at work, and so on. Heck, they may have sent the request from their blackberry that has a 0 priority, and they want to get the answer back at their desk.
Just send a message stanza with a type of headline, since you don't want them to reply to the notice.
The things said about messages are all right.
If you care about whom to send presence subscribe stanzas to, I wonder whether you really know resources at that time. IIRC, resources are stripped off before forwarding presence subscribes and I assume that you are responding to them. Furthermore, the bot wants to be informed about all presences, so subscribing to the bare jid is the right thing to do.

Email Receipt Assurance

Our clients sometimes don't get the emails that we send out. It's a BIG loss. How do I assure that they receive the emails so that if it's not received in the other end, the program can resend it or do something about it.
None of the suggestions above will work 100% of the time. Many email clients will (rightly so) refuse to load foreign images, negating the usefulness of "web bugs". They will also refuse (or be unable to) return Outlook-style "receipts". And many mail servers either deliberately (to curb spam) or mistakenly (due to misconfiguration) won't return bounce messages. Or possibly an over-aggressive spam filter ate your message, so it arrived but was never seen by the end user. Plus there is the little matter of mail taking hours or days to reach the end user or bounce, and how do you correlate these late notifications or bounces with the mail you sent 4 days ago?
So basically, you can catch some but not all, no matter what you do. I'd say that any design that relies on being able to know with certainty whether the end user got your mail is fatally flawed.
One thing that you can do is set up a bounceback address that receives any mail that is undeliverable. Use the bounceback address as the From address -- you may want a different one for Reply-To so that replies get directed properly.
Check the bounceback mailbox daily and contact customers to get updated email addresses for the ones that fail. You may be able to automate a couple of retries to failed addresses before resorting to the manual contact in case the failure is only intermittent.
This would take some code outside your application that scans the mailbox and keeps some state information about the number of contacts, etc. and attempts the resend.
Depending on how you generate the mails, you might be able to make this process easier: generate a unique bounce address for every single email you send out. You could use bounces+1234#example.com, for example.
Many SMTP servers will allow you to use the part after the + as a parameter to an external script, etc.
The problem is that many (broken) SMTP servers don't return enough info with a bounce to identify the original message -- sometimes, when there are forwardings involved, you don't even get back the original addressee...
With the above trick you can reliably correlate outgoing messages with incoming bounces.
There is no standard way to know whether the email reached the destination. Many email clients support different types of receipts though. You can use any of those if you want.
There are some ways to know when the user actually read the email.
There are many techniques like adding an image to your email that is to be fetched from your web server. When the user reads the email, the request for the image comes to your server and you can capture the event.
The problem is that there is no way to know that the mail did not reach the destination.
I worked on a bulk email system in a previous life. Deliverability was one of our major issues. The most common cause of undelivered emails is a spam filter.
Here are the steps we took to ensure the highest delivery rates:
We used Return Path to test emails for that spam-like smell.
If you send a lot of emails, you need to make sure your SMTP server is not blacklisted.
Remind your users to add your FROM address to their "safe senders" list.
Use a system that collects bouncebacks and use them to scrub your mailing list. This will also help keep you off the blacklists.
If the emails are critical, consider sending them return-receipt-requested. This will not really guarantee anything, but it might give you some metrics on actual deliverability.
There's not really a good way to determine if the email actually arrives in their inbox, you can only confirm that you sent it. Attach a receipt that lets you know when they open it perhaps?
Microsoft Outlook provides similar functionality, however it is based on the email client. I'm not sure if other clients, like Thunderbird, support this.
However, there is nothing in the protocols that specify receipts.
One option that may work: send a link to a generate web page and monitor that page for hits. This provides its own issues however: confidentiality, etc.