Email attachments and bandwidth usage

Email attachments and bandwidth usage - email

I am working on a module that enables our system users to send bulk emails to all the registered active applicants from the applicant pool. Currently, there are more than 10 million active applicants in the pool to which emails can be sent. I am thinking to create blocks of emails and wait for a few minutes before sending individual blocks. What I am more concerned about is the attachment.
Since every email can contain an attachment(max. 2MB), There is a possibility that a huge amount of bandwidth will be consumed, even if the email is sent to only 10,000 applicants (2MB X 10,000 applicants = 20GB bandwidth approximately). My questions are:
Since every attachment is a MIME type, will the size of the email be calculated the the way I have calculated above? Or there is a different mechanism specially in context of bandwidth usage?
In your opinion, what options do I have If I have to send a document to thousands of people and want to save the bandwidth as well? I can put the document on the server and let everybody download, but will it not consume the some amount of bandwidth? (I don't want to go down the FTP route)
Somebody was saying moving these kinds of documents to the cloud?? Does cloud technology offer solutions that cater for this kind of need?
Many thanks,

The attachment creates a problem of being flagged as spam. Best avoid it if you can.
The attachment is MIME encoded rather than gzip compressed. This takes up 1.5 times the bandwidth.
It is not easy to see if the attachment has been opened unless it has some payload that does that for you - again this could be flagged as spam.
Putting these documents on a regular web server will make sense. You can use normal Google Analytics to see what is going on. You can also use public caching to make sure that the document is cached by ISPs etc, thereby reducing your download. The document can also be compressed with gzip to be opened with a browser, unobtrusively doing the un-compressing for your recipients.

Related

Best practices - should I order newsletter recipients by email host or not?

For a newsletter mailing, about 50,000 users, using pear, is it convenient to order the list by mail provider or leave it all randomly?

From my experiences using Exim to send large amounts of emails, performance will suffer heavily if your email queue grows too large. Depending on your hardware, once you have around 10,000 emails in the queue, you will start to see significant effects of bogosorting, where the server uses more CPU just juggling the queue than actually getting any useful work done.
One way of avoiding large queues, is of course to get the emails delivered as fast and efficient as possible. One of the many ways of achieving this, is to get Exim to deliver multiple emails over the same TCP connection. This in turn can be achieved by sorting the recipients by domain, but that is not enough! By default, Exim will try to deliver each mail it receives immediately and then each delivery will open its own connection (this gives fast deliveries for very small volumes but will drive server load through the roof for larger volumes). You need to first spool the mails to Exim, and then let a queue runner handle the actual delivery which will then automatically see all other emails in the queue that should go to the same host and will deliver them over the same connection.
Optimizing Exim for sending large amounts of emails is a very complex subject that cannot be solved with just a few magical tricks. Crucial configuration options are (but not limited to): queue_only, queue_run_max, deliver_queue_load_max, remote_max_parallel, split_spool_directory, but also fast spool disk, enough RAM, and making sure Exim starts new queue runners often enough (command-line option when starting the Exim daemon).
How this relates to PEAR escapes me, but perhaps this gives you some ideas of how to approach your problem.

Extract JSON-Key from incoming stream

I am currently working on an endpoint for the Postmark service. Postmark sends received E-Mails as JSON and put the attachments as Base64 into an "Attachment"-Array.
My question is now if it is possible to use akka streams to extract the Attachments while receiving them and "do my things" with it?"
In details: I wanna slice the Attachments into chunks, hash them while they come in to store the chunks in my database.
The rest of the incoming JSON can be processed (without) the attachments afterwards.
The reason I wanna do this, is to use as little memory as possible becuase the attachment sizes can go up to 100MB and the hashing of such big files can use a lot of processing power at once.
I hope to save memory and to much cpu load if I process the attachments in chunks.
If you know a better way to achieve this I am open for any suggestion! :)

Do not send mails with duplicate subjects

we've got different processes that send mails in case of issues encountered (e.g. not enough permissions to perform an operation on a certain order item). This works fine to the point that sometimes identical messages are sent every 5 minutes. In our environment it is very difficult to synchronize the email sending on application layer (actually there are different applications sending out email, so we'd have to touch every application if we were to implement this inside application layer).
It would seem logical for me that filtering out mails (by duplicate subjects) is best done within the email layer, e.g. the application receiving the SMTP requests.
Yet we'd also prefer not to go down to SMTP layer by ourselves, rather use an existing service/application.
Is anybody aware of a web mailer (like googlemail) which does this kind of filtering? it would be ok for us the pay for such a service, so being "free as in beer" would be nice, but being not free is not a showstopper.
Thanks in advance
Holger

I find the idea of filtering duplicate e-mail message by the Subject: header quite worrisome. If they are produced by multiple applications, how can you be certain that the content of the messages is duplicated and that you are not unwittingly dropping important notifications?
The only unique feature of a message that can be used to filter out duplicates is its Message-ID: header. If that header is the same for two messages, then it's usually reasonable to assume that they are copies of the same original message - e.g. one received directly and one that was CC'ed to a mailing list.
That said, you can do pretty much anything you want on most SMTP servers - at least those that are based on a Unix-like OS. For example, Postfix can use custom shell scripts for filtering.
You can, for example, use formail to extract the body of each message and produce its
MD5 hash. Comparing the message body hashes along with the Date:, Subject:, From:, To: and Cc: headers at the same time is a good start to detect real duplicates.

Newsletter slow sending - delay between 2 mails to the same server / max e-mails per hour?

I have to make a newsletter sending utility application which will collect the list of subscriber from our central database and send out the newsletter. I've considered the possibility to be blacklisted due to flooding if I just flush out all emails at once, so I decided to go on a desktop-based softwer which will email those slowly.
my question is
what is the max emails per hour that may be addressed to the same email domain (recipient/incoming server)?
or what should be delay between 2 e-mails to the same server for it doesn't consider it flooding?
whichever of the above applies more appropriate to the real-world of mail servers configuration...
thanks

I make Thread.Sleep(2000) after every 2 mail

It's really going to vary by configuration, so there's not necessarily a one-size-fits-all answer. You might want to check with your ISP - it's probably them or their upstream that you'd need to worry about.
Since you're sending a newsletter, could you add multiple recipients via BCC rather than individual messages? That should be less "abusive" to all concerned.

I've implementing sending to max 600 e-mails per recipient domain. That seems to be working fine for some time now and sounds like an OK solution.
Still, some SysAdmin insight on this would be appreciated.

IMAP folder/messages synchronization strategy?

I'm about to add IMAP email integration to one of our web applications (ASP.NET / SQL Server). I'm already using a commercial library which exposes the most important IMAP functionality: get folder list, get message headers, get mime message etc.)
Getting email data "live" from the IMAP server works very well. But here comes the difficult task: I have to keep the email/folders caching SQL database synchronized to the IMAP server (I have to show data applying different criteria).
Our database schema essentially contains a "Folders" and an "Emails" table. The "Emails" table contains primarily header information like "FromAddress", "FromName", "IsRead", "IsAnswered", "IsForwarded", "HasAttachments" etc. (without the email content or attachments).
I have to consider two major scenarios:
Getting all messages the first time (or after a user re-organized the folders)
Getting new/recent messages
What would be a good synchronization strategy for keeping the mail server and database server up-to-date, considering that performance is a major design criterion (I can't just query/compare thousands of messages every time I connect, in order to find out if the user moved or deleted some old emails).
Thanks!

From your library's feature list:
Better UniqueId Support: We've added
even more options for requesting a
message's unique id. You can now
return the UniqueId in a message's
DataTable for return trips to the IMAP
server.
And:
Retrieve only New Messages
Search Flagged Messages
Mark/Unmark Messages as Read
It looks to me as though your library has all the support you need to keep your SQL server synchronized. You can programmatically mark messages as read, and the library supports retrieval of only new messages. That takes care of your second item.
Your strategy will depend partly on how your solution works. If I read your question correclty, your users manage their email on the IMAP server, and your SQL Server is "subscribed" to the IMAP server, from a syncronization perspective.
If this is correct, then synchronization is effectively a background task. My approach would be to synchronize using an event model on a user-by-user basis. If possible, "notify" the synchronization program when there is activity (new/deleted emails) for a user. Add a synchronization "job" to a background process that batches synch jobs together. A notification model will ensure that the synch program only works on users that need a synch.
Small new/deleted email synch jobs go to one "processor" and larger jobs like total resynch and folder reorganization go to another. Really big resynch jobs may have to be split up in order to keep overall throughput high. The "small job" and "big job" processors could be two different services, or possibly two different threads depending on performance and design considerations.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse