Anonymous email verification system - similar to an anonymous tip system - forms

I want to create an anonymous tip system that verifies a user's email address without saving it.
The point would be to verify that someone is affiliated with a certain organization through their email address (whether it be a .gov or a .edu or a particular website's address).
Ideally, however, the email address would not be saved anywhere so that the individual could still comfortably submit their tip/complaint in a totally anonymous and secure way. I suppose we could also be open to encryption, but ideally somehow we would be blind to the user's email address.
What would be the best way to implement this if you have no constraints (it could be an email system, PHP, whatever)?

I'll change the scenario a little then. If we want to simply avoid anyone with access to our system alone from directly knowing the email of anyone who registers or submits a story what would be a possible way we could approach this?
The best way is to never store any part of the email.
Assuming you do need to be able to tell, given the email/hostname again, whether it was associated with a tip, then just treat the email/hostname as you would any sensitive secret like a password. Salt and hash it.
That an email address has lower-entropy than a password shouldn't matter as long as you generate good salts.

If I was a whistleblower who wanted to submit information about my employer to anyone, I would make absolutely sure that no part of the transaction involved any facility under my employer's control -- equipment, communications, or people. That's the only way I could be reasonably convinced that my anonymity could not be trivially subverted.

Related

Is email harvesting from login/signup/forgot forms a genuine concern?

I'm building a membership system keyed by email address. I.e. Email/password.
As I do it there is always this niggling concern in the back of my mind that a spammer is going to bot my forms and use them to verify email addresses somehow.
For example, if I put an ajax email checker on the registration form which goes off and asks the server if we have this email address on file already, I envisage someone might trivially throw email addresses at it and take note of the ones which return true/false, then go off and use that information for their own evil purposes.
Another example, for the forgot password routine, rather than a non-committal 'We may or may not have that email address on file' it is nice to tell the genuine user that we really did find their email on file and sent them their password. But again I worry about bad people submitting lots of email addresses and using the +/- response for their own purposes.
Third example, the way the login form reacts to bad passwords or unknown email addresses also can throw off hints as to whether the email address is a known and active user, or not.
Is this something I should actually worry about? Or am I just making life hard for my users and myself?
Yes you should be concerned, even if you have a few emails in comparison to FB the security of your users is your responsibility. Don't let laxadasical attitudes deter you from seeking the best security you can find for your site and users.
Do you think you're going to have a significant proportion of the world's email addresses in your system? If not, then I don't think this is a concern.
Facebook might need to worry about this, not you. You will only have a tiny insignificant fraction of the world's email addresses, so using your db to check addresses will be futile.

Handling typos in emails or signing up users

I have a web app at which visitors are signing up and getting a newsletter to the email they registered with.
I am using only a single email field in the signup form, since I wish to reduce the number of fields plus I figure most people (like me) copy and paste the email which mean a typo would propagate to the secondary verification field.
My problem is that a fair percentage of the signups have a typo in the email address, e.g. #yhaoo, #hotmaill, etc.
How can I effectively deal with such typos?
I was thinking of doing a simple auto-correction by using a list of misspellings for common domains, but I can't a ready-made comprehensive list for that.
When the form is posted, you can do an DNS lookup to see if there is a MX record for the domain. If there is not, you can be almost certain that it is a typo, because sending to that address would not get delivered. Then you could re-display the form with a friendly error message, asking the user to confirm that the email address is correct.
Don't auto-correct without prompting the user. It will be very hard to get right, and you might end up with confused users, that have their email address on a domain that closely resembles another domain.
I had this same question, and I just found a free javascript library at http://getmailcheck.org that I think will solve our problems:
The Javascript library and jQuery plugin that suggests a right domain when your users misspell it in an email address.
When your user types in "user#gmil.con", Mailcheck will suggest
"user#gmail.com".
Mailcheck will offer up suggestions for second and top level domains
too. For example, when a user types in "user#hotmail.cmo",
"hotmail.com" will be suggested.
Similarly, if only the second level domain is misspelled, it will be
corrected independently of the top level domain.
It is supposedly used by Dropbox, Lyft, Kickstarter, Khan Academy, and more.
First, you should first make a DNS lookup to see if there's a valid MX record for that domain (which implies the domain should exist) - if not you shouldn't accept that email.
Second, look for an http redirect from the domain to another domain. E.g. yayoo.com and yahooo.com both redirect to yahoo.com, so you may want to show a warning message "Did you mean ...#yahoo.com ?" or even automatically correct the addresses from a whitelist that you've made sure are safe to correct.
Lastly, if there's a valid MX record and no redirect, your remaining culprits will most likely be just typos that lead to hitfarms riding on typos for large providers (or innocent other services) e.g. gmial.com. For these you can resort to manually building a hash table of auto-correct suggestions (again, offering the user a "Did you mean.." step before accepting the submission.
I know that the question is old. But maybe my answer will help someone.
I'm using Mailgun API to handle typos in email addresses.

In PHP, approaches to reduce bots submit form and invalid email accounts?

I know its kinda common question, but I cant find a best answer (for now)...
What are the best approaches to reduce bots submit form and invalid email accounts in php and html?
Bots - capthca? hidden css? what else?
Invalid Email - This is truely insane job. How can I detect if the user type: user#yahooo.com, then i said the email is invalid? What if he type: user#yaho.com, user#yahoo1.com, etc... is there anyway to check whether the email is valid?
captchas are the most common way to prevent bots. Coding horror has a good article on the subject (see: http://www.codinghorror.com/blog/archives/001067.html and http://www.codinghorror.com/blog/archives/000712.html)
As to valid/invalid emails, your best bet is to require a validation step in registration. Don't activate the account until the user has used a link/special key sent in an email.
One way is to use a service like Akismet, which provide free API to hook up your form for validating form inputs against known spammers (and spam-like texts).
With so many email accounts, it is much an overhead to validate email accounts (you can always check the email string-validity (like xyz#abc.com) using regex, but not quick or light enough to check if the account is valid).
Your best bet for checking valid email addresses is to send an email to it with a random value which you have the user click on.
e.g.
Welcome to McFadder's site!
Click here to validate your email address:
http://www.example.com/validate.php?Hash=c4ca4238a0b923820dcc509a6f75849b
You then have a database table (say, called UserEmailValidate) which contains the User ID, the hash.
To validate email addresses in the form, use JavaScript regular expressions, or PHP validation.
To avoid bots abusing your form, use captchas. http://recaptcha.net/ is a free service.
I think CAPTCHA is going to be your based option, I've used ReCAPTCHA in the past:
http://recaptcha.net/plugins/php/
You can only validate the email on face value as per the RFC.
http://en.wikipedia.org/wiki/E-mail_address
You might want to send an email to them and ask them to click on a link to validate their account.
We used a cross site request forgery block in combination with a captcha and a field hidden with CSS to cut out almost all faked emails on our site. It isn't perfect, but the volume was significantly reduced. If you combined all that with a human verification of the actual email and deleting unverified accounts you could tighten up the spam net even more.
Set a session cookie of a hashed and salted secret value
Submit the form with that secret cookie and make sure the session matches the hidden form field. This beats the lazy bot submissions
Add a captcha to beat better bots
Create a hidden field called "comments" that is hidden with CSS. Put a label that says "don't fill this out or your submission will be ignored" and style that hidden as well. Anybody that fills it out is either a bot or a dumbo and you can pretend to send the email but not really do it.
Add in askimet (no experience personally) and a quick verificiation email and you have a pretty reliable net that will skim out the crap for you.
Send a confirmation email to the address provided with an activation key that the user has to use to activate their account to verify that the email is valid.
To get rid of bots, you probably want to use a captcha.
First of all you can try simply not to deal with these problems by using alternative methods (like stackoverflow does). The next thing is to check if the mail "could" be a valid by resolving the hostname and let the user play the usual captcha game. You can either do something of your own or use third party services. You can make extensive use of cookies, flash and JavaScript, however that might annoy a few users and not prevent so many spammers. What do you mean by hidden CSS? Hide certain input fields via css and give them names like URL/firstmail/name and hope that a robot - not obeying the display:none; - will fill it out? Yes, could prevent a few. The last thing is to send the user a link to the given mail to validate and activate his account, if an account is not activated within two days, just drop it. You could even go one step further and ask the user in this mail to send YOU a mail to a specific address...

Guidelines for accepting email messages as input to application

A number of applications have the handy feature of allowing users to respond to notification emails from the application. The responses are slurped back into the application.
For example, if you were building a customer support system the email would likely contain some token to link the response back to the correct service ticket.
What are some guidelines, hints and tips for implementing this type of system? What are some potential pitfalls to be aware of? Hopefully those who have implemented systems like this can share their wisdom.
Some guidelines and considerations:
The address question: The best thing to do is to use the "+" extension part of an email (myaddr**+custom**#gmail.com) address. This makes it easier to route, but most of all, easier to keep track of the address routing to your system. Other techniques might use a token in the subject
Spam: Do spam processing outside the app, and have the app filter based on a header.
Queuing failed messages: Don't, for the most part. The standard email behavior is to try for up to 3 days to deliver a message. For an application email server, all this does is create giant spool files of mail you'll most likely never process. Only queue messages if the failure reasons are out of your control (e.g., server is down).
Invalid message handling: There are a multiple of ways a message can be invalid. Some are limitations of the library (it can't parse the address, even though its an RFC valid one). Others are because of broken clients (e.g., omitting quotes around certain headers). Other's might be too large, or use an unknown encoding, be missing critical headers, have multiple values where there should only be one, violate some semantic specific to your application, etc, etc, etc. Basically, where ever the Java mail API could throw an exception is an error handling case you must determine how to appropriately handle.
Error responses: Not every error deserves a response. Some are generated because of spam, and you should avoid sending messages back to those addresses. Others are from automated systems (yourself, a vacation responder, another application mail system, etc), and if you reply, it'll send you another message, repeating the cycle.
Client-specific hacks: like above, each client has little differences that'll complicate your code. Keep this in mind anytime you traverse the structure of a message.
Senders, replies, and loops: Depending on your situation, you might receive mail from some of the following sources:
Real people, maybe from external sources
Mailing lists
Yourself, or one of your own recipient addresses
Other mail servers (bounces, failures, etc)
Entity in another system (my-ldap-group#company.com, system-monitor#localhost)
An automated system
An alias to one of the above
An alias to an alias
Now, your first instinct is probably "Only accept mail from correct sources!", but that'll cause you lots of headaches down the line because people will send the damnedest things to an application mail server. I find its better to accept everything and explicitly deny the exceptions.
Debugging: Save a copy of the headers of any message you receive. This will help out tremendously anytime you have a problem.
--Edit--
I bought the book, Building Scalable Web Sites, mentioned by rossfabricant. It -does- have a good email section. A couple of important points it has are about handling email from wireless carriers and authentication of emails.
You can set the address that the email is sent from, what will be put into the To: address if someone just presses 'Reply-to'. Make that unique, and you'll be able to tell where it came from, and to where it must be directed back to.
When it comes to putting a name beside it though '"something here" ' - put something inviting to have them just reply to the mail. I've seen one major web-app, with Email capturing that has 'do not reply', which turns people off from actually sending anything to it though.
Building Scalable Web sites has a nice section on handling email. It's written by a Flickr developer.
(source: lsl.com.au)
EDIT: I misunderstood your question.
You could configure your email server to catch-all, and generate a unique reply-to address. E.g. CST-2343434#example.com.
A polling process on the server could read the inbox and parse out the relevant part from the received email, CS-2343434 could mean Customer Support ticket ID no. 2343434.
I implemented something like this using JavaMail API.
Just a thought.
The best way to achieve this will be to write a window service that acts like a mail client [pop3 or imap]. This windows service should execute a timed action triggered by a timer, which connects to the mail server and polls the server for any unread message(s) available in the email inbox. The email ID to check for is the email ID on which the users will give their input on/to. If the windows service client finds that there exists any new mail(s) then it should download and filter the email body and push further for processing based on the user input in the email. You can host the input processing in the same windows service but it is not advisable to do so. The windows service can put the inputs in a special application directory or database from where your main appication can read the user inputs received in email and process them as needed.
You will be required to develop a high performance TCP/IP client for doing so. I advise you not to use the default .Net library due to performance issues, instead use one of the best availabel open source TCP/IP implementations for .Net like XF.Server from kodart. we have used this in our applications and achieved remarkably grear results.
Hope this helps..
Bose has a pretty great system where they embed a Queue and Ticket ID into the email itself.
My company has the traditional Case # on the subject line, but when CREATING a case, require a specific character string "New Case" "Tech Support Issue" on the subject line to get through the spam filters.
If the email doesn't match the create or update semantics, the autoresponder sends an email back to the recipient demonstrating how to properly send an email, or directs them to our forums or web support site.
It helps eliminate the spam issue, and yet is still accessible to a wide technical audience that is still heavily email dependent.
Spam is going to be a bit of a concern. However since you are initiating the conversation you can use the presence of your unique identifier (I prefer to use the subject line - "Trouble ticket: Unable to log into web...[artf123456]") to filter out spam. Be sure to check the filter on occasion since some folks mangle the subject when replying.
Email is a cesspool of bad standards and broken clients. You need to be prepared to accept almost anything as input. You will need to be very forgiving about what kinds of input are tolerated. Anything easy for you to program will likely be difficult for your users to use correctly. Consider the old mailing list programs that require you to issue commands in the subject line. Only hardcore nerds can use those effectively. And some of those trouble-ticket CRM things you mentioned have bizarre requirements, such as forcing the user to reply between two specific text markers in the text. That sort of thing is confusing to people.
You'll need to deal with email clients that send you formatted text instead of plain text. Some email clients still don't handle HTML properly (cough GMail) so your replies will also need to be designed appropriately. There are various ways in which photos might be "uploaded" via email as well, especially when mobile phones are involved. You will need to implement various hacks and heuristics to deal with these situations.
It's also entirely possible that you will get email that is valid but unusable by the email parsing library you are using. Whether or not this is important enough to roll your own will be a judgement call.
Finally, others have mentioned using specific email addresses to uniquely identify a "conversation". This is probably the easiest way to do this, as the content of the mail will often not survive a round trip to a client. Be prepared, however, to get mail to old IDs from old customers who, instead of opening a new ticket somehow, reply to an old ticket. Your application will probably need some way to push emails with an old ID into a new case, either manually or automatically. For a CRM system it's very likely that a user would reply to an old email even if you already sent him a new email with a new ID in it. As for whether you should use some.email.address+some.id#yourdomain.com or just some.id#yourdomain.com, I'd go with the latter because the plus-sign confuses some email clients. Make your IDs guids or something and have some way to validate them (such as a CRC or something) and you'll get less junk. Humans should never have to type in the GUIDs, just reply to them. The downside is spam filtering: a user's computer might view such email addresses as spam, and there wouldn't be an easy way to whitelist the addresses.
Which reminds me: sending email these days is full of pitfalls. There are many anti-spam technologies which make it extremely hard for you to send email to your customers. You will need to research all of these and you need to be careful, and do some testing, to ensure that you can reach the major email providers. A website like Campaign Monitor
can help you if you are sending email.

What is the best and safest way to store user email addresses in the database? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
From security reasons, is it worth encrypting user emails before putting them into the database?
I know we hash and salt passwords but that's another story as we do not really need password originals. With emails it is different.
Knowing that the decryption key will anyway be somewhere close to the database, does it make sense to encrypt emails? I suppose if someone gets into the system, they will find the key as well, if not immediately then eventually.
What are the best-practices? Are there any other options available if I run my own servers and not on a shared/virtual hosting?
EDIT: I intend to use SQL Server. And no, it is no corporate software with security requirements, just some entertainment site I have in mind.
If you're going to need the email address in the future, then you'll have to store them in plain text.
You could encrypt them, of course, however, this is effectively security through obscurity in this case. Basically, if your application's perimeter is secure, your data within it can be plain text. Encrypting here adds complexity to you working with the data, but doesn't really stop an attacker from getting your raw data.
As you say, if he gets through your perimeter defenses, he's likely to easily get your decryption key to decrypt the email data. Encryption may slow down the determined attacker slightly, but will not add any real security to your data.
The best scenario is to hash the email address (with salt!) and store that. This allows you to check the email address against an input value (for example) and verify that the email address input is the same as what you have stored, of course, the major downside for this is that you can't know what the email address is without that additional value, so if you're wanting to (for example) regularly email your users, you'll be out of luck.
I suspect you're storing the email address because it's useful data, and you will want to do something with it (like send an email :) in which case, encrypting just adds overhead to working with that data, whilst gaining very little in return.
In this case, I would focus on securing access the database itself (i.e. your "perimeter" defenses) and ensure they are as strong as can be, whilst leaving the data in the database in plain text.
Hopefully this answer will answer your question as well.
Is it worth encrypting email addresses in the database?
In short, no, it is not worth encrypting user email addresses. You're right in thinking that a database compromise will likely result in somebody also gaining access to the keys required to break your encryption.
In general I agree with others saying it's not worth the effort. However, I disagree that anyone who can access your database can probably also get your keys. That's certainly not true for SQL Injection, and may not be true for backup copies that are somehow lost or forgotten about. And I feel an email address is a personal detail, so I wouldn't care about spam but about the personal consequences when the addresses are revealed.
Of course, when you're afraid of SQL Injection then you should make sure such injection is prohibited. And backup copies should be encrypted themselves.
Still, for some online communities the members might definitely not want others to know that they are a member (like related to mental healthcare, financial help, medical and sexual advice, adult entertainment, politics, ...). In those cases, storing as few personal details as possible and encrypting those that are required (note that database-level encryption does not prevent the details from showing using SQL Injection), might not be such a bad idea. Again: treat an email address as such personal detail.
For your entertainment site this is probably not the case, and you should focus on prohibiting SELECT * FROM through SQL Injection, and making sure visitors cannot somehow get to someone else's personal profile or order information by changing the URL.
One of the most often-cited truisms in computer security is that the
only truly secure computer is one buried in concrete, with the power
turned off and the network cable cut.
With that in mind the best way to securely store email addresses? Dont store them at all!
tl;dr Do you need their email address, or a way of sending them emails? Either trust someone who will do a better job than you or don't use the email address at all.
Why do you need to keep a record of a customer's email address? The only reasons I have run into are:
Account confirmation & authentication
Transaction & Marketing emails
Confirmation & Authentication
The core of what we want is two step authentication: Something they know and something they have. Something they know is a password, and is easy to prove since they will be the only one who knows it. Something they have is harder to prove and traditionally we use an email address since it is easy to verify. These days though there are other things we can use:
Mobile phone
An account with a trusted website (Facebook, Google, Twitter)
Mobile phone verification is simple. Send them a sms using a service like twilio.com and ask them to text back a confirmation code. We now know that the mobile belongs to the customer who wanted to register. With OpenID you can verify existing accounts with other trusted sites, and the confirmation process is handled by them.
For the customer to authenticate then all they provide is either their mobile number and password, or an OpenID authentication token. Neither require a email address (well the OpenID provider might but thats not your responsibility).
If these are not an option then you can still confirm an email address and then use it for authentication. Confirmation only requires a unique token to be stored and a link to be sent to the email address. Store a salted hash of the email address, and use that to match the account in the same way we do passwords.
Transaction & Marketing Emails
The real reason why we want to store the email address! So we can send them offers of stuff we think they need so they can delete it without reading it. Seriously though is email the best medium for this? If we have an OpenID account then why not use that for notifications? Send a Facebook message or write on their wall, #mention them on Twitter, send a text message to their mobile, build an app and push notifications at them. There are so many channels much more effective than email.
If you want to use email then use a email platform like Mandrill and MailChimp. When they register create a subscriber in a mailing list on MailChimp. Store the subscriber id with the account. For transaction emails ( reset password, account updates ) fetch the subscriber and pass the stored email to Mandrill to send the email. For mass marketing just send to the mailing list in MailChimp.
The only thing stored in the database is the subscriber id. It also gives all the benefits of using a email platform, unsubscribes, open and click through rates, e-commerce tracking etc. Email platforms will do a better job of delivering emails that you. They will also do a better job at protecting the privacy of their data than you. Let them do the hard work of database security so you can focus on getting more customers.
I think that when people can come in your database you are anyway screwed :)
It doesn't make a lot of sense to just encrypt your email addresses. Beside that there will be a lot of other information in your database that you would not like to be gathered, the decryption key will be indeed within reach at the same time your database is open.
I would like to suggest to find your layer of security and data integrity on a higher level. So the prevention of people entering your database.
And why would email addresses be so important? Most people will anyway get spam or their email addresses will otherwise be available somewhere on the web.
Depends on how often you access the addresses. If you read them once in a while, it might make sense, but this would be one of the last security issues I would spend time on.
I do not encrypt user e-mails. The point is to protect the database; the keys are accessible anyway if you actually want to use the e-mails once they are stored.
Do check the address for validity and possible SQL injection, though.
If the application server and database are on separate servers, it would generally increase security to have all or parts of the database encrypted.
Even if they are on the same machine, a hacker may not figure out where your password is stored (although I wouldn't rely on that).
I generally wouldn't encrypt the emails at the application level, instead relying on database-wide encryption offered by most enterprise databases.
Of course if you're using something like MySQL, then you have no choice but to do it at the application level.
I normally tell my clients it isn't worth the trouble encrypting a database, however if you have stricter privacy requirements it may make sense to do so.
Encrypting database content is always a tricky consideration. Clearly the content is useless unless it can be unencrypted, and if that has to happen without human intervention, then you're storing both the cyphertext AND the key somewhere. If that somewhere is on the same machine, then one might wonder why you even bothered.
Well, there's a few reasons why you might want to do this. One is because you're required to do so because of some company policy. Another is that perhaps your database is housed in a more hostile environment than that machine that accesses it.
In general, encrypting database content isn't going to win you any awards, but if you can justify it, then you clearly have at least some motivation to do so.
yeah could be helpful for the user if you hash it with salt. I had a code before which i used that I use salt and hash then I can decrypt it. Flow is that once user will register you then hash and salt (encryption process) it. Then if you need to fetch the encrypted data there will be decryption.