Email obfuscation question - email

Yes, I realize this question was asked and answered, but I have specific questions about this that I feel were not clear on that thread and I'd prefer not to get lost in the shuffle on another thread as well.
Previous threads said that rendering the email address to an image the way Facebook does is overkill and unprofessional user experience for business/professional websites. And it seems that the general consensus is to use a JavaScript document.write solution using html entities or some other method that breaks up and/or makes the string unreadable by a simple bot. The application I'm building doesn't even need the "mailto:" functionality, I just need to display the email address. Also, this is a business web application, so it needs to look/act as professional as possible. Here are my questions:
If I go the document.write route and pass the html entity version of each character, are there no web crawlers sophisticated enough to execute the javascript and pull the rendered text anyway? Or is this considered best practice and completely (or almost completely) spammer proof?
What's so unprofessional about the image solution? If Facebook is one of the highest trafficked applications in the world and not at all run by amateurs, why is their method completely dismissed in the other thread about this subject?
If your answer (as in the other thread) is to not bother myself with this issue and let the users' spam filters do all the work, please explain why you feel this way. We are displaying our users' email addresses that they have given us, and I feel responsible to protect them as much as I can. If you feel this is unnecessary, please explain why.
Thanks.

It is not spammer proof. If someone looks at the code for your site and determines the pattern that you are using for your email addresses, then specific code can be written to try and decipher that.
I don't know that I would say it is unprofessional, but it prevents copy-and-paste functionality, which is quite a big deal. With images, you simply don't get that functionality. What if you want to copy a relatively complex email address to your address book in Outlook? You have to resort to typing it out which is prone to error.
Moving the responsibility to the users spam filters is really a poor response. While I believe that users should be diligent in guarding against spam, that doesn't absolve the person publishing the address from responsibility.
To that end, trying to do this in an absolutely secure manner is nearly impossible. The only way to do that is to have a shared secret which the code uses to decipher the encoded email address. The problem with this is that because the javascript is interpreted on the client side, there isn't anything that you can keep a secret from scrapers.
Encoders for email addresses nowadays generally work because most email bot harvesters aren't going to concern themselves with coding specifically for every site. They are going to try and have a minimal algorithm which will get maximum results (the payoff isn't worth it otherwise). Because of this, simple encoders will defeat most bots. But if someone REALLY wants to get at the emails on your site, then they can and probably easily as well, since the code that writes the addresses is publically available.
Taking all this into consideration, it makes sense that Facebook went the image route. Because they can alter the image to make OCR all but impossible, they can virtually guarantee that email addresses won't be harvested. Given that they are probably one of the largest email address repositories in the world, it could be argued that they carry a heavier burden than any of us, and while inconvenient, are forced down that route to ensure security and privacy for their vast user base.

Quite a few reasons Javascript is a good solution for now (that may change as the landscape evolves).
Javascript obfuscation is a better mouse trap for now
You just need to outrun the others. As long as there are low hanging fruit, spammers will go for those. So unless everyone starts moving to javascript, you're okay for now at least
most spammers use http based scripts which GET and parse using regex. using a javascript engine to parse is certainly possible but will slow things down
Regarding the facebook solution, I don't consider it unprofessional but I can clearly see why purists may disagree.
It breaks accessibility standards (cannot be parsed by browsers, voice readers or be clicked.
It breaks semantic construct (it's an image, not a mailto link anymore)
It breaks the presentational layer. If you increase browser default font size or use high contrast custom CSS, it won't apply to the email.

Here is a nice blog post comparing a few methods, with benchmarks.
http://techblog.tilllate.com/2008/07/20/ten-methods-to-obfuscate-e-mail-addresses-compared/

Related

What is the current best antispam solution for online forms without using a captcha?

Hopefully this is an acceptable subjective question. We are interested in your specific and recent experiences.
For years, we used the “hidden field” trick to confuse spambots into not being able to deliver spam via an online form. In particular, this solution from Nfriendly worked perfectly.
Over time, though, spambots got smarter, and it no longer stopped much. We removed the previous solution during a site redesign because it didn’t work well with the new setup. Spam continued, albeit not at a noticeably higher rate.
We would like a new solution that blocks at least most spam without having to resort to a captcha that could discourage user interaction.
There are lots of good posts on StackOverflow about various methods to try and stop, or at least reduce, spam via an online form without using a captcha, but most of them are rather old and tend to be discussing techniques that probably also have been defeated by now like the hidden field solution.
In your recent experience, have basic math problems in lieu of a captcha been defeated already? Has it discouraged user interaction? How about this clever idea of displaying a photo of Mario on a video game site and asking the user to identify him? In your experience, are spambots smart enough to defeat that method?
It seems like the most recent ideas to avoid a captcha are trying to detect human-like activity, but these may add quite a bit of overhead.
Before attempting to implement one of the most recent ideas or try to roll our own, some community feedback would be helpful.
Based on your recent experience, what do you think is the current, best, lightweight solution for reducing spam submitted via an online form without using a captcha? Why? Thank you.
CAPTCHAs now require increasing amounts of third party data to be loaded. Sometimes they require increasing amounts of click-n-wait interaction. The only consolation has been the thought that we're providing useful evaluation for a distant process.
The depressing result of this is that the old 'evaluate a written puzzle' approach is now, comparatively, less annoying! So choosing a topic appropriate to your audience, all you need to do is refresh your question database occasionally to balance the spam against the complaints.
Unless of course you already have a dataset that could do with some crowd-evaluation.

Clickable email-links encryption? How to do them?

I would like to know if and how it is possible to create a clickable email-link for websites, that are "encrypted" in a way emailspiders can't collect them and it is still possible for living users to click it to open in email-clients or even copy it.
I saw some links that were done in javascript but I on't know how they did this and how "safe" they are.
thank you in advance for any reply
Most approaches to this are splitting the address across multiple elements and inserting extra formatting; then for JS-enabled browsers, they use JavaScript to turn it back into an e-mail address.
The poster example for this is SpamSpan, which even has several "levels" of obfuscation - each level progressively less and less resembles an e-mail in the source code, yet it still manages to piece it back together by JS. Although some spambots today are supposedly capable of executing JavaScript, te vast majority doesn't - and the e-mails are still human-readable with JS off. An advantage of JS-assisted de/obfuscation is that it doesn't rely on external servers, you just need to (simply) integrate the JS library.
Another approach is taken by reCAPTCHA Mailhide - the e-mail is revealed only after solving a CAPTCHA (same type as for normal reCAPTCHA). This is less convenient for the user, but practically safe against robots. A disadvantage of this is that it depends on reCAPTCHA's servers (in essence, on Google) - some people are dead-set against any external dependencies.
This would be a very simple and effective way:
Scramble email addresses
All it does is convert it into ASCII, and all you need to do is insert it where your email address would go!
Although there are more (crazily) secure ways you can choose, this would be the simply option. You can also try this solution, it uses JavaScript to protect your email.
Hope this helps!

Whitelist for player to player chat in kids game

We're developing an educational multiplayer game for kids and want to allow players to chat with each other using a whitelist system. When using whitelist chat, players will be able to type only words which appear in the whitelist.
We're aware of the limitations of whitelists in general, but we think a whitelist chat system is something that would allow our players to express themselves better in the game, while allowing a higher level of security than moderated or blacklist chat.
While the system is easy enough to implement, we haven't been able to find a sample whitelist of "safe" words online. Does anyone know of where we can find such a list, preferably with a license that allows us to use it in a commercial project?
Thanks.
I do not believe that a simple whitelist of words will cut it. There are quite a few euphemisms for a lot of stuff out there, that a whitelist would never block (e.g. "he is growing like a weed" is fine, "he is growing weed" is NOT). And let's not mention the basic "would you like to meet?" which would be fine if the meeting were to happen in-game, but very dangerous if it were to happen out of it. Then there is also the issue of blocking rare, foreign or mistyped words, that might make your chat system frustrating enough that it would not be used.
In my opinion, there is absolutely no way you could ever match the security offered by an active and competent human moderator. Of course, depending on the volume of chat traffic and any real-time requirements there are quite a few practical issues with using humans for this. Considering that your application is targeted at children, however, human moderation might be quite acceptable, despite its much higer cost.
A second choise, but one very far from the abilities of human moderation, is to use some statistical filter such as Bogofilter, which will happily sort arbitrary text if you train it well. A blacklist would also help to immediately cut down messages with words that little kids should not (but usually do) know. You would also need a bunch of filters that would cut down messages with stuff like telephone numbers, email and street addresses and web links.
Perhaps the method with the best effectiveness/cost ratio would be to use human moderators assisted by multiple statistical filters to better make use of their time. Keep in mind, however, that if there are malicious users (i.e. anything else than same-age kids in a classroom) there is no way to make sure that nothing questionable or dangerous ever goes through.
You can try the standard unix dictionary. /usr/share/dict/words. But you'll have to modify it to remove the naughty words.
http://en.wikipedia.org/wiki/Words_%28Unix%29
http://www.openwall.com/wordlists/
While this doesn't exactly answer your question, Runescape uses a white list of phrases, rather than words.
The implementation in Runescape is awkward, because there are so many phrases to choose from. You have to go through 3 or 4 menus sometimes to get to the phrase you want.
If you can come up with a better organization of phrases, then this might work for you.

How to overcome fear of user-input (web development)

I'm writing a web application for public consumption...How do you get over/ deal with the fear of User Input? As a web developer, you know the tricks and holes that exist that can be exploited particularly on the web which are made all the more easier with add-ons like Firebug etc
Sometimes it's so overwhelming you just want to forget the whole deal (does make you appreciate Intranet Development though!)
Sorry if this isn't a question that can be answered simply, but perhaps ideas or strategies that are helpful...Thanks!
One word: server-side validation (ok, that may have been three words).
There's lots of sound advice in other answers, but I'll add a less "programming" answer:
Have a plan for dealing with it.
Be ready for the contingency that malicious users do manage to sneak something past you. Have plans in place to mitigate damage, restore clean and complete data, and communicate with users (and potentially other interested parties such as the issuers of any credit card details you hold) to tell them what's going on. Know how you will detect the breach and close it. Know that key operational and development personnel are reachable, so that a bad guy striking at 5:01pm on the Friday before a public holiday won't get 72+ clear hours before you can go offline let alone start fixing things.
Having plans in place won't help you stop bad user input, but it should help a bit with overcoming your fears.
If its "security" related concerns you need to just push through it, security and exploits are a fact of life in software, and they need to be addressed head-on as part of the development process.
Here are some suggestions:
Keep it in perspective - Security, Exploits and compromises are going to happen to any application which is popular or useful, be prepared for them and expect them to occur
Test it, then test it again - QA, Acceptance testing and sign off should be first class parts of your design and production process, even if you are a one-man shop. Enlist users to test as a dedicated (and vocal) user will be your most useful tool in finding problems
Know your platform - Make sure you know the technology, and hardware you are deploying on. Ensure that relevant patches and security updates are applied
research - look at applications similar to your own and see what issues they experience, surf their forums, read their bug logs etc.
Be realistic - You are not going to be able to fix every bug and close every hole. Pick the most impactful ones and address those
Lots of eyes - Enlist as many people to review your designs and code as possible. This should be in addition to your QA resources
You don't get over it.
Check everything at server side - validate input again, check permissions, etc.
Sanitize all data.
That's very easy to write in bold letter and a little harder to do in practice.
Something I always did was wrap all user strings in an object, something like StringWrapper which forces you to call an encoding method to get the string. In other words, just provide access to s.htmlEncode() s.urlEncode().htmlEncode() etc. Of course you need to get the raw string so you can have a s.rawString() method, but now you have something you can grep for to review all uses of raw strings.
So when you come to 'echo userString' you will get a type error, and you are then reminded to encode/escape the string through the public methods.
Some other general things:
Prefer white-lists over black lists
Don't go overboard with stripping out bad input. I want to be able to use the < character in posts/comments/etc! Just make sure you encode data correctly
Use parameterized SQL queries. If you are SQL escaping user input yourself, you are doing it wrong.
First, I'll try to comfort you a bit by pointing out that it's good to be paranoid. Just as it's good to be a little scared while driving, it's good to be afraid of user input. Assume the worst as much as you can, and you won't be disappointed.
Second, program defensively. Assume any communication you have with the outside world is entirely compromised. Take in only parameters that the user should be able to control. Expose only that data that the user should be able to see.
Sanitize input. Sanitize sanitize sanitize. If it's input that will be displayed on the site (nicknames for a leaderboard, messages on a forum, anything), sanitize it appropriately. If it's input that might be sent to SQL, sanitize that too. In fact, don't even write SQL directly, use an intermediary of some sort.
There's really only one thing you can't defend from if you're using HTTP. If you use a cookie to identify somebody's identity, there's nothing you can do from preventing somebody else in a coffeehouse from sniffing the cookie of somebody else in that coffee house if they're both using the same wireless connection. As long as they're not using a secure connection, nothing can save you from that. Even Gmail isn't safe from that attack. The only thing you can do is make sure an authorization cookie can't last forever, and consider making them re-login before they do something big like change password or buy something.
But don't sweat it. A lot of the security details have been taken care of by whatever system you're building on top of (you ARE building on top of SOMETHING, aren't you? Spring MVC? Rails? Struts? ). It's really not that tough. If there's big money at stake, you can pay a security auditing company to try and break it. If there's not, just try to think of everything reasonable and fix holes when they're found.
But don't stop being paranoid. They're always out to get you. That's just part of being popular.
P.S. One more hint. If you have javascript like this:
if( document.forms["myForm"]["payment"].value < 0 ) {
alert("You must enter a positive number!");
return false;
}
Then you'd sure as hell have code in the backend that goes:
verify( input.payment >= 0 )
"Quote" everything so that it can not have any meaning in the 'target' language: SQL, HTML, JavaScript, etc.
This will get in the way of course, so you have to be careful to identify when this needs special handling, like through administrative privileges to deal with some if the data.
There are multiple types of injection and cross-site scripting (see this earlier answer), but there are defenses against all of them. You'll clearly want to look at stored procedures, white-listing (e.g. for HTML input), and validation, to start.
Beyond that, it's hard to give general advice. Other people have given some good tips, such as always doing server-side validation and researching past attacks.
Be vigilant, but not afraid.
No validation in web-application layer.
All validations and security checks should be done by the domain layer or business layer.
Throw exceptions with valid error messages and let these execptions be caught and processed at presentation layer or web-application.
You can use validation framework
to automate validations with the help
of custom validation attributes.
http://imar.spaanjaars.com/QuickDocId.aspx?quickdoc=477
There should be some documentation of known exploits for the language/system you're using. I know the Zend PHP Certification covers that issue a bit and you can read the study guide.
Why not hire an expert to audit your applications from time to time? It's a worthwhile investment considering your level of concern.
Our client always say: "Deal with my users as they dont differentiate between the date and text fields!!"
I code in Java, and my code is full of asserts i assume everything is wrong from the client and i check it all at server.
#1 thing for me is to always construct static SQL queries and pass your data as parameters. This limits the quoting issues you have to deal with enormously. See also http://xkcd.com/327/
This also has performance benefits, as you can re-use the prepared queries.
There are actually only 2 things you need to take care with:
Avoid SQL injection. Use parameterized queries to save user-controlled input in database. In Java terms: use PreparedStatement. In PHP terms: use mysql_real_escape_string() or PDO.
Avoid XSS. Escape user-controlled input during display. In Java/JSP terms: use JSTL <c:out>. In PHP terms: use htmlspecialchars().
That's all. You don't need to worry about the format of the data. Just about the way how you handle it.

How to write a spec for a website

As I'm starting to develop for the web, I'm noticing that having a document between the client and myself that clearly lays out what they want would be very helpful for both parties. After reading some of Joel's advice, doing anything without a spec is a headache, unless of course your billing hourly ;)
In those that have had experience,
what is a good way to extract all
the information possible from the
client about what they want their
website to do and how it looks? Good
ways to avoid feature creep?
What web specific requirements
should I be aware of? (graphic
design perhaps)
What do you use to write your specs in?
Any thing else one should know?
Thanks!
Ps: to "StackOverflow Purists" , if my question sucks, i'm open to feed back on how to improve it rather than votes down and "your question sucks" comments
Depends on the goal of the web-site. If it is a site to market a new product being released by the client, it is easier to narrow down the spec, if it's a general site, then it's a lot of back and forth.
Outline the following:
What is the goal of the site / re-design.
What is the expected raise in customer base?
What is the customer retainment goal?
What is the target demographic?
Outline from the start all the interactive elements - flash / movies / games.
Outline the IA, sit down with the client and outline all the sections they want. Think up of how to organize it and bring it back to them.
Get all changes in writing.
Do all spec preparation before starting development to avoid last minute changes.
Some general pointers
Be polite, but don't be too easy-going. If the client is asking for something impossible, let them know that in a polite way. Don't say YOU can't do it, say it is not possible to accomplish that in the allotted time and budget.
Avoid making comparisons between your ideas and big name company websites. Don't say your search function will be like Google, because you set a certain kind of standard for your program that the user is used to.
Follow standards in whatever area of work you are. This will make sure that the code is not only easy to maintain later but also avoid the chances of bugs.
Stress accessibility to yourself and the client, it is a big a thing.
More stuff:
Do not be afraid to voice your opinion. Of course, the client has the money and the decision at hand whether to work with you - so be polite. But don't be a push-over, you have been in the industry and you know how it works, so let them know what will work and what won't.
If the client stumbles on your technical explanations, don't assume they are stupid, they are just in another industry.
Steer the client away from cliches and buzz words. Avoid throwing words like 'ajax' and 'web 2.0' around, unless you have the exact functionality in mind.
Make sure to plan everything before you start work as I have said above. If the site is interactive, you have to make sure everything meshes together. When the site is thought up piece by piece, trust me it is noticeable.
One piece of advice that I've seen in many software design situations (not just web site design) relates to user expectations. Some people manage them well by giving the user something to see, while making sure that the user doesn't believe that the thing they're seeing can actually work.
Paper prototyping can help a lot for this type of situation: http://en.wikipedia.org/wiki/Paper_prototyping
I'm with the paper prototyping, but use iplotz.com for it, which is working out fine so far from us.
It makes you think about how the application should work in more detail, and thus makes it less likely to miss out on certain things you need to build, and it makes it much easier to explain to the client what you are thinking of.
You can also ask the client to use iplotz to explain the demands to you, or cooperate in it.
I also found looking for client questionnaires on google a good idea to help generate some more ideas:
Google: web client questionnaire,
There are dozens of pdfs and other forms to learn from