In GWT javadoc, we are advised
If you only need a simple label (text,
but not HTML), then the Label widget
is more appropriate, as it disallows
the use of HTML, which can lead to
potential security issues if not used
properly.
I would like to be educated/reminded about the security susceptibilities. It would be nice to list the description of the mechanisms of those risks.
Are the susceptibilities equally potent on GAE vs Amazon vs my home linux server?
Are they equally potent across the browser brands?
Thank you.
The security risk of using the HTML widget is that it doesn't escape html characters like the Label widget does. This opens up the possibility of Cross-site scripting (XSS). Therefore you should not use it to display data supplied from users. There's risk in itself to use it for string literals in your code.
How your GWT project is deployed doesn't matter much for the security risk, as the risk is there anyway if you allow user supplied data being printed back unescaped. But how your site is used, e.g. how much content is user contributed, and how popular your page is have a huge effect of how likely it is that someone actually will exploit a weakness.
Though ... if you have a habit of typing malicious javascript in your own sourcecode using Labels won't help anyway... :P
Related
We've for years been struggling on what Rich Text Editor to use for our clients. The large share of our client request concern issues with the editor, and we have summarized the causes to be 90% of the time a combination of the following:
Client does not understand anything of HTML and can only use what the editor provides.
Client wants to do "advanced stuff" like specifically formatted footers under photos (we create classes for these kinds of requests), combined with floating boxes and special fonts.
Client copies stuff from other places - especially Word.
Images go into the article as well.
From a client-perspective, these are not overly crazy things, and I want to be able to actually support this. However, after so many years and trying several editors (mainly CKEditor and TinyMCE) we always end up with continuous requests because things go wrong. We generally fix it manually or hack up things in the editor, but I am really interested if this is normal. Are there Rich Text Editors that can do the above easily and without causing much trouble? Do your clients also have problems with these things?
Rich text editing on the web is a complex problem. Effectively, vendors that sell products like ckeditor and tinyMCE are restricted by what the browsers provide in terms of a rich text editing engine.
Each editor is merely an iframed document with contentEditable set to true. This exposes the underlying RTE engine of the browser, including all the quirks and implementation specifics of that browser.
Bottom line is, you are never going to achieve a word like experience in the browser with the current state of browsers. Period. Your clients will always have an unrealistic expectation of rich text editing on the web.
Our job is to mitigate the problems through managing expectations. They need to know that this is the web, not Word. They need to know that most things will work OK, and some will work poorly. They also need to know that things are never going to be consistent across all browsers.
HTML e-mails are a complex beast. Deciding what to send (as the sender) and what to display (as the recipient) is tricky and potentially dangerous.
On the recipient side of things, we have webmail and we have regular e-mail clients. For my purposes, I consider 'webmail' anything that displays the HTML e-mail as part of something that in itself is HTML, and regular e-mail clients anything that displays the HTML e-mail in a different context (e.g. OS- and program-specific GUI).
What should webmail do with HTML headers (<head>, <title>, <meta>, ...) in an e-mail?
Is there a spec somewhere, be that as an actual standard or de-facto-standard?
My motivation for asking is that we use HTML Purifier to sanitise our HTML and if its Core.CollectErrors feature reports changes, they're reported. This 'reported' is both necessary... and frustrating. We strip out some of the reported errors as insignificant for our purposes, but HTML headers mark a massive hurdle:
Someone could potentially use <link> in their e-mail, which we would strip out. (HTML Purifier is intended for HTML fragments, not full documents)
The desire to use things like <link> in HTML e-mails certainly seems to exist, and there are plenty of e-mail clients that send <meta>-tags in an HTML header (e.g. Outlook), but how are things handled in the wild? Is it safe to strip them out silently (which for our purposes denotes a 'non-breaking change') and lay proverbial blame on the sending party if it does break? Is that reasonable? Has someone ever decided this in the one or other way? My google-fu is weak. :(
I seriously doubt there is a spec anywhere which specifies how HTML emails should be embedded into webmail clients. It's mostly a question of achieving parity with existing webmail providers which provide the ability to view HTML email. I suspect stylesheets are a notable exception, but I also suspect most HTML mailers that support that heavy styling are fairly constrained in what they can and cannot do, given how webmail things may handle them. I'd suggest doing some experiments, and consulting the source code of open-source webmail systems like SquirrelMail.
If you are worried about information loss, one thing that many clients allow you to do is download the original HTML, for offline viewing. Of course, it tends to be pretty atrocious, so I don't know why anyone would do that.
You should look at:
http://htmlemailboilerplate.com/
You will find a boilerplate code for HTML emails. There is also a good practice slideshow.
My approach to HTML email is to to write the sort of basic HTML we were doing in the 1990's - table layouts, minimal inline CSS (for colours only) and thats pretty much it. I don't know how modern clients deal with CSS positioning but people are still using Outlook 2003 which I believe is based on the hateful rendering engine that runs IE6 - so it pays to go for lowest common denominator.
I've never seen anything that looked like a standard for it, i've seen some email clients (GMail) strip out various things - including CSS, and others just ignore certain things (Outlook & background images).
Rationally, I can't think of what use any meta information would be in an email anyway - its hard enough to get people to read your mail anyway, I suspect even less will view the source! I've always included a title tag in case anything wants to use it as a subject - but even thats a stab in the dark.
Whenever i've looked at how mails are requested server side - admittedly a while ago, but I never noticed that anything was cached. You open the mail, the requests are made again. I'm sure things have moved on since I last checked, but personally - i'd still be inclined keep HTML email as simple and stripped back as possible.
They seem to go against the advice
You may be tempted to store the chosen
locale in a session or a cookie. Do
not do so. The locale should be
transparent and a part of the URL.
from the official Rails I18n guide
I tried seeing if they set a cookie for the locale, and it seems like they don't.
So how do they do it and why they chose not to use URLs for different languages, like http://github.com/en/foo, http://github.com/fr/foo, etc. ?
It uses the _gh_sess cookie to store that information.
Multiple URLS are sometimes avoided because they create something looking a lot like duplicate content for search engines. This can deplete your Google Karma and lead to poor SEO performance.
I would like to know if and how it is possible to create a clickable email-link for websites, that are "encrypted" in a way emailspiders can't collect them and it is still possible for living users to click it to open in email-clients or even copy it.
I saw some links that were done in javascript but I on't know how they did this and how "safe" they are.
thank you in advance for any reply
Most approaches to this are splitting the address across multiple elements and inserting extra formatting; then for JS-enabled browsers, they use JavaScript to turn it back into an e-mail address.
The poster example for this is SpamSpan, which even has several "levels" of obfuscation - each level progressively less and less resembles an e-mail in the source code, yet it still manages to piece it back together by JS. Although some spambots today are supposedly capable of executing JavaScript, te vast majority doesn't - and the e-mails are still human-readable with JS off. An advantage of JS-assisted de/obfuscation is that it doesn't rely on external servers, you just need to (simply) integrate the JS library.
Another approach is taken by reCAPTCHA Mailhide - the e-mail is revealed only after solving a CAPTCHA (same type as for normal reCAPTCHA). This is less convenient for the user, but practically safe against robots. A disadvantage of this is that it depends on reCAPTCHA's servers (in essence, on Google) - some people are dead-set against any external dependencies.
This would be a very simple and effective way:
Scramble email addresses
All it does is convert it into ASCII, and all you need to do is insert it where your email address would go!
Although there are more (crazily) secure ways you can choose, this would be the simply option. You can also try this solution, it uses JavaScript to protect your email.
Hope this helps!
Yes, I realize this question was asked and answered, but I have specific questions about this that I feel were not clear on that thread and I'd prefer not to get lost in the shuffle on another thread as well.
Previous threads said that rendering the email address to an image the way Facebook does is overkill and unprofessional user experience for business/professional websites. And it seems that the general consensus is to use a JavaScript document.write solution using html entities or some other method that breaks up and/or makes the string unreadable by a simple bot. The application I'm building doesn't even need the "mailto:" functionality, I just need to display the email address. Also, this is a business web application, so it needs to look/act as professional as possible. Here are my questions:
If I go the document.write route and pass the html entity version of each character, are there no web crawlers sophisticated enough to execute the javascript and pull the rendered text anyway? Or is this considered best practice and completely (or almost completely) spammer proof?
What's so unprofessional about the image solution? If Facebook is one of the highest trafficked applications in the world and not at all run by amateurs, why is their method completely dismissed in the other thread about this subject?
If your answer (as in the other thread) is to not bother myself with this issue and let the users' spam filters do all the work, please explain why you feel this way. We are displaying our users' email addresses that they have given us, and I feel responsible to protect them as much as I can. If you feel this is unnecessary, please explain why.
Thanks.
It is not spammer proof. If someone looks at the code for your site and determines the pattern that you are using for your email addresses, then specific code can be written to try and decipher that.
I don't know that I would say it is unprofessional, but it prevents copy-and-paste functionality, which is quite a big deal. With images, you simply don't get that functionality. What if you want to copy a relatively complex email address to your address book in Outlook? You have to resort to typing it out which is prone to error.
Moving the responsibility to the users spam filters is really a poor response. While I believe that users should be diligent in guarding against spam, that doesn't absolve the person publishing the address from responsibility.
To that end, trying to do this in an absolutely secure manner is nearly impossible. The only way to do that is to have a shared secret which the code uses to decipher the encoded email address. The problem with this is that because the javascript is interpreted on the client side, there isn't anything that you can keep a secret from scrapers.
Encoders for email addresses nowadays generally work because most email bot harvesters aren't going to concern themselves with coding specifically for every site. They are going to try and have a minimal algorithm which will get maximum results (the payoff isn't worth it otherwise). Because of this, simple encoders will defeat most bots. But if someone REALLY wants to get at the emails on your site, then they can and probably easily as well, since the code that writes the addresses is publically available.
Taking all this into consideration, it makes sense that Facebook went the image route. Because they can alter the image to make OCR all but impossible, they can virtually guarantee that email addresses won't be harvested. Given that they are probably one of the largest email address repositories in the world, it could be argued that they carry a heavier burden than any of us, and while inconvenient, are forced down that route to ensure security and privacy for their vast user base.
Quite a few reasons Javascript is a good solution for now (that may change as the landscape evolves).
Javascript obfuscation is a better mouse trap for now
You just need to outrun the others. As long as there are low hanging fruit, spammers will go for those. So unless everyone starts moving to javascript, you're okay for now at least
most spammers use http based scripts which GET and parse using regex. using a javascript engine to parse is certainly possible but will slow things down
Regarding the facebook solution, I don't consider it unprofessional but I can clearly see why purists may disagree.
It breaks accessibility standards (cannot be parsed by browsers, voice readers or be clicked.
It breaks semantic construct (it's an image, not a mailto link anymore)
It breaks the presentational layer. If you increase browser default font size or use high contrast custom CSS, it won't apply to the email.
Here is a nice blog post comparing a few methods, with benchmarks.
http://techblog.tilllate.com/2008/07/20/ten-methods-to-obfuscate-e-mail-addresses-compared/