Is HTML::StripScripts still safe for removing modern exploits? - perl

I need a way in Perl to strip naughty things, such as XSS, image interjection, and the works.
I found HTML::StripScripts but it hasn't updated in close to two years, and I'm not up to date with all the new exploits.
Is it safe?
What other markups languages (in Perl) would you use?

XSS is a vast topic and exploits come up every other day.
Just removing scripts will not make your code/site safe.
It is better to not try to strip (Blacklisting) certain things. It is safer to white list html/special characters you will allow on your site. i.e <b>, <i>
Defang seems to be the latest/greatest anti XSS lib for perl on cpan
Blacklisting vs Whitelisting
OWASP XSS Cheat Sheet
And I suggest playing with CAL9000 to get an idea of how widespread / tricky XSS is

HTML::StripScripts is a whitelist, and can use a tree-based parser and should be as safe as the whitelist.

Related

How to test/classify CPAN modules for utf8 correctness

Here is an excellent question and the wonderful tchrist's answer with 7+24+52 advices&comments how to make an perl program utf8 safe.
But here is 19k CPAN modules. What is possible to do for differentiating "good" and "bad" ones? (from the utf8's point of view)
For example: File::Slurp if you will read the file with
#use strict encoding warnings utf8 autodie... etc....
my $str = read_file($file, binmode => ':utf8');
you will get different results based on command line switches, and perl -CSDA will not work. Sad. (Yes, i know than Encode::decode("utf8", read_file($file, binmode => ':raw')); will help, but SAD anyway.
My questions:
is here any preferred way, how to test/classify what CPAN modules are utf8 safe/ready/correct?
is here some Test::something already done for utf8 testing?
is here something like Perl::Critic for utf8 - what will check the module source for possible utf8 incorrectness? (because manually checking sources for 7+24+52 things i cannot classify as the "easy way to programming")
or any other way? :)
I understand, than much of CPAN modules simply does not need to know about utf8. But here are zilion others what should.
Please, don't misunderstand me. I love Perl language. I know than perl has extremely powerful utf8 capability. (especially 5.14). The above was not mean as perl critique - but me (and probably some others too) need to know what CPAN modules are OK, and how classify them...)
When doing development using several CPAN modules, and initially everything goes well but in the final testing, you find that some modules does not support utf8, and therefore part of your work is useless - that really can cause a bit disillusionment. :(
Edit:
I understand than all complicated things around the unicode has two roots:
unicode itself - as tchrist excellently analyzed some of problematic points
perl - simple can't break all working modules, live servers etc - so need maintaining backward compatibility.
My only hope: perl6. Is is an totally new and different language. Don't need maintaining any backward compatibility. So I hope, in perl6 will be default some things what is not possible do in perl5 and all utf8 things will be much more intuitive.
But, back to modules: #daxim told: "Authors won't even reveal whether their module is taint-safe, and this feature exists for decades!" - and this is a catastrophe. Maybe (big maybe, and honestly haven't idea how to do it), but maybe we arrived to the time, when need put much-much harder restrictions into CPAN submissions.
At one side i'm really very happy with volunteer works of CPAN authors. At the other side, publishing source code is not only like a free speech "right" - but should obey some rules too.
I understand, than is is nearly impossible make any "revolution", but we probably need some "evolution". Maybe flag any CPAN module what is not utf8 safe. Flag all what are not taint safe. Flag (like here in SO) what module does not meet the minimal coding standards and remove them. Maybe I'm an idealist and/or naive. :)
Chill, the situation is less dire than you're thinking. No one except tchrist operates on this level of Unicode correctness, also see Aristotle's recent commentary. As with all things, you get 80% of the way with 20% of the effort. This base effort, namely getting the topic of character encoding right, is well documented; and jrockway repeats it in his answer in that thread.
Replies to your specific questions:
No, there isn't. There is no concerted effort to collect this information in a central place. The Perl 5 wiki could be used to document problematic modules, Juerd already discusses some in uniadvice. I would really like to see a statement from each module author in their documentation that "this module DTRT w.r.t. encoding", but I don't see it happening. Authors won't even reveal whether their module is taint-safe, and this feature exists for decades!
encoding::warnings can be used to smoke out unintended upgrades. I mention it in the work-flow of Checklist for going the Unicode way with Perl
You can't do that with Perl::Critic or static analysis. I see no other way than knowledgable people poking at the module with pointy characters until it falls apart (or not), like mirod just commented.

What text editor does most accurate job of syntax highlighting Perl

I know I risk asking a speculative question, however, inspired by this recent question I wonder which editor does the best job of syntax highlighting Perl. Being well aware of the difficulties (impossibilities) of parsing Perl I know there will not be a perfect case. Still I wonder if there is a clear leader in faithful representation.
N.B. I use gedit and it works fine, but with known issues.
Komodo Edit does a good job and also scans your modules (including those installed via CPAN) and does well at generating autocomplete data for them.
I'm a loyal vim user and rarely encounter anything odd with the native syntax.vim, except for these cases (I'll edit in more if/when I find them; others please feel free also):
!!expression is better written !!!!expression (everything after two ! is rendered as a comment quoted string; four ! brings everything back to normal)
m## or s### renders everything after the # as a comment; I usually use {} as a delimiter when avoiding / for leaning toothpick syndrome
some edge cases for $hash{key} where key is not a simple alphanumeric string - although it's safer to enclose such key names in '' anyway so as to not have to look up the exact cases for when a bareword is treated as a key name
I haven't used it, but Padre should be good since it's written in Perl. IIRC It uses PPI for parsing
jEdit...with the tweaks that I have amassed over the years. It's got the most customizable syntax highlighting I've ever seen.
I use Emacs in CPerl mode. I think it does a terrific job, although similar to Ether's answer, it's not perfect. What's more, I usually use Htmlize to publish highlighted code to the web. It's kind of annoying to use fancier forums like this one that do their own syntax highlighting, since it's not really any easier and the results aren't as good.

Do Perl CGI programs have a buffer overflow or script vulnerability for HTML contact forms?

My hosting company says it is possible to fill an HTML form text input field with just the right amount of garbage bytes to cause a buffer overflow/resource problem when used with Apache/HTTP POST to a CGI-Bin Perl script (such as NMS FormMail).
They say a core dump occurs at which point an arbitrary script (stored as part of the input field text) can be run on the server which can compromise the site. They say this isn't something they can protect against in their Apache/Perl configuration—that it's up to the Perl script to prevent this by limiting number of characters in the posted fields. But it seems like the core dump could occur before the script can limit field sizes.
This type of contact form and method is in wide use by thousands of sites, so I'm wondering if what they say is true. Can you security experts out there enlighten me—is this true? I'm also wondering if the same thing can happen with a PHP script. What do you recommend for a safe site contact script/method?
I am not sure about the buffer overflow, but in any case it can't hurt to limit the POST size anyway. Just add the following on top of your script:
use CGI qw/:standard/;
$CGI::POST_MAX=1024 * 100; # max 100K posts
$CGI::DISABLE_UPLOADS = 1; # no uploads
Ask them to provide you with a specific reference to the vulnerability. I am sure there are versions of Apache where it is possible to cause buffer overflows by specially crafted POST requests, but I don't know any specific to NMS FormMail.
You definitely should ask for specifics from your hosting company. There are a lot of unrelated statements in there.
A "buffer overflow" and a "resource problem" are completely different things. A buffer overflow suggests that you will crash perl or mod_perl or httpd themselves. If this is the case, then there is a bug in one of these components, and they should reference the bug in question and provide a timeline for when they will be applying the security update. Such a bug would certainly make Bugtraq.
A resource problem on the other hand, is a completely different thing. If I send you many megabytes in my POST, then I could eat an arbitrary amount of memory. This is resolvable by configuring the LimitRequestBody directive in httpd.conf. The default is unlimited. This has to be set by the hosting provider.
They say a core dump occurs at which point an arbitrary script (stored as part of the input field text) can be run on the server which can compromise the site. They say this isn't something they can protect against in their Apache/Perl configuration—that it's up to the Perl script to prevent this by limiting number of characters in the posted fields. But it seems like the core dump could occur before the script can limit field sizes.
Again, if this is creating a core dump in httpd (or mod_perl), then it represents a bug in httpd (or mod_perl). Perl's dynamic and garbage-collected memory management is not subject to buffer overflows or bad pointers in principle. This is not to say that a bug in perl itself cannot cause this, just that the perl language itself does not have the language features required to cause core dumps this way.
By the time your script has access to the data, it is far too late to prevent any of the things described here. Your script of course has its own security concerns, and there are many ways to trick perl scripts into running arbitrary commands. There just aren't many ways to get them to jump to arbitrary memory locations in the way that's being described here.
Formail has been vulnerable to such in the past so I believe your ISP was using this to illustrate. Bad practices in any perl script could lead to such woe.
I recommend ensuring the perl script verifies all user input if possible. Otherwise only use trusted scripts and ensure you keep them updated.

The future of Perl? (Perl 6, employability)

I've found a few related questions, like Python vs. Perl (now deleted) and Is Perl Worth it? (now deleted), but I can't seem to find anything that directly addresses this question.
Is there a legitimate future in Perl? I work in a Perl shop right now, and I came from PHP so I see some of the advantages of an arguably "lower" level language when doing things on the server-level, but it seems to me a lot of the tasks in Perl can be performed more quickly in PHP, and SOME ARGUE (subjective, not my opinion) that Python does these tasks in a more explicit way that's easier to maintain.
Is having this job on my resume ultimately going to make me less employable, especially if the language no longer grows?
A few notes:
I love Perl, so don't think I'm bashing the language. It's fun to use and we use a fairly verbose syntax that is relatively easy to maintain.
I realize that "Vaporware" is a buzzword that isn't necessarily applicable to this situation, because Perl doesn't have a marketing department and they're not "promising" Perl 6 by any date.
I realize that CPAN keeps the community going, so whether Perl 6 comes out or not people continue to build modules that increase possibilities in the language, but that doesn't mean that industry shops realize this, and switch to "more supported" languages that keep coming out with revised versions of the language like Python and (especially) PHP.*
EDIT {CLARIFICATION}
Cade Roux and Telemachus both brought up good points about whether or not your future can be defined by your resume.
To be honest, this was brought up when one of my former employers said "I don't hire anyone with Perl as their last job. That's OLD technology." This was a PHP shop, so take all that with a grain of salt.
Now without defaming my former employer, she's not a tech person AT ALL, so she was really expressing an opinion of a layperson, and in this case my question was more along the lines of "Is there a stigma on this particular technology placed on it by people who don't utilize it?", specifically more along the lines of people who may have had past experience with similar employers. I'm not asking you to look into the future with a magic glass to assume what the next "hot" language would be, but rather if this particular language (which is accused of stunted growth, again by laypeople) has negative connotations placed upon it.
I hope that makes a little more sense.
Plenty of shops - including on Wall Street - heavily use Perl and will continue to do so.
However, I have never seen a PHP or Python used in this industry (not saying it is not used, but that I never encountered. Purely personal anecdote. Nor have I EVER heard any conversation of "Perl can not do X that Python can, let's use Python").
Perl6 is irrelevant to job picture.
Many shops are still on 5.8 or G-d forbid 5.6
More importantly, perl5 continues to evolve, including with features/ideas from Perl6. See Perl 5.10 and 5.11
Plus evolution includes really cool framework like Moose etc...
I can probably come up with more bullets later, but the summary is that no, having a Perl job will in no way negatively affect your career prospects.
However, knowing nothing but Perl may affect it negatively, so make sure you know Java, C#, C++ or something besides dynamic interpreted languages. Not many shops would hire "Perl Only" developer, even if they gladly hire "Perl + other stuff" ones.
See Tim Bunce's Perl Myths slides on slide share.
In short, Perl is not dead and has lots of jobs available.
Anyone who actually watches the development of Perl, would know that that there has perhaps been more work on the Perl language in the past decade, than in the previous decade.
This has been spurred on by the introduction of Perl6.
The introduction of Perl 6 spurred on, the now deeply ingrained, testing culture.
Just look at how much the Rakudo implementation of Perl 6, is tested:
Rakudo Progress http://rakudo.de/progress.png
There has also been a lot of back-porting of Perl 6 features into Perl 5.
For example, the Perl 6 "switch" statement
#!/usr/bin/perl
use strict;
use warnings;
use 5.10.1;
# or
use feature qw'switch say';
my $str = "testing 123";
given( $str ){
when(/(\d+)/){
say $1;
}
when( [0..10] ){
say $_, 'is equal to some number between 0 and 10';
# given, sets the current topic "$_"
}
}
There are few languages I would tie my career to. Perl will always be there and it will always be the best tool for certain kinds of jobs. But this is true for many languages. However, there are also languages which have more competition in some of the spaces where they are used. Perl is one language that has a lot more strong niches.
Still, you wouldn't restrict yourself to using just one language for your entire life - or even in one project if there are better options to solve a problem.
Career-wise, there are basic technologies which are fairly universally used, and of these I think a few of the most valuable are: relational database concepts and SQL, XML/HTML/HTTP/DOM, regular expressions. These are all basically independent of any particular vendor or language, and if you are strong in these areas, choice of language and platform are going to be informed by the problem being addressed.
Perl is, and always will be, a practical language for manipulating large amounts of data. I work in an industry where moving, converting, and parsing large amounts of text and image data is what we do, and I couldn't live without Perl.
Likewise, if you're a sysadmin (especially a Unix one), Perl is a necessary tool. There are tons of places where you need to be able to whip up a quick and dirty application that runs right along with the shell functions.
Languages have niches. Perl has a big stable niche, in many ways much more stable than fad-driven web languages. PHP, for example, is a nice little web language, but its saving grace is that it's quick and easy to develop in, not that it is a particularly great language. I'll tend to use PHP over Perl for web applications (though I use Python over PHP, if I have time), but 90% of the stuff I do in my day-to-day would be nearly impossible in PHP, and is flat trivial in Perl.
#Nate: I love Python. LOVE it. I actually worry that I love it too much, and I'm being irrational about it. PHP is a nice tool, but when your main selling point is "Quick and Easy" then you're running a risk. That was the big push behind original Visual Basic, and we all know how that worked out.
I'd discourage you from putting Perl on your resume - there's already too many people in the perl market and we don't want any more! ... just kidding.
The past is supposedly no guide to the future, but, despite having plenty of C (etc.) and Java in my 'skills toolbag' I've seen more gainful employ from my Perl than anything else over the last decade.
I suspect that offshore-perl-new-build may not be the biggest market in the future, but there's certainly active development in the city and media industries in the UK.
Otherwise, I'd just agree with the points above. Technicians with diverse skills are more able to pick the right tools, and less inclined to 'get religious' about language choice.
If you're looking at a post where the non-technical management have a strong point of view about what technology should and shouldn't be used - I'd place that one in the 'avoid' pile.
To add another separate answer - as you have noted - there is a very real danger when dealing with recruiters and others that your resume will be interpreted and things inferred that are not necessarily how you see yourself, and you might get pigeon-holed.
This WILL happen both ways - too much variation and you aren't an expert in anything OR too little variation and you are only good at one thing.
I don't have a simple answer for combatting that, except to ensure that you emphasize portable skills and also achievements which are independent of technology - making the company more money, landing new business, making new markets, etc.
Perl is another tool in your toolbox. If I have an opening and one person is narrow focused to a specific technology, and another has a broad range of skills I would be more inclined to hire the one with the wider range of skills even if they might not be quite as deeply knowledgeable. Some one who has a wide range of skills on a range of platforms is someone who can think, innovate and adapt.
I don't understand the point of this question. You have a job and you already know Perl. You can ask whether or not to learn new languages and which ones to learn (please don't, but you could), but none of us can or should predict whether or not you're going to get another job using Perl.
You ask, "Is having this job on my resume ultimately going to make me less employable, especially if the language no longer grows?"
Well, it's better than a blank resume, and you can't change your past, so really what are we talking about here?

How can I combine Catalyst and ngettext?

I'm trying to get my head around i18n with Catalyst. As far as I understood the matter, there are two ways to make translations with Perl: Maketext and Gettext. However, I have a requirement to support gettext's .po format so basically I'm going with gettext.
Now, I've found Catalyst::Plugin::I18n and thus Locale::Maketext::Lexicon, which does what I want most of the time. However, it doesn't generate proper pluralization forms, i.e. properly writing msgid_plural and msgstr[x] into the .pot file. This happens probably because Maketext depends on its bracket notation [quant,_1...] and thus has to have the same notation in the translation.
Yet another solution might be using some direct gettext port like Locale::Messages, however this would mean rewriting C::P::I18n.
Does anybody have a proper solution for this problem apart from rewriting several modules? Anything that combines proper gettext with all its features and Catalyst will do.
You will probably get a better answer on the mailing list:
http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
I assume you've also read this:
http://www.catalystframework.org/calendar/2006/18