mature, powerful Perl library to support text/string operation? - perl

Do we have a mature, powerful Perl library to support text/string operation ? for example, if I need to trim a string, I need to write a function like below.
Then question is, do we have an existing API so that I can import and call it ? just like StringUtils.trim(s) in Apache Common Lang.
Thanks.
sub trim($) {
my $string = shift;
$string =~ s/^\s+//;
$string =~ s/\s+$//;
return $string;
}

Most string operations in StringUtils are so trivially done in Perl that I question the need for such a module. Yes, it would produce more readable code to those less familiar with Perl, but it would require learning the peculiarities of the given routines, which would be more work for those more familiar with Perl.

Perhaps String::Util?
Searching CPAN should be your first step when looking for libraries.
Here are a few other string modules:
http://search.cpan.org/modlist/String_Language_Text_Processing/String
http://search.cpan.org/modlist/String_Language_Text_Processing/Text

Related

Evaluating escape sequences in perl

I'm reading strings from a file. Those strings contain escape sequences which I would like to have evaluated before processing them further. So I do:
$t = eval("\"$t\"");
which works fine. But I'm having doubt about the performance. If eval is forking a perl process each time, it will be a performance killer. Another way I considered to do the job were regex, where I have found related questions in SO.
My question: is there a better, more efficient way to do it?
EDIT: before calling eval in my example $t is containing \064\065\x20a\n. It is evaluated to 45 a<LF>.
It's not quite clear what the strings in the file look like and what you do to them before passing off to eval. There's something missing in the explanation.
If you simply want to undo C-style escaping (as also used in Perl), use Encode::Escape:
use Encode qw(decode);
use Encode::Escape qw();
my $string_with_unescaped_literals = decode 'ascii-escape', $string_with_escaped_literals;
If you have placeholders in the file which look like Perl variables that you want to fill with values, then you are abusing eval as a poor man's templating engine. Use a real one that does not have the dangerous side effect of running arbitrary code.
$string =~ s/\\([rnt'"\\])/"qq|\\$1|"/gee
string eval can solve the problem too, but it brings up a host of security and maintenance issues, like # in string
oh gah don't use eval for this, thats dangerous if someone provides it with input like "system('sync;reboot');"..
But, you could do something like this:
#!/usr/bin/perl
$string = 'foo\"ba\\\'r';
printf("%s\n", $string);
$string =~ s/\\([\"\'])/$1/g;
printf("%s\n", $string);

What Perl module can I use to test CGI output for common errors?

Is there a Perl module which can test the CGI output of another program? E.g. I have a program
x.cgi
(this program is not in Perl) and I want to run it from program
test_x_cgi.pl
So, e.g. test_x_cgi.pl is something like
#!perl
use IPC::Run3
run3 (("x.cgi"), ...)
So in test_x_cgi.pl I want to automatically check that the output of x.cgi doesn't do stupid things like, e.g. print messages before the HTTP header is fully outputted. In other words, I want to have a kind of "browser" in Perl which processes the output. Before I try to create such a thing myself, is there any module on CPAN which does this?
Please note that x.cgi here is not a Perl script; I am trying to write a test framework for it in Perl. So, specifically, I want to test a string of output for ill-formedness.
Edit: Thanks
I have already written a module which does what I want, so feel free to answer this question for the benefit of other people, but any further answers are academic as far as I'm concerned.
There's CGI::Test, which looks like what you're looking for. It specifically mentions the ability to test non-Perl CGI programs. It hasn't been updated for a while, but neither has the CGI spec.
There is Test::HTTP. I have not used it, but seems to have an interface that fits your requirements.
$test->header_is($header_name, $value [, $description]);
Compares the response header
$header_name with the value $value
using Test::Builder-is>.
$test->header_like($header_name, $regex, [, $description]);
Compares the response header
$header_name with the regex $regex
using Test::Builder-like>.
Look at the examples from chapter 16 from the perl cookbook
16.9. Controlling the Input, Output, and Error of Another Program
It uses IPC::Open3.
Fom perl cookbook, might be modified by me, see below.
Example 16.2
cmd3sel - control all three of kids in, out, and error.
use IPC::Open3;
use IO::Select;
$cmd = "grep vt33 /none/such - /etc/termcap";
my $pid = open3(*CMD_IN, *CMD_OUT, *CMD_ERR, $cmd);
$SIG{CHLD} = sub {
print "REAPER: status $? on $pid\n" if waitpid($pid, 0) > 0
};
#print CMD_IN "test test 1 2 3 \n";
close(CMD_IN);
my $selector = IO::Select->new();
$selector->add(*CMD_ERR, *CMD_OUT);
while (my #ready = $selector->can_read) {
foreach my $fh (#ready) {
if (fileno($fh) == fileno(CMD_ERR)) {print "STDERR: ", scalar <CMD_ERR>}
else {print "STDOUT: ", scalar <CMD_OUT>}
$selector->remove($fh) if eof($fh);
}
}
close(CMD_OUT);
close(CMD_ERR);
If you want to check that the output of x.cgi is properly formatted HTML/XHTML/XML/etc, why not run it through the W3 Validator?
You can download the source and find some way to call it from your Perl test script. Or, you might able to leverage this Perl interface to calling the W3 Validator on the web.
If you want to write a testing framework, I'd suggest taking a look at Test::More from CPAN as a good starting point. It's powerful but fairly easy to use and is definitely going to be better than cobbling something together as a one-off.

Where can I find simple exercises to practice string manipulation in Perl?

Does anyone know where I can find a collection of very simple exercises on string manipulation in order to practice Perl language?
I'm learning Perl just to use the resources it has got of string manipulation.
Learning Perl book has lots of exercises as far as I recall.
Also, search StackOverflow on "perl" and ("regex" or "split") and try to answer any questions yourself and then go and understand other people's answers
Also, go through any string manipulation tasks on Rosetta Code and do them in Perl
You could do this simply with Test::More. Do your manipulations and then test your guess:
my $subs1 = substr( 'apple', 0, 1 );
is( $subs1, 'a' );
The underlying testing system will usually tell you what you guessed wrong. If you did this in REPL (since it's very linear), it could serve as a training exercise.
You can start with
my $s = 'PERL';
$s =~ s/PERL/Perl/;
or
$s = ucfirst lc $s;
perldoc -q string

Can I make sure Perl code written on 5.10+ will run on 5.8?

Some of the new features of Perl 5.10 and 5.12, such as "say", are defined as features, that you can enable or disallow explicitly using the "feature" pragma. But other additions, like the named capture groups of regexes, are implicit.
When I write Perl using a 5.10+ interpreter, but want it to also run on 5.8, can I make Perl complain about using anything that's not in 5.8? Obviously, it is good practice to test your code on all major versions you intend it to run on, but it'd still be nice to have Perl warn me automatically.
When I want to ensure that a program will run under particular versions of perl, I test it under that version of perl. A feature of my release application tests under multiple perls before it actually uploads.
This requires that you have a proper test suite and write enough tests. It's easy to maintain several separate perl installations at the same time, too, as I show in Effective Perl Programming.
Test::MinimumVersion almost sounds like it might work, but it has several limitations. It only looks at the file you give it (so it will not check anything you load), and I don't think it actually looks inside regex patterns. Each of these report that the minimum version is 5.004, which is not true for any of them:
#!perl
use Perl::MinimumVersion;
my $p_flag = <<'SOURCE';
'123' =~ m/[123]/p; # 5.10 feature
SOURCE
my $named_capture = <<'SOURCE';
'123' =~ m/(?<num>[123])/; # 5.10 feature
SOURCE
my $r_line_ending = <<'SOURCE';
'123' =~ m/[123]\R/p; # 5.12 feature
SOURCE
my $say = <<'SOURCE';
say 'Hello';
SOURCE
my $smart_match = <<'SOURCE';
$boolean = '123' ~~ #array;
SOURCE
my $given = <<'SOURCE';
given( $foo ) {
when( /123/ ) { say 'Hello' }
};
SOURCE
foreach my $source ( $p_flag, $named_capture, $r_line_ending, $say, $smart_match, $given ) {
print "----Source---\n$source\n-----";
my $version = Perl::MinimumVersion->new( \$source )->minimum_version;
print "Min version is $version\n";
}
Part of the reason Perl::MinimumVersion works is because it looks for hints that the source gives it already, such as use 5.010, and use feature so on. However, that's not the only way to enable features. And, as you'll note, it misses things like the /p flag, at least until someone adds a check for that. However, you'll always be chasing things like that with a PPI solution.
It's easier just to compile it, run the tests, and find out.
I don't think you can make Perl check as you run the code, but
check out Perl::MinimumVersion and Test::MinimumVersion. (The latter is just a Test::Builder wrapper around the former.)
Well from reading other answers your answer is no, but perhaps App::perlbrew can help you install and manage multiple versions of perl for testing.

How can I identify and remove redundant code in Perl?

I have a Perl codebase, and there are a lot of redundant functions and they are spread across many files.
Is there a convenient way to identify those redundant functions in the codebase?
Is there any simple tool that can verify my codebase for this?
You could use the B::Xref module to generate cross-reference reports.
I've run into this problem myself in the past. I've slapped together a quick little program that uses PPI to find subroutines. It normalizes the code a bit (whitespace normalized, comments removed) and reports any duplicates. Works reasonably well. PPI does all the heavy lifting.
You could make the normalization a little smarter by normalizing all variable names in each routine to $a, $b, $c and maybe doing something similar for strings. Depends on how aggressive you want to be.
#!perl
use strict;
use warnings;
use PPI;
my %Seen;
for my $file (#ARGV) {
my $doc = PPI::Document->new($file);
$doc->prune("PPI::Token::Comment"); # strip comments
my $subs = $doc->find('PPI::Statement::Sub');
for my $sub (#$subs) {
my $code = $sub->block;
$code =~ s/\s+/ /; # normalize whitespace
next if $code =~ /^{\s*}$/; # ignore empty routines
if( $Seen{$code} ) {
printf "%s in $file is a duplicate of $Seen{$code}\n", $sub->name;
}
else {
$Seen{$code} = sprintf "%s in $file", $sub->name;
}
}
}
It may not be convenient, but the best tool for this is your brain. Go through all the code and get an understanding of its interrelationships. Try to see the common patterns. Then, refactor!
I've tagged your question with "refactoring". You may find some interesting material on this site filed under that subject.
If you are on Linux you might use grep to help you make list all of the functions in your codebase. You will probably need to do what Ether suggests and really go through the code to understand it if you haven't already.
Here's an over-simplified example:
grep -r "sub " codebase/* > function_list
You can look for duplicates this way too. This idea may be less effective if you are using Perl's OOP capability.
It might also be worth mentioning NaturalDocs, a code documentation tool. This will help you going forward.