Related
Do we have a mature, powerful Perl library to support text/string operation ? for example, if I need to trim a string, I need to write a function like below.
Then question is, do we have an existing API so that I can import and call it ? just like StringUtils.trim(s) in Apache Common Lang.
Thanks.
sub trim($) {
my $string = shift;
$string =~ s/^\s+//;
$string =~ s/\s+$//;
return $string;
}
Most string operations in StringUtils are so trivially done in Perl that I question the need for such a module. Yes, it would produce more readable code to those less familiar with Perl, but it would require learning the peculiarities of the given routines, which would be more work for those more familiar with Perl.
Perhaps String::Util?
Searching CPAN should be your first step when looking for libraries.
Here are a few other string modules:
http://search.cpan.org/modlist/String_Language_Text_Processing/String
http://search.cpan.org/modlist/String_Language_Text_Processing/Text
See Update Below
I am going through a whole bunch of Perl scripts that someone at my company wrote. He used join to concatenate strings. For example, he does this (taken hot out of a real Perl script):
$fullpath=join "", $Upload_Loc, "/", "$filename";
Instead of this:
$fullpath = "$Upload_Loc" . "/" . "$filename";
Or even just this:
$fullpath = "$Upload_Loc/$filename";
He's no longer here, but the people who are here tell me he concatenated strings in this way because it was somehow better. (They're not too clear why).
So, why would someone use join in this matter over using the . concatenate operator, or just typing the strings together as in the third example? Is there a valid reason for this style of coding?
I'm trying to clean up lot of the mess here, and my first thought would be to end this practice. It makes the code harder to read, and I'm sure doing a join is not a very efficient way to concatenate strings. However, although I've been writing scripts in Perl since version 3.x, I don't consider myself a guru because I've never had a chance to hang around with people who were better than Perl than I am and could teach me Perl's deep inner secrets. I just want to make sure that my instinct is correct here before I make a fool of myself.
I've got better ways of doing that around here.
Update
People are getting confused. He isn't just for concatenating paths. Here's another example:
$hotfix=join "", "$app", "_", "$mod", "_", "$bld", "_", "$hf", ".zip";
Where as I would do something like this:
$hotfix = $app . "_" $mod . "_" . $bld . "_" . "$hf.zip";
Or, more likely
$hotfix = "${app}_${mod}_${bld}_${hf}.zip";
Or maybe in this case, I might actually use join because the underscore causes problems:
$hotfix = join("_", $app, $mod, $bld, $hf) . ".zip";
My question is still: Is he doing something that real Perl hackers know, and a newbie like me who's been doing this for only 15 years don't know about? Do people look at me concatenating strings using . or just putting them in quotes and say "Ha! What a noob! I bet he owns a Macintosh too!"
Or, does the previous guy just has a unique style of programming much like my son's unique style of driving includes running head on into trees?
I've done my fair share of commercial Perl development for "a well known online retailer", and I've never seen join used like that. Your third example would be my preferred alternative, as it's simple, clean and readable.
Like others here, I don't see any genuine value in using join as a performance enhancer. It might well perform marginally better than string interpolation but I can't imagine a real-world situation where the optimisation could be justified yet the code still written in a scripting language.
As this question demonstrates, esoteric programming idioms (in any language) just lead to a lot of misunderstanding. If you're lucky, the misunderstanding is benign. The developers I enjoy working alongside are the ones who code for readability and consistency and leave the Perl Golf for the weekends. :)
In short: yes, I think his unique style is akin to your son's unique style of driving. :)
I would consider
$fullpath = join "/", $Upload_Loc, $filename;
clearer than the alternatives. However, File::Spec has been in the core for a long time, so
use File::Spec::Functions qw( catfile );
# ...
$fullpath = catfile $Upload_Loc, $filename;
is much better. And, better yet, there is Path::Class:
use Path::Class;
my $fullpath = file($Upload_Loc, $filename);
Speed is usually not a factor I consider in concatenating file names and paths.
The example you give in your update:
$hotfix=join "", "$app", "_", "$mod", "_", "$bld", "_", "$hf", ".zip";
demonstrates why the guy is clueless. First, there is no need to interpolate those individual variables. Second, that is better written as
$hotfix = join '_', $app, $mod, $bld, "$hf.zip";
or, alternatively, as
$hotfix = sprintf '%s_%s_%s_%s.zip', $app, $mod, $bld, $hf;
with reducing unnecessary punctuation being my ultimate goal.
In general, unless the lists of items to be joined are huge, you will not see much of a performance difference changing them over to concatenations. The main concern is readability and maintainability, and in those cases, if the string interpolation form is clearer, you can certainly use that.
I would guess that this is just a personal coding preference of the original programmer.
In general, I use join when the length of the list is large/unknown, or if I am joining with something other than the empty string (or a single space for array interpolation). Otherwise, using . or simple string interpolation is usually shorter and easier to read.
Perl compiles double-quoted strings into things with join and . catenation in them:
$ perl -MO=Deparse,-q -e '$fullpath = "$Upload_Loc/$filename"'
$fullpath = $Upload_Loc . '/' . $filename;
-e syntax OK
$ perl -MO=Deparse,-q -le 'print "Got #ARGV"'
BEGIN { $/ = "\n"; $\ = "\n"; }
print 'Got ' . join($", #ARGV);
-e syntax OK
which may inspire you to things like this:
$rx = do { local $" = "|"; qr{^(?:#args)$} };
as in:
$ perl -le 'print $rx = do { local $" = "\t|\n\t"; qr{ ^ (?xis: #ARGV ) $ }mx }' good stuff goes here
(?^mx: ^ (?xis: good |
stuff |
goes |
here ) $ )
Nifty, eh?
Interpolation is a little slower than joining a list. That said I've never known anyone to take it to this extreme.
You could use the Benchmark module to determine how much difference there is.
Also, you could ask this question over on http://perlmonks.org/. There are real gurus there who can probably give you the inner secrets much better than I can.
All of those approaches are fine.
Join can sometimes be more powerful than . concatentate, particularly when some of the things you are joining are arrays:
join "/", "~", #document_path_elements, $myDocument;
While recognizing that in all the examples I see here, there are no significant performance differences, a series of concatenation, whether with . or with double-quotish interpolation, is indeed going to be more memory-inefficient than a join, which precomputes the needed string buffer for the result instead of expanding it several times (potentially even needing to move the partial resutl to a new location each time).
I have a problem with the criticism I see leveled here; there are many right ways to speak perl, and this is certainly one of them.
Inconsistent indentation, on the other hand...
I have a Perl codebase, and there are a lot of redundant functions and they are spread across many files.
Is there a convenient way to identify those redundant functions in the codebase?
Is there any simple tool that can verify my codebase for this?
You could use the B::Xref module to generate cross-reference reports.
I've run into this problem myself in the past. I've slapped together a quick little program that uses PPI to find subroutines. It normalizes the code a bit (whitespace normalized, comments removed) and reports any duplicates. Works reasonably well. PPI does all the heavy lifting.
You could make the normalization a little smarter by normalizing all variable names in each routine to $a, $b, $c and maybe doing something similar for strings. Depends on how aggressive you want to be.
#!perl
use strict;
use warnings;
use PPI;
my %Seen;
for my $file (#ARGV) {
my $doc = PPI::Document->new($file);
$doc->prune("PPI::Token::Comment"); # strip comments
my $subs = $doc->find('PPI::Statement::Sub');
for my $sub (#$subs) {
my $code = $sub->block;
$code =~ s/\s+/ /; # normalize whitespace
next if $code =~ /^{\s*}$/; # ignore empty routines
if( $Seen{$code} ) {
printf "%s in $file is a duplicate of $Seen{$code}\n", $sub->name;
}
else {
$Seen{$code} = sprintf "%s in $file", $sub->name;
}
}
}
It may not be convenient, but the best tool for this is your brain. Go through all the code and get an understanding of its interrelationships. Try to see the common patterns. Then, refactor!
I've tagged your question with "refactoring". You may find some interesting material on this site filed under that subject.
If you are on Linux you might use grep to help you make list all of the functions in your codebase. You will probably need to do what Ether suggests and really go through the code to understand it if you haven't already.
Here's an over-simplified example:
grep -r "sub " codebase/* > function_list
You can look for duplicates this way too. This idea may be less effective if you are using Perl's OOP capability.
It might also be worth mentioning NaturalDocs, a code documentation tool. This will help you going forward.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Inspired by the original thread and the up and coming clones, here's one for the Perl community.
What are questions a good Perl programmer should be able to respond to?
Why is use strict helpful?
My bellweather question is What's the difference between a list and an array?.
I also tend to like asking people to show me as many ways as they can to define a scope. There's one that people almost always forget, and another that most people think provides a scope but doesn't.
What is the difference between
if ($foo) { ... }
and
if (defined $foo) { ... }
and when should you use one over the other?
Questions
What is a reference?
How does Perl implement object orientation?
How does Perl's object orientation differ from other languages like C# and Java?
Traditional object orientation in core Perl has largely been superseded by what?
Why?
What is the difference between a package and a module?
What features were implemented in 5.10?
What is a Schwartzian transform?
Explain the difference between these lines of code and the values of the variables.
my $a = (4, 5, 6);
my #a = (4, 5, 6);
my $b = 4, 5, 6;
my $c = #a;
What are some of Perl's greatest strengths?
What are some of Perl's greatest weaknesses?
Name several hallmarks of the "modern Perl" movement.
What does the binding operator do?
What does the flip-flop operator do?
What is the difference between for and foreach?
What makes Perl difficult to parse?
What are prototypes?
What is AUTOLOAD?
What is the Perl motto?
Why is this a problem?
What does use strict; do? Why is it useful?
What does the following block of code do?
print (3 + 4) * 2;
Tests
Implement grep using map.
Implement and use a dispatch table.
Given a block of text, replace a word in that block with the return value of a function that takes that word as an argument.
Implement a module, including documentation compatible with perldoc.
Slurp a file.
Draw a table that illustrates Perl's concept of truthiness.
What is the difference between my and our?
What is the difference between my and local?
For the above, when is it appropriate to use one over the other?
What are list context and scalar context?
What is the difference between my $x = ... and my($x) = ...?
What does my($x,undef,$z) = ... do?
Why is my(#a,#b) = (#list1, #list2) likely a bug?
How can a user-defined sub know whether it was called in list or scalar context? Give an example of when it makes sense for the same sub to return different values in one context or the other.
What is the difference between /a(.*)b/ and /a(.*?)b/?
What's wrong with this code?
my #array = qw/a b c d e f g h/;
for ( #array ) {
my $val = shift #array;
print $val, "\n";
}
my $a = 1;
if($a) {
my $a = 2;
}
print $a;
What is the value of $a at the end?
What's wrong with using a variable as a variable name?
Study guide: Part 1, Part 2, and Part 3.
I think brian d foy's approach is an ingenious tactic to test knowledge, understanding, and partiality about the language and the programming craft in general: What are five things you hate about your favorite language?. If they can't name 5 they probably aren't great with the language, or are totally inept at other approaches.
He applies this to people trying to a push a language: I would extend that and say it is just as applicable here. I would expect every good Perl programmer to be able to name five things they don't like. And, I would expect those five things to have some degree of merit.
Write code that builds a moderately complex data structure, say an array of hashes of arrays. How would you access a particular leaf? How would you traverse the entire structure?
For each of the following problems, how would you solve it using hashes?
Compute set relationships, e.g., union, intersection, mutual exclusion.
Find unique elements of a list.
Write a dispatch table.
My favourite question. What is following code missing:
open(my $fh, "<", "file.txt");
while (<$fh>) {
print $_;
}
close($fh);
This question should open discussion about error handling in perl. It also can be adopted to other languages too.
Some months ago, chromatic—author of Modern Perl—wrote a similar nice article “How to Identify a Good Perl Programmer,” which contains a list of questions that every good Perl programmer should be able to answer effectively.
Some of those nice questions are given below:
What’s the difference between accessing an array element with $items[$index] and #items[$index]?
What’s the difference between == and eq?
How do you load and import symbols from a Perl 5 module?
What is the difference, on the caller side, between return; and return undef;?
What is the difference between reading a file with for and with while?
For complete details read How to Identify a Good Perl Programmer.
How is $foo->{bar}[$baz]($quux) evaluated?
What is a lexical closure? When are closures useful? (Please, no counter-creators!)
What is the difference between list context and scalar context. How do you access each? Is there such a thing as Hash context? Maybe a little bit?
How to swap the values of two variables without using a temporary variable?
What does this one-liner print and why :
perl -pe '}{$_ = $.' file
answer: number of lines in the file, similar to wc -l.
What's wrong with this code :
my $i;
print ++$i + ++$i;
answer: modifying a variable twice in the same statement leads to undefined behaviour.
Simple one: will the if block run :
my #arr = undef;
if (#arr) { ... }
answer: yes
How would you code a reverse() perl builtin yourself ? You can use other perl functions.
answer: many ways. A short one: sub my_reverse {sort {1} #_})
I would also probably dig on regex, as I expect every good Perl programmer to master regex (but not just that). Some possible questions:
what are lookahead and lookbehind assertion /modifierq?
how do you check that two individual parts of a regex are identical ?
what means greedy ?
what are Posix character classes ?
what does \b match ?
what is the use of \c modifier ?
how do you precompile a regex ?
What is the correct way to initialize a empty string?
my $str = q{};
or
my $str = "";
or
my $str = '';
First off, does anyone have a comprehensive list of the Perl special variables?
Second, are there any tasks that are much easier using them? I always unset $/ to read in files all at once, and $| to automatically flush buffers, but I'm not sure of any others.
And third, should one use the Perl special variables, or be more explicit in their coding. Personally I'm a fan of using the special variables to manipulate the way code behaves, but I've heard others argue that it just confuses things.
They are all documented in perlvar.
Note that the long names are only usable if you use English qw( -no_match_vars ); first.
Always remember to local'ize your changes to the punctuation variables. Some of the punctuation variables are useful, others should not be used. For instance, $[ should never be used (it changes the base index of arrays, so local $[ = 1; will cause 1 to refer to the first item in a list or array). Others like $" are iffy. You have to balance the usefulness of not having to do the join manually. For instance, which of these is easier to understand?
local $" = " :: "; #"
my $s = "#a / #b / #c\n";
versus
my $sep = " :: ";
my $s = join(" / ", join($sep, #a), join($sep, #a), join($sep, #a)) . "\n";
or
my $s = join(" / ", map { join " :: ", #$_ }, \(#a, #b, #c)) . "\n";
1) As far as which ones I use often:
$! is quintessential for IO error handling
$# for eval error handling when calling mis-designed libraries (like database ones) whose coders weren't considerate enough to code in decent error handling other than "die"
$_ for map/grep blocks, although I 100% agree with a poster above that using it for regular code is not a good practice.
$| for flushing buffers
2) As far as using punctuation vs. English names, I'll pick on Marc Bollinger's reply above although the same rebuttal goes for anyone arguing that there's no benefit to using English names.
"if you're using Perl, you're obviously not choosing it for neophyte readability"
Marc, I find that is not always (or rather almost never) true. Then again, 99% of my Perl experience is writing production Perl code for large companies, 90% of it full fledged applications instead of 10-line hack scripts, so my analysis may not apply in other domains. The reasons such thinking as Marc's is wrong are:
Just because I'm a Perl non-neophyte (to put it mildly), some noob analyst hired a year ago - or an outsourced "genius" - is probably not. You may not want to confuse them any more than they already are. "If code was hard to write, it should be hard to read" is not exactly high on the list of good attitudes of professional developers, in any language.
When I'm up at 2am, half-asleep and troubleshooting a production problem, I really do not want to depend on the ability of my already-nearly-blind eyes to distinguish between $! and $|. Especially in a code written by before mentioned "genius" who may not have known which one of them to use and switched them around.
When I'm reading a code left unfinished by a guy who was cough "restructured" cough out of the company a year ago, I'd rather concentrate on intricacies of screwy logic than readability of the punctuation soup.
The three I use the most are $_, #_ and $!.
I like to use $_ when looping through an array, retrieving parameters (as pointed out by Motti, this is actually #_) or performing substitutions:
Example 1.1:
foreach (#items)
{
print $_;
}
Example 1.2:
my $prm1 = shift; # implicit use of #_ or #ARGV depending on context
Example 1.3:
s/" "/""/ig; # implicit use of $_
I use $! in cases like this:
Example 2.1:
open(FILE, ">>myfile") || die "Error: $!";
I do agree though, it makes the code more confusing to someone not familiar with Perl. But confusing other people is one of the joys of knowing the language! :)
Typical ones I use are $_, #_, #ARGV, $!, $/. Other ones I comment heavily.
Brad notes that $# is also a pretty common variable. (Error value from eval()).
I say use them--if you're using Perl, you're obviously not choosing it for neophyte readability. Any more-than-casual developer will likely have a browser/reference window open, and sifting through the perlvar manpage in one window is likely no less arduous than looking up definitions of (and assignments to!) global or external variables. As an example, I just recently encountered the new-in-5.10.x named capture buffers:
/^(?<myName>.*)$/;
# and later
my $capture = %+{'myName'};
And figuring out what was going on wasn't any harder than going into parlvar/perlre and reading a little bit.
I'd much rather find a bunch of wacky special vars in undocumented code than a bunch of wacky algorithms in undocumented code.