Perl for loop explanation - perl

I'm looking through perl code and I see this:
sub html_filter {
my $text = shift;
for ($text) {
s/&/&/g;
s/</</g;
s/>/>/g;
s/"/"/g;
}
return $text;
}
what does the for loop do in this case and why would you do it this way?

The for loop aliases each element of the list its looping over to $_. In this case, there is only one element, $text.
Within the body, this allows one to write
s/&/&/g;
etc. instead of having to write
$text =~ s/&/&/g;
repeatedly. See also perldoc perlsyn.

Without an explicit loop variable, the for loop uses the special variable called $_. The substitution statements inside the loop also use the special $_ variable because none other is specified, so this is just a trick to make the source code shorter. I would probably write this function as:
sub html_filter {
my $text = shift;
$text =~ s/&/&/g;
$text =~ s/</</g;
$text =~ s/>/>/g;
$text =~ s/"/"/g;
return $text;
}
This will have no performance consequences and is readable by people other than Perl.

As Mr Hewgill points out, the code sample is implicitly localizing and aliasing to $_, the magical implied variable.
He offers a substitute that is more readable at the cost of boilerplate code.
There is no reason to sacrifice readability for brevity. Simply replace the implicit localization and assignment with an explicit version:
sub html_filter {
local $_ = shift;
s/&/&/g;
s/</</g;
s/>/>/g;
s/"/"/g;
return $_;
}
If I didn't know Perl all that well and came across this code, I'd know that I needed to look at the docs for $_ and local--as a bonus in perlvar, there a few examples of localizing $_.
For anyone who uses Perl a lot, the above should be easy to understand.
So there is really no reason to sacrifice readability for brevity here.

It's just used to alias $text to $_, the default variable. Done because they're too lazy to use an explicit variable or don't want to waste precious cycles creating a new scalar.

Its cleaning up &, < , > and quote characters and replacing them with the appropriate HTML entity chars.

It loops through your text and substitutes ampersands (&) with &amp, < with &lt, > with &gt and " with &quot. You'd do this for output to a .html document... those are the proper entity characters.

The original code could be more flexible by using wantarray to test the desired context:
sub html_filter {
my #text = #_;
for (#text) {
s/&/&/g;
s/</</g;
s/>/>/g;
s/"/"/g;
}
return wantarray ? #text: "#text"; }
That way you could call it in list context or scalar context and get back the correct results, for example:
my #stuff = html_filter('"','>');
print "$_\n" for #stuff;
my $stuff = html_filter('&');
print $stuff;

Related

Perl: test regex without creating new variable

Sorry if this is a basic question, but I'm somewhat new to perl, and I feel there should be a way to do this, but am having trouble finding any documentation. I'm wondering if you can do the following without the throw-away variable $doto:
my $file="foo/bar.c";
my $doto = $file;
$doto =~ s/\.c$/\.o/;
print ".o exists" if ( -f $doto );
That is, something like:
print ".o exists" if ( -f ($file =~ s/\.c$/\.o/gr) );
(but that creates a compile error of course).
My compile error is as follows:
Bareword found where operator expected at - line 2, near "s/.c$/.o/gr"
This is perl, v5.8.9
Your statement
print ".o exists" if ( -f ($file =~ s/\.c$/\.o/gr) )
works fine on versions of Perl that support the /r modifier—v5.14 or better. (Note that /g is superfluous.)
Without it there is no way to apply a substitution without modifying a variable, although you can make it a very short-lived temporary variable using a block
{
(my $doto = $file) =~ s/\.c$/\.o/;
print ".o exists" if -f $doto;
}
This answer talks about making the actual print if -f lookup code more readable. If you want the code to run faster, this solution is more expensive than your ugly one.
Since in your version of Perl there is no non-destructive substitution all you could do is implement your own function for that. It will not be as nice as the s///r, but it does the job. If you've got several occurrences of this type of code, it will make sense.
sub replace {
my ($text, $pattern, $replacement) = #_;
$text =~ s{$pattern}{$replacement}g; # do you need /g?
return $text;
}
# ... later
print ".o exists" if -f replace($file, qr/\.c$/, '.o');
This already takes care of making a copy for you, much like your temporary variable does, so $file will not actually be altered.
Note that your /g was useless as the filename will only ever have one end of the line, but it might not be useless later. However, it would be better to not fix it there, but to pass in an optional flag as another argument.
replace( $file, qr/.../, '.o', 'g' ); # where 'g' just means any true value
sub replace {
my ($text, $pattern, $replacement, $global) = #_;
if ($global) {
$text =~ s{$pattern}{$replacement}g;
} else {
$text =~ s{$pattern}{$replacement};
}
return $text;
}
You also generally don't need to escape the . in the replacement part because that's not actually a regular expression pattern, just a string.
I would approach it by adding a function as follows.
sub doto_exists {
my $doto = shift;
$doto =~ s/\.c$/\.o/;
return (-f $doto);
}
$file = "file1.c";
print ".o exists\n" if doto_exists($file) ;

How to match square bracket without escaping with \

When I am reading an input file in perl I got below line
u_pwrup_control/g_pwrup_bscan_cell[262]_u_pwrup_bscan
Now I want to find a similar line in reference file using regexp .But it is not matching when I use below command.
while(<INPUT_FILE>){
$k=$_;
##opening ref file in read mode
while(<REF_FILE>)
if ($_ =~ /$k/) {
print $_;
} else {
print $k is not matching;
}
}
}
Please tell me how to match [] without escaping with .
You are looking for the function quotemeta. Alternatively, you can use \Q...\E inside the regex (more informations about that on perlre.
Applied to your code :
either do $k = quotemeta $_; (the $_ is optional though), instead of $k = $_;
or keep $k = $_; and in the regex, do $_ =~ /\Q$k/.
You didn't provide a lot details in your question, so I'm no guarantying that this will actually match what you are trying to match, but at least [ and ] (and any other unsafe character) will be escaped in the regex.
In particular, you might want to chomp in both while after reading the lines, but it really depends on what you are reading.
But your code could be improved in many way, including :
Always add use strict; and use warnings; at the beginning of your script.
Related to the previous point, but use lexically scoped variables (ie. declared with my instead of your global ones (not declared)). So write my $k = ... instead of just $k = ... (only when you declare it).
Instead of doing your first while like that :
while (<INPUT_FILE>){ $k = $_; ... }
It would be much cleaner to do something like :
while (my $k = <INPUT_FILE>) { ... }
Using $_ is convenient in a lot of cases, but in that one, it's really not.
Don't use global filehandles, but instead use lexical variables :
open my $INPUT_FILE, '<', 'your_file_name' or die $!
And the, you can use them the same way as your old global ones : while (<$INPUT_FILE>) { ... }

Using a char variable in tr///

I am trying to count the characters in a string and found an easy solution counting a single character using the tr operator. Now I want to do this with every character from a to z. The following solution doesn't work because tr/// matches every character.
my #chars = ('a' .. 'z');
foreach my $c (#chars)
{
$count{$c} = ($text =~ tr/$c//);
}
How do I correctly use the char variable in tr///?
tr/// doesn't work with variables unless you wrap it in an eval
But there is a nicer way to do this:
$count{$_} = () = $text =~ /$_/g for 'a' .. 'z';
For the TIMTOWTDI:
$count{$_}++ for grep /[a-z]/i, split //, $text;
tr doesn't support variable interpolation (neither in the search list nor in the replacement list). If you want to use variables, you must use eval():
$count{$c} = eval "\$text =~ tr/$c/$c/";
That said, a more efficient (and secure) approach would be to simply iterate over the characters in the string and increment counters for each character, e.g.:
my %count = map { $_ => 0 } 'a' .. 'z';
for my $char (split //, $text) {
$count{$char}++ if defined $count{$char};
}
If you look at the perldoc for tr/SEARCHLIST/REPLACEMENTLIST/cdsr, then you'll see, right at the bottom of the section, the following:
Because the transliteration table is built at compile time, neither the SEARCHLIST nor the REPLACEMENTLIST are subjected to double quote interpolation. That means that if you want to use variables, you must use an eval():
eval "tr/$oldlist/$newlist/";
die $# if $#;
eval "tr/$oldlist/$newlist/, 1" or die $#;
Thus, you would need an eval to generate a new SEARCHLIST.
This is going to be very inefficient... the code might feel neat, but you're processing the complete string 26 times. You're also not counting uppercase characters.
You'd be better off stepping through the string once and just incrementing counters for each character found.
From the perlop documentation:
tr/AAA/XYZ/
will transliterate any A to X.
Because the transliteration table is built at compile time, neither
the SEARCHLIST nor the REPLACEMENTLIST are subjected to double quote
interpolation. That means that if you want to use variables, you must
use an eval()
Alternatively in your case you can use the s/// operator as:
foreach my $c (#chars) {
$count{$c} += ($text =~ s/$c//g);
}
My solution with some modification based from http://www.perlmonks.org/?node_id=446003
sub lowerLetters {
my $string = shift;
my %table;
#table{split //, $letters_uc} = split //, $letters_lc;
my $table_re = join '|', map { quotemeta } reverse sort keys %table;
$string =~ s/($table_re)/$table{$1}/g;
return if not defined $string;
return $string;
}
You may want to use s instead. Substitution is much more powerful than tr
My solution:
$count{$c} =~ s/\$search/$replace/g;
g at the end means "use it globally".
See:
https://blog.james.rcpt.to/2010/10/25/perl-search-and-replace-using-variables/
https://docstore.mik.ua/orelly/perl3/lperl/ch09_06.htm

A couple of Perl subtleties

I've been programming in Perl for a while, but I never have understood a couple of subtleties about Perl:
The use and the setting/unsetting of the $_ variable confuses me. For instance, why does
# ...
shift #queue;
($item1, #rest) = split /,/;
work, but (at least for me)
# ...
shift #queue;
/some_pattern.*/ or die();
does not seem to work?
Also, I don't understand the difference between iterating through a file using foreach versus while. For instance,I seem to be getting different results for
while(<SOME_FILE>){
# Do something involving $_
}
and
foreach (<SOME_FILE>){
# Do something involving $_
}
Can anyone explain these subtle differences?
shift #queue;
($item1, #rest) = split /,/;
If I understand you correctly, you seem to think that this shifts off an element from #queue to $_. That is not true.
The value that is shifted off of #queue simply disappears The following split operates on whatever is contained in $_ (which is independent of the shift invocation).
while(<SOME_FILE>){
# Do something involving $_
}
Reading from a filehandle in a while statement is special: It is equivalent to
while ( defined( $_ = readline *SOME_FILE ) ) {
This way, you can process even colossal files line-by-line.
On the other hand,
for(<SOME_FILE>){
# Do something involving $_
}
will first load the entire file as a list of lines into memory. Try a 1GB file and see the difference.
Another, albeit subtle, difference between:
while (<FILE>) {
}
and:
foreach (<FILE>) {
}
is that while() will modify the value of $_ outside of its scope, whereas, foreach() makes $_ local. For example, the following will die:
$_ = "test";
while (<FILE1>) {
print "$_";
}
die if $_ ne "test";
whereas, this will not:
$_ = "test";
foreach (<FILE1>) {
print "$_";
}
die if $_ ne "test";
This becomes more important with more complex scripts. Imagine something like:
sub func1() {
while (<$fh2>) { # clobbers $_ set from <$fh1> below
<...>
}
}
while (<$fh1>) {
func1();
<...>
}
Personally, I stay away from using $_ for this reason, in addition to it being less readable, etc.
Regarding the 2nd question:
while (<FILE>) {
}
and
foreach (<FILE>) {
}
Have the same functional behavior, including setting $_. The difference is that while() evaluates <FILE> in a scalar context, while foreach() evaluates <FILE> in a list context. Consider the difference between:
$x = <FILE>;
and
#x = <FILE>;
In the first case, $x gets the first line of FILE, and in the second case #x gets the entire file. Each entry in #x is a different line in FILE.
So, if FILE is very big, you'll waste memory slurping it all at once using foreach (<FILE>) compared to while (<FILE>). This may or may not be an issue for you.
The place where it really matters is if FILE is a pipe descriptor, as in:
open FILE, "some_shell_program|";
Now foreach(<FILE>) must wait for some_shell_program to complete before it can enter the loop, while while(<FILE>) can read the output of some_shell_program one line at a time and execute in parallel to some_shell_program.
That said, the behavior with regard to $_ remains unchanged between the two forms.
foreach evaluates the entire list up front. while evaluates the condition to see if its true each pass. while should be considered for incremental operations, foreach only for list sources.
For example:
my $t= time() + 10 ;
while ( $t > time() ) { # do something }
StackOverflow: What’s the difference between iterating over a file with foreach or while in Perl?
It is to avoid this sort of confusion that it's considered better form to avoid using the implicit $_ constructions.
my $element = shift #queue;
($item,#rest) = split /,/ , $element;
or
($item,#rest) = split /,/, shift #queue;
likewise
while(my $foo = <SOMEFILE>){
do something
}
or
foreach my $thing(<FILEHANDLE>){
do something
}
while only checks if the value is true, for also places the value in $_, except in some circumstances. For example <> will set $_ if used in a while loop.
to get similar behaviour of:
foreach(qw'a b c'){
# Do something involving $_
}
You have to set $_ explicitly.
while( $_ = shift #{[ qw'a b c' ]} ){
# Do something involving $_
}
It is better to explicitly set your variables
for my $line(<SOME_FILE>){
}
or better yet
while( my $line = <SOME_FILE> ){
}
which will only read in the file one line at a time.
Also shift doesn't set $_ unless you specifically ask it too
$_ = shift #_;
And split works on $_ by default. If used in scalar, or void context will populate #_.
Please read perldoc perlvar so that you will have an idea of the different variables in Perl.
perldoc perlvar.

How can I make Perl functions that use $_ by default?

I have an array and a simple function that trims white spaces:
my #ar=("bla ", "ha 1")
sub trim { my $a = shift; $a =~ s/\s+$//; $a}
Now, I want to apply this to an array with the map function. Why can't I do this by just giving the function name like one would do with built-in functions?
For example, you can do
print map(length, #ar)
But you can't do
print map(trim, #ar)
You have to do something like:
print map {trim($_)} #ar
print map(trim($_), #ar)
If you are using 5.10 or later, you can specify _ as the prototype for trim. If you are using earlier versions, use Axeman's answer:
As the last character of a prototype, or just before a semicolon, you can use _ in place of $ : if this argument is not provided, $_ will be used instead.
use strict; use warnings;
my #x = ("bla ", "ha 1");
sub trim(_) { my ($x) = #_; $x =~ s!\s+$!!; $x }
print map trim, #x;
Incidentally, don't use $a and $b outside of a sort comparator: They are immune from strict checking.
However, I prefer not to use prototypes for functions I write mainly because their use makes it harder to mentally parse the code. So, I would prefer using:
map trim($_), #x;
See also perldoc perlsub:
This is all very powerful, of course, and should be used only in moderation to make the world a better place.
The prototype that Sinan talks about is the best current way. But for earlier versions, there is still the old standby:
sub trim {
# v-- Here's the quick way to do it.
my $str = #_ ? $_[0] : $_;
# That was it.
$str =~ s/^\s+|\s+$//;
return $str;
}
Of course, I have a trim function with more features and handles more arguments and list context, but it doesn't demonstrate the concept as well. The ternary expression is a quick way to do what the '_' prototype character now does.
My favorite way to optionally use $_ without needing 5.10+ is as follows:
sub trim {
my ($s) = (#_, $_);
$s =~ s/\s+$//;
$s
}
This assigns the first element of #_ to $s if there is one. Otherwise it uses $_.
Many Perl built-in functions operate on $_ if given no arguments.
If your function did the same, it would work:
my #ar = ("bla ", "ha 1");
sub trim { my $s = #_ ? $_[0] : $_; $s =~ s/\s+$//; $s}
print map(trim, #ar), "\n";
And yes, Perl is kind of gross.