Perl trim spaces from scalar or array - perl

This is a very basic Perl question but I just want to make sure the actual good practice to it.
Consider I have built a function to trim spaces from strings and I will pass to it either single scalar as string or array of strings, I have this basic working example:
sub trim_spaces {
my (#out) = #_;
for (#out) {
s/\s+//g;
}
return (scalar #out >1)? #out : $out[0];
}
this works in the following calls:
trim_spaces(" These Spaces Are All Removed");
and
#str = (" Str Number 1 ", " Str Number 2 ", " Str Number 3 ");
trim_spaces(#str);
What I am trying to do and understand is the shortest version of this function like this:
sub trim_spaces {
s/\s+//g for (#_);
return #_;
}
This works only if I pass an array:
trim_spaces(#str);
but it does not work if I pass a scalar string:
trim_spaces(" These Spaces Are All Removed");
I understand it should be converted from scalar ref to array, how this can be done in the short version.
Trying to understand the best practices of Perl.

The strict best practice answer to this is to always unpack the contents of #_ into lexical variables, first thing. Perl Best Practices provides the following (paraphrased) arguments:
It's not self-documenting to directly access #_. $_[0], $_[1], and so on tell you nothing about what these parameters are for.
The aliasing behavior of #_ is easily forgotten and can be a source of hard-to-find bugs in a program. Whenever possible, avoid spooky action at a distance.
You can verify each argument while unpacking the #_ array.
And one argument not from PBP:
Seeing my $self = shift; at the beginning of a subroutine clearly marks it as an OO method instead of an ordinary sub.
Sources: Perl Best Practices (Conway 2005), Perl::Critic's relevant policy from PBP.
The elements in #_ are aliases to the original values, which means modifying them inside the subroutine will change them outside as well. The array you're returning is ignored in your examples.
If you store the string in a variable this would work:
my $string = ' These Spaces Are Removed ';
trim_spaces($string); # $string is now 'TheseSpacesAreRemoved'
Or you could use non-destructive substitution and assign the results created by this:
sub trim_spaces { return map { s/\s+//gr } #_ }
my #trimmed = trim_spaces('string one', ' string two');
my ($trimmed_scalar) = trim_spaces('string three');
map will create a list of the values returned by the substitution with the /r flag. The parens around $trimmed_scalar are necessary; see the last example for a version where it isn't.
Alternatively, you could copy the parameters inside the subroutine into lexical variables to avoid action at a distance, which is generally better practice than directly modifying the elements of #_:
sub trim_spaces
{
my #strings = #_;
s/\s+//g for #strings;
return #strings;
}
Personally, I find it nicer when the subroutine returns a value without side effects, and the /r flag saves me the trouble of thinking of a better name for a lexical copy. We can use wantarray to make it smarter in regards to the calling context:
sub trim_spaces
{
return if not defined wantarray;
return map { s/\s+//gr } #_ if wantarray;
return shift =~ s/\s+//gr;
}
On a side note, trim_spaces would be better named remove_whitespace or something similar. Trimming usually means to remove leading and trailing whitespace, and the \s character class matches tabs, newlines, form feeds, and carriage returns in addition to spaces. Use tr/ //dcr to remove just spaces instead if that's what you wanted.

Related

Use variadic arguments as arguments for sprintf

I'd like to program a wrapper around printf(...).
My first attempt was:
sub printf2 {
my $test = sprintf(#_);
print $test;
}
As the array (in scalar context) isn't a format string, this doesn't work (as expected).
Does anyone know a solution? Probably without using any special packages?
EDIT: In the real context, I'd like to use sprintf. Apparently there is a difference between printf and sprintf.
The sprintf function has a ($#) prototype, so the first argument to sprintf is always evaluated in scalar context, even if it is an array.
$x = sprintf(#a); # same as sprintf(scalar #a)
So before you call sprintf, you need to separate the template from the rest of the arguments.
Here's a concise way:
sub printf2 {
my $test = sprintf(shift, #_);
print $test;
}
Curiously, printf doesn't have a prototype and does what you expect.
printf(#a); # same as printf($a[0], #a[1..$#a])
How about this
sub pf { printf $_[0],#_[1..$#_] }
Many of the named operators (such as sprintf) have special syntaxes. sprintf's syntax is defined to be
sprintf FORMAT, LIST
This can often (but not always) be seen using prototype.
>perl -wE"say prototype 'CORE::sprintf'"
$#
The problem is that you used one of the following syntaxes instead of the documented syntax.
sprintf ARRAY
sprintf LIST
Simply switch to the documented syntax to solve your problem.
sub printf2 {
my ($format, #args) = #_;
print sprintf($format, #args);
}
Or if you want to avoid the copying,
sub printf2 {
print sprintf($_[0], #_[ 1..$#_ ]);
}

Scope of the default variable $_ in Perl

I have the following method which accepts a variable and then displays info from a database:
sub showResult {
if (#_ == 2) {
my #results = dbGetResults($_[0]);
if (#results) {
foreach (#results) {
print "$count - $_[1] (ID: $_[0])\n";
}
} else {
print "\n\nNo results found";
}
}
}
Everything works fine, except the print line in the foreach loop. This $_ variable still contains the values passed to the method.
Is there anyway to 'force' the new scope of values on $_, or will it always contain the original values?
If there are any good tutorials that explain how the scope of $_ works, that would also be cool!
Thanks
The problem here is that you're using really #_ instead of $_. The foreach loop changes $_, the scalar variable, not #_, which is what you're accessing if you index it by $_[X]. Also, check again the code to see what it is inside #results. If it is an array of arrays or refs, you may need to use the indirect ${$_}[0] or something like that.
In Perl, the _ name can refer to a number of different variables:
The common ones are:
$_ the default scalar (set by foreach, map, grep)
#_ the default array (set by calling a subroutine)
The less common:
%_ the default hash (not used by anything by default)
_ the default file handle (used by file test operators)
&_ an unused subroutine name
*_ the glob containing all of the above names
Each of these variables can be used independently of the others. In fact, the only way that they are related is that they are all contained within the *_ glob.
Since the sigils vary with arrays and hashes, when accessing an element, you use the bracket characters to determine which variable you are accessing:
$_[0] # element of #_
$_{...} # element of %_
$$_[0] # first element of the array reference stored in $_
$_->[0] # same
The for/foreach loop can accept a variable name to use rather than $_, and that might be clearer in your situation:
for my $result (#results) {...}
In general, if your code is longer than a few lines, or nested, you should name the variables rather than relying on the default ones.
Since your question was related more to variable names than scope, I have not discussed the actual scope surrounding the foreach loop, but in general, the following code is equivalent to what you have.
for (my $i = 0; $i < $#results; $i++) {
local *_ = \$results[$i];
...
}
The line local *_ = \$results[$i] installs the $ith element of #results into the scalar slot of the *_ glob, aka $_. At this point $_ contains an alias of the array element. The localization will unwind at the end of the loop. local creates a dynamic scope, so any subroutines called from within the loop will see the new value of $_ unless they also localize it. There is much more detail available about these concepts, but I think they are outside the scope of your question.
As others have pointed out:
You're really using #_ and not $_ in your print statement.
It's not good to keep stuff in these variables since they're used elsewhere.
Officially, $_ and #_ are global variables and aren't members of any package. You can localize the scope with my $_ although that's probably a really, really bad idea. The problem is that Perl could use them without you even knowing it. It's bad practice to depend upon their values for more than a few lines.
Here's a slight rewrite in your program getting rid of the dependency on #_ and $_ as much as possible:
sub showResults {
my $foo = shift; #Or some meaningful name
my $bar = shift; #Or some meaningful name
if (not defined $foo) {
print "didn't pass two parameters\n";
return; #No need to hang around
}
if (my #results = dbGetResults($foo)) {
foreach my $item (#results) {
...
}
}
Some modifications:
I used shift to give your two parameters actual names. foo and bar aren't good names, but I couldn't find out what dbGetResults was from, so I couldn't figure out what parameters you were looking for. The #_ is still being used when the parameters are passed, and my shift is depending upon the value of #_, but after the first two lines, I'm free.
Since your two parameters have actual names, I can use the if (not defined $bar) to see if both parameters were passed. I also changed this to the negative. This way, if they didn't pass both parameters, you can exit early. This way, your code has one less indent, and you don't have a if structure that takes up your entire subroutine. It makes it easier to understand your code.
I used foreach my $item (#results) instead of foreach (#results) and depend upon $_. Again, it's clearer what your program is doing, and you wouldn't have confused $_->[0] with $_[0] (I think that's what you were doing). It would have been obvious you wanted $item->[0].

Is there an analogue of Ruby gsub method in Perl? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
How do I perform a Perl substitution on a string while keeping the original?
How do I do one line replacements in Perl without modifying the string itself? I also want it to be usable inside expressions, much like I can do p s.gsub(/from/, 'to') in Ruby.
All I can think of is
do {my $r = $s; $r =~ s/from/to/; $r}
but sure there is a better way?
Starting on the day you feel comfortable writing use 5.14.0 at the top of all of your programs, you can use the s/foo/bar/r variant of the s/// operator, which returns the changed string instead of modifying the original in place (added in perl 5.13.2).
The solution you found with do is not bad, but you can shorten it a little:
do {(my $r = $s) =~ s/from/to/; $r}
It still reveals the mechanics though. You can hide the implementation, and also apply substitutions to lists by writing a subroutine. In most implementations, this function is called apply which you could import from List::Gen or List::MoreUtils or a number of other modules. Or since it is so short, just write it yourself:
sub apply (&#) { # takes code block `&` and list `#`
my ($sub, #ret) = #_; # shallow copy of argument list
$sub->() for #ret; # apply code to each copy
wantarray ? #ret : pop #ret # list in list context, last elem in scalar
}
apply creates a shallow copy of the argument list, and then calls its code block, which is expected to modify $_. The block's return value is not used. apply behaves like the comma , operator. In list context, it returns the list. In scalar context, it returns the last item in the list.
To use it:
my $new = apply {s/foo/bar/} $old;
my #new = apply {s/foo/bar/} qw( foot fool fooz );
From Perl's docs: Regexp-like operators:
($foo = $bar) =~ s/this/that/g; # copy first, then change would match gsub, while
$bar =~ s/this/that/g; # change would match gsub!

Why do printf and sprintf behave differently when only given an array?

sub do_printf { printf #_ }
sub do_sprintf { print sprintf #_ }
do_printf("%s\n", "ok"); # prints ok
do_sprintf("%s\n", "ok"); # prints 2
sprintf has prototype $# while printf has prototype of #
From the perldoc on sprintf:
Unlike printf, sprintf does not do
what you probably mean when you pass
it an array as your first argument.
The array is given scalar context, and
instead of using the 0th element of
the array as the format, Perl will use
the count of elements in the array as
the format, which is almost never
useful.
See codeholic's and Mark's answers for the explanation as to why they behave differently.
As a workaround, simply do:
sub do_sprintf { print sprintf(shift, #_) }
Then,
sub do_printf { printf #_ }
sub do_sprintf { print sprintf(shift, #_) }
do_printf("%s\n", "ok"); # prints ok
do_sprintf("%s\n", "ok2"); # prints ok2
They do different things. For printf the output is to a stream; for sprintf you want the string constructed. It handles the formatting (the f) of the print command. The main purpose for printf is to print out the value it constructs to a stream but with s(tring)printf(ormat) you're only trying to create the string, not print it.
printf returns the number of characters printed to a stream as feedback. Once characters are printed to a stream they've passed out of the logical structure of the program. Meanwhile, sprintf needs to hand you back a string. The most convenient way is as a return value--which because it is within the program structure can be inspected for length, or whether it contains any 'e's, or whatever you want.
Why shouldn't they behave differently?
sprintf evaluates the array in scalar context. Your array has two elements, so it evaluates as "2" (without a trailing \n).

What does '#_' do in Perl?

I was glancing through some code I had written in my Perl class and I noticed this.
my ($string) = #_;
my #stringarray = split(//, $string);
I am wondering two things:
The first line where the variable is in parenthesis, this is something you do when declaring more than one variable and if I removed them it would still work right?
The second question would be what does the #_ do?
The #_ variable is an array that contains all the parameters passed into a subroutine.
The parentheses around the $string variable are absolutely necessary. They designate that you are assigning variables from an array. Without them, the #_ array is assigned to $string in a scalar context, which means that $string would be equal to the number of parameters passed into the subroutine. For example:
sub foo {
my $bar = #_;
print $bar;
}
foo('bar');
The output here is 1--definitely not what you are expecting in this case.
Alternatively, you could assign the $string variable without using the #_ array and using the shift function instead:
sub foo {
my $bar = shift;
print $bar;
}
Using one method over the other is quite a matter of taste. I asked this very question which you can check out if you are interested.
When you encounter a special (or punctuation) variable in Perl, check out the perlvar documentation. It lists them all, gives you an English equivalent, and tells you what it does.
Perl has two different contexts, scalar context, and list context. An array '#_', if used in scalar context returns the size of the array.
So given these two examples, the first one gives you the size of the #_ array, and the other gives you the first element.
my $string = #_ ;
my ($string) = #_ ;
Perl has three 'Default' variables $_, #_, and depending on who you ask %_. Many operations will use these variables, if you don't give them a variable to work on. The only exception is there is no operation that currently will by default use %_.
For example we have push, pop, shift, and unshift, that all will accept an array as the first parameter.
If you don't give them a parameter, they will use the 'default' variable instead. So 'shift;' is the same as 'shift #_;'
The way that subroutines were designed, you couldn't formally tell the compiler which values you wanted in which variables. Well it made sense to just use the 'default' array variable '#_' to hold the arguments.
So these three subroutines are (nearly) identical.
sub myjoin{
my ( $stringl, $stringr ) = #_;
return "$stringl$stringr";
}
sub myjoin{
my $stringl = shift;
my $stringr = shift;
return "$stringl$stringr";
}
sub myjoin{
my $stringl = shift #_;
my $stringr = shift #_;
return "$stringl$stringr";
}
I think the first one is slightly faster than the other two, because you aren't modifying the #_ variable.
The variable #_ is an array (hence the # prefix) that holds all of the parameters to the current function.