Perl: return an array from subroutine - perl

Perl noob.
I am having trouble understanding returning an array value from a subroutine in a module.
The script has the following:
print "Enter your first and last name.\n";
chomp(my $fullname = <STDIN>); #input is 'testing this' all lower case
Jhusbands::Loginpass->check_name($fullname);
print "$fullname\n";
The module includes the following subroutine:
sub check_name {
my $class = shift;
if ($_[0] =~ /^\w+\s+\w+$/ ) {
#_ = split( / /, $_[0]);
foreach $_ (#_) {
$_ = ucfirst lc for #_;
#_ = join(" ", #_);
print Dumper(#_) . "\n";
return #_;
}
}
}
I am taking the name, checking it for only first and last (I'll get to else statements later), splitting it, correcting the case, and joining again. Dumper displays the final array as:
$VAR1 = 'Testing This';
So it appears to be working that far. However, the return vale for $fullname in the script displays the all lower case:
testing this
Why is it not taking the corrected uppercase variable that Dumper displays as the last array iteration?

You don't assign the return to anything. Also, the sub manipulates #_ which it shouldn't be doing, as discussed below. It can also be greatly simplified
sub check_name {
my ($class, $name) = #_;
if ($name =~ /^\w+\s+\w+$/) {
return join ' ', map { ucfirst lc } split ' ', $name;
}
return; # returns "undef" (as input wasn't in expected format)
}
Then the caller can do
my $fullname = Jhusbands::Loginpass->check_name($name);
print "$fullname\n" if $fullname;
A sub's return should always be checked but in this case even more so, since it processes its input conditionally. I renamed the input to sub (to $name), for clarity.
If the code in the sub is meant to change the $fullname by writing directly to #_ (and you had no return for that reason), that fails since after the specific manipulations $_[0] isn't any more aliased to the argument that was passed.
In any case, doing that is very tricky, can lead to opaque code -- and is unneeded. To directly change the argument pass it as a reference and write to it. However, it is probably far clearer and less error prone to return the result in this case.
It should be noted that the above name "processing" runs into the standard problems with processing of names, due to their bewildering variety. If this needs to be comprehensive then the name parsing should be dispatched to a rounded library (or procedure) that can deal with the possible dirersity.
Thanks to ikegami for comments bringing this up with examples, as well as a more direct way:
$name =~ s/(\w+)/\u\L$1/g;
return $name;
which with /r introduced in v5.14 can be written as
return $name =~ s/(\w+)/\u\L$1/gr;
If $name has no word-characters (\w) and there is no match this returns the same string.

Related

Perform the same operation to all the variables but first variable passed in a Perl function

I am using a Perl function to dump a CSV file, where I pass certain values, and indie this function, I want to perform same operation for these passed variables except for the first variable, which is the file handle.
What I want to do is to check whether a passed argument (string) has commas in it, if so, make enclose them in quotation mark (") s.
But I need to assign these values to variable names, as I have to use them later for different purposes.
Following is my subroutine:
sub printCSVRowData
{
my $CSVFileHandle = shift;
foreach my $str (#_) {
if ($str eq "" or not defined $str or $str =~ /^ *$/) {
$str = "NA";
}
$str =~ s/\"//g;
}
my $firstCol = shift;
my $secondCol = shift;
my $thirdCol = shift;
# Do some modifications
print $CSVFileHandle "$firstCol, $secondCol, $thirdCol";
}
Now the issue is when I values to this subroutine, I get the following error message:
Modification of a read-only value attempted at line (where $str =~ s/\"//g; is called).
Can anyone help me on this ? What am I doing wrong here ? Is there any other way around for this ?
You are modifying #_, whose elements are the scalars passed as arguments. For this reason, modifying the elements of #_ isn't safe. That's why we copy the elements of #_ and the modify copies instead.
sub printCSVRowData {
my ($csv, $fh, #fields) = #_;
#fields = map { defined($_) && /\S/ ? $_ : "NA" } #fields;
$csv->say($fh, \#fields);
}
You should be using Text::CSV_XS or similar.
my $csv = Text::CSV_XS->new({
auto_diag => 2,
binary => 1,
});

Perl: Modify variable passed as param to subroutine

I need to modify a variable inside a routine, so it keeps the changes after leaving the routine. Here's an example:
$text = "hello";
&convert_to_uppercase($text);
print $text;
I want to see "HELLO" on the screen, not "hello".
The routine would be:
sub convert_to_uppercase($text){
<something like $text = uc($text);>
}
I know how to do it in PHP, but it seems that the parameters are not changed the same way. And, I've been searching everywhere and I couldn't find a concrete answer.
You really shouldn't use an ampersand & when calling a Perl subroutine. It is necessary only when treating the code as a data item, for instance when taking a reference, like \&convert_to_uppercase. Using it in a call hasn't been necessary since version 4 of Perl 5, and it does some arcane things that you probably don't want.
It is unusual for subroutines to modify their parameters, but the elements of #_ are aliases of the actual parameters so you can do what you ask by modifying that array.
If you write your subroutine like this
sub convert_to_uppercase {
$_[0] = uc $_[0];
}
then it will do what you ask. But it is generally best to return the modified value so that the decision on whether to overwrite the original value can be taken by the calling code. For instance, if I have
sub upper_case {
uc shift;
}
then it can be called either as
my $text = "hello";
$text = upper_case($text);
print $text;
which does as you require, and modifies $text; or as
my $text = "hello";
print upper_case($text);
which leaves $text unchanged, but returns the altered value.
Passing a reference and modifying the original variable inside the subroutine would be done like this:
$text = 'hello';
convert_to_uppercase(\$text); #notice the \ before $text
print $text;
sub convert_to_uppercase { #perl doesn't specify arguments here
### arguments will be in #_, so #_ is now a list like ('hello')
my $ref = shift; #$ref is NOT 'hello'. it's '$text'
### add some output so you can see what's going on:
print 'Variable $ref is: ', $ref, " \n"; #will print some hex number like SCALAR(0xad1d2)
print 'Variable ${$ref} is: ', ${$ref}, " \n"; #will print 'hello'
# Now do what this function is supposed to do:
${$ref} = uc ${$ref}; #it's modifying the original variable, not a copy of it
}
The other way is to create a return value inside the subroutine and modify the variable outside of the subroutine:
$text = 'hello';
$text = convert_to_uppercase($text); #there's no \ this time
print $text;
sub convert_to_uppercase {
# #_ contains 'hello'
my $input = shift; #$input is 'hello'
return uc $input; #returns 'HELLO'
}
But the convert_to_uppercase routine seems redundant because that's what uc does. Skip all of that and just do this:
$text = 'hello';
$text = uc $text;

What does this if statement do? (string comparison)

I am trying to understand a piece of code which loops over a file, does various assignments, then enters a set of if statements where a string is seemingly compared to nothing. What are /nonsynonymous/ and /prematureStop/ being compared to here? I am mostly experienced with python.
open(IN,$file);
while(<IN>){
chomp $_;
my #tmp = split /\t+/,$_;
my $id = join("\t",$tmp[0],$tmp[1]-1);
$id =~ s/chr//;
my #info_field = split /;/,$tmp[2];
my $vat = $info_field[$#info_field];
my $score = 0;
$self -> {VAT} ->{$id}= $vat;
$self ->{GENE} -> {$id} = $tmp[3];
if (/nonsynonymous/ || /prematureStop/){...
It is comparing against the current input line ($_).
By default, perl will automatically use the current input line ($_) when doing regex matches unless overridden (with =~).
From http://perldoc.perl.org/perlretut.html
If you're matching against the special default variable $_ , the $_ =~
part can be omitted:
$_ = "Hello World";
if (/World/) {
print "It matches\n";
}
else {
print "It doesn't match\n";
}
Often in Perl, if a specific variable isn't given, it's assumed that you want to use the default variable $_. For instance, the while loop assigns the incoming lines from <IN> to that variable, chomp $_; could just as well have been written chomp;, and the regular expressions in the if statement try to match with $_ as well.

A couple of Perl subtleties

I've been programming in Perl for a while, but I never have understood a couple of subtleties about Perl:
The use and the setting/unsetting of the $_ variable confuses me. For instance, why does
# ...
shift #queue;
($item1, #rest) = split /,/;
work, but (at least for me)
# ...
shift #queue;
/some_pattern.*/ or die();
does not seem to work?
Also, I don't understand the difference between iterating through a file using foreach versus while. For instance,I seem to be getting different results for
while(<SOME_FILE>){
# Do something involving $_
}
and
foreach (<SOME_FILE>){
# Do something involving $_
}
Can anyone explain these subtle differences?
shift #queue;
($item1, #rest) = split /,/;
If I understand you correctly, you seem to think that this shifts off an element from #queue to $_. That is not true.
The value that is shifted off of #queue simply disappears The following split operates on whatever is contained in $_ (which is independent of the shift invocation).
while(<SOME_FILE>){
# Do something involving $_
}
Reading from a filehandle in a while statement is special: It is equivalent to
while ( defined( $_ = readline *SOME_FILE ) ) {
This way, you can process even colossal files line-by-line.
On the other hand,
for(<SOME_FILE>){
# Do something involving $_
}
will first load the entire file as a list of lines into memory. Try a 1GB file and see the difference.
Another, albeit subtle, difference between:
while (<FILE>) {
}
and:
foreach (<FILE>) {
}
is that while() will modify the value of $_ outside of its scope, whereas, foreach() makes $_ local. For example, the following will die:
$_ = "test";
while (<FILE1>) {
print "$_";
}
die if $_ ne "test";
whereas, this will not:
$_ = "test";
foreach (<FILE1>) {
print "$_";
}
die if $_ ne "test";
This becomes more important with more complex scripts. Imagine something like:
sub func1() {
while (<$fh2>) { # clobbers $_ set from <$fh1> below
<...>
}
}
while (<$fh1>) {
func1();
<...>
}
Personally, I stay away from using $_ for this reason, in addition to it being less readable, etc.
Regarding the 2nd question:
while (<FILE>) {
}
and
foreach (<FILE>) {
}
Have the same functional behavior, including setting $_. The difference is that while() evaluates <FILE> in a scalar context, while foreach() evaluates <FILE> in a list context. Consider the difference between:
$x = <FILE>;
and
#x = <FILE>;
In the first case, $x gets the first line of FILE, and in the second case #x gets the entire file. Each entry in #x is a different line in FILE.
So, if FILE is very big, you'll waste memory slurping it all at once using foreach (<FILE>) compared to while (<FILE>). This may or may not be an issue for you.
The place where it really matters is if FILE is a pipe descriptor, as in:
open FILE, "some_shell_program|";
Now foreach(<FILE>) must wait for some_shell_program to complete before it can enter the loop, while while(<FILE>) can read the output of some_shell_program one line at a time and execute in parallel to some_shell_program.
That said, the behavior with regard to $_ remains unchanged between the two forms.
foreach evaluates the entire list up front. while evaluates the condition to see if its true each pass. while should be considered for incremental operations, foreach only for list sources.
For example:
my $t= time() + 10 ;
while ( $t > time() ) { # do something }
StackOverflow: What’s the difference between iterating over a file with foreach or while in Perl?
It is to avoid this sort of confusion that it's considered better form to avoid using the implicit $_ constructions.
my $element = shift #queue;
($item,#rest) = split /,/ , $element;
or
($item,#rest) = split /,/, shift #queue;
likewise
while(my $foo = <SOMEFILE>){
do something
}
or
foreach my $thing(<FILEHANDLE>){
do something
}
while only checks if the value is true, for also places the value in $_, except in some circumstances. For example <> will set $_ if used in a while loop.
to get similar behaviour of:
foreach(qw'a b c'){
# Do something involving $_
}
You have to set $_ explicitly.
while( $_ = shift #{[ qw'a b c' ]} ){
# Do something involving $_
}
It is better to explicitly set your variables
for my $line(<SOME_FILE>){
}
or better yet
while( my $line = <SOME_FILE> ){
}
which will only read in the file one line at a time.
Also shift doesn't set $_ unless you specifically ask it too
$_ = shift #_;
And split works on $_ by default. If used in scalar, or void context will populate #_.
Please read perldoc perlvar so that you will have an idea of the different variables in Perl.
perldoc perlvar.

How can I make Perl functions that use $_ by default?

I have an array and a simple function that trims white spaces:
my #ar=("bla ", "ha 1")
sub trim { my $a = shift; $a =~ s/\s+$//; $a}
Now, I want to apply this to an array with the map function. Why can't I do this by just giving the function name like one would do with built-in functions?
For example, you can do
print map(length, #ar)
But you can't do
print map(trim, #ar)
You have to do something like:
print map {trim($_)} #ar
print map(trim($_), #ar)
If you are using 5.10 or later, you can specify _ as the prototype for trim. If you are using earlier versions, use Axeman's answer:
As the last character of a prototype, or just before a semicolon, you can use _ in place of $ : if this argument is not provided, $_ will be used instead.
use strict; use warnings;
my #x = ("bla ", "ha 1");
sub trim(_) { my ($x) = #_; $x =~ s!\s+$!!; $x }
print map trim, #x;
Incidentally, don't use $a and $b outside of a sort comparator: They are immune from strict checking.
However, I prefer not to use prototypes for functions I write mainly because their use makes it harder to mentally parse the code. So, I would prefer using:
map trim($_), #x;
See also perldoc perlsub:
This is all very powerful, of course, and should be used only in moderation to make the world a better place.
The prototype that Sinan talks about is the best current way. But for earlier versions, there is still the old standby:
sub trim {
# v-- Here's the quick way to do it.
my $str = #_ ? $_[0] : $_;
# That was it.
$str =~ s/^\s+|\s+$//;
return $str;
}
Of course, I have a trim function with more features and handles more arguments and list context, but it doesn't demonstrate the concept as well. The ternary expression is a quick way to do what the '_' prototype character now does.
My favorite way to optionally use $_ without needing 5.10+ is as follows:
sub trim {
my ($s) = (#_, $_);
$s =~ s/\s+$//;
$s
}
This assigns the first element of #_ to $s if there is one. Otherwise it uses $_.
Many Perl built-in functions operate on $_ if given no arguments.
If your function did the same, it would work:
my #ar = ("bla ", "ha 1");
sub trim { my $s = #_ ? $_[0] : $_; $s =~ s/\s+$//; $s}
print map(trim, #ar), "\n";
And yes, Perl is kind of gross.