Perl: Modify variable passed as param to subroutine - perl

I need to modify a variable inside a routine, so it keeps the changes after leaving the routine. Here's an example:
$text = "hello";
&convert_to_uppercase($text);
print $text;
I want to see "HELLO" on the screen, not "hello".
The routine would be:
sub convert_to_uppercase($text){
<something like $text = uc($text);>
}
I know how to do it in PHP, but it seems that the parameters are not changed the same way. And, I've been searching everywhere and I couldn't find a concrete answer.

You really shouldn't use an ampersand & when calling a Perl subroutine. It is necessary only when treating the code as a data item, for instance when taking a reference, like \&convert_to_uppercase. Using it in a call hasn't been necessary since version 4 of Perl 5, and it does some arcane things that you probably don't want.
It is unusual for subroutines to modify their parameters, but the elements of #_ are aliases of the actual parameters so you can do what you ask by modifying that array.
If you write your subroutine like this
sub convert_to_uppercase {
$_[0] = uc $_[0];
}
then it will do what you ask. But it is generally best to return the modified value so that the decision on whether to overwrite the original value can be taken by the calling code. For instance, if I have
sub upper_case {
uc shift;
}
then it can be called either as
my $text = "hello";
$text = upper_case($text);
print $text;
which does as you require, and modifies $text; or as
my $text = "hello";
print upper_case($text);
which leaves $text unchanged, but returns the altered value.

Passing a reference and modifying the original variable inside the subroutine would be done like this:
$text = 'hello';
convert_to_uppercase(\$text); #notice the \ before $text
print $text;
sub convert_to_uppercase { #perl doesn't specify arguments here
### arguments will be in #_, so #_ is now a list like ('hello')
my $ref = shift; #$ref is NOT 'hello'. it's '$text'
### add some output so you can see what's going on:
print 'Variable $ref is: ', $ref, " \n"; #will print some hex number like SCALAR(0xad1d2)
print 'Variable ${$ref} is: ', ${$ref}, " \n"; #will print 'hello'
# Now do what this function is supposed to do:
${$ref} = uc ${$ref}; #it's modifying the original variable, not a copy of it
}
The other way is to create a return value inside the subroutine and modify the variable outside of the subroutine:
$text = 'hello';
$text = convert_to_uppercase($text); #there's no \ this time
print $text;
sub convert_to_uppercase {
# #_ contains 'hello'
my $input = shift; #$input is 'hello'
return uc $input; #returns 'HELLO'
}
But the convert_to_uppercase routine seems redundant because that's what uc does. Skip all of that and just do this:
$text = 'hello';
$text = uc $text;

Related

Confusion about using perl #_ variable with regex capture

When I using following code, subroutine f can't print #_ correctly for the substitution of $tmp before it, which is explainable.
use strict;
use warnings;
sub f {
# print("args = ", #_);
my $tmp = "111";
$tmp =~ s/\d+//;
print("args = ", #_);
}
"dd11ddd" =~ /(?<var>\d+)/;
f($+{"var"});
But when I uncomment the first print statement, then both print could give the correct #_, which makes me confused, why the capture group hasn't been overwrite. Or just some underlay mechanism of perl I don't know? Please help, thanks.
When I pass capture group into perl subroutine, the capture group hasn't been overwritten as expected.
I want to know why this could happen and how to explain it correctly.
Perl arguments are passed by reference.
sub f {
$_[0] = "def";
}
my $x = "abc;
say $x; # abc
f( $x );
say $x; # def
%+ is affected by $tmp =~ s/\d+//, and thus so is $_[0]. We don't usually run into problems because we usually make an explicit copy of the arguments.
sub f {
my $y = shift;
$y = "def";
}
my $x = "abc";
say $x; # abc
f( $x );
say $x; # abc
Passing a copy of the scalar would also avoid the problems.
sub f {
$_[0] = "def";
}
my $x = "abc";
say $x; # abc
f( "$x" );
say $x; # abc
The above explains why you get weird behaviour and how to avoid it, but not why accessing $_[0] before the substitution seems to fix it. Honestly, it doesn't really matter. It's some weird interaction between the magical nature of %+, the implicit localization of %+, the optimizations to avoid needless implication localizations of %+, and the way localization works.

Perform the same operation to all the variables but first variable passed in a Perl function

I am using a Perl function to dump a CSV file, where I pass certain values, and indie this function, I want to perform same operation for these passed variables except for the first variable, which is the file handle.
What I want to do is to check whether a passed argument (string) has commas in it, if so, make enclose them in quotation mark (") s.
But I need to assign these values to variable names, as I have to use them later for different purposes.
Following is my subroutine:
sub printCSVRowData
{
my $CSVFileHandle = shift;
foreach my $str (#_) {
if ($str eq "" or not defined $str or $str =~ /^ *$/) {
$str = "NA";
}
$str =~ s/\"//g;
}
my $firstCol = shift;
my $secondCol = shift;
my $thirdCol = shift;
# Do some modifications
print $CSVFileHandle "$firstCol, $secondCol, $thirdCol";
}
Now the issue is when I values to this subroutine, I get the following error message:
Modification of a read-only value attempted at line (where $str =~ s/\"//g; is called).
Can anyone help me on this ? What am I doing wrong here ? Is there any other way around for this ?
You are modifying #_, whose elements are the scalars passed as arguments. For this reason, modifying the elements of #_ isn't safe. That's why we copy the elements of #_ and the modify copies instead.
sub printCSVRowData {
my ($csv, $fh, #fields) = #_;
#fields = map { defined($_) && /\S/ ? $_ : "NA" } #fields;
$csv->say($fh, \#fields);
}
You should be using Text::CSV_XS or similar.
my $csv = Text::CSV_XS->new({
auto_diag => 2,
binary => 1,
});

Perl: return an array from subroutine

Perl noob.
I am having trouble understanding returning an array value from a subroutine in a module.
The script has the following:
print "Enter your first and last name.\n";
chomp(my $fullname = <STDIN>); #input is 'testing this' all lower case
Jhusbands::Loginpass->check_name($fullname);
print "$fullname\n";
The module includes the following subroutine:
sub check_name {
my $class = shift;
if ($_[0] =~ /^\w+\s+\w+$/ ) {
#_ = split( / /, $_[0]);
foreach $_ (#_) {
$_ = ucfirst lc for #_;
#_ = join(" ", #_);
print Dumper(#_) . "\n";
return #_;
}
}
}
I am taking the name, checking it for only first and last (I'll get to else statements later), splitting it, correcting the case, and joining again. Dumper displays the final array as:
$VAR1 = 'Testing This';
So it appears to be working that far. However, the return vale for $fullname in the script displays the all lower case:
testing this
Why is it not taking the corrected uppercase variable that Dumper displays as the last array iteration?
You don't assign the return to anything. Also, the sub manipulates #_ which it shouldn't be doing, as discussed below. It can also be greatly simplified
sub check_name {
my ($class, $name) = #_;
if ($name =~ /^\w+\s+\w+$/) {
return join ' ', map { ucfirst lc } split ' ', $name;
}
return; # returns "undef" (as input wasn't in expected format)
}
Then the caller can do
my $fullname = Jhusbands::Loginpass->check_name($name);
print "$fullname\n" if $fullname;
A sub's return should always be checked but in this case even more so, since it processes its input conditionally. I renamed the input to sub (to $name), for clarity.
If the code in the sub is meant to change the $fullname by writing directly to #_ (and you had no return for that reason), that fails since after the specific manipulations $_[0] isn't any more aliased to the argument that was passed.
In any case, doing that is very tricky, can lead to opaque code -- and is unneeded. To directly change the argument pass it as a reference and write to it. However, it is probably far clearer and less error prone to return the result in this case.
It should be noted that the above name "processing" runs into the standard problems with processing of names, due to their bewildering variety. If this needs to be comprehensive then the name parsing should be dispatched to a rounded library (or procedure) that can deal with the possible dirersity.
Thanks to ikegami for comments bringing this up with examples, as well as a more direct way:
$name =~ s/(\w+)/\u\L$1/g;
return $name;
which with /r introduced in v5.14 can be written as
return $name =~ s/(\w+)/\u\L$1/gr;
If $name has no word-characters (\w) and there is no match this returns the same string.

Interpolating a non-interpolated passed string inside a subroutine in Perl

I am looking to parse a tab delimited text file into a nested hash with a subroutine. Each file row will be keyed by a unique id from a uid column(s), with the header row as nested keys. Which column(s) is(are) to become the uid changes (as sometimes there isn't a unique column, so the uid has to be a combination of columns). My issue is with the $uid variable, which I pass as a non-interpolated string. When I try to use it inside the subroutine in an interpolated way, it will only give me the non-interpolated value:
use strict;
use warnings;
my $lofrow = tablehash($lof_file, '$row{gene}', "transcript", "ENST");
##sub to generate table hash from file w/ headers
##input values are file, uid, header starter, row starter, max column number
##returns hash reference (deref it)
sub tablehash {
my ($file, $uid, $headstart, $rowstart, $colnum) = #_;
if (!$colnum){ # takes care of a unknown number of columns
$colnum = 0;
}
open(INA, $file) or die "failed to open $file, $!\n";
my %table; # permanent hash table
my %row; # hash of column values for each row
my #names = (); # column headers
my #values = (); # line/row values
while (chomp(my $line = <INA>)){ # reading lines for lof info
if ($line =~ /^$headstart/){
#names = split(/\t/, $line, $colnum);
} elsif ($line =~ /^$rowstart/){ # splitting lof info columns into variables
#values = split(/\t/, $line, $colnum);
#row{#names} = #values;
print qq($uid\t$row{gene}\n); # problem: prints "$row{gene} ACB1"
$table{"$uid"} = { %row }; # puts row hash into permanent hash, but with $row{gene} key)
}
}
close INA;
return \%table;
}
I am out of ideas. I could put $table{$row{$uid}} and simply pass "gene", but in a couple of instances I want to have a $uid of "$row{gene}|$row{rsid}" producing $table{ACB1|123456}
Interpolation is a feature of the Perl parser. When you write something like
"foo $bar baz"
, Perl compiles it into something like
'foo ' . $bar . ' $baz'
It does not interpret data at runtime.
What you have is a string where one of the characters happens to be $ but that has no special effect.
There are at least two possible ways to do something like what you want. One of them is to use a function, not a string. (Which makes sense because interpolation really means concatenation at runtime, and the way to pass code around is to wrap it in a function.)
my $lofrow = tablehash($lof_file, sub { my ($row) = #_; $row->{gene} }, "transcript", "ENST");
sub tablehash {
my ($file, $mkuid, $headstart, $rowstart, $colnum) = #_;
...
my $uid = $mkuid->(\%row);
$table{$uid} = { %row };
Here $mkuid isn't a string but a reference to a function that (given a hash reference) returns a uid string. tablehash calls it, passing a reference to %row to it. You can then later change it to e.g.
my $lofrow = tablehash($lof_file, sub { my ($row) = #_; "$row->{gene}|$row->{rsid}" }, "transcript", "ENST");
Another solution is to use what amounts to a template string:
my $lofrow = tablehash($lof_file, "gene|rsid", "transcript", "ENST");
sub tablehash {
my ($file, $uid_template, $headstart, $rowstart, $colnum) = #_;
...
(my $uid = $uid_template) =~ s/(\w+)/$row{$1}/g;
$table{$uid} = { %row };
The s/// code goes through the template string and manually replaces every word by the corresponding value from %row.
Random notes:
Bonus points for using strict and warnings.
if (!$colnum) { $colnum = 0; } can be simplified to $colnum ||= 0;.
Use lexical variables instead of bareword filehandles. Barewords are effectively global variables (and syntactically awkward because they're not first-class citizens of the language).
Always use the 3-argument form of open to avoid unexpected interpretation of the second argument.
Include the name of your program in error messages (either explicitly with $0 or implicitly by omitting \n from die).
my #foo = (); my %bar = (); is redundant and can be simplified to my #foo; my %bar;. Arrays and hashes start out empty; overwriting them with an empty list is pointless.
chomp(my $line = <INA>) will throw a warning when you reach EOF (because you're trying to chomp a variable containing undef).
my %row; should probably be declared inside the loop. It looks like it's supposed to only contain values from the current line.
Suggestion:
open my $fh, '<', $file or die "$0: can't open $file: $!\n";
while (my $line = readline $fh) {
chomp $line;
...
}

perl subroutine argument lists - "pass by alias"?

I just looked in disbelief at this sequence:
my $line;
$rc = getline($line); # read next line and store in $line
I had understood all along that Perl arguments were passed by value, so whenever I've needed to pass in a large structure, or pass in a variable to be updated, I've passed a ref.
Reading the fine print in perldoc, however, I've learned that #_ is composed of aliases to the variables mentioned in the argument list. After reading the next bit of data, getline() returns it with $_[0] = $data;, which stores $data directly into $line.
I do like this - it's like passing by reference in C++. However, I haven't found a way to assign a more meaningful name to $_[0]. Is there any?
You can, its not very pretty:
use strict;
use warnings;
sub inc {
# manipulate the local symbol table
# to refer to the alias by $name
our $name; local *name = \$_[0];
# $name is an alias to first argument
$name++;
}
my $x = 1;
inc($x);
print $x; # 2
The easiest way is probably just to use a loop, since loops alias their arguments to a name; i.e.
sub my_sub {
for my $arg ( $_[0] ) {
code here sees $arg as an alias for $_[0]
}
}
A version of #Steve's code that allows for multiple distinct arguments:
sub my_sub {
SUB:
for my $thisarg ( $_[0] ) {
for my $thatarg ($_[1]) {
code here sees $thisarg and $thatarg as aliases
last SUB;
}
}
}
Of course this brings multilevel nestings and its own code readability issues, so use it only when absolutely neccessary.