How to avoid input modification in PDL subroutines - perl

I would like to avoid the assignment operator .= to modify the user input from a subroutine.
One way to avoid this is to perform a copy of the input inside the subroutine. Is this the best way to proceed? Are there other solutions?
use PDL;use strict;
my $a=pdl(1);
f_0($a);print "$a\n";
f_1($a);print "$a\n";
sub f_0{
my($input)=#_;
my $x=$input->copy;
$x.=0;
}
sub f_1{
my($input)=#_;
$input.=0;
}
In my case (perl 5.22.1), executing last script prints 1 and 0 in two lines. f_0 does not modify user input in-place, while f_1 does.

According to the FAQ 6.17 What happens when I have several references to the same PDL object in different variables :
Piddles behave like Perl references in many respects. So when you say
$a = pdl [0,1,2,3]; $b = $a;
then both $b and $a point to the same
object, e.g. then saying
$b++;
will not create a copy of the original piddle but just
increment in place
[...]
It is important to keep the "reference nature" of piddles in mind when
passing piddles into subroutines. If you modify the input piddles you
modify the original argument, not a copy of it. This is different from
some other array processing languages but makes for very efficient
passing of piddles between subroutines. If you do not want to modify
the original argument but rather a copy of it just create a copy
explicitly...
So yes, to avoid modification of the original, create a copy as you did:
my $x = $input->copy;
or alternatively:
my $x = pdl( $input );

Related

Perl subroutines

Here I fixed most of my mistakes and thank you all, any other advice please with my hash at this point and how can I clear each word and puts the word and its frequency in a hash, excluding the empty words.. I think my code make since now.
So you can focus on the key part of the algorithm, how about accepting input on STDIN and output to STDOUT. That way there's no argument checking, etc. Just a simple:
$ prog < words.txt
All you really need is a very simple algorithm:
Read a line
Split it into words
Record a count of the word
When done, display the counts
Here's a sample program
#! /usr/bin/perl -w
use strict;
my (%data);
while (<STDIN>) {
chomp;
my(#words) = split(/\s+/);
foreach my $word (#words) {
if (!defined($data{$word})) {
$data{$word} = 0;
}
$data{$word}++;
}
}
foreach (sort(keys(%data))) {
print "$_: $data{$_}\n";
}
Once you understand this and have it working in your environment, you can extend it to meet your other requirements:
remove non-alphabetic characters from each word
print three results per line
use input and output files
put the algorithm into a subroutine
I agree that starting with dave's answer would be more productive, but if you are interested in your mistakes, here is what I see:
You assign the return value of checkArgs to a scalar variable $checkArgs, but return an array value. It means that $checkArgs will always contain 2 (the size of the array) after this call (because the program dies if the number of arguments is not 2). It is not very bad since you do not use the value later, but why you need it at all in this case?
You open files and close them immediately without reading from them. Does not make sense.
Statement
while (<>)
reads either from standard output or from all files in the command line arguments. The latter variant is like what you want, but your second argument is the output file, not input. The diamond operator will try to read from it too. You have two options: a) use only one file name in the command line arguments, read the file with <>, use standard output for output, and redirect output to a file in shell; b) use
while(<$file1>)
instead, of course, before closing files. Option a) is the traditional Unix- and Perl-style, but b) provides for clearer code for beginners.
Statements
return $word;
and
return $str, $hash{$str};
return corresponding values on the first iterations of the loops, all other data remain unprocessed. In the first case, you should create a local array, store all $word in it and return the array as a whole. In the second case, you already have such local %hash, it is enough to return this hash. In both cases, you need should assign the return values of the functions not to scalars, but to an array and a hash correspondingly. Now, you actually lose all you data.

Perl operators that modify inputs in-place

I recently took a Perl test and one of the questions was to find all the Perl operations that can be used to modify their inputs in-place. The options were
sort
map
do
grep
eval
I don't think any of these can modify the inputs in-place. Am I missing anything here or is the question wrong?
Try this:
my #array = qw(1 2 3 4);
print "#array\n";
my #new_array = map ++$_, #array;
print "#new_array\n";
print "#array\n"; # oops, we modified this in-place
grep is similar. For sort, the $a and $b variables are aliases back to the original array, so can also be used to modify it. The result is somewhat unpredictable, depending on what sorting algorithm Perl is using (which has historically changed in different versions of Perl, though hasn't changed in a while).
my #arr = qw(1 2 3 4 5);
my #new = sort { ++$a } #arr;
print "#arr\n";
do and eval can take an arbitrary code block, so can obviously modify any non-readonly variable, though it's not clear whether that counts as modifying inputs in place. Slade's example using the stringy form of eval should certainly count though.
I'm assuming the question is testing to see if the student knows to properly use the return values of sort, map, and so on instead of using them in void context and expecting side effects. It's totally possible to modify the parameters given, though.
map and grep alias $_ to each element, so modifying $_ will change the values of the variables in the list passed to it (assuming they're not constants or literals).
eval EXPR and do EXPR can do anything, more or less, so there's nothing stopping you from doing something like:
my $code = q($code = 'modified');
eval $code;
say $code;
The arguments to do BLOCK and eval BLOCK are always a literal block of code, which aren't valid lvalues in any way I know of.
sort has a special optimization when called like #array = sort { $a <=> $b } #array;. If you look at the opcodes generated by this with B::Concise, you'll see something like:
9 <#> sort lK/INPLACE,NUM
But for a question about the language semantics, an implementation detail is irrelevant.

A better variable naming system?

A newbie to programming. The task is to extract a particular data from a string and I chose to write the code as follows -
while ($line =<IN>) {
chomp $line;
#tmp=(split /\t/, $line);
next if ($tmp[0] !~ /ch/);
#tgt1=#tmp[8..11];
#tgt2=#tmp[12..14];
#tgt3=#tmp[15..17];
#tgt4=#tmp[18..21];
foreach (1..4) {
print #tgt($_), "\n";
}
I thought #tgt($_) would be interpreted as #tgt1, #tgt2, #tgt3, #tgt4 but I still get the error message that #tgt is a global symbol (#tgt1, #tgt2, #tgt3, #tgt4` have been declared).
Q1. Did I misunderstand foreach loop?
Q2. Why couldn't perl see #tgt($_) as #tgt1, #tgt2 ..etc?
Q2. From the experience this is probably a bad way to name variables. What would be a preferred way to name variables that have similar features?
Q2. Why couldn't perl see #tgt($_) as #tgt1, #tgt2 ..etc?
Q2. From the experience this is probably a bad way to name variables. What would be a preferred way to name variables that have similar features?
I'll asnswer both together.
#tgt($_) does NOT mean what you hope it means
First off, it's an invalid syntax (you can't use () after an array name, perl interpeter will produce a compile error).
What you're trying to do is access distinct variables by accessing a variable via an expression resulting in its name (aka symbolic references). This IS possible to do; but is typically a bad idea and poor-style Perl (as in, you CAN but you SHOULD NOT do it, without a very very good reason).
To access element $_ the way you tried, you use #{"tgt$_"} syntax. But I repeat - Do Not Do That, even if you can.
A correct idiomatic solution: use an array of arrayrefs, with your 1-4 (or rather 0-3) indexing the outer array:
# Old bad code: #tgt1=#tmp[8..11];
# New correct code:
$tgt[0]=[ #tmp[8..11] ]; # [] creates an array reference from a list.
# etc... repeat 4 times - you can even do it in a smart loop later.
What this does is, it stores a reference to an array slice into a zeroth element of a single #tgt array.
At the end, #tgt array has 4 elements , each an array reference to an array containing one of the slices.
Q1. Did I misunderstand foreach loop?
Your foreach loop (as opposed to its contents - see above) was correct, with one style caveat - again, while you CAN use a default $_ variable, you should almost never use it, instead always use named variables for readability.
You print the abovementioned array of arrayrefs as follows (ask separately if any of the syntax is unclear - this is a mid-level data structure handling, not for beginners):
foreach my $index (0..3) {
print join(",", #{ $tgt[$index]}) . "\n";
}

Pass by value vs pass by reference for a Perl hash

I'm using a subroutine to make a few different hash maps. I'm currently passing the hashmap by reference, but this conflicts when doing it multiple times. Should I be passing the hash by value or passing the hash reference?
use strict;
use warnings;
sub fromFile($){
local $/;
local our %counts =();
my $string = <$_[0]>;
open FILE, $string or die $!;
my $contents = <FILE>;
close FILE or die $!;
my $pa = qr{
( \pL {2} )
(?{
if(exists $counts{lc($^N)}){
$counts{lc($^N)} = $counts{lc($^N)} + 1;
}
else{
$counts{lc($^N)} = '1';
}
})
(*FAIL)
}x;
$contents =~ $pa;
return %counts;
}
sub main(){
my %english_map = &fromFile("english.txt");
#my %german_map = &fromFile("german.txt");
}
main();
When I run the different txt files individually I get no problems, but with both I get some conflicts.
Three comments:
Don't confuse passing a reference with passing by reference
Passing a reference is passing a scalar containing a reference (a type of value).
The compiler passes an argument by reference when it passes the argument without making a copy.
The compiler passes an argument by value when it passes a copy of the argument.
Arguments are always passed by reference in Perl
Modifying a function's parameters (the elements of #_) will change the corresponding variable in the caller. That's one of the reason the convention to copy the parameters exists.
my ($x, $y) = #_; # This copies the args.
Of course, the primary reason for copying the parameters is to "name" them, but it saves us from some nasty surprises we'd get by using the elements of #_ directly.
$ perl -E'sub f { my ($x) = #_; "b"=~/(.)/; say $x; } "a"=~/(.)/; f($1)'
a
$ perl -E'sub f { "b"=~/(.)/; say $_[0]; } "a"=~/(.)/; f($1)'
b
One cannot pass an array or hash as an argument in Perl
The only thing that can be passed to a Perl sub is a list of scalars. (It's also the only thing that can be returned by one.)
Since #a evaluates to $a[0], $a[1], ... in list context,
foo(#a)
is the same as
foo($a[0], $a[1], ...)
That's why we create a reference to the array or hash we want to pass to a sub and pass the reference.
If we didn't, the array or hash would be evaluated into a list of scalars, and it would have to be reconstructed inside the sub. Not only is that expensive, it's impossible in cases like
foo(#a, #b)
because foo has no way to know how many arguments were returned by #a and how many were returned by #b.
Note that it's possible to make it look like an array or hash is being passed as an argument using prototypes, but the prototype just causes a reference to the array/hash to be created automatically, and that's what actually passed to the sub.
For a couple of reasons you should use pass-by-reference, but the code you show returns the hash by value.
You should use my rather than local except for built-in variables like $/, and then for only as small a scope as possible.
Prototypes on subroutines are almost never a good idea. They do something very specific, and if you don't know what that is you shouldn't use them.
Calling subroutines using the ampersand sigil, as in &fromFile("english.txt"), hasn't been correct since Perl 4, about twenty years ago. It affects the parameters delivered to a subroutine in at least two different ways and is a bad idea.
I'm not sure why you are using a file glob with my $string = <$_[0]>. Are you expecting wildcards in the filename passed as the parameter? If so then you will be opening and reading only the first matching file, otherwise the glob is unnecessary.
Lexical file handles like $fh are better than bareword file handles like FILE, and will be closed implicitly when they are destroyed - usually at the end of the block where they are declared.
I am not sure how your hash %counts gets populated. No regex on its own can fill a hash, but I will have to trust you!
Try this version. People familiar with Perl will thank you (ironically!) for not using camel-case variable names. And it is rare to see a main subroutine declared and called. That is C, this is Perl.
Update I have changed this code to do what your original regex did.
use strict;
use warnings;
sub from_file {
my ($filename) = #_;
my $contents = do {
open my $fh, '<', $filename or die qq{Unable to open "$filename": $!};
local $/;
my $contents = <$fh>;
};
my %counts;
$counts{lc $1}++ while $contents =~ /(?=(\pL{2}))/g;
return \%counts;
}
sub main {
my $english_map = from_file('english.txt');
my $german_map = from_file('german.txt');
}
main();
You can use either a reference or pass the entire hash or array. Your choice. There are two issues that might make you choose one over the other:
Passing other parameters
Memory Management
Perl doesn't really have subroutine parameters. Instead, you're simply passing in an array of parameters. What if you're subroutine is seeing which array has more elements. I couldn't do this:
foo(#first, #second);
because all I'll be passing in is one big array that combines all the members of both. This is true with hashes too. Imagine a program that takes two hashes and finds the ones with common keys:
#common_keys = common(%hash1, %hash1);
Again, I'm combining all the keys and their values in both hashes into one big array.
The only way around this issue is to pass a reference:
foo(\#first, \#second);
#common_keys = common(\%hash1, \%hash2);
In this case, I'm passing the memory location where these two hashes are stored in memory. My subroutine can use those hash references. However, you do have to take some care which I'll explain with the second explanation.
The second reason to pass a reference is memory management. If my array or hash is a few dozen entries, it really doesn't matter all that much. However, imagine I have 10,000,000 entries in my hash or array. Copying all those members could take quite a bit of time. Passing by reference saves me memory, but with a terrible cost. Most of the time, I'm using subroutines as a way of not affecting my main program. This is why subroutines are suppose to use their own variables and why you're taught in most programming courses about variable scope.
However, when I pass a reference, I'm breaking that scope. Here's a simple program that doesn't pass a reference.
#! /usr/bin/env perl
use strict;
use warnings;
my #array = qw(this that the other);
foo (#array);
print join ( ":", #array ) . "\n";
sub foo {
my #foo_array = #_;
$foo_array[1] = "FOO";
}
Note that the subroutine foo1 is changing the second element of the passed in array. However, even though I pass in #array into foo, the subroutine doesn't change the value of #array. That's because the subroutine is working on a copy (created by my #foo_array = #_;). Once the subroutine exists, the copy disappears.
When I execute this program, I get:
this:that:the:other
Now, here's the same program, except I'm passing in a reference, and in the interest of memory management, I use that reference:
#! /usr/bin/env perl
use strict;
use warnings;
my #array = qw(this that the other);
foo (\#array);
print join ( ":", #array ) . "\n";
sub foo {
my $foo_array_ref = shift;
$foo_array_ref->[1] = "FOO";
}
When I execute this program, I get:
this:FOO:the:other
That's because I don't pass in the array, but a reference to that array. It's the same memory location that holds #array. Thus, changing the reference in my subroutine causes it to be changed in my main program. Most of the time, you do not want to do this.
You can get around this by passing in a reference, then copying that reference to an array. For example, if I had done this:
sub foo {
my #foo_array = #{ shift() };
I would be making a copy of my reference to another array. It protects my variables, but it does mean I'm copying my array over to another object which takes time and memory. Back in the 1980s when I first was programming, this was a big issue. However, in this age of gigabyte memory and quadcore processors, the main issue isn't memory management, but maintainability. Even if your array or hash contained 10 million entries, you'll probably not notice any time or memory issues.
This also works the other way around too. I could return from my subroutine a reference to a hash or the entire hash. Many people like returning a reference, but this could be problematic.
In object oriented Perl programming, I use references to keep track of my objects. Normally, I'll have a reference to a hash I can use to store other values, arrays, and hashes.
In a recent program, I was counting IDs and how many times they are referenced in a log file. This was stored in an object (which is just a reference to a hash). I had a method that would return the entire hash of IDs and their counts. I could have done this:
return $self->{COUNT_HASH};
But, what happened, if the user started modifying that reference I passed? They would be actually manipulating my object without using my methods to add and subtract from the IDs. Not something that I want them to do. Instead, I create a new hash, and then return a reference to that hash:
my %hash_counts = % { $self-{COUNT_HASH} };
return \%hash_count;
This copied my reference to an array, and then I passed the reference to the array. This protects my data from outside manipulation. I could still return a reference, but the user would no longer have access to my object without going through my methods.
By the way, I like using wantarray which gives the caller a choice on how they want their data:
my %hash_counts = %{ $self->{COUNT_HASH} };
return want array ? %hash_counts : \%hash_counts;
This allows me to return a reference or a hash depending how the user called my object:
my %hash_counts = $object->totals(); # Returns a hash
my $hash_counts_ref = $object->totals(); # Returns a reference to a hash
1 A footnote: The #_ array is pointing to the same memory location as the parameters of your calling subroutine. Thus, if I pass in foo(#array) and then did $_[1] = "foo";, I would be changing the second element of #array.

Passing arrays to functions in Perl

I think I have misunderstood some aspects of argument passing to functions in Perl. What's the difference between func(\#array) and func(#array)?
AFAIK, in both functions, arguments are passed by reference and in both functions we can change the elements of #array in the main program. So what's the difference? When should we use which?
#array = (1,2,3);
func(#array);
func(\#array);
sub func {
...
}
Also, how do I imitate pass-by-value in Perl? Is using #_ the only way?
It's impossible to pass arrays to subs. Subs take a list of scalars for argument. (And that's the only thing they can return too.)
You can pass a reference to an array:
func(\#array)
You can pass the elements of an array:
func(#array)
When should we use which?
If you want to pass more than just the elements of the array (e.g. pass $x, $y and #a), it can become tricky unless you pass a reference.
If you're going to process lists (e.g. sum mysub grep { ... } ...), you might not want to pass a reference.
If you want to modify the array (as opposed to just modifying the existing elements of the array), you need to pass a reference.
It can be more efficient to pass a reference for long arrays, since creating and putting one reference on the stack is faster than creating an alias for each element of a large array. This will rarely be an issue, though.
It's usually decided by one of the first two of the above. Beyond that, it's mostly a question of personal preference.
Also, how do I imitate pass-by-value in Perl?
sub foo {
my ($x) = #_; # Changing $x doesn't change the argument.
...
}
sub foo {
my #a = #_; # Changing #a or its contents
... # doesn't change the arguments.
}
AFAIK, in both functions, arguments are passed by reference and in both functions we can change the elements of #array in the main program.
"change the elements of", yes. However, in the func(#array) case, the sub has no means to make other changes to the array (truncating it, pushing, popping, slicing, passing a reference to something else, even undef'ing it).
I would avoid using the term "passed by reference", since the mechanism is completely different than Perl's references. It is less overloaded :) to say that in the sub, #_'s elements start off aliased to the elements passed to the sub.
func(\#array) passes a reference. func(#array) passes a list (of the elements in #array). As Keith pointed out, these elements are passed by reference. However, you can make a copy inside the sub in order to pass by value.
What you are after is this:
sub func {
my #array = #_;
}
This will pass a copy of the arguments of func to #array, which is a local variable within the scope of the subroutine.
Documentation here