foreach my $var (#list) -- $var is a reference? - perl

So, I never knew this and I want to get some clarifcation on it. I know if you do
foreach (#list){
if you change $_ in that loop it will affect the actual data. But, I did not know that if you did
foreach my $var1 (#list){
If you changed $var1 in the loop it would change the actual data. :-/ So, is there a way to loop over #list but keep the variable a read-only copy, or a copy that if changed will not change the value in #list?

$var is aliased to each item in turn.
See http://perldoc.perl.org/perlsyn.html#Foreach-Loops
If any element of LIST is an lvalue, you can modify it by modifying VAR inside the loop.
Conversely, if any element of LIST is NOT an lvalue, any attempt to modify that element
will fail. In other words, the foreach loop index variable is an implicit alias for each
item in the list that you're looping over.

Easiest way is just to copy it:
foreach my $var1 (#list) {
my $var1_scratch = $var1;
or
foreach my $var1 ( map $_, #list ) {
But if $var1 is a reference, $var1_scratch will be a reference to the same thing.
To be really safe, you'd have to use something like Storable::dclone to do a deep copy:
foreach my $var1 ( #{ Storable::dclone( \#list ) } ) {
}
(untested). Then you should be able to safely change $var1. But it could be expensive if
#list is a big datastructure.

It is an alias, not a reference. If you want to create your own aliases (outside of for) you can use Data::Alias.

The only difference between these loops:
foreach (#array) { ... }
foreach my $var (#array) { ... }
is the loop variable. The aliasing is a function of foreach, not the implicit variable $_. Note that this is an alias (another name for the same thing) and not a reference (a pointer to a thing).
In the simple (and common) case, you can break the aliasing by making a copy:
foreach my $var (#array) {
my $copy = $var;
# do something that changes $copy
}
This works for ordinary scalar values. For references (or objects) you would need to make a deep copy using Storable or Clone, which could be expensive. Tied variables are problematic as well, perlsyn recommends avoiding the situation entirely.

I don't know how to force the variable to be by-value instead of by-reference in the foreach statement itself. You can copy the value of $_ though.
#!/usr/perl/bin
use warnings;
use strict;
use Data::Dumper;
my #data = (1, 2, 3, 4);
print "Before:\n", Dumper(\#data), "\n\n\n";
foreach (#data) {
my $v = $_;
$v++;
}
print "After:\n", Dumper(\#data), "\n\n\n";
__END__

Make a copy of #list in the for statement:
foreach my $var1 (() = #list) {
# modify $var without modifying #list here
}

Related

How to reuse a variable, which is a reference to a hash?

I'm trying to add a few hashes to the array, reusing the same variable (in my real program, I'm doing this in a loop, that's why reusing the var). This is the code:
my #items;
my %x;
$x{'aa'} = 'bb';
push(#items, \%x);
%x = (); # I think the error is here, I'm not resetting the reference :(
$x{'cc'} = 'dd';
push(#items, \%x);
use Data::Dumper;
print Dumper \#items;
However, what I see is not what I expect:
$VAR1 = [
{
'cc' => 'dd'
},
$VAR1->[0]
];
What do I misunderstand?
ps. This is how it looks in the loop:
my #items;
my %x;
foreach ... {
%x = ();
$x{'aa'} = some_random_number();
push(#items, \%x);
}
Reusing a hash reference that way is not possible, it will always point to the same hash. And if you reset the same hash over and over and add new values to it, of course the beginning of the hash will be deleted.
You are correct that %x = () is the problem, because that is where you delete the content in %x (the 'aa' key).
What you want is to create a new hash reference for each value you want to store.
use strict;
use warnings;
my #items = qw( .... );
foreach ... {
my %x; # use a lexically scoped variable, which will be new each iteration
$x{'aa'} = some_random_number();
push(#items, \%x);
}
Or even:
my %x = (aa => some_random());
Or better yet, use a hash reference right away
my $x = { aa => some_random() };
push #items, $x;
Or again, a bit quicker:
push #items, { aa => some_random() };
{ ... } creates an anonymous hash ref. Reusing variables is not a good idea, hardly ever. Unless you are using so many variables that you are afraid of memory issues. Use the lexical scope and anonymous references to your advantage to encapsulate your code and avoid confusion.
But since you are using different keys, and pushing them onto an array, I feel like you are confused about Perl data structures. You could just use the same hash, if your keys are different:
my %x;
foreach ... {
$x{$key} = some_random_number();
}
You probably want a variable to keep track of the key name there, not a constant.
In the comments you describe a practice of adding values to the hash, and then adding it to the array to start over. This is exactly why you should use a lexically scoped hash. For example:
....
for my $table (#tables) {
my %hash; # <--- new variable for each $table
for my $stuff (#stuff) {
$hash{$stuff} = something();
}
....
push #array, \%hash;
}
If you put an enclosure around the hash, it will be reset automatically, and then use the "same" name space the following iterations, except it will point to a different memory location (a new data reference).
Or if, as you say, you cannot use a loop outside, you can just put a block around the variable.
{ # start of a block
my %hash;
for my $stuff (#stuff) {
$hash{$stuff} = something();
}
....
push #array, \%hash;
} # end of a block
my %hash; # this "%hash" is a different variable from the previous

How can I use a named iterator with postfix foreach?

#values = (1..5);
foreach $s(#values){
print "$s\n"; #It works
}
print "$_\n",foreach (#values); #It also works
print "$s\n",foreach $s(#values); #It not works
How to give the variable name?
Above the code does not print. It show the syntax error.
How to give the name for the foreach concept in , separated syntax. How can i do it.?
It is not possible to assign a variable name to the for loop iterator when using a statement modifier:
Statement Modifiers
Any simple statement may optionally be followed by a SINGLE modifier, just before the terminating semicolon (or block ending). The possible modifiers are:
...
for LIST
foreach LIST
...
The for(each) modifier is an iterator: it executes the statement once for each item in the LIST (with $_ aliased to each item in turn).
print "Hello $_!\n" for qw(world Dolly nurse);
Instead, as stated above, you must use the $_ variable:
#values = (1..5);
print "$_\n" for #values;
If you want to use a variable name for the iterator, then simply use a for(each) in long form:
for my $var (#values) {
print "$var\n";
}
my #values = (1..5);
print "$_\n" foreach (#values);
__END__
1
2
3
4
5
If you really want the variable to be named $s, you can do this:
my #values = (1..5);
print "$s\n" while($s = shift #values);
but the first approach is more idiomatic and readable.
The other answers are correct, you can't declare a name for the iterator in a for statement modifier. But here is a way you can get the similar effect of doing a loop on one line with a named variable.
perl5i provides a number of language enhancements, including this one...
use perl5i::2;
#a->foreach(func($s) { say $s });
# Similar to...
print "$_\n" for #a;
This also works for grep, map and the notoriously tricky to use correctly "each".
%h->each(func($key, $val) { say "$key => $val" });
You can also work in pairs, triplets, etc...
#a->foreach(func($this,$that) { say "$this, $that" });
Performance may suffer as perl5i makes a full function call for each iteration, rather than just a cheaper block entry.
Although I wouldn't recommend this in general due to readability issues, you could temporarily alias another global variable to the _ variable.
{
our $s;
local *s = *_;
print "$s\n" for 1, 2, 3;
}
You use a do block and reassign the $_ variable inside the block:
#values = (1..5);
do { $s = $_; print "$s\n" },foreach (#values);

How to parse through many hashes using foreach?

foreach my %hash (%myhash1,%myhash2,%myhash3)
{
while (($keys,$$value) = each %hash)
{
#use key and value...
}
}
Why doesn't this work :
it says synta error on foreach line.
Pls tell me why is it wrong.
This is wrong because you seem to think that this allows you to access each hash as a separate construct, whereas what you are in fact doing is, besides a syntax error, accessing the hashes as a mixed-together new list. For example:
my %hash1 = qw(foo 1 bar 1);
my %hash2 = qw(abc 1 def 1);
for (%hash1, %hash2) # this list is now qw(foo 1 bar 1 abc 1 def 1)
When you place a hash (or array) in a list context statement, they are expanded into their elements, and their integrity is not preserved. Some built-in functions do allow this behaviour, but normal Perl code does not.
You also cannot assign a hash as the for iterator variable, that can only ever be a scalar value. What you can do is this:
for my $hash (\%myhash1, \%myhash2, \%myhash3) {
while (my ($key, $value) = each %$hash) {
...
Which is to say, you create a list of hash references and iterate over them. Note that you cannot tell the difference between the hashes with this approach.
Note also that I use my $hash because this variable must be a scalar.
The syntax should be like:
my $hash1 = {'a'=>1};
my $hash2 = {'b'=>1};
my #arr2 = ($hash1, $hash2);
foreach $hash (#arr2)
{
while(($key, $value) = each %$hash)
{
print $key, $value;
}
}
you need to reference and then dereference the hash.

three questions on a Perl function

I am trying to use an existing Perl program, which includes the following function of GetItems. The way to call this function is listed in the following.
I have several questions for this program:
what does foreach my $ref (#_) aim to do? I think #_ should be related to the parameters passed, but not quite sure.
In my #items = sort { $a <=> $b } keys %items; the "items" on the left side should be different from the "items" on the right side? Why do they use the same name?
What does $items{$items[$i]} = $i + 1; aim to do? Looks like it just sets up the value for the hash $items sequentially.
$items = GetItems($classes, $pVectors, $nVectors, $uVectors);
######################################
sub GetItems
######################################
{
my $classes = shift;
my %items = ();
foreach my $ref (#_)
{
foreach my $id (keys %$ref)
{
foreach my $cui (keys %{$ref->{$id}}) { $items{$cui} = 1 }
}
}
my #items = sort { $a <=> $b } keys %items;
open(VAL, "> $classes.items");
for my $i (0 .. $#items)
{
print VAL "$items[$i]\n";
$items{$items[$i]} = $i + 1;
}
close VAL;
return \%items;
}
When you enter a function, #_ starts out as an array of (aliases to) all the parameters passed into the function; but the my $classes = shift removes the first element of #_ and stores it in the variable $classes, so the foreach my $ref (#_) iterates over all the remaining parameters, storing (aliases to) them one at a time in $ref.
Scalars, hashes, and arrays are all distinguished by the syntax, so they're allowed to have the same name. You can have a $foo, a #foo, and a %foo all at the same time, and they don't have to have any relationship to each other. (This, together with the fact that $foo[0] refers to #foo and $foo{'a'} refers to %foo, causes a lot of confusion for newcomers to the language; you're not alone.)
Exactly. It sets each element of %items to a distinct integer ranging from one to the number of elements, proceeding in numeric (!) order by key.
foreach my $ref (#_) loops through each hash reference passed as a parameter to GetItems. If the call looks like this:
$items = GetItems($classes, $pVectors, $nVectors, $uVectors);
then the loop processes the hash refs in $pVector, $nVectors, and $uVectors.
#items and %items are COMPLETELY DIFFERENT VARIABLES!! #items is an array variable and %items is a hash variable.
$items{$items[$i]} = $i + 1 does exactly as you say. It sets the value of the %items hash whose key is $items[$i] to $i+1.
Here is an (nearly) line by line description of what is happening in the subroutine
Define a sub named GetItems.
sub GetItems {
Store the first value in the default array #_, and remove it from the array.
my $classes = shift;
Create a new hash named %items.
my %items;
Loop over the remaining values given to the subroutine, setting $ref to the value on each iteration.
for my $ref (#_){
This code assumes that the previous line set $ref to a hash ref. It loops over the unsorted keys of the hash referenced by $ref, storing the key in $id.
for my $id (keys %$ref){
Using the key ($id) given by the previous line, loop over the keys of the hash ref at that position in $ref. While also setting the value of $cui.
for my $cui (keys %{$ref->{$id}}) {
Set the value of %item at position $cui, to 1.
$items{$cui} = 1;
End of the loops on the previous lines.
}
}
}
Store a sorted list of the keys of %items in #items according to numeric value.
my #items = sort { $a <=> $b } keys %items;
Open the file named by $classes with .items appended to it. This uses the old-style two arg form of open. It also ignores the return value of open, so it continues on to the next line even on error. It stores the file handle in the global *VAL{IO}.
open(VAL, "> $classes.items");
Loop over a list of indexes of #items.
for my $i (0 .. $#items){
Print the value at that index on it's own line to *VAL{IO}.
print VAL "$items[$i]\n";
Using that same value as an index into %items (which it is a key of) to the index plus one.
$items{$items[$i]} = $i + 1;
End of loop.
}
Close the file handle *VAL{IO}.
close VAL;
Return a reference to the hash %items.
return \%items;
End of subroutine.
}
I have several questions for this program:
What does foreach my $ref (#_) aim to do? I think #_ should be related to the parameters passed, but not quite sure.
Yes, you are correct. When you pass parameters into a subroutine, they automatically are placed in the #_ array. (Called a list in Perl). The foreach my $ref (#_) begins a loop. This loop will be repeated for each item in the #_ array, and each time, the value of $ref will be assigned the next item in the array. See Perldoc's Perlsyn (Perl Syntax) section about for loops and foreach loops. Also look at Perldoc's Perlvar (Perl Variables) section of General variables for information about special variables like #_.
Now, the line my $classes = shift; is removing the first item in the #_ list and putting it into the variable $classes. Thus, the foreach loop will be repeated three times. Each time, $ref will be first set to the value of $pVectors, $nVectors, and finally $uVectors.
By the way, these aren't really scalar values. In Perl, you can have what is called a reference. This is the memory location of the data structure you're referencing. For example, I have five students, and each student has a series of tests they've taken. I want to store all the values of each test in a hash keyed by the student's ID.
Normally, each entry in the hash can only contain a single item. However, what if this item refers to a list that contains the student's grades?
Here's the list of student #100's grade:
#grades = (100, 93, 89, 95, 74);
And here's how I set Student 100's entry in my hash:
$student{100} = \#grades;
Now, I can talk about the first grade of the year for Student #100 as $student{100}[0]. See the Perldoc's Mark's very short tutorial about references.
In my #items = sort { $a <=> $b } keys %items; the "items" on the left side should be different from the "items" on the right side? Why do they use the same name?
In Perl, you have three major types of variables: Lists (what some people call Arrays), Hashes (what some people call Keyed Arrays), and Scalars. In Perl, it is perfectly legal to have different variable types have the same name. Thus, you can have $var, %var, and #var in your program, and they'll be treated as completely separate variables1.
This is usually a bad thing to do and is highly discouraged. It gets worse when you think of the individual values: $var refers to the scalar while $var[3] refers to the list, and $var{3} refers to the hash. Yes, it can be very, very confusing.
In this particular case, he has a hash (a keyed array) called %item, and he's converting the keys in this hash into a list sorted by the keys. This syntax could be simplified from:
my #items = sort { $a <=> $b } keys %items;
to just:
my #items = sort keys %items;
See the Perldocs on the sort function and the keys function.
What does $items{$items[$i]} = $i + 1; aim to do? Looks like it just sets up the value for the hash $items sequentially.
Let's look at the entire loop:
foreach my $i (0 .. $#items)
{
print VAL "$items[$i]\n";
$items{$items[$i]} = $i + 1;
}
The subroutine is going to loop through this loop once for each item in the #items list. This is the sorted list of keys to the old %items hash. The $#items means the largest index in the item list. For example, if #items = ("foo", "bar", and "foobar"), then $#item would be 2 because the last item in this list is $item[2] which equals foobar.
This way, he's hitting the index of each entry in #items. (REMEMBER: This is different from %item!).
The next line is a bit tricky:
$items{$items[$i]} = $i + 1;
Remember that $item{} refers to the old %items hash! He's creating a new %items hash. This is being keyed by each item in the #items list. And, the value is being set to the index of that item plus 1. Let's assume that:
#items = ("foo", "bar", "foobar")
In the end, he's doing this:
$item{foo} = 1;
$item{bar} = 2;
$item{foobar} = 3;
1 Well, this isn't 100% true. Perl stores each variable in a kind of hash structure. In memory, $var, #var, and %var will be stored in the same hash entry in memory, but in positions related to each variable type. 99.9999% of the time, this matters not one bit. As far as you are concerned, these are three completely different variables.
However, there are a few rare occasions where a programmer will take advantage of this when they futz directly with memory in Perl.
I want to show you how I would write that subroutine.
Bur first, I want to show you some of the steps of how, and why, I changed the code.
Reduce the number of for loops:
First off this loop doesn't need to set the value of $items{$cui} to anything in particular. It also doesn't have to be a loop at all.
foreach my $cui (keys %{$ref->{$id}}) { $items{$cui} = 1 }
This does practically the same thing. The only real difference is it sets them all to undef instead.
#items{ keys %{$ref->{$id}} } = ();
If you really needed to set the values to 1. Note that (1)x#keys returns a list of 1's with the same number of elements in #keys.
my #keys = keys %{$ref->{$id}};
#items{ #keys } = (1) x #keys;
If you are going to have to loop over a very large number of elements then a for loop may be a good idea, but only if you have to set the value to something other than undef. Since we are only using the loop variable once, to do something simple; I would use this code:
$items{$_} = 1 for keys %{$ref->{$id}};
Swap keys with values:
On the line before that we see:
foreach my $id (keys %$ref){
In case you didn't notice $id was used only once, and that was for getting the associated value.
That means we can use values and get rid of the %{$ref->{$id}} syntax.
for my $hash (values %$ref){
#items{ keys %$hash } = ();
}
( $hash isn't a good name, but I don't know what it represents. )
3 arg open:
It isn't recommended to use the two argument form of open, or to blindly use the bareword style of filehandles.
open(VAL, "> $classes.items");
As an aside, did you know there is also a one argument form of open. I don't really recommend it though, it's mostly there for backward compatibility.
our $VAL = "> $classes.items";
open(VAL);
The recommend way to do it, is with 3 arguments.
open my $val, '>', "$classes.items";
There may be some rare edge cases where you need/want to use the two argument version though.
Put it all together:
sub GetItems {
# this will cause open and close to die on error (in this subroutine only)
use autodie;
my $classes = shift;
my %items;
for my $vector_hash (#_){
# use values so that we don't have to use $ref->{$id}
for my $hash (values %$ref){
# create the keys in %items
#items{keys %$hash} = ();
}
}
# This assumes that the keys of %items are numbers
my #items = sort { $a <=> $b } keys %items;
# using 3 arg open
open my $output, '>', "$classes.items";
my $index; # = 0;
for $item (#items){
print {$output} $item, "\n";
$items{$item} = ++$index; # 1...
}
close $output;
return \%items;
}
Another option for that last for loop.
for my $index ( 1..#items ){
my $item = $items[$index-1];
print {$output} $item, "\n";
$items{$item} = $index;
}
If your version of Perl is 5.12 or newer, you could write that last for loop like this:
while( my($index,$item) = each #items ){
print {$output} $item, "\n";
$items{$item} = $index + 1;
}

Why is Perl foreach variable assignment modifying the values in the array?

OK, I have the following code:
use strict;
my #ar = (1, 2, 3);
foreach my $a (#ar)
{
$a = $a + 1;
}
print join ", ", #ar;
and the output?
2, 3, 4
What the heck? Why does it do that? Will this always happen? is $a not really a local variable? What where they thinking?
Perl has lots of these almost-odd syntax things which greatly simplify common tasks (like iterating over a list and changing the contents in some way), but can trip you up if you're not aware of them.
$a is aliased to the value in the array - this allows you to modify the array inside the loop. If you don't want to do that, don't modify $a.
See perldoc perlsyn:
If any element of LIST is an lvalue, you can modify it by modifying VAR inside the loop. Conversely, if any element of LIST is NOT an lvalue, any attempt to modify that element will fail. In other words, the foreach loop index variable is an implicit alias for each item in the list that you're looping over.
There is nothing weird or odd about a documented language feature although I do find it odd how many people refuse check the docs upon encountering behavior they do not understand.
$a in this case is an alias to the array element. Just don't have $a = in your code and you won't modify the array. :-)
If I remember correctly, map, grep, etc. all have the same aliasing behaviour.
As others have said, this is documented.
My understanding is that the aliasing behavior of #_, for, map and grep provides a speed and memory optimization as well as providing interesting possibilities for the creative. What happens is essentially, a pass-by-reference invocation of the construct's block. This saves time and memory by avoiding unnecessary data copying.
use strict;
use warnings;
use List::MoreUtils qw(apply);
my #array = qw( cat dog horse kanagaroo );
foo(#array);
print join "\n", '', 'foo()', #array;
my #mapped = map { s/oo/ee/g } #array;
print join "\n", '', 'map-array', #array;
print join "\n", '', 'map-mapped', #mapped;
my #applied = apply { s/fee//g } #array;
print join "\n", '', 'apply-array', #array;
print join "\n", '', 'apply-applied', #applied;
sub foo {
$_ .= 'foo' for #_;
}
Note the use of List::MoreUtils apply function. It works like map but makes a copy of the topic variable, rather than using a reference. If you hate writing code like:
my #foo = map { my $f = $_; $f =~ s/foo/bar/ } #bar;
you'll love apply, which makes it into:
my #foo = apply { s/foo/bar/ } #bar;
Something to watch out for: if you pass read only values into one of these constructs that modifies its input values, you will get a "Modification of a read-only value attempted" error.
perl -e '$_++ for "o"'
the important distinction here is that when you declare a my variable in the initialization section of a for loop, it seems to share some properties of both locals and lexicals (someone with more knowledge of the internals care to clarify?)
my #src = 1 .. 10;
for my $x (#src) {
# $x is an alias to elements of #src
}
for (#src) {
my $x = $_;
# $_ is an alias but $x is not an alias
}
the interesting side effect of this is that in the first case, a sub{} defined within the for loop is a closure around whatever element of the list $x was aliased to. knowing this, it is possible (although a bit odd) to close around an aliased value which could even be a global, which I don't think is possible with any other construct.
our #global = 1 .. 10;
my #subs;
for my $x (#global) {
push #subs, sub {++$x}
}
$subs[5](); # modifies the #global array
Your $a is simply being used as an alias for each element of the list as you loop over it. It's being used in place of $_. You can tell that $a is not a local variable because it is declared outside of the block.
It's more obvious why assigning to $a changes the contents of the list if you think about it as being a stand in for $_ (which is what it is). In fact, $_ doesn't exist if you define your own iterator like that.
foreach my $a (1..10)
print $_; # error
}
If you're wondering what the point is, consider the case:
my #row = (1..10);
my #col = (1..10);
foreach (#row){
print $_;
foreach(#col){
print $_;
}
}
In this case it is more readable to provide a friendlier name for $_
foreach my $x (#row){
print $x;
foreach my $y (#col){
print $y;
}
}
Try
foreach my $a (#_ = #ar)
now modifying $a does not modify #ar.
Works for me on v5.20.2