Accessing variable outside foreach loop - perl

what is the scope of a variable declared in this way:
foreach $variable (<FILE>){
if($variable...){
}
}
print "$variable \n";
is it possible to use it outside loop?
thanks in advance.

At first, it would seem that it should work, because you're obviously not running this code with strict as you don't declare $variable. And you're not declaring it a lexical variable (my $variable), so it is a "package variable", and works like a global.
However, Perl needlessly, localizes the scope of the loop variable.
And even though this looks like it should work:
use strict;
use warnings;
use feature 'say';
...
my $variable; # creates a lexical variable.
foreach $variable (<FILE>){
if($variable...){
...
}
}
say $variable; # modern form of: print "$variable \n";
Perl needlessly again, localizes the scope of the variable.
Often you can declare the lexical as part of the loop. Like so:
foreach my $variable ( <FILE> ) {
...
}
It does not allow you to access that variable outside of the loop. However, whether you specify my in the loop or not, just putting the variable before the parenthesis localizes the scope of whatever variable you might use.
So if you want to know what the value is outside the loop, it has to be explicitly other than the loop variable.
my $var;
foreach $variable ( <FILE> ) {
$var = $variable;
}
say $var;
In the comments below, you asked me what better way to read a file. So the below contains some nitpicks.
By far the best way to loop through a file is a while loop. It has much less overhead than a foreach loop, and the Perl syntax makes it easy to use.
use English qw<$OS_ERROR>; # imports a standard readable alias for $!
# 1) Use lexical file handles, not "barewords"; 2) use 3-argument open;
# 3) always open or die.
open( my $handle, '<', 'foo.txt' )
or die "Could not open file: $OS_ERROR!"
;
while ( my $line = <$handle> ) {
...
}
close $handle;

Yes, it is possible, but be aware that variable is always localized to the loop (restores previous value after the loop).
From perldoc perlsyn
If the variable is preceded with the keyword my, then it is lexically scoped, and is therefore visible only within the loop.
Otherwise, the variable is implicitly local to the loop and regains its former value upon exiting the loop. If the variable was previously declared with my, it uses that variable instead of the global one, but it's still localized to the loop. This implicit localization occurs only in a foreach loop.

Related

How to re-declare a variable in the same scope in perl?

Is there a way to re-declare a variable in the same scope using the my keyword in perl? When I run the following script:
use warnings;
use strict;
my $var = 3;
print "$var\n";
undef $var;
my $var = 4;
print "$var\n";
I get the "desired" output, but there is also a warning "my" variable $var masks earlier declaration in same scope. Is there a way to re-declare the variable without getting the warning?
I'm not sure, but I think this is because my happens at compile-time and undef happens at run-time because the warning is being printed even before the first print statement. (I'm not even sure if perl actually compiles the thing before running it.)
Context: I want to be able to copy a chunk of code and paste it multiple times in the same file without having to edit-out all the my declarations. I guess this isn't the best way to do it, but any solution to the problem would be appreciated.
To avoid the warning, you can enclose the new variable declaration, and the code that uses it, inside curly braces ({...}) and create a new scope.
my $var = 3;
print "$var\n";
{
my $var = 4;
print "$var\n";
}

In Perl, can local() create a variable?

I have read many posts in Stackoverflow and in Google which tell that local does not create a variable, instead it works on the existing ones.
I have a small piece of code below and I wonder how local is working when there is no such variable already created.
#use strict;
#use warnings;
&func;
sub func{
local $temp = 20;
print $temp;
}
This I wrote just to understand the concept and I am relatively new to Perl.
Unless you declare a variable with my, variables without a full package specification go into the current package. Here's how you might see variables used for the first time and what they would be:
my $temp; # a scoped, lexical variable that does not live in any package
state $temp; # a persistent lexical variable
our $temp; # a package variable in the current package, declared
$temp; # a package variable in the current package
$main::temp # a package variable in main
$Foo::Bar::temp # a package variable in Foo::Bar
local $temp # a package variable in the current package, with a dynamically-scoped (temporary) value
The local sets the scope of a package variable. When you declare this "dynamic" scope, Perl uses the temporary value you set until the end of the scope. As with other package variables, Perl creates them when you first use them. That you might use it first with local in front doesn't affect that.
Many people who tried to answer your question immediately nagged you about strict. This is a programming aid that helps you not mistype a variable name by forcing you to declare all variables you intend to use. When you use a variable name you haven't declared, it stops the compilation of your program. You can do that with the vars pragma, my, state, or our:
use vars qw($temp);
our $temp;
my $temp;
state $temp;
local isn't part of that, as you've seen. Why? Because that's just how it is. I'd like it more if it were different.
strict won't complain if you use the full package specification, such as $Foo::Bar::temp. You can mistype all of those without ever noticing.
I mostly reserve my use of local for Perl's special variables, which you don't have to declare. If I want to use $_ in a subroutine, perhaps to use the operators that use $_ by default, I'll probably start that with local $_:
sub something {
local $_ = shift #_;
s/.../.../;
tr/.../.../;
...;
}
I probably use local more often with the input record separator so I can use different line endings without affecting might have come before:
my $data = do { local $/; <FILE> };
Those work because there's an implicit first use of those variables that you haven't seen.
Otherwise, I probably want to make variables private to its subroutine so nothing outside the subroutine can see it. In that case, I don't want a package variable that the rest of the program can read or write. That's the job for my variables:
sub something {
my $temp = ...;
}
The trick of programming is to limit what can happen to exactly what you want. If the rest of your program shouldn't be able to see or change the variable, my is the way to go.
I explain this is Learning Perl and write about the details of the package variables in Mastering Perl.
local does not create a variable instead works on the existing ones. but i have a small piece of code below and i wonder how local is working when there is no such variable already created.
Lets make a few steps, and let the perl do some diagnostics,
perl -wE 'local $temp =3'
Name "main::temp" used only once: possible typo at -e line 1.
So local $temp alters $main::temp which is package variable and
perl -wE 'local $main::temp =3'
Name "main::temp" used only once: possible typo at -e line 1.
gives the same warning. So we created a new package variable which is localized.
What does this mean? It means that unlike our $temp it keeps the value of package ('global') variable $temp until it exits enclosing block at which point it restores value to previous value.
A few more tests,
perl -MData::Dumper -E 'say Dumper [exists $main::{t}, ${$main::{t}}]'
$VAR1 = [
'', # `$main::t` is NOT created in main package
undef # retrieving value of `$main::t` thus returns undef
];
perl -MData::Dumper -E '{our $t=7} say Dumper [exists $main::{t}, ${$main::{t}}]'
$VAR1 = [
1, # `$main::t` is created in main package
7 # value of `$main::t`
];
and finally,
perl -MData::Dumper -E '{local $t=7} say Dumper [exists $main::{t}, ${$main::{t}}]'
$VAR1 = [
1, # `$main::t` is *CREATED* in main package
undef # value of `$main::t` reverts to undef at exit of enclosing block
];
local does not create a variable. Simply mentioning $temp is creating the variable. It is created when as soon as it is first encountered, whether at compile-time or at run-time.
$ perl -E'
$foo;
${"bar"};
BEGIN { say $::{foo} && *{ $::{foo} }{SCALAR} ? "exists" : "doesn'\''t exist"; }
BEGIN { say $::{bar} && *{ $::{bar} }{SCALAR} ? "exists" : "doesn'\''t exist"; }
BEGIN { say $::{baz} && *{ $::{baz} }{SCALAR} ? "exists" : "doesn'\''t exist"; }
say $::{foo} && *{ $::{foo} }{SCALAR} ? "exists" : "doesn'\''t exist";
say $::{bar} && *{ $::{bar} }{SCALAR} ? "exists" : "doesn'\''t exist";
say $::{baz} && *{ $::{baz} }{SCALAR} ? "exists" : "doesn'\''t exist";
'
exists # $foo exists at compile-time
doesn't exist # $bar doesn't exist at compile-time
doesn't exist # $baz doesn't exist at compile-time
exists # $foo exists at run-time
exists # $bar exists at run-time
doesn't exist # $baz doesn't exist at run-time
Having variables created simply by naming them makes it hard to spot typos. We use use strict; because it prevents that.
local only has a run-time effect. local temporarily backs up the value of $temp in a way that causes Perl to restore it when the lexical scope is exited.
$ perl -E'
sub f { say $temp; }
$temp = 123;
f();
{
local $temp = 456;
f();
}
f();
'
123
456
123
You forgot to use use strict. If you do not use strict the global package variable $temp will be used.. See http://perlmaven.com/global-symbol-requires-explicit-package-name.
Package variables are always global. They have a name and a package qualifier. You can omit the package qualifier, in which case Perl uses a default, which you can set with the package declaration.
To avoid using global variables by accident, add use strict 'vars' to your program.
From the documentation:
use strict vars: This generates a compile-time error if you access a variable that was
neither explicitly declared (using any of my, our, state, or use vars
) nor fully qualified. (Because this is to avoid variable suicide
problems and subtle dynamic scoping issues, a merely local variable
isn't good enough.)
Without use strict -- specifically use strict 'vars', which is a subset -- just mentioning a variable creates it in the current package. There is no need even for local, and your code can be written like this
sub func{
$temp = 20;
print $temp;
}
func();
output
20
That is one reason why use strict is so important, and it is dangerous to omit it. Without it you have no protection against misspelling variables and silently breaking your program

Perl - Use of uninitialized value within %frequency in concatenation (.) or string

Not entirely sure why but for some reason i cant print the hash value outside the while loop.
#!/usr/bin/perl -w
opendir(D, "cwd" );
my #files = readdir(D);
closedir(D);
foreach $file (#files)
{
open F, $file or die "$0: Can't open $file : $!\n";
while ($line = <F>) {
chomp($line);
$line=~ s/[-':!?,;".()]//g;
$line=~ s/^[a-z]/\U/g;
#words = split(/\s/, $line);
foreach $word (#words) {
$frequency{$word}++;
$counter++;
}
}
close(F);
print "$file\n";
print "$ARGV[0]\n";
print "$frequency{$ARGV[0]}\n";
print "$counter\n";
}
Any help would be much appreciated!
cheers.
This line
print "$frequency{$ARGV[0]}\n";
Expects you to have an argument to your script, e.g. perl script.pl argument. If you have no argument, $ARGV[0] is undefined, but it will stringify to the empty string. This empty string is a valid key in the hash, but the value is undefined, hence your warning
Use of uninitialized value within %frequency in concatenation (.) or string
But you should also see the warning
Use of uninitialized value $ARGV[0] in hash element
And it is a very big mistake not to include that error in this question.
Also, when using readdir, you get all the files in the directory, including directories. You might consider filtering the files somewhat.
Using
use strict;
use warnings;
Is something that will benefit you very much, so add that to your script.
I had originally written this,
There is no %frequency defined at the top level of your program.
When perl sees you reference %frequency inside the inner-most
loop, it will auto-vivify it, in that scratchpad (lexical scope).
This means that when you exit the inner-most loop (foreach $word
(#words)), the auto-vivified %frequency is out of scope and
garbage-collected. Each time you enter that loop, a new, different
variable will be auto-vivified, and then discarded.
When you later refer to %frequency in your print, yet another new,
different %frequency will be created.
… but then realized that you had forgotten to use strict, and Perl was being generous and giving you a global %frequency, which ironically is probably what you meant. So, this answer is wrong in your case … but declaring the scope of %frequency would probably be good form, regardless.
These other, “unrelated” notes are still useful perhaps, or else I'd delete the answer altogether:
As #TLP mentioned, you should probably also skip directories (at least) in your file loop. A quick way to do this would be my #files = grep { -f "cwd/$_" } (readdir D); this will filter the list to contain only files.
I'm further suspicious that you named a directory "cwd" … are you perhaps meaning the current working directory? In all the major OS'es in use today, that directory is referenced as “.” — you're looking for a directory literally named "cwd"?

How is $_ different from named input or loop arguments?

As I use $_ a lot I want to understand its usage better. $_ is a global variable for implicit values as far as I understood and used it.
As $_ seems to be set anyway, are there reasons to use named loop variables over $_ besides readability?
In what cases does it matter $_ is a global variable?
So if I use
for (#array){
print $_;
}
or even
print $_ for #array;
it has the same effect as
for my $var (#array){
print $var;
}
But does it work the same? I guess it does not exactly but what are the actual differences?
Update:
It seems $_ is even scoped correctly in this example. Is it not global anymore? I am using 5.12.3.
#!/usr/bin/perl
use strict;
use warnings;
my #array = qw/one two three four/;
my #other_array = qw/1 2 3 4/;
for (#array){
for (#other_array){
print $_;
}
print $_;
}
that prints correctly 1234one1234two1234three1234four.
For global $_ I would have expected 1234 4 1234 4 1234 4 1234 4 .. or am i missing something obvious?
When is $_ global then?
Update:
Ok, after having read the various answers and perlsyn more carefully I came to a conclusion:
Besides readability it is better to avoid using $_ because implicit localisation of $_ must be known and taken account of otherwise one might encounter unexpected behaviour.
Thanks for clarification of that matter.
are there reasons to use named loop variables over $_ besides readability?
The issue is not if they are named or not. The issue is if they are "package variables" or "lexical variables".
See the very good description of the 2 systems of variables used in Perl "Coping with Scoping":
http://perl.plover.com/FAQs/Namespaces.html
package variables are global variables, and should therefore be avoided for all the usual reasons (eg. action at a distance).
Avoiding package variables is a question of "correct operation" or "harder to inject bugs" rather than a question of "readability".
In what cases does it matter $_ is a global variable?
Everywhere.
The better question is:
In what cases is $_ local()ized for me?
There are a few places where Perl will local()ize $_ for you, primarily foreach, grep and map. All other places require that you local()ize it yourself, therefore you will be injecting a potential bug when you inevitably forget to do so. :-)
The classic failure mode of using $_ (implicitly or explicitly) as a loop variable is
for $_ (#myarray) {
/(\d+)/ or die;
foo($1);
}
sub foo {
 open(F, "foo_$_[0]") or die;
while (<F>) {
...
}
}
where, because the loop variable in for/foreach is bound to the actual list item, means that the while (<F>) overwrites #myarray with lines read from the files.
$_ is the same as naming the variable as in your second example with the way it is usually used. $_ is just a shortcut default variable name for the current item in the current loop to save on typing when doing a quick, simple loop. I tend to use named variables rather than the default. It makes it more clear what it is and if I happen to need to do a nested loop there are no conflicts.
Since $_ is a global variable, you may get unexpected values if you try to use its value that it had from a previous code block. The new code block may be part of a loop or other operation that inserts its own values into $_, overwriting what you expected to be there.
The risk in using $_ is that it is global (unless you localise it with local $_), and so if some function you call in your loop also uses $_, the two uses can interfere.
For reasons which are not clear to me, this has only bitten me occasionally, but I usually localise $_ if I use it inside packages.
There is nothing special about $_ apart from it is the default parameter for many functions. If you explicitly lexically scope your $_ with my, perl will use the local version of $_ rather than the global one. There is nothing strange in this, it is just like any other named variable.
sub p { print "[$_]"; } # Prints the global $_
# Compare and contrast
for my $_ (b1..b5) { for my $_ (a1..a5) { p } } print "\n"; # ex1
for my $_ (b1..b5) { for (a1..a5) { p } } print "\n"; # ex2
for (b1..b5) { for my $_ (a1..a5) { p } } print "\n"; # ex3
for (b1..b5) { for (a1..a5) { p } } print "\n"; # ex4
You should be slightly mystified by the output until you find out that perl will preserve the original value of the loop variable on loop exit (see perlsyn).
Note ex2 above. Here the second loop is using the lexically scoped $_ declared in the first loop. Subtle, but expected. Again, this value is preserved on exit so the two loops do not interfere.

Is it a convention to avoid using $_ when using other people's Perl API's?

I've just been caught out when using someone else's API in conjunction with the default variable $_
foreach (#rps_server_details) {
#server_data = ();
#server_data = split(/,/);
#$esp_hosts = ();
$filters{server_name} = $server_data[0];
print "--->$_<--\n";
$esp_hosts = $esp->get_hosts(fields => $fields, %filters) || die "$#";
print "--->$_<--\n";
The output for this is:
--->igrid8873.someone.com,app_10<--
Use of uninitialized value in concatenation (.) or string at ./rps_inv_lookup.pl line 120.
---><--
Specifying my own loop variable instead of relying on $_ fixes the problem.
Am I just being naive by using $_ in conjunction with an API someone else has written? Or is this a bug in that API module?
It is a bug in the API. If you use $_ in a function it is important to add a
local($_);
inside the function to avoid clobbering the caller's $_, or otherwise avoid using $_in a library function to be called by others.
If you can limit yoursel to Perl versions > 5.9.1 then you can also make $_ lexical which makes it easier to understand than localwith
my $_;
But this will break on earlier versions of Perl.
From man perlvar:
As $_ is a global variable, this may lead in some cases to
unwanted side-effects. As of perl 5.9.1, you can now use a
lexical version of $_ by declaring it in a file or in a block
with "my". Moreover, declaring "our $_" restores the global $_
in the current scope.
I would say it's:
a violation of best practices on your part (always use as-local-as possible variable scope and avoid using $_ due to just the issue your encountered)
coupled with a bug in the API caused by the same violation of the best practices as well as not localizing the special variable with local $_ as proscribed by perldoc perlvar.
In addition to perldoc, the API violates Perl Best Practices (as in Conway's book's rules):
Section 5.6. Localizing Punctuation Variables
If you're forced to modify a punctuation variable, localize it.
The problems described earlier under "Localization can also crop up whenever you're forced to change the value in a punctuation variable (often in I/O operations). All punctuation variables are global in scope. They provide explicit control over what would be completely implicit behaviours in most other languages: output buffering, input line numbering, input and output line endings, array indexing, et cetera.
It's usually a grave error to change a punctuation variable without first localizing it. Unlocalized assignments can potentially change the behaviour of code in entirely unrelated parts of your system, even in modules you did not write yourself but are merely using.
Using local is the cleanest and most robust way to temporarily change the value of a global variable. It should always be applied in the smallest possible scope, so as to minimize the effects of any "ambient behaviour" the variable might control:
Here's full perldoc perlvar documentation as well - search for the word "nasty_break" in the web page (I couldn't find direct in-page link but it's close to the start of the page)
You should be very careful when
modifying the default values of most
special variables described in this
document. In most cases you want to
localize these variables before
changing them, since if you don't, the
change may affect other modules which
rely on the default values of the
special variables that you have
changed. This is one of the correct
ways to read the whole file at once:
open my $fh, "<", "foo" or die $!;
local $/; # enable localized slurp mode
my $content = ;
close $fh;
But the following code is quite bad:
open my $fh, "<", "foo" or die $!;
undef $/; # enable slurp mode
my $content = ;
close $fh;
since some other module, may want to
read data from some file in the
default "line mode", so if the code we
have just presented has been executed,
the global value of $/ is now changed
for any other code running inside the
same Perl interpreter.
Usually when a variable is localized
you want to make sure that this change
affects the shortest scope possible.
So unless you are already inside some
short {} block, you should create one
yourself. For example:
my $content = '';
open my $fh, "<", "foo" or die $!;
{
local $/;
$content = ;
}
close $fh;
Here is an example of how your own
code can go broken:
for (1..5){
nasty_break();
print "$_ ";
}
sub nasty_break {
$_ = 5;
# do something with $_
}
You probably expect this code to
print:
1 2 3 4 5
but instead you get:
5 5 5 5 5
Why? Because nasty_break() modifies $_
without localizing it first. The fix
is to add local():
local $_ = 5;
foreach (#rps_server_details) {
#server_data = ();
#server_data = split(/,/);
#$esp_hosts = ();
$filters{server_name} = $server_data[0];
print "--->$_<--\n";
{
local *_; # disconnects the remaining scope from the implicit
# variables so you can clean up after the dirty api.
# NOTE: Submit a bug report against the offending module.
# If you notice this across multiple api features
# consider finding a different module for this task.
$esp_hosts = $esp->get_hosts(fields => $fields, %filters) || die "$#";
}
print "--->$_<--\n";