Perl: Variable value is 'glob', but should be 'scalar' - perl

I do have the following simple code:
my $TimeZone = $hCache->{'TimeZone'}; # Cache gets filled earlier
my $DateTime = DateTime->now();
$DateTime->set_time_zone($TimeZone);
This code runs in an application server which is basically a long running perl process that accepts incoming network connections.
From time to time this applicationserver gets somehow "dirty", and the code above is printing the following error:
The 'name' parameter ("Europe/Berlin") to DateTime::TimeZone::new was
a 'glob', which is not one of the allowed types: scalar at
/srv/epages/eproot/Perl/lib/site_perl/linux/DateTime.pm line 1960.
When I try to debug the variable "$TimeZone" I'm getting no further details.
E.g.
print ref($TimeZone); # prints nothing (scalar?)
print $TimeZone; # prints "Europe/Berlin"
The code works if I'm forcing the timezone to be a string again, like so:
my $TimeZone = $hCache->{'TimeZone'}; # Cache gets filled earlier
my $DateTime = DateTime->now();
$DateTime->set_time_zone($TimeZone."");
My questions are:
If 'glob' is not a reference, how can I debug the variable properly?
How can I create a 'glob' variable? What is the syntax to it? I'm
quite sure that my huge codebase has some accidents in it, but I
don't know what to search for.
Is there a way to 'monitor' the
variable? Basically, getting a stacktrace if the variable changes

How can I create a 'glob' variable?
Glob, short for "typeglob" is a structure (in the C sense of the word) that contains a field for each type of variable that can be found in the symbol table (scalar, array, hash, code, glob, etc). They form the symbol table.
Globs are created by simply mentioning a package variable.
#a = 4..6; # Creates glob *main::a containing a reference to the new array.
Since globs are themselves packages variables, you can bring a glob into existence just by mentioning it.
my $x = *glob; # The glob *main::glob is created by this line at compile-time.
Note that file handles are often accessed via globs. For example, open(my $fh, '<', ...) populates $fh with a reference to a glob that contains a reference to an IO.
$fh # Reference to glob that contains a reference to an IO.
*$fh # Glob that contains a reference to an IO.
*$fh{IO} # Reference to an IO.
If 'glob' is not a reference, how can I debug the variable properly?
ref(\$var) will return GLOB for a glob.
$ perl -e'$x = *STDOUT; CORE::say ref(\$x)'
GLOB
Is there a way to 'monitor' the variable?
Yes. You can add magic to it.
$ perl -e'
use feature qw( say );
use Carp qw( cluck );
use Variable::Magic qw( wizard cast );
my $wiz = wizard(
data => sub { $_[1] },
set => sub { cluck("Variable $_[1] modified"); },
);
my $x;
cast($x, $wiz, q{$x});
$x = 123; # Line 14
'
Variable $x modified at -e line 9.
main::__ANON__(SCALAR(0x50bcee23c0), "\$x") called at -e line 14
eval {...} called at -e line 14
More work is needed to detect if a hash or array changes, but the above can be used to monitor the elements of hashes and arrays.

Related

in Perl, how to assign the print function to a variable?

I need to control the print method using a variable
My code is below
#!/usr/bin/perl
# test_assign_func.pl
use strict;
use warnings;
sub echo {
my ($string) = #_;
print "from echo: $string\n\n";
}
my $myprint = \&echo;
$myprint->("hello");
$myprint = \&print;
$myprint->("world");
when I ran, I got the following error for the assignment of print function
$ test_assign_func.pl
from echo: hello
Undefined subroutine &main::print called at test_assign_func.pl line 17.
Looks like I need to prefix a namespace to print function but I cannot find the name space. Thank you for any advice!
print is an operator, not a sub.
perlfunc:
The functions in this section can serve as terms in an expression. They fall into two major categories: list operators and named unary operators.
Perl provides a sub for named operators that can be duplicated by a sub with a prototype. A reference to these can be obtained using \&CORE::name.
my $f = \&CORE::length;
say $f->("abc"); # 3
But print isn't such an operator (because of the way it accepts a file handle). For these, you'll need to create a sub with a more limited calling convention.
my $f = sub { print #_ };
$f->("abc\n");
Related:
What are Perl built-in operators/functions?
As mentioned in CORE, some functions can't be called as subroutines, only as barewords. print is one of them.

using file handle returned by select

I am pulling out my hair on using the file handle returned by select.
The documentation about select reads:
select
Returns the currently selected filehandle.
I have a piece of code, that prints some data and usually is executed without any re-direction. But there is one use case, where select is used to re-direct the print output to a file.
In this piece of code, I need to use the current selected file handle. I tried the following code fragment:
my $fh = select;
print $fh "test\n";
I wrote a short test program to demonstrate my problem:
#!/usr/bin/perl
use strict;
use warnings;
sub test
{
my $fh=select;
print $fh "#_\n";
}
my $oldfh;
# this works :-)
open my $test1, "> test1.txt";
$oldfh = select $test1;
test("test1");
close select $oldfh if defined $oldfh;
#this doesn't work. :-(
# Can't use string ("main::TEST2") as a symbol ref while "strict refs" in use
open TEST2,">test2.txt";
$oldfh = select TEST2;
test("test2");
close select $oldfh if defined $oldfh;
#this doesn't work, too. :-(
# gives Can't use string ("main::STDOUT") as a symbol ref while "strict refs" in use at
test("test");
It seems, that select is not returning a reference to the file handle but a string containing the name of the file handle.
What do I have to do to always get a usable file handle from select's return value?
P.S. I need to pass this file handle as OutputFile to XML::Simple's XMLout().
Just use
print XMLout(...);
It seems, that select is not returning a reference to the file handle but a string containing the name of the file handle.
It can indeed return a plain ordinary string.
>perl -MDevel::Peek -E"Dump(select())"
SV = PV(0x6cbe38) at 0x260e850
REFCNT = 1
FLAGS = (PADTMP,POK,pPOK)
PV = 0x261ce48 "main::STDOUT"\0
CUR = 12
LEN = 24
But that's perfectly acceptable as a file handle to Perl. There are four things that Perl accepts as file handles:
A reference to an IO object.
>perl -e"my $fh = *STDOUT{IO}; CORE::say($fh 'foo');"
foo
A glob that contains a reference to an IO object.
>perl -e"my $fh = *STDOUT; CORE::say($fh 'foo');"
foo
A reference to a glob that contains a reference to an IO object.
>perl -e"my $fh = \*STDOUT; CORE::say($fh 'foo');"
foo
A "symbolic reference" to a glob that contains a reference to an IO object.
>perl -e"my $fh = 'STDOUT'; CORE::say($fh 'foo');"
foo
This type doesn't work under strict refs, though.
>perl -Mstrict -e"my $fh = 'STDOUT'; CORE::say($fh 'foo');"
Can't use string ("STDOUT") as a symbol ref while "strict refs" in use at -e line 1.
What do I have to do to always get a usable file handle from select's return value?
As demonstrated above, it already returns a perfectly usable file handle. If XMLout doesn't support it, then it's a bug in XMLout. You could work around it as follows:
my $fh = select();
if (!ref($fh) && ref(\$fh) ne 'GLOB') {
no strict qw( refs );
$fh = \*$fh;
}
This can also be used to make the handle usable in a strict environment
As bad as XML::Simple is at reading XML, it's a million times worse at generating it. See Why is XML::Simple Discouraged?.
Consider XML::LibXML or XML::Twig if you're modifying XML.
Consider XML::Writer if you're generating XML.
The point of select is you don't need to specify the handle at all, since it's the default one.
sub test {
print "#_\n";
}
That's also the reason why select isn't recommended: it introduces global state which is hard to track and debug.
First of all, you shouldn't use XML::Simple , because it will need lots of work to make sure that your output will generate consistent XML. At least make sure you're using the appropriate ForceArray parameters.
Instead of doing filehandle shenanigans, why don't you use the simpler
print XMLout($data, %options);
... instead of trying to pass a default filehandle around?

Issues with function calls for search routine

My aim is to have multiple searches of specific files recursively. So I have these files:
/dir/here/tmp1/recursive/foo2013.log
/dir/here/tmp1/recursive/foo2014.log
/dir/here/tmp2/recursive/foo2013.log
/dir/here/tmp2/recursive/foo2014.log
where the 2013 and 2014 says in which year the files got modified lastly.
I then want to find the more up to date files (foo2014.log) for each directory tree (tmp1 and tmp2 likewise).
Referring to this answer I have the following code in script.pl:
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
func("tmp1");
print "===\n";
func("tmp2");
sub func{
my $varName = shift;
my %times;
find(\&upToDateFiles, "/dir/here");
for my $dir (keys %times) {
if ($times{$dir}{file} =~ m{$varName}){
print $times{$dir}{file}, "\n";
# do stuff here
}
}
sub upToDateFiles {
return unless (-f && /^foo/);
my $mod = -M $_;
if (!defined($times{$File::Find::dir})
or $mod < $times{$File::Find::dir}{mod})
{
$times{$File::Find::dir}{mod} = $mod;
$times{$File::Find::dir}{file} = $File::Find::name;
}
}
}
which will give me this output:
Variable "%times" will not stay shared at ./script.pl line 25.
/dir/here/tmp1/recursive/foo2014.log
===
I have three questions:
Why isn't the second call of the function func working like the first one? Variables are just defined in the scope of the function so why am I getting interferences?
Why do I get the notification for variable %times and how can I get rid of it?
If I define the function upToDateFiles outside of func I am getting this error: Execution of ./script.pl aborted due to compilation errors. I think this is because the variables aren't defined outside of func. Is it possible to change this and still get the desired output?
For starters - embedding a sub within another sub is rather nasty. If you use diagnostics; you'll get:
(W closure) An inner (nested) named subroutine is referencing a
lexical variable defined in an outer named subroutine.
When the inner subroutine is called, it will see the value of
the outer subroutine's variable as it was before and during the *first*
call to the outer subroutine; in this case, after the first call to the
outer subroutine is complete, the inner and outer subroutines will no
longer share a common value for the variable. In other words, the
variable will no longer be shared.
This problem can usually be solved by making the inner subroutine
anonymous, using the sub {} syntax. When inner anonymous subs that
reference variables in outer subroutines are created, they
are automatically rebound to the current values of such variables.
Which is directly relevant to the problem you're having. Try to avoid nesting your subs, and you won't have this problem. It certainly looks like you're trying to be far more complicated than you need to. Have you considered something like:
#!/usr/bin/perl
use strict;
use warnings;
use diagnostics;
use File::Find;
my %filenames;
sub compare_tree {
return unless -f && m/^foo/;
my $mtime = -M $File::Find::name;
if ( !$filenames{$_} || $mtime < $filenames{$_}{mtime} ) {
$filenames{$_} = {
newest => $File::Find::name,
mtime => $mtime,
};
}
}
find( \&compare_tree, "/dir/here" );
foreach my $filename ( keys %filenames ) {
print "$filename has newest version path of:", $filenames{$filename}{newest}, "\n";
print "$filename has newest mtime of:", $filenames{$filename}{mtime}, "\n";
}
I'd also note - you seem to be using $File::Find::dir - this looks wrong to me, based on what you describe you're doing. Likewise - you're running find twice on the same directory structure, which is not a very efficient approach - very big finds are expensive operations, so doubling the work needed isn't good.
Edit: Caught out by forgetting that -M was: -M Script start time minus file modification time, in days.. So 'newer' files are the lower number, not the higher. (So have amended above accordingly).

2 Sub references as arguments in perl

I have perl function I dont what does it do?
my what does min in perl?
#ARVG what does mean?
sub getArgs
{
my $argCnt=0;
my %argH;
for my $arg (#ARGV)
{
if ($arg =~ /^-/) # insert this entry and the next in the hash table
{
$argH{$ARGV[$argCnt]} = $ARGV[$argCnt+1];
}
$argCnt++;
}
return %argH;}
Code like that makes David sad...
Here's a reformatted version of the code doing the indentations correctly. That makes it so much easier to read. I can easily tell where my if and loops start and end:
sub getArgs {
my $argCnt = 0;
my %argH;
for my $arg ( #ARGV ) {
if ( $arg =~ /^-/ ) { # insert this entry and the next in the hash table
$argH{ $ARGV[$argCnt] } = $ARGV[$argCnt+1];
}
$argCnt++;
}
return %argH;
}
The #ARGV is what is passed to the program. It is an array of all the arguments passed. For example, I have a program foo.pl, and I call it like this:
foo.pl one two three four five
In this case, $ARGV is set to the list of values ("one", "two", "three", "four", "five"). The name comes from a similar variable found in the C programming language.
The author is attempting to parse these arguments. For example:
foo.pl -this that -the other
would result in:
$arg{"-this"} = "that";
$arg{"-the"} = "other";
I don't see min. Do you mean my?
This is a wee bit of a complex discussion which would normally involve package variables vs. lexically scoped variables, and how Perl stores variables. To make things easier, I'm going to give you a sort-of incorrect, but technically wrong answer: If you use the (strict) pragma, and you should, you have to declare your variables with my before they can be used. For example, here's a simple two line program that's wrong. Can you see the error?
$name = "Bob";
print "Hello $Name, how are you?\n";
Note that when I set $name to "Bob", $name is with a lowercase n. But, I used $Name (upper case N) in my print statement. As it stands, now. Perl will print out "Hello, how are you?" without a care that I've used the wrong variable name. If it's hard to spot an error like this in a two line program, imagine what it would be like in a 1000 line program.
By using strict and forcing me to declare variables with my, Perl can catch that error:
use strict;
use warnings; # Another Pragma that should always be used
my $name = "Bob";
print "Hello $Name, how are you doing\n";
Now, when I run the program, I get the following error:
Global symbol "$Name" requires explicit package name at (line # of print statement)
This means that $Name isn't defined, and Perl points to where that error is.
When you define variables like this, they are in scope with in the block where it's defined. A block could be the code contained in a set of curly braces or a while, if, or for statement. If you define a variable with my outside of these, it's defined to the end of the file.
Thus, by using my, the variables are only defined inside this subroutine. And, the $arg variable is only defined in the for loop.
One more thing:
The person who wrote this should have used the Getopt::Long module. There's a major bug in their code:
For example:
foo.pl -this that -one -two
In this case, my hash looks like this:
$args{'-this'} = "that";
$args{'-one'} = "-two";
$args{'-two'} = undef;
If I did this:
if ( defined $args{'-two'} ) {
...
}
I would not execute the if statement.
Also:
foo.pl -this=that -one -two
would also fail.
#ARGV is a special variable (refer to perldoc perlvar):
#ARGV
The array #ARGV contains the command-line arguments intended for the
script. $#ARGV is generally the number of arguments minus one, because
$ARGV[0] is the first argument, not the program's command name itself.
See $0 for the command name.
Perl documentation is also available from your command line:
perldoc -v #ARGV

Confusion about proper usage of dereference in Perl

I noticed the other day that - while altering values in a hash - that when you dereference a hash in Perl, you actually are making a copy of that hash. To confirm I wrote this quick little script:
#! perl
use warnings;
use strict;
my %h = ();
my $hRef = \%h;
my %h2 = %{$hRef};
my $h2Ref = \%h2;
if($hRef eq $h2Ref) {
print "\n\tThey're the same $hRef $h2Ref";
}
else {
print "\n\tThey're NOT the same $hRef $h2Ref";
}
print "\n\n";
The output:
They're NOT the same HASH(0x10ff6848) HASH(0x10fede18)
This leads me to realize that there could be spots in some of my scripts where they aren't behaving as expected. Why is it even like this in the first place? If you're passing or returning a hash, it would be more natural to assume that dereferencing the hash would allow me to alter the values of the hash being dereferenced. Instead I'm just making copies all over the place without any real need/reason to beyond making syntax a little more obvious.
I realize the fact that I hadn't even noticed this until now shows its probably not that big of a deal (in terms of the need to go fix in all of my scripts - but important going forward). I think its going to be pretty rare to see noticeable performance differences out of this, but that doesn't alter the fact that I'm still confused.
Is this by design in perl? Is there some explicit reason I don't know about for this; or is this just known and you - as the programmer - expected to know and write scripts accordingly?
The problem is that you are making a copy of the hash to work with in this line:
my %h2 = %{$hRef};
And that is understandable, since many posts here on SO use that idiom to make a local name for a hash, without explaining that it is actually making a copy.
In Perl, a hash is a plural value, just like an array. This means that in list context (such as you get when assigning to a hash) the aggregate is taken apart into a list of its contents. This list of pairs is then assembled into a new hash as shown.
What you want to do is work with the reference directly.
for (keys %$hRef) {...}
for (values %$href) {...}
my $x = $href->{some_key};
# or
my $x = $$href{some_key};
$$href{new_key} = 'new_value';
When working with a normal hash, you have the sigil which is either a % when talking about the entire hash, a $ when talking about a single element, and # when talking about a slice. Each of these sigils is then followed by an identifier.
%hash # whole hash
$hash{key} # element
#hash{qw(a b)} # slice
To work with a reference named $href simply replace the string hash in the above code with $href. In other words, $href is the complete name of the identifier:
%$href # whole hash
$$href{key} # element
#$href{qw(a b)} # slice
Each of these could be written in a more verbose form as:
%{$href}
${$href}{key}
#{$href}{qw(a b)}
Which is again a substitution of the string '$href' for 'hash' as the name of the identifier.
%{hash}
${hash}{key}
#{hash}{qw(a b)}
You can also use a dereferencing arrow when working with an element:
$hash->{key} # exactly the same as $$hash{key}
But I prefer the doubled sigil syntax since it is similar to the whole aggregate and slice syntax, as well as the normal non-reference syntax.
So to sum up, any time you write something like this:
my #array = #$array_ref;
my %hash = %$hash_ref;
You will be making a copy of the first level of each aggregate. When using the dereferencing syntax directly, you will be working on the actual values, and not a copy.
If you want a REAL local name for a hash, but want to work on the same hash, you can use the local keyword to create an alias.
sub some_sub {
my $hash_ref = shift;
our %hash; # declare a lexical name for the global %{__PACKAGE__::hash}
local *hash = \%$hash_ref;
# install the hash ref into the glob
# the `\%` bit ensures we have a hash ref
# use %hash here, all changes will be made to $hash_ref
} # local unwinds here, restoring the global to its previous value if any
That is the pure Perl way of aliasing. If you want to use a my variable to hold the alias, you can use the module Data::Alias
You are confusing the actions of dereferencing, which does not inherently create a copy, and using a hash in list context and assigning that list, which does. $hashref->{'a'} is a dereference, but most certainly does affect the original hash. This is true for $#$arrayref or values(%$hashref) also.
Without the assignment, just the list context %$hashref is a mixed beast; the resulting list contains copies of the hash keys but aliases to the actual hash values. You can see this in action:
$ perl -wle'$x={"a".."f"}; for (%$x) { $_=chr(ord($_)+10) }; print %$x'
epcnal
vs.
$ perl -wle'$x={"a".."f"}; %y=%$x; for (%y) { $_=chr(ord($_)+10) }; print %$x; print %y'
efcdab
epcnal
but %$hashref isn't acting any differently than %hash here.
No, dereferencing does not create a copy of the referent. It's my that creates a new variable.
$ perl -E'
my %h1; my $h1 = \%h1;
my %h2; my $h2 = \%h2;
say $h1;
say $h2;
say $h1 == $h2 ?1:0;
'
HASH(0x83b62e0)
HASH(0x83b6340)
0
$ perl -E'
my %h;
my $h1 = \%h;
my $h2 = \%h;
say $h1;
say $h2;
say $h1 == $h2 ?1:0;
'
HASH(0x9eae2d8)
HASH(0x9eae2d8)
1
No, $#{$someArrayHashRef} does not create a new array.
If perl did what you suggest, then variables would get aliased very easily, which would be far more confusing. As it is, you can alias variables with globbing, but you need to do so explicitly.