How to pass filehandle as reference between modules and subs in perl - perl

I'm maintaining old Perl code and need to enable strict pragma in all modules. I have a problem in passing a file handle as a reference between modules and subs. We have a common module responsible for opening the log file which is passed as typeglob reference. In other modules, the run function first calls open_log() from the common module, then it passes this file handle to other subs.
Here I've written a simple test to simulate the situation.
#!/usr/bin/perl -w
use strict;
$::STATUS_OK = 0;
$::STATUS_NOT_OK = 1;
sub print_header {
our $file_handle = #_;
print { $$file_handle } "#### HEADER ####"; # reference passing fails
}
sub print_text {
my ($file_handle, $text)= #_;
print_header(\$file_handle);
print { $$file_handle } $text;
}
sub open_file_handle {
my ($file_handle, $path, $name) = #_;
my $filename = $path."\\".$name;
unless ( open ($$file_handle, ">".$filename)) {
print STDERR "Failed to open file_handle $filename for writing.\n";
return $::STATUS_NOT_OK;
}
print STDERR "File $filename was opened for writing successfully.\n";
return $::STATUS_OK;
}
my $gpath = "C:\\Temp";
my $gname = "mylogfile.log";
my $gfile_handle;
if (open_file_handle(\$gfile_handle, $gpath, $gname) == $::STATUS_OK) {
my $text = "BIG SUCCESS!!!\n";
print_text(\$gfile_handle, $text);
print STDERR $text;
} else {
print STDERR "EPIC FAIL!!!!!!!!\n";
}
The Main function first calls open_file_handle and passes a file handle reference to the print_text function. If I comment out the row:
print_header(\$file_handle);
Everything works fine, but I need to pass the file handle reference to other functions from the print_text function, and this doesn't work.
I'm a Java developer and Perl's reference handling is not familiar to me. I don't want to change the open_log() sub to return a file handle (now it returns only status), since I have lots of modules and hundreds of code lines to go through to make this change in all places.
How can I fix my code to make it work?

There are two types of filehandles in Perl. Lexical and global bareword filehandles:
open my $fh, '>', '/path/to/file' or die $!;
open FILEHANDLE, '>', '/path/to/file' or die $!;
You are dealing with the first, which is good. The second one is global and should not be used.
The file handles you have are lexical, and they are stored in a scalar variable. It's called scalar because it has a dollar sign $. These can be passed as arguments to subs.
foo($fh);
They can also be referenced. In that case, you get a scalar reference.
my $ref = \$fh;
Usually you reference stuff if you hand it over to a function so Perl does not make a copy of the data. Think of a reference like a pointer in C. It's only the memory location of the data (structure). The piece of data itself remains where it is.
Now, in your code you have references to these scalars. You can tell because it is dereferenced in the print statement by saying $$fh.
sub print_text {
my ($file_handle, $text)= #_;
print_header(\$file_handle);
print { $$file_handle } $text;
}
So the $file_handle you get as a parameter (that's what the = #_ does) is actually a reference. You do not need to reference it again when you pass it to a function.
I guess you wrote the print_header yourself:
sub print_header {
our $file_handle = #_;
print { $$file_handle } "#### HEADER ####"; # reference passing fails
}
There are a few things here:
- our is for globals. Do not use that. Use my instead.
- Put parenthesis around the parameter assignment: my ($fh) = #_
- Since you pass over a reference to a reference to a scalar, you need to dereference twice: ${ ${ $file_handle } }
Of course the double-deref is weird. Get rid of it passing the variable $file_hanlde to print_header instead of a refence to it:
sub print_text {
my ($file_handle, $text)= #_;
print_header($file_handle); # <-- NO BACKSLASH HERE
print { $$file_handle } $text;
}
That is all you need to to make it work.
In general, I would get rid of all the references to the $file_handle vars here. You don't need them. The lexical filehandle is already a reference to an IO::Handle object, but don't concern yourself with that right now, it is not important. Just remember:
use filehandles that have a $ up front
pass them without references and you do not need to worry about \ and ${} and stuff like that
For more info, see perlref and perlreftut.

You are having difficulties because you added multiple extra level of references. Objects like lexical filehandles already are references.
If you have difficulties keeping track of what is a reference, you might want to use some kind of hungarian notation, like a _ref suffix.
In print_text, this would be:
sub print_text {
my ($file_handle_ref, $text)= #_;
print_header(\$file_handle_ref);
print { $$file_handle_ref } $text;
}
And in print_header:
sub print_header {
my ($file_handle_ref_ref) = #_; # don't use `our`, and assign to a lvalue list!
print { $$$file_handle_ref_ref } "#### HEADER ####"; # double derefernence … urgh
}
A far superior solution is to pass the filehandle around directly, without references.
sub print_header {
my ($file_handle) = #_;
print {$file_handle} "#### HEADER ####"; # no reference, no cry
}
sub print_text {
my ($file_handle, $text)= #_;
print_header($file_handle);
print {$file_handle} $text;
}
And in the main part:
my $gpath = "C:/Temp"; # forward slashes work too, as long as you are consistent
my $gname = "mylogfile.log";
if (open_file_handle(\my $gfile_handle, $gpath, $gname) == $::STATUS_OK) {
my $text = "BIG SUCCESS!!!\n";
print_text($gfile_handle, $text);
...
} else {
...
}

the reference operator is "\" (backslash)
anything includes arrays, hashes and even sub-routines can be referenced
the 5th line to count backwards
print_text(\$gfile_handle, $text);
you passed a referenced variable \$gfile_handle to the sub-routine print_text
sub print_text {
my ($file_handle, $text)= #_;
print_header(\$file_handle);
print { $$file_handle } $text;
}
and in this sub-routine, $file_handle is already a reference
then your referenced it again and pass it to the sub-routine print_header
so, you can solve this problem by putting off the reference operator the 5th line to count backwards like this:
print_text($gfile_handle, $text);
and try again :-)

Related

can a variable be defined in perl whose value cannot be changed in a subroutine?

In the following script, I declare and modify #basearray in the main program. Inside the dosomething subroutine, I access #basearray, assign it to an array local to the script, and modify the local copy. Because I have been careful to change the value only of local variables inside the subroutine, #basearray is not changed.
If I had made the mistake of assigning a value to #basearray inside the subroutine, though, it would have been changed and that value would have persisted after the call to the subroutine.
This is demonstrated in the 2nd subroutine, doagain.
Also, doagain receives the reference \#basearray as an argument rather than accessing #basearray directly. But going to that additional trouble provides no additional safety. Why do it that way at all?
Is there a way to guarantee that I cannot inadvertently change #basearray inside any subroutine? Any kind of hard safety device that I can build into my code, analogous to use strict;, some combination perhaps of my and local?
Am I correct in thinking that the answer is No, and that the only solution is to not make careless programmer errors?
#!/usr/bin/perl
use strict; use warnings;
my #basearray = qw / amoeba /;
my $count;
{
print "\#basearray==\n";
$count = 0;
foreach my $el (#basearray) { $count++; print "$count:\t$el\n" };
}
sub dosomething
{
my $sb_name = (caller(0))[3];
print "entered $sb_name\n";
my #sb_array=( #basearray , 'dog' );
{
print "\#sb_array==\n";
$count = 0;
foreach my $el (#sb_array) { $count++; print "$count:\t$el\n" };
}
print "return from $sb_name\n";
}
dosomething();
#basearray = ( #basearray, 'rats' );
{
print "\#basearray==\n";
$count = 0;
foreach my $el (#basearray) { $count++; print "$count:\t$el\n" };
}
sub doagain
{
my $sb_name = (caller(0))[3];
print "entered $sb_name\n";
my $sf_array=$_[0];
my #sb_array=#$sf_array;
#sb_array=( #sb_array, "piglets ... influenza" );
{
print "\#sb_array==\n";
$count = 0;
foreach my $el (#sb_array) { $count++; print "$count:\t$el\n" };
}
print "now we demonstrate that passing an array as an argument to a subroutine does not protect it from being globally changed by programmer error\n";
#basearray = ( #sb_array );
print "return from $sb_name\n";
}
doagain( \#basearray );
{
print "\#basearray==\n";
$count = 0;
foreach my $el (#basearray) { $count++; print "$count:\t$el\n" };
}
There isn't a pragma or a keyword or such, but there are well established "good practices," which in this case completely resolve what you reasonably ponder about.
The first sub, dosomething, commits the sin of using variables visible in its scope but defined in the higher scope. Instead, always pass needed data to a subroutine (exceptions are rare, in crystal clear cases).
Directly using data from "outside" defies the idea of a function as an encapsulated procedure, exchanging data with its users via a well defined and clear interface. It entangles ("couples") sections of code that are in principle completely unrelated. In practice, it can also be outright dangerous.
Also, the fact the #basearray is up for grabs in the sub is best considered an accident -- what when that sub gets moved to a module? Or another sub is introduced to consolidate code where #basearray is defined?
The second sub, doagain, nicely takes a reference to that array. Then, to protect the data in the caller, one can copy the caller's array to another one which is local to the sub
sub doagain {
my ($ref_basearray) = #_;
my #local_ba = #$ref_basearray;
# work with local_ba and the caller's basearray is safe
}
The names of local lexical variables are of course arbitrary, but a convention where they resemble the caller's data names may be useful.
Then you can adopt a general practice, for safety, to always copy input variables to local ones. Work directly with references that are passed in only when you want to change the caller's data (relatively rare in Perl). This may hurt efficiency if it's done a lot with sizeable data, or when really large data structures are involved. So perhaps then make an exception and change data via its reference, and be extra careful.
(Putting my comment as answer)
One way to guarantee not changing a variable inside a subroutine is to not change it. Use only lexically scoped variables inside the subroutine, and pass whatever values you need inside the subroutine as arguments to the subroutine. It is a common enough coding practice, encapsulation.
One idea that you can use -- mainly as practice, I would say -- to force yourself to use encapsulation, is to put a block around your "main" code, and place subroutines outside of it. That way, if you should accidentally refer to a (formerly) global variable, use strict will be able to do it's job and produce a fatal error. Before runtime.
use strict;
use warnings;
main: { # lexical scope reduced to this block
my #basearray = qw / amoeba /;
print foo(#basearray); # works
print bar(); # fatal error
} # END OF MAIN lexical scope of #basearray ends here
sub foo {
my #basearray = #_; # encapsulated
return $basearray[1]++;
}
sub bar {
return $basearray[1]++; # out of scope ERROR
}
This will not compile, and will produce the error:
Global symbol "#basearray" requires explicit package name at foo.pl line 15.
Execution of foo.pl aborted due to compilation errors.
I would consider this a training device to force yourself to using good coding practices, and not something to necessarily use in production code.
There are several solutions with various levels of pithiness from "just don't change it" to "use an object or tied array and lock down the update functions". An intermediate solution, not unlike using an object with a getter method, is to define a function that returns your array but can only operate as an rvalue, and to use that function inside subroutines.
my #basearray = (...);
sub basearray { return #basearray }
sub foo {
foreach my $elem (basearray()) {
...
}
#bar = map { $_ *= 2 } basearray(); # ok
#bar = map { $_ *= 2 } #basearray; # modifies #basearray!
}
TLDR: yes, but.
I'll start with the "but". But it's better to design your code so that the variable simply doesn't exist in the scope where the untrusted function is defined.
sub untrusted_function {
...
}
my #basearray = qw( ... ); # declared after untrusted_function
If untrusted_function needs to be able to access the contents of the array, pass it a copy of the array as a parameter, so it can't modify the original.
Now here's the "yes".
You can mark the array as read-only before calling the untrusted function.
Internals::SvREADONLY($_, 1) for #basearray;
Internals::SvREADONLY(#basearray, 1);
Then mark it read-write again after the function has finished.
Internals::SvREADONLY(#basearray, 0);
Internals::SvREADONLY($_, 0) for #basearray;
Using Internals::SvREADONLY(#basearray, $bool) modifies the read-only state of the array itself, preventing elements from being added or removed from it; Internals::SvREADONLY($_, $bool) for #basearray modifies the read-only state of each element in the array too, which you probably want.
Of course, if your array contains references like blessed objects, you then need to consider whether you need to recurse into the references, marking them read-only too. (But can also be a concern with the shallow copy of the array I mentioned in the preferred solution!)
So yes, it is possible to prevent a sub from accidentally modifying a variable by marking that variable read-only before calling the sub, but it's a better idea to restructure your code so the sub simply doesn't have access to the variable at all.
Yes, but.
Here is a prototype that uses #TLP's answer.
#!/usr/bin/perl
use strict; use warnings;
{ # block_main BEG
my #basearray = qw / amoeba elephants sequoia /;
print join ( ' ', 'in main, #basearray==', join ( ' ', #basearray ), "\n" );
print "Now we call subroutine to print it:\n"; enumerateprintarray ( \#basearray );
my $ref_basearray = changearray ( \#basearray, 'wolves or coyotes . . . ' );
#basearray = #$ref_basearray;
print "Now we call subroutine to print it:\n"; enumerateprintarray ( \#basearray );
} # block_main END
sub enumerateprintarray
{
my $sb_name = (caller(0))[3];
#print join ( '' , #basearray ); # mortal sin! for in the day that thou eatest thereof thou shalt surely die.
my $sb_exact_count_arg = 1;
die "$sb_name must have exactly $sb_exact_count_arg arguments" unless ( ( scalar #_ ) == $sb_exact_count_arg );
my $sf_array = $_[0];
my #sb_array = #$sf_array;
my $sb_count = 0;
foreach (#sb_array)
{
$sb_count++;
print "\t$sb_count:\t$_\n";
}
}
sub changearray
{
my $sb_name = (caller(0))[3];
#print join ( '' , #basearray ); # in the day that thou eatest thereof thou shalt surely die.
my $sb_exact_count_arg = 2;
die "$sb_name must have exactly $sb_exact_count_arg arguments" unless ( ( scalar #_ ) == $sb_exact_count_arg );
my ( $sf_array, $addstring ) = #_;
my #sb_array = #$sf_array;
push #sb_array, $addstring;
return ( \#sb_array );
}

is it allowed to pass pipes to constructors?

I tried to do something very fancy in Perl, and I think I'm suffering the consequences. I don't know if what I was trying to do is possible, actually.
My main program creates a pipe like this:
pipe(my $pipe_reader, my $pipe_writer);
(originally it was pipe(PIPE_READER, PIPE_WRITER) but I changed to regular variables when I was trying to debug this)
Then it forks, but I think that is probably irrelevant here. The child does this:
my $response = Response->new($pipe_writer);
The constructor of Response is bare bones:
sub new {
my $class = shift;
my $writer = shift;
my $self = {
writer => $writer
};
bless($self, $class);
return($self);
}
Then later the child will write its response:
$response->respond(123, "Here is my response");
The code for respond is as follows:
sub respond {
my $self = shift;
my $number = shift;
my $text = shift;
print $self->{writer} "$number\n";
print $self->{writer} "$text\n";
close $self->{writer}
}
This triggers a strange compile error: 'String found where operator expected ... Missing operator before "$number\n"?' at the point of the first print. Of course this is the normal syntax for a print, except that I have the object property instead of a normal handle AND it happens to be a pipe, not a file handle. So now I'm wondering if I'm not allowed to do this.
From print
If you're storing handles in an array or hash, or in general whenever you're using any expression more complex than a bareword handle or a plain, unsubscripted scalar variable to retrieve it, you will have to use a block returning the filehandle value instead, ...
print { $files[$i] } "stuff\n";
print { $OK ? *STDOUT : *STDERR } "stuff\n";
(my emphasis)
So you need
print { $self->{writer} } "$number\n";
Or, per Borodin's comment
$self->{writer}->print("$number\n");
The syntax of print is special, see for example this post and this post. For one, after print must come either a "simple" filehandle or a block evaluating to one, as quoted above, to satisfy the parser.
But with the dereference (arrow) operator the filehandle is found to be an IO::File object† and so its parent's IO::Handle::print method is invoked on it.
Prior to v5.14 there had to be use IO::Handle; for this to work, though not anymore. See this post and links in it for more.
Note that print FILEHANDLE LIST is not an indirect method call,
even as it may appear to be. It is just a function call to the print builtin under rather special syntax rules. It is only with an explicit ->
that an IO::Handle method gets called.
† It is either blessed into the class as the method call is encountered (and fails), or at creation; I can't find it in docs or otherwise resolve whether filehandles are blessed at creation or on demand
perl -MScalar::Util=blessed -wE'
pipe(RD,WR);
say *WR{IO}; #--> IO::File=IO(0xe8cb58)
say blessed(WR)//"undef"; #--> undef
'
(warns of unused RD)   We can't do this with lexical filehandles as they are not in the symbol table.
But once needed a filehandle is an IO::File or IO::Handle object (depending on Perl version).

Trouble passing an IO::File handle to a Perl class

If I pass it as an argument I get the error:
'Can't locate object method "getline" via package "Bad" at Bad.pm line 27.'
But if I insert it in the module it works.
This is the boiled down code. bad.pl uses the module Bad.pm. Set $CAUSE_ERROR to see the problem.
#!/usr/bin/env perl
# This is bad.pl
use strict;
use warnings;
use IO::File;
use Bad; # use the bad module "Bad.pm"
&Main();
sub Main {
my $filename = "bad.pl";
warn "About to parse '$filename'\n";
my $MyWord = Bad->new(); # Create a new object.
my $io = IO::File->new($filename, "r");
#####################
my $CAUSE_ERROR = 1; # Set to 0 it does NOT cause an error. Set to 1 it DOES.
#####################
if($CAUSE_ERROR ) {
$MyWord->Parse($MyWord, $io);
} else {
$MyWord->{fd} = $io;
$MyWord->Parse($MyWord);
}
}
This is Bad.pm
package Bad;
# This is Bad.pm
use warnings;
use strict;
sub new {
my ($class, $args) = #_;
my $self = {
fd => undef,
};
return bless($self, $class); # Changes a function to a class
}
sub Parse {
my ($MyWord, $io) = #_;
if(defined($MyWord->{fd})){
# WORKS
$io = $MyWord->{fd};
while ( defined(my $inputline = $io->getline) ) {
print "${inputline}";
}
} else {
# FAILS
while ( defined(my $inputline = $io->getline) ) {
print "${inputline}";
}
}
}
1;
Using Perl v5.22.3 under Cygwin.
Originally I had Bad.pm in a sub directory but I simplified it.
Thank you for you time.
To summarize:
$MyWord->Parse($MyWord, $io);
Given that $MyWord is a reference blessed into the Bad class (i.e, it's an instance of Bad), this calls Bad::Parse with the arguments ($MyWord, $MyWord, $io). That is, it behaves as if you'd called:
Bad::Parse($MyWord, $MyWord, $io)`.
However, Bad::Parse() is written to expect the arguments ($MyWord, $io), so $io gets set to the second $MyWord, and Bad::Parse() throws an error when it tries to call $io->getline because the Bad module doesn't implement that method.
The fix is simple:
Call the function as $MyWord->Parse($io).
Change the variable name for the first argument in Bad::Parse() from $MyWord to $self. This isn't strictly necessary -- you can technically call this variable whatever you want -- but it's conventional, and will make your code much more readable to other Perl programmers.
To summarize errors in the posted code: The class name is passed to the constructor behind the scenes, as is the object to methods; we do not supply them. We do pass the filehandle to new, so that it is assigned to object's data and it can thus be used by methods in the class.
Here is a basic example. I try to stick to the posted design as much as possible. This does not do much of what is needed with I/O objects, but is rather about writing a class in general.
The class is meant to process a file, having been passed a filehandle for it. We expect to have one filehandle per object. Since we get it open the reponsibility to close it is left to the caller.
script.pl
use strict;
use warnings;
use feature 'say';
use IO::File;
use ProcessFile;
my $filename = shift || $0; # from command line, or this file
say "About to parse '$filename'";
my $io = IO::File->new($filename, "r") or die "Can't open $filename: $!";
my $word = ProcessFile->new($io); # Create a new object, initialize with $io
$word->parse();
# OR, by chaining calls
#my $word = ProcessFile->new($io)->parse();
say "Have ", ProcessFile->num_objects(), " open filehandles";
$io->close;
The package file ProcessFile.pm
package ProcessFile;
use warnings;
use strict;
use Carp qw(croak);
use Scalar::Util qw(openhandle);
# Example of "Class" data and methods: how many objects (open filehandles)
our $NumObjects;
sub num_objects { return $NumObjects }
sub DESTROY { --$NumObjects }
sub new {
my ($class, $fh) = #_; # class name, arguments passed to constructor
# To also check the mode (must be opened for reading) use Fcntl module
croak "No filehandle or not open or invalid " if not openhandle $fh;
my $self = { _fh => $fh }; # add other data that may make sense
bless $self, $class; # now $self is an object of class ProcessFile
++$NumObjects;
return $self;
}
sub parse {
my ($self, #args) = #_; # object, arguments passed to method (if any)
# Filehandle is retrieved from data, $self->{_fh}
while ( defined(my $inputline = $self->{_fh}->getline) ) {
print $inputline;
}
# Rewind before returning $self (or not, depending on design/#args)
# Can do more here, set some data etc, as needed by class design
seek $self->{_fh}, 0, 0;
return $self;
}
1;
A few comments on the above code follow. Let me know if more would be helpful.
Class data and methods don't belong to any one object, and are used for purposes that relate to the class as a whole (for example, to track all objects in play).
The DESTROY method runs when an object is destroyed, for example when it goes out of scope. Here we need it in order to decrease the count of existing objects. Try: place the code creating an object in a block { ... }; and see what count we get after the block.
We use openhandle from Scalar::Util to test whether the filehandle is open. We should really also test whether it is open for reading, since that is the fixed purpose of the class, using Fcntl.
In the sole, example method parse we read out the file and then rewind the filehandle, before returning the object. That is a placeholder for saving and/or setting the state for repeated use. What is done depends on the purpose and design of the class, and can be controlled by arguments.
Documentation: tutorial perlootut and reference perlobj on object-oriented work in Perl, perlmod for modules (a class is firstly a package), and a tutorial perlreftut for references.
There are also many informative SO posts around, please search.

Reference found where even-sized list expected

I am writing this code in perl where i create a unique key and then assign a value to it.
sub populate {
my $file = shift;
my %HoH = shift;
open(INFILE,$file);
.
.
.
$final_name = $prepend.$five;
$HoH{$final_name} = $seven;
}
Now i am passing in two parameters to a subroutine which id like
&populate(\%abc,$file_1);
&populate(\%xyz,$file_2);
Why does it give me an error like this:
Reference found where even-sized list expected
Because your hash is assigned to a reference, and not a hash (even-sized list). You need to do:
my $hashref = shift;
...
$hashref->{$final_name} = $seven;
ETA: You should call subroutines without &, e.g. populate(...), unless you specifically want to override the prototype of the sub. If you don't know what a prototype is, just don't use &.
ETA2: You really should use a lexical filehandle and three-argument open. Consider this scenario:
open INFILE, $file;
some_sub();
$args = <INFILE>; # <--- Now reading from a closed filehandle
sub some_sub {
open INFILE, $some_file;
random code...
close INFILE;
}

Perl: How to pass and use a lexical file handle to a subroutine as a named argument?

I want to pass a lexical file handle to a subroutine using a named argument, but the following does not compile:
#!/usr/bin/perl -w
use strict;
my $log_fh;
my $logname = "my.log";
sub primitive {
my ($fh, $m) = #_;
print $fh $m;
}
sub sophisticated {
my ($args) = #_;
print $args->{m};
print $args->{fh} $args->{m} ;
}
open $log_fh, ">", $logname;
print $log_fh "Today I learned ...\n";
primitive($log_fh,"... the old way works ...\n");
sophisticated({
fh=>$log_fh,
m=>"... and the new way requires an intervention by SO.",
});
close $log_fh;
The complaint is:
Scalar found where operator expected at ./lexical.file.handle.pl line 15, near
} $args"
(Missing operator before $args?)
$ perl --version
This is perl, v5.10.1
It works O.K. when I use the primitive technique of passing arguments, and the named-argument hash technique works for the message portion, just not for the file handle portion. Do I need a new version of print ?
When you've got a complex expression that returns a filehandle (like $args->{fh}) you'll need to disambiguate the syntax a bit by adding some extra curlies:
print { $args->{fh} } $args->{m};
This is due to the weird way the print operator is designed, with no comma between the filehandle and the list of stuff to print.
Alternatively, you could grab the filehandle out of your arguments hashref first, e.g.
my $fh = $args->{fh};
print $fh $args->{m};
friedo's answer covers your problem, but there's a stylistic issue I'd like to point out. You don't need to wrap everything in an anonymous hash to emulate named arguments. A hash initializer is just a list interpreted as key/value pairs. Passing such a list to a sub provides a cleaner syntax for the caller:
sub sophisticated {
my %arg = #_;
print $arg{m};
print {$arg{fh}} $arg{m};
}
sophisticated(fh => $log_fh, m => "Hello, world!\n");