What is a "handle" in Perl? - perl

Wondering what the handle is in Perl.
I can see file handle, directory handle..etc but want to know the meaning of the handle in Perl.
For example, in IO::Pipe, I can see below explain. And want to make clear the meaning of "becomes a handle"?
reader ([ARGS])
The object is re-blessed into a sub-class of IO::Handle,
and becomes a handle at the reading end of the pipe. If
ARGS are given then fork is called and ARGS are passed
to exec.
Also could you please explain the meaning of bless?

A handle is an way to get to something without actually being that thing. A handle has an interface to interact with something managed by system (or something else).
Start with the idea of a scalar. Defining a simple scalar stores a value and that value is actually in the memory of your program. In very simplistic terms, you manage that resource directly and wholly within your program. You don't need to ask the system to do increment the variable for you:
my $n = 5;
$n++;
Talking to the outside world
A handle represents a connection to something managed by something else, typically through "system calls".
A file handle is your connection to a file (so, managed by the filesystem or OS) but is not the file itself. With that filehandle, you can can read from or write to the file, there is code behind all that to talk to the system to do the actual work.
open my $filehandle, '<', $filename or die "$!";
Since you are not managing the actual work and since you depend on the system to do the work, you check the $! system error variable to check that the system was able to do what you wanted. If it couldn't, it tells you how it ran into a problem (although the error may not be very specific).
A directory handle is a way to get a list of the things inside a directory, but is not the directory itself. To get that, you have to ask the system to do things for you. And so on.
Perl is wonderful though
But, in Perl, you can make a handle to anything you like (and I write a lot about this in either Effective Perl Programming and Mastering Perl. You can use the interface for a handle even if you wholly control the thing and don't need to ask the system to do something on your behalf.
For example, you can use the filehandle interface on a string:
open my $string_filehandle, '>', \my $string;
print {$string_filehandle} "This goes to the string";
To your code as you read it, it looks like the thing is a file (socket, whatever) because the main use of the handle interface. This is quite handy when your are handcuffed to using a filehandle because someone else wrote some code you can't change. This function is designed to only send $message to some output handle:
sub print_to_file_only {
my( $filehandle, $message ) = #_;
print {$filehandle} $message;
}
But sometime you don't want that message to go to the terminal, file, socket, or whatever. You want to see it in my program. You can capture the message in your $string_filehandle because it uses the same handle interface even though it's not a static resource.
print_to_file_only( $string_filehandle );
Now you'll see the message show up in $string, and you can do whatever you like with it.
There are many more tricks like this, and I'm tempted to talk about all of them. But, this should be a good start.

This can be a very broad topic and I'll try to stay with the crux of the question, captured in a line quoted from IO::Pipe docs which needs explaining
The object is re-blessed into a sub-class of IO::Handle,
and becomes a handle at the reading end of the pipe.
A "handle" in Perl is a construct built around some resource, out in the OS or in our program, which allows us to manage that resource. A filehandle, for instance, may facilitate access to a file, via libraries and OS facilities, and is more than a plain file descriptor
use warnings;
use strict;
use feature 'say';
my $file = shift // die "Usage: $0 file\n";
open my $fh, '<', $file or die "Can't open $file: $!";
say fileno $fh; # file descriptor, a small integer, normally >= 3
say $fh->fileno; # can use handle as object of IO::Handle or IO::File
print while <$fh>; # use it to access data at/via the resource
close $fh;
The opened file got a file descriptor in the OS, a small integer, the first one available.† But we get the "filehandle" $fh associated with the opened file, which is far nicer to work with and with which various tools can be used. (The fileno was used to get the fd from it.)
In newer Perls (since v5.14.0) a (file)handle can in fact be treated as an object of IO::Handle or IO::File, as these classes get loaded on demand once a method call from them is used on the variable with the handle (if the call can't be resolved otherwise).‡
This brings us to the second question, of "re-bless"-ing.
When a reference is bless-ed into a package it becomes an object of (the class supposedly defined in) that package. A sub this is done in is thus a constructor and such "blessed" reference is returned to the caller. That is an "instance" of the class in the caller, an object.
The object's internal structure has fields saying what package it's from so it gets treated accordingly, one can call methods defined in the package on it, etc. This is a bit simplified, see perlootut and perlobj for starters.
The quote from the docs comes from the reader or writer methods in IO::Pipe class. Once they are called on an object of that class it becomes beneficial for the object to have facilities from IO::Handle, so it is "made" into an object of that class. (Not of IO::File class since a pipe isn't seekable while IO::File inherits from IO::Seekable as well.)
Since bless is a crucial and telling part of the process peopple often simply say that it's "blessed" (or "re-blessed" here since it was already an object of another class), but as you can see from the linked sources there is a bit more to do.
As a final comment, note that a "(file)handle" can be opened to things entirely other than an OS resource like a file or socket or such. For example, it can be "tied" (see perltie, Tie::Handle); or, opened to a scalar ("in-memory file").§
† If STDIN/STDOUT/STDERR (fd's 0,1,2) aren't closed and this is the first thing opened, it gets 3
‡ The IO::File inherits from IO::Handle and IO::Seekable, adding only a few methods. Most classes that represent handles, like IO::Pipe or IO::Select, inherit from IO::Handle. So its docs, first, provide a feel for what is available for a handle, so what a "handle" is.
§ This isn't a full filehandle though; try fileno on it (-1). But it behaves well enough to be useful. One example: I use it in forked processes to accumulate prints in a child in a string, which is in the end sent back to the parent. That way they can be logged/printed coherently, and in some order.
# in a child
my $stdout;
open my $fh_stdout, '>', \$stdout or croak "Can't open var for write: $!";
my $fh_STDOUT = select $fh_stdout; # set as default, save old (STDOUT)
say "goes to string"; # winds up in $stdout (sent to parent in the end)
select $fh_STDOUT; # cna switch back (normally not needed in child)
This can be done in other ways of course (append messages to a string, for example) but this way we can print normally once the handle is select-ed as default (and can use existing subs/libraries which may just print, not caring where they run, etc).

Most languages create variables and objects in very different ways. In Perl, they are very similar.
Perl allows most types of variables to be marked as an object by the bless functionality.
This confer additional powers to the variable to call methods in the class. Perl will search that class for a method of that name. If you fail to supply the second argument to bless, it will use the current package or class to search.
In their IO::Pipe example, you call IO::Pipe's new() method to obtain a blessed object. To them make use, they fork() and the parent converts ("re-blessed") $pipe to a reader subclass of IO::Pipe with methods calls that work as a reader. The child process converts their $pipe to a write. Now they may communicate from child to parent via the pipe.

Related

In perl is \*STDIN the same as STDIN?

I'm the author of Pythonizer and I'm trying to translate the code of CGI.pm from the standard perl library to Python. I came across this code in read_from_client:
read(\*STDIN, $$buff, $len, $offset)
Is \*STDIN the same thing as just STDIN? I'm not understanding why they are using it this way. Thanks for your help!
The module also references \*main::STDIN - is this the same as STDIN too (I would translate plain STDIN to sys.stdin in python)? Code:
foreach my $fh (
\*main::STDOUT,
\*main::STDIN,
\*main::STDERR,
) { ... }
Instead of translating CGI.pm line for line, I'll recommend you understand the interface then do whatever Python would do for that. Or, better yet, just forget it exists. It often seems like a translation will be a drop-in replacement, but since the libraries and structures you'll use in the new language are different enough that you are just going to make new bugs. Since you are going to make new bugs anyway, you might as well do something smarter.
But, I know nothing about your situation, so let's get to the literal question.
You're looking at:
# Read data from a file handle
sub read_from_client {
my($self, $buff, $len, $offset) = #_;
local $^W=0; # prevent a warning
return $MOD_PERL
? $self->r->read($$buff, $len, $offset)
: read(\*STDIN, $$buff, $len, $offset);
}
Instead of worrying about the Perl code, just do whatever you need to do in Python to satisfy the interface. Given a buffer and a length, get some more data from the filehandle. Since you are not handling mod_perl (I'm guessing, because how would you?), you can ignore most stuff there.
The \*main::STDIN and \*STDIN are references to a typeglob, which is a way to track all the Perl variables with the same name (scalar, array, hash, subroutine, filehandle, and a few others). The STDIN identifier is a special case variable that is main by default, so adding the package main:: in front is probably merely developer comfort.
When you use those reference in a place that wants to work on a filehandle, the filehandle portion of the type glob is used. It's just a way to pass the identifier STDIN and have something else use it as a filehandle.
You see this as a way to pass around the named, standard file handles.
The read takes a filehandle (or reference to typeglob) as its first argument.
In python, you'd do something like sys.stdin.read(...).
The following can usually be used a file handle:
An IO object (*STDIN{IO})
A glob containing an IO object (*STDIN)
A reference to a glob containing an IO object (\*STDIN)
The name of a glob containing an IO object ("STDIN")
The builtin operators that expect a file handle allow you to omit the * when providing a glob. For example, read( FH, ... ) means read( *FH, ... ).
The builtin functions that expect a file handle should accept all of these. So you could any of the following:
read( *STDIN{IO}, ... )
read( STDIN, ... )
read( *STDIN, ... )
read( \*STDIN, ... )
read( "STDIN", ... ).
They will have the same effect.
Third-party libraries probably accept the globs and reference to globs, and they should also expect IO objects. I expect the least support for providing the name as a string. Your mileage may vary.
You can't go wrong with a reference to a glob (\*FH) since that's what open( my $fh, ... ) produces.

How to pipe to and read from the same tempfile handle without race conditions?

Was debugging a perl script for the first time in my life and came over this:
$my_temp_file = File::Temp->tmpnam();
system("cmd $blah | cmd2 > $my_temp_file");
open(FIL, "$my_temp_file");
...
unlink $my_temp_file;
This works pretty much like I want, except the obvious race conditions in lines 1-3. Even if using proper tempfile() there is no way (I can think of) to ensure that the file streamed to at line 2 is the same opened at line 3. One solution might be pipes, but the errors during cmd might occur late because of limited pipe buffering, and that would complicate my error handling (I think).
How do I:
Write all output from cmd $blah | cmd2 into a tempfile opened file handle?
Read the output without re-opening the file (risking race condition)?
You can open a pipe to a command and read its contents directly with no intermediate file:
open my $fh, '-|', 'cmd', $blah;
while( <$fh> ) {
...
}
With short output, backticks might do the job, although in this case you have to be more careful to scrub the inputs so they aren't misinterpreted by the shell:
my $output = `cmd $blah`;
There are various modules on CPAN that handle this sort of thing, too.
Some comments on temporary files
The comments mentioned race conditions, so I thought I'd write a few things for those wondering what people are talking about.
In the original code, Andreas uses File::Temp, a module from the Perl Standard Library. However, they use the tmpnam POSIX-like call, which has this caveat in the docs:
Implementations of mktemp(), tmpnam(), and tempnam() are provided, but should be used with caution since they return only a filename that was valid when function was called, so cannot guarantee that the file will not exist by the time the caller opens the filename.
This is discouraged and was removed for Perl v5.22's POSIX.
That is, you get back the name of a file that does not exist yet. After you get the name, you don't know if that filename was made by another program. And, that unlink later can cause problems for one of the programs.
The "race condition" comes in when two programs that probably don't know about each other try to do the same thing as roughly the same time. Your program tries to make a temporary file named "foo", and so does some other program. They both might see at the same time that a file named "foo" does not exist, then try to create it. They both might succeed, and as they both write to it, they might interleave or overwrite the other's output. Then, one of those programs think it is done and calls unlink. Now the other program wonders what happened.
In the malicious exploit case, some bad actor knows a temporary file will show up, so it recognizes a new file and gets in there to read or write data.
But this can also happen within the same program. Two or more versions of the same program run at the same time and try to do the same thing. With randomized filenames, it is probably exceedingly rare that two running programs will choose the same name at the same time. However, we don't care how rare something is; we care how devastating the consequences are should it happen. And, rare is much more frequent than never.
File::Temp
Knowing all that, File::Temp handles the details of ensuring that you get a filehandle:
my( $fh, $name ) = File::Temp->tempfile;
This uses a default template to create the name. When the filehandle goes out of scope, File::Temp also cleans up the mess.
{
my( $fh, $name ) = File::Temp->tempfile;
print $fh ...;
...;
} # file cleaned up
Some systems might automatically clean up temp files, although I haven't care about that in years. Typically is was a batch thing (say once a week).
I often go one step further by giving my temporary filenames a template, where the Xs are literal characters the module recognizes and fills in with randomized characters:
my( $name, $fh ) = File::Temp->tempfile(
sprintf "$0-%d-XXXXXX", time );
I'm often doing this while I'm developing things so I can watch the program make the files (and in which order) and see what's in them. In production I probably want to obscure the source program name ($0) and the time; I don't want to make it easier to guess who's making which file.
A scratchpad
I can also open a temporary file with open by not giving it a filename. This is useful when you want to collect outside the program. Opening it read-write means you can output some stuff then move around that file (we show a fixed-length record example in Learning Perl):
open(my $tmp, "+>", undef) or die ...
print $tmp "Some stuff\n";
seek $tmp, 0, 0;
my $line = <$tmp>;
File::Temp opens the temp file in O_RDWR mode so all you have to do is use that one file handle for both reading and writing, even from external programs. The returned file handle is overloaded so that it stringifies to the temp file name so you can pass that to the external program. If that is dangerous for your purpose you can get the fileno() and redirect to /dev/fd/<fileno> instead.
All you have to do is mind your seeks and tells. :-) Just remember to always set autoflush!
use File::Temp;
use Data::Dump;
$fh = File::Temp->new;
$fh->autoflush;
system "ls /tmp/*.txt >> $fh" and die $!;
#lines = <$fh>;
printf "%s\n\n", Data::Dump::pp(\#lines);
print $fh "How now brown cow\n";
seek $fh, 0, 0 or die $!;
#lines2 = <$fh>;
printf "%s\n", Data::Dump::pp(\#lines2);
Which prints
[
"/tmp/cpan_htmlconvert_DPzx.txt\n",
"/tmp/cpan_htmlconvert_DunL.txt\n",
"/tmp/cpan_install_HfUe.txt\n",
"/tmp/cpan_install_XbD6.txt\n",
"/tmp/cpan_install_yzs9.txt\n",
]
[
"/tmp/cpan_htmlconvert_DPzx.txt\n",
"/tmp/cpan_htmlconvert_DunL.txt\n",
"/tmp/cpan_install_HfUe.txt\n",
"/tmp/cpan_install_XbD6.txt\n",
"/tmp/cpan_install_yzs9.txt\n",
"How now brown cow\n",
]
HTH

Simple perl file copy methods without using File::Copy

I have very little perl experience and have been trying a few methods on OS X before I attempt to use Macperl on a more difficult to access OS 9 with very limited memory.
I have been trying simple file copy methods without using File::Copy.
Both of the following appear to work:
open R,"<old";
open W,">new";
print W <R>;
open $r,"<old";
open $w,">new";
while (<$r>) { print $w $_ }
Why can't I use $r and $w with the first method?
How does the first method work without a 'while'?
Is there a better way to do this?
Is there a better way to do this?
There sure is... File::Copy is a core module (no installation requred), so there's little reason to avoid using it.
use File::Copy;
copy('old', 'new');
Alternatively, you can use a system call to the underlying OS, in your case, OS-X
system('cp', 'old', 'new');
UPDATE Oops, you're on OS9, so I'm not sure what system calls are available.
You can use your first method with lexical file handles, but you need to disambiguate a little.
open $r, '<', 'old';
open $w, '>', 'new';
print {$w} <$r>;
Bare in mind this is unidiomatic code, and if you just want to create a direct copy, the first method is preferable (EDIT If your memory constraints allow for it).
Perl operators and functions can return different things depending on what their context expects.
The first method works because the print function creates what is called a list context for the <> operator - then thee operator will "slurp in" the entire file.
In the second example, the <> operator is called in the condition of the loop, which creates a scalar context, returning one line at a time (some asterisks here, but that's another story.)
Here is some explanation about contexts: http://perlmaven.com/scalar-and-list-context-in-perl.
And, both methods should work with the R and W filehandles (which are old fashioned Perl filehandles that are different from regular variables), and with the $r/$w notation that actually denotes variables which hold a filehandle reference. The difference is subtle but in most everyday use cases these can be used interchangeably. Have you tried using $ variables in the first example?
In addition to the Hellmar Becker's answer:
The print $w <$r>; does not work (gives a syntax error) because if the FILEHANDLE argument is a variable, perl tries to interpret the print's argument list beginning ($w <$r) as an operator (see the NOTE in http://perldoc.perl.org/functions/print.html). To disambigue put parentheses around the <$r>:
print $w (<$r>);

How does open(STDOUT,'>:scalar', \$stdout) work in Perl?

What does >:scalar mean?
Never see this kind of code before..
The particular ">:THING" syntax tells the Perl IO system to use the layer specified by THING. Have a look at the PerlIO documentation for 'layer'. Common layers are 'raw' and 'utf8'.
In this case, this allows you to use $stdout as an in-memory file which should end up containing whatever gets sent to STDOUT. More generally, the syntax lets you open an in-memory file, then send the filehandle to other functions that normally write to files, so that you can collect their output (or provide their input).
You can also achieve the same result by opening a "file" which is a reference to a scalar:
open my $fh, ">:scalar", \$scalar or die;
open my $fh, ">", \$scalar or die;
It's provided by PerlIO, and implemented by PerlIO::scalar, although you do not have to 'use' the module to access the functionality.
Perl uses a layered IO system. At the bottom of the chain, one finds a layer that deals with accessing the media. scalar is the IO system layer that handles reading from and writing to a scalar instead of a file. Saying
open(my $fh, '>:scalar', \$scalar)
for scalar handles is the equivalent of saying
open(my $fh, '>:unix', $file_name)
for OS handles. It's wholly redundant, since Perl already knows it's a scalar handle and not an OS handle.

Are there reasons to ever use the two-argument form of open(...) in Perl?

Are there any reasons to ever use the two-argument form of open(...) in Perl rather than the three-or-more-argument versions?
The only reason I can come up with is the obvious observation that the two-argument form is shorter. But assuming that verbosity is not an issue, are there any other reasons that would make you choose the two-argument form of open(...)?
One- and two-arg open applies any default layers specified with the -C switch or open pragma. Three-arg open does not. In my opinion, this functional difference is the strongest reason to choose one or the other (and the choice will vary depending what you are opening). Which is easiest or most descriptive or "safest" (you can safely use two-arg open with arbitrary filenames, it's just not as convenient) take a back seat in module code; in script code you have more discretion to choose whether you will support default layers or not.
Also, one-arg open is needed for Damian Conway's file slurp operator
$_ = "filename";
$contents = readline!open(!((*{!$_},$/)=\$_));
Imagine you are writing a utility that accepts an input file name. People with reasonable Unix experience are used to substituting - for STDIN. Perl handles that automatically only when the magical form is used where the mode characters and file name are one string, else you have to handle this and similar special cases yourself. This is a somewhat common gotcha, I am surprised no one has posted that yet. Proof:
use IO::File qw();
my $user_supplied_file_name = '-';
IO::File->new($user_supplied_file_name, 'r') or warn "IO::File/non-magical mode - $!\n";
IO::File->new("<$user_supplied_file_name") or warn "IO::File/magical mode - $!\n";
open my $fh1, '<', $user_supplied_file_name or warn "non-magical open - $!\n";
open my $fh2, "<$user_supplied_file_name" or warn "magical open - $!\n";
__DATA__
IO::File/non-magical mode - No such file or directory
non-magical open - No such file or directory
Another small difference : the two argument form trim spaces
$foo = " fic";
open(MH, ">$foo");
print MH "toto\n";
Writes in a file named fic
On the other hand
$foo = " fic";
open(MH, ">", $foo);
print MH "toto\n";
Will write in a file whose name begin with a space.
For short admin scripts with user input (or configuration file input), not having to bother with such details as trimming filenames is nice.
The two argument form of open was the only form supported by some old versions of perl.
If you're opening from a pipe, the three argument form isn't really helpful. Getting the equivalent of the three argument form involves doing a safe pipe open (open(FILE, '|-')) and then executing the program.
So for simple pipe opens (e.g. open(FILE, 'ps ax |')), the two argument syntax is much more compact.
I think William's post pretty much hits it. Otherwise, the three-argument form is going to be more clear, as well as safer.
See also:
What's the best way to open and read a file in Perl?
Why is three-argument open calls with autovivified filehandles a Perl best practice?
One reason to use the two-argument version of open is if you want to open something which might be a pipe, or a file. If you have one function
sub strange
{
my ($file) = #_;
open my $input, $file or die $!;
}
then you want to call this either with a filename like "file":
strange ("file");
or a pipe like "zcat file.gz |"
strange ("zcat file.gz |");
depending on the situation of the file you find, then the two-argument version may be used. You will actually see the above construction in "legacy" Perl. However, the most sensible thing might be to open the filehandle appropriately and send the filehandle to the function rather than using the file name like this.
When you are combining a string or using a variable, it can be rather unclear whether '<' or '>' etc is in already. In such cases, I personally prefer readability, which means, I use the longer form:
open($FILE, '>', $varfn);
When you simply use a constant, I prefer the ease-of-typing (and, actually, consider the short version better readable anyway, or at least even to the long version).
open($FILE, '>somefile.xxx');
I'm guessing you mean open(FH, '<filename.txt') as opposed to open(FH, '<', 'filename.txt') ?
I think it's just a matter of preference. I always use the former out of habit.