Can someone explain this Perl code snippet? - perl

This little piece of code has been a staple in a bunch of my scripts, but I took the syntax from another working script that someone else wrote and adapted it to fit my needs. I'm not even sure that the syntax used here is the best or most common way to open a file handler either.
The code is:
$fh = \*STAT_FILE;
open ($fh,">>".$stat_file) or die "Can't open $stat_file: $!\n";
my $print_flag = ( -z $stat_file );
I don't fully understand the first line and also the last line of the code above. Specifically, \*STAT_FILE and -z, respectively.
I know that, for the most part, the second line will open a file for appending or quit and throw an error. But again, I don't understand what purpose the $! serves in that line either.
Can someone explain this Perl code, line-by-line, to me in pseudo? Also, if the method above is not the preferred method, then what is?
Thanks in advance

Before perl 5.6, file handles could only be globs (bare words) or references to globs (which is what \*STAT_FILE is). Also, it's better to use 3-argument open (See the docs. Also see perlopentut). So you can now do:
open(my $fh, ">>", $stat_file) or die "Failed to open $stat_file: $!";
and forget about \*STAT_FILE.
-z is one of the file test functions (and takes a file name or file handle as an argument) and tests to see if the file has zero size.
$! is one of the Special Variables and contains the most recent system error message (in this case why you can not open the file, perhaps permission issues, or a directory in the path to the file does not exist, etc.).
You should learn to use perldoc, all of this is in perldoc:
perldoc perlfunc (specifically perldoc -f open and perldoc -f -X)
perldoc perlvar

The first row assign to the variable a reference (the backslash sign) to the typeglob (a fullsymbol table entry) STAT_FILE. This has been a quite idiomatic perl construct to pass filehandles as reported, just to name it, in the Larry Wall "Programming perl". The $! variable contains the error message reurned by the operating system.
So the whole meaning is:
line 1. put in the $fh variable a filehandle;
line 2. Open for append the file reporting the system message error should a fault happens;
line 3. Set a flag variable warning if the file has zero length

Related

Is this a standard Perl language construction or a customization: open HANDLE, ">$fname"

Not a Perl guru, working with an ancient script, ran into a construct I didn't recognize that yields results I don't expect. Curious whether this is the standard language, or a PM customization of sorts:
open FILE1, ">./$disk_file" or die "Can't open file: $disk_file: $?";
From the looks of this, file is to be opened for writing, but the log error says that file is not found. Perl's file i/o expects 3 parameters, not 2. Log doesn't have the die output, instead saying: "File not found"
Confused a bit here.
EDIT: Made it work using the answers below. Seemed like I was running a cashed version of the .pl for some time, instead of the newly-edited. Finally it caught up with a 2-param open, thanks y'all for your help!
That is the old 2-argument form of open. The second argument is a bit magical:
if it starts with '>' the remainder of the string is used as the name of a file to open for writing
if it starts with '<' the remainder of the string is used as the name of a file to open for reading (this is the default if '<' is omitted)
if it ends with '|' the string up to that point is interpreted as a command which is executed with its STDOUT connected to a pipe which your script will open for reading
if it starts with '|' the string after that point is interpreted as a command which is executed with its STDIN connected to a pipe which your script will open for writing
This is a potentially security vulnerability because if your script accepts a filename as user input, the user can add a '|' at the beginning or end to trick your script into running a command.
The 3-argument form of open was added in (I think) version 5.8 so it has been a standard part of Perl for a very long time.
The FILE1 part is known as a bareword filehandle - which is a global. Modern style would be to use a lexical scalar like my $file1 instead.
See perldoc perlopen for details but, in brief...
Perl's open() will accept either two or three parameters (there's even a one-parameter version - which no-one ever uses). The two-parameter version is a slightly older style where the open mode and the filename are joined together in the second parameter.
So what you have is equivalent to:
open FILE1, '>', "./$disk_file" or die "Can't open file: $disk_file: $?";
A couple of other points.
We prefer to use lexical variables as filehandles these days (so, open my $file1, ... instead of open FILE1, ...).
I think you'll find that $! will be more useful in the error message than $?. $? contains the error from a child process, but there's no child process here.
Update: And none of this seems to be causing the problems that you're seeing. That seems to be caused by a file actually not being in the expected place. Can you please edit your question to add the exact error message that you're seeing.
The other answers here are correct that's the two-argument syntax. They've done a good job covering why and how you should ideally change it, so I won't rehash here.
However they haven't tried to help you fix it, so let me try that...
This is a guess, but I suspect $disk_file contains a filename with a path (eg my_logs/somelog.log), and the directory part (my_logs in my entirely guessed example) doesn't exists, so is throwing an error. You could create that directory, or alter whatever sets that variable so it's writing to a location that does exist.
Bear in mind these paths will be relative to wherever you're running the script from - not relative to the script itself, so if there's a log directory (or whatever) in the same dir as the script you may want to cd to the script's dir first.

What is the meaning of the dot in this open() usage in Perl?

How can I understand the following usage of the open() function in Perl File I/O?
open(FHANDLE, ">" . $file )
I tried to find this type of syntax in the docs but did not find; please note there is a . (dot) after ">".
All I cannot understand is a use of dot, the rest I know.
This is an example of the old, two-argument form of open (which should be avoided now that three-argument open is available). In Perl, . is the append operator. It combines the two strings into a single string.
The line of code you posted is equivalent to open(FHANDLE, ">$file" ), it just uses a different method of combining the > and $file.
The better way to do it these days would be open(my $fhandle, '>', $file), as shown in the documentation you linked to.
This is the two-argument open. The dot . is the string concatenation operator in Perl. If open is called with two arguments, the second argument contains both the mode and the path.
In your case, it will open the file named in $file for writing.
However, for several reasons you should not do this. It's more common to use the three-argument-open, and the lexical filehandles instead of the global GLOB filehandle.
The lexical filehandle makes sure Perl implicitly closes the handel for you as soon as it goes out of scope. Using different args for mode and filename is a security concern, because otherwise a malicious user could smuggle in mode-changes into the filename.
open my $fh, '>', $file or die $!;
IN addition to the now lexical filehandle and the separation of the mode and the filename, we also check for errors in this code, which is always a good idea.

perl's open() fails sometimes when file name ends with whitespace

I'm facing a problem with Perl's open() function. It is related to the files whose names end with whitespace. If I use open() with 2 arguments (filehandle and filename) and filename ends with whitespace, open() fails. Error message says that file cannot be found, although file exists. No such thing happen when opening mode is specified, e.g., if I state explicitly that file is opened for reading. Here is some sample code:
use warnings;
use strict;
my $file = '/tmp/test_with_ending_space ';
open WRITE, ">", $file or die "open with mode got error: $!";
print WRITE "my open() test\n";
close WRITE;
# open() with mode
open READ, "<", $file or die "open without mode got error: $!";
while (<READ>) {
print;
}
close READ;
# open() without mode
open READ1, $file or die $!;
while (<READ1>) {
print;
}
close READ1;
And here is the output from such code:
marius#mariusm-PC:~/perl$ ./test.pl
my open() test
open without mode got error: No such file or directory at ./test.pl line 21.
No such things happen with "usual" filenames, i.e., when filenames end with some other character.
Any ideas if this is a known problem? If yes, is there a way how to workaround it?
And just in case, before you start telling me "be nice, specify mode and tell your open() how to open the file". Unfortunately, this issue is present in some core modules, e.g., IO::File::open() (that's where I got stuck originaly). Last call in this function is open($fh, $file), i.e., it calls native open() without any particular mode.
It's documented in open
The filename passed to the one- and two-argument forms of
open() will have leading and trailing whitespace deleted
Read the following paragrpahs for more details.
#choroba gave the "why", but you also asked for a workaround.
Well, this is VERY kludgey, but if you're desperate and can't change the open() calls, this will work. First, detect if the filename ends with whitespace (I assume you can handle that). If it does, create a temp symlink to the file (without trailing whitespace!), and open the symlink.
WFM in my (old) Solaris 2.6 box.

In Perl, why does print not generate any output after I close STDOUT?

I have the code:
open(FILE, "<$new_file") or die "Cant't open file \n";
#lines=<FILE>;
close FILE;
open(STDOUT, ">$new_file") or die "Can't open file\n";
$old_fh = select(OUTPUT_HANDLE);
$| = 1;
select($old_fh);
for(#lines){
s/(.*?xsl.*?)xsl/$1xslt/;
print;
}
close(STDOUT);
STDOUT -> autoflush(1);
print "file changed";
After closing STDOUT closing the program does not write the last print print "file changed". Why is this?
*Edited* Print message I want to write on Console no to file
I suppose it is because print default filehandle is STDOUT, which at that point it is already closed. You could reopen it, or print to other filehandle, for example, STDERR.
print STDERR "file changed";
It's because you've closed the filehandle stored in STDOUT, so print can't use it anymore. Generally speaking opening a new filehandle into one of the predefined handle names isn't a very good idea because it's bound to lead to confusion. It's much clearer to use lexical filehandles, or just a different name for your output file. Yes you then have to specify the filehandle in your print call, but then you don't have any confusion over what's happened to STDOUT.
A print statement will output the string in the STDOUT, which is the default output file handle.
So the statement
print "This is a message";
is same as
print STDOUT "This is a message";
In your code, you have closed STDOUT and then printing the message, which will not work. Reopen the STDOUT filehandle or do not close it. As the script ends, the file handles will be automatically closed
open OLDOUT, ">&", STDOUT;
close STDOUT;
open(STDOUT, ">$new_file") or die "Can't open file\n";
...
close(STDOUT);
open (STDOUT, ">&",OLDOUT);
print "file changed";
You seem to be confused about how file IO operations are done in perl, so I would recommend you read up on that.
What went wrong?
What you are doing is:
Open a file for reading
Read the entire file and close it
Open the same file for overwrite (org file is truncated), using the STDOUT file handle.
Juggle around the default print handle in order to set autoflush on a file handle which is not even opened in the code you show.
Perform a substitution on all lines and print them
Close STDOUT then print a message when everything is done.
Your main biggest mistake is trying to reopen the default output file handle STDOUT. I assume this is because you do not know how print works, i.e. that you can supply a file handle to print to print FILEHANDLE "text". Or that you did not know that STDOUT was a pre-defined file handle.
Your other errors:
You did not use use strict; use warnings;. No program you write should be without these. They will prevent you from doing bad things, and give you information on errors, and will save you hours of debugging.
You should never "slurp" a file (read the entire file to a variable) unless you really need to, because this is ineffective and slow and for huge files will cause your program to crash due to lack of memory.
Never reassign the default file handles STDIN, STDOUT, STDERR, unless A) you really need to, B) you know what you are doing.
select sets the default file handle for print, read the documentation. This is rarely something that you need to concern yourself with. The variable $| sets autoflush on (if set to a true value) for the currently selected file handle. So what you did actually accomplished nothing, because OUTPUT_HANDLE is a non-existent file handle. If you had skipped the select statements, it would have set autoflush for STDOUT. (But you wouldn't have noticed any difference)
print uses print buffers because it is efficient. I assume you are trying to autoflush because you think your prints get caught in the buffer, which is not true. Generally speaking, this is not something you need to worry about. All the print buffers are automatically flushed when a program ends.
For the most part, you do not need to explicitly close file handles. File handles are automatically closed when they go out of scope, or when the program ends.
Using lexical file handles, e.g. open my $fh, ... instead of global, e.g. open FILE, .. is recommended, because of the previous statement, and because it is always a good idea to avoid global variables.
Using three-argument open is recommended: open FILEHANDLE, MODE, FILENAME. This is because you otherwise risk meta-characters in your file names to corrupt your open statement.
The quick fix:
Now, as I said in the comments, this -- or rather, what you intended, because this code is wrong -- is pretty much identical to the idiomatic usage of the -p command line switch:
perl -pi.bak -e 's/(.*?xsl.*?)xsl/$1xslt/' file.txt
This short little snippet actually does all that your program does, but does it much better. Explanation:
-p switch automatically assumes that the code you provide is inside a while (<>) { } loop, and prints each line, after your code is executed.
-i switch tells perl to do inplace-edit on the file, saving a backup copy in "file.txt.bak".
So, that one-liner is equivalent to a program such as this:
$^I = ".bak"; # turns inplace-edit on
while (<>) { # diamond operator automatically uses STDIN or files from #ARGV
s/(.*?xsl.*?)xsl/$1xslt/;
print;
}
Which is equivalent to this:
my $file = shift; # first argument from #ARGV -- arguments
open my $fh, "<", $file or die $!;
open my $tmp, ">", "/tmp/foo.bar" or die $!; # not sure where tmpfile is
while (<$fh>) { # read lines from org file
s/(.*?xsl.*?)xsl/$1xslt/;
print $tmp $_; # print line to tmp file
}
rename($file, "$file.bak") or die $!; # save backup
rename("/tmp/foo.bar", $file) or die $!; # overwrite original file
The inplace-edit option actually creates a separate file, then copies it over the original. If you use the backup option, the original file is first backed up. You don't need to know this information, just know that using the -i switch will cause the -p (and -n) option to actually perform changes on your original file.
Using the -i switch with the backup option activated is not required (except on Windows), but recommended. A good idea is to run the one-liner without the option first, so the output is printed to screen instead, and then adding it once you see the output is ok.
The regex
s/(.*?xsl.*?)xsl/$1xslt/;
You search for a string that contains "xsl" twice. The usage of .*? is good in the second case, but not in the first. Any time you find yourself starting a regex with a wildcard string, you're probably doing something wrong. Unless you are trying to capture that part.
In this case, though, you capture it and remove it, only to put it back, which is completely useless. So the first order of business is to take that part out:
s/(xsl.*?)xsl/$1xslt/;
Now, removing something and putting it back is really just a magic trick for not removing it at all. We don't need magic tricks like that, when we can just not remove it in the first place. Using look-around assertions, you can achieve this.
In this case, since you have a variable length expression and need a look-behind assertion, we have to use the \K (mnemonic: Keep) option instead, because variable length look-behinds are not implemented.
s/xsl.*?\Kxsl/xslt/;
So, since we didn't take anything out, we don't need to put anything back using $1. Now, you may notice, "Hey, if I replace 'xsl' with 'xslt', I don't need to remove 'xsl' at all." Which is true:
s/xsl.*?xsl\K/t/;
You may consider using options for this regex, such as /i, which causes it to ignore case and thus also match strings such as "XSL FOO XSL". Or the /g option which will allow it to perform all possible matches per line, and not just the first match. Read more in perlop.
Conclusion
The finished one-liner is:
perl -pi.bak -e 's/xsl.*?xsl\K/t/' file.txt

Are there reasons to ever use the two-argument form of open(...) in Perl?

Are there any reasons to ever use the two-argument form of open(...) in Perl rather than the three-or-more-argument versions?
The only reason I can come up with is the obvious observation that the two-argument form is shorter. But assuming that verbosity is not an issue, are there any other reasons that would make you choose the two-argument form of open(...)?
One- and two-arg open applies any default layers specified with the -C switch or open pragma. Three-arg open does not. In my opinion, this functional difference is the strongest reason to choose one or the other (and the choice will vary depending what you are opening). Which is easiest or most descriptive or "safest" (you can safely use two-arg open with arbitrary filenames, it's just not as convenient) take a back seat in module code; in script code you have more discretion to choose whether you will support default layers or not.
Also, one-arg open is needed for Damian Conway's file slurp operator
$_ = "filename";
$contents = readline!open(!((*{!$_},$/)=\$_));
Imagine you are writing a utility that accepts an input file name. People with reasonable Unix experience are used to substituting - for STDIN. Perl handles that automatically only when the magical form is used where the mode characters and file name are one string, else you have to handle this and similar special cases yourself. This is a somewhat common gotcha, I am surprised no one has posted that yet. Proof:
use IO::File qw();
my $user_supplied_file_name = '-';
IO::File->new($user_supplied_file_name, 'r') or warn "IO::File/non-magical mode - $!\n";
IO::File->new("<$user_supplied_file_name") or warn "IO::File/magical mode - $!\n";
open my $fh1, '<', $user_supplied_file_name or warn "non-magical open - $!\n";
open my $fh2, "<$user_supplied_file_name" or warn "magical open - $!\n";
__DATA__
IO::File/non-magical mode - No such file or directory
non-magical open - No such file or directory
Another small difference : the two argument form trim spaces
$foo = " fic";
open(MH, ">$foo");
print MH "toto\n";
Writes in a file named fic
On the other hand
$foo = " fic";
open(MH, ">", $foo);
print MH "toto\n";
Will write in a file whose name begin with a space.
For short admin scripts with user input (or configuration file input), not having to bother with such details as trimming filenames is nice.
The two argument form of open was the only form supported by some old versions of perl.
If you're opening from a pipe, the three argument form isn't really helpful. Getting the equivalent of the three argument form involves doing a safe pipe open (open(FILE, '|-')) and then executing the program.
So for simple pipe opens (e.g. open(FILE, 'ps ax |')), the two argument syntax is much more compact.
I think William's post pretty much hits it. Otherwise, the three-argument form is going to be more clear, as well as safer.
See also:
What's the best way to open and read a file in Perl?
Why is three-argument open calls with autovivified filehandles a Perl best practice?
One reason to use the two-argument version of open is if you want to open something which might be a pipe, or a file. If you have one function
sub strange
{
my ($file) = #_;
open my $input, $file or die $!;
}
then you want to call this either with a filename like "file":
strange ("file");
or a pipe like "zcat file.gz |"
strange ("zcat file.gz |");
depending on the situation of the file you find, then the two-argument version may be used. You will actually see the above construction in "legacy" Perl. However, the most sensible thing might be to open the filehandle appropriately and send the filehandle to the function rather than using the file name like this.
When you are combining a string or using a variable, it can be rather unclear whether '<' or '>' etc is in already. In such cases, I personally prefer readability, which means, I use the longer form:
open($FILE, '>', $varfn);
When you simply use a constant, I prefer the ease-of-typing (and, actually, consider the short version better readable anyway, or at least even to the long version).
open($FILE, '>somefile.xxx');
I'm guessing you mean open(FH, '<filename.txt') as opposed to open(FH, '<', 'filename.txt') ?
I think it's just a matter of preference. I always use the former out of habit.