backtick vs native way of doing things in PERL - perl

Consider these 2 snippets :
#!/bin/bash/perl
open(DATA,"<input.txt");
while(<DATA>)
{
print($_) ;
}
and
$abcd = `cat input.txt`;
print $abcd;
Both will print the content of file input.txt as output
Question : Is there any standard, as to which one (backticks or native-method) should be preferred over the other, in any particular case or both are equal always??
Reason i am asking this is because i find cat method to be easier than opening a file in native perl method, so, this puts me in doubt that if i can achieve something through backtick way, shall i go with it or prefer other native ways of doing it!!
I checked this thread too : What's the difference between Perl's backticks, system, and exec? but it went a different route than my doubt!!

Use builtin functions wherever possible:
They are more portable: open works on Windows, while `cat input.txt` will not.
They have less overhead: Using backticks will fork, exec a shell which parses the command, which execs the cat program. This unnecessarily loads two programs. This is in contrast to open which is a builtin Perl function.
They make error handling easier. The open function will return a false value on error, which allows you to take different actions, e.g. like terminating the program with an error message:
open my $fh, "<", "input.txt" or die "Couldn't open input.txt: $!";
They are more flexible. For example, you can add encoding layers if your data isn't Latin-1 text:
open my $fh, "<:utf8", "input.txt" or die "Couldn't open input.txt: $!";
open my $fh, "<:raw", "input.bin" or die "Couldn't open input.bin: $!";
If you want a “just read this file into a scalar” function, look at the File::Slurp module:
use File::Slurp;
my $data = read_file "input.txt";

Using the back tick operators to call cat is highly inefficient, because:
It spawns a separate process (or maybe more than one if a shell is used) which does nothing more than read the file, which perl could do itself.
You are reading the whole file into memory instead of processing it one line at a time. OK for a small file, not so good for a large one.
The back tick method is ok for a quick and dirty script but I would not use it for anything serious.

Related

Perl: `die` did not work upon opening a nonexistent gz file using gzip

The following script creates a gziped file named "input.gz". Then the script attempts to open "input.gz" using gzip -dc. Intuitively, die should be triggered if a wrong input file name is provided. However, as in the following script, the program will not die even if a wrong input file name is provided ("inputx.gz"):
use warnings;
use strict;
system("echo PASS | gzip -c > input.gz");
open(IN,"-|","gzip -dc inputx.gz") || die "can't open input.gz!";
print STDOUT "die statment was not triggered!\n";
close IN;
The output of the script above was
die statment was not triggered!
gzip: inputx.gz: No such file or directory
My questions is: why wasn't die statement triggered even though gzip quit with error? And how can I make die statement triggered when a wrong file name is given?
It's buried in perlipc, but this seems relevant (emphasis added):
Be careful to check the return values from both open() and close(). If you're writing to a pipe, you should also trap SIGPIPE. Otherwise, think of what happens when you start up a pipe to a command that doesn't exist: the open() will in all likelihood succeed (it only reflects the fork()'s success), but then your output will fail--spectacularly. Perl can't know whether the command worked, because your command is actually running in a separate process whose exec() might have failed. Therefore, while readers of bogus commands return just a quick EOF, writers to bogus commands will get hit with a signal, which they'd best be prepared to handle.
Use IO::Uncompress::Gunzip to read gzipped files instead.
The open documentation is explicit about open-ing a process since that is indeed different
If you open a pipe on the command - (that is, specify either |- or -| with the one- or two-argument forms of open), an implicit fork is done, so open returns twice: in the parent process it returns the pid of the child process, and in the child process it returns (a defined) 0. Use defined($pid) or // to determine whether the open was successful.
For example, use either
my $child_pid = open(my $from_kid, "-|") // die "Can't fork: $!";
or
my $child_pid = open(my $to_kid, "|-") // die "Can't fork: $!";
(with code following that shows one use of this, which you don't need) The main point is to check for defined -- by design we get undef if open for a process fails, not just any "false."
While this should be corrected, keep in mind that the open call fails if fork itself fails, what is rare; in most cases when a "command fails" the fork was successful but something later wasn't. So in such cases we just cannot get the // die message, but end up seeing messages from the shell or command or OS, hopefully.
This is alright though, if informative messages indeed get emitted by some part of the process. Wrap the whole thing in eval and you'll have manageable error reporting.
But it is in general difficult to ensure to get all the right messages, and in some cases not possible. One good approach is to use a module for running and managing external commands. Among the many other advantages they also usually handle errors much more nicely. If you need to handle process's output right as it is emitted I recommend IPC::Run (which i'd recommend otherwise as well).
Read on what linked docs say, for specific examples on what you need and for much useful insight.
In your case
# Check input, depending on how it is given,
# consider String::ShellQuote if needed
my $file = ...;
my #cmd = ('gzip', '-dc', $file);
my $child_pid = open my $in, '-|', #cmd
// die "Can't fork for '#cmd': $!";
while (<$in>) {
...
}
close $in or die "Error closing pipe: $!";
Note a few other points
the "list form" of the command bypasses the shell
lexical filehandle (my $fh) is much better than typeglobs (IN)
print the actual error in the die statement, in $! variable
check close for a good final check on how it all went

In Perl, why does print not generate any output after I close STDOUT?

I have the code:
open(FILE, "<$new_file") or die "Cant't open file \n";
#lines=<FILE>;
close FILE;
open(STDOUT, ">$new_file") or die "Can't open file\n";
$old_fh = select(OUTPUT_HANDLE);
$| = 1;
select($old_fh);
for(#lines){
s/(.*?xsl.*?)xsl/$1xslt/;
print;
}
close(STDOUT);
STDOUT -> autoflush(1);
print "file changed";
After closing STDOUT closing the program does not write the last print print "file changed". Why is this?
*Edited* Print message I want to write on Console no to file
I suppose it is because print default filehandle is STDOUT, which at that point it is already closed. You could reopen it, or print to other filehandle, for example, STDERR.
print STDERR "file changed";
It's because you've closed the filehandle stored in STDOUT, so print can't use it anymore. Generally speaking opening a new filehandle into one of the predefined handle names isn't a very good idea because it's bound to lead to confusion. It's much clearer to use lexical filehandles, or just a different name for your output file. Yes you then have to specify the filehandle in your print call, but then you don't have any confusion over what's happened to STDOUT.
A print statement will output the string in the STDOUT, which is the default output file handle.
So the statement
print "This is a message";
is same as
print STDOUT "This is a message";
In your code, you have closed STDOUT and then printing the message, which will not work. Reopen the STDOUT filehandle or do not close it. As the script ends, the file handles will be automatically closed
open OLDOUT, ">&", STDOUT;
close STDOUT;
open(STDOUT, ">$new_file") or die "Can't open file\n";
...
close(STDOUT);
open (STDOUT, ">&",OLDOUT);
print "file changed";
You seem to be confused about how file IO operations are done in perl, so I would recommend you read up on that.
What went wrong?
What you are doing is:
Open a file for reading
Read the entire file and close it
Open the same file for overwrite (org file is truncated), using the STDOUT file handle.
Juggle around the default print handle in order to set autoflush on a file handle which is not even opened in the code you show.
Perform a substitution on all lines and print them
Close STDOUT then print a message when everything is done.
Your main biggest mistake is trying to reopen the default output file handle STDOUT. I assume this is because you do not know how print works, i.e. that you can supply a file handle to print to print FILEHANDLE "text". Or that you did not know that STDOUT was a pre-defined file handle.
Your other errors:
You did not use use strict; use warnings;. No program you write should be without these. They will prevent you from doing bad things, and give you information on errors, and will save you hours of debugging.
You should never "slurp" a file (read the entire file to a variable) unless you really need to, because this is ineffective and slow and for huge files will cause your program to crash due to lack of memory.
Never reassign the default file handles STDIN, STDOUT, STDERR, unless A) you really need to, B) you know what you are doing.
select sets the default file handle for print, read the documentation. This is rarely something that you need to concern yourself with. The variable $| sets autoflush on (if set to a true value) for the currently selected file handle. So what you did actually accomplished nothing, because OUTPUT_HANDLE is a non-existent file handle. If you had skipped the select statements, it would have set autoflush for STDOUT. (But you wouldn't have noticed any difference)
print uses print buffers because it is efficient. I assume you are trying to autoflush because you think your prints get caught in the buffer, which is not true. Generally speaking, this is not something you need to worry about. All the print buffers are automatically flushed when a program ends.
For the most part, you do not need to explicitly close file handles. File handles are automatically closed when they go out of scope, or when the program ends.
Using lexical file handles, e.g. open my $fh, ... instead of global, e.g. open FILE, .. is recommended, because of the previous statement, and because it is always a good idea to avoid global variables.
Using three-argument open is recommended: open FILEHANDLE, MODE, FILENAME. This is because you otherwise risk meta-characters in your file names to corrupt your open statement.
The quick fix:
Now, as I said in the comments, this -- or rather, what you intended, because this code is wrong -- is pretty much identical to the idiomatic usage of the -p command line switch:
perl -pi.bak -e 's/(.*?xsl.*?)xsl/$1xslt/' file.txt
This short little snippet actually does all that your program does, but does it much better. Explanation:
-p switch automatically assumes that the code you provide is inside a while (<>) { } loop, and prints each line, after your code is executed.
-i switch tells perl to do inplace-edit on the file, saving a backup copy in "file.txt.bak".
So, that one-liner is equivalent to a program such as this:
$^I = ".bak"; # turns inplace-edit on
while (<>) { # diamond operator automatically uses STDIN or files from #ARGV
s/(.*?xsl.*?)xsl/$1xslt/;
print;
}
Which is equivalent to this:
my $file = shift; # first argument from #ARGV -- arguments
open my $fh, "<", $file or die $!;
open my $tmp, ">", "/tmp/foo.bar" or die $!; # not sure where tmpfile is
while (<$fh>) { # read lines from org file
s/(.*?xsl.*?)xsl/$1xslt/;
print $tmp $_; # print line to tmp file
}
rename($file, "$file.bak") or die $!; # save backup
rename("/tmp/foo.bar", $file) or die $!; # overwrite original file
The inplace-edit option actually creates a separate file, then copies it over the original. If you use the backup option, the original file is first backed up. You don't need to know this information, just know that using the -i switch will cause the -p (and -n) option to actually perform changes on your original file.
Using the -i switch with the backup option activated is not required (except on Windows), but recommended. A good idea is to run the one-liner without the option first, so the output is printed to screen instead, and then adding it once you see the output is ok.
The regex
s/(.*?xsl.*?)xsl/$1xslt/;
You search for a string that contains "xsl" twice. The usage of .*? is good in the second case, but not in the first. Any time you find yourself starting a regex with a wildcard string, you're probably doing something wrong. Unless you are trying to capture that part.
In this case, though, you capture it and remove it, only to put it back, which is completely useless. So the first order of business is to take that part out:
s/(xsl.*?)xsl/$1xslt/;
Now, removing something and putting it back is really just a magic trick for not removing it at all. We don't need magic tricks like that, when we can just not remove it in the first place. Using look-around assertions, you can achieve this.
In this case, since you have a variable length expression and need a look-behind assertion, we have to use the \K (mnemonic: Keep) option instead, because variable length look-behinds are not implemented.
s/xsl.*?\Kxsl/xslt/;
So, since we didn't take anything out, we don't need to put anything back using $1. Now, you may notice, "Hey, if I replace 'xsl' with 'xslt', I don't need to remove 'xsl' at all." Which is true:
s/xsl.*?xsl\K/t/;
You may consider using options for this regex, such as /i, which causes it to ignore case and thus also match strings such as "XSL FOO XSL". Or the /g option which will allow it to perform all possible matches per line, and not just the first match. Read more in perlop.
Conclusion
The finished one-liner is:
perl -pi.bak -e 's/xsl.*?xsl\K/t/' file.txt

Writing a macro in Perl

open $FP, '>', $outfile or die $outfile." Cannot open file for writing\n";
I have this statement a lot of times in my code.
I want to keep the format same for all of those statements, so that when something is changed, it is only changed at one place.
In Perl, how should I go about resolving this situation?
Should I use macros or functions?
I have seen this SO thread How can I use macros in Perl?, but it doesn't say much about how to write a general macro like
#define fw(FP, outfile) open $FP, '>', \
$outfile or die $outfile." Cannot open file for writing\n";
First, you should write that as:
open my $FP, '>', $outfile or die "Could not open '$outfile' for writing:$!";
including the reason why open failed.
If you want to encapsulate that, you can write:
use Carp;
sub openex {
my ($mode, $filename) = #_;
open my $h, $mode, $filename
or croak "Could not open '$filename': $!";
return $h;
}
# later
my $FP = openex('>', $outfile);
Starting with Perl 5.10.1, autodie is in the core and I will second Chas. Owens' recommendation to use it.
Perl 5 really doesn't have macros (there are source filters, but they are dangerous and ugly, so ugly even I won't link you to the documentation). A function may be the right choice, but you will find that it makes it harder for new people to read your code. A better option may be to use the autodie pragma (it is core as of Perl 5.10.1) and just cut out the or die part.
Another option, if you use Vim, is to use snipMate. You just type fw<tab>FP<tab>outfile<tab> and it produces
open my $FP, '>', $outfile
or die "Couldn't open $outfile for writing: $!\n";
The snipMate text is
snippet fw
open my $${1:filehandle}, ">", $${2:filename variable}
or die "Couldn't open $$2 for writing: $!\n";
${3}
I believe other editors have similar capabilities, but I am a Vim user.
There are several ways to handle something similar to a C macro in Perl: a source filter, a subroutine, Template::Toolkit, or use features in your text editor.
Source Filters
If you gotta have a C / CPP style preprocessor macro, it is possible to write one in Perl (or, actually, any language) using a precompile source filter. You can write fairly simple to complex Perl classes that operate on the text of your source code and perform transformations on it before the code goes to the Perl compiler. You can even run your Perl code directly through a CPP preprocessor to get the exact type of macro expansions you get in C / CPP using Filter::CPP.
Damian Conway's Filter::Simple is part of the Perl core distribution. With Filter::Simple, you could easily write a simple module to perform the macro you are describing. An example:
package myopinion;
# save in your Perl's #INC path as "myopinion.pm"...
use Filter::Simple;
FILTER {
s/Hogs/Pigs/g;
s/Hawgs/Hogs/g;
}
1;
Then a Perl file:
use myopinion;
print join(' ',"Hogs", 'Hogs', qq/Hawgs/, q/Hogs/, "\n");
print "In my opinion, Hogs are Hogs\n\n";
Output:
Pigs Pigs Hogs Pigs
In my opinion, Pigs are Pigs
If you rewrote the FILTER in to make the substitution for your desired macro, Filter::Simple should work fine. Filter::Simple can be restricted to parts of your code to make substations, such as the executable part but not the POD part; only in strings; only in code.
Source filters are not widely used in in my experience. I have mostly seen them with lame attempts to encrypt Perl source code or humorous Perl obfuscators. In other words, I know it can be done this way but I personally don't know enough about them to recommend them or say not to use them.
Subroutines
Sinan Ünür openex subroutine is a good way to accomplish this. I will only add that a common older idiom that you will see involves passing a reference to a typeglob like this:
sub opensesame {
my $fn=shift;
local *FH;
return open(FH,$fn) ? *FH : undef;
}
$fh=opensesame('> /tmp/file');
Read perldata for why it is this way...
Template Toolkit
Template::Toolkit can be used to process Perl source code. For example, you could write a template along the lines of:
[% fw(fp, outfile) %]
running that through Template::Toolkit can result in expansion and substitution to:
open my $FP, '>', $outfile or die "$outfile could not be opened for writing:$!";
Template::Toolkit is most often used to separate the messy HTML and other presentation code from the application code in web apps. Template::Toolkit is very actively developed and well documented. If your only use is a macro of the type you are suggesting, it may be overkill.
Text Editors
Chas. Owens has a method using Vim. I use BBEdit and could easily write a Text Factory to replace the skeleton of a open with the precise and evolving open that I want to use. Alternately, you can place a completion template in your "Resources" directory in the "Perl" folder. These completion skeletons are used when you press the series of keys you define. Almost any serious editor will have similar functionality.
With BBEdit, you can even use Perl code in your text replacement logic. I use Perl::Critic this way. You could use Template::Toolkit inside BBEdit to process the macros with some intelligence. It can be set up so the source code is not changed by the template until you output a version to test or compile; the editor is essentially acting as a preprocessor.
Two potential issues with using a text editor. First is it is a one way / one time transform. If you want to change what your "macro" does, you can't do it, since the previous text of you "macro" was already used. You have to manually change them. Second potential issue is that if you use a template form, you can't send the macro version of the source code to someone else because the preprocessing that is being done inside the editor.
Don't Do This!
If you type perl -h to get valid command switches, one option you may see is:
-P run program through C preprocessor before compilation
Tempting! Yes, you can run your Perl code through the C preprocessor and expand C style macros and have #defines. Put down that gun; walk away; don't do it. There are many platform incompatibilities and language incompatibilities.
You get issues like this:
#!/usr/bin/perl -P
#define BIG small
print "BIG\n";
print qq(BIG\n);
Prints:
BIG
small
In Perl 5.12 the -P switch has been removed...
Conclusion
The most flexible solution here is just write a subroutine. All your code is visible in the subroutine, easily changed, and a shorter call. No real downside other than the readability of your code potentially.
Template::Toolkit is widely used. You can write complex replacements that act like macros or even more complex than C macros. If your need for macros is worth the learning curve, use Template::Toolkit.
For very simple cases, use the one way transforms in an editor.
If you really want C style macros, you can use Filter::CPP. This may have the same incompatibilities as the perl -P switch. I cannot recommend this; just learn the Perl way.
If you want to run Perl one liners and Perl regexs against your code before it compiles, use Filter::Simple.
And don't use the -P switch. You can't on newer versions of Perl anyway.
For something like open i think it's useful to include close in your factorized routine. Here's an approach that looks a bit wierd but encapsulates a typical open/close idiom.
sub with_file_do(&$$) {
my ($code, $mode, $file) = #_;
open my $fp, '>', $file or die "Could not open '$file' for writing:$!";
local $FP = $fp;
$code->(); # perhaps wrap in an eval
close $fp;
}
# usage
with_file_do {
print $FP "whatever\n";
# other output things with $FP
} '>', $outfile;
Having the open params specified at the end is a bit wierd but it allows you to avoid having to specify the sub keyword.

Are there reasons to ever use the two-argument form of open(...) in Perl?

Are there any reasons to ever use the two-argument form of open(...) in Perl rather than the three-or-more-argument versions?
The only reason I can come up with is the obvious observation that the two-argument form is shorter. But assuming that verbosity is not an issue, are there any other reasons that would make you choose the two-argument form of open(...)?
One- and two-arg open applies any default layers specified with the -C switch or open pragma. Three-arg open does not. In my opinion, this functional difference is the strongest reason to choose one or the other (and the choice will vary depending what you are opening). Which is easiest or most descriptive or "safest" (you can safely use two-arg open with arbitrary filenames, it's just not as convenient) take a back seat in module code; in script code you have more discretion to choose whether you will support default layers or not.
Also, one-arg open is needed for Damian Conway's file slurp operator
$_ = "filename";
$contents = readline!open(!((*{!$_},$/)=\$_));
Imagine you are writing a utility that accepts an input file name. People with reasonable Unix experience are used to substituting - for STDIN. Perl handles that automatically only when the magical form is used where the mode characters and file name are one string, else you have to handle this and similar special cases yourself. This is a somewhat common gotcha, I am surprised no one has posted that yet. Proof:
use IO::File qw();
my $user_supplied_file_name = '-';
IO::File->new($user_supplied_file_name, 'r') or warn "IO::File/non-magical mode - $!\n";
IO::File->new("<$user_supplied_file_name") or warn "IO::File/magical mode - $!\n";
open my $fh1, '<', $user_supplied_file_name or warn "non-magical open - $!\n";
open my $fh2, "<$user_supplied_file_name" or warn "magical open - $!\n";
__DATA__
IO::File/non-magical mode - No such file or directory
non-magical open - No such file or directory
Another small difference : the two argument form trim spaces
$foo = " fic";
open(MH, ">$foo");
print MH "toto\n";
Writes in a file named fic
On the other hand
$foo = " fic";
open(MH, ">", $foo);
print MH "toto\n";
Will write in a file whose name begin with a space.
For short admin scripts with user input (or configuration file input), not having to bother with such details as trimming filenames is nice.
The two argument form of open was the only form supported by some old versions of perl.
If you're opening from a pipe, the three argument form isn't really helpful. Getting the equivalent of the three argument form involves doing a safe pipe open (open(FILE, '|-')) and then executing the program.
So for simple pipe opens (e.g. open(FILE, 'ps ax |')), the two argument syntax is much more compact.
I think William's post pretty much hits it. Otherwise, the three-argument form is going to be more clear, as well as safer.
See also:
What's the best way to open and read a file in Perl?
Why is three-argument open calls with autovivified filehandles a Perl best practice?
One reason to use the two-argument version of open is if you want to open something which might be a pipe, or a file. If you have one function
sub strange
{
my ($file) = #_;
open my $input, $file or die $!;
}
then you want to call this either with a filename like "file":
strange ("file");
or a pipe like "zcat file.gz |"
strange ("zcat file.gz |");
depending on the situation of the file you find, then the two-argument version may be used. You will actually see the above construction in "legacy" Perl. However, the most sensible thing might be to open the filehandle appropriately and send the filehandle to the function rather than using the file name like this.
When you are combining a string or using a variable, it can be rather unclear whether '<' or '>' etc is in already. In such cases, I personally prefer readability, which means, I use the longer form:
open($FILE, '>', $varfn);
When you simply use a constant, I prefer the ease-of-typing (and, actually, consider the short version better readable anyway, or at least even to the long version).
open($FILE, '>somefile.xxx');
I'm guessing you mean open(FH, '<filename.txt') as opposed to open(FH, '<', 'filename.txt') ?
I think it's just a matter of preference. I always use the former out of habit.

What's the best way to open and read a file in Perl?

Please note - I am not looking for the "right" way to open/read a file, or the way I should open/read a file every single time. I am just interested to find out what way most people use, and maybe learn a few new methods at the same time :)*
A very common block of code in my Perl programs is opening a file and reading or writing to it. I have seen so many ways of doing this, and my style on performing this task has changed over the years a few times. I'm just wondering what the best (if there is a best way) method is to do this?
I used to open a file like this:
my $input_file = "/path/to/my/file";
open INPUT_FILE, "<$input_file" || die "Can't open $input_file: $!\n";
But I think that has problems with error trapping.
Adding a parenthesis seems to fix the error trapping:
open (INPUT_FILE, "<$input_file") || die "Can't open $input_file: $!\n";
I know you can also assign a filehandle to a variable, so instead of using "INPUT_FILE" like I did above, I could have used $input_filehandle - is that way better?
For reading a file, if it is small, is there anything wrong with globbing, like this?
my #array = <INPUT_FILE>;
or
my $file_contents = join( "\n", <INPUT_FILE> );
or should you always loop through, like this:
my #array;
while (<INPUT_FILE>) {
push(#array, $_);
}
I know there are so many ways to accomplish things in perl, I'm just wondering if there are preferred/standard methods of opening and reading in a file?
There are no universal standards, but there are reasons to prefer one or another. My preferred form is this:
open( my $input_fh, "<", $input_file ) || die "Can't open $input_file: $!";
The reasons are:
You report errors immediately. (Replace "die" with "warn" if that's what you want.)
Your filehandle is now reference-counted, so once you're not using it it will be automatically closed. If you use the global name INPUT_FILEHANDLE, then you have to close the file manually or it will stay open until the program exits.
The read-mode indicator "<" is separated from the $input_file, increasing readability.
The following is great if the file is small and you know you want all lines:
my #lines = <$input_fh>;
You can even do this, if you need to process all lines as a single string:
my $text = join('', <$input_fh>);
For long files you will want to iterate over lines with while, or use read.
If you want the entire file as a single string, there's no need to iterate through it.
use strict;
use warnings;
use Carp;
use English qw( -no_match_vars );
my $data = q{};
{
local $RS = undef; # This makes it just read the whole thing,
my $fh;
croak "Can't open $input_file: $!\n" if not open $fh, '<', $input_file;
$data = <$fh>;
croak 'Some Error During Close :/ ' if not close $fh;
}
The above satisfies perlcritic --brutal, which is a good way to test for 'best practices' :). $input_file is still undefined here, but the rest is kosher.
Having to write 'or die' everywhere drives me nuts. My preferred way to open a file looks like this:
use autodie;
open(my $image_fh, '<', $filename);
While that's very little typing, there are a lot of important things to note which are going on:
We're using the autodie pragma, which means that all of Perl's built-ins will throw an exception if something goes wrong. It eliminates the need for writing or die ... in your code, it produces friendly, human-readable error messages, and has lexical scope. It's available from the CPAN.
We're using the three-argument version of open. It means that even if we have a funny filename containing characters such as <, > or |, Perl will still do the right thing. In my Perl Security tutorial at OSCON I showed a number of ways to get 2-argument open to misbehave. The notes for this tutorial are available for free download from Perl Training Australia.
We're using a scalar file handle. This means that we're not going to be coincidently closing someone else's file handle of the same name, which can happen if we use package file handles. It also means strict can spot typos, and that our file handle will be cleaned up automatically if it goes out of scope.
We're using a meaningful file handle. In this case it looks like we're going to write to an image.
The file handle ends with _fh. If we see us using it like a regular scalar, then we know that it's probably a mistake.
If your files are small enough that reading the whole thing into memory is feasible, use File::Slurp. It reads and writes full files with a very simple API, plus it does all the error checking so you don't have to.
There is no best way to open and read a file. It's the wrong question to ask. What's in the file? How much data do you need at any point? Do you need all of the data at once? What do you need to do with the data? You need to figure those out before you think about how you need to open and read the file.
Is anything that you are doing now causing you problems? If not, don't you have better problems to solve? :)
Most of your question is merely syntax, and that's all answered in the Perl documentation (especially (perlopentut). You might also like to pick up Learning Perl, which answers most of the problems you have in your question.
Good luck, :)
It's true that there are as many best ways to open a file in Perl as there are
$files_in_the_known_universe * $perl_programmers
...but it's still interesting to see who usually does it which way. My preferred form of slurping (reading the whole file at once) is:
use strict;
use warnings;
use IO::File;
my $file = shift #ARGV or die "what file?";
my $fh = IO::File->new( $file, '<' ) or die "$file: $!";
my $data = do { local $/; <$fh> };
$fh->close();
# If you didn't just run out of memory, you have:
printf "%d characters (possibly bytes)\n", length($data);
And when going line-by-line:
my $fh = IO::File->new( $file, '<' ) or die "$file: $!";
while ( my $line = <$fh> ) {
print "Better than cat: $line";
}
$fh->close();
Caveat lector of course: these are just the approaches I've committed to muscle memory for everyday work, and they may be radically unsuited to the problem you're trying to solve.
I once used the
open (FILEIN, "<", $inputfile) or die "...";
my #FileContents = <FILEIN>;
close FILEIN;
boilerplate regularly. Nowadays, I use File::Slurp for small files that I want to hold completely in memory, and Tie::File for big files that I want to scalably address and/or files that I want to change in place.
For OO, I like:
use FileHandle;
...
my $handle = FileHandle->new( "< $file_to_read" );
croak( "Could not open '$file_to_read'" ) unless $handle;
...
my $line1 = <$handle>;
my $line2 = $handle->getline;
my #lines = $handle->getlines;
$handle->close;
Read the entire file $file into variable $text with a single line
$text = do {local(#ARGV, $/) = $file ; <>};
or as a function
$text = load_file($file);
sub load_file {local(#ARGV, $/) = #_; <>}
If these programs are just for your productivity, whatever works! Build in as much error handling as you think you need.
Reading in a whole file if it's large may not be the best way long-term to do things, so you may want to process lines as they come in rather than load them up in an array.
One tip I got from one of the chapters in The Pragmatic Programmer (Hunt & Thomas) is that you might want to have the script save a backup of the file for you before it goes to work slicing and dicing.
The || operator has higher precedence, so it is evaluated first before sending the result to "open"... In the code you've mentioned, use the "or" operator instead, and you wouldn't have that problem.
open INPUT_FILE, "<$input_file"
or die "Can't open $input_file: $!\n";
Damian Conway does it this way:
$data = readline!open(!((*{!$_},$/)=\$_)) for "filename";
But I don't recommend that to you.