Writing a macro in Perl - perl

open $FP, '>', $outfile or die $outfile." Cannot open file for writing\n";
I have this statement a lot of times in my code.
I want to keep the format same for all of those statements, so that when something is changed, it is only changed at one place.
In Perl, how should I go about resolving this situation?
Should I use macros or functions?
I have seen this SO thread How can I use macros in Perl?, but it doesn't say much about how to write a general macro like
#define fw(FP, outfile) open $FP, '>', \
$outfile or die $outfile." Cannot open file for writing\n";

First, you should write that as:
open my $FP, '>', $outfile or die "Could not open '$outfile' for writing:$!";
including the reason why open failed.
If you want to encapsulate that, you can write:
use Carp;
sub openex {
my ($mode, $filename) = #_;
open my $h, $mode, $filename
or croak "Could not open '$filename': $!";
return $h;
}
# later
my $FP = openex('>', $outfile);
Starting with Perl 5.10.1, autodie is in the core and I will second Chas. Owens' recommendation to use it.

Perl 5 really doesn't have macros (there are source filters, but they are dangerous and ugly, so ugly even I won't link you to the documentation). A function may be the right choice, but you will find that it makes it harder for new people to read your code. A better option may be to use the autodie pragma (it is core as of Perl 5.10.1) and just cut out the or die part.
Another option, if you use Vim, is to use snipMate. You just type fw<tab>FP<tab>outfile<tab> and it produces
open my $FP, '>', $outfile
or die "Couldn't open $outfile for writing: $!\n";
The snipMate text is
snippet fw
open my $${1:filehandle}, ">", $${2:filename variable}
or die "Couldn't open $$2 for writing: $!\n";
${3}
I believe other editors have similar capabilities, but I am a Vim user.

There are several ways to handle something similar to a C macro in Perl: a source filter, a subroutine, Template::Toolkit, or use features in your text editor.
Source Filters
If you gotta have a C / CPP style preprocessor macro, it is possible to write one in Perl (or, actually, any language) using a precompile source filter. You can write fairly simple to complex Perl classes that operate on the text of your source code and perform transformations on it before the code goes to the Perl compiler. You can even run your Perl code directly through a CPP preprocessor to get the exact type of macro expansions you get in C / CPP using Filter::CPP.
Damian Conway's Filter::Simple is part of the Perl core distribution. With Filter::Simple, you could easily write a simple module to perform the macro you are describing. An example:
package myopinion;
# save in your Perl's #INC path as "myopinion.pm"...
use Filter::Simple;
FILTER {
s/Hogs/Pigs/g;
s/Hawgs/Hogs/g;
}
1;
Then a Perl file:
use myopinion;
print join(' ',"Hogs", 'Hogs', qq/Hawgs/, q/Hogs/, "\n");
print "In my opinion, Hogs are Hogs\n\n";
Output:
Pigs Pigs Hogs Pigs
In my opinion, Pigs are Pigs
If you rewrote the FILTER in to make the substitution for your desired macro, Filter::Simple should work fine. Filter::Simple can be restricted to parts of your code to make substations, such as the executable part but not the POD part; only in strings; only in code.
Source filters are not widely used in in my experience. I have mostly seen them with lame attempts to encrypt Perl source code or humorous Perl obfuscators. In other words, I know it can be done this way but I personally don't know enough about them to recommend them or say not to use them.
Subroutines
Sinan Ünür openex subroutine is a good way to accomplish this. I will only add that a common older idiom that you will see involves passing a reference to a typeglob like this:
sub opensesame {
my $fn=shift;
local *FH;
return open(FH,$fn) ? *FH : undef;
}
$fh=opensesame('> /tmp/file');
Read perldata for why it is this way...
Template Toolkit
Template::Toolkit can be used to process Perl source code. For example, you could write a template along the lines of:
[% fw(fp, outfile) %]
running that through Template::Toolkit can result in expansion and substitution to:
open my $FP, '>', $outfile or die "$outfile could not be opened for writing:$!";
Template::Toolkit is most often used to separate the messy HTML and other presentation code from the application code in web apps. Template::Toolkit is very actively developed and well documented. If your only use is a macro of the type you are suggesting, it may be overkill.
Text Editors
Chas. Owens has a method using Vim. I use BBEdit and could easily write a Text Factory to replace the skeleton of a open with the precise and evolving open that I want to use. Alternately, you can place a completion template in your "Resources" directory in the "Perl" folder. These completion skeletons are used when you press the series of keys you define. Almost any serious editor will have similar functionality.
With BBEdit, you can even use Perl code in your text replacement logic. I use Perl::Critic this way. You could use Template::Toolkit inside BBEdit to process the macros with some intelligence. It can be set up so the source code is not changed by the template until you output a version to test or compile; the editor is essentially acting as a preprocessor.
Two potential issues with using a text editor. First is it is a one way / one time transform. If you want to change what your "macro" does, you can't do it, since the previous text of you "macro" was already used. You have to manually change them. Second potential issue is that if you use a template form, you can't send the macro version of the source code to someone else because the preprocessing that is being done inside the editor.
Don't Do This!
If you type perl -h to get valid command switches, one option you may see is:
-P run program through C preprocessor before compilation
Tempting! Yes, you can run your Perl code through the C preprocessor and expand C style macros and have #defines. Put down that gun; walk away; don't do it. There are many platform incompatibilities and language incompatibilities.
You get issues like this:
#!/usr/bin/perl -P
#define BIG small
print "BIG\n";
print qq(BIG\n);
Prints:
BIG
small
In Perl 5.12 the -P switch has been removed...
Conclusion
The most flexible solution here is just write a subroutine. All your code is visible in the subroutine, easily changed, and a shorter call. No real downside other than the readability of your code potentially.
Template::Toolkit is widely used. You can write complex replacements that act like macros or even more complex than C macros. If your need for macros is worth the learning curve, use Template::Toolkit.
For very simple cases, use the one way transforms in an editor.
If you really want C style macros, you can use Filter::CPP. This may have the same incompatibilities as the perl -P switch. I cannot recommend this; just learn the Perl way.
If you want to run Perl one liners and Perl regexs against your code before it compiles, use Filter::Simple.
And don't use the -P switch. You can't on newer versions of Perl anyway.

For something like open i think it's useful to include close in your factorized routine. Here's an approach that looks a bit wierd but encapsulates a typical open/close idiom.
sub with_file_do(&$$) {
my ($code, $mode, $file) = #_;
open my $fp, '>', $file or die "Could not open '$file' for writing:$!";
local $FP = $fp;
$code->(); # perhaps wrap in an eval
close $fp;
}
# usage
with_file_do {
print $FP "whatever\n";
# other output things with $FP
} '>', $outfile;
Having the open params specified at the end is a bit wierd but it allows you to avoid having to specify the sub keyword.

Related

How to pipe to and read from the same tempfile handle without race conditions?

Was debugging a perl script for the first time in my life and came over this:
$my_temp_file = File::Temp->tmpnam();
system("cmd $blah | cmd2 > $my_temp_file");
open(FIL, "$my_temp_file");
...
unlink $my_temp_file;
This works pretty much like I want, except the obvious race conditions in lines 1-3. Even if using proper tempfile() there is no way (I can think of) to ensure that the file streamed to at line 2 is the same opened at line 3. One solution might be pipes, but the errors during cmd might occur late because of limited pipe buffering, and that would complicate my error handling (I think).
How do I:
Write all output from cmd $blah | cmd2 into a tempfile opened file handle?
Read the output without re-opening the file (risking race condition)?
You can open a pipe to a command and read its contents directly with no intermediate file:
open my $fh, '-|', 'cmd', $blah;
while( <$fh> ) {
...
}
With short output, backticks might do the job, although in this case you have to be more careful to scrub the inputs so they aren't misinterpreted by the shell:
my $output = `cmd $blah`;
There are various modules on CPAN that handle this sort of thing, too.
Some comments on temporary files
The comments mentioned race conditions, so I thought I'd write a few things for those wondering what people are talking about.
In the original code, Andreas uses File::Temp, a module from the Perl Standard Library. However, they use the tmpnam POSIX-like call, which has this caveat in the docs:
Implementations of mktemp(), tmpnam(), and tempnam() are provided, but should be used with caution since they return only a filename that was valid when function was called, so cannot guarantee that the file will not exist by the time the caller opens the filename.
This is discouraged and was removed for Perl v5.22's POSIX.
That is, you get back the name of a file that does not exist yet. After you get the name, you don't know if that filename was made by another program. And, that unlink later can cause problems for one of the programs.
The "race condition" comes in when two programs that probably don't know about each other try to do the same thing as roughly the same time. Your program tries to make a temporary file named "foo", and so does some other program. They both might see at the same time that a file named "foo" does not exist, then try to create it. They both might succeed, and as they both write to it, they might interleave or overwrite the other's output. Then, one of those programs think it is done and calls unlink. Now the other program wonders what happened.
In the malicious exploit case, some bad actor knows a temporary file will show up, so it recognizes a new file and gets in there to read or write data.
But this can also happen within the same program. Two or more versions of the same program run at the same time and try to do the same thing. With randomized filenames, it is probably exceedingly rare that two running programs will choose the same name at the same time. However, we don't care how rare something is; we care how devastating the consequences are should it happen. And, rare is much more frequent than never.
File::Temp
Knowing all that, File::Temp handles the details of ensuring that you get a filehandle:
my( $fh, $name ) = File::Temp->tempfile;
This uses a default template to create the name. When the filehandle goes out of scope, File::Temp also cleans up the mess.
{
my( $fh, $name ) = File::Temp->tempfile;
print $fh ...;
...;
} # file cleaned up
Some systems might automatically clean up temp files, although I haven't care about that in years. Typically is was a batch thing (say once a week).
I often go one step further by giving my temporary filenames a template, where the Xs are literal characters the module recognizes and fills in with randomized characters:
my( $name, $fh ) = File::Temp->tempfile(
sprintf "$0-%d-XXXXXX", time );
I'm often doing this while I'm developing things so I can watch the program make the files (and in which order) and see what's in them. In production I probably want to obscure the source program name ($0) and the time; I don't want to make it easier to guess who's making which file.
A scratchpad
I can also open a temporary file with open by not giving it a filename. This is useful when you want to collect outside the program. Opening it read-write means you can output some stuff then move around that file (we show a fixed-length record example in Learning Perl):
open(my $tmp, "+>", undef) or die ...
print $tmp "Some stuff\n";
seek $tmp, 0, 0;
my $line = <$tmp>;
File::Temp opens the temp file in O_RDWR mode so all you have to do is use that one file handle for both reading and writing, even from external programs. The returned file handle is overloaded so that it stringifies to the temp file name so you can pass that to the external program. If that is dangerous for your purpose you can get the fileno() and redirect to /dev/fd/<fileno> instead.
All you have to do is mind your seeks and tells. :-) Just remember to always set autoflush!
use File::Temp;
use Data::Dump;
$fh = File::Temp->new;
$fh->autoflush;
system "ls /tmp/*.txt >> $fh" and die $!;
#lines = <$fh>;
printf "%s\n\n", Data::Dump::pp(\#lines);
print $fh "How now brown cow\n";
seek $fh, 0, 0 or die $!;
#lines2 = <$fh>;
printf "%s\n", Data::Dump::pp(\#lines2);
Which prints
[
"/tmp/cpan_htmlconvert_DPzx.txt\n",
"/tmp/cpan_htmlconvert_DunL.txt\n",
"/tmp/cpan_install_HfUe.txt\n",
"/tmp/cpan_install_XbD6.txt\n",
"/tmp/cpan_install_yzs9.txt\n",
]
[
"/tmp/cpan_htmlconvert_DPzx.txt\n",
"/tmp/cpan_htmlconvert_DunL.txt\n",
"/tmp/cpan_install_HfUe.txt\n",
"/tmp/cpan_install_XbD6.txt\n",
"/tmp/cpan_install_yzs9.txt\n",
"How now brown cow\n",
]
HTH

What is the meaning of the dot in this open() usage in Perl?

How can I understand the following usage of the open() function in Perl File I/O?
open(FHANDLE, ">" . $file )
I tried to find this type of syntax in the docs but did not find; please note there is a . (dot) after ">".
All I cannot understand is a use of dot, the rest I know.
This is an example of the old, two-argument form of open (which should be avoided now that three-argument open is available). In Perl, . is the append operator. It combines the two strings into a single string.
The line of code you posted is equivalent to open(FHANDLE, ">$file" ), it just uses a different method of combining the > and $file.
The better way to do it these days would be open(my $fhandle, '>', $file), as shown in the documentation you linked to.
This is the two-argument open. The dot . is the string concatenation operator in Perl. If open is called with two arguments, the second argument contains both the mode and the path.
In your case, it will open the file named in $file for writing.
However, for several reasons you should not do this. It's more common to use the three-argument-open, and the lexical filehandles instead of the global GLOB filehandle.
The lexical filehandle makes sure Perl implicitly closes the handel for you as soon as it goes out of scope. Using different args for mode and filename is a security concern, because otherwise a malicious user could smuggle in mode-changes into the filename.
open my $fh, '>', $file or die $!;
IN addition to the now lexical filehandle and the separation of the mode and the filename, we also check for errors in this code, which is always a good idea.

backtick vs native way of doing things in PERL

Consider these 2 snippets :
#!/bin/bash/perl
open(DATA,"<input.txt");
while(<DATA>)
{
print($_) ;
}
and
$abcd = `cat input.txt`;
print $abcd;
Both will print the content of file input.txt as output
Question : Is there any standard, as to which one (backticks or native-method) should be preferred over the other, in any particular case or both are equal always??
Reason i am asking this is because i find cat method to be easier than opening a file in native perl method, so, this puts me in doubt that if i can achieve something through backtick way, shall i go with it or prefer other native ways of doing it!!
I checked this thread too : What's the difference between Perl's backticks, system, and exec? but it went a different route than my doubt!!
Use builtin functions wherever possible:
They are more portable: open works on Windows, while `cat input.txt` will not.
They have less overhead: Using backticks will fork, exec a shell which parses the command, which execs the cat program. This unnecessarily loads two programs. This is in contrast to open which is a builtin Perl function.
They make error handling easier. The open function will return a false value on error, which allows you to take different actions, e.g. like terminating the program with an error message:
open my $fh, "<", "input.txt" or die "Couldn't open input.txt: $!";
They are more flexible. For example, you can add encoding layers if your data isn't Latin-1 text:
open my $fh, "<:utf8", "input.txt" or die "Couldn't open input.txt: $!";
open my $fh, "<:raw", "input.bin" or die "Couldn't open input.bin: $!";
If you want a “just read this file into a scalar” function, look at the File::Slurp module:
use File::Slurp;
my $data = read_file "input.txt";
Using the back tick operators to call cat is highly inefficient, because:
It spawns a separate process (or maybe more than one if a shell is used) which does nothing more than read the file, which perl could do itself.
You are reading the whole file into memory instead of processing it one line at a time. OK for a small file, not so good for a large one.
The back tick method is ok for a quick and dirty script but I would not use it for anything serious.

How to include a "diff" in a Perl test? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
How can I use Perl to determine whether the contents of two files are identical?
If I am writing a Perl module test, and for example I want to test that an output file is exactly what is expected, if I use an external command like diff, the test might fail on some operating systems which don't provide the diff command. What would be a simple way to do something like diff on files, which doesn't rely on external commands? I understand that there are modules on CPAN which can do file diffs, but I would rather not complicate the build process unless necessary.
File::Compare, in core since 5.004.
When testing and looking for differences in files or strings I always use Test::Differences that uses Text::Diff. I know that you probably know that and you would like a non module solution, but looking for differences has many corner cases so is not trivial. Also I write this answer more for googlers (just in case you already know these modules).
I like the table output of this module. It is very convenient when the differences are a small number.
Why not just read and compare the two files in perl? Something like...
sub readfile
{
local ($/) = undef;
open READFILE, "<", $_[0]
or die "Can't read '$_[0]': $!";
my $contents = <READFILE>;
close READFILE or die "Can't close '$_[0]': $!";
return $contents;
}
$expected = readfile("expected_results");
$actual = readfile("actual_results");
if ($expected != $actual) {
die "Got wrong results!";
}
(If you're concerned about multiple OS portability, you may also need to do something about line endings, either in your test program or here, because some OSs use CRLF instead of LF to separate lines in text files. If you want to handle it here, a regular expression replace will do the trick.)

Are there reasons to ever use the two-argument form of open(...) in Perl?

Are there any reasons to ever use the two-argument form of open(...) in Perl rather than the three-or-more-argument versions?
The only reason I can come up with is the obvious observation that the two-argument form is shorter. But assuming that verbosity is not an issue, are there any other reasons that would make you choose the two-argument form of open(...)?
One- and two-arg open applies any default layers specified with the -C switch or open pragma. Three-arg open does not. In my opinion, this functional difference is the strongest reason to choose one or the other (and the choice will vary depending what you are opening). Which is easiest or most descriptive or "safest" (you can safely use two-arg open with arbitrary filenames, it's just not as convenient) take a back seat in module code; in script code you have more discretion to choose whether you will support default layers or not.
Also, one-arg open is needed for Damian Conway's file slurp operator
$_ = "filename";
$contents = readline!open(!((*{!$_},$/)=\$_));
Imagine you are writing a utility that accepts an input file name. People with reasonable Unix experience are used to substituting - for STDIN. Perl handles that automatically only when the magical form is used where the mode characters and file name are one string, else you have to handle this and similar special cases yourself. This is a somewhat common gotcha, I am surprised no one has posted that yet. Proof:
use IO::File qw();
my $user_supplied_file_name = '-';
IO::File->new($user_supplied_file_name, 'r') or warn "IO::File/non-magical mode - $!\n";
IO::File->new("<$user_supplied_file_name") or warn "IO::File/magical mode - $!\n";
open my $fh1, '<', $user_supplied_file_name or warn "non-magical open - $!\n";
open my $fh2, "<$user_supplied_file_name" or warn "magical open - $!\n";
__DATA__
IO::File/non-magical mode - No such file or directory
non-magical open - No such file or directory
Another small difference : the two argument form trim spaces
$foo = " fic";
open(MH, ">$foo");
print MH "toto\n";
Writes in a file named fic
On the other hand
$foo = " fic";
open(MH, ">", $foo);
print MH "toto\n";
Will write in a file whose name begin with a space.
For short admin scripts with user input (or configuration file input), not having to bother with such details as trimming filenames is nice.
The two argument form of open was the only form supported by some old versions of perl.
If you're opening from a pipe, the three argument form isn't really helpful. Getting the equivalent of the three argument form involves doing a safe pipe open (open(FILE, '|-')) and then executing the program.
So for simple pipe opens (e.g. open(FILE, 'ps ax |')), the two argument syntax is much more compact.
I think William's post pretty much hits it. Otherwise, the three-argument form is going to be more clear, as well as safer.
See also:
What's the best way to open and read a file in Perl?
Why is three-argument open calls with autovivified filehandles a Perl best practice?
One reason to use the two-argument version of open is if you want to open something which might be a pipe, or a file. If you have one function
sub strange
{
my ($file) = #_;
open my $input, $file or die $!;
}
then you want to call this either with a filename like "file":
strange ("file");
or a pipe like "zcat file.gz |"
strange ("zcat file.gz |");
depending on the situation of the file you find, then the two-argument version may be used. You will actually see the above construction in "legacy" Perl. However, the most sensible thing might be to open the filehandle appropriately and send the filehandle to the function rather than using the file name like this.
When you are combining a string or using a variable, it can be rather unclear whether '<' or '>' etc is in already. In such cases, I personally prefer readability, which means, I use the longer form:
open($FILE, '>', $varfn);
When you simply use a constant, I prefer the ease-of-typing (and, actually, consider the short version better readable anyway, or at least even to the long version).
open($FILE, '>somefile.xxx');
I'm guessing you mean open(FH, '<filename.txt') as opposed to open(FH, '<', 'filename.txt') ?
I think it's just a matter of preference. I always use the former out of habit.