So I've recently started using Perl::Critic to check the quality of the code I've written. I'm running it in brutal mode and have one suggestion it is making which I don't understand as being an issue. The output is:
Return value of flagged function ignored - print at line 197, column 13. See pages 208,278 of PBP. (Severity: 1)
This is basically a call to the print function with a short message which outputs to the console. Why then should I capture the return value which will almost certainly always be 1 as I can't think of any use case where this wouldn't be a 1.
Is brutal mode being 'too brutal'? Or am I missing something? I should add that I did read pages 208 and 278 of the PBP and the answer is not clear to me.
I agree that most of the time print will not fail. But, you can disable this function by creating a .perlcriticrc file and adding these lines to it (like I do):
# Check all builtins except "print"
[InputOutput::RequireCheckedSyscalls]
functions = :builtins
exclude_functions = print
This is described in Perl::Critic::Policy::InputOutput::RequireCheckedSyscalls
Also, if you disagree with all the policies of the Brutal setting, you can just use one of the other 4 less brutal settings. The tool is highly configurable.
Here is a trivial case where print can fail (printing to a closed filehandle):
open my $fh, '>', 'out';
print $fh "555\n";
close $fh;
print $fh "888\n" or die "print failed: $!";
# we shouldn't get here
print "777\n";
In such short code, it is obvious that you just closed the filehandle, and you would never then try to print to it. But, if you had a lot of (poorly designed) code, maybe it would occur.
There are other reasons print could fail, such as if another process came along and deleted a directory or disabled write permissions on your open file.
I created a script for myself to run perlcritic which makes it easy to access the POD for a given policy: Sort and summarize perlcritic output
One use case where print "something"; fails is when STDOUT has been opened to a file and the file system is full. But in my projects I also do not check the return value of print.
To implement the fix for
print "$updated_service_name\n";
Use
my $printed = (print "$updated_service_name\n");
if (!$printed) {
die "Unable to write to stdout\n";
}
Related
Was debugging a perl script for the first time in my life and came over this:
$my_temp_file = File::Temp->tmpnam();
system("cmd $blah | cmd2 > $my_temp_file");
open(FIL, "$my_temp_file");
...
unlink $my_temp_file;
This works pretty much like I want, except the obvious race conditions in lines 1-3. Even if using proper tempfile() there is no way (I can think of) to ensure that the file streamed to at line 2 is the same opened at line 3. One solution might be pipes, but the errors during cmd might occur late because of limited pipe buffering, and that would complicate my error handling (I think).
How do I:
Write all output from cmd $blah | cmd2 into a tempfile opened file handle?
Read the output without re-opening the file (risking race condition)?
You can open a pipe to a command and read its contents directly with no intermediate file:
open my $fh, '-|', 'cmd', $blah;
while( <$fh> ) {
...
}
With short output, backticks might do the job, although in this case you have to be more careful to scrub the inputs so they aren't misinterpreted by the shell:
my $output = `cmd $blah`;
There are various modules on CPAN that handle this sort of thing, too.
Some comments on temporary files
The comments mentioned race conditions, so I thought I'd write a few things for those wondering what people are talking about.
In the original code, Andreas uses File::Temp, a module from the Perl Standard Library. However, they use the tmpnam POSIX-like call, which has this caveat in the docs:
Implementations of mktemp(), tmpnam(), and tempnam() are provided, but should be used with caution since they return only a filename that was valid when function was called, so cannot guarantee that the file will not exist by the time the caller opens the filename.
This is discouraged and was removed for Perl v5.22's POSIX.
That is, you get back the name of a file that does not exist yet. After you get the name, you don't know if that filename was made by another program. And, that unlink later can cause problems for one of the programs.
The "race condition" comes in when two programs that probably don't know about each other try to do the same thing as roughly the same time. Your program tries to make a temporary file named "foo", and so does some other program. They both might see at the same time that a file named "foo" does not exist, then try to create it. They both might succeed, and as they both write to it, they might interleave or overwrite the other's output. Then, one of those programs think it is done and calls unlink. Now the other program wonders what happened.
In the malicious exploit case, some bad actor knows a temporary file will show up, so it recognizes a new file and gets in there to read or write data.
But this can also happen within the same program. Two or more versions of the same program run at the same time and try to do the same thing. With randomized filenames, it is probably exceedingly rare that two running programs will choose the same name at the same time. However, we don't care how rare something is; we care how devastating the consequences are should it happen. And, rare is much more frequent than never.
File::Temp
Knowing all that, File::Temp handles the details of ensuring that you get a filehandle:
my( $fh, $name ) = File::Temp->tempfile;
This uses a default template to create the name. When the filehandle goes out of scope, File::Temp also cleans up the mess.
{
my( $fh, $name ) = File::Temp->tempfile;
print $fh ...;
...;
} # file cleaned up
Some systems might automatically clean up temp files, although I haven't care about that in years. Typically is was a batch thing (say once a week).
I often go one step further by giving my temporary filenames a template, where the Xs are literal characters the module recognizes and fills in with randomized characters:
my( $name, $fh ) = File::Temp->tempfile(
sprintf "$0-%d-XXXXXX", time );
I'm often doing this while I'm developing things so I can watch the program make the files (and in which order) and see what's in them. In production I probably want to obscure the source program name ($0) and the time; I don't want to make it easier to guess who's making which file.
A scratchpad
I can also open a temporary file with open by not giving it a filename. This is useful when you want to collect outside the program. Opening it read-write means you can output some stuff then move around that file (we show a fixed-length record example in Learning Perl):
open(my $tmp, "+>", undef) or die ...
print $tmp "Some stuff\n";
seek $tmp, 0, 0;
my $line = <$tmp>;
File::Temp opens the temp file in O_RDWR mode so all you have to do is use that one file handle for both reading and writing, even from external programs. The returned file handle is overloaded so that it stringifies to the temp file name so you can pass that to the external program. If that is dangerous for your purpose you can get the fileno() and redirect to /dev/fd/<fileno> instead.
All you have to do is mind your seeks and tells. :-) Just remember to always set autoflush!
use File::Temp;
use Data::Dump;
$fh = File::Temp->new;
$fh->autoflush;
system "ls /tmp/*.txt >> $fh" and die $!;
#lines = <$fh>;
printf "%s\n\n", Data::Dump::pp(\#lines);
print $fh "How now brown cow\n";
seek $fh, 0, 0 or die $!;
#lines2 = <$fh>;
printf "%s\n", Data::Dump::pp(\#lines2);
Which prints
[
"/tmp/cpan_htmlconvert_DPzx.txt\n",
"/tmp/cpan_htmlconvert_DunL.txt\n",
"/tmp/cpan_install_HfUe.txt\n",
"/tmp/cpan_install_XbD6.txt\n",
"/tmp/cpan_install_yzs9.txt\n",
]
[
"/tmp/cpan_htmlconvert_DPzx.txt\n",
"/tmp/cpan_htmlconvert_DunL.txt\n",
"/tmp/cpan_install_HfUe.txt\n",
"/tmp/cpan_install_XbD6.txt\n",
"/tmp/cpan_install_yzs9.txt\n",
"How now brown cow\n",
]
HTH
This strange interest comes from expanding requirements and no time to change design (refactor). This is not good design, sure, but I need to deal with it now and hope to refactor later.
There are a few log files opened early on which are printed to throughout code. The new requirement implies that with a (new) command-line option (--noflag) one of these log files is irrelevant.
All I could do at the moment is to pad the definition (open my $fh, ...) and all uses of it (print $fh ...) with if $flag. This is clearly bad design and it is error prone (it isn't pretty either).
Is there a way to do something with $fh when it is associated with the file
so that any following print $fh ... is accepted by intepreter but will result in simply not running the print, without error? (Let me imagine something like, say, $fh = VOID if $flag;.) Or, is there some NULL stream or such? All I know of are STDOUT (1), STDERR (2), and STDIN (0).
I do not want $fh to print anywhere else, ideally not even to /dev/null (if that is possible?). I did look around and couldn't find anything related. I'd appreciate being pointed to information if in fact it is out there already.
Any ideas are appreciated.
PS. First question ever asked here (after years of using SO), please let me know if it's off.
UPDATE
Thanks for responses. They prompt me to add to/refine this question: Are prints marked to go to /dev/null possibly optimized, so that the 'printing' actually doesn't happen? (While I am still interested in whether it is possible to set a filehandle so to tell to Perl 'do not print here'.)
I am trying to avoid running void (print) statements, without adding conditionals.
Update/Clarification
To summarize a bit from comments (thank you!): This was not a quest for performance optimization. I completely agree with everything said in comments on this. It is simply that executing pointless statements (typically around a million) makes me uneasy. Also, I was curious about some possible dark corner of Perl that I haven't run into. (Most of this has been addressed in answers/comments.)
If you are on a unix operating system you can use '/dev/null'
open my $fh, '>', '/dev/null' or die 'This should never happen';
Dev null will silently accept all input.
Closing your filehandle
close $fh;
will make all your prints to that file handle fail. Run
no warnings 'closed';
to suppress all the warning messages that would generate (you do use warnings, right?)
Through magic, you could create a magical handle for which operations are always successful.
perl -e'
{
package Handle::Dummy;
use Tie::Handle qw( );
use Symbol qw( gensym );
our #ISA = qw( Tie::Handle );
sub new { my $fh = gensym; tie *$fh, $_[0]; $fh }
sub TIEHANDLE { bless(\my $dummy, $_[0]) }
sub READ { return 1; }
sub WRITE { return 1; }
sub CLOSE { return 1; }
}
my $fh = Handle::Dummy->new();
print($fh "abc\n") or die $!;
close($fh) or die $!;
print("ok\n");
'
ok
That avoids the systems calls, but it replaces them with expensive Perl subroutine calls.
It's far simpler and more reliable[1] to simply use /dev/null. It could very well be faster too.
Are prints marked to go to /dev/null possibly optimized
No. Perl doesn't know anything about /dev/null.
How slow do you think a system call is? This doesn't sound like the right thing to optimize!
The magical file handle is not associated with a system file handle, so it can't be passed to a C library, it won't survive exec, etc.
You can use an anonymous, temporary file (about a quarter of the way down the perldoc page) like so;
#!/usr/bin/env perl
use strict;
use Getopt::Long;
my $fh;
my $need_log = 2;
print "Intitial need_log: $need_log\n";
GetOptions('flag!' => \$need_log);
print "After option processing, need_log: ", $need_log, "\n";
if ($need_log) {
open($fh, '>', "log.txt") or die "Failed to open log: $!\n";
}
else {
open($fh, '>', undef);
}
print $fh "Hello World... NOT\n";
exit 0;
Here is a few runs with different use of the --flag option;
User#Ubuntu:~$ ls -l log.txt
ls: cannot access log.txt: No such file or directory
User#Ubuntu:~$ ./nf.pl
Intitial need_log: 2
After option processing, need_log: 2
User#Ubuntu:~$ cat log.txt
Hello World... NOT
User#Ubuntu:~$ rm log.txt
User#Ubuntu:~$
User#Ubuntu:~$
User#Ubuntu:~$ ./nf.pl --flag
Intitial need_log: 2
After option processing, need_log: 1
User#Ubuntu:~$ cat log.txt
Hello World... NOT
User#Ubuntu:~$ rm log.txt
User#Ubuntu:~$
User#Ubuntu:~$
User#Ubuntu:~$ ./nf.pl --noflag
Intitial need_log: 2
After option processing, need_log: 0
User#Ubuntu:~$ cat log.txt
cat: log.txt: No such file or directory
User#Ubuntu:~$
I've initialized the $need_log variable to '2' so that we can tell if it has a 'True' value as a result of the flag option being present (in which case it will have the value 1) or as a result of no mention of the flag option at all (in which case it will have the value 2).
Specifying '--noflag' triggers the else clause which has 'undef' as the third argument which creates the anonymous temporary file. This doesn't perfectly match your question of not writing at all, but if the file is temporary and you're not putting gigabytes in it, this will hopefully suffice.
I'm a self-taught Perler, seeking assistance from the Perl experts:
I keep getting an error saying I can't use the filehandle within a foreach loop, even though I'm sure to close it (or undef it, I've tried both). See the full error here: http://cl.ly/image/2b2D1T403s14
The code is available on GitHub: https://github.com/bsima/yeast-TRX
The code in question can be found in the file "script.pl" at around line 90:
foreach my $species (keys %Saccharomyces) {
open(RAW,">./data/$species/$geneName/raw.csv");
print RAW "gene,dinucleotide,position,trx.score,energy.score\n";
undef RAW;
open(SMOOTH,">./data/$species/$geneName/smooth.csv");
print SMOOTH "gene,position,trx.score,energy.score\n";
undef SMOOTH;
}
Help is much appreciated! I don't know the intricacies of how Perl works with filehandles, probably because of my lack of formal training. Any comments on my overall code quality is welcome too, if someone is feeling particularly helpful.
EDIT: Found the problem. Perl cannot generate directories on the fly, so the $species/$geneName directory was never even being created. I added a line at the beginning of the foreach loop that said simply mkdir("$species/$geneName"); and that solve the issue.
You are getting warning that is quite telling:
Bareword RAW not allowed while "strict subs" in use
Also, undef FILEHANDLE is not as good as close FILEHANDLE.
Solution is to use normal scoped variables for file handles and close them, something like this:
foreach my $species (keys %Saccharomyces) {
open my $raw, ">", "./data/$species/$geneName/raw.csv";
print $raw "gene,dinucleotide,position,trx.score,energy.score\n";
close $raw;
open my $smooth, ">", "./data/$species/$geneName/smooth.csv";
print $smooth "gene,position,trx.score,energy.score\n";
close $smooth;
}
Also, you should check if $raw and $smooth were opened before trying to write to them.
Perl cannot generate directories on the fly, so the $species/$geneName directory was never even being created. I added a line at the beginning of the foreach loop that said simply mkdir("$species/$geneName"); and that solve the issue.
I have the code:
open(FILE, "<$new_file") or die "Cant't open file \n";
#lines=<FILE>;
close FILE;
open(STDOUT, ">$new_file") or die "Can't open file\n";
$old_fh = select(OUTPUT_HANDLE);
$| = 1;
select($old_fh);
for(#lines){
s/(.*?xsl.*?)xsl/$1xslt/;
print;
}
close(STDOUT);
STDOUT -> autoflush(1);
print "file changed";
After closing STDOUT closing the program does not write the last print print "file changed". Why is this?
*Edited* Print message I want to write on Console no to file
I suppose it is because print default filehandle is STDOUT, which at that point it is already closed. You could reopen it, or print to other filehandle, for example, STDERR.
print STDERR "file changed";
It's because you've closed the filehandle stored in STDOUT, so print can't use it anymore. Generally speaking opening a new filehandle into one of the predefined handle names isn't a very good idea because it's bound to lead to confusion. It's much clearer to use lexical filehandles, or just a different name for your output file. Yes you then have to specify the filehandle in your print call, but then you don't have any confusion over what's happened to STDOUT.
A print statement will output the string in the STDOUT, which is the default output file handle.
So the statement
print "This is a message";
is same as
print STDOUT "This is a message";
In your code, you have closed STDOUT and then printing the message, which will not work. Reopen the STDOUT filehandle or do not close it. As the script ends, the file handles will be automatically closed
open OLDOUT, ">&", STDOUT;
close STDOUT;
open(STDOUT, ">$new_file") or die "Can't open file\n";
...
close(STDOUT);
open (STDOUT, ">&",OLDOUT);
print "file changed";
You seem to be confused about how file IO operations are done in perl, so I would recommend you read up on that.
What went wrong?
What you are doing is:
Open a file for reading
Read the entire file and close it
Open the same file for overwrite (org file is truncated), using the STDOUT file handle.
Juggle around the default print handle in order to set autoflush on a file handle which is not even opened in the code you show.
Perform a substitution on all lines and print them
Close STDOUT then print a message when everything is done.
Your main biggest mistake is trying to reopen the default output file handle STDOUT. I assume this is because you do not know how print works, i.e. that you can supply a file handle to print to print FILEHANDLE "text". Or that you did not know that STDOUT was a pre-defined file handle.
Your other errors:
You did not use use strict; use warnings;. No program you write should be without these. They will prevent you from doing bad things, and give you information on errors, and will save you hours of debugging.
You should never "slurp" a file (read the entire file to a variable) unless you really need to, because this is ineffective and slow and for huge files will cause your program to crash due to lack of memory.
Never reassign the default file handles STDIN, STDOUT, STDERR, unless A) you really need to, B) you know what you are doing.
select sets the default file handle for print, read the documentation. This is rarely something that you need to concern yourself with. The variable $| sets autoflush on (if set to a true value) for the currently selected file handle. So what you did actually accomplished nothing, because OUTPUT_HANDLE is a non-existent file handle. If you had skipped the select statements, it would have set autoflush for STDOUT. (But you wouldn't have noticed any difference)
print uses print buffers because it is efficient. I assume you are trying to autoflush because you think your prints get caught in the buffer, which is not true. Generally speaking, this is not something you need to worry about. All the print buffers are automatically flushed when a program ends.
For the most part, you do not need to explicitly close file handles. File handles are automatically closed when they go out of scope, or when the program ends.
Using lexical file handles, e.g. open my $fh, ... instead of global, e.g. open FILE, .. is recommended, because of the previous statement, and because it is always a good idea to avoid global variables.
Using three-argument open is recommended: open FILEHANDLE, MODE, FILENAME. This is because you otherwise risk meta-characters in your file names to corrupt your open statement.
The quick fix:
Now, as I said in the comments, this -- or rather, what you intended, because this code is wrong -- is pretty much identical to the idiomatic usage of the -p command line switch:
perl -pi.bak -e 's/(.*?xsl.*?)xsl/$1xslt/' file.txt
This short little snippet actually does all that your program does, but does it much better. Explanation:
-p switch automatically assumes that the code you provide is inside a while (<>) { } loop, and prints each line, after your code is executed.
-i switch tells perl to do inplace-edit on the file, saving a backup copy in "file.txt.bak".
So, that one-liner is equivalent to a program such as this:
$^I = ".bak"; # turns inplace-edit on
while (<>) { # diamond operator automatically uses STDIN or files from #ARGV
s/(.*?xsl.*?)xsl/$1xslt/;
print;
}
Which is equivalent to this:
my $file = shift; # first argument from #ARGV -- arguments
open my $fh, "<", $file or die $!;
open my $tmp, ">", "/tmp/foo.bar" or die $!; # not sure where tmpfile is
while (<$fh>) { # read lines from org file
s/(.*?xsl.*?)xsl/$1xslt/;
print $tmp $_; # print line to tmp file
}
rename($file, "$file.bak") or die $!; # save backup
rename("/tmp/foo.bar", $file) or die $!; # overwrite original file
The inplace-edit option actually creates a separate file, then copies it over the original. If you use the backup option, the original file is first backed up. You don't need to know this information, just know that using the -i switch will cause the -p (and -n) option to actually perform changes on your original file.
Using the -i switch with the backup option activated is not required (except on Windows), but recommended. A good idea is to run the one-liner without the option first, so the output is printed to screen instead, and then adding it once you see the output is ok.
The regex
s/(.*?xsl.*?)xsl/$1xslt/;
You search for a string that contains "xsl" twice. The usage of .*? is good in the second case, but not in the first. Any time you find yourself starting a regex with a wildcard string, you're probably doing something wrong. Unless you are trying to capture that part.
In this case, though, you capture it and remove it, only to put it back, which is completely useless. So the first order of business is to take that part out:
s/(xsl.*?)xsl/$1xslt/;
Now, removing something and putting it back is really just a magic trick for not removing it at all. We don't need magic tricks like that, when we can just not remove it in the first place. Using look-around assertions, you can achieve this.
In this case, since you have a variable length expression and need a look-behind assertion, we have to use the \K (mnemonic: Keep) option instead, because variable length look-behinds are not implemented.
s/xsl.*?\Kxsl/xslt/;
So, since we didn't take anything out, we don't need to put anything back using $1. Now, you may notice, "Hey, if I replace 'xsl' with 'xslt', I don't need to remove 'xsl' at all." Which is true:
s/xsl.*?xsl\K/t/;
You may consider using options for this regex, such as /i, which causes it to ignore case and thus also match strings such as "XSL FOO XSL". Or the /g option which will allow it to perform all possible matches per line, and not just the first match. Read more in perlop.
Conclusion
The finished one-liner is:
perl -pi.bak -e 's/xsl.*?xsl\K/t/' file.txt
I have a module that uses IPC::Open3 (or IPC::Open2, both exhibit this problem) to call an external binary (bogofilter in this case) and feed it some input via the child-input filehandle, then reads the result from the child-output handle. The code works fine when run in most environments. However, the main use of this module is in a web service that runs under Apache 2.2.6. And under that environment, I get the error:
Cannot fdopen STDOUT: Invalid argument
This only happens when the code runs under Apache. Previously, the code constructed a horribly complex command, which included a here-document for the input, and ran it with back-ticks. THAT worked, but was very slow and prone to breaking in unique and perplexing ways. I would hate to have to revert to the old version, but I cannot crack this.
Could it be because mod_perl 2 closes STDOUT? I just discovered this and posted about it:
http://marc.info/?l=apache-modperl&m=126296015910250&w=2
I think it's a nasty bug, but no one seems to care about it thus far. Post a follow up on the mod_perl list if your problem is related and you want it to get attention.
Jon
Bogofilter returns different exit codes for spam/nonspam.
You can "fix" this by redirecting stdout to /dev/null
system("bogofilter < $input > /dev/null") >> 8;
Will return 0 for spam, 1 for nonspam, 2 for unknown (the >> 8 is because perl helpfully corrects the exit code, this fixes the damage).
Note: the lack of an environment may also prevent bogofilter from finding its wordlist, so pass that in explicitly as well:
system("bogofilter -d /path/to/.bogofilter/ < $input > /dev/null") >> 8;
(where /path/to/.bogofilter contains the wordlist.db)
You can't retrieve the actual rating that bogofilter gave that way, but it does get you something.
If your code is only going to be run on Linux/Unix systems it is easy to write an open3 replacement that does not fail because STDOUT is not a real file handle:
sub my_open3 {
# untested!
pipe my($inr), my($inw) or die;
pipe my($outr), my($outw) or die;
pipe my($errr), my($errw) or die;
my $pid = fork;
unless ($pid) {
defined $pid or die;
POSIX::dup2($inr, 0);
POSIX::dup2($outw, 1);
POSIX::dup2($errw, 2);
exec #_;
POSIX::_exit(1);
}
return ($inw, $outr, $errr);
}
my ($in, $out, $err) = my_open3('ls /etc/');
Caveat Emptor: I am not a perl wizard.
As #JonathanSwartz suggested, I believe the issue is that apache2 mod_perl closes STDIN and STDOUT. That shouldn't be relevant to what IPC::Open3 is doing, but it has a bug in it, described here.
In summary (this is the part I'm not super clear on), open3 tries to match the child processes STDIN/OUT/ERR to your process, or duplicate it if that was what is requested. Due to some undocumented ways that open('>&=X') works, it generally works fine, except in the case where STDIN/OUT/ERR are closed.
Another link that gets deep into the details.
One solution is to fix IPC::Open3, as described in both of those links. The other, which worked for me, is to temporarily open STDIN/OUT in your mod_perl code and then close it afterwards:
my ($save_stdin,$save_stdout);
open $save_stdin, '>&STDIN';
open $save_stdout, '>&STDOUT';
open STDIN, '>&=0';
open STDOUT, '>&=1';
#make your normal IPC::Open3::open3 call here
close(STDIN);
close(STDOUT);
open STDIN, '>&', $save_stdin;
open STDOUT, '>&', $save_stdout;
Also, I noticed a bunch of complaints around the net about IPC::Run3 suffering from the same problems, so if anyone runs into the same issue, I suspect the same solution would work.