Set a filehandle so that prints to it are quietly skipped? - perl

This strange interest comes from expanding requirements and no time to change design (refactor). This is not good design, sure, but I need to deal with it now and hope to refactor later.
There are a few log files opened early on which are printed to throughout code. The new requirement implies that with a (new) command-line option (--noflag) one of these log files is irrelevant.
All I could do at the moment is to pad the definition (open my $fh, ...) and all uses of it (print $fh ...) with if $flag. This is clearly bad design and it is error prone (it isn't pretty either).
Is there a way to do something with $fh when it is associated with the file
so that any following print $fh ... is accepted by intepreter but will result in simply not running the print, without error? (Let me imagine something like, say, $fh = VOID if $flag;.) Or, is there some NULL stream or such? All I know of are STDOUT (1), STDERR (2), and STDIN (0).
I do not want $fh to print anywhere else, ideally not even to /dev/null (if that is possible?). I did look around and couldn't find anything related. I'd appreciate being pointed to information if in fact it is out there already.
Any ideas are appreciated.
PS. First question ever asked here (after years of using SO), please let me know if it's off.
UPDATE
Thanks for responses. They prompt me to add to/refine this question: Are prints marked to go to /dev/null possibly optimized, so that the 'printing' actually doesn't happen? (While I am still interested in whether it is possible to set a filehandle so to tell to Perl 'do not print here'.)
I am trying to avoid running void (print) statements, without adding conditionals.
Update/Clarification
To summarize a bit from comments (thank you!): This was not a quest for performance optimization. I completely agree with everything said in comments on this. It is simply that executing pointless statements (typically around a million) makes me uneasy. Also, I was curious about some possible dark corner of Perl that I haven't run into. (Most of this has been addressed in answers/comments.)

If you are on a unix operating system you can use '/dev/null'
open my $fh, '>', '/dev/null' or die 'This should never happen';
Dev null will silently accept all input.

Closing your filehandle
close $fh;
will make all your prints to that file handle fail. Run
no warnings 'closed';
to suppress all the warning messages that would generate (you do use warnings, right?)

Through magic, you could create a magical handle for which operations are always successful.
perl -e'
{
package Handle::Dummy;
use Tie::Handle qw( );
use Symbol qw( gensym );
our #ISA = qw( Tie::Handle );
sub new { my $fh = gensym; tie *$fh, $_[0]; $fh }
sub TIEHANDLE { bless(\my $dummy, $_[0]) }
sub READ { return 1; }
sub WRITE { return 1; }
sub CLOSE { return 1; }
}
my $fh = Handle::Dummy->new();
print($fh "abc\n") or die $!;
close($fh) or die $!;
print("ok\n");
'
ok
That avoids the systems calls, but it replaces them with expensive Perl subroutine calls.
It's far simpler and more reliable[1] to simply use /dev/null. It could very well be faster too.
Are prints marked to go to /dev/null possibly optimized
No. Perl doesn't know anything about /dev/null.
How slow do you think a system call is? This doesn't sound like the right thing to optimize!
The magical file handle is not associated with a system file handle, so it can't be passed to a C library, it won't survive exec, etc.

You can use an anonymous, temporary file (about a quarter of the way down the perldoc page) like so;
#!/usr/bin/env perl
use strict;
use Getopt::Long;
my $fh;
my $need_log = 2;
print "Intitial need_log: $need_log\n";
GetOptions('flag!' => \$need_log);
print "After option processing, need_log: ", $need_log, "\n";
if ($need_log) {
open($fh, '>', "log.txt") or die "Failed to open log: $!\n";
}
else {
open($fh, '>', undef);
}
print $fh "Hello World... NOT\n";
exit 0;
Here is a few runs with different use of the --flag option;
User#Ubuntu:~$ ls -l log.txt
ls: cannot access log.txt: No such file or directory
User#Ubuntu:~$ ./nf.pl
Intitial need_log: 2
After option processing, need_log: 2
User#Ubuntu:~$ cat log.txt
Hello World... NOT
User#Ubuntu:~$ rm log.txt
User#Ubuntu:~$
User#Ubuntu:~$
User#Ubuntu:~$ ./nf.pl --flag
Intitial need_log: 2
After option processing, need_log: 1
User#Ubuntu:~$ cat log.txt
Hello World... NOT
User#Ubuntu:~$ rm log.txt
User#Ubuntu:~$
User#Ubuntu:~$
User#Ubuntu:~$ ./nf.pl --noflag
Intitial need_log: 2
After option processing, need_log: 0
User#Ubuntu:~$ cat log.txt
cat: log.txt: No such file or directory
User#Ubuntu:~$
I've initialized the $need_log variable to '2' so that we can tell if it has a 'True' value as a result of the flag option being present (in which case it will have the value 1) or as a result of no mention of the flag option at all (in which case it will have the value 2).
Specifying '--noflag' triggers the else clause which has 'undef' as the third argument which creates the anonymous temporary file. This doesn't perfectly match your question of not writing at all, but if the file is temporary and you're not putting gigabytes in it, this will hopefully suffice.

Related

simply tee in Perl without fork, File::Tee, or piping to tee

Is there a simple way in Perl to send STDOUT or STDERR to multiple places without forking, using File::Tee, or opening a pipe to /usr/bin/tee?
Surely there is a way to do this in pure perl without writing 20+ lines of code, right? What am I missing? Similar questions have been asked, both here on SO and elsewhere, but none of the answers satisfy the requirements that I not have to
fork
use File::Tee / IO::Tee / some other module+dependencies
whose code footprint is 1000x larger than my actual script
open a pipe to the actual tee command
I can see the use of a Core module as a tradeoff here, but really is that needed?
It looks like I can simply do this:
BEGIN {
open my $log, '>>', 'error.log' or die $!;
$SIG{__WARN__} = sub { print $log #_ and print STDERR #_ };
$SIG{__DIE__} = sub { warn #_ and exit 1 };
}
This simply and effectively sends most error messages both to the original STDERR and to a log file (apparently stuff trapped in an eval doesn't show up, I'm told). So there are downsides to this, mentioned in the comments. But as mentioned in the original question, the need was specific. This isn't meant for reuse. It's for a simple, small script that will never be more than 100 lines long.
If you are looking for a way to do this that isn't a "hack", the following was adapted from http://grokbase.com/t/perl/beginners/096pcz62bk/redirecting-stderr-with-io-tee
use IO::Tee;
open my $save_stderr, '>&STDERR' or die $!;
close STDERR;
open my $error_log, '>>', 'error.log' or die $!;
*STDERR = IO::Tee->new( $save_stderr, $error_log ) or die $!;

Perl: Pass one byte plus STDIN to another command

I would like to do this efficiently:
my $buf;
my $len = read(STDIN,$buf,1);
if($len) {
# Not empty
open(OUT,"|-", "wc") || die;
print OUT $buf;
# This is the line I want to do faster
print OUT <STDIN>;
exit;
}
The task is to start wc only if there is any input. If there is no input, the program should just exit.
wc is just an example here. It will be substituted with a much more complex command.
The input can be of several TB of data, so I would really like to not touch that data at all (not even with a sysread). I tried doing:
pipe(STDIN,OUT);
But that doesn't work. Is there some other way that I can tell OUT that after it has gotten the first byte, it should just read from STDIN? Maybe some open(">=&2") gymnastics combined with exec?
The FIONREAD ioctl, mentioned in the Perl Cookbook, can tell you how many bytes are pending on a file descriptor without consuming them. In perlish terms:
use strict;
use warnings;
use IO::Select qw( );
BEGIN { require 'sys/ioctl.ph'; }
sub fionread {
my $sz = pack('L', 0);
return unless ioctl($_[0], FIONREAD, $sz);
return unpack('L', $sz);
}
# Wait until it's known whether the handle has data to read or has reached EOF.
IO::Select->new(\*STDIN)->can_read();
if (fionread(\*STDIN)) {
system('wc');
# Check for errors
}
This should be very widely portable to UNIX and UNIX-like platforms.
A child process is always given duplicates of its parent's file handles, so simply starting wc - either with backticks or with a call to system or exec - will cause it to read from the same place as the Perl process's STDIN.
As for starting wc only when there is something to read, it looks like you need IO::Select, which will allow you either to check whether a file handle has something to read, or to block until it does have something.
This program will check whether STDIN has any data waiting, and run wc and print its output if so.
use strict;
use warnings;
use IO::Select;
my $select = IO::Select->new(\*STDIN);
if ( $select->can_read(0) ) {
print `wc`;
}
The parameter to can_read is a timeout in seconds. Passing a value of zero makes it return immediately, reporting true (actually it returns the file handle itself) if there is data waiting, or false (undef) if not.
If you don't pass a parameter then can_read will wait forever until there is something to read, so you can suspend your program and wait for data for wc by writing just
$select->can_read;
print `wc`;
or you could combine the construction of the object to make it even more concise
IO::Select->new(\*STDOUT)->can_read;
print `wc`;
Note also that IO::Select works fine with file descriptors too, and as the fileno for STDIN is zero, you could write
my $select = IO::Select(0)
but that isn't very descriptive and would need a comment to make sense
The specific solution in which you're interested is impossible.
As you surely discovered already, you can't determine if a file handle has reached EOF without reading from it. [Apparently, you can] select(2) will get you close. It will tell you that a handle has reached EOF or has data waiting, but it won't tell you which. This is why you're looking into alternate solutions. Unfortunately, the one you're looking into is just as impossible.
Is there some other way that I can tell OUT that after it has gotten the first byte, it should just read from STDIN?
No. OUT isn't code; it doesn't read anything. It's a variable. Furthermore, it's a variable in the parent. Changing a variable in the parent isn't going to affect the child.
Maybe you meant to ask: Can one tell the child program to start reading from a second handle?
No, generally speaking. You can't go and edit another program's variables. The program would have to be specifically written to accept two file handles and read from one after the other.
Then again, it's possible to obtain a file name for an arbitrary file handle, so all we need is a program that is specifically written to accept two file names and read from one after the other, and that's quite common.
$ echo abcdef | perl -MFcntl -e'
if (sysread(STDIN, $buf, 1)) {
pipe(my $r, my $w);
my $pid = fork();
if (!$pid) {
close($w);
# Clear close-on-exec flag.
my $flags = fcntl($r, Fcntl::F_GETFD, 0);
fcntl($r, Fcntl::F_SETFD, $flags & ~Fcntl::FD_CLOEXEC);
exec("cat", "/proc/$$/fd/".fileno($r), "/proc/$$/fd/".fileno(STDIN));
die $!;
}
close($r);
print($w $buf);
close($w);
waitpid($pid, 0);
}
'
abcdef
(Lots of error checking needed.)
Above, cat was used an example where your program would be used, but that presents another solution: Why not just use cat? The overhead of cat should be quite minor for an IO-bound program.
use String::ShellQuote qw( shell_quote );
my $cmd1 = shell_quote("cat", "/proc/$$/fd/".fileno($r), "/proc/$$/fd/".fileno(STDIN));
my $cmd2 = ...
exec("$cmd1 | $cmd2");

How reliable is the -B file test?

When I open a SQLite database file there is a lot of readable text in the beginning of the file - how big is the chance that a SQLite file is filtered wrongly away due the -B file test?
#!/usr/bin/env perl
use warnings;
use strict;
use 5.10.1;
use File::Find;
my $dir = shift;
my $databases;
find( {
wanted => sub {
my $file = $File::Find::name;
return if not -B $file;
return if not -s $file;
return if not -r $file;
say $file;
open my $fh, '<', $file or die "$file: $!";
my $firstline = readline( $fh ) // '';
close $fh or die $!;
push #$databases, $file if $firstline =~ /\ASQLite\sformat/;
},
no_chdir => 1,
},
$dir );
say scalar #$databases;
The perlfunc man page has the following to say about -T and -B:
The -T and -B switches work as follows. The first block or so of the file is
examined for odd characters such as strange control codes or characters with
the high bit set. If too many strange characters (>30%) are found, it's a -B
file; otherwise it's a -T file. Also, any file containing a zero byte in the
first block is considered a binary file.
Of course you could now do a statistic analysis of a number of sqlite files, parse their "first block or so" for "odd characters", calculate the probability of their occurrence, and that would give you an idea of how likely it is that -B fails for sqlite files.
However, you could also go the easy route. Can it fail? Yes, it's a heuristic. And a bad one at that. So don't use it.
File type recognition on Unix is usually done by evaluating the file's content. And yes, there are people who've done all the work for you already: it's called libmagic (the thingy that yields the file command line tool). You can use it from Perl with e.g. File::MMagic.
Well, all files are technically a collection of bytes, and thus binary. Beyond that, there is no accepted definition of binary, so it's impossible to evaluate -B's reliability unless you care to posit a definition by which it is to be evaluated.

In Perl, why does print not generate any output after I close STDOUT?

I have the code:
open(FILE, "<$new_file") or die "Cant't open file \n";
#lines=<FILE>;
close FILE;
open(STDOUT, ">$new_file") or die "Can't open file\n";
$old_fh = select(OUTPUT_HANDLE);
$| = 1;
select($old_fh);
for(#lines){
s/(.*?xsl.*?)xsl/$1xslt/;
print;
}
close(STDOUT);
STDOUT -> autoflush(1);
print "file changed";
After closing STDOUT closing the program does not write the last print print "file changed". Why is this?
*Edited* Print message I want to write on Console no to file
I suppose it is because print default filehandle is STDOUT, which at that point it is already closed. You could reopen it, or print to other filehandle, for example, STDERR.
print STDERR "file changed";
It's because you've closed the filehandle stored in STDOUT, so print can't use it anymore. Generally speaking opening a new filehandle into one of the predefined handle names isn't a very good idea because it's bound to lead to confusion. It's much clearer to use lexical filehandles, or just a different name for your output file. Yes you then have to specify the filehandle in your print call, but then you don't have any confusion over what's happened to STDOUT.
A print statement will output the string in the STDOUT, which is the default output file handle.
So the statement
print "This is a message";
is same as
print STDOUT "This is a message";
In your code, you have closed STDOUT and then printing the message, which will not work. Reopen the STDOUT filehandle or do not close it. As the script ends, the file handles will be automatically closed
open OLDOUT, ">&", STDOUT;
close STDOUT;
open(STDOUT, ">$new_file") or die "Can't open file\n";
...
close(STDOUT);
open (STDOUT, ">&",OLDOUT);
print "file changed";
You seem to be confused about how file IO operations are done in perl, so I would recommend you read up on that.
What went wrong?
What you are doing is:
Open a file for reading
Read the entire file and close it
Open the same file for overwrite (org file is truncated), using the STDOUT file handle.
Juggle around the default print handle in order to set autoflush on a file handle which is not even opened in the code you show.
Perform a substitution on all lines and print them
Close STDOUT then print a message when everything is done.
Your main biggest mistake is trying to reopen the default output file handle STDOUT. I assume this is because you do not know how print works, i.e. that you can supply a file handle to print to print FILEHANDLE "text". Or that you did not know that STDOUT was a pre-defined file handle.
Your other errors:
You did not use use strict; use warnings;. No program you write should be without these. They will prevent you from doing bad things, and give you information on errors, and will save you hours of debugging.
You should never "slurp" a file (read the entire file to a variable) unless you really need to, because this is ineffective and slow and for huge files will cause your program to crash due to lack of memory.
Never reassign the default file handles STDIN, STDOUT, STDERR, unless A) you really need to, B) you know what you are doing.
select sets the default file handle for print, read the documentation. This is rarely something that you need to concern yourself with. The variable $| sets autoflush on (if set to a true value) for the currently selected file handle. So what you did actually accomplished nothing, because OUTPUT_HANDLE is a non-existent file handle. If you had skipped the select statements, it would have set autoflush for STDOUT. (But you wouldn't have noticed any difference)
print uses print buffers because it is efficient. I assume you are trying to autoflush because you think your prints get caught in the buffer, which is not true. Generally speaking, this is not something you need to worry about. All the print buffers are automatically flushed when a program ends.
For the most part, you do not need to explicitly close file handles. File handles are automatically closed when they go out of scope, or when the program ends.
Using lexical file handles, e.g. open my $fh, ... instead of global, e.g. open FILE, .. is recommended, because of the previous statement, and because it is always a good idea to avoid global variables.
Using three-argument open is recommended: open FILEHANDLE, MODE, FILENAME. This is because you otherwise risk meta-characters in your file names to corrupt your open statement.
The quick fix:
Now, as I said in the comments, this -- or rather, what you intended, because this code is wrong -- is pretty much identical to the idiomatic usage of the -p command line switch:
perl -pi.bak -e 's/(.*?xsl.*?)xsl/$1xslt/' file.txt
This short little snippet actually does all that your program does, but does it much better. Explanation:
-p switch automatically assumes that the code you provide is inside a while (<>) { } loop, and prints each line, after your code is executed.
-i switch tells perl to do inplace-edit on the file, saving a backup copy in "file.txt.bak".
So, that one-liner is equivalent to a program such as this:
$^I = ".bak"; # turns inplace-edit on
while (<>) { # diamond operator automatically uses STDIN or files from #ARGV
s/(.*?xsl.*?)xsl/$1xslt/;
print;
}
Which is equivalent to this:
my $file = shift; # first argument from #ARGV -- arguments
open my $fh, "<", $file or die $!;
open my $tmp, ">", "/tmp/foo.bar" or die $!; # not sure where tmpfile is
while (<$fh>) { # read lines from org file
s/(.*?xsl.*?)xsl/$1xslt/;
print $tmp $_; # print line to tmp file
}
rename($file, "$file.bak") or die $!; # save backup
rename("/tmp/foo.bar", $file) or die $!; # overwrite original file
The inplace-edit option actually creates a separate file, then copies it over the original. If you use the backup option, the original file is first backed up. You don't need to know this information, just know that using the -i switch will cause the -p (and -n) option to actually perform changes on your original file.
Using the -i switch with the backup option activated is not required (except on Windows), but recommended. A good idea is to run the one-liner without the option first, so the output is printed to screen instead, and then adding it once you see the output is ok.
The regex
s/(.*?xsl.*?)xsl/$1xslt/;
You search for a string that contains "xsl" twice. The usage of .*? is good in the second case, but not in the first. Any time you find yourself starting a regex with a wildcard string, you're probably doing something wrong. Unless you are trying to capture that part.
In this case, though, you capture it and remove it, only to put it back, which is completely useless. So the first order of business is to take that part out:
s/(xsl.*?)xsl/$1xslt/;
Now, removing something and putting it back is really just a magic trick for not removing it at all. We don't need magic tricks like that, when we can just not remove it in the first place. Using look-around assertions, you can achieve this.
In this case, since you have a variable length expression and need a look-behind assertion, we have to use the \K (mnemonic: Keep) option instead, because variable length look-behinds are not implemented.
s/xsl.*?\Kxsl/xslt/;
So, since we didn't take anything out, we don't need to put anything back using $1. Now, you may notice, "Hey, if I replace 'xsl' with 'xslt', I don't need to remove 'xsl' at all." Which is true:
s/xsl.*?xsl\K/t/;
You may consider using options for this regex, such as /i, which causes it to ignore case and thus also match strings such as "XSL FOO XSL". Or the /g option which will allow it to perform all possible matches per line, and not just the first match. Read more in perlop.
Conclusion
The finished one-liner is:
perl -pi.bak -e 's/xsl.*?xsl\K/t/' file.txt

STDOUT to array Perl

I am compiling a Perl program, i am writing the output STDOUT to a file. In the same program , i want to run another small script using while function on the output of STDOUT. So, I need to save the output of first script in an array, then i can use in while<#array>. Like
open(File,"text.txt");
open(STDOUT,">output,txt");
#file_contents=<FILE>;
foreach (#file_contents){
//SCRIPT GOES HERE//
write;
}
format STDOUT =
VARIABLE #<<<<<< #<<<<<< #<<<<<<
$x $y $z
.
//Here I want to use output of above program in while loop //
while(<>){
}
How can i save the output of first program into array so that i can use in while loop, or how can i directly use STDOUT in while loop. I have to make sure that first part is completely executed. Thanks in advance.
Since you remapped STDOUT so it writes to a file, you could presumably close STDOUT, and then reopen the file for reading.
Quite where you're going to send any other output is a bit of a mystery, but presumably you can resolve that. Were it me, I'd not fiddle with STDOUT. I'd make the script write to a file handle:
use strict;
use warnings;
open my $input, "<", "text.txt" or die "A horrible death";
open my $output, ">", "output.txt" or die "A horrible death";
my #file_contents = <$input>;
close($input);
foreach (#file_contents)
{
# Script goes here
print $output "Any information that goes to output\n";
}
close $output;
open my $reread, "<", "output.txt" or die "A horrible death";
while (<$reread>)
{
# Process the previous output
}
Note the use of lexical file handles, the checking that the open worked, the close when finished with the input file, the use of use strict; and use warnings;. (I've only been working with Perl for 20 years and I know I don't trust my scripts until they run clean with those settings.)
I assume you want to reopen STDOUT in order to make the write function work. However, the correct solution for that is to either specify the file handle, or to a lesser extent, to use select.
write FILEHANDLE;
or
select FILEHANDLE;
write;
Unfortunately, it seems the IO of perlform is a bit arcane, and does not seem to allow for lexical file handles.
Your problem is you can't reuse the formatted text within the program, so a bit of trixy programming is required. What you can do is open a file handle that prints to a scalar. Which is another somewhat arcane perl functionality, but in this case, it might be the only way to do this directly.
# Using FOO as format to avoid destroying STDOUT
format FOO =
VARIABLE #<<<<<< #<<<<<< #<<<<<<
$x $y $z
.
my $foo;
use autodie; # save yourself some typing
open INPUT, '<', "text.txt"; # normally, we would add "or die $!" on these
open FOO, '>', \$foo; # but now autodie handles that for us
open my $output, '>', "output.txt";
while (<FILE>) {
$foo = ""; # we need to reset $foo each iteration
write FOO; # write to the file handle instead
print $output $foo; # this now prints $foo to output.txt
do_something($foo); # now you can also process the text at the same time
}
As you'll notice, we now first print the formatted line to the scalar $foo. While it is there, we can handle it as regular data, so there's no need to save to a file and reopening it to get to the data.
Each iteration, data is concatenated to the end of $foo, so to avoid accumulation, we need to reset $foo. The best way to handle this would be to make $foo lexical within the scope, but unfortunately we need $foo to be declared outside the while loop in order to be able to use it in the open statement.
It might be possible to use local $foo inside the while-loop, but I think that's adding yet more bad practice to this already very bad hack.
Conclusion:
With all this said and done, I suspect the best way to handle this is to not use perlform at all, and format your data in some other way. While perlform might be well suited to print to a file, it is not the best suited for what you have in mind. I recall this question from earlier, perhaps there was some other answer that would work better. Such as using sprintf, like Jonathan suggested
Assuming the output from your first program is tab-delimited:
while (<>) {
chomp $_;
my ($variable, $x, $y, $z) = split("\t", $_);
# do stuff with values
}