Perl: STDOUT reopened as FH only for input

Perl: STDOUT reopened as FH only for input - perl

I have the following error message (with Perl 5):
Tools.pm: Filehandle STDOUT reopened as FH only for input at /usr/local/lib/perl5/site_perl/mach/5.20/Template/Provider.pm line 967.
I understand its cause: It is that STDOUT was closed and later the same FD was used for something unrelated to STDOUT.
The code does the right thing. The only problem is that this error message should not to be logged.
How to stop printing this error message into our log file?

Detailed handling of errors is outlined in Template::Manual::Config::ERROR.
It can be categorized in the constructor by specifying templates for exception types
my $template = Template->new({
ERRORS => {
user => 'user/index.html',
dbi => 'error/database',
default => 'error/default',
},
});
which can be raised using the THROW directive
[% THROW user.login 'no user id: please login' %]
or by calling throw method
$context->throw('user.passwd', 'Incorrect Password');
$context->throw('Incorrect Password'); # type 'undef'
or by Perl code by calling die, perhaps with a Template::Exception object.
How to use this to solve the problem is a matter of details, of which none are provided.
But you really want to find (user) code that triggers this and clean it up. For one, as noted by ikegami in a comment, don't close STDOUT but reopen it to /dev/null. (I'd say, never simply close standard streams.) For instance, if you just don't want to see STDOUT any more
open STDOUT, '>', '/dev/null';
or first copy it (uses dup2) before reopening so that you can restore it later
open my $SAVEOUT, '>&', 'STDOUT';
open STDOUT, '>', '/dev/null';
...
open STDOUT, '>', $SAVEOUT; # restore STDOUT
close $SAVEOUT; # if unneeded
(see open), or if feasible create a local *FOO and use that to save STDOUT.
The warning comes about since the lowest unused file descriptor is always used, and here fd 1 was vacated by closing STDOUT; but it is attempted to be used for input what isn't OK. As for why it's not OK and why emit a warning, this thread and an old bug report are useful. This goes beyond Perl.
One generic way is to use the __WARN__ hook
BEGIN {
$SIG{__WARN__} = sub {
warn #_
unless $_[0] ~= /Filehandle STDOUT reopened as FH only for input/
}
};
where the warning is emitted unless it matches the one you want to suppress.
This BEGIN block need be before the use statements for modules it is expected to affect. If you know the scope at which this is needed it is better to localize it, local $SIG{__WARN__} = sub {...};
See this post and its links, to other posts and relevant documentation, for more details.

Related

Capture NOTICE from PostgreSQL in Perl DBI

I'm using full text search with user generated input on a PostgreSQL table, "the a" in this example.
my $dbh = DBI->connect("...", { RaiseError => 0, AutoCommit => 1 });
my $sth = $dbh->prepare("SELECT name from mytable WHERE tsv ## plainto_tsquery('the a')");
my $rv = $sth->execute();
If the user input only contains stop words, I get a NOTICE on STDERR and the query returns no results, but produces no error.
I would like to capture that NOTICE in Perl to possibly alert the user to rephrase the search, but I can't seem to access it.
Setting RaiseError to 1 doesn't change anything and $dbh->pg_notifies returns undef.
Any ideas ?

I presume that the mention of "NOTICE" means that something does a RAISE. How that behaves is configurable but delving into that would require far more detail.
At perl level, there are various ways to get to the STDERR stream and capture what's sent to it.
If these are warn-ings then setting a hook for $SIG{__WARN__} runs that subroutine whenever a warning is to be printed
{
local $SIG{__WARN__} = sub { say "Got warning: $_[0]"; };
warn "a warning";
}
So you can capture it this way, do what you wish with it, and then perhaps reemit. The $_[0] in the example has the string a warning, and after the anonymous sub runs the control is back at the next line after a warning is issued (the warn statement in this snippet). See %SIG.
I put this in a block only to be able to local-ize the change to SIG{__WARN__}, what is practically compulsory (if this global isn't local-ized its change affects all code). So if this code is in a suitable lexical scope anyway then that block isn't needed.
But this won't catch things printed directly to STDERR. For that the most practical way is to use libraries, and a simple and convenient one for this purpose is Capture::Tiny
use Capture::Tiny qw(capture);
my ($stdout, $stderr, $exit) = capture {
say "to STDOUT";
say STDERR "to STDERR";
warn "a warn-ing";
# ...
};
Now $stdout has text to STDOUT while $stderr has to STDERR followed by a warn-ing. So your database code would go in such a block and the NOTICE should wind up in that $stderr variable. There is also a capture_stderr function if you prefer to capture only that this way.

That doesn't seem to work as it should, and it is recognized as an unsolved bug.

Perl: `die` did not work upon opening a nonexistent gz file using gzip

The following script creates a gziped file named "input.gz". Then the script attempts to open "input.gz" using gzip -dc. Intuitively, die should be triggered if a wrong input file name is provided. However, as in the following script, the program will not die even if a wrong input file name is provided ("inputx.gz"):
use warnings;
use strict;
system("echo PASS | gzip -c > input.gz");
open(IN,"-|","gzip -dc inputx.gz") || die "can't open input.gz!";
print STDOUT "die statment was not triggered!\n";
close IN;
The output of the script above was
die statment was not triggered!
gzip: inputx.gz: No such file or directory
My questions is: why wasn't die statement triggered even though gzip quit with error? And how can I make die statement triggered when a wrong file name is given?

It's buried in perlipc, but this seems relevant (emphasis added):
Be careful to check the return values from both open() and close(). If you're writing to a pipe, you should also trap SIGPIPE. Otherwise, think of what happens when you start up a pipe to a command that doesn't exist: the open() will in all likelihood succeed (it only reflects the fork()'s success), but then your output will fail--spectacularly. Perl can't know whether the command worked, because your command is actually running in a separate process whose exec() might have failed. Therefore, while readers of bogus commands return just a quick EOF, writers to bogus commands will get hit with a signal, which they'd best be prepared to handle.
Use IO::Uncompress::Gunzip to read gzipped files instead.

The open documentation is explicit about open-ing a process since that is indeed different
If you open a pipe on the command - (that is, specify either |- or -| with the one- or two-argument forms of open), an implicit fork is done, so open returns twice: in the parent process it returns the pid of the child process, and in the child process it returns (a defined) 0. Use defined($pid) or // to determine whether the open was successful.
For example, use either
my $child_pid = open(my $from_kid, "-|") // die "Can't fork: $!";
or
my $child_pid = open(my $to_kid, "|-") // die "Can't fork: $!";
(with code following that shows one use of this, which you don't need) The main point is to check for defined -- by design we get undef if open for a process fails, not just any "false."
While this should be corrected, keep in mind that the open call fails if fork itself fails, what is rare; in most cases when a "command fails" the fork was successful but something later wasn't. So in such cases we just cannot get the // die message, but end up seeing messages from the shell or command or OS, hopefully.
This is alright though, if informative messages indeed get emitted by some part of the process. Wrap the whole thing in eval and you'll have manageable error reporting.
But it is in general difficult to ensure to get all the right messages, and in some cases not possible. One good approach is to use a module for running and managing external commands. Among the many other advantages they also usually handle errors much more nicely. If you need to handle process's output right as it is emitted I recommend IPC::Run (which i'd recommend otherwise as well).
Read on what linked docs say, for specific examples on what you need and for much useful insight.
In your case
# Check input, depending on how it is given,
# consider String::ShellQuote if needed
my $file = ...;
my #cmd = ('gzip', '-dc', $file);
my $child_pid = open my $in, '-|', #cmd
// die "Can't fork for '#cmd': $!";
while (<$in>) {
...
}
close $in or die "Error closing pipe: $!";
Note a few other points
the "list form" of the command bypasses the shell
lexical filehandle (my $fh) is much better than typeglobs (IN)
print the actual error in the die statement, in $! variable
check close for a good final check on how it all went

Fallback Open File Perl

I was trying to write a program where perl opens one file, but falls back to another if that file does not exist or cannot be opened for some reason. The relevant line was:
open(my $fh,"<","/path/to/file") or open (my $fh,"<","/path/to/alternate/file") or die
Eventually, I figured out that:
open(my $fh,"<","/path/to/file") or open ($fh,"<","/path/to/alternate/file") or die
worked. What is the difference between these two statements, why doesn't the first work, and is the second the right way to do this, or are there still some problems with it?
Edit: If it matters, I'm using perl 5.12, and the first fails in the case that "/path/to/file" exists. My inclination is that the second open should not run if the first open is successful, so why is $fh being overwritten by the second?

my declares a variable. If you use it twice with the same name in the same scope, later mentions of it will be the second one, not the first. Your code will trigger a "my" variable ... masks earlier declaration in the same statement warning (if you enable warnings as you should.) So if the first open succeeds, it sets a $fh variable that isn't accessible later, and the second variable is left in an undocumented, undefined state, because its declaration wasn't actually executed. (See the "Here be dragons" warning in perldoc perlsyn, and realize that A or B is equivalent to B unless A.)
Your "working" code is also broken; while my returns the newly declared variable, which can be then set, the scope of a lexical (where later mentions of it find the variable) doesn't actually begin until the following statement. So your first $fh is the lexical that will be accessed on later lines, but the second is actually a global variable (or an error, if you are using strict as you should).
Correct code is:
my $fh;
open $fh, ... or open $fh, ...;

Others have said why the existing code doesn't work, but have also offered versions that have race conditions: the state of the file might change between when you checked it and when you opened it. It's fairly benign in your case, but it can produce subtle bugs and security holes. In general, you check if you can open a file by trying to open a file.
Here's a more general way which scales to multiple files, lets you know which file opened, and contains no race conditions.
use Carp;
sub try_open {
my #files = #_;
for my $file (#files) {
if( open my $fh, "<", $file ) {
return { fh => $fh, file => $file };
}
}
croak "Can't open any of #files";
}

In Perl, why does print not generate any output after I close STDOUT?

I have the code:
open(FILE, "<$new_file") or die "Cant't open file \n";
#lines=<FILE>;
close FILE;
open(STDOUT, ">$new_file") or die "Can't open file\n";
$old_fh = select(OUTPUT_HANDLE);
$| = 1;
select($old_fh);
for(#lines){
s/(.*?xsl.*?)xsl/$1xslt/;
print;
}
close(STDOUT);
STDOUT -> autoflush(1);
print "file changed";
After closing STDOUT closing the program does not write the last print print "file changed". Why is this?
*Edited* Print message I want to write on Console no to file

I suppose it is because print default filehandle is STDOUT, which at that point it is already closed. You could reopen it, or print to other filehandle, for example, STDERR.
print STDERR "file changed";

It's because you've closed the filehandle stored in STDOUT, so print can't use it anymore. Generally speaking opening a new filehandle into one of the predefined handle names isn't a very good idea because it's bound to lead to confusion. It's much clearer to use lexical filehandles, or just a different name for your output file. Yes you then have to specify the filehandle in your print call, but then you don't have any confusion over what's happened to STDOUT.

A print statement will output the string in the STDOUT, which is the default output file handle.
So the statement
print "This is a message";
is same as
print STDOUT "This is a message";
In your code, you have closed STDOUT and then printing the message, which will not work. Reopen the STDOUT filehandle or do not close it. As the script ends, the file handles will be automatically closed

open OLDOUT, ">&", STDOUT;
close STDOUT;
open(STDOUT, ">$new_file") or die "Can't open file\n";
...
close(STDOUT);
open (STDOUT, ">&",OLDOUT);
print "file changed";

You seem to be confused about how file IO operations are done in perl, so I would recommend you read up on that.
What went wrong?
What you are doing is:
Open a file for reading
Read the entire file and close it
Open the same file for overwrite (org file is truncated), using the STDOUT file handle.
Juggle around the default print handle in order to set autoflush on a file handle which is not even opened in the code you show.
Perform a substitution on all lines and print them
Close STDOUT then print a message when everything is done.
Your main biggest mistake is trying to reopen the default output file handle STDOUT. I assume this is because you do not know how print works, i.e. that you can supply a file handle to print to print FILEHANDLE "text". Or that you did not know that STDOUT was a pre-defined file handle.
Your other errors:
You did not use use strict; use warnings;. No program you write should be without these. They will prevent you from doing bad things, and give you information on errors, and will save you hours of debugging.
You should never "slurp" a file (read the entire file to a variable) unless you really need to, because this is ineffective and slow and for huge files will cause your program to crash due to lack of memory.
Never reassign the default file handles STDIN, STDOUT, STDERR, unless A) you really need to, B) you know what you are doing.
select sets the default file handle for print, read the documentation. This is rarely something that you need to concern yourself with. The variable $| sets autoflush on (if set to a true value) for the currently selected file handle. So what you did actually accomplished nothing, because OUTPUT_HANDLE is a non-existent file handle. If you had skipped the select statements, it would have set autoflush for STDOUT. (But you wouldn't have noticed any difference)
print uses print buffers because it is efficient. I assume you are trying to autoflush because you think your prints get caught in the buffer, which is not true. Generally speaking, this is not something you need to worry about. All the print buffers are automatically flushed when a program ends.
For the most part, you do not need to explicitly close file handles. File handles are automatically closed when they go out of scope, or when the program ends.
Using lexical file handles, e.g. open my $fh, ... instead of global, e.g. open FILE, .. is recommended, because of the previous statement, and because it is always a good idea to avoid global variables.
Using three-argument open is recommended: open FILEHANDLE, MODE, FILENAME. This is because you otherwise risk meta-characters in your file names to corrupt your open statement.
The quick fix:
Now, as I said in the comments, this -- or rather, what you intended, because this code is wrong -- is pretty much identical to the idiomatic usage of the -p command line switch:
perl -pi.bak -e 's/(.*?xsl.*?)xsl/$1xslt/' file.txt
This short little snippet actually does all that your program does, but does it much better. Explanation:
-p switch automatically assumes that the code you provide is inside a while (<>) { } loop, and prints each line, after your code is executed.
-i switch tells perl to do inplace-edit on the file, saving a backup copy in "file.txt.bak".
So, that one-liner is equivalent to a program such as this:
$^I = ".bak"; # turns inplace-edit on
while (<>) { # diamond operator automatically uses STDIN or files from #ARGV
s/(.*?xsl.*?)xsl/$1xslt/;
print;
}
Which is equivalent to this:
my $file = shift; # first argument from #ARGV -- arguments
open my $fh, "<", $file or die $!;
open my $tmp, ">", "/tmp/foo.bar" or die $!; # not sure where tmpfile is
while (<$fh>) { # read lines from org file
s/(.*?xsl.*?)xsl/$1xslt/;
print $tmp $_; # print line to tmp file
}
rename($file, "$file.bak") or die $!; # save backup
rename("/tmp/foo.bar", $file) or die $!; # overwrite original file
The inplace-edit option actually creates a separate file, then copies it over the original. If you use the backup option, the original file is first backed up. You don't need to know this information, just know that using the -i switch will cause the -p (and -n) option to actually perform changes on your original file.
Using the -i switch with the backup option activated is not required (except on Windows), but recommended. A good idea is to run the one-liner without the option first, so the output is printed to screen instead, and then adding it once you see the output is ok.
The regex
s/(.*?xsl.*?)xsl/$1xslt/;
You search for a string that contains "xsl" twice. The usage of .*? is good in the second case, but not in the first. Any time you find yourself starting a regex with a wildcard string, you're probably doing something wrong. Unless you are trying to capture that part.
In this case, though, you capture it and remove it, only to put it back, which is completely useless. So the first order of business is to take that part out:
s/(xsl.*?)xsl/$1xslt/;
Now, removing something and putting it back is really just a magic trick for not removing it at all. We don't need magic tricks like that, when we can just not remove it in the first place. Using look-around assertions, you can achieve this.
In this case, since you have a variable length expression and need a look-behind assertion, we have to use the \K (mnemonic: Keep) option instead, because variable length look-behinds are not implemented.
s/xsl.*?\Kxsl/xslt/;
So, since we didn't take anything out, we don't need to put anything back using $1. Now, you may notice, "Hey, if I replace 'xsl' with 'xslt', I don't need to remove 'xsl' at all." Which is true:
s/xsl.*?xsl\K/t/;
You may consider using options for this regex, such as /i, which causes it to ignore case and thus also match strings such as "XSL FOO XSL". Or the /g option which will allow it to perform all possible matches per line, and not just the first match. Read more in perlop.
Conclusion
The finished one-liner is:
perl -pi.bak -e 's/xsl.*?xsl\K/t/' file.txt

Is it kosher to assign to $! in Perl?

Is it OK to assign to $! on an error in Perl?
E.g.,
if( ! (-e $inputfile))
{
$! = "Input file $inputfile appears to be non-existent\n";
return undef;
}
This way I can handle all errors at the top-level.
Thanks.

If you assign to $!, it is placed in the system errno variable, which only takes numbers. So you can in fact do:
use Errno "EEXIST";
$! = EEXIST;
print $!;
and get the string value for a defined system error number, but you can't do what you want - setting it to an arbitrary string. Such a string will get you a Argument "..." isn't numeric in scalar assignment warning and leave errno set to 0.
The other problem is that $! may be changed by any system call. So you can only trust it to have the value you set until you do a print or just about anything else. You probably want your very own error variable.

Well, the documentation says it's an indicator for system errors, basically. So I wouldn't assign to it, you just make your lib's users mad.
Use exceptions instead:
eval { # this ain't the evil eval
# some code
die $something;
}
if (my $err = $#) {
# exception handling
}
Note that you can "throw", or die, with about anything you need..

My Rabbi said "no!"

Setting $! is fine, but:
you should do it at the end of your function
you should return a different value that indicates that an error occurred
you should use your OS's errno values rather than strings for setting purposes
the checking code needs to check the value should do so immediately on failure of the function (and only if a failure is indicated by the function)
Something to keep in mind is that die uses the value in $! for its exit code (so long as it is not zero).

Yes you can assign stuff(#'s) to $!, just be wary of where you do it so you don't mess up some other functions message.

If you only have one variable to store errors, you'll have problems if you have more than one error occurring in your program before checking the status of your error variable. That's worth avoiding if you can help it.
Thankfully in Perl you can help it. A really nice solution is to use object-oriented exception handling from Error.pm. This module will allow you to write try/catch blocks, like this:
try {
some code;
code that might thrown an exception;
more code;
return;
}
catch Error with {
my $ex = shift; # Get hold of the exception object
# handle the exception;
};
The CPAN documentation for the module is quite good, and there is a Perl.com article on the subject too.

$! has so many caveats, being a global variable which lots of functions assign to (some of them C functions which Perl calls), that I would simply throw an exception (which in Perl means dying) and let the user trap it if they care. So instead of writing:
$obj->foo or die $!;
$obj->bar or die $!;
$obj->baz or die $!;
or even
$obj->foo or die $obj->error;
$obj->bar or die $obj->error;
$obj->baz or die $obj->error;
you can just write
$obj->foo;
$obj->bar;
$obj->baz;
and know that if there is an error you'll be informed of it. Also anyone above you will be informed and can trap it. Since that's the most common case make it happen without the user needing to remember it and type it over and over and over again.
Should you want to ignore or recover from the error, simply use eval BLOCK.
eval { $obj->foo }; # don't care if it works or not
eval { $obj->bar } or do { ...something when it doesn't work... };
Since this is the exceptional case it should be where the user has to remember to add more code and does more work.
Examples of this approach include DBI's RaiseError flag, which is only off by default for backwards compatibility, and the wonderful autodie module.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse