How does Perl know how many bytes to read in a handle from IO::Select::->can_read? - perl

I'm using IO::Select's can_read method to select file handles that are ready for reading in a simple perl script.
However, the <...> operator on filehandles does not require a length to be passed to it.
Is IO::Select reaching inside the filehandle to set the "appropriate length" ... or what exactly is happening?
#!/usr/bin/env perl
use IO::Select;
use strict;
use warnings;
my #handles = IO::Select->new(\*STDIN)->can_read(3);
#handles == 1 or die;
my $handle = $handles[0];
print ("I read " . <$handle> . "\n");
For instance, the following script prints "a\n" immediately and then exits after three seconds.
% sh -c 'echo a; sleep 5; echo b' | perl reader.pl
I read a
Exit 141
It then exits abnormally for some strange reason ... not sure where the exit status is being set.
EDIT: the apparent abnormal exit appears to be a bug in tcsh.

Neither IO::Select knows the length nor the <...> operator or readline function knows it. Instead it will just try to read what is there until the end of the line. If no end of line character is found (i.e. $/) it will simply return all available data in case of non-blocking file handle or wait until end of line or end of data in case of a blocking file handle.
In your specific case echo a; sleep 5; echo b results in a line a\n and 5 seconds later in a line b\n. Since your code uses <..> in scalar context it will only read a single line. This means it will stop after the first line end was found and return this line, resulting in a\n.

Related

Under what circumstances are END blocks skipped in Perl?

I have a long-running program that used File::Temp::tempdir to create a temporary file and sometimes interrupted it via ^C.
The following program prints the name of the temporary directory it creates and the name of a file in it.
#!/usr/bin/env perl
use strict;
use warnings;
use File::Temp qw[tempdir];
my $dir = tempdir(CLEANUP => 1);
print "$dir\n";
print "$dir/temp.txt\n";
`touch $dir/temp.txt`;
exit;
On OS X, this creates a directory inside /var/folders
If the last line is exit; or die;, then the folder will get cleaned up and the temporary file inside it will get deleted.
However, if we replace the last line with sleep 20; and then interrupt the perl program via ^C, the temporary directory remains.
% perl maketemp.pl
/var/folders/dr/cg4fl5m11vg3jfxny3ldfplc0000gn/T/ycilyLSFs6
/var/folders/dr/cg4fl5m11vg3jfxny3ldfplc0000gn/T/ycilyLSFs6/temp.txt
^C
% stat /var/folders/dr/cg4fl5m11vg3jfxny3ldfplc0000gn/T/ycilyLSFs6/temp.txt
16777220 6589054 -rw-r--r-- 1 <name> staff 0 0 "Aug 1 20:46:27 2016" "Aug 1 20:46:27 2016" "Aug 1 20:46:27 2016" "Aug 1 20:46:27 2016" 4096 0 0
/var/folders/dr/cg4fl5m11vg3jfxny3ldfplc0000gn/T/ycilyLSFs6/temp.txt
%
using a signal handler that just calls exit; does clean up the directory. E.g.
#!/usr/bin/env perl
use strict;
use warnings;
use File::Temp qw[tempdir];
$SIG{INT} = sub { exit; };
my $dir = tempdir(CLEANUP => 1);
print "$dir\n";
print "$dir/temp.txt\n";
`touch $dir/temp.txt`;
sleep 20;
As does using a "trivial" signal handler
#!/usr/bin/env perl
use strict;
use warnings;
use File::Temp qw[tempdir];
$SIG{INT} = sub { };
my $dir = tempdir(CLEANUP => 1);
print "$dir\n";
print "$dir/temp.txt\n";
`touch $dir/temp.txt`;
sleep 20;
I tried looking through the source code (https://github.com/Perl-Toolchain-Gang/File-Temp/blob/master/lib/File/Temp.pm) to determine how tempdir is registering a cleanup action
Here's the exit handler installation
https://github.com/Perl-Toolchain-Gang/File-Temp/blob/master/lib/File/Temp.pm#L1716
which calls _deferred_unlink
https://github.com/Perl-Toolchain-Gang/File-Temp/blob/master/lib/File/Temp.pm#L948
which modified the global hashes %dirs_to_unlink and %files_to_unlink, but uses the pid $$ as a key for some reason (probably in case the Perl interpreter forks? Not sure why that's necessary though since removing a directory seems like it would be an idempotent operation.)
The actual logic to clean up the files is here, in the END block.
https://github.com/Perl-Toolchain-Gang/File-Temp/blob/master/lib/File/Temp.pm#L878
A quick experiment shows that END blocks are indeed run when perl has exited normally or abnormally.
sleep 20;
END {
print "5\n";
}
# does not print 5 when interrupted
And are run here
$SIG{INT} = sub {};
sleep 20;
END {
print "5\n";
}
# does print 5 when interrupted
So ... why does the END block get skipped after a SIGINT unless there's a signal handler, even one that seems like it should do nothing?
By default, SIGINT kills the process[1]. By kill, I mean the process is immediately terminated by the kernel. The process doesn't get to perform any cleanup.
By setting a handler for SIGINT, you override this behaviour. Instead of killing the process, the signal handler is called. It might not do anything, but its very existence prevented the process from being killed. In this situation, the program won't exit as a result of the signal unless it chooses to exit (by calling die or exit in the handler. If it does, it would get a chance to cleanup as normal.
Note that if a signal for which a handler was defined comes in during a system call, the system call exits with error EINTR in order to allow the program to safely handle the signal. This is why sleep returns as soon as SIGINT is received.
If instead you had used $SIG{INT} = 'IGNORE';, the signal would have been completely ignored. Any systems calls in progress won't be interrupted.
On my system, man 1 kill lists the default actions of signals.
Your signal handler $SIG{INT} = sub {} isn't doing nothing, it is trapping the signal and preventing the program from exiting.
But to answer your original question, END blocks, as perlmod says:
is executed as late as possible, that is, after perl has finished running the program and just before the interpreter is being exited, even if it is exiting as a result of a die() function. (But not if it's morphing into another program via exec, or being blown out of the water by a signal--you have to trap that yourself (if you can).)
That is, a fatal signal, if not trapped, circumvents Perl's global destruction and does not call END blocks.

Perl: process string with shell command (pipe)

Assume a pipeline with three programs:
start | middle | end
If start and end are now part of one perl script, how can I pipe data through a shell command in the perl script, in order to pass through middle?
I tried the following (apologies for lack of strict mode, it was supposed to be a simple proof of concept):
#!/usr/bin/perl -n
# Output of "start" stage
$start = "a b c d\n";
# This shell command is "middle"
open (PR, "| sed -E 's/a/-/g' |") or die 'Failed to start sed';
# Pipe data from "start" into "middle"
print PR $start;
# Read data from "middle" into "end"
$end = "";
while (<PR>) {
$end .= $_;
}
close PR;
# Apply "end" and print output
$end =~ s/b/+/g;
print $end;
Expected output:
- + c d
Actual output:
none, until I hit ENTER, then I get - b c d. The middle command is receiving data from start and processing it, but the output is going to STDOUT instead of end. Also, the attempt to read from middle seems to be reading from STDIN instead (hence the relevance of hitting ENTER).
I'm aware that this could all easily be done in one line of perl (or sed); my problem is how to do piping in perl, not how to replace chars in a string.
You can use IPC::Open2 for this.
This code creates two file handles: $to_sed, which you can print to to send input to the program, and $from_sed which you can readline (or <$from_sed>) from to read the program's output.
use IPC::Open2;
my $pid = open2(my ($from_sed, $to_sed), "sed -E 's/a/-/g'");
Most often it is simplest to involve the shell, but there is an alternative call that allows you to bypass the shell and instead run a program and populate its argv directly. It is described in the linked documentation.
The reason your code does nothing until you hit enter is because you are using perl -n.
-n causes Perl to assume the following loop around your program, which makes it iterate over filename arguments
somewhat like sed -n or awk:
LINE:
while (<>) {
... # your program goes here
}
The part in your code where you read your file again returns nothing.
If you turn on warnings you will discover that perl doesn't do bi-directional pipes.

IO::Pipe - close(<handle>) does not set $?

My understanding is that closing the handle for an IO::Pipe object should be done with the method ($fh->close) and not the built-in (close($fh)).
The other day I goofed and used the built-in out of habit on a IO::Pipe object that was opened to a command that I expected to fail. I was surprised when $? was zero, and my error checking wasn't triggered.
I realized my mistake. If I use the built-in, IO:Pipe can't perform the waitpid() and can't set $?. But what I was surprised by was that perl seemed to still close the pipe without setting $? via the core.
I worked up a little test script to show what I mean:
use 5.012;
use warnings;
use IO::Pipe;
say 'init pipes:';
pipes();
my $fh = IO::Pipe->reader(q(false));
say 'post open pipes:';
pipes();
say 'return: ' . $fh->close;
#say 'return: ' . close($fh);
say 'status: ' . $?;
say q();
say 'post close pipes:';
pipes();
sub pipes
{
for my $fd ( glob("/proc/self/fd/*") )
{
say readlink($fd) if -p $fd;
}
say q();
}
When using the method it shows the pipe being gone after the close and $? is set as I expected:
init pipes:
post open pipes:
pipe:[992006]
return: 1
status: 256
post close pipes:
And, when using the built-in it also appears to close the pipe, but does not set $?:
init pipes:
post open pipes:
pipe:[952618]
return: 1
status: 0
post close pipes:
It seems odd to me that the built-in results in the pipe closure, but doesn't set $?. Can anyone help explain the discrepancy?
Thanks!
If you look at the code for IO::Handle (of which IO::Pipe::End is a sub-class), you will see the following:
sub close {
#_ == 1 or croak 'usage: $io->close()';
my($io) = #_;
close($io);
}
It looks like $fh->close just calls close $fh. Of course, we should not be peeking behind the curtain.
We can see after IO::Pipe does a close $fh (behind the scenes), it then does a waitpid:
package IO::Pipe::End;
our(#ISA);
#ISA = qw(IO::Handle);
sub close {
my $fh = shift;
my $r = $fh->SUPER::close(#_); # <-- This just calls a CORE::close
waitpid(${*$fh}{'io_pipe_pid'},0)
if(defined ${*$fh}{'io_pipe_pid'});
$r;
}
Also interesting is this from the close Perldoc:
If the filehandle came from a piped open, close returns false if one of the other syscalls involved fails or if its program exits with non-zero status. If the only problem was that the program exited non-zero, $! will be set to 0 .
Closing a pipe also waits for the process executing on the pipe to exit --in case you wish to look at the output of the pipe
afterwards--and implicitly puts the exit status value of that command
into $? and ${^CHILD_ERROR_NATIVE} .
That answers your question right there.
But what I was surprised by was that perl seemed to still close the pipe without setting $? via the core.
Why would it? It has no way to know the process at the other end is a child, much less one for which the program should wait. Since it has no reason to call waitpid, $? isn't going to get set.
In fact, I doubt it wait for the process at the other end of the pipe even if it wanted to, because I doubt there's a way of obtaining the pid of the process at the other end of the pipe, because it's actually possible for there to be multiple processes at the other end of the pipe.
IO::Pipe::close only calls waitpid when IO::Pipe is used to "open a process".
Similarly, close only calls waitpid when open is used to "open a process".
A process "opened" using one method cannot be closed by the other.
It turns out that my confusion stems from a flawed assumption that the disappearing pipe coincided with a complete process termination. That appears to not be the case, as the process is still available for a wait().
> perl -MIO::Pipe -le 'my $io = IO::Pipe->reader(q(false)); close($io); print $?; print wait(); print $?'
0
8857
256

Perl -> How return value of qx(perl file)

I need to know how is possible return values of Perl file from other Perl file.
In my first file i call to the second file with sentence similar to:
$variable = qx( perl file2.pl --param1 $p1 --param2 $p2);
I have tried with exit and return to get this data but is not possible.
Any idea?
Processes are no subroutines.
Communication between processes (“IPC”) is mostly done via normal file handles. Such file handles can specifically be
STDIN and STDOUT,
pipes that are set up by the parent process, these are then shared by the child,
sockets
Every process also has an exit code. This code is zero for success, and non-zero to indicate a failure. The code can be any integer in the range 0–255. The exit code can be set via the exit function, e.g. exit(1), and is also set by die.
Using STDIN and STDOUT is the normal mode of operation for command line programs that follow the Unix philosophy. This allows them to be chained with pipes to more complex programs, e.g.
cat a b c | grep foo | sort >out
Such a tool can be implemented in Perl by reading from the ARGV or STDIN file handle, and printing to STDOUT:
while (<>) {
# do something with $_
print $output;
}
Another program can then feed data to that script, and read it from the STDOUT. We can use open to treat the output as a regular file handle:
use autodie;
open my $tool, "-|", "perl", "somescript.pl", "input-data"; # notice -| open mode
while (<$tool>) {
...
}
close $tool;
When you want all the output in one variable (scalar or array), you can use qx as a shortcut: my $tool_output = qx/perl somescript.pl input-data/, but this has two disadvantages: One, a shell process is executed to parse the command (shell escaping problems, inefficiency). Two, the output is available only when the command has finished. Using open on the other hand allows you to do parallel computations.
In file2.pl, you must print something to STDOUT. For example:
print "abc\n";
print is the solution.
Sorry for my idiot question!
#
$variable = system( perl file2.pl --param1 $p1 --param2 $p2);
#$variable has return value of perl file2.pl ...

Perl - Master script calling sub-scripts and return status

Here's the design I want to accomplish in Perl:
A master script calls multiple sub-scripts. The master script controls the calling of each sub-script in a particular sequence and records output from each sub-script in order to decide whether on not to call the next script.
Currently, I have a master script that calls the sub-script using a system() call, but I am having trouble having the sub-script communicate back status to the master script.
Do not want to use sub functions, would really like to keep each of the sub-script code separate.
To shed more light on the problem:
The sub script should decide what to report back to the master script. For eg: sub script sends code 1 when sub script finds a string value in the database, it sends a code 2 when the sub string doesn't find the file its looking for, and sends a code of 0 when everything goes fine.
Can't you just use exit codes for this?
my $code = system( 'perl', '-e', 'exit 2;' ) >> 8; # $code = 2
say "\$code=$code";
Exit codes can be 255 distinct values.
You can execute and capture output from system commands with backtick syntax.
# get result as scalar
$result = `ls -lA`;
# get the result as an array, each line of output is a separate array entry
#result = `ls -lA`;
Whenever you use the backtick syntax, the exit status of the command is also stored in the automatic variable $?
You can then have the master script decide if the output is good or not using whatever logic you need.
Looking at Axeman's answer you could use the IPC::System::Simple module:
#!/usr/bin/perl
use warnings;
use 5.012;
use IPC::System::Simple qw(system $EXITVAL EXIT_ANY);
system( [2], 'perl', '-e', 'exit 2' );
say "EXITVAL: $EXITVAL";