Perl: Pass one byte plus STDIN to another command - perl

I would like to do this efficiently:
my $buf;
my $len = read(STDIN,$buf,1);
if($len) {
# Not empty
open(OUT,"|-", "wc") || die;
print OUT $buf;
# This is the line I want to do faster
print OUT <STDIN>;
exit;
}
The task is to start wc only if there is any input. If there is no input, the program should just exit.
wc is just an example here. It will be substituted with a much more complex command.
The input can be of several TB of data, so I would really like to not touch that data at all (not even with a sysread). I tried doing:
pipe(STDIN,OUT);
But that doesn't work. Is there some other way that I can tell OUT that after it has gotten the first byte, it should just read from STDIN? Maybe some open(">=&2") gymnastics combined with exec?

The FIONREAD ioctl, mentioned in the Perl Cookbook, can tell you how many bytes are pending on a file descriptor without consuming them. In perlish terms:
use strict;
use warnings;
use IO::Select qw( );
BEGIN { require 'sys/ioctl.ph'; }
sub fionread {
my $sz = pack('L', 0);
return unless ioctl($_[0], FIONREAD, $sz);
return unpack('L', $sz);
}
# Wait until it's known whether the handle has data to read or has reached EOF.
IO::Select->new(\*STDIN)->can_read();
if (fionread(\*STDIN)) {
system('wc');
# Check for errors
}
This should be very widely portable to UNIX and UNIX-like platforms.

A child process is always given duplicates of its parent's file handles, so simply starting wc - either with backticks or with a call to system or exec - will cause it to read from the same place as the Perl process's STDIN.
As for starting wc only when there is something to read, it looks like you need IO::Select, which will allow you either to check whether a file handle has something to read, or to block until it does have something.
This program will check whether STDIN has any data waiting, and run wc and print its output if so.
use strict;
use warnings;
use IO::Select;
my $select = IO::Select->new(\*STDIN);
if ( $select->can_read(0) ) {
print `wc`;
}
The parameter to can_read is a timeout in seconds. Passing a value of zero makes it return immediately, reporting true (actually it returns the file handle itself) if there is data waiting, or false (undef) if not.
If you don't pass a parameter then can_read will wait forever until there is something to read, so you can suspend your program and wait for data for wc by writing just
$select->can_read;
print `wc`;
or you could combine the construction of the object to make it even more concise
IO::Select->new(\*STDOUT)->can_read;
print `wc`;
Note also that IO::Select works fine with file descriptors too, and as the fileno for STDIN is zero, you could write
my $select = IO::Select(0)
but that isn't very descriptive and would need a comment to make sense

The specific solution in which you're interested is impossible.
As you surely discovered already, you can't determine if a file handle has reached EOF without reading from it. [Apparently, you can] select(2) will get you close. It will tell you that a handle has reached EOF or has data waiting, but it won't tell you which. This is why you're looking into alternate solutions. Unfortunately, the one you're looking into is just as impossible.
Is there some other way that I can tell OUT that after it has gotten the first byte, it should just read from STDIN?
No. OUT isn't code; it doesn't read anything. It's a variable. Furthermore, it's a variable in the parent. Changing a variable in the parent isn't going to affect the child.
Maybe you meant to ask: Can one tell the child program to start reading from a second handle?
No, generally speaking. You can't go and edit another program's variables. The program would have to be specifically written to accept two file handles and read from one after the other.
Then again, it's possible to obtain a file name for an arbitrary file handle, so all we need is a program that is specifically written to accept two file names and read from one after the other, and that's quite common.
$ echo abcdef | perl -MFcntl -e'
if (sysread(STDIN, $buf, 1)) {
pipe(my $r, my $w);
my $pid = fork();
if (!$pid) {
close($w);
# Clear close-on-exec flag.
my $flags = fcntl($r, Fcntl::F_GETFD, 0);
fcntl($r, Fcntl::F_SETFD, $flags & ~Fcntl::FD_CLOEXEC);
exec("cat", "/proc/$$/fd/".fileno($r), "/proc/$$/fd/".fileno(STDIN));
die $!;
}
close($r);
print($w $buf);
close($w);
waitpid($pid, 0);
}
'
abcdef
(Lots of error checking needed.)
Above, cat was used an example where your program would be used, but that presents another solution: Why not just use cat? The overhead of cat should be quite minor for an IO-bound program.
use String::ShellQuote qw( shell_quote );
my $cmd1 = shell_quote("cat", "/proc/$$/fd/".fileno($r), "/proc/$$/fd/".fileno(STDIN));
my $cmd2 = ...
exec("$cmd1 | $cmd2");

Related

How to pipe to and read from the same tempfile handle without race conditions?

Was debugging a perl script for the first time in my life and came over this:
$my_temp_file = File::Temp->tmpnam();
system("cmd $blah | cmd2 > $my_temp_file");
open(FIL, "$my_temp_file");
...
unlink $my_temp_file;
This works pretty much like I want, except the obvious race conditions in lines 1-3. Even if using proper tempfile() there is no way (I can think of) to ensure that the file streamed to at line 2 is the same opened at line 3. One solution might be pipes, but the errors during cmd might occur late because of limited pipe buffering, and that would complicate my error handling (I think).
How do I:
Write all output from cmd $blah | cmd2 into a tempfile opened file handle?
Read the output without re-opening the file (risking race condition)?
You can open a pipe to a command and read its contents directly with no intermediate file:
open my $fh, '-|', 'cmd', $blah;
while( <$fh> ) {
...
}
With short output, backticks might do the job, although in this case you have to be more careful to scrub the inputs so they aren't misinterpreted by the shell:
my $output = `cmd $blah`;
There are various modules on CPAN that handle this sort of thing, too.
Some comments on temporary files
The comments mentioned race conditions, so I thought I'd write a few things for those wondering what people are talking about.
In the original code, Andreas uses File::Temp, a module from the Perl Standard Library. However, they use the tmpnam POSIX-like call, which has this caveat in the docs:
Implementations of mktemp(), tmpnam(), and tempnam() are provided, but should be used with caution since they return only a filename that was valid when function was called, so cannot guarantee that the file will not exist by the time the caller opens the filename.
This is discouraged and was removed for Perl v5.22's POSIX.
That is, you get back the name of a file that does not exist yet. After you get the name, you don't know if that filename was made by another program. And, that unlink later can cause problems for one of the programs.
The "race condition" comes in when two programs that probably don't know about each other try to do the same thing as roughly the same time. Your program tries to make a temporary file named "foo", and so does some other program. They both might see at the same time that a file named "foo" does not exist, then try to create it. They both might succeed, and as they both write to it, they might interleave or overwrite the other's output. Then, one of those programs think it is done and calls unlink. Now the other program wonders what happened.
In the malicious exploit case, some bad actor knows a temporary file will show up, so it recognizes a new file and gets in there to read or write data.
But this can also happen within the same program. Two or more versions of the same program run at the same time and try to do the same thing. With randomized filenames, it is probably exceedingly rare that two running programs will choose the same name at the same time. However, we don't care how rare something is; we care how devastating the consequences are should it happen. And, rare is much more frequent than never.
File::Temp
Knowing all that, File::Temp handles the details of ensuring that you get a filehandle:
my( $fh, $name ) = File::Temp->tempfile;
This uses a default template to create the name. When the filehandle goes out of scope, File::Temp also cleans up the mess.
{
my( $fh, $name ) = File::Temp->tempfile;
print $fh ...;
...;
} # file cleaned up
Some systems might automatically clean up temp files, although I haven't care about that in years. Typically is was a batch thing (say once a week).
I often go one step further by giving my temporary filenames a template, where the Xs are literal characters the module recognizes and fills in with randomized characters:
my( $name, $fh ) = File::Temp->tempfile(
sprintf "$0-%d-XXXXXX", time );
I'm often doing this while I'm developing things so I can watch the program make the files (and in which order) and see what's in them. In production I probably want to obscure the source program name ($0) and the time; I don't want to make it easier to guess who's making which file.
A scratchpad
I can also open a temporary file with open by not giving it a filename. This is useful when you want to collect outside the program. Opening it read-write means you can output some stuff then move around that file (we show a fixed-length record example in Learning Perl):
open(my $tmp, "+>", undef) or die ...
print $tmp "Some stuff\n";
seek $tmp, 0, 0;
my $line = <$tmp>;
File::Temp opens the temp file in O_RDWR mode so all you have to do is use that one file handle for both reading and writing, even from external programs. The returned file handle is overloaded so that it stringifies to the temp file name so you can pass that to the external program. If that is dangerous for your purpose you can get the fileno() and redirect to /dev/fd/<fileno> instead.
All you have to do is mind your seeks and tells. :-) Just remember to always set autoflush!
use File::Temp;
use Data::Dump;
$fh = File::Temp->new;
$fh->autoflush;
system "ls /tmp/*.txt >> $fh" and die $!;
#lines = <$fh>;
printf "%s\n\n", Data::Dump::pp(\#lines);
print $fh "How now brown cow\n";
seek $fh, 0, 0 or die $!;
#lines2 = <$fh>;
printf "%s\n", Data::Dump::pp(\#lines2);
Which prints
[
"/tmp/cpan_htmlconvert_DPzx.txt\n",
"/tmp/cpan_htmlconvert_DunL.txt\n",
"/tmp/cpan_install_HfUe.txt\n",
"/tmp/cpan_install_XbD6.txt\n",
"/tmp/cpan_install_yzs9.txt\n",
]
[
"/tmp/cpan_htmlconvert_DPzx.txt\n",
"/tmp/cpan_htmlconvert_DunL.txt\n",
"/tmp/cpan_install_HfUe.txt\n",
"/tmp/cpan_install_XbD6.txt\n",
"/tmp/cpan_install_yzs9.txt\n",
"How now brown cow\n",
]
HTH

IPC::Open3 and determining if child is waiting for input

sub run_command
{
my $COMMAND = shift;
my #OUTPUT;
my %CMD = {};
$CMD{pid} = open3(my $CH_IN, my $CH_OUT, my $CH_ERR, $COMMAND);
$CMD{_STDIN} = $CH_IN;
$CMD{_STDOUT} = $CH_OUT;
$CMD{_STDERR} = $CH_ERR;
my $line = readline $CMD{_STDOUT};
print $line;
# open my $CMDPROC, q{-|}, $COMMAND or return;
# foreach (<$CMDPROC>)
# {
# push #OUTPUT, "$ARG";
# }
close $CMDPROC or return;
return #OUTPUT
}
The above code is part of a script I am writing which needs to run another script (called child). The child may or may not prompt for input, depending on the presence of a cookie file in /var/tmp (both scripts written on CentOS5 / perl 5.8.8)
I need to determine if and when the child is waiting for input, so that the parent can pass input from STDIN of parent. I also need to use open3 to open the child process, as I need for parent to pass the brutal (Severity 1) check of Perl::Critic.
I included the comments, because when the cookie file is already set, I can at least get parent to call child properly since child doesn't wait for input in that case.
I've checked around trying to find examples of how to determine if the child is waiting for input. The one example I found used strace (http://www.perlmonks.org/?node_id=964971) and I feel as though that might be too complex for what I am trying to do.
Any links to guide me will be greatly appreciated.
You can check if there's space in the pipe (using select). You can even check how much space is available in the pipe. However, I've never heard of the ability to check if a thread is blocked waiting to read from the pipe. I think you should explore other avenues.
It seems to me that a program that only reads from STDIN when certain conditions unrelated to arguments are met would provide a prompt indicating it's waiting for input. If that's the case, one could use Expect to launch and control the child program.
But the simplest solution would be to write the data to STDIN unconditionally. Implementing this using IPC::Open3 is very complicated[1], so I recommend switching to IPC::Run3 (simpler) or IPC::Run (more flexible).
# Capture's child's STDERR
run3 [ $prog, #args ], \$text_for_stdin, \my $text_from_stdout, \my $text_from_stderr;
or
# Inherits parent's STDERR
run3 [ $prog, #args ], \$text_for_stdin, \my $text_from_stdout;
When you both write to the child's STDIN and read from the child's STDOUT, you need to use select (or something else) to avoid deadlocks. IPC::Open3 is very low level and doesn't do this for you, whereas handling this are IPC::Run3 and IPC::Run raison d'être.

How do I influence the width of Perl IPC::Open3 output?

I have the following Perl code and would like it to display exactly as invoking /bin/ls in the terminal would display. For example on a terminal sized to 100 columns, it would print up to 100 characters worth of output before inserting a newline. Instead this code prints 1 file per line of output. I feel like it involves assigning some terminal settings to the IO::Pty instance, but I've tried variations of that without luck.
UPDATE: I replaced the <$READER> with a call to sysread hoping the original code might just have a buffering issue, but the output received from sysread is still one file per line.
UPDATE: I added code showing my attempt at changing the IO::Pty's size via the clone_winsize_from method. This didn't result in the output being any different.
UPDATE: As best I can tell (from reading IPC::open3 code for version 1.12) it seems you cannot pass a variable of type IO::Handle without open3 creating a pipe rather than dup'ing the filehandle. This means isatty doesn't return a true value when ls invokes it and ls then forces itself into "one file per line" mode.
I think I just need to do a fork/exec and handle the I/O redirection myself.
#!/usr/bin/env perl
use IPC::Open3;
use IO::Pty;
use strict;
my $READER = IO::Pty->new();
$READER->slave->clone_winsize_from(\*STDIN);
my $pid = open3(undef, $READER, undef, "/bin/ls");
while(my $line = <$READER>)
{
print $line;
}
waitpid($pid, 0) or die "Error waiting for pid: $!\n";
$READER->close();
I think $READER is getting overwritten with a pipe created by open3, which can be avoided by changing
my $READER = ...;
my $pid = open3(undef, $READER, undef, "/bin/ls");
to
local *READER = ...;
my $pid = open3(undef, '>&READER', undef, "/bin/ls");
See the docs.
You can pass the -C option to ls to force it to use columnar output (without getting IO::Pty involved).
The IO::Pty docs describe a clone_winsize_from(\*FH) method. You might try cloning your actual pty's dimensions.
I see that you're setting up the pty only as stdout of the child process. You might need to set it up also as its stdin — when the child process sends the "query terminal size" escape sequence to its stdout, it would need to receive the response on its stdin.

How do I read a file which is constantly updating?

I am getting a stream of data (text format) from an external server and like to pass it on to a script line-by-line. The file is getting appended in a continuous manner. Which is the ideal method to perform this operation. Is IO::Socket method using Perl will do? Eventually this data has to pass through a PHP program (reusable) and eventually land onto a MySQL database.
The question is how to open the file, which is continuously getting updated?
In Perl, you can make use of seek and tell to read from a continuously growing file. It might look something like this (borrowed liberally from perldoc -f seek)
open(FH,'<',$the_file) || handle_error(); # typical open call
for (;;) {
while (<FH>) {
# ... process $_ and do something with it ...
}
# eof reached on FH, but wait a second and maybe there will be more output
sleep 1;
seek FH, 0, 1; # this clears the eof flag on FH
}
In perl there are a couple of modules that make tailing a file easier. IO::Tail and
File::Tail one uses a callback the other uses a blocking read so it just depends on which suits your needs better. There are likely other tailing modules as well but these are the two that came to mind.
IO::Tail - follow the tail of files/stream
use IO::Tail;
my $tail = IO::Tail->new();
$tail->add('test.log', \&callback);
$tail->check();
$tail->loop();
File::Tail - Perl extension for reading from continously updated files
use File::Tail;
my $file = File::Tail->new("/some/log/file");
while (defined(my $line= $file->read)) {
print $line;
}
Perhaps a named pipe would help you?
You talk about opening a file, and ask about IO::Socket. These aren't quite the same things, even if deep down you're going to be reading data of a file descriptor.
If you can access the remote stream from a named pipe or FIFO, then you can just open it as an ordinary file. It will block when nothing is available, and return whenever there is data that needs to be drained. You may, or may not, need to bring File::Tail to bear on the problem of not losing data if the sender runs too far ahead of you.
On the other hand, if you're opening a socket directly to the other server (which seems more likely), IO::Socket is not going to work out of the box as there is no getline method available. You would have to read and buffer block-by-block and then dole it out line by line through an intermediate holding pen.
You could pull out the socket descriptor into an IO::Handle, and use getline() on that. Something like:
my $sock = IO::Socket::INET->new(
PeerAddr => '172.0.0.1',
PeerPort => 1337,
Proto => 'tcp'
) or die $!;
my $io = new IO::Handle;
$io->fdopen(fileno($sock),"r") or die $!;
while (defined( my $data = $io->getline() )) {
chomp $data;
# do something
}
You may have to perform a handshake in order to start receiving packets, but that's another matter.
In python it is pretty straight-forward:
f = open('teste.txt', 'r')
for line in f: # read all lines already in the file
print line.strip()
# keep waiting forever for more lines.
while True:
line = f.readline() # just read more
if line: # if you got something...
print 'got data:', line.strip()
time.sleep(1) # wait a second to not fry the CPU needlessy
The solutions to read the whole fine to seek to the end are perfomance-unwise. If that happens under Linux, I would suggest just to rename the log file. Then, you can scan all the entites in the renamed file, while those in original file will be filled again. After scanning all the renamed file - delete it. Or move whereever you like. This way you get something like logrotate but for scanning newly arriving data.

How can I capture the stdin and stdout of system command from a Perl script?

In the middle of a Perl script, there is a system command I want to execute. I have a string that contains the data that needs to be fed into stdin (the command only accepts input from stdin), and I need to capture the output written to stdout. I've looked at the various methods of executing system commands in Perl, and the open function seems to be what I need, except that it looks like I can only capture stdin or stdout, not both.
At the moment, it seems like my best solution is to use open, redirect stdout into a temporary file, and read from the file after the command finishes. Is there a better solution?
IPC::Open2/3 are fine, but I've found that usually all I really need is IPC::Run3, which handles the simple cases really well with minimal complexity:
use IPC::Run3; # Exports run3() by default
run3( \#cmd, \$in, \$out, \$err );
The documentation compares IPC::Run3 to other alternatives. It's worth a read even if you don't decide to use it.
The perlipc documentation covers many ways that you can do this, including IPC::Open2 and IPC::Open3.
Somewhere at the top of your script, include the line
use IPC::Open2;
That will include the necessary module, usually installed with most Perl distributions by default. (If you don't have it, you could install it using CPAN.) Then, instead of open, call:
$pid = open2($cmd_out, $cmd_in, 'some cmd and args');
You can send data to your command by sending it to $cmd_in and then read your command's output by reading from $cmd_out.
If you also want to be able to read the command's stderr stream, you can use the IPC::Open3 module instead.
IPC::Open3 would probably do what you want. It can capture STDERR and STDOUT.
http://metacpan.org/pod/IPC::Open3
A very easy way to do this that I recently found is the IPC::Filter module. It lets you do the job extremely intuitively:
$output = filter $input, 'somecmd', '--with', 'various=args', '--etc';
Note how it invokes your command without going through the shell if you pass it a list. It also does a reasonable job of handling errors for common utilities. (On failure, it dies, using the text from STDERR as its error message; on success, STDERR is just discarded.)
Of course, it’s not suitable for huge amounts of data since it provides no way of doing any streaming processing; also, the error handling might not be granular enough for your needs. But it makes the many simple cases really really simple.
I think you want to take a look at IPC::Open2
There is a special perl command for it
open2()
More info can be found on: http://sunsite.ualberta.ca/Documentation/Misc/perl-5.6.1/lib/IPC/Open2.html
If you do not want to include extra packages, you can just do
open(TMP,">tmpfile");
print TMP $tmpdata ;
open(RES,"$yourcommand|");
$res = "" ;
while(<RES>){
$res .= $_ ;
}
which is the contrary of what you suggested, but should work also.
I always do it this way if I'm only expecting a single line of output or want to split the result on something other than a newline:
my $result = qx( command args 2>&1 );
my $rc=$?;
# $rc >> 8 is the exit code of the called program.
if ($rc != 0 ) {
error();
}
If you want to deal with a multi-line response, get the result as an array:
my #lines = qx( command args 2>&1 );
foreach ( my $line ) (#lines) {
if ( $line =~ /some pattern/ ) {
do_something();
}
}