Problem with piped filehandle in perl - perl

I am trying to run bp_genbank2gff3.pl (bioperl package) from another perl script that
gets a genbank as its argument.
This does not work (no output files are generated):
my $command = "bp_genbank2gff3.pl -y -o /tmp $ARGV[0]";
open( my $command_out, "-|", $command );
close $command_out;
but this does
open( my $command_out, "-|", $command );
sleep 3; # why do I need to sleep?
close $command_out;
Why?
I thought that close is supposed to block until the command is done:
Closing any piped filehandle causes
the parent process to wait for the
child to finish...
(see http://perldoc.perl.org/functions/open.html).
Edit
I added this as last line:
say "ret=$ret, \$?=$?, \$!=$!";
and in both cases the printout is:
ret=, $?=13, $!=
(which means close failed in both cases, right?)

$? = 13 means your child process was terminated by a SIGPIPE signal. Your external program (bp_genbank2gff3.pl) tried to write some output to a pipe to your perl program. But the perl program closed its end of the pipe so your OS sent a SIGPIPE to the external program.
By sleeping for 3 seconds, you are letting your program run for 3 seconds before the OS kills it, so this will let your program get something done. Note that pipes have a limited capacity, though, so if your parent perl script is not reading from the pipe and if the external program is writing a lot to standard output, the external program's write operations will eventually block and you may not really get 3 seconds of effort from your external program.
The workaround is to read the output from the external program, even if you are just going to throw it away.
open( my $command_out, "-|", $command );
my #ignore_me = <$command_out>;
close $command_out;
Update: If you really don't care about the command's output, you can avoid SIGPIPE issues by redirecting the output to /dev/null:
open my $command_out, "-|", "$command > /dev/null";
close $command_out; # succeeds, no SIGPIPE
Of course if you are going to go to that much trouble to ignore the output, you might as well just use system.
Additional info: As the OP says, closing a piped filehandle causes the parent to wait for the child to finish (by using waitpid or something similar). But before it starts waiting, it closes its end of the pipe. In this case, that end is the read end of the pipe that the child process is writing its standard output to. The next time the child tries to write something to standard output, the OS detects that the read end of that pipe is closed and sends a SIGPIPE to the child process, killing it and quickly letting the close statement in the parent finish.

I'm not sure what you're trying to do but system is probably better in this case...

Related

A daemon to tail a log and fork multiple external (perl) script

I'm trying to write a program, actually a daemon, which stay in memory and perform something like tail -F on a rapidly updated log file. Then the program, when detect a new line on the file, have to launch another compiled perl script which will perform some operations on the log line and then send it with a post.
To clearly explain, I will refer to these two program as "prgTAIL" and "prgPROCESS". So, prgTAIL tail the log and launch prgPROCESS passing the new line to it.
Obviously the prgTAIL doesn't have to wait for the prgPROCESS to end the process, cause prgTAIL have to stay in memory and keep detecting new line on the log. Also, the rate of file update needs to launch multiple parallel prgPROCESS instance. For this reason I'm using two program: the first small and fast just pass the data to the second, which may be heavier cause it can be launched in multiple instances.
On the prgTAIL I used:
a pipe to tail the log file
a while loop to launch prgPROCESS on new log line
a fork(); to continue without waiting prgPROCESS ends
my $log_csv = "/log/csv.csv";
open (my $pipe, "-|", "tail", "-n0", "-F", $log_csv) or die "error";
while (<$pipe>) {
$line = $_ ;
my $pid = fork();
if (defined $pid && $pid == 0) {
exec("/bin/prgPROCESS ".$line) ; # I tried system() too.
exit 0;
}
}
The prgPROCESS operation are not so important; anyway.. it parses the $line passed as arguments, construct an XML and then post it via https.
So, this stuff actually run, but I think I messed up something with the process, cause when a reach a number of newline and prgPROCESS call around 550, prgTAIL keep running but it can't call prgPROCESS anymore, cause there are too many process. I get this error on the bash:
-bash: fork: Resource temporarily unavailable
What's wrong? Any idea? Maybe the prgPROCESS processes don't end and stay stuck without make room for other process?
PS: I'm using a Mac OS X now, but this will run on Linux.
Your problem is this:
while () {
doesn't have any constraint condition, so it's just spinning as fast as it can. You're never actually reading from your pipe, you're just forking as fast as you can and spawning that new script.
You might be wanting:
while ( my $line = <$pipe> ) {
#....
}
But really - it's arguable that you don't actually need to fork at all, because a read/process/read loop would probably do just fine - fork() and exec() is basically what system already does anyway.
You should also - if forking - clean up child processes. It doesn't matter too much for short running things, but things that sit in a loop will leave a lot of zombie processes. Either via setting $SIG{CHLD} or using waitpid.

Perl Expect Script Exiting Prematurely

I have a Perl Expect script which handles file transfers. The script works fine except that it exits before the file transfer finishes. I don't want to rely on sleep() because the amount of time needed can vary.
Is there someway for expect to wait for my command to finish, before continuing?
my $exp = Expect->spawn("perl ./fileTransfer.pl $url")
or die "Cannot spawn program: $!\n";
#Enter credentials
$exp->send($username);
sleep(1);
$exp->send($password);
sleep(1);
#This only executes for a bit, before the program exits:
$exp->send($getFiles);
$exp->soft_close();
exit;
This was solved by simply using $exp->expect(undef); instead of $exp->soft_close();
I also took #Mark Setchell 's advice and now 'expect' specific prompts, this way I can easily do multiple 'sends' without fear of one executing before the prior one finishes.

Is There Any Way to Pipe Data from Perl to a Unix Command Line Utility

I have a command line utility from a third party (it's big and written in Java) that I've been using to help me process some data. This utility expects information in a line delimited file and then outputs processed data to STDOUT.
In my testing phases, I was fine with writing some Perl to create a file full of information to be processed and then sending that file to this third party utility, but as I'm nearing putting this code in production, I'd really prefer to just pipe data to this utility directly instead of first writing that data to a file as this would save me the overhead of having to write unneeded information to disk. I recently asked on this board how I could do this in Unix, but have since realized that it would be incredibly more convenient to actually run it directly out of a Perl module. Perhaps something like:
system(bin/someapp do-action --option1 some_value --input $piped_in_data)
Currently I call the utility as follows:
bin/someapp do-action --option1 some_value --input some_file
Basically, what I want is to write all my data either to a variable or to STDOUT and then to pipe it to the Java app through a system call in the SAME Perl script or module. This would make my code a lot more fluid. Without it, I'd wind up needing to write a Perl script which calls a bash file half way through which in turn would need to call another Perl script to prep data. If at all possible I'd love to just stay in Perl the whole way through. Any ideas?
If I am reading your question correctly, you are wanting to spawn a process and be able to both write to its stdin and read from its stdout. If that is the case, then IPC::Open2 is exactly what you need. (Also see IPC::Open3 you also need to read from the process' stderr.)
Here is some sample code. I have marked the areas you will have to change.
#!/usr/bin/perl
use strict;
use warnings;
use IPC::Open2;
# Sample data -- ignore this.
my #words = qw(the quick brown fox jumped over the lazy dog);
# Automatically reap child processes. This is important when forking.
$SIG{'CHLD'} = 'IGNORE';
# Spawn the external process here. Change this to the process you need.
open2(*READER, *WRITER, "wc -c") or die "wc -c: $!";
# Fork into a child process. The child process will write the data, while the
# parent process reads data back from the process. We need to fork in case
# the process' output buffer fills up and it hangs waiting for someone to read
# its output. This could cause a deadlock.
my $pid;
defined($pid = fork()) or die "fork: $!";
if (!$pid) {
# This is the child.
# Close handle to process' stdout; the child doesn't need it.
close READER;
# Write out some data. Change this to print out your data.
print WRITER $words[rand(#words)], " " for (1..100000);
# Then close the handle to the process' stdin.
close WRITER;
# Terminate the child.
exit;
}
# Parent closes its handle to the process' stdin immediately! As long as one
# process has an open handle, the program on the receiving end of the data will
# never see EOF and may continue waiting.
close WRITER;
# Read in data from the process. Change this to whatever you need to do to
# process the incoming data.
print "READ: $_" while (<READER>);
# Close the handle to the process' stdin. After this call, the process should
# be finished executing and will terminate on its own.
close READER;
If it only accepts files, let it open "/proc/self/fd/0", which is the same as STDIN. For the rest, see cdhowies answer.
If all you want to do is pipe the STDOUT from your program into your other program's STDIN, you can do this via the standard Perl open command:
open (CMD, "|$command") or die qq(Couldn't execute $command for piping);
Then, all you have to do to send data to this command is to use the print statement:
print CMD $dataToCommand;
And, you finally close your pipe with the close statement:
close (CMD);
PERL HINT
Perl has a command called perldoc which can give you the documentation of any Perl function or Perl module installed on your system. To get more information about the open command, type:
$ perldoc -f open
The -f says this is a Perl function
If you're doing what cdhowie said in his answer, (you're spawning a process, then reading and writing to that process), you will need IPC::Open2. To get information about the IPC::Open2 module, type:
$ perldoc IPC::Open2

How can I make fork in Perl in different scripts?

I have a process in Perl that creates another one with the system command, I leave it on memory and I pass some variables like this:
my $var1 = "Hello";
my $var1 = "World";
system "./another_process.pl $var1 $var2 &";
But the system command only returns the result, I need to get the PID. I want to make something like fork. What should I do? How can I make something like fork but in diferent scripts?
Thanks in advance!
Perl has a fork function.
See perldoc perlfaq8 - How do I start a process in the background?
(contributed by brian d foy)
There's not a single way to run code
in the background so you don't have to
wait for it to finish before your
program moves on to other tasks.
Process management depends on your
particular operating system, and many
of the techniques are in perlipc.
Several CPAN modules may be able to
help, including
IPC::Open2
or
IPC::Open3
,
IPC::Run
,
Parallel::Jobs
,
Parallel::ForkManager
,
POE
,
Proc::Background
, and
Win32::Process
.
There are many other modules you might
use, so check those namespaces for
other options too. If you are on a
Unix-like system, you might be able to
get away with a system call where you
put an & on the end of the command:
system("cmd &")
You can also try using
fork,
as described in
perlfunc
(although this is the same thing that
many of the modules will do for you).
STDIN, STDOUT, and STDERR are shared
Both the main process and the
backgrounded one (the "child" process)
share the same STDIN, STDOUT and
STDERR filehandles. If both try to
access them at once, strange things
can happen. You may want to close or
reopen these for the child. You can
get around this with opening a pipe
(see open) but on some systems this
means that the child process cannot
outlive the parent.
Signals
You'll have to catch the SIGCHLD
signal, and possibly SIGPIPE too.
SIGCHLD is sent when the backgrounded
process finishes. SIGPIPE is sent when
you write to a filehandle whose child
process has closed (an untrapped
SIGPIPE can cause your program to
silently die). This is not an issue
with system("cmd&").
Zombies
You have to be prepared to "reap" the
child process when it finishes.
$SIG{CHLD} = sub { wait };
$SIG{CHLD} = 'IGNORE'; You can also
use a double fork. You immediately
wait() for your first child, and the
init daemon will wait() for your
grandchild once it exits.
unless ($pid = fork) {
unless (fork) {
exec "what you really wanna do";
die "exec failed!";
}
exit 0;
}
waitpid($pid, 0);
See Signals in
perlipc
for other examples of code to do this.
Zombies are not an issue with
system("prog &").system("prog &").
It's true that you can use fork/exec, but I think it will be much easier to simply use the pipe form of open. Not only is the return value the pid you are looking for, you can be connected to either the stdin or stdout of the process, depending on how you open. For instance:
open my $handle, "foo|";
will return the pid of foo and connect you to the stdout so that if you you get a line of output from foo. Using "|foo" instead will allow you to write to foo's stdin.
You can also use open2 and open3 to do both simultaneously, though that has some major caveats applied as you can run in to unexpected issues due to io buffering.
Use fork and exec.
If you need to get the PID of a perl script you can use the $$ variable. You can put it in your another_process.pl then have it output the pid to a file. Can you be more clear on like fork? You can always use the fork exec combination.

How can I run Perl system commands in the background?

#!/usr/bin/env perl
use warnings; use strict;
use 5.012;
use IPC::System::Simple qw(system);
system( 'xterm', '-geometry', '80x25-5-5', '-bg', 'green', '&' );
say "Hello";
say "World";
I tried this to run the xterm-command in the background, but it doesn't work:
No absolute path found for shell: &
What would be the right way to make it work?
Perl's system function has two modes:
taking a single string and passing it to the command shell to allow special characters to be processed
taking a list of strings, exec'ing the first and passing the remaining strings as arguments
In the first form you have to be careful to escape characters that might have a special meaning to the shell. The second form is generally safer since arguments are passed directly to the program being exec'd without the shell being involved.
In your case you seem to be mixing the two forms. The & character only has the meaning of "start this program in the background" if it is passed to the shell. In your program, the ampersand is being passed as the 5th argument to the xterm command.
As Jakob Kruse said the simple answer is to use the single string form of system. If any of the arguments came from an untrusted source you'd have to use quoting or escaping to make them safe.
If you prefer to use the multi-argument form then you'll need to call fork() and then probably use exec() rather than system().
Note that the list form of system is specifically there to not treat characters such as & as shell meta-characters.
From perlfaq8's answer to How do I start a process in the background?
(contributed by brian d foy)
There's not a single way to run code in the background so you don't have to wait for it to finish before your program moves on to other tasks. Process management depends on your particular operating system, and many of the techniques are in perlipc.
Several CPAN modules may be able to help, including IPC::Open2 or IPC::Open3, IPC::Run, Parallel::Jobs, Parallel::ForkManager, POE, Proc::Background, and Win32::Process. There are many other modules you might use, so check those namespaces for other options too.
If you are on a Unix-like system, you might be able to get away with a system call where you put an & on the end of the command:
system("cmd &")
You can also try using fork, as described in perlfunc (although this is the same thing that many of the modules will do for you).
STDIN, STDOUT, and STDERR are shared
Both the main process and the backgrounded one (the "child" process) share the same STDIN, STDOUT and STDERR filehandles. If both try to access them at once, strange things can happen. You may want to close or reopen these for the child. You can get around this with opening a pipe (see open in perlfunc) but on some systems this means that the child process cannot outlive the parent.
Signals
You'll have to catch the SIGCHLD signal, and possibly SIGPIPE too. SIGCHLD is sent when the backgrounded process finishes. SIGPIPE is sent when you write to a filehandle whose child process has closed (an untrapped SIGPIPE can cause your program to silently die). This is not an issue with system("cmd&").
Zombies
You have to be prepared to "reap" the child process when it finishes.
$SIG{CHLD} = sub { wait };
$SIG{CHLD} = 'IGNORE';
You can also use a double fork. You immediately wait() for your first child, and the init daemon will wait() for your grandchild once it exits.
unless ($pid = fork) {
unless (fork) {
exec "what you really wanna do";
die "exec failed!";
}
exit 0;
}
waitpid($pid, 0);
See Signals in perlipc for other examples of code to do this. Zombies are not an issue with system("prog &").
Have you tried?
system('xterm -geometry 80x25-5-5 -bg green &');
http://www.rocketaware.com/perl/perlfaq8/How_do_I_start_a_process_in_the_.htm
This is not purely an explanation for Perl. The same problem is under C and other languages.
First understand what the system command does:
Forks
Under the child process call exec
The parent process is waiting for forked child process to finish
It does not matter if you pass multiple arguments or one argument. The difference is, with multiple arguments, the command is executed directly. With one argument, the command is wrapped by the shell, and finally executed as:
/bin/sh -c your_command_with_redirections_and_ambersand
When you pass a command as some_command par1 par2 &, then between the Perl interpreter and the command is the sh or bash process used as a wrapper, and it is waiting for some_command finishing. Your script is waiting for the shell interpreter, and no additional waitpid is needed, because Perl's function system does it for you.
When you want to implement this mechanism directly in your script, you should:
Use the fork function. See example: http://users.telenet.be/bartl/classicperl/fork/all.html
Under the child condition (if), use the exec function. Your user is similar to system, see the manual. Notice, exec causes the child process program/content/data cover by the executed command.
Under the parent condition (if, fork exits with non-zero), you use waitpid, using pid returned by the fork function.
This is why you can run the process in the background. I hope this is simple.
The simplest example:
if (my $pid = fork) { #exits 0 = false for child process, at this point is brain split
# parent ($pid is process id of child)
# Do something what you want, asynchronously with executed command
waitpid($pid); # Wait until child ends
# If you don't want to, don't wait. Your process ends, and then the child process will be relinked
# from your script to INIT process, and finally INIT will assume the child finishing.
# Alternatively, you can handle the SIGCHLD signal in your script
}
else {
# Child
exec('some_command arg1 arg2'); #or exec('some_command','arg1','arg2');
#exit is not needed, because exec completely overwrites the process content
}