In OCaml, how to get stdout string from subprocess? - subprocess

I'm working with OCaml and need to start a new process and communicate with it.
If the subprocess terminated once it's called and produced some output, then how to retrieve the strings in it's stdout?
What if the subprocess never terminates? That is, everytime given a string to stdin, it will produce a result to stdout, how to fetch the result?

Use Unix.open_process_in function. It will spawn a process and return an input channel, that will allow you to read data from the process.
If you want to wait for the process to terminate, you can just read all the data (i.e., wait until the process pipe returns EOF), and then close the channel, e.g.,
(* [run cmd] runs a shell command, waits until it terminates, and
returns a list of strings that the process outputed *)
let run cmd =
let inp = Unix.open_process_in cmd in
let r = In_channel.input_lines inp in
In_channel.close inp; r
Working with non-terminating process is even easier.
You may also find interesting lwt library that has a very descent interface to multiprocessing. And async library is another asynchronous library that provide an excellent interface to multiprocessing. Although, this libraries are great, they are little bit advance, for simple cases standard Unix module is enough.

Related

Redirect output of sys process to stdout, no buffering

I would like to run an external process and have its stdout and stderr piped immediately to stdout, without any buffering (at least not line-buffering). With line buffering, this isn't difficult:
val cmd: String = ...
cmd ! ProcessLogger(println,println)
However, how can I do this so that I don't have to wait for a whole line before outputting? The problem is that the process I am running has a percentage output that keeps updating, but when I use something like the above, it waits for a bit and then finally outputs 100%.
I've looked into ProcessIO and the helpful BasicIO companion, but I haven't managed to get anything working end to end.

What is the preferred cross-platform IPC Perl module?

I want to create a simple IO object that represents a pipe opened to another program to that I can periodically write to another program's STDIN as my app runs. I want it to be bullet-proof (in that it catches all errors) and cross-platform. The best options I can find are:
open
sub io_read {
local $SIG{__WARN__} = sub { }; # Silence warning.
open my $pipe, '|-', #_ or die "Cannot exec $_[0]: $!\n";
return $pipe;
}
Advantages:
Cross-platform
Simple
Disadvantages
No $SIG{PIPE} to catch errors from the piped program
Are other errors caught?
IO::Pipe
sub io_read {
IO::Pipe->reader(#_);
}
Advantages:
Simple
Returns an IO::Handle object for OO interface
Supported by the Perl core.
Disadvantages
Still No $SIG{PIPE} to catch errors from the piped program
Not supported on Win32 (or, at least, its tests are skipped)
IPC::Run
There is no interface for writing to a file handle in IPC::Run, only appending to a scalar. This seems…weird.
IPC::Run3
No file handle interface here, either. I could use a code reference, which would be called repeatedly to spool to the child, but looking at the source code, it appears that it actually writes to a temporary file, and then opens it and spools its contents to the pipe'd command's STDIN. Wha?
IPC::Cmd
Still no file handle interface.
What am I missing here? It seems as if this should be a solved problem, and I'm kind of stunned that it's not. IO::Pipe comes closest to what I want, but the lack of $SIG{PIPE} error handling and the lack of support for Windows is distressing. Where is the piping module that will JDWIM?
Thanks to guidance from #ikegami, I have found that the best choice for interactively reading from and writing to another process in Perl is IPC::Run. However, it requires that the program you are reading from and writing to have a known output when it is done writing to its STDOUT, such as a prompt. Here's an example that executes bash, has it run ls -l, and then prints that output:
use v5.14;
use IPC::Run qw(start timeout new_appender new_chunker);
my #command = qw(bash);
# Connect to the other program.
my ($in, #out);
my $ipc = start \#command,
'<' => new_appender("echo __END__\n"), \$in,
'>' => new_chunker, sub { push #out, #_ },
timeout(10) or die "Error: $?\n";
# Send it a command and wait until it has received it.
$in .= "ls -l\n";
$ipc->pump while length $in;
# Wait until our end-of-output string appears.
$ipc->pump until #out && #out[-1] =~ /__END__\n/m;
pop #out;
say #out;
Because it is running as an IPC (I assume), bash does not emit a prompt when it is done writing to its STDOUT. So I use the new_appender() function to have it emit something I can match to find the end of the output (by calling echo __END__). I've also used an anonymous subroutine after a call to new_chunker to collect the output into an array, rather than a scalar (just pass a reference to a scalar to '>' if you want that).
So this works, but it sucks for a whole host of reasons, in my opinion:
There is no generally useful way to know that an IPC-controlled program is done printing to its STDOUT. Instead, you have to use a regular expression on its output to search for a string that usually means it's done.
If it doesn't emit one, you have to trick it into emitting one (as I have done here—god forbid if I should have a file named __END__, though). If I was controlling a database client, I might have to send something like SELECT 'IM OUTTA HERE';. Different applications would require different new_appender hacks.
The writing to the magic $in and $out scalars feels weird and action-at-a-distance-y. I dislike it.
One cannot do line-oriented processing on the scalars as one could if they were file handles. They are therefore less efficient.
The ability to use new_chunker to get line-oriented output is nice, if still a bit weird. That regains a bit of the efficiency on reading output from a program, though, assuming it is buffered efficiently by IPC::Run.
I now realize that, although the interface for IPC::Run could potentially be a bit nicer, overall the weaknesses of the IPC model in particular makes it tricky to deal with at all. There is no generally-useful IPC interface, because one has to know too much about the specifics of the particular program being run to get it to work. This is okay, maybe, if you know exactly how it will react to inputs, and can reliably recognize when it is done emitting output, and don't need to worry much about cross-platform compatibility. But that was far from sufficient for my need for a generally useful way to interact with various database command-line clients in a CPAN module that could be distributed to a whole host of operating systems.
In the end, thanks to packaging suggestions in comments on a blog post, I decided to abandon the use of IPC for controlling those clients, and to use the DBI, instead. It provides an excellent API, robust, stable, and mature, and suffers none of the drawbacks of IPC.
My recommendation for those who come after me is this:
If you just need to execute another program and wait for it to finish, or collect its output when it is done running, use IPC::System::Simple. Otherwise, if what you need to do is to interactively interface with something else, use an API whenever possible. And if it's not possible, then use something like IPC::Run and try to make the best of it—and be prepared to give up quite a bit of your time to get it "just right."
I've done something similar to this. Although it depends on the parent program and what you are trying to pipe. My solution was to spawn a child process (leaving $SIG{PIPE} to function) and writing that to the log, or handling the error in what way you see fit. I use POSIX to handle my child process and am able to utilize all the functionality of the parent. However if you're trying to have the child communicate back to the parent - then things get difficult. Do you have an example of the main program and what you're trying to PIPE?

What can be the possible situations where one should prefer the unbuffered output?

By the discussion in my previous question I came to know that Perl gives line buffer output by default.
$| = 0; # for buffered output (by default)
If you want to get unbuffered output then set the special variable $| to 1 i.e.
$| = 1; # for unbuffered output
Now I want to know that what can be the possible situations where one should prefer the unbuffered output?
You want unbuffered output for interactive tasks. By that, I mean you don't want output stuck in some buffer when you expect someone or something else to respond to the output.
For example, you wouldn't want user prompts sent to STDOUT to be buffered. (That's why STDOUT is never fully buffered when attached to a terminal. It is only line buffered, and the buffer is flushed by attempts to read from STDIN.)
For example, you'd want requests sent over pipes and sockets to not get stuck in some buffer, as the other end of the connection would never see it.
The only other reason I can think of is when you don't want important data to be stuck in a buffer in the event of a unrecoverable error such as a panic or death by signal.
For example, you might want to keep a log file unbuffered in order to be able to diagnose serious problems. (This is why STDERR isn't buffered by default.)
Here's a small sample of Perl users from StackOverflow who have benefited from learning to set $| = 1:
STDOUT redirected externally and no output seen at the console
Perl Print function does not work properly when Sleep() is used
can't write to file using print
perl appending issues
Unclear perl script execution
Perl: Running a "Daemon" and printing
Redirecting STDOUT of a pipe in Perl
Why doesn't my parent process see the child's output until it exits?
Why does adding or removing a newline change the way this perl for loop functions?
Perl not printing properly
In Perl, how do I process input as soon as it arrives, instead of waiting for newline?
What is the simple way to keep the output stream exactly as it shown out on the screen (while interactive data used)?
Is it possible to print from a perl CGI before the process exits?
Why doesn't print output anything on each iteration of a loop when I use sleep?
Perl Daemon Not Working with Sleep()
It can be useful when writing to another program over a socket or pipe. It can also be useful when you are writing debugging information to STDOUT to watch the state of your program live.

How can I run Perl system commands in the background?

#!/usr/bin/env perl
use warnings; use strict;
use 5.012;
use IPC::System::Simple qw(system);
system( 'xterm', '-geometry', '80x25-5-5', '-bg', 'green', '&' );
say "Hello";
say "World";
I tried this to run the xterm-command in the background, but it doesn't work:
No absolute path found for shell: &
What would be the right way to make it work?
Perl's system function has two modes:
taking a single string and passing it to the command shell to allow special characters to be processed
taking a list of strings, exec'ing the first and passing the remaining strings as arguments
In the first form you have to be careful to escape characters that might have a special meaning to the shell. The second form is generally safer since arguments are passed directly to the program being exec'd without the shell being involved.
In your case you seem to be mixing the two forms. The & character only has the meaning of "start this program in the background" if it is passed to the shell. In your program, the ampersand is being passed as the 5th argument to the xterm command.
As Jakob Kruse said the simple answer is to use the single string form of system. If any of the arguments came from an untrusted source you'd have to use quoting or escaping to make them safe.
If you prefer to use the multi-argument form then you'll need to call fork() and then probably use exec() rather than system().
Note that the list form of system is specifically there to not treat characters such as & as shell meta-characters.
From perlfaq8's answer to How do I start a process in the background?
(contributed by brian d foy)
There's not a single way to run code in the background so you don't have to wait for it to finish before your program moves on to other tasks. Process management depends on your particular operating system, and many of the techniques are in perlipc.
Several CPAN modules may be able to help, including IPC::Open2 or IPC::Open3, IPC::Run, Parallel::Jobs, Parallel::ForkManager, POE, Proc::Background, and Win32::Process. There are many other modules you might use, so check those namespaces for other options too.
If you are on a Unix-like system, you might be able to get away with a system call where you put an & on the end of the command:
system("cmd &")
You can also try using fork, as described in perlfunc (although this is the same thing that many of the modules will do for you).
STDIN, STDOUT, and STDERR are shared
Both the main process and the backgrounded one (the "child" process) share the same STDIN, STDOUT and STDERR filehandles. If both try to access them at once, strange things can happen. You may want to close or reopen these for the child. You can get around this with opening a pipe (see open in perlfunc) but on some systems this means that the child process cannot outlive the parent.
Signals
You'll have to catch the SIGCHLD signal, and possibly SIGPIPE too. SIGCHLD is sent when the backgrounded process finishes. SIGPIPE is sent when you write to a filehandle whose child process has closed (an untrapped SIGPIPE can cause your program to silently die). This is not an issue with system("cmd&").
Zombies
You have to be prepared to "reap" the child process when it finishes.
$SIG{CHLD} = sub { wait };
$SIG{CHLD} = 'IGNORE';
You can also use a double fork. You immediately wait() for your first child, and the init daemon will wait() for your grandchild once it exits.
unless ($pid = fork) {
unless (fork) {
exec "what you really wanna do";
die "exec failed!";
}
exit 0;
}
waitpid($pid, 0);
See Signals in perlipc for other examples of code to do this. Zombies are not an issue with system("prog &").
Have you tried?
system('xterm -geometry 80x25-5-5 -bg green &');
http://www.rocketaware.com/perl/perlfaq8/How_do_I_start_a_process_in_the_.htm
This is not purely an explanation for Perl. The same problem is under C and other languages.
First understand what the system command does:
Forks
Under the child process call exec
The parent process is waiting for forked child process to finish
It does not matter if you pass multiple arguments or one argument. The difference is, with multiple arguments, the command is executed directly. With one argument, the command is wrapped by the shell, and finally executed as:
/bin/sh -c your_command_with_redirections_and_ambersand
When you pass a command as some_command par1 par2 &, then between the Perl interpreter and the command is the sh or bash process used as a wrapper, and it is waiting for some_command finishing. Your script is waiting for the shell interpreter, and no additional waitpid is needed, because Perl's function system does it for you.
When you want to implement this mechanism directly in your script, you should:
Use the fork function. See example: http://users.telenet.be/bartl/classicperl/fork/all.html
Under the child condition (if), use the exec function. Your user is similar to system, see the manual. Notice, exec causes the child process program/content/data cover by the executed command.
Under the parent condition (if, fork exits with non-zero), you use waitpid, using pid returned by the fork function.
This is why you can run the process in the background. I hope this is simple.
The simplest example:
if (my $pid = fork) { #exits 0 = false for child process, at this point is brain split
# parent ($pid is process id of child)
# Do something what you want, asynchronously with executed command
waitpid($pid); # Wait until child ends
# If you don't want to, don't wait. Your process ends, and then the child process will be relinked
# from your script to INIT process, and finally INIT will assume the child finishing.
# Alternatively, you can handle the SIGCHLD signal in your script
}
else {
# Child
exec('some_command arg1 arg2'); #or exec('some_command','arg1','arg2');
#exit is not needed, because exec completely overwrites the process content
}

What's the differences between system and backticks and pipes in Perl?

Perl supports three ways (that I know of) of running external programs:
system:
system PROGRAM LIST
as in:
system "abc";
backticks as in:
`abc`;
running it through a pipe as in:
open ABC, "abc|";
What are the differences between them? Here's what I know:
You can use backticks and pipes to get the output of the command easily.
that's it (more in future edits?)
system(): runs command and returns command's exit status
backticks: runs command and returns the command's output
pipes : runs command and allows you to use them as an handle
Also backticks redirects the executed program's STDOUT to a variable, and system sends it to your main program's STDOUT.
The perlipc documentation explains the various ways that you can interact with other processes from Perl, and perlfunc's open documentation explains the piped filehandles.
The system sends its output to standard output (and error)
The backticks captures the standard output and returns it (but not standard error)
The piped open allows you to capture the output and read it from a file handle, or to print to a file handle and use that as the input for the external command.
There are also modules that handle these details with the cross-platform edge cases.
system is also returning the exit value of the application (ERRORLEVEL in Windows).
Pipes are a bit more complicated to use, as reading from them and closing them adds extra code.
Finally, they have different implementation which was meant to do different things. Using pipes you're able to communicate back with the executed applications, while the other commands doesn't allow that (easily).
Getting the program's exit status is not limited to system(). When you call close(PIPE), it returns the exit status, and you can get the latest exit status for all 3 methods from $?.
Please also note that
readpipe('...')
is the same as
`...`