External program called with backticks still produces output - perl

so I call an external program in perl and want to capture it's output:
my #RNAalifoldOut = `RNAalifold some parameters`;
If called from command line the output consists of three lines, e.g:
4 sequences; length of alignment 48.
__GCCGA_UGUAGCUCAGUUGGG_AGAGCGCCAGACUGAAAAUCAGA
...((((.....((((.........)))).(((((.......)))))
However my array #RNAalifoldOut contains only the two last lines and the first line appears directly on the screen when the line is being executed.
How can this be? I thought maybe the program writes the first line to STDERR, but isn't that discarded by the backticks operator? And what could I do to hide this output?
Regards
Nick

You are likely seeing the standard error from RNAalifold. Backticks capture only the standard output.
Capture both standard output and standard error by changing your code to
my #RNAalifoldOut = `RNAalifold some parameters 2>&1`;
To discard the standard error, use
my #RNAalifoldOut = `RNAalifold some parameters 2>/dev/null`;
on Unix-like platforms. On Windows, use
my #RNAalifoldOut = `RNAalifold some parameters 2>nul`;

Related

Perl interface with Aspell

I am trying to identify misspelled words with Aspell via Perl. I am working on a Linux server without administrator privileges which means I have access to Perl and Aspell but not, for example, Text::Aspell which is a Perl interface for Aspell.
I want to do the very simple task of passing a list of words to Aspell and having it return the words that are misspelled. If the words I want to check are "dad word lkjlkjlkj" I can do this through the command line with the following commands:
aspell list
dad word lkjlkjlkj
Aspell requires CTRL + D at the end to submit the word list. It would then return "lkjlkjlkj", as this isn't in the dictionary.
In order to do the exact same thing, but submitted via Perl (because I need to do this for thousands of documents) I have tried:
my $list = q(dad word lkjlkjlkj):
my #arguments = ("aspell list", $list, "^D");
my $aspell_out=`#arguments`;
print "Aspell output = $aspell_out\n";
The expected output is "Aspell output = lkjlkjlkj" because this is the output that Aspell gives when you submit these commands via the command line. However, the actual output is just "Aspell output = ". That is, Perl does not capture any output from Aspell. No errors are thrown.
I am not an expert programmer, but I thought this would be a fairly simple task. I've tried various iterations of this code and nothing works. I did some digging and I'm concerned that perhaps because Aspell is interactive, I need to use something like Expect, but I cannot figure out how to use it. Nor am I sure that it is actually the solution to my problem. I also think ^D should be an appropriate replacement for CTRL+D at the end of the commands, but all I know is it doesn't throw an error. I also tried \cd instead. Whatever it is, there is obviously an issue in either submitting the command or capturing the output.
The complication with using aspell out of a program is that it is an interactive and command-line driver tool, as you suspect. However, there is a simple way to do what you need.
In order to use aspell's command list one needs to pass it words via STDIN, as its man page says. While I find the GNU Aspell manual a little difficult to get going with, passing input to a program via its STDIN is easy enough and we can rewrite the invocation as
echo dad word lkj | aspell list
We get lkj printed back, as due. Now this can run out of a program just as it stands
my $word_list = q(word lkj good asdf);
my $cmd = qq(echo $word_list | aspell list);
my #aspell_out = qx($cmd);
print for #aspell_out;
This prints lines lkj and asdf.
I assemble the command in a string (as opposed to an array) for specific reasons, explained below. The qx is the operator form of backticks, which I prefer for its far superior readability.
Note that qx can return all output in a string, if in scalar context (assigned to a scalar for example), or in a list when in list context. Here I assign to an array so you get each word as an element (alas, each also comes with a newline, so may want to do chomp #aspell_out;).
Comment on a list vs string form of a command
I think that it's safe to recommend to use a list-form for a command, in general. So we'd say
my #cmd = ('ls', '-l', $dir); # to be run as an external command
instead of
my $cmd = "ls -l $dir"; # to be run as an external command
The list form generally makes it easier to manage the command, and it avoids the shell altogether.
However, this case is a little different
The qx operator doesn't really behave differently -- the array gets concatenated into a string, and that runs. The very fact that we can pass it an array is incidental, and not even documented
We need to pipe input to aspell's STDIN, and shell does that for us simply. We can use a shell with command's LIST form as well, but then we'd need to invoke it explicitly. We can also go for aspell's STDIN by means other than the shell but that's more complex
With a command in a list the command name must be the first word, so that "aspell list" from the question is wrong and it should fail (there is no command named that) ... except that in this case it wouldn't (if the rest were correct), since for qx the array gets collapsed into a string
Finally, apsell nicely exposes its API in a C library and that's been utilized for the module you mention. I'd suggest to install it as a user (no privileges needed) and use that.
You should take a step back and investigate if you can install Text::Aspell without administrator privilige. In most cases that's perfectly possible.
You can install modules into your home directory. If there is no C-compiler available on the server you can install the module on a compatible machine, compile and copy the files.

use a variable with whitespace Perl

I am currently working on a project but I have one big problem. I have some picture with a whitespace in the name and I want to do a montage. The problem is that I can't rename my picture and my code is like that :
$pic1 = qq(picture one.png);
$pic2 = qq(picture two.png);
my $cmd = "C:\...\montage.exe $pic1 $pic2 output.png";
system($cmd);
but because of the whitespace montage.exe doesn't work. How can I execute my code without renaming all my pictures?
Thanks a lot for your answer!
You can properly quote the filenames within the string you pass to system, as #Borodin shows in his answer. Something like: system("montage.exe '$pic1' '$pic2'")
However, A more reliable and safer solution is to pass the arguments to montage.exe as extra parameters in the system call:
system('montage.exe', $pic2, $pic2, 'output.png')
Now you don't have to worry about nesting the correct quotes, or worry about files with unexpected characters. Not only is this simpler code, but it avoids malicious injection issues, should those file names ever come from a tainted source. Someone could enter | rm *, but your system call will not remove all your files for you.
Further, in real life, you probably are not going to have a separate scalar variable for each file name. You'll have them in an array. This makes your system call even easier:
system('montage.exe', #filenames, 'output.png')
Not only is that super easy, but it avoids the pitfall of having a command line too long. If your filenames have nice long paths (maybe 50-100 characters), a Windows command line will exceed the max command length after around 100 files. Passing the arguments through system() instead of in one big string avoids that limitation.
Alternatively, you can pass the arguments to montage.exe as a list (instead of concatenating them all into a string):
use strict;
use warnings;
my $pic1 = qq(picture one.png);
my $pic2 = qq(picture two.png);
my #cmd = ("C:\...\montage.exe", $pic1, $pic2, "output.png");
system(#cmd);
You need to put quotes around the file names that have spaces. You also need to escape the backslashes
my $cmd = qq{C:\\...\\montage.exe "$pic1" "$pic2" output.png};
In unix systems, the best approach is the multi-argument form of system because 1) it avoids invoking a shell, and 2) that's the format accepted by the OS call. Neither of those are true in Windows. The OS call to spawn a program expects a command line, and system's attempt to form this command line is sometimes incorrect. The safest approach is to use Win32::ShellQuote.
use Win32::ShellQuote qw( quote_system );
system quote_system("C:\\...\\montage.exe", $pic1, $pic2, "output.png");

Calling a shell command with multiple arguments

I'm trying to automate creating certificates via a Perl script.
The command I want to run is:
easyrsa build-client-full $clientname nopass
The way I thought it should be done in Perl is:
my $arguments = ("build-client-full $clientname nopass");
my $cmd = "$easyrsa_path/easyrsa"." "."$arguments";
system("bash", $cmd);
However, this yields
"file not found"
on execution. I triple checked that the path is correct.
If I try it like this:
my #arguments = ("bash", $easyrsa_path,"build-client-full $clientname nopass");
system(#arguments);
Bash returns
"Unknown command 'build-client-full test nopass'. Run without commands
for usage help."
Background
When you use system(LIST) where LIST has more than one element, Perl will not call the shell, and instead directly invoke the program given by the first element in the LIST, and use the rest of the list as command line arguments to be passed verbatim, with no interpolation by the shell, including no splitting arguments on whitespace.
So in your first example, Perl is running the command bash and passing the string "$easyrsa_path/easyrsa build-client-full $clientname nopass", literally as one big long argument, and in your second example, it's running the command bash and passing the two arguments $easyrsa_path and "build-client-full $clientname nopass". However, I assume that easyrsa needs the three arguments as separate strings in its argument list, which the shell would normally split, but since both of your calls to system aren't using the shell, it's not working.
system (and exec) have four ways of interpreting their arguments, as per the documentation:
If you pass a single string (including a LIST with only one element) that does not contain any shell metacharacters, it is split into words and passed directly to execvp(3) (meaning it bypasses the shell).
Warning: This invocation is easily confused with the following - a single metacharacter will cause the shell to be invoked, which can be dangerous especially when unchecked variables are interpolated into the command string.
If you pass a single string (including a LIST with only one element) that does contain shell metacharacters, the entire argument is passed to the system's command shell for parsing. Normally, that's /bin/sh -c on Unix platforms, but the idea of the "default shell" is problematic, and there is certainly no guarantee that it'll be bash (though it could be).
Warning: In this invocation of system, you have the full power of the shell, which also means you're responsible for correctly quoting and escaping any shell metacharacters and/or whitespace. I recommend you only use this form if you explicitly want the power of the shell, and otherwise, it's usually best to use one of the following two.
If there is more than one argument in LIST, this calls execvp(3) with the arguments in LIST, meaning the shell is avoided.
(See below for caveats on Windows.)
The form system {EXPR} LIST always runs the program named by EXPR and avoids the shell, no matter what's in LIST.
(See below for caveats on Windows.)
The latter two are desirable if you want to pass special characters that the shell would normally interpret, and I'd actually always recommend doing this, since blindly passing user input into system can open up a security hole - I wrote a longer article about that over on PerlMonks.
Solutions
#Borodin and #AnFi have already pointed out: If you simply split up the elements of the LIST properly, it should work - it doesn't look like you need any features of bash or any shell here. And don't forget to check for errors!
system("$easyrsa_path/easyrsa","build-client-full",$clientname,"nopass") == 0
or warn "system failed: \$? = $?";
Note that there are good modules that provide alternatives to system and qx, my go-to module is usually IPC::Run3. These modules are very helpful if you want to capture output from the external command. In this case, IPC::System::Simple might be easier since it provides a drop-in replacement for system with better error handling, as well as systemx which always avoids the shell. (That module is what autodie uses when you say use autodie ':all';.)
use IPC::System::Simple qw/systemx/;
systemx("$easyrsa_path/easyrsa","build-client-full",$clientname,"nopass");
Note that if you really wanted to call bash, you'd need to add the -c option and say system("bash","-c","--","$easyrsa_path/easyrsa build-client-full $clientname nopass"). But as I a said above, I strongly recommend against this, since if $easyrsa_path or $clientname contain any shell metacharacters or malicious content, you may end up having a huge problem.
Windows
Windows is more complicated than the above. The documentation says that the only "reliable" way to avoid calling the shell there is the system PROGRAM LIST form, but on Windows, command line arguments are not passed as a list, but a single big string, and it's up to the called command, not the shell, to interpret that string, and different commands may do that differently - see also. (I have heard good things about Win32::ShellQuote, though.)
Plus, there's the special system(1, #args) form documented in perlport.
If you pass multiple parameters to system then each one forms a separate parameter to the command line. So it is as though you had entered
easyrsa "build-client-full test nopass"
and you correctly get the error
Unknown command 'build-client-full test nopass'
You also don't need to add bash: perl will run the shell for you if necessary
You can either pass the whole command to system
system($cmd)
and perl will pass it to the shell to be processed as if you'd entered it at the command prompt. Or you can split the parameters properly
system("$easyrsa_path/easyrsa", "build-client-full", $clientname, "nopass")
which will make perl call easyrsa directly unless the command contains things that need the shell to process, like output redirection

Suppress output to stderr in matlab

I'm trying to suppress output from a code section in a script (namely the network initialization from a Caffe network). I've tried wrapping the corresponding bit of code in an evalc command
[suppressed_output, var_output] = evalc('someFunction(input)');
But this doesn't work. I've still got loads of lines of (non-error) output from the network initialization that are clogging my logs (amidst all the wanted output printed via fprintf('') in the script). I think this happens because the corresponding function is writing to STDERR (instead of STDOUT?) - the first line it prints is this warning:
WARNING: Logging before InitGoogleLogging() is written to STDERR
... and then hundreds of lines of what it is doing follow, e.g.:
I0215 15:01:51.840272 28620 upgrade_proto.cpp:66] Attempting to upgrade input file specified using deprecated input fields: tmp-def.prototxt
I0215 15:01:51.840360 28620 upgrade_proto.cpp:69] Successfully upgraded file specified using deprecated input fields.
...
Can I somehow suppress the output to STDERR (without messing with the function content)? Ideally only locally for this specific function, since I'd still like to get potential error messages.
In case it is relevant:
I call myScript via matlab command line and its output written to a log (mlexec.log) with tee:
matlab -nodesktop -nosplash -display :1 -r "try, myScript; catch e, disp(getReport(e)), end, quit force" 2>&1| tee mlexec.log
The problem here is, that in the matlab command line call, the output from STDERR is streamed to STDOUT by this "command": 2>&1. Since the .cpp file seems to stream its output to STDERR (according to the Warning), it will be forwarded to STDOUT and eventually the log.
Streaming STDERR (2) to Nirvana with 2>NUL or a different log file (e.g. 2>mlexec.stderr.log) solves the problem.
I wanted to post this in a comment but it said I had to have 50 reputation (I have 49 now...)
I think this is what you're looking for
EDIT/UPDATE:
One thing you can do is enclose a section of your code with warning on/off statements as follows:
warning('off','all')
%your code here
warning('on','all')
This should stop any warnings being output to stderr from that section. I personally do not recommend this, it's good to know what you're doing that the MATLAB runtime does not like.

How to read from a redirected file instead of taking command line parameters

I am writing a program where if no command line arguments are supplied i.e #ARGV == 0, the program takes in three inputs. But, the program has the feature to read any files given as arguments, thus
calculate input1 input2
runs the formula on the numbers found in file1 and file2.
The problem I am running into is when I run
calculate < input1
#ARGV returns 0, thus it runs the code for user input.
How do I get around this so that the program can read input1 and use the values inside for calculations?
calculate < input1 is equivalent to cat input1 | calculate.
You need to read from <STDIN> and not look for command line arguments.
That should not be a problem. If you read reading from <> (which is really <ARGV>), then there is no difference.
You must be doing something wrong if redirection changes things. Are you actually opening files yourself???
You might consider using a module like Getopt::Euclid or Getopt::Long to make the argument passing more explicit. That might make the program easier to understand for other users too.