Safely pass input line by line (from generator) to subprocess' stdin on Python - subprocess

I want to manage a subprocess with the subprocess module, and I need to pipe a (really) large numbers of lines to the child stdin. I'm creating the input with a generator, and passing onto the subprocess like this:
def my_gen (end): # simplified example
for i in range(0, end):
yield f"line {i}"
with subprocess.Popen(["command", "-o", "option_value"], # simplified example
stdin = subprocess.PIPE, stdout = sys.stdout, stderr = sys.stderr) as process:
for line in my_gen(1e7):
process.stdin.write(line.encode()) # This is apparently not safe
out, err = process.communicate() # out and err will be None,
# but this closes the process gracefully, which "with" does too
This results in a Broken Pipe Error, although it does't happen all the time on every machine I've tried:
Traceback (most recent call last):
File "my_script", line 170, in <module>
process.stdin.write(line.encode())
BrokenPipeError: [Errno 32] Broken pipe
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "path/tolib/python3.8/subprocess.py", line 171, in <module>
File "path/tolib/python3.8/subprocess.py", line 914, in __exit__
self.stdin.close()
BrokenPipeError: [Errno 32] Broken pipe
So, what's the safe way to pass input line by line from a generator to a subprocess?
Edit: I've been getting suggestions about using communicate, which is of course in the docs. That answers how to communicate safely, but it doesn't accept a generator as input.
Edit2: as Booboo pointed out, the example will throw a runtime error (not the one I was finding in my code), the call to range should be range(0, int(end)) so my_gen can accept numbers in 1e7 notation.

First of all, if you want stdout and stderr to not be piped, then either do not specify these arguments to the Popen call at all or specify their values as None, the default value if not specified (but do not specify these as sys.stdout and sys.stderr).
Why not? Looking at the source for the Popen.communicate method I can see that there is special optimized code for the case where there is only one non-None argument and when that argument is the sysin argument then Popen.communicate is implemented by simply doing a write of the past input string to the pipe and ignores any BrokenPipeError error that might occur. But by passing the stdout and stderr arguments as you are, I suspect that communicate is confused and is now starting threads to handle the processing and this is ultimately intermittently leading to your exception.
Now I believe that you can execute your writes without using communicate and also ignore the BrokenPipeError. When I tried the following code (substituting my own command being executed by Popen that writes what is being piped in to a file and using text mode), I, in fact, did not encounter any BrokenPipeError exceptions (nor do I expect to with the proper setting of stdout and stderr). So I can't swear to whether the output will still be correct if such an exception should occur.
As an aside, the range built-in function does not take a float object (at least not for me), so I don't know how you are able to specify 1e7.
I have also modified the code to add terminating newline characters at the end of each line and to process in text mode, but you should not feel constrained to do so.
import subprocess
import sys
def my_gen (end): # simplified example
for i in range(0, end):
yield f"line {i}\n"
with subprocess.Popen(["command", "-o", "option_value"], stdin=subprocess.PIPE, text=True) as process: # simplified example
for line in my_gen(10_000_000):
try:
process.stdin.write(line)
except BrokenPipeError as e:
pass
out, err = process.communicate()

Docs say to use .communicate:
Warning: Use communicate() rather than .stdin.write, .stdout.read or .stderr.read to avoid deadlocks due to any of the other OS pipe buffers filling up and blocking the child process.
https://docs.python.org/3/library/subprocess.html#subprocess.Popen.communicate

Related

Suppress output to stderr in matlab

I'm trying to suppress output from a code section in a script (namely the network initialization from a Caffe network). I've tried wrapping the corresponding bit of code in an evalc command
[suppressed_output, var_output] = evalc('someFunction(input)');
But this doesn't work. I've still got loads of lines of (non-error) output from the network initialization that are clogging my logs (amidst all the wanted output printed via fprintf('') in the script). I think this happens because the corresponding function is writing to STDERR (instead of STDOUT?) - the first line it prints is this warning:
WARNING: Logging before InitGoogleLogging() is written to STDERR
... and then hundreds of lines of what it is doing follow, e.g.:
I0215 15:01:51.840272 28620 upgrade_proto.cpp:66] Attempting to upgrade input file specified using deprecated input fields: tmp-def.prototxt
I0215 15:01:51.840360 28620 upgrade_proto.cpp:69] Successfully upgraded file specified using deprecated input fields.
...
Can I somehow suppress the output to STDERR (without messing with the function content)? Ideally only locally for this specific function, since I'd still like to get potential error messages.
In case it is relevant:
I call myScript via matlab command line and its output written to a log (mlexec.log) with tee:
matlab -nodesktop -nosplash -display :1 -r "try, myScript; catch e, disp(getReport(e)), end, quit force" 2>&1| tee mlexec.log
The problem here is, that in the matlab command line call, the output from STDERR is streamed to STDOUT by this "command": 2>&1. Since the .cpp file seems to stream its output to STDERR (according to the Warning), it will be forwarded to STDOUT and eventually the log.
Streaming STDERR (2) to Nirvana with 2>NUL or a different log file (e.g. 2>mlexec.stderr.log) solves the problem.
I wanted to post this in a comment but it said I had to have 50 reputation (I have 49 now...)
I think this is what you're looking for
EDIT/UPDATE:
One thing you can do is enclose a section of your code with warning on/off statements as follows:
warning('off','all')
%your code here
warning('on','all')
This should stop any warnings being output to stderr from that section. I personally do not recommend this, it's good to know what you're doing that the MATLAB runtime does not like.

opening a batch file that opens a text file in python

I am writing a script that can execute a batch file, which needs to open a file in the same folder first. My current code is:
from subprocess import Popen
p = Popen("Mad8dl.bat <RUNTHISTO.txt>", cwd=r"C:\...\test")
stdout, stderr = p.communicate()
where the ... is just the path to the folder. However, everytime I run it I get the syntax error:
The syntax of the command is incorrect
Any help regarding the syntax would be greatly appreciated.
First, you should probably remove the < and > angle brackets from your code; just pass the filename, without any brackets, to your batch file. (Unless your filename really does contain < and > characters, in which case I really want to know how you managed it since those characters are forbidden in filenames in Windows).
Second, your code should look like:
from subprocess import Popen, PIPE
p = Popen(["Mad8dl.bat", "RUNTHISTOO.txt"], cwd=r"C:\...\test", stdout=PIPE, stderr=PIPE)
stdout, stderr = p.communicate()
Note the list containing the components of the call, rather than a single string. Also note that you need to specify stdout=PIPE and stderr=PIPE in your Popen() call if you want to use communicate() later on.

External program called with backticks still produces output

so I call an external program in perl and want to capture it's output:
my #RNAalifoldOut = `RNAalifold some parameters`;
If called from command line the output consists of three lines, e.g:
4 sequences; length of alignment 48.
__GCCGA_UGUAGCUCAGUUGGG_AGAGCGCCAGACUGAAAAUCAGA
...((((.....((((.........)))).(((((.......)))))
However my array #RNAalifoldOut contains only the two last lines and the first line appears directly on the screen when the line is being executed.
How can this be? I thought maybe the program writes the first line to STDERR, but isn't that discarded by the backticks operator? And what could I do to hide this output?
Regards
Nick
You are likely seeing the standard error from RNAalifold. Backticks capture only the standard output.
Capture both standard output and standard error by changing your code to
my #RNAalifoldOut = `RNAalifold some parameters 2>&1`;
To discard the standard error, use
my #RNAalifoldOut = `RNAalifold some parameters 2>/dev/null`;
on Unix-like platforms. On Windows, use
my #RNAalifoldOut = `RNAalifold some parameters 2>nul`;

How to access buffer contents of Expect module in perl

I am using expect to automate terminal based applications. I will send data depending on result from "expect" command. I knew that expect, while doing a string matching stores all the unmatched string patterns in a buffer. For example $expect_out(0,string) is used to store the string that expect is actually waiting for, while $expect_out(buffer) contains all the unmatched string patterns occurred till the previous command.
I want to know if there is any way of accessing these expect buffers, like copying expect buffer contents into some variable as shown below
$mybuffer = $expect_out(buffer);
but the above statement is actually throwing an error "syntax error at perl_app_hh.pl line 72, near "$expect_out(""
I just want to copy contents of expect buffer to a variable. So please help me on this issue.
You're going to have to read the documentation for the Expect module. $expect(buffer) is not valid Perl.
$exp = Expect->spawn(...);
$exp->send(...);
$buffer = $exp->before();

Spotify Tech Puzzle - stdin in Python

I'm trying to solve the bilateral problem on Spotify's Tech Puzzles. http://www.spotify.com/us/jobs/tech/bilateral-projects/ I have something that is working on my computer that reads input from a file input.txt, and it outputs to ouput.txt. My problem is that I cannot figure out how to make my code work when I submit it where it must read from stdin. I have looked at several other posts and I don't see anything that makes sense to me. I see some people just use raw_input - but this produces a user prompt?? Not sure what to do. Here is the protion of my code that is suposed to read the input, and write the output. Any suggestions on how this might need changed? Also how would I test the code once it is changed to read from stdin? How can I put test data in stdin? The error i get back from spotify says Run Time Error - NameError.
import sys
# Read input
Input = []
for line in sys.stdin.readlines():
if len(line) <9:
teamCount = int(line)
if len(line) > 8:
subList = []
a = line[0:4]
b = line[5:9]
subList.append(a)
subList.append(b)
Input.append(subList)
##### algorithm here
#write output
print listLength
for empWin in win:
print empWin
You are actually doing ok.
for line in sys.stdin.readlines():
will read lines from stdin. It can however be shortened to:
for line in sys.stdin:
I don't use Windows, but to test your solution from a command line, you should run it like this:
python bilateral.py < input.txt > output.txt
If I run your code above like that, I see the error message
Traceback (most recent call last):
File "bilateral.py", line 20, in <module>
print listLength
NameError: name 'listLength' is not defined
which by accident (because I guess you didn't send in that) was the error the Spotify puzzle checker discovered. You have probably just misspelled a variable somewhere.