What can be the possible situations where one should prefer the unbuffered output? - perl

By the discussion in my previous question I came to know that Perl gives line buffer output by default.
$| = 0; # for buffered output (by default)
If you want to get unbuffered output then set the special variable $| to 1 i.e.
$| = 1; # for unbuffered output
Now I want to know that what can be the possible situations where one should prefer the unbuffered output?

You want unbuffered output for interactive tasks. By that, I mean you don't want output stuck in some buffer when you expect someone or something else to respond to the output.
For example, you wouldn't want user prompts sent to STDOUT to be buffered. (That's why STDOUT is never fully buffered when attached to a terminal. It is only line buffered, and the buffer is flushed by attempts to read from STDIN.)
For example, you'd want requests sent over pipes and sockets to not get stuck in some buffer, as the other end of the connection would never see it.
The only other reason I can think of is when you don't want important data to be stuck in a buffer in the event of a unrecoverable error such as a panic or death by signal.
For example, you might want to keep a log file unbuffered in order to be able to diagnose serious problems. (This is why STDERR isn't buffered by default.)

Here's a small sample of Perl users from StackOverflow who have benefited from learning to set $| = 1:
STDOUT redirected externally and no output seen at the console
Perl Print function does not work properly when Sleep() is used
can't write to file using print
perl appending issues
Unclear perl script execution
Perl: Running a "Daemon" and printing
Redirecting STDOUT of a pipe in Perl
Why doesn't my parent process see the child's output until it exits?
Why does adding or removing a newline change the way this perl for loop functions?
Perl not printing properly
In Perl, how do I process input as soon as it arrives, instead of waiting for newline?
What is the simple way to keep the output stream exactly as it shown out on the screen (while interactive data used)?
Is it possible to print from a perl CGI before the process exits?
Why doesn't print output anything on each iteration of a loop when I use sleep?
Perl Daemon Not Working with Sleep()

It can be useful when writing to another program over a socket or pipe. It can also be useful when you are writing debugging information to STDOUT to watch the state of your program live.

Related

Find data point generating Error in Perl code

I have a program in Perl that reads one line at a time from a data file and computes certain statistics for each line of data. Every now and then, while the program reads through my dataset, I get a warning about an ...uninitialized value... and I would like to know which line of data generates this warning.
Is there any way I can tell Perl to print (to screen or file) the data point that is generating the error?
If your script prints one line for each input line, it would be simpler to see when the error occurs by flushing the standard error along with the output (making the message appear at the "right" point):
$| = 1;
That is, turn on the perl autoflush feature, as discussed in How to flush output to the console?
What (auto)flushing does:
error messages are written to the predefined STDERR stream, normal printf's go to the (default) predefined STDOUT.
data written on these streams is saved up by the system (under Perl's control) to write out in chunks (called "buffers") to improve efficiency.
STDERR and STDOUT buffers are independent, and could be written line-by-line or by buffers (many characters, not necessarily lines).
using autoflush tells Perl to modify its scheme for writing buffers so that their content is written via the operating system at the end of a print/printf call.
normally STDERR is written line-by-line. The command tells Perl to enable this feature for the current/default stream, i.e., STDOUT.
doing that makes both of them write line-by-line, so that messages sent close in time via either appears close together in the output of your script.
Perl usually includes the file handle and line number in warnings by default; i.e.
>echo hello | perl -lnwe 'print $x'
Name "main::x" used only once: possible typo at -e line 1.
Use of uninitialized value $x in print at -e line 1, <> line 1.
So if you're doing the computation while reading, you get the appropriate warning.

Perl STDIN without buffering or line buffering

I have a Perl script that received input piped from another program. It's buffering with an 8k (Ubuntu default) input buffer, which is causing problems. I'd like to use line buffering or disable buffering completely. It doesn't look like there's a good way to do this. Any suggestions?
use IO::Handle;
use IO::Poll qw[ POLLIN POLLHUP POLLERR ];
use Text::CSV;
my $stdin = new IO::Handle;
$stdin->fdopen(fileno(STDIN), 'r');
$stdin->setbuf(undef);
my $poll = IO::Poll->new() or die "cannot create IO::Poll object";
$poll->mask($stdin => POLLIN);
STDIN->blocking(0);
my $halt = 0;
for(;;) {
$poll->poll($config{poll_timout});
for my $handle ($poll->handles(POLLIN | POLLHUP | POLLERR)) {
next unless($handle eq $stdin);
if(eof) {
$halt = 1;
last;
}
my #row = $csv->getline($stdin);
# Do more stuff here
}
last if($halt);
}
Polling STDIN kind of throws a wrench into things since IO::Poll uses buffering and direct calls like sysread do not (and they can't mix). I don't want to infinitely call sysread without no blocking. I require the use of select or poll since I don't want to hammer the CPU.
PLEASE NOTE: I'm talking about STDIN, NOT STDOUT. $|++ is not the solution.
[EDIT]
Updating my question to clarify based on the comments and other answers.
The program that is writing to STDOUT (on the other side of the pipe) is line buffered and flushed after every write. Every write contains a newline, so in effect, buffering is not an issue for STDOUT of the first program.
To verify this is true, I wrote a small C program that reads piped input from the same program with STDIN buffering disabled (setvbuf with _IONBF). The input appears in STDIN of the test program immediately. Sadly, it does not appear to be an issue with the output from the first program.
[/EDIT]
Thanks for any insight!
PS. I have done a fair amount of Googling. This link is the closest I've found to an answer, but it certainly doesn't satisfy all my needs.
Say there are two short lines in the pipe's buffer.
IO::Poll notifies you there's data to read, which you proceed to read (indirectly) using readline.
Reading one character at a time from a file handle is very inefficient. As such, readline (aka <>) reads a block of data from the file handle at a time. The two lines ends up in a buffer and the first of the two lines is returned.
Then you wait for IO::Poll to notify you that there is more data. It doesn't know about Perl's buffer; it just knows the pipe is empty. As such, it blocks.
This post demonstrates the problem. It uses IO::Select, but the principle (and solution) is the same.
You're actually talking about the other program's STDOUT. The solution is $|=1; (or equivalent) in the other program.
If you can't, you might be able to convince the other program use line-buffering instead of block buffering by connecting its STDOUT to a pseudo-tty instead of a pipe (like Expect.pm does, for example).
The unix program expect has a tool called unbuffer which does that exactly that. (It's part of the expect-dev package on Ubuntu.) Just prefix the command name with unbuffer.

Line buffered reading in Perl

I have a perl script, say "process_output.pl" which is used in the following context:
long_running_command | "process_output.pl"
The process_output script, needs to be like the unix "tee" command, which dumps output of "long_running_command" to the terminal as it gets generated, and in addition captures output to a text file, and at the end of "long_running_command", forks another process with the text file as an input.
The behavior I am currently seeing is that, the output of "long_running_command" gets dumped to the terminal, only when it gets completed instead of, dumping output as it gets generated. Do I need to do something special to fix this?
Based on my reading in a few other stackexchange posts, i tried the following in "process_output.pl", without much help:
select(STDOUT); $| =1;
select(STDIN); $| =1; # Not sure even if this is needed
use FileHandle; STDOUT->autoflush(1);
stdbuf -oL -eL long_running_command | "process_output.pl"
Any pointers on how to proceed further.
Thanks
AB
This is more likely an issue with the output of the first process being buffered, rather than the input of your script. The easiest solution would be to try using the unbuffer command (I believe it's part of the expect package), something like
unbuffer long_running_command | "process_output.pl"
The unbuffer command will disable the buffering that happens normally when output is directed to a non-interactive place.
This will be the output processing of long_running_processing. More than likely it is using stdio - which will look to see what the output file descriptor is connected to before it does outputing. If it is a terminal (tty), then it will generally output line based, but in the above case - it will notice it is writing to a pipe and will therefore buffer the output into larger chunks.
You can control the buffering in your own process by using, as you showed
select(STDOUT); $| =1;
This means that things that your process prints to STDIO, are not buffered - it makes no sense doing this for input, as you control how much buffering is done - if you use sysread() then you are reading unbuffered, if you use a construct like <$fh> then perl will await until it has a "whole line" (it actually reads up to the next input line separator (as defined in variable $/ which is newline by default)) before it returns data to you.
unbuffer can be used to "disable" the output buffering, what it actually does is make the outputing process think that it is talking to a tty (by using a pseudo tty) so the output process does not buffer.

Disabling Inappropriate Buffering Perl

I was working with a file parser in perl that prints the name of every file it processes. But i noticed that these print outputs appeared out of order which got my attention. After further digging, i found out that this is because, Perl is using Buffering and releases these print statements to the output only when the buffer is full. I also learned that there is a work around by "making the filehandle hot". Whenever you print to a hot filehandle, Perl flushes the buffer immediately. So my question is :
Are there any consequences of "making the filehandle hot" ?
Does leaving the buffer to get filled up before flushing vs flushing immediately have any effect on performance ?
Perl uses different output buffering modes depending on context: Writing to files etc. buffers in chunks (this is important for performance), while a handle is flushed after each line if perl has reason to believe that the output goes to a terminal. STDERR is unbuffered by default.
You can deactivate buffering for the currently selected file handle by setting the special $| variable to a true value. However, this is better expressed as:
use IO::File; # on older perls
...
$some_file_handle->autoflush(1);
print { $some_file_handle } "this isn't buffered";
which has the advantage that you don't have to use the annoying select function for handles other than STDOUT. Why is this method called autoflush? The file handle is still buffered, but the buffer is automatically flushed after each print or say call.
Careful: The autoflush method won't work on truly ancient perls where file handles aren't objects yet. In that case, do the select dance:
my $old_fh = select $my_$fh;
$| = 1;
select $old_fh;
print { $my_fh } "this isn't buffered";
(select returns the currently selected file handle).

Perl - can't flush STDOUT or STDERR

Perl 5.14 from stock Ubuntu Precise repos. Trying to write a simple wrapper to monitor progress on copying from one stream to another:
use IO::Handle;
while ($bufsize = read (SOURCE, $buffer, 1048576)) {
STDERR->printflush ("Transferred $xferred of $sendsize bytes\n");
$xferred += $bufsize;
print TARGET $buffer;
}
This does not perform as expected (writing a line each time the 1M buffer is read). I end up seeing the first line (with a blank value of $xferred), and then nothing until everything flushes on the 7th and 8th lines (on an 8MB transfer). Been pounding my brains out on this for hours - I've read the perldocs, I've read the classic "Suffering from Buffering" article, I've tried everything from select and $|++ to IO::Handle to binmode (STDERR, "::unix") to you name it. I've also tried flushing TARGET with each line using IO::Handle (TARGET->flush). No dice.
Has anybody else ever encountered this? I don't have any ideas left. Sleeping one second "fixes" the problem, but obviously I don't want to sleep a second every time I read a buffer just so my progress will output on the screen!
FWIW, the problem is exactly the same whether I'm outputting to STDERR or STDOUT.
Perl read calls fread(3), not read(2).
This means that it goes through libc and may be using an internal buffer larger than yours; i.e., it gets all the data there is to be received and then quickly throws it at you in 1MB increments.
If this conjecture is correct, the solution might be to use sysread, which calls read(2), instead of read.