I was working with a file parser in perl that prints the name of every file it processes. But i noticed that these print outputs appeared out of order which got my attention. After further digging, i found out that this is because, Perl is using Buffering and releases these print statements to the output only when the buffer is full. I also learned that there is a work around by "making the filehandle hot". Whenever you print to a hot filehandle, Perl flushes the buffer immediately. So my question is :
Are there any consequences of "making the filehandle hot" ?
Does leaving the buffer to get filled up before flushing vs flushing immediately have any effect on performance ?
Perl uses different output buffering modes depending on context: Writing to files etc. buffers in chunks (this is important for performance), while a handle is flushed after each line if perl has reason to believe that the output goes to a terminal. STDERR is unbuffered by default.
You can deactivate buffering for the currently selected file handle by setting the special $| variable to a true value. However, this is better expressed as:
use IO::File; # on older perls
...
$some_file_handle->autoflush(1);
print { $some_file_handle } "this isn't buffered";
which has the advantage that you don't have to use the annoying select function for handles other than STDOUT. Why is this method called autoflush? The file handle is still buffered, but the buffer is automatically flushed after each print or say call.
Careful: The autoflush method won't work on truly ancient perls where file handles aren't objects yet. In that case, do the select dance:
my $old_fh = select $my_$fh;
$| = 1;
select $old_fh;
print { $my_fh } "this isn't buffered";
(select returns the currently selected file handle).
Related
I have a Perl script that received input piped from another program. It's buffering with an 8k (Ubuntu default) input buffer, which is causing problems. I'd like to use line buffering or disable buffering completely. It doesn't look like there's a good way to do this. Any suggestions?
use IO::Handle;
use IO::Poll qw[ POLLIN POLLHUP POLLERR ];
use Text::CSV;
my $stdin = new IO::Handle;
$stdin->fdopen(fileno(STDIN), 'r');
$stdin->setbuf(undef);
my $poll = IO::Poll->new() or die "cannot create IO::Poll object";
$poll->mask($stdin => POLLIN);
STDIN->blocking(0);
my $halt = 0;
for(;;) {
$poll->poll($config{poll_timout});
for my $handle ($poll->handles(POLLIN | POLLHUP | POLLERR)) {
next unless($handle eq $stdin);
if(eof) {
$halt = 1;
last;
}
my #row = $csv->getline($stdin);
# Do more stuff here
}
last if($halt);
}
Polling STDIN kind of throws a wrench into things since IO::Poll uses buffering and direct calls like sysread do not (and they can't mix). I don't want to infinitely call sysread without no blocking. I require the use of select or poll since I don't want to hammer the CPU.
PLEASE NOTE: I'm talking about STDIN, NOT STDOUT. $|++ is not the solution.
[EDIT]
Updating my question to clarify based on the comments and other answers.
The program that is writing to STDOUT (on the other side of the pipe) is line buffered and flushed after every write. Every write contains a newline, so in effect, buffering is not an issue for STDOUT of the first program.
To verify this is true, I wrote a small C program that reads piped input from the same program with STDIN buffering disabled (setvbuf with _IONBF). The input appears in STDIN of the test program immediately. Sadly, it does not appear to be an issue with the output from the first program.
[/EDIT]
Thanks for any insight!
PS. I have done a fair amount of Googling. This link is the closest I've found to an answer, but it certainly doesn't satisfy all my needs.
Say there are two short lines in the pipe's buffer.
IO::Poll notifies you there's data to read, which you proceed to read (indirectly) using readline.
Reading one character at a time from a file handle is very inefficient. As such, readline (aka <>) reads a block of data from the file handle at a time. The two lines ends up in a buffer and the first of the two lines is returned.
Then you wait for IO::Poll to notify you that there is more data. It doesn't know about Perl's buffer; it just knows the pipe is empty. As such, it blocks.
This post demonstrates the problem. It uses IO::Select, but the principle (and solution) is the same.
You're actually talking about the other program's STDOUT. The solution is $|=1; (or equivalent) in the other program.
If you can't, you might be able to convince the other program use line-buffering instead of block buffering by connecting its STDOUT to a pseudo-tty instead of a pipe (like Expect.pm does, for example).
The unix program expect has a tool called unbuffer which does that exactly that. (It's part of the expect-dev package on Ubuntu.) Just prefix the command name with unbuffer.
I have a perl script, say "process_output.pl" which is used in the following context:
long_running_command | "process_output.pl"
The process_output script, needs to be like the unix "tee" command, which dumps output of "long_running_command" to the terminal as it gets generated, and in addition captures output to a text file, and at the end of "long_running_command", forks another process with the text file as an input.
The behavior I am currently seeing is that, the output of "long_running_command" gets dumped to the terminal, only when it gets completed instead of, dumping output as it gets generated. Do I need to do something special to fix this?
Based on my reading in a few other stackexchange posts, i tried the following in "process_output.pl", without much help:
select(STDOUT); $| =1;
select(STDIN); $| =1; # Not sure even if this is needed
use FileHandle; STDOUT->autoflush(1);
stdbuf -oL -eL long_running_command | "process_output.pl"
Any pointers on how to proceed further.
Thanks
AB
This is more likely an issue with the output of the first process being buffered, rather than the input of your script. The easiest solution would be to try using the unbuffer command (I believe it's part of the expect package), something like
unbuffer long_running_command | "process_output.pl"
The unbuffer command will disable the buffering that happens normally when output is directed to a non-interactive place.
This will be the output processing of long_running_processing. More than likely it is using stdio - which will look to see what the output file descriptor is connected to before it does outputing. If it is a terminal (tty), then it will generally output line based, but in the above case - it will notice it is writing to a pipe and will therefore buffer the output into larger chunks.
You can control the buffering in your own process by using, as you showed
select(STDOUT); $| =1;
This means that things that your process prints to STDIO, are not buffered - it makes no sense doing this for input, as you control how much buffering is done - if you use sysread() then you are reading unbuffered, if you use a construct like <$fh> then perl will await until it has a "whole line" (it actually reads up to the next input line separator (as defined in variable $/ which is newline by default)) before it returns data to you.
unbuffer can be used to "disable" the output buffering, what it actually does is make the outputing process think that it is talking to a tty (by using a pseudo tty) so the output process does not buffer.
By the discussion in my previous question I came to know that Perl gives line buffer output by default.
$| = 0; # for buffered output (by default)
If you want to get unbuffered output then set the special variable $| to 1 i.e.
$| = 1; # for unbuffered output
Now I want to know that what can be the possible situations where one should prefer the unbuffered output?
You want unbuffered output for interactive tasks. By that, I mean you don't want output stuck in some buffer when you expect someone or something else to respond to the output.
For example, you wouldn't want user prompts sent to STDOUT to be buffered. (That's why STDOUT is never fully buffered when attached to a terminal. It is only line buffered, and the buffer is flushed by attempts to read from STDIN.)
For example, you'd want requests sent over pipes and sockets to not get stuck in some buffer, as the other end of the connection would never see it.
The only other reason I can think of is when you don't want important data to be stuck in a buffer in the event of a unrecoverable error such as a panic or death by signal.
For example, you might want to keep a log file unbuffered in order to be able to diagnose serious problems. (This is why STDERR isn't buffered by default.)
Here's a small sample of Perl users from StackOverflow who have benefited from learning to set $| = 1:
STDOUT redirected externally and no output seen at the console
Perl Print function does not work properly when Sleep() is used
can't write to file using print
perl appending issues
Unclear perl script execution
Perl: Running a "Daemon" and printing
Redirecting STDOUT of a pipe in Perl
Why doesn't my parent process see the child's output until it exits?
Why does adding or removing a newline change the way this perl for loop functions?
Perl not printing properly
In Perl, how do I process input as soon as it arrives, instead of waiting for newline?
What is the simple way to keep the output stream exactly as it shown out on the screen (while interactive data used)?
Is it possible to print from a perl CGI before the process exits?
Why doesn't print output anything on each iteration of a loop when I use sleep?
Perl Daemon Not Working with Sleep()
It can be useful when writing to another program over a socket or pipe. It can also be useful when you are writing debugging information to STDOUT to watch the state of your program live.
I've found something similar like this in a piece of code:
use IO::Handle;
autoflush STDOUT 1;
print '';
Is the purpose of "print" to empty a possibly filled buffer?
The print forces all text in buffer (from previous prints) to be ouputted immediately. The code basically disable buffering and flush everything.
The print call should be a wasted system call. perlvar states, "If set to nonzero, forces a flush right away and after every write or print on the currently selected output channel." The code in this example should turn on autoflush, causing a flush, then add noting to the STDOUT buffer and cause a flush. There may be another reason for the print but my guess is that the original author of the code made the same assumption as bvr that there would be data left in the buffer after the call to autoflush that would need to be flushed.
I have Perl script which appends a new line to the existing file every 3 seconds. Also, there is a C++ application which reads from that file.
The problem is that the application begins to read the file after the script is done and file handle is closed. To avoid this I want to flush after each line append. How can I do that?
Try:
use IO::Handle;
$fh->autoflush;
This was actually posted as a way of auto-flushing in an early question of mine, which asked about the universally accepted bad way of achieving this :-)
TL/DR: use IO::Handle and the flush method, eg:
use IO::Handle;
$myfile->flush();
First, you need to decide how "flushed" you want it. There can be quite a few layers of buffering:
Perl's internal buffer on the file handle. Other programs can't see data until it's left this buffer.
File-system level buffering of "dirty" file blocks. Other programs can still see these changes, they seem "written", but they'll be lost if the OS or machine crashes.
Disk-level write-back buffering of writes. The OS thinks these are written to disk, but the disk is actually just storing them in volatile memory on the drive. If the OS crashes the data won't be lost, but if power fails it might be unless the disk can write it out first. This is a big problem with cheap consumer SSDs.
It gets even more complicated when SANs, remote file systems, RAID controllers, etc get involved. If you're writing via pipes there's also the pipe buffer to consider.
If you just want to flush the Perl buffer, you can close the file, print a string containing "\n" (since it appears that Perl flushes on newlines), or use IO::Handle's flush method.
You can also, per the perl faq use binmode or play with $| to make the file handle unbuffered. This is not the same thing as flushing a buffered handle, since queuing up a bunch of buffered writes then doing a single flush has a much lower performance cost than writing to an unbuffered handle.
If you want to flush the file system write back buffer you need to use a system call like fsync(), open your file in O_DATASYNC mode, or use one of the numerous other options. It's painfully complicated, as evidenced by the fact that PostgreSQL has its own tool just to test file syncing methods.
If you want to make sure it's really, truly, honestly on the hard drive in permanent storage you must flush it to the file system in your program. You also need to configure the hard drive/SSD/RAID controller/SAN/whatever to really flush when the OS asks it to. This can be surprisingly complicated to do and is quite OS/hardware specific. "plug-pull" testing is strongly recommended to make sure you've really got it right.
From 'man perlfaq5':
$old_fh = select(OUTPUT_HANDLE);
$| = 1;
select($old_fh);
If you just want to flush stdout, you can probably just do:
$| = 1;
But check the FAQ for details on a module that gives you a nicer-to-use abstraction, like IO::Handle.
Here's the answer - the real answer.
Stop maintaining an open file handle for this file for the life of the process.
Start abstracting your file-append operation into a sub that opens the file in append mode, writes to it, and closes it.
# Appends a new line to the existing file
sub append_new_line{
my $linedata = shift;
open my $fh, '>>', $fnm or die $!; # $fnm is file-lexical or something
print $fh $linedata,"\n"; # Flavor to taste
close $fh;
}
The process observing the file will encounter a closed file that gets modified whenever the function is called.
All of the solutions suggesting setting autoflush are ignoring the basic fact that most modern OS's are buffering file I/O irrespective of what Perl is doing.
You only possibility to force the commitment of the data to disk is by closing the file.
I'm trapped with the same dilemma atm where we have an issue with rotation of the log being written.
To automatically flush the output, you can set autoflush/$| as described by others before you output to the filehandle.
If you've already output to the filehandle and need to ensure that it gets to the physical file, you need to use the IO::Handle flush and sync methods.
There an article about this in PerlDoc: How do I flush/unbuffer an output filehandle? Why must I do this?
Two solutions:
Unbuffer the output filehandler with : $|
Call the autoflush method if you are using IO::Handle or one of its subclasses.
An alternative approach would be to use a named pipe between your Perl script and C++ program, in lieu of the file you're currently using.
For those who are searching a solution to flush output line by line to a file in Ansys CFD Post using a Session File (*.cse), this is the only solution that worked for me:
! $file="Test.csv";
! open(OUT,"+>>$file");
! select(OUT);$|=1; # This is the important line
! for($i=0;$i<=10;$i++)
! {
! print out "$i\n";
! sleep(3);
! }
Note that you need the exclamation marks at every begin of every line that contains Perl script. sleep(3); is only applied for demonstration reasons. use IO::Handle; is not needed.
The genuine correct answer is to use:-
$|=1; # Make STDOUT immediate (non-buffered)
and although that is one cause of your problem, the other cause of the same problem is this: "Also, there is a C++ application which reads from that file."
It is EXTREMELY NON-TRIVIAL to write C++ code which can properly read from a file that is growing, because your "C++" program will encounter an EOF when it gets to the end... (you cannot read past the end of a file without serious extra trickery) - you have to do a pile of complicated stuff with IO blocking and flags to properly monitor a file this way (like how the linux "tail" command works).
I had the same problem with the only difference of writing the same file over and over again with new content. This association of "$| = 1" and autoflush worked for me:
open (MYFILE, '>', '/internet/web-sites/trot/templates/xml_queries/test.xml');
$| = 1; # Before writing!
print MYFILE "$thisCardReadingContentTemplate\n\n";
close (MYFILE);
MYFILE->autoflush(1); # After writing!
Best of luck.
H