How do I flush a file in Perl? - perl

I have Perl script which appends a new line to the existing file every 3 seconds. Also, there is a C++ application which reads from that file.
The problem is that the application begins to read the file after the script is done and file handle is closed. To avoid this I want to flush after each line append. How can I do that?

Try:
use IO::Handle;
$fh->autoflush;
This was actually posted as a way of auto-flushing in an early question of mine, which asked about the universally accepted bad way of achieving this :-)

TL/DR: use IO::Handle and the flush method, eg:
use IO::Handle;
$myfile->flush();
First, you need to decide how "flushed" you want it. There can be quite a few layers of buffering:
Perl's internal buffer on the file handle. Other programs can't see data until it's left this buffer.
File-system level buffering of "dirty" file blocks. Other programs can still see these changes, they seem "written", but they'll be lost if the OS or machine crashes.
Disk-level write-back buffering of writes. The OS thinks these are written to disk, but the disk is actually just storing them in volatile memory on the drive. If the OS crashes the data won't be lost, but if power fails it might be unless the disk can write it out first. This is a big problem with cheap consumer SSDs.
It gets even more complicated when SANs, remote file systems, RAID controllers, etc get involved. If you're writing via pipes there's also the pipe buffer to consider.
If you just want to flush the Perl buffer, you can close the file, print a string containing "\n" (since it appears that Perl flushes on newlines), or use IO::Handle's flush method.
You can also, per the perl faq use binmode or play with $| to make the file handle unbuffered. This is not the same thing as flushing a buffered handle, since queuing up a bunch of buffered writes then doing a single flush has a much lower performance cost than writing to an unbuffered handle.
If you want to flush the file system write back buffer you need to use a system call like fsync(), open your file in O_DATASYNC mode, or use one of the numerous other options. It's painfully complicated, as evidenced by the fact that PostgreSQL has its own tool just to test file syncing methods.
If you want to make sure it's really, truly, honestly on the hard drive in permanent storage you must flush it to the file system in your program. You also need to configure the hard drive/SSD/RAID controller/SAN/whatever to really flush when the OS asks it to. This can be surprisingly complicated to do and is quite OS/hardware specific. "plug-pull" testing is strongly recommended to make sure you've really got it right.

From 'man perlfaq5':
$old_fh = select(OUTPUT_HANDLE);
$| = 1;
select($old_fh);
If you just want to flush stdout, you can probably just do:
$| = 1;
But check the FAQ for details on a module that gives you a nicer-to-use abstraction, like IO::Handle.

Here's the answer - the real answer.
Stop maintaining an open file handle for this file for the life of the process.
Start abstracting your file-append operation into a sub that opens the file in append mode, writes to it, and closes it.
# Appends a new line to the existing file
sub append_new_line{
my $linedata = shift;
open my $fh, '>>', $fnm or die $!; # $fnm is file-lexical or something
print $fh $linedata,"\n"; # Flavor to taste
close $fh;
}
The process observing the file will encounter a closed file that gets modified whenever the function is called.

All of the solutions suggesting setting autoflush are ignoring the basic fact that most modern OS's are buffering file I/O irrespective of what Perl is doing.
You only possibility to force the commitment of the data to disk is by closing the file.
I'm trapped with the same dilemma atm where we have an issue with rotation of the log being written.

To automatically flush the output, you can set autoflush/$| as described by others before you output to the filehandle.
If you've already output to the filehandle and need to ensure that it gets to the physical file, you need to use the IO::Handle flush and sync methods.

There an article about this in PerlDoc: How do I flush/unbuffer an output filehandle? Why must I do this?
Two solutions:
Unbuffer the output filehandler with : $|
Call the autoflush method if you are using IO::Handle or one of its subclasses.

An alternative approach would be to use a named pipe between your Perl script and C++ program, in lieu of the file you're currently using.

For those who are searching a solution to flush output line by line to a file in Ansys CFD Post using a Session File (*.cse), this is the only solution that worked for me:
! $file="Test.csv";
! open(OUT,"+>>$file");
! select(OUT);$|=1; # This is the important line
! for($i=0;$i<=10;$i++)
! {
! print out "$i\n";
! sleep(3);
! }
Note that you need the exclamation marks at every begin of every line that contains Perl script. sleep(3); is only applied for demonstration reasons. use IO::Handle; is not needed.

The genuine correct answer is to use:-
$|=1; # Make STDOUT immediate (non-buffered)
and although that is one cause of your problem, the other cause of the same problem is this: "Also, there is a C++ application which reads from that file."
It is EXTREMELY NON-TRIVIAL to write C++ code which can properly read from a file that is growing, because your "C++" program will encounter an EOF when it gets to the end... (you cannot read past the end of a file without serious extra trickery) - you have to do a pile of complicated stuff with IO blocking and flags to properly monitor a file this way (like how the linux "tail" command works).

I had the same problem with the only difference of writing the same file over and over again with new content. This association of "$| = 1" and autoflush worked for me:
open (MYFILE, '>', '/internet/web-sites/trot/templates/xml_queries/test.xml');
$| = 1; # Before writing!
print MYFILE "$thisCardReadingContentTemplate\n\n";
close (MYFILE);
MYFILE->autoflush(1); # After writing!
Best of luck.
H

Related

How to read, write, append the input file using perl

How to read and overwrite the same input file using perl?
My input:
GCGCCACTGCACTCCAGCCTGGGCGACAGAGC (873 TO 904) GCTCTGTCGCCCAGGCTGGAGTGCAGTGGCGC (3033 TO 3064)
CAAAAAAAAAAAAAAAAAAA (917 TO 936) TTTTTTTTTTTTTTTTTTTG (2998 TO 3017)
AAAAAAAAAAAAAAAAAAAG (922 TO 941) CTTTTTTTTTTTTTTTTTTT (2997 TO 3016)
I tried the below code:
#!/usr/local/bin/perl
open($in,'<',"/home/httpd/cgi-bin/exa.txt") || die("error");
open($out,'>>',"/home/httpd/cgi-bin/exa.txt")||die("error");
while(<$in>)
{
print $out;
}
close $in;
close $out;
Bear in mind what you're looking at doing here - you're opening a file for reading, reading it one line at a time.
What do you think is going to happen when you modify that file in the process?
There's also some constraints - Windows doesn't support concurrent opening for read/write anyway.
However take a look at open specifically:
You can put a + in front of the > or < to indicate that you want both read and write access to the file; thus +< is almost always preferred for read/write updates--the +> mode would clobber the file first. You can't usually use either read-write mode for updating textfiles, since they have variable-length records. See the -i switch in perlrun for a better approach. The file is created with permissions of 0666 modified by the process's umask value.
What I would suggest instead though - don't read and write from the same file at all. Rename one, execute your process, verify that it worked properly, and then tidy up afterwards.
That way a partial success won't mean corrupted data.
You can use the -i flag - see perlrun - this allows you to in place edit as you might be used to with sed. (Can be used within program via $^I - see perlvar )
There's a couple of constraints on doing this though - specifically it only works if you're using the while ( <> ) { construct. Practically speaking, I think this wouldn't be a good choice outside more simplistic programs - it's doing something implicitly, so might not be entirely clear to future readers, and it's doing essentially the same thing as opening and renaming anyway.

Perl STDIN without buffering or line buffering

I have a Perl script that received input piped from another program. It's buffering with an 8k (Ubuntu default) input buffer, which is causing problems. I'd like to use line buffering or disable buffering completely. It doesn't look like there's a good way to do this. Any suggestions?
use IO::Handle;
use IO::Poll qw[ POLLIN POLLHUP POLLERR ];
use Text::CSV;
my $stdin = new IO::Handle;
$stdin->fdopen(fileno(STDIN), 'r');
$stdin->setbuf(undef);
my $poll = IO::Poll->new() or die "cannot create IO::Poll object";
$poll->mask($stdin => POLLIN);
STDIN->blocking(0);
my $halt = 0;
for(;;) {
$poll->poll($config{poll_timout});
for my $handle ($poll->handles(POLLIN | POLLHUP | POLLERR)) {
next unless($handle eq $stdin);
if(eof) {
$halt = 1;
last;
}
my #row = $csv->getline($stdin);
# Do more stuff here
}
last if($halt);
}
Polling STDIN kind of throws a wrench into things since IO::Poll uses buffering and direct calls like sysread do not (and they can't mix). I don't want to infinitely call sysread without no blocking. I require the use of select or poll since I don't want to hammer the CPU.
PLEASE NOTE: I'm talking about STDIN, NOT STDOUT. $|++ is not the solution.
[EDIT]
Updating my question to clarify based on the comments and other answers.
The program that is writing to STDOUT (on the other side of the pipe) is line buffered and flushed after every write. Every write contains a newline, so in effect, buffering is not an issue for STDOUT of the first program.
To verify this is true, I wrote a small C program that reads piped input from the same program with STDIN buffering disabled (setvbuf with _IONBF). The input appears in STDIN of the test program immediately. Sadly, it does not appear to be an issue with the output from the first program.
[/EDIT]
Thanks for any insight!
PS. I have done a fair amount of Googling. This link is the closest I've found to an answer, but it certainly doesn't satisfy all my needs.
Say there are two short lines in the pipe's buffer.
IO::Poll notifies you there's data to read, which you proceed to read (indirectly) using readline.
Reading one character at a time from a file handle is very inefficient. As such, readline (aka <>) reads a block of data from the file handle at a time. The two lines ends up in a buffer and the first of the two lines is returned.
Then you wait for IO::Poll to notify you that there is more data. It doesn't know about Perl's buffer; it just knows the pipe is empty. As such, it blocks.
This post demonstrates the problem. It uses IO::Select, but the principle (and solution) is the same.
You're actually talking about the other program's STDOUT. The solution is $|=1; (or equivalent) in the other program.
If you can't, you might be able to convince the other program use line-buffering instead of block buffering by connecting its STDOUT to a pseudo-tty instead of a pipe (like Expect.pm does, for example).
The unix program expect has a tool called unbuffer which does that exactly that. (It's part of the expect-dev package on Ubuntu.) Just prefix the command name with unbuffer.

Perl - can't flush STDOUT or STDERR

Perl 5.14 from stock Ubuntu Precise repos. Trying to write a simple wrapper to monitor progress on copying from one stream to another:
use IO::Handle;
while ($bufsize = read (SOURCE, $buffer, 1048576)) {
STDERR->printflush ("Transferred $xferred of $sendsize bytes\n");
$xferred += $bufsize;
print TARGET $buffer;
}
This does not perform as expected (writing a line each time the 1M buffer is read). I end up seeing the first line (with a blank value of $xferred), and then nothing until everything flushes on the 7th and 8th lines (on an 8MB transfer). Been pounding my brains out on this for hours - I've read the perldocs, I've read the classic "Suffering from Buffering" article, I've tried everything from select and $|++ to IO::Handle to binmode (STDERR, "::unix") to you name it. I've also tried flushing TARGET with each line using IO::Handle (TARGET->flush). No dice.
Has anybody else ever encountered this? I don't have any ideas left. Sleeping one second "fixes" the problem, but obviously I don't want to sleep a second every time I read a buffer just so my progress will output on the screen!
FWIW, the problem is exactly the same whether I'm outputting to STDERR or STDOUT.
Perl read calls fread(3), not read(2).
This means that it goes through libc and may be using an internal buffer larger than yours; i.e., it gets all the data there is to be received and then quickly throws it at you in 1MB increments.
If this conjecture is correct, the solution might be to use sysread, which calls read(2), instead of read.

How can i make writes to a gzip file from my perl script non-blocking?

I'm currently writing a script that takes a database as input and generates all valid combinations from the 10+ tables, following certain rules. Since the output is pretty darn huge, i'm dumping this through gzip into the file, like this:
open( my $OUT, '|-', "gzip > file" );
for ( #data ) {
my $line = calculate($_);
print $OUT $line;
}
Due to the nature of the beast though i end up having to make hundreds of thousands of small writes, one for each line. This means that between each calculation it waits for gzip to receive the data and get done compressing it. At least i think so, i might be wrong.
In case I'm right though, I'm wondering how i can make this print asynchronous, i.e. it fires the data at gzip and then goes on processing the data.
Give IO::Compress::Gzip a try. It accepts a filehandle to write to. You can set O_NONBLOCK on that filehandle.
Pipes already use a buffer so that the writing program doesn't have to wait for the reading program. However, that buffer is usually fairly small (it's normally only 64KB on Linux) and not easily changed (it requires recompiling the kernel). If the standard buffer is not enough, the easiest thing to do is include a buffering program in the pipeline:
open( my $OUT, '|-', "bfr | gzip > file" );
bfr simply reads STDIN into an in-memory buffer, and writes to STDOUT as fast as the next program allows. The default is a 5MB buffer, but you can change that with the -b option (e.g. bfr -b10m for a 10MB buffer).
naturally i'll will do it in a thread or with a fork as you wish.
http://hell.jedicoder.net/?p=82

Efficient way to continually read in data from a text file

We have a script on an FTP endpoint that monitors the FTP logs spewed out by our FTP daemon.
Currently what we do is have a perl script essentially runs a tail -F on the file and sends every single line into a remote MySQL database, with slightly different column content based off the record type.
This database has tables for content of both the tarball names/content, as well as FTP user actions with said packages; Downloads, Deletes, and everything else VSFTPd logs.
I see this as particularly bad, but I'm not sure what's better.
The goal of a replacement is to still get log file content into a database as quick as possible. I'm thinking doing something like making a FIFO/pipe file in place of where the FTP log file is, so I can read it in once periodically, ensuring I never read the same thing in twice. Assuming VSFTPd will place nice with that (I'm thinking it won't, insight welcome!).
The FTP daemon is VSFTPd, I'm at least fairly sure the extent of their logging capabilies are: xfer style log, vsftpd style log, both, or no logging at all.
The question is, what's better than what we're already doing, if anything?
Honestly, I don't see much wrong with what you're doing now. tail -f is very efficient. The only real problem with it is that it loses state if your watcher script ever dies (which is a semi-hard problem to solve with rotating logfiles). There's also a nice File::Tail module on CPAN that saves you from the shellout and has some nice customization available.
Using a FIFO as a log can work (as long as vsftpd doesn't try to unlink and recreate the logfile at any point, which it may do) but I see one major problem with it. If no one is reading from the other end of the FIFO (for instance if your script crashes, or was never started), then a short time later all of the writes to the FIFO will start blocking. And I haven't tested this, but it's pretty likely that having logfile writes block will cause the entire server to hang. Not a very pretty scenario.
Your problem with reading a continually updated file is you want to keep reading, even after the end of file is reached. The solution to this is to re-seek to your current position in the file:
seek FILEHANDLE, 0, 1;
Here is my code for doing this sort of thing:
open(FILEHANDLE,'<', '/var/log/file') || die 'Could not open log file';
seek(FILEHANDLE, 0, 2) || die 'Could not seek to end of log file';
for (;;) {
while (<FILEHANDLE>) {
if ( $_ =~ /monitor status down/ ) {
print "Gone down\n";
}
}
sleep 1;
seek FILEHANDLE, 0, 1; # clear eof
}
You should look into inotify (assuming you are on a nice, posix based OS) so you can run your perl script whenever the logfile is updated. If this level of IO causes problems you could always keep the logfile on a RAMdisk so IO is very fast.
This should help you set this up:
http://www.cyberciti.biz/faq/linux-inotify-examples-to-replicate-directories/
You can open the file as an in pipe.
open(my $file, '-|', '/ftp/file/path');
while (<$file>) {
# you know the rest
}
File::Tail does this, plus heuristic sleeping and nice error handling and recovery.
Edit: On second thought, a real system pipe is better if you can manage it. If not, you need to find the last thing you put in the database, and spin through the file until you get to the last thing you put in the database whenever your process starts. Not that easy to accomplish, and potentially impossible if you have no way of identifying where you left off.