When do you need to `END { close STDOUT}` in Perl? - perl

In the tchrists broilerplate i found this explicit closing of STDOUT in the END block.
END { close STDOUT }
I know END and close, but i'm missing why it is needed.
When start searching about it, found in the perlfaq8 the following:
For example, you can use this to make
sure your filter program managed to
finish its output without filling up
the disk:
END {
close(STDOUT) || die "stdout close failed: $!";
}
and don't understand it anyway. :(
Can someone explain (maybe with some code-examples):
why and when it is needed
how and in what cases can my perl filter fill up the disk and so on.
when things getting wrong without it...
etc??

A lot of systems implement "optimistic" file operations. By this I mean that a call to for instance print which should add some data to a file can return successfully before the data is actually written to the file, or even before enough space is reserved on disk for the write to succeed.
In these cases, if you disk is nearly full, all your prints can appear successful, but when it is time to close the file, and flush it out to disk, the system realizes that there is no room left. You then get an error when closing the file.
This error means that all the output you thought you saved might actually not have been saved at all (or partially saved). If that was important, your program needs to report an error (or try to correct the situation, or ...).
All this can happen on the STDOUT filehandle if it is connected to a file, e.g. if your script is run as:
perl script.pl > output.txt
If the data you're outputting is important, and you need to know if all of it was indeed written correctly, then you can use the statement you quoted to detect a problem. For example, in your second snippet, the script explicitly calls die if close reports an error; tchrist's boilerplate runs under use autodie, which automatically invokes die if close fails.
(This will not guarantee that the data is stored persistently on disk though, other factors come into play there as well, but it's a good error indication. i.e. if that close fails, you know you have a problem.)

I believe Mat is mistaken.
Both Perl and the system have buffers. close causes Perl's buffers to be flushed to the system. It does not necessarily cause the system's buffers to be written to disk as Mat claimed. That's what fsync does.
Now, this would happen anyway on exit, but calling close gives you a chance to handle any error it encountered flushing the buffers.
The other thing close does is report earlier errors in attempts by the system to flush its buffers to disk.

Related

Perl: system command returns before it is done

I'm using a a few system() commands in my perl script that is running on Linux.
The commands I run with the system() function output their data to a log which I then parse to decide what to do next.
I noticed that sometimes it looks like the code that parses the log file (which comes after the system() function) doesn't use the final log.
For example, I search for a "test pass" phrase in the log file - and the script doesn't find it even though when I open the log it is there.
Another example - I try to delete the folder where the log was placed but it doesn't let me because it's "Not empty". When I try to delete it manually it is deleted with errors.
(These examples happen every now and then, but most of the time they don't)
It looks like some kind of "timing" problem to me. How can I solve it?
If you want to be safe and are on Linux, call system sync; after every command that writes to the disk before reading from the disk. That will force the OS to write everything still buffered to the filesystem and only return afterwards. Thus you can be sure that when it is finished, everything you wrote to files now actually has arrived there.
But be aware, that may be overkill in many situations. There are reasons for those buffers and manually calling sync constantly is most likely not the fastest way of achieving things. See also http://linux.die.net/man/8/sync
If, for example, you have something else that you could do between the writing and the reading, like some calculations or whatever, that would likely be enough and you would not waste time by telling the OS that you know better how and when it has to do it's jobs. ^^
But if perfect efficiency is not your main concern (and you should not be using Perl if it was), system sync; after everything that modifies files and before accessing those files is probably okay and safe.

what might cause a print error in perl?

I have a long running script that every hour opens a file, prints to it and closes the file. I've recently found very rarely, the print is failing, not because I'm testing the status of the print itself but rather due to the fact of missing entries in the file until the system is actually rebooted!
I do trap for file open failures and write a message to syslog when that happens and I'm not seeing any open failures so I'm now guessing it may be the print that is failing. I'm not trapping the print failures, which I suspect most people don't but am now going to update that one print.
Meanwhile, my question is does anyone know what types of situations could cause a print statement to fail when there is plenty of disk storage and no contention for a file which has been successfully opened in append mode?
You could be out of memory (ENOMEM) or over a filesize limit (E2BIG or SIGXFSZ). You could have an old-fashioned I/O error (EIO). You could have a race condition if the script is run concurrently or if the file is accessed over NFS. And, of course, you could have an error in the expression whose value you would print.
An exotic cause that I once saw is that a CPU heatsink failure can lead to sprintf spuriously failing, causing some surprising results including writing garbage to file descriptors.
Finally, I remind you that print will often write its stuff in an I/O buffer. This means two things. (1) You need to check the result of close() as well. (2) If you print but you don't immediately close() or flush() then your data can be buffered and not actually written until much later (or not at all if the process dies horribly).

Is it bad to open() and close() in a loop if speed isn't an issue?

I modified another programmer's Perl script I use to make it output logs. The perl script goes through files, and for every file it goes through I open() the log, write/print to it and then close() it. This happens a lot of times. I do this to make sure I don't lose any data if said Perl script hangs up (it eventually starts doing that, and I'm not knowledgeable enough to fix it). Therefore, I don't have a good alternative to repeating open() and close() in that loop.
My main question is this: the Perl script is for personal use, so speed reduction is not an issue. But are there other bad things that could follow out of this likely improper usage of open/close? It may sound like a stupid question, but is it possible this would wear my hard disk down faster, or am I misunderstanding how file handling works?
Thanks in advance.
As others have mentioned, there is no issue here other than performance (and arguably cleanliness of code).
However, if you are merely worried about "losing data if Perl hangs up", just set autoflush on the file handle:
use IO::Handle;
open HANDLE, '>log.txt'
or die "Unable to open log.txt for writing: $!";
HANDLE->autoflush(1);
Now every print to HANDLE will get flushed automatically. No need to keep opening and closing.
Search for "autoflush" in the perldoc man page for more information.
In theory it's usually better to open and close connections as quickly as possible, and files are no different. The two things you will run into are file locking and performance.
File locking could come about if something else is accessing your file at the same time.
Performance, as you mentioned, isn't a huge concern.
We're not talking about lifetimes of waiting for open/close operations anyway...it's mostly noticeable with high concurrency or hundreds of thousands of actions.
The OS determines hard drive access so you should be fine. If you need to open() and close() a lot of files then it's ok. The only thing that might happen is if your script hangs (for some odd reason) while it has the file pointer from open() it could cause data loss if it resumes after you edit manually (but this is a pretty rare scenario). Also if your script crashes, then the descriptors get released anyway so there's no issue as far as I can tell.

How can I debug a Perl program that suddenly exits?

I have Perl program based on IO::Async, and it sometimes just exits after a few hours/days without printing any error message whatsoever. There's nothing in dmesg or /var/log either. STDOUT/STDERR are both autoflush(1) so data shouldn't be lost in buffers. It doesn't actually exit from IO::Async::Loop->loop_forever - print I put there just to make sure of that never gets triggered.
Now one way would be to keep peppering the program with more and more prints and hope one of them gives me some clue. Is there better way to get information what was going on in a program that made it exit/silently crash?
One trick I've used is to run the program under strace or ltrace (or attach to the process using strace). Naturally that was under Linux. Under other operating systems you'd use ktrace or dtrace or whatever is appropriate.
A trick I've used for programs which only exhibit sparse issues over days or week and then only over handfuls among hundreds of systems is to direct the output from my tracer to a FIFO, and have a custom program keep only 10K lines in a ring buffer (and with a handler on SIGPIPE and SIGHUP to dump the current buffer contents into a file. (It's a simple program, but I don't have a copy handy and I'm not going to re-write it tonight; my copy was written for internal use and is owned by a former employer).
The ring buffer allows the program to run indefinitely with fear of running systems out of disk space ... we usually only need a few hundred, even a couple thousand lines of the trace in such matters.
If you are capturing STDERR, you could start the program as perl -MCarp::Always foo_prog. Carp::Always forces a stack trace on all errors.
A sudden exit without any error message is possibly a SIGPIPE. Traditionally SIGPIPE is used to stop things like the cat command in the following pipeline:
cat file | head -10
It doesn't usually result in anything being printed either by libc or perl to indicate what happened.
Since in an IO::Async-based program you'd not want to silently exit on SIGPIPE, my suggestion would be to put somewhere in the main file of the program a line something like
$SIG{PIPE} = sub { die "Aborting on SIGPIPE\n" };
which will at least alert you to this fact. If instead you use Carp::croak without the \n you might even be lucky enough to get the file/line number of the syswrite, etc... that caused the SIGPIPE.

Why does IIS crash when I print to stderr in Perl?

This has been driving me crazy. We have IIS (6) and windows 2008 and ActiveState Perl 5.10. For some reason whenever we do a warn or a carp it eventually corrupts the app pool. Of course, that's a pretty big deal since it means that our errors actually cause problems.
This happened with the previous version of Perl (5.8) and Windows (2003) and IIS (5.) Anyway, basically I put in a carp or a warn and I get an error message and then some garbage text. Any thoughts?
Check to make sure that IIS and the perl DLL are linked with the same version of the C runtime library. (Use depends.exe or dumpbin /dependents).
To expand: the problem may be that IIS has its FILE* table in one place, and the perl DLL thinks it's going to be in a slightly different place. When perl goes to find the stderr handle, it treats random memory as a file handle, with predictable results.
Try adding the following to the top of your scripts:
BEGIN {
open STDERR, '>> c:/iisError.log'
or die "Can't write to c:/issError.log: $!\n";
binmode STDERR;
}
I'm not sure why you would have this problem. But several "wild" guesses as to sources for such a problem would be addressed by the above code.
(It has been a while since I read the source code for appending to files in Win32, but, as I recall, the >> mode plus binmode means that writes to the file from different processes are unlikely to collide, preventing overlapping text in the log.)
A couple of suggestions:
Make sure that the id of the worker
process has write permission to the
directory/file you are writing. I
probably wouldn't give it full
control of C:, though. Better to
make a sub-directory.
Write to the event log instead of a file using
Win32::EventLog
Update: I discovered that this error only happens when you have a variable in the warn. If the warn is just regular text there are no issues. Also, the variable cannot be empty and it looks like you have to have two warns with nonempty variables to hit the bug.