Can a call to write(2) be interrupted by OS with an fsync(2) - system-calls

I have a loop of write(2) with arbitrary amount of data + EOL and an fsync(2) appending to a file line by line.
Can a crash of the process leave me with a file that has half of the data from the write(2) call written to file?
My theory is that if the OS calls fsync occasionally, there might be a coincidence of it happening during a call the write(2) leaving the file with half of the line written, without the ending new line.

Yes. Even without a crash, you might have a partial line written as the write call might not write all the data passed to it -- it might return a short write.

Related

Fortran and Eclipse: Displaying text in console

I'm having a small difficulty with Fortran 90 and Eclipse. I installed the "Photran" plugin to Eclipse, and have managed to compile everything perfect, and overall the program does what it has to do. The problem comes when displaying text in the Eclipse console. The code it self not that important, since it does what it has to do, but more the output generation.
The piece of the code I'm having trouble with is the following:
subroutine main_program
write(*,*) "Program begins!"
<Program that takes ~5mins to run>
write(*,*) "Program ends!"
end subroutine main_program
Specifically, the problem is that in the console, the first message should be shown immediately, "Program begins!", and after ~5 minutes it should show "Program ends!". It happens that both of these messages get displayed only after the program is done running, not while the programs is executing.
I have used:
subroutine main_program
print*, "Program begins!"
<Program that takes ~5mins to run>
print*, "Program ends!"
end subroutine main_program
but it keeps on doing the same thing. I saw a "similar" post earlier (can't find the link though, sorry about that) but it was not really what I was looking for.
OK, here's the answer. Insert the statement
flush 6
after the first write statement to have its output sent immediately to the console. Insert it anywhere else you wish once you understand what it is doing.
It is obvious (to me) from the situation OP describes that the output is being buffered, that is the program issues a write statement and passes the output off to the operating system which does as it damn well pleases -- here it waits until the program ends before writing anything to the console. I guess that its buffering capabilities have some limits and if the program exceeded them the o/s would empty its buffers prior to program end.
Fortran now (since 2003 I think) provides a standard way of telling the o/s to actually flush the buffer to the output device -- the flush statement. In its simplest form flush takes only one argument, the unit number of the output channel to be flushed. I guessed that OP had unit 6 connected to stdout (aka *), since this is a near-universal default configuration, though not one guaranteed by the Fortran language standard.
I don't think that flush * is correct.
If you have a pre-2003 compiler then (a) for Backus' sake update and (b) it is likely that it supports a non-standard way to flush buffers; if memory serves gfortran used to provide a subroutine which would be called something like call flush(6).
There are other ways, outside Fortran, to tell the o/s to write to disk (or console or what have you) immediately. Look at the documentation for your o/s if you are interested in them.

what might cause a print error in perl?

I have a long running script that every hour opens a file, prints to it and closes the file. I've recently found very rarely, the print is failing, not because I'm testing the status of the print itself but rather due to the fact of missing entries in the file until the system is actually rebooted!
I do trap for file open failures and write a message to syslog when that happens and I'm not seeing any open failures so I'm now guessing it may be the print that is failing. I'm not trapping the print failures, which I suspect most people don't but am now going to update that one print.
Meanwhile, my question is does anyone know what types of situations could cause a print statement to fail when there is plenty of disk storage and no contention for a file which has been successfully opened in append mode?
You could be out of memory (ENOMEM) or over a filesize limit (E2BIG or SIGXFSZ). You could have an old-fashioned I/O error (EIO). You could have a race condition if the script is run concurrently or if the file is accessed over NFS. And, of course, you could have an error in the expression whose value you would print.
An exotic cause that I once saw is that a CPU heatsink failure can lead to sprintf spuriously failing, causing some surprising results including writing garbage to file descriptors.
Finally, I remind you that print will often write its stuff in an I/O buffer. This means two things. (1) You need to check the result of close() as well. (2) If you print but you don't immediately close() or flush() then your data can be buffered and not actually written until much later (or not at all if the process dies horribly).

When do you need to `END { close STDOUT}` in Perl?

In the tchrists broilerplate i found this explicit closing of STDOUT in the END block.
END { close STDOUT }
I know END and close, but i'm missing why it is needed.
When start searching about it, found in the perlfaq8 the following:
For example, you can use this to make
sure your filter program managed to
finish its output without filling up
the disk:
END {
close(STDOUT) || die "stdout close failed: $!";
}
and don't understand it anyway. :(
Can someone explain (maybe with some code-examples):
why and when it is needed
how and in what cases can my perl filter fill up the disk and so on.
when things getting wrong without it...
etc??
A lot of systems implement "optimistic" file operations. By this I mean that a call to for instance print which should add some data to a file can return successfully before the data is actually written to the file, or even before enough space is reserved on disk for the write to succeed.
In these cases, if you disk is nearly full, all your prints can appear successful, but when it is time to close the file, and flush it out to disk, the system realizes that there is no room left. You then get an error when closing the file.
This error means that all the output you thought you saved might actually not have been saved at all (or partially saved). If that was important, your program needs to report an error (or try to correct the situation, or ...).
All this can happen on the STDOUT filehandle if it is connected to a file, e.g. if your script is run as:
perl script.pl > output.txt
If the data you're outputting is important, and you need to know if all of it was indeed written correctly, then you can use the statement you quoted to detect a problem. For example, in your second snippet, the script explicitly calls die if close reports an error; tchrist's boilerplate runs under use autodie, which automatically invokes die if close fails.
(This will not guarantee that the data is stored persistently on disk though, other factors come into play there as well, but it's a good error indication. i.e. if that close fails, you know you have a problem.)
I believe Mat is mistaken.
Both Perl and the system have buffers. close causes Perl's buffers to be flushed to the system. It does not necessarily cause the system's buffers to be written to disk as Mat claimed. That's what fsync does.
Now, this would happen anyway on exit, but calling close gives you a chance to handle any error it encountered flushing the buffers.
The other thing close does is report earlier errors in attempts by the system to flush its buffers to disk.

Delete file on exit

Maybe I'm wrong, but I am convinced there is some facility provided by UNIX and by the C standard library to get the OS to delete a file once a process exits. But I can't remember what it's called (or maybe I imagined it). In my particular case I would like to access this functionality from perl.
Java has the deleteOnExit function but I understand the deletion is done by the JVM as opposed to the OS which means that if the JVM exits uncleanly (e.g. power failure) then the file will never get deleted.
But I understand the facility I am looking for (if it exists), as it is provided by the OS, the OS looks after the file's deletion, presumably doing some cleanup work on OS start in the case of power failure etc., and certainly doing cleanup in the case the process exits uncleanly.
A very very simple solution to this (that only works on *nix systems) is to:
Create and open the file (keep the file handle around)
Immediately call unlink on the file
Proceed as normal using the file handle, and exit when you feel like it
Then when your program is complete, the file descriptor is closed and the file is truly deleted. This will even work if the program crashes.
Of course this only works within the context of a single script (i.e. other scripts won't be able to directly manipulate the file, although you COULD pass them the file descriptor).
If you are looking for something that the OS may automatically take care of on restart after power failure, an END block isn't enough, you need to create the file where the OS is expecting a temporary file. And once you are doing that, you should just use one of the File::Temp routines (which even offer the option of opening and immediately unlinking the file for you, if you want).
You're looking for atexit(). In Perl this is usually done with END blocks. Java and Perl provide their own because they want to be portable to systems that don't follow the relevant standards (in this case C90).
That said, on Unix the common convention is to open a file and then unlink it; the kernel will delete it when the last reference (which is to say, your file descriptor) is closed. You almost always want to open for read+write.
I think you are looking for a function called tmpfile() which creates file when called and deletes it upon close. check, article
You could do your work in an END block.

How can I debug a Perl program that suddenly exits?

I have Perl program based on IO::Async, and it sometimes just exits after a few hours/days without printing any error message whatsoever. There's nothing in dmesg or /var/log either. STDOUT/STDERR are both autoflush(1) so data shouldn't be lost in buffers. It doesn't actually exit from IO::Async::Loop->loop_forever - print I put there just to make sure of that never gets triggered.
Now one way would be to keep peppering the program with more and more prints and hope one of them gives me some clue. Is there better way to get information what was going on in a program that made it exit/silently crash?
One trick I've used is to run the program under strace or ltrace (or attach to the process using strace). Naturally that was under Linux. Under other operating systems you'd use ktrace or dtrace or whatever is appropriate.
A trick I've used for programs which only exhibit sparse issues over days or week and then only over handfuls among hundreds of systems is to direct the output from my tracer to a FIFO, and have a custom program keep only 10K lines in a ring buffer (and with a handler on SIGPIPE and SIGHUP to dump the current buffer contents into a file. (It's a simple program, but I don't have a copy handy and I'm not going to re-write it tonight; my copy was written for internal use and is owned by a former employer).
The ring buffer allows the program to run indefinitely with fear of running systems out of disk space ... we usually only need a few hundred, even a couple thousand lines of the trace in such matters.
If you are capturing STDERR, you could start the program as perl -MCarp::Always foo_prog. Carp::Always forces a stack trace on all errors.
A sudden exit without any error message is possibly a SIGPIPE. Traditionally SIGPIPE is used to stop things like the cat command in the following pipeline:
cat file | head -10
It doesn't usually result in anything being printed either by libc or perl to indicate what happened.
Since in an IO::Async-based program you'd not want to silently exit on SIGPIPE, my suggestion would be to put somewhere in the main file of the program a line something like
$SIG{PIPE} = sub { die "Aborting on SIGPIPE\n" };
which will at least alert you to this fact. If instead you use Carp::croak without the \n you might even be lucky enough to get the file/line number of the syswrite, etc... that caused the SIGPIPE.