Atomic open of non-existing file in Perl - perl

I want to write something to a file which name is in variable $filename.
I don't want to overwrite it, so I check first if it exists and then open it:
#stage1
if(-e $filename)
{
print "file $filename exists, not overwriting\n";
exit 1;
}
#stage2
open(OUTFILE, ">", $filename) or die $!;
But this is not atomic. Theoretically someone can create this file between stage1 and stage2. Is there some variant of open command that will do these both things in atomic way, so it will fail to open a file for writing if the file exists?

Here is an atomic way of opening files:
#!/usr/bin/env perl
use strict;
use warnings qw(all);
use Fcntl qw(:DEFAULT :flock);
my $filename = 'test';
my $fh;
# this is "atomic open" part
unless (sysopen($fh, $filename, O_CREAT | O_EXCL | O_WRONLY)) {
print "file $filename exists, not overwriting\n";
exit 1;
}
# flock() isn't required for "atomic open" per se
# but useful in real world usage like log appending
flock($fh, LOCK_EX);
# use the handle as you wish
print $fh scalar localtime;
print $fh "\n";
# unlock & close
flock($fh, LOCK_UN);
close $fh;
Debug session:
stas#Stanislaws-MacBook-Pro:~/stackoverflow$ cat test
Wed Dec 19 12:10:37 2012
stas#Stanislaws-MacBook-Pro:~/stackoverflow$ perl sysopen.pl
file test exists, not overwriting
stas#Stanislaws-MacBook-Pro:~/stackoverflow$ cat test
Wed Dec 19 12:10:37 2012

If you're concerned about multiple Perl scripts modifying the same file, just use the flock() function in each one to lock the file you're interested in.
If you're worried about external processes, which you probably don't have control over, you can use the sysopen() function. According to the Programming Perl book (which I highly recommend, by the way):
To fix this problem of overwriting, you’ll need to use sysopen, which
provides individual controls over whether to create a new file or
clobber an existing one. And we’ll ditch that –e file existence test
since it serves no useful purpose here and only increases our exposure
to race conditions.
They also provide this sample block of code:
use Fcntl qw/O_WRONLY O_CREAT O_EXCL/;
open(FH, "<", $file)
|| sysopen(FH, $file, O_WRONLY | O_CREAT | O_EXCL)
|| die "can't create new file $file: $!";
In this example, they first pull in a few constants (to be used in the sysopen call). Next, they try to open the file with open, and if that fails, they then try sysopen. They continue on to say:
Now even if the file somehow springs into existence between when open
fails and when sysopen tries to open a new file for writing, no harm
is done, because with the flags provided, sysopen will refuse to open
a file that already exists.
So, to make things clear for your situation, remove the file test completely (no more stage 1), and only do the open operation using code similar to the block above. Problem solved!

Related

How to write to an existing file in Perl?

I want to open an existing file in my desktop and write to it, for some reason I can't do it in ubuntu. Maybe I don't write the path exactly?
Is it possible without modules and etc.
open(WF,'>','/home/user/Desktop/write1.txt';
$text = "I am writing to this file";
print WF $text;
close(WF);
print "Done!\n";
You have to open a file in append (>>) mode in order to write to same file.
(Use a modern way to read a file, using a lexical filehandle:)
Here is the code snippet (tested in Ubuntu 20.04.1 with Perl v5.30.0):
#!/usr/bin/perl
use strict;
use warnings;
my $filename = '/home/vkk/Scripts/outfile.txt';
open(my $fh, '>>', $filename) or die "Could not open file '$filename' $!";
print $fh "Write this line to file\n";
close $fh;
print "done\n";
For more info, refer these links - open or appending-to-files by Gabor.
Please see following code sample, it demonstrates some aspects of correct usage of open, environment variables and reports an error if a file can not be open for writing.
Note: Run a search in Google for Perl bookshelf
#!/bin/env perl
#
# vim: ai ts=4 sw=4
#
use strict;
use warnings;
use feature 'say';
my $fname = $ENV{HOME} . '/Desktop/write1.txt';
my $text = 'I am writing to this file';
open my $fh, '>', $fname
or die "Can't open $fname";
say $fh $text;
close $fh;
say 'Done!';
Documentation quote
About modes
When calling open with three or more arguments, the second argument -- labeled MODE here -- defines the open mode. MODE is usually a literal string comprising special characters that define the intended I/O role of the filehandle being created: whether it's read-only, or read-and-write, and so on.
If MODE is <, the file is opened for input (read-only). If MODE is >, the file is opened for output, with existing files first being truncated ("clobbered") and nonexisting files newly created. If MODE is >>, the file is opened for appending, again being created if necessary.
You can put a + in front of the > or < to indicate that you want both read and write access to the file; thus +< is almost always preferred for read/write updates--the +> mode would clobber the file first. You can't usually use either read-write mode for updating textfiles, since they have variable-length records. See the -i switch in perlrun for a better approach. The file is created with permissions of 0666 modified by the process's umask value.
These various prefixes correspond to the fopen(3) modes of r, r+, w, w+, a, and a+.
Documentation: open, close,

Synchronize processes by locking a file

One of my scripts is installing a component. When run in parallel, the same script tries to install the same component, so I thought about synchronizing the process by locking a file while the script is installing and wait while other script is installing something.
The code would look like this:
# this will create a file handler on a file from TEMP dir with the
# name of the component; if it doesn't exist in TEMP dir, it will create it
my $file = $ENV{"TEMP"}. "\\" . $componentName;
open (my $fh, ">", "$file") or die "Couldn't open file!";
# this will apply an exclusive lock meaning that if another process
# already locked the file, it will wait until the lock is removed
flock($fh, 2) or die "Failed to lock the file";
# install the component..
# closing the file handle automatically removes the lock
close $fh;
I am concerned about the situation when a script locks the file and is starting the installation and the second script comes and tries to create a file handle on the locked file. I didn't see any errors, but I don't want to miss something.
Will there be a problem with this?
The thing that's important to remember is - the 'open' will work in either case, because that doesn't test the lock. It's the flock operation that will block until the lock is released.
And this should work just fine, although once the lock is released - you might want to check if you still need to run the install, unless you don't really care about doing it twice - e.g. if the rest of the script makes use of/relies upon it.
Also - are there other sources of 'installing' that aren't your script, that could cause the same problem? A lock is an advisory thing.
It would be a style improvement in your program to also:
Test $ENV{'TEMP'} to see that it exists, and default (or fail) if it doesn't.
use Fcntl qw ( :flock ); because then you can flock ( $fh, LOCK_EX ); to make it clear you're taking an exclusive lock.
You appear to be using \\ as a file separator. That's probably better if you used something like File::Spec to do that, for portability reasons.
You can use a LOCK_NB for nonblocking: flock ( $fh, LOCK_EX | LOCK_NB ) and then just skip if it's locked.
A lock doesn't prevent the file from being opened or modified; it prevents it from being locked.
This means the open won't fail, and it will clobber the file even if it's locked and still being used. If the lock is meant to protect access to the file (i.e. if the programs actually write to the locked file), you want to use sysopen to avoid clobbering the file if it already exists[1].
use Fcntl qw( LOCK_EX O_CREAT O_WRONLY );
# Open the file without clobbering it, creating it if necessary.
sysopen(my $fh, $qfn, O_WRONLY | O_CREAT)
or die($!);
# Wait for the file to become available.
flock($fh, LOCK_EX)
or die($!);
truncate($fh, 0)
or die($!);
...
or
use Fcntl qw( LOCK_EX LOCK_NB O_CREAT O_WRONLY );
# Open the file without clobbering it, creating it if necessary.
sysopen(my $fh, $qfn, O_WRONLY | O_CREAT)
or die($!);
# Check if the file is locked.
flock($fh, LOCK_EX | LOCK_NB)
or die($!{EWOULDBLOCK} ? "File already in use\n" : $!);
truncate($fh, 0)
or die($!);
...
You could also use open(my $fh, '>>', $qfn) if you don't mind having the file in append mode.

How to write perl sript that can't be run simultaneously [duplicate]

I need to run Perl script by cron periodically (~every 3-5 minutes). I want to ensure that only one Perl script instance will be running in a time, so next cycle won't start until the previous one is finished. Could/Should that be achieved by some built-in functionality of cron, Perl or I need to handle it at script level?
I am quite new to Perl and cron, so help and general recommendations are appreciated.
I have always had good luck using File::NFSLock to get an exclusive lock on the script itself.
use Fcntl qw(LOCK_EX LOCK_NB);
use File::NFSLock;
# Try to get an exclusive lock on myself.
my $lock = File::NFSLock->new($0, LOCK_EX|LOCK_NB);
die "$0 is already running!\n" unless $lock;
This is sort of the same as the other lock file suggestions, except I don't have to do anything except attempt to get the lock.
The Sys::RunAlone module does what you want very nicely. Just add
use Sys::RunAlone;
near the top of your code.
Use File::Pid to store the script's pid in a file, which the script should check for at the start, and abort if found. You can remove the pidfile when the script is done, but it's not truly necessary, as you can simply check later to see if that process id is still alive (which will also account for the cases when your script aborts unexpectedly):
use strict;
use warnings;
use File::Pid;
my $pidfile = File::Pid->new({file => /var/run/myscript});
exit if $pidfile->running();
$pidfile->write();
# ... rest of script...
# end of script
$pidfile->remove();
exit;
A typical approach is for each process to open and lock a certain file. Then the process reads the process ID contained in the file.
If a process with that ID is running, the latecomer exits quietly. Otherwise, the new winner writes its process ID ($$ in Perl) to the pidfile, closes the handle (which releases the lock), and goes about its business.
Example implementation below:
#! /usr/bin/perl
use warnings;
use strict;
use Fcntl qw/ :DEFAULT :flock :seek /;
my $PIDFILE = "/tmp/my-program.pid";
sub take_lock {
sysopen my $fh, $PIDFILE, O_RDWR | O_CREAT or die "$0: open $PIDFILE: $!";
flock $fh => LOCK_EX or die "$0: flock $PIDFILE: $!";
my $pid = <$fh>;
if (defined $pid) {
chomp $pid;
if (kill 0 => $pid) {
close $fh;
exit 1;
}
}
else {
die "$0: readline $PIDFILE: $!" if $!;
}
sysseek $fh, 0, SEEK_SET or die "$0: sysseek $PIDFILE: $!";
truncate $fh, 0 or die "$0: truncate $PIDFILE: $!";
print $fh "$$\n" or die "$0: print $PIDFILE: $!";
close $fh or die "$0: close: $!";
}
take_lock;
print "$0: [$$] running...\n";
sleep 2;
I have always used this - small and simple - no dependancy on any module and works both Windows + Linux.
use Fcntl ':flock';
### Check to make sure there is only one instance ###
open SELF, "< $0" or die("Cannot run two instances of this program");
unless ( flock SELF, LOCK_EX | LOCK_NB ) {
print "You cannot run two instances of this program , a process is still running";
exit 1;
}
AFAIK perl has no such thing builtin. You could easily create a temporary file, when you start your application and delete it, when your script is done.
Given the frequency I would normally write a daemon (server) that nicely waits idly between job runs (i.e. sleep()) rather than try to use cron for fairly fine-grained access.
If necessary, on Unix / Linux systems you could run it from /etc/inittab (or replacement) to ensure that it always running, and is automatically restarted in the process is killed or dies.
Added: (and some irrelevant stuff removed)
The always present (running, but mostly idle) daemon approach has the benefit of eliminating the possibility of concurrent instances of the script being being started by cron automatically.
However it does mean you are responsible for managing the timing correctly, such as in the case of there is an overlap (i.e. a previous run is still running, while a new trigger occurs). This may help you decide whether to use a forking daemon or non-forking design. Threads don't provide any advantage in this scenario, so there is no need to consider their usage.
This does not completely eliminate the possibility of multiple processes running, but that a common problem with many daemons. The typical solution is to use a semaphore such as a mutually-exclusive lock on a file, to prevent a second instance from being run. The file-lock is automatically forgotten when the process ends, so in the case of abnormal termination (e.g. power failure) there is no clean-up necessary of the lock itself.
An approach using Fcntl module, and using a Perl sysopen with a O_EXCL flag (or O_RDWR | O_CREAT | O_EXCL) was given by Greg Bacon. The only differences I would make are combine exclusive locking into the sysopen call (i.e. use the flags I've suggested), and remove the then redundant flock call. Oh, and I would follow the UNIX (& Linux FHS) file-system and naming conventions of /var/run/daemonname.pid.
Another approach would be to use djb's daemontools or similar to "daemonize" the task.

Perl File pointers

I have a question concerning these two files:
1)
use strict;
use warnings;
use v5.12;
use externalModule;
my $file = "test.txt";
unless(unlink $file){ # unlink is UNIX equivalent of rm
say "DEBUG: No test.txt persent";
}
unless (open FILE, '>>'.$file) {
die("Unable to create $file");
}
say FILE "This name is $file"; # print newline automatically
unless(externalModule::external_function($file)) {
say "error with external_function";
}
print FILE "end of file\n";
close FILE;
and external module (.pm)
use strict;
use warnings;
use v5.12;
package externalModule;
sub external_function {
my $file = $_[0]; # first arguement
say "externalModule line 11: $file";
# create a new file handler
unless (open FILE, '>>'.$file) {
die("Unable to create $file");
}
say FILE "this comes from an external module";
close FILE;
1;
}
1; # return true
Now,
In the first perl script line 14:
# create a new file handler
unless (open FILE, '>>'.$file) {
die("Unable to create $file");
}
If I would have
'>'.$file
instead, then the string printed by the external module will not be displayed in the final test.txt file.
Why is that??
Kind Regards
'>' means open the file for output, possibly overwriting it ("clobbering"). >> means appending to it if it already exists.
BTW, it is recommended to use 3 argument form of open with lexical file-handles:
open my $FH, '>', $file or die "Cannot open $file: $!\n";
If you use >$file in your main function, it will write to the start of the file, and buffer output as well. So after your external function returns, the "end of file" will be appended to the buffer, and the buffer flushed -- with the file pointer still at position 0 in the file, so you'll just overwrite the text from the external function. Try a much longer text in the external function, and you'll see that the last part of it remains, with the first part getting overwritten.
This is very old syntax, and not recommended in modern Perl. The 3-argument version, which was "only" introduced in 5.6.1 about 10 years ago, is preferred. So is using a lexical variable for a file handle, rather than an uppercase bareword.
Anyway, >> means open for append, whereas > means open for write, which will remove any existing data in the file.
You're clobbering your file when you reopen it once more. The > means open the file for writing, and delete the old file if it exists and create a new one. The >> means open the file for writing, but append the data if the file already exists.
As you can see, it's very hard to pass FILE back and forth between your module and your program.
The latest syntax is to use lexically scoped variables for file handles:
use autodie;
# Here I'm using the newer three parameter version of the open statement.
# I'm also using '$fh' instead of 'FILE' to store the pointer to the open
# file. This makes it easier to pass the file handle to various functions.
#
# I'm also using "autodie". This causes my program to die due to certain
# errors, so if I forget to test, I don't cause problems. I can use `eval`
# to test for an error if I don't want to die.
open my $fh, ">>", $file; # No die if it doesn't open thx to autodie
# Now, I can pass the file handle to whatever my external module needs
# to do with my file. I no longer have to pass the file name and have
# my external module reopen the file
externalModule::xternal_function( $fh );

Running only one Perl script instance by cron

I need to run Perl script by cron periodically (~every 3-5 minutes). I want to ensure that only one Perl script instance will be running in a time, so next cycle won't start until the previous one is finished. Could/Should that be achieved by some built-in functionality of cron, Perl or I need to handle it at script level?
I am quite new to Perl and cron, so help and general recommendations are appreciated.
I have always had good luck using File::NFSLock to get an exclusive lock on the script itself.
use Fcntl qw(LOCK_EX LOCK_NB);
use File::NFSLock;
# Try to get an exclusive lock on myself.
my $lock = File::NFSLock->new($0, LOCK_EX|LOCK_NB);
die "$0 is already running!\n" unless $lock;
This is sort of the same as the other lock file suggestions, except I don't have to do anything except attempt to get the lock.
The Sys::RunAlone module does what you want very nicely. Just add
use Sys::RunAlone;
near the top of your code.
Use File::Pid to store the script's pid in a file, which the script should check for at the start, and abort if found. You can remove the pidfile when the script is done, but it's not truly necessary, as you can simply check later to see if that process id is still alive (which will also account for the cases when your script aborts unexpectedly):
use strict;
use warnings;
use File::Pid;
my $pidfile = File::Pid->new({file => /var/run/myscript});
exit if $pidfile->running();
$pidfile->write();
# ... rest of script...
# end of script
$pidfile->remove();
exit;
A typical approach is for each process to open and lock a certain file. Then the process reads the process ID contained in the file.
If a process with that ID is running, the latecomer exits quietly. Otherwise, the new winner writes its process ID ($$ in Perl) to the pidfile, closes the handle (which releases the lock), and goes about its business.
Example implementation below:
#! /usr/bin/perl
use warnings;
use strict;
use Fcntl qw/ :DEFAULT :flock :seek /;
my $PIDFILE = "/tmp/my-program.pid";
sub take_lock {
sysopen my $fh, $PIDFILE, O_RDWR | O_CREAT or die "$0: open $PIDFILE: $!";
flock $fh => LOCK_EX or die "$0: flock $PIDFILE: $!";
my $pid = <$fh>;
if (defined $pid) {
chomp $pid;
if (kill 0 => $pid) {
close $fh;
exit 1;
}
}
else {
die "$0: readline $PIDFILE: $!" if $!;
}
sysseek $fh, 0, SEEK_SET or die "$0: sysseek $PIDFILE: $!";
truncate $fh, 0 or die "$0: truncate $PIDFILE: $!";
print $fh "$$\n" or die "$0: print $PIDFILE: $!";
close $fh or die "$0: close: $!";
}
take_lock;
print "$0: [$$] running...\n";
sleep 2;
I have always used this - small and simple - no dependancy on any module and works both Windows + Linux.
use Fcntl ':flock';
### Check to make sure there is only one instance ###
open SELF, "< $0" or die("Cannot run two instances of this program");
unless ( flock SELF, LOCK_EX | LOCK_NB ) {
print "You cannot run two instances of this program , a process is still running";
exit 1;
}
AFAIK perl has no such thing builtin. You could easily create a temporary file, when you start your application and delete it, when your script is done.
Given the frequency I would normally write a daemon (server) that nicely waits idly between job runs (i.e. sleep()) rather than try to use cron for fairly fine-grained access.
If necessary, on Unix / Linux systems you could run it from /etc/inittab (or replacement) to ensure that it always running, and is automatically restarted in the process is killed or dies.
Added: (and some irrelevant stuff removed)
The always present (running, but mostly idle) daemon approach has the benefit of eliminating the possibility of concurrent instances of the script being being started by cron automatically.
However it does mean you are responsible for managing the timing correctly, such as in the case of there is an overlap (i.e. a previous run is still running, while a new trigger occurs). This may help you decide whether to use a forking daemon or non-forking design. Threads don't provide any advantage in this scenario, so there is no need to consider their usage.
This does not completely eliminate the possibility of multiple processes running, but that a common problem with many daemons. The typical solution is to use a semaphore such as a mutually-exclusive lock on a file, to prevent a second instance from being run. The file-lock is automatically forgotten when the process ends, so in the case of abnormal termination (e.g. power failure) there is no clean-up necessary of the lock itself.
An approach using Fcntl module, and using a Perl sysopen with a O_EXCL flag (or O_RDWR | O_CREAT | O_EXCL) was given by Greg Bacon. The only differences I would make are combine exclusive locking into the sysopen call (i.e. use the flags I've suggested), and remove the then redundant flock call. Oh, and I would follow the UNIX (& Linux FHS) file-system and naming conventions of /var/run/daemonname.pid.
Another approach would be to use djb's daemontools or similar to "daemonize" the task.