How to protect file from multiple perl application access? - perl

I have a /root/abc/sample.xml file and this file is being accessed by many other application (around 2000 applications) parallel. so can you please suggest me any mechanism to access this file very safely /root/abc/sample.xml by all the application without any file corruption. I am beginner in perl programming .
Each application will read this file and close it once the process is done.

There are file locking solutions available with this in mind; flock is one. It provides a file lock ( flock() ) method that waits for a file to be free and then locks it, and unlock ( unlock() ).
For you example you could do;
#!/usr/bin/env perl
use strict;
# import LOCK_* and SEEK_END constants
use Fcntl qw(:flock SEEK_END);
# Open the file
open(my $xml, ">>", "sample.xml") or die "Can't open xml $!";
# Lock the file
flock($xml, LOCK_EX) or die "Cannot lock xml - $!\n";
# Someone may have changed the file while waiting for lock
# Either re-read or seek to end depening on what your doing.
# We are just going to append some stuff so will seek to end
# for this example
seek($xml, 0, SEEK_END) or die "Cannot seek - $!\n";
# Do some stuff
print $xml "<foo>bar</foo>" ,"\n\n";
# Unlock
flock($xml, LOCK_UN) or die "Cannot unlock xml - $!\n";

If all of your applications just read from the file (and do not write), there will be no problem. You can set file permissions to read-only to make sure.
Data corruption is only an issue if at least one party tries to update the file.

If you require write access, you can also use locks (file locking) or a distributed locking service like ZooKeeper + the Net::ZooKeeper module(possibly overkill).

Related

Synchronize processes by locking a file

One of my scripts is installing a component. When run in parallel, the same script tries to install the same component, so I thought about synchronizing the process by locking a file while the script is installing and wait while other script is installing something.
The code would look like this:
# this will create a file handler on a file from TEMP dir with the
# name of the component; if it doesn't exist in TEMP dir, it will create it
my $file = $ENV{"TEMP"}. "\\" . $componentName;
open (my $fh, ">", "$file") or die "Couldn't open file!";
# this will apply an exclusive lock meaning that if another process
# already locked the file, it will wait until the lock is removed
flock($fh, 2) or die "Failed to lock the file";
# install the component..
# closing the file handle automatically removes the lock
close $fh;
I am concerned about the situation when a script locks the file and is starting the installation and the second script comes and tries to create a file handle on the locked file. I didn't see any errors, but I don't want to miss something.
Will there be a problem with this?
The thing that's important to remember is - the 'open' will work in either case, because that doesn't test the lock. It's the flock operation that will block until the lock is released.
And this should work just fine, although once the lock is released - you might want to check if you still need to run the install, unless you don't really care about doing it twice - e.g. if the rest of the script makes use of/relies upon it.
Also - are there other sources of 'installing' that aren't your script, that could cause the same problem? A lock is an advisory thing.
It would be a style improvement in your program to also:
Test $ENV{'TEMP'} to see that it exists, and default (or fail) if it doesn't.
use Fcntl qw ( :flock ); because then you can flock ( $fh, LOCK_EX ); to make it clear you're taking an exclusive lock.
You appear to be using \\ as a file separator. That's probably better if you used something like File::Spec to do that, for portability reasons.
You can use a LOCK_NB for nonblocking: flock ( $fh, LOCK_EX | LOCK_NB ) and then just skip if it's locked.
A lock doesn't prevent the file from being opened or modified; it prevents it from being locked.
This means the open won't fail, and it will clobber the file even if it's locked and still being used. If the lock is meant to protect access to the file (i.e. if the programs actually write to the locked file), you want to use sysopen to avoid clobbering the file if it already exists[1].
use Fcntl qw( LOCK_EX O_CREAT O_WRONLY );
# Open the file without clobbering it, creating it if necessary.
sysopen(my $fh, $qfn, O_WRONLY | O_CREAT)
or die($!);
# Wait for the file to become available.
flock($fh, LOCK_EX)
or die($!);
truncate($fh, 0)
or die($!);
...
or
use Fcntl qw( LOCK_EX LOCK_NB O_CREAT O_WRONLY );
# Open the file without clobbering it, creating it if necessary.
sysopen(my $fh, $qfn, O_WRONLY | O_CREAT)
or die($!);
# Check if the file is locked.
flock($fh, LOCK_EX | LOCK_NB)
or die($!{EWOULDBLOCK} ? "File already in use\n" : $!);
truncate($fh, 0)
or die($!);
...
You could also use open(my $fh, '>>', $qfn) if you don't mind having the file in append mode.

How to write perl sript that can't be run simultaneously [duplicate]

I need to run Perl script by cron periodically (~every 3-5 minutes). I want to ensure that only one Perl script instance will be running in a time, so next cycle won't start until the previous one is finished. Could/Should that be achieved by some built-in functionality of cron, Perl or I need to handle it at script level?
I am quite new to Perl and cron, so help and general recommendations are appreciated.
I have always had good luck using File::NFSLock to get an exclusive lock on the script itself.
use Fcntl qw(LOCK_EX LOCK_NB);
use File::NFSLock;
# Try to get an exclusive lock on myself.
my $lock = File::NFSLock->new($0, LOCK_EX|LOCK_NB);
die "$0 is already running!\n" unless $lock;
This is sort of the same as the other lock file suggestions, except I don't have to do anything except attempt to get the lock.
The Sys::RunAlone module does what you want very nicely. Just add
use Sys::RunAlone;
near the top of your code.
Use File::Pid to store the script's pid in a file, which the script should check for at the start, and abort if found. You can remove the pidfile when the script is done, but it's not truly necessary, as you can simply check later to see if that process id is still alive (which will also account for the cases when your script aborts unexpectedly):
use strict;
use warnings;
use File::Pid;
my $pidfile = File::Pid->new({file => /var/run/myscript});
exit if $pidfile->running();
$pidfile->write();
# ... rest of script...
# end of script
$pidfile->remove();
exit;
A typical approach is for each process to open and lock a certain file. Then the process reads the process ID contained in the file.
If a process with that ID is running, the latecomer exits quietly. Otherwise, the new winner writes its process ID ($$ in Perl) to the pidfile, closes the handle (which releases the lock), and goes about its business.
Example implementation below:
#! /usr/bin/perl
use warnings;
use strict;
use Fcntl qw/ :DEFAULT :flock :seek /;
my $PIDFILE = "/tmp/my-program.pid";
sub take_lock {
sysopen my $fh, $PIDFILE, O_RDWR | O_CREAT or die "$0: open $PIDFILE: $!";
flock $fh => LOCK_EX or die "$0: flock $PIDFILE: $!";
my $pid = <$fh>;
if (defined $pid) {
chomp $pid;
if (kill 0 => $pid) {
close $fh;
exit 1;
}
}
else {
die "$0: readline $PIDFILE: $!" if $!;
}
sysseek $fh, 0, SEEK_SET or die "$0: sysseek $PIDFILE: $!";
truncate $fh, 0 or die "$0: truncate $PIDFILE: $!";
print $fh "$$\n" or die "$0: print $PIDFILE: $!";
close $fh or die "$0: close: $!";
}
take_lock;
print "$0: [$$] running...\n";
sleep 2;
I have always used this - small and simple - no dependancy on any module and works both Windows + Linux.
use Fcntl ':flock';
### Check to make sure there is only one instance ###
open SELF, "< $0" or die("Cannot run two instances of this program");
unless ( flock SELF, LOCK_EX | LOCK_NB ) {
print "You cannot run two instances of this program , a process is still running";
exit 1;
}
AFAIK perl has no such thing builtin. You could easily create a temporary file, when you start your application and delete it, when your script is done.
Given the frequency I would normally write a daemon (server) that nicely waits idly between job runs (i.e. sleep()) rather than try to use cron for fairly fine-grained access.
If necessary, on Unix / Linux systems you could run it from /etc/inittab (or replacement) to ensure that it always running, and is automatically restarted in the process is killed or dies.
Added: (and some irrelevant stuff removed)
The always present (running, but mostly idle) daemon approach has the benefit of eliminating the possibility of concurrent instances of the script being being started by cron automatically.
However it does mean you are responsible for managing the timing correctly, such as in the case of there is an overlap (i.e. a previous run is still running, while a new trigger occurs). This may help you decide whether to use a forking daemon or non-forking design. Threads don't provide any advantage in this scenario, so there is no need to consider their usage.
This does not completely eliminate the possibility of multiple processes running, but that a common problem with many daemons. The typical solution is to use a semaphore such as a mutually-exclusive lock on a file, to prevent a second instance from being run. The file-lock is automatically forgotten when the process ends, so in the case of abnormal termination (e.g. power failure) there is no clean-up necessary of the lock itself.
An approach using Fcntl module, and using a Perl sysopen with a O_EXCL flag (or O_RDWR | O_CREAT | O_EXCL) was given by Greg Bacon. The only differences I would make are combine exclusive locking into the sysopen call (i.e. use the flags I've suggested), and remove the then redundant flock call. Oh, and I would follow the UNIX (& Linux FHS) file-system and naming conventions of /var/run/daemonname.pid.
Another approach would be to use djb's daemontools or similar to "daemonize" the task.

Perl - passing an open socket across fork/exec

I want to have a Perl daemon listen for and accept an incoming connection from a client, and then fork & exec another Perl program to continue the conversation with the client.
I can do this fine when simply forking - where the daemon code also contains the child's code. But I don't see how the open socket can be "passed" across an exec() to another Perl program.
Somehow I got the impression that this was easy in Unix (which is my environment) and therefore in Perl, too. Can it actually be done?
This can be done in approximately three steps:
Clear the close-on-exec flag on the file descriptor.
Tell the exec'd program which file descriptor to use.
Restore the file descriptor into a handle.
1. Perl (by default) sets the close-on-exec flag on file descriptors it opens. This means file descriptors won't be preserved across an exec. You have to clear this flag first:
use Fcntl;
my $flags = fcntl $fh, F_GETFD, 0 or die "fcntl F_GETFD: $!";
fcntl $fh, F_SETFD, $flags & ~FD_CLOEXEC or die "fcntl F_SETFD: $!";
2. Now that the file descriptor will stay open across exec, you need to tell the program which descriptor it is:
my $fd = fileno $fh;
exec 'that_program', $fd; # pass it on the command line
# (you could also pass it via %ENV or whatever)
3. Recover the filehandle on the other side:
my $fd = $ARGV[0]; # or however you passed it
open my $fh, '+<&=', $fd; # fdopen
$fh->autoflush(1); # because "normal" sockets have that enabled by default
Now you have a Perl-level handle in $fh again.
Addendum: As ikegami mentioned in a comment, you can also make sure the socket is using one of the three "standard" file descriptors (0 (stdin), 1 (stdout), 2 (stderr)) which are 1. left open by default across execs, 2. have known numbers so no need to pass anything, and 3. perl will create corresponding handles for them automatically.
open STDIN, '+<&', $fh; # now STDIN refers to the socket
exec 'that_program';
Now that_program can simply use STDIN. This works even for output; there is no inherent restriction on file descriptors 0, 1, 2 that they be for input or output only. It's just a convention that all unix programs follow.

Atomic open of non-existing file in Perl

I want to write something to a file which name is in variable $filename.
I don't want to overwrite it, so I check first if it exists and then open it:
#stage1
if(-e $filename)
{
print "file $filename exists, not overwriting\n";
exit 1;
}
#stage2
open(OUTFILE, ">", $filename) or die $!;
But this is not atomic. Theoretically someone can create this file between stage1 and stage2. Is there some variant of open command that will do these both things in atomic way, so it will fail to open a file for writing if the file exists?
Here is an atomic way of opening files:
#!/usr/bin/env perl
use strict;
use warnings qw(all);
use Fcntl qw(:DEFAULT :flock);
my $filename = 'test';
my $fh;
# this is "atomic open" part
unless (sysopen($fh, $filename, O_CREAT | O_EXCL | O_WRONLY)) {
print "file $filename exists, not overwriting\n";
exit 1;
}
# flock() isn't required for "atomic open" per se
# but useful in real world usage like log appending
flock($fh, LOCK_EX);
# use the handle as you wish
print $fh scalar localtime;
print $fh "\n";
# unlock & close
flock($fh, LOCK_UN);
close $fh;
Debug session:
stas#Stanislaws-MacBook-Pro:~/stackoverflow$ cat test
Wed Dec 19 12:10:37 2012
stas#Stanislaws-MacBook-Pro:~/stackoverflow$ perl sysopen.pl
file test exists, not overwriting
stas#Stanislaws-MacBook-Pro:~/stackoverflow$ cat test
Wed Dec 19 12:10:37 2012
If you're concerned about multiple Perl scripts modifying the same file, just use the flock() function in each one to lock the file you're interested in.
If you're worried about external processes, which you probably don't have control over, you can use the sysopen() function. According to the Programming Perl book (which I highly recommend, by the way):
To fix this problem of overwriting, you’ll need to use sysopen, which
provides individual controls over whether to create a new file or
clobber an existing one. And we’ll ditch that –e file existence test
since it serves no useful purpose here and only increases our exposure
to race conditions.
They also provide this sample block of code:
use Fcntl qw/O_WRONLY O_CREAT O_EXCL/;
open(FH, "<", $file)
|| sysopen(FH, $file, O_WRONLY | O_CREAT | O_EXCL)
|| die "can't create new file $file: $!";
In this example, they first pull in a few constants (to be used in the sysopen call). Next, they try to open the file with open, and if that fails, they then try sysopen. They continue on to say:
Now even if the file somehow springs into existence between when open
fails and when sysopen tries to open a new file for writing, no harm
is done, because with the flags provided, sysopen will refuse to open
a file that already exists.
So, to make things clear for your situation, remove the file test completely (no more stage 1), and only do the open operation using code similar to the block above. Problem solved!

Running only one Perl script instance by cron

I need to run Perl script by cron periodically (~every 3-5 minutes). I want to ensure that only one Perl script instance will be running in a time, so next cycle won't start until the previous one is finished. Could/Should that be achieved by some built-in functionality of cron, Perl or I need to handle it at script level?
I am quite new to Perl and cron, so help and general recommendations are appreciated.
I have always had good luck using File::NFSLock to get an exclusive lock on the script itself.
use Fcntl qw(LOCK_EX LOCK_NB);
use File::NFSLock;
# Try to get an exclusive lock on myself.
my $lock = File::NFSLock->new($0, LOCK_EX|LOCK_NB);
die "$0 is already running!\n" unless $lock;
This is sort of the same as the other lock file suggestions, except I don't have to do anything except attempt to get the lock.
The Sys::RunAlone module does what you want very nicely. Just add
use Sys::RunAlone;
near the top of your code.
Use File::Pid to store the script's pid in a file, which the script should check for at the start, and abort if found. You can remove the pidfile when the script is done, but it's not truly necessary, as you can simply check later to see if that process id is still alive (which will also account for the cases when your script aborts unexpectedly):
use strict;
use warnings;
use File::Pid;
my $pidfile = File::Pid->new({file => /var/run/myscript});
exit if $pidfile->running();
$pidfile->write();
# ... rest of script...
# end of script
$pidfile->remove();
exit;
A typical approach is for each process to open and lock a certain file. Then the process reads the process ID contained in the file.
If a process with that ID is running, the latecomer exits quietly. Otherwise, the new winner writes its process ID ($$ in Perl) to the pidfile, closes the handle (which releases the lock), and goes about its business.
Example implementation below:
#! /usr/bin/perl
use warnings;
use strict;
use Fcntl qw/ :DEFAULT :flock :seek /;
my $PIDFILE = "/tmp/my-program.pid";
sub take_lock {
sysopen my $fh, $PIDFILE, O_RDWR | O_CREAT or die "$0: open $PIDFILE: $!";
flock $fh => LOCK_EX or die "$0: flock $PIDFILE: $!";
my $pid = <$fh>;
if (defined $pid) {
chomp $pid;
if (kill 0 => $pid) {
close $fh;
exit 1;
}
}
else {
die "$0: readline $PIDFILE: $!" if $!;
}
sysseek $fh, 0, SEEK_SET or die "$0: sysseek $PIDFILE: $!";
truncate $fh, 0 or die "$0: truncate $PIDFILE: $!";
print $fh "$$\n" or die "$0: print $PIDFILE: $!";
close $fh or die "$0: close: $!";
}
take_lock;
print "$0: [$$] running...\n";
sleep 2;
I have always used this - small and simple - no dependancy on any module and works both Windows + Linux.
use Fcntl ':flock';
### Check to make sure there is only one instance ###
open SELF, "< $0" or die("Cannot run two instances of this program");
unless ( flock SELF, LOCK_EX | LOCK_NB ) {
print "You cannot run two instances of this program , a process is still running";
exit 1;
}
AFAIK perl has no such thing builtin. You could easily create a temporary file, when you start your application and delete it, when your script is done.
Given the frequency I would normally write a daemon (server) that nicely waits idly between job runs (i.e. sleep()) rather than try to use cron for fairly fine-grained access.
If necessary, on Unix / Linux systems you could run it from /etc/inittab (or replacement) to ensure that it always running, and is automatically restarted in the process is killed or dies.
Added: (and some irrelevant stuff removed)
The always present (running, but mostly idle) daemon approach has the benefit of eliminating the possibility of concurrent instances of the script being being started by cron automatically.
However it does mean you are responsible for managing the timing correctly, such as in the case of there is an overlap (i.e. a previous run is still running, while a new trigger occurs). This may help you decide whether to use a forking daemon or non-forking design. Threads don't provide any advantage in this scenario, so there is no need to consider their usage.
This does not completely eliminate the possibility of multiple processes running, but that a common problem with many daemons. The typical solution is to use a semaphore such as a mutually-exclusive lock on a file, to prevent a second instance from being run. The file-lock is automatically forgotten when the process ends, so in the case of abnormal termination (e.g. power failure) there is no clean-up necessary of the lock itself.
An approach using Fcntl module, and using a Perl sysopen with a O_EXCL flag (or O_RDWR | O_CREAT | O_EXCL) was given by Greg Bacon. The only differences I would make are combine exclusive locking into the sysopen call (i.e. use the flags I've suggested), and remove the then redundant flock call. Oh, and I would follow the UNIX (& Linux FHS) file-system and naming conventions of /var/run/daemonname.pid.
Another approach would be to use djb's daemontools or similar to "daemonize" the task.