Is there any noticeable performance difference between these two ways of reading/writing a user file with Perl, on Linux?
Option 1:
open (READFILE, '<:utf8', "users/$_[0]") or die ("no read users/$_[0]");
# Do the reading
close (READFILE) or die;
# Do more stuff
open (WRITEFILE, '>:utf8', "users/$_[0]") or die ("no write users/$_[0]"); flock (WRITEFILE, 2) or die ("no lock users/$_[0]");
# Do the writing
close (WRITEFILE) or die;
Option 2:
open (USERFILE, '+<:utf8', "users/$_[0]") or die ("no open users/$_[0]"); flock (USERFILE, 2) or die ("no lock users/$_[0]");
# Do the reading
# Do more stuff
seek (USERFILE, 0, 0); truncate (USERFILE, 0);
# Do the writing
close (USERFILE) or die ("no write users/$_[0]");
The user files are not big, typically 20-40 lines or 2-4 KB each.
And would there be other reasons for choosing option 1 or 2 (or a 3rd option)?
Here is a benchmark which you can use to test it, I suspect that getting a new file descriptor is the part that takes longer if you close and then open again.
#!/usr/bin/env perl
use warnings;
use strict;
use open qw(:encoding(utf8) :std);
use Benchmark qw<cmpthese>;
my $text = <<TEXT;
I had some longer text here, but for better readability, just
these two lines.
TEXT
cmpthese(10_000,{
close => sub{
open my $file, '<',"bla" or die "$!";
my #array = <$file>;
close $file or die;
open $file, '>',"bla" or die "$!";
$file->print($text)
},
truncate => sub {
open my $file, '+<',"bla" or die "$!";
my #array = <$file>;
seek $file,0,0;
truncate $file, 0;
$file->print($text)
},
truncate_flock => sub {
open my $file, '+<',"bla" or die "$!";
flock $file, 2;
my #array = <$file>;
seek $file,0,0;
truncate $file, 0;
$file->print($text)
},
});
Output on my machine:
Rate close truncate_flock truncate
close 2703/s -- -15% -17%
truncate_flock 3175/s 17% -- -3%
truncate 3257/s 21% 3% --
A higher rate is better. Using close is 1.17 times slower.
But it heavily depends on how long your more stuff takes, since you're flocking the file in your truncate example and if another program is trying to access this file it may be slowed down because of that.
Related
For my upcoming PulseAudio library I want to redirect STDERR and STDOUT to /dev/null logically this works,
sub _exec {
open (*STDERR, '>', '/dev/null');
open (*STDOUT, '>', '/dev/null');
CORE::system('pacmd', #_ ) or die $?;
However, this still outputs to the term....
sub _exec {
local ( *STDERR, *STDOUT );
open (*STDERR, '>', '/dev/null');
open (*STDOUT, '>', '/dev/null');
CORE::system('pacmd', #_ ) or die $?;
That leaves me with two questions
First and foremost, why am I experiencing the behavior that I'm seeing?
Secondly, is there a more efficient method that doesn't involve storing the old value and replacing it?
The child writes to fd 1 and 2, yet you didn't change fd 1 and 2. You just created new Perl variables (something the child knows nothing about) with fd 3 and 4 (something the child doesn't care about).
Here's one way of achieving what you want:
use IPC::Open3 qw( open3 );
sub _exec {
open(local *CHILD_STDIN, '<', '/dev/null') or die $!;
open(local *CHILD_STDOUT, '>', '/dev/null') or die $!;
my $pid = open3(
'<&CHILD_STDIN',
'>&CHILD_STDOUT',
undef, # 2>&1
'pacmd', #_,
);
waitpid($pid, 0);
die $! if $? == -1;
die &? if $?;
}
open3 is pretty low level, but it's far higher level than doing it yourself*. IPC::Run and IPC::Run3 are even higher level.
* — It takes care for forking and assigning the handles to the right file descriptors. It handles error checking, including making pre-exec errors in the child appear to be the launch failures they are and not errors from the executed program.
For my upcoming PulseAudio library I want to redirect STDERR and STDOUT to /dev/null logically this works,
sub _exec {
open (*STDERR, '>', '/dev/null');
open (*STDOUT, '>', '/dev/null');
CORE::system('pacmd', #_ ) or die $?;
However, this still outputs to the term....
sub _exec {
local ( *STDERR, *STDOUT );
open (*STDERR, '>', '/dev/null');
open (*STDOUT, '>', '/dev/null');
CORE::system('pacmd', #_ ) or die $?;
That leaves me with two questions
First and foremost, why am I experiencing the behavior that I'm seeing?
Secondly, is there a more efficient method that doesn't involve storing the old value and replacing it?
The child writes to fd 1 and 2, yet you didn't change fd 1 and 2. You just created new Perl variables (something the child knows nothing about) with fd 3 and 4 (something the child doesn't care about).
Here's one way of achieving what you want:
use IPC::Open3 qw( open3 );
sub _exec {
open(local *CHILD_STDIN, '<', '/dev/null') or die $!;
open(local *CHILD_STDOUT, '>', '/dev/null') or die $!;
my $pid = open3(
'<&CHILD_STDIN',
'>&CHILD_STDOUT',
undef, # 2>&1
'pacmd', #_,
);
waitpid($pid, 0);
die $! if $? == -1;
die &? if $?;
}
open3 is pretty low level, but it's far higher level than doing it yourself*. IPC::Run and IPC::Run3 are even higher level.
* — It takes care for forking and assigning the handles to the right file descriptors. It handles error checking, including making pre-exec errors in the child appear to be the launch failures they are and not errors from the executed program.
this
is just
an example.
Lets assume the above is out.txt. I want to read out.txt and write onto the same file.
<Hi >
<this>
<is just>
<an example.>
Modified out.txt.
I want to add tags in the beginning and end of some lines.
As I will be reading the file several times I cannot keep writing it onto a different file each time.
EDIT 1
I tried using "+<" but its giving an output like this :
Hi
this
is just
an example.
<Hi >
<this>
<is just>
<an example.>
**out.txt**
EDIT 2
Code for reference :
open(my $fh, "+<", "out.txt");# or die "cannot open < C:\Users\daanishs\workspace\CCoverage\out.txt: $!";
while(<$fh>)
{
$s1 = "<";
$s2 = $_;
$s3 = ">";
$str = $s1 . $s2 . $s3;
print $fh "$str";
}
The very idea of what you are trying to do is flawed. The file starts as
H i / t h i s / ...
If you were to change it in place, it would look as follows after processing the first line:
< H i > / i s / ...
Notice how you clobbered "th"? You need to make a copy of the file, modify the copy, the replace the original with the copy.
The simplest way is to make this copy in memory.
my $file;
{ # Read the file
open(my $fh, '<', $qfn)
or die "Can't open \"$qfn\": $!\n";
local $/;
$file = <$fh>;
}
# Change the file
$file =~ s/^(.*)\n/<$1>\n/mg;
{ # Save the changes
open(my $fh, '>', $qfn)
or die "Can't create \"$qfn\": $!\n";
print($fh $file);
}
If you wanted to use the disk instead:
rename($qfn, "$qfn.old")
or die "Can't rename \"$qfn\": $!\n";
open(my $fh_in, '<', "$qfn.old")
or die "Can't open \"$qfn\": $!\n";
open(my $fh_out, '>', $qfn)
or die "Can't create \"$qfn\": $!\n";
while (<$fh_in>) {
chomp;
$_ = "<$_>";
print($fh_out "$_\n");
}
unlink("$qfn.old");
Using a trick, the above can be simplified to
local #ARGV = $qfn;
local $^I = '';
while (<>) {
chomp;
$_ = "<$_>";
print(ARGV "$_\n");
}
Or as a one-liner:
perl -i -pe'$_ = "<$_>"' file
Read contents in memory and then prepare required string as you write to your file. (SEEK_SET to zero't byte is required.
#!/usr/bin/perl
open(INFILE, "+<in.txt");
#a=<INFILE>;
seek INFILE, 0, SEEK_SET ;
foreach $i(#a)
{
chomp $i;
print INFILE "<".$i.">"."\n";
}
If you are worried about amount of data being read in memory, you will have to create a temporary result file and finally copy the result file to original file.
You could use Tie::File for easy random access to the lines in your file:
use Tie::File;
use strict;
use warnings;
my $filename = "out.txt";
my #array;
tie #array, 'Tie::File', $filename or die "can't tie file \"$filename\": $!";
for my $line (#array) {
$line = "<$line>";
# or $line =~ s/^(.*)$/<$1>/g; # -- whatever modifications you need to do
}
untie #array;
Disclaimer: Of course, this option is only viable if the file is not shared with other processes. Otherwise you could use flock to prevent shared access while you modify the file.
Disclaimer-2 (thanks to ikegami): Don't use this solution if you have to edit big files and are concerned about performance. Most of the performance loss is mitigated for small files (less than 2MB, though this is configurable using the memory arg).
One option is to open the file twice: Open it once read-only, read the data, close it, process it, open it again read-write (no append), write the data, and close it. This is good practice because it minimizes the time you have the file open, in case someone else needs it.
If you only want to open it once, then you can use the +< file type - just use the seek call between reading and writing to return to the beginning of the file. Otherwise, you finish reading, are at the end of the file, and start writing there, which is why you get the behavior you're seeing.
Need to specify
use Fcntl qw(SEEK_SET);
in order to use
seek INFILE, 0, SEEK_SET;
Thanks user1703205 for the example.
I'm trying to read a binary file with the following code:
open(F, "<$file") || die "Can't read $file: $!\n";
binmode(F);
$data = <F>;
close F;
open (D,">debug.txt");
binmode(D);
print D $data;
close D;
The input file is 16M; the debug.txt is only about 400k. When I look at debug.txt in emacs, the last two chars are ^A^C (SOH and ETX chars, according to notepad++) although that same pattern is present in the debug.txt. The next line in the file does have a ^O (SI) char, and I think that's the first occurrence of that particular character.
How can I read in this entire file?
If you really want to read the whole file at once, use slurp mode. Slurp mode can be turned on by setting $/ (which is the input record separator) to undef. This is best done in a separate block so you don't mess up $/ for other code.
my $data;
{
open my $input_handle, '<', $file or die "Cannot open $file for reading: $!\n";
binmode $input_handle;
local $/;
$data = <$input_handle>;
close $input_handle;
}
open $output_handle, '>', 'debug.txt' or die "Cannot open debug.txt for writing: $!\n";
binmode $output_handle;
print {$output_handle} $data;
close $output_handle;
Use my $data for a lexical and our $data for a global variable.
TIMTOWTDI.
File::Slurp is the shortest way to express what you want to achieve. It also has built-in error checking.
use File::Slurp qw(read_file write_file);
my $data = read_file($file, binmode => ':raw');
write_file('debug.txt', {binmode => ':raw'}, $data);
The IO::File API solves the global variable $/ problem in a more elegant fashion.
use IO::File qw();
my $data;
{
my $input_handle = IO::File->new($file, 'r') or die "could not open $file for reading: $!";
$input_handle->binmode;
$input_handle->input_record_separator(undef);
$data = $input_handle->getline;
}
{
my $output_handle = IO::File->new('debug.txt', 'w') or die "could not open debug.txt for writing: $!";
$output_handle->binmode;
$output_handle->print($data);
}
I don't think this is about using slurp mode or not, but about correctly handling binary files.
instead of
$data = <F>;
you should do
read(F, $buffer, 1024);
This will only read 1024 bytes, so you have to increase the buffer or read the whole file part by part using a loop.
open(my $fh, '>', $path) || die $!;
my_sub($fh);
Can my_sub() somehow extrapolate $path from $fh?
A filehandle might not even be connected to a file but instead to a network socket or a pipe hooked to the standard output of a child process.
If you want to associate handles with paths your code opens, use a hash and the fileno operator, e.g.,
my %fileno2path;
sub myopen {
my($path) = #_;
open my $fh, "<", $path or die "$0: open: $!";
$fileno2path{fileno $fh} = $path;
$fh;
}
sub myclose {
my($fh) = #_;
delete $fileno2path{fileno $fh};
close $fh or warn "$0: close: $!";
}
sub path {
my($fh) = #_;
$fileno2path{fileno $fh};
}
Whoever might be looking for better way to find the file name from filehandle or file descriptor:
I would prefer to use the find -inum , if available.
Or, how about using following way, always - any drawbacks except the unix/linux compatible!
my $filename='/tmp/tmp.txt';
open my $fh, '>', $filename;
my $fd = fileno $fh;
print readlink("/proc/$$/fd/$fd");
You can call stat or IO::Handle::stat on a filehandle -- that will give you the device and inode of the file that you have opened. With that and a little operating system wizardry you can find the filename. OK, maybe a lot of operating system wizardry.
The find command has an -inum option to find a file with a specified inode number. This is probably not going to be as efficient as caching the path when you open the file, as gbacon recommends.