Shrinking file that is opened in read/write - perl

In perl:
I have a file opened in read/write, with an exclusive lock.
open( $f, "+< $filename" );
flock( $f, LOCK_EX );
If I write more data to the file than it previously held, the file will grow.
If I write less data, my new contents are at the beginning, but the old contents are still there at the end of the file.
This isn't surprising, however it's not what I want.
Is there a simple way to shrink the file while it is opened in read/write? Basically I want to tell it to end the file at exactly this byte position.
I know I can open it differently, and I'm considering doing that, but a one line fix would be nice.

truncate

I actually don't know about perl, but since ftruncate(2) would be the C function, maybe this helps?

Related

Faster way to read a file line by line in perl

I have to read a big unix file line by line using perl. the script is taking more than 2 mins to run in case of big file but takes lesser time for small file.
I am using following code:
open(FILE , "filename");
while ( < FILE > ){
}
Please let me know a way to parse file faster
What do you mean by a "big file" and a "small file"? What size are those files? How many lines do they have?
Unless your big file is absolutely huge, it seems likely that what is slowing your program down is not the reading from a file, but whatever you're doing in the while loop. To prove me wrong, you'd just need to run your program with nothing in the while loop to see how long that takes.
Assuming that I'm right, then you need to work out what section of your processing is causing the problems. Without seeing that code we, obviously, can't be any help there. But that's where a tool like Devel::NYTProf would be useful.
I'm not sure where you learned your Perl from, but the idiom you're using to open your file is rather outdated. These days we would a) use lexical variables as filehandles, b) use the 3-argument version of open() and c) always check the return value from open() and take appropriate action.
open(my $fh, '<', 'filename')
or die "Cannot open 'filename': $!\n";
while ( < $fh > ) {
...
}
If you have the memory, #array = <fh> then look thru array

Perl File::Copy isn't working

I am trying to copy a file to a new filename using File::Copy but getting an error saying the file doesn't exist.
print "\nCopying $hash->{Filename1} to $hash->{Filename2}.\n"
copy( $hash->{Filename1}, $hash->{Filename2} ) or die "Unable to copy model. Copy failed: $!";
I have checked that both references are populated (by printing them) and that $hash->{Filename1} does actually exist - and it does.
my error message is this
Unable to copy model. Copy failed: No such file or directory at B:\Script.pl line 467.
Anyone got any ideas of what I might have done wrong? I use this exact same line earlier in my script with no problems so I'm a bit confused.
Is there a file size limit on File::Copy?
Many thanks.
Filename1 may exist but what about Filename2?
Your error message states "No such file or directory at ..." so I'd be investigating the possibility that the directory you're trying to copy the file to is somehow deficient.
You may also want to check permissions if the destination directory and file do exist.
First step is to print out both file names before attempting the copy so that you can see what they are, and investigate the problem from that viewpoint. You should also post those file names in your question so that we can help further. It may well be that there's a dodgy character in one of the filenames, such as a newline you forgot to chomp off.
Re your question on file size limits, I don't believe the module itself imposes one. If you don't provide the buffer size, it uses a maximum of 2G for the chunks usedfor transferring data but there's nothing in the module that restricts the overall size.
It may be that the underlying OS restricts it but, unless your file is truly massive or you're very low on disk space, that's not going to come into play. However, since you appear to be working from the b: drive, that may be a possibility you want to check. I wasn't even aware people used floppy disks any more :-)
Check that there is no extra whitespace or other hard to spot problems with your filename variables with:
use Data::Dumper;
$Data::Dumper::Useqq = 1;
print Dumper( { filename1 => $hash->{Filename1}, filename2 => $hash->{Filename2} } );

Accessing a file in perl

In my script I am dealing with opening files and writing to files. I found that there is some thing wrong with a file I try to open, the file exists, it is not empty and I am passing the right path to file handle.
I know that my question might sounds weird but while I was debugging my code I put the following command in my script to check some files
system ("ls");
Then my script worked well, when it's removed it does not work correctly anymore.
my #unique = ("test1","test2");
open(unique_fh,">orfs");
print unique_fh #unique ;
open(ORF,"orfs")or die ("file doesnot exist");
system ("ls");
while(<ORF>){
split ;
}
#neworfs=#_ ;
print #neworfs ;
Perl buffers the output when you print to a file. In other words, it doesn't actually write to the file every time you say print; it saves up a bunch of data and writes it all at once. This is faster.
In your case, you couldn't see anything you had written to the file, because Perl hadn't written anything yet. Adding the system("ls") call, however, caused Perl to write your output first (the interpreter is smart enough to do this, because it thinks you might want to use the system() call to do something with the file you just created).
How do you get around this? You can close the file before you open it again to read it, as choroba suggested. Or you can disable buffering for that file. Put this code just after you open the file:
my $fh = select (unique_fh);
$|=1;
select ($fh);
Then anytime you print to the file, it will get written immediately ($| is a special variable that sets the output buffering behavior).
Closing the file first is probably a better idea, although it is possible to have a filehandle for reading and writing open at the same time.
You did not close the filehandle before trying to read from the same file.

Perl: Clear contents of file and open file in append mode

I need to open a file in append mode in Perl, but I need that before opening file all the data is deleted and fresh data goes in.
I will be entering data line by line, so before entering the first line I need all previous data is deleted.
Please help.
I think you are confused about what "append" means in perl. What you are describing is opening a file and truncating it, i.e.:
open my $fh, '>', $file;
This will delete the contents of $file and open a new file with the same name.
The reason to use open for appending is when you have a file that you do not wish to overwrite. I.e. the difference between > and >> is simply that the former truncates the existing file and begins writing at the start of the file, and the latter skips to the end of the existing file and starts writing there.
Documentation here
truncate
File handling includes:
Read a file (<)
Write in a file
Append (>>)
Overwrite (>)
For detailed explanation please visit this link.
open(fileHandle, ">", $filePath);

How do I flush a file in Perl?

I have Perl script which appends a new line to the existing file every 3 seconds. Also, there is a C++ application which reads from that file.
The problem is that the application begins to read the file after the script is done and file handle is closed. To avoid this I want to flush after each line append. How can I do that?
Try:
use IO::Handle;
$fh->autoflush;
This was actually posted as a way of auto-flushing in an early question of mine, which asked about the universally accepted bad way of achieving this :-)
TL/DR: use IO::Handle and the flush method, eg:
use IO::Handle;
$myfile->flush();
First, you need to decide how "flushed" you want it. There can be quite a few layers of buffering:
Perl's internal buffer on the file handle. Other programs can't see data until it's left this buffer.
File-system level buffering of "dirty" file blocks. Other programs can still see these changes, they seem "written", but they'll be lost if the OS or machine crashes.
Disk-level write-back buffering of writes. The OS thinks these are written to disk, but the disk is actually just storing them in volatile memory on the drive. If the OS crashes the data won't be lost, but if power fails it might be unless the disk can write it out first. This is a big problem with cheap consumer SSDs.
It gets even more complicated when SANs, remote file systems, RAID controllers, etc get involved. If you're writing via pipes there's also the pipe buffer to consider.
If you just want to flush the Perl buffer, you can close the file, print a string containing "\n" (since it appears that Perl flushes on newlines), or use IO::Handle's flush method.
You can also, per the perl faq use binmode or play with $| to make the file handle unbuffered. This is not the same thing as flushing a buffered handle, since queuing up a bunch of buffered writes then doing a single flush has a much lower performance cost than writing to an unbuffered handle.
If you want to flush the file system write back buffer you need to use a system call like fsync(), open your file in O_DATASYNC mode, or use one of the numerous other options. It's painfully complicated, as evidenced by the fact that PostgreSQL has its own tool just to test file syncing methods.
If you want to make sure it's really, truly, honestly on the hard drive in permanent storage you must flush it to the file system in your program. You also need to configure the hard drive/SSD/RAID controller/SAN/whatever to really flush when the OS asks it to. This can be surprisingly complicated to do and is quite OS/hardware specific. "plug-pull" testing is strongly recommended to make sure you've really got it right.
From 'man perlfaq5':
$old_fh = select(OUTPUT_HANDLE);
$| = 1;
select($old_fh);
If you just want to flush stdout, you can probably just do:
$| = 1;
But check the FAQ for details on a module that gives you a nicer-to-use abstraction, like IO::Handle.
Here's the answer - the real answer.
Stop maintaining an open file handle for this file for the life of the process.
Start abstracting your file-append operation into a sub that opens the file in append mode, writes to it, and closes it.
# Appends a new line to the existing file
sub append_new_line{
my $linedata = shift;
open my $fh, '>>', $fnm or die $!; # $fnm is file-lexical or something
print $fh $linedata,"\n"; # Flavor to taste
close $fh;
}
The process observing the file will encounter a closed file that gets modified whenever the function is called.
All of the solutions suggesting setting autoflush are ignoring the basic fact that most modern OS's are buffering file I/O irrespective of what Perl is doing.
You only possibility to force the commitment of the data to disk is by closing the file.
I'm trapped with the same dilemma atm where we have an issue with rotation of the log being written.
To automatically flush the output, you can set autoflush/$| as described by others before you output to the filehandle.
If you've already output to the filehandle and need to ensure that it gets to the physical file, you need to use the IO::Handle flush and sync methods.
There an article about this in PerlDoc: How do I flush/unbuffer an output filehandle? Why must I do this?
Two solutions:
Unbuffer the output filehandler with : $|
Call the autoflush method if you are using IO::Handle or one of its subclasses.
An alternative approach would be to use a named pipe between your Perl script and C++ program, in lieu of the file you're currently using.
For those who are searching a solution to flush output line by line to a file in Ansys CFD Post using a Session File (*.cse), this is the only solution that worked for me:
! $file="Test.csv";
! open(OUT,"+>>$file");
! select(OUT);$|=1; # This is the important line
! for($i=0;$i<=10;$i++)
! {
! print out "$i\n";
! sleep(3);
! }
Note that you need the exclamation marks at every begin of every line that contains Perl script. sleep(3); is only applied for demonstration reasons. use IO::Handle; is not needed.
The genuine correct answer is to use:-
$|=1; # Make STDOUT immediate (non-buffered)
and although that is one cause of your problem, the other cause of the same problem is this: "Also, there is a C++ application which reads from that file."
It is EXTREMELY NON-TRIVIAL to write C++ code which can properly read from a file that is growing, because your "C++" program will encounter an EOF when it gets to the end... (you cannot read past the end of a file without serious extra trickery) - you have to do a pile of complicated stuff with IO blocking and flags to properly monitor a file this way (like how the linux "tail" command works).
I had the same problem with the only difference of writing the same file over and over again with new content. This association of "$| = 1" and autoflush worked for me:
open (MYFILE, '>', '/internet/web-sites/trot/templates/xml_queries/test.xml');
$| = 1; # Before writing!
print MYFILE "$thisCardReadingContentTemplate\n\n";
close (MYFILE);
MYFILE->autoflush(1); # After writing!
Best of luck.
H