perl: open filehandle, write into it, give it a name later on? - perl

I think I've read how to do this somewhere but I can't find where. Maybe it's only possible in new(ish) versions of Perl. I am using 5.14.2:
I have a Perl script that writes down results into a file if certain criteria are met. It's more logical given the structure of the script to write down the results and later on check if the criteria to save the results into a file are met.
I think I've read somewhere that I can write content into a filehandle, which in Linux I guess will correspond to a temporary file or a pipe of some sorts, and then give the name to that file, including the directory where it should be, later on. If not, the content will be discarded when the script finishes.
Other than faffing around temporary files and deleting them manually, is there a straightforward way of doing this in Perl?

There's no simple (UNIX) facility for what you describe, but the behavior can be composed out of basic system operations. Perl's File::Temp already does most of what you want:
use File:Temp;
my $tmp = File::Temp->new; # Will be unlinked at end of program.
while ($work_to_do) {
print $tmp a_lot_of_stuff(); # $tmp is a filehandle
}
if ($save_it) {
rename($tmp, $new_file); # $tmp is also a string. Move (rename) the file.
} # If you need this to work across filesystems, you
# might want to ``use File::Copy qw(move)'' instead.
exit; # $tmp will be unlinked here if it was not renamed

I use File::Temp for this.
But you should have in mind that File::Temp deletes the file by default. That is OK but in my case I don't want that when debugging. If the script terminates and the output is not the desired one I can not check the temp file.
So I prefer to set $KEEP_ALL=1 or $fh->unlink_on_destroy( 0 ); when OO or ($fh, $filename) = tempfile($template, UNLINK => 0); and then unlink the file myself or move to a proper place.
It would be safer to move the file after closing the filehandle (just in case there is some buffering going on). So I would prefer an approach where temp file is not deleted by default and then when all is done, set a conditional that either delete it or move it to your desired place and name.

Related

How to pipe to and read from the same tempfile handle without race conditions?

Was debugging a perl script for the first time in my life and came over this:
$my_temp_file = File::Temp->tmpnam();
system("cmd $blah | cmd2 > $my_temp_file");
open(FIL, "$my_temp_file");
...
unlink $my_temp_file;
This works pretty much like I want, except the obvious race conditions in lines 1-3. Even if using proper tempfile() there is no way (I can think of) to ensure that the file streamed to at line 2 is the same opened at line 3. One solution might be pipes, but the errors during cmd might occur late because of limited pipe buffering, and that would complicate my error handling (I think).
How do I:
Write all output from cmd $blah | cmd2 into a tempfile opened file handle?
Read the output without re-opening the file (risking race condition)?
You can open a pipe to a command and read its contents directly with no intermediate file:
open my $fh, '-|', 'cmd', $blah;
while( <$fh> ) {
...
}
With short output, backticks might do the job, although in this case you have to be more careful to scrub the inputs so they aren't misinterpreted by the shell:
my $output = `cmd $blah`;
There are various modules on CPAN that handle this sort of thing, too.
Some comments on temporary files
The comments mentioned race conditions, so I thought I'd write a few things for those wondering what people are talking about.
In the original code, Andreas uses File::Temp, a module from the Perl Standard Library. However, they use the tmpnam POSIX-like call, which has this caveat in the docs:
Implementations of mktemp(), tmpnam(), and tempnam() are provided, but should be used with caution since they return only a filename that was valid when function was called, so cannot guarantee that the file will not exist by the time the caller opens the filename.
This is discouraged and was removed for Perl v5.22's POSIX.
That is, you get back the name of a file that does not exist yet. After you get the name, you don't know if that filename was made by another program. And, that unlink later can cause problems for one of the programs.
The "race condition" comes in when two programs that probably don't know about each other try to do the same thing as roughly the same time. Your program tries to make a temporary file named "foo", and so does some other program. They both might see at the same time that a file named "foo" does not exist, then try to create it. They both might succeed, and as they both write to it, they might interleave or overwrite the other's output. Then, one of those programs think it is done and calls unlink. Now the other program wonders what happened.
In the malicious exploit case, some bad actor knows a temporary file will show up, so it recognizes a new file and gets in there to read or write data.
But this can also happen within the same program. Two or more versions of the same program run at the same time and try to do the same thing. With randomized filenames, it is probably exceedingly rare that two running programs will choose the same name at the same time. However, we don't care how rare something is; we care how devastating the consequences are should it happen. And, rare is much more frequent than never.
File::Temp
Knowing all that, File::Temp handles the details of ensuring that you get a filehandle:
my( $fh, $name ) = File::Temp->tempfile;
This uses a default template to create the name. When the filehandle goes out of scope, File::Temp also cleans up the mess.
{
my( $fh, $name ) = File::Temp->tempfile;
print $fh ...;
...;
} # file cleaned up
Some systems might automatically clean up temp files, although I haven't care about that in years. Typically is was a batch thing (say once a week).
I often go one step further by giving my temporary filenames a template, where the Xs are literal characters the module recognizes and fills in with randomized characters:
my( $name, $fh ) = File::Temp->tempfile(
sprintf "$0-%d-XXXXXX", time );
I'm often doing this while I'm developing things so I can watch the program make the files (and in which order) and see what's in them. In production I probably want to obscure the source program name ($0) and the time; I don't want to make it easier to guess who's making which file.
A scratchpad
I can also open a temporary file with open by not giving it a filename. This is useful when you want to collect outside the program. Opening it read-write means you can output some stuff then move around that file (we show a fixed-length record example in Learning Perl):
open(my $tmp, "+>", undef) or die ...
print $tmp "Some stuff\n";
seek $tmp, 0, 0;
my $line = <$tmp>;
File::Temp opens the temp file in O_RDWR mode so all you have to do is use that one file handle for both reading and writing, even from external programs. The returned file handle is overloaded so that it stringifies to the temp file name so you can pass that to the external program. If that is dangerous for your purpose you can get the fileno() and redirect to /dev/fd/<fileno> instead.
All you have to do is mind your seeks and tells. :-) Just remember to always set autoflush!
use File::Temp;
use Data::Dump;
$fh = File::Temp->new;
$fh->autoflush;
system "ls /tmp/*.txt >> $fh" and die $!;
#lines = <$fh>;
printf "%s\n\n", Data::Dump::pp(\#lines);
print $fh "How now brown cow\n";
seek $fh, 0, 0 or die $!;
#lines2 = <$fh>;
printf "%s\n", Data::Dump::pp(\#lines2);
Which prints
[
"/tmp/cpan_htmlconvert_DPzx.txt\n",
"/tmp/cpan_htmlconvert_DunL.txt\n",
"/tmp/cpan_install_HfUe.txt\n",
"/tmp/cpan_install_XbD6.txt\n",
"/tmp/cpan_install_yzs9.txt\n",
]
[
"/tmp/cpan_htmlconvert_DPzx.txt\n",
"/tmp/cpan_htmlconvert_DunL.txt\n",
"/tmp/cpan_install_HfUe.txt\n",
"/tmp/cpan_install_XbD6.txt\n",
"/tmp/cpan_install_yzs9.txt\n",
"How now brown cow\n",
]
HTH

Accessing a file in perl

In my script I am dealing with opening files and writing to files. I found that there is some thing wrong with a file I try to open, the file exists, it is not empty and I am passing the right path to file handle.
I know that my question might sounds weird but while I was debugging my code I put the following command in my script to check some files
system ("ls");
Then my script worked well, when it's removed it does not work correctly anymore.
my #unique = ("test1","test2");
open(unique_fh,">orfs");
print unique_fh #unique ;
open(ORF,"orfs")or die ("file doesnot exist");
system ("ls");
while(<ORF>){
split ;
}
#neworfs=#_ ;
print #neworfs ;
Perl buffers the output when you print to a file. In other words, it doesn't actually write to the file every time you say print; it saves up a bunch of data and writes it all at once. This is faster.
In your case, you couldn't see anything you had written to the file, because Perl hadn't written anything yet. Adding the system("ls") call, however, caused Perl to write your output first (the interpreter is smart enough to do this, because it thinks you might want to use the system() call to do something with the file you just created).
How do you get around this? You can close the file before you open it again to read it, as choroba suggested. Or you can disable buffering for that file. Put this code just after you open the file:
my $fh = select (unique_fh);
$|=1;
select ($fh);
Then anytime you print to the file, it will get written immediately ($| is a special variable that sets the output buffering behavior).
Closing the file first is probably a better idea, although it is possible to have a filehandle for reading and writing open at the same time.
You did not close the filehandle before trying to read from the same file.

Perl Subdirectory Traversal

I am writing a script that goes through our large directory of Perl Scripts and checks for certain things. Right now, it takes two kinds of input: the main directory, or a single file. If a single file is provided, it runs the main function on that file. If the main directory is provided, it runs the main function on every single .pm and .pl inside that directory (due to the recursive nature of the directory traversal).
How can I write it (or what package may be helpful)- so that I can also enter one of the seven SUBdirectories, and it will traverse ONLY that subdirectory (instead of the entire thing)?
I can't really see the difference in processing between the two directory arguments. Surely, using File::Find will just do the right thing in both instances.
Something like this...
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
my $input = shift;
if (-f $input) {
# Handle a single file
handle_a_file($input);
} else {
# Handler a directory
handle_a_directory($input);
}
sub handle_a_file {
my $file = shift;
# Do whatever you need to with a single file
}
sub handle_a_directory {
my $dir = shift;
find(\&do this, $dir);
}
sub do_this {
return unless -f;
return unless /\.p[ml]$/;
handle_a_file($File::Find::name);
}
One convenient way would be to use the excellent Path::Class module, more precisely: the traverse() method of Path::Class::Dir. You'd control what to process from within the callback function which is supplied as the first argument to traverse(). The manpages has sample snippets.
Using the built-ins like opendir is perfectly fine, of course.
I've just turned to using Path::Class almost everywhere, though, as it has so many nice convenience methods and simply feels right. Be sure to read the docs for Path::Class::File to know what's available. Really does the job 99% of the time.
If you know exactly what directory and subdirectories you want to look at you can use glob("$dir/*/*/*.txt") for example to get ever .txt file in 3rd level of the given $dir

How can I generate random unique temp file names?

I am trying to create a temp file using the following code:
use File::Temp ;
$tmp = File::Temp->new( TEMPLATE => 'tempXXXXX',
DIR => 'mydir',
SUFFIX => '.dat');
This is create the temp file. Because of my permission issue, the other program is not able to write into file.
So I just want to generate the file name without creating the file. Is there any where to do that?
If you don't create the file at the same time you create the name then it is possible for the a file with the same name to be created before you create the file manually. If you need to have a different process open the file, simply close it first:
#!/usr/bin/perl
use strict;
use warnings;
use File::Temp;
sub get_temp_filename {
my $fh = File::Temp->new(
TEMPLATE => 'tempXXXXX',
DIR => 'mydir',
SUFFIX => '.dat',
);
return $fh->filename;
}
my $filename = get_temp_filename();
open my $fh, ">", $filename
or die "could not open $filename: $!";
The best way to handle the permissions problem is to make sure the users that run the two programs are both in the same group. You can then use chmod to change the permissions inside the first program to allow the second program (or any user in that group) to modify the file:
my $filename = get_temp_filename();
chmod 0660, $filename;
Just to obtain the name of the tempfile you can do:
#!/usr/bin/perl
use strict;
use warnings;
use 5.10.1;
use File::Temp qw/tempfile/;
my $file;
(undef, $file) = tempfile('tmpXXXXXX', OPEN=>0);
say $file;
But as Chas. Owens said, be careful the same name could be created before you use it.
The get_temp_filename function proposed by Chas. Owens uses a local filehandle object ($fh), which is destroyed upon function return, leading to the created tempfile destruction.
To avoid this, and therefore keep the file (less risk) add:
UNLINK => 0
to the new method arguments, forbidding file unlink at object deletion time.
Actually, I agree with Chas.Owens - the design is fatally flawed.
It really feels like you need to fix the design, so:
If you have control of the 2nd program, have that program create the filename and the file, and pass the filename to the 1st program.
But, if the 2nd program isn't something you wrote and so you cannot modify it then I'd recommend one of the following:
1 - Use the first processes PID as part of the file name in an attempt to minimize the risks of duplicate filenames.
2 - Have the 2nd program pipe its output to the 1st program, don't bother with a file at all. Personally, this is a much better solution than 1.
3 - Wrap the 2nd program in a script (shell, perl, whatever) which creates the name and the file and passes that to both programs.

How do I delete a random value from an array in Perl?

I'm learning Perl and building an application that gets a random line from a file using this code:
open(my $random_name, "<", "out.txt");
my #array = shuffle(<$random_name>);
chomp #array;
close($random_name) or die "Error when trying to close $random_name: $!";
print shift #array;
But now I want to delete this random name from the file. How I can do this?
shift already deletes a name from the array.
So does pop (one from the beginning, one from the end) - I would suggest using pop as it may be more efficient and being a random one, you don't care which on you use.
Or do you need to delete it from a file?
If that's the case, you need to:
A. get a count of names inside a file (if small, read it all in memory using File::Slurp, if large, either read it line-by-line and count or simply execute wc -l $filename command via backticks.
B. Generate a random # from 1 to <$ of lines> (say, $random_line_number
C. Read the file line by line. For every line read, WRITE it to another temp file (use File::Temp to generate temp files. Except do NOT write the line numbered $random_line_number to text file
D. Close temp file and move it instead of your original file
If the list contains filenames and you need to delete the file itself (the random file), use unlink() function. Don't forget to process return code from unlink() and, like with any IO operation, print error message containing $! which will be the text of system error on failure.
Done.
D.
When you say "delete this … from the list" do you mean delete it from the file? If you simply mean remove it from #array then you've already done that by using shift. If you want it removed from the file, and the order doesn't matter, simply write the remaining names in #array back into the file. If the file order does matter, you're going to have to do something slightly more complicated, such as reopen the file, read the items in in order, except for the one you don't want, and then write them all back out again. Either that, or take more notice of the order when you read the file.
If you need to delete a line from a file (its not entirely clear from your question) one of the simplest and most efficient ways is to use Tie::File to manipulate a file as if it were an array. Otherwise perlfaq5 explains how to do it the long way.