How can I generate random unique temp file names? - perl

I am trying to create a temp file using the following code:
use File::Temp ;
$tmp = File::Temp->new( TEMPLATE => 'tempXXXXX',
DIR => 'mydir',
SUFFIX => '.dat');
This is create the temp file. Because of my permission issue, the other program is not able to write into file.
So I just want to generate the file name without creating the file. Is there any where to do that?

If you don't create the file at the same time you create the name then it is possible for the a file with the same name to be created before you create the file manually. If you need to have a different process open the file, simply close it first:
#!/usr/bin/perl
use strict;
use warnings;
use File::Temp;
sub get_temp_filename {
my $fh = File::Temp->new(
TEMPLATE => 'tempXXXXX',
DIR => 'mydir',
SUFFIX => '.dat',
);
return $fh->filename;
}
my $filename = get_temp_filename();
open my $fh, ">", $filename
or die "could not open $filename: $!";
The best way to handle the permissions problem is to make sure the users that run the two programs are both in the same group. You can then use chmod to change the permissions inside the first program to allow the second program (or any user in that group) to modify the file:
my $filename = get_temp_filename();
chmod 0660, $filename;

Just to obtain the name of the tempfile you can do:
#!/usr/bin/perl
use strict;
use warnings;
use 5.10.1;
use File::Temp qw/tempfile/;
my $file;
(undef, $file) = tempfile('tmpXXXXXX', OPEN=>0);
say $file;
But as Chas. Owens said, be careful the same name could be created before you use it.

The get_temp_filename function proposed by Chas. Owens uses a local filehandle object ($fh), which is destroyed upon function return, leading to the created tempfile destruction.
To avoid this, and therefore keep the file (less risk) add:
UNLINK => 0
to the new method arguments, forbidding file unlink at object deletion time.

Actually, I agree with Chas.Owens - the design is fatally flawed.
It really feels like you need to fix the design, so:
If you have control of the 2nd program, have that program create the filename and the file, and pass the filename to the 1st program.
But, if the 2nd program isn't something you wrote and so you cannot modify it then I'd recommend one of the following:
1 - Use the first processes PID as part of the file name in an attempt to minimize the risks of duplicate filenames.
2 - Have the 2nd program pipe its output to the 1st program, don't bother with a file at all. Personally, this is a much better solution than 1.
3 - Wrap the 2nd program in a script (shell, perl, whatever) which creates the name and the file and passes that to both programs.

Related

How to pipe to and read from the same tempfile handle without race conditions?

Was debugging a perl script for the first time in my life and came over this:
$my_temp_file = File::Temp->tmpnam();
system("cmd $blah | cmd2 > $my_temp_file");
open(FIL, "$my_temp_file");
...
unlink $my_temp_file;
This works pretty much like I want, except the obvious race conditions in lines 1-3. Even if using proper tempfile() there is no way (I can think of) to ensure that the file streamed to at line 2 is the same opened at line 3. One solution might be pipes, but the errors during cmd might occur late because of limited pipe buffering, and that would complicate my error handling (I think).
How do I:
Write all output from cmd $blah | cmd2 into a tempfile opened file handle?
Read the output without re-opening the file (risking race condition)?
You can open a pipe to a command and read its contents directly with no intermediate file:
open my $fh, '-|', 'cmd', $blah;
while( <$fh> ) {
...
}
With short output, backticks might do the job, although in this case you have to be more careful to scrub the inputs so they aren't misinterpreted by the shell:
my $output = `cmd $blah`;
There are various modules on CPAN that handle this sort of thing, too.
Some comments on temporary files
The comments mentioned race conditions, so I thought I'd write a few things for those wondering what people are talking about.
In the original code, Andreas uses File::Temp, a module from the Perl Standard Library. However, they use the tmpnam POSIX-like call, which has this caveat in the docs:
Implementations of mktemp(), tmpnam(), and tempnam() are provided, but should be used with caution since they return only a filename that was valid when function was called, so cannot guarantee that the file will not exist by the time the caller opens the filename.
This is discouraged and was removed for Perl v5.22's POSIX.
That is, you get back the name of a file that does not exist yet. After you get the name, you don't know if that filename was made by another program. And, that unlink later can cause problems for one of the programs.
The "race condition" comes in when two programs that probably don't know about each other try to do the same thing as roughly the same time. Your program tries to make a temporary file named "foo", and so does some other program. They both might see at the same time that a file named "foo" does not exist, then try to create it. They both might succeed, and as they both write to it, they might interleave or overwrite the other's output. Then, one of those programs think it is done and calls unlink. Now the other program wonders what happened.
In the malicious exploit case, some bad actor knows a temporary file will show up, so it recognizes a new file and gets in there to read or write data.
But this can also happen within the same program. Two or more versions of the same program run at the same time and try to do the same thing. With randomized filenames, it is probably exceedingly rare that two running programs will choose the same name at the same time. However, we don't care how rare something is; we care how devastating the consequences are should it happen. And, rare is much more frequent than never.
File::Temp
Knowing all that, File::Temp handles the details of ensuring that you get a filehandle:
my( $fh, $name ) = File::Temp->tempfile;
This uses a default template to create the name. When the filehandle goes out of scope, File::Temp also cleans up the mess.
{
my( $fh, $name ) = File::Temp->tempfile;
print $fh ...;
...;
} # file cleaned up
Some systems might automatically clean up temp files, although I haven't care about that in years. Typically is was a batch thing (say once a week).
I often go one step further by giving my temporary filenames a template, where the Xs are literal characters the module recognizes and fills in with randomized characters:
my( $name, $fh ) = File::Temp->tempfile(
sprintf "$0-%d-XXXXXX", time );
I'm often doing this while I'm developing things so I can watch the program make the files (and in which order) and see what's in them. In production I probably want to obscure the source program name ($0) and the time; I don't want to make it easier to guess who's making which file.
A scratchpad
I can also open a temporary file with open by not giving it a filename. This is useful when you want to collect outside the program. Opening it read-write means you can output some stuff then move around that file (we show a fixed-length record example in Learning Perl):
open(my $tmp, "+>", undef) or die ...
print $tmp "Some stuff\n";
seek $tmp, 0, 0;
my $line = <$tmp>;
File::Temp opens the temp file in O_RDWR mode so all you have to do is use that one file handle for both reading and writing, even from external programs. The returned file handle is overloaded so that it stringifies to the temp file name so you can pass that to the external program. If that is dangerous for your purpose you can get the fileno() and redirect to /dev/fd/<fileno> instead.
All you have to do is mind your seeks and tells. :-) Just remember to always set autoflush!
use File::Temp;
use Data::Dump;
$fh = File::Temp->new;
$fh->autoflush;
system "ls /tmp/*.txt >> $fh" and die $!;
#lines = <$fh>;
printf "%s\n\n", Data::Dump::pp(\#lines);
print $fh "How now brown cow\n";
seek $fh, 0, 0 or die $!;
#lines2 = <$fh>;
printf "%s\n", Data::Dump::pp(\#lines2);
Which prints
[
"/tmp/cpan_htmlconvert_DPzx.txt\n",
"/tmp/cpan_htmlconvert_DunL.txt\n",
"/tmp/cpan_install_HfUe.txt\n",
"/tmp/cpan_install_XbD6.txt\n",
"/tmp/cpan_install_yzs9.txt\n",
]
[
"/tmp/cpan_htmlconvert_DPzx.txt\n",
"/tmp/cpan_htmlconvert_DunL.txt\n",
"/tmp/cpan_install_HfUe.txt\n",
"/tmp/cpan_install_XbD6.txt\n",
"/tmp/cpan_install_yzs9.txt\n",
"How now brown cow\n",
]
HTH

List all variables loaded by 'require' function

I have a config file with bunch of data structures (arrays, hashes) and I load them into my perl script using
require '<config>';
I can use the variables from config that I know of but is there a way that I can list all the variables loaded by the require function? Ideally I would want them to load into a hash variable and refer them to avoid variable name conflicts
Not easily, and this is why relying on global named variables is problematic. Instead, have your config file return a single data structure (like a hashref, so you can name parts of it) and load it with do into a lexical variable:
use strict;
use warnings;
my $file = '/path/to/foo.conf';
my $data = do $file;
die "Failed to parse $file: $#" if !defined $data and $#;
die "Failed to read $file: $!" if !defined $data;
Make sure either to pass an absolute path to the file (recommended, to avoid depending on what your current working directory happens to be) or prepend a relative path with ./, otherwise do (and require) will search #INC for the file, which since Perl 5.26 does not contain the current working directory. See Path::This for a way to get an absolute path relative to the current file.

Calling one Perl program from another

I have two Perl files and I want to call one file from another with arguments
First file a.pl
$OUTFILE = "C://programs/perls/$ARGV[0]";
# this should be some out file created inside work like C://programs/perls/abc.log
Second File abc.pl
require "a.pl" "abc.log";
# $OUTFILE is a variable inside a.pl and want to append current file's name as log.
I want it to create an output file with the name of log as that of current file.
One more constraint I have is to use $OUTFILE in both a.pl and abc.pl.
If there is any better approach please suggest.
The require keyword only takes one argument. That's either a file name or a package name. Your line
require "a.pl" "abc.log";
is wrong. It gives a syntax error along the lines of String found where operator expected.
You can require one .pl file from another .pl, but that is very old-fashioned, badly written Perl code.
If neither file defines a package then the code is implicitly placed in the main package. You can declare a package variable in the outside file and use it in the one that is required.
In abc.pl:
use strict;
use warnings;
# declare a package variable
our $OUTFILE = "C://programs/perls/filename";
# load and execute the other program
require 'a.pl';
And in a.pl:
use strict;
use warnings;
# do something with $OUTFILE, like use it to open a file handle
print $OUTFILE;
If you run this, it will print
C://programs/perls/filename
You should convert your perl file you want to call to a perl module:
Hello.pm
#!/usr/bin/perl
package Hello;
use strict;
use warnings;
sub printHello {
print "Hello $_[0]\n"
}
1;
Then you can call it:
test.pl
#!/usr/bin/perl
use strict;
use warnings;
# you have to put the current directory to the module search path
use lib (".");
use Hello;
Hello::printHello("a");
I tested it in git bash on windows, maybe you have to do some modifications in your environment.
In this way you can pass as many arguments as you would like to, and you don't have to look for the variables you are using and maybe not initialized (this is a less safe approach I think, e.g. sometimes you will delete something you did't really want) somewhere in the file you want to call. The disadvantage is that you need to learn a bit about perl modules but I think it definitely worths.
A second approach could be to use the exec/system call (you can pass arguments in this way too; if forking a child process is acceptable), but that is an another story.
I would do this another way. Have the program take the name of the log file as a command-line parameter:
% perl a.pl name-of-log-file
Inside a.pl, open that file to append to it then output whatever you like. Now you can run it from many other sorts of places besides another Perl program.
# a.pl
my $log_file = $ARGV[0] // 'default_log_name';
open my $fh, '>>:utf8', $log_file or die ...;
print { $fh } $stuff_to_output;
But, you could also call if from another Perl program. The $^X is the path to the currently running perl and this uses system in the slightly-safer list form:
system $^X, 'a.pl', $name_of_log_file
How you get something into $name_of_log_file is up to you. In your example you already knew the value in your first program.

Perl: Substitute text string with value from list (text file or scalar context)

I am a perl novice, but have read the "Learning Perl" by Schwartz, foy and Phoenix and have a weak understanding of the language. I am still struggling, even after using the book and the web.
My goal is to be able to do the following:
Search a specific folder (current folder) and grab filenames with full path. Save filenames with complete path and current foldername.
Open a template file and insert the filenames with full path at a specific location (e.g. using substitution) as well as current foldername (in another location in the same text file, I have not gotten this far yet).
Save the new modified file to a new file in a specific location (current folder).
I have many files/folders that I want to process and plan to copy the perl program to each of these folders so the perl program can make new .
I have gotten so far ...:
use strict;
use warnings;
use Cwd;
use File::Spec;
use File::Basename;
my $current_dir = getcwd;
open SECONTROL_TEMPLATE, '<secontrol_template.txt' or die "Can't open SECONTROL_TEMPLATE: $!\n";
my #secontrol_template = <SECONTROL_TEMPLATE>;
close SECONTROL_TEMPLATE;
opendir(DIR, $current_dir) or die $!;
my #seq_files = grep {
/gz/
} readdir (DIR);
open FASTQFILENAMES, '> fastqfilenames.txt' or die "Can't open fastqfilenames.txt: $!\n";
my #fastqfiles;
foreach (#seq_files) {
$_ = File::Spec->catfile($current_dir, $_);
push(#fastqfiles,$_);
}
print FASTQFILENAMES #fastqfiles;
open (my ($fastqfilenames), "<", "fastqfilenames.txt") or die "Can't open fastqfilenames.txt: $!\n";
my #secontrol;
foreach (#secontrol_template) {
$_ =~ s/#/$fastqfilenames/eg;
push(#secontrol,$_);
}
open SECONTROL, '> secontrol.txt' or die "Can't open SECONTROL: $!\n";
print SECONTROL #secontrol;
close SECONTROL;
close FASTQFILENAMES;
My problem is that I cannot figure out how to use my list of files to replace the "#" in my template text file:
my #secontrol;
foreach (#secontrol_template) {
$_ =~ s/#/$fastqfilenames/eg;
push(#secontrol,$_);
}
The substitute function will not replace the "#" with the list of files listed in $fastqfilenames. I get the "#" replaced with GLOB(0x8ab1dc).
Am I doing this the wrong way? Should I not use substitute as this can not be done, and then rather insert the list of files ($fastqfilenames) in the template.txt file? Instead of the $fastqfilenames, can I substitute with content of file (e.g. s/A/{r file.txt ...). Any suggestions?
Cheers,
JamesT
EDIT:
This made it all better.
foreach (#secontrol_template) {
s/#/$fastqfilenames/g;
push #secontrol, $_;
}
And as both suggestions, the $fastqfiles is a filehandle.
replaced this: open (my ($fastqfilenames), "<", "fastqfilenames.txt") or die "Can't open fastqfilenames.txt: $!\n";
with this:
my $fastqfilenames = join "\n", #fastqfiles;
made it all good. Thanks both of you.
$fastqfilenames is a filehandle. You have to read the information out of the filehandle before you can use it.
However, you have other problems.
You are printing all of the filenames to a file, then reading them back out of the file. This is not only a questionable design (why read from the file again, since you already have what you need in an array?), it also won't even work:
Perl buffers file I/O for performance reasons. The lines you have written to the file may not actually be there yet, because Perl is waiting until it has a large chunk of data saved up, to write it all at once.
You can override this buffering behavior in a few different ways (closing the file handle being the simplest if you are done writing to it), but as I said, there is no reason to reopen the file again and read from it anyway.
Also note, the /e option in a regex replacement evaluates the replacement as Perl code. This is not necessary in your case, so you should remove it.
Solution: Instead of reopening the file and reading it, just use the #fastqfiles variable you previously created when replacing in the template. It is not clear exactly what you mean by replacing # with the filenames.
Do you want to to replace each # with a list of all filenames together? If so, you should probably need to join the filenames together in some way before doing the replacement.
Do you want to create a separate version of the template file for each filename? If so, you need an inner for loop that goes over each filename for each template. And you will need something other than a simple replacement, because the replacement will change the original string on the first time through. If you are on Perl 5.16, you could use the /r option to replace non-destructively: push(#secontrol,s/#/$file_name/gr); Otherwise, you should copy to another variable before doing the replacement.
$_ =~ s/#/$fastqfilenames/eg;
$fastqfilenames is a file handle, not the file contents.
In any case, I recommend the use of Text::Template module in order to do this kind of work (file text substitution).

Error with opening a filehandle

I have just begun working with Perl, I am only at the introductory level, and I have been having trouble with opening filehandles.
Here is the code:
#!/usr/bin/perl -w
$proteinfilename = 'peptide';
open(PROTEINFILE, $proteinfilename) or die "Can't write to file '$proteinfilename' [$!]\n";
$protein = <PROTEINFILE>;
close PROTEINFILE;
print $protein;
exit;
Every time I tried to run the program, it gave me an error
readline() on closed filehandle PROTEINFILE at C:\BIN\protein.pl
or
Can't write to file 'peptide' [No such file or directory]
Can you please help me figure this out. I have the file peptide saved as a .txt and its in the same folder as the protein.pl. What else can I do to make this work?
You're telling perl to open file peptide in the current directory, but it doesn't find such a file there ("No such file or directory").
Perhaps the current directory isn't C:\BIN, the directory in which you claim the file is located. You can address that by moving the file, using an absolute path, or changing the
current directory to be the one where teh script is located.
use Cwd qw( realpath );
use Path::File qw( file );
chdir(file(realpath($0))->dir);
Perhaps the file isn't named peptide. It might actually be named peptide.txt, for example. Windows hides extensions it recognises by default, a feature I HATE. You can address this by renaming the file or by using the correct file name.
Are you looking to open the file for reading or writing? Your open statement opens it for reading; your error message says 'writing'. You use it for reading — so your error message is confusing, I believe.
If you get 'No such file or directory' errors, it means that despite what you thought, the name 'peptide' is not the name of a file in the current directory. Perl does not add extensions to file names for you; if your file is actually peptide.txt (since you mention that it is a 'txt file'), then that's what you need to specify to open. If you run perl protein.pl and peptide (or peptide.txt) is in the current directory, then it is not clear what your problem is. If your script is in C:\BIN directory and your current directory is not C:\BIN but peptide (or peptide.txt) is also in C:\BIN, then you need to arrange to open C:/bin/peptide or c:/bin/peptide.txt. Note the switch from backslashes to slashes. Backslashes have meanings specific to Perl as an escape character, and Windows is happy with slashes in place of backslashes. If you must use backslashes, then use single quotes around the name:
my $proteinfilename = 'C:\BIN\peptide.txt';
It may be simplest to take the protein file name from a command line argument; this gives you the flexibility of having the script anywhere on your PATH and the file anywhere you choose.
Two suggestions to help your Perl:
Use the 3-argument form of open and lexical file handles, as in:
open my $PROTEINFILE, '<', $proteinfilename or
die "Can't open file '$proteinfilename' for reading [$!]\n";
my $protein = <$PROTEINFILE>;
close $PROTEINFILE;
Note that this reads a single line from the file. If you need to slurp the whole file into $protein, then you have to do a little more work. There are modules to handle slurping for you, but you can also simply use:
my $protein;
{ local $/; $protein = <$PROTEINFILE>; }
This sets the line delimiter to undef which means the entire file is slurped in one read operation. The $/ variable is global, but this adjusts its value in a minimal scope. Note that $protein was declared outside the block containing the slurp operation!
Use use strict; as well as -w or use warnings;. It will save you grief over time.
I've only been using Perl for 20 years; I don't write a serious script without both use strict; and use warnings; because I don't trust my ability to spot silly mistakes (and Perl will do it for me). I don't make all that many mistakes, but Perl has saved me on many occasions because I use them.
Here how your program will go
#!/usr/bin/perl
use strict;
use warnings;
my $proteinfilename = 'peptide.txt';
open(PROTEINFILE, $proteinfilename) or die "Can't write to file '$proteinfilename' [$!]\n";
my $protein = <PROTEINFILE>;
close PROTEINFILE;
print $protein;
You need to add the file extension(for example .txt) at the end like below.
my $proteinfilename = 'peptide.txt';
Your program say peptide_test.pl and input text file peptide.txt should be in the same directory.
If they are not in the same directory, use absolute path like below.
my $proteinfilename = 'C:\somedirectory\peptide.txt';
Note: Use single quotes in case of absolute path.This will ignore the backslash\ in path.
Now about errors, If you don't use die statement, you will get error
readline<> on closed filehandle PROTEINFILE at C:\BIN\protein.pl
After using die,
or die $! ;
you will get error No such file or directory.
Also always
use strict;
use warnings;
-w is deprecated after perl 5.6. These two lines/statements will help you finding typos,syntax errors
And one more,I don't think you need exit;, at the end.
Refer exit function.