Accessing a file in perl - perl

In my script I am dealing with opening files and writing to files. I found that there is some thing wrong with a file I try to open, the file exists, it is not empty and I am passing the right path to file handle.
I know that my question might sounds weird but while I was debugging my code I put the following command in my script to check some files
system ("ls");
Then my script worked well, when it's removed it does not work correctly anymore.
my #unique = ("test1","test2");
open(unique_fh,">orfs");
print unique_fh #unique ;
open(ORF,"orfs")or die ("file doesnot exist");
system ("ls");
while(<ORF>){
split ;
}
#neworfs=#_ ;
print #neworfs ;

Perl buffers the output when you print to a file. In other words, it doesn't actually write to the file every time you say print; it saves up a bunch of data and writes it all at once. This is faster.
In your case, you couldn't see anything you had written to the file, because Perl hadn't written anything yet. Adding the system("ls") call, however, caused Perl to write your output first (the interpreter is smart enough to do this, because it thinks you might want to use the system() call to do something with the file you just created).
How do you get around this? You can close the file before you open it again to read it, as choroba suggested. Or you can disable buffering for that file. Put this code just after you open the file:
my $fh = select (unique_fh);
$|=1;
select ($fh);
Then anytime you print to the file, it will get written immediately ($| is a special variable that sets the output buffering behavior).
Closing the file first is probably a better idea, although it is possible to have a filehandle for reading and writing open at the same time.

You did not close the filehandle before trying to read from the same file.

Related

Is this a standard Perl language construction or a customization: open HANDLE, ">$fname"

Not a Perl guru, working with an ancient script, ran into a construct I didn't recognize that yields results I don't expect. Curious whether this is the standard language, or a PM customization of sorts:
open FILE1, ">./$disk_file" or die "Can't open file: $disk_file: $?";
From the looks of this, file is to be opened for writing, but the log error says that file is not found. Perl's file i/o expects 3 parameters, not 2. Log doesn't have the die output, instead saying: "File not found"
Confused a bit here.
EDIT: Made it work using the answers below. Seemed like I was running a cashed version of the .pl for some time, instead of the newly-edited. Finally it caught up with a 2-param open, thanks y'all for your help!
That is the old 2-argument form of open. The second argument is a bit magical:
if it starts with '>' the remainder of the string is used as the name of a file to open for writing
if it starts with '<' the remainder of the string is used as the name of a file to open for reading (this is the default if '<' is omitted)
if it ends with '|' the string up to that point is interpreted as a command which is executed with its STDOUT connected to a pipe which your script will open for reading
if it starts with '|' the string after that point is interpreted as a command which is executed with its STDIN connected to a pipe which your script will open for writing
This is a potentially security vulnerability because if your script accepts a filename as user input, the user can add a '|' at the beginning or end to trick your script into running a command.
The 3-argument form of open was added in (I think) version 5.8 so it has been a standard part of Perl for a very long time.
The FILE1 part is known as a bareword filehandle - which is a global. Modern style would be to use a lexical scalar like my $file1 instead.
See perldoc perlopen for details but, in brief...
Perl's open() will accept either two or three parameters (there's even a one-parameter version - which no-one ever uses). The two-parameter version is a slightly older style where the open mode and the filename are joined together in the second parameter.
So what you have is equivalent to:
open FILE1, '>', "./$disk_file" or die "Can't open file: $disk_file: $?";
A couple of other points.
We prefer to use lexical variables as filehandles these days (so, open my $file1, ... instead of open FILE1, ...).
I think you'll find that $! will be more useful in the error message than $?. $? contains the error from a child process, but there's no child process here.
Update: And none of this seems to be causing the problems that you're seeing. That seems to be caused by a file actually not being in the expected place. Can you please edit your question to add the exact error message that you're seeing.
The other answers here are correct that's the two-argument syntax. They've done a good job covering why and how you should ideally change it, so I won't rehash here.
However they haven't tried to help you fix it, so let me try that...
This is a guess, but I suspect $disk_file contains a filename with a path (eg my_logs/somelog.log), and the directory part (my_logs in my entirely guessed example) doesn't exists, so is throwing an error. You could create that directory, or alter whatever sets that variable so it's writing to a location that does exist.
Bear in mind these paths will be relative to wherever you're running the script from - not relative to the script itself, so if there's a log directory (or whatever) in the same dir as the script you may want to cd to the script's dir first.

perl print to a file and STDOUT which is the file

My program is trying to print to a file which for it is the STDOUT.
To say, print "text here"; prints to a file x.log , while I am also trying to print to the file x.log using file-handler method as in print FH1 "text here"; . I notice that when the file-handler method statement is provided first and then followed by the STDOUT procedure. My second print can override the first.I would like to know more on why this occurs.
This makes me think of a race condition or that the file-handler is relatively slow (if it goes through buffer?) than the STDOUT print statements. This I am not sure on how if that is the way Perl works. Perl version - 5.22.0
As far as I understand your program basically looks like this:
open(my $fh,'>','foobar.txt');
print $fh "foo\n";
print "bar\n"; # prints to STDOUT
And then you use it in a way that STDOUT is redirected in the shell to the same file which is already opened in your program:
$ perl test.pl > foobar.txt
This will open two independent file handles to the same file: one within your program and the other within the shell where you start the program. Both file handles manage their own file position for writing, start at position 0 and advance the position after each write.
Since these file handles are independent from each other they will not care if there are other file handles dealing currently with this file, no matter if these other file handles are inside or outside the program. This means that these writes will overwrite each other.
In addition to this there is also internal buffering done, i.e. each print will first result into a write into some internal buffer and might result immediately into a write to the file handle. When data are written to the file handle depends on the mode of the file handle, i.e. unbuffered, line-buffered or a buffer of a specific size. This makes the result kind of unpredictable.
If you don't want this behavior but still want to write to the same file using multiple file handle you better use the append-mode, i.e. open with >> instead of > in both Perl code and shell. This will make sure that all data will be appended to the end of the file instead of written to the file position maintained by the file handle. This way data will not get overwritten. Additionally you might want to make the file handles unbuffered so that data in the file end up in the same order as the print statements where done:
open(my $fh,'>>','foobar.txt');
$fh->autoflush(1); # make $fh unbuffered
$|=1; # make STDOUT unbuffered
print $fh "foo\n";
print "bar\n"; # prints to STDOUT
$ perl test.pl >> foobar.txt

How to run a local program with user input in Perl

I'm trying to get user input from a web page written in Perl and send it to a local program (blastp), then display the results.
This is what I have right now:
(input code)
print $q->p, "Your database: $bd",
$q->p, "Your protein is: $prot",
$q->p, "Executing...";
print $q->p, system("blastp","-db $bd","-query $prot","-out results.out");
Now, I've done a little research, but I can't quite grasp how you're supposed to do things like this in Perl. I've tried opening a file, writing to it, and sending it over to blastp as an input, but I was unsuccessful.
For reference, this line produces a successful output file:
kold#sazabi ~/BLAST/pataa $ blastp -db pataa -query ../teste.fs -out results.out
I may need to force the bd to load from an absolute path, but that shouldn't be difficult.
edit: Yeah, the DBs have an environmental variable, that's fixed. Ok, all I need is to get the input into a file, pass it to the command, and then print the output file to the CGI page.
edit2: for clarification:
I am receiving user input in $prot, I want to pass it over to blastp in -query, have the program blastp execute, and then print out to the user the results.out file (or just have a link to it, since blastp can output in HTML)
EDIT:
All right, fixed everything I needed to fix. The big problem was me not seeing what was going wrong: I had to install Tiny:Capture and print out stderr, which was when I realized the environmental variable wasn't getting set correctly, so BLAST wasn't finding my databases. Thanks for all the help!
Write $prot to the file. Assuming you need to do it as-is without processing the text to split it or something:
For a fixed file name (may be problematic):
use File::Slurp;
write_file("../teste.fs", $prot, "\n") or print_error_to_web();
# Implement the latter to print error in nice HTML format
For a temp file (better):
my ($fh, $filename) = tempfile( $template, DIR => "..", CLEANUP => 1);
# You can also create temp directory which is even better, via tempdir()
print $fh "$prot\n";
close $fh;
Step 2: Run your command as you indicated:
my $rc = system("$BLASTP_PATH/blastp", "-db", "pataa"
,"-query", "../teste.fs", "-out", "results.out");
# Process $rc for errors
# Use qx[] instead of system() if you want to capture
# standard output of the command
Step 3: Read the output file in:
use File::Slurp;
my $out_file_text = read_file("results.out");
Send back to web server
print $q->p, $out_file_text;
The above code has multiple issues (e.g. you need better file/directory paths, more error handling etc...) but should start you on the right track.

Perl Skipping Some Files When Looping Across Files and Writing Their Contents to Output File

I'm having an issue with Perl and I'm hoping someone here can help me figure out what's going on. I have about 130,000 .txt files in a directory called RawData and I have a Perl program that loads them into an array, then loops through this array, loading each .txt file. For simplicity, suppose I have four text files I'm looping through
File1.txt
File2.txt
File3.txt
File4.txt
The contents of each .txt file look something like this:
007 C03XXYY ZZZZ
008 A01XXYY ZZZZ
009 A02XXYY ZZZZ
where X,Y,Z are digits. In my simplified code below, the program then pulls out just line 007 in each .txt file, saves XX as ID, ignores YY and grabs the variable data ZZZZ that I've called VarVal. Then it writes everything to a file with a header specified in the code below:
#!/usr/bin/perl
use warnings;
use strict;
open(OUTFILE, "> ../Data/OutputFile.csv") or die $!;
opendir(MYDIR,"../RawData")||die $!;
my #txtfiles=grep {/\.txt$/} readdir(MYDIR);
closedir(MYDIR);
print OUTFILE "ID,VarName,VarVal\n";
foreach my $txtfile (#txtfiles){
#Prints to the screen so I can see where I am in the loop.
print $txtfile","\n";
open(INFILE, "< ../RawData/$txtfile") or die $!;
while(<INFILE>){
if(m{^007 C03(\d{2})(\d+)(\s+)(.+)}){
print OUTFILE "$1,VarName,$4\n"
}
}
}
The issue I'm having is that the contents of, for example File3.txt, don't show up in OutputFile.csv. However, it's not an issue with Perl not finding a match because I checked that the if statement is being executed by deleting OUTFILE and looking at what the code prints to the terminal screen. What shows up is exactly what should be there.
Furthermore, If I just run the problematic file (File3.txt) through the loop itself by commenting out the opendir and closedir stuff and doing something like my #textfile = "File3.txt";. Then when I run the code, the only data that shows up in the OutputFile.csv IS what's in File3.txt. But when it goes through the loop, it won't show up in OutputFile.csv. Plus, I know that File3.txt is being sent to into the loop because I can see it being printed on the screen with print $txtfile","\n";. I'm at a loss as to what is going on here.
The other issue is that I don't think it's something specific to this one particular file (maybe it is) but I can't just troubleshoot this one file because I have 130,000 files and I just happened to stumble across the fact that this one wasn't being written to the output file. So there may be other files that also aren't getting written, even though there is no obvious reason they shouldn't be just like the case of File3.txt.
Perhaps because I'm doing so many files in rapid succession, looping 130,000 files, causes some sort of I/O issues that randomly fails every so often to write the contents in memory to the output file? That's my best guess but I have not idea how to diagnose or fix this.
This is kind of a difficult question to debug, but I'm hoping someone on here has some insight or has seen similar problems that would provide me with a solution.
Thanks
There's nothing obviously wrong that I can see in your code. It is a little outdated as using autodie and lexical filehandles would be better.
However, I would recommend that you make your regex slightly less restrictive by making the spacing variable length after the first value and making the last variable optionally of 0 length. I'd also output the filename as well. Then you can see which other files aren't being caught for whatever reason:
if (m{^007\s+C03(\d{2})\d+\s+(.*)}){
print OUTFILE "$txtfile $1,VarName,$2\n";
last;
}
Finally, assuming there is only a single 007 C03 in each file, you could throw in a last call after one is found.
You may want to try sorting the #txtfiles list, then trying to systematically look through the output to see what is or isn't there. With 130k files in random order, it would be pretty difficult to be certain that you missed one. Perl should be giving you the files in the actual order they appear in the directory, which is different that user level commands like ls, so it may be different that you'd expect.

Perl script getting stuck in terminal for no apparent reason

I have a Perl script which reads three files and writes new files after reading each one of them. Everything is one thread.
In this script, I open and work with three text files and store the contents in a hash. The files are large (close to 3 MB).
I am using a loop to go through each of the files (open -> read -> Do some action (hash table) -> close)
I am noticing that the whenever I am scanning through the first file, the Perl terminal window in my Cygwin shell gets stuck. The moment I hit the enter key I can see the script process the rest of the files without any issues.
It's very odd as there is no read from STDIN in my script. Moreover, the same logic applies to all the three files as everything is in the same loop.
Has anyone here faced a similar issue? Does this usually happen when dealing with large files or big hashes?
I can't post the script here, but there is not much in it to post anyway.
Could this just be a problem in my Cygwin shell?
If this problem does not go away, how can I circumvent it? Like providing the enter input when the script is in progress? More importantly, how can I debug such a problem?
sub read_set
{
#lines_in_set = ();
push #lines_in_set , $_[0];
while (<INPUT_FILE>)
{ $line = $_;
chomp($line);
if ($line=~ /ENDNEWTYPE/i or $line =~ /ENDSYNTYPE/ or eof())
{
push #lines_in_set , $line;
last;
}
else
{
push #lines_in_set , $line;
}
}
return #lines_in_set;
}
--------> I think i found the problem :- or eof() call was ensuring that the script would be stuck !! Somehow happening only at the first time. I have no idea why though
The eof() call is the problem. See perldoc -f eof.
eof with empty parentheses refers to the pseudo file accessed via while (<>), which consists of either all the files named in #ARGV, or to STDIN if there are none.
And in particular:
Note that this function actually reads a character and then "ungetc"s it, so isn't useful in an interactive context.
But your loop reads from another handle, one called INPUT_FILE.
It would make more sense to call eof(INPUT_FILE). But even that probably isn't necessary; your outer loop will terminate when it reaches the end of INPUT_FILE.
Some more suggestions, not related to the symptoms you're seeing:
Add
use strict;
use warnings;
near the top of your script, and correct any error messages this produces (perl -cw script-name does a compile-only check). You'll need to declare your variables using my (perldoc -f my). And use consistent indentation; I recommend the same style you'll find in most Perl documentation.