glob skipping files [duplicate]

glob skipping files [duplicate] - perl

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Why does Perl's glob return undef for every other call?
This is a continuation of another problem I was having where I needed to find a file that had a long name but I knew what part of the name was, here is the code I used:
my #RTlayerRabcd = ("WIRA_Rabcd_RT","WIRB_Rabcd_RT","WIRC_Rabcd_RT","WIRD_Rabcd_RT","WIRE_Rabcd_RT","BASE_Rabcd_RT");
#Rabcd calculations
for($i = 0; $i < 6; $i++)
{
print "#RTlayerRabcd[$i]\n";
#Searching for Rabcd Room Temperature readings
my $file = glob($dir . "*#RTlayerRabcd[$i]" . '.txt');
print "$file\n";
my $Rtot = 0;
#Open file, Read line
open (FILE, $file);
while (<FILE>)
{
#read line and and seperate at "tab" into $temp, $Res
chomp;
($Res, $temp) = split("\t");
$j++;
$Rtot=$Res+$Rtot;
}
close (FILE);
$Ravg = $Rtot/$j;
print FILE_OUT "$Ravg \t";
}
After I run the code I get the following print outs:
WIRA_Rabcd_RT
Vesuvious_C6R8_051211/vesu_R6C8_05112011_WIRA_Rabcd_Rt.txt
WIRB_Rabcd_RT
WIRC_Rabcd_RT
Vesuvious_C6R8_051211/vesu_R6C8_05112011_WIRC_Rabcd_Rt.txt
WIRD_Rabcd_RT
WIRE_Rabcd_RT
BASE_Rabcd_RT
Vesuvious_C6R8_051211/vesu_R6C8_05112011_BASE_Rabcd_Rt.txt
The program seems to be skipping files, any idea why?

In scalar context, glob iterates through all files matched by the glob, returning undef after the last one. It's intended to be used as the condition in a while loop, for example:
while (my $file = glob('*.c')) {
say $file;
}
The iterator is tied to that particular call to glob. Once the iteration has started, there's no way to reset the iterator early. glob ignores its argument until after it has returned undef.
You can fix your problem by using glob in list context:
my ($file) = glob($dir . "*#RTlayerRabcd[$i]" . '.txt');

My guess would be that the directory Vesuvious_C6R8_051211 does not contain the files *WIRB_Rabcd_RT.txt, *WIRD_Rabcd_RT.txt and *WIRE_Rabcd_RT.txt. What is the file listing of that directory? Also, I recommend you check the return code for open(), to be sure that it indeed opened a file successfully.

Related

Run a script in multiple directories with multiple output files in Perl (problems comparing hash key values)

I have the script which looks something like this, which I want to use to search through the current directory I am in, open, all directories in that directory, open all files that match certain REs (fastq files that have a format such that every four lines go together), do some work with these files, and write some results to a file in each directory. (Note: the actual script does a lot more than this but I think I have a structural issue associated with the iteration over folders because the script works when a simplified version is used in one folder, and so I am posting a simplified version here)
#!user/local/perl
#Created by C. Pells, M. R. Snyder, and N. T. Marshall 2017
#Script trims and merges high throughput sequencing reads from fastq files for a specific primer set
use Cwd;
use warnings;
my $StartTime= localtime;
my $MasterDir = getcwd; #obtains a full path to the current directory
opendir (DIR, $MasterDir);
my #objects = readdir (DIR);
closedir (DIR);
foreach (#objects){
print $_,"\n";
}
my #Dirs = ();
foreach my $O (0..$#objects){
my $CurrDir = "";
if ((length ($objects[$O]) < 7) && ($O>1)){ #Checking if the length of the object name is < 7 characters. All samples are 6 or less. removing the first two elements: "." and ".."
$CurrDir = $MasterDir."/".$objects[$O]; #appends directory name to full path
push (#Dirs, $CurrDir);
}
}
foreach (#Dirs){
print $_,"\n";#checks that all directories were read in
}
foreach my $S (0..$#Dirs){
my #files = ();
opendir (DIR, $Dirs[$S]) || die "cannot open $Dirs[$S]: $!";
#files = readdir DIR; #reads in all files in a directory
closedir DIR;
my #AbsFiles = ();
foreach my $F (0..$#files){
my $AbsFileName = $Dirs[$S]."/".$files[$F]; #appends file name to full path
push (#AbsFiles, $AbsFileName);
}
foreach my $AF (0..$#AbsFiles){
if ($AbsFiles[$AF] =~ /_R2_001\.fastq$/m){ #finds reverse fastq file
my #readbuffer=();
#read in reverse fastq
my %RSeqHash;
my $c = 0;
print "Reading, reversing, complimenting, and trimming reverse fastq file $AbsFiles[$AF]\n";
open (INPUT1, $AbsFiles[$AF]) || die "Can't open file: $!\n";
while (<INPUT1>){
chomp ($_);
push(#readbuffer, $_);
if (#readbuffer == 4) {
$rsn = substr($readbuffer[0], 0, 45); #trims reverse seq name
$cc++ % 10000 == 0 and print "$rsn\n";
$RSeqHash{$rsn} = $readbuffer[1];
#readbuffer = ();
}
}
}
}
foreach my $AFx (0..$#AbsFiles){
if ($AbsFiles[$AFx] =~ /_R1_001\.fastq$/m){ #finds forward fastq file
print "Reading forward fastq file $AbsFiles[$AFx]\n";
open (INPUT2, $AbsFiles[$AFx]) || die "Can't open file: $!\n";
my $OutMergeName = $Dirs[$S]."/"."Merged.fasta";
open (OUT, ">", "$OutMergeName");
my $cc=0;
my #readbuffer = ();
while (<INPUT2>){
chomp ($_);
push(#readbuffer, $_);
if (#readbuffer == 4) {
my $fsn = substr($readbuffer[0], 0, 45); #trims forward seq name
#$cc++ % 10000 == 0 and print "$fsn\n$readbuffer[1]\n";
if ( exists($RSeqHash{$fsn}) ){ #checks to see if forward seq name is present in reverse seq hash
print "$fsn was found in Reverse Seq Hash\n";
print OUT "$fsn\n$readbuffer[1]\n";
}
else {
$cc++ % 10000 == 0 and print "$fsn not found in Reverse Seq Hash\n";
}
#readbuffer = ();
}
}
close INPUT1;
close INPUT2;
close OUT;
}
}
}
my $EndTime= localtime;
print "Script began at\t$StartTime\nCompleted at\t$EndTime\n";
Again, I know that the script works without iterating over folders. But with this version I just get empty output files. Due to the print functions I inserted in this script, I've determined that Perl cant find the variable $fsn as a key in the hash from INPUT2. I cant understand why because each file is there and it works when I don't iterate over folders so I know that the keys match. So either there is something simple I am missing or this is some sort of limitation to Perl's memory that I have found. Any help is appreciated!

Turns out my issue was with where I was declaring the hash. For some reason even though I only declare it after it finds the first input file. The script fails unless I declare the hash before the foreach loop that cycles through all items in #AbsFiles searching for the first input file, which is fine because it means that the hash is cleared in every new directory. But I don't understand why it failed like it was because it should only be declaring (or clearing) the hash when it finds the input file name. I guess I don't NEED to know why it didn't work before, but some help to understand would be nice.
I have to give credit to another user for helping me realize this. They attempted to answer my question but did not, and then gave me this hint about where I declare my hash in a comment on that answer. This answer has now been deleted so I can't credit that user for pointing me in this direction. I would love to know what they understand about Perl that I do not that made it clear to them that this was the problem. I apologize that I was busy with data analysis and a conference so I could not respond to that comment sooner.

How to get rid of the syntax error in this code?

Below I'v proided just a chunk of a huge perl script I am trying to write. I am getting syntax errors in else statement but in the console window its only saying syntax error at perl script and not clearly telling the error. I am trying to create a variable file file_no_$i.txt and copy contents of t_code.txt in it and then find and replace string in the variable file with some selected keys of hash %defines_2
open ( my $pointer, "<", "t_code.txt" ) or die $!;
my $out_pointer;
for (my $i=0 ; $i <=$#match ; $i++) {
for (my $j=0; $j <= $#match ; $j++) {
if ($match[$i]=~$match[$j]) {
next;
}
else {
my $file_name = "file_no_$i.txt";
open $out_pointer, ">" , $file_name or die "Can't open the output file!";
copy("$file_name","t_code.txt") or die "Copy failed: $!";
my #lin = <$out_pointer>;
foreach $_(#lin) {
$_ =~ s/UART90_BASE_ADDRESS/$defines_2{ $_ = grep{/$match[$i]/} (keys %defines_2)};
}
}
}
}

You cannot use / unquoted inside a s/// construct. Instead of backslashes, you can use different delimiters:
s#UART90_BASE_ADDRESS#$defines_2{ $_ = grep{/$match[$i]/} (keys %defines_2)}#;
It fixes the syntax error, but I fear it still won't do what you want. Without data, it's hard to test, though.

What I think you're doing is editing a number of text files whose names look like file_no_1.txt etc. You're doing that by copying the current file to t_code.txt and then reading that file line by line, editing as required, as writing the lines back to the original text file.
The problem with that approach is that the file will be copied and rewritten many times, and it would be better to read the whole file into an array, make all the edits, and then write them back in one operation. That would be fine unless the file is enormous — say, several GB.
Here's some code that implements that approach. You see that $file_name is defined and #lines is filled outside the inner loop. The innermost loop modifies the elements of #lines and, outside that loop again, #lines is written back to the original text file.
I couldn't fathom a couple of things about your code.
I'm not sure if you should be using =~ or if you intended a simple eq. The former does a contains test, and you had a problem in the past where you meant to check that the first string had the second at the end
The grep call
grep{/$match[$i]/} (keys %defines_2)
worries me, as it can potentially return more than one key of the %defines_2 hash, in which case your own code will insert what is pretty much a random selection from the hash elements
If your code is working then that's fine, but if not then I hope this helps you fix it. If you need more help on this chunk of code then you should include a small sample of the data so that we can better understand what is going on.
for my $i (0 .. $#match) {
my $file_name = "file_no_$i.txt";
my #lines = do {
open my $in_fh, '<', 't_code.txt' or die $!;
<$in_fh>;
};
for my $j (0 .. $#match) {
next if $match[$i] =~ $match[$j];
for ( #lines ) {
my ($match) = grep { /$match[$i]/ } keys %defines_2;
s/UART90_BASE_ADDRESS/$defines_2{$match}/;
}
}
open my $out_fh, '>', $file_name or die qq{Can't open "$file_name" for output: $!};
print $out_fh $_ for #lines;
close $out_fh or die qq{Failed to close output file "$file_name": $!};
}

how to output the second line in a multi-line file [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I have a big file with repeated lines as follows:
#UUSM
ABCDEADARFA
+------qqq
!2wqeqs6777
I will like to output the all the 'second line' in the file. I have the following code snipped for doing this, but it's not working as expected. Lines 1, 3 and 4 are in the output instead.
open(IN,"<", "file1.txt") || die "cannot open input file:$!";
while (<IN>) {
$line = $line . $_;
if ($line =~ /^\#/) {
<IN>;
#next;
my $line = $line;
}
}
print "$line";
Please help!

try this
open(IN,"<", "file1.txt") || die "cannot open input file:$!";
my $lines = "";
while (<IN>) {
if ($. % 4 == 2) $lines .= $_;
}
print "$lines";

I assume what you are asking is how to print the line that comes after a line that begins with #:
perl -ne 'if (/^\#/) { print scalar <> }' file1.txt
This says, "If the line begins with #, then print the next line. Do this for all the files in the argument list." The scalar function is used here to impose a scalar context on the file handle, so that it does not print the whole file. By default print has a list context for its arguments.
If you actually want to print the second line in the file, well, that's even easier. Here's a few examples:
Using the line number $. variable, printing if it equals line number 2.
perl -ne '$. == 2 and print, close ARGV' yourfile.txt
Note that if you have multiple files, you must close the ARGV file handle to reset the counter $.. Note also the use of the lower precedence operator and will force print and close to both be bound to the conditional.
Using regular logic.
perl -ne 'print scalar <>; close ARGV;'
perl -pe '$_ = <>; close ARGV;'
Both of these uses a short-circuit feature by closing the ARGV file handle when the second line is printed. If you should want to print every other line of a file, both these will do that if you remove the close statements.

perl -ne '$at = $. if /^\#/; print if $. - 1 == $at' file1.txt
Written out longhand, the above is equivalent to
open my $fh, "<", "file1.txt";
my $at_line = 0;
while (<$fh>) {
if (/^\#/) {
$at_line = $.;
}
else {
print if $. - 1 == $at_line;
}
}

If you want lines 2, 6, 10 printed, then:
while (<>)
{
print if $. % 4 == 2;
}
Where $. is the current line number — and I didn't spend the time opening and closing the file. That might be:
{
my $file = "file1.txt";
open my $in, "<", $file or die "cannot open input file $file: $!";
while (<$in>)
{
print if $. % 4 == 2;
}
}
This uses the modern preferred form of file handle (a lexical file handle), and the braces around the construct mean the file handle is closed automatically. The name of the file that couldn't be opened is included in the error message; the or operator is used so the precedence is correct (the parentheses and || in the original were fine too and could be used here, but conventionally are not).
If you want the line after a line starting with # printed, you have to organize things differently.
my $print_next = 0;
while (<>)
{
if ($print_next)
{
print $_;
$print_next = 0;
}
elsif (m/^#/)
{
$print_next = 1;
}
}
Dissecting the code in the question
The original version of the code in the question was (line numbers added for convenience):
1 open(IN,"<", "file1.txt") || die "cannot open input file:$!";
2 while (<IN>) {
3 $line = $line . $_;
4 if ($line =~ /^\#/) {
5 <IN>;
6 #next;
7 my $line = $line;
8 }
9 }
10 print "$line";
Discussion of each line:
OK, though it doesn't use a lexical file handle or report which file could not be opened.
OK.
Premature and misguided. This adds the current line to the variable $line before any analysis is done. If it was desirable, it could be written $line .= $_;
Suggests that the correct description for the desired output is not 'the second lines' but 'the line after a line starting with #. Note that since there is no multi-line modifier on the regex, this will always match only the first line segment in the variable $line. Because of the premature concatenation, it will match on each line (because the first line of data starts with #), executing the code in lines 5-8.
Reads another line into $_. It doesn't test for EOF, but that's harmless.
Comment line; no significance except to suggest some confusion.
my $line = $line; is a self-assignment to a new variable hiding the outer $line...mainly, this is weird and to a lesser extent it is a no-op. You are not using use strict; and use warnings; because you would have warnings if you did. Perl experts use use strict; and use warnings; to make sure they haven't made silly mistakes; novices should use them for the same reason.
Of itself, OK. However, the code in the condition has not really done very much. It skips the second line in the file; it will later skip the fourth, the sixth, the eighth, etc.
OK.
OK, but...if you're only interested in printing the lines after the line starting #, or only interested in printing the line numbers 2N+2 for integral N, then there is no need to build up the entire string in memory before printing each line. It will be simpler to print each line that needs printing as it is found.

Avoiding regex match variable being reused

Basically, I'm looping through html files and looking for a couple of regexes. They match which is fine, but I don't expect every file to contain matches, but when the loop runs, every iteration contains the same match (despite it not being in that file). I assume that by using $1 it is persisting through each iteration.
I've tried using an arbitary regex straight after each real match to reset it, but that doesn't seem to work. The thread I got that idea from seemed to have a lot of argument etc on best practice and the original questions problem, so I thought it would be worth asking for specific advice to my code. It's likely not written in a great way either:
# array of diff filenames
opendir(TDIR, "$folder/diff/$today") || die "can't opendir $today: $!";
#diffList = grep !/^\.\.?$/, readdir(TDIR);
closedir TDIR;
# List of diff files
print "List of Diff files:\n" . join("\n", #diffList) . "\n\n";
for($counter = 0; $counter < scalar(#diffList); $counter++) {
# Open diff file, read in to string
$filename = $diffList[$counter];
open FILE, "<", "$folder/diff/$today/$filename";
while(<FILE>) {
$lines .= $_;
}
close FILE or warn "$0: close today/$filename: $!";
# Use regular expressions to extract the found differences
if($lines =~ m/$plus1(.*?)$span/s) {
$plus = $1;
"a" =~ m/a/;
} else {$plus = "0";}
if($lines =~ m/$minus1(.*?)$span/s) {
$minus = $1;
"a" =~ m/.*/;
} else {$minus = "0";}
# If changes were found, send them to the database
if($plus ne "0" && $minus ne "0") {
# Do stuff
}
$plus = "0";
$minus = "0";
}
If I put a print inside the "do stuff" if, it's always true and always shows the same two values that are found in one of the files.
Hopefully I've explained my situation well enough. Any advice is appreciated, thanks.

It may be that your code appends lines from newly-read files onto $lines. Have you tried explicitly clearing it after each iteration?

It's already been answered, but you could also consider a different syntax for reading the file. It can be noticeably quicker and helps you avoid little bugs like this.
Just add this to read the file between the open/close:
local $/ = undef;
$lines = <FILE>;
That'll temporarily unset the line separator so it reads the whole file at once. Just enclose it in a { } block if you need to read another file in the same scope.

I can I separate multiple logical pages in a text file I create in Perl?

So far, I've been successful with generating output to individual files by opening a file for output as part of outer loop and closing it after all output is written. I had used a counting variable ($x) and appended .txt onto it to create a filename, and had written it to the same directory as my perl script. I want to step the code up a bit, prompt for a file name from the user, open that file once and only once, and write my output one "printed letter" per page. Is this possible in plain text? From what I understand, chr(12) is an ascii line feed character and will get me close to what I want, but is there a better way? Thanks in advance, guys. :)
sub PersonalizeLetters{
print "\n\n Beginning finalization of letters...";
print "\n\n I need a filename to save these letters to.";
print "\n Filename > ";
$OutFileName = <stdin>;
chomp ($OutFileName);
open(OutFile, ">$OutFileName");
for ($x=0; $x<$NumRecords; $x++){
$xIndex = (6 * $x);
$clTitle = #ClientAoA[$xIndex];
$clName = #ClientAoA[$xIndex+1];
#I use this 6x multiplier because my records have 6 elements.
#For this routine I'm only interested in name and title.
#Reset OutLetter array
#Midletter has other merged fields that aren't specific to who's receiving the letter.
#OutLetter = #MiddleLetter;
for ($y=0; $y<=$ifLength; $y++){
#Step through line by line and insert the name.
$WorkLine = #OutLetter[$y];
$WorkLine =~ s/\[ClientTitle\]/$clTitle/;
$WorkLine =~ s/\[ClientName\]/$clName/;
#OutLetter[$y] = $WorkLine;
}
print OutFile "#OutLetter";
#Will chr(12) work here, or is there something better?
print OutFile chr(12);
$StatusX = $x+1;
print "Writing output $StatusX of $NumRecords... \n\n";
}
close(OutFile);
}

Separate the "pages" with form feeds, but you have to do it after each page, not at the end. I'm not sure what PersonalizeLetters is supposed to do, but it looks like you what to use it to print all of the letters. In that case, I think you just need to restructure it a bit. Do all the setup outside the subroutine, pass in the filename, then do what you need to do for each record. After you process a record, print the form feed:
sub PersonalizeLetters
{
my( $OutFileName ) = #_;
open my $out, '>', $OutFileName
or die "Could not open $OutFileName: $!";
for( $x=0; $x < $NumRecords; $x++ )
{
print "Writing output $x of $NumRecords...\n\n";
print $out $stuff_for_this_record;
print $out "\f";
}
}

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

glob skipping files [duplicate] - perl

My guess would be that the directory Vesuvious_C6R8_051211 does not contain the files WIRB_Rabcd_RT.txt, WIRD_Rabcd_RT.txt and *WIRE_Rabcd_RT.txt. What is the file listing of that directory? Also, I recommend you check the return code for open(), to be sure that it indeed opened a file successfully.

Related

Run a script in multiple directories with multiple output files in Perl (problems comparing hash key values)

How to get rid of the syntax error in this code?

how to output the second line in a multi-line file [closed]

Avoiding regex match variable being reused

I can I separate multiple logical pages in a text file I create in Perl?

Categories

Resources