Perl while loops not working - perl

I'm quite new to perl and apologies if this has already been answered in a previous discussion. I have a script that needs to use the declared variables outside the loops, but only one loop is working, even though I have declared the variables outside of the loop, the code is:
my $sample;
open(IN, 'ls /*_R1_*.gz |');
while (my $sample = <IN>) {
chomp $sample;
print "sample = $sample\n";
my $fastq1="${sample}"; #need to use fastq1 later on hence it's declared here
my $sample2;
open(IN, 'ls /*_R2_*.gz |');
while (my $sample2 = <IN>) {
chomp $sample2;
print "sample2 = $sample2\n";
my $fastq2="${sample2}"; #need to use fastq2 later on hence it's declared here
}
}
Sample2 works but sample1 does not, only the first sample is output and then the loop goes onto sample2, the output is:
sample =/sample1_R1_001.fastq.gz
sample2 =/sample1_R2_001.fastq.gz
sample2 =/sample2_R2_001.fastq.gz
sample2 =/sample3_R2_001.fastq.gz
etc..
Can anyone figure this out?
Thanks

From your comments, I assume that your problem is probably that you declare $fastq1 and $fastq2 inside the loop. That means they will be out of scope outside the loops, and not accessible. You need something like:
my ($fastq1, $fastq2);
while ( ... ) {
....
$fastq1 = $sample;
}
Note that this will only save the last value in the loop of that variable. The others will of course be overwritten each loop iteration. If you have more values to save, use an array or hash.
Some other notes on your code.
You should always use
use strict;
use warnings;
Not doing so is a very bad idea, as it will only hide the errors and warnings, not solve them.
my $sample;
You declare this variable twice.
open(IN, 'ls /*_R1_*.gz |');
This is just bad on all possible levels:
System calls are always the least desirable option, unless no alternatives exist
Perl has many ways of reading file names
Parsing the output of ls is fragile and not portable
Piping the result of the system command through open is compounding the other flaws with this approach.
Recommended solution: Use either opendir + readdir or glob:
for my $files (</*_R1_*.gz>) { ... }
# or
opendir my $dh, "/" or die $!;
while (my file = readdir $dh) {
next unless $file =~ /_R1_.*\.gz$/;
...
}
my $fastq1 = "${sample}";
You do not need to quote a variable. Nor use support curly braces.
When declaring the variable with my inside a loop, it only retains its value that single loop iteration. Since you never use this variable, I assume you meant to use it outside the loop. But it will be out of scope there.
This can be written
my $fastq1 = $sample;
But you probably want to declare those variables outside your while loops, or they will be out of scope there. You should know that this will only save the last value for these variables, of course.
Also, as Rohit says, your loops are nested, which I assume is not what you wanted. This is most likely because you do not use a proper text editor to write your code, so your indentation is all messed up, and it is hard to see where one loop ends. Follow Rohit's advice there.

You are closing the first while loop after the end of 2nd while loop. Because of that, your 2nd while loop become a part of your 1st while loop, wherein, you are re-assigning the file handler - IN to a different file. And since you are exhausting it in the inner while loop, your outer while loop never run again.
You should close the brace before starting the next while:
while(my $sample = <IN>){
chomp $sample;
print "sample = $sample\n";
my $fastq1="${sample}";
} # You need this
my $sample2;
open(IN, 'ls /data_n2/vmistry/Fluidigm_Exome/300bp_fastq/*_R2_*.gz |');
while(my $sample2 = <IN>){
chomp $sample2;
print "sample2 = $sample2\n";
my $fastq2="${sample2}";
}
# } # Remove this

Related

Data::Dumper wraps second word's output

I'm experiencing a rather odd problem while using Data::Dumper to try and check on my importing of a large list of data into a hash.
My Data looks like this in another file.
##Product ID => Market for product
ABC => Euro
XYZ => USA
PQR => India
Then in my script, I'm trying to read in my list of data into a hash like so:
open(CONFIG_DAT_H, "<", $config_data);
while(my $line = <CONFIG_DAT_H>) {
if($line !~ /^\#/) {
chomp($line);
my #words = split(/\s*\=\>\s/, $line);
%product_names->{$words[0]} = $words[1];
}
}
close(CONFIG_DAT_H);
print Dumper (%product_names);
My parsing is working for the most part that I can find all of my data in the hash, but when I print it using the Data::Dumper it doesn't print it properly. This is my output.
$VAR1 = 'ABC';
';AR2 = 'Euro
$VAR3 = 'XYZ';
';AR4 = 'USA
$VAR5 = 'PQR';
';AR6 = 'India
Does anybody know why the Dumper is printing the '; characters over the first two letters on my second column of data?
There is one unclear thing in the code: is *product_names a hash or a hashref?
If it is a hash, you should use %product_names{key} syntax, not %product_names->{key}, and need to pass a reference to Data::Dumper, so Dumper(\%product_names).
If it is a hashref then it should be labelled with a correct sigil, so $product_names->{key} and Dumper($product_names}.
As noted by mob if your input has anything other than \n it need be cleaned up more explicitly, say with s/\s*$// per comment. See the answer by ikegami.
I'd also like to add, the loop can be simplified by loosing the if branch
open my $config_dat_h, "<", $config_data or die "Can't open $config_data: $!";
while (my $line = <$config_dat_h>)
{
next if $line =~ /^\#/; # or /^\s*\#/ to account for possible spaces
# ...
}
I have changed to the lexical filehandle, the recommended practice with many advantages. I have also added a check for open, which should always be in place.
Humm... this appears wrong to me, even you're using Perl6:
%product_names->{$words[0]} = $words[1];
I don't know Perl6 very well, but in Perl5 the reference should be like bellow considering that %product_names exists and is declared:
$product_names{...} = ... ;
If you could expose the full code, I can help to solve this problem.
The file uses CR LF as line endings. This would become evident by adding the following to your code:
local $Data::Dumper::Useqq = 1;
You could convert the file to use unix line endings (seeing as you are on a unix system). This can be achieved using the dos2unix utility.
dos2unix config.dat
Alternatively, replace
chomp($line);
with the more flexible
$line =~ s/\s+\z//;
Note: %product_names->{$words[0]} makes no sense. It happens to do what you want in old versions of Perl, but it rightfully throws an error in newer versions. $product_names{$words[0]} is the proper syntax for accessing the value of an element of a hash.
Tip: You should be using print Dumper(\%product_names); instead of print Dumper(%product_names);.
Tip: You might also find local $Data::Dumper::Sortkeys = 1; useful. Data::Dumper has such bad defaults :(
Tip: Using split(/\s*=>\s*/, $line, 2) instead of split(/\s*=>\s*/, $line) would permit the value to contain =>.
Tip: You shouldn't use global variable without reason. Use open(my $CONFIG_DAT_H, ...) instead of open(CONFIG_DAT_H, ...), and replace other instances of CONFIG_DAT_H with $CONFIG_DAT_H.
Tip: Using next if $line =~ /^#/; would avoid a lot of indenting.

Syntax errors at line 24 and 26. I don't know why?

syntax error at bioinfo2.pl line 24, near ");"
syntax error at bioinfo2.pl line 26, near "}"
Execution of bioinfo2.pl aborted due to compilation errors.
print "Enter file name......\n\n";
chomp($samplefile = <STDIN>);
open(INFILE,"$samplefile") or die "Could not open $samplefile";
#residue_name= ();
#residue_count= ();
while($newline = <INFILE>)
{
if ($newline =~ /^ATOM/)
{
chomp $newline;
#columns = split //, $newline;
$res = join '', $columns[17], $columns[18], $columns[19];
splice #columns,0;
$flag=0
for ($i = 0; $i<scalar(#residue_name); $i++;)
{
if (#residue_name[i] == $res)
{
#residue_count[i] = #residue_count[i] + 1;
$flag=1;
}
}
if($flag==0)
{
push(#residue_name, $res);
}
for ($i = 0; $i<scalar(#residue_name); $i++)
{
print (#residue_name[i], "-------", #residue_count[i], "\n");
}
}
}
It might be advisable to use strict; use warnings. That forces you to declare your variables (you can do so with my), and rules out many possible errors.
Here are a few things that I noticed:
In Perl5 v10 and later, you can use the say function (use 5.010 or use feature 'say'). This works like print but adds a newline at the end.
Never use the two-arg form of open. This opens some security issues. Provide an explicit open mode. Also, you can use scalars as filehandles; this provides nice features like auto-closing of files.
open my $INFILE, '<', $samplefile or die "Can't open $samplefile: $!";
The $! variable contains the reason why the open failed.
If you want to retrieve a list of elements from an array, you can use a slice (multiple subscripts):
my $res = join '', #columns[17 .. 19]; # also, range operator ".."
Note that the sigil is now an #, because we take multiple elems.
The splice #columns, 0 is a fancy way of saying “delete all elements from the array, and return them”. This is not neccessary (you don't read from that variable later). If you use lexical variables (declared with my), then each iteration of the while loop will receive a new variable. If you really want to remove the contents, you can undef #columns. This should be more efficient.
Actual error: You require a semicolon after $flag = 0 to terminate the statement before you can begin a loop.
Actual error: A C-style for-loop contains three expressions contained in parens. Your last semicolon divides them into 4 expressions, this is an error. Simply remove it, or look at my next tip:
C-style loops (for (foo; bar; baz) {}) are painful and error-prone. If you only iterate over a range (e.g. of indices), then you can use the range operator:
for my $i (0 .. $#residue_name) { ... }
The $# sigil gives the last index of an array.
When subscripting arrays (accessing array elements), then you have to include the sigil of the index:
$residue_name[$i]
Note that the sigil of the array is $, because we access only one element.
The pattern $var = $var + 1 can be shortened to $var++. This uses the increment operator.
The $flag == 0 could be abbreviated to !$flag, as all numbers except zero are considered true.
Here is a reimplementation of the script. It takes the filename as a command line argument; this is more flexible than prompting the user.
#!/usr/bin/perl
use strict; use warnings; use 5.010;
my $filename = $ARGV[0]; # #ARGV holds the command line args
open my $fh, "<", $filename or die "Can't open $filename: $!";
my #residue_name;
my #residue_count;
while(<$fh>) { # read into "$_" special variable
next unless /^ATOM/; # start a new iteration if regex doesn't match
my $number = join "", (split //)[17 .. 19]; # who needs temp variables?
my $push_number = 1; # self-documenting variable names
for my $i (0 .. $#residue_name) {
if ($residue_name[$i] == $number) {
$residue_count[$i]++;
$push_number = 0;
}
}
push #residue_name, $number if $push_number;
# are you sure you want to print this after every input line?
# I'd rather put this outside the loop.
for my $i (0 .. $#residue_name) {
say $residue_name[$i], ("-" x 7), $residue_count[$i]; # "x" repetition operator
}
}
And here is an implementation that may be faster for large input files: We use hashes (lookup tables), instead of looping through arrays:
#!/usr/bin/perl
use strict; use warnings; use 5.010;
my $filename = $ARGV[0]; # #ARGV holds the command line args
open my $fh, "<", $filename or die "Can't open $filename: $!";
my %count_residue; # this hash maps the numbers to counts
# automatically guarantees that every number has one count only
while(<$fh>) { # read into "$_" special variable
next unless /^ATOM/; # start a new iteration if regex doesn't match
my $number = join "", (split //)[17 .. 19]; # who needs temp variables?
if (exists $count_residue{$number}) {
# if we already have an entry for that number, we increment:
$count_residue{$number}++;
} else {
# We add the entry, and initialize to zero
$count_residue{$number} = 0;
}
# The above if/else initializes new numbers (seen once) to zero.
# If you want to count starting with one, replace the whole if/else by
# $count_residue{$number}++;
# print out all registered residues in numerically ascending order.
# If you want to sort them by their count, descending, then use
# sort { $count_residue{$b} <=> $count_residue{$a} } ...
for my $num (sort {$a <=> $b} keys %count_residue) {
say $num, ("-" x 7), $count_residue{$num};
}
}
It took me a while to chance down all the various errors. As others have said, use use warnings; and use strict;
Rule #1: Whenever you see syntax error pointing to a perfectly good line, you should always see if the line before is missing a semicolon. You forgot the semicolon after $flag=0.
In order to track down all the issues, I've rewritten your code into a more modern syntax:
#! /usr/bin/env perl
use strict;
use warnings;
use autodie;
print "Enter file name......\n\n";
chomp (my $samplefile = <STDIN>);
open my $input_file, '<:crlf', $samplefile;
my #residue_name;
my #residue_count;
while ( my $newline = <$input_file> ) {
chomp $newline;
next if $newline !~ /^ATOM/; #Eliminates the internal `if`
my #columns = split //, $newline;
my $res = join '', $columns[17], $columns[18], $columns[19];
my $flag = 0;
for my $i (0..$#residue_name) {
if ( $residue_name[$i] == $res ) {
$residue_count[$i]++;
$flag = 1;
}
}
if ( $flag == 0 ) {
push #residue_name, $res;
}
for my $i (0..$#residue_name) {
print "$residue_name[$i] ------- $residue_count[$i]\n";
}
}
close $input_file;
Here's a list of changes:
Lines 2 & 3: Always use use strict; and use warnings;. These will help you track down about 90% of your program errors.
Line 4: Use use autodie;. This will eliminate the need for checking whether a file opened or not.
Line 7 (and others): Using use strict; requires you to predeclare variables. Thus, you'll see my whenever a variable is first used.
Line 8: Use the three parameter open and use local variables for file handles instead of globs (i.e. $file_handle vs. FILE_HANDLE). The main reasons is that local variables are easier to pass into subroutines than globs.
Lines 9 & 10: No need to initialize the arrays, just declare them is enough.
Line 13: Always chomp as soon as you read in.
Line 14: Doing this eliminates an entire inner if statement that's embraces your entire while loop. Code blocks (such as if, while, and for) get hard to figure out when they get too long and too many embedded inside each other. Using next in this way allows me to eliminate the if block.
Line 17: Here's where you missed the semicolon which gave you your first syntax error. The main thing is I eliminated the very confusing splice command. If you want to zero out your array, you could have simply said #columns = (); which is much clearer. However, since #columns is now in scope only in the while loop, I no longer have to blank it out since it will be redefined for each line of your file.
Line 18: This is a much cleaner way of looping through all lines of your array. Note that $#residue_name gives you the last index of $#residue_name while scalar #resudue_name gives you the number of elements. This is a very important distinction! If I have an #array = (0, 1, 2, 3, 4), $#array will be 4, but scalar #array will be 5. Using the C style for loop can be a bit confusing when doing this. Should you use > or >=? Using (0..$#residue) name is obvious and eliminate the chance of errors which included the extra semi-colon inside your C style for statement. Because of the chance of errors and the complexity of the syntax, The developers who created Python have decided not allow for C style for loops.
Line 19 (and others): Using warnings pointed out that you did #residue_name[i] and it had several issues. First of all, you should use $residue_name[...] when indexing an array, and second of all, i is not an integer. You meant $i. Thus #residue_name[i] becomes $residue_name[$i].
Line 20: If you're incrementing a variable, use $foo++; or $foo += 1; and not $foo = $foo + 1;. The first two make it easier to see that you're incrementing a variable and not recalculating it's value.
Line 29: One of the great features of Perl is that variables can be interpolated inside quotes. You can put everything inside a single set of quotes. By the way, you should use . and not , if you do break up a print statement into multiple pieces. The , is a list operation. This means that what you print out is dependent upon the value of $,. The $, is a Perl variable that says what to print out between each item of a list when you interpolate a list into a string.
Please don't take this as criticism of your coding abilities. Many Perl books that teach Perl, and many course that teach Perl seem to teach Perl as it was back in the Perl 3.0 days. When I first learned Perl, it was at Perl 3.0, and much of my syntax would have looked like yours. However, Perl 5.x has been out for quite a while and contains many features that made programming easier and cleaner to read.
It took me a while to get out of Perl 3.0 habits and into Perl 4.0 and later Perl 5.0 habits. You learn by looking at what others do, and asking questions on forums like Stack Overflow.
I still can't say your code will work. I don't have your input, so I can't test it against that. However, by using this code as the basis of your program, debugging these errors should be pretty easy.

How Can I Store a File Handle in a Perl Object and how can I access the result?

I wanted to store a file handle in a Perl Object. Here is how I went about it.
sub openFiles {
my $self = shift;
open (my $itemsFile, "<", "items.txt") or die $!;
open (my $nameFile, "<", "FullNames.txt") or die $!;
$self->{itemsFile} = $itemsFile;
$self->{nameFile} = $nameFile;
return $self;
}
Then I'm looking to access some information from one of these files. Here is how I go about it.
sub getItemDescription {
my $self = #_;
chomp(my $record = $self->{itemsFile});
return $record;
}
I attempt to access it in another procedure as follows:
print "Test 3: $self->getItemDescription()\n";
My questions are as follows:
Is the way I'm saving the file handle in the object correct? If not, how is it wrong?
Is the way I'm reading the lines of the file correct? If not, how can I get it right?
Finally, is the way I'm printing the returned object correct?
This is really important to me. If there is any way that I can improve the structure of my code, i.e. making a global variable for file handling or changing the structure of the object, please let me know.
Is the way I'm saving the file handle in the object correct?
Yes.
Is the way I'm reading the lines of the file correct?
No. That just assigns the file handle. One reads a line from the file using the readline operator.
One would normally use the <...> syntax of the readline operator, but <...> is a shortcut for both readline(...) and glob(qq<...>), and Perl thinks <$self->{itemsFile}> is short for glob(qq<$self->{itemsFile}>). You have to use readline specifically
my $record = readline($self->{itemsFile});
chomp($record) if defined($record);
or do some extra work
my $fh = $self->{itemsFile};
my $record = <$fh>;
chomp($record) if defined($record);
(Note that I don't call chomp unconditionally since readline/<> can return undef.)
Finally, is the way I'm printing the returned object correct?
I presume you mean returned string, as in the string returned by getItemDescription. The catch is, you never actually call the method. ->getItemDescription() has no meaning in double quoted string literals, even after a variable. You need to move $self->getItemDescription() out of the double quotes.
You also fail to check if you've reached the end of the file.
You are close.
To read a record (line) from a filehandle, you use the builtin readline function or the <...> operator AFTER you assign the filehandle to a "simple scalar" (see edit below).
chomp(my $record = readline( $self->{itemsFile} );
my $fh = $self->{itemsFile};
chomp(my $record = <$fh>);
There is also a bug in your getItemDescription method. You'll want to say
my ($self) = #_;
instead of
my $self = #_;
The latter call is a scalar assignment of an array, which resolves to the length of the array, not the first element of the array.
EDIT: <$self->{itemsFile}> and <{$self->{itemsFile}}> do not work, as perlop explains:
If what's within the angle brackets is neither a filehandle nor a simple scalar variable containing a filehandle name, typeglob, or typeglob reference, it is interpreted as a filename pattern to be globbed, and either a list of filenames or the next filename in the list is returned, depending on context. This distinction is determined on syntactic grounds alone. That means <$x> is always a readline() from an indirect handle, but <$hash{key}> is always a glob(). That's because $x is a simple scalar variable, but $hash{key} is not--it's a hash element. Even <$x > (note the extra space) is treated as glob("$x "), not readline($x).
The openFiles piece is correct.
The errors occur primarily getItemDescription method.
First as previously mentioned my $self = #_; should be my ($self) = #_;.
However, the crux of the question is solved in the following fashion:
Change chomp(my $record = $self->{itemsFile}); to two lines:
$file1 = $self->{itemsFile};
chomp(my $record = $file1);
To clarify you must (in my experience and I tried all the solutions suggested) use a scalar value.
Finally, see the last two paragraphs in ikagami's answer.

Perl Global variable uninitialized

I'm new to perl so please bear with me.
I have script that is parsing a CSV file. To make things easier to debug I am using a state machine FSA::Rules (works great love it).
Every thing is going well only now I need to make my logs make sense, as part of this I need to record line numbers so my program looks some thing like this.
my $line = '';
my $lineCount = 0;
sub do {
...
#CSV opened
...
#State machine stuff happens here
readLine;
if ($line =~ m/.*Pattern*/){
#do stuff
}
}
sub readLine{
$line = <CSV>;
$lineCount ++;
}
But I get the following error
Use of uninitialized value $line in pattern match (m//) at
Any one know why $line would not be initialized?
Thanks.
When you reach end of file, $line = <CSV> will assign the undefined value to $line. The usual idiom is to check whether the readline function (which is implicitly called by the <> operator) returned a good value or not before proceeding ...
while (my $line = <CSV>) {
# guaranteed that $line has a defined value
...
}
but you with your sequence of calls, you are avoiding that check. Your current code also increments $lineCount even when <CSV> does not return a good value, which may not be what you want either.

Nested while loop which does not seem to keep variables appropriately

I'm an amature Perl coder, and I'm having a lot of trouble figuring what is causing this particular issue. It seems as though it's a variable issue.
sub patch_check {
my $pline;
my $sline;
while (<SYSTEMINFO>) {
chomp($_);
$sline = $_;
while (<PATCHLIST>) {
chomp($_);
$pline = $_;
print "sline $sline pline $pline underscoreline $_ "; #troubleshooting
print "$sline - $pline\n";
if ($pline =~ /($sline)/) {
#print " - match $pline -\n";
}
} #end while
}
}
There is more code, but I don't think it is relevant. When I print $sline in the first loop it works fine, but not in the second loop. I tried making the variables global, but that did not work either.
The point of the subform is I want to open a file (patches) and see if it is in (systeminfo). I also tried reading the files into arrays and doing foreach loops.
Does anyone have another solution?
It looks like your actual goal here is to find lines which are in both files, correct? The normal (and much more efficient! - it only requires you to read in each file once, rather than reading all of one file for each line in the other) way to do this in Perl would be to read the lines from one file into a hash, then use hash lookups on each line in the other file to check for matches.
Untested (but so simple it should work) code:
sub patch_check {
my %slines;
while (<SYSTEMINFO>) {
# Since we'll just be comparing one file's lines
# against the other file's lines, there's no real
# reason to chomp() them
$slines{$_}++;
}
# %slines now has all lines from SYSTEMINFO as its
# keys and the values are the number of times the
# line appears, in case that's interesting to you
while (<PATCHLIST>) {
print "match: $_" if exists $slines{$_};
}
}
Incidentally, if you're reading your data from SYSTEMINFO and PATCHLIST, then you're doing it the old-fashioned way. When you get a chance, read up on lexical filehandles and the three-argument form of open if you're not already familiar with them.
Your code is not entering the PATCHLIST while loop the 2nd time through the SYSTEMINFO while loop because you already read all the contents of PATCHLIST the first time through. You'd have to re-open the PATCHLIST filehandle to accomplish what you're trying to do.
That's a pretty inefficient way to see if the lines of one file match the lines of another file. Take a look at grep with the -f flag for another way.
grep -f PATCHFILE SYSTEMINFO
What I like to do in such cases is: read one file and create keys for a hash from the values you are looking for. And then read the second file and look if the keys are already existing. In this way you have to read each file only once.
Here is example code, untested:
sub patch_check {
my %patches = ();
open(my $PatchList, '<', "patch.txt") or die $!;
open(my $SystemInfo, '<', "SystemInfo.txt") or die $!;
while ( my $PatchRow = <$PatchList> ) {
$patches($PatchRow) = 0;
}
while ( my $SystemRow = <$SystemInfo> ) {
if exists $patches{$SystemRow} {
#The Patch is in System Info
#Do whateever you want
}
}
}
You can not read one file inside the read loop of another. Slurp one file in, then have one loop as a foreach line of the slurped file, the outer loop, the read loop.