Perl script - Confusing error - perl

When I run this code, I am purely trying to get all the lines containing the word "that" in them. Sounds easy enough. But when I run it, I get a list of matches that contain the word "that" but only at the end of the line. I don't know why it's coming out like this and I have been going crazy trying to solve it. I am currently getting an output of 268 total matches, and the output I need is only 13. Please advise!
#!/usr/bin/perl -w
#Usage: conc.shift.pl textfile word
open (FH, "$ARGV[0]") || die "cannot open";
#array = (1,2,3,4,5);
$count = 0;
while($line = <FH>) {
chomp $line;
shift #array;
push(#array, $line);
$count++;
if ($line =~ /that/)
{
$output = join(" ",#array);
print "$output \n";
}
}
print "Total matches: $count\n";

Don't you want to increment your $count variable only if the line contains "that", i.e.:
if ($line =~ /that/) {
$count++;
instead of incrementing the counter before checking if $line contains "that", as you have it:
$count++;
if ($line =~ /that/) {
Similarly, I suspect that your push() and join() calls, for stashing a matching line in #array, should also be within the if block, only executed if the line contains "that".
Hope this helps!

Related

Populate an array by splitting a string

I am trying to convert a string into an array based on space delimiter.
My input file looks like this:
>Reference
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnctcACCATGGTGTCGACTC
TTCTATGGAAACAGCGTGGATGGCGTCTCCAGGCGATCTGACGGTTCACTAAACGAGCTC
Ignoring the line starting with >, the length of rest of the string is 360.
I am trying to convert this into an array.
Here's my code so far:
#!/usr/bin/perl
use strict;
use warnings;
#### To to change bases with less than 10X coverage to N #####
#### Take depth file and consensus fasta file as input arguments ####
my ($in2) = #ARGV;
my $args = $#ARGV + 1;
if ( $args != 1 ) {
print "Error!!! Insufficient Number of Argumrnts\n";
print "Usage: $0 <consensus fasta file> \n";
}
#### Open a filehandle to read in consensus fasta file ####
my $FH2;
my $line;
my #consensus;
my $char;
open($FH2, '<', $in2) || die "Could not open file $in2\n";
while ( <$FH2> ) {
$line = $_;
chomp $line;
next if $line =~ />/; # skip header line
$line =~ s/\s+//g;
my $len = length($line);
print "$len\n";
#print "$line";
#consensus = split(// , $line);
print "$#consensus\n";
#print "#consensus\n";
#for $char (0 .. $#consensus){
# print "$char: $consensus[$char]\n";
# }
}
The problem is the $len variable returns a value of 60 instead of 360 and $#consensus returns a value of 59 instead of 360 which is the length of the string.
I have removed the whitespace after each line with code $line =~ s/\s+//g;but it still is not working.
It looks like your code is essentially working. It's just your checking logic that makes no sense. I'd do the following:
use strict;
use warnings;
if (#ARGV != 1) {
print STDERR "Usage: $0 <consensus fasta file>\n";
exit 1;
}
open my $fh, '<', $ARGV[0] or die "$0: cannot open $ARGV[0]: $!\n";
my #consensus;
while (my $line = readline $fh) {
next if $line =~ /^>/;
$line =~ s/\s+//g;
push #consensus, split //, $line;
}
print "N = ", scalar #consensus, "\n";
Main things to note:
Error messages should go to STDERR, not STDOUT.
If an error occurs, the program should exit with an error code, not keep running.
Error messages should include the name of the program and the reason for the error.
chomp is redundant if you're going to remove all whitespace anyway.
As you're processing the input line by line, you can just keep pushing elements to the end of #consensus. At the end of the loop it'll have accumulated all characters across all lines.
Examining #consensus within the loop makes little sense as it hasn't finished building yet. Only after the loop do we have all characters we're interested in.

perl script miscounting because of empty lines

the below script is basically catching the second column and counting the values. The only minor issue I have is that the file has empty lines at the end (it's how the values are being exported) and because of these empty lines the script is miscounting. Any ideas please? Thanks.
my $sum_column_b = 0;
open my $file, "<", "file_to_count.txt" or die($!);
while( my $line = <$file>) {
$line =~ m/\s+(\d+)/; #regexpr to catch second column values
$sum_column_b += $1;
}
print $sum_column_b, "\n";
I think the main issue has been established, you are using $1 when it is not conditionally tied to the regex match, which causes you to add values when you should not. This is an alternative solution:
$sum_column_b += $1 if $line =~ m/\s+(\d+)/;
Typically, you should never use $1 unless you check that the regex you expect it to come from succeeded. Use either something like this:
if ($line =~ /(\d+)/) {
$sum += $1;
}
Or use direct assignment to a variable:
my ($num) = $line =~ /(\d+)/;
$sum += $num;
Note that you need to use list context by adding parentheses around the variable, or the regex will simply return 1 for success. Also note that, like Borodin says, this will give an undefined value when the match fails, and you must add code to check for that.
This can be handy when capturing several values:
my #nums = $line =~ /(\d+)/g;
The main problem is that if the regex does not match, then $1 will hold the value it received in the previous successful match. So every empty line will cause the previous line to be counted again.
An improvement would be:
my $sum_column_b = 0;
open my $file, "<", "file_to_count.txt" or die($!);
while( my $line = <$file>) {
next if $line =~ /^\s*$/; # skip "empty" lines
# ... maybe skip other known invalid lines
if ($line =~ m/\s+(\d+)/) { #regexpr to catch second column values
$sum_column_b += $1;
} else {
warn "problematic line '$line'\n"; # report invalid lines
}
}
print $sum_column_b, "\n";
The else-block is of course optional but can help noticing invalid data.
Try putting this line just after the while line:
next if ( $line =~ /^$/ );
Basically, loop around to the next line if the current line has no content.
#!/usr/bin/perl
use warnings;
use strict;
my $sum_column_b = 0;
open my $file, "<", "file_to_count.txt" or die($!);
while (my $line = <$file>) {
next if (m/^\s*$/); # next line if this is unsignificant
if ($line =~ m/\s+(\d+)/) {
$sum_column_b += $1;
}
}
print "$sum_column_b\n";

Ignore lines in a file till match and process lines after that

I am looping over lines in a file and when matched a particular line, i want to process the lines after the current (matched) line. I can do it :-
open my $fh, '<', "abc" or die "Cannot open!!";
while (my $line = <$fh>){
next if($line !~ m/Important Lines below this Line/);
last;
}
while (my $line = <$fh>){
print $line;
}
Is there a better way to do this (code needs to be a part of a bigger perl script) ?
I'd use flip-flop operator:
while(<DATA>) {
next if 1 .. /Important/;
print $_;
}
__DATA__
skip
skip
Important Lines below this Line
keep
keep
output:
keep
keep

Perl: Searching a file

I am creating a perl script that takes in the a file (example ./prog file)
I need to parse through the file and search for a string. This is what I thought would work, but it does not seem to work. The file is one work per line containing 50 lines
#array = < >;
print "Enter the word you what to match\n";
chomp($match = <STDIN>);
foreach $line (#array){
if($match eq $line){
print "The word is a match";
exit
}
}
You're chomping your user input, but not the lines from the file.
They can't match; one ends with \n the other does not. Getting rid of your chomp should solve the problem. (Or, adding a chomp($line) to your loop).
$match = <STDIN>;
or
foreach $line (#array){
chomp($line);
if($match eq $line){
print "The word is a match";
exit;
}
}
Edit in the hope that the OP notices his mistake from the comments below:
Changing eq to == doesn't "fix" anything; it breaks it. You need to use eq for string comparison. You need to do one of the above to fix your code.
$a = "foo\n";
$b = "bar";
print "yup\n" if ($a == $b);
Output:
yup

Why doesn't my Perl code work when I put it in a foreach loop?

This code outputs the scalars in the row array properly:
$line = "This is my favorite test";
#row = split(/ /, $line);
print $row[0];
print $row[1];
The same code inside a foreach loop doesn't print any scalar values:
foreach $line (#lines){
#row = split(/ /, $line);
print $row[0];
print $row[1];
}
What could cause this to happen?
I am new to Perl coming from python. I need to learn Perl for my new position.
As already mentioned in Jefromi's comments, whatever the problem is, it exists outside of the code you posted. This works entirely fine:
$lines[0] = "This is my favorite test";
foreach $line (#lines) {
#row = split(/ /, $line);
print $row[0];
print $row[1];
}
The output is Thisis
When I run into these sorts of problems where I wonder why something in a block is not happening, I add some debugging code to ensure I actually enter the block:
print "\#lines has " . #lines . " elements to process\n";
foreach my $line ( #lines )
{
print "Processing [$line]\n";
...;
}
You can also do this in your favorite debugger by setting breakpoints and inspecting variables, but that's a bit too much work for me. :)
If you need to learn Perl and already know Python, you shouldn't have that much trouble going through Learning Perl in a couple of days.