Perl: averaging numbers in a file - perl

I have a file where each line consists of a numerical value:
1
2
3
3
1
My function looks something like this:
print "Enter file name to average \n";
$infile = <>;
open IN, "$infile";
$total = 0;
$count =0;
while (my $line = <>) {
$total +=$line;
$count ++=;
}
print "Average = ", $total / $count, "\n";
close(IN);
But I'm getting an error at the $count ++=; line saying that there's a syntax error near "+=;".

Just do $count++, no =.
See http://perldoc.perl.org/perlop.html#Auto-increment-and-Auto-decrement.

No need of = after increment $count++. In simple way you can do like this:
#!/usr/bin/perl
use strict;
use warnings;
my $total = 0;
my $count = 0;
while (<>)
{
$total += $_;
$count++;
}
print "Average = ", $total/$count, "\n";
_____file.txt_____
1
2
3
3
1
Execute you program as:
./scriptname file.txt
Output:
Average = 2

The ideal code for your problem should look like below:
#!/usr/bin/perl
use strict;
use warnings;
print "Enter file name to average \n";
chomp( my $infile = <> ); #remove new line characters
open my $IN, '<', $infile
or die "unable to open file: $! \n"; #use 3 arg file open syntax
my $total = 0;
my $count = 0;
while ( my $line = <$IN> ) {## you should use <> on file handle
chomp $line;
next if ($line=~/^\s*$/); #ignore blank lines
$total += $line;
$count++;
}
if($count > 0){
print "Average = ", $total / $count, "\n";
}
else{
print "Average cannot be calculated Check if file is blank \n";
}
close($IN);

Related

The output of a subroutine is returning 0

I have written a script which uses a subroutine to call percentage of nucleotides in a given sequence. When I run the script the output for each nucleotide percentage is always shown to be zero.
Here's my code;
#!/usr/bin/perl
use strict;
use warnings;
#### Subroutine to report percentage of each nucleotide in DNA sequence ####
my $input = $ARGV[0];
my $nt = $ARGV[1];
my $args = $#ARGV +1;
if($args != 2){
print "Error!!! Insufficient number of arguments\n";
print "Usage: $0 <input fasta file>\n";
}
my($FH, $line);
open($FH, '<', $input) || die "Could\'nt open file: $input\n";
$line = do{
local $/;
<$FH>;
};
$line =~ s/>(.*)//g;
$line =~ s/\s+//g;
my $perc = perc_nucleotide($line , $nt);
printf("The percentage of $nt nucleotide in given sequence is %.0f", $perc);
print "\n";
sub perc_nucleotide {
my($line, $nt) = #_;
print "$nt\n";
my $count = 0;
if( $nt eq "A" || $nt eq "T" || $nt eq "G" || $nt eq "C"){
$count++;
}
my $total_len = length($line);
my $perc = ($count/$total_len)*100;
}
I think that I am setting the $count variable wrong. I tried different ways but can't figure it out.
This is the input file
>XM_024894547.1 Trichoderma citrinoviride Redoxin (BBK36DRAFT_1163529), partial mRNA
ATGGCCTTCCGTCTCCCTCTGCGCCGCATTGCCCTGGCCCGCCCCGCCACCGTTGCGCGTGGCTTCCACT
CGACGCCCCGCGCCCTGGTCAAGGTCGGCGACGAGGTCCCGAGCTTGGAGCTGTTCGAGAAGTCGGCCGC
CAGCAAGATCAACCTGGCCGACGAGTTCAAGAAGGGCGACGGCTACATTGTCGGCGTCCCGGGCGCCTTC
TCCGGCACCTGCTCCGGCACCCACGTCCCGTCGTACATCAACCACCCTGACATCAAGACGGCCGGCCAGG
TCTTTGTCGTCTCCGTCAACGACCCCTTTGTCATGAAGGCTTGGGCAGACCAGCTGGATCCCGCCGGAGA
GACAGGAATCCGGTTCGTTGCCGACCCCACGGCTGAGTTCACAAAGGCTCTGGAACTGGGATTCGACGAC
GCTGCTCCTCTGTTCGGAGGCACCCGAAGCAAGCGCTATGCTCTCAAGGTTAAGGATGGCAAGGTCACTG
CCGCCTTTGTTGAGCCCGACAACACGGGCACTTCCGTGTCAATGGCCGACAAGGTCCTCAGCTAA
The problem is here:
my $perc = perc_nucleotide($line , $nt);
printf("The percentage of $nt nucleotide in given sequence is %.0f", $perc);
perc_nucleotide is returning 0.18018018018018 but the format %.0f says to print it with no decimal places. So it gets truncated to 0. You should probably use something more like %.2f.
It's also worth noting that perc_nucleotide does not have a return. It still works, but for reasons that might not be obvious.
perc_nucleotide sets my $perc = ($count/$total_len)*100; but never uses that $perc. The $perc in the main program is a different variable.
perc_nucleotide does return something, every Perl subroutine without an explicit return returns the "last evaluated expression". In this case it's my $perc = ($count/$total_len)*100; but the last evaluated expression rules can get a bit tricky.
It's easier to read and safer to have an explicit return. return ($count/$total_len)*100;
I corrected the script and it gave me right answers.
#!/usr/bin/perl
use strict;
use warnings;
##### Subroutine to calculate percentage of all nucleotides in a DNA sequence #####
my $input = $ARGV[0];
my $nt = $ARGV[1];
my $args = $#ARGV + 1;
if($args != 2){
print "Error!!! Insufficient number of arguments\n";
print "Usage: $0 <input_fasta_file> <nucleotide>\n";
}
my($FH, $line);
open($FH, '<', $input) || die "Couldn\'t open input file: $input\n";
$line = do{
local $/;
<$FH>;
};
chomp $line;
#print $line;
$line =~ s/>(.*)//g;
$line =~ s/\s+//g;
#print "$line\n";
my $total_len = length($line);
my $perc_of_nt = perc($line, $nt);
**printf("The percentage of nucleotide $nt in a given sequence is %.2f%%", $perc_of_nt);
print "\n";**
#print "$total_len\n";
sub perc{
my($line, $nt) = #_;
my $char; my $count = 0;
**foreach $char (split //, $line){
if($char eq $nt){
$count += 1;
}
}**
**return (($count/$total_len)*100)**
}
The answer for the above input file is:
Total_len = 555
The percentage of nucleotide A in a given sequence is 18.02%
The percentage of nucleotide T in a given sequence is 18.74%
The percentage of nucleotide G in a given sequence is 28.47%
The changes which I made are in bold.
Thanks for amazing insight!!!

How to take input from one csv file and count the words in another file in perl

I am very new to perl. I want to take string from first column of one csv file and want to check the frequency of this word in another file and want to print the output in third file. Here is my code -
#!/usr/bin/perl
$inputfile = 'Input.txt';
$outputfile = 'Out.csv';
$file = 'File.csv';
open(INPUT, "<$inputfile") or die "Could not read from $inputfile, program halting.";
open(OUTPUT, ">$outputfile") or die "Could not open $outputfile, program halting.";
open(FILE, "<$file") or die "Could not read from $file, program halting";
#temp;
#token;
$count;
#skip first line of approved file
if(<FILE>)
{
(#temp) = split (/\,/);
}
$count = 0;
while(<FILE>)
{
#temp = split (/\,/);
print "First Column - #temp[0], ";
print "Count - #temp[1], ";
print "Priority - #temp[3], ";
$count = 0;
while(<INPUT>)
{
#read the fields in the current record into an array
#words = split(/\s+/);
foreach $word (#words)
{
$temp1 = #temp[0];
if($word == $temp1)
{
$count++;
}
}
}
print "$temp1 - Count - $count \n ";
print OUTPUT "$temp1,$count,#temp[3]";
print OUTPUT "\n";
}
close INPUT;
close OUTPUT;
close FILE;
print "Done, please check the output file.\n";
Somebody please help.
First and foremost thing is always
use strict;
use warnings;
You have misunderstood getting values from arrays as scalar. To fetch a scalar value from array use the following:
#temp = split (/\,/); #this is array
print "First Column - $temp[0], "; #temp[[0] is scalar value of array
print "Count - $temp[1], "; #temp[[1] is scalar value of array
print "Priority - $temp[3], "; #temp[[2] is scalar value of array
and here
foreach $word (#words)
{
$temp1 = $temp[0]; #this should be scalar.
if($word == $temp1)
{
$count++;
}
}
Wherever you are using #temp[0],#temp[1].. is wrong usage it should be $temp[0],$temp[1],$temo[2]....

Perl script grep

The script is printing the amount of input lines, I want it to print the amount of input lines that are present in another file
#!/usr/bin/perl -w
open("file", "text.txt");
#todd = <file>;
close "file";
while(<>){
if( grep( /^$_$/, #todd)){
#if( grep #todd, /^$_$/){
print $_;
}
print "\n";
}
if for example file contains
1
3
4
5
7
and the input file that will be read from contains
1
2
3
4
5
6
7
8
9
I would want it to print 1,3,4,5 and 7
but 1-9 are being printed instead
UPDATE******
This is my code now and I am getting this error
readline() on closed filehandle todd at ./may6test.pl line 3.
#!/usr/bin/perl -w
open("todd", "<text.txt");
#files = <todd>; #file looking into
close "todd";
while( my $line = <> ){
chomp $line;
if ( grep( /^$line$/, #files) ) {
print $_;
}
print "\n";
}
which makes no sense to me because I have this other script that is basically doing the same thing
#!/usr/bin/perl -w
open("file", "<text2.txt"); #
#file = <file>; #file looking into
close "file"; #
while(<>){
$temp = $_;
$temp =~ tr/|/\t/; #puts tab between name and id
my ($name, $number1, $number2) = split("\t", $temp);
if ( grep( /^$number1$/, #file) ) {
print $_;
}
}
print "\n";
OK, the problem here is - grep sets $_ too. So grep { $_ } #array will always give you every element in the array.
At a basic level - you need to:
while ( my $line = <> ) {
chomp $line;
if ( grep { /^$line$/ } #todd ) {
#do something
}
}
But I'd suggest instead that you might want to consider building a hash of your lines instead:
open( my $input, '<', "text.txt" ) or die $!;
my %in_todd = map { $_ => 1 } <$input>;
close $input;
while (<>) {
print if $in_todd{$_};
}
Note - you might want to watch for trailing linefeeds.

Perl simple filehandling of text

What this program is meant to do is that it reads a text file which looks like:
Item \t\t Price
apple \t\t 20
orange \t\t 50
lime \t\t 30
I'm using split function to split these 2 columns and then i should apply a -25% discount on all items and print it out to a new file. My code so far does what i want but the new text file has a '0' value under my last number in price column. I also get 2 errors if i run it with "use warnings" which are:
Use of uninitialized value $item in multiplication * ...
Use of uninitialized value $item[0] in concatenation (.) ...
I should also tell total number of items calculated but i get like 5 1's instead of 5. (11111 instead of 5)
use strict;
use warnings;
my $filename = 'shop.txt';
if (-e $filename){
open (IN, $filename);
}
else{
die "Can't open input file for reading: $!";
}
open (OUT,">","discount.txt") or die "Can't open output file for writing: $!";
my $header = <IN>;
print OUT $header;
while (<IN>) {
chomp;
my #items = split(/\t\t/);
foreach my $item ($items[1]){
my $discount = $item * (0.75);
print OUT "$items[0]\t\t$discount\n";
}
}
This is too complicated and not clear what are you doing in foreach loop and you are not skipping empty lines. Keep it simple:
use warnings;
use strict;
use v5.10;
<>; # skip header
while(my $line = <>)
{
chomp $line;
next unless ($line);
my ($title, $price ) = split /\s+/, $line;
if( $title && defined $price )
{
$price *= 0.75;
say "$title\t\t$price";
}
}
and run like
perl script.pl <input.txt >output.txt
use strict;
use warnings;
my $filename = 'shop.txt';
if (-e $filename){
open (IN, $filename);
}
else{
die "Can't open input file for reading: $!";
}
open (OUT,">","discount.txt") or die "Can't open output file for writing: $!";
my $header = <IN>;
my $item;
my $price;
print OUT $header;
while (<IN>) {
chomp;
($item, $price) = split(/\t\t/);
my $discount = $price*0.75;
print OUT "$item $discount\n";
}
This should help! :)
If the total item count isn't very important to you:
$ perl -wane '$F[1] *= 0.75 if $. > 1; print join("\t", #F), "\n";' input.txt
Output:
Item Price
apple 15
orange 37.5
lime 22.5
If you really need the total item count:
$ perl -we 'while (<>) { #F = split; if ($. > 1) { $F[1] *= 0.75; $i++ } print join("\t", #F), "\n"; } print "$i items\n";' input.txt
Output:
Item Price
apple 15
orange 37.5
lime 22.5
3 items
I'd use this approach
#!/usr/bin/perl
use strict;
use warnings;
my %items;
my $filename = 'shop.txt';
my $discount = 'discount.txt';
open my $in, '<', $filename or die "Failed to open file! : $!\n";
open my $out, ">", $discount or die "Can't open output file for writing: $!";
print $out "Item\t\tPrice\n";
my $cnt = 0;
while (my $line = <$in>) {
chomp $line;
if (my ($item,$price) = $line =~ /(\w.+)\s+([0-9.]+)/){
$price = $price * (0.75);
print $out "$item\t\t$price\n";
$items{$item} = $price;
$cnt++;
}
}
close($in);
close($out);
my $total = keys %items;
print "Total items - $total \n";
print "Total items - $cnt\n";
Using regex capture groups to capture the item and price (using \w.+ in case the item is 2 words like apple sauce), this will also prevent empty lines from printing to file.
I also hard coded the Item and Price header, probably a good idea if you are going to be using a consistent header.
Hope it helps
---Update ----
I added 2 examples of a total count in my script. The first one is using a hash and printing out the hash size, the second method is using a counter. The hash option is good except if your list has 2 items that are the same in which case the key of the hash will be overridden with the last item found which shares the same name. The counter is a simple solution.

how to count the number of specific characters through each line from file?

I'm trying to count the number of 'N's in a FASTA file which is:
>Header
AGGTTGGNNNTNNGNNTNGN
>Header2
AGNNNNNNNGNNGNNGNNGN
so in the end I want to get the count of number of 'N's and each header is a read so I want to make a histogram so I would at the end output something like this:
# of N's # of Reads
0 300
1 240
etc...
so there are 300 sequences or reads that have 0 number of 'N's
use strict;
use warnings;
my $file = shift;
my $output_file = shift;
my $line;
my $sequence;
my $length;
my $char_N_count = 0;
my #array;
my $count = 0;
if (!defined ($output_file)) {
die "USAGE: Input FASTA file\n";
}
open (IFH, "$file") or die "Cannot open input file$!\n";
open (OFH, ">$output_file") or die "Cannot open output file $!\n";
while($line = <IFH>) {
chomp $line;
next if $line =~ /^>/;
$sequence = $line;
#array = split ('', $sequence);
foreach my $element (#array) {
if ($element eq 'N') {
$char_N_count++;
}
}
print "$char_N_count\n";
}
Try this. I changed a few things like using scalar file handles. There are many ways to do this in Perl, so some people will have other ideas. In this case I used an array which may have gaps in it - another option is to store results in a hash and key by the count.
Edit: Just realised I'm not using $output_file, because I have no idea what you want to do with it :) Just change the 'print' at the end to 'print $out_fh' if your intent is to write to it.
use strict;
use warnings;
my $file = shift;
my $output_file = shift;
if (!defined ($output_file)) {
die "USAGE: $0 <input_file> <output_file>\n";
}
open (my $in_fh, '<', $file) or die "Cannot open input file '$file': $!\n";
open (my $out_fh, '>', $output_file) or die "Cannot open output file '$output_file': $!\n";
my #results = ();
while (my $line = <$in_fh>) {
next if $line =~ /^>/;
my $num_n = ($line =~ tr/N//);
$results[$num_n]++;
}
print "# of N's\t# of Reads\n";
for (my $i = 0; $i < scalar(#results) ; $i++) {
unless (defined($results[$i])) {
$results[$i] = 0;
# another option is to 'next' if you don't want to show the zero totals
}
print "$i\t\t$results[$i]\n";
}
close($in_fh);
close($out_fh);
exit;