Why is this array only printing the last number? - perl

My first time working with Perl. I'm using it to take data from multiple cells from one Excel file and put them in another, existing Excel file.
I've managed to extract the data I need from the first file and put it into an #array. I started a new file to experiment with writing the data into the specific cells I need.
The problem is that when the script runs it has the same number in all cells, 18365. While the #rows arrays is working correctly and putting the number in the correct cell, the #revenue array only prints the last number.
Is there something I am overlooking or not understanding? Is there a better way to do this? I thank you in advance.
use warnings;
use strict;
use Spreadsheet::ParseExcel;
use Spreadsheet::ParseExcel::SaveParser;
my $parser = Spreadsheet::ParseExcel::SaveParser->new();
my $workbook = $parser->Parse('xls_test.xls');
if (!defined $workbook ) {
die $parser->error(), ".\n";
}
my $worksheet = $workbook->worksheet(0);
my #rows = (2, 10, 17);
my #revenue = (10200, 9025, 18365);
my $col = 2;
foreach my $rev (#revenue) {
foreach my $r (#rows) {
$worksheet->AddCell( $r, $col, "$rev" );
}
}
$workbook->SaveAs('xls_test.xls');

If you take your loop:
foreach my $rev (#revenue) {
foreach my $r (#rows) {
$worksheet->AddCell( $r, $col, "$rev" );
}
}
The last $rev to write is 18365, and you will overwrite the previous values in all 3 rows.
What you can do is creating a row-to-revenue hash from both your lists and traverse it:
my #rows = (2, 10, 17);
my #revenue = (10200, 9025, 18365);
my $col = 2;
my %data;
for my $i (0 .. $#rows) {
$data{$rows[$i]} = $revenue[$i]; # row => revenue
}
foreach $row (keys %data) {
$worksheet->AddCell($row, $col, $data{$row});
}

Here's an illustration of what your code does. I replaced the AddCell with a simple say, which is like print with a newline at the end.
use strict;
use warnings 'all';
use feature 'say';
my #rows = (2, 10, 17);
my #revenue = (10200, 9025, 18365);
my $col = 2;
say "row\tcol\trev";
foreach my $rev (#revenue) {
foreach my $r (#rows) {
say join "\t", $r, $col, $rev;
}
}
And this is the output:
row col rev
2 2 10200
10 2 10200
17 2 10200
2 2 9025
10 2 9025
17 2 9025
2 2 18365
10 2 18365
17 2 18365
As you can see, it iterates through all the revenues, and then for each revenue it writes to rows 2, 10 and 17.
2 2 10200
10 2 10200
17 2 10200
And here it goes again.
2 2 9025
10 2 9025
17 2 9025
Since it's always in the same column (that's the 2), the values get overwritten. That's why only the last round of values are there.
I don't really know what you are expecting as the correct output, but since you have fixed rows, you might want to use three columns? You could increase the $col variable after you're done writing all rows for each $rev.
foreach my $rev (#revenue) {
foreach my $r (#rows) {
$worksheet->AddCell( $r, $col, $rev );
}
$col++;
}
Now the output of our little program above would be like this, and all values would be there.
row col rev
2 2 10200
10 2 10200
17 2 10200
2 3 9025
10 3 9025
17 3 9025
2 4 18365
10 4 18365
17 4 18365
Note that I removed the double-quotes "" around $rev. You don't need to quote variables like that.

Related

perl: finding the region with the greatest width from a list

I have a table having the following structure
gene transcript exon length
A NM_1 1 10
A NM_1 2 5
A NM_1 3 20
A NM_2 1 10
A NM_2 2 5
A NM_2 3 50
B NM_5 1 10
... ... ... ...
So basically, the table consists of a column with all human genes. The second column contains the transcript name. The same gene can have multiple transcripts. The third column contains an exon number. Every gene consists of multiple exons. The fourth column contains the length of each exon.
Now I want to create a new table looking like this:
gene transcript length
A NM_2 65
B NM_5 10
... ... ...
So what I basically want to do is find the longest transcript for each gene.
This means that when there are multiple transcripts (column transcript) for each gene (column gene), I need to make the sum of the values in the length column for all the exons of the transcript of that gene.
So in the example there are two transcripts for gene A: NM_1 and NM_2. Each has three exons. The sum of these three values for NM_1 = 10+5+20 = 35, for NM_2 it's 10+5+50 = 65. So for gene A, NM_2 is the longest transcript, so I want to put this in the new table. For gene B there is only 1 transcript, with one exon of length 10. So in the new table, I just want the length of this transcript reported.
I've worked with hashes before, so I thought of storing 'gene' and 'transcript' as two different keys:
#! /usr/bin/perl
use strict;
use warnings;
open(my $test,'<',"test.txt") || die ("Could not open file $!");
open(my $output, '+>', "output.txt") || die ("Can't write new file: $!");
# skip the header of $test # I know how to do this
my %hash = ();
while(<$test>){
chomp;
my #cols = split(/\t/);
my $keyfield = $cols[0]; #gene name
my $keyfield2 = $cols[1]; # transcript name
push #{ $hash{$keyfield} }, $keyfield2;
...
Given what you're trying to do, I'd be thinking something like this:
use strict;
use warnings;
my %genes;
my $header_line = <DATA>;
#read the data
while (<DATA>) {
my ( $gene, $transcript, $exon, $length ) = split;
$genes{$gene}{$transcript} += $length;
}
print join( "\t", "gene", "transcript", "length_sum" ), "\n";
foreach my $gene ( keys %genes ) {
#sort by length_sum, and 'pop' the top of the list.
my ($longest_transcript) =
( sort { $genes{$gene}{$b} <=> $genes{$gene}{$a} or $a cmp $b }
keys %{ $genes{$gene} } );
print join( "\t",
$gene, $longest_transcript, $genes{$gene}{$longest_transcript} ),
"\n";
}
__DATA__
gene transcript exon length
A NM_1 1 10
A NM_1 2 5
A NM_1 3 20
A NM_2 1 10
A NM_2 2 5
A NM_2 3 50
B NM_5 1 10
output
gene transcript length_sum
B NM_5 10
A NM_2 65
This is made much less untidy using the nmax_by (numeric maximum by) function from List::UtilsBy. This program accumulates the total length in a hash and then picks out the longest transcript for each gene using nmax_by.
I presume you're able to open the input file on $fh instead of using the DATA handle? Or you could pass the path to the input file on the command line and just use <> instead of <$fh>without explicitly opening anything.
use strict;
use warnings;
use List::UtilsBy qw/ nmax_by /;
my $fh = \*DATA;
<$fh>; # Drop header line
my %genes;
while ( <$fh> ) {
my ($gene, $trans, $exon, $len) = split;
$genes{$gene}{$trans} += $len;
}
my $fmt = "%-7s%-14s%-s\n";
printf $fmt, qw/ gene transcript length /;
for my $gene ( sort keys %genes ) {
my $trans = nmax_by { $genes{$gene}{$_} } keys %{ $genes{$gene} };
printf ' '.$fmt, $gene, $trans, $genes{$gene}{$trans};
}
__DATA__
gene transcript exon length
A NM_1 1 10
A NM_1 2 5
A NM_1 3 20
A NM_2 1 10
A NM_2 2 5
A NM_2 3 50
B NM_5 1 10
output
gene transcript length
A NM_2 65
B NM_5 10
Update
Here's a much shortened version of nmax_by that will work for you to test. You can add this at the top of the program, or if you'd rather put it at the end then you need to pre-declare it with sub nmax_by(&#); at the top because it has a prototype
sub nmax_by(&#) {
my $code = shift;
my ($max, $maxval);
for ( #_ ) {
my $val = $code->($_);
($max, $maxval) = ($_, $val) unless defined $maxval and $maxval >= $val;
}
$max;
}

Perl find the elements that appears once in an array

Given an array of elements, how to find the element that occurs once only in that array:
my #array = qw(18 1 18 3 18 1 1 2 3 3);
result should be: 2
This is a variation on perlfaq5 - How can I remove duplicate elements from a list or array?
Just use a hash to count the elements, and then print the ones seen only once.
use strict;
use warnings;
my #array = qw(18 1 18 3 18 1 1 2 3 3);
my #nondup = do {
my %count;
$count{$_}++ for #array;
grep {$count{$_} == 1} keys %count;
};
print "#nondup\n";
Outputs:
2
You can also try this in simple way.
use strict;
use warnings;
my #array = qw(7 8 7 5 18 1 18 3 18 1 1 2 3 3 4 5 6 7);
my $tm = "";
my %hash=();
foreach $tm(#array){
if(exists $hash{$tm}){
$hash{$tm} = "";
}
else{
$hash{$tm} = "$tm";
}
}
print join ("\n", values %hash);exit;

grep tab separated string in perl

I am trying to grep tab separated numbers (eg 1\t3) in an array something like
#data=
1 3
2 3
1 3
3 3
the idea behind the code is something like this
#!usr/bin/perl
use strict;
use warnings;
my #data = ( "1\t3", "2\t3", "1\t3", "3\t3", );
for (my $i=0;$i<4;$i++) {
for (my $j=0;$j<4_size;$j++) {
my $pattern= "$i\t$j";
my #count=grep(/$pattern/,#data);
undef $pattern;
print "$pattern\tcount\n";
}
}
hoping for output something like
1st and second column: pattern
3nd column : count of total matches
1 1
1 2
1 3 2
2 1
2 3 1
3 1
3 2
3 3 1
but the output is null for some reasons,
I am recently learnt and finding it very intriguing.
any suggestions?
The code below:
Does not crash if input contains unexpected characters (e.g., '(')
Only counts exact matches for the sequences of digits on either side of "\t".
Matches lines that might have been read from a file or __DATA__ section without using chomp using \R.
--
#!/usr/bin/env perl
use strict;
use warnings;
my #data = ( "1\t3", "2\t3", "(\t4", "1\t3", "3\t3", "11\t3" );
for my $i (1 .. 3) {
for my $j (1 .. 3) {
my $pattern = "$i\t$j";
my $count = grep /\A\Q$pattern\E\R?\z/, #data;
print join("\t", $pattern, $count ? $count : ''), "\n";
}
}
Output:
1 1
1 2
1 3 2
2 1
2 2
2 3 1
3 1
3 2
3 3 1
You almost got it. Here is a working version:
#!usr/bin/perl
use strict;
use warnings;
my #data = ( "1\t3", "2\t3", "1\t3", "3\t3", );
foreach my $i (1 .. 3) {
foreach my $j (1 .. 3) {
my $pattern = "$i\t$j";
my $count = grep(/$pattern/, #data);
print $pattern . ($count ? "\t$count\n" : "\n");
}
}

How can I organize this data using Perl?

I am new to Perl. I have an input file such as:
a 7 5
b 8 2
a 3 2
b 4 1
c 6 1
How can I get output like
column_1_val, number_occurrence_column_1, sum_of_column_2, sum_of_column_3
For example
a 2 10 7
b 2 12 3
c 1 6 1
The program below is a possible solution. I have used the DATA file handle whereas you will presumably need to open an external file and use the handle from that.
use strict;
use warnings;
use feature 'say';
my %data;
while (<DATA>) {
my ($key, #vals) = split;
$data{$key}[0]++;
my $i;
$data{$key}[++$i] += $_ for #vals;
}
say join ' ', $_, #{$data{$_}} for sort keys %data;
__DATA__
a 7 5
b 8 2
a 3 2
b 4 1
c 6 1
output
a 2 10 7
b 2 12 3
c 1 6 1
That would be something like (untested):
while (<>) {
if (m/(\w+)\s+(\d+)\s+(\d+)/) {
($n, $r1, $r2) = ($1, $2, $3);
$nr{$n}++;
$r1{$n} += $r1;
$r2{$n} += $r2;
}
}
for $n (sort keys %nr) {
print "$n $nr{$n} $r1{$n} $r2{$n}\n";
}
This is a very quick-and-dirty way of doing what you described, but it should get you on your way.
Even i am not aware of perl.But in case you are concerned with the result.the below is the solution in awk.It might /might not help you.but in case u need it :
awk '{c[$1]++;a[$1]=a[$1]+$2;b[$1]+=$3}END{for(i in a)print i,c[i],a[i],b[i]}' file3
A slightly different take:
my %records;
while ( <> ) {
my #cols = split ' ';
my $rec = $records{ $cols[0] } ||= {};
$rec->{number_occurrence_column_1}++;
$rec->{sum_of_column_2} += $cols[1];
$rec->{sum_of_column_3} += $cols[2];
}
foreach my $rec ( map { { col1 => $_, %{ $records{ $_ } } }
sort keys %records
) {
print join( "\t"
, #$rec{ qw<col1 number_occurrence_column_1
sum_of_column_2 sum_of_column_3
>
}
), "\n"
;
}

Going out of loop Perl

I have two arrays, I am evaluating the values of one array with other. What i have done is
#array_x= qw(1 5 3 4 6);
#array_y= qw(-3 4 2 1 3);
foreach $x (#array_x){
foreach $y (#array_y){
if ($x-$y > 0){
next;
}
print "$x\n";
}
}
Here, problem is , in array_x, its first index i.e 1-(-3)=4, it satisfies, but next 1-4=-3 is not satisfying the condition, hence it should break the loop and go for next element of array_x. Here only 5 and 6 satisfies the condition with all elements of array_y, so i should get only 5,6 in the output.
Here is your loops with labels so you can break to the outer level:
XVALUE:
foreach $x (#array_x){
YVALUE:
foreach $y (#array_y){
if ($x-$y > 0){
next XVALUE;
}
print "$x\n";
}
}
You can label each loop and exit the one you want. See perldoc last
E.g.:
LINE: while (<STDIN>) {
last LINE if /^$/; # exit when done with header
#...
}
If the intention is to just find the elements which are greater than the element in the subsequent list, the following would find it in 1 iteration of each array.
use strict;
my #array_x= qw(1 5 3 4 6);
my #array_y= qw(-3 4 2 1 3);
my $max_y = $array_y[0];
foreach my $y (#array_y) {
$max_y = $y if $y > $max_y;
}
foreach my $x (#array_x) {
print "\nX=$x" if $x > $max_y;
}
Output:
X=5
X=6
Not really sure what is your need, but is this what you want?
#!/usr/bin/perl
use Modern::Perl;
my #array_x= qw(1 5 3 4 6);
my #array_y= qw(-3 4 2 1 3);
foreach my $x(#array_x){
my $OK=1;
foreach my $y(#array_y){
next if $x > $y;
$OK=0;
last;
}
say "x=$x" if $OK;
}
output:
x=5
x=6
I think you might want to rethink your method. You want to find all values in #x which are greater than all in #y. You shouldn't loop over all #y each time, you should find the max of it, then filter on the max.
use strict;
use warnings;
use List::Util 'max';
my #x= qw(1 5 3 4 6);
my #y= qw(-3 4 2 1 3);
my $ymax = max #y;
my #x_result = grep { $_ > $ymax } #x;
Or since I am crazy about the new state keyword:
use strict;
use warnings;
use 5.10.0;
use List::Util 'max';
my #x= qw(1 5 3 4 6);
my #y= qw(-3 4 2 1 3);
my #x_result = grep { state $ymax = max #y; $_ > $ymax } #x;
Edit: on re-reading previous answers, this is the same concept as angel_007, though I think this implementation is more self-documenting/readable.
Revised answer:
#!/usr/bin/perl
use strict;
use warnings;
my #array_x= qw(1 5 3 4 6);
my #array_y= qw(-3 4 2 1 3);
LABEL: for my $x (#array_x) {
for my $y (#array_y) {
next LABEL unless $x > $y;
}
print "$x\n";
}