Adding columns of numbers in Perl - perl

I have a file with columns of numbers:
1 0.0 0.0
2 0.0 0.0
3 15.2 0.0
4 7.0 9.0
5 0.0 3.0
6 1.0 0.0
7 0.0 2.5
8 0 0 0 0
I need to find the sum of numbers from row 3 to 7 of the right two columns. So for column2 i want to sum 15.2, 7.0 and 1.0. For column3 i want to sum 9.0, 3.0 and 2.5. I need to maintain the single decimal point format.
code:
While (<INPUT>){
my #a = split;
my $c2 .= $a[1];
my $c3 .= $a[2];
my $c2_string = substr($c2, 2, 5);
my $c3_string = substr($c3, 2, 5);
my #sumarray = split ('', $c2);
#then loop through each element and add them up.
This doesnt seem to work. How can i maintain separation of each number while maintaining the decimal format?
For c2, wrong Output:
1
5
.
2
7
.
0
0
.
0
etc
Desired Output:
c2=23.2
c3=14.5

my $x = my $y = 0;
while (<INPUT>) {
my #a = split;
($a[0] >=3 and $a[0] <=7) or next;
$x += $a[1];
$y += $a[2];
}
print "c2=$x\n", "c3=$y\n";
perl -lane'
($F[0] >=3 and $F[0] <=7) or next;
$x += $F[1]; $y += $F[2];
END{ print for "c2=$x","c3=$y" }
' file

my #data;
while (<INPUT>) {
push #data, [ split ];
}
my ($sum2, $sum3);
for (my $i = 2; $i < 7; $i++) {
$sum2 += $data[$i][1];
$sum3 += $data[$i][2];
}
print "$sum2, $sum3\n";
Output:
23.2, 14.5
And this one does not create an array for the entire file:
my ($sum2, $sum3);
while (<INPUT>) {
my #v = split;
if ($v[0] > 2 && $v[0] < 8) {
$sum2 += $v[1];
$sum3 += $v[2];
}
}

#!/usr/bin/perl -w
use strict;
my $infile = 'in.txt';
open my $input, '<', $infile or die "Can't open to $infile: $!";
my ($col1, $sum_col2, $sum_col3 );
while (<$input>) {
my (#cols) = split;
$col1 = $cols[0];
$sum_col2 += $cols[1] if $col1 == 3 .. 7;
$sum_col3 += $cols[2] if $col1 == 3 .. 7;
}
print "Column2: $sum_col2\n";
print "Column3: $sum_col3\n";
Output:
Column2: 23.2
Column3: 14.5

Related

Perl power of the adjacency matrix

Does anyone know how they would find the n-th power of the adjacency matrix?
Here is the matrix I am trying to write the code for
0 0 1 0
0 0 1 0
1 1 0 1
0 0 1 0
and the 2nd power adjacency matrix is:
1 1 0 1
1 1 0 1
0 0 3 0
1 1 0 1
I am not sure how I can calculate it with a perl code
I only have a code for reading in my file
sub matrix_read_file {
my ($filename) = #_;
my #matrix;
open (my $F, '<', $filename) or die "Could not open $filename: $!";
while (my $line =<$F> ) {
chomp $line;
next if $line =~ /^\s*$/; # skip blank lines
my #row = split /\s+/, $line;
push #matrix, \#row;
}
close $F;
print "$matrix[2][1]\n"; #test to see if individual elements print
print "$matrix[1][2]\n";
return \#matrix;
}
Here is an example:
use v5.22.0; # signatures requires perl >= 5.22
use feature qw(say);
use strict;
use warnings;
use experimental qw(signatures);
use Data::Dumper;
{
my $m = [[0,0,1,0],[0,0,1,0],[1,1,0,1],[0,0,1,0]];
my $m2 = matmul( $m, $m );
print Dumper($m2);
}
# Assume $m1 has size [pxq] and $m2 has size [qxr] :
# p = number of rows in $m1
# q = number of columns in $m1 == number of rows in $m2
# r = number of columns in $m2
# result = [pxq] * [qxr] => [p x r]
sub matmul( $m1, $m2 ) {
my $p = $#$m1;
my $q = $#{$m1->[0]};
my $r = $#{$m2->[0]};
my #res;
for my $i (0..$p) {
for my $j (0..$r) {
my $sum = 0;
for my $k (0..$q) {
$sum += $m1->[$i][$k] * $m2->[$k][$j];
}
$res[$i][$j] = $sum;
}
}
return \#res;
}
Or alternatively, using the PDL module:
use feature qw(say);
use strict;
use warnings;
use PDL;
my $m = pdl [[0,0,1,0],[0,0,1,0],[1,1,0,1],[0,0,1,0]];
my $m2 = $m x $m;
print $m2;

Using pointers to lines in a file in perl

I'm trying to use some sort of pointers in perl so that I can look at two at two files that are sorted in alphabetical order and match things in both the files if they have the same name in the first column. The way i'm searching through each file though is I'm looking at which lines first column is lower in alphabetical order and then moving the pointer on that file to the next line. Somewhat similar to the pointers in merge sort. The code below is an example of what I want.
Using these two files.
set1
apple 17 20
boombox 23 29
carl 25 29
cat 22 33
dog 27 44
set2
ants yes
boombox no
carl yes
dentist yes
dice no
dog no
I can make a script that does something like this
($name, $affirmation) = first line in set2; #part I'm confused about I just kind of need some sort of command of something that will do this
while (<>){
#set1 = split;
while ($name < set1[0]){
($name, $affirmation) = next line in set2; # part i'm confused about I just kind of need some sort of command of something that will do this
}
if ($name = $set[0]{
print #set1, $affirmation;
}
This is how I would run it
./script.txt set1
I would end up with
boombox 23 29 no
carl 25 29 yes
dog 27 44 no
.
.
Edit:
I tried some code in some of the answers to see if I could make some functional code out of it but I seem to be running into problems, and some of the syntax in the answers I could not understand so I'm having a lot of trouble figuring out how to debug or solve this.
This is my specific example using the folllowing two text files
text.txt
Apples 0 -1 -1 0 0 0 0 -1
Apricots 0 1 1 0 0 0 0 1
Fruit 0 -1 -1 0 0 0 0 -1
Grapes 0 -2 -1 0 0 0 0 -2
Oranges 0 1 1 0 0 0 0 -1
Peaches 0 -2 -1 0 0 0 0 -2
text2.txt
Apples CHR1 + 1167628 1170420 1 1 N
Apricots CHR1 - 2115898 2144159 1 1 N
Oranges CHR1 - 19665266 19812066 1 1 N
Noidberry CHR1 - 1337728 1329993 1 1 N
Peaches CHR1 - 1337275 1342693 1 1 N
And this script
script.pl
#!/usr/bin/perl
use warnings;
my $file_1 = $ARGV[0];
my $file_2 = $ARGV[1];
open my $fh1, '<', $file_1 or die "Can't open $file_1: $!";
open my $fh2, '<', $file_2 or die "Can't open $file_2: $!";
open(my $single, '>', 'text.txt');
open(my $deep, '>', 'text2.txt');
OUTER: while (my $outer = <$fh1>){
chomp $outer;
#CopyNumber = split(' ', $outer);
($title, $title2) = split('\|', $CopyNumber[0]);
#print 'title: ',$title,' title2: ',$title2,"\n";
my $numLoss = 0;
my $deepLoss = 0;
for ($i = 1; $i <= $#CopyNumber; $i++){
#print "$CopyNumber[$i], $#CopyNumber, $i, \n";
if ($CopyNumber[$i] < 0){
$numLoss = $numLoss + 1;
if ($CopyNumber[$i] <-1){
$deepLoss = $deepLoss + 1;
}
}
}
if ($GeneSym and (($GeneSym cmp $title)==0)){ #or (($GeneSym cmp $title2)==0))){
print $single $Chrom,"\t",$Start,"\t",$Stop,"\t",$numLoss/$#CopyNumber,"\n";
print $deep $Chrom,"\t",$Start,"\t",$Stop,"\t",$deepLoss/$#CopyNumber,"\n";
next OUTER;
}
INNER: while (my $inner = <$fh2>){
($GeneSym, $Chrom, $Strand, $Start, $Stop, $MapId, $TotalMap, $AbnormalMerge, $Overlap) = split(' ', $inner);
$Chrom =~ s/CHR/hs/ee;
my $cmp = ($GeneSym cmp $title);
next OUTER if $cmp < 0;
if ($cmp==0){ #or (($GeneSym cmp $title2)==0)){
print $single $Chrom,"\t",$Start,"\t",$Stop,"\t",$numLoss/$#CopyNumber,"\n";
print $deep $Chrom,"\t",$Start,"\t",$Stop,"\t",$deepLoss/$#CopyNumber,"\n";
next OUTER;
}
}
}
If I run ./script.pl text.txt text2.txt I should get this printed into Number.txt
//corresponding to columns 2,4,5 of text2.txt and the last column being the percentage of columns which have a number lower than 0
hs1 1167628 1170420 0.375 //For Apples
hs1 2115898 2144159 0 //For Apricots
hs1 19665266 19812066 0.125 //For Oranges
hs1 1337275 1342693 0.375 //For Peaches
Instead I get this
hs1 1167628 1170420 0.375
hs1 2115898 2144159 0
hs1 1337275 1342693 0.375
So I'm just getting an error where
hs1 19665266 19812066 0.125 //For Oranges
isn't printing
Quite like you state, with: use cmp for comparison, split line into two terms.
For each line of FILE1 file go through lines of FILE2 file, exiting when a match is found. Once the FILE2 overshoots alphabetically move to the next line of FILE1.
use warnings 'all';
use strict;
sub process {
my ($name, $affirm_1, $affirm_2) = #_;
print "$name $affirm_1 $affirm_2\n";
}
my $file_1 = 'set1.txt';
my $file_2 = 'set2.txt';
open my $fh1, '<', $file_1 or die "Can't open $file_1: $!";
open my $fh2, '<', $file_2 or die "Can't open $file_2: $!";
my ($name_2, $affirm_2);
FILE1: while (my $line1 = <$fh1>) {
chomp $line1;
my ($name_1, $affirm_1) = split ' ', $line1, 2;
if ($name_2) {
my $cmp = $name_1 cmp $name_2;
next FILE1 if $cmp < 0;
if ($cmp == 0) {
process($name_1, $affirm_1, $affirm_2);
next FILE1;
}
}
FILE2: while (my $line2 = <$fh2>) {
chomp $line2;
($name_2, $affirm_2) = split ' ', $line2, 2;
my $cmp = $name_1 cmp $name_2;
next FILE1 if $cmp < 0;
if ($cmp == 0) {
process($name_1, $affirm_1, $affirm_2);
next FILE1;
}
}
}
Comments on a few remaining details.
Once a FILE2 line "overshoots," in the next iteration of FILE1 we need to first check that line, before entering the FILE2 loop to iterate over its remaining lines. For the first FILE1 line the $name_2 is still undef thus if ($name_2).
Updated for edited post.
use warnings 'all';
use strict;
sub process_line {
my ($single, $deep, $rline, $GeneSym, $Chrom, $Start, $Stop) = #_;
my ($numLoss, $deepLoss) = calc_loss($rline);
$Chrom =~ s/CHR/hs/;
print $single (join "\t", $Chrom, $Start, $Stop, $numLoss/$#$rline), "\n";
print $deep (join "\t", $Chrom, $Start, $Stop, $deepLoss/$#$rline), "\n";
}
sub calc_loss {
my ($rline) = #_;
my ($numLoss, $deepLoss) = (0, 0);
for my $i (1.. $#$rline) {
$numLoss += 1 if $rline->[$i] < 0;
$deepLoss += 1 if $rline->[$i] < -1;
}
return $numLoss, $deepLoss;
}
my ($Number, $NumberDeep) = ('Number.txt', 'NumberDeep.txt');
open my $single, '>', $Number or die "Can't open $Number: $!";
open my $deep, '>', $NumberDeep or die "Can't open $NumberDeep: $!";
my ($file_1, $file_2) = ('set1_new.txt', 'set2_new.txt');
open my $fh1, '<', $file_1 or die "Can't open $file_1: $!";
open my $fh2, '<', $file_2 or die "Can't open $file_2: $!";
my ($GeneSym, $Chrom, $Strand, $Start, $Stop,
$MapId, $TotalMap, $AbnormalMerge, $Overlap);
FILE1: while (my $line1 = <$fh1>) {
next if $line1 =~ /^\s*$/;
chomp $line1;
my #line = split ' ', $line1;
if ($GeneSym) {
my $cmp = $line[0] cmp $GeneSym;
next FILE1 if $cmp < 0;
if ($cmp == 0) {
process_line($single, $deep, \#line,
$GeneSym, $Chrom, $Start, $Stop);
next FILE1;
}
}
FILE2: while (<$fh2>) {
next if /^\s*$/;
chomp;
($GeneSym, $Chrom, $Strand, $Start, $Stop,
$MapId, $TotalMap, $AbnormalMerge, $Overlap) = split;
my $cmp = $line[0] cmp $GeneSym;
next FILE1 if $cmp < 0;
if ($cmp == 0) {
process_line($single, $deep, \#line,
$GeneSym, $Chrom, $Start, $Stop);
next FILE1;
}
}
}
This produces the desired output with given sample files. Some shortcuts are taken, please let me know if comments would be helpful. Here are a few
Much error checking should be added around.
I assume the first field of FILE1 to be used as it stands. Otherwise changes are needed.
Processing is split into two functions, calculations being separate. This is not necessary.
$#$rline is the index of the last element of $rline arrayref. If this is too much syntax to stomach use #$rline - 1, for example as (0..#$rline-1)
Some comments on the code posted in the question:
Always, always, please use warnings; (and use strict;)
loop over indices is best written foreach my $i (0..$#array)
The regex modifier /ee is very involved. There is absolutely no need for it here.
You're right. It's exactly like a merge sort, except only matching lines are output.
sub read_and_parse1 {
my ($fh) = #_;
defined( my $line = <$fh> )
or return undef;
my ($id, #copy) = split(' ', $line); # Use split(/\t/, $line) if tab-separated data
my ($gene_sym) = split(/\|/, $id);
return [ $gene_sym, #copy ];
}
sub read_and_parse2 {
my ($fh) = #_;
defined( my $line = <$fh> )
or return undef;
return [ split(' ', $line) ]; # Use split(/\t/, $line) if tab-separated data
}
my $fields1 = read_and_parse1($fh1);
my $fields2 = read_and_parse2($fh2);
while ($fields1 && $fields2) {
my $cmp = $fields1->[0] cmp $fields2->[0];
if ($cmp < 0) { $fields1 = read_and_parse1($fh1); }
elsif ($cmp > 0) { $fields2 = read_and_parse2($fh2); }
else {
my ($gene_sym, #copy) = #$fields1;
my (undef, $chrom, $strand, $start, $stop, $map_id, $total_map, $abnormal_merge, $overlap) = #$fields2;
$chrom =~ s/^CHR/hs/;
my $num_loss = grep { $_ < 0 } #copy;
my $deep_loss = grep { $_ < -1 } #copy;
print($single_fh join("\t", $chrom, $start, $stop, $num_loss/#copy ) . "\n");
print($deep_fh join("\t", $chrom, $start, $stop, $deep_loss/#copy ) . "\n");
$fields1 = read_and_parse1($fh1);
$fields2 = read_and_parse2($fh2);
}
}
Output:
$ cat single.txt
hs1 1167628 1170420 0.375
hs1 2115898 2144159 0
hs1 19665266 19812066 0.125
hs1 1337275 1342693 0.375
$ cat deep.txt
hs1 1167628 1170420 0
hs1 2115898 2144159 0
hs1 19665266 19812066 0
hs1 1337275 1342693 0.25

Majority Element Failing to close cycles

I'm trying to figure out why this keeps printing the "majority element" candidate in every cycle.
The code I've been trying to make work is a Majority Element search (to find an element that is repeated more than half of the length of a list).
I can't separate the processes of finding the candidate and testing against the array because my input is a text file that has an indeterminate number of arrays. It's an exercise from rosalind.info that has different inputs every time you try to solve it.
An example of the input would be
-5 5 5 5 5 5 5 5 -8 7 7 7 1 7 3 7 -7 1 6 5 10 100 1000 1 -5 1 6 7 1 1 10 1
Here's what I've written so far.
foreach my $currentrow (#lists) {
my #row = ();
#row = split( /\s/, $currentrow );
my $length = $#row;
my $count = 0;
my $i = 0;
for $i ( 0 .. $length - 1 ) {
if ( $count == 0 ) {
$candidate = $row[$i];
$count++;
}
if ( ( $count > 0 ) and ( $i = $length - 1 ) ) {
my $counter2 = 0;
for my $j ( 0 .. $length - 1 ) {
if ( $row[$j] == $candidate ) {
$counter2++;
}
}
if ( $counter2 <= ( $#row / 2 ) and ( $i = $length - 1 ) ) {
$candidate = -1;
print $candidate, " ", $i, " ";
}
if ( $counter2 > ( $#row / 2 ) and ( $i = $length - 1 ) ) {
print $candidate, " ", $i, " ";
}
}
if ( $candidate == $row[$i] and $count > 0 ) {
$count = $count + 1;
}
if ( $candidate != $row[$i] and $count > 0 ) {
$count = $count - 1;
}
}
}
Do you have use strict and use warnings 'all' in place?
I imagine that your problem may be because of the test $i = $length - 1, which is an assignment, and should be $i == $length - 1
To find a majority element I would use a hash:
perl -nae '%h=(); $h{$_}+=2 for #F; $h{$_}>#F and print for keys %h; print "\n"'
Each line of input is treated separately. Each line of output matches a line of input and presents its majority element or is empty if there is no such element.
Edit: Now the solution uses autosplit (-a), which is shorter and work not only for numbers.

to change the value of an element of an array by using the index ref in perl

#!/bin/usr/perl -w
use strict;
print "Enter your input filename for original sample data values: \n";
chomp($data=<STDIN>);
print "Enter your input filename for adjustment values\n";
chomp($adj=<STDIN>) ;
print "Enter your output filename for resultant adjusted new sample data \n";
chomp($new=<STDIN>);
open(R1,"$data") or die("error");
open(R2,"$adj") or die ("error");
open(WW,"+>$new") or die ("error");
while( ($line1=(<R1>)) && ($line2=(<R2>)) )
{
$l1=$line1;
#arr1= split(" ",$l1);
$l2=$line2;
#arr2= split(" ",$l2);
$l= ( scalar#arr1);
$p= (scalar#arr2);
for ( $i = 0; $i <= $l; $i++ ){
for ( $j =($i+1); $j <= $l; $j++ ){
if ($arr1[$i]< $arr1[$j]){
$a = $arr1[$i] + ($arr2[$i]/2);
$b = $arr1[$j] - ($arr2[$i]/2);
push ( $arr1[$i]->$a , $arr1[$j]->$b);
}
elsif ( $arr1[$i]= $arr1[$j]){
$a = $arr1[$i];
$b = $arr1[$j];
push ($arr1[$i]->$a,$arr1[$j]->$b);
}
else{
$a = $arr1[$i]-($arr2[$i]/2);
$b = $arr1[$j]+ ($arr2[$i]/2);
push ($arr1[$i]->$a,$arr1[$j]->$b);
}
}
}
$l1 = scalar#arr1;
for ($k = 0; $k <= $l1 ; $k++)
{
if (($k % 10) != 0){
print WW "$arr1[$k]";
print WW "\t" ;
}
else {
print WW "\n";
print WW "$arr1[$k]";
print WW "\t";
}
}
}
close(R1);
close(R2);
close(WW);
exit;
when i am running this prog. i am getting an error that "not an ARRAY reference at line 29".
how can i create the reference to my first array #arr1 ??? so that it stores the changed values of the element at the particular index after running the iteration.
input :
#array1
1 2 3 4 5 6 7 8 9 10
#array2
1 2 3 4 5 6 7 8 9 10 9 8 7 6 5 4 3 2
desired output
#array1
15 1.5 2 3 6 4 11.5 5 5.5
Well, I'm not getting the answer you say you're looking for, but what it appears you're trying to do is to store the value of $a into the $i'th index of array #arr1 and the value of $b into the $jth index of #arr1. I have hoisted the assignment code out of the if branches since it will be the same for all three cases. I have also fixed a subtle error you had in your conditions. You had
elsif ( $arr1[$i]= $arr1[$j]){
but you surely meant to do an equality comparison rather than an assignment here:
elsif ( $arr1[$i] == $arr1[$j]){
So here is the modified section. As I say, it still doesn't print out what you say the desired result is, and I'm not sure whether it's because your computation is wrong or your printing is wrong (I couldn't figure out any obvious transform from your inputs to your desired output), but this should at least put you in the right direction:
for ( $i = 0; $i <= $l; $i++ ){
for ( $j =($i+1); $j <= $l; $j++ ){
if ($arr1[$i]< $arr1[$j]){
$a = $arr1[$i] + ($arr2[$i]/2);
$b = $arr1[$j] - ($arr2[$i]/2);
# push ( $arr1[$i]->$a , $arr1[$j]->$b);
}
elsif ( $arr1[$i] == $arr1[$j]){
$a = $arr1[$i];
$b = $arr1[$j];
# push ($arr1[$i]->$a,$arr1[$j]->$b);
}
else{
$a = $arr1[$i]-($arr2[$i]/2);
$b = $arr1[$j]+ ($arr2[$i]/2);
# push ($arr1[$i]->$a,$arr1[$j]->$b);
}
$arr1[$i] = $a;
$arr1[$j] = $b;
}
}

Using perl, given an array of any size, how do I randomly pick 1/4 of the list

For clarification, if I had a list of 8 elements, i would want to randomly pick 2. If I had a list of 20 elements, I would want to randomly pick 5. I would also like to assure (though not needed) that two elements don't touch, i.e. if possible not the 3 and then 4 element. Rather, 3 and 5 would be nicer.
The simplest solution:
Shuffle the list
select the 1st quarter.
Example implementation:
use List::Util qw/shuffle/;
my #nums = 1..20;
my #pick = (shuffle #nums)[0 .. 0.25 * $#nums];
say "#pick";
Example output: 10 2 18 3 19.
Your additional restriction “no neighboring numbers” actually makes this less random, and should be avoided if you want actual randomness. To avoid that two neighboring elements are included in the output, I would iteratively splice unwanted elements out of the list:
my #nums = 1..20;
my $size = 0.25 * #nums;
my #pick;
while (#pick < $size) {
my $i = int rand #nums;
push #pick, my $num = $nums[$i];
# check and remove neighbours
my $len = 1;
$len++ if $i < $#nums and $num + 1 == $nums[$i + 1];
$len++, $i-- if 0 < $i and $num - 1 == $nums[$i - 1];
splice #nums, $i, $len;
}
say "#pick";
use strict;
use warnings;
sub randsel {
my ($fact, $i, #r) = (1.0, 0);
while (#r * 4 < #_) {
if (not grep { $_ == $i } #r) {
$fact = 1.0;
# make $fact = 0.0 if you really don't want
# consecutive elements
$fact = 0.1 if grep { abs($i - $_) == 1 } #r;
push(#r, $i) if (rand() < 0.25 * $fact);
}
$i = ($i + 1) % #_;
}
return map { $_[$_] } sort { $a <=> $b } #r;
}
my #l;
$l[$_] = $_ for (0..19);
print join(" ", randsel(#l)), "\n";