How to sum two lists element-wise - perl

I want to parse a file line by line, each of which containing two integers, then sum these values in two distinct variables. My naive approach was like this:
my $i = 0;
my $j = 0;
foreach my $line (<INFILE>)
{
($i, $j) += ($line =~ /(\d+)\t(\d+)/);
}
But it yields the following warning:
Useless use of private variable in void context
hinting that resorting to the += operator triggers evaluation of the left-hand side in scalar instead of list context (please correct me if I'm wrong on this point).
Is it possible to achieve this elegantly (possibly in one line) without resorting to arrays or intermediate variables?
Related question: How can I sum arrays element-wise in Perl?

No, it's because the expression ($i, $j) += (something, 1) parses as adding 1 to $j only, leaving $i hanging in void context. Perl 5 has no hyper-operators or automatic zipping for the assignment operators such as +=. This works:
my ($i, $j) = (0, 0);
foreach my $line (<INFILE>) {
my ($this_i, $this_j) = split /\t/, $line;
$i += $this_i;
$j += $this_j;
}
You can avoid the repetion by using a compound data structure instead of named variables for the columns.

First of all, your way of adding arrays pairwise does not work (the related question you posted yourself gives some hints there).
And for the parsing part: How about just splitting the lines? If your lines are formatted accordingly (whitespaces should not be a problem).
split(/\t/, $line, 2)
If you really, really want to do it in one line, you could do something like this (though I don't think you would call it elegant):
my #a = (0, 0);
foreach my $line (<INFILE>)
{
#a = map { shift(#a)+$_ } split(/\t/, $line, 2);
}
For an input of #lines = ("11\t1\n", " 22 \t 2 \n", "33\t3"); it gave me the #a = (6, 66)
I would advise you to use the split part of my answer, but not the adding up part. There is nothing wrong in using more than one line! If it makes your intention clearer, more lines are better than one. But than again I'm hardly using perl nowadays but python instead, so my perl coding style might have a "bad" influence there...

It is quite possible to swap the pair over for each addition, meaning you're always adding to the same element in each pair. (This generalises to rotating multi-element arrays if required.)
use strict;
use warnings;
my #pair = (0, 0);
while (<DATA>) {
#pair = ($pair[1], $pair[0] + $_) for /\d+/g;
}
print "#pair\n";
__DATA__
99 42
12 15
18 14
output
129 71

Here's another option:
use Modern::Perl;
my $i = my $j = 0;
map{$i += $_->[0]; $j += $_->[1]} [split] for <DATA>;
say "$i - $j";
__DATA__
1 2
3 4
5 6
7 8
Output:
16 - 20

Related

Passing strings as array to subroutine and return count of specific char

I was trying to think in the right way to tackle this:
-I would to pass say, n elements array as argument to a subroutine. And for each element match two char types S and T and print for each element, the count of these letters. So far I did this but I am locked and found some infinite loops in my code.
use strict;
use warnings;
sub main {
my #array = #_;
while (#array) {
my $s = ($_ = tr/S//);
my $t = ($_ = tr/T//);
print "ST are in total $s + $t\n";
}
}
my #bunchOfdata = ("QQQRRRRSCCTTTS", "ZZZSTTKQSST", "ZBQLDKSSSS");
main(#bunchOfdata);
I would like the output to be:
Element 1 Counts of ST = 5
Element 2 Counts of ST = 6
Element 3 Counts of ST = 4
Any clue how to solve this?
while (#array) will be an infinite loop since #array never gets smaller. You can't read into the default variable $_ this way. For this to work, use for (#array) which will read the array items into $_ one at a time until all have been read.
The tr transliteration operator is the right tool for your task.
The code needed to get your results could be:
#!/usr/bin/perl
use strict;
use warnings;
my #data = ("QQQRRRRSCCTTTS", "ZZZSTTKQSST", "ZBQLDKSSSS");
my $i = 1;
for (#data) {
my $count = tr/ST//;
print "Element $i Counts of ST = $count\n";
$i++;
}
Also, note that my $count = tr/ST//; doesn't require the binding of the transliteration operator with $_. Perl assumes this when $_ holds the value to be counted here. Your code tried my $s = ($_ = tr/S//); which will give the results but the shorter way I've shown is the preferred way.
(Just noticed you had = instead of =~ in your statement. That is an error. Has to be $s = ($_ =~ tr/S//);)
You can combine the 2 sought letters as in my code. Its not necessary to do them separately.
I got the output you want.
Element 1 Counts of ST = 5
Element 2 Counts of ST = 6
Element 3 Counts of ST = 4
Also, you can't perform math operations in a quoted string like you had.
print "ST are in total $s + $t\n";
Instead, you would need to do:
print "ST are in total ", $s + $t, "\n";
where the operation is performed outside of the string.
Don't use while to traverse an array - your array gets no smaller, so the condition is always true and you get an infinite loop. You should use for (or foreach) instead.
for (#array) {
my $s = tr/S//; # No need for =~ as tr/// works on $_ by default
my $t = tr/T//;
print "ST are in total $s + $t\n";
}
Why tr///??
sub main {
my #array = #_;
while (#array) {
my $s = split(/S/, $_, -1) - 1;
my $t = split(/T/, $_, -1) - 1;
print "ST are in total $s + $t\n";
}
}

Perl: iterating values in for loop

print "Input value \n";
$line = <>;
chomp $line;
#val = $line;
for ($i = 1; $i <= 10; $i++){
print -#val* $i;
}
I have a simple for loop here where the user enters a value that I store into the #val, and I want my loop to iterate from 1 to 10 and print out the value of -#val * $i. Suppose my #val = 2, then I should see the output: -2 -4 -6 - 8 ... -20. But my actual output is: -1 -2 ... -10. What went wrong?
As I wrote in my comment, I can't imagine what you were trying to achieve by copying the value of $line to array #val. There are also a number of other points that I would like to make
You must always
use strict;
use warnings 'all';
at the start of all your Perl programs. You will then have to declare all of your variables with my, and it will alert you to many simple errors that you may otherwise overlook
The C-style for loop is rarely useful in Perl. It is usually best to iterate over a simple list. In your program that would be
for my $i ( 1 .. 10 ) {
...
}
So putting all that together, your program looks like this
use strict;
use warnings 'all';
print "Input value\n";
my $val = <>;
chomp $val;
for my $i ( 1 .. 10 ) {
print -$val * $i;
}
It's also worth pointing out that, when the contents of the for loop is just a single statement like this, you can use for as a statement modifier and write just
print -$val * $_ for 1 .. 10;
The problem is that you're storing your value in an array (#val). When you use that array in a scalar context (the math in the for loop) you just get the number of elements in the array. In your case 1. Change #val to $val or just use $line directly.

Selecting highest count of element except when...

So i have been working on this perl script that will analyze and count the same letters in different line spaces. I have implemented the count to a hash but am having trouble excluding a " - " character from the output results of this hash. I tried using delete command or next if, but am not getting rid of the - count in the output.
So with this input:
#extract = ------------------------------------------------------------------MGG-------------------------------------------------------------------------------------
And following code:
#Count selected amino acids.
my %counter = ();
foreach my $extract(#extract) {
#next if $_ =~ /\-/; #This line code does not function correctly.
$counter{$_}++;
}
sub largest_value_mem (\%) {
my $counter = shift;
my ($key, #keys) = keys %$counter;
my ($big, #vals) = values %$counter;
for (0 .. $#keys) {
if ($vals[$_] > $big) {
$big = $vals[$_];
$key = $keys[$_];
}
}
$key
}
I expect the most common element to be G, same as the output. If there is a tie in the elements, say G = M, if there is a way to display both in that would be great but not necessary. Any tips on how to delete or remove the '-' is much appreciated. I am slowly learning perl language.
Please let me know if what I am asking is not clear or if more information is needed, thanks again kindly for all the comments.
Your data doesn't entirely make sense, since it's not actually working perl code. I'm guessing that it's a string divided into characters. After that it sounds like you just want to be able to find the highest frequency character, which is essentially just a sort by descending count.
Therefore the following demonstrates how to count your characters and then sort the results:
use strict;
use warnings;
my $str = '------------------------------------------------------------------MGG-------------------------------------------------------------------------------------';
my #chars = split '', $str;
#Count Characteres
my %count;
$count{$_}++ for #chars;
delete $count{'-'}; # Don't count -
# Sort keys by count descending
my #keys = sort {$count{$b} <=> $count{$a}} keys %count;
for my $key (#keys) {
print "$key $count{$key}\n";
}
Outputs:
G 2
M 1
foreach my $extract(#extract) {
#next if $_ =~ /\-/
$_ setting is suppressed by $extract here.
(In this case, $_ keeps value from above, e.g. routine argument list, previous match, etc.)
Also, you can use character class for better readability:
next if $extract=~/[-]/;

Could i search between keys of a hash and assign its value to a variable in Perl?

I want to use substr function to recuperate some nucleotides in a sequences. Here i have the FASTA format of those sequences:
>dvex28051
AAAACAAAAACATTCGCTAGAAAGTAATCAGCTGGTCATTTATTTGAAATGTTAATGATATATTTCATGTTGCTAATTTTTTATGAAAAAAATCATTGCTTATTTAATTACTCTTGGTTCTTGACCAACTATAAAAGCATTGTTTAGTATCAAGTGTCCAGGTATCAGCAGTTTTGTTTGAAAACAAACTTTTATTCATGCAGTCAGTGGCGGATCCAGGTAGAGTGCAGAGGCAGCACCCTCCGTCAGAAAACCAAAAAAAGAAGAAATGAAAAATTATAAAAAAAATTTCTAAACGTTGGTGCACTTAAGTGTAGCAAAAAATTCCTGTTTAGATATTCAGTGGGGAGCGACACCTTTTGGGGCCTATAGCTTCAAATCTTACTTGGTGACCTAAAATCGCTTTTTCGTTGGATCTGCGAAAGCTAGAATTTGGTTGCTGCAAATCGAATCGGTGCATCAACTGCATCAATATCAACGATGTGGTGACTGGTGGTATATTTTGGGTTCGTGCAATGCTACATTTATTTCAATCATATTTCAAGGCAGAAAGGGAAAGAAAACATCAGGTCAAGACAGTGGCGTAGCGAGGGAAGGGGGGCATACGTCCCCGGGCGCAACACGATGTCTTTTTTTTTAATCATCTGCGAAATTCAGACATTTTTTAGAGACTAAATGAAACTATGGAAAACCGGGCCCTTATAAAAGTTGAGACCAAGTGAAAAACTGGGGATAAAACATGAAAATCGGGCTCCAAAAGAATGAGAGTCCGCCCTTGGTCTGTACCAGCATGATTTGAGCGCAAATTTCATTAAGCCCCCGGGCGCAAGACACTCACGCTACGCCCCTGGGTAAAGACAAACAGAGTAGTTTTTCTTATAAACACAAGCATGCACAAACAACATAAAAACAAAACACAGTTTTTTTTAAGACGATGTGCTGCGTGCACCCGCTCAATGTTTTTTTTTTTTTTTTATAGAAAAGCAAAACTTTGAAAGGTTAACGTCAACTCATTTTACAACAATTTGTGGCAAATGGTATCAAGGTATCAAGCAATTAACTAAATGTCTTCCACTAGAACGCAGAACACCATTTTGCAATTATTTATTTGATGTAAACCAGTGTGTTAGATCAAAATCACTTCGACGCCGTTTTTTGACTCCGTGAAAATCTTGGTATTCTTCTCGCATTGCATAATGATGGTTTGTTGAAATAAAATTAAACGCTTAACGTTCTTAAAATGAGCGCGATACTACTTTTCTTTGTAGATTTTCTGCATGCGCTCCTTTTAAGTTGATCCCGAGCTACAAACTTCTTTATGAACGTTTTGGATTTCTCCAAAATAAAGCCTGCAAGCAGTTTTCTAAAAACACCGCACCCCCCATTAGGAATTTCTAGATCCGCCCCTGCATACAGTATTTGTTAATTATTAAAACCAACCAGCAGCAATTGTTTATTCAATGACTATTAAACCAACCTGGATAGTGCGTTTGGTCTTGATTGAAGCGATTGCTGCATTGACGTCTTTCGGAACCACATCACC
>dvex294195
GAATCAGTGGAAAAGTCACAACGCAGCTTGCCGAATTACTGCAGATTCTTTACACTTTTTTTTCTACATTATCACTGTTTTGCTTAATTTTCAATTATAGAAATCAAAATTAATAACTGGTATGTAGTTGGTCGGTGCTTCGAGAAAGTAGCCTACTCAATGATTTCTCAGAATGTTACAGTACTTCAAAAAAACAGACTACCCATTTCAAAAAATATAAACCTAGTA
I want to compare each keys of the hash with the Hit column (dvex\d++) of this table:
#Query Hit sense start end star_q end_q lenght_q # this line is informative don't make part of the code.
miRNA1 dvex28051 + 205 232 11 38 51
miRNA1 dvex202016 - 75 106 17 48 51
miRNA1 dvex294195 + 55 85 11 48 51
If this exist, I want to assign its value of the hash to a variable (i.e: $sequence) for apply a substr function:
my $fragment = substr $sequence, $start, $length_sequence;
I make an array with the sequences, and tried to reading it each 2 values and compare it:
while (my $line1 = <$MYINPUTFILE>){ #Entry of the sequences Fasta file
chomp $line1;
push #array_lines, $line1;
}
while (my $line2 = <$IN>){ #Entry of the table
chomp $line2;
push #database_lines, $line2;
}
foreach my $database_line (#database_lines){ #each value of the table
my #entry = split /\s++/,$database_line;
$pattern = $entry[1];
$query = $entry[0];
$start = $entry[3];
$l_pattern = length $pattern;
$end = $entry[4];
$lng_sequence = ($end - $start) + 1;
$sense = $entry[2];
$l_query = $entry[7];
my $count = 2;
for (my $i = 0; $i <= $#array_lines; $i +=$count){
chomp $array_lines[$i-2];
chomp $array_lines[$i-1];
$seq = $array_lines[$i-1];
$header = $array_lines[$i-2];
if($new_header =~ /$pattern/ && $l_header == $l_pattern){
if(($end+$right_diff+$increment) > $l_query){
$clean_seq = substr $seq, $start, $l_query;
} else {;}
}
The problem with my code is that Perl recognizes $seq as the last one Sequence. And always apply substr function on this $seq. I need to search the $pattern and search in those sequences, if exist, assign $seq to its sequence, next apply substr function.
Some suggestions?
I see two significant problems with your code. First, in the loop:
for (my $i = 0; $i <= $#array_lines; $i +=$count){
chomp $array_lines[$i-2];
chomp $array_lines[$i-1];
$seq = $array_lines[$i-1];
$i is set to zero the first time through, but you access array elements $i-1 and $i-2. Element -1 will be the last element of the array, and -2 will be the second to the last element. So it looks like $seq and $header will have incorrect values the first time through your loop. Maybe you need to start $i at $count instead of zero?
Secondly, in this line:
if(($end+$right_diff+$increment) > $l_query){
$increment appears only here in your code. It is never set to anything. Did you mean to use $i here?
A few other suggestions:
Make sure you use warnings; use strict; This will catch errors such as the $increment variable above.
Here is a simpler way to read a file into an array:
my #array_lines = <$MYINPUTFILE>;
chomp #array_lines;
Within regexes, ++ is a special quantifier that disables backtracking. If you want to split on one or more whitespace characters, it is more typical to use split /\s+/, or the equivalent split ' '
With this line, you appear to be simply checking that two strings are equal:
if($new_header =~ /$pattern/ && $l_header == $l_pattern)
You could just do this instead:
if($new_header eq $pattern)
When you have multiple conditions, it is clearer to put them all in one if statement instead of using nested statements. If you have many conditions, you can put them on multiple lines for clarity.
It isn't necessary to use else {;} If you don't need to do anything there, just omit the else clause altogether.

how to add numbers in numeric string in perl

I am facing some problem while adding values in numeric string:
I have string that looks like 02:03:05:07:04:06. All the numbers have to be <10. Now, I have to take a random number from 1-9 and add that number with last position number of the string (e.g. 3).
I the sum>10, then I have add that number to the number in the second last position.
So far, I have
#!/usr/bin/perl -w
use strict;
my $str='02:03:05:07:04:06';
my #arr=split(/:/,$str);
my #new_arr=pop(#arr);
my $rand_val=int(rand(9));
my $val=$new_arr[0]+$rand_val;
if($val>=10)
{
I am unable to generate a logic here:(
}
Please help me out of this problem.
After adding the number we have to join the string and print it also :)
my $str = '02:03:05:07:04:06';
my #nums = split /:/, $str;
my $add = int(rand(9)) + 1;
my $overflow = 1;
for (1..#nums) {
if ($num[-$_] + $add < 10) {
$num[-$_] += $add;
$overflow = 0;
last;
}
}
die "Overflow" if $overflow;
$str = join ':', map sprintf('%02d', $_), #nums;
I just run this and it works. The caveat is that, the lower the last number of the string is, the smaller the chance the "if ($val>=10)" will be valid
This doesn't solve the problem of your rand_val potentially being 0, but I'll leave that as a task for you to resolve. This should give you what you're looking for in terms of traversing through the values in the array until the the sum of the random value and current most-last value in the array.
1 use strict;
2 my $str='02:03:05:07:04:06';
3 my #arr=split(/:/,$str);
4 my $rand_val=int(rand(9));
5 my $val;
6
7 foreach my $i (reverse #arr){
8 $val = $i + $rand_val;
9 next if ($val >= 10);
10 print "val: $val, rand_val: $rand_val, value_used: $i\n";
11 last if ($val < 10);
12 }
I see a misstake : you do
my #new_arr=pop(#arr);
(...)
my $val=$new_arr[0]+$rand_val;
but pop only returns the last element, not a list.