Using perl, given an array of any size, how do I randomly pick 1/4 of the list - perl

For clarification, if I had a list of 8 elements, i would want to randomly pick 2. If I had a list of 20 elements, I would want to randomly pick 5. I would also like to assure (though not needed) that two elements don't touch, i.e. if possible not the 3 and then 4 element. Rather, 3 and 5 would be nicer.

The simplest solution:
Shuffle the list
select the 1st quarter.
Example implementation:
use List::Util qw/shuffle/;
my #nums = 1..20;
my #pick = (shuffle #nums)[0 .. 0.25 * $#nums];
say "#pick";
Example output: 10 2 18 3 19.
Your additional restriction “no neighboring numbers” actually makes this less random, and should be avoided if you want actual randomness. To avoid that two neighboring elements are included in the output, I would iteratively splice unwanted elements out of the list:
my #nums = 1..20;
my $size = 0.25 * #nums;
my #pick;
while (#pick < $size) {
my $i = int rand #nums;
push #pick, my $num = $nums[$i];
# check and remove neighbours
my $len = 1;
$len++ if $i < $#nums and $num + 1 == $nums[$i + 1];
$len++, $i-- if 0 < $i and $num - 1 == $nums[$i - 1];
splice #nums, $i, $len;
}
say "#pick";

use strict;
use warnings;
sub randsel {
my ($fact, $i, #r) = (1.0, 0);
while (#r * 4 < #_) {
if (not grep { $_ == $i } #r) {
$fact = 1.0;
# make $fact = 0.0 if you really don't want
# consecutive elements
$fact = 0.1 if grep { abs($i - $_) == 1 } #r;
push(#r, $i) if (rand() < 0.25 * $fact);
}
$i = ($i + 1) % #_;
}
return map { $_[$_] } sort { $a <=> $b } #r;
}
my #l;
$l[$_] = $_ for (0..19);
print join(" ", randsel(#l)), "\n";

Related

Sum the odd and even indices of an array separately - Perl

I have an array of 11 elements. Where I want to sum the odd elements including the first and last elements as one scalar and the evens as another.
This is my code I am trying to use map adding 2 to each index to achieve the result but I think I have got it wrong.
use strict;
use warnings;
use Data::Dumper;
print 'Enter the 11 digiet serial number: ';
chomp( my #barcode = //, <STDIN> );
my #sum1 = map { 2 + $_ } $barcode[1] .. $barcode[11];
my $sum1 = sum Dumper( \#sum1 );
# sum2 = l2 + l4 + l6 + r8 + r10;
printf "{$sum1}";
What is a good way to achieve this?
Sum of even/odd indicies (what you asked for, but not what you want[1]):
use List::Util qw( sum ); # Or: sub sum { my $acc; $acc += $_ for #_; $acc }
my $sum_of_even_idxs = sum grep { $_ % 2 == 0 } 0..$#nums;
my $sum_of_odd_idxs = sum grep { $_ % 2 == 1 } 0..$#nums;
Sum of even/odd values (what you also asked for, but not what you want[1]):
use List::Util qw( sum ); # Or: sub sum { my $acc; $acc += $_ for #_; $acc }
my $sum_of_even_vals = sum grep { $_ % 2 == 0 } #nums;
my $sum_of_odd_vals = sum grep { $_ % 2 == 1 } #nums;
Sum of values at even/odd indexes (what you appear to want):
use List::Util qw( sum ); # Or: sub sum { my $acc; $acc += $_ for #_; $acc }
my $sum_of_vals_at_even_idxs = sum #nums[ grep { $_ % 2 == 0 } 0..$#nums ];
my $sum_of_vals_at_odd_idxs = sum #nums[ grep { $_ % 2 == 1 } 0..$#nums ];
Given that you know how many elements you have, you could use the following:
use List::Util qw( sum ); # Or: sub sum { my $acc; $acc += $_ for #_; $acc }
my $sum_of_vals_at_even_idxs = sum #nums[0,2,4,6,8,10];
my $sum_of_vals_at_odd_idxs = sum #nums[1,3,5,7,9];
I included these in case someone needing these lands on this Q&A.
Add up values at odd and at even indices
perl -wE'#ary = 1..6;
for (0..$#ary) { $_ & 1 ? $odds += $ary[$_] : $evens += $ary[$_] };
say "odds: $odds, evens: $evens"
'
Note for tests: with even indices (0,2,4) we have (odd!) values (1,3,5), in this (1..6) example
You can use the fact that the ?: operator is assignable
print 'Enter the 11 digiet serial number: ';
chomp( my #barcode = //, <STDIN> );
my $odd = 0;
my $even = 0;
for (my $index = 0; $index < #barcode; $index++) {
($index % 2 ? $even : $odd) += $barcode[$index];
}
This works by indexing over #barcode and taking the mod 2 of the index, ie dividing the index by 2 and taking the remainder, and if the remainder is 1 adding that element of #barcode to $even otherwise to $odd.
That looks strange until you remember that arrays are 0 based so your first number of the barcode is stored in $barcode[0] which is an even index.
chomp( my #barcode = //, <STDIN> ); presumably was supposed to have a split before the //?
#barcode will have all the characters in the line read, including the newline. The chomp will change the final element from a newline to an empty string.
Better to chomp first so you just have your digits in the array:
chomp(my $barcode = <STDIN>);
my #barcode = split //, $barcode;
Another Perl, if the string is of length 11 and contains only digits
$ perl -le ' $_="12345678911"; s/(.)(.)|(.)$/$odd+=$1+$3;$even+=$2/ge; print "odd=$odd; even=$even" '
odd=26; even=21
$
with different input
$ perl -le ' $_="12121212121"; s/(.)(.)|(.)$/$odd+=$1+$3;$even+=$2/ge; print "odd=$odd; even=$even" '
odd=6; even=10
$

Heapsort function behaves the same with push and unshift.(Perl)

I've been trying to implement a heap-sort function but it sorts the array in reverse order. The weird part is that it doesn't matter if I use unshift or push, the elements always get printed reversed.
#!/usr/bin/perl -w
use 5.014;
no warnings 'recursion';
sub heapify{
my $index = pop #_;
my $larger;
unless ($index > int(#_/2-1))
{
my $left = 2 * $index + 1;
my $right = 2 * $index + 2;
if($right < #_ && ($_[$left]<$_[$right]))
{
$larger = $right;
} else {
$larger = $left;
}
if($_[$index] < $_[$larger])
{
($_[$index],$_[$larger]) =
($_[$larger],$_[$index]) ;
heapify(#_,$larger);
}
}
}
sub max_heap{
for(my $i = int(#_/2 -1) ; $i > -1; --$i){
heapify(#_,$i);
}
}
sub heapsort{
return unless #_ > 1 ;
max_heap(#_);
my $last = shift(#_);
heapsort(#_);
push(#_,$last);
}
my #test = (9,3,13,7,6,78,2);
heapsort(#test);
say "Heapsorted:";
say join("\n",#test);
Used pseudocode
use strict;
use warnings;
my #a = reverse 1..500;
heapSort(\#a);
print "#a\n";
sub heapSort {
my ($a) = #_;
# input: an unordered array a of length count
# (first place a in max-heap order)
heapify($a);
my $end = $#$a; # //in languages with zero-based arrays the children are 2*i+1 and 2*i+2
while ($end) {
# (swap the root(maximum value) of the heap with the last element of the heap)
#$a[$end, 0] = #$a[0, $end]; # swap(a[end], a[0])
# (decrease the size of the heap by one so that the previous max value will
# stay in its proper placement)
$end--;
# (put the heap back in max-heap order)
siftDown($a, 0, $end);
}
}
sub heapify {
my ($a) = #_;
my $count = #$a;
# (start is assigned the index in a of the last parent node)
my $start = ($count - 2 ) / 2;
while ($start >= 0) {
#(sift down the node at index start to the proper place such that all nodes below
# the start index are in heap order)
siftDown($a, $start, $count-1);
$start--;
#(after sifting down the root all nodes/elements are in heap order)
}
}
sub siftDown {
my ($a, $start, $end) = #_;
# input: end represents the limit of how far down the heap
# to sift.
my $root = $start;
while ($root * 2 + 1 <= $end) { # (While the root has at least one child)
my $child = $root * 2 + 1; # (root*2 + 1 points to the left child)
my $swap = $root; # (keeps track of child to swap with)
#(check if root is smaller than left child)
$swap = $child if $a->[$swap] < $a->[$child];
#(check if right child exists, and if it's bigger than what we're currently swapping with)
$swap = $child + 1 if $child+1 <= $end and $a->[$swap] < $a->[$child+1];
# (check if we need to swap at all)
if ($swap != $root) {
# swap(a[root], a[swap])
#$a[$root, $swap] = #$a[$swap, $root];
$root = $swap; # (repeat to continue sifting down the child now)
}
else {
return;
}
}
}
This is to mention the required change in the code by Сухой27 answered Dec 31 '13 at 17:46.
Code works fine when input array has even n number of elements, but goes wrong when n is odd.
For an input like #a = qw(10 11 2);
Output would be = 2 11 10
The output is wrong because in the first step, the above script skips checking of left child of last parent. When left is larger, the same does not swap with its parent.
The fix is simple:
my $start = int(($count - 2 ) / 2);

Trying to figure out if there's a shorter/better way to implement a conditional-sum-like function

Warning: Project Euler Problem 1 Spoiler
I recently discovered Project Euler and decided to try a few of the problems. The first problem was to sum the numbers from 0-999 that are multiples of 3 or 5.
My first, "java-like" solution was:
print threeAndFive(1000)."\n";
# Returns the sum of the numbers less than $max that are multiples of 3 or 5
sub threeAndFive
{
my $max = shift;
my $sum = 0;
for (my $i=; $i < $max; $i++)
{
$sum+=$i if (validate($i));
}
return $sum;
}
sub validate
{
my $num = shift;
if ($num % 3 == 0 || $num % 5 == 0)
{
return 1;
}
return undef;
}
I then rewrote it in a more perlish fashion:
print eval(join ('+', map {($_ % 3 == 0 || $_ % 5 == 0) ? $_ : ()} (1 .. 999)));
While this is obviously way more concise than the original code, I feel that it can probably be shorter or done in a better fashion. For example, in Python, one can do:
print sum([i for i in range(1,1000) if i%3==0 or i%5==0])
Are there more concise/better/clearer ways to do this? Or other equivalent ways that use different functions? I'm interested in learning as much perl as I can, so the more solutions, the merrier.
Thanks in advance.
The Straightforward Approach
To answer your question, List::Util provides sum.
use List::Util qw( sum );
Or you could write your own
sub sum { my $acc; $acc += $_ for #_; $acc }
Then you get:
say sum grep { $_ % 3 == 0 || $_ % 5 == 0 } 0..999;
Of course, that's an unoptimised approach.
The Optimised Approach
You can easily reduce the above to Ω(1) memory from Ω(N) by using a counting loop.
my $acc;
for (1..999) { $acc += $_ if $_ % 3 == 0 || $_ % 5 == 0; }
say $acc;
But that's far from the best, since the result can be obtained in Ω(1) time and memory!
This is done by adding the sum of the multiples of 3 to the sum of the multiples of 5, then subtracting the sum of the multiples of 15, because the sums of the multiples of $x can be calculated using
( sum 1..floor($n/$x) ) * $x # e.g. 3+6+9+... = (1+2+3+...)*3
which can take advantage of the formula
sum 1..$n = $n * ($n+1) * 0.5
Less concise, but faster:
sub sum1toN { my $N = int(shift); ($N * ($N+1)) / 2; }
my $N = 999;
print sum1toN($N/3)*3 + sum1toN($N/5)*5 - sum1toN($N/15)*15, "\n";
The sum1toN function computes the sum of integers from 1 to N.
Since:
3 + 6 + 9 + 12 ... + 999
Equals:
(1 + 2 + 3 + ... 333 ) * 3
We can computes sum of multiples of 3 using sum1toN(N/3) * 3. And the same applies to 5. Note that since we count the multiples of by 15 in both cases, a subtraction of sum1toN(N/15)*15 is needed.

Perl to count current value based on next value

Currently I'm learning Perl and gnuplot. I would like to know how to count certain value based on the next value. For example I have a text file consist of:
#ID(X) Y
1 1
3 9
5 11
The output should show the value of the unknown ID as well. So, the output should show:
#ID(X) Y
1 1
2 5
3 9
4 10
5 11
The Y of ID#2 is based on the following:
((2-3)/(1-3))*1 + ((2-1)/(3-1))*9 which is linear algebra
Y2=((X2-X3)/(X1-X3))*Y1 + ((X2-X1)/(X3-X1)) * Y3
Same goes to ID#5
Currently I have this code,
#! /usr/bin/perl -w
use strict;
my $prev_id = 0;
my $prev_val = 0;
my $next_id;
my $next_val;
while (<>)
{
my ($id, $val) = split;
for (my $i = $prev_id + 1; $i < $next_id; $i++)
{
$val = (($id - $next_id) / ($prev_id - $next_id)) * $prev_val + (($id - $prev_id) / ($next_id - $prev_id)) * $next_val;
printf ("%d %s\n", $i, $val);
}
printf ("%d %s\n", $id, $val);
($prev_val, $prev_id) = ($val, $id);
($next_val, $next_id) = ($prev_val, $prev_id);
}
Your formula seems more complicated than I would expect, given that you are always dealing with integer spacings of 1.
You did not say whether you want to fill gaps for multiple consecutive missing values, but let's assume you want to.
What you do is read in the first line, and say that's the current one and you output it. Now you read the next line, and if its ID is not the expected one, you fill the gaps with simple linear interpolation...
Pseudocode
(currID, currY) = readline()
outputvals( currID, currY )
while lines remain do
(nextID, nextY) = readline()
gap = nextID - currID
for i = 1 to gap
id = currID + i
y = currY + (nextY - currY) * i / gap
outputvals( id, y )
end
(currID, currY) = (nextID, nextY)
end
Sorry for the non-Perl code. It's just that I haven't been using Perl for ages, and can't remember half of the syntax. =) The concepts here are pretty easy to translate into code though.
Using an array may be the way to go. This will also make your data available for further manipulation.
** Caveat: will not work for multiple consecutive missing values of y; see #paddy's answer.
#!/usr/bin/perl
use strict;
use warnings;
my #coordinates;
while (<DATA>) {
my ($x, $y) = split;
$coordinates[$x] = $y;
}
# note that the for loop starts on index 1 here ...
for my $x (1 .. $#coordinates) {
if (! $coordinates[$x]) {
$coordinates[$x] = (($x - ($x + 1)) / (($x - 1) - ($x + 1)))
* $coordinates[$x - 1]
+ (($x - ($x - 1)) / (($x + 1) - ($x - 1)))
* $coordinates[$x + 1];
}
print "$x - $coordinates[$x]\n";
}
__DATA__
1 1
3 9
5 11
You indicated your problem is getting the next value. The key isn't to look ahead, it's to look behind.
my $prev = get first value;
my ($prev_a, $prev_b) = parse($prev);
my $this = get second value;
my ($this_a, $this_b) = parse($this);
while ($next = get next value) {
my ($next_a, $next_b) = parse($next);
...
$prev = $this; $prev_a = $this_a; $prev_b = $this_b;
$this = $next; $this_a = $next_a; $this_b = $next_b;
}
#! /usr/bin/perl -w
use strict;
my #in = (1,9,11);
my #out;
for (my $i = 0; $i<$#in; $i++) {
my $j = $i*2;
my $X1 = $i;
my $X2 = $i+1;
my $X3 = $i+2;
my $Y1 = $in[$i];
my $Y3 = $in[$i+1];
my $Y2 = $Y1*(($X2-$X3)/($X1-$X3))
+ $Y3*(($X2-$X1)/($X3-$X1));
$out[$j] = $in[$i];
$out[$j+1] = $Y2;
}
$out[$#in*2] = $in[$#in];
print (join " ",#out);

How do I change this to "idiomatic" Perl?

I am beginning to delve deeper into Perl, but am having trouble writing "Perl-ly" code instead of writing C in Perl. How can I change the following code to use more Perl idioms, and how should I go about learning the idioms?
Just an explanation of what it is doing: This routine is part of a module that aligns DNA or amino acid sequences(using Needelman-Wunch if you care about such things). It creates two 2d arrays, one to store a score for each position in the two sequences, and one to keep track of the path so the highest-scoring alignment can be recreated later. It works fine, but I know I am not doing things very concisely and clearly.
edit: This was for an assignment. I completed it, but want to clean up my code a bit. The details on implementing the algorithm can be found on the class website if any of you are interested.
sub create_matrix {
my $self = shift;
#empty array reference
my $matrix = $self->{score_matrix};
#empty array ref
my $path_matrix = $self->{path_matrix};
#$seq1 and $seq2 are strings set previously
my $num_of_rows = length($self->{seq1}) + 1;
my $num_of_columns = length($self->{seq2}) + 1;
#create the 2d array of scores
for (my $i = 0; $i < $num_of_rows; $i++) {
push(#$matrix, []);
push(#$path_matrix, []);
$$matrix[$i][0] = $i * $self->{gap_cost};
$$path_matrix[$i][0] = 1;
}
#fill out the first row
for (my $i = 0; $i < $num_of_columns; $i++) {
$$matrix[0][$i] = $i * $self->{gap_cost};
$$path_matrix[0][$i] = -1;
}
#flag to signal end of traceback
$$path_matrix[0][0] = 2;
#double for loop to fill out each row
for (my $row = 1; $row < $num_of_rows; $row++) {
for (my $column = 1; $column < $num_of_columns; $column++) {
my $seq1_gap = $$matrix[$row-1][$column] + $self->{gap_cost};
my $seq2_gap = $$matrix[$row][$column-1] + $self->{gap_cost};
my $match_mismatch = $$matrix[$row-1][$column-1] + $self->get_match_score(substr($self->{seq1}, $row-1, 1), substr($self->{seq2}, $column-1, 1));
$$matrix[$row][$column] = max($seq1_gap, $seq2_gap, $match_mismatch);
#set the path matrix
#if it was a gap in seq1, -1, if was a (mis)match 0 if was a gap in seq2 1
if ($$matrix[$row][$column] == $seq1_gap) {
$$path_matrix[$row][$column] = -1;
}
elsif ($$matrix[$row][$column] == $match_mismatch) {
$$path_matrix[$row][$column] = 0;
}
elsif ($$matrix[$row][$column] == $seq2_gap) {
$$path_matrix[$row][$column] = 1;
}
}
}
}
You're getting several suggestions regarding syntax, but I would also suggest a more modular approach, if for no other reason that code readability. It's much easier to come up to speed on code if you can perceive the big picture before worrying about low-level details.
Your primary method might look like this.
sub create_matrix {
my $self = shift;
$self->create_2d_array_of_scores;
$self->fill_out_first_row;
$self->fill_out_other_rows;
}
And you would also have several smaller methods like this:
n_of_rows
n_of_cols
create_2d_array_of_scores
fill_out_first_row
fill_out_other_rows
And you might take it even further by defining even smaller methods -- getters, setters, and so forth. At that point, your middle-level methods like create_2d_array_of_scores would not directly touch the underlying data structure at all.
sub matrix { shift->{score_matrix} }
sub gap_cost { shift->{gap_cost} }
sub set_matrix_value {
my ($self, $r, $c, $val) = #_;
$self->matrix->[$r][$c] = $val;
}
# Etc.
One simple change is to use for loops like this:
for my $i (0 .. $num_of_rows){
# Do stuff.
}
For more info, see the Perl documentation on foreach loops and the range operator.
I have some other comments as well, but here is the first observation:
my $num_of_rows = length($self->{seq1}) + 1;
my $num_of_columns = length($self->{seq2}) + 1;
So $self->{seq1} and $self->{seq2} are strings and you keep accessing individual elements using substr. I would prefer to store them as arrays of characters:
$self->{seq1} = [ split //, $seq1 ];
Here is how I would have written it:
sub create_matrix {
my $self = shift;
my $matrix = $self->{score_matrix};
my $path_matrix = $self->{path_matrix};
my $rows = #{ $self->{seq1} };
my $cols = #{ $self->{seq2} };
for my $row (0 .. $rows) {
$matrix->[$row]->[0] = $row * $self->{gap_cost};
$path_matrix->[$row]->[0] = 1;
}
my $gap_cost = $self->{gap_cost};
$matrix->[0] = [ map { $_ * $gap_cost } 0 .. $cols ];
$path_matrix->[0] = [ (-1) x ($cols + 1) ];
$path_matrix->[0]->[0] = 2;
for my $row (1 .. $rows) {
for my $col (1 .. $cols) {
my $gap1 = $matrix->[$row - 1]->[$col] + $gap_cost;
my $gap2 = $matrix->[$row]->[$col - 1] + $gap_cost;
my $match_mismatch =
$matrix->[$row - 1]->[$col - 1] +
$self->get_match_score(
$self->{seq1}->[$row - 1],
$self->{seq2}->[$col - 1]
);
my $max = $matrix->[$row]->[$col] =
max($gap1, $gap2, $match_mismatch);
$path_matrix->[$row]->[$col] = $max == $gap1
? -1
: $max == $gap2
? 1
: 0;
}
}
}
Instead of dereferencing your two-dimensional arrays like this:
$$path_matrix[0][0] = 2;
do this:
$path_matrix->[0][0] = 2;
Also, you're doing a lot of if/then/else statements to match against particular subsequences: this could be better written as given statements (perl5.10's equivalent of C's switch). Read about it at perldoc perlsyn:
given ($matrix->[$row][$column])
{
when ($seq1_gap) { $path_matrix->[$row][$column] = -1; }
when ($match_mismatch) { $path_matrix->[$row][$column] = 0; }
when ($seq2_gap) { $path_matrix->[$row][$column] = 1; }
}
The majority of your code is manipulating 2D arrays. I think the biggest improvement would be switching to using PDL if you want to do much stuff with arrays, particularly if efficiency is a concern. It's a Perl module which provides excellent array support. The underlying routines are implemented in C for efficiency so it's fast too.
I would always advise to look at CPAN for previous solutions or examples of how to do things in Perl. Have you looked at Algorithm::NeedlemanWunsch?
The documentation to this module includes an example for matching DNA sequences. Here is an example using the similarity matrix from wikipedia.
#!/usr/bin/perl -w
use strict;
use warnings;
use Inline::Files; #multiple virtual files inside code
use Algorithm::NeedlemanWunsch; # refer CPAN - good style guide
# Read DNA sequences
my #a = read_DNA_seq("DNA_SEQ_A");
my #b = read_DNA_seq("DNA_SEQ_B");
# Read Similarity Matrix (held as a Hash of Hashes)
my %SM = read_Sim_Matrix();
# Define scoring based on "Similarity Matrix" %SM
sub score_sub {
if ( !#_ ) {
return -3; # gap penalty same as wikipedia)
}
return $SM{ $_[0] }{ $_[1] }; # Similarity Value matrix
}
my $matcher = Algorithm::NeedlemanWunsch->new( \&score_sub, -3 );
my $score = $matcher->align( \#a, \#b, { align => \&check_align, } );
print "\nThe maximum score is $score\n";
sub check_align {
my ( $i, $j ) = #_; # #a[i], #b[j]
print "seqA pos: $i, seqB pos: $j\t base \'$a[$i]\'\n";
}
sub read_DNA_seq {
my $source = shift;
my #data;
while (<$source>) {
push #data, /[ACGT-]{1}/g;
}
return #data;
}
sub read_Sim_Matrix {
#Read DNA similarity matrix (scores per Wikipedia)
my ( #AoA, %HoH );
while (<SIMILARITY_MATRIX>) {
push #AoA, [/(\S+)+/g];
}
for ( my $row = 1 ; $row < 5 ; $row++ ) {
for ( my $col = 1 ; $col < 5 ; $col++ ) {
$HoH{ $AoA[0][$col] }{ $AoA[$row][0] } = $AoA[$row][$col];
}
}
return %HoH;
}
__DNA_SEQ_A__
A T G T A G T G T A T A G T
A C A T G C A
__DNA_SEQ_B__
A T G T A G T A C A T G C A
__SIMILARITY_MATRIX__
- A G C T
A 10 -1 -3 -4
G -1 7 -5 -3
C -3 -5 9 0
T -4 -3 0 8
And here is some sample output:
seqA pos: 7, seqB pos: 2 base 'G'
seqA pos: 6, seqB pos: 1 base 'T'
seqA pos: 4, seqB pos: 0 base 'A'
The maximum score is 100