Perl mergesort - array reference - perl

I was trying to implement merge-sort in Perl, I am quite new to Perl and I know I am doing something wrong with the array references. The arrays end up holding the same value after the process is done. Please help cause I don't see where I am going wrong.
The Corrected Code:
use strict;
use warnings;
my ( #aref, #auxref ) = ();
my ( $hi, $lo, $i, $j, $k, $n ) = 0;
#aref = ( 5, 7, 6, 3, 4, 1, 8, 9, 4 );
$n = #aref;
mergeSort( \#aref, \#auxref, 0, $n - 1 );
print "#auxref\n";
print "#aref\n";
sub mergeSort {
my ($aref) = $_[0];
my ($auxref) = $_[1];
my $lo = $_[2];
my $hi = $_[3];
if ( $hi <= $lo ) { return; }
my $mid = 0;
$mid = int( $lo + ( $hi - $lo ) / 2 );
mergeSort( $aref, $auxref, $lo, $mid );
mergeSort( $aref, $auxref, $mid + 1, $hi );
merge( $aref, $auxref, $lo, $mid, $hi );
}
sub merge {
my ($aref) = $_[0];
my ($auxref) = $_[1];
my $lo = $_[2];
my $mid = $_[3];
my $hi = $_[4];
for ( $i = $lo ; $i <= $hi ; $i++ ) {
$auxref->[$i] = $aref->[$i];
}
$i = $lo;
$j = $mid + 1;
for ( $k = $lo ; $k <= $hi ; $k++ ) {
if ( $i > $mid ) {
$aref->[$k] = $auxref->[$j];
$j++;
}
elsif ( $j > $hi ) {
$aref->[$k] = $auxref->[$i];
$i++;
}
elsif ( $auxref->[$i] <= $auxref->[$j] ) {
$aref->[$k] = $auxref->[$i];
$i++;
}
else {
$aref->[$k] = $auxref->[$j];
$j++;
}
}
}

In sub merge, you have two array refs: $auxref and $aref.
And you're accessing the array elements as though they were ordinary arrays (i.e. $aref[0]) but as they are array references, you need to dereference with an arrow first: $aref->[0].
Adding use strict; and use warnings; to the top of your script should have weeded out these errors though?
Arrays
my #arr = (1, 2, 3, 4);
$arr[0] = 5;
push #arr, 6;
# #arr = (5, 2, 3, 4, 6)
Array References
my $arr = [1,2,3];
$arr->[0] = 5;
push #$arr, 6;
# $arr = [5, 2, 3, 4, 6];
2D arrays of array references
my #arr = ([1, 2], [3, 4]);
print $arr[0][1]; # identical to $arr[0]->[1];
push #{$arr[1]}, 5;
# #arr = ([1, 2], [3, 4, 5]);
2D arrayref of array references
my $arr = [[1, 2], [3, 4]];
print $arr->[0][1]; # identical to $arr->[0]->[1];
push #{$arr->[1]}, 5;
# $arr = [[1, 2], [3, 4, 5]];
2D Array of arrays
...can't exist because an array can only hold scalars
my #arr = ((1, 2), (3, 4));
# #arr = (1, 2, 3, 4);

The following is a version of merge sort that doesn't rely on references at all. It almost certainly isn't as memory efficient as some of the original merge sort algorithms were intended, but it gets the job done.
use strict;
use warnings;
my #array = ( 5, 7, 6, 3, 4, 1, 8, 9, 4 );
my #sorted = mergeSort(#array);
print "#sorted\n";
sub mergeSort {
my #array = #_;
if (#array > 1) {
my $mid = int(#array / 2);
my #lowArray = mergeSort(#array[0..$mid-1]);
my #highArray = mergeSort(#array[$mid..$#array]);
# Merge the two halves
my #newArray = ();
while (#lowArray && #highArray) {
if ($lowArray[0] < $highArray[0]) {
push #newArray, shift #lowArray;
} else {
push #newArray, shift #highArray;
}
}
# Either the low or high array will be empty at this point,
# so no need to compare for the remainder.
return (#newArray, #lowArray, #highArray);
} else {
return #array;
}
}

Related

perl: sprintf for element in list

I've been really confused about this, I'm trying to create a big matrix of numbers and I want to use sprintf with perl to have a nicer output. I'm trying to use sprintf like so
my $x = 0;
my $y = 0;
for ($x=1; $x<=$steps; $y++) { # loop through lines
for ($y=0; $y<=$distances; $y++) {
my $format = sprintf ("%s",$matrix[$x][$y]);
but this is really doing my head in, as I am looping through all the values of $x and $y and getting their combinations. So I am not sure if I'm meant to use more formatting arguments like so
my $format = sprintf ("%s%s%s",$matrix[$x][$y]);
(of course this is giving me compilation errors as it's not right)
But when I only use one argument, I can't put spaces in between my columns :/ Can somebody explain what's happening? I really don't understand what I'm meant to do to get the formatting nice. I'm looking to just align the columns and have a couple of whitespaces between them. Thank you all so much.
I would be thinking in terms of using map, as a way to display every element:
#!/usr/bin/env perl
use strict;
use warnings;
my #matrix = ( [1,2,3,4],
[5,6,7,8],
[9,10,11,12], );
print join ("\n", map { join ( "\t", #$_ ) } #matrix );
This is formatting on tab-stops, rather than fixed width columns, and outputs:
1 2 3 4
5 6 7 8
9 10 11 12
If you particularly wanted sprintf though:
foreach my $row ( #matrix ) {
print map { sprintf("%5s", $_) } #$row,"\n";
}
(5 columns wide).
In each of these, I'm working on whole rows - that only really applies though, if I'm right about the assumptions I've made about which elements you're displaying.
At a very basic level - your code could work as:
#!/usr/bin/env perl
use strict;
use warnings;
my #matrix = ( [ 1, 2, 3, 4 ],
[ 5, 6, 7, 8 ],
[ 9, 10, 11, 12 ], );
my $steps = 2;
my $distances = 3;
for ( my $x = 1; $x <= $steps; $x++ ) { # loop through lines
for ( my $y = 0; $y <= $distances; $y++ ) {
printf( "%5s", $matrix[$x][$y] );
}
print "\n";
}
Although note - that will only work with equal numbers of columns. You could, however, do something like:
#!/usr/bin/env perl
use strict;
use warnings;
my #matrix = ( [ 1, 2, ],
[ 3, 4, 5, ],
[ 6, 7, 8, 9, 10, 11, 12 ], );
my $steps = 2;
my $distances = 3;
for ( my $x = 1; $x <= $steps; $x++ ) { # loop through lines
for ( my $y = 0; $y <= $distances; $y++ ) {
printf( "%5s", $matrix[$x][$y] // '' );
}
print "\n";
}
Which omits the first row (because you set $x to 1), and iterates up to 4 columns:
3 4 5
6 7 8 9
This omits the extra values on the last line, and uses // to test if the cell is empty or not.
for my $row (#matrix) {
my $format = join(' ', ('%5.2f') x #$row)."\n";
printf($format, #$row);
}
If all rows have the same number of columns, you could calculate the format once.
if (#matrix) {
my $format = join(' ', ('%5.2f') x #{$matrix[0]})."\n";
for my $row (#matrix) {
printf($format, #$row);
}
}
If the size of the columns isn't unknown in advance, you'll need to need to perform the following in order:
Format the cells (if needed),
Find the length of the largest cell of each column, then
Print out the matrix with padding.
The following assumes every row of the matrix is the same length.
use List::Util qw( max );
if (#matrix) {
for my $row (#matrix) {
$_ = sprinf('%.2f', $_) for #$row;
}
my $num_cols = #{$matrix[0]};
my #col_sizes = (0) x $num_cols;
for my $row (#matrix) {
$col_sizes[$x] = max(0, $col_sizes[$x], $row->[$x]);
}
my $format = join(' ', map { "%$_s" } #col_sizes)."\n";
for my $row (#matrix) {
printf($format, #$row);
}
}

Modifying an array through a reference

I am trying to write a subroutine that will receive an array reference and then delete some of the elements of the array. For example:
use strict;
use warnings;
my #a = (1, 2, 3, 6);
func1 (\#a);
sub func1 {
my $a = shift;
my #b = (2, 6);
for my $val_to_remove (#b) {
for my $i (0..$#$a) {
my $val = $a->[$i];
if ( $val == $val_to_remove ) {
splice #$a, $i, 1;
last;
}
}
}
}
This seems, to say the least, a little awkward using two for loops.
Is it possible to simplify this?
I also tried
use strict;
use warnings;
my #a = (1, 2, 3, 6);
my $temp = \#a;
func2 (\$temp );
sub func2 {
my $a = shift;
$$a = [2, 6];
}
but then #a is not modified, but rather $temp will be..
I would also rather like to avoid passing a reference to a reference, since that will mess up the calling syntax for other modules.
Use a hash as an indicator function for identifying efficiently the items to be removed; use a grep for filtering them out:
sub func1 {
my $a = shift;
my %b = map { ($_ => 1) } (2, 6);
#$a = grep { !$b{$_} } #$a;
}
Loic's solution works well and is quite readable. I would recommend it unless you're working with large arrays that cause the grep to eat a lot of memory, or if performance is absolutely critical.
You can get a bit of a performance boost by using splice:
use strict;
use warnings;
use Data::Dump;
my #haystack = (1, 2, 3, 6);
my %needle = map { $_ => 1 } (2, 6);
foreach my $i (reverse 0 .. $#haystack) {
splice #haystack, $i, 1 if exists $needle{ $haystack[$i] };
}
dd \#haystack;
Output:
[1, 3]
Note that you must iterate through #haystack in reverse order, since every time you remove an element, the remaining elements shift to the left, changing the array indexes.
Benchmark
Here are the results from a slightly modified version of BrowserUk's corrected benchmark, written in response to foreach array - delete current row ? on PerlMonks. The original benchmark included several other methods for removing elements from an array, which I've left out for simplicity.
$ ./benchmark -N=1e2
Rate grep for_splice
grep 40959/s -- -37%
for_splice 65164/s 59% --
$ ./benchmark -N=1e3
Rate grep for_splice
grep 4072/s -- -38%
for_splice 6515/s 60% --
$ ./benchmark -N=1e4
Rate grep for_splice
grep 366/s -- -33%
for_splice 550/s 50% --
$ ./benchmark -N=1e5
Rate grep for_splice
grep 32.7/s -- -38%
for_splice 52.9/s 62% --
$ ./benchmark -N=1e6
(warning: too few iterations for a reliable count)
Rate grep for_splice
grep 2.36/s -- -28%
for_splice 3.28/s 39% --
And the benchmark code itself:
#!/usr/bin/perl -sl
use strict;
use warnings;
use Benchmark 'cmpthese';
our $N //= 1e3;
our $I //= -1;
# 10% the size of the haystack
my $num_needles = int($N / 10) || 1;
our #as;
#{ $as[ $_ ] } = 1 .. $N for 0 .. 4;
our %needle = map { int(rand($N)) => 1 } 1 .. $num_needles;
cmpthese $I, {
for_splice => q[
my $ar = $as[0];
foreach my $i (reverse 0 .. $#$ar) {
splice #$ar, $i, 1 if exists $needle{ $ar->[$i] };
}
$I == 1 and print "0: ", "#$ar";
],
grep => q[
my $ar = $as[1];
#$ar = grep { ! exists $needle{$_} } #$ar;
$I == 1 and print "1: ", "#$ar";
],
};
You can't use a simple for (LIST) loop to iterate over the indices of an array if you're also modifying the contents of the array. That's because the index of the last item may change, and you will also skip over elements if you delete the current element and increment the counter.
A while loop is required instead, or the equivalent C-style for.
This program demonstrates, as well as uing List::Util::any to check whether an array elemnent should be deleted
use strict;
use warnings;
use List::Util 'any';
my #a = (1, 2, 3, 6);
func1 (\#a);
use Data::Dump;
dd \#a;
sub func1 {
my ($a) = #_;
my #b = (2, 6);
for ( my $i = 0; $i < #$a; ) {
if ( any { $a->[$i] == $_ } #b ) {
splice #$a, $i, 1;
}
else {
++$i;
}
}
}
output
[1, 3]
With problems of this nature, it is often easier to build the array you want rather than delete from an existing one.
my #a = ( 1, 2, 3, 6 );
sub func1 {
my $aref = shift #_;
my #b = ( 2, 6 );
my #results = ();
for my $item ( #$aref ){
if( grep { $item == $_ } #b ){
next;
}
push #results, $item;
}
return #results;
}
my #results = func1( \#a );
say "#results";

How do I create a Bubble Sort in Perl

I am trying to create a simple bubble sort in Perl but it doesn't seem to work. Can any one help me?
Code:
for ( my $i = 1; $i < #array; $i++ ) {
for ( my $k = 0; $k = #array < $i - 1; $k++ ) {
if ( $array[$k] > $array[ $k + 1 ] ) {
$temp = $array[$k];
$array[$k] = $array[ $k + 1 ];
$array[ $k + 1 ] = $temp;
}
}
}
Then when I iterate through the array again it is not sorted.
Shouldn't the outer loop go from the back to the front of the array? Also the $k = #array< $i - 1 statement in the for inner loop doesn't make sense.
my #array = (5,6,3,1,7,3,2,9,10,4);
my $i, $k;
for ($i = $#array; $i > 0; $i--) { # $#array = last index = length-1
for ($k = 0; $k < $i; $k++) {
if ($array[$k] > $array[$k+1]) {
($array[$k], $array[$k+1]) = ($array[$k+1], $array[$k]);
}
}
}
print "#array\n"; # 1 2 3 3 4 5 6 7 9 10
You should avoid this sort of confusion by using list version of for
my #array = ( 5, 6, 3, 1, 7, 3, 2, 9, 10, 4 );
for my $i ( 1 .. $#array ) {
for my $k ( 0 .. $i - 1 ) {
#array[ $k, $k + 1 ] = #array[ $k + 1, $k ]
if $array[$k] > $array[ $k + 1 ];
}
}
print "#array\n";
You can prevent a lot of bugs in this way and code is more readable because there is clearly visible intent what do you like to achieve. Which leads to the realisation your algorithm is wrong and what you would probably like is
my #array = ( 5, 6, 3, 1, 7, 3, 2, 9, 10, 4 );
for my $i ( reverse 1 .. $#array ) {
for my $k ( 0 .. $i - 1 ) {
#array[ $k, $k + 1 ] = #array[ $k + 1, $k ]
if $array[$k] > $array[ $k + 1 ];
}
}
print "#array\n";
I cannot believe nobody gave him the true PERL answer yet. I believe he asked for it in Perl, not in C transcribed to Perl :)
for($i=$#a;$i>0;$i--){$m=$a[0];splice(#a,0,$i+1,map{$s=$m;$m>$_?$_:($s,$m=$_)[0]}#a[1..$i],$m);}

Performing a function on each combination of variables in two arrays

I am trying to take one set of data and subtract each value in that data by another set of data.
For example:
Data set one (1, 2, 3)
Data set two (1, 2, 3, 4, 5)
So I should get something like (1 - (1 .. 5)) then (2 - (1..5)) and so on.
I currently have:
#!/usr/bin/perl
use strict;
use warnings;
my $inputfile = $ARGV[0];
open( INPUTFILE, "<", $inputfile ) or die $!;
my #array = <INPUTFILE>;
my $protein = 'PROT';
my $chain = 'P';
my $protein_coords;
for ( my $line = 0; $line <= $#array; ++$line ) {
if ( $array[$line] =~ m/\s+$protein\s+/ ) {
chomp $array[$line];
my #splitline = ( split /\s+/, $array[$line] );
my %coordinates = (
x => $splitline[5],
y => $splitline[6],
z => $splitline[7],
);
push #{ $protein_coords->[0] }, \%coordinates;
}
}
print "$protein_coords->[0]->[0]->{'z'} \n";
my $lipid1 = 'MEM1';
my $lipid2 = 'MEM2';
my $lipid_coords;
for ( my $line = 0; $line <= $#array; ++$line ) {
if ( $array[$line] =~ m/\s+$lipid1\s+/ || $array[$line] =~ m/\s+$lipid2\s+/ ) {
chomp $array[$line];
my #splitline = ( split /\s+/, $array[$line] );
my %coordinates = (
x => $splitline[5],
y => $splitline[6],
z => $splitline[7],
);
push #{ $lipid_coords->[1] }, \%coordinates;
}
}
print "$lipid_coords->[1]->[0]->{'z'} \n";
I am trying to take every value in $protein_coords->[0]->[$ticker]->{'z'} minus each value in $lipid_coords->[1]->[$ticker]->{'z'}.
My overall objective is to find (z2-z1)^2 in the equation d = sqrt((x2-x1)^2+(y2-y1)^2-(z2-z1)^2). I think that if I can do this once then I can do it for X and Y also. Technically I am trying to find the distance between every atom in a PDB file against every lipid atom in the same PDB and print the ResID for distance less than 5A.
To iterate on all combinations of two arrays, just embed two for loops:
use strict;
use warnings;
my #dataone = (1, 2, 3);
my #datatwo = (1, 2, 3, 4, 5);
for my $one (#dataone) {
for my $two (#datatwo) {
print "$one - $two\n";
}
}
Outputs:
1 - 1
1 - 2
1 - 3
1 - 4
1 - 5
2 - 1
2 - 2
2 - 3
2 - 4
2 - 5
3 - 1
3 - 2
3 - 3
3 - 4
3 - 5
This will give you the result of subtracting each element of set 2 from each element of set 1 in what I believe is the manner you were asking.
#!/usr/bin/perl
use strict;
use warnings;
my #set1 = (1, 2, 3);
my #set2 = (1, 2, 3, 4, 5);
my #set3 = ();
for my $val (#set1) {
push #set3, map { $val - $_ } #set2;
}
local $" = ', ';
print "#set3\n";
system 'pause';
The result will be an array containing (1 - (1..5), 2 - (1..5), 3 - (1..5)).
Contents of #set3 after script runs:
0, -1, -2, -3, -4, 1, 0, -1, -2, -3, 2, 1, 0, -1, -2
All the other protein and lipid stuff is way over my head, but I hope this at least helps a little. You should now have an array containing the subtracted elements that you can work with to get the rest of your results!
Edit:
Can replace the loop with this one liner :)
my #set3 = map { my $v = $_; map { $v - $_ } #set2 } #set1;
map is a pretty nifty function!
The easiest way to do this is to do your calculations while you're going through file two:
for (my $line = 0; $line <= $#array; ++$line) {
if (($array[$line] =~ m/\s+$lipid1\s+/) | ($array[$line] =~ m/\s+$lipid2\s+/)) {
chomp $array[$line];
my #splitline = (split /\s+/, $array[$line]);
my %coordinates = (x => $splitline[5],
y => $splitline[6],
z => $splitline[7],
);
push #{$lipid_coords->[1]}, \%coordinates;
# go through each of the sets of protein coors in your array...
for my $p (#{$protein_coords->[0]}) {
# you can store this value however you want...
my $difference = $protein_coords->[0][$p]{z} - $coordinates{z};
}
}
}
If I were you, I would use some form of unique identifier to allow me to access the data on each combination -- e.g. build a hash of the form $difference->{<protein_id>}{<lipid_id>} = <difference>.

Perl: What is the easiest way to flatten a multidimensional array?

What's the easiest way to flatten a multidimensional array ?
One level of flattening using map
$ref = [[1,2,3,4],[5,6,7,8]]; # AoA
#a = map {#$_} #$ref; # flattens it
print "#a"; # 1 2 3 4 5 6 7 8
Using List::Flatten seems like the easiest:
use List::Flatten;
my #foo = (1, 2, [3, 4, 5], 6, [7, 8], 9);
my #bar = flat #foo; # #bar contains 9 elements, same as (1 .. 9)
Actually, that module exports a single simple function flat, so you might as well copy the source code:
sub flat(#) {
return map { ref eq 'ARRAY' ? #$_ : $_ } #_;
}
You could also make it recursive to support more than one level of flattening:
sub flat { # no prototype for this one to avoid warnings
return map { ref eq 'ARRAY' ? flat(#$_) : $_ } #_;
}
The easiest and most natural way, is to iterate over the values and use the # operator to "dereference" / "unpack" any existing nested values to get the constituent parts. Then repeat the process for every reference value encountered.
This is similar to Viajayenders solution, but works for values not already in an array reference and for any level of nesting:
sub flatten {
map { ref $_ ? flatten(#{$_}) : $_ } #_;
}
Try testing it like so:
my #l1 = [ 1, [ 2, 3 ], [[[4]]], 5, [6], [[7]], [[8,9]] ];
my #l2 = [ [1,2,3,4,5], [6,7,8,9] ];
my #l3 = (1, 2, [3, 4, 5], 6, [7, 8], 9); # Example from List::Flatten
my #r1 = flatten(#l1);
my #r2 = flatten(#l1);
my #r3 = flatten(#l3);
if (#r1 ~~ #r2 && #r2 ~~ #r3) { say "All list values equal"; }
if data is always like an example, I recommend List::Flatten too.
but data has more than 2 nested array, flat cant't work.
like #foo = [1, [2, [3, 4, 5]]]
in that case, you should write recursive code for it.
how about bellow.
sub flatten {
my $arg = #_ > 1 ? [#_] : shift;
my #output = map {ref $_ eq 'ARRAY' ? flatten($_) : $_} #$arg;
return #output;
}
my #foo = (1, 2, [3, 4, 5, [6, 7, 8]], 9);
my $foo = [1, 2, [3, 4, 5, [6, 7, 8]], 9];
my #output = flatten #foo;
my #output2 = flatten $foo;
print "#output";
print "#output2";
The easiest way to flatten a multidimensional array when it includes:
1. arrays
2. array references
3. scalar values
4. scalar references
sub flatten {
map { ref $_ eq 'ARRAY' ? flatten(#{$_}) :
ref $_ eq 'SCALAR' ? flatten(${$_}) : $_
} #_;
}
The other flatten sub answer crashes on scalar references.
Something along the lines of:
my $i = 0;
while ($i < scalar(#array)) {
if (ref #array[$i] eq 'ARRAY') {
splice #array, $i, 1, #$array[$i];
} else {
$i++;
}
}
I wrote it blindly, no idea if it actually works but you should get the idea.
Same as Vijayender's solution but will work on mixed arrays containing arrayrefs and scalars.
$ref = [[1,2,3,4],[5,6,7,8],9,10];
#a = map { ref $_ eq "ARRAY" ? #$_ : $_ } #$ref;
print "#a"
Of course you can extend it to also dereference hashrefs:
#a = map { ref $_ eq "ARRAY" ? #$_ : ref $_ eq "HASH" ? %$_: $_ } $#ref;
or use grep to weed out garbage:
#a = map { #$_} grep { ref $_ eq 'ARRAY' } #$ref;
As of List::MoreUtils 0.426 we have an arrayify function that flattens arrays recursively:
#a = (1, [[2], 3], 4, [5], 6, [7], 8, 9);
#l = arrayify #a; # returns 1, 2, 3, 4, 5, 6, 7, 8, 9
It was introduced earlier but was broken.