I've a set of strings with variable sizes, for example:
AAA23
AB1D1
A1BC
AAB212
My goal is have in alphabetical order and unique characters collected for COLUMNS, such as:
first column : AAAA
second column : AB1A
and so on...
For this moment I was able to extract the posts through a hash of hashes. But now, how can I sort data? Could I for each hash of hash make a new array?
Thank you very much for you help!
Al
My code:
#!/usr/bin/perl
use strict;
use warnings;
my #sessions = (
"AAAA",
"AAAC",
"ABAB",
"ABAD"
);
my $length_max = 0;
my $length_tmp = 0;
my %columns;
foreach my $string (#sessions){
my $l = length($string);
if ($l > $length_tmp){
$length_max = $l;
}
}
print "max legth : $length_max\n\n";
my $n = 1;
foreach my $string (#sessions){
my #ch = split("",$string);
for my $col (1..$length_max){
$columns{$n}{$col} = $ch[$col-1];
}
$n++;
}
foreach my $col (keys %columns) {
print "colonna : $col\n";
my $deref = $columns{$col};
foreach my $pos (keys %$deref){
print " posizione : $pos --> $$deref{$pos}\n";
}
print "\n";
}
exit(0);
What you're doing is rotating the array. It doesn't need a hash of hash or anything, just another array. Surprisingly, neither List::Util nor List::MoreUtils supplies one. Here's a straightforward implementation with a test. I presumed you want short entries filled in with spaces so the columns come out correct.
#!/usr/bin/perl
use strict;
use warnings;
use Test::More;
use List::Util qw(max);
my #Things = qw(
AAA23
AB1D1
A1BC
AAB212
);
sub rotate {
my #rows = #_;
my $maxlength = max map { length $_ } #rows;
my #columns;
for my $row (#rows) {
my #chars = split //, $row;
for my $colnum (1..$maxlength) {
my $idx = $colnum - 1;
$columns[$idx] .= $chars[$idx] || ' ';
}
}
return #columns;
}
sub print_columns {
my #columns = #_;
for my $idx (0..$#columns) {
printf "Column %d: %s\n", $idx + 1, $columns[$idx];
}
}
sub test_rotate {
is_deeply [rotate #_], [
"AAAA",
"AB1A",
"A1BB",
"2DC2",
"31 1",
" 2",
];
}
test_rotate(#Things);
print_columns(#Things);
done_testing;
You can sort the output of %columns in your code with
foreach my $i (sort { $a <=> $b } keys %columns) {
print join(" " => sort values %{ $columns{$i} }), "\n";
}
This gives
A A A A
A A A C
A A B B
A A B D
But using index numbers as hash keys screams that you should use an array instead, so let's do that. To get the columns, use
sub columns {
my #strings = #_;
my #columns;
while (#strings) {
push #columns => [ sort map s/^(.)//s ? $1 : (), #strings ];
#strings = grep length, #strings;
}
#columns;
}
Given the strings from your question, it returns
A A A A
1 A A B
1 A B B
2 2 C D
1 1 3
2
As you can see, this is unsorted and repeats characters. With Perl, when you see the word unique, always think of hashes!
sub unique_sorted_columns {
map { my %unique;
++$unique{$_} for #$_;
[ sort keys %unique ];
}
columns #_;
}
If you don't mind destroying information, you can have columns sort and filter duplicates:
sub columns {
my #strings = #_;
my #columns;
while (#strings) {
my %unique;
map { ++$unique{$1} if s/^(.)//s } #strings;
push #columns => [ sort keys %unique ];
#strings = grep length, #strings;
}
#columns;
}
Output:
A
1 A B
1 A B
2 C D
1 3
2
Related
I have a list in Perl.
#alist=("a_vld","a_sf","a_ef","a_val");
print join(',', #alist), "\n";
Output:
a_vld,a_sf,a_ef,a_val
How can I re-order the elements in list such that its output is as following expected output:
a_sf,a_ef,a_vld,a_val
Note: a_ will keep changing with different strings but I want to preserve the order of sf,ef,vld,val
One way to order (sort) by a given set is to associate a numeric order with its elements.
Then we need to find these keys in the strings in the list to sort. That is done once, ahead of the actual sorting, via the Schwartzian transform.
use List::Util qw(first);
my #alist = ("a_vld", "a_sf", "a_ef", "a_val");
my #keys = qw(sf ef vld val); # keys to sort by
my %order_by = map { $keys[$_] => $_ } 0..$#keys;
my #sorted =
map { $_->[0] }
sort { $order_by{$a->[1]} <=> $order_by{$b->[1]} }
map {
my $elem = $_;
[ $elem, first { $elem =~ /$_/ } keys %order_by ]
}
#alist;
say "#sorted";
This prints the line: a_sf a_ef a_vld a_val
The elements of the list are associated with the sorting keys by finding the key in the string via a regex. With this in mind the above can be used reasonably generally, with a desired list of keys .
For a short list, use a sort function that identifies the sequences in order
#sorted = sort {
($b =~ /sf$/) <=> ($a =~ /sf$/)
|| ($b =~ /ef$/) <=> ($a =~ /ef$/)
|| ($b =~ /vld$/) <=> ($a =~ /vld$/)
|| ($b =~ /val$/) <=> ($a =~ /val$/)
} #alist;
If there's always a _ before the code, and if the code is always at the end
use Sort::Key qw( ikeysort );
my #order = qw( sf ef vld val );
my %order = map { $order[$_] => $_ } 0..$#order;
my #sorted = ikeysort { /_([^_\W]+)\z/ ? $order{$1} : 0 } #unsorted;
The above is a faster, cleaner version of the following:
my #order = qw( sf ef vld val );
my %order = map { $order[$_] => $_ } 0 .. $#order;
my #sorted =
sort {
my $key_a = /_([^_\W]+)\z/ ? $order{$1} : 0;
my $key_b = /_([^_\W]+)\z/ ? $order{$1} : 0;
$key_a <=> $key_b || $a cmp $b
}
#unsorted;
Otherwise
use Sort::Key qw( ikeysort );
my #order = qw( sf ef vld val );
my %order = map { $order[$_] => $_ } 0..$#order;
my $alt = join '|', map quotemeta, #order;
my $re = qr/($alt)/;
my #sorted = ikeysort { /$re/ ? $order{$1} : 0 } #unsorted;
The above is a faster, cleaner version of the following:
my #order = qw( sf ef vld val );
my %order = map { $order[$_] => $_ } 0..$#order;
my $alt = join '|', map quotemeta, #order;
my $re = qr/($alt)/;
my #sorted =
sort {
my $key_a = /$re/ ? $order{$1} : 0;
my $key_b = /$re/ ? $order{$1} : 0;
$key_a <=> $key_b || $a cmp $b
}
#unsorted;
I want to make a hash of array from a file that looks like:
xx500173:56QWER 45 A rtt34 34C
...
I would like to have a unique "key" (e.g. column1_column2)
#!/usr/bin/perl
use warnings;
use strict;
my $seq;
while(<>){
chomp;
my #line = split(/\s+/, $_);
my $key = $line[0] . "_" . $line[1]; #try to make a unique key for each entry
map { $seq->{ $_->[$key] } = [#$_[0..4]] } [ split/\s+/ ];
}
foreach my $s (keys %{$seq} ) {
print $s,": ",join( "\t", #{ $seq->{$s}} ) . "\n";
}
but I get the following error:
Argument "xx500173:56QWER_45" isn't numeric in array element
Does is it matter if key is numeric or string?
An index to an array [] should be numeric, but $key is not numeric. Assuming you want all the white-space-separated tokens as elements of your array:
use warnings;
use strict;
my $seq;
while (<DATA>) {
chomp;
my #line = split;
my $key = $line[0] . "_" . $line[1]; #try to make a unique key for each entry
$seq->{$key} = [ #line ];
}
foreach my $s ( keys %{$seq} ) {
print $s, ": ", join( "\t", #{ $seq->{$s} } ) . "\n";
}
__DATA__
xx500173:56QWER 45 A rtt34 34C
Outputs:
xx500173:56QWER_45: xx500173:56QWER 45 A rtt34 34C
You have confused yourself with the line
map { $seq->{ $_->[$key] } = [#$_[0..4]] } [ split/\s+/ ];
which is wrong because
map is an operator for translating one list into another by performing the same operation on every element of the input list, but you are ignoring the returned value
The input list is only one item long - the array reference returned by [ split/\s+/ ]
What you have written is the same as
$_ = [ split /\s+/ ];
$seq->{ $_->[$key] } = [ #$_[0..4] ];
and the problem is that $_->[$key] tries to index the anonymous array using the string $key, which is clearly wrong.
All you need here is
$seq->{$key} = [ #line[0..4] ];
and your complete program should look like this
#!/usr/bin/perl
use strict;
use warnings;
my $seq;
while ( <> ) {
chomp;
my #line = split;
$seq->{"$line[0]_$line[1]"} = [ #line[0..4] ];
}
for my $s ( keys %{$seq} ) {
printf "%s: %s\n", $s, join("\t", #{ $seq->{$s} } );
}
After working with this code, I am stuck at what I think is a simple error, yet I need outside eyes to see what is wrong.
I used unpack function to divide an array into the following.
#extract =
------MMMMMMMMMMMMMMMMMMMMMMMMMM-M-MMMMMMMM
------SSSSSSSSSSSSSSSSSSSSSSSSSS-S-SSSSSDTA
------TIIIIIIIIIIIIITIIIVVIIIIII-I-IIIIITTT
Apparently, after unpacking into the array, when I try to go into the while loop, #extract shows up completely empty. Any idea as to why this is happening?
print #extract; #<-----------Prints input
my $sum = 0;
my %counter = ();
while (my $column = #extract) {
print #extract; #<------- This extract is completely empty. Should be input
for (my $aa = (split ('', $column))){
$counter{$aa}++;
delete $counter{'-'}; # Don't count -
}
# Sort keys by count descending
my #keys = (sort {$counter{$b} <=> $counter{$a}} keys %counter) [0]; #gives highest letter
for my $key (#keys) {
$sum += $counter{$key};
print OUTPUT "$key $counter{$key} ";
Each line is an array element correct? I don't see in your code where you are checking the individual characters.
Assuming the input that you have shown is a 3 element array containing the line as a string:
#!/usr/bin/perl
use strict;
use warnings;
my #entries;
while(my $line = shift(#extract)){
my %hash;
for my $char(split('', $line)){
if($char =~ /[a-zA-Z]/) { $hash{$char}++ }
}
my $high;
for my $key (keys %hash) {
if(!defined($high)){ $high = $key }
elsif($hash{$high} < $hash{$key}){
$high = $key
}
}
push #entries, {$high => $hash{$high}};
}
Note this empties #extract, if you don't want to do that you'd have to use a for loop like below
for my $i (0 .. $#extract){
#my %hash etc...
}
EDIT:
Changed it so that only the highest number is actually kept
An approach using reduce from List::Util.
#!/usr/bin/perl
use strict;
use warnings;
use List::Util 'reduce';
my #extract = qw/
------MMMMMMMMMMMMMMMMMMMMMMMMMM-M-MMMMMMMM
------SSSSSSSSSSSSSSSSSSSSSSSSSS-S-SSSSSDTA
------TIIIIIIIIIIIIITIIIVVIIIIII-I-IIIIITTT
/;
for (#extract) {
my %count;
tr/a-zA-Z//cd;
for (split //) {
$count{$_}++;
}
my $max = reduce { $count{$a} > $count{$b} ? $a : $b } keys %count;
print "$max $count{$max}\n";
}
I have two arrays:
#array1 = (A,B,C,D,E,F);
#array2 = (A,C,H,D,E,G);
The arrays could be of different size. I want to find how many mismatches are there between the arrays. The indexes should be the same. In this case there are three mismatch :b->c,c->h and F->G.(i.e , The 'C' in $array[2] should not be considered a match to 'C' in $array[1]) I would like to get the number of mismatches as well as the mismatch.
foreach my $a1 ( 0 .. $#array1) {
foreach my $a2( 0 .. $#array2)
if($array1[$a1] ne $array2[$a2]) {
}
}
}
my %array_one = map {$_, 1} #array1;
my #difference = grep {!$array_one {$_}} #array1;
print "#difference\n";
Ans: gives me H, G but not C.
with my little Perl knowledge I tried this, with no result. Could you suggest me how I should deal this? Your suggestions and pointers would be very helpful.
You shouldn't have nested loops. You only need to go through the indexes once.
use List::Util qw( max );
my #mismatches;
for my $i (0..max($#array1, $#array2)) {
push #mismatches, $i
if $i >= #array1
|| $i >= #array2
|| $array1[$i] ne $array2[$i];
}
}
say "There are " . (0+#mismatches) . " mismatches";
for my $i (#mismatches) {
...
}
Since you mentioned grep, this is how you'd replace the for with grep:
use List::Util qw( max );
my #mismatches =
grep { $_ >= #array1
|| $_ >= #array2
|| array1[$_] ne $array2[$_] }
0 .. max($#array1, $#array2);
say "There are " . (0+#mismatches) . " mismatches";
for my $i (#mismatches) {
...
}
Here's an example using each_arrayref from List::MoreUtils.
sub diff_array{
use List::MoreUtils qw'each_arrayref';
return unless #_ && defined wantarray;
my #out;
my $iter = each_arrayref(#_);
my $index = 0;
while( my #current = $iter->() ){
next if all_same(#current);
unshift #current, $index;
push #out, \#current;
}continue{ ++$index }
return #out;
}
This version should be faster if you are going to use this for determining the number of differences often. The output is exactly the same. It just doesn't have to work as hard when returning a number.
Read about wantarray for more information.
sub diff_array{
use List::MoreUtils qw'each_arrayref';
return unless #_ && defined wantarray;
my $iter = each_arrayref(#_);
if( wantarray ){
# return structure
my #out;
my $index = 0;
while( my #current = $iter->() ){
next if all_same(#current);
unshift #current, $index;
push #out, \#current;
}continue{ ++$index }
return #out;
}else{
# only return a count of differences
my $out = 0;
while( my #current = $iter->() ){
++$out unless all_same #current;
}
return $out;
}
}
diff_array uses the subroutine all_same to determine if all of the current list of elements are the same.
sub all_same{
my $head = shift;
return undef unless #_; # not enough arguments
for( #_ ){
return 0 if $_ ne $head; # at least one mismatch
}
return 1; # all are the same
}
To get just the number of differences:
print scalar diff_array \#array1, \#array2;
my $count = diff_array \#array1, \#array2;
To get a list of differences:
my #list = diff_array \#array1, \#array2;
To get both:
my $count = my #list = diff_array \#array1, \#array2;
The output for the input you provided:
(
[ 1, 'B', 'C' ],
[ 2, 'C', 'H' ],
[ 5, 'F', 'G' ]
)
Example usage
my #a1 = qw'A B C D E F';
my #a2 = qw'A C H D E G';
my $count = my #list = diff_array \#a1, \#a2;
print "There were $count differences\n\n";
for my $group (#list){
my $index = shift #$group;
print " At index $index\n";
print " $_\n" for #$group;
print "\n";
}
You're iterating over both arrays when you don't want to be doing so.
#array1 = ("A","B","C","D","E","F");
#array2 = ("A","C","H","D","E","G");
foreach my $index (0 .. $#array1) {
if ($array1[$index] ne $array2[$index]) {
print "Arrays differ at index $index: $array1[$index] and $array2[$index]\n";
}
}
Output:
Arrays differ at index 1: B and C
Arrays differ at index 2: C and H
Arrays differ at index 5: F and G
Well, first, you're going to want to go over each element of one of the arrays, and compare it to the same element of the other array. List::MoreUtils provides an easy way to do this:
use v5.14;
use List::MoreUtils qw(each_array);
my #a = qw(a b c d);
my #b = qw(1 2 3);
my $ea = each_array #a, #b;
while ( my ($a, $b) = $ea->() ) {
say "a = $a, b = $b, idx = ", $ea->('index');
}
You can extend that to find where there is a non-match by checking inside that while loop (note: this assumes your arrays don't have undefs at the end, or that if they do, undef is the same as having a shorter array):
my #mismatch;
my $ea = each_array #a, #b;
while ( my ($a, $b) = $ea->() ) {
if (defined $a != defined $b || $a ne $b) {
push #mismatch, $ea->('index');
}
}
and then:
say "Mismatched count = ", scalar(#mismatch), " items are: ", join(q{, }, #mismatch);
The following code builds a list of mismatched pairs, then prints them out.
#a1 = (A,B,C,D,E,F);
#a2 = (A,C,H,D,E,G);
#diff = map { [$a1[$_] => $a2[$_]] }
grep { $a1[$_] ne $a2[$_] }
(0..($#a1 < $#a2 ? $#a1 : $#a2));
print "$_->[0]->$_->[1]\n" for #diff
You have the right idea, but you only need a single loop, since you are looking at each index and comparing entries between the arrays:
foreach my $a1 ( 0 .. $#array1) {
if($array1[$a1] ne $array2[$a1]) {
print "$a1: $array1[$a1] <-> $array2[$a1]\n";
}
}
I am trying to do work on a hash of files and the work has to be done in a specific order. Most would say the list can be ordered like so:
for my $k (sort keys %my_hash)
{
print "$k=>$my_hash{$k}, ";
}
However, I need nonalphabetical order, in fact the keys start with a word then _ and they go G to digits to L to any of M,P,R,T or D (eg. word_G.txt,word_2.txt,...,word_P.txt). Is there any way to sort by custom order?
Is there any way to sort by custom order?
Yes. See sort.
For example:
#!/usr/bin/env perl
use warnings; use strict;
my #order = qw(G 1 2 3 L M P R T D);
my %order_map = map { $order[$_] => $_ } 0 .. $#order;
my $pat = join '|', #order;
my #input = qw(word_P.txt word_2.txt word_G.txt);
my #sorted = sort {
my ($x, $y) = map /^word_($pat)[.]txt\z/, $a, $b;
$order_map{$x} <=> $order_map{$y}
} #input;
print "#sorted\n";
use 5.014;
sub rank {
my ($word) = #_;
$word =~ s{\A \w+ _}{}msx;
return do {
given ($word) {
0 when /\A G/msx;
1 when /\A [0-9]/msx;
2 when /\A L/msx;
3 when /\A [MPRTD]/msx;
default { 1000 };
}
};
}
say for sort { rank($a) <=> rank($b) } qw(word_P.txt word_2.txt word_G.txt);
Output:
word_G.txt
word_2.txt
word_P.txt
Edit: Before Perl 5.14, use a temporary variable.
use 5.010;
⋮
return do {
my $dummy;
given ($word) {
$dummy = 0 when /\A G/msx;
$dummy = 1 when /\A [0-9]/msx;
$dummy = 2 when /\A L/msx;
$dummy = 3 when /\A [MPRTD]/msx;
default { $dummy = 1000 };
}
$dummy;
};
I had a specific use case where I wanted to sort with certain values first, other values last, then everything else alphabetically in the middle.
Here's my solution:
my #sorted = sort {
my #order = qw(Mike Dave - Tom Joe);
my ($x,$y) = (undef,undef);
for (my $i = 0; $i <= $#order; $i++) {
my $token = $order[$i];
$x = $i if ($token eq $a or (not defined $x and $token eq "-"));
$y = $i if ($token eq $b or (not defined $y and $token eq "-"));
}
$x <=> $y or
$a cmp $b
} #ARGV;
Output:
$ perl customsort.pl Tom Z Mike A Joe X Dave G
Mike Dave A G X Z Tom Joe