Perl: Get position and length of element in a string - perl

Say I have a string like:
my $refseq="CCCC-TGA---ATAAAC--TCCAT-GCTCCCCC--------------------AAGC";
I want to detect the positions where "-" occurs and the number of contiguous "-". I want to end up with a hash with "-" position as key, and extension length as value, for this example above:
%POSLENGTH = (5 => 1, 8 => 3, 14 => 2, 19 => 1, 27 => 20);
Note that the positions should be given based on the string without "-".

Check for #- array in perlval
my $refseq = "CCCC-TGA---ATAAAC--TCCAT-GCTCCCCC--------------------AAGC";
my %POSLENGTH;
$POSLENGTH{ $-[0] +1 } = length($1) while $refseq =~ s/(-+)//;
use Data::Dumper; print Dumper \%POSLENGTH;
output
$VAR1 = {
'14' => 2,
'8' => 3,
'27' => 20,
'19' => 1,
'5' => 1
};

You can do this using the built-in #- and #+ arrays. Together they hold the start and end offsets of the last successful pattern match in element 0 (and of any captures in elements 1 onwards) so clearly the length of the last match is $+[0] - $-[0].
They're documented under Variables related to regular expressions in perldoc perlvar.
I've used Data::Dump here just to display the contents of the hash that is built
On a side note, I'm very doubtful that a hash is a useful structure for this information as I can't imagine a situation where you know the start position of a substring and need to know its length. I would have thought it was better represented as just an array of pairs
use strict;
use warnings;
use Data::Dump;
my $refseq="CCCC-TGA---ATAAAC--TCCAT-GCTCCCCC--------------------AAGC";
my %pos_length;
while ( $refseq =~ /-+/g ) {
my ($pos, $len) = ( $-[0] + 1, $+[0] - $-[0] );
$pos_length{$pos} = $len;
}
dd \%pos_length;
output
{ 5 => 1, 9 => 3, 18 => 2, 25 => 1, 34 => 20 }

Related

How to print specific key in an array (Perl) [duplicate]

This question already has answers here:
Simple hash search by value
(5 answers)
Closed 5 years ago.
I recently started learning Perl, so I'm not too familiar with the functions and syntax.
If I have a Perl array and some variables,
#!/usr/bin/perl
use strict;
use warnings;
my #numbers = (a =>1, b=> 2, c => 3, d =>4, e => 5);
my $x;
my $range = 5;
$x = int(rand($range));
print "$x";
to generate a random number between 1-5, how can I get the program to print the actual key (a, b, c, etc.) instead of just the number (1, 2, 3, 4, 5)?
It seems that you want to do a reverse lookup, key-by-value, opposite to what we get from a hash. Since a hash is a list you can reverse it and use the resulting hash to look up by number.
A couple of corrections: you need a hash variable (not an array), and you need to add 1 to your rand integer generator so to have the desired 1..5 range
use warnings;
use strict;
use feature 'say';
my %numbers = (a => 1, b => 2, c => 3, d => 4, e => 5);
my %lookup_by_number = reverse %numbers; # values need be unique
my $range = 5;
my $x = int(rand $range) + 1;
say $lookup_by_number{$x};
Without reversing the hash you'd need to iterate the hash %numbers over values, testing each against $x so to find its key.
If there are same values for various keys in your original hash then you have to do it by hand since reverse-ing would attempt to create a hash with duplicate keys, in which case only the last one assigned remains. So you'd lose some values. One way
my #at_num = grep { $x == $numbers{$_} } keys %numbers;
as in the post that this was marked as duplicate of.
But then you should build a data structure for reverse lookup so to not search through the list every time information is needed. This can be a hash where keys are the list of unique numbers while their values are then array references (arrayrefs) with corresponding keys from the original hash
use warnings;
use strict;
my %num = (a => 1, b => 2, c => 1, d => 3, e => 2); # with duplicate values
my %lookup_by_num;
foreach my $key (keys %num) {
push #{ $lookup_by_num{$num{$key}} }, $key;
}
say "$_ => [ #{$lookup_by_num{$_}} ]" for keys %lookup_by_num;
This prints
1 => [ c a ]
3 => [ d ]
2 => [ e b ]
A nice way to display complex data structures is via Data::Dumper, or Data::Dump (or others).
The expression #{ $lookup_by_num{ $num{$key} } } extracts the value of %lookup_by_num for the key $num{$key}and dereferences it #{ ... }, so that it can then push the $key to it. The critical part of this is that the first time it encounters $num{$key} it autovivifies the arrayref and its corresponding key. See this post with its references for details.
There's many ways to do it. For example, declare "numbers" as a hash rather than an array. Note that the keys come first in each key-value pair, and here you want to use your random int as the key:
my %numbers = ( 0 => 'a', 1 => 'b', 2 => 'c', 3 => 'd', 4 => 'e' );
Then you can look up the "key" as you call it using:
my $key = $numbers{$x};
Note that rand( $x ); returns a number greater than or equal to zero and less than $x. So if you want integers in the range 1-5, you must add 1 in your code: at the moment you'll get 0-4, not 1-5.
Firstly, arrays don't have keys (well, they kind of do, but they're integers and not the values you want). So I think you want a hash, not an array.
my %numbers = (a =>1, b=> 2, c => 3, d =>4, e => 5);
And if you want to get the letter, given the integer then you need the reverse of this hash:
my %rev_numbers = %numbers;
Note that reversing a hash like this only works if the values in your original hash are unique (because reversing a hash makes the values into keys and hash keys are always unique).
Then, you can just look up an integer in your %rev_hash to get its associated letter.
my $integer = 3;
say $rev_numbers{$integer}; # prints 'c'

How to use the Perl format function to print multiple columns

I have a Perl hash like %word. The key is the word and the value is its count. Now I want to display %word like:
the 20 array 10 print 2
a 18 perl 8 function 1
of 12 code 5
I search and Perl format can solve this, and I learn this page perlform, but I still don't how to do it.
I knew about format and that it could be vary handy to generate nice forms... at the time we still had a world where all was monospaced...
So, I researched it a bit and found the following solution:
use strict;
use warnings;
my %word = (
the => 20,
array => 10,
print => 2,
a => 18,
perl => 8,
function => 1,
of => 12,
code => 5,
);
my #word = %word; # turn the hash into a list
format =
#<<<<<<<<<<< #>>>> #<<<<<<<<<<< #>>>> #<<<<<<<<<<< #>>>>~~
shift #word, shift #word, shift #word, shift #word, shift #word, shift #word
.
write;
The nasty problem sits in the ~~ which makes the line repeating and that for each field in the format line you do need a corresponding scalar value... In order to get those scalar values, I shifted them off from the #word array.
There is a lot more to know about format and write.
Have fun!

Perl: How does hash assignment with 'map' work?

I have trouble understanding how to assign to a hash using the map function.
Why does
my %a = map {$_=>1 if $_>=2} (1..4);
give me an Odd number of elements in hash assignment error while
my %a = map {$_=>1 if $_>2} (1..4);
gives me
$VAR1 = {
'' => '',
'4' => 1,
'3' => 1
};
and why is there only one empty string in the hash? If I assign to an array
my #a = map {$_ if $_>2} (1..4);
$VAR1 = [
'',
'',
3,
4
];
I get two empty strings, which makes more sense to me.
Is there a possibility to return no empty string if the condition is not met?
Although map is not the best way to do this job (grep as pointed out would be better), it is still possible just using map with the ? comparison:
#!/usr/bin/perl
use strict ;
use warnings ;
use Data::Dumper ;
my %a = map { $_>2 ? ( $_ => 1 ) : () } (1..4) ;
print Dumper( \%a ) ;
Returning the empty list makes map behave like grep when condition is not met.
>perl test.pl
$VAR1 = {
'4' => 1,
'3' => 1
};
map transforms a list into another list. In the first case, your input list is 1, 2, 3, 4. For each member, you return a tuple if the member is >= 2, but otherwise, you return just a single value. The single value is returned for 1 only and causes the "odd number of elements".
In the second case, the transformation works as follows:
input | output
------+-------
1 | ''
2 | ''
3 | 3 => 1
4 | 4 => 1
If you make a hash from it, you take the first empty string as the key, the second empty string as the value, which creates "one empty string in the hash" - there are in fact two.

Math::BaseConvert with strange basenumbers like 23 or 1000

I just found the perl module Math::BaseConvert. I have to the task to convert numbers to very strange number with a different base. Not only base 2, 8, 16, but also 23, 134 up to 1000. (This is a partial task to balance a tree of files in a directory)
I could not make it. Reading the tests for the module in CPAN also confused me. So I wrote a little test, maybe you can tell me what's wrong, the result is:
ok 1 - use Math::BaseConvert;
ok 2 - Convert number '23' (base10) to '27' (base8)
not ok 3 - Convert number '23' (base10) to '23' (base32)
# Failed test 'Convert number '23' (base10) to '23' (base32)'
# at test_math_baseconvert.pl line 35.
# got: 'N'
# expected: '23'
not ok 4 - Convert number '64712' (base10) to '64:712' (base1000)
# Failed test 'Convert number '64712' (base10) to '64:712' (base1000)'
# at test_math_baseconvert.pl line 35.
# got: '-1'
# expected: '64:712'
1..4
# Looks like you failed 2 tests of 4.
The testprogram is this:
#!/usr/bin/perl
use strict;
use warnings;
use Test::More;
use_ok( 'Math::BaseConvert', '1.7' );
my #lines = (
{
# http://www.wolframalpha.com/input/?i=23+from+base+10+to+base+16
old_number => '23',
old_base => 10,
new_number => '27',
new_base => 8,
},
{
# http://www.wolframalpha.com/input/?i=23+from+base+10+to+base+32
old_number => '23',
old_base => 10,
new_number => '23', # stays same
new_base => 32,
},
{
# http://www.wolframalpha.com/input/?i=64712+from+base+10+to+base+1000
old_number => '64712',
old_base => 10,
new_number => '64:712',
new_base => 1000,
},
);
for my $line (#lines) {
cmp_ok(
Math::BaseConvert::cnv(
$line->{old_number}, $line->{old_base}, $line->{new_base}
),
'eq',
$line->{new_number},
sprintf(
"Convert number '%s' (base%d) to '%s' (base%d)",
$line->{old_number}, $line->{old_base},
$line->{new_number}, $line->{new_base}
)
);
}
done_testing();
Wolfram Alpha's method of showing bases greater than base 16 is to separate digits with a colon. There's nothing wrong with that, as they're displaying those numbers using css styling that lessons the shading of the colon to make it more obvious what they're doing. But they also add a message stating exactly how many digits they're showing since "1:1617 (2 digits)" isn't obvious enough.
The method Math::BaseConvert and other such modules use is to expand the character set for digits just like is done with hex numbers 0-9A-F to include the first 6 letters in the alphabet. For the case of base 32 numbers the character set is 0-9A-V. Given N is the 14th letter in the alphabet, it is the appropriate representation for 23 in base 32.
If you want to use the colon representation for numbers greater than 16, you can either use the module or just roll your own solution.
#!/usr/bin/perl
use strict;
use warnings;
use Test::More;
use_ok( 'Math::BaseConvert', '1.7' );
my #lines = (
# Old_Number Old_Base New_Number New_Base
[qw(23 10 27 8)], # http://www.wolframalpha.com/input/?i=23+from+base+10+to+base+16
[qw(23 10 23 32)], # http://www.wolframalpha.com/input/?i=23+from+base+10+to+base+32
[qw(64712 10 64:712 1000)], # http://www.wolframalpha.com/input/?i=64712+from+base+10+to+base+1000
);
for my $line (#lines) {
cmp_ok(
base10toN(#$line[0,3]),
'eq',
$line->[2],
sprintf("Convert number '%s' (base%d) to '%s' (base%d)", $line->[0], 10, #$line[2,3])
);
}
sub base10toN {
my ($num, $base) = #_;
return Math::BaseConvert::cnv($num, 10, $base)
if $base <= 16;
my #digits = ();
while (1) {
my $remainder = $num % $base;
unshift #digits, $remainder;
$num = ($num - $remainder) / $base
or last;
}
return join ':', #digits;
}
done_testing();
You seem to be expecting decimal output, with "digits" being decimal numbers separated by :.
Math::BaseConvert doesn't do that. It only supports having a single character per digit.
By default, the digits used are '0'..'9', 'A'..'Z', 'a'..'z', '.', '_' though you can supply your own list instead (and you would have to do so to support up to base 1000).

Getting fixed width columnar output using 'printf' in Perl [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Formatting output with 'printf' in Perl
my #selections = ("Hamburger","Frankfurter","French Fries","Large Coke","Medium Coke","Small Coke","Onion Rings");
my #prices = (3.49, 2.19, 1.69, 1.79, 1.59, 1.39, 1.19);
my #quantity = (3, 0, 0, 4, 0, 0, 8);
printf("%s %10s %12s %10s\n", "Qty", "Desc.", "Unit \$", "Total");
for($meh = 0; $meh <= 6; $meh++)
{
if($quantity[$meh] != 0)
{
printf("%d %10s %9.2f %7.2f\n", $quantity[$meh], $selections[$meh], $prices[$meh], $prices[$meh]*$quantity[$meh])
}
}
I can't figure out how to make the columns line up. I followed the suggestions of another post, but it still isn't working.
The problem is that your strings are more than 10 characters long, and Perl won't cut them unless you specify a maximum width, which is given after the dot for strings (%10.10s). Also, you may want to use a negative number so they become aligned to the left (%-10.10s).
If you want the columns to be exactly aligned based on dynamic input data, you need to make two passes over the rows. The first time through, record the maximum length of each column. Then construct a format string using those lengths. Finally, print each row using that format string.
use strict;
use warnings;
my #selections = ("Hamburger","Frankfurter","French Fries","Large Coke","Medium Coke","Small Coke","Onion Rings");
my #prices = (3.49, 2.19, 1.69, 1.79, 1.59, 1.39, 1.19);
my #quantity = (3, 0, 0, 4, 0, 0, 8);
my #rows;
push #rows, ["Qty", "Desc.", "Unit \$", "Total"];
# construct table data as a two-dimensional array
for (my $meh = 0; $meh < #selections; $meh++) {
next unless $quantity[$meh];
push #rows, [$quantity[$meh], $selections[$meh], $prices[$meh], $prices[$meh]*$quantity[$meh]];
}
# first pass over rows: compute the maximum width for each column
my #widths;
for my $row (#rows) {
for (my $col = 0; $col < #$row; $col++) {
$widths[$col] = length $row->[$col] if length $row->[$col] > ($widths[$col] // 0);
}
}
# compute the format. for this data, it works out to "%-3s %-11s %-6s %-5s\n"
my $format = join(' ', map { "%-${_}s" } #widths) . "\n";
# second pass: print each row using the format
for my $row (#rows) {
printf $format, #$row;
}
That yields this output:
Qty Desc. Unit $ Total
3 Hamburger 3.49 10.47
4 Large Coke 1.79 7.16
8 Onion Rings 1.19 9.52
Long time ago, Perl was mainly used for formatting files. It still has this capabilities although I haven't seen it used in a program since Perl 4.x came out.
Check out the perlform documentation, the format function, and the write function.
I'd give you an example on what the code would look like except I haven't done it in years. Otherwise, use the printf statement. You can limit the size of a text field with a %-10.10s type of format. This says to left justify the string, and pad it out to 10 characters, but not more than 10 characters.
I also suggest you get a book on modern Perl. One that will teach you about references.
I've rewritten your program to use references. Notice that all of the data is now in a single array instead of spread over four separate arrays that you hope you keep the index together.
I can talk about the ENTREE of $item[1] by saying $item[1]->{ENTREE}. It's easier to read and easier to maintain.
Also note that I've changed your for loop. In yours, you had to know that you had seven items. If you added a new item, you'd have to change your loop. In mine, I use $#menu to get the last index of my menu. I then use (0..$#menu) to automatically loop from 0 to the last item in the #menu array.
use strict;
use warnings;
use Data::Dumper;
my #menu = (
{ ENTREE => "Hamburger", PRICE => 3.49, QUANTITY => 3 },
{ ENTREE => "Frankfurter", PRICE => 2.19, QUANTITY => 0 },
{ ENTREE => "French Fries", PRICE => 1.69, QUANTITY => 0 },
{ ENTREE => "Large Coke", PRICE => 1.79, QUANTITY => 4 },
{ ENTREE => "Medium Coke", PRICE => 1.59, QUANTITY => 0 },
{ ENTREE => "Small Coke", PRICE => 1.39, QUANTITY => 0 },
{ ENTREE => "Onion Rings", PRICE => 1.19, QUANTITY => 8 },
);
printf "%-3.3s %-10.10s %-6.6s %s\n\n", 'Qty', 'Desc.', 'Unit $', 'Total';
# Use $#menu to get the number of items in the array instead of knowing it's 6
foreach my $item (0..$#menu) {
# Dereference $menu[$item] to make $menu_item a hash
# This makes the syntax easier to read.
my %menu_item = %{ $menu[$item] };
if ( $menu_item{QUANTITY} ) {
printf "%3d %-10.10s %9.2f %7.2f\n",
$menu_item{QUANTITY}, $menu_item{ENTREE}, $menu_item{PRICE},
$menu_item{QUANTITY} * $menu_item{PRICE};
}
}
OUTPUT:
Qty Desc. Unit $ Total
3 Hamburger 3.49 10.47
4 Large Coke 1.79 7.16
8 Onion Ring 1.19 9.52