I need help figuring out how these two subroutines work and what values or data structures they return. Here's a minimal representation of the code:
#!/usr/bin/perl
use strict; use warnings;
# an array of ASCII encrypted characters
my #quality = ("C~#p)eOA`/>*", "DCCec)ds~~", "*^&*"); # for instance
# input the quality
# the '#' character in front deferences the subroutine's returned array ref
my #q = #{unpack_qual_to_phred(#quality)};
print pack_phred_to_qual(\#q) . "\n";
sub unpack_qual_to_phred{
my ($qual)=#_;
my $upack_code='c' . length($qual);
my #q=unpack("$upack_code",$qual);
for(my $i=0;$i<#q;$i++){
$q[$i]-=64;
}
return(\#q);
}
sub pack_phred_to_qual{
my ($q_ref)=#_;
#q=#{$q_ref};
for(my $i=0;$i<#q;$i++){
$q[$i]+=64;
}
my $pack_code='c' . int(#q);
my $qual=pack("$pack_code",#q);
return ($qual);
}
1;
From my understanding, the unpack_qual_to_phread() subroutine apparently decrypts the ASCII character elements stored in #quality. The subroutine reads in an array containing elements of ASCII characters. Each element of the array is processed and apparently decrypted. The subroutine then returns an array ref containing elements of the decrypted array. I understand this much however I'm not really familiar with the Perl functions pack and unpack. Also I was unable to find any good examples of them online.
I think the pack_phred_to_qual subroutine converts the quality array ref back into ASCII characters and prints them.
thanks. any help or suggestions are greatly appreciated. Also if someone could provide a simple example of how Perl's pack and unpack functions work that would help too.
Calculating the length is needless. Those functions can be simplified to
sub unpack_qual_to_phred { [ map $_ - 64, unpack 'c*', $_[0] ] }
sub pack_phred_to_qual { pack 'c*', map $_ + 64, #{ $_[0] } }
In encryption terms, it's a crazy simple substitution cypher. It simply subtracts 64 from the character number of each character. It could have been written as
sub encrypt { map $_ - 64, #_ }
sub decrypt { map $_ + 64, #_ }
The pack/unpack doesn't factor in the encryption/decryption at all; it's just a way of iterating over each byte.
It is fairly simple, as packs go. Is is calling unpack("c12", "C~#p)eOA/>*)` which takes each letter in turn and finds the ascii value for that letter, and then subtracts 64 from the value (well, subtracting 64 is a post-processing step, nothing to do with pack). So letter "C" is ascii 67 and 67-64 is 3. Thus the first value out of that function is a 3. Next is "~" which is ascii 126. 126-64 is 62. Next is # which is ascii 35, and 35-64 is -29, etc.
The complete set of numbers being generated from your script is:
3,62,-29,48,-23,37,15,1,32,-17,-2,-22
The "encryption" step simply reverses this process. Adds 64 and then converts to a char.
This is not a full answer to your question, but did you read perlpacktut? Or the pack/unpack docs on perldoc? Those will probably go a long way to helping you understand.
EDIT:
Here's a simple way to think of it: say you have a 4-byte number stored in memory, 1234. If that's in a perl scalar, $num, then
pack('s*', $num)
would return
π♦
or whatever the actual internal storage value of "1234" is. So pack() treated the scalar value as a string, and turned it into the actual binary representation of the number (you see "pi-diamond" printed out, because that's the ASCII representation of that number). Conversely,
unpack('s*', "π♦")
would return the string "1234".
The unpack() part of your unpack_qual_to_phred() subroutine could be simplified to:
my #q = unpack("c12", "C~#p)e0A`/>*");
which would return a list of ASCII character pairs, each pair corresponding to a byte in the second argument.
Related
I have a code like this.
$entry = &function(); //returns a number between 0 to 20
$var = sprintf("%#.4x", $entry);
if($var=~ /$hex/)
{
//block of statements
}
$hex will be within 0x0000 ..... 0x0014. Now, when function returns from 1 to 20, $var matches $hex. (Like 0x0001 .... 0x0014)
But when $entry is 0, $var becomes 0000. But I want it to be 0x0000. Currently, I am checking if that is 0000, I am changing it through a if loop. Please let me know if that is possible in sprintf itself.
According to the documentation for sprintf:
flags
# prefix non-zero hexadecimal with "0x" or "0X"
Note that it says non-zero, so only non-zero values will be prefixed by 0x.
A simple fix is to add the prefix manually:
sprintf "0x%04x", $entry;
The doc clearly mentions that 0x is appended only for non-zero numbers when # flag is used.This makes sense since zero is zero whether it is in Octal or Hexadecimal. Hence prefixing it with 0x doesn't make sense.
Best way to handle this would be:
if($var=~ /$hex/ or !$var)
Sounds like you are doing things backwards. Wouldn't the following make more sense?
if ($entry == hex($hex))
If you want to compare numbers, compare the numbers, not their text representation.
UPDATE
As pointed out in the answer, this question really has to do with Scalar versus List Context in Perl.
## ## ##
I am learning perl via self-taught crash course (primarily with the Llama book and the web). In attempting some byte swap code, I have found a one liner I do not understand completely. A comment in the script explains what I think is happening in the one-liner.
#!/usr/bin/perl --
#
# Script to print byte-swapped hex values
#
use 5.010;
use warnings;
use strict;
# NOTE: I realize I could use a single variable 'data', but the x- y- z- prefixes may help in
# identification (for clarity) in the code for this SO question.
my $xData;
my $yData;
my $zData;
for ( my $ijk = 998; $ijk < 1001; $ijk++ )
{
printf ( "\n%4d is hex " . (sprintf "0x%04X", $ijk) . "\n", $ijk );
# with byte swap
say "These numbers (bytes swapped) should match...";
# do sprintf, match pattern and store to ($1)($2), now reverse them into ($2)($1).
# BindOp leaves $_ alone, match stuffs $_ & is then used as input for reverse, prints.
say reverse ((sprintf "%04X", $ijk) =~ /(..)(..)/) ; # from perlmonks' webpage
$yData = (reverse ((sprintf "%04X", $ijk) =~ /(..)(..)/) );
say $yData; # does NOT match
$xData = sprintf "%04X", $ijk;
$xData =~ s/(..)(..)/$2$1/ ;
say $xData; # does match
$_ = sprintf "%04X", $ijk;
/(..)(..)/;
$zData = $2 . $1 ;
say $zData; # does match
}
exit 0;
OUTPUT:
998 is hex 0x03E6 These numbers (bytes swapped) should match...
E603
6E30
E603
E603
999 is hex 0x03E7 These numbers (bytes swapped) should match...
E703
7E30
E703
E703
1000 is hex 0x03E8 These numbers (bytes swapped) should match...
E803
8E30
E803
E803
Why does the one liner work, and why doesn't the $yData perform the same way? I'm pretty sure I understand why $xData and $zData work, but I would expect $yData to be the closest equivalent non-one-liner. What is the closest equivalent non-one-liner and why? Where is the discrepancy?
The reverse in your print (say) statement comes in the list context, while when assigned to $yData the context is scalar. This function (may) behave considerably differently based on the context.
From perldoc -f reverse
reverse LIST
In list context, returns a list value consisting of the elements of LIST in the opposite order. In scalar context, concatenates the elements of LIST and returns a string value with all characters in the opposite order.
In this case this produces different results.
When called in list context, it swaps the (two) input bytes, keeping each byte intact (represented by two hexadecimal digits matched in a group). When called in scalar context, it joins the input and returns a character string, running in the opposite order. Taken to represent a hex number this would have each byte changed.
Is there a way to get number of bytes that "consumed" by an 'unpack' call?
I just want to parse(unpack) different structures from a long string in several steps, like following:
my $record1 = unpack "TEMPLATE", substr($long_str, $pos);
# Advance position pointer
$pos += NUMBER_OF_BYTES_CONSUMED_BY_LAST_UNPACK();
# Other codes that might determin what to read in following steps
# ...
# Read again at the new position
my $record2 = unpack "TEMPLATE2", substr($long_str, $pos);
This does seem like a glaring omission in unpack, doesn't it? As a consolation prize, you could use an a* to the end of the unpack template to return the unused portion of the input string.
# The variable-length "w" format is to make the example slightly more interesting.
$x = pack "w*", 126..129;
while(length $x) {
# unpack one number, keep the rest packed in $x
($n, $x) = unpack "wa*", $x;
print $n;
}
If your packed string is really long, this is not a good idea since it has to make a copy of the "remainder" portion of the string every time you do an unpack.
You can add the character . to the end of the format string:
my (#ary) = unpack("a4v3a*.", "abcdefghijklmn");
say for #ary;
Output:
abcd
26213
26727
27241
klmn
14 # <-- 14 bytes consumed
This was cleverly hidden in the perl5100delta file. If it is documented in perlfunc somewhere, I cannot find it.
Can anybody explain the following Perl code for me, please?
I think its in Perl and I have no clue about Perl programming. Please explain what the following code does?
$t = test(10);
sub test() {
my $str = unpack("B32", pack("N",shift));
$str2 = substr($str,16,length($str));
return $str2;
}
The pack, unpack and substr functions are documented here, here and here, respectively.
pack("N"...) packs a number into a four-byte network-order representation. unpack("B32"...) unpacks this packed number as a string of bits (zeros and ones). The substr call takes the second half of this bit string (from bit 16 onwards), which represents the lower 16 bits of the original 32-bit number.
Why it does it this way is a mystery to me. A simpler and faster solution is to deal with the lower 16 bits at the outset (note the lower case "n"):
sub test($) {
return unpack("B16", pack("n",shift));
}
shift
pops the first argument to the function from the list of arguments passed
pack("N", shift)
returns a 32bit network byte order representation of that value
my $str = unpack("B32", pack("N", shift));
stores a bitstring representation (32 bits worth) of said value (i.e. a string that looks like "00010011").
The substr is buggy and should be substr($str, 16); to get the last 16 characters of the above. (or substr($str, 16, 16);.)
In addition to Marcelo's answer, the shift function takes the #_ as its default argument. #_ contains the subroutine's arguments.
pack("N", shift) takes the argument of the function (return value of shift, which works on the arguments array by default) and makes it into an integer. The unpack("B32, part then makes it into string again, of 32 bits, so a string of 0's and 1's. The substr just takes the last 16 bit-characters, in this case.
What is the Perl equivalent of strlen()?
length($string)
perldoc -f length
length EXPR
length Returns the length in characters of the value of EXPR. If EXPR is
omitted, returns length of $_. Note that this cannot be used on an
entire array or hash to find out how many elements these have. For
that, use "scalar #array" and "scalar keys %hash" respectively.
Note the characters: if the EXPR is in Unicode, you will get the num-
ber of characters, not the number of bytes. To get the length in
bytes, use "do { use bytes; length(EXPR) }", see bytes.
Although 'length()' is the correct answer that should be used in any sane code, Abigail's length horror should be mentioned, if only for the sake of Perl lore.
Basically, the trick consists of using the return value of the catch-all transliteration operator:
print "foo" =~ y===c; # prints 3
y///c replaces all characters with themselves (thanks to the complement option 'c'), and returns the number of character replaced (so, effectively, the length of the string).
length($string)
The length() function:
$string ='String Name';
$size=length($string);
You shouldn't use this, since length($string) is simpler and more readable, but I came across some of these while looking through code and was confused, so in case anyone else does, these also get the length of a string:
my $length = map $_, $str =~ /(.)/gs;
my $length = () = $str =~ /(.)/gs;
my $length = split '', $str;
The first two work by using the global flag to match each character in the string, then using the returned list of matches in a scalar context to get the number of characters. The third works similarly by splitting on each character instead of regex-matching and using the resulting list in scalar context