Learning Perl, One-Liner behaves differently - perl

UPDATE
As pointed out in the answer, this question really has to do with Scalar versus List Context in Perl.
## ## ##
I am learning perl via self-taught crash course (primarily with the Llama book and the web). In attempting some byte swap code, I have found a one liner I do not understand completely. A comment in the script explains what I think is happening in the one-liner.
#!/usr/bin/perl --
#
# Script to print byte-swapped hex values
#
use 5.010;
use warnings;
use strict;
# NOTE: I realize I could use a single variable 'data', but the x- y- z- prefixes may help in
# identification (for clarity) in the code for this SO question.
my $xData;
my $yData;
my $zData;
for ( my $ijk = 998; $ijk < 1001; $ijk++ )
{
printf ( "\n%4d is hex " . (sprintf "0x%04X", $ijk) . "\n", $ijk );
# with byte swap
say "These numbers (bytes swapped) should match...";
# do sprintf, match pattern and store to ($1)($2), now reverse them into ($2)($1).
# BindOp leaves $_ alone, match stuffs $_ & is then used as input for reverse, prints.
say reverse ((sprintf "%04X", $ijk) =~ /(..)(..)/) ; # from perlmonks' webpage
$yData = (reverse ((sprintf "%04X", $ijk) =~ /(..)(..)/) );
say $yData; # does NOT match
$xData = sprintf "%04X", $ijk;
$xData =~ s/(..)(..)/$2$1/ ;
say $xData; # does match
$_ = sprintf "%04X", $ijk;
/(..)(..)/;
$zData = $2 . $1 ;
say $zData; # does match
}
exit 0;
OUTPUT:
998 is hex 0x03E6 These numbers (bytes swapped) should match...
E603
6E30
E603
E603
999 is hex 0x03E7 These numbers (bytes swapped) should match...
E703
7E30
E703
E703
1000 is hex 0x03E8 These numbers (bytes swapped) should match...
E803
8E30
E803
E803
Why does the one liner work, and why doesn't the $yData perform the same way? I'm pretty sure I understand why $xData and $zData work, but I would expect $yData to be the closest equivalent non-one-liner. What is the closest equivalent non-one-liner and why? Where is the discrepancy?

The reverse in your print (say) statement comes in the list context, while when assigned to $yData the context is scalar. This function (may) behave considerably differently based on the context.
From perldoc -f reverse
reverse LIST
In list context, returns a list value consisting of the elements of LIST in the opposite order. In scalar context, concatenates the elements of LIST and returns a string value with all characters in the opposite order.
In this case this produces different results.
When called in list context, it swaps the (two) input bytes, keeping each byte intact (represented by two hexadecimal digits matched in a group). When called in scalar context, it joins the input and returns a character string, running in the opposite order. Taken to represent a hex number this would have each byte changed.

Related

Remove upfront zeros from floating point lower than 1 in Perl

I would like to normalize the variable from ie. 00000000.1, to 0.1 using Perl
my $number = 000000.1;
$number =\~ s/^0+(\.\d+)/0$1/;
Is there any other solution to normalize floats lower than 1 by removing upfront zeros than using regex?
When I try to put those kind of numbers into an example function below
test(00000000.1, 0000000.025);
sub test {
my ($a, $b) = #_;
print $a, "\n";
print $b, "\n";
print $a + $b, "\n";
}
I get
01
021
22
which is not what is expected.
A number with leading zeros is interpreted as octal, e.g. 000000.1 is 01. I presume you have a string as input, e.g. my $number = "000000.1". With this your regex is:
my $number = "000000.1";
$number =~ s/^0+(?=0\.\d+)//;
print $number;
Output:
0.1
Explanation of regex:
^0+ -- 1+ 0 digits
(?=0\.\d+) -- positive lookahead for 0. followed by digits
Learn more about regex: https://twiki.org/cgi-bin/view/Codev/TWikiPresentation2018x10x14Regex
Simplest way, force it to be treated as a number and it will drop the leading zeros since they are meaningless for decimal numbers
my $str = '000.1';
...
my $num = 0 + $str;
An example,† to run from the command-line:
perl -wE'$n = shift; $n = 0 + $n; say $n' 000.1
Prints 0.1
Another, more "proper" way is to format that string ('000.1' and such) using sprintf. Then you do need to make a choice about precision, but that is often a good idea anyway
my $num = sprintf "%f", $str; # default precision
Or, if you know how many decimal places you want to keep
my $num = sprintf "%.3f", $str;
† The example in the question is really invalid. An unquoted string of digits which starts with a zero (077, rather than '077') would be treated as an octal number except that the decimal point (in 000.1) renders that moot as octals can't be fractional; so, Perl being Perl, it is tortured into a number somehow, but possibly yielding unintended values.
I am not sure how one could get an actual input like that. If 000.1 is read from a file or from the command-line or from STDIN ... it will be a string, an equivalent of assigning '000.1'
See Scalar value constructors in perldata, and for far more detail, perlnumber.
As others have noted, in Perl, leading zeros produce octal numbers; 10 is just a decimal number ten but 010 is equal to decimal eight. So yeah, the numbers should be in quotes for the problem to make any sense.
But the other answers don’t explain why the printed results look funny. Contrary to Peter Thoeny’s comment and zdim’s answer, there is nothing ‘invalid’ about the numbers. True, octals can’t be floating point, but Perl does not strip the . to turn 0000000.025 into 025. What happens is this:
Perl reads the run of zeros and recognises it as an octal number.
Perl reads the dot and parses it as the concatenation operator.
Perl reads 025 and again recognises it as an octal number.
Perl coerces the operands to strings, i.e. the decimal value of the numbers in string form; 0000000 is, of course, '0' and 025 is '21'.
Perl concatenates the two strings and returns the result, i.e. '021'.
And without error.
(As an exercise, you can check something like 010.025 which, for the same reason, turns into '821'.)
This is why $a and $b are each printed with a leading zero. Also note that, to evaluate $a + $b, Perl coerces the strings to numbers, but since leading zeros in strings do not produce octals, '01' + '021' is the same as '1' + '21', returning 22.

Shortest Perl solution for outputing 4 random words

I have this one-line Unix shell script
for i in 1 2 3 4; do sed "$(tr -dc '0-9' < /dev/urandom | fold -w 5 |
awk '$0>=35&&$0<=65570' | head -1)q;d" "$0"; done | perl -p00e
's/\n(?!\Z)/ /g'
The script has 65K words in it, one per line, from line 35 to 65570. The code and the data are in the same file.
This script outputs 4 space-separated random words from this list with a newline at the end. For example
first fourth third second
How can I make this one-liner much shorter with Perl, keeping the
tr -dc '0-9' < /dev/urandom
part?
Keeping it is important since it provides Cryptographically Secure Pseudo-Random Numbers (CSPRNs) for all Unix OSs. Of course, if Perl can get numbers from /dev/urandom then the tr can be replaced with Perl too, but the numbers from urandom need to stay.
For convenience, I shared the base script with 65K words
65kwords.txt
or
65kwords.txt
Please use only core modules. It would be used for generating "human memorable passwords".
Later, the (hashing) iteration count, where we would use this to store the passwords would be extremely high, so brute-force would be very slow, even with many many GPUs/FPGAs.
You mention needing a CSPRN, which makes this a non trivial exercise - if you need cryptographic randomness, then using built in stuff (like rand) is not a good choice, as the implementation is highly variable across platforms.
But you've got Rand::Urandom which looks like it does the trick:
By default it uses the getentropy() (only available in > Linux 3.17) and falls back to /dev/arandom then /dev/urandom.
#!/usr/bin/env perl
use strict;
use warnings;
use Rand::Urandom;
chomp ( my #words = <DATA> );
print $words[rand #words], " " for 1..4;
print "\n";
__DATA__
yarn
yard
wound
worst
worry
work
word
wool
wolf
wish
wise
wipe
winter
wing
wind
wife
whole
wheat
water
watch
walk
wake
voice
Failing that though - you can just read bytes from /dev/urandom directly:
#!/usr/bin/env perl
use strict;
use warnings;
my #number_of_words = 4;
chomp ( my #words = <DATA> );
open ( my $urandom, '<:raw', '/dev/urandom' ) or die $!;
my $bytes;
read ( $urandom, $bytes, 2 * $number_of_words ); #2 bytes 0 - 65535
#for testing
#unpack 'n' is n An unsigned short (16-bit)
# unpack 'n*' in a list context returns a list of these.
foreach my $value ( unpack ( "n*", $bytes ) ) {
print $value,"\n";
}
#actually print the words.
#note - this assumes that you have the right number in your list.
# you could add a % #words to the map, e.g. $words[$_ % #words]
#but that will mean wrapping occurs, and will alter the frequency distribution.
#a more robust solution would be to fetch additional bytes if the 'slot' is
#empty.
print join " ", ( map { $words[$_] } unpack ( "n*", $bytes )),"\n";
__DATA__
yarn
yard
wound
worst
#etc.
Note - the above relies on the fact that your wordlist is the same size as two bytes (16 bits) - if this assumption isn't true, you'll need to deal with 'missed' words. A crude approach would be to take a modulo, but that would mean some wrapping and therefore not quite truly even distribution of word picks. Otherwise you can bit-mask and reroll, as indicated below:
On a related point though - have you considered not using a wordlist, and instead using consonant-vowel-consonant groupings?
E.g.:
#!/usr/bin/env perl
use strict;
use warnings;
#uses /dev/urandom to fetch bytes.
#generates consonant-vowel-consonant groupings.
#each are 11.22 bits of entropy, meaning a 4-group is 45 bits.
#( 20 * 6 * 20 = 2400, which is 11.22 bits of entropy log2 2400
#log2(2400 ^ 4) = 44.91
#but because it's generated 'true random' it's a know entropy string.
my $num = 4;
my $format = "CVC";
my %letters = (
V => [qw ( a e i o u y )],
C => [ grep { not /[aeiouy]/ } "a" .. "z" ], );
my %bitmask_for;
foreach my $type ( keys %letters ) {
#find the next power of 2 for the number of 'letters' in the set.
#So - for the '20' letter group, that's 31. (0x1F)
#And for the 6 letter group that's 7. (0x07)
$bitmask_for{$type} = ( 2 << log ( #{$letters{$type}} ) / log 2 ) - 1 ;
}
open( my $urandom, '<:raw', '/dev/urandom' ) or die $!;
for ( 1 .. $num ) {
for my $type ( split //, $format ) {
my $value;
while ( not defined $value or $value >= #{ $letters{$type} } ) {
my $byte;
read( $urandom, $byte, 1 );
#byte is 0-255. Our key space is 20 or 6.
#So rather than modulo, which would lead to an uneven distribution,
#we just bitmask and discard and 'too high'.
$value = (unpack "C", $byte ) & $bitmask_for{$type};
}
print $letters{$type}[$value];
}
print " ";
}
print "\n";
close($urandom);
This generates 3 character CVC symbols, with a known entropy level (11.22 per 'group') for making reasonably robust passwords. (45 bits as opposed to the 64 bits of your original, although obviously you can add extra 'groups' to gain 11.22 bits per time).
This answer is not cryptographically safe!
I would do this completely in Perl. No need for a one-liner. Just grab your word-list and put it into a Perl program.
use strict;
use warnings;
my #words = qw(
first
second
third
fourth
);
print join( q{ }, map { $words[int rand #words] } 1 .. 4 ), "\n";
This grabs four random words from the list and outputs them.
rand #words evaluates #words in scalar context, which gives the number of elements, and creates a random floating point value between 0 and smaller than that number. int cuts off the decimals. This is used as the index to grab an element out of #words. We repeat this four times with the map statement, where the 1 .. 4 is the same as passing a list of (1, 2, 3, 4) into map as an argument. This argument is ignored, but instead our random word is picked. map returns a list, which we join on one space. Finally we print the resulting string, and a newline.
The word list is created with the quoted words qw() operator, which returns a list of quoted words. It's shorthand so you don't need to type all the quotes ' and commas ,.
If you'd want to have the word list at the bottom you could either put the qw() in a sub and call it at the top, or use a __DATA__ section and read from it like a filehandle.
The particular method using tr and fold on /dev/urandom is a lot less efficient than it could be, so let's fix it up a little bit, while keeping the /dev/urandom part.
Assuming that available memory is enough to contain your script (including wordlist):
chomp(#words = <DATA>);
open urandom, "/dev/urandom" or die;
read urandom, $randbytes, 4 * 2 or die;
print join(" ", map $words[$_], unpack "S*", $randbytes), "\n";
__DATA__
word
list
goes
here
This goes for brevity and simplicity without outright obfuscation — of course you could make it shorter by removing whitespace and such, but there's no reason to. It's self-contained and will work with several decades of perls (yes, those bareword filehandles are deliberate :-P)
It still expects exactly 65536 entries in the wordlist, because that way we don't have to worry about introducing bias to the random number choice using a modulus operator. A slightly more ambitious approach might be to read 48 bytes from urandom for each word, turning it into a floating-point value between 0 and 1 (portable to most systems) and multiplying it by the size of the word list, allowing for a word list of any reasonable size.
A lot of nonsense is talked about password strength, and I think you're overestimating the worth of several of your requirements here
I don't understand your preoccupation with making your code "much shorter with perl". (Why did you pick Perl?) Savings here can only really be useful to make the script quicker to read and compile, but they will be dwarfed by the half megabyte of data following the code which must also be read
In this context, the usefulness to a hacker of a poor random number generator depends on prior knowledge of the construction of the password together with the passwords that have been most recently generated. With a sample of only 65,000 words, even the worst random number generator will show insignificant correlation between successive passwords
In general, a password is more secure if it is longer, regardless of its contents. Forming a long password out of a sequence of English words is purely a way of making the sequence more memorable
"Of course later, the (hashing) iteration count ... would be extreme high, so brute-force [hacking?] would be very slow"
This doesn't follow at all. Cracking algorithms won't try to guess the four words you've chosen: they will see only a thirty-character (or so) string consisting only of lower-case letters and spaces, and whose origin is insignificant. It will be no more or less crackable than any other password of the same length with the same character set
I suggest that you should rethink your requirements and so make things easier for yourself. I don't find it hard to think of four English words, and don't need a program to do it for me. Hint: pilchard is a good one: they never guess that!
If you still insist, then I would write something like this in Perl. I've used only the first 18 lines of your data for
use strict;
use warnings 'all';
use List::Util 'shuffle';
my #s = map /\S+/g, ( shuffle( <DATA> ) )[ 0 .. 3 ];
print "#s\n";
__DATA__
yarn
yard
wound
worst
worry
work
word
wool
wolf
wish
wise
wipe
winter
wing
wind
wife
whole
wheat
output
wind wise winter yarn
You could use Data::Random::rand_words()
perl -MData::Random -E 'say join $/, Data::Random::rand_words(size => 4)'

In Perl, how can I tell split not to strip empty trailing fields?

Was trying to count the number of lines in a string of text (including empty lines). A little surprised by the behavior of split. Had expected the following to output 2 but it printed 1 on my perl 5.14.2.
$str = "hello\
world\n\n";
#a = split(/\n/, $str);
print $#a, "\n";
Seems that split() is insensitive to consecutive \n (add more \n's at the end of the string will not increase the printout). The only I can get it sort of close to giving the number of lines is
$str = "hello\
world\n\n";
#a = split(/(\n)/, $str);
printf "%d\n", ($#a + 1)/2, "\n";
But it looks more like a workaround than a straight solution. Any ideas?
perldoc -f split:
If LIMIT is negative, it is treated as if it were instead
arbitrarily large; as many fields as possible are produced.
If LIMIT is omitted (or, equivalently, zero), then it is usually
treated as if it were instead negative but with the exception that
trailing empty fields are stripped (empty leading fields are
always preserved); if all fields are empty, then all fields are
considered to be trailing (and are thus stripped in this case).
$ perl -E 'my $x = "1\n2\n\n"; my #x = split /\n/, $x, -1; say $#x'
3
Perhaps the problem is that you are using $#a when scalar #a is what you are actually looking for?
I apologize if you are already aware of this or if this is not the issue, but $#a returns the index of the last element of #a and (scalar #a) returns the number of elements that #a contains. Since array indexing starts at 0, $#a is one less than scalar #a.

Perl Cryptology: Encrypting/Decrypting ASCII chracters with pack and unpack functions

I need help figuring out how these two subroutines work and what values or data structures they return. Here's a minimal representation of the code:
#!/usr/bin/perl
use strict; use warnings;
# an array of ASCII encrypted characters
my #quality = ("C~#p)eOA`/>*", "DCCec)ds~~", "*^&*"); # for instance
# input the quality
# the '#' character in front deferences the subroutine's returned array ref
my #q = #{unpack_qual_to_phred(#quality)};
print pack_phred_to_qual(\#q) . "\n";
sub unpack_qual_to_phred{
my ($qual)=#_;
my $upack_code='c' . length($qual);
my #q=unpack("$upack_code",$qual);
for(my $i=0;$i<#q;$i++){
$q[$i]-=64;
}
return(\#q);
}
sub pack_phred_to_qual{
my ($q_ref)=#_;
#q=#{$q_ref};
for(my $i=0;$i<#q;$i++){
$q[$i]+=64;
}
my $pack_code='c' . int(#q);
my $qual=pack("$pack_code",#q);
return ($qual);
}
1;
From my understanding, the unpack_qual_to_phread() subroutine apparently decrypts the ASCII character elements stored in #quality. The subroutine reads in an array containing elements of ASCII characters. Each element of the array is processed and apparently decrypted. The subroutine then returns an array ref containing elements of the decrypted array. I understand this much however I'm not really familiar with the Perl functions pack and unpack. Also I was unable to find any good examples of them online.
I think the pack_phred_to_qual subroutine converts the quality array ref back into ASCII characters and prints them.
thanks. any help or suggestions are greatly appreciated. Also if someone could provide a simple example of how Perl's pack and unpack functions work that would help too.
Calculating the length is needless. Those functions can be simplified to
sub unpack_qual_to_phred { [ map $_ - 64, unpack 'c*', $_[0] ] }
sub pack_phred_to_qual { pack 'c*', map $_ + 64, #{ $_[0] } }
In encryption terms, it's a crazy simple substitution cypher. It simply subtracts 64 from the character number of each character. It could have been written as
sub encrypt { map $_ - 64, #_ }
sub decrypt { map $_ + 64, #_ }
The pack/unpack doesn't factor in the encryption/decryption at all; it's just a way of iterating over each byte.
It is fairly simple, as packs go. Is is calling unpack("c12", "C~#p)eOA/>*)` which takes each letter in turn and finds the ascii value for that letter, and then subtracts 64 from the value (well, subtracting 64 is a post-processing step, nothing to do with pack). So letter "C" is ascii 67 and 67-64 is 3. Thus the first value out of that function is a 3. Next is "~" which is ascii 126. 126-64 is 62. Next is # which is ascii 35, and 35-64 is -29, etc.
The complete set of numbers being generated from your script is:
3,62,-29,48,-23,37,15,1,32,-17,-2,-22
The "encryption" step simply reverses this process. Adds 64 and then converts to a char.
This is not a full answer to your question, but did you read perlpacktut? Or the pack/unpack docs on perldoc? Those will probably go a long way to helping you understand.
EDIT:
Here's a simple way to think of it: say you have a 4-byte number stored in memory, 1234. If that's in a perl scalar, $num, then
pack('s*', $num)
would return
π♦
or whatever the actual internal storage value of "1234" is. So pack() treated the scalar value as a string, and turned it into the actual binary representation of the number (you see "pi-diamond" printed out, because that's the ASCII representation of that number). Conversely,
unpack('s*', "π♦")
would return the string "1234".
The unpack() part of your unpack_qual_to_phred() subroutine could be simplified to:
my #q = unpack("c12", "C~#p)e0A`/>*");
which would return a list of ASCII character pairs, each pair corresponding to a byte in the second argument.

How do I get the length of a string in Perl?

What is the Perl equivalent of strlen()?
length($string)
perldoc -f length
length EXPR
length Returns the length in characters of the value of EXPR. If EXPR is
omitted, returns length of $_. Note that this cannot be used on an
entire array or hash to find out how many elements these have. For
that, use "scalar #array" and "scalar keys %hash" respectively.
Note the characters: if the EXPR is in Unicode, you will get the num-
ber of characters, not the number of bytes. To get the length in
bytes, use "do { use bytes; length(EXPR) }", see bytes.
Although 'length()' is the correct answer that should be used in any sane code, Abigail's length horror should be mentioned, if only for the sake of Perl lore.
Basically, the trick consists of using the return value of the catch-all transliteration operator:
print "foo" =~ y===c; # prints 3
y///c replaces all characters with themselves (thanks to the complement option 'c'), and returns the number of character replaced (so, effectively, the length of the string).
length($string)
The length() function:
$string ='String Name';
$size=length($string);
You shouldn't use this, since length($string) is simpler and more readable, but I came across some of these while looking through code and was confused, so in case anyone else does, these also get the length of a string:
my $length = map $_, $str =~ /(.)/gs;
my $length = () = $str =~ /(.)/gs;
my $length = split '', $str;
The first two work by using the global flag to match each character in the string, then using the returned list of matches in a scalar context to get the number of characters. The third works similarly by splitting on each character instead of regex-matching and using the resulting list in scalar context