Perl: Encoding binary number to base64 - perl

I have a string that is essentially binary
my $string = "000110";
I've been trying to use encode_base64 but that encodes strings, if im reading the documentation correctly.
my $j = MIME::Base64->encode_base64($string);
print "$j\n"; # should print 'A'
>> TUlNRTo6QmFzZTY000000
How can I achive this in perl? the string is expected to be ~120 binary bits in length.
I'd rather not use any modules that are not installed with perl by default, the target audience for this script is not familiar with the shell.
Edit:
A lot of the answers to this question have been surrounded about strings, not actual numbers, there was one solution I found, but it required Math::BaseCalc module to be installed.
Edit2: Essentially, if i have
my $binary_string = "000110";
i would like to have it encoded in base64 (as a number), so it returns
>>G # for this case (binary number 000110 to base64 number = G)

base64 is an algorithm that converts strings of 8-bit bytes/characters. Anything else must be packed into bytes.
You already have a string, but you could be more space-efficient by packing the 120 bits into 15 bytes using the following:
my $base64 = encode_base64(pack("B*", $binary), "");
The inverse operation is
my $binary = unpack("B*", decode_base64($base64));
For example,
$ perl -MMIME::Base64 -E'say encode_base64(pack("B*", $ARGV[0]), "")' \
0100000101000010
QUI=
$ perl -MMIME::Base64 -E'say unpack("B*", decode_base64($ARGV[0]))' \
QUI=
0100000101000010
If you have actually have a number of bits that's not divisible by 8, you can prefix the string with the number of bits.
my $base64 = encode_base64(pack("CB*", length($binary), $binary), "");
The inverse operation is
my ($length, $binary) = unpack("CB*", decode_base64($base64));
substr($binary, $length) = "";

Related

How can I escape a string in Perl for LDAP searching?

I want to escape a string, per RFC 4515. So, the string "u1" would be transformed to "\75\31", that is, the ordinal value of each character, in hex, preceded by backslash.
Has to be done in Perl. I already know how to do it in Python, C++, Java, etc., but Perl if baffling.
Also, I cannot use Net::LDAP and I may not be able to add any new modules, so, I want to do it with basic Perl features.
Skimming through RFC 4515, this encoding escapes the individual octets of multi-byte UTF-8 characters, not codepoints. So, something that works with non-ASCII text too:
#!/usr/bin/env perl
use strict;
use warnings;
use feature qw/say/;
sub valueencode ($) {
# Unpack format returns octets of UTF-8 encoded text
my #bytes = unpack "U0C*", $_[0];
sprintf '\%02x' x #bytes, #bytes;
}
say valueencode 'u1';
say valueencode "Lu\N{U+010D}i\N{U+0107}"; # Lučić, from the RFC 4515 examples
Example:
$ perl demo.pl
\75\31
\4c\75\c4\8d\69\c4\87
Or an alternative using the vector flag:
use Encode qw/encode/;
sub valueencode ($) {
sprintf '\%*vx', "\\", encode('UTF-8', $_[0]);
}
Finally, a smarter version that only escapes ASCII characters when it has to (And multi-byte characters, even though upon a closer read of the RFC they don't actually need to be if they're valid UTF-8):
# Encode according to RFC 4515 valueencoding grammar rules:
#
# Text is UTF-8 encoded. Bytes can be escaped with the sequence
# \XX, where the X's are hex digits.
#
# The characters NUL, LPAREN, RPAREN, ASTERISK and BACKSLASH all MUST
# be escaped.
#
# Bytes > 0x7F that aren't part of a valid UTF-8 sequence MUST be
# escaped. This version assumes there are no such bytes and that input
# is a ASCII or Unicode string.
#
# Single bytes and valid multibyte UTF-8 sequences CAN be escaped,
# with each byte escaped separately. This version escapes multibyte
# sequences, to give ASCII results.
sub valueencode ($) {
my $encoded = "";
for my $byte (unpack 'U0C*', $_[0]) {
if (($byte >= 0x01 && $byte <= 0x27) ||
($byte >= 0x2B && $byte <= 0x5B) ||
($byte >= 0x5D && $byte <= 0x7F)) {
$encoded .= chr $byte;
} else {
$encoded .= sprintf '\%02x', $byte;
}
}
return $encoded;
}
This version returns the strings 'u1' and 'Lu\c4\8di\c4\87' from the above examples.
In short, one way is just as the question says: split the string into characters, get their ordinals then convert format to hex; then put it back together. I don't know how to get the \nn format so I'd make it 'by hand'. For instance
my $s = join '', map { sprintf '\%x', ord } split //, 'u1';
Or use vector flag %v to treat the string as a "vector" of integers
my $s = sprintf '\%*vx', '\\', 'u1';
With %v the string is broken up into numerical representation of characters, each is converted (%x), and they're joined back, with . between them. That (optional) * allows us to specify our string by which to join them instead, \ (escaped) here.
This can also be done with pack + unpack, see the link below. Also see that page if there is a wide range of input characters.†
See ord and sprintf, and for more pages like this one.
† If there is non-ASCII input then you may need to encode it so to get octets, if they are to escape (and not whole codepoints)
use Encode qw(encode);
my $s = sprintf '\%*vx', '\\', encode('UTF_8', $input);
See the linked page for more.

Perl - convert hexadecimal to binary and use it as string

I am new to Perl and I have difficulties using the different types.
I am trying to get an hexadecimal register, transform it to binary, use it a string and get substrings from the binary string.
I have done a few searches and what I tried is :
my $hex = 0xFA1F;
print "$hex\n";
result was "64031" . First surprise : can't I print the hex value in Perl and not just the decimal value ?
$hex = hex($hex);
print "$hex\n";
Result was 409649. Second surprise : I would expect the result to be also 64031 since "hex" converts hexadecimal to decimal.
my $bin = printf("%b", $hex);
It prints the binary value. Is there a way to transform the hex to bin without printing it ?
Thanks,
SLP
Decimal, binary, and hexadecimal are all text representations of a number (i.e. ways of writing a number). Computers can't deal with these as numbers.
my $num = 0xFA1F; stores the specified number (sixty-four thousand and thirty-one) into $num. It's stored in a format the hardware understands, but that's not very important. What's important is that it's stored as a number, not text.
When print is asked to print a number, it prints it out in decimal (or scientific notation if large/small enough). It has no idea how the number of created (from a hex constant? from addition? etc), so it can't determine how to output the number based on that.
To print an number as hex, you can use
my $hex = 'FA1F'; # $hex contains the hex representation of the number.
print $hex; # Prints the hex representation of the number.
or
my $num = 0xFA1F; # $num contains the number.
printf "%X", $num; # Prints the hex representation of the number.
You are assigning a integer value using hexadecimal format. print by default prints numbers in decimal format, so you are getting 64031.
You can verify this using the printf() by giving different formats.
$ perl -e ' my $num = 0xFA1F; printf("%d %X %b\n", ($num) x 3 ) '
64031 FA1F 1111101000011111
$ perl -e ' my $num = 64031; printf("%d %X %b\n", ($num) x 3 ) '
64031 FA1F 1111101000011111
$ perl -e ' my $num = 0b1111101000011111; printf("%d %X %b\n", ($num) x 3 ) '
64031 FA1F 1111101000011111
$
To get the binary format of 0xFA1F in string, you can use sprintf()
$ perl -e ' my $hex = 0xFA1F; my $bin=sprintf("%b",$hex) ; print "$bin\n" '
1111101000011111
$
lets take each bit of confusion in order
my $hex = 0xFA1F;
This stores a hex constant in $hex, but Perl doesn't have a hex data type so although you can write hex constants, and binary and octal constants for that matter, Perl converts them all to decimal. Note that there is a big difference between
my $hex = 0xFA1F;
and
my $hex = '0xFA1F';
The first stores a number into $hex, which when you print it out you get a decimal number, the second stores a string which when printed out will give 0xFAF1 but can be passed to the hex() function to be converted to decimal.
$hex = hex($hex);
The hex function converts a string as if it was a hex number and returns the decimal value and, as up to this point, $hex has only ever been used as a number Perl will first stringify $hex then pass the string to the hex() function to convert that value from hex to decimal.
So to the solution. You are almost there with printf(),there is a function called sprintf() which takes the same parameters as printf() but instead of printing the formatted value returns it as a string. So what you need is.
my $hex = 0xFA1F;
my $bin = sprintf("%b", $hex);
print $bin;
Technical note:
Yes I know that Perl stores all its numbers internally as binary, but lets not go there for this answer, OK?
If you're ok with using a distribution, I wrote Bit::Manip to make my prototyping a bit easier when dealing with registers (There's also a Pure Perl version available if you have problems compiling the XS code).
Not only can it fetch out bits from a number, it can toggle, clear, set etc:
use warnings;
use strict;
use Bit::Manip qw(:all);
my $register = 0xFA1F;
# fetch the bits from register using msb, lsb
my $msbyte = bit_get($register, 15, 8);
print "value: $msbyte\n";
print "bin: " . bit_bin($msbyte) . "\n";
# or simply:
# printf "bin: %b\n", $msbyte;
Output:
value: 250
bin: 11111010
Here's a blog post I wrote that shows how to use some of the software's functionality with an example datasheet register.

how to print the result from pack function?

I like to verify what pack does. I have the following code to give it a try.
$bits = pack 'N','134744072';
how to print bits ?
I did the following:
printf ("bits = %032b \n", $bits);
but it does not work.
Thanks !!
If you want the binary representation of a number, use
my $num = 134744072;
printf("bits = %032b\n", $num);
If you want the binary representation of a string of bytes, use
my $bytes = pack('N', 134744072);
printf("bits = %s\n", unpack('B*', $bytes));
The Devel::Peek module (which comes with Perl) allows you to examine Perl's representation of the variable. This is probably more useful than just a raw print when you're dealing with binary data rather than printable character strings.
#!/usr/bin/perl
use strict;
use warnings;
use Devel::Peek qw(Dump);
my $bits = pack 'N','134744072';
Dump($bits);
Which produces output like this:
SV = PV(0xaedb20) at 0xb15650
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0xb06630 "\10\10\10\10"\0
CUR = 4
LEN = 10
The 'SV' at the beginning indicates that this is a dump of a 'scalar value' (as opposed to say an array or a hash value).
The 'SV = PV' indicates that this scalar contains a string of bytes (as opposed to say an integer or floating point value).
The 'PV = 0xb06630' is the pointer to where those bytes are located.
The "\10\10\10\10"\0 is probably the bit you're interested in. The double quoted string represents the bytes making up the contents of this string.
Inside the string, you would typically see the bytes interpreted as if they were ASCII, so the byte 65 decimal would appear as 'A'. All non-printable characters are displayed in octal with a preceding \.
So your $bits variable contains 4 bytes, each octal '10' which is hex 0x08.
The LEN and CUR are telling you that Perl allocated 10 bytes of storage and is currently using 4 of them (so length($bits) would return 4).

Perl substr based on bytes

I'm using SimpleDB for my application. Everything goes well unless the limitation of one attribute is 1024 bytes. So for a long string I have to chop the string into chunks and save it.
My problem is that sometimes my string contains unicode character (chinese, japanese, greek) and the substr() function is based on character count not byte.
I tried to use use bytes for byte semantic or later
substr(encode_utf8($str), $start, $length) but it does not help at all.
Any help would be appreciated.
UTF-8 was engineered so that character boundaries are easy to detect. To split the string into chunks of valid UTF-8, you can simply use the following:
my $utf8 = encode_utf8($text);
my #utf8_chunks = $utf8 =~ /\G(.{1,1024})(?![\x80-\xBF])/sg;
Then either
# The saving code expects bytes.
store($_) for #utf8_chunks;
or
# The saving code expects decoded text.
store(decode_utf8($_)) for #utf8_chunks;
Demonstration:
$ perl -e'
use Encode qw( encode_utf8 );
# This character encodes to three bytes using UTF-8.
my $text = "\N{U+2660}" x 342;
my $utf8 = encode_utf8($text);
my #utf8_chunks = $utf8 =~ /\G(.{1,1024})(?![\x80-\xBF])/sg;
CORE::say(length($_)) for #utf8_chunks;
'
1023
3
substr operates on 1-byte characters unless the string has the UTF-8 flag on. So this will give you the first 1024 bytes of a decoded string:
substr encode_utf8($str), 0, 1024;
although, not necessarily splitting the string on character boundaries. To discard any split characters at the end, you can use:
$str = decode_utf8($str, Encode::FB_QUIET);

How do I convert little Endian to Big Endian using a Perl Script?

I am using the Perl Win32::SerialPort module. In this paticular module I sent over data using the input command. The data that I sent over to a embedded system were scalar data (numbers) using the transmit_char function (if it were C it would be integers, but since its a scripting language I am not sure what the internal format is in perl. My guess is that perl always stores all numbers as 32 bit floating points, which are adjusted by the module when transmitting).
Then after sending the data I receive data using the input command. The data that I recieve is probably in binary form, but perl doesn't know how to interpret it. I use the unpack function like this
my $binData = $PortObj->input;
my $hexData = unpack("H*",$binData);
Suppose I transmit 0x4294 over the serial cable, which is a command on the embedded system that I am communicating with, I expect a response of 0x5245. Now the problem is with the endianess: when I unpack I get 0x4552, which is wrong. Is there a way to correct that by adjusting the binary data. I also tried h*, which gives me 0x5425, which is also not correct.
Note: the data I receive is sent over Byte at a time and the LSB is sent first
Endianess applies to the ordering of bytes of an integer (primarily). You need to know the size of the integer.
Example for 32-bit unsigned:
my $bytes = pack('H*', '1122334455667788');
my #n = unpack('N*', $bytes);
# #n = ( 0x11223344, 0x55667788 );
my $bytes = pack('H*', '4433221188776655');
my #n = unpack('V*', $bytes);
# #n = ( 0x11223344, 0x55667788 );
See pack. Note the "<" and ">" modifiers to control the endianess where of instructions where the default endianess is not the one you want.
Note: If you're reading from the file, you already have bytes. Don't create bytes using pack 'H*'.
Note: If you're reading from the file, don't forget to binmode the handle.
Regarding the example the OP added to his post:
To get 0x5245 from "\x45\x52", use unpack("v", $two_bytes).
What sort of data types are these? Perl's pack has the N and V format specifiers for integers, and Perl 5.10 added the > and < modifiers so you can read shorts, floats, doubles, and quads (and some other types) in the endianness you want.
With these, you read the data in the endianness it uses in the input. After you do that, you have the data internally-represented as the number you expect and you can re-pack them anyway that you like.
For example, the Q format doesn't have an endianness partner like the pair N and V. I'm always going to get the architecture's interpretation of the octet sequence:
my #octets = ( 0x19, 0x36 );
my $bom = pack 'C*', #octets;
my ( $short ) = unpack 'S', $bom;
my $last = $short & 0x00FF;
my $first = ( $short & 0xFF00 ) >> 8;
printf "SHORT: %x FIRST: %x LAST: %x\n", $short, $first, $last;
my $quad_format = $first == $octets[0] ? 'Q' : 'Q>';
say "QUAD_FORMAT: $quad_format";
my $data = pack 'C*', 0b11011110, 0xAD, 0xBE, 0xEF, 0xAA, 0xBB, 0xCC, 0xDD;
my $q = unpack $quad_format, $data;
printf "$quad_format: %x\n", $q;
The output shows that I get the packed value of 0x1936 comes back as 0x3619 with the plain S format. That means that this was run on a little-endian architecture. A same thing will happen with a quad value, so I want to read the quad and tell Perl to interpret get the value then force it to be big-endian (the "big" part of '>' touches the Q) to get the expected internal numerical value:
SHORT: 3619 FIRST: 36 LAST: 19
QUAD_FORMAT: Q>
Q>: deadbeefaabbccdd
I write more about this in Use the > and < pack modifiers to specify the architecture.