Bit selection in perl - perl

How can we bit select the variables in perl code?
I am new to perl and I have a scenario where I need to extract a particular format from a file and give input to a another module for analysis.
Currently I have extracted the required pattern which is in 16-bit Hexadecimal.
Now from this 16bits hexadecimal format ,I want only LSB 10bits.
Kindly refer the below example(This is a sample code where I have used only 1 line of my requirement)
use strict;
my $string = "HDR 0c0d PlD 1000 GAP 412";
$string =~ s/.*HDR\s(\S+).*/$1/g;
print "$string\n";
my $hex = hex($string);
print "$hex";
The output in $hex is 3095 which is 16bit 16’b0011000010000101 now I need to extract only the LSB 10bits(0010000101), Please let me know some easy way to do this .

Use bit mask to select bits which you need. To select right 10 bit you can use:
my $x = 0xfff0;
print $x & 0x3ff;
output is
1008
which is decimal number of ten bits of number 0xfff0

Related

perl pack a utf-8 Chinese Character how to unpack to get this Character?

I am learning pack function in perl. I found I cannot unpack and get the origin value. Following is the code. File encode utf8. How can I unpack and get the Chinese character.
I have checked the perldoc. I am not sure which TEMPLATE I can use. Document said that:
U A Unicode character number. Encodes to a character in character mode and UTF-8 (or UTF-EBCDIC in EBCDIC platforms) in byte mode.
So I tried U. But it didn't work.
use Encode;
open(DAT,"+>T.dat");
binmode(DAT,":raw");
print DAT pack("f",-3.938345);
print DAT pack("l",1234556);
print DAT pack("U*","我");
seek(DAT,0,0);
read(DAT,$Val,4);
$V=unpack("f",$Val);
print "V $V\n";
read(DAT,$int,4);
$I=unpack("l",$int);
print "int $I\n";
read(DAT,$HZ,4);
$HZ=unpack("U*",$HZ);
print("HZ $HZ\n");
close(DAT);
And I have another question, I know one Chinese Character only take 2 bytes if encoded in GB2312. How can I pack one Character and only take 2 bytes space?
Unicode pack and unpack in Perl works the other way round:
use utf8;
binmode STDOUT,":utf8";
my $packed = pack("U*", 0x6211);
print "$packed\n"; # 我
my $unpacked = unpack("U*", "我");
printf "0x%X\n", $unpacked; # 0x6211

Can't use an undefined value as an ARRAY reference

I have a simple script written in perl, and i keep getting this particular error when i try to run. The script is for generating some numbers for use in checking integer to floating point. This is the particular error i get.
Can't use an undefined value as an ARRAY reference at /tools/oss/packages/i86pc-5.10/perl/5.8.8-32/lib/5.8.8/Math/BigInt/Calc.pm line 1180
From the error message am not able to figure out where my code is going wrong. By the way i need to use 64 bit numbers. How do i debug this issue?
Here is the code sample
use bignum;
use warnings;
use strict;
open(VCVT, ">CvtIntToFp") or die "couldn't open file to write:$!";
my $number;
my $sgn;
# left with 31 bits excluding the sign
# 23 bits of significand needed, all the result
# will be exact except where leading bit ignoring singn is >23
# take its 2's complement to get the negative number and put it
# into the register
# 32 bit number 1 bit sign 31 left any number with leading 1 #position >23 (counting from 0) will be inexact when in floating point
# 30-24 bit positons can have a leading ones at at any position and result is an inexact
my $twoPwr32 = 0x100000000; #2**32
my #num=();
for(my $i=0; $i<100; $i++)
{
$sgn = (rand()%2);
my $tempLead = (rand()%7); # there are 7 bits from 24 to 30
$number=$tempLead << 24;
if($sgn)
{$number = ($twoPwr32- $number +1) & 0xffffffff;
}
$number = sprintf("%x", $number);
push(#num, $number);
}
my $item=0;
foreach $item (#num)
{
print "$item\n";
print VCVT "$item\n";
}
Try using use diagnostics to get a better error message and read perldoc bignum. The error and explanation is given there that usage of bignum internally converts the numbers into bignum and returns a reference. Since I have perl 5.14 documentation I have the link for documentation of perl 5.20 and I think the bug still exists. Refer to http://perldoc.perl.org/bignum.html
Update :
Hexadecimal number > 0xffffffff non-portable at throw_stack.pl line 19 (#1)
(W portable) The hexadecimal number you specified is larger than 2**32-1
(4294967295) and therefore non-portable between systems. See
perlport for more on portability concerns.
Also refer to this question for the usage of 64 bit arithmetic in Perl.

Perl ord and chr working with unicode

To my horror I've just found out that chr doesn't work with Unicode, although it does something. The man page is all but clear
Returns the character represented by that NUMBER in the character set. For example, chr(65)" is "A" in either ASCII or Unicode, and chr(0x263a) is a Unicode smiley face.
Indeed I can print a smiley using
perl -e 'print chr(0x263a)'
but things like chr(0x00C0) do not work. I see that my perl v5.10.1 is a bit ancient, but when I paste various strange letters in the source code, everything's fine.
I've tried funny things like use utf8 and use encoding 'utf8', I haven't tried funny things like use v5.12 and use feature 'unicode_strings' as they don't work with my version, I was fooling around with Encode::decode to find out that I need no decoding as I have no byte array to decode. I've read much more documentation than ever before, and found quite a few interesting things but nothing helpful. It looks like a sort of the Unicode Bug but there's no usable solution given. Moreover I don't care about the whole string semantics, all I need is a trivial function.
So how can I convert a number into a string consisting of the single character corresponding with it, so that for example real_chr(0xC0) eq 'À' holds?
The first answer I've got explains quite everything about IO, but I still don't understand why
#!/usr/bin/perl -w
use strict;
use utf8;
use encoding 'utf8';
print chr(0x00C0) eq 'À' ? 'eq1' : 'ne1', " - ", chr(0x263a) eq '☺' ? 'eq1' : 'ne1', "\n";
print 'À' =~ /\w/ ? "match1" : "no_match1", " - ", chr(0x00C0) =~ /\w/ ? "match2" : "no_match2", "\n";
prints
ne1 - eq1
match1 - no_match2
It means that the manually entered 'À' differs from chr(0x00C0). Moreover, the former is a word constituent character (correct!) while the latter is not (but should be!).
First,
perl -le'print chr(0x263A);'
is buggy. Perl even tells you as much:
Wide character in print at -e line 1.
That doesn't qualify as "working". So while they differ in how fail to provide what you want, neither of the following gives you what you want:
perl -le'print chr(0x263A);'
perl -le'print chr(0x00C0);'
To properly output the UTF-8 encoding of those Unicode code points, you need to tell Perl to encoding the Unicode points with UTF-8.
$ perl -le'use open ":std", ":encoding(UTF-8)"; print chr(0x263A);'
☺
$ perl -le'use open ":std", ":encoding(UTF-8)"; print chr(0x00C0);'
À
Now on to the "why".
File handle can only transmit bytes, so unless you tell it otherwise, Perl file handles expect bytes. That means the string you provide to print cannot contain anything but bytes, or in other words, it cannot contain characters over 255. The output is exactly what you provide:
$ perl -e'print map chr, 0x00, 0x65, 0xC0, 0xF0' | od -t x1
0000000 00 65 c0 f0
0000004
This is useful. This is different then what you want, but that doesn't make it wrong. If you want something different, you just need to tell Perl what you want.
By adding an :encoding layer, the handle now expects a string of Unicode characters, or as I call it, "text". The layer tells Perl how to convert the text into bytes.
$ perl -e'
use open ":std", ":encoding(UTF-8)";
print map chr, 0x00, 0x65, 0xC0, 0xF0, 0x263a;
' | od -t x1
0000000 00 65 c3 80 c3 b0 e2 98 ba
0000011
Your right that chr doesn't know or care about Unicode. Like length, substr, ord and reverse, chr implements a basic string function, not a Unicode function. That doesn't mean it can't be used to work with text string. As you've seen, the problem wasn't with chr but with what you did with the string after you built it.
A character is an element of a string, and a character is a number. That means a string is just a sequence of numbers. Whether you treat those numbers as Unicode code points (text), packed IP addresses or temperature measurements is entirely up to you and the functions to which you pass the strings.
Here are a few example of operators that do assign meaning to the strings they receive as operands:
m// expects a string of Unicode code points.
connect expects a sequence of bytes that represent a sockaddr_in structure.
print with a handle without :encoding expect a sequence of bytes.
print with a handle with :encoding expect a sequence of Unicode code points.
etc
So how can I convert a number into a string consisting of the single character corresponding with it, so that for example real_chr(0xC0) eq 'À' holds?
chr(0xC0) eq 'À' does hold. Did you remember to tell Perl you encoded your source code using UTF-8 by using use utf8;? If you didn't tell Perl, Perl actually sees a two-character string on the RHS.
Regarding the question you've added:
There are problems with the encoding pragma. I recommend against using it. Instead, use
use open ':std', ':encoding(UTF-8)';
That'll fix one of the problems. The other problem you are encountering is with
chr(0x00C0) =~ /\w/
It's a known bug that's intentionally left broken for backwards compatibility reasons. That is, unless you request a more recent version of the language as follows:
use 5.014; # use 5.012; *might* suffice.
A workaround that works as far back as 5.8:
my $x = chr(0x00C0);
utf8::upgrade($x);
$x =~ /\w/

Convert raw hex to readable hex in perl?

I have written a little perl app to reads the serial port. When I run my little script I receive data but it's written in unreadable signs.. it shows like *I??. However if I do
perl test.pl | hexdump
I get the required data. And the hex data makes sense to me. Does anyone know how I can get this output using perl without using hexdump?
Right now I use print ($data) to print my data.
"Raw hex" doesn't mean anything; what you've got is a string of bytes that you want to convert to a textual representation. To do that you can use unpack. For example,
my $bytes = read_from_serial_port();
my $hex = unpack 'h*', $bytes;
Use H instead of h if you want the opposite endianness. (I always forget which is which.)

In Perl, can I treat a string as a byte array?

In Perl, is it appropriate to use a string as a byte array containing 8-bit data? All the documentation I can find on this subject focuses on 7-bit strings.
For instance, if I read some data from a binary file into $data
my $data;
open FILE, "<", $filepath;
binmode FILE;
read FILE $data 1024;
and I want to get the first byte out, is substr($data,1,1) appropriate? (again, assuming it is 8-bit data)
I come from a mostly C background, and I am used to passing a char pointer to a read() function. My problem might be that I don't understand what the underlying representation of a string is in Perl.
The bundled documentation for the read command, reproduced here, provides a lot of information that is relevant to your question.
read FILEHANDLE,SCALAR,LENGTH,OFFSET
read FILEHANDLE,SCALAR,LENGTH
Attempts to read LENGTH characters of data into variable SCALAR
from the specified FILEHANDLE. Returns the number of
characters actually read, 0 at end of file, or undef if there
was an error (in the latter case $! is also set). SCALAR will
be grown or shrunk so that the last character actually read is
the last character of the scalar after the read.
An OFFSET may be specified to place the read data at some place
in the string other than the beginning. A negative OFFSET
specifies placement at that many characters counting backwards
from the end of the string. A positive OFFSET greater than the
length of SCALAR results in the string being padded to the
required size with "\0" bytes before the result of the read is
appended.
The call is actually implemented in terms of either Perl's or
system's fread() call. To get a true read(2) system call, see
"sysread".
Note the characters: depending on the status of the filehandle,
either (8-bit) bytes or characters are read. By default all
filehandles operate on bytes, but for example if the filehandle
has been opened with the ":utf8" I/O layer (see "open", and the
"open" pragma, open), the I/O will operate on UTF-8 encoded
Unicode characters, not bytes. Similarly for the ":encoding"
pragma: in that case pretty much any characters can be read.
See perldoc -f pack and perldoc -f unpack for how to treat strings as byte arrays.
You probably want to use sysopen and sysread if you want to read bytes from binary file.
See also perlopentut.
Whether this is appropriate or necessary depends on what exactly you are trying to do.
#!/usr/bin/perl -l
use strict; use warnings;
use autodie;
use Fcntl;
sysopen my $bin, 'test.png', O_RDONLY;
sysread $bin, my $header, 4;
print map { sprintf '%02x', ord($_) } split //, $header;
Output:
C:\Temp> t
89504e47
Strings are strings of "characters", which are bigger than a byte.1 You can store bytes in them and manipulate them as though they are characters, taking substrs of them and so on, and so long as you're just manipulating entities in memory, everything is pretty peachy. The data storage is weird, but that's mostly not your problem.2
When you try to read and write from files, the fact that your characters might not map to bytes becomes important and interesting. Not to mention annoying. This annoyance is actually made a bit worse by Perl trying to do what you want in the common case: If all the characters in the string fit into a byte and you happen to be on a non-Windows OS, you don't actually have to do anything special to read and write bytes. Perl will complain, however, if you have stored a non-byte-sized character and try to write it without giving it a clue about what to do with it.
This is getting a little far afield, largely because encoding is a large and confusing topic. Let me leave it off there with some references: Look at Encode(3perl), open(3perl), perldoc open, and perldoc binmode for lots of hilarious and gory details.
So the summary answer is "Yes, you can treat strings as though they contained bytes if they do in fact contain bytes, which you can assure by only reading and writing bytes.".
1: Or pedantically, "which can express a larger range of values than a byte, though they are stored as bytes when that is convenient". I think.
2: For the record, strings in Perl are internally represented by a data structure called a 'PV' which in addition to a character pointer knows things like the length of the string and the current value of pos.3
3: Well, it will start storing the current value of pos if it starts being interesting. See also
use Devel::Peek;
my $x = "bluh bluh bluh bluh";
Dump($x);
$x =~ /bluh/mg;
Dump($x);
$x =~ /bluh/mg;
Dump($x);
It might help more if you tell us what you are trying to do with the byte array. There are various ways to work with binary data, and each lends itself to a different set of tools.
Do you want to convert the data into a Perl array? If so, pack and unpack are a good start. split could also come in handy.
Do you want to access individual elements of the string without unpacking it? If so, substr is fast and will do the trick for 8 byte data. If you want other bit depths, take a look at the vec function, which treads a string as a bit vector.
Do you want to scan the string and convert certain bytes to other bytes? Then the s/// or tr/// constructs might be useful.
Allow me just to post a small example about treating string as binary array - since I myself found it difficult to believe that something called "substr" would handle null bytes; but seemingly it does - below is a snippet of a perl debugger terminal session (with both string and array/list approaches):
$ perl -d
Loading DB routines from perl5db.pl version 1.32
Editor support available.
Enter h or `h h' for help, or `man perldebug' for more help.
^D
Debugged program terminated. Use q to quit or R to restart,
use o inhibit_exit to avoid stopping after program termination,
h q, h R or h o to get additional info.
DB<1> $str="\x00\x00\x84\x00"
DB<2> print $str
�
DB<3> print unpack("H*",$str) # show content of $str as hex via `unpack`
00008400
DB<4> $str2=substr($str,2,2)
DB<5> print unpack("H*",$str2)
8400
DB<6> $str2=substr($str,1,3)
DB<7> print unpack("H*",$str2)
008400
[...]
DB<30> #stra=split('',$str); print #stra # convert string to array (by splitting at empty string)
�
DB<31> print unpack("H*",$stra[3]) # print indiv. elems. of array as hex
00
DB<32> print unpack("H*",$stra[2])
84
DB<33> print unpack("H*",$stra[1])
00
DB<34> print unpack("H*",$stra[0])
00
DB<35> print unpack("H*",join('',#stra[1..3])) # print only portion of array/list via indexes (using flipflop [two dots] operator)
008400