perl bitwise AND and bitwise shifting - perl

I was reading some example code snippet for the module Net::Pcap::Easy, and I came across this piece of code
my $l3protlen = ord substr $raw_bytes, 14, 1;
my $l3prot = $l3protlen & 0xf0 >> 2; # the protocol part
return unless $l3prot == 4; # return unless IPv4
my $l4prot = ord substr $packet, 23, 1;
return unless $l4prot == '7';
After doing a total hex dump of the raw packet $raw_bytes, I can see that this is an ethernet frame, and not on a TCP/UDP packet. Can someone please explain what the above code does?

For parsing the frame, I looked up this page.
Now onto the Perl...
my $l3protlen = ord substr $raw_bytes, 14, 1;
Extract the 15th byte (character) from $raw_bytes, and convert to its ordinal value (e.g. a character 'A' would be converted to an integer 65 (0x41), assuming the character set is ASCII). This is how Perl can handle binary data as if it were a string (e.g. passing it to substr) but then let you get the binary values back out and handle them as numbers. (But remember TMTOWTDI.)
In the IPv4 frame, the first 14 bytes are the MAC header (6 bytes each for destination and source MAC address, followed by 2-byte Ethertype which was probably 0x8000 - you could have checked this). Following this, the 15th byte is the start of the Ethernet data payload: the first byte of this contains Version (upper 4 bytes) and Header Length in DWORDs (lower 4 bytes).
Now it looks to me like there is a bug in the next line of this sample code, but it may well normally work by a fluke!
my $l3prot = $l3protlen & 0xf0 >> 2; # the protocol part
In Perl, >> has higher precedence than &, so this will be equivalent to
my $l3prot = $l3protlen & (0xf0 >> 2);
or if you prefer
my $l3prot = $l3protlen & 0x3c;
So this extracts bits 2 - 5 from the $l3prot value: the mask value 0x3c is 0011 1100 in binary. So for example a value of 0x86 (in binary, 1000 0110) would become 0x04 (binary 0000 0100).
In fact a 'normal' IPv4 value is 0x45, i.e. protocol type 4, header length 5 dwords. Mask that with 0x3c and you get... 4! But only by fluke: you have tested the top 2 bits of the length, not the protocol type!
This line should surely be
my $l3prot = ($l3protlen & 0xf0) >> 4;
(note brackets for precedence and a shift of 4 bits, not 2). (I found this same mistake in the CPAN documentation so I guess it's probably quite widely spread.)
return unless $l3prot == 4; # return unless IPv4
For IPv4 we expect this value to be 4 - if it isn't, jump out of the function right away. (So the wrong code above gives the result which lets this be interpreted as an IPv4 packet, but only by luck.)
my $l4prot = ord substr $packet, 23, 1;
Now extract the 24th byte and convert to ordinal value in the same way. This is the Protocol byte from the IP header:
return unless $l4prot == '7';
We expect this to be 7 - if it isn't jump out of the function right away. (According to IANA, 7 is "Core-based trees"... but I guess you know which protocols you are interested in!)

Related

how to set max-functions in pcie device tree? (about the syntax "max-functions = /bits/ 8 <8>;")

In linux-6.15.68, in Documentation/devicetree/bindings/pci/rockchip-pcie-ep.txt, I see these explanations. (please see the marked line.)
Optional Property:
- num-lanes: number of lanes to use
- max-functions: Maximum number of functions that can be configured (default 1).
pcie0-ep: pcie#f8000000 {
compatible = "rockchip,rk3399-pcie-ep";
#address-cells = <3>;
#size-cells = <2>;
rockchip,max-outbound-regions = <16>;
clocks = <&cru ACLK_PCIE>, <&cru ACLK_PERF_PCIE>,
<&cru PCLK_PCIE>, <&cru SCLK_PCIE_PM>;
clock-names = "aclk", "aclk-perf",
"hclk", "pm";
max-functions = /bits/ 8 <8>; // <---- see this line
num-lanes = <4>;
reg = <0x0 0xfd000000 0x0 0x1000000>, <0x0 0x80000000 0x0 0x20000>;
<skip>
In the example dts, what does "max-functions = /bins/ 8 <8>;" mean?
I found in Documentation/devicetree/bindings/pci/snps,dw-pcie-ep.yaml, it says
max-functions:
$ref: /schemas/types.yaml#/definitions/uint32
description: maximum number of functions that can be configured
But I don't know how to read the $ref document.
ADD
I found this.
The storage size of an element can be changed using the /bits/ prefix. The /bits/ prefix allows for the creation of 8, 16, 32, and
64-bit elements. The resulting array will not be padded to a
multiple of the default 32-bit element size.
e.g. interrupts = /bits/ 8 <17 0xc>; e.g. clock-frequency = /bits/
64 <0x0000000100000000>;
Does this mean 17 and 0xc are both 8-bit variables and when it is compiled to dtb, it keep the 8-bit format? The linux code will analyze the dtb file, then does the dtb contains the format information too?
The Device Tree Compiler v1.4.0 onwards supports some extra syntaxes for specifying property values that are not present in The Devicetree Specification up to at least version v0.4-rc1. These extra property value syntaxes are documented in the Device Tree Compiler's Device Tree Source Format and include:
A number in an array between angle brackets can be specified as a character literal such as 'a', '\r' or '\xFF'.
The size of elements in an array between angle brackets can be set using the prefix /bits/ and a bit-width of 8, 16, 32, or 64, defaulting to 32-bit integers.
The binary Flattened Devicetree (DTB) Format contains no explicit information on the type of a property value. A property value is just a string of bytes. Numbers (and character literals) between angle brackets in the source are converted to bytes in big-endian byte order in accordance with the element size in bits divided by 8. For example:
<0x11 'a'> is encoded in the same way as the bytestring [00 00 00 11 00 00 00 61].
/bits/ 8 <17 0xc> is encoded in the same way as the bytestring [11 0c].
It is up to the reader of the property value to "know" what type it is expecting. For example, the Rockchip AXI PCIe endpoint controller driver in the Linux kernel ("drivers/pci/controller/pcie-rockchip-ep.c") "knows" that the "max-functions" property should have been specified as a single byte and attempts to read it using the statement err = of_property_read_u8(dev->of_node, "max-functions", &ep->epc->max_functions);. (It is probably encoded as a single byte property for convenience so that it can be copied directly into the u8 max_functions member of a struct pci_epc.)

Convert byte array (hex) to signed Int

I am trying to convert a (variable length) Hex String to Signed Integer (I need either positive or negative values).
[Int16] [int 32] and [int64] seem to work fine with 2,4+ byte length Hex Strings but I'm stuck with 3 byte strings [int24] (no such command in powershell).
Here's what I have now (snippet):
$start = $mftdatarnbh.Substring($DataRunStringsOffset+$LengthBytes*2+2,$StartBytes*2) -split "(..)"
[array]::reverse($start)
$start = -join $start
if($StartBytes*8 -le 16){$startd =[int16]"0x$($start)"}
elseif($StartBytes*8 -in (17..48)){$startd =[int32]"0x$($start)"}
else{$startd =[int64]"0x$($start)"}
With the above code, a $start value of "D35A71" gives '13851249' instead of '-2925967'. I tried to figure out a way to implement two's complement but got lost. Any easy way to do this right?
Thank you in advance
Edit: Basically, I think I need to implement something like this:
int num = (sbyte)array[0] << 16 | array[1] << 8 | array[2];
as seen here.
Just tried this:
$start = "D35A71"
[sbyte]"0x$($start.Substring(0,2))" -shl 16 -bor "0x$($start.Substring(2,2))" -shl 8 -bor "0x$($start.Substring(4,2))"
but doesn't seem to get the correct result :-/
To parse your hex.-number string as a negative number you can use [bigint] (System.Numerics.BigInteger):
# Since the most significant hex digit has a 1 as its most significant bit
# (is >= 0x8), it is parsed as a NEGATIVE number.
# To force unconditional interpretation as a positive number, prepend '0'
# to the input hex string.
PS> [bigint]::Parse('D35A71', 'AllowHexSpecifier')
-2925967
You can cast the resulting [bigint] instance back to an [int] (System.Int32).
Note:
The result is a negative number, because the most significant hex digit of the hex input string is >= 0x8, i.e. has its high bit set.
To force [bigint] to unconditionally interpret a hex. input string as a positive number, prepend 0.
The internal two's complement representation of a resulting negative number is performed at byte boundaries, so that a given hex number with an odd number of digits (i.e. if the first hex digit is a "half byte") has the missing half byte filled with 1 bits.
Therefore, a hex-number string whose most significant digit is >= 0x8 (parses as a negative number) results in the same number as prepending one or more Fs (0xF == 1111) to it; e.g., the following calls all result in -2048:
[bigint]::Parse('800', 'AllowHexSpecifier'),
[bigint]::Parse('F800', 'AllowHexSpecifier'),
[bigint]::Parse('FF800', 'AllowHexSpecifier'), ...
See the docs for details about the parsing logic.
Examples:
# First digit (7) is < 8 (high bit NOT set) -> positive number
[bigint]::Parse('7FF', 'AllowHexSpecifier') # -> 2047
# First digit (8) is >= 8 (high bit IS SET) -> negative number
[bigint]::Parse('800', 'AllowHexSpecifier') # -> -2048
# Prepending additional 'F's to a number that parses as
# a negative number yields the *same* result
[bigint]::Parse('F800', 'AllowHexSpecifier') # -> -2048
[bigint]::Parse('FF800', 'AllowHexSpecifier') # -> -2048
# ...
# Starting the hex-number string with '0'
# *unconditionally* makes the result a *positive* number
[bigint]::Parse('0800', 'AllowHexSpecifier') # -> 2048

Perl - Forming a value from bits lying between two (16 bit) fields (data across word boundaries)

I have am reading some data into 16 bit data words, and extracting VALUES from parts of the 16 bit words. Some of the values I need straddles the word boundaries.
I need to take the bits from the first word and some from the second word and join them to form a value.
I am thinking of the best way to do this. I could bit shift stuff all over the place and compose the data that way, but I am thinking there must be perhaps an easier/better way because I have many cases like this and the values are in some case different sizes (which I know since I have a data map).
For instance:
[TTTTTDDDDPPPPYYY] - 16 bit field
[YYYYYWWWWWQQQQQQ] - 16 bit field
TTTTT = 5 bit value, easily extracted
DDDD = 4 bit value, easily extracted
WWWWW = 5 bit value, easily extracted
QQQQQQ = 6 bit value, easily extracted
YYYYYYYY = 8 bit value, which straddles the word boundaries. What is the best way to extract this? In my case I have a LOT of data like this, so elegance/simplicity in a solution is what I seek.
Aside - In Perl what are the limits of left shifting? I am on a 32 bit computer, am I right to guess that my (duck) types are 32 bit variables and that I can shift that far, even though I unpacked the data as 16 bits (unpack with type n) into a variable? This situation came up in the case of trying to extract a 31 bit variable that lies between two 16 bit fields.
Lastly (someone may ask), reading/unpacking the data into 32 bit words does not help me as I still face the same issue - Data is not aligned on word boundaries but crosses it.
The size of your integers are given (in bytes) by perl -V:ivsize or programatically using use Config qw( %Config ); $Config{ivsize}. They'll have 32 bit in a 32-bit build (since they are guaranteed to be large enough to hold a pointer). That means you can use
my $i = ($hi << 16 | $lo); # TTTTTDDDDPPPPYYYYYYYYWWWWWQQQQQQ
my $q = ($i >> 0) & (2**6-1);
my $w = ($i >> 6) & (2**5-1);
my $y = ($i >> 11) & (2**8-1);
my $p = ($i >> 19) & (2**4-1);
my $d = ($i >> 23) & (2**4-1);
my $t = ($i >> 27) & (2**5-1);
If you wanted to stick to 16 bits, you could use the following:
my $y = ($hi & 0x7) << 5 | ($lo >> 11);
00000[00000000YYY ]
[ YYYYY]WWWWWQQQQQQ
------------------
[00000000YYYYYYYY]

Default integer length in Perl?

I am reading some script which is written in Perl and I don't understand it. I have never used Perl before. I've read someting about scalars and that confuses me. For an example look at this code:
my $sim_packets = 5;
my $sim_length = $payload_length * 2 * $sim_packets;
push #data, (1...$sim_length * 10);
my $data_32bit = 0;
if I use after this:
$data_32bit = shift #data;
What is length of $data_32bit in bits?
I ask this because I have another array in this code: #payload, and this line confuses me:
push #payload, ($data_32bit >> 24) & 0xff,
($data_32bit >> 16) & 0xff,
($data_32bit >> 8) & 0xff,
($data_32bit) & 0xff;
$data_32bit is 32 bit long?
Oh boy,
push #payload, ($data_32bit >> 24) & 0xff, ($data_32bit >> 16) & 0xff, ($data_32bit >> 8) & 0xff, ($data_32bit) & 0xff;
Somebody apparently needs to
perldoc -f pack
perldoc -f unpack
Regarding your question, $data_32bit is not 32bit long, just becasue the term 32bit appears in its name. If you need to know how exactly it is represented, you should go for Data::Dumper.
Perl stores integers in the mantissa of a native floating point number, so it really depends on the machine architecture. With IEEE, it should be something like 53 bits.
If I understand correctly third line of this code created array: 1..50?
It added to an array rather than creating one.
The scalars added consist of numbers starting with 1 and going up to and including $payload_length * 2 * $sim_packets * 10, which is $payload_length * 100.
$payload_length is unlikely to be 1/2, so I suspect the number of scalars added is more than the 50 you mentioned.
What is length of $data_32bit in bits?
What does that even mean?
The size of the scalar of 24 bytes on one of my system.
$ perl -MDevel::Size=total_size -E'$i=123; say total_size $i'
24
The amount of bits required to store the value:
ceil(log($payload_length * 100) / log(2))
In this case, the author appears to be indicating the value will/should fit in 32 bits. That will be the case unless $payload_length exceeds some number larger than 40,000,000.
and this line confuses me:
It adds four values to the array. The four values correspond to the bytes of $data_32bit when stored as a unsigned two's complement number with the most significant byte first.

implementation of sha-256 in perl

i'm trying very hard on implementing the sha-256 algorithm. I have got problems with the padding of the message. for sha-256 you have to append one bit at the end of the message, which I have reached so far with $message .= (chr 0x80);
The next step should be to fill the emtpy space(512bit block) with 0's.
I calculated it with this formula: l+1+k=448-l and append it then to the message.
My problem comes now:Append in the last 64bit block the binary representation of the length of the message and fill the rest with 0's again. Since perl handles their data types by themself, there is no "byte" datatype. How can I figure out which value I should append?
please see also the official specification:
http://csrc.nist.gov/publications/fips/fips180-3/fips180-3_final.pdf
If at all possible, pull something off the shelf. You do not want to roll your own SHA-256 implementation because to get official blessing, you would have to have it certified.
That said, the specification is
5.1.1 SHA-1, SHA-224 and SHA-256
Suppose that the length of the message, M, is l bits. Append the bit 1 to the end of the message, followed by k zero bits, where k is the smallest, non-negative solution to the equation
l + 1 + k ≡ 448 mod 512
Then append the 64-bit block that is equal to the number l expressed using a binary representation. For example, the (8-bit ASCII) message “abc” has length 8 × 3 = 24, so the message is padded with a one bit, then 448 - (24 + 1) = 423 zero bits, and then the message length, to become the 512-bit padded message
423 64
.-^-. .---^---.
01100001 01100010 01100011 1 00…00 00…011000
“a” “b” “c” '-v-'
l=24
Then length of the padded message should now be a multiple of 512 bits.
You might be tempted to use vec because it allows you to address single bits, but you would have to work around funky addressing.
If bits is 4 or less, the string is broken into bytes, then the bits of each byte are broken into 8/BITS groups. Bits of a byte are numbered in a little-endian-ish way, as in 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80. For example, breaking the single input byte chr(0x36) into two groups gives a list (0x6, 0x3); breaking it into 4 groups gives (0x2, 0x1, 0x3, 0x0).
Instead, a pack template of B* specifies
A bit string (descending bit order inside each byte).
and N
An unsigned long (32-bit) in "network" (big-endian) order.
The latter is useful for assembling the message length. Although pack has a Q parameter for quad, the result is in the native order.
Start with a bit of prep work
our($UPPER32BITS,$LOWER32BITS);
BEGIN {
use Config;
die "$0: $^X not configured for 64-bit ints"
unless $Config{use64bitint};
# create non-portable 64-bit masks as constants
no warnings "portable";
*UPPER32BITS = \0xffff_ffff_0000_0000;
*LOWER32BITS = \0x0000_0000_ffff_ffff;
}
Then you can defined pad_message as
sub pad_message {
use bytes;
my($msg) = #_;
my $l = bytes::length($msg) * 8;
my $extra = $l % 512; # pad to 512-bit boundary
my $k = 448 - ($extra + 1);
# append 1 bit followed by $k zero bits
$msg .= pack "B*", 1 . 0 x $k;
# add big-endian length
$msg .= pack "NN", (($l & $UPPER32BITS) >> 32), ($l & $LOWER32BITS);
die "$0: bad length: ", bytes::length $msg
if (bytes::length($msg) * 8) % 512;
$msg;
}
Say the code prints the padded message with
my $padded = pad_message "abc";
# break into multiple lines for readability
for (unpack("H*", $padded) =~ /(.{64})/g) {
print $_, "\n";
}
Then the output is
6162638000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000018
which matches the specification.
First of all I hope you do this just as an exercise -- there is a Digest module in core that already computes SHA-256 just fine.
Note that $message .= (chr 0x80); appends one byte, not one bit. If you really need bitwise manipulation, take a look at the vec function.
To get the binary representation of an intger, you should use pack. To get it to 64 bit, do something like
$message .= pack 'Q', length($message)
Note that the 'Q' format is only available on 64 bit perls; if yours isn't one, simply concatenate four 0-bytes with a 32 bit value (pack format L).