Little Endian Encoding - encoding

The following byte sequence is encoded as Little Endian Unsigned Int.
F0 00 00 00
I just read about endianness. Just wanted to verify if it is 240 decimal.

Translating the byte sequence to bits...
[1111 0000] [0000 0000] [0000 0000] [0000 0000]
Converting the first byte to decimal...
= 0*2^0 + 0*2^1 + 0*2^2 + 0*2^3 + 1*2^4 + 1*2^5 + 1*2^6 + 1*2^7
Doing the math...
= 16 + 32 + 64 + 128 = 240

Yes, 0x000000F0 = 240.
If it were big-endian, it would be 0xF0000000 = 4026531840 (or -268435456 if signed).

Related

[guid]::NewGuid().GetBytes() returns different result than [System.Text.Encoding]::UTF8.GetBytes(...)

I found this excellent approach on shortening GUIDs here on stackowerflow: .NET Short Unique Identifier
I have some other strings that I wanted to treat the same way, but I found out that in most cases the Base64String is even longer than the original string.
My question is: why does [guid]::NewGuid().ToByteArray() return a significant smaller byte array than [System.Text.Encoding]::UTF8.GetBytes([guid]::NewGuid().Guid)?
For example, let's look at the following GUID:
$guid = [guid]::NewGuid()
$guid
Guid
----
34c2b21e-18c3-46e7-bc76-966ae6aa06bc
With $guid.GetBytes(), the following is returned:
30
178
194
52
195
24
231
70
188
118
150
106
230
170
6
188
And [System.Convert]::ToBase64String($guid.ToByteArray()) generates HrLCNMMY50a8dpZq5qoGvA==
[System.Convert]::ToBase64String([System.Text.Encoding]::UTF8.GetBytes($guid.Guid)), however, returns MzRjMmIyMWUtMThjMy00NmU3LWJjNzYtOTY2YWU2YWEwNmJj, with [System.Text.Encoding]::UTF8.GetBytes($guid.Guid) being:
51
52
99
50
98
50
49
101
45
49
56
99
51
45
52
54
101
55
45
98
99
55
54
45
57
54
54
97
101
54
97
97
48
54
98
99
The GUID struct is an object storing a 16 byte array that contains its value.
These are the 16 bytes you see when you perform its method .ToByteArray() method.
The 'normal' string representation is a grouped series of these bytes in hexadecimal format. (4-2-2-2-6)
As for converting to Base64, this will always return a longer string because each Base64 digit represents exactly 6 bits of data.
Therefore, every three 8-bits bytes of the input (3×8 bits = 24 bits) can be represented by four 6-bit Base64 digits (4×6 = 24 bits).
The resulting string can even be extended with = padding characters at the end of the string to always be a multiple of 4.
The result is a string of [math]::Ceiling(<original size> / 3) * 4 length.
Using [System.Text.Encoding]::UTF8.GetBytes([guid]::NewGuid().Guid) is actually first performing the GUID's .ToString() method and from that string it will return the ascii values of each character in there.
(hexadecimal representation = 2 characters per byte = 32 values + the four dashes in it leaves a 36-byte array)
[guid]::NewGuid().ToByteArray()
In the scope of this question, a GUID can be seen as a 128-bit number (actually it is a structure, but that's not relevant to the question). When converting it into a byte array, you divide 128 by 8 (bits per byte) and get an array of 16 bytes.
[System.Text.Encoding]::UTF8.GetBytes([guid]::NewGuid().Guid)
This converts the GUID to a hexadecimal string representation first. Then this string gets encoded as UTF-8.
A hex string uses two characters per input byte (one hex digit for the lower and one for the upper 4 bits). So we need at least 32 characters (16 bytes of GUID multiplied by 2). When converted to UTF-8 each character relates to exactly one byte, because all hex digits as well as the dash are in the basic ASCII range which maps 1:1 to UTF-8. So including the dashes we end up with 32 + 4 = 36 bytes.
So this is what [System.Convert]::ToBase64String() has to work with - 16 bytes of input in the first case and 36 bytes in the second case.
Each Base64 output digit represents up to 6 input bits.
16 input bytes = 128 bits, divided by 6 = 22 Base64 characters
36 input bytes = 288 bits, divided by 6 = 48 Base64 characters
That's how you end up with more than twice the number of Base64 characters when converting a GUID to hex string first.

Create PCAP file from values in a database

I have a database filled with a lot of logged IPV4 messages. It is used to get queries like: "give me all messages from MacAddress ... that were logged in the period ... to ... that have ..."
Some queries will result in a huge amount of logged messages. Therefore we decided to make a PCAP file if such a request was made.
"Please create a PCAP file containing all logged messages from your
database that ..."
So upon request, my service should fetch the requested data from the database (in pages) and create a PCAP file filled with the data fetched from the database. Later callers can ask for a read-only OWIN stream to this file
The service can create such a file. The problem is that it is not recognized as a proper WireShark file.
I've read Libcap File Format. Whenever I have to create a file filled with LoggedMessages I fill a binary file as follows.
Global Header
Per logged message:
A packet header
Packet data with:
Ethernet Frame: Destination Mac, Source Mac, EtherType (0x800)
IPV4 header
Logged Data
Wireshark starts complaining about the file when it attempts to read the Ethertype. It says this is a Length. Definition of Ethernet Frame with EtherType
So below I show the start of my file. Hexadecimal format per byte + my interpretation of it. After that the comments from wireshark
The created stream starts with the Global Header: a 32 bytes structure. First the hexadecimal values then the interpretation:
=== Global Header ====
D4 C3 B2 A1 02 00 04 00
00 00 00 00 00 00 00 00
FF FF 00 00 01 00 00 00
Magic number A1B2C3D4 (Original Time Precision)
Version: 2 - 4
ThisZone 0
sigFigs 0
snapLen 0000FFFF
datalinkType 1
Note that the magic number has the LSB first, indicating that every multi-byte number will have the least significant byte first. So a 2 byte value of 0x1234 will have in memory first 34 then 12.
After that the Packets should come. Every time one Packet Header, followed by one Packet Data
=== Packet header ===
09 89 58 5A C8 85 0B 00
6B 00 00 00 6B 00 00 00
Timestamp: 1515751689.7551446 (usec precision)
Number of saved bytes (incl_len) 107 bytes (0x006b)
Actual packet length (orig_len) 107 bytes (0x006b)
=== Packet Data ===
CF 31 59 D3 E7 98 53 39 - 17 F0 A9 9C 00 08 45 00
5D 00 00 00 00 00 FF 00 - E0 0D 8A 84 77 44 E0 2B
9C FB 4D 43 D5 8A 00 00 - 00 00 41 41 41 41 41 41
41 41 41 41 41 41 41 41 - 41 41 41 41 41 41 41 41
// etc, until total 107 bytes
The packet data consists of a Mac Header, IPV4 header and a couple of 0x41 as data
=== Mac Header ===
Destination Mac: CF:31:59:D3:E7:98
Source Mac: 53:39:17:F0:A9:9C
Ether type: 0800
Note that the magic number showed that every multi-byte number has the LSB first, so the two bytes 00 08 will have a 16-bit meaning of 0x0800
If you look at the PCAP file interpretation I show below, then the problem starts here: the Ether Type is not interpreted as Ether Type, but as length.
After remark in one of the answers, I tried to reverse the two byte ether type from 00 08 into 08 00 (MSB first), but that made the problems worse.
=== IPV4 header ===
- 45 00 5D 00
- 00 00 00 00
- FF 00 E0 0D
- 8A 84 77 44
- E0 2B 9C FB
Specification of the IPV4 header structure
DWORD 0
- bits 00..04: version; bits 04..07 IP Header Length: 04 05
- bits 08..13 DSCP; bits 14..15 ECN: 00
- bits 16..31 Total Length (header + Payload): 93 (005D)
DWORD 1
- bits 00..15 Identification: 0000
- bits 16..18 Flags; bits 19..31 offset: 0000
DWORD 2
- bits 00..07 Time to Live FF
- bits 08..15 Protocol; used protocol 00
- bits 16..31 Header Checksum 3552 (0DE0)
DWORD 3 and 4
Source IP: 138.132.119.68
Destination IP: 224.43.156.251
Bacause wireshark complains about checksum, I verify as follows:
Verify checksum:
Header: 0045 005D 0000 0000 00FF 0DE0 848A 4477 2BE0 FB9C
69 + 93 + 0 + 0 + 255 + 3552 + 33930 + 17527 + 11232 + 64412 = 131070 (01FFFE)
0001 + FFFE = FFFF
1's complement: 0000 (checksum ok)
This is what WireShark (version 2.4.4) makes of it:
The following seems normal:
Frame 1: 107 bytes on wire (856 bits), 107 bytes captured (856 bits)
Encapsulation type: Ethernet (1)
Arrival Time: Jan 12, 2018 11:08:09.755144000 W. Europe Standard Time
[Time shift for this packet: 0.000000000 seconds]
Epoch Time: 1515751689.755144000 seconds
[Time delta from previous captured frame: 0.000000000 seconds]
[Time delta from previous displayed frame: 0.000000000 seconds]
[Time since reference or first frame: 0.000000000 seconds]
Frame Number: 1
Frame Length: 107 bytes (856 bits)
Capture Length: 107 bytes (856 bits)
[Frame is marked: False]
[Frame is ignored: False]
[Protocols in frame: eth:llc:data]
[Coloring Rule Name: Checksum Errors]
[Coloring Rule String [truncated]: eth.fcs.status=="Bad" ||
ip.checksum.status=="Bad" || tcp.checksum.status=="Bad" ||
udp.checksum.status=="Bad" || sctp.checksum.status=="Bad" ||
mstp.checksum.status=="Bad" || cdp.checksum.status=="Bad" ||]
Here comes the first problem: EtherType is interpreted as Length
IEEE 802.3 Ethernet
Destination: cf:31:59:d3:e7:98 (cf:31:59:d3:e7:98)
Source: 53:39:17:f0:a9:9c (53:39:17:f0:a9:9c)
Length: 8
Padding: ff00e00d8a847744e02b9cfb4d43d58a0000000041414141...
Trailer: 414141414141414141414141414141414141414141414141...
Frame check sequence: 0x41414141 incorrect, should be 0xe19cae36
[FCS Status: Bad]
After the length, which I meant as an EtherType, comes a lot of padding, instead of interpretation of my 5 DWORDs.
The link to the Ethernet Frame in wikipedia I showed says:
The EtherType field is two octets long and it can be used for two
different purposes. Values of 1500 and below mean that it is used to
indicate the size of the payload in octets, while values of 1536 and
above indicate that it is used as an EtherType, to indicate which
protocol is encapsulated in the payload of the frame.
My value if 0x0800 = 2048. This certainly is above 1536
For example, an EtherType value of 0x0800 signals that the frame
contains an IPv4 datagram.
If value 0x0800 the incorrect value? Or is my error somewhere else?
Looks like your ethertype has the wrong byte order. It should be:
=== Packet Data ===
CF 31 59 D3 E7 98 53 39 - 17 F0 A9 9C 08 00 XX XX

UTF8 Hex Codepoint to Decimal Mismatch

I'm working on a program that takes the hex value of a unicode character and converts it to an integer, then to a byte array, then to a UTF-8 string. All is fine other than the fact that, for example, the hex value E2 82 AC (€ symbol) is 14 844 588 in decimal, but, if you look at the code point value of it on the web page provided below, it's 226 130 172, which is a big difference.
http://utf8-chartable.de/unicode-utf8-table.pl?start=8320&number=128&names=-
If you sort the values their by decimal, they're not just converting the hex to decimal. Obviously I don't understand encodings as well as I thought I did.
E2 82 AC maps to 226 130 172 instead of 14 844 588.
Why is this discrepancy?
Thanks in advance.
I think your statement, "the hex value E2 82 AC (€ symbol) is 14 844 588 in decimal", is incorrect.
How did you interpret the hex values E2, 82, and AC?
hex E2 = hex E * 16 + hex 2 = 14 * 16 + 2 = 226.
hex 82 = hex 8 * 16 + hex 2 = 8 * 16 + 2 = 130.
hex AC = hex A * 16 + hex C = 10 * 16 + 12 = 172.
So, the hex value E2 82 AC (€ symbol) is in fact 226 130 172 in decimal.

Matlab reading endian-incorrect binary data input / interpreting as uint32

While writing this post, I attempted b = fread(s, 1, 'uint32')
This would work great, but my poor data is sent LSB first! (no I can not change this)
Before, I was using b = fread(s, 4)' which gives me a vector similar to [47 54 234 0].
Here is my input stream:
0A
0D 39 EA 00 04 39 EA 00
4B 39 EA 00 D0 38 EA 00
0A
etc...
I can successfully delimit by 0x0A by
while ~isequal(fread(s, 1), 10) end
Basically I need to get the array of uint32s represented by [00EA390D 00EA3904 00EA394B 00EA38D0]
The documentation for swapbytes doesn't help me much and the uint32 operator operates on individual elements!!
The matlab fread function directly supports little endian machine format. Just set the 5th argument of the fread function to the string "L".
b = fread(s, 4, 'uint32',0,'l');

How does Binary Lambda Calculus encode parenthesis?

How does the BLC encode parenthesis? For example, how would this:
λa.λb.λc.(a ((b c) d))
Be encoded in BLC?
Note: the Wikipedia article is not very helpful as it uses an unfamiliar notation and provides only one simple example, which doesn't involve parenthesis, and a very complex example, which is hard to analyze. The paper is similar in that aspect.
If you mean the binary encoding based on De Bruijn indices discussed in the Wikipedia, that's actually quite simple. You first need to do De Bruijn encoding, which means replacing the variables with natural numbers denoting the number of λ binders between the variable and its λ binder. In this notation,
λa.λb.λc.(a ((b c) d))
becomes
λλλ 3 ((2 1) d)
where d is some natural number >=4. Since it is unbound in the expression, we can't really tell which number it should be.
Then the encoding itself, defined recursively as
enc(λM) = 00 + enc(M)
enc(MN) = 01 + enc(M) + enc(N)
enc(i) = 1*i + 0
where + denotes string concatenation and * means repetition. Systematically applying this, we get
enc(λλλ 3 ((2 1) d))
= 00 + enc(λλ 3 ((2 1) d))
= 00 + 00 + enc(λ 3 ((2 1) d))
= 00 + 00 + 00 + enc(3 ((2 1) d))
= 00 + 00 + 00 + 01 + enc(3) + enc((2 1) d)
= 00 + 00 + 00 + 01 + enc(3) + 01 + enc(2 1) + enc(d)
= 00 + 00 + 00 + 01 + enc(3) + 01 + 01 + enc(2) + enc(1) + enc(d)
= 000000011110010111010 + enc(d)
and as you can see, the open parentheses are encoded as 01 while the close parens are not needed in this encoding.