Im not sure if endian is the right word but..
I have been parsing through a PNG file and I have noticed that all of the integer values are in big endian. Is this true?
For example, the width and height are stored in the PNG file as 32bit unsigned integers. My image is 16x16 and in the file its stored as:
00 00 00 10
when it should be:
10 00 00 00
Is this true or is there something I am missing?
Yes, according to the specification, integers must be in network byte order (big endian):
All integers that require more than one byte shall be in network byte order: the most significant byte comes first, then the less significant bytes in descending order of significance (MSB LSB for two-byte integers, MSB B2 B1 LSB for four-byte integers). The highest bit (value 128) of a byte is numbered bit 7; the lowest bit (value 1) is numbered bit 0. Values are unsigned unless otherwise noted. Values explicitly noted as signed are represented in two's complement notation.
http://www.w3.org/TR/2003/REC-PNG-20031110/#7Integers-and-byte-order
Integers in PNG are in network byte order (big endian).
See: the spec.
Related
I am streaming some data of fixed length. The data consists of some 64-bit ints as well as some 32-bit floats. Because the format of the data is fixed and known, I am just sending an array of bytes with a known endian-ness. The data can then be easily reconstructed at the other end.
However, my transport protocol will not allow any 0x00 bytes. Is there a way I can encode my data differently to avoid this? Losing some range in the data is fine (e.g. ints having a maximum of 2^60 is totally fine). Incresing the full size of the message is totally fine too, as long as the full length of data is fixed no matter what the values of the ints and floats are (e.g. if ints now take 9 bytes to store).
I don't know much about encoding formats, but I learned about CRCs a long time ago and I'm wondering if there's something like that, which will add some fixed length block to the end of the bytestream, but which will prevent the bytestream from containing any 0x00 bytes?
Let's take the case of 64-bit numbers:
Reduce your value range to 256 and use the last (or first) 7 bytes to encode that value.
For the 7 value bytes, replace all the 0x00 bytes with 0xff bytes. Record the positions of the bytes that have been flipped.
Use the remaining byte as a bit mask to encode the positions of the bytes that have been flipped. This will take up 7 bits of that remaining byte. The first (or last) bit of that byte needs to be always set to 1 to prevent the encoding byte to become 0x00 itself.
For example:
Take the 7 byte value b2 00 c3 d4 e5 ff 00.
Flip the 0x00 bytes to get b2 ff c3 d4 e5 ff ff. Bytes 2 and 7 have been flipped.
Create the bit mask 0100001 and prefix with a 1 bit to get a binary value of 10100001, or a hex value of 0xa1.
Your encoded 64-bit value will then be a1 b2 ff c3 d4 e5 ff ff.
The approach for 32-bit numbers is the same. Use 28 bits for the value, 3 bits to encode which bytes have been flipped, and the leftover bit always set to 1.
I am looking for an hashing algorithm that generates alphanumeric output. I did few tests with MD5 , SHA3 etc and they produce hexadecimal output.
Example:
Input: HelloWorld
Output[sha3_256]: 92dad9443e4dd6d70a7f11872101ebff87e21798e4fbb26fa4bf590eb440e71b
The 1st character in the above output is 9. Since output is in HEX format, maximum possible values are [0-9][a-f]
I am trying to achieve maximum possible values for the 1st character. [0-9][a-z][A-Z]
Any ideas would be appreciated . Thanks in advance.
Where MD5 computes a 128bit hash and SHA256 a 256bit hash, the output they provide is nothing more than a 128, respectively 256 long binary number. In short, that are a lot of zero's and ones. In order to use a more human-friendly representation of binary-coded values, Software developers and system designers use hexadecimal numbers, which is a representation in base(16). For example, an 8-bit byte can have values ranging from 00000000 to 11111111 in binary form, which can be conveniently represented as 00 to FF in hexadecimal.
You could convert this binary number into a base(32) if you want. This is represented using the characters "A-Z2-7". Or you could use base(64) which needs the characters "A-Za-z0-9+/". In the end, it is just a representation.
There is, however, some practical use to base(16) or hexadecimal. In computer lingo, a byte is 8 bits and a word consists of two bytes (16 bits). All of these can be comfortably represented hexadecimally as 28 = 24×24 = 16×16. Where 28 = 25×23 = 32×8. Hence, in base(32), a byte is not cleanly represented. You already need 5 bytes to have a clean base(32) representation with 8 characters. That is not comfortable to deal with on a daily basis.
I have found numerous references to the encoding requirements of Integers in ASN.1
and that Integers are inherently signed objects
TLV 02 02 0123 for exmaple.
However, I have a 256 bit integer (within a certificate) encoded
30 82 01 09 02 82 01 00 d1 a5 xx xx xx… 02 03 010001
30 start
82 2 byte length
0109 265 bytes
02 Integer
82 2 byte length
0100 256 bytes
d1 a5 xxxx
The d1 is the troubling part because the leading bit is 1, meaning this 256 bit number is signed when in fact it is an unsigned number, a public rsa key infact. Does the signed constraint apply to Integers > 64 bits?
Thanks,
BER/DER uses 2s-complement representation for encoding integer values. This means the the first bit (not byte) determines whether a number is positive or negative. This means that sometimes an extra leading zero byte needs to be added to prevent the first bit from causing the integer to be interpreted as a negative number. Note that it is invalid BER/DER to have the first 9 bits all zero.
Yes, you are right. For any non negative DER/BER-encoded INTEGER - no matter its length - the MSB of the first payload byte is 0.
The program that generated such key is incorrect.
The "signed constraint" (actually, a rule) totally applies to any size integers. However, depending on a domain you might find all sorts of oddities in how domain objects are encoded. This is something that has to be learned and accounted for the hard way, unfortunately.
What does this call to typecast do in MATLAB?
y=typecast(x,'single');
what does it mean? When I run typecast(3,'single') it gives 0 2.1250.
I don't understand what that is.
I am trying to convert this to Java, how can I do that?
From the MATLAB manual:
single - Convert to single precision
Syntax
B = single(A)
Description
B = single(A) converts the matrix A
to single precision, returning that
value in B. A can be any numeric
object (such as a double). If A is
already single precision, single has
no effect. Single-precision quantities
require less storage than
double-precision quantities, but have
less precision and a smaller range.
typecast reinterprets the bytes used to represent a value of one type as if those same bytes were representing a different type. For example, the constant 3 in MATLAB is an IEEE double-precision value, meaning it takes 8 bytes to store it. Those eight bytes in this case are
40 08 00 00 00 00 00 00
A value of type single in MATLAB is an IEEE single-precision value, meaning it takes only 4 bytes to store it. So the eight bytes of the double will map to two 4-byte singles, those being
40 08 00 00, and
00 00 00 00
It turns out that 40 08 00 00 is the single-precision representation of the value 2.125, and as you might guess, 00 00 00 00 is the single-precision representation of 0. I believe they come out in reverse order due to the endian-ness of the machine, and on a big-endian machine I think you'd get 2.125 0 instead.
In C++ this would be something like a reinterpret_cast. In Java, there doesn't appear to be as direct a mapping, but the answers to this Stack Overflow question discuss some alternatives such as Serialization.
From running help typecast it looks like it changes the datatype, but keeps the bit assignment the same, whereas single( ) keeps the number the same, but changes the bit arrangement.
If I understand it, you could think of it like you have two boxes, each containing up to 8 balls. Lets say, box 1 is full, whilst box 2 contains 3 balls. We now typecast this into a system where a box holds 4 balls.
This system will need three boxes to hold our balls. So we have boxes 1 and 2 which are full. Box 3 contains 3 balls.
So you'd have [8,3] converted to [4,4,3].
Alternatively, if you converted the number into our new system in the same way as single( ) works (e.g. for changing an int8 to a single), you'd change the number of balls, not the container.
What is the difference between Big Endian and Little Endian Byte order ?
Both of these seem to be related to Unicode and UTF16. Where exactly do we use this?
Big-Endian (BE) / Little-Endian (LE) are two ways to organize multi-byte words. For example, when using two bytes to represent a character in UTF-16, there are two ways to represent the character 0x1234 as a string of bytes (0x00-0xFF):
Byte Index: 0 1
---------------------
Big-Endian: 12 34
Little-Endian: 34 12
In order to decide if a text uses UTF-16BE or UTF-16LE, the specification recommends to prepend a Byte Order Mark (BOM) to the string, representing the character U+FEFF. So, if the first two bytes of a UTF-16 encoded text file are FE, FF, the encoding is UTF-16BE. For FF, FE, it is UTF-16LE.
A visual example: The word "Example" in different encodings (UTF-16 with BOM):
Byte Index: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
------------------------------------------------------------
ASCII: 45 78 61 6d 70 6c 65
UTF-16BE: FE FF 00 45 00 78 00 61 00 6d 00 70 00 6c 00 65
UTF-16LE: FF FE 45 00 78 00 61 00 6d 00 70 00 6c 00 65 00
For further information, please read the Wikipedia page of Endianness and/or UTF-16.
Ferdinand's answer (and others) are correct, but incomplete.
Big Endian (BE) / Little Endian (LE) have nothing to do with UTF-16 or UTF-32.
They existed way before Unicode, and affect how the bytes of numbers get stored in the computer's memory. They depend on the processor.
If you have a number with the value 0x12345678 then in memory it will be represented as 12 34 56 78 (BE) or 78 56 34 12 (LE).
UTF-16 and UTF-32 happen to be represented on 2 respectively 4 bytes, so the order of the bytes respects the ordering that any number follows on that platform.
UTF-16 encodes Unicode into 16-bit values. Most modern filesystems operate on 8-bit bytes. So, to save a UTF-16 encoded file to disk, for example, you have to decide which part of the 16-bit value goes in the first byte, and which goes into the second byte.
Wikipedia has a more complete explanation.
little-endian: adj.
Describes a computer architecture in which, within a given 16- or 32-bit word, bytes at lower addresses have lower significance (the word is stored ‘little-end-first’). The PDP-11 and VAX families of computers and Intel microprocessors and a lot of communications and networking hardware are little-endian. The term is sometimes used to describe the ordering of units other than bytes; most often, bits within a byte.
big-endian: adj.
[common; From Swift's Gulliver's Travels via the famous paper On Holy Wars and a Plea for Peace by Danny Cohen, USC/ISI IEN 137, dated April 1, 1980]
Describes a computer architecture in which, within a given multi-byte numeric representation, the most significant byte has the lowest address (the word is stored ‘big-end-first’). Most processors, including the IBM 370 family, the PDP-10, the Motorola microprocessor families, and most of the various RISC designs are big-endian. Big-endian byte order is also sometimes called network order.
---from the Jargon File: http://catb.org/~esr/jargon/html/index.html
Byte endianness (big or little) needs to be specified for Unicode/UTF-16 encoding because for character codes that use more than a single byte, there is a choice of whether to read/write the most significant byte first or last. Unicode/UTF-16, since they are variable-length encodings (i.e. each char can be represented by one or several bytes) require this to be specified. (Note however that UTF-8 "words" are always 8-bits/one byte in length [though characters can be multiple points], therefore there is no problem with endianness.) If the encoder of a stream of bytes representing Unicode text and the decoder aren't agreed on which convention is being used, the wrong character code can be interpreted. For this reason, either the convention of endianness is known beforehand or more commonly a byte order mark is usually specified at the beginning of any Unicode text file/stream to indicate whethere big or little endian order is being used.