Is there a difference between Signed and Unsigned LEB128, when *encoding* the number? - encoding

I understand that LEB128 decoders need to know whether an encoded number is signed or unsigned, but the encoder seems to work identically either way (though Wikipedia uses distinct functions for encoding signed and unsigned numbers).
If positive numbers are encoded the same way in Signed and Unsigned LEB128 (only the range changes), and negative numbers only occur in Signed LEB128, it seems more sensible to create a single function that encodes any integer (using the two's compliment when the argument is negative).
I implemented a function that works the way I described, and it seems to work fine.
This is not an implementation detail (unless I've misunderstood something). Any function that can encode Signed LEB128 makes any function that encodes Unsigned LEB128 completely redundant, so there would never be a good reason to create both.
I used JavaScript, but the actual implementation is not important. Is there ever a reason to have a Signed LEB128 encoder and an Unsigned one?
const toLEB128 = function * (arg) {
/* This generator takes any BigInt, LEB128 encodes it, and
yields the result, one byte at a time (little-endian). */
const digits = arg.toString(2).length;
const length = digits + (7 - digits % 7);
const sevens = new RegExp(".{1,7}", "g");
const number = BigInt.asUintN(length, arg);
const padded = "000000" + number.toString(2);
const string = padded.slice(padded.length % 7);
const eights = string.match(sevens).map(function(string, index) {
/* This callback takes each string of seven digits and its
index (big-endian), prepends the correct continuation digit,
converts the 8-bit result to a BigInt, then returns it. */
return BigInt("0b" + Boolean(index) * 1 + string);
});
while (eights.length) yield eights.pop();
};

Related

c++ libtomcrypt library outputting shorter hashes/truncated hashes

I am trying to generate hashes to use in a blockchain project, when looking for a crypto library i stumbled accross tomcrypt and chose to download it since it was easy to install, but now i have a problem, when I create the hashes (btw i'm usign SHA3_512 but the bug is present in every other SHA hashing algorithm) sometimes it outputs the correct hash but truncated
photo example
Hash truncating example
this is the code for the hashing function
string hashSHA3_512(const std::string& input) {
//Initial
unsigned char* hashResult = new unsigned char[sha3_512_desc.hashsize];
//Initialize a state variable for the hash
hash_state md;
sha3_512_init(&md);
//Process the text - remember you can call process() multiple times
sha3_process(&md, (const unsigned char*) input.c_str(), input.size());
//Finish the hash calculation
sha3_done(&md, hashResult);
// Convert to string
string stringifiedHash(reinterpret_cast<char*>(hashResult));
// Return the result
return stringToHex(stringifiedHash);
}
and here is the code for the toHex function even if I already checked and the truncating hash problem pops up before this function is called
string stringToHex(const std::string& input)
{
static const char hex_digits[] = "0123456789abcdef";
std::string output;
output.reserve(input.length() * 2);
for (unsigned char c : input)
{
output.push_back(hex_digits[c >> 4]);
output.push_back(hex_digits[c & 15]);
}
return output;
}
if someone has knowledge about this library or in general about this problem and possible fixes pls explain to me, i'm stuck from 3 days
UPDATE
I figured out the program is truncating the hashes when it encounters 2 consecutive zeros in hex so 8 zeros in binary (or simply 2 bytes) but I still don't understand why, if you do pls let me and hopefully other people with the same problem know

Comparing unsigned integers with signed type

Some languages do not have support for unsigned integers, like dart or Java.
I have two integer numbers int a, b that are really unsigned (basically hashes or bitfields), but have to be stored in the signed data types.
A comparison function is needed. The usual a < b will not work here, as it would wrongly interpret negative values to be smaller, while they are (in the desired unsigned interpretation) actually larger. Each of those two ranges are handled correctly if considered alone.
A working solution I came up with (in dart, but language shouldn't really matter) is
int compareAsUnsigned(int a, int b) {
final signA = a.sign;
final signB = b.sign;
if (signA == signB) return a.compareTo(b);
if (signA == -1 || signB == -1) return b.compareTo(a);
return a.compareTo(b);
}
Are there any efficent and / or elegant ways to get the unsigned compare for values stored in signed data types (a longer type is not available and all bits are used)?

What is the point of writing integer in hexadecimal, octal and binary?

I am well aware that one is able to assign a value to an array or constant in Swift and have those value represented in different formats.
For Integer: One can declare in the formats of decimal, binary, octal or hexadecimal.
For Float or Double: One can declare in the formats of either decimal or hexadecimal and able to make use of the exponent too.
For instance:
var decInt = 17
var binInt = 0b10001
var octInt = 0o21
var hexInt = 0x11
All of the above variables gives the same result which is 17.
But what's the catch? Why bother using those other than decimal?
There are some notations that can be way easier to understand for people even if the result in the end is the same. You can for example think in cases like colour notation (hexadecimal) or file permission notation (octal).
Code is best written in the most meaningful way.
Using the number format that best matches the domain of your program, is just one example. You don't want to obscure domain specific details and want to minimize the mental effort for the reader of your code.
Two other examples:
Do not simplify calculations. For example: To convert a scaled integer value in 1/10000 arc minutes to a floating point in degrees, do not write the conversion factor as 600000.0, but instead write 10000.0 * 60.0.
Chose a code structure that matches the nature of your data. For example: If you have a function with two return values, determine if it's a symmetrical or asymmetrical situation. For a symmetrical situation always write a full if (condition) { return A; } else { return B; }. It's a common mistake to write if (condition) { return A; } return B; (simply because 'it works').
Meaning matters!

HMAC-MD5 in pure lua

I need to write a HMAC-MD5 algorithm in pure Lua..
I got this algorithm from Wikipedia
function hmac (key, message)
if (length(key) > blocksize) then
key = hash(key) // keys longer than blocksize are shortened
end if
if (length(key) < blocksize) then
key = key ∥ [0x00 * (blocksize - length(key))] // keys shorter than blocksize are zero-padded ('∥' is concatenation)
end if
o_key_pad = [0x5c * blocksize] ⊕ key // Where blocksize is that of the underlying hash function
i_key_pad = [0x36 * blocksize] ⊕ key // Where ⊕ is exclusive or (XOR)
return hash(o_key_pad ∥ hash(i_key_pad ∥ message)) // Where '∥' is concatenation
end function
and I have the md5 code from here. The md5 calculation function works correctly..
Implementing the algorithm in lua, so far I have the following code
local function hmac_md5(key,msg)
local blocksize = 64
if string.len(key) > blocksize then
key = calculateMD5(key)
end
while string.len(key)<blocksize do
key = key .. "0"
end
-- local o_key_pad = bit_xor((0x5c * blocksize),key)
-- local i_key_pad = bit_xor((0x36 * blocksize),key)
return calculateMD5(o_key_pad..calculateMD5(i_key_pad..message))
end
--calculateMD5 is the md5.Calc function in the Stackoverflow link specifed
I am stuck in the part where o_key_pad and i_key_pad are calculated.. do I just XOR the 2 values? The python implementation in the wikipedia link had some weird calculations..
Please help!
Yes, "⊕" is the symbol for "exclusive or".
Remember: once you compute the final hash, DO NOT use an ordinary string comparison to check if a hash is correct. This WILL allow attackers to sign arbitrary messages.
Note that 0x5c * blocksize is probably not what you are looking for, since that multiplies 0x5c by blocksize. You want to create an array of length blocksize containing 0x5c in each position.
Note that you must pad with zero bytes, not the character "0". So key = key .. "0" is wrong. It should be key = key .. "\0", or however you create NUL bytes in Lua.

Are int32s signed or unsigned in OSC (or is it unspecified?)

The OSC Specification, version 1.0 specifies the "int32" data type as "32-bit big-endian two's complement integer". This implies that it's signed (otherwise, why would you write "two's complement"...), but it doesn't come right out and say it.
This comes up most clearly in the encoding of blobs: should it be legal to have a blob of length #x90000000 ? This number can be encoded as an unsigned 32-bit integer, but not as a signed 32-bit integer. I grant you, that's an extremely big blob (more than 2 gigabytes).
The specification gives you no more details. I checked the code of the C++ osc implementation I use and it's defined as:
typedef signed long int32;
the blob is defined as:
struct Blob{
Blob() {}
explicit Blob( const void* data_, unsigned long size_ )
: data( data_ ), size( size_ ) {}
const void* data;
unsigned long size;
};
So yes, it's signed integer for the "atomic" int32 type.
The blob on the other hand has it's size stored as unsigned long. So probably it can be larger. You may have to try it first, because I have only the implementation of osc pack here.