Kademlia XOR Distance as an Integer - encoding

In the Kademlia paper it mentions using the XOR of the NodeID interpreted as an integer. Let's pretend my NodeID1 is aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d and my NodeID2 is ab4d8d2a5f480a137067da17100271cd176607a1. What's the appropriate way to interpret this as an integer for comparison of NodeID1 and NodeID2? Would I convert these into BigInt and XOR those two BigInts? I saw that in one implementation. Could I also just convert each NodeID into decimal and XOR those values?
I found this question but I'm trying to better understand exactly how this works.
Note: This isn't for implementation, I'm just trying to understand how the integer interpretation works.

For a basic kademlia implementation you only need 2 bit arithmetic operations on the IDs: xor and comparison. For both cases the ID conceptually is a 160bit unsigned integer with overflow, i.e. modulo 2^160 arithmetic. It can be decomposed into a 20bytes or 5×u32 array, assuming correct endianness conversion in the latter case. The most common endianness for network protocols is big-endian, so byte 0 will contain the most significant 8 bits out of 160.
Then the xor or comparisons can be applied on a subunit by subunit basis. I.e. xor is just an xor for all the bytes, the comparison is a binary array comparison.
Using bigint library functions are probably sufficient for implementation but not optimal because they have size and signedness overhead compared to implementing the necessary bit-twiddling on fixed-sized arrays.
A more complete implementation may also need some additional arithmetic and utility functions.
Could I also just convert each NodeID into decimal and XOR those values?
Considering the size of the numbers decimal representation is not particularly useful. For the human reader heaxadecimal or the individual bits are more useful and computers operates on binary and practically never on decimal.

Related

Portability of auto kind/type conversions in numerical operations in Fortran

According to the Fortran standard, if the operands of a numeric operation have different data kind/types, then the resulting value has a kind/type determined by the operand with greater decimal precision. Before the operation is evaluated, the operand with the lower decimal precision is first converted to the higher-precision kind/type.
Now, the use of a high-precision data kind/type implies there is accuracy to a certain level of significant digits, but kind/type conversion does not seem to guarantee such things1. For this reason, I avoid mixing single- and double-precision reals.
But does this mean that automatic kind/type conversions should be avoided at all costs? For example, I would not hesitate to write x = y**2 where both x and y are reals (of the same kind), but the exponent is an integer.
Let us limit the scope of this question to the result of a single operation between two operands. We are not considering the outcome of equations with operations between multiple values where other issues might creep in.
Let us also assume we are using a portable type/kind system. For example, in the code below selected_real_kind is used to define the kind assigned to double-precision real values.
Then, I have two questions regarding numerical expressions with type/kind conversions between two operands:
Is it "portable", in practice? Can we expect the same result for an operation that uses automatic type/kind conversion from different compilers?
Is it "accurate" (and "portable") if the lower-precision operands are limited to integers or whole-number reals? To make this clear, can we always assume that 0==0.0d0, 1==1.0d0, 2==2.0d0, ... , for all compilers? And if so, then can we always assume that simple expressions such as (1 - 0.1230d0) == (1.0d0 - 0.1230d0) are true, and therefore the conversion is both accurate and portable?
To provide a simple example, would automatic conversion from an integer to a double-precision real like shown in the code below be accurate and/or portable?
program main
implicit none
integer, parameter :: dp = selected_real_kind(p=15)
print *, ((42 - 0.10_dp) == (42.0_dp - 0.10_dp))
end program
I have tested with gfortran and ifort, using different operands and operations, but have yet to see anything to cause concern as long as I limit the conversions to integers or whole-number reals. Am I missing anything here, or just revealing my non-CS background?
1According to these Intel Fortran docs (for example), integers converted to a real type have decimals filled with zeros. For the conversion of a single-precision real to higher-precision real, the additional decimal places are filled by first setting the low-order bits of the converted higher-precision operand to zero. So, for example, when a single-precision real operand with a non-zero fractional part (such as 1.2) is converted to a double, the conversion does not automatically increase the accuracy of the value - for example, 1.2 does not become 1.2000000000000000d0 but instead becomes something like 1.200000047683758d0. How much this actually matters probably depends on the application.

Rationale for CBOR negative integers

I am confused as to why CBOR chooses to encode negative integers as unsigned binary numbers with the value defined as -1 minus the unsigned value, instead of e.g. regular two's complement representation. Is there an obvious advantage that I'm missing, apart from increased negative range (which, IMO, is of questionable value weighed against increased complexity)?
Advantages:
There's only one allowed encoding type for each integer value, so all encoders will emit consistent output. If the encoders use the shortest encoding for each value as recommended by the spec, they'll emit identical output.
Picking the shortest numeric field is easier for non-negative numbers than for signed negative numbers, and CBOR aims for tiny IOT devices to readily transmit data.
It fits twice as many values into each integer encoding field width, thus making the data more compact. (It'd be yet more compact if the integer encodings didn't overlap, but that'd be notably more complicated.)
It can handle twice as large a negative value before needing the bignum extension.

Why are there 'Int's **and** 'Double's? Why not just have one class?

So I know this may be a very naive question, but I'm new to Scala and other similar languages, and this legitimately befuddles me.
Why does Scala (and other languages) have two distinct classes for integers and doubles? Since integers are in floats (1.0 = 1), why bother with an Int class?
I sort of understand that maybe you want to make sure that some integers are never changed into floats, or that maybe you want to guard against occurrences like 1.00000000002 != 1, which may be confusing when you only see the first few digits, but is there some other underlying justification that I'm missing?
Thanks!
Integers are important to the internal workings of software, because many things are internally implemented as integers that you wouldn't necessarily think of as "numbers". For example, memory addresses are generally represented as integers; individual eight-bit bytes are generally conceived of as one-byte integers; and characters (such as ASCII characters and Unicode characters) are usually identified by integer-valued codepoints. (In all of these cases, incidentally, in the rare event that we want to display them, we generally use hexadecimal notation, which is convenient because it uses exactly two characters per eight-bit byte.) Another important case, that usually is thought of as numeric even outside programming, is array indices; even in math, an array of length three will have elements numbered 1, 2, and 3 (though in programming, many languages — including Scala — will use the indices 0, 1, and 2 instead, sometimes because of the underlying scheme for mapping indices to memory addresses, and sometimes due to historical reasons relating to older languages' doing so for this reason).
More generally, many things in computing (and in the real world) are strictly quantized; it doesn't make sense to write of "2.4 table rows", or "2.4 loop iterations", so it's convenient to have a data-type where arithmetic is exact, and represents exact integer quantities.
But you're right that the distinction is not absolutely essential; a number of scripting languages, such as Perl and JavaScript and Tcl, have mostly dispensed with the explicit distinction between integers and floating-point numbers (though the distinction is still often drawn in the workings of the interpreters, with conversions occurring implicitly when needed). For example, in JavaScript, typeof 3 and typeof 3.0 are both 'number'.
Integers are generally much easier to work with given that they have exact representations. This is true not only at the software level, but at the hardware level as well. For example, looking at this source describing the x86 architecture, floating point addition generally takes 4X longer than integer addition. As such, it is advantageous to separate the two types of operations for performance reasons as well as usability reasons.
To add to other two answers, integers are not actually in floats. That is, there are some integers which can't be represented exactly by a float:
scala> Int.MaxValue.toFloat == (Int.MaxValue - 1).toFloat
res0: Boolean = true
Same for longs and doubles.

comparing float and double and printing them

I have a quick question. So, say I have a really big number up to like 15 digits, and I would take the input and assign it to two variables, one float and one double if I were to compare two numbers, how would you compare them? I think double has the precision up to like 15 digits? and float has 8? So, do I simply compare them while the float only contains 8 digits and pad the rest or do I have the float to print out all 15 digits and then make the comparison? Also, if I were asked to print out the float number, is the standard way of doing it is just printing it up to 8 digits? which is its max precision
thanks
Most languages will do some form of type promotion to let you compare types that are not identical, but reasonably similar. For details, you would have to indicate what language you are referring to.
Of course, the real problem with comparing floating point numbers is that the results might be unexpected due to rounding errors. Most mathematical equivalences don't hold for floating point artihmetic, so two sequences of operations which SHOULD yield the same value might actually yield slightly different values (or even very different values if you aren't careful).
EDIT: as for printing, the "standard way" is based on what you need. If, for some reason, you are doing monetary computations in floating point, chances are that you'll only want to print 2 decimal digits.
Thinking in terms of digits may be a problem here. Floats can have a range from negative infinity to positive infinity. In C# for example the range is ±1.5 × 10^−45 to ±3.4 × 10^38 with a precision of 7 digits.
Also, IEEE 754 defines floats and doubles.
Here is a link that might help http://en.wikipedia.org/wiki/IEEE_floating_point
Your question is the right one. You want to consider your approach, though.
Whether at 32 or 64 bits, the floating-point representation is not meant to compare numbers for equality. For example, the assertion 2.0/7.0 == 60.0/210.0 may or may not be true in the CPU's view. Conceptually, the floating-point is inherently meant to be imprecise.
If you wish to compare numbers for equality, use integers. Consider again the ratios of the last paragraph. The assertion that 2*210 == 7*60 is always true -- noting that those are the integral versions of the same four numbers as before, only related using multiplication rather than division. One suspects that what you are really looking for is something like this.

Irrational number representation in computer

We can write a simple Rational Number class using two integers representing A/B with B != 0.
If we want to represent an irrational number class (storing and computing), the first thing came to my mind is to use floating point, which means use IEEE 754 standard (binary fraction). This is because irrational number must be approximated.
Is there another way to write irrational number class other than using binary fraction (whether they conserve memory space or not) ?
I studied jsbeuno's solution using Python: Irrational number representation in any programming language?
He's still using the built-in floating point to store.
This is not homework.
Thank you for your time.
With a cardinality argument, there are much more irrational numbers than rational ones. (and the number of IEEE754 floating point numbers is finite, probably less than 2^64).
You can represent numbers with something else than fractions (e.g. logarithmically).
jsbeuno is storing the number as a base and a radix and using those when doing calcs with other irrational numbers; he's only using the float representation for output.
If you want to get fancier, you can define the base and the radix as rational numbers (with two integers) as described above, or make them themselves irrational numbers.
To make something thoroughly useful, though, you'll end up replicating a symbolic math package.
You can always use symbolic math, where items are stored exactly as they are and calculations are deferred until they can be performed with precision above some threshold.
For example, say you performed two operations on a non-irrational number like 2, one to take the square root and then one to square that. With limited precision, you may get something like:
(√2)²
= 1.414213562²
= 1.999999999
However, storing symbolic math would allow you to store the result of √2 as √2 rather than an approximation of it, then realise that (√x)² is equivalent to x, removing the possibility of error.
Now that obviously involves a more complicated encoding that simple IEEE754 but it's not impossible to achieve.