Binary representation of the inverse of a big integer - type-conversion

For cryptographic purposes, I have to write the binary representation of the inverse of a big number p, whose size is roughly 16k bits. But I couldn't find any library allowing to convert such an integer to a float (I'm working on C/Python). Especially, as the inverse 1/p is extremely small, I've tried to shift the binary representation by dividing instead the closest power of ten smaller than p, by p, but the problem remains as I can't convert p to perfom the division.
Does there exist some algorithms computing this kind of inversion? Do I have to implement them or are they already present in some libraries? I am working with python3.5.

Related

Compression algorithm for contiguous numbers

I'm looking for an efficient encoding for storing simulated coefficients.
The data has thousands of curves with each 512 contiguous numbers with single precision. The data may be stored as fixed point while it should preserve about 23-bit precision (compared to unity level).
The curves could look like those:
My best approach was to convert the numbers to 24-bit fixed point. Repeatedly I took the adjacent difference as long as the sum-of-squares decreases. When compressing the resulting data using LZMA (xz,lzip) I get about 7.5x compression (compared to float32).
Adjacent differences are good at the beginning, but they emphasize the quantization noise at each turn.
I've also tried the cosine transform after subtracting the slope/curve at the boundaries. The resulting compression was much weaker.
I tried AEC but LZMA compressed much stronger. The highest compression was using bzip3 (after adjacent differences).
I found no function to fit the data with high precision and a limited parameter count.
Is there a way to reduce the penalty of quantization noise when using adjacent differences?
Are there encodings which are better suited for this type of data?
You could try a higher-order predictor. Your "adjacent difference" is a zeroth-order predictor, where the next sample is predicted to be equal to the last sample. You take the differences between the actuals and the predictions, and then compress those differences.
You can try first, second, etc. order predictors. A first-order predictor would look at the last two samples, draw a line between those, and predict that the next sample will fall on the line. A second-order predictor would look at the last three samples, fit those to a parabola, and predict that the next sample will fall on the parabola. And so on.
Assuming that your samples are equally spaced on your x-axis, then the predictors for x[0] up through cubics are:
x[-1] (what you're using now)
2*x[-1] - x[-2]
3*x[-1] - 3*x[-2] + x[-3]
4*x[-1] - 6*x[-2] + 4*x[-3] - x[-4]
(Note that the coefficients are alternating-sign binomial coefficients.)
I doubt that the cubic polynomial predictor will be useful for you, but experiment with all of them to see if any help.
Assuming that the differences are small, you should use a variable-length integer to represent them. The idea would be to use one byte for each difference most of the time. For example, you could code seven bits of difference, say -64 to 63, in one byte with the high bit clear. If the difference doesn't fit in that, then make the high bit set, and have a second byte with another seven bits for a total of 14 with that second high bit clear. And so on for larger differences.

Can float16 data type save compute cycles while computing transcendental functions?

it's clearly that float16 can save bandwidth, but is float16 can save compute cycles while computing transcendental functions, like exp()?
If your hardware has full support for it, not just conversion to float32, then yes, definitely. e.g. on a GPU, or on Intel Alder Lake with AVX-512 enabled, or Sapphire Rapids.
Half-precision floating-point arithmetic on Intel chips. Or apparently on Apple M2 CPUs.
If you can do two 64-byte SIMD vectors of FMAs per clock on a core, you go twice as fast if that's 32 half-precision FMAs per vector instead of 16 single-precision FMAs.
Speed vs. precision tradeoff: only enough for FP16 is needed
Without hardware ALU support for FP16, only by not requiring as much precision because you know you're eventually going to round to fp16. So you'd use polynomial approximations of lower degree, thus fewer FMA operations, even though you're computing with float32.
BTW, exp and log are interesting for floating point because the format itself is build around an exponential representation. So you can do an exponential by converting fp->int and stuffing that integer into the exponent field of an FP bit pattern. Then with the the fractional part of your FP number, you use a polynomial approximation to get the mantissa of the exponent. A log implementation is the reverse: extract the exponent field and use a polynomial approximation of log of the mantissa, over a range like 1.0 to 2.0.
See
Efficient implementation of log2(__m256d) in AVX2
Fastest Implementation of Exponential Function Using AVX
Very fast approximate Logarithm (natural log) function in C++?
vgetmantps vs andpd instructions for getting the mantissa of float
Normally you do want some FP operations, so I don't think it would be worth trying to use only 16-bit integer operations to avoid unpacking to float32 even for exp or log, which are somewhat special and intimately connected with floating point's significand * 2^exponent format, unlike sin/cos/tan or other transcendental functions.
So I think your best bet would normally still be to start by converting fp16 to fp32, if you don't have instructions like AVX-512 FP16 can do actual FP math on it. But you can gain performance from not needing as much precision, since implementing these functions normally involves a speed vs. precision tradeoff.

What is the optimum precision to use in an arithmetic encoder?

I've implemented an arithmetic coder here - https://github.com/danieleades/arithmetic-coding
i'm struggling to understand a general way to choose an optimal number of bits for representing integers within the encoder. I'm using a model where probabilities are represented as rationals.
I know that to prevent underflows/overflows, the number of bits used to represent integers within the encoder/decoder must be at least 2 bits greater than the maximum number of bits used to represent the denominator of the probabilities.
for example, if i use a maximum of 10 bits to represent the denominator of the probabilities, then to ensure the encoding/decoding works, i need to use at least MAX_DENOMINATOR_BITS + 2 = 12 bits to represent the integers.
If i was to use 32bit integers to store these values, I would have another 10 bits up my sleeve (right?).
I've seen a couple of examples that use 12 bits for integers, and 8 bits for probabilities, with a 32bit integer type. Is this somehow optimal, or is this just a fairly generic choice?
I've found that increasing the precision above the minimum improves the compression ratio slightly (but it saturates quickly). Given that increasing the precision improves compression, what is the optimum choice? Should I simply aim to maximise the number of bits i use to represent the integers for a given denominator? Performance is a non-goal for my application, in case that's a consideration.
Is it possible to quantify the benefit of moving to say, a 64bit internal representation to provide a greater number of precision bits?
I've based my implementation on this (excellent) article - https://marknelson.us/posts/2014/10/19/data-compression-with-arithmetic-coding.html

Are there architectures which are not using two's complement for representation of negative values?

The benefits of using the two's complement for storing negative values in memory are well-known and well-discussed in this board.
Hence, I'm wondering:
Do or did some architectures exist, which have chosen a different way for representing negative values in memory than using two's complement?
If so: What were the reasons?
Signed-magnitude existed as the most obvious, naive implementation of signed numbers.
One's complement has also been used on real machines.
On both of those representations, there's a benefit that the positive and negative ranges span equal intervals. A downside is that they both contain a negative zero representation that doesn't naturally occur in the sort of integer arithmetic commonly used in computation. And of course, the hardware for two's complement turns out to be much simpler to build
Note that the above applies to integers. Common IEEE-style floating point representations are effectively sign-magnitude, with some more details layered into the magnitude representation.

double or decimal for temperature spread formula

I am looking at writing an application which calculates temperatures spread in a material with an exposure (such as fire) over time.
I then need to draw the result as a heat map on the geometry of the material.
Now I wonder if I should in general go with Decimal or Double for all calculations and drawing. I have looked into both but are still unsure which to use.
I will need to compare values, including interpolated values over time. And double have as far as I understand it comparison problems due to inexact representation.
But decimal is heavy to work with.
I am leaning towards double only but at the same time more exact representation and comparison is worth a lot too.
Any finite representation of decimal numbers is bound to have the same “inexact representation” issues as a finite binary representation. Not all real numbers can be represented in finite space and going to base 10 is not going to help there.
Decimal just uses the same bits less efficiently—when talking of representing a physical simulation. On the other hand, a particular implementation of decimal numbers may allow to use more bits, but multi-precision binary floating-point implementations are available too.
The superstition that decimal floating-point libraries would somehow not be “inexact” or have “comparison problems” is widespread because these libraries can represent exactly decimal numbers, starting with 0.1. If your simulation involves coefficients that are powers of ten, then decimal would indeed be a good fit. Otherwise, decimal will not solve any of the problems inherent to the finite representation of a continuous range of numbers.