Newton-Raphson Division For Floating Point Divide? - matlab

I am trying to implement the Newton-Raphson Division Algorithm Wikipedia entry to implement a IEEE-754 32-bit floating point divide on a processor which has no hardware divide unit.
My memory locations are 32-bit two's complement word and I have already implemented floating point addition, subtraction, and multiplication, so I can reuse the code to implement the Newton-Raphson algorithm. I am trying to first implement this all in Matlab.
At this step:
X_0 = 48/17 - 32/17 * D
How do I bitshift D properly to between 0.5 and 1 as described in the algorithm details?

You might look at the compiler-rt runtime library (part of LLVM), which has a liberal license and implements floating-point operations for processors that lack hardware support.
You could also look at libgcc, though I believe that's GPL, which may or may not be an issue for you.
In fact, don't just look at them. Use one of them (or another soft-float library). There's no need to re-invent the wheel.

Related

Can float16 data type save compute cycles while computing transcendental functions?

it's clearly that float16 can save bandwidth, but is float16 can save compute cycles while computing transcendental functions, like exp()?
If your hardware has full support for it, not just conversion to float32, then yes, definitely. e.g. on a GPU, or on Intel Alder Lake with AVX-512 enabled, or Sapphire Rapids.
Half-precision floating-point arithmetic on Intel chips. Or apparently on Apple M2 CPUs.
If you can do two 64-byte SIMD vectors of FMAs per clock on a core, you go twice as fast if that's 32 half-precision FMAs per vector instead of 16 single-precision FMAs.
Speed vs. precision tradeoff: only enough for FP16 is needed
Without hardware ALU support for FP16, only by not requiring as much precision because you know you're eventually going to round to fp16. So you'd use polynomial approximations of lower degree, thus fewer FMA operations, even though you're computing with float32.
BTW, exp and log are interesting for floating point because the format itself is build around an exponential representation. So you can do an exponential by converting fp->int and stuffing that integer into the exponent field of an FP bit pattern. Then with the the fractional part of your FP number, you use a polynomial approximation to get the mantissa of the exponent. A log implementation is the reverse: extract the exponent field and use a polynomial approximation of log of the mantissa, over a range like 1.0 to 2.0.
See
Efficient implementation of log2(__m256d) in AVX2
Fastest Implementation of Exponential Function Using AVX
Very fast approximate Logarithm (natural log) function in C++?
vgetmantps vs andpd instructions for getting the mantissa of float
Normally you do want some FP operations, so I don't think it would be worth trying to use only 16-bit integer operations to avoid unpacking to float32 even for exp or log, which are somewhat special and intimately connected with floating point's significand * 2^exponent format, unlike sin/cos/tan or other transcendental functions.
So I think your best bet would normally still be to start by converting fp16 to fp32, if you don't have instructions like AVX-512 FP16 can do actual FP math on it. But you can gain performance from not needing as much precision, since implementing these functions normally involves a speed vs. precision tradeoff.

How to resolve Underflow (/Overflow) issues in Fixed-Point tool of MATLAB Simulink?

Thanks in advance
I am working on a Simulink model which involves Floating-point data types. So using the Fixed-Point tool available in Simulink, I am trying to convert my floating-point system to a fixed-point one. I am following the tutorial available here to achieve the conversion.
Link to the tutorial on converting the floating-point system to the fixed point
In the Data type proposing step, I got underflow values for some of the variables. My question is how to convert those underflow values as well in-range. Or can I ignore them and proceed with further steps? In general how to tackle this type of underflow/overflow issue?
Using fixed-point arithmetic can be faster and use less resources than floating-point arithmetic, but a significant disadvantage is that underflow and overflow are not handled gracefully. If you try to detect and recover from these conditions you will lose much of the advantage provided by fixed-point.
In practice, you should select a fixed-point format for your variables that provides enough bits for the integer part (the bits to the left of the radix point) so that overflow cannot occur. This requires careful analysis of your algorithms and the potential ranges of all variables. Your format should also provide enough fraction bits (to the right of the radix point) so that underflows do not cause significant problems with your algorithm.

Can CPU(like Intel/AMD/ARM) do higher math computation except for addition/subtraction/multiplication/division?

I learn something about logical circuit and computer architecture,including assembly instruction set(such as x86 instruction set,ARM instruction set) and microarchitecture(x86/ARM),I found no matter Intel processor or ARM processor can only do addition/subtraction/multiplication/division these four basic math computation hardwarely,because Intel/ARM processor only have adder/subtracter/multiplier/divider these four basic computers.
But these processors support more advanced math computation like trigonometric function,exponential function,power function and these functions' derivative/ definite integral?Even matrix computation?
I know these advanced math computations can be done softwarely(like Python's NumPy/SciPy),but Intel/ARM processor can support these advanced math computations hardwarely just like addition/subtraction/multiplication/division?
Generally speaking, you can build hardware structures to help accelerate the calculation of things such as trigonometric functions. However, in practice, it's pointless, because it's not a good use of hardware resources.
There is a paper from 1983 on how trigonometric functions were implemented on the 8087 floating-point co-processor (Implementation of transcendental functions on a numerics processor). Even there, they rely on a CORDIC implementation, which is a method of calculating trig functions using relatively basic hardware (add/sub/shift/table look-up). You can read more about CORDIC implementations in the following paper: Evaluating Elementary Functions in a Numerical Coprocessor Based on Rational Approximations
On a modern x86 processor, complex instructions like FCOS are implemented using microcode. Intel doesn't like to talk about their microcoded instructions, but you can find a paper from AMD right now that describes this particular use of microcode: The K5 Transcendental Functions
Intel processors do support trigonometric and many advanced computations
According to "Professional Assembly Language" by Richard Blum
Since 80486 the Intel IA-32 platform has directly supported floating point operations.
The FPU Floating Point Unit supports many advanced functions other than simple add, sum, multiple, div.
They are
Absolute value FABS
Change sign FCHS
Cosine FCOS
Partial Tangent FPTAN
etc.

Does unary minus just change sign?

Consider for example the following double-precision numbers:
x = 1232.2454545e-89;
y = -1232.2454545e-89;
Can I be sure that y is always exactly equal to -x (or Matlab's uminus(x))? Or should I expect small numerical differences of the order or eps as it often happens with numerical computations? Try for example sqrt(3)^2-3: the result is not exactly zero. Can that happen with unary minus as well? Is it lossy like square root is?
Another way to put the question would be: is a negative numerical literal always equal to negating its positive counterpart?
My question refers to Matlab, but probably has more to do with the IEEE 754 standard than with Matlab specifically.
I have done some tests in Matlab with a few randomly selected numbers. I have found that, in those cases,
They turn out to be equal indeed.
typecast(x, 'uint8') and typecast(-x, 'uint8') differ only in the sign bit as defined by IEEE 754 double-precision format.
This suggests that the answer may be affirmative. If applying unary minus only changes the sign bit, and not the significand, no precision is lost.
But of course I have only tested a few cases. I'd like to be sure this happens in all cases.
This question is computer architecture dependent. However, the sign of floating point numbers on modern architectures (including x64 and ARM cores) is represented by a single sign bit, and they have instructions to flip this bit (e.g. FCHS). That being the case, we can draw two conclusions:
A change of sign can be achieved (and indeed is by modern compilers and architectures) by a single bit flip/instruction. This means that the process is completely invertible, and there is no loss of numerical accuracy.
It would make no sense for MATLAB to do anything other than the fastest, most accurate thing, which is just to flip that bit.
That said, the only way to be sure would be to inspect the assembly code for uminus in your MATLAB installation. I don't know how to do this.

Fixed point in Matlab

Can someone please explain this?
As I understand it provides less precision..Is it a speed-up that one wishes to get by using it? When is it good to use? Should I use it in Matlab Coder?
Not all the computers in the world use floating-point arithmetic. In particular, many devices which have a connection to the world (such as sensors and the computers which process their data) use fixed-point representations of numbers. Some researchers into algorithms and similar matters also want to use fixed-point numbers. Matlab's fixed-point toolbox allows its users to do fixed-point arithmetic on their PCs, and to write code targeted at execution on devices which implement it.
It's not (necessarily) true that the Matlab fixed-point arithmetic provides less precision, it can be used to provide more precision than IEEE floating-point types.
Is it a speed up ? That's beside the point. (Read on)
When is it good to use ? When you need to use fixed-point arithmetic. I'm not sure anyone would recommend it as a general-purpose replacement for floating-point arithmetic.
Should you use it ? Your question suggests that the answer is almost certainly 'No, you would already know that you ought to be using fixed-point arithmetic.'