Representation of fixed point number in systemverilog - system-verilog

How should I represent a fixed point number in systemverilog since it doesnt support fixed point numbers for reg and logic.Is using real data type the correct method or can we use any different data type?
I am trying to do a square root function in systemverilog, in which the result will be in FP e.g sqrt(8) = 2.82.
What should be the data type of my inputs and outputs(sqrt) such that I can check the decimal point places correctly while verifying.

You use integral types for fixed point numbers. Some people will index their variables like
logic [M-1:-F] fp_number; // M-bits integer, F bit fractional
But it is up to you to adjust the decimal point when adding different sized numbers as well as adjusting for multiplication and division. There are some OpenCore libraries that have many of these operations for you.

Related

SystemVerilog: Data types and display of default size of data type

How can I display the size of a 'real' (or 'float') in system verilog?
$bits can display size of int, shortint, longint, time, integer, etc. but cannot do the same for a real.
You cannot select individual bits of a real number, nor is there any other construct that requires to know the number of bits in a real number. So SystemVerilog does not need to provide a way to tell you.
real is not a real verilog type. It is intended for testbench or for analog calculations, not for design. Therefore it has no bit size associated with it.
However from lrm:
The real data type is the same as a C double. The shortreal data type is the same as a C float. The
realtime declarations shall be treated synonymously with real declarations and can be used
interchangeably. Variables of these three types are collectively referred to as real variables.
And there is a function which converts real to bits:
$realtobits converts values from a real type to a 64-bit vector representation of the real number.
and corresponding
$bitstoreal converts a bit pattern created by $realtobits to a value of the real type
So, you can assume that the size of real is 64 bits after conversion to bits.

Simulink data types

I'm reading an IMU on the arduino board with a s-function block in simulink by double or single data types though I just need 2 decimals precision as ("xyz.ab").I want to improve the performance with changing data types and wonder that;
is there a way to decrease the precision to 2 decimals in s-function block or by adding/using any other conversion blocks/codes in the simulink aside from using fixed-point tool?
For true fixed point transfer, fixed-point toolbox is the most general answer, as stated in Phil's comment.
However, to avoid toolbox use, you could also devise your own fix-point integer format and add a block that takes a floating point input and convert it into an integer format (and vice versa on the output).
E.g. If you know the range is 327.68 < var < 327.67 you could just define your float as an int16 divided by 10. In a matlab function block you would then just say
y=int16(u*100.0);
to convert the input to the S-function.
On the output it would be a reversal
y=double(u)/100.0;
(Eml/matlab function code can be avoided by using multiply, divide and convert blocks.)
However, be mindful of the bits available and that the scaling (*,/) operations is done on the floating point rather than the integer.
2^(nrOfBits-1)-1 shows you what range you can represent including signeage. For unsigned types uint8/16/32 the range is 2^(nrOfBits)-1. Then you use the scaling to fit the representable bit into your used floating point range. The scaled range divided by 2^nrOfBits will tell you what the resolution will be (how large are the steps).
You will need to scale the variables correspondingly on the Arduino side as well when you go to an integer interface of this type. (I'm assuming you have access to that code - if not it'd be hard to use any other interface than what is already provided)
Note that the intXX(doubleVar*scale) will always truncate the values to integer. If you need proper rounding you should also include the round function, e.g.:
int16(round(doubleVar*scale));
You don't need to use a base 10 scale, any scaling and offsets can be used, but it's easier to make out numbers manually if you keep to base 10 (i.e. 0.1 10.0 100.0 1000.0 etc.).
As a final note, if the Arduino code interface is floating point (single/double) and can't be changed to integer type; you will not get any speedup from rounding decimals since the full floating point is what will be is transferred anyway. Even if you do manage to reduce the data a bit using integers I suspect this might not give a huge speedup unless you transfer large amounts of data. The interface code will have a comparatively large overhead anyway.
Good luck with your project!

How to tell if two numbers are really different or they are actually the same due to floating point error

For example,
0.168033639538270
and
0.168033639538270
are two double type numbers that are from two different calculations (some further calculations from the eigenvalues of a matrix).
But they are treated as different by MATLAB (by unique or ==). How do I know if MATLAB treats them as different due to floating point error eps = 2.220446049250313e-16, or if they are actually different (the digits behind the first 15 digits are not the same, but MATLAB just will not display them). Sometimes MATLAB treats two number with the same display value as the same, but sometimes different, so I want to know if they are really different.
You can print a formatted version of the number at required precision using sprintf, and then compare the two strings using strcmp.

Irrational number representation in computer

We can write a simple Rational Number class using two integers representing A/B with B != 0.
If we want to represent an irrational number class (storing and computing), the first thing came to my mind is to use floating point, which means use IEEE 754 standard (binary fraction). This is because irrational number must be approximated.
Is there another way to write irrational number class other than using binary fraction (whether they conserve memory space or not) ?
I studied jsbeuno's solution using Python: Irrational number representation in any programming language?
He's still using the built-in floating point to store.
This is not homework.
Thank you for your time.
With a cardinality argument, there are much more irrational numbers than rational ones. (and the number of IEEE754 floating point numbers is finite, probably less than 2^64).
You can represent numbers with something else than fractions (e.g. logarithmically).
jsbeuno is storing the number as a base and a radix and using those when doing calcs with other irrational numbers; he's only using the float representation for output.
If you want to get fancier, you can define the base and the radix as rational numbers (with two integers) as described above, or make them themselves irrational numbers.
To make something thoroughly useful, though, you'll end up replicating a symbolic math package.
You can always use symbolic math, where items are stored exactly as they are and calculations are deferred until they can be performed with precision above some threshold.
For example, say you performed two operations on a non-irrational number like 2, one to take the square root and then one to square that. With limited precision, you may get something like:
(√2)²
= 1.414213562²
= 1.999999999
However, storing symbolic math would allow you to store the result of √2 as √2 rather than an approximation of it, then realise that (√x)² is equivalent to x, removing the possibility of error.
Now that obviously involves a more complicated encoding that simple IEEE754 but it's not impossible to achieve.

Arbitrary precision Float numbers on JavaScript

I have some inputs on my site representing floating point numbers with up to ten precision digits (in decimal). At some point, in the client side validation code, I need to compare a couple of those values to see if they are equal or not, and here, as you would expect, the intrinsics of IEEE754 make that simple check fails with things like (2.0000000000==2.0000000001) = true.
I may break the floating point number in two longs for each side of the dot, make each side a 64 bit long and do my comparisons manually, but it looks so ugly!
Any decent Javascript library to handle arbitrary (or at least guaranteed) precision float numbers on Javascript?
Thanks in advance!
PS: A GWT based solution has a ++
There is the GWT-MATH library at http://code.google.com/p/gwt-math/.
However, I warn you, it's a GWT jsni overlay of a java->javascript automated conversion of java.BigDecimal (actually the old com.ibm.math.BigDecimal).
It works, but speedy it is not. (Nor lean. It will pad on a good 70k into your project).
At my workplace, we are working on a fixed point simple decimal, but nothing worth releasing yet. :(
Use an arbitrary precision integer library such as silentmatt’s javascript-biginteger, which can store and calculate with integers of any arbitrary size.
Since you want ten decimal places, you’ll need to store the value n as n×10^10. For example, store 1 as 10000000000 (ten zeroes), 1.5 as 15000000000 (nine zeroes), etc. To display the value to the user, simply place a decimal point in front of the tenth-last character (and then cut off any trailing zeroes if you want).
Alternatively you could store a numerator and a denominator as bigintegers, which would then allow you arbitrarily precise fractional values (but beware – fractional values tend to get very big very quickly).