Long Integer signed Fixed Point to Real convertion SystemVerilog - system-verilog

I need to convert a Long number as Fixed point into a Double rappresentation.
The fixed point math is used into the synthesis process and the Real data type only for validation and simulation.
If I make multiple convertion in chain with multiple datatypes to adjust the format then it is not enough or completely wrong .
In my case with a fixed point mantissa of 44 bit I have 3bit integer+sign bit. Q notation like "sfix_44_48"
As example I am doing this to convert a fixed point integer number into a Real value(getting the number 0.5f ):
logic signed [47:0] fp_number = 48'h0800_0000_0000; // it should be 0.5f
real r_val;
real rr_val;
real rrr_val;
real tmp;
initial
begin
r_val = $itor(fp_number)/(2**44); // doesn't solve the problem.
rr_val = real'{fp_number}/(2**44); // doesn't solve the problem.
$cast(tmp,fp_number>>>44); // doesn't solve the problem
rrr_val = tmp;
end
$itor(...) is limited to 32bit integer part.
As result of above I get zero or NaN, on Modelsim simulation.
No luck during all these convertions.
the SV LRM doesn't seem to have a clear way to do this convertion.
What is the SV workaround to allow simulations to analize data greater than 32bit size easily? please.
C.

You want to use
rr_val = real'(fp_number)/(2.0**44);
Do not use any of the $TtoT functions from Verilog. They have fixed datatype inputs and outputs.
2**44 gets computed as a 32-bit 2-complement value and overflows, giving you 0. You can use 2.0 or real'(2) instead.

thanks to #dave_59 I post this piece of code which show the mess with the convertion.
logic signed [47:0] fp_number = 48'h0800_0000_0000; // it should be 0.5f
logic signed [31:0] fp_number2 = 32'h0800_0000; // it should be 0.5f
real r_val;
real rr_val;
real rrr_val;
real rrrr_val;
initial
begin
$display("48bit fp convertion sfix_44_48");
r_val = real'{fp_number}/(2**44); // doesn't solve the problem (curly braces valid sintax but wrong convertion + wrong convertion on the denominator).
rr_val = real'{fp_number}/(2.0**44); // doesn't solve the problem (curly braces valid sintax but wrong convertion + denominator convertion OK).
rrr_val = real'(fp_number)/(2**44); // doesn't solve the problem (numerator OK + the power operation is not properly converted to a Real number as result).
rrrr_val = real'(fp_number)/(2.0**44); // solve the problem with long integer fixed points convertion (the braces are not curly anymore).
$display("r_val[%08f]",r_val,", rr_val[%08f]",rr_val,", rrr_val[%08f]",rrr_val,", rrrr_val[%08f]",rrrr_val); // it should be 0.5 on the fourth data
$display("32bit fp convertion sfix_28_32");
r_val = real'{fp_number2}/(2**28); // result totally different than previous 48bit operation, doesn't solve the problem (curly braces valid sintax but wrong convertion + wrong convertion on the denominator).
rr_val = real'{fp_number2}/(2.0**28); // doesn't solve the problem (curly braces valid sintax but wrong convertion + denominator convertion OK).
rrr_val = real'(fp_number2)/(2**28); // with a 32bit range it apparently solve the problem (numerator OK + the power operation is OK with this range).
rrrr_val = real'(fp_number2)/(2.0**28); // solve the problem with long integer fixed points convertion (the braces are not curly anymore).
$display("r_val[%08f]",r_val,", rr_val[%08f]",rr_val,", rrr_val[%08f]",rrr_val,", rrrr_val[%08f]",rrrr_val); // it should be 0.5 on the fourth data
end
the only valid convertion at 48bit is the fourth case.
For 32bit the third case is valid and also the fourth case.
First: the classic pow(...) operation must be done with this syntax (2.0**BIT) which will create a Real division and not a integer division using (2**BIT) when a scaling fixed point will be applied.
In this case the operation above is managed as float/double(C style) or real(SystemVerilog)
* Second: the real'() cast operation MUST be used with NO curly braces.
I didn't have a Linting tool to proper check the syntax so I would expect a Warning due to the validity of the operation with the curly braces.
Third: the subdle results are ok if the INTEGER denominator is limited at 32bit operations.
As results shown below:
SIM START.
# 48bit fp convertion sfix_44_48
# ** Error (suppressible): (vsim-8604) ./blocks/sim/test_tb.sv(141): NaN (not a number) resulted from a division operation.
# ** Error (suppressible): (vsim-8630) Infinity results from division operation.
# Time: 0 ps Iteration: 0 Process: /test_tb/#INITIAL#138 File: ./blocks/sim/test_tb.sv Line: 143
# r_val[-1.#IND00], rr_val[0.000000], rrr_val[1.#INF00], rrrr_val[0.500000]
# 32bit fp convertion sfix_28_32
SIM END.
# r_val[0.000000], rr_val[0.000000], rrr_val[0.500000], rrrr_val[0.500000]
The solution is to avoid any not protected casting, like a casting with the braces boundary:
r_val = real'{ byteU,byteH,byteL} / (2.0**44) ; // WRONG
rr_val = real'({byteU,byteH,byteL}) / (2.0**44) ; // CORRECT
If the scaling factor occurs then the operation, generally /, must be done with the same type of operands (real/real).
Unsafe way is (real/long) which leads into a nightmare.

Related

Arithmetic right shift on logic (2 dimensional) signal

I am trying to do a Shift Right, Arithmetic (keep sign) on the signal in.
When I set the value in[0] to 16'hbb00, I expect in_sign_extend[0] to be 16'hf760 after it is signed right shifted. But, I notice that the actual result I see on in_sign_extend[0] is 16'h0680.
localparam CHANNELS = 8;
localparam AXI_M_DATA_WIDTH = 32;
logic signed [0:CHANNELS-1] [AXI_M_DATA_WIDTH/2-1:0] in;
logic signed [0:CHANNELS-1] [AXI_M_DATA_WIDTH/2-1:0] in_sign_extend;
assign in_sign_extend[0] = (in[0] >>> 3);
I am trying to understand if the in is actually correctly signed. Or am I missing something here.
The part select of a packed array is always unsigned, even when selecting the entire array. Only the variable as a whole reference (in) is signed. You can change your code to
assign in_sign_extend[0] = (signed'(in[0]) >>> 3);
Or you might prefer to use an unpacked array
logic signed [AXI_M_DATA_WIDTH/2-1:0] in[CHANNELS];
logic signed [AXI_M_DATA_WIDTH/2-1:0] in_sign_extend[CHANNELS];
assign in_sign_extend[0] = (in[0] >>> 3);

Systemverilog expansion rule

When I review some codes, I found something strange.
It seems that it comes from expansion and operation priority.
(I know that because "sig" is declared with 'signed', $signed is not necessary and '-sig' is correct one, anyway..)
reg signed [9:0] sig;
reg signed [11:0] out;
initial
begin
$monitor ("%0t] sig=%0d, out=%0h", $time, sig, out);
sig = 64;
out = $signed(-sig);
#1
out = -$signed(sig);
#1
sig = -512;
out = $signed(-sig);
#1
out = -$signed(sig);
#1
$finish;
end
Simulation result for above codes is,
0] sig=64, out=-64
2] sig=-512, out=-512
3] sig=-512, out=512
When sig=-512, I expected that 10 bits sig would be expanded to 12bits before negation, but it was expanded after negation.
Therefore negation of -512 was still -512, and after expansion, it had a -512.
I guess "$signed() blocks expansion..Any idea what happens??
First of all, -512 and 512 are identical numbers in 10-bit represenntation. 10 bits can actually only hold signed values from -512 to 511. In this scheme negation of -512 should work weirdly, not mentioned in lrm, at least i was not able to locate anything related. This is probably an undefined behavior.
However, it is logical to assume that in this scheme in order to represent a negated value of '-512' just removing signess is sufficient. It seems that all commercial compilers in eda playground do this. So, a result of the unaray - operator in this case will be unsigned value of 512.
So, in out = $signed(-(-512)) the negation operator returns an unsigned value of 512 and it gets converted to a signed by the system task. Therefore, it gets sign extended in out.
out = -$signed(-512) for the same reason the outermost negation operator returns an unsigned value of 512. No sign extension happens here.
You can again make it signed by enclosing in yet another $signed as out = $signed(-$signed(-512))

What is the point of writing integer in hexadecimal, octal and binary?

I am well aware that one is able to assign a value to an array or constant in Swift and have those value represented in different formats.
For Integer: One can declare in the formats of decimal, binary, octal or hexadecimal.
For Float or Double: One can declare in the formats of either decimal or hexadecimal and able to make use of the exponent too.
For instance:
var decInt = 17
var binInt = 0b10001
var octInt = 0o21
var hexInt = 0x11
All of the above variables gives the same result which is 17.
But what's the catch? Why bother using those other than decimal?
There are some notations that can be way easier to understand for people even if the result in the end is the same. You can for example think in cases like colour notation (hexadecimal) or file permission notation (octal).
Code is best written in the most meaningful way.
Using the number format that best matches the domain of your program, is just one example. You don't want to obscure domain specific details and want to minimize the mental effort for the reader of your code.
Two other examples:
Do not simplify calculations. For example: To convert a scaled integer value in 1/10000 arc minutes to a floating point in degrees, do not write the conversion factor as 600000.0, but instead write 10000.0 * 60.0.
Chose a code structure that matches the nature of your data. For example: If you have a function with two return values, determine if it's a symmetrical or asymmetrical situation. For a symmetrical situation always write a full if (condition) { return A; } else { return B; }. It's a common mistake to write if (condition) { return A; } return B; (simply because 'it works').
Meaning matters!

Converting Double to Integer for Modulo Operation in MATLAB

I am trying to perform a modulo operation in MATLAB, and I'm not sure how to convert the input variable to the correct data type for the modulo operation to complete.
Here is what I have:
sequence = 0;
....
sequence = sequence + 1;
if (modp(sequence, 3) == 0)
....
In C-ish, I'm looking to perform if (sequence % 3 == 0).
MATLAB complains that there is no modp operation for a double, and that I must use an int. However, the documentation doesn't say which integer format I need to use (i.e., int8, int64, et cetera) and none of those integer formats work.
What am I doing wrong?
Did you realize you are using a function of the "symbolic toolbox"? I don't see any advantage in this case thus simply use mod(a,b) from Matlab (there is also a fixed point mod(a,b) and symbolic mod(a,b), don't confuse them)
http://www.mathworks.de/de/help/matlab/ref/mod.html

Floating point bug when writing to file in Matlab?

Im not sure if Im missing something simple but the following code fails (a and b are meant to be the same):
a=single(2147483584)
f=fopen('test','wb');
fwrite(f,a,'int32')
fclose(f);
f=fopen('test','rb');
b=fread(f,inf,'int32');
fclose(f)
a
b
with output:
a =
2.1475e+009
b =
-2.1475e+009
and the following code succeeds:
a=single(2147483583)
f=fopen('test','wb');
fwrite(f,a,'int32')
fclose(f);
f=fopen('test','rb');
b=fread(f,inf,'int32');
fclose(f)
a
b
with output:
a =
2.1475e+009
b =
2.1475e+009
Does anyone know why?
I don't know Matlab well, but it seems fairly clear what's happening here. You're converting a to a float and then storing the result of that conversion as a 32-bit signed integer. But the nearest single-precision IEEE 754 float to the integer 2147483584 is 2147483648.0, or 2**31. A 32-bit integer can only represent values in the range [-2**31, 2**31-1], so it looks as though when you write this value as an integer, it gets wrapped modulo 2**32 to give -2**31 instead of 2**31.
In contrast, the nearest single-precision float to 2147483583 is 2147483520.0, which does fit in a 32-bit integer.