The best way to convert a real valued number to an integer value larger than 32 bits? - system-verilog

Are there any system functions that support more than 32 bits in System Verilog? I want to convert a real-valued quantity to an integral value that contains more than 32 bits. The $rtoi() system function does precisely what I want for values that can be represented in 32 bits or less. Is there a built-in for this, or would I need to write my own?
For a concrete example, I would like to be able to do something like the following:
logic [41:0] test_value;
initial begin
test_value = $rtoi($pow(2.0, 39.5));
end
Where, instead of $rtoi(), I would use the unknown sought after system function. Given the correct function, I would expect this to result in test_value being initialized with the bit sequence 42'b1011010100000100111100110011001111111001 or possibly 42'b1011010100000100111100110011001111111010 if rounding is supported.
I can write my own function, but I would like to avoid reinventing the wheel unless there is no wheel.

A implicit cast from real to integral gives you what you want with rounding
test_value = 2.0**39.5;
Or you can use an explicit cast
typedef logic [41:0] uint42_t;
test_value = uint42_t'(2.0**39.5);

Related

How to convert values to N bits of resolution in MATLAB?

My computer uses 32 bits of resolution as default. I'm writing a script that involves taking measurements with a multimeter that has N bits of resolution. How do I convert the values to that?
For example, if I have a RNG that gives 1000 values
nums = randn(1,1000);
and I use an N-bit multimeter to read those values, how would I get the values to reflect that?
I currently have
meas = round(nums,N-1);
but it's giving me N digits, not N bits. The original random numbers are unbounded, but the resolution of the multimeter is the limitation; how to implement the limitation is what I'm looking for.
Edit I: I'm talking about the resolution of measurement, not the bounds of the numbers. The original values are unbounded. The accuracy of the measured values should be limited by the resolution.
Edit II: I revised the question to try to be a bit clearer.
randn doesn’t produce bounded numbers. Let’s say you are producing 32-bit integers instead:
mums = randi([0,2^32-1],1,n);
To drop the bottom 32-N bits, simply divide by an appropriate value and round (or take the floor):
nums = round(nums/(2^(32-N)));
Do note that we only use floating-point arithmetic here, numbers are integer-valued, but not actually integers. You can do a similar operation using actual integers if you need that.
Also, obviously, N should be lower than 32. You cannot invent new bits. If N is larger, the code above will add zero bits at the bottom of the number.
With a multimeter, it is likely that the range is something like -M V to M V with a a constant resolution, and you can configure the M selecting the range.
This is fixed point math. My answer will not use it because I don't have the toolbox available, if you have it you could use it to have simpler code.
You can generate the integer values with the intended resolution, then rescale it to the intended range.
F=2^N-1 %Maximum integer value
X=randi([0,F],100,1)
X*2*M/F-M %Rescale, divide by the integer range, multiply by the intended range. Then offset by intended minimum.

Are certain MATLAB functions only precise to a certain decimal? How precise is MATLAB really?

I am converting a program from MATLAB 2012 to 2016. I've been getting some strange errors, which I believe some of are due to a lack of precision in MATLAB functions.
For instance, I have a timeseries oldTs as such:
Time Data
-----------------------------
1.00000000000000001 1.277032377439511
1.00000000000000002 1.277032378456123
1.00000000000000003 1.277032380112478
I have another timeseries newTs with similar data, but many more rows. oldTs may have half a million rows, whereas newTs could have a million. I want to interpolate the data from the old timeseries with the new timeseries, for example:
interpolatedTs = interp(oldTs.time, oldTs.data, newTs.time)
This is giving me an error: x values must be distinct
The thing is, my x values are distinct. I think that MATLAB may be truncating some of the data, and therefore believing that some of the data is not unique. I found that other MATLAB functions do this:
test = [1.00000000000000001, 1.00000000000000002, 1.0000000000000000003]
unique(test)
ans =
1
test2 = [10000000000000000001, 10000000000000000002, 10000000000000000003]
unique(test2)
ans =
1.000000000000000e+19
MATLAB thinks that this vector only has one unique value in it instead of three! This is a huge issue for me, as I need to maintain the highest level of accuracy and precision with my data, and I cannot sacrifice any of that precision. Speed/Storage is not a factor.
Do certain MATLAB functions, by default, truncate data at a certain nth decimal? Has this changed from MATLAB 2012 to MATLAB 2016? Is there a way to force MATLAB to use a certain precision for a program? Why does MATLAB do this to begin with?
Any light shed on this topic is much appreciated. Thanks.
No, this has not changed since 2012, nor since the very first version of MATLAB. MATLAB uses, and has always used, double precision floating point values by default (8 bytes). The first value larger than 1 that can be represented is 1 + eps(1), with eps(1) = 2.2204e-16. Basically you have less than 16 decimal digits to play with. Your value 1.00000000000000001 is identical to 1 in double precision floating point representation.
Note that this is not something specific to MATLAB, it is a standard that your hardware conforms to. MATLAB simply uses your hardware's capabilities.
Use the variable precision arithmetic from the Symbolic Math Toolbox to work with higher precision numbers:
data = [vpa(1) + 0.00000000000000001
vpa(1) + 0.00000000000000002
vpa(1) + 0.00000000000000003]
data =
1.00000000000000001
1.00000000000000002
1.00000000000000003
Note that vpa(1.00000000000000001) will not work, as the number is first interpreted as a double-precision float value, and only after converted to VPA, but the damage has already been done at that point.
Note also that arithmetic with VPA is a lot slower, and some operations might not be possible at all.

Variable precicion arithmetic for symbolic integral in Matlab

I am trying to calculate some integrals that use very high power exponents. An example equation is:
(-exp(-(x+sqrt(p)).^2)+exp(-(x-sqrt(p)).^2)).^2 ...
./( exp(-(x+sqrt(p)).^2)+exp(-(x-sqrt(p)).^2)) ...
/ (2*sqrt(pi))
where p is constant (1000 being a typical value), and I need the integral for x=[-inf,inf]. If I use the integral function for numeric integration I get NaN as a result. I can avoid that if I set the limits of the integration to something like [-20,20] and a low p (<100), but ideally I need the full range.
I have also tried setting syms x and using int and vpa, but in this case vpa returns:
1.0 - 1.0*numeric::int((1125899906842624*(exp(-(x - 10*10^(1/2))^2) - exp(-(x + 10*10^(1/2))^2))^2)/(3991211251234741*(exp(-(x - 10*10^(1/2))^2) + exp(-(x + 10*10^(1/2))^2)))
without calculating a value. Again, if I set the limits of the integration to lower values I do get a result (also for low p), but I know that the result that I get is wrong – e.g., if x=[-100,100] and p=1000, the result is >1, which should be wrong as the equation should be asymptotic to 1 (or alternatively the codomain should be [0,1) ).
Am I doing something wrong with vpa or is there another way to calculate high precision values for my integrals?
First, you're doing something that makes solving symbolic problems more difficult and less accurate. The variable pi is a floating-point value, not an exact symbolic representation of the fundamental constant. In Matlab symbolic math code, you should always use sym('pi'). You should do the same for any other special numeric values, e.g., sqrt(sym('2')) and exp(sym('1')), you use or they will get converted to an approximate rational fraction by default (the source of strange large number you see in the code in your question). For further details, I recommend that you read through the documentation for the sym function.
Applying the above, here's a runnable example:
syms x;
p = 1000;
f = (-exp(-(x+sqrt(p)).^2)+exp(-(x-sqrt(p)).^2)).^2./(exp(-(x+sqrt(p)).^2)...
+exp(-(x-sqrt(p)).^2))/(2*sqrt(sym('pi')));
Now vpa(int(f,x,-100,100)) and vpa(int(f,x,-1e3,1e3)) return exactly 1.0 (to 32 digits of precision, see below).
Unfortunately, vpa(int(f,x,-Inf,Inf)), does not return an answer, but a call to the underlying MuPAD function numeric::int. As I explain in this answer, this is what can happen when int cannot obtain a result. Normally, it should try to evaluate the the integral numerically, but your function appears to be ill-defined at ±∞, resulting in divide by zero issues that the variable precision quadrature methods can't handle well. You can evaluate the integral at wider bounds by increasing the variable precision using the digits function (just remember to set digits back to the default of 32 when done). Setting digits(128) allowed me to evaluate vpa(int(f,x,-1e4,1e4)). You can also more efficiently evaluate your integral over a wider range via 2*vpa(int(f,x,0,1e4)) at lower effective digits settings.
If your goal is to see exactly how much less than one p = 1000 corresponds to, you can use something like vpa(1-2*int(f,x,0,1e4)). At digits(128), this returns
0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000086457415971094118490438229708839420392402555445545519907545198837816908450303280444030703989603548138797600750757834260181259102
Applying double to this shows that it is approximately 8.6e-89.

how I must use digits function in matlab

i have code and use double function several time to convert sym to double.to increase precision , I want to use digits function.
I want to know it is enough that I write digits in the top of code or I must write digits in above of every double function.
digits set's the precision until it is changed again. Calling digits() without any input you get the precision to verify it's set correct.
In many cases digis has absoluetly no influence on symbolic variables because an analytical solution is found. This means there are no precision errors unless you convert to double. When convertig, digits should be set to at least 16 because this matches double precision.

fortran90 reading array with real numbers

I have a list of real data in a file. The real data looks like this..
25.935
25.550
24.274
29.936
23.122
27.360
28.154
24.320
28.613
27.601
29.948
29.367
I write fortran90 code to read this data into an array as below:
PROGRAM autocorr
implicit none
INTEGER, PARAMETER :: TRUN=4000,TCOR=1800
real,dimension(TRUN) :: angle
real :: temp, temp2, average1, average2
integer :: i, j, p, q, k, count1, t, count2
REAL, DIMENSION(0:TCOR) :: ACF
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
open(100, file="fort.64",status="old")
do k = 1,TRUN
read(100,*) angle(k)
end do
Then, when I print again to see the values, I get
25.934999
25.549999
24.274000
29.936001
23.122000
27.360001
28.153999
24.320000
28.613001
27.601000
29.948000
29.367001
32.122002
33.818001
21.837000
29.283001
26.489000
24.010000
27.698000
30.799999
36.157001
29.034000
34.700001
26.058001
29.114000
24.177000
25.209000
25.820999
26.620001
29.761000
May I know why the values are now 6 decimal points?
How to avoid this effect so that it doesn't affect the calculation results?
Appreciate any help.
Thanks
You don't show the statement you use to write the values out again. I suspect, therefore, that you've used Fortran's list-directed output, something like this
write(output_unit,*) angle(k)
If you have done this you have surrendered the control of how many digits the program displays to the compiler. That's what the use of * in place of an explicit format means, the standard says that the compiler can use any reasonable representation of the number.
What you are seeing, therefore, is your numbers displayed with 8 sf which is about what single-precision floating-point numbers provide. If you wanted to display the numbers with only 3 digits after the decimal point you could write
write(output_unit,'(f8.3)') angle(k)
or some variation thereof.
You've declared angle to be of type real; unless you've overwritten the default with a compiler flag, this means that you are using single-precision IEEE754 floating-point numbers (on anything other than an exotic computer). Bear in mind too that most real (in the mathematical sense) numbers do not have an exact representation in floating-point and that the single-precision decimal approximation to the exact number 25.935 is likely to be 25.934999; the other numbers you print seem to be the floating-point approximations to the numbers your program reads.
If you really want to compute your results with a lower precision, then you are going to have to employ some clever programming techniques.