Consider for example the following double-precision numbers:
x = 1232.2454545e-89;
y = -1232.2454545e-89;
Can I be sure that y is always exactly equal to -x (or Matlab's uminus(x))? Or should I expect small numerical differences of the order or eps as it often happens with numerical computations? Try for example sqrt(3)^2-3: the result is not exactly zero. Can that happen with unary minus as well? Is it lossy like square root is?
Another way to put the question would be: is a negative numerical literal always equal to negating its positive counterpart?
My question refers to Matlab, but probably has more to do with the IEEE 754 standard than with Matlab specifically.
I have done some tests in Matlab with a few randomly selected numbers. I have found that, in those cases,
They turn out to be equal indeed.
typecast(x, 'uint8') and typecast(-x, 'uint8') differ only in the sign bit as defined by IEEE 754 double-precision format.
This suggests that the answer may be affirmative. If applying unary minus only changes the sign bit, and not the significand, no precision is lost.
But of course I have only tested a few cases. I'd like to be sure this happens in all cases.
This question is computer architecture dependent. However, the sign of floating point numbers on modern architectures (including x64 and ARM cores) is represented by a single sign bit, and they have instructions to flip this bit (e.g. FCHS). That being the case, we can draw two conclusions:
A change of sign can be achieved (and indeed is by modern compilers and architectures) by a single bit flip/instruction. This means that the process is completely invertible, and there is no loss of numerical accuracy.
It would make no sense for MATLAB to do anything other than the fastest, most accurate thing, which is just to flip that bit.
That said, the only way to be sure would be to inspect the assembly code for uminus in your MATLAB installation. I don't know how to do this.
Related
I am wondering what would be the way to tell Matlab that all computations need to be done and continued with let's say 4 digits.
format long or other formats I think is for showing the results with specific precision, not for their value precision.
Are you hoping to increase performance or save memory by not using double precision? If so, then variable precision arithmetic and vpa (part of the Symbolic Math toolbox) are probably not the right choice. Instead, you might consider fixed point arithmetic. This type of math is mostly used on microcontrollers and architectures with memory address widths of less than 32 bits or that lack dedicated floating point hardware. In Matlab you'll need the Fixed Point Designer toolbox. You can read more about the capabilities here.
Use digits: http://www.mathworks.com/help/symbolic/digits.html
digits(d) sets the current VPA accuracy to d significant decimal digits. The value d must be a positive integer greater than 1 and less than 2^29 + 1.
The benefits of using the two's complement for storing negative values in memory are well-known and well-discussed in this board.
Hence, I'm wondering:
Do or did some architectures exist, which have chosen a different way for representing negative values in memory than using two's complement?
If so: What were the reasons?
Signed-magnitude existed as the most obvious, naive implementation of signed numbers.
One's complement has also been used on real machines.
On both of those representations, there's a benefit that the positive and negative ranges span equal intervals. A downside is that they both contain a negative zero representation that doesn't naturally occur in the sort of integer arithmetic commonly used in computation. And of course, the hardware for two's complement turns out to be much simpler to build
Note that the above applies to integers. Common IEEE-style floating point representations are effectively sign-magnitude, with some more details layered into the magnitude representation.
I am using Matlab's corr function to calculate the correlation of a dataset. While the results agree within the double point accuracy (<10^-14), they are not exactly the same even on the same computer for different runs.
Is floating-point calculation deterministic? Where is the source of the randomness?
Yes and no.
Floating point arithmetic, as in a sequence of operations +, *, etc. is deterministic. However in this case, linear algebra libraries (BLAS, LAPACK, etc) are most likely being used, which may not be: for example, matrix multiplication is typically not performed as a "triple loop" as some references would have you believe, but instead matrices are split up into blocks that are optimised for maximum performance based on things like cache size. Therefore, you will get different sequences of operations, with different intermediate rounding, which will give slightly different results. Typically, however, the variation in these results is smaller than the total rounding error you are incurring.
I have to admit, I am a little bit surprised that you get different results on the same computer, but it is difficult to know why without knowing what the library is doing (IIRC, Matlab uses the Intel BLAS libraries, so you could look at their documentation).
This is something I noticed in Matlab when trying to do MLE. My first estimator used the log likelihood of a pdf and broke the product up as a sum. For example, a log weibull pdf (f(x)=b ax^(a-1)exp(-bx^a)) broken up is:
log_likelihood=log(b)+log(a)+(a-1)log(x)-bx^a
Evaluating this is wildly different to this:
log_likelihood=log(bax^(a-1)exp(-bx^a))
What is the computer doing differently in the two stages? The first one gives a much larger number (by a couple orders of magnitude).
Depending on the numbers you use, this could be a numerical issue: If you combine very large numbers with very small numbers, you can get inaccuracies due to limitations in number precision.
One possibility is that you lose some accuracy in the second case since you are operating at different scales.
I work on a scientific software project implementing maximum likelihood of phylogenetic trees, and consistently run into issues regarding numerical precision. Often the descepency is ...
between competing applications with the same values in the model,
when calculating the MLE scores by hand,
in the order of the operations in the computation.
It really all comes down to number three, and even in your case. Mulitplication of small and very large numbers can cause weird results when their exponents are scaled during computation. There is a lot about this in the (in)famous "What Every Computer Scientist Should Know About Floating-Point Arithmetic". But, what I've mentioned is the short of it if that's all you are interested in.
Over all, the issue you are seeing are strictly numerical issues in the representation of floating point / double precision numbers and operations when computing the function. I'm not too familiar with MATLAB, but they may have an arbitrary-precision type that would give you better results.
Aside from that, keep them symbolic as long as possible and if you have any intuition about the variables size (as in a is always very large compared to x), then make sure you are choosing the order of parenthesis wisely.
The first equation should be better since it is dealing with adding logs, and should be much more stable then the second --although x^a makes me a bit weary as it would dominate the equation, but it would in practice anyways.
I am trying to implement the Newton-Raphson Division Algorithm Wikipedia entry to implement a IEEE-754 32-bit floating point divide on a processor which has no hardware divide unit.
My memory locations are 32-bit two's complement word and I have already implemented floating point addition, subtraction, and multiplication, so I can reuse the code to implement the Newton-Raphson algorithm. I am trying to first implement this all in Matlab.
At this step:
X_0 = 48/17 - 32/17 * D
How do I bitshift D properly to between 0.5 and 1 as described in the algorithm details?
You might look at the compiler-rt runtime library (part of LLVM), which has a liberal license and implements floating-point operations for processors that lack hardware support.
You could also look at libgcc, though I believe that's GPL, which may or may not be an issue for you.
In fact, don't just look at them. Use one of them (or another soft-float library). There's no need to re-invent the wheel.