Probability of generating a particular random number, such as in MATLAB - matlab

In real probability, there is a 0% chance that a random number p, selected from all of the real numbers in the interval (0,1), will be 0.5. However, what are the odds that
rand == 0.5
in MATLAB? I suppose this is like asking how many double-precision numbers are between zero and one, or maybe there are other factors at play.

No particular info on MATLAB's generator...
In general even simple pseudo-random generators have long enough cycles which would cover all values representable by double.
If MATLAB uses some other form of generating random numbers it would be even better - so assuming it uniformly covers whole range of double values.
I believe probability would be: distance between representable numbers around values you are interested divided by length of the interval. See What is the minimal step in double data type? (.NET) for discussion on the distance.

Looking at this question, we see that there are 262 - 252
doubles in the interval (0 1). Therefore, the probability of picking any single one (like 0.5) would be roughly equal to one divided by this number, or
>> p = 1/(2^62-2^52)
ans =
2.170523997312134e-019
However, as horchler already indicates, it also depends on the type of random number generator you use, as well as MATLAB's implementation thereof. Sadly, I have only basic knowledge on the implementaion details for each, but you can look here for a list of available random number generators in MATLAB and google a bit further for more precise numbers.

I am not sure whether Alexei was trying to say this, but inspired by him I think the probability will indeed be approximately the distance between numbers around 0.5.
Therefore I expect the probability to be approximately:
eps(0.5)
Which evaluates to 1.1102e-16
Given the monotonic nature of the difference between double numbers I would actually think this holds:
eps(0.5-eps(0.5)) <= yourprobability <= eps(0.5)
Implying a range of 5.5511e-17 to 1.1102e-16

Related

Matlab Zero Tolerance in rank function

I am wondering if there is technical or theoretical reason on why Matlab on rank function considers as zero the value max(size(A))*eps(norm(A)). Can you please provide some intuition?
Thank you!
The following answer is not based on proper mathematical reasoning, it is just some speculations (as you were asking for intuition):
norm(A) is the order of magnitude of the matrix entries.
eps(norm(A)) is thus the accuracy that the floating point representation of the matrix entries typically has.
Now, consider you add N numbers that should theoretically add up to zero, but each of them has an error of eps to it ... I think we would expect an error in the order of sqrt(N) * eps for the result.
Then, given that the algorithm that computes the rank performs N^2 operations on the matrix entries (where N is its size) to result in a number that is checked against zero, the error that we would then expect is what you stated in your question.
What I don't know, is the algorithm that Matlab uses really of complexity N^2?

Matlab function that generates random real numbers in a closed interval

It's there any function in Matlab that generates random real numbers in a closed interval. I found something with unifrnd() but it's generating numbers in an open interval.
If I use unifrnd(x,y); I get (x,y) interval, instead of [x,y].
Given the discussion of accuracy in the comments, you could use something like:
mag = floor(log10( y - x))
num = unifrnd(x-(10^mag)*eps, y+(10^mag)*eps)
This essentially adds one "point" to the discrete interval representation, taking into account the accuracy based on the size of the numbers you're using. unifrnd() is essentially a wrapper around rand() (which means you don't really need the stats toolbox to do this), and thus it is really just scaling the uniform distribution on (0,1). If you're worried about the endpoints though, that matters, because you can't get more granular than the product the magnitude of your interval length with eps.

Calculate the variance of an integer vector in MATLAB

I need to calculate the variance of a large vector which is stored as uint8. The MATLAB var function however only accepts double and single types as input. The easiest way to calculate the variance would therefore be
vec = randi(255,1,100,'uint8');
var(single(vec))
This of course gives the correct result. However using single datatype increses the memory usage by a factor of 4. For large vectors (~ 1 million elements) this will quickly fill up the memory.
What I tried: The definition of the variance for a discrete random variable X is
(Source: Wikipedia)
I estimated the p's using the histogram, but then got stuck: To calculate the variance in a vectorized fashion, I would need to convert the x_i's to single or double.
Is there any possibility to calculate the variance without converting the whole vector to single or double?
If you're willing to work with uint16, you can do this, it creates only 3 floating point numbers (var and the 2 means), use Var(X)=Mean(X^2)-Mean(X)^2:
uivec=uint16(vec);
mean(uivec.^2)-mean(uivec)^2
So, not as good as keeping uint8 but still twice better than converting to single. It should work with uint16 because your input is uint8 and (2^8)^2=2^16.
If you want the exact same answer as var, you need to remember that MATLAB uses the unbiased estimator for var (it divides the sum by n-1 instead of n, where n is your number of samples) so you need to do:
n=length(vec);
v=mean(uivec.^2)-mean(uivec)^2*(n/(n-1))
then your v will be exactly equal to var(single(vec)).
No. The value of the variance is going to be a floating point value most likely, so you need to perform floating point operations.
p_i itself is the Probability mass function, so sum(p_i) should be one, therefore each p_i is a floating point number.
In addition, nu, the mean, will probably not be integer neither

Matlab Bug in Sine function?

Has anyone tried plotting a sine function for large values in MATLAB?
For e.g.:
x = 0:1000:100000;
plot(x,sin(2*pi*x))
I was just wondering why the amplitude is changing for this periodic function? As per what I expect, for any value of x, the function has a period of 2*pi. Why is it not?
Does anyone know? Is there a way to get it right? Also, is this a bug and is it already known?
That's actually not the amplitude changing. That is due to the numerical imprecisions of floating point arithmetic. Bear in mind that you are specifying an integer sequence from 0 to 100000 in steps of 1000. If you recall from trigonometry, sin(n*x*pi) = 0 when x and n are integers, and so theoretically you should be obtaining an output of all zeroes. In your case, n = 2, and x is a number from 0 to 100000 that is a multiple of 1000.
However, this is what I get when I use the above code in your post:
Take a look at the scale of that graph. It's 10^{-11}. Do you know how small that is? As further evidence, here's what the max and min values are of that sequence:
>> min(sin(2*pi*x))
ans =
-7.8397e-11
>> max(sin(2*pi*x))
ans =
2.9190e-11
The values are so small that they might as well be zero. What you are visualizing in the graph is due to numerical imprecision. As I mentioned before, sin(n*x*pi) = 0 when n and x is are integers, under the assumption that we have all of the decimal places of pi available. However, because we only have 64-bits total to represent pi numerically, you will certainly not get the result to be exactly zero. In addition, be advised that the sin function is very likely to be using numerical approximation algorithms (Taylor / MacLaurin series for example), and so that could also contribute to the fact that the result may not be exactly 0.
There are, of course, workarounds, such as using the symbolic mathematics toolbox (see #yoh.lej's answer). In this case, you will get zero, but I won't focus on that here. Your post is questioning the accuracy of the sin function in MATLAB, that works on numeric type inputs. Theoretically with your input into sin, as it is an integer sequence, every value of x should make sin(n*x*pi) = 0.
BTW, this article is good reading. This is what every programmer needs to know about floating point arithmetic and functions: http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html. A more simple overview can be found here: http://floating-point-gui.de/
Because what is the exact value of pi?
This apparent error is due to the limit of floating point accuracy. If you really need/want to get around that you can do symbolic computation with matlab, see the difference between:
>> sin(2*pi*10)
ans =
-2.4493e-15
and
>> sin(sym(2*pi*10))
ans =
0

Matlab: reverse of eps? Accuracy on positive weight?

eps returns the distance from 1.0 to the next largest double-precision number, so I can use it to interpret the numbers value on negative weight position. But for very large number with value on high positive weight position, what can I use to interpret?
I mean that I need to have some reference to count out computation noise on numbers obtained on Matlab.
Have you read "What Every Computer Scientist Should Know About Floating-Point Arithmetic"?
It discusses rounding error (what you're calling "computation noise"), the IEEE 754 standard for representation of floating-point numbers, and implementations of floating-point math on computers.
I believe that reading this paper would answer your question, or at least give you more insight into exactly how floating point math works.
Some clarifications to aid your understanding - too big to fit in the comments of #Richante's post:
Firstly, the difference between realmin and eps:
realmin is the smallest normalised floating point number. You can represent smaller numbers in denormalised form.
eps is the smallest increment between distinct numbers. realmin = eps(realmin) * 2^52.
"Normalised" and "denormalised" floating point numbers are explained in the paper linked above.
Secondly, rounding error is no indicator of how much you can "trust" the nth digit of a number.
Take, for example, this:
>> ((0.1+0.1+0.1)^512)/(0.3^512)
ans =
1.0000
We're dividing 0.3^512 by itself, so the answer should be exactly one, right? We should be able to trust every digit up to eps(1).
The error in this calculation is actually 400 * eps:
>> ((0.1+0.1+0.1)^512)/(0.3^512) - 1
ans =
9.4591e-014
>> ans / eps(1)
ans =
426
The calculation error, i.e. the extent to which the nth digit is untrustworthy, is far greater than eps, the floating-point roundoff error in the representation of the answer. Note that we only did six floating-point operations here! You can easily rack up millions of FLOPs to produce one result.
I'll say it one more time: eps() is not an indicator of the error in your calculation. Do not attempt to display : "My result is 1234.567 +/- eps(1234.567)". That is meaningless and deceptive, because it implies your numbers are more precise than they actually are.
eps, the rounding error in the representation of your answer, is only 1 part per billion trillion or so. Your real enemy is the error that accumulates every time you do a floating point operation, and that is what you need to track for a meaningful estimate of the error.
Easier to digest than the paper Li-aung Yip recommends would be the Wikipedia article on machine epsilon. Then read What Every Computer Scientist ...
Your question isn't very well worded, but I think you want something that gives the distance from a number to the next smallest double-precision number? If this is the case, then you can just use:
x = 100;
x + eps(x) %Next largest double-precision number
x - eps(-x) %Next smallest double-precision number
Double-precision numbers have a single sign bit, so counting up from a negative number is the same as counting down from a positive.
Edit:
According to help eps, "For all X, EPS(X) is equal to EPS(ABS(X))." which really confuses me; I can't see how that can be consistent with double having a single sign bit, and values not being equally spaced.