Calculate the variance of an integer vector in MATLAB - matlab

I need to calculate the variance of a large vector which is stored as uint8. The MATLAB var function however only accepts double and single types as input. The easiest way to calculate the variance would therefore be
vec = randi(255,1,100,'uint8');
var(single(vec))
This of course gives the correct result. However using single datatype increses the memory usage by a factor of 4. For large vectors (~ 1 million elements) this will quickly fill up the memory.
What I tried: The definition of the variance for a discrete random variable X is
(Source: Wikipedia)
I estimated the p's using the histogram, but then got stuck: To calculate the variance in a vectorized fashion, I would need to convert the x_i's to single or double.
Is there any possibility to calculate the variance without converting the whole vector to single or double?

If you're willing to work with uint16, you can do this, it creates only 3 floating point numbers (var and the 2 means), use Var(X)=Mean(X^2)-Mean(X)^2:
uivec=uint16(vec);
mean(uivec.^2)-mean(uivec)^2
So, not as good as keeping uint8 but still twice better than converting to single. It should work with uint16 because your input is uint8 and (2^8)^2=2^16.
If you want the exact same answer as var, you need to remember that MATLAB uses the unbiased estimator for var (it divides the sum by n-1 instead of n, where n is your number of samples) so you need to do:
n=length(vec);
v=mean(uivec.^2)-mean(uivec)^2*(n/(n-1))
then your v will be exactly equal to var(single(vec)).

No. The value of the variance is going to be a floating point value most likely, so you need to perform floating point operations.
p_i itself is the Probability mass function, so sum(p_i) should be one, therefore each p_i is a floating point number.
In addition, nu, the mean, will probably not be integer neither

Related

MatLab:Generate N pseudo-random numbers with a Poisson distribution having mean M and total T where N,M, and T are user defined

I’d like to be able to generate in MatLab a sequence of N pseudo-random numbers with a Poisson distribution having mean M. The sum of the N numbers should be T. N, M, and T are always positive or zero and would be user specified parameters to any function.
Obviously, if T is small relative to N it is likely that there will be problems achieving a total of T. In that case the function could just return the values T and then N-1 zeros or an error code. However, it is highly likely that in most cases T>>N.
I have been trying variations based on the method of generating random numbers with a given distribution provided at http://matlabtricks.com/post-44/generate-random-numbers-with-a-given-distribution and trying various normalizations at each step but have not been successful.
You could try to approximate what you want by using multinomial distribution.
If you use Wikipedia notation, then k=N, n=T and pi=M/T. Poisson distribution has distinctive property of mean equal to variance, but if your parameters are such that pi is small, then mean npi would be quite close to variance npi(1-pi). Sum would be automatically (by property of multinomial) equal of T.
Multinomial sampling in Matlab is done using mnrmd function.
UPDATE
Wrt comment, lets consider N sampled values vi, and write their sum
Sum(i=1...N) vi = T
Lets compute mean value of the left and right side of this equation.
Sum(i=1...N) E(vi) = E(T) = T
On the right side, mean value of constant is constant itself. On the left side we have
Sum(i=1...N) E(vi) = Sum(i=1...N) M = N*M = T
Therefore, M=T/N and pi=M/T=1/N.

MATLAB - negative values go to NaN in a symmetric function

Can someone please explain why the following symmetric function cannot pass a certain limit of negative values?
D = 0.1; l = 4;
c = #(x,v) (v/D).*exp(-v*x/D)./(1-exp(-v*l/D));
v_vec = -25:0.01:25;
figure(2)
hold on
plot(v_vec,c(l,v_vec),'b')
plot(v_vec,c(0,v_vec),'r')
Notice at the figure where the blue line chops, this is where I get inf/nan values.
It seems that Matlab is trying to compute a result that is too large, outputs +inf, and then operates on that, which yields +/- inf and NaNs.
For instance, at v=-25, part of the function computes exp(-(-25)*4/0.1), which is exp(1000), and that outputs +inf. (larger than the largest representable double precision float).
You can potentially solve that problem by rewriting your function to avoid operating of such very large (or very small) numbers, say by reorganising the fraction containing exp() functions.
I did encounter the same hurdle using exp() with arguments triggering overflow. Sometimes it is difficult to trace back numeric imprecision or convergence errors. In principle the function definition using exp() only create intermediate issues as your purpose as a transition function. The intention I guess was to provide a continuous function.
My solution to this problem is to divide the argument into regions and provide in each region an approximation function. In your case zero for negative x and proportional to x for positive x. In between you can use the orginal function. Care should be taken to match the approximation at the borders of the regions and the number of continuous differentiations which is important for convergence in loops.

Why Kernel smoothing function, ksdensity, in MATLAB, results in values greater than one?

I have a set of samples, S, and I want to find its PDF. The problem is when I use ksdensity I get values greater than one!
[f,xi] = ksdensity(S)
In array f, most of the values are greater than one! Would you please tell me what the problem can be? Thanks for your help.
For example:
S=normrnd(0.3035, 0.0314,1,1000);
ksdensity(S)
ksdensity, as the name says, estimates a probability density function over a continuous variable. Probability densities can be larger than 1, they can actually have arbitrary values from zero upwards. The constraint on probabilities is that their sum over an exhaustive range of possibilities has to be 1. For probability densities, the constraint is that the integral over the whole range of values is 1.
A crude approximation of an integral of the pdf estimated by ksdensity can be obtained in Matlab like this:
sum(f) * min(diff(xi))
assuming that the values in xi are equally spaced. The value of this expression should be approximately 1.
If in your application you believe this approximation is not close enough to 1, you might want to specify the grid of estimation points (second parameter pts) such that the spacing is finer or the range is wider than the one automatically generated by ksdensity.

How does matlab compare two complex numbers?

I saw a file in matlab with used max() on a matrix whose entries are complex numbers. I can't understand how does matlab compare two complex numbers?
ls1=max(tfsp');
Here , tfsp contains complex numbers.
The complex numbers are compared first by magnitude, then by phase angle (if there is a tie for the maximum magnitude.)
From help max:
When X is complex, the maximum is computed using the magnitude
MAX(ABS(X)). In the case of equal magnitude elements, then the phase
angle MAX(ANGLE(X)) is used.
NaN's are ignored when computing the maximum. When all elements in X
are NaN's, then the first one is returned as the maximum.

Probability of generating a particular random number, such as in MATLAB

In real probability, there is a 0% chance that a random number p, selected from all of the real numbers in the interval (0,1), will be 0.5. However, what are the odds that
rand == 0.5
in MATLAB? I suppose this is like asking how many double-precision numbers are between zero and one, or maybe there are other factors at play.
No particular info on MATLAB's generator...
In general even simple pseudo-random generators have long enough cycles which would cover all values representable by double.
If MATLAB uses some other form of generating random numbers it would be even better - so assuming it uniformly covers whole range of double values.
I believe probability would be: distance between representable numbers around values you are interested divided by length of the interval. See What is the minimal step in double data type? (.NET) for discussion on the distance.
Looking at this question, we see that there are 262 - 252
doubles in the interval (0 1). Therefore, the probability of picking any single one (like 0.5) would be roughly equal to one divided by this number, or
>> p = 1/(2^62-2^52)
ans =
2.170523997312134e-019
However, as horchler already indicates, it also depends on the type of random number generator you use, as well as MATLAB's implementation thereof. Sadly, I have only basic knowledge on the implementaion details for each, but you can look here for a list of available random number generators in MATLAB and google a bit further for more precise numbers.
I am not sure whether Alexei was trying to say this, but inspired by him I think the probability will indeed be approximately the distance between numbers around 0.5.
Therefore I expect the probability to be approximately:
eps(0.5)
Which evaluates to 1.1102e-16
Given the monotonic nature of the difference between double numbers I would actually think this holds:
eps(0.5-eps(0.5)) <= yourprobability <= eps(0.5)
Implying a range of 5.5511e-17 to 1.1102e-16