Standard Deviation (SD) for Geometric Mean - usage-statistics

I want to calculate the Standard Deviation around the Geometric mean for a highly variable data with >500 entries.
I calculated the geometric mean using the function exp(mean(log(x)))
To calculate the Standard Deviation around the geometric mean I created another function:
function(x) sqrt((sum(((x-(exp(mean(log(x)))))^2))/(NROW(x)-1)))
But applying this gave the following error:
Error in x : object of type 'closure' is not subsettable
Can anyone help me for calculating the SD around geometric mean by creating a function using logarithms, so that it is arithmetically applicable in a large and varied dataset?

This is a php function ( stats_standard_deviation ) that calculate the standard deviation:
See the php documentation
Example:
$data = array(2,3,3,3,3,2,5);
$standardDeviation = stats_standard_deviation($data);

Related

Double numerical integration in Matlab - Singularity

I have discrete data of a 2D function defined as
theta = linspace(0,pi,nTheta);
phi = linspace(0,2*pi,nPhi);
p=zeros(nPhi,nTheta);%only to show the dimension of my matrix
[np,nt]=ndgrid(phi,theta);
f1 = griddedInterpolant(np,nt,p,'spline');
f2= #(np,nt) f1(np,nt);
integral2(f2,0,2*pi,0,pi)
Note that p is calculated from a complex physical problem, but i showed above how it is initialized.
Also, I can increase nTheta and nPhi, which leads to more accurate calculation of p.
My calculated function (with nPhi=400,nTheta=200) is something like:
I tried 3 ways :
using Trapz function
using the code above but with linear interpolation for gridded interpolant
using the code above with spline interpolation
Although the spline is better than others, i still need to increase nPhi and nTheta, which makes it impossible for me to do the simulation due to its cost.
Is there any suggestion except these 3 methods or any general suggestion how i can do this computation more efficient? (I also took advantage of the symmetry in both directions)
Note that the shape of my function varies in each time step, so a local mesh refinement might be challenging because i don't know the detail of my function in advance.

Python and Matlab compute variances differently. Am I using the correct functions? [duplicate]

I try to convert matlab code to numpy and figured out that numpy has a different result with the std function.
in matlab
std([1,3,4,6])
ans = 2.0817
in numpy
np.std([1,3,4,6])
1.8027756377319946
Is this normal? And how should I handle this?
The NumPy function np.std takes an optional parameter ddof: "Delta Degrees of Freedom". By default, this is 0. Set it to 1 to get the MATLAB result:
>>> np.std([1,3,4,6], ddof=1)
2.0816659994661326
To add a little more context, in the calculation of the variance (of which the standard deviation is the square root) we typically divide by the number of values we have.
But if we select a random sample of N elements from a larger distribution and calculate the variance, division by N can lead to an underestimate of the actual variance. To fix this, we can lower the number we divide by (the degrees of freedom) to a number less than N (usually N-1). The ddof parameter allows us change the divisor by the amount we specify.
Unless told otherwise, NumPy will calculate the biased estimator for the variance (ddof=0, dividing by N). This is what you want if you are working with the entire distribution (and not a subset of values which have been randomly picked from a larger distribution). If the ddof parameter is given, NumPy divides by N - ddof instead.
The default behaviour of MATLAB's std is to correct the bias for sample variance by dividing by N-1. This gets rid of some of (but probably not all of) of the bias in the standard deviation. This is likely to be what you want if you're using the function on a random sample of a larger distribution.
The nice answer by #hbaderts gives further mathematical details.
The standard deviation is the square root of the variance. The variance of a random variable X is defined as
An estimator for the variance would therefore be
where denotes the sample mean. For randomly selected , it can be shown that this estimator does not converge to the real variance, but to
If you randomly select samples and estimate the sample mean and variance, you will have to use a corrected (unbiased) estimator
which will converge to . The correction term is also called Bessel's correction.
Now by default, MATLABs std calculates the unbiased estimator with the correction term n-1. NumPy however (as #ajcr explained) calculates the biased estimator with no correction term by default. The parameter ddof allows to set any correction term n-ddof. By setting it to 1 you get the same result as in MATLAB.
Similarly, MATLAB allows to add a second parameter w, which specifies the "weighing scheme". The default, w=0, results in the correction term n-1 (unbiased estimator), while for w=1, only n is used as correction term (biased estimator).
For people who aren't great with statistics, a simplistic guide is:
Include ddof=1 if you're calculating np.std() for a sample taken from your full dataset.
Ensure ddof=0 if you're calculating np.std() for the full population
The DDOF is included for samples in order to counterbalance bias that can occur in the numbers.

Higher order moments and shape parameters

I have used 5th moment of my data as a feature for classification and it gives good results, but i don't know what it measures? is it a shape parameter like kurtosis and skewness?
I'm using matlab's
m=moment(X,order);
which returns the central sample moment of X specified by the positive integer order.

Summing a series in matlab

I'm trying to write a generic function for finding the cosine of a value inputted into the function. The formula for cosine that I'm using is:
n
cosx = sum((-1)^n*x^(2n)/(2n)!)
n=1
I've looked at the matlab documentation and this page implies that the "sum" function should be able to do it so I tried to test it by entering:
sum(x^n, n=1..3)
but it just gives me "Error: The expression to the left of the equals sign is not a valid target for an assignment".
Is summing an infinite series something that matlab is able to do by default or do I have to simulate it using a function and loops?
Well if you want to approximate it to a finite number of terms you can do it in Matlab without toolboxes or loops:
sumCos = #(x, n)(sum(((-1).^(0:n)).*(x.^(2*(0:n)))./(factorial(2*(0:n)))));
and then use it like this
sumCos(pi, 30)
The first parameter is the angle, the second is the number of terms you want to take the series to (i.e. effects the precision). This is a numerical solution which I think is really what you're after.
btw I took the liberty of correcting your initial sum, surely n must start from 0 if you are trying to approximate cos
If you want to understand my formula (which surely you do) then you need to read up on some essential Matlab basics namely the colon operator and then the concept of using . to perform element-wise operations.
In MATLAB itself, no, you cannot solve an infinite sum. You would have to estimate it as you suggested. The page you were looking at is part of the Symbolic Math toolbox, which is an add-on to MATLAB. In particular, you were looking at MuPAD, which is rather similar to Mathematica. It is a symbolic math workspace, whereas MATLAB is more of a numeric math workspace. If you own the Symbolic Math toolbox, you can either use MuPAD as you tried to above, or you can use the symsum function from within MATLAB itself to carry out sums of series.

Hamming distance in k-means clustering

I want to use the hamming distance in kmeans clustering in Matlab, but I get an error saying that my data must be binary.
Is there anyway around this? The data matrix that I use can't be binary (it has a physical interpretation that must allow for values 0,1,2,3) but it's important that I use the Hamming distance.
Per the MATLAB documentation, the Hamming distance measure for kmeans can only be used with binary data, as it's a measure of the percentage of bits that differ.
You could try mapping your data into a binary representation before using the function. You could also look at using the city block distance as an alternative if possible, as it is suitable for non-binary input.
The data to cluster must be of type logical. You can convert your 0/1 double, single, uintX data by a single command:
x = logical( y );
If you want to convert uint8 type data to binary, check the function uint8tobit(). Take a look at de2bi() and bi2de() functions.