Higher order moments and shape parameters - matlab

I have used 5th moment of my data as a feature for classification and it gives good results, but i don't know what it measures? is it a shape parameter like kurtosis and skewness?
I'm using matlab's
m=moment(X,order);
which returns the central sample moment of X specified by the positive integer order.

Related

How do I know the confidence level of my correct probability score?

I have a writer recognition system that gives back an NLL (Negative Least Likelihood) score for a test sample against every trained model. For example if there are thirteen models to compare the sample against the NLL output will look like this.
15885.1881156907 17948.1931699086 17205.1548161452 16846.8936368077 20798.8048757930 18153.8179076007 18972.6746781821 17398.9047592641 19292.8326540969 22559.3178790489 17315.0994094185 19471.9518308519 18867.2297851016
Where each column represents the score for that sample against every model. Column 1 gives the score against model 1 and so on.
This test sample is written by model 1. So the first column should have the minimum value for correct prediction.
The output I provided here gives the desired prediction, as the value of column 1 is minimum.
When I presented my results I was asked how confident I was about the scores or the predicted values? I was asked to provide a confidence level of each score.
I did some reading after this and found some posts on 95 % confidence interval which appears as every result to my google query but it does not appear to be what I need.
The reason I need this is suppose for a test sample I have scores from 2 models. Then using the confidence level I am supposed to know which score to pick up.
For example for the same test sample the scores from another model are:
124494.535128967 129586.451168849 126269.733526396 129579.895935672 128582.387405272 125984.657455834 127486.755531507 125162.136816278 129790.811437270 135902.112799503 126599.346536290 136223.382395325 126182.202727967
Both are correctly predicting as in both cases score in column 1 is minimum. But again how do I find the confidence level of my score?
Would appreciate any guidance here.
As my knowledge you cannot evaluate a confidence level for just one value.
Suppose you can store your results in a matrix where each column corresponds to a model and each row corresponds to an example (or observation). You can evaluate the confidence for every single model by using all the predicted results from that model (i.e. you can evaluate the confidence interval for any column in our matrix) according to the following procedure:
Evaluate the mean value of the column, let's call this µ
Evaluate the standard deviation of the column, let's call this σ
Evaluate the mean error as ε=σ/sqrt(N), where N is the number of samples (rows)
the lower bound for the confidence interval is given by µ-2ε whereas the upper bound is given by µ+2ε. By straightforward subtraction you can find the amplitude of such confidence interval. The more is closer to zero, the more accurate is your measurement.
Hope this is what you're looking for.

Python and Matlab compute variances differently. Am I using the correct functions? [duplicate]

I try to convert matlab code to numpy and figured out that numpy has a different result with the std function.
in matlab
std([1,3,4,6])
ans = 2.0817
in numpy
np.std([1,3,4,6])
1.8027756377319946
Is this normal? And how should I handle this?
The NumPy function np.std takes an optional parameter ddof: "Delta Degrees of Freedom". By default, this is 0. Set it to 1 to get the MATLAB result:
>>> np.std([1,3,4,6], ddof=1)
2.0816659994661326
To add a little more context, in the calculation of the variance (of which the standard deviation is the square root) we typically divide by the number of values we have.
But if we select a random sample of N elements from a larger distribution and calculate the variance, division by N can lead to an underestimate of the actual variance. To fix this, we can lower the number we divide by (the degrees of freedom) to a number less than N (usually N-1). The ddof parameter allows us change the divisor by the amount we specify.
Unless told otherwise, NumPy will calculate the biased estimator for the variance (ddof=0, dividing by N). This is what you want if you are working with the entire distribution (and not a subset of values which have been randomly picked from a larger distribution). If the ddof parameter is given, NumPy divides by N - ddof instead.
The default behaviour of MATLAB's std is to correct the bias for sample variance by dividing by N-1. This gets rid of some of (but probably not all of) of the bias in the standard deviation. This is likely to be what you want if you're using the function on a random sample of a larger distribution.
The nice answer by #hbaderts gives further mathematical details.
The standard deviation is the square root of the variance. The variance of a random variable X is defined as
An estimator for the variance would therefore be
where denotes the sample mean. For randomly selected , it can be shown that this estimator does not converge to the real variance, but to
If you randomly select samples and estimate the sample mean and variance, you will have to use a corrected (unbiased) estimator
which will converge to . The correction term is also called Bessel's correction.
Now by default, MATLABs std calculates the unbiased estimator with the correction term n-1. NumPy however (as #ajcr explained) calculates the biased estimator with no correction term by default. The parameter ddof allows to set any correction term n-ddof. By setting it to 1 you get the same result as in MATLAB.
Similarly, MATLAB allows to add a second parameter w, which specifies the "weighing scheme". The default, w=0, results in the correction term n-1 (unbiased estimator), while for w=1, only n is used as correction term (biased estimator).
For people who aren't great with statistics, a simplistic guide is:
Include ddof=1 if you're calculating np.std() for a sample taken from your full dataset.
Ensure ddof=0 if you're calculating np.std() for the full population
The DDOF is included for samples in order to counterbalance bias that can occur in the numbers.

2-Dimensional Minimization without Derivatives and Ignoring certain Input Parameters on the go

I have a Function V which depends on two variables v1 and v2 and a parameter-Array p containing 15 Parameters.
I want to Minimize my Function V regarding v1 and v2, but there is no closed expression for my Function, so I can't build and use the Derivatives.
The Problem is the following : For caluclating the Value of my Function I need the Eigenvalues of two 4x4 Matrices (which should be symmetric and real by concept, but sometimes the EigenSolver does not get real Eigenvalues). These Eigenvalues I calculate with the Eigen Package. The entries of the Matrices are given by v1,v2 and p.
There are certain Input Sets for which some of these Eigenvalues become negative. These are Input Sets which I want to ignore for my calculation as they will lead to an complex Function value and my Function is only allowed to have real values.
Is there a way to include this? My first attempt was a Nelder-Mead-Simplex Algorithm using the GSL-Library and an way too high Output value for the Function if one of the Eigenvalues becomes negative, but this doesn't work.
Thanks for any suggestions.
For the Nelder-Mead simplex, you could reject new points as vertices for the simplex, unless they have the desired properties.
Your method to artificially increase the function value for forbidden points is also called penalty or barrier function. You might want to re-design your penalty function.
Another optimization method without derivatives is the Simulated Annealing method. Again, you could modify the method to avoid forbidden points.
What do you mean by "doesn't work"? Does it take too long? Are the resulting function values too high?
Depending on the function evaluation cost, it might be an approach to simply scan a 2D interval, evaluate all width x height function values and drill down in the tile with the lowest function values.

Which scaling technique does it use?

I have a matrix X, the size of which is 100*2000 double. I want to know which kind of scaling technique is applied to matrix X in the following command, and why it does not use z-score to do scaling?
X = X./repmat(sqrt(sum(X.^2)),size(X,1),1);
That scaling comes from linear algebra. That's what we call normalizing by producing a unit vector. Assuming that each row is an observation and each column is a feature, what's happening here is that we are going through every observation that you collected and normalizing each feature value over all observations such that the overall length / magnitude of a particular feature for all observations is set to 1.
The bottom division takes a look at each feature and determines the norm or magnitude of the feature over all observations. Once you find these magnitudes, you then take each feature for each observation and divide by their respective magnitudes.
The reason why unit vectors are often employed is to describe a point in feature space with respect to a set of basis vectors. Normalizing by producing unit vectors gives you the smallest possible way to represent one component in feature space and so what's probably happening here is that the observations are now being transformed such that each component / feature is being represented in terms of a set of basis vectors. Each basis vector is one feature in the data.
Check out the Wikipedia article on Unit Vectors for more details: http://en.wikipedia.org/wiki/Unit_vector

Using Linear Prediction Over Time Series to Determine Next K Points

I have a time series of N data points of sunspots and would like to predict based on a subset of these points the remaining points in the series and then compare the correctness.
I'm just getting introduced to linear prediction using Matlab and so have decided that I would go the route of using the following code segment within a loop so that every point outside of the training set until the end of the given data has a prediction:
%x is the data, training set is some subset of x starting from beginning
%'unknown' is the number of points to extend the prediction over starting from the
%end of the training set (i.e. difference in length of training set and data vectors)
%x_pred is set to x initially
p = length(training_set);
coeffs = lpc(training_set, p);
for i=1:unknown
nextValue = -coeffs(2:end) * x_pred(end-unknown-1+i:-1:end-unknown-1+i-p+1)';
x_pred(end-unknown+i) = nextValue;
end
error = norm(x - x_pred)
I have three questions regarding this:
1) Does this appropriately do what I have described? I ask because my error seems rather large (>100) when predicting over only the last 20 points of a dataset that has hundreds of points.
2) Am I interpreting the second argument of lpc correctly? Namely, that it means the 'order' or rather number of points that you want to use in predicting the next point?
3) If this is there a more efficient, single line function in Matlab that I can call to replace the looping and just compute all necessary predictions for me given some subset of my overall data as a training set?
I tried looking through the lpc Matlab tutorial but it didn't seem to do the prediction as I have described my needs require. I have also been using How to use aryule() in Matlab to extend a number series? as a reference.
So after much deliberation and experimentation I have found the above approach to be correct and there does not appear to be any single Matlab function to do the above work. The large errors experienced are reasonable since I am using a linear prediction algorithm for a problem (i.e. sunspot prediction) that has inherent nonlinear behavior.
Hope this helps anyone else out there working on something similar.