ANN Mean Squared Error - neural-network

When using Mean Squared to calculate error, should the actual calculation be done after every epoch or every pattern within a specific data-set?
Thanks John.

You should calculate the MSE after every epoch, but use all errors from each pattern in the epoch.

It depends on the dataset for which is the greatest member. You need to find the sum of the squared error first which is the greatest member in the dataset equal to zero. Then find the mean square error which is sum of squared error of greatest member divided by N.

Related

Matlab Zero Tolerance in rank function

I am wondering if there is technical or theoretical reason on why Matlab on rank function considers as zero the value max(size(A))*eps(norm(A)). Can you please provide some intuition?
Thank you!
The following answer is not based on proper mathematical reasoning, it is just some speculations (as you were asking for intuition):
norm(A) is the order of magnitude of the matrix entries.
eps(norm(A)) is thus the accuracy that the floating point representation of the matrix entries typically has.
Now, consider you add N numbers that should theoretically add up to zero, but each of them has an error of eps to it ... I think we would expect an error in the order of sqrt(N) * eps for the result.
Then, given that the algorithm that computes the rank performs N^2 operations on the matrix entries (where N is its size) to result in a number that is checked against zero, the error that we would then expect is what you stated in your question.
What I don't know, is the algorithm that Matlab uses really of complexity N^2?

Compute average distance between vector and its permutations

I have a vector, say x = [1 1.5 2]. I want to compute the expected distance between that vector and a random permutation of the vector. The assumption is that all permutations are equally likely.
For the example above, the solution should be 4/9. The first element changes 1/2 on average, the second element changes 1/3 on average, and the last one 1/2. The average change is therefore 4/9.
The problem is that this vector has about 50-100 entries. Is there a smart way to compute this expected distance?
I am now using mean(mean(abs(bsxfun(#minus,x,x')))) and this seems to do the trick.
One of the rare cases where bsxfun does not provide the fastest solution. If you want to make use of the symmetry, use pdist
s=sum(pdist(x,'cityblock'))/numel(x).^2*2

How do I know the confidence level of my correct probability score?

I have a writer recognition system that gives back an NLL (Negative Least Likelihood) score for a test sample against every trained model. For example if there are thirteen models to compare the sample against the NLL output will look like this.
15885.1881156907 17948.1931699086 17205.1548161452 16846.8936368077 20798.8048757930 18153.8179076007 18972.6746781821 17398.9047592641 19292.8326540969 22559.3178790489 17315.0994094185 19471.9518308519 18867.2297851016
Where each column represents the score for that sample against every model. Column 1 gives the score against model 1 and so on.
This test sample is written by model 1. So the first column should have the minimum value for correct prediction.
The output I provided here gives the desired prediction, as the value of column 1 is minimum.
When I presented my results I was asked how confident I was about the scores or the predicted values? I was asked to provide a confidence level of each score.
I did some reading after this and found some posts on 95 % confidence interval which appears as every result to my google query but it does not appear to be what I need.
The reason I need this is suppose for a test sample I have scores from 2 models. Then using the confidence level I am supposed to know which score to pick up.
For example for the same test sample the scores from another model are:
124494.535128967 129586.451168849 126269.733526396 129579.895935672 128582.387405272 125984.657455834 127486.755531507 125162.136816278 129790.811437270 135902.112799503 126599.346536290 136223.382395325 126182.202727967
Both are correctly predicting as in both cases score in column 1 is minimum. But again how do I find the confidence level of my score?
Would appreciate any guidance here.
As my knowledge you cannot evaluate a confidence level for just one value.
Suppose you can store your results in a matrix where each column corresponds to a model and each row corresponds to an example (or observation). You can evaluate the confidence for every single model by using all the predicted results from that model (i.e. you can evaluate the confidence interval for any column in our matrix) according to the following procedure:
Evaluate the mean value of the column, let's call this µ
Evaluate the standard deviation of the column, let's call this σ
Evaluate the mean error as ε=σ/sqrt(N), where N is the number of samples (rows)
the lower bound for the confidence interval is given by µ-2ε whereas the upper bound is given by µ+2ε. By straightforward subtraction you can find the amplitude of such confidence interval. The more is closer to zero, the more accurate is your measurement.
Hope this is what you're looking for.

Python and Matlab compute variances differently. Am I using the correct functions? [duplicate]

I try to convert matlab code to numpy and figured out that numpy has a different result with the std function.
in matlab
std([1,3,4,6])
ans = 2.0817
in numpy
np.std([1,3,4,6])
1.8027756377319946
Is this normal? And how should I handle this?
The NumPy function np.std takes an optional parameter ddof: "Delta Degrees of Freedom". By default, this is 0. Set it to 1 to get the MATLAB result:
>>> np.std([1,3,4,6], ddof=1)
2.0816659994661326
To add a little more context, in the calculation of the variance (of which the standard deviation is the square root) we typically divide by the number of values we have.
But if we select a random sample of N elements from a larger distribution and calculate the variance, division by N can lead to an underestimate of the actual variance. To fix this, we can lower the number we divide by (the degrees of freedom) to a number less than N (usually N-1). The ddof parameter allows us change the divisor by the amount we specify.
Unless told otherwise, NumPy will calculate the biased estimator for the variance (ddof=0, dividing by N). This is what you want if you are working with the entire distribution (and not a subset of values which have been randomly picked from a larger distribution). If the ddof parameter is given, NumPy divides by N - ddof instead.
The default behaviour of MATLAB's std is to correct the bias for sample variance by dividing by N-1. This gets rid of some of (but probably not all of) of the bias in the standard deviation. This is likely to be what you want if you're using the function on a random sample of a larger distribution.
The nice answer by #hbaderts gives further mathematical details.
The standard deviation is the square root of the variance. The variance of a random variable X is defined as
An estimator for the variance would therefore be
where denotes the sample mean. For randomly selected , it can be shown that this estimator does not converge to the real variance, but to
If you randomly select samples and estimate the sample mean and variance, you will have to use a corrected (unbiased) estimator
which will converge to . The correction term is also called Bessel's correction.
Now by default, MATLABs std calculates the unbiased estimator with the correction term n-1. NumPy however (as #ajcr explained) calculates the biased estimator with no correction term by default. The parameter ddof allows to set any correction term n-ddof. By setting it to 1 you get the same result as in MATLAB.
Similarly, MATLAB allows to add a second parameter w, which specifies the "weighing scheme". The default, w=0, results in the correction term n-1 (unbiased estimator), while for w=1, only n is used as correction term (biased estimator).
For people who aren't great with statistics, a simplistic guide is:
Include ddof=1 if you're calculating np.std() for a sample taken from your full dataset.
Ensure ddof=0 if you're calculating np.std() for the full population
The DDOF is included for samples in order to counterbalance bias that can occur in the numbers.

Matlab counting the distance

I have a problem in computing the distance between two different matrices. The first matrix is 5000x6, the second matrix is 5x80.
I want to use this syntax to calculate the distances:
pdist2(mCe(1,:),row);
But this gives me an error saying "columns in x have to be same in y".
Is there a way to compute the distances when the matrices have a different amount of columns?
The pdist2 function calculates the distance between a set of points based on a metric. A metric is a function of 2 vector arguments from the same metric space and as such they are required to have the same dimension. What you want to do is not possible based on the definition of a metric. Read this link for more details
http://en.wikipedia.org/wiki/Metric_space