calculate error neural network - neural-network

Recently I have been implementing neural networks for my research. While trying to design the error of the neural network, I got confused on several things because I found several ways to compute mean square error:
global error= sigma of ((Tik-Yik)^2 )
where i= number of output neuron and k= number of training data
RMS= sqrt(global error/i+k)
global error= sigma of ((Tik-Yik)^2 )
where i= number of output neuron and k= number of training data
mean error= sigma of((Tik-mean(Tik))^2)
where i=number of output neuron and k=number of trainingdata
RMS=global error/mean error
I got confused about those two, can somebody explain to me which one is the right one? Or both are true?

Assuming these are the equations you're referring to :
If so, the RMS1 value looks like a valid RMS expression, while RMS2 is simply a fractional expression, not an RMS expression at all.
Just one thing though: it looks like your RMS1 also contains an error/typo: the denominator of the contents of your sqrt() should be simply "k", not "i+k". I think you should simply divide by the training set size, and not include the neuron count in there.
It would be useful if you also cited your references for these equations.

Related

Anomaly in accuracy calculation

I am classifying a dataset with four classes using pretrained VGG19. To calculate accuracy, I used this formula:
accuracy = sum(predictedLabels==testLabels)/numel(predictedLabels) --Eq 1
Then I calculated the confusion matrix using:
confMat = confusionmat(testLabels, predictedLabels) **--Eq 2**
From which I got a matrix with 4 rows and 4 columns since I had 4 classes.
Now, we know that the accuracy formula is also:
Accuracy=TP+TN/(TP+TN+FP+FN) **Eq-3**
So I also calculated Accuracy from my confusion matrix formed through above Eq. 2. where
TP=value in (row==column),
FP=sum of column-TP,
FN=sum of row-TP,
TN=sum of the diagonal-TP
If I am doing above steps alright, then my confusion is that I am getting different accuracies from two methods Eq 1 and Eq 3. The accuracy I am getting with Eq. 1 is equivalent to the formula TP/(TP+TN). so, If this is the case, then Eq. 1 is the wrong formula for calculating accuracy. But, this formula has been used across all matlab deep learning codes.
So, MATLAB is doing something wrong (which has the probability 0, I know) or I am doing something wrong. But, unfortunately, I am unable to pinpoint my mistake.
Now, the question is,
Am I doing it wrong? Where am I missing the step? How to correct it? What is the logical explanation of this anomaly?
EDIT
This anomaly in accuracy calculation happens due to class imbalance problem. that is when, there are different number of samples in each class. therefore, the regular accuracy formula in Eq. 3 will not work in such cases.
The main issue is that negative and positive is for prediction (is this a cat or not), while you are doing classification with more than two categories. The classifier doesn't give you positive and negative (for is it a cat prediction), so it is not possible to relate to answers as true positive or false positive etc. Therefore equation 3 is meaningless, and so is the method for computing TP, TN etc. For example, if TP is row=column as you defined, then these are the accurate values in the diagonal of confMat. But what is TN? According to your definition it is TP (which is the diagonal) minus the diagonal. I hope this helps put things on the write track.

How to interpret coefficients and p-values in multiple linear regression with two categorical variables and interaction

I am new to linear regression so I hope you can help me with interpreting the output of a multiple linear regression with two categorical predictor variables and an interaction term.
I did the following linear regression:
lm(H1A1c ~ Vowel * Speaker, data=data)
Vowel and Speaker are both categorical variables. Vowel can be "breathy", "modal" or "creaky" and there are four different speakers (F01, F02, M01, M02). I want to see if a combination of those two categories can predict the values for H1A1c.
My output is this:
Output of lm
Please correct me if I am wrong but I think we can see from this output that the relationship between most of my variables can't be characterised as linear. What I don't really understand is how to interpret the first p-value. When I googled I found that all the other p-values refer to the relationship of the respective coefficient and what this coefficient relates to. E.g. the p-value in the third line refers to the relationship of the coefficient of the third line to the first one, i.e. 23.1182-9.6557.
What about the p-value of the first coefficient, though? There can't be a linear relationship if there is no relationship? What does this p-value refer to?
Thanks in advance for your answers!
The first p-value(Intercept) tells you how likely the y-intercept of your fitted line is going to be zero(pass through the origin). Since the p-value in your result is way lower than 0.05, you can say the y-intercept is certainly not zero.
Other p-values are to be interpreted differently. Your interpretation is correct that they give an idea whether the coefficients of the variables they represent are likely to be zero or not.
the p-value in the third line refers to the relationship of the coefficient of the third line to the first one, i.e. 23.1182-9.6557
(-9.6557) means that on an average, the predicted value of H1A1c will be 9.6557 units lower if GlottalContext=creaky(i.e. GlottalContextcreaky = 1) compared to when GlottalContext=breathy(since breathy is your reference category here) keeping all other predictors unchanged. This is obviously when the corresponding p-value is less than 0.05 which, I see, is the case for GlottalContextcreaky.
(Additionally, if I were to assume that H1A1c is a continuous variable, I am not sure if choosing a linear regression to predict H1A1c would be the best way to go since both your predictors are categorical. You might want to explore other algorithms e.g. transform your dependent variable to categorical and do a binary/multinomial logistic regression or a decision tree)

Simple binary logistic regression using MATLAB

I'm working on doing a logistic regression using MATLAB for a simple classification problem. My covariate is one continuous variable ranging between 0 and 1, while my categorical response is a binary variable of 0 (incorrect) or 1 (correct).
I'm looking to run a logistic regression to establish a predictor that would output the probability of some input observation (e.g. the continuous variable as described above) being correct or incorrect. Although this is a fairly simple scenario, I'm having some trouble running this in MATLAB.
My approach is as follows: I have one column vector X that contains the values of the continuous variable, and another equally-sized column vector Y that contains the known classification of each value of X (e.g. 0 or 1). I'm using the following code:
[b,dev,stats] = glmfit(X,Y,'binomial','link','logit');
However, this gives me nonsensical results with a p = 1.000, coefficients (b) that are extremely high (-650.5, 1320.1), and associated standard error values on the order of 1e6.
I then tried using an additional parameter to specify the size of my binomial sample:
glm = GeneralizedLinearModel.fit(X,Y,'distr','binomial','BinomialSize',size(Y,1));
This gave me results that were more in line with what I expected. I extracted the coefficients, used glmval to create estimates (Y_fit = glmval(b,[0:0.01:1],'logit');), and created an array for the fitting (X_fit = linspace(0,1)). When I overlaid the plots of the original data and the model using figure, plot(X,Y,'o',X_fit,Y_fit'-'), the resulting plot of the model essentially looked like the lower 1/4th of the 'S' shaped plot that is typical with logistic regression plots.
My questions are as follows:
1) Why did my use of glmfit give strange results?
2) How should I go about addressing my initial question: given some input value, what's the probability that its classification is correct?
3) How do I get confidence intervals for my model parameters? glmval should be able to input the stats output from glmfit, but my use of glmfit is not giving correct results.
Any comments and input would be very useful, thanks!
UPDATE (3/18/14)
I found that mnrval seems to give reasonable results. I can use [b_fit,dev,stats] = mnrfit(X,Y+1); where Y+1 simply makes my binary classifier into a nominal one.
I can loop through [pihat,lower,upper] = mnrval(b_fit,loopVal(ii),stats); to get various pihat probability values, where loopVal = linspace(0,1) or some appropriate input range and `ii = 1:length(loopVal)'.
The stats parameter has a great correlation coefficient (0.9973), but the p values for b_fit are 0.0847 and 0.0845, which I'm not quite sure how to interpret. Any thoughts? Also, why would mrnfit work over glmfit in my example? I should note that the p-values for the coefficients when using GeneralizedLinearModel.fit were both p<<0.001, and the coefficient estimates were quite different as well.
Finally, how does one interpret the dev output from the mnrfit function? The MATLAB document states that it is "the deviance of the fit at the solution vector. The deviance is a generalization of the residual sum of squares." Is this useful as a stand-alone value, or is this only compared to dev values from other models?
It sounds like your data may be linearly separable. In short, that means since your input data is one dimensional, that there is some value of x such that all values of x < xDiv belong to one class (say y = 0) and all values of x > xDiv belong to the other class (y = 1).
If your data were two-dimensional this means you could draw a line through your two-dimensional space X such that all instances of a particular class are on one side of the line.
This is bad news for logistic regression (LR) as LR isn't really meant to deal with problems where the data are linearly separable.
Logistic regression is trying to fit a function of the following form:
This will only return values of y = 0 or y = 1 when the expression within the exponential in the denominator is at negative infinity or infinity.
Now, because your data is linearly separable, and Matlab's LR function attempts to find a maximum likelihood fit for the data, you will get extreme weight values.
This isn't necessarily a solution, but try flipping the labels on just one of your data points (so for some index t where y(t) == 0 set y(t) = 1). This will cause your data to no longer be linearly separable and the learned weight values will be dragged dramatically closer to zero.

Connecting perceptrons with output of previous ones?

Because of the help I received and researched here I was able to create a simple perceptron in C#, code of which goes like:
int Input1 = A;
int Input2 = B;
//weighted sum
double WSum = A * W1 + B * W2 + Bias;
//get the sign: -1 for negative, +1 for positive
int Sign=Math.Sign(WSum);
double error = Desired - Sign;
//updating weights
W1 += error * Input1 * 0.1; //0.1 being a learning rate
W2 += error * Input2 * 0.1;
return Sign;
I do not use Sigmoid here and just get -1 or 1.
I would have two questions:
1) Is that correct that my weights get values like -5 etc? When input is e.g. 100,50 it goes like: W1+=error*100*0.1
2) I want to proceed deeper and create more connected neurons - I guess I would need at least two to provide inputs to the third. Is that correct that the third will be fed with values just -1..1? I am aiming to a simple pattern recognition but so far do not understand how it should work.
It is perfectly valid that the values of your weights range from -Infinity to +Infinity. You should always use real numbers instead of integers (so as mentioned above, double will work. 32 bit floats precision is perfectly sufficient for neural networks).
Moreover, you should decay your learning rate with every learning step, e.g. reduce it by a factor of 0.99 after each update. Else, your algorithm will oscillate when approaching an optimum.
If you want to go "deeper", you will need to implement a Multilayer Perceptron (MLP). There exists a proof that a neural network with simple threshold neurons and multiple layers alsways has an equivalent with only 1 layer. This is why several decades ago the research community temporarily abandoned the idea of artificial neural networks. 1986, Geoffrey Hinton made the Backpropagation algorithm popular. With it you can train MLPs with multiple hidden layers.
To solve non-linear problems like XOR or other complex problems like pattern recognition, you need to apply a non-linear activation function. Have a look at the logistic sigmoid activation function for a start. f(x) = 1. / (1. + exp(-x)). When doing this you should normalize your input as well as your output values to the range [0.0; 1.0]. This is especially important for the output neurons since the output of the logistic sigmoid activation function is defined in exactly this range.
A simple Python implementation of feed-forward MLPs using arrays can be found in this answer.
Edit: You also need at least 1 hidden layer to solve e.g. XOR.
Try to set your weights as double.
Also i think it's much better to work with arrays, especially in neural networks and perceptron is the only way.
And you will need some for or while loops to succeed what you want.

Matlab - bug with linear discriminant analysis

I run
Y_testing_obtained = classify(X_testing, X_training, Y_training);
and the error I get is
Error using ==> classify at 246
The pooled covariance matrix of TRAINING must be positive definite.
X_training is 1550 x 5 matrix. Can you please tell me what this error means, i.e. why is it appearing, and how to work around it?
Thanks
Explanation: When you run the function classify without specifying the type of discriminant function (as you did), Matlab uses Linear Discriminant Analysis (LDA). Without going into too much details on LDA, the algorithms needs to calculate the covariance matrix of X_testing in order to solve an optimisation problem, and this matrix has to be positive definite (see Wikipedia: Positive-definite matrix). The underlying assumption is that your data is represented by a multivariate probability distribution, which always has a positive definite covariance matrix unless one or more variables are exact linear combinations of the others.
To solve your problem: It is possible that one of your variables is a linear combination of the others. You can try selecting a sensible subset of your variables, or perform Principal Component Analysis (PCA) on the training data and then classify using the first few principal components. Or, you could specify the type of discriminant function and choose one of the two naive Bayes classifiers, for example:
Y_testing_obtained = classify(X_testing, X_training, Y_training, 'diaglinear');
As a side note, you also need to have more observations (rows) than variables (columns), but in your case this is not the problem as you seem to have 1550 observations and 5 variables.
Finally, you can also have a look at the answers posted to a similar question on the Matlab forum.
Try regularizing the data using cvshrink function in Matlab