MSE in neuralnet results and roc curve of the results - neural-network

Hi my question is a bit long please bare and read it till the end.
I am working on a project with 30 participants. We have two type of data set (first data set has 30 rows and 160 columns , and second data set has the same 30 rows and 200 columns as outputs=y and these outputs are independent), what i want to do is to use the first data set and predict the second data set outputs.As first data set was rectangular type and had high dimension i have used factor analysis and now have 19 factors that cover up to 98% of the variance. Now i want to use these 19 factors for predicting the outputs of the second data set.
I am using neuralnet and backpropogation and everything goes well and my results are really close to outputs.
My questions :
1- as my inputs are the factors ( they are between -1 and 1 ) and my outputs scale are between 4 to 10000 and integer , should i still scaled them before running neural network ?
2-I scaled the data ( both input and outputs ) and then predicted with neuralnet , then i check the MSE error it was so high like 6000 while my prediction and real output are so close to each other. But if i rescale the prediction and outputs then check The MSE its near zero. Is it unbiased to rescale and then check the MSE ?
3- I read that it is better to not scale the output from the beginning but if i just scale the inputs all my prediction are 1. Is it correct to not to scale the outputs ?
4- If i want to plot the ROC curve how can i do it. Because my results are never equal to real outputs ?
Thank you for reading my question

[edit#1]: There is a publication on how to produce ROC curves using neural network results
http://www.lcc.uma.es/~jja/recidiva/048.pdf
1) You can scale your values (using minmax, for example). But only scale your training data set. Save the parameters used in the scaling process (in minmax they would be the min and max values by which the data is scaled). Only then, you can scale your test data set WITH the min and max values you got from the training data set. Remember, with the test data set you are trying to mimic the process of classifying unseen data. Unseen data is scaled with your scaling parameters from the testing data set.
2) When talking about errors, do mention which data set the error was computed on. You can compute an error function (in fact, there are different error functions, one of them, the mean squared error, or MSE) on the training data set, and one for your test data set.
4) Think about this: Let's say you train a network with the testing data set,and it only has 1 neuron in the output layer . Then, you present it with the test data set. Depending on which transfer function (activation function) you use in the output layer, you will get a value for each exemplar. Let's assume you use a sigmoid transfer function, where the max and min values are 1 and 0. That means the predictions will be limited to values between 1 and 0.
Let's also say that your target labels ("truth") only contains discrete values of 0 and 1 (indicating which class the exemplar belongs to).
targetLabels=[0 1 0 0 0 1 0 ];
NNprediction=[0.2 0.8 0.1 0.3 0.4 0.7 0.2];
How do you interpret this?
You can apply a hard-limiting function such that the NNprediction vector only contains the discreet values 0 and 1. Let's say you use a threshold of 0.5:
NNprediction_thresh_0.5 = [0 1 0 0 0 1 0];
vs.
targetLabels =[0 1 0 0 0 1 0];
With this information you can compute your False Positives, FN, TP, and TN (and a bunch of additional derived metrics such as True Positive Rate = TP/(TP+FN) ).
If you had a ROC curve showing the False Negative Rate vs. True Positive Rate, this would be a single point in the plot. However, if you vary the threshold in the hard-limit function, you can get all the values you need for a complete curve.
Makes sense? See the dependencies of one process on the others?

Related

understand and coding of the zero-lag cross-correlation matlab

First of all, I am sorry if I am a dummy and cant understand this part of an article. I have a set of data with 200 channels in which every specific two channels are co-dependent. In the paper it is mentioned:
"
For each channel, we filtered both signals
between 0.5 and 2.5 Hz to preserve only the cardiac component and
normalized the resulting signals to balance any difference between
their amplitude.
"
Question1: this means I need to normalize both co-dependent channels to the average of the median? or just normalize each signal to its own median?
Here is the rest of the paragraph
"
Then, we computed the cross-correlation
extracted the value at a time lag of 0 to quantify the similarity
between the filtered signals. In-phase and counter-phase identical
waveforms yielded a zero-lag cross-correlation value of 1 and -1
respectively, whereas a null value derived from totally uncorrelated
signals. "
I wrote the code below: but I get -1 or plus one everywhere even for signals that are not codependent it gives me 1 or -1. I guess I am wrong in part of the code but rationally I don't know where. Here is the code
datafile='data_sess_03.nirs'
ch_num=1
[w,src,det,mlOrg,mlo,mlm,Data,datap,acc1,acc2]=readData(datafile);
fc=[0.5 2.5];
dataf=filterData(Data,fc);
[c,lags]=xcorr(dataf(1,:),dataf(5,:),0); % channel 1 and 5 are
codependent
%% c is -1 and plus one every when even in the noisy channels
plot(acor,'black')
[~,I] = max(abs(acor));
lagDiff = lag(I)/fs
Any help will be really appreciated. Thanks a lot to help me

Should the output of backpropogation converge to 1 given the output is (0,1)

I am currently trying to understand the ANN that I created for an assignment that essentially takes gray scale (0-150)images (120x128) and determines whether the person is Male or Female. It works for the most part. I am treating this like a boolean problem where the output(Male = 1, Female = 0). I am able to get the ANN to correctly identify Male or Female. However the outputs I am getting for the Males are (0.3-0.6) depending on the run. Should I be getting the value ~1 out?
I am using a sigmoid unit 1/(1+e^-y) and have tried to take the inverse. I have tried this using 5 - 60 hidden units on 1 layer and tried 2 outputs with flip flop results. I want to understand this so that I can apply this to a non-boolean problem. ie If I want a numerical output how would I got about doing that or am I using the wrong machine learning technique?
You can use binary function at the output with some threshold. Assuming, you have assigned 0 for female and 1 for male in training, while testing you will get values in between 0 and 1 and also some times below 0 and above 1......So to make a decision at the output value just add threshold of 0.5 and check output value, if it is less than 0.5 then estimated class is female and if it is equal to or greater than 0.5 then estimated class is male.

Caffe classification labels in HDF5

I am finetuning a network. In a specific case I want to use it for regression, which works. In another case, I want to use it for classification.
For both cases I have an HDF5 file, with a label. With regression, this is just a 1-by-1 numpy array that contains a float. I thought I could use the same label for classification, after changing my EuclideanLoss layer to SoftmaxLoss. However, then I get a negative loss as so:
Iteration 19200, loss = -118232
Train net output #0: loss = 39.3188 (* 1 = 39.3188 loss)
Can you explain if, and so what, goes wrong? I do see that the training loss is about 40 (which is still terrible), but does the network still train? The negative loss just keeps on getting more negative.
UPDATE
After reading Shai's comment and answer, I have made the following changes:
- I made the num_output of my last fully connected layer 6, as I have 6 labels (used to be 1).
- I now create a one-hot vector and pass that as a label into my HDF5 dataset as follows
f['label'] = numpy.array([1, 0, 0, 0, 0, 0])
Trying to run my network now returns
Check failed: hdf_blobs_[i]->shape(0) == num (6 vs. 1)
After some research online, I reshaped the vector to a 1x6 vector. This lead to the following error:
Check failed: outer_num_ * inner_num_ == bottom[1]->count() (40 vs. 240)
Number of labels must match number of predictions; e.g., if softmax axis == 1
and prediction shape is (N, C, H, W), label count (number of labels)
must be N*H*W, with integer values in {0, 1, ..., C-1}.
My idea is to add 1 label per data set (image) and in my train.prototxt I create batches. Shouldn't this create the correct batch size?
Since you moved from regression to classification, you need to output not a scalar to compare with "label" but rather a probability vector of length num-labels to compare with the discrete class "label". You need to change num_output parameter of the layer before "SoftmaxWithLoss" from 1 to num-labels.
I believe currently you are accessing un-initialized memory and I would expect caffe to crash sooner or later in this case.
Update:
You made two changes: num_output 1-->6, and you also changed your input label from a scalar to vector.
The first change was the only one you needed for using "SoftmaxWithLossLayer".
Do not change label from a scalar to a "hot-vector".
Why?
Because "SoftmaxWithLoss" basically looks at the 6-vector prediction you output, interpret the ground-truth label as index and looks at -log(p[label]): the closer p[label] is to 1 (i.e., you predicted high probability for the expected class) the lower the loss. Making a prediction p[label] close to zero (i.e., you incorrectly predicted low probability for the expected class) then the loss grows fast.
Using a "hot-vector" as ground-truth input label, may give rise to multi-category classification (does not seems like the task you are trying to solve here). You may find this SO thread relevant to that particular case.

Unreasonable [positive] log-likelihood values from matlab "fitgmdist" function

I want to fit a data sets with Gaussian mixture model, the data sets contains about 120k samples and each sample has about 130 dimensions. When I use matlab to do it, so I run scripts (with cluster number 1000):
gm = fitgmdist(data, 1000, 'Options', statset('Display', 'iter'), 'RegularizationValue', 0.01);
I get the following outputs:
iter log-likelihood
1 -6.66298e+07
2 -1.87763e+07
3 -5.00384e+06
4 -1.11863e+06
5 299767
6 985834
7 1.39525e+06
8 1.70956e+06
9 1.94637e+06
The log likelihood is bigger than 0! I think it's unreasonable, and don't know why.
Could somebody help me?
First of all, it is not a problem of how large your dataset is.
Here is some code that produces similar results with a quite small dataset:
options = statset('Display', 'iter');
x = ones(5,2) + (rand(5,2)-0.5)/1000;
fitgmdist(x,1,'Options',options);
this produces
iter log-likelihood
1 64.4731
2 73.4987
3 73.4987
Of course you know that the log function (the natural logarithm) has a range from -inf to +inf. I guess your problem is that you think the input to the log (i.e. the aposteriori function) should be bounded by [0,1]. Well, the aposteriori function is a pdf function, which means that its value can be very large for very dense dataset.
PDFs must be positive (which is why we can use the log on them) and must integrate to 1. But they are not bounded by [0,1].
You can verify this by reducing the density in the above code
x = ones(5,2) + (rand(5,2)-0.5)/1;
fitgmdist(x,1,'Options',options);
this produces
iter log-likelihood
1 -8.99083
2 -3.06465
3 -3.06465
So, I would rather assume that your dataset contains several duplicate (or very close) values.

Output Value Of Neural Network Does Not Arrive To Desired Values

I made a neural network that also have Back Propagation.it has 5 nodes in input layer,6 nodes in hidden layer,1 node in output layer and have random weights and i use sigmoid as activation function.
i have two set of data for input.
for example :
13.5 22.27 0 0 0 desired value=0.02
7 19 4 7 2 desired value=0.03
now i train the network with 5000 iteration or iteration will stop if the error
value(desired - calculated output value) is less than or equal to 0.001.
the output value of first iteration for each input set is about 60 And it will decrease in each iteration.
now the problem is that the second set of inputs(that has desired value of 0.03),cause to stop iteration because of calculated output value of 3.001 but the first set of inputs did not arrived to desired value of it(that is 0.02) and its output is about 0.03 .
EDITED :
I used LMS algorithm andchanged the error threshold 0.00001 to find correct error value,but now output value of last iteration for both 0.03 and 0.02 desired value is between 0.023 and 0.027 and that is incorrect yet.
For your error value stop threshold, you should take the error on one epoch (Sum of every error of all your dataset) and not only on one member of you dataset. With this you will have to increase the value of your error threshold but it will force your neural network to do a good classification on all your example and not only on some example.