Using Torch's ClassNLLCriterion - neural-network

I am currently using Torch and just trying to get a simple neural network program running. Each one of my inputs has 3 attributes and the output is supposed to be a classification between the numbers 1 and 7. I've extracted my data from a CSV file and have put it into 2 Tensors (1 with the inputs and 1 with the outputs). The data is in this format.
**Data**
1914 1993 2386
1909 1990 2300
.....
1912 1989 2200
[torch.DoubleTensor of size 99999x3]
**Class**
1
1
2
.....
7
[torch.DoubleTensor of size 99999]
For the model I'm using to train the network, I simply have
model = nn.Sequential()
model:add(nn.Linear(3, 7))
model:add(nn.LogSoftMax())
criterion = nn.ClassNLLCriterion()
And this is the code I have to train the network
for int i = 1, 10 do
prediction = model:forward(data)
loss = criterion:forward(prediction, class)
model:zeroGradParameters()
grad = criterion:backward(prediction, class)
model:backward(data, grad)
model:updateParameters(.1)
end
In my test data tensor, I have formatted it in the same way as I formatted the test data (Tensor of 99999x3). I want the program to give me a prediction of what the classification would be when I run this line.
print (model:forward(test_data))
However, I am getting negative numbers (which shouldn't happen with ClassNLLCriterion?) and the sums of the probabilities are not adding to 0. My doubt is that I have either not formatted the data correctly or that I wasn't able to perform the training process correctly. If anyone could help me figure out what the issue is, I'd be very grateful.
Thank you!

The reason why you cannot see the prediction yields on the layer model:add(nn.LogSoftMax()) which implements the log function, that's why you have negative values (they are not probabilities). As an example, to get the probabilities back you should do:
model = nn.Sequential()
model:add(nn.Linear(3, 7));
model:add(nn.LogSoftMax());
criterion = nn.ClassNLLCriterion();
data = torch.Tensor{1914, 1993 , 2386}
print (model:forward(data):exp())
>> 0.0000
0.0000
1.0000
0.0000
0.0000
0.0000
0.0000 [torch.DoubleTensor of size 7]
Sorry for the late answer.

Here is what I currently use which maybe the wrong way of using classnllcriterion but atleast it will get you somewhere to understanding it.
Make the targets to be either
(7,1,1,1,1,1,1) <--First class representation
.......
(1,1,1,1,1,1,7) <--Last class representation
or
(1,1,1,1,1,1,1) <--First class representation
.......
(7,7,7,7,7,7,7) <--Last class representation
I figured it's much easier to train the last representation as targets, but I have a feeling we should use the first one instead.
EDIT: I've just found out that classnllcriterion only accepts a scalars as targets, hence using the above is wrong!
You should instead use either
1 .. 7 as target values, either just 1 or just 7.
That's

Related

Modeling an hrf time series in MATLAB

I'm attempting to model fMRI data so I can check the efficacy of an experimental design. I have been following a couple of tutorials and have a question.
I first need to model the BOLD response by convolving a stimulus input time series with a canonical haemodynamic response function (HRF). The first tutorial I checked said that one can make an HRF that is of any amplitude as long as the 'shape' of the HRF is correct so they created the following HRF in matlab:
hrf = [ 0 0 1 5 8 9.2 9 7 4 2 0 -1 -1 -0.8 -0.7 -0.5 -0.3 -0.1 0 ]
And then convolved the HRF with the stimulus by just using 'conv' so:
hrf_convolved_with_stim_time_series = conv(input,hrf);
This is very straight forward but I want my model to eventually be as accurate as possible so I checked a more advanced tutorial and they did the following. First they created a vector of 20 timepoints then used the 'gampdf' function to create the HRF.
t = 1:1:20; % MEASUREMENTS
h = gampdf(t,6) + -.5*gampdf(t,10); % HRF MODEL
h = h/max(h); % SCALE HRF TO HAVE MAX AMPLITUDE OF 1
Is there a benefit to doing it this way over the simpler one? I suppose I have 3 specific questions.
The 'gampdf' help page is super short and only says the '6' and '10' in each function call represents 'A' which is a 'shape' parameter. What does this mean? It gives no other information. Why is it 6 in the first call and 10 in the second?
This question is directly related to the above one. This code is written for a situation where there is a TR = 1 and the stimulus is very short (like 1s). In my situation my TR = 2 and my stimulus is quite long (12s). I tried to adapt the above code to make a working HRF for my situation by doing the following:
t = 1:2:40; % 2s timestep with the 40 to try to equate total time to above
h = gampdf(t,6) + -.5*gampdf(t,10); % HRF MODEL
h = h/max(h); % SCALE HRF TO HAVE MAX AMPLITUDE OF 1
Because I have no idea what the 'gampdf' parameters mean (or what that line does, in all actuality) I'm not sure this gives me what I'm looking for. I essentially get out 20 values where 1-14 have SOME numeric value in them but 15-20 are all 0. I'm assuming there will be a response during the entire 12s stimulus period (first 6 TRs so values 1-6) with the appropriate rectification which could be the rest of the values but I'm not sure.
Final question. The other code does not 'scale' the HRF to have an amplitude of 1. Will that matter, ultimately?
The canonical HRF you choose is dependent upon where in the brain the BOLD signal is coming from. It would be inappropriate to choose just any HRF. Your best source of a model is going to come from a lit review. I've linked a paper discussing the merits of multiple HRF models. The methods section brings up some salient points.

Matlab enforce variable accuracy

I have two double arrays that are of equal size but one always has the following format:
A1 = 0.0756 0.0368 0.0124 0.0024 0.0002 0.0000 0.0000
while the other one is:
A2 = 0.0797 0.0368 0.0120 0.0024 0.0004 0 0
I want to enforce the last two elements to be of the same accuracy, that is be 0.0000 instead of 0. Trying the naïve approach of A2(7) = 0.0000 does not work, although A2(7) = A1(7) does the trick.
How can I archive this a bit more cleverly?
Careful! I think you will find the A1(7) == 0 returns false.
What exactly do you mean by the same accuracy? Internally matlab uses the same precision for every element of both of your arrays (they're all doubles). It is just displaying them differently.
Try the following commands:
A1(7);
format long g
A1(7);
and I think you'll find that in fact A1(7) isn't 0 and furthermore is also accurate to far more than the 4 decimal points you are seeing.
So the question is, do you actually want to round off to 4 decimal places? Or do you just want to display up to 4 decimal places? I image you want the latter, so have a look at sprintf
The numbers actually already have the desired precision. If you want you can print them with any amount of digits you like (Though typically only about the first fourteen will be significant).
For this you can use the various print commands, however if you want to change the default way numbers are displayed, your simple options are quite limited.
Check help format for the things you can choose from.
If you always want to show a minmum number of decimals, I believe your only choice is:
format shorteng

Interpretation of Probability Estimate for Multi-class classification in LibSVM for MATLAB

Problem: 3 class classification with labels 1,2,3.
Tool: LibSVM for MATLAB
svmModel = svmtrain(<Trainfeatures>, <TrainclassLabels>, '-b 1 -c <someCValue> -g <someGammaValue>');
[predLabels, classAccuracy, **probEstimates**] = svmpredict(<TestFeatures>, <TestClassLabels>, '-b 1');
AFter this step, I get the first ten rows of probEstimates to be,
0.9129 0.0749 0.0122
0.9059 0.0552 0.0389
0.8231 0.0183 0.1586
0.9077 0.0098 0.0825
0.9074 0.0668 0.0257
0.8685 0.0146 0.1169
0.8962 0.0664 0.0374
0.9074 0.0548 0.0377
0.9474 0.0054 0.0472
0.9178 0.0642 0.0180
but the first ten predicted labels to be:
2
2
2
2
2
2
2
2
2
2
Questions:
My understanding was that the probability estimate was the probability that a particular item would belong to a particular class, given its feature vector. However, if that were true, then these items should belong to class 1 and not class 2. Does the libsvm change the order of classes or am I missing something here? If I am wrong, can someone please explain what the real interpretation of probability estimate is?
If I have to move the decision boundary to increase the precision of class 1 (have less items to be predicted to be class 1 and hence be more conservative in the decision boundary), which of these class probabilities should I have to deal with and how?
I came across the same problem recently.
The reason is related to the order of training data.
If you want the index of post-probability vector to correspond to the label of training data, the training data should be sorted according to the label.
For example, if the label of the the first data point is 4, then the first entry of post-probability vector is related to data points labeled 4.
The order of the the labels stored in the model may different from what we thought it should be. You can check using svmModel.Label. And the probability estimates are outputted according to this order.

Matlab neural networks - bad results

I've got a problem with implementing multilayered perceptron with Matlab Neural Networks Toolkit.
I try to implement neural network which will recognize single character stored as binary image(size 40x50).
Image is transformed into a binary vector. The output is encoded in 6bits. I use simple newff function in that way (with 30 perceptrons in hidden layer):
net = newff(P, [30, 6], {'tansig' 'tansig'}, 'traingd', 'learngdm', 'mse');
Then I train my network with a dozen of characters in 3 different fonts, with following train parameters:
net.trainParam.epochs=1000000;
net.trainParam.goal = 0.00001;
net.traxinParam.lr = 0.01;
After training net recognized all characters from training sets correctly but...
It cannot recognize more then twice characters from another fonts.
How could I improve that simple network?
you can try to add random elastic distortion to your training set (in order to expand it, and making it more "generalizable").
You can see the details on this nice article from Microsoft Research :
http://research.microsoft.com/pubs/68920/icdar03.pdf
You have a very large number of input variables (2,000, if I understand your description). My first suggestion is to reduce this number if possible. Some possible techniques include: subsampling the input variables or calculating informative features (such as row and column total, which would reduce the input vector to 90 = 40 + 50)
Also, your output is coded as 6 bits, which provides 32 possible combined values, so I assume that you are using these to represent 26 letters? If so, then you may fare better with another output representation. Consider that various letters which look nothing alike will, for instance, share the value of 1 on bit 1, complicating the mapping from inputs to outputs. An output representation with 1 bit for each class would simplify things.
You could use patternnet instead of newff, this creates a network more suitable for pattern recognition. As target function use a 26-elements vector with 1 in the right letter's position (0 elsewhere). The output of the recognition will be a vector of 26 real values between 0 and 1, with the recognized letter with the highest value.
Make sure to use data from all fonts for the training.
Give as input all data sets, train will automatically divide them into train-validation-test sets according to the specified percentages:
net.divideParam.trainRatio = .70;
net.divideParam.valRatio = .15;
net.divideParam.testRatio = .15;
(choose you own percentages).
Then test using only the test set, you can find their indices into
[net, tr] = train(net,inputs,targets);
tr.testInd

Why newff() does not work properly when I use its arguments!

My data-set includes 29 inputs and 6 outputs. When I use
net = newff(minmax(Pl),[14 12 8 6]);
to build my feed forward MLP network and train it by
net.trainParam.epochs=50;
net=train(net,Pl,Tl);
the network can not learn my data-set and its error does not decrease below 0.7, but when I use arguments of newff function like this:
net=newff(minmax(Pl),[14 12 8 6],{'tansig' 'tansig' 'tansig' 'purelin'},'trainlm');
the error is decreased very fast and it comes below 0.0001! The unusual note is that when I use the previous code using only one layer including 2 neurons:
net=newff(minmax(Pl),[2 6],{'tansig' 'purelin'},'trainlm');
the error is decreased below 0.2 again and it is doubtful!
Please give me some tips and help me to know what is difference between:
net = newff(minmax(Pl),[14 12 8 6]);
and
net=newff(minmax(Pl),[14 12 8 myANN.m],{'tansig' 'tansig' 'tansig' 'purelin'},'trainlm');
?
I think that the second argument to NEWFF (link requires login) is supposed to be the target vectors, not the size of hidden layers (which is the third argument).
Note that the default transfer function for hidden layers is tansig and for output layer is purelin, and the default training algorithm is trainlm.
Finally, remember that if you want to get reproducible results, you have to manually reset the random number generator to a fixed state at the beginning of each run.