I have trained an RNN in Keras. Now, I want to get the values of the trained weights:
model = Sequential()
model.add(SimpleRNN(27, return_sequences=True , input_shape=(None, 27), activation = 'softmax'))<br>
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
model.get_weights()
This gives me 2 arrays of shape (27,27) and 1 array of shape (27,1). I am not getting the meaning of these arrays. Also, I should get 2 more array of shape (27,27) and (27,1) that will calculate the hidden state 'a' activation. How can I get these weights?
The arrays returned by model.get_weights() directly correspond to the weights used by SimpleRNNCell. They include:
The kernel matrix of size (input_shape[-1], units). In your case, input_shape=(None, 27) and units=27, so it's (27, 27). The kernel gets multiplied by the input.
The recurrent_kernel matrix of size (units, units), which also happens to be (27, 27). This matrix gets multiplied by the previous state.
The bias array of shape (units,) == (27,).
These arrays correspond to the standard formula:
# W = kernel
# U = recurrent_kernel
# B = bias
output = new_state = act(W * input + U * state + B)
Note that keras implementation uses a single bias vector, so all in all there are exactly three arrays.
Related
I was wondering how was working LSTM under Keras.
Let's take an example.
I have maximum sentence length of 3 words.
Example : 'how are you'
I vectorize each words in a vector of len 4. So I will have a shape (3,4)
Now, I want to use an lstm to do translation stuff. (Just an example)
model = Sequential()
model.add(LSTM(1, input_shape=(3,4), return_sequences=True))
model.summary()
I'm going to have an output shape of (3,1) according to Keras.
Layer (type) Output Shape Param #
=================================================================
lstm_16 (LSTM) (None, 3, 1) 24
=================================================================
Total params: 24
Trainable params: 24
Non-trainable params: 0
_________________________________________________________________
And this is what I don't understand.
Each unit of an LSTM (With return_sequences=True to have all the output of each state) should give me a vector of shape (timesteps, x)
Where timesteps is 3 in this case, and x is the size of my words vector (In this case, 4)
So, why I got an output shape of (3,1) ?
I searched everywhere, but can't figure it out.
Your interpretation of what the LSTM should return is not right. The output dimensionality doesn't need to match the input dimensionality. Concretely, the first argument of keras.layers.LSTM corresponds to the dimensionality of the output space, and you're setting it to 1.
In other words, setting:
model.add(LSTM(k, input_shape=(3,4), return_sequences=True))
will result in a (None, 3, k) output shape.
In the snippet:
criterion = nn.CrossEntropyLoss()
raw_loss = criterion(output.view(-1, ntokens), targets)
output size is torch.Size([5, 5, 8967]), targets size is torch.Size([25]), and ntokens is 8967
After modifying the code, my
output size is torch.Size([5, 8967]) and targets size is torch.Size([25])
which rises dimensionality issues when computing the loss.
Is it sensible to increase the size of my Linear activation that produces the output by 5, so that I can resize the output later to be of the size torch.Size([5, 5, 8967])?
The problem with increasing the size of the tensor is that ntokens can become quite large and I can easily run out of memory because of that. Is there an alternative approach?
You should do something like this:
ntokens = 8000
output = Variable(torch.randn(5, 5, ntokens))
targets = Variable(torch.from_numpy(np.random.randint(0, ntokens, size=25)))
criterion = nn.CrossEntropyLoss()
loss = criterion(output.view(-1, ntokens), targets)
print(loss)
This prints:
Variable containing:
9.4613
[torch.FloatTensor of size 1]
Here, I am assuming output contains predictions of next word for 5 sentences (minibatch size is 5) and each sentence is of length 5 (sequence length is 5). 8000 is the vocabulary size, so your model is predicting a probability distribution over the entire vocabulary.
Now, you can compute the loss of predicting each word as your target shape is 25 as required.
Please note, CrossEntropyLoss expects input to contain scores for each class. So, input has to be a 2D Tensor of size (minibatch, C) and the target has to be a class index (0 to C-1) for each value of a 1D tensor of size minibatch.
I'm implementing a multiclass classification with Libsvm adopting a one versus all strategy. For this purpose, I used the ovrtrain and ovrpredict MATLAB functions:
model = ovrtrain(GroupTrain, TrainingSet,'t -0' );
[predicted_labels ac decv] = ovrpredict(testY, TestSet, model);
The output of ovrpredict is as follows
Accuracy = 90% (18/20) (classification)
Accuracy = 90% (18/20) (classification)
Accuracy = 90% (18/20) (classification)
Accuracy = 95% (19/20) (classification)
Accuracy = 90% (18/20) (classification)
Accuracy = 90% (18/20) (classification)
Accuracy = 90% (18/20) (classification)
Accuracy = 90% (18/20) (classification)
Accuracy = 90% (18/20) (classification)
Accuracy = 90% (18/20) (classification)
I have 10 classes, I'm new to libsvm so I guess that those accuracies correspond to the classification accuracy of each class. However, I don't understand what is the difference between this output and the value of the accuracy ac returned by ovrpredict, which is 60%.
ac =
0.6000
Thanks
Both values are quite different from each other. Accuracy is the output of svmpredict() function, which tells you how your test data set is fitting to that specific class while ac gives you accuracy of input test class-labels (testY in your case) w.r.t predicted class-labels.
Lets, have a look inside overpredict function and see how these accuracy values are being generated.
function [pred, ac, decv] = ovrpredict(y, x, model)
From definition, we can see, we have 3 input parameters.
y = Class labels
x = Test sata set
model = A struct containing 10 models for 10 different classes.
labelSet = model.labelSet;
labelSet extracts labelSet (unique class-labels). In your case, you will have 10 unique labels, depending how you set while defining 10 separate classes of test data.
labelSetSize = length(labelSet)
Here you get number of classes (10 in your case).
models = model.models;
'models' variable will contain all training models (10 in your case).
decv= zeros(size(y, 1), labelSetSize)
Here, decv matrix has been created to keep decision probablities of each test data value.
for i=1:labelSetSize
[l,a,d] = svmpredict(double(y == labelSet(i)), x, models{i});
decv(:, i) = d * (2 * models{i}.Label(1) - 1);
end
Here, we pass our test data from svmpredict function for each generated model. In your case, this loop will iterate 10 times and generate classification Accuracy of test for each specific class. For example, Accuracy = 90% (18/20) (classification) indicates that 18 out of 20 rows of your test data set matches to that specific class.
Please note, in multi-class SVM, you can't make a decision based on Accuracy values. You will need Pred and ac values to make individual or overall estimate respectively.
double(y == labelSet(i) changes multi-class labels to single class labels by by checking which labels in y belong to a specific class (where iterator i is pointing). it will output either 0 or 1 for unmatched or matched cases respectively. Hence output label vector will contain either 0's or 1's thus corresponding to single class SVM.
decv(:, i) = d * (2 * models{i}.Label(1) - 1) labels the decision values -ve(unhealthy) or +ve(healthy) depending upon the single-class label values in respective trained model. models{i}.Label(1) contains only 2 types of values .i.e. 0 (for unmatched cases) or 1(for matched cases). Hence (2 * models{i}.Label(1) - 1)will always evaluate to 1 or -1, therefore, labelling the decision value healthy or unhealthy.
[tmp,pred] = max(decv, [], 2);
pred = labelSet(pred);
max returns two column vectors, 1st (tmp) containing the maximum decision value in each row and end (pred) respective row (or class) index.Hence, we are only interested in class index, we discard tmp variable.
ac = sum(y==pred) / size(x, 1);
Finally, we will calculate ac by checking how many predicted labels match input test labels and dividing the sum with number of test classes.
In your case ac=0.6 means 6 out of 10 test labels match predicted labels or 4 labels have been predicted otherwise.
I hope, it answers your question.
I want to classify a list of 5 test images using the library LIBSVM with a strategy 'one against all' in order to obtain probabilities for each class. the used code is bellow :
load('D:\xapp.mat');
load('D:\xtest.mat');
load('D:\yapp.mat');%% matrix contains true class of images yapp=[641;645;1001;1010;1100]
load('D:\ytest.mat');%% matrix contains unlabeled class of test set ytest=[1;2;3;4;5]
numLabels=max(yapp);
numTest=size(ytest,1);
%# train one-against-all models
model = cell(numLabels,1);
for k=1:numLabels
model{k} = svmtrain(double(yapp==k),xapp, ['-c 1000 -g 10 -b 1 ']);
end
%# get probability estimates of test instances using each model
prob = zeros(numTest,numLabels);
for k=1:numLabels
[~,~,p] = svmpredict(double(ytest==k), xtest, model{k}, '-b 1');
prob(:,k) = p(:,model{k}.Label==1); %# probability of class==k
end
%# predict the class with the highest probability
[~,pred] = max(prob,[],2);
acc = sum(pred == ytest) ./ numel(ytest) %# accuracy
I obtain this error :
Model does not support probabiliy estimates
Subscripted assignment dimension mismatch.
Error in comp (line 98)
prob(:,k) = p(:,model{k}.Label==1); %# probability of class==k
please, help me to solve this error and thanks in advance
What you're trying to do is to use a code snippet that evaluates a SVM classifier performances whereas your goal is to properly estimate the labels for your test set.
I assume your five labels are [641;645;1001;1010;1100] (as in yapp). First thing you have to do is delete ytest, because you don't know any labels for the test set. It is pointless to fill ytest with some dummy values: the SVMs will return our predicted labels.
The first error, as already pointed out in the comments is in
numLabels=max(yapp);
you must change max() with length() in order to gather the number of classes.
The training stage is almost correct.
Given the fact that k goes from 1 to 5 whereas yapp has the range above, you should consider changing double(yapp==k) into double(yapp==yapp(k)): in this manner we mark as positive the k-th value in yapp. Given the fact that k goes from 1 to 5, then yapp(k) will go from 641 to 1100.
And now the prediction stage.
The first input for svmpredict() should be the test labels but now we don't know them so we can fill it with a vector of zeros (there will be as many zeros as there are patterns in the test set). That is because svmpredict() automatically returns the accuracy as well if the test labels are known, but that's not the case. So you must change the second for-loop to
for k=1:numLabels
[~,~,p] = svmpredict(zeros(size(xtest,1),1), xtest, model{k}, '-b 1');
prob(:,k) = p(:,model{k}.Label==1); %# probability of class==k
end
and finally predict the labels with
[~,pred] = max(prob,[],2);
and pred contains the predicted labels.
Note 1: in this method, however, you cannot measure accuracy and/or other parameters because what we called the test set actually is not a test set. A test set is a labelled set and we pretend we don't know its labels in order to let the SVM predict them and then match the predicted labels with the actual labels in order to measure its accuracy.
Note 2: predicted labels in pred will most likely have values in range 1 to 5 due to the second for-loop. However, since your labels have different values, you can map back taking into account that 1 is 641, 2 is 645, 3 is 1001, 4 is 1010, 5 is 1100.
I'm making a decision tree using the classregtree(X,Y) function. I'm passing X as a matrix of size 70X9 (70 data objects, each having 9 attributes), and Y as a 70X1 matrix. Each one of my Y values is either 2 or 4. However, in the decision tree formed, it gives values of 2.5 or 3.5 for some of the leaf nodes.
Any ideas why this might be caused?
You are using classregtree in regression mode (which is the default mode).
Change the mode to classification mode.
Here is an example using CLASSREGTREE for classification:
%# load dataset
load fisheriris
%# split training/testing
cv = cvpartition(species, 'holdout',1/3);
trainIdx = cv.training;
testIdx = cv.test;
%# train
t = classregtree(meas(trainIdx,:), species(trainIdx), 'method','classification', ...
'names',{'SL' 'SW' 'PL' 'PW'});
%# predict
pred = t.eval(meas(testIdx,:));
%# evaluate
cm = confusionmat(species(testIdx),pred)
acc = sum(diag(cm))./sum(testIdx)
The output (confusion matrix and accuracy):
cm =
17 0 0
0 13 3
0 2 15
acc =
0.9
Now if your target class is encoded as numbers, the returned prediction will still be cell array of strings, so you have to convert them back to numbers:
%# load dataset
load fisheriris
[species,GN] = grp2idx(species);
%# ...
%# evaluate
cm = confusionmat(species(testIdx),str2double(pred))
acc = sum(diag(cm))./sum(testIdx)
Note that classification will always return strings, so I think you might have mistakenly used the method=regression option, which performs regression (numeric target) not classification (discrete target)