I'm training a data set with 17 features and 5 output using pytorch. But I'm most interested in two of them, let say output 2 and 3 out of 0-4. What's a good strategy to get as high accuracy as possible on 2 and 3, while the rest can have lower accuracy?
If you are using nn.CrossEntropyLoss(), you can pass in the weights to emphasize or de-emphasize certain classes. From the PyTorch docs:
torch.nn.CrossEntropyLoss(weight: Optional[torch.Tensor] = None, ...)
The weights do not have to sum up to one, since PyTorch will handle that on its own when reduction='mean', which is the default setting. The weights specify which classes to weigh more heavily when calculating the loss. In other words, the higher the weight, the higher the penalty for getting a prediction wrong for the particular set of classes with higher weights.
# imports assumed
x = torch.randn(10, 5) # dummy data
target = torch.randint(0, 5, (10,)) # dummy targets
weights = torch.tensor([1., 1., 2., 2., 1.]) # emphasize classes 2 and 3
criterion_weighted = nn.CrossEntropyLoss(weight=weights)
loss_weighted = criterion_weighted(x, target)
Related
I am currently reading "Introduction to machine learning" by Ethem Alpaydin and I came across nearest centroid classifiers and tried to implement it. I guess I have correctly implemented the classifier but I am getting only 68% accuracy . So, is the nearest centroid classifier itself is inefficient or is there some error in my implementation (below) ?
The data set contains 1372 data points each having 4 features and there are 2 output classes
My MATLAB implementation :
DATA = load("-ascii", "data.txt");
#DATA is 1372x5 matrix with 762 data points of class 0 and 610 data points of class 1
#there are 4 features of each data point
X = DATA(:,1:4); #matrix to store all features
X0 = DATA(1:762,1:4); #matrix to store the features of class 0
X1 = DATA(763:1372,1:4); #matrix to store the features of class 1
X0 = X0(1:610,:); #to make sure both datasets have same size for prior probability to be equal
Y = DATA(:,5); # to store outputs
mean0 = sum(X0)/610; #mean of features of class 0
mean1 = sum(X1)/610; #mean of featurs of class 1
count = 0;
for i = 1:1372
pre = 0;
cost1 = X(i,:)*(mean0'); #calculates the dot product of dataset with mean of features of both classes
cost2 = X(i,:)*(mean1');
if (cost1<cost2)
pre = 1;
end
if pre == Y(i)
count = count+1; #counts the number of correctly predicted values
end
end
disp("accuracy"); #calculates the accuracy
disp((count/1372)*100);
There are at least a few things here:
You are using dot product to assign similarity in the input space, this is almost never valid. The only reason to use dot product would be the assumption that all your data points have the same norm, or that the norm does not matter (nearly never true). Try using Euclidean distance instead, as even though it is very naive - it should be significantly better
Is it an inefficient classifier? Depends on the definition of efficiency. It is an extremely simple and fast one, but in terms of predictive power it is extremely bad. In fact, it is worse than Naive Bayes, which is already considered "toy model".
There is something wrong with the code too
X0 = DATA(1:762,1:4); #matrix to store the features of class 0
X1 = DATA(763:1372,1:4); #matrix to store the features of class 1
X0 = X0(1:610,:); #to make sure both datasets have same size for prior probability to be equal
Once you subsamples X0, you have 1220 training samples, yet later during "testing" you test on both training and "missing elements of X0", this does not really make sense from probabilistic perspective. First of all you should never test accuracy on the training set (as it overestimates true accuracy), second of all by subsampling your training data your are not equalizing priors. Not in the method like this one, you are simply degrading quality of your centroid estimate, nothing else. These kind of techniques (sub/over- sampling) equalize priors for models that do model priors. Your method does not (as it is basically generative model with the assumed prior of 1/2), so nothing good can happen.
I am finetuning a network. In a specific case I want to use it for regression, which works. In another case, I want to use it for classification.
For both cases I have an HDF5 file, with a label. With regression, this is just a 1-by-1 numpy array that contains a float. I thought I could use the same label for classification, after changing my EuclideanLoss layer to SoftmaxLoss. However, then I get a negative loss as so:
Iteration 19200, loss = -118232
Train net output #0: loss = 39.3188 (* 1 = 39.3188 loss)
Can you explain if, and so what, goes wrong? I do see that the training loss is about 40 (which is still terrible), but does the network still train? The negative loss just keeps on getting more negative.
UPDATE
After reading Shai's comment and answer, I have made the following changes:
- I made the num_output of my last fully connected layer 6, as I have 6 labels (used to be 1).
- I now create a one-hot vector and pass that as a label into my HDF5 dataset as follows
f['label'] = numpy.array([1, 0, 0, 0, 0, 0])
Trying to run my network now returns
Check failed: hdf_blobs_[i]->shape(0) == num (6 vs. 1)
After some research online, I reshaped the vector to a 1x6 vector. This lead to the following error:
Check failed: outer_num_ * inner_num_ == bottom[1]->count() (40 vs. 240)
Number of labels must match number of predictions; e.g., if softmax axis == 1
and prediction shape is (N, C, H, W), label count (number of labels)
must be N*H*W, with integer values in {0, 1, ..., C-1}.
My idea is to add 1 label per data set (image) and in my train.prototxt I create batches. Shouldn't this create the correct batch size?
Since you moved from regression to classification, you need to output not a scalar to compare with "label" but rather a probability vector of length num-labels to compare with the discrete class "label". You need to change num_output parameter of the layer before "SoftmaxWithLoss" from 1 to num-labels.
I believe currently you are accessing un-initialized memory and I would expect caffe to crash sooner or later in this case.
Update:
You made two changes: num_output 1-->6, and you also changed your input label from a scalar to vector.
The first change was the only one you needed for using "SoftmaxWithLossLayer".
Do not change label from a scalar to a "hot-vector".
Why?
Because "SoftmaxWithLoss" basically looks at the 6-vector prediction you output, interpret the ground-truth label as index and looks at -log(p[label]): the closer p[label] is to 1 (i.e., you predicted high probability for the expected class) the lower the loss. Making a prediction p[label] close to zero (i.e., you incorrectly predicted low probability for the expected class) then the loss grows fast.
Using a "hot-vector" as ground-truth input label, may give rise to multi-category classification (does not seems like the task you are trying to solve here). You may find this SO thread relevant to that particular case.
I have training set of size 54 * 65536 and a testing set of 18 * 65536.
I want to use a knn classifier, but I have some questions:
1) How should I define trainlabel?
Class = knnclassify(TestVec,TrainVec, TrainLabel,k);
Is it a vector of size 54 * 1 that defines to which group each row in training set belongs? Here the group is numbered as 1 ,2,..
2) To find the accuracy I used this:
cp = classperf(TrainLabel);
Class = knnclassify(TestVec,TrainVec, TrainLabel);
cp = classperf(TestLabel,Class);
cp.CorrectRate*100
Is this right? Is there another method to calculate it?
3) How can I enhance the accuracy?
4) How do I choose the best value of k?
I do not know matlab nor the implementation of the knn you are providing, so I can answer only a few of your questions.
1) You assumption is correct. trainlabel is a 54*1 vector or an array of size 54 or something equivalent that defines which group each datapoint (row) in training set belongs to.
2) ... MATLAB / implementation related, sorry.
3) That is a very big discussion. Possible ways are:
Choose a better value of K.
Preprocess the data (or make preprocessing better if already applied).
Get a better / bigger trainset.
to name a few...
4) You can use different values while measuring the accuracy for each one and keep the best. (Note: If you do that, make sure you do not measure the accuracy of the classifier per value of k only once, but rather you use some technique like 10-Folding or something).
There is more than a fair chance that the library you are using for the K-NNclassifier provides such utilities.
I am trying to train a random forest classificator on a very imbalanced dataset with 2 classes (benign-malign).
I have seen and followed the code from a previous question (How to set up and use sample weight in the Orange python package?) and tried to set various higher weights to the minority class data instances, but the classificators that I get work exactly the same.
My code:
data = Orange.data.Table(filename)
st = Orange.classification.tree.SimpleTreeLearner(min_instances=3)
forest = Orange.ensemble.forest.RandomForestLearner(learner=st, trees=40, name="forest")
weight = Orange.feature.Continuous("weight")
weight_id = -10
data.domain.add_meta(weight_id, weight)
data.add_meta_attribute(weight, 1.0)
for inst in data:
if inst[data.domain.class_var]=='malign':
inst[weight]=100
classifier = forest(data, weight_id)
Am I missing something?
Simple tree learner is simple: it's optimized for speed and does not support weights. I guess learning algorithms in Orange that do not support weight should raise an exception if the weight argument is specified.
If you need them just to change the class distribution, multiply data instances instead. Create a new data table and add 100 copies of each instance of malignant tumor.
I've got a problem with implementing multilayered perceptron with Matlab Neural Networks Toolkit.
I try to implement neural network which will recognize single character stored as binary image(size 40x50).
Image is transformed into a binary vector. The output is encoded in 6bits. I use simple newff function in that way (with 30 perceptrons in hidden layer):
net = newff(P, [30, 6], {'tansig' 'tansig'}, 'traingd', 'learngdm', 'mse');
Then I train my network with a dozen of characters in 3 different fonts, with following train parameters:
net.trainParam.epochs=1000000;
net.trainParam.goal = 0.00001;
net.traxinParam.lr = 0.01;
After training net recognized all characters from training sets correctly but...
It cannot recognize more then twice characters from another fonts.
How could I improve that simple network?
you can try to add random elastic distortion to your training set (in order to expand it, and making it more "generalizable").
You can see the details on this nice article from Microsoft Research :
http://research.microsoft.com/pubs/68920/icdar03.pdf
You have a very large number of input variables (2,000, if I understand your description). My first suggestion is to reduce this number if possible. Some possible techniques include: subsampling the input variables or calculating informative features (such as row and column total, which would reduce the input vector to 90 = 40 + 50)
Also, your output is coded as 6 bits, which provides 32 possible combined values, so I assume that you are using these to represent 26 letters? If so, then you may fare better with another output representation. Consider that various letters which look nothing alike will, for instance, share the value of 1 on bit 1, complicating the mapping from inputs to outputs. An output representation with 1 bit for each class would simplify things.
You could use patternnet instead of newff, this creates a network more suitable for pattern recognition. As target function use a 26-elements vector with 1 in the right letter's position (0 elsewhere). The output of the recognition will be a vector of 26 real values between 0 and 1, with the recognized letter with the highest value.
Make sure to use data from all fonts for the training.
Give as input all data sets, train will automatically divide them into train-validation-test sets according to the specified percentages:
net.divideParam.trainRatio = .70;
net.divideParam.valRatio = .15;
net.divideParam.testRatio = .15;
(choose you own percentages).
Then test using only the test set, you can find their indices into
[net, tr] = train(net,inputs,targets);
tr.testInd