how to calculate the precision and recall in a classification probability model - classification

As I know, precision and recall can be calculated from a binary classification model as its output is only 1 or 0. However, in a probability binary classification model, the output is a decimal number within 0 to 1. So how to get the precision and recall from it? Thanks!
I expect to generate the precision and recall from the probability classification model.

Related

Cholesky decomposition for simulation correlated random variables

I have a correlation matrix for N random variables. Each of them is uniformly distributed within [0,1]. I am trying to simulate these random variables, how can I do that? Note N > 2. I was trying to using Cholesky Decomposition and below is my steps:
get the lower triangle of the correlation matrix (L=N*N)
independently sample 10000 times for each of the N uniformly distributed random variables (S=N*10000)
multiply the two: L*S, and this gives me correlated samples but the range of them is not within [0,1] anymore.
How can I solve the problem?
I know that if I only have 2 random variables I can do something like:
1*x1+sqrt(1-tho^2)*y1
to get my correlated sample y. But if you have more than two variables correlated, not sure what should I do.
You can get approximate solutions by generating correlated normals using the Cholesky factorization, then converting them to U(0,1)'s using the normal CDF. The solution is approximate because the normals have the desired correlation, but converting to uniforms is a non-linear transformation and only linear xforms preserve correlation.
There's a transformation available which will give exact solutions if the transformed Var/Cov matrix is positive semidefinite, but that's not always the case. See the abstract at https://www.tandfonline.com/doi/abs/10.1080/03610919908813578.

Why is softmax function necessory? Why not simple normalization?

I am not familiar with deep learning so this might be a beginner question.
In my understanding, softmax function in Multi Layer Perceptrons is in charge of normalization and distributing probability for each class.
If so, why don't we use the simple normalization?
Let's say, we get a vector x = (10 3 2 1)
applying softmax, output will be y = (0.9986 0.0009 0.0003 0.0001).
Applying simple normalization (dividing each elements by the sum(16))
output will be y = (0.625 0.1875 0.125 0.166).
It seems like simple normalization could also distribute the probabilities.
So, what is the advantage of using softmax function on the output layer?
Normalization does not always produce probabilities, for example, it doesn't work when you consider negative values. Or what if the sum of the values is zero?
But using exponential of the logits changes that, it is in theory never zero, and it can map the full range of the logits into probabilities. So it is preferred because it actually works.
This depends on the training loss function. Many models are trained with a log loss algorithm, so that the values you see in that vector estimate the log of each probability. Thus, SoftMax is merely converting back to linear values and normalizing.
The empirical reason is simple: SoftMax is used where it produces better results.

How can I calculate Precision and Recall for sentiment analysis multi-class classifier using Confusion Matrix?

I wonder how to compute precision and recall using a confusion matrix sentiment analysis multi-class classifier using Confusion Matrix. I have a dataset of 5000 texts and I did human labeling for a sample of 100. Now, I would like to compute the Precision and Recall for the classifier based on this sample of data. I have three classes; Positive, Neutral and Negative.
So how can I compute these metrics for each class?
As I am new here in stackoverflow, I couldn't illustrate the confusion matrix I have, so let us assume that we have the following confusion matrix:
red color > Negative
green color > Positive
purple color> Neutral
you can measure
precision=TPos/(TPos+TNeg+TNeu) i.e 30/(30+20+10)=50% ,
recall=TPos/(TPos+FNeg+FNeu) i.e 30/(30+50+20)=30% ,
F-measure=2*precision*recall/(precision+recall)=37.5% ,and
Accuracy(all true)/(all data) =30+60+80/300=56.7% .
for more http://blog.kaggle.com/2015/10/23/scikit-learn-video-9-better-evaluation-of-classification-models/
You can use sklearn's classification report.

Calculating the area under curve from classification accuracy

I have an assignment:
Using Naive Bayes we built a model on some data with 2 classes (model returns 2 probabilities - one for positive and one for negative class). We calculated the area under ROC curve AUC = 0.8 and classification accuracy CA = 0.6 with threshold set to 0.5 (if the probability of some example for positive class is higher than 0.5, we predict positive class for that example, else the negative class). Then we discovered that if we set the threshold to 0.3, classification accuracy becomes CA = 0.7. What is the AUC for the second threshold? If the result depends on initial data, present all possibilities.
How can I calculate that?
Not sure if that qualifies as an answer, but the ROC AUC is the integral of sensitivity and specificity over all classification thresholds. Therefore you cannot compute the AUC for a specific threshold.

How can I efficiently find the accuracy of a classifier

Even with a simple classifier like the nearest neighbour I cannot seem to judge its accuracy and thus cannot improve it.
For example with the code below:
IDX = knnsearch(train_image_feats, test_image_feats);
predicted_categories = cell([size(test_image_feats, 1), 1]);
for i=1:size(IDX,1)
predicted_categories{i}=train_labels(IDX(i));
end
Here train_image_feats is a 300 by 256 matrix where each row represents an image. Same is the structure of test_image_feats. train_labels is the label corresponding to each row of the training matrix.
The book I am following simply said that the above method achieves an accuracy of 19%.
How did the author come to this conclusion? Is there any way to judge the accuracy of my results be it with this classifier or other?
The author then uses another method of feature extraction and says it improved accuracy by 30%.
How can I find the accuracy? Be it graphically or just via a simple percentage.
Accuracy when doing machine learning and classification is usually calculated by comparing your predicted outputs from your classifier in comparison to the ground truth. When you're evaluating the classification accuracy of your classifier, you will have already created a predictive model using a training set with known inputs and outputs. At this point, you will have a test set with inputs and outputs that were not used to train the classifier. For the purposes of this post, let's call this the ground truth data set. This ground truth data set helps assess the accuracy of your classifier when you are providing inputs to this classifier that it has not seen before. You take your inputs from your test set, and run them through your classifier. You get outputs for each input and we call the collection of these outputs the predicted values.
For each predicted value, you compare to the associated ground truth value and see if it is the same. You add up all of the instances where the outputs match up between the predicted and the ground truth. Adding all of these values up, and dividing by the total number of points in your test set yields the fraction of instances where your model accurately predicted the result in comparison to the ground truth.
In MATLAB, this is really simple to calculate. Supposing that your categories for your model were enumerated from 1 to N where N is the total number of labels you are classifying with. Let groundTruth be your vector of labels that denote the ground truth while predictedLabels denote your labels that are generated from your classifier. The accuracy is simply calculated by:
accuracy = sum(groundTruth == predictedLabels) / numel(groundTruth);
accuracyPercentage = 100*accuracy;
The first line of code calculates what the accuracy of your model is as a fraction. The second line calculates this as a percentage, where you simply multiply the first line of code by 100. You can use either or when you want to assess accuracy. One is just normalized between [0,1] while the other is a percentage from 0% to 100%. What groundTruth == predictedLabels does is that it compares each element between groundTruth and predictedLabels. If the ith value in groundTruth matches with the ith value in predictedLabels, we output a 1. If not, we output a 0. This will be a vector of 0s and 1s and so we simply sum up all of the values that are 1, which is eloquently encapsulated in the sum operation. We then divide by the total number of points in our test set to obtain the final accuracy of the classifier.
With a toy example, supposing I had 4 labels, and my groundTruth and predictedLabels vectors were this:
groundTruth = [1 2 3 2 3 4 1 1 2 3 3 4 1 2 3];
predictedLabels = [1 2 2 4 4 4 1 2 3 3 4 1 2 3 3];
The accuracy using the above vectors gives us:
>> accuracy
accuracy =
0.4000
>> accuracyPercentage
accuracyPercentage =
40
This means that we have a 40% accuracy or an accuracy of 0.40. Using this example, the predictive model was only able to accurately classify 40% of the test set when you put each test set input through the classifier. This makes sense, because between our predicted outputs and ground truth, only 40%, or 6 outputs match up. These are the 1st, 2nd, 6th, 7th, 10th and 15th elements. There are other metrics to calculating accuracy, like ROC curves, but when calculating accuracy in machine learning, this is what is usually done.