Calculate statistical significance for accuracy between models - confidence-interval

There are two generative models modelA and modelB.
They are evaluated on a test dataset of 7,500 samples.
The output from the model is binary i.e. 0 and 1.
If response from the model == expected outcome, then it is counted as exact match “1”, else it would be “0”.
Accuracy from model1 is 56%.
Accuracy from model2 is 65%.
How to calculate confidence interval for a two-sample difference in proportion tests. What is the statistical significance of my result?
I have looked into blogs and youtube videos but they are discuss about numerical data. But for my case it is a for a binary outcome. How to compute the statistical significance given the accuracy from two models?

Related

KNN giving highest accuracy with K=1?

I am using Weka's IBk for performing classification on text (tweets). I am converting the training and test data to vector space, and when I am performing the classification on test data, the best result comes from K=1. The training and testing data are separate from each other. Why does K=1 give the best accuracy?
Because you are using vectors; so at k=1 the value you get for proximity (for k=1) is more important than what the common class is in case of k=n (ex: when k=5)

Evaluating performance of Neural Network embeddings in kNN classifier

I am solving a classification problem. I train my unsupervised neural network for a set of entities (using skip-gram architecture).
The way I evaluate is to search k nearest neighbours for each point in validation data, from training data. I take weighted sum (weights based on distance) of labels of nearest neighbours and use that score of each point of validation data.
Observation - As I increase the number of epochs (model1 - 600 epochs, model 2- 1400 epochs and model 3 - 2000 epochs), my AUC improves at smaller values of k but saturates at the similar values.
What could be a possible explanation of this behaviour?
[Reposted from CrossValidated]
To cross check if imbalanced classes are an issue, try fitting a SVM model. If that gives a better classification(possible if your ANN is not very deep) it may be concluded that classes should be balanced first.
Also, try some kernel functions to check if this transformation makes data linearly separable?

Neural Network Output

I'm working on a dataset that states whether one is positive or negative of diabetes. If in my data set, the number of observations negative of diabetes is 10 times greater than those of my observations positive of diabetes, is it already given that my network would only learn and predict negative of diabetes because it has more observations than that of positive?
The short answer is "No, not necessarily". The longer answer is it depends on how the ANN was trained (some cross-validation scheme), whether it had a sufficiently large sample of each class of data points and what proportion of the population was used as the training set. You also need to account for type I and type II errors (false positives and false negatives).
Try searching for something like "evaluating classification model" to get in-depth information.

Matlab Neural Network correctly classified results

I have trained a NN with Back Propagation algorithm and calculated the MSE. Now I want to find the percentage of correctly classified results (i am facing a classification problem). Any help?
It depends on your dataset whether you generate the data or whether you are given a dataset with samples.
In the first case you feed your NN with a generated sample and check whether NN predicts the correct class. You repeat it let say 100 times. And for each correctly classified sample you increment the counter CorrectlyClassified by one. Then the percentage of correctly classified results is equal to CorrectlyClassified. For higher accuracy you may not generate 100 samples, but X samples (where X is bigger than 100). Then the percentage of correctly classified results is:
CorrectlyClassified/X*100.
If you are given a dataset you should use cross-validation. See MATLAB documentation for an example.

Rapidminer - neural net operator - output confidence

I have feed-forward neural network with six inputs, 1 hidden layer and two output nodes (1; 0). This NN is learned by 0;1 values.
When applying model, there are created variables confidence(0) and confidence(1), where sum of this two numbers for each row is 1.
My question is: what do these two numbers (confidence(0) and confidence(1)) exactly mean? Are these two numbers probabilities?
Thanks for answers
In general
The confidence values (or scores, as they are called in other programs) represent a measure how, well, confident the model is that the presented example belongs to a certain class. They are highly dependent on the general strategy and the properties of the algorithm.
Examples
The easiest example to illustrate is the majority classifier, who just assigns the same score for all observations based on the proportions in the original testset
Another is example the k-nearest-neighbor-classifier, where the score for a class i is calculated by averaging the distance to those examples which both belong to the k-nearest-neighbors and have class i. Then the score is sum-normalized across all classes.
In the specific example of NN, I do not know how they are calculated without checking the code. I guess it is just the value of output node, sum-normalized across both classes.
Do the confidences represent probabilities ?
In general no. To illustrate what probabilities in this context mean: If an example has probability 0.3 for class "1", then 30% of all examples with similar feature/variable values should belong to class "1" and 70% should not.
As far as I know, his task is called "calibration". For this purpose some general methods exist (e.g. binning the scores and mapping them to the class-fraction of the corresponding bin) and some classifier-dependent (like e.g. Platt Scaling which has been invented for SVMs). A good point to start is:
Bianca Zadrozny, Charles Elkan: Transforming Classifier Scores into Accurate Multiclass Probability Estimates
The confidence measures correspond to the proportion of outputs 0 and 1 that are activated in the initial training dataset.
E.g. if 30% of your training set has outputs (1;0) and the remaining 70% has outputs (0; 1), then confidence(0) = 30% and confidence(1) = 70%