How should you test the significance of 2 classification accuracy scores: paired permutation test - classification

I have a single trained classifier tested on 2 related multiclass classification tasks. As each trial of the classification tasks are related, the 2 sets of predictions constitute paired data. I would like to run a paired permutation test to find out if the difference in classification accuracy between the 2 prediction sets is significant.
So my data consists of 2 lists of predicted classes, where each prediction is related to the prediction in the other test set at the same index.
Example:
actual_classes = [1, 3, 6, 1, 22, 1, 11, 12, 9, 2]
predictions1 = [1, 3, 6, 1, 22, 1, 11, 12, 9 10] # 90% acc.
predictions2 = [1, 3, 7, 10, 22, 1, 7, 12, 2, 10] # 50% acc.
H0: There is no significant difference in classification accuracy.
How do I go about running a paired permutation test to test significance of the difference in classification accuracy?

I have been thinking about this and I'm going to post a proposed solution and see if someone approves or explains why I'm wrong.
actual_classes = [1, 3, 6, 1, 22, 1, 11, 12, 9, 2]
predictions1 = [1, 3, 6, 1, 22, 1, 11, 12, 9 10] # 90% acc.
predictions2 = [1, 3, 7, 10, 22, 1, 7, 12, 2, 10] # 50% acc.
paired_predictions = [[1,1], [3,3], [6,7], [1,10], [22,22], [1,1], [11,7], [12,12], [9,2], [10,10]]
actual_test_statistic = predictions1 - predictions2 # 90%-50%=40 # 0.9-0.5=0.4
all_simulations = [] # empty list
for number_of_iterations:
shuffle(paired_predictions) # only shuffle between pairs, not within
simulated_predictions1 = paired_predictions[first prediction of each pair]
simulated_predictions2 = paired_predictions[second prediction of each pair]
simulated_accuracy1 = proportion of times simulated_predictions1 equals actual_classes
simulated_accuracy2 = proportion of times simulated_predictions2 equals actual_classes
all_simulations.append(simulated_accuracy1 - simulated_accuracy2) # Put the simulated difference in the list
p = count(absolute(all_simulations) > absolute(actual_test_statistic ))/number_of_iterations
If you have any thoughts, let me know in the comments. Or better still, provide your own corrected version in your own answer. Thank you!

Related

How to compare 2 sets of different date which contains 2 different sets of data?

I have 2 sets of Date, their 1st and last dates are the same respectively but their dates within might not be the same to each other. Both DateA and DateB contain different values on their each date, which are arrays A and B.
DateA= '2016-01-01'
'2016-01-02'
'2016-01-04'
'2016-01-05'
'2016-01-06'
'2016-01-07'
'2016-01-08'
'2016-01-09'
'2016-01-10'
'2016-01-12'
'2016-01-13'
'2016-01-14'
'2016-01-16'
'2016-01-17'
'2016-01-18'
'2016-01-19'
'2016-01-20'
DateB= '2016-01-01'
'2016-01-02'
'2016-01-03'
'2016-01-04'
'2016-01-05'
'2016-01-09'
'2016-01-10'
'2016-01-11'
'2016-01-12'
'2016-01-13'
'2016-01-15'
'2016-01-16'
'2016-01-17'
'2016-01-19'
'2016-01-20'
A = [5, 2, 3, 4, 6, 1, 7, 9, 3, 6, 1, 7, 9, 2, 1, 4, 6]
B = [4, 2, 7, 1, 8, 4, 9, 5, 3, 9, 3, 6, 7, 2, 9]
I have converted the dates into datenumber,ie
datenumberA= 736330
736331
736333
736334
736335
736336
736337
736338
736339
736341
736342
736343
736345
736346
736347
datenumberB= 736330
736331
736332
736333
736334
736338
736339
736340
736341
736342
736344
736345
736346
736348
736349
Now I want to compare the value of A on DateA(n) to that of B on DateB while DateB is the date that is closest to and before the date of DateA(n).
For example,
comparing the value of A on DateA '2016-01-12' to that of B on DateB '2016-01-11'.
Please help and thanks a lot.
It'll get you the desired output!
all_k=0;
out(1)=1; % not comparing the first index as you mentioned
for n=2:size(datenumberA,1)
j=0;
while 1
k=find(datenumberB+j==datenumberA(n)-1); %finding the index of DateB closest to and before DateA(n)
if size(k,1)==1 break; end %if found, come out of the while loop
j=j+1; % otherwise keep adding 1 in the values of datenumberB until found
end
if size(find(all_k==k),2) ~=1 % to avoid if any DateB is already compared
out(end+1)=A(n)> B(k); %Comparing Value in A with corresponding value in B
all_k(end+1)=k; end %Storing which indices of DateB are already compared
end
out' %Output
Output:-
ans =
1
0
0
1
0
0
1
0
0
1
0
0
1

Initial values in WinBUGS

I gave all values for stochastic nodes, but WinBUGS still gives me the message that a chain contains uninitialized variables. When I try to generate it, BUGS gives me an error undefined real result. What node I am missing here?
model {
# Population Model start on the 1 of October 2014
# Look up initial numbers in each age class
F1.2014 <- 4 # no. first-year females October 2014
F2.2014 <- 2 # no. older females October 2014
F.2014 <- F1.2014+F2.2014 # total no. of females in October 2014
F.trans <- 15 # no. of trans. juv females in March 2015
F2[1] ~ dbin(phi.af.annual, F.2014) # sample number older females alive October 2015 (12 mo)
mutot.2014 <- mu.1*F1.2014+mu.2*F2.2014 # expected number of fledglings 2014/15
J.2014 ~ dpois(mutot.2014) # sample actual number of fledglings
JF.2014 ~ dbin(0.5,J.2014) # sample no. juv females in January 2015
phi.trans<-pow((phi.jf.mo),6) # probability trans juvs survive from March to October
F1.trans ~ dbin(phi.trans, F.trans) # sample no. translocated female juven in October 2015
F1.juvs ~ dbin(phi.jf,JF.2014) # sample no. female juven in October 2015
F1[1]<-F1.juvs+F1.trans # total number of first year birds in October 2015
F[1] <- F1[1]+F2[1]
PE[1] <- step(-F[1]) # prob extinction by October 2015
# run simulations for 20 years
for (i in 1:20) {
mutot[i] <- mu.1*F1[i]*mu.2*F2[i] # expected number of total fledglings by females
J[i] ~ dpois(mutot[i]) # sample total fledglings
JF[i] ~ dbin(0.5,J[i]) # sample number female fledglings
F1[i+1] ~ dbin(phi.jf,JF[i]) # sample number female recruits next year
F2[i+1] ~ dbin(phi.af.annual,F[i]) # sample number adult females that survive to next year
F[i+1] <- F1[i+1]+F2[i+1] # total number of females next year
PE[i+1] <- step(-F[i+1]) # prob that population extinct next year
}
}
# Data
list(phi.af.annual=0.6, mu.1=2.5, mu.2=3.1, phi.jf.mo=0.8, phi.jf=0.87)
# Inits
list(F1=c(NA, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5),
F2=c(NA, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3),
F1.trans=10, F1.juvs=6, J.2014=20, JF.2014=10,
J=c(10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 10, 10, 10, 10),
JF=c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5)
)

how to use removeObjectsInArray in swift

I have two arrays
var availableIndex: Int[] = [0, 1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14]
var answerIndex: Int[] = [1, 3, 10, 8]
I want to remove 1, 3, 10, 8 from availableIndex array. I've seen the documentation how to achieve that using removeObjectsInArray
availableIndex.removeObjectsInArray(answerIndex)
but I can't use that method, it gave me an error. I have no idea where's my fault. Sorry if my bad English
edit:
here is the error 'Int[]' does not have a member named 'removeObjectsInArray'
The proper Swifty way to do this is
availableIndex = availableIndex.filter { value in
!answerIndex.contains(value)
}
(will create a new filtered array only with values not contained in answerIndex)
Of course, a better solution would be to convert answerIndex into a Set.
removeObjectsInArray is defined only on Obj-C mutable arrays (NSMutableArray).
An Obj-C workaround is to define the array directly as NSMutableArray
var availableIndex: NSMutableArray = [0, 1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14]
var answerIndex: Int[] = [1, 3, 10, 8]
availableIndex.removeObjectsInArray(answerIndex)
It's also possible to do it like that:
availableIndex.removeAll(where: { answerIndex.contains($0) })

ZXing library Reed Solomon example

I want to try the ReedSolomonDecoder from the ZXing library on the example given on page 10 of this paper
Basically, it encodes the message
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
using the generator polynomial
x^4 + 15x^3 + 3x^2 + x + 12
which results in
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 3, 3, 12, 12
I want to decode this in the following manner:
int[] data = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 3, 3, 12, 12};
GenericGF field = new GenericGF(?, 16, 1); // what integer should I use for primitive here?
ReedSolomonDecoder decoder = new ReedSolomonDecoder(field);
decoder.decode(data, 4);
I don't know how to create a GenericGF object from the given generator polynomial. I know that it expects a binary integer representation of the polynomial, but to do that, I would need the polynomial to be in an irreducible form, i.e. all the coefficients to be either 0 or 1. How can I achieve that from this given generator polynomial?
I'm pretty new to this as well but I think you would want to use
public static GenericGF AZTEC_PARAM = new GenericGF(0x13, 16, 1);

How to check if a number can be represented as a sum of some given numbers

I've got a list of some integers, e.g. [1, 2, 3, 4, 5, 10]
And I've another integer (N). For example, N = 19.
I want to check if my integer can be represented as a sum of any amount of numbers in my list:
19 = 10 + 5 + 4
or
19 = 10 + 4 + 3 + 2
Every number from the list can be used only once. N can raise up to 2 thousand or more. Size of the list can reach 200 integers.
Is there a good way to solve this problem?
4 years and a half later, this question is answered by Jonathan.
I want to post two implementations (bruteforce and Jonathan's) in Python and their performance comparison.
def check_sum_bruteforce(numbers, n):
# This bruteforce approach can be improved (for some cases) by
# returning True as soon as the needed sum is found;
sums = []
for number in numbers:
for sum_ in sums[:]:
sums.append(sum_ + number)
sums.append(number)
return n in sums
def check_sum_optimized(numbers, n):
sums1, sums2 = [], []
numbers1 = numbers[:len(numbers) // 2]
numbers2 = numbers[len(numbers) // 2:]
for sums, numbers_ in ((sums1, numbers1), (sums2, numbers2)):
for number in numbers_:
for sum_ in sums[:]:
sums.append(sum_ + number)
sums.append(number)
for sum_ in sums1:
if n - sum_ in sums2:
return True
return False
assert check_sum_bruteforce([1, 2, 3, 4, 5, 10], 19)
assert check_sum_optimized([1, 2, 3, 4, 5, 10], 19)
import timeit
print(
"Bruteforce approach (10000 times):",
timeit.timeit(
'check_sum_bruteforce([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 200)',
number=10000,
globals=globals()
)
)
print(
"Optimized approach by Jonathan (10000 times):",
timeit.timeit(
'check_sum_optimized([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 200)',
number=10000,
globals=globals()
)
)
Output (the float numbers are seconds):
Bruteforce approach (10000 times): 1.830944365834205
Optimized approach by Jonathan (10000 times): 0.34162875449254027
The brute force approach requires generating 2^(array_size)-1 subsets to be summed and compared against target N.
The run time can be dramatically improved by simply splitting the problem in two. Store, in sets, all of the possible sums for one half of the array and the other half separately. It can now be determined by checking for every number n in one set if the complementN-n exists in the other set.
This optimization brings the complexity down to approximately: 2^(array_size/2)-1+2^(array_size/2)-1=2^(array_size/2 + 1)-2
Half of the original.
Here is a c++ implementation using this idea.
#include <bits/stdc++.h>
using namespace std;
bool sum_search(vector<int> myarray, int N) {
//values for splitting the array in two
int right=myarray.size()-1,middle=(myarray.size()-1)/2;
set<int> all_possible_sums1,all_possible_sums2;
//iterate over the first half of the array
for(int i=0;i<middle;i++) {
//buffer set that will hold new possible sums
set<int> buffer_set;
//every value currently in the set is used to make new possible sums
for(set<int>::iterator set_iterator=all_possible_sums1.begin();set_iterator!=all_possible_sums1.end();set_iterator++)
buffer_set.insert(myarray[i]+*set_iterator);
all_possible_sums1.insert(myarray[i]);
//transfer buffer into the main set
for(set<int>::iterator set_iterator=buffer_set.begin();set_iterator!=buffer_set.end();set_iterator++)
all_possible_sums1.insert(*set_iterator);
}
//iterator over the second half of the array
for(int i=middle;i<right+1;i++) {
set<int> buffer_set;
for(set<int>::iterator set_iterator=all_possible_sums2.begin();set_iterator!=all_possible_sums2.end();set_iterator++)
buffer_set.insert(myarray[i]+*set_iterator);
all_possible_sums2.insert(myarray[i]);
for(set<int>::iterator set_iterator=buffer_set.begin();set_iterator!=buffer_set.end();set_iterator++)
all_possible_sums2.insert(*set_iterator);
}
//for every element in the first set, check if the the second set has the complemenent to make N
for(set<int>::iterator set_iterator=all_possible_sums1.begin();set_iterator!=all_possible_sums1.end();set_iterator++)
if(all_possible_sums2.find(N-*set_iterator)!=all_possible_sums2.end())
return true;
return false;
}
Ugly and brute force approach:
a = [1, 2, 3, 4, 5, 10]
b = []
a.size.times do |c|
b << a.combination(c).select{|d| d.reduce(&:+) == 19 }
end
puts b.flatten(1).inspect