How to compare 2 sets of different date which contains 2 different sets of data? - matlab

I have 2 sets of Date, their 1st and last dates are the same respectively but their dates within might not be the same to each other. Both DateA and DateB contain different values on their each date, which are arrays A and B.
DateA= '2016-01-01'
'2016-01-02'
'2016-01-04'
'2016-01-05'
'2016-01-06'
'2016-01-07'
'2016-01-08'
'2016-01-09'
'2016-01-10'
'2016-01-12'
'2016-01-13'
'2016-01-14'
'2016-01-16'
'2016-01-17'
'2016-01-18'
'2016-01-19'
'2016-01-20'
DateB= '2016-01-01'
'2016-01-02'
'2016-01-03'
'2016-01-04'
'2016-01-05'
'2016-01-09'
'2016-01-10'
'2016-01-11'
'2016-01-12'
'2016-01-13'
'2016-01-15'
'2016-01-16'
'2016-01-17'
'2016-01-19'
'2016-01-20'
A = [5, 2, 3, 4, 6, 1, 7, 9, 3, 6, 1, 7, 9, 2, 1, 4, 6]
B = [4, 2, 7, 1, 8, 4, 9, 5, 3, 9, 3, 6, 7, 2, 9]
I have converted the dates into datenumber,ie
datenumberA= 736330
736331
736333
736334
736335
736336
736337
736338
736339
736341
736342
736343
736345
736346
736347
datenumberB= 736330
736331
736332
736333
736334
736338
736339
736340
736341
736342
736344
736345
736346
736348
736349
Now I want to compare the value of A on DateA(n) to that of B on DateB while DateB is the date that is closest to and before the date of DateA(n).
For example,
comparing the value of A on DateA '2016-01-12' to that of B on DateB '2016-01-11'.
Please help and thanks a lot.

It'll get you the desired output!
all_k=0;
out(1)=1; % not comparing the first index as you mentioned
for n=2:size(datenumberA,1)
j=0;
while 1
k=find(datenumberB+j==datenumberA(n)-1); %finding the index of DateB closest to and before DateA(n)
if size(k,1)==1 break; end %if found, come out of the while loop
j=j+1; % otherwise keep adding 1 in the values of datenumberB until found
end
if size(find(all_k==k),2) ~=1 % to avoid if any DateB is already compared
out(end+1)=A(n)> B(k); %Comparing Value in A with corresponding value in B
all_k(end+1)=k; end %Storing which indices of DateB are already compared
end
out' %Output
Output:-
ans =
1
0
0
1
0
0
1
0
0
1
0
0
1

Related

how splitBetween method works

List integerList=[1,2,4,11,14,15,16,16,19,30,31,50,51,100,101,105]; //input
var subList=integerList.splitBetween((v1, v2) => (v2 - v1).abs() > 6);
print(subList); //([1, 2, 4], [11, 14, 15, 16, 16, 19], [30, 31], [50, 51], [100, 101, 105])
what is the logic splitBetween methods works here ?
check each pair of adjacent elements v1 and v2
lets use your data:
[1,2,4,11,14,15,16,16,19,30,31,50,51,100,101,105]
begin with index 0 and 1
we have : v1 = 1 , v2 = 2
then test with the function (v2 - v1).abs() > 6)
( 1-2).abs()>6 = false
index 1 and 2 : v1=2 , v2=4
(2 -4).abs() > 6 = false
index 2 and 3 : v1=4 , v2=11
(4 - 11).abs() > 6 absolute(-7) > 6 = true,
since its true : the elements since the previous chunk-splitting elements are emitted as a list
which means, index 1 - 3 emmited as a list.
current sublist = ([1,2,4])
and so on
index : 4 - 8 is false. and pair of index 8 and 9 is true
current sublist = ([1,2,4], [11,14,15,16,16,19])
repeat untin last index.
lastly :if at last index are false then we keep add to the list. because it says that : Any final elements are emitted at the end.
final result : ([1, 2, 4], [11, 14, 15, 16, 16, 19], [30, 31], [50, 51], [100, 101, 105])

How should you test the significance of 2 classification accuracy scores: paired permutation test

I have a single trained classifier tested on 2 related multiclass classification tasks. As each trial of the classification tasks are related, the 2 sets of predictions constitute paired data. I would like to run a paired permutation test to find out if the difference in classification accuracy between the 2 prediction sets is significant.
So my data consists of 2 lists of predicted classes, where each prediction is related to the prediction in the other test set at the same index.
Example:
actual_classes = [1, 3, 6, 1, 22, 1, 11, 12, 9, 2]
predictions1 = [1, 3, 6, 1, 22, 1, 11, 12, 9 10] # 90% acc.
predictions2 = [1, 3, 7, 10, 22, 1, 7, 12, 2, 10] # 50% acc.
H0: There is no significant difference in classification accuracy.
How do I go about running a paired permutation test to test significance of the difference in classification accuracy?
I have been thinking about this and I'm going to post a proposed solution and see if someone approves or explains why I'm wrong.
actual_classes = [1, 3, 6, 1, 22, 1, 11, 12, 9, 2]
predictions1 = [1, 3, 6, 1, 22, 1, 11, 12, 9 10] # 90% acc.
predictions2 = [1, 3, 7, 10, 22, 1, 7, 12, 2, 10] # 50% acc.
paired_predictions = [[1,1], [3,3], [6,7], [1,10], [22,22], [1,1], [11,7], [12,12], [9,2], [10,10]]
actual_test_statistic = predictions1 - predictions2 # 90%-50%=40 # 0.9-0.5=0.4
all_simulations = [] # empty list
for number_of_iterations:
shuffle(paired_predictions) # only shuffle between pairs, not within
simulated_predictions1 = paired_predictions[first prediction of each pair]
simulated_predictions2 = paired_predictions[second prediction of each pair]
simulated_accuracy1 = proportion of times simulated_predictions1 equals actual_classes
simulated_accuracy2 = proportion of times simulated_predictions2 equals actual_classes
all_simulations.append(simulated_accuracy1 - simulated_accuracy2) # Put the simulated difference in the list
p = count(absolute(all_simulations) > absolute(actual_test_statistic ))/number_of_iterations
If you have any thoughts, let me know in the comments. Or better still, provide your own corrected version in your own answer. Thank you!

PostGIS conditional aggregration - presence/absence matrix

I have a dataset that resembles the following:
site_id, species
1, spp1
2, spp1
2, spp2
2, spp3
3, spp2
3, spp3
4, spp1
4, spp2
I want to create a table like this:
site_id, spp1, spp2, spp3, spp4
1, 1, 0, 0, 0
2, 1, 1, 1, 0
3, 0, 1, 1, 0
4, 1, 1, 0, 0
This question was asked here, however the issue I face is that my list of species is significantly greater and so creating a massive query listing each species manually would take a significant amount of time. I would therefore like a solution that does not require this and could instead read from the existing species list.
In addition, when playing with that query, the count() function would keep adding so I would end up with values greater than 1 where multiples of the same species were present in a site_id. Ideally I want a binary 1 or 0 output.

How to check if a number can be represented as a sum of some given numbers

I've got a list of some integers, e.g. [1, 2, 3, 4, 5, 10]
And I've another integer (N). For example, N = 19.
I want to check if my integer can be represented as a sum of any amount of numbers in my list:
19 = 10 + 5 + 4
or
19 = 10 + 4 + 3 + 2
Every number from the list can be used only once. N can raise up to 2 thousand or more. Size of the list can reach 200 integers.
Is there a good way to solve this problem?
4 years and a half later, this question is answered by Jonathan.
I want to post two implementations (bruteforce and Jonathan's) in Python and their performance comparison.
def check_sum_bruteforce(numbers, n):
# This bruteforce approach can be improved (for some cases) by
# returning True as soon as the needed sum is found;
sums = []
for number in numbers:
for sum_ in sums[:]:
sums.append(sum_ + number)
sums.append(number)
return n in sums
def check_sum_optimized(numbers, n):
sums1, sums2 = [], []
numbers1 = numbers[:len(numbers) // 2]
numbers2 = numbers[len(numbers) // 2:]
for sums, numbers_ in ((sums1, numbers1), (sums2, numbers2)):
for number in numbers_:
for sum_ in sums[:]:
sums.append(sum_ + number)
sums.append(number)
for sum_ in sums1:
if n - sum_ in sums2:
return True
return False
assert check_sum_bruteforce([1, 2, 3, 4, 5, 10], 19)
assert check_sum_optimized([1, 2, 3, 4, 5, 10], 19)
import timeit
print(
"Bruteforce approach (10000 times):",
timeit.timeit(
'check_sum_bruteforce([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 200)',
number=10000,
globals=globals()
)
)
print(
"Optimized approach by Jonathan (10000 times):",
timeit.timeit(
'check_sum_optimized([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 200)',
number=10000,
globals=globals()
)
)
Output (the float numbers are seconds):
Bruteforce approach (10000 times): 1.830944365834205
Optimized approach by Jonathan (10000 times): 0.34162875449254027
The brute force approach requires generating 2^(array_size)-1 subsets to be summed and compared against target N.
The run time can be dramatically improved by simply splitting the problem in two. Store, in sets, all of the possible sums for one half of the array and the other half separately. It can now be determined by checking for every number n in one set if the complementN-n exists in the other set.
This optimization brings the complexity down to approximately: 2^(array_size/2)-1+2^(array_size/2)-1=2^(array_size/2 + 1)-2
Half of the original.
Here is a c++ implementation using this idea.
#include <bits/stdc++.h>
using namespace std;
bool sum_search(vector<int> myarray, int N) {
//values for splitting the array in two
int right=myarray.size()-1,middle=(myarray.size()-1)/2;
set<int> all_possible_sums1,all_possible_sums2;
//iterate over the first half of the array
for(int i=0;i<middle;i++) {
//buffer set that will hold new possible sums
set<int> buffer_set;
//every value currently in the set is used to make new possible sums
for(set<int>::iterator set_iterator=all_possible_sums1.begin();set_iterator!=all_possible_sums1.end();set_iterator++)
buffer_set.insert(myarray[i]+*set_iterator);
all_possible_sums1.insert(myarray[i]);
//transfer buffer into the main set
for(set<int>::iterator set_iterator=buffer_set.begin();set_iterator!=buffer_set.end();set_iterator++)
all_possible_sums1.insert(*set_iterator);
}
//iterator over the second half of the array
for(int i=middle;i<right+1;i++) {
set<int> buffer_set;
for(set<int>::iterator set_iterator=all_possible_sums2.begin();set_iterator!=all_possible_sums2.end();set_iterator++)
buffer_set.insert(myarray[i]+*set_iterator);
all_possible_sums2.insert(myarray[i]);
for(set<int>::iterator set_iterator=buffer_set.begin();set_iterator!=buffer_set.end();set_iterator++)
all_possible_sums2.insert(*set_iterator);
}
//for every element in the first set, check if the the second set has the complemenent to make N
for(set<int>::iterator set_iterator=all_possible_sums1.begin();set_iterator!=all_possible_sums1.end();set_iterator++)
if(all_possible_sums2.find(N-*set_iterator)!=all_possible_sums2.end())
return true;
return false;
}
Ugly and brute force approach:
a = [1, 2, 3, 4, 5, 10]
b = []
a.size.times do |c|
b << a.combination(c).select{|d| d.reduce(&:+) == 19 }
end
puts b.flatten(1).inspect

How can we assign letters to numbers

I have the following:
d=[1 2 3 4 5 6 7]
I want Matlab to assign a day name to every number by doing a loop or
any suitable method as follows:
1 =tuesday
2=wednesday
.
.
.
7=monday
the results I am aiming to get after running the program is :
the Matlab window asks the user to enter a number from 1 to 7
n=('enter a number from 1 to 7')
then,
if we enter ,for example, 4 , this means that the printed result is: Friday
or
if we entered , for example , 7, this means that the printed result is: Monday
and so on
Is there any way to do this
regards
You could use a cell array, which allows you to store an array of text strings. The curly bracket is the key:
>> weekdays = {'Mon', 'Tues', 'Weds', 'Thurs', 'Fri', 'Sat', 'Sun'};
>> weekdays{4}
ans =
Thurs
Edit: You can get the relevant number from the user by using MATLAB's input function:
n = input('Enter your number:');
disp(weekdays{n})
Using a map might be one approach:
weekDays = containers.Map({1, 2, 3, 4, 5, 6, 7} , ...
{'tuesday', 'wednesday', 'thursday', 'friday', 'saturday', 'sunday'});
number = input('enter a number from 1 to 7');
disp(sprintf('You did choose %s\n', weekDays(number)));
EDIT:
Using the solution by Bill Cheatham you end up with
weekdays = {'Mon', 'Tues', 'Weds', 'Thurs', 'Fri', 'Sat', 'Sun'};
number = input('enter a number from 1 to 7');
disp(sprintf('You did choose %s\n', weekdays{number}));