How to implement cumulative score in Matlab - matlab

Can someone explain me what is "cumulative score" and how to implement it on Matlab?
I searched on the net and i found out that the cumulative score is defined as the percentage of test images such that the absolute error is not higher than the threshold t, (in years in this study).
I read in an article that the cumulative score is calculated as shown in the image.
I also used the "one category error" in my study, calculated as following:
correct = abs(predict_label - test_label) <= 1;
num_correct = length(find(correct));
accuracy2Svmk2 = (num_correct ./ length(test_label)) * 100;
these two metrics can be the same more or less?
Thank you.

Maybe you forgot the "1-":
accuracy2Svmk2 = (1 - (num_correct ./ length(test_label))) * 100;

Related

Ruby version of gamma.fit from scipy.stats

As the title suggests, I am trying to find a function that can take an array of floats and find a distribution that fits my data.
From here I'll use it to find the CDF of new data I am passing it.
I have installed and looked through the sciruby Distribution and NArray docs but nothing appears to match the 'fit' method
The python code looks like this
# Approach 2: Model-based percentiles.
# Step 1: Find a Gamma distribution that fits your data
alpha, _, beta = stats.gamma.fit(data, floc = 0.)
# Step 2: Use that distribution's CDF to get percentiles.
scores = 100-100*stats.gamma.cdf(new_data, a = alpha, scale=beta)
print(scores)
Thank you in advance
After a deep dive into other packages and a lot of help from someone from the 'Cross Validated' forum, I have the answer needed.
In order to obtain the needed 'alpha' and 'beta' values that will give the shape and rate of the gamma distribution, you will need to discover what the 'variance' value is in the data.
There are a few approaches to achieving this. See here for more information;
https://www.statisticshowto.com/probability-and-statistics/descriptive-statistics/sample-variance/
Code examples;
data = [<insert your numbers>]
sum = data.sum
sum_square_mean = (sum**2) / data.size
all_square = data.map { |n| n**2 }.sum
net_square = all_square - sum_square_mean
minus_one = data.size - 1
variance = net_square / minus_one
mean = data.sum(0.0) / data.size
mean_squared = mean**2
alpha = mean_squared / variance
beta = mean / variance
theta = variance / mean
The line 'minus_one' isn't completely necessary but it's done in statistics to reduce the error rate. Look up Bessels correction. You can just get variance from net_square / data.size.
Second option using the 'descriptive_statistics' gem
require('descriptive_statistics')
# doesn't account for bessel's correction
#alpha = (data.mean**2) / data.variance
#beta = data.mean / data.variance
#theta = data.variance / data.mean
Once you have these values, you can use the cdf function from the Distribution Gem , docs here
The next stage is then to pass the values into this function which will return a percentile.
Make sure to use the '1 over beta' calculation or it won't work
percentile = 100 - (100 * Distribution::Gamma::Ruby_.cdf(x, alpha, 1 / beta))
You may have noticed I have also calculated #theta
This was for a separate function that means I can also return the value from my gamma distribution by passing in the percentile. Used like so
value = Distribution::Gamma.quantile(0.5, alpha, theta)
This function is also known as 'inverse cdf', 'inverse cumulative distribution function', 'probability point function' or 'percentile point function'. Here it is simply named 'quantile'.
For more information on gamma distributions, please see the wiki
Gamma Distribution

Statistics on an amplitude-time graph in MATLAB

MATLAB - I have data of amplitude and time, and I am trying to ascertain the times at which the peaks in amplitude are. My most recent attempt, seen here, involves the findpeaks function. However, I would like to get the MinPeakHeight parameter, which comes directly from the graph of the data and is seen here as 0.2, to be more accurate and general. Is there some statistical or other way to do this?
y = y(:,1);
dt = 1 / Fs;
t = 0 : dt : (length(y) * dt) - dt;
[pks,index] = findpeaks(y,'MinPeakHeight',0.2);
pksTimes = t(index);
I hope that this is clear, but, if not, I am more than happy to clarify any points. Thank you!

Iterative quantile estimation in Matlab

I'm trying to implement an interative algorithm to estimate quantiles in data that is generated from a Monte-Carlo simulation. I want to make it iterative, because I have many iterations and variables so storing all data points and using Matlab's quantile function would take much of the memory that I actually need for the simulation.
I found some approaches based on the Robbin-Monro process, given by
The implementation with a control sequence ct = c / t where c is constant is quite straight forward. In the cited paper, they show that c = 2 * sqrt(2 * pi) gives quite good results, at least for the median. But they also propose an adaptive approach based on an estimation of the histogram. Unfortunately, I haven't figured out how to implement this adaptation yet.
I tested the implementation with a constant c for three test samples with 10.000 data points. The value c = 2 * sqrt(2 * pi) did not work well for me, but c = 100 looks quite good for the test samples. However, this selction is not very robust and failed in the actual Monte-Carlo simulation giving results wide off the mark.
probabilities = [0.1, 0.4, 0.7];
controlFactor = 100;
quantile = zeros(size(probabilities));
indicator = zeros(size(probabilities));
for index = 1:length(data)
control = controlFactor / index;
indices = (data(index) >= quantile);
indicator(indices) = probabilities(indices);
indices = (data(index) < quantile);
indicator(indices) = probabilities(indices) - 1;
quantile = quantile + control * indicator;
end
Is there a more robust solution for iterative quantile estimation or does anyone have an implementation for an adaptive approach with small memory consumption?
After trying some of the adaptive iterative approaches that I found in literature without great success (not sure, if I did it right), I came up with a solution that gives me good results for my test samples and also for the actual Monte-Carlo-Simulation.
I buffer a subset of simulation results, compute the sample quantiles and average over all subset sample quantiles in the end. This seems to work quite well and without tuning many parameters. The only parameter is the buffer size which is 100 in my case.
The results converge quite fast and increasing sample size does not improve the results dramatically. There seems to be a small but constant bias that presumably is the averaged error of the subset sample quantiles. And that is the downside of my solution. By choosing the buffer size, one fixes the achievable accuracy. Increasing the buffer size reduces this bias. In the end, it seems to be a memory and accuracy tradeoff.
% Generate data
rng('default');
data = sqrt(0.5) * randn(10000, 1) + 5 * rand(10000, 1) + 10;
% Set parameters
probabilities = 0.2;
% Compute reference sample quantiles
quantileEstimation1 = quantile(data, probabilities);
% Estimate quantiles with computing the mean over a number of subset
% sample quantiles
subsetSize = 100;
quantileSum = 0;
for index = 1:length(data) / subsetSize;
quantileSum = quantileSum + quantile(data(((index - 1) * subsetSize + 1):(index * subsetSize)), probabilities);
end
quantileEstimation2 = quantileSum / (length(data) / subsetSize);
% Estimate quantiles with iterative computation
quantileEstimation3 = zeros(size(probabilities));
indicator = zeros(size(probabilities));
controlFactor = 2 * sqrt(2 * pi);
for index = 1:length(data)
control = controlFactor / index;
indices = (data(index) >= quantileEstimation3);
indicator(indices) = probabilities(indices);
indices = (data(index) < quantileEstimation3);
indicator(indices) = probabilities(indices) - 1;
quantileEstimation3 = quantileEstimation3 + control * indicator;
end
fprintf('Reference result: %f\nSubset result: %f\nIterative result: %f\n\n', quantileEstimation1, quantileEstimation2, quantileEstimation3);

Select a subset of stocks using genetic algorithm in Matlab

I want to select 10 stocks out of the a possible set of given stocks that should be given some weight while the rest should be given zero weight. I have read the covariance matrix and returns from a file. My code is
Aeq = ones(1,stocks);
beq = 1;
lb = zeros(1,stocks);
up = ones(1,stocks);
options = gaoptimset;
options = gaoptimset(options,'PopulationSize' ,10);
fitnessFunction = #(x) (x * covariance * x') - (x * returns);
W = ga(fitnessFunction,stocks,[],[],Aeq,beq,lb,up,[],options);
This code is giving weights to all the stocks. I cannot figure it out how to limit the number to 10.
The 'PopulationSize' parameters specifies how many entities - in your case portfolios - exist at each epoch, it has nothing to do with the weights assigned to each asset.
You need to write appropriate crossoverFcn and mutationFcn functions that explicitly include maintaining exactly 10 non-zero weights.

Calculating a interest rate tree in matlab

I would like to calibrate a interest rate tree using the optimization tool in matlab. Need some guidance on doing it.
The interest rate tree looks like this:
How it works:
3.73% = 2.5%*exp(2*0.2)
96.40453 = (0.5*100 + 0.5*100)/(1+3.73%)
94.15801 = (0.5*96.40453+ 0.5*97.56098)/(1+2.50%)
The value of 2.5% is arbitrary and the upper node is obtained by multiplying with an exponential of 2*volatility(here it is 20%).
I need to optimize the problem by varying different values for the lower node.
How do I do this optimization in Matlab?
What I have tried so far?
InterestTree{1}(1,1) = 0.03;
InterestTree{3-1}(1,3-1)= 2.5/100;
InterestTree{3}(2,:) = 100;
InterestTree{3-1}(1,3-2)= (2.5*exp(2*0.2))/100;
InterestTree{3-1}(2,3-1)=(0.5*InterestTree{3}(2,3)+0.5*InterestTree{3}(2,3-1))/(1+InterestTree{3-1}(1,3-1));
j = 3-2;
InterestTree{3-1}(2,3-2)=(0.5*InterestTree{3}(2,j+1)+0.5*InterestTree{3}(2,j))/(1+InterestTree{3-1}(1,j));
InterestTree{3-2}(2,3-2)=(0.5*InterestTree{3-1}(2,j+1)+0.5*InterestTree{3-1}(2,j))/(1+InterestTree{3-2}(1,j));
But I am not sure how to go about the optimization. Any suggestions to improve the code, do tell me..Need some guidance on this..
Are you expecting the tree to increase in size? Or are you just optimizing over the value of the "2.5%" parameter?
If it's the latter, there are two ways. The first is to model the tree using a closed form expression by replacing 2.5% with x, which is possible with the tree. There are nonlinear optimization toolboxes available in Matlab (e.g. more here), but it's been too long since I've done this to give you a more detailed answer.
The seconds is the approach I would immediately do. I'm interpreting the example you gave, so the equations I'm using may be incorrect - however, the principle of using the for loop is the same.
vol = 0.2;
maxival = 100;
val1 = zeros(1,maxival); %Preallocate
finalval = zeros(1,maxival);
for ival=1:maxival
val1(ival) = i/1000; %Use any scaling you want. This will go from 0.1% to 10%
val2=val1(ival)*exp(2*vol);
x1 = (0.5*100+0.5*100)/(1+val2); %Based on the equation you gave
x2 = (0.5*100+0.5*100)/(1+val1(ival)); %I'm assuming this is how you calculate the bottom node
finalval(ival) = x1*0.5+x2*0.5/(1+...); %The example you gave isn't clear, so replace this with whatever it should be
end
[maxval, indmaxval] = max(finalval);
The maximum value is in maxval, and the interest that maximized this is in val1(indmaxval).