Matlab: binomial simulation - matlab

How do i simulate a binomial distribution with values for investment with two stocks acme and widget?
Number of trials is 1000
invest in each stock for 5 years
This is my code. What am I doing wrong?
nyears = 5;
ntrials = 1000;
startamount = 100;
yrdeposit = 50;
acme = zeros(nyears, 1);
widget = zeros(nyears,1);
v5 = zeros(ntrials*5, 1);
v5 = zeros(ntrials*5, 1);
%market change between -5 to 1%
marketchangeacme = (-5+(1+5)*rand(nyears,1));
marketchangewidget = (-3+(3+3)*rand(nyears,1));
acme(1) = startamount;
widget(1) = startamount;
for m=1:numTrials
for n=1:nyears
acme(n) = acme(n-1) + (yrdeposit * (marketchangeacme(n)));
widget(n) = acme(n-1) + (yrdeposit * (marketchangewidget(n)));
vacme5(i) = acme(j);
vwidget5(i) = widget(j);
end
theMean(m) = mean(1:n*nyears);
p = 0.5 % prob neg return
acmedrop = (marketchangeacme < p)
widgetdrop = (marketchangewidget <p)
end
plot(mean)

Exactly what you are trying to calculate is not clear. However some things that are obviously wrong with the code are:
widget(n) presumable isn't a function of acme(n-1) but rather 'widget(n-1)`
Every entry of theMean will be mean(1:nyears*nyears), which for nyears=5 will be 13. (This is because n=nyears always at that point in code.)
The probability of a negative return for acme is 5/6, not 0.5.
To find the locations of the negative returns you want acmedrop = (marketchangeacme < 0); not < 0.5 (nor any other probability). Similarly for widgetdrop.
You are not preallocating vacme5 nor vwidget5 (but you do preallocate v5 twice, and then never use it.
You don't create a variable called mean (and you never should) so plot(mean) will not work.

Related

Solving probability problems with MATLAB

How can I simulate this question using MATLAB?
Out of 100 apples, 10 are rotten. We randomly choose 5 apples without
replacement. What is the probability that there is at least one
rotten?
The Expected Answer
0.4162476
My Attempt:
r=0
for i=1:10000
for c=1:5
a = randi(1,100);
if a < 11
r=r+1;
end
end
end
r/10000
but it didn't work, so what would be a better way of doing it?
Use randperm to choose randomly without replacement:
A = false(1, 100);
A(1:10) = true;
r = 0;
for k = 1:10000
a = randperm(100, 5);
r = r + any(A(a));
end
result = r/10000;
Short answer:
Your problem follow an hypergeometric distribution (similar to a binomial distribution but without replacement), if you have the necessary toolbox you can simply use the probability density function of the hypergeometric distribution:
r = 1-hygepdf(0,100,10,5) % r = 0.4162
Since P(x>=1) = P(x=1) + P(x=2) + P(x=3) + P(x=4) + P(x=5) = 1-P(x=0)
Of course, here I calculate the exact probability, this is not an experimental result.
To get further:
Noticed that if you do not have access to hygepdf, you can easily write the function yourself by using binomial coefficient:
N = 100; K = 10;
n = 5; k = 0;
r = 1-(nchoosek(K,k)*nchoosek(N-K,n-k))/nchoosek(N,n) % r = 0.4162
You can also use the binomial probability density function, it is a bit more tricky (but also more intuitive):
r = 1-prod(binopdf(0,1,10./(100-[0:4])))
Here we compute the probability to obtain 0 rotten apple five time in a row, the probabily increase at every step since we remove 1 good juicy apple each time. And then, according to the above explaination, we take 1-P(x=0).
There are a couple of issues with your code. First of all, implicitly in what you wrote, you replace the apple after you look at it. When you generate the random number, you need to eliminate the possibility of choosing that number again.
I've rewritten your code to include better practices:
clear
n_runs = 1000;
success = zeros(n_runs, 1);
failure = zeros(n_runs, 1);
approach = zeros(n_runs, 1);
for ii = 1:n_runs
apples = 1:100;
a = randperm(100, 5);
if any(a < 11)
success(ii) = 1;
elseif a >= 11
failure(ii) = 1;
end
approach(ii) = sum(success)/(sum(success)+sum(failure));
end
figure; hold on
plot(approach)
title("r = "+ approach(end))
hold off
The results are stored in an array (called approach), rather than a single number being updated every time, which means you can see how quickly you approach the end value of r.
Another good habit is including clear at the beginning of any script, which reduces the possibility of an error occurring due to variables stored in the workspace.

How to speed up this for-loop code (for large matrix `H_sparse`)?

H_sparse is a large matrix with size 20,000-by-5,000. The matrix-vector product dk = A * Del_H; in the code below is time consuming. How can I speed up this code?
This code is another way to get an equivalent result to the built-in function pinv(H_Sparse) in MATLAB. I think MATLAB uses mex files and bsxfun in pinv, so it's fast.
But in theory the below algorithm is faster:
function PINV_H_Spp = Recur_Pinv_Comp( H_Sparse )
L = 1;
H_candidate = H_Sparse(:,L);
A = pinv( H_candidate );
for L = 1:size( H_Sparse, 2 ) - 1
L = L + 1;
Del_H = H_Sparse(:,L);
dk = A * Del_H;
Ck = Del_H - H_candidate * dk;
Gk = pinv( Ck );
A = A - dk * Gk;
A(end+1,:) = Gk;
H_candidate(:,end+1) = Del_H;
end
PINV_H_Spp = A;
The code can be compared with pinv(H_Sparse), using H_Sparse = rand(20000, 5000) as sample data.
A few points of improvement:
You can change your loop index to 2:size(H_Sparse, 2) and remove the line L = L + 1;.
There's no need to create a separate variable H_candidate, since it's only partitions of H_Sparse. Instead, just index H_sparse accordingly and you'll save on memory.
Instead of building your matrix A row-by-row, you can preallocate it and update it using indexing. This usually provides some speed-up.
Return A as your output. No need to put it in another variable.
Here's a new version of the code incorporating the above improvements:
function A = Recur_Pinv_Comp(H_Sparse)
[nRows, nCols] = size(H_Sparse);
A = [pinv(H_Sparse(:, 1)); zeros(nRows-1, nCols)];
for L = 2:nCols
Del_H = H_Sparse(:, L);
dk = A(1:L-1, :)*Del_H;
Ck = Del_H - H_Sparse(:, 1:L-1)*dk;
Gk = pinv(Ck);
A(1:L-1, :) = A(1:L-1, :) - dk*Gk;
A(L, :) = Gk;
end
end
In addition, it looks like your calls to pinv only ever operate on column vectors, so you may be able to replace them with a simple array transpose and scaling by the sum of the squares of the vector (which might speed things up a little more):
Gk = Ck.'./(Ck.'*Ck);

How to implement parallel-for in a 4 level nested for loop block

I have to calculate the std and mean of a large data set with respect to quite a few models. The final loop block is nested to four levels.
This is what it looks like:
count = 1;
alpha = 0.5;
%%%Below if each individual block is to be posterior'd and then average taken
c = 1;
for i = 1:numel(writers) %no. of writers
for j = 1: numel(test_feats{i}) %no. of images
for k = 1: numel(gmm) %no. of models
for n = 1: size(test_feats{i}{j},1)
[~, scores(c)] = posterior(gmm{k}, test_feats{i}{j}(n,:));
c = c + 1;
end
c = 1;
index_kek=find(abs(scores-mean(scores))>alpha*std(scores));
avg = mean(scores(index_kek)); %using std instead of mean... beacause of ..reasons
NLL(count) = avg;
count = count + 1;
end
count = 1; %reset count
NLL_scores{i}(j,:) = NLL;
end
fprintf('***score for model_%d done***\n', i)
end
It works and gives the desired result but it takes 3 days to give me the final calculation, even on my i7 processor. During processing the task manager tells me that only 20% of the cpu is being used, so I would rather put more load on the cpu to get the result faster.
Going by the official help here if I suppose want to make the outer most loop a parfor while keeping the rest normal for all I have to do is to insert integer limits rather than function calls such as size or numel.
So making these changes the above code will become:
count = 1;
alpha = 0.5;
%%%Below if each individual block is to be posterior'd and then average taken
c = 1;
num_writers = numel(writers);
num_images = numel(test_feats{1});
num_models = numel(gmm);
num_feats = size(test_feats{1}{1},1);
parfor i = 1:num_writers %no. of writers
for j = 1: num_images %no. of images
for k = 1: num_models %no. of models
for n = 1: num_feats
[~, scores(c)] = posterior(gmm{k}, test_feats{i}{j}(n,:));
c = c + 1;
end
c = 1;
index_kek=find(abs(scores-mean(scores))>alpha*std(scores));
avg = mean(scores(index_kek)); %using std instead of mean... beacause of ..reasons
NLL(count) = avg;
count = count + 1;
end
count = 1; %reset count
NLL_scores{i}(j,:) = NLL;
end
fprintf('***score for model_%d done***\n', i)
end
Is this the most optimum way to implement parfor in my case? Can it be improved or optimized further?
I couldn't test in Matlab for now but it should be close to a working solution. It has a reduced number of loops and changes a few implementation details but overall it might perform just as fast (or even slower) as your earlier code.
If gmm and test_feats take lots of memory then it is important that parfor is able to determine which peaces of data need to be delivered to which workers. The IDE should warn you if inefficient memory access is detected. This modification is especially useful if num_writers is much less than the number of cores in your CPU, or if it is only slightly larger (like 5 writers for 4 cores would take about as long as 8 writers).
[i_writer i_image i_model] = ndgrid(1:num_writers, 1:num_images, 1:num_models);
idx_combined = [i_writer(:) i_image(:) i_model(:)];
n_combined = size(idx_combined, 1);
NLL_scores = zeros(n_combined, 1);
parfor i_for = 1:n_combined
i = idx_combined(i_for, 1)
j = idx_combined(i_for, 2)
k = idx_combined(i_for, 3)
% pre-allocate
scores = zeros(num_feats, 1)
for i_feat = 1:num_feats
[~, scores(i_feat)] = posterior(gmm{k}, test_feats{i}{j}(i_feat,:));
end
% "find" is redundant here and performs a bit slower, might be insignificant though
index_kek = abs(scores - mean(scores)) > alpha * std(scores);
NLL_scores(i_for) = mean(scores(index_kek));
end

What is wrong with my Simpson algorithm?

I was trying to write an algorithm to approximate integrals with Simpson's method. When I try to plot it in a loglog plot, however, I don't get the correct order of accuracy which is O(h^4) (I get O(n)). I can't find any errors though. This is my code:
%Reference solution with Simpson's method (the reference solution works well)
yk = 0;
yj = 0;
href = 0.0001;
mref = (b-a)/href;
for k=2:2:mref-1
yk=yk+y(k*href+a);
end
for j=1:2:mref
yj=yj+y(href*j+a);
end
Iref=(href/3)*(y(a)+y(b)+2*yk+4*yj);
%Simpson's method
iter = 1;
Ehmatrix = 0;
for n = 0:nmax
h = b/(2^n+1);
x = a:h:b;
xodd = x(2:2:end-1);
xeven = x(3:2:end);
yodd = y(xodd);
yeven = y(xeven);
Isimp = (h/3)*(y(x(1))+4*sum(yodd)+2*sum(yeven)+y(b));
E = abs(Isimp-Iref);
Ehmatrix([iter],1) = [E];
Ehmatrix([iter],2) = [h];
iter = iter + 1;
end
figure
loglog(Ehmatrix(:,2),Ehmatrix(:,1))
a and b are the integration limits and y is the integrand that we want to approximate.
Djamillah - your code looks fine though the initialization of h is probably valid only for the case where a==0, so you may want to change this line of code to
h = (b-a)/(2^n+1);
I wonder if x = a:h:b; will always be valid - sometimes b may be included in the list, and sometimes it might not be, depending upon h. You may want to reconsider and use linspace instead
x = linspace(a,b,2^n+1);
which will guarantee that x has 2^n+1 points distributed evenly in the interval [a,b]. h could then be initialized as
h = x(2)-x(1);
Also, when determining the even and odd indices, we need to ignore the last element of x for both even and odd. So instead of
xodd = x(2:2:end-1);
xeven = x(3:2:end);
do
xodd = x(2:2:end-1);
xeven = x(3:2:end-1);
Finally, rather than using a vector y (how is this set?) I might just use the function handle to the function that I'm integrating instead and replace the calculation above as
Isimp = delta/3*(func(x(1)) + 4*sum(func(xodd)) + 2*sum(func(xeven)) + ...
func(x(end)));
Other than these tiny things (which are probably negligible), there is nothing in your algorithm to indicate a problem. It produced similar results to a version that I have.
As for the order of convergence, should it be O(n^4) or O(h^4)?
Taking into account Geoff's suggestions, and making a few other changes, it all works as expected.
%Reference solution with Simpson's method (the reference solution works well)
a=0;
b=1;
y=#(x) cos(x);
nmax=10;
%Simpson's method
Ehmatrix = [];
for n = 0:nmax
x = linspace(a,b,2^n+1);
h = x(2)-x(1);
xodd = x(2:2:end-1);
xeven = x(3:2:end-1);
yodd = y(xodd);
yeven = y(xeven);
Isimp = (h/3)*(y(x(1))+4*sum(yodd)+2*sum(yeven)+y(b));
E = abs(Isimp-integral(y,0,1));
Ehmatrix(n+1,:) = [E h];
end
loglog(Ehmatrix(:,2),Ehmatrix(:,1))
P=polyfit(log(Ehmatrix(:,2)),log(Ehmatrix(:,1)),1);
OrderofAccuracy=P(1)
You were getting O(h) accuracy because xeven=x(3:2:end) was wrong. Replacing it by xeven=x(3:e:end-1) fixes the code, and thus the accuracy.

Generating random numbers...Faster way?

Using Run & Time on my algorithm I found that is a bit slow on adding standard deviation to integers. First of all I created the large integer matrix:
NumeroCestelli = 5;
lover_bound = 0;
upper_bound = 250;
steps = 10 ;
Alpha = 0.123
livello = [lover_bound:steps:upper_bound];
L = length(livello);
[PianoSperimentale] = combinator(L,NumeroCestelli,'c','r');
for i=1:L
PianoSperimentale(PianoSperimentale==i)=livello(i);
end
then I add standard deviation (sigma = alpha * mu) and error (of a weigher) like this:
%Standard Deviation
NumeroEsperimenti = size(PianoSperimentale,1);
PesoCestelli = randn(NumeroEsperimenti,NumeroCestelli)*Alfa;
PesoCestelli = PesoCestelli.*PianoSperimentale + PianoSperimentale;
random = randn(NumeroEsperimenti,NumeroCestelli);
PesoCestelli(PesoCestelli<0) = random(PesoCestelli<0).*(Alfa.*PianoSperimentale(PesoCestelli<0) + PianoSperimentale(PesoCestelli<0));
%Error
IncertezzaCella = 0.5*10^(-6);
Incertezza = randn(NumeroEsperimenti,NumeroCestelli)*IncertezzaCella;
PesoIncertezza = PesoCestelli.*Incertezza+PesoCestelli;
PesoIncertezza = (PesoIncertezza<0).*(-PesoIncertezza)+PesoIncertezza;
Is there a faster way?
There is not enough information for me to test it, but I bet that eliminating all the duplicate calculations that you do will lead to a speedup. I have tried to remove some of them:
PesoCestelli = randn(NumeroEsperimenti,NumeroCestelli)*Alfa;
PesoCestelli = (1+PesoCestelli).*PianoSperimentale;
random = randn(NumeroEsperimenti,NumeroCestelli);
idx = PesoCestelli<0;
PesoCestelli(idx) = random(idx).*(1+Alfa).*PianoSperimentale(idx);
%Error
IncertezzaCella = 0.5*10^(-6);
Incertezza = randn(NumeroEsperimenti,NumeroCestelli)*IncertezzaCella;
PesoIncertezza = abs((1+PesoCestelli).*Incertezza);
Note that I reduced the last two lines to a single line.
You calculate PesoCestelli<0 a number of times. You could just calculate it once and save teh value. You also create a full set of random numbers, but only use a subset of them where PesoCestelli<0. You might be able to speed things up by only creating the number of random numbers you need.
It is not clear what Alfa is, but if it is a scalar, instead of
Alfa.*PianoSperimentale(PesoCestelli<0) + PianoSperimentale(PesoCestelli<0)
it might be faster to do
(1+Alfa).*PianoSperimentale(PesoCestelli<0)