How can I simulate this question using MATLAB?
Out of 100 apples, 10 are rotten. We randomly choose 5 apples without
replacement. What is the probability that there is at least one
rotten?
The Expected Answer
0.4162476
My Attempt:
r=0
for i=1:10000
for c=1:5
a = randi(1,100);
if a < 11
r=r+1;
end
end
end
r/10000
but it didn't work, so what would be a better way of doing it?
Use randperm to choose randomly without replacement:
A = false(1, 100);
A(1:10) = true;
r = 0;
for k = 1:10000
a = randperm(100, 5);
r = r + any(A(a));
end
result = r/10000;
Short answer:
Your problem follow an hypergeometric distribution (similar to a binomial distribution but without replacement), if you have the necessary toolbox you can simply use the probability density function of the hypergeometric distribution:
r = 1-hygepdf(0,100,10,5) % r = 0.4162
Since P(x>=1) = P(x=1) + P(x=2) + P(x=3) + P(x=4) + P(x=5) = 1-P(x=0)
Of course, here I calculate the exact probability, this is not an experimental result.
To get further:
Noticed that if you do not have access to hygepdf, you can easily write the function yourself by using binomial coefficient:
N = 100; K = 10;
n = 5; k = 0;
r = 1-(nchoosek(K,k)*nchoosek(N-K,n-k))/nchoosek(N,n) % r = 0.4162
You can also use the binomial probability density function, it is a bit more tricky (but also more intuitive):
r = 1-prod(binopdf(0,1,10./(100-[0:4])))
Here we compute the probability to obtain 0 rotten apple five time in a row, the probabily increase at every step since we remove 1 good juicy apple each time. And then, according to the above explaination, we take 1-P(x=0).
There are a couple of issues with your code. First of all, implicitly in what you wrote, you replace the apple after you look at it. When you generate the random number, you need to eliminate the possibility of choosing that number again.
I've rewritten your code to include better practices:
clear
n_runs = 1000;
success = zeros(n_runs, 1);
failure = zeros(n_runs, 1);
approach = zeros(n_runs, 1);
for ii = 1:n_runs
apples = 1:100;
a = randperm(100, 5);
if any(a < 11)
success(ii) = 1;
elseif a >= 11
failure(ii) = 1;
end
approach(ii) = sum(success)/(sum(success)+sum(failure));
end
figure; hold on
plot(approach)
title("r = "+ approach(end))
hold off
The results are stored in an array (called approach), rather than a single number being updated every time, which means you can see how quickly you approach the end value of r.
Another good habit is including clear at the beginning of any script, which reduces the possibility of an error occurring due to variables stored in the workspace.
Related
I am trying to use the Metropolis Hastings algorithm with a random walk sampler to simulate samples from a function $$ in matlab, but something is wrong with my code. The proposal density is the uniform PDF on the ellipse 2s^2 + 3t^2 ≤ 1/4. Can I use the acceptance rejection method to sample from the proposal density?
N=5000;
alpha = #(x1,x2,y1,y2) (min(1,f(y1,y2)/f(x1,x2)));
X = zeros(2,N);
accept = false;
n = 0;
while n < 5000
accept = false;
while ~accept
s = 1-rand*(2);
t = 1-rand*(2);
val = 2*s^2 + 3*t^2;
% check acceptance
accept = val <= 1/4;
end
% and then draw uniformly distributed points checking that u< alpha?
u = rand();
c = u < alpha(X(1,i-1),X(2,i-1),X(1,i-1)+s,X(2,i-1)+t);
X(1,i) = c*s + X(1,i-1);
X(2,i) = c*t + X(2,i-1);
n = n+1;
end
figure;
plot(X(1,:), X(2,:), 'r+');
You may just want to use the native implementation of matlab mhsample.
Regarding your code, there are a few things missing:
- function alpha,
- loop variable i (it might be just n but it is not suited for indexing since it starts at zero).
And you should always allocate memory in matlab if you want to fill it dynamically, i.e. X in your case.
To expand on the suggestions by #max, the code appears to work if you change the i indices to n and replace
n = 0;
with
n = 2;
X(:,1) = [.1,.1];
It would probably be better to assign X(:,1) to random values within your accept region (using the same code you use later), and/or include a burn-in period.
Depending upon what you are going to do with this, it may also make things cleaner to evaluate the argument to sin in the f function to keep it within 0 to 2 pi (likely by shifting the value by 2 pi if it exceeds those bounds)
I have an SDE I'm approximating by a numerical scheme using this code:
mu = 1.5;
Sigma = 0.5;
TimeStep = 0.001;
Time = 0:TimeStep:5;
random1 = normrnd(2,0.05,[1,500]);
random2 = randn(1,length(Time));
X = zeros(500, length(Time));
for j = 1:500
X(j,1)= random1(j);
for i = 1:length(Time)
X(j,i+1) = X(j,i) - mu*X(j,i)*TimeStep + Sigma*sqrt(2*mu)*sqrt(TimeStep)*random2(i);
end
end
How would it be possible to remove the outer for-loop and vectorize, so that at each time step, the first value is calculated for all 500 plots?
That's pretty simple, especially since j is only used for row indexing here:
X(:,1)= random1;
for i = 1:length(Time)
X(:,i+1) = X(:,i) - mu*X(:,i)*TimeStep + Sigma*sqrt(2*mu)*sqrt(TimeStep)*random2(i);
end
I tested both versions (Octave 5.1.0), and get identical results. Speed-up is about 400x on my machine.
General remark: Never use i and/or j as loop variables, since they are also used as imaginary units, cf. i and j.
Hope that helps!
Well basically the question says it all, my intuition tells me that a call to minmax should take less time than calling a min and then a max.
Is there some optimization I prevent Matlab carrying out in the following code?
minmax:
function minmax_vals = minmaxtest()
buffSize = 1000000;
A = rand(128,buffSize);
windowSize = 100;
minmax_vals = zeros(128,buffSize/windowSize*2);
for i=1:(buffSize/windowSize)
minmax_vals(:,(2*i-1):(2*i)) = minmax(A(:,((i-1)*windowSize+1):(i*windowSize)));
end
end
separate min-max:
function minmax_vals = minmaxtest()
buffSize = 1000000;
A = rand(128,buffSize);
windowSize = 100;
minmax_vals = zeros(128,buffSize/windowSize*2);
for i=1:(buffSize/windowSize)
minmax_vals(:,(2*i-1)) = min(A(:,((i-1)*windowSize+1):(i*windowSize)),[],2);
minmax_vals(:,(2*i)) = max(A(:,((i-1)*windowSize+1):(i*windowSize)),[],2);
end
end
Summary
You can see the overhead because minmax isn't completely obfuscated. Simply type
edit minmax
And you will see the function!
It appears that there is a data-type conversion to nntype.data('format',x,'Data');, which will not be the case for min or max and could be costly. This is for use with MATLAB's neural networking (nn) tools as minmax belongs to that toolbox.
In short, min and max are lower-level, compiled functions (hence they are fully obfuscated), which don't require functionality from the nn toolbox.
Benchmark
Here is a slightly more isolated benchmark, without your windowing and using timeit instead of the profiler. I've also included timings for just the data conversion used in minmax! The test gets the min and max of each row in a large matrix, see here the output plot and code below...
It appears that there is a linear relationship between number of rows and time taken (as expected for a linear operator), but the coefficient is much greater for the combined minmax relationship, with the separate operations being approximately 10x quicker. Also you can clearly see that data conversion takes more time that the min then max version alone!
function benchie()
K = zeros(10, 3);
for k = 1:10
n = 2^k;
A = rand(n, 200);
Arow = zeros(1,200);
m = zeros(n,2);
f1 = #()minmaxtest(A,m);
K(k,1) = timeit(f1);
f2 = #()minthenmaxtest(A,m);
K(k,2) = timeit(f2);
f3 = #()dataconversiontest(A, Arow);
K(k,3) = timeit(f3);
end
figure; hold on; plot(2.^(1:10), K(:,1)); plot(2.^(1:10), K(:,2)); plot(2.^(1:10), K(:,3));
end
function minmaxtest(A,m)
for ii = 1:size(A,1)
m(ii, 1:2) = minmax(A(ii,:));
end
end
function dataconversiontest(A, Arow)
for ii = 1:size(A,1)
Arow = nntype.data('format', A(ii,:), 'Data');;
end
end
function minthenmaxtest(A,m)
for ii = 1:size(A,1)
m(ii, 1) = min(A(ii,:));
m(ii, 2) = max(A(ii,:));
end
end
This question is from chegg.com.
Given a vector a of N elements a_{n},n =1,2,...,N, the simple moving average of m sequential elements of this vector is defined as
mu(j) = mu(j-1) + (a(m+j-1)-a(j-1))/m for j = 2,3,...,(N-m+1)
where
mu(1) = sum(a(k))/m for k = 1,2,...,m
Write a script that computes these moving averages when a is given by a=5*(1+rand(N,1)), where rand generates uniformly distributed random numbers. Assume that N=100 and m=6. Plot the results using plot(j,mu(j)) for j=1,2,...,N-m+1.
My current code is below, but I'm not sure where to go from here or if it's even right.
close all
clear all
clc
N = 100;
m = 6;
a = 5*(1+rand(N,1));
mu = zeros(N-m+1,1);
mu(1) = sum(a(1:m));
for j=2
mu(j) = mu(j-1) + (a-a)/m
end
plot(1:N-m+1,mu)
I'll step you through the modifications.
Firstly, mu(1) was not fully defined. The equation given is slightly incorrect, but this is what it should be:
mu(1) = sum(a(1:m))/m;
then the for loop has to go from j=2 to j=N-m+1
for j=2:N-m+1
and at each step, mu(j) is given by this formula, the same as given in the question
mu(j) = mu(j-1) + (a(m+j-1)-a(j-1))/m
And that's all you need to change!
Because for combinations of large numbers at times matlab replies NaN, the assignment is to write a program to compute combinations of 200 objects taken 90 at a time. Once this works we are to make it into a function y = comb(n,k).
This is what I have so far based on an example we were given of the probability that 2 people in a class have the same birthday.
This is the example:
nMax = 70; %maximum number of people in classroom
nArray = 1:nMax;
prevPnot = 1; %initialize probability
for iN = 1:nMax
Pnot = prevPnot*(365-iN+1)/365; %probability that no birthdays are the same
P(iN) = 1-Pnot; %probability that at least two birthdays are the same
prevPnot = Pnot;
end
plot(nArray, P, '.-')
xlabel('nb. of people')
ylabel('prob. that at least two have same birthday')
grid on
At this point I'm having trouble because I'm more familiar with java. This is what I have so far, and it isn't coming out at all.
k = 90;
n = 200;
nArray = 1:k;
prevPnot = 1;
for counter = 1:k
Pnot = (n-counter+1)/(prevPnot*(n-k-counter+1);
P(iN) = Pnot;
prevPnot = Pnot;
end
The point of the loop I wrote is to separate out each term
i.e. n/k*(n-k), times (n-counter)/(k-counter)*(n-k-counter), and so forth.
I'm also not entirely sure how to save a loop as a function in matlab.
To compute the number of combinations of n objects taken k at a time, you can use gammaln to compute the logarithm of the factorials in order to avoid overflow:
result = exp(gammaln(n+1)-gammaln(k+1)-gammaln(n-k+1));
Another approach is to remove terms that will cancel and then compute the result:
result = prod((n-k+1:n)./(1:k));