Matlab Coin Toss Simulation - matlab

I have to write some code in Matlab that simulates tossing a coin 150 times. I have to count how many times the coin lands on heads and create a vector that gives a running percentage of the heads.
Then I have to make a table of the number of trials, random 'flips", and the running percentages of heads. I assume random "flips" means heads or tails for that trial.
I also have to create a line graph with trials on the x-axis and probabilities (percentages) on the y-axis. I'm assuming the percentages are just the percentage of getting heads.
Sorry if this post was long. I figure giving the details now will make it easier to see what I was trying to do with the code. I didn't create the table or plot yet because I'm not even sure how to code for the actual problem.
NUM_TRIALS = 150;
trials = 1:NUM_TRIALS;
heads = 0;
t = rand(NUM_TRIALS,1);
percent_h = zeros(size(t));
for i = trials
if (t(i) < 0.5)
heads = heads + 1;
percent_h = heads./trials;
end
end
flips = t;
disp('Number of Trials, Random flips, Heads Percentage')
disp([trials', flips, percent_h'])
plot(trials,percent_h)
title('Trial Number vs. Percent Heads')
xlabel('Trial number')
ylabel('Percent Heads')

Your code is actually pretty close to answering your question, but there are a few issues that I see.
You should index t by the current trial number.
Likewise, percent_h should be indexed accordingly. This should be pre-allocated as well.
Not sure what z is supposed to represent...
To make the plot, just use plot. xlabel will give a label to the x axis, ylabel to the y axis. title will give a name to the plot.
You should divide by i, not trials.
So, your code should look something like this. There's a fair number of ways to simplify it, but I'll preserve your code as much as possible.
NUM_TRIALS = 150;
trials = 1:NUM_TRIALS;
heads = 0;
t = rand(NUM_TRIALS,1);
percent_h=zeros(size(t));
for i = trials
if (t(i) < 0.5)
heads = heads + 1;
end
percent_h(i) = heads/i;
end
plot(trials,percent_h)
xlabel('Trial Number')
ylabel('Percent Heads')
title ('Trial Number vs Percent Heads')

You can actually solve this more simply by taking advantage of a few other MATLAB functions, as hinted at by #PearsonArtPhoto. Firstly, you can use RANDI to generate the coin tosses as ones for a head. Then, you can use CUMSUM to get the cumulative number of heads. Dividing this element wise by 1:n gives you the cumulative fraction of heads.
n=150;
ishead = randi([0,1],1,n);
plot(cumsum(ishead)./(1:n));

Related

Finding the longest linear section of non-linear plot in MATLAB

Apologies for the long post but this takes a bit to explain. I'm trying to make a script that finds the longest linear portion of a plot. Sample data is in a csv file here, it is stress and strain data for calculating the shear modulus of 3D printed samples. The code I have so far is the following:
x_data = [];
y_data = [];
x_data = Data(:,1);
y_data = Data(:,2);
plot(x_data,y_data);
grid on;
answer1 = questdlg('Would you like to load last attempt''s numbers?');
switch answer1
case 'Yes'
[sim_slopes,reg_data] = regr_and_longest_part(new_x_data,new_y_data,str2num(answer2{3}),str2num(answer2{2}),K);
case 'No'
disp('Take a look at the plot, find a range estimate, and press any button to continue');
pause;
prompt = {'Eliminate values ABOVE this x-value:','Eliminate values BELOW this x-value:','Size of divisions on x-axis:','Factor for similarity of slopes:'};
dlg_title = 'Point elimination';
num_lines = 1;
defaultans = {'0','0','0','0.1'};
if isempty(answer2) < 1
defaultans = {answer2{1},answer2{2},answer2{3},answer2{4}};
end
answer2 = inputdlg(prompt,dlg_title,num_lines,defaultans);
uv_of_x_range = str2num(answer2{1});
lv_of_x_range = str2num(answer2{2});
x_div_size = str2num(answer2{3});
K = str2num(answer2{4});
close all;
iB = find(x_data > str2num(answer2{1}),1,'first');
iS = find(x_data > str2num(answer2{2}),1,'first');
new_x_data = x_data(iS:iB);
new_y_data = y_data(iS:iB);
[sim_slopes, reg_data] = regr_and_longest_part(new_x_data,new_y_data,str2num(answer2{3}),str2num(answer2{2}),K);
end
[longest_section0, Midx]= max(sim_slopes(:,4)-sim_slopes(:,3));
longest_section=1+longest_section0;
long_sec_x_data_start = x_div_size*(sim_slopes(Midx,3)-1)+lv_of_x_range;
long_sec_x_data_end = x_div_size*(sim_slopes(Midx,4)-1)+lv_of_x_range;
long_sec_x_data_start_idx=find(new_x_data >= long_sec_x_data_start,1,'first');
long_sec_x_data_end_idx=find(new_x_data >= long_sec_x_data_end,1,'first');
long_sec_x_data = new_x_data(long_sec_x_data_start_idx:long_sec_x_data_end_idx);
long_sec_y_data = new_y_data(long_sec_x_data_start_idx:long_sec_x_data_end_idx);
[b_long_sec, longes_section_reg_data] = robustfit(long_sec_x_data,long_sec_y_data);
plot(long_sec_x_data,b_long_sec(1)+b_long_sec(2)*long_sec_x_data,'LineWidth',3,'LineStyle',':','Color','k');
function [sim_slopes,reg_data] = regr_and_longest_part(x_points,y_points,x_div,lv,K)
reg_data = cell(1,3);
scatter(x_points,y_points,'.');
grid on;
hold on;
uv = lv+x_div;
ii=0;
while lv <= x_points(end)
if uv > x_points(end)
uv = x_points(end);
end
ii=ii+1;
indices = find(x_points>lv & x_points<uv);
temp_x_points = x_points((indices));
temp_y_points = y_points((indices));
if length(temp_x_points) <= 2
break;
end
[b,stats] = robustfit(temp_x_points,temp_y_points);
reg_data{ii,1} = b(1);
reg_data{ii,2} = b(2);
reg_data{ii,3} = length(indices);
plot(temp_x_points,b(1)+b(2)*temp_x_points,'LineWidth',2);
lv = lv+x_div;
uv = lv+x_div;
end
sim_slopes = NaN(length(reg_data),4);
sim_slopes(1,:) = [reg_data{1,1},0,1,1];
idx=1;
for ii=2:length(reg_data)
coff =sim_slopes(idx,1);
if abs(reg_data{ii,1}-coff) <= K*coff
C=zeros(ii-sim_slopes(idx,3)+1,1);
for kk=sim_slopes(idx,3):ii
C(kk)=reg_data{kk,1};
end
sim_slopes(idx,1)=mean(C);
sim_slopes(idx,2)=std(C);
sim_slopes(idx,4)=ii;
else
idx = idx + 1;
sim_slopes(idx,1)=reg_data{ii,1};
sim_slopes(idx,2)=0;
sim_slopes(idx,3)=ii;
sim_slopes(idx,4)=ii;
end
end
end
Apologies for the code not being well optimized, I'm still relatively new to MATLAB. I did not use derivatives because my data is relatively noisy and derivation might have made it worse.
I've managed to get the get the code to find the longest straight part of the plot by splitting the data up into sections called x_div_size then performing a robustfit on each section, the results of which are written into reg_data. The code then runs through reg_data and finds which lines have the most similar slopes, determined by the K factor, by calculating the average of the slopes in a section of the plot and makes a note of it in sim_slopes. It then finds the longest interval with max(sim_slopes(:,4)-sim_slopes(:,3)) and performs a regression on it to give the final answer.
The problem is that it will only consider the first straight portion that it comes across. When the data is plotted, it has a few parts where it seems straightest:
As an example, when I run the script with answer2 = {'0.2','0','0.0038','0.3'} I get the following, where the black line is the straightest part found by the code:
I have the following questions:
It's clear that from about x = 0.04 to x = 0.2 there is a long straight part and I'm not sure why the script is not finding it. Playing around with different values the script always seems to pick the first longest straight part, ignoring subsequent ones.
MATLAB complains that Warning: Iteration limit reached. because there are more than 50 regressions to perform. Is there a way to bypass this limit on robustfit?
When generating sim_slopes there might be section of the plot whose slope is too different from the average of the previous slopes so it gets marked as the end of a long section. But that section sometimes is sandwiched between several other sections on either side which instead have similar slopes. How would it be possible to tell the script to ignore one wayward section and to continue as if it falls within the tolerance allowed by the K value?
Take a look at the Douglas-Peucker algorithm. If you think of your (x,y) values as the vertices of an (open) polygon, this algorithm will simplify it for you, such that the largest distance from the simplified polygon to the original is smaller than some threshold you can choose. The simplified polygon will be the set of straight lines. Find the two vertices that are furthest apart, and you're done.
MATLAB has an implementation in the Mapping Toolbox called reducem. You might also find an implementation on the File Exchange (but be careful, there is also really bad code on there). Or, you can roll your own, it's quite a simple algorithm.
You can also try using the ischange function to detect changes in the intercept and slope of the data, and then extract the longest portion from that.
Using the sample data you provided, here is what I see from a basic attempt:
>> T = readtable('Data.csv');
>> T = rmmissing(T); % Remove rows with NaN
>> T = groupsummary(T,'Var1','mean'); % Average duplicate timestamps
>> [tf,slopes,intercepts] = ischange(T.mean_Var2, 'linear', 'SamplePoints', T.Var1); % find changes
>> plot(T.Var1, T.mean_Var2, T.Var1, slopes.*T.Var1 + intercepts)
which generates the plot
You should be able to extract the longest segment based on the indices given by find(tf).
You can also tune the parameters of ischange to get fewer or more segments. Adding the name-value pair 'MaxNumChanges' with a value of 4 or 5 produces more linear segments with a tighter fit to the curve, for example, which effectively removes the kink in the plot that you see.

Animated plot of infectious disease spread with for loop (Matlab)

I'm a beginner in Matlab and I'm trying to model the spread of an infectious disease using Matlab. However, I encounter some problems.
At first, I define the matrices that need to be filled and their initial status:
diseasematrix=zeros(20,20);
inirow=10;
inicol=10;
diseasematrix(inirow,inicol)=1; % The first place where a sick person is
infectionmatrix=zeros(20,20); % Infected people, initially all 0
healthymatrix=round(rand(20,20)*100); % Initial healthy population (randomly)
Rate=0.0001; % Rate of spread
Now, I want to make a plot where the spread of the disease is shown, using a for loop. But i'm stuck here...
for t=1:365
Zneighboursum=zeros(size(diseasematrix));
out_ZT = calc_ZT(Zneighboursum, diseasematrix);
infectionmatrix(t) = round((Rate).*(out_ZT));
diseasematrix(t) = diseasematrix(t-1) + infectionmatrix(t-1);
healthymatrix(t) = healthymatrix(t-1) - infectionmatrix(t-1);
imagesc(diseasematrix(t));
title(sprintf('Day %i',t));
drawnow;
end
This basically says that the infectionmatrix is calculated based upon the formula in the loop, the diseasematrix is calculated by adding up the sick people of the previous timestep with the infected people of the previous time. The healthy people that remain are calculated by substracting the healthy people of the previous time step with the infected people. The variable out_ZT is a function I made:
function [ZT] = calc_ZT(Zneighboursum, diseasematrix)
Zneighboursum = Zneighboursum + circshift(diseasematrix,[1 0]);
Zneighboursum = Zneighboursum + circshift(diseasematrix,[0 1]);
ZT=Zneighboursum;
end
This is to quantify the number of sick people around a central cell.
However, the result is not what I want. The plot does not evolve dynamically and the values don't seem to be right. Can anyone help me?
Thanks in advance!
There are several problems with the code:
(Rate).*(out_ZT) is wrong. Because first one is a scalar and
second is a matrix, while .* requires both to be matrices of the
same size. so a single * would work.
The infectionmatrix,
diseasematrix, healthymatrix are all 2 dimensional matrices and
in order to keep them in memory you need to have a 3 dimensional
matrix. But since you don't use the things you store later you can
just rewrite on the old one.
You store integers in the
infectionmatrix, because you calculate it with round(). That
sets the result always to zero.
The value for Rate was too low to see any result. So I increased it to 0.01 instead
(just a cautionary point) you haven't used healthymatrix in your code anywhere.
The code for the function is fine, so after debugging according to what I perceived, here's the code:
diseasematrix=zeros(20,20);
inirow=10;
inicol=10;
diseasematrix(inirow,inicol)=1; % The first place where a sick person is
infectionmatrix=zeros(20,20); % Infected people, initially all 0
healthymatrix=round(rand(20,20)*100); % Initial healthy population (randomly)
Rate=0.01;
for t=1:365
Zneighboursum=zeros(size(diseasematrix));
out_ZT = calc_ZT(Zneighboursum, diseasematrix);
infectionmatrix = (Rate*out_ZT);
diseasematrix = diseasematrix + infectionmatrix;
healthymatrix = healthymatrix - infectionmatrix;
imagesc(diseasematrix);
title(sprintf('Day %i',t));
drawnow;
end
There is several problems:
1) If you want to save a 3D matrix you will need a 3D vector:
so you have to replace myvariable(t) by myvariable(:,:,t);
2) Why did you use round ? if you round a value < 0.5 the result will be 0. So nothing will change in your loop.
3) You need to define the boundary condition (t=1) and then start your loop with t = 2.
diseasematrix=zeros(20,20);
inirow=10;
inicol=10;
diseasematrix(inirow,inicol)=1; % The first place where a sick person is
infectionmatrix =zeros(20,20); % Infected people, initially all 0
healthymatrix=round(rand(20,20)*100); % Initial healthy population (randomly)
Rate=0.01; % Rate of spread
for t=2:365
Zneighboursum=zeros(size(diseasematrix,1),size(diseasematrix,2));
out_ZT = calc_ZT(Zneighboursum, diseasematrix(:,:,t-1));
infectionmatrix(:,:,t) = (Rate).*(out_ZT);
diseasematrix(:,:,t) = diseasematrix(:,:,t-1) + infectionmatrix(:,:,t-1);
healthymatrix(:,:,t) = healthymatrix(:,:,t-1) - infectionmatrix(:,:,t-1);
imagesc(diseasematrix(:,:,t));
title(sprintf('Day %i',t));
drawnow;
end
IMPORTANT: circshift clone your matrix in order to deal with the boundary effect.

Remove noise from a rectangular wave matlab

I have some recordings (from 16:00PM to 16:00PM) where ones indicate some kind of noise and zeros indicate quite moments. The following code tries to replicate these recordings.
dt = datenum('00:02:00','HH:MM:ss') - datenum('00:01:00','HH:MM:ss');
time_begin = datenum('00:00:00','HH:MM:ss');
time_end = datenum('24:00:00','HH:MM:ss');
time = repmat(cellstr(datestr(time_begin:dt:time_end,'HH:MM:ss')),2,1);
loudness = ones(1,numel(time));
quiet_start = [1489 1737];
quiet_end = [1603 1906];
for i = 1: numel(quiet_start)
loudness(quiet_start(i):quiet_end(i))=0;
end
time = time(961:2400);
loudness = loudness(961:2400);
figure
plot(loudness)
ylim([0 3])
I know that in the interval 16:00PM - 16:00PM there should be only 1 "bout" of zeros. Here (if you plot loudness) you can see that there are 2 bouts of zeros.
I have 2 possibilities:
remove one of the two bouts of zeros
remove the bout of ones in the middle
Is there any measure that I can use to take this decision? I.e. Do I make a bigger error converting ones to zeros or viceversa?
There are 2 (or more) bouts of zeros because of some errors in the recordings...but for sure there should be only one. I would like to remove the bouts in order to "modify" the system as less as possible. For instance: in this case I would remove the first bout of zeros since it is the smallest, but what to do if there are more than 2 bouts? Is there any algorithm that deals with this kind of problems?

A moving average with different functions and varying time-frames

I have a matrix time-series data for 8 variables with about 2500 points (~10 years of mon-fri) and would like to calculate the mean, variance, skewness and kurtosis on a 'moving average' basis.
Lets say frames = [100 252 504 756] - I would like calculate the four functions above on over each of the (time-)frames, on a daily basis - so the return for day 300 in the case with 100 day-frame, would be [mean variance skewness kurtosis] from the period day201-day300 (100 days in total)... and so on.
I know this means I would get an array output, and the the first frame number of days would be NaNs, but I can't figure out the required indexing to get this done...
This is an interesting question because I think the optimal solution is different for the mean than it is for the other sample statistics.
I've provided a simulation example below that you can work through.
First, choose some arbitrary parameters and simulate some data:
%#Set some arbitrary parameters
T = 100; N = 5;
WindowLength = 10;
%#Simulate some data
X = randn(T, N);
For the mean, use filter to obtain a moving average:
MeanMA = filter(ones(1, WindowLength) / WindowLength, 1, X);
MeanMA(1:WindowLength-1, :) = nan;
I had originally thought to solve this problem using conv as follows:
MeanMA = nan(T, N);
for n = 1:N
MeanMA(WindowLength:T, n) = conv(X(:, n), ones(WindowLength, 1), 'valid');
end
MeanMA = (1/WindowLength) * MeanMA;
But as #PhilGoddard pointed out in the comments, the filter approach avoids the need for the loop.
Also note that I've chosen to make the dates in the output matrix correspond to the dates in X so in later work you can use the same subscripts for both. Thus, the first WindowLength-1 observations in MeanMA will be nan.
For the variance, I can't see how to use either filter or conv or even a running sum to make things more efficient, so instead I perform the calculation manually at each iteration:
VarianceMA = nan(T, N);
for t = WindowLength:T
VarianceMA(t, :) = var(X(t-WindowLength+1:t, :));
end
We could speed things up slightly by exploiting the fact that we have already calculated the mean moving average. Simply replace the within loop line in the above with:
VarianceMA(t, :) = (1/(WindowLength-1)) * sum((bsxfun(#minus, X(t-WindowLength+1:t, :), MeanMA(t, :))).^2);
However, I doubt this will make much difference.
If anyone else can see a clever way to use filter or conv to get the moving window variance I'd be very interested to see it.
I leave the case of skewness and kurtosis to the OP, since they are essentially just the same as the variance example, but with the appropriate function.
A final point: if you were converting the above into a general function, you could pass in an anonymous function as one of the arguments, then you would have a moving average routine that works for arbitrary choice of transformations.
Final, final point: For a sequence of window lengths, simply loop over the entire code block for each window length.
I have managed to produce a solution, which only uses basic functions within MATLAB and can also be expanded to include other functions, (for finance: e.g. a moving Sharpe Ratio, or a moving Sortino Ratio). The code below shows this and contains hopefully sufficient commentary.
I am using a time series of Hedge Fund data, with ca. 10 years worth of daily returns (which were checked to be stationary - not shown in the code). Unfortunately I haven't got the corresponding dates in the example so the x-axis in the plots would be 'no. of days'.
% start by importing the data you need - here it is a selection out of an
% excel spreadsheet
returnsHF = xlsread('HFRXIndices_Final.xlsx','EquityHedgeMarketNeutral','D1:D2742');
% two years to be used for the moving average. (250 business days in one year)
window = 500;
% create zero-matrices to fill with the MA values at each point in time.
mean_avg = zeros(length(returnsHF)-window,1);
st_dev = zeros(length(returnsHF)-window,1);
skew = zeros(length(returnsHF)-window,1);
kurt = zeros(length(returnsHF)-window,1);
% Now work through the time-series with each of the functions (one can add
% any other functions required), assinging the values to the zero-matrices
for count = window:length(returnsHF)
% This is the most tricky part of the script, the indexing in this section
% The TwoYearReturn is what is shifted along one period at a time with the
% for-loop.
TwoYearReturn = returnsHF(count-window+1:count);
mean_avg(count-window+1) = mean(TwoYearReturn);
st_dev(count-window+1) = std(TwoYearReturn);
skew(count-window+1) = skewness(TwoYearReturn);
kurt(count-window +1) = kurtosis(TwoYearReturn);
end
% Plot the MAs
subplot(4,1,1), plot(mean_avg)
title('2yr mean')
subplot(4,1,2), plot(st_dev)
title('2yr stdv')
subplot(4,1,3), plot(skew)
title('2yr skewness')
subplot(4,1,4), plot(kurt)
title('2yr kurtosis')

Summing up Dice in MATLAB

My function called RollDice simulates the rolling of a given number of six sided dice a given number of times. The function has two input arguments, the number of dice (NumDice) that will be rolled in each experiment and the total number (NumRolls) of times that the dice will be rolled. The output of the function will be a vector SumDice of length NumRolls that contains the sum of the dice values in each experiment.
This is my code right now: how do I account for the SUM of the dice? Thanks!
function SumDice= RollDice(NumDice,NumRolls)
FACES= 6;
maxOut= FACES*NumDice;
count= zeros(1,maxOut);
for i = 1:NumRolls
outcome= 0;
for k= 1:NumDice
outcome= outcome + ceil(ranNumDice(1)*FACES);
end
count(outcome)= count(outcome) + 1;
end
bar(NumDice:maxOut, count(NumDice:length(count)));
message= sprintf('% NumDice rolls of % NumDice fair dice', NumRolls, NumDice);
title(message);
xlabel('sum of dice values'); ylabel('Count');
This is a simple and neat little problem (+1) and I enjoyed looking into it :-)
There were quite a few areas where I felt I could improve on your function. Rather than going through them one by one, I thought I'd just re-write the function how I'd do it, and we could go from there. I've written it as a script, but it can be turned into a function easily enough. Finally, I also generalized it a bit by allowing for the dice to have any number of faces (6 or otherwise). So, here it is:
%#Define the parameters
NumDice = 2;
NumFace = 6;
NumRoll = 50;
%#Generate the rolls and obtain the sum of the rolls
AllRoll = randi(NumFace, NumRoll, NumDice);
SumRoll = sum(AllRoll, 2);
%#Determine the bins for the histogram
Bins = (NumDice:NumFace * NumDice)';
%#Build the histogram
hist(SumRoll, Bins);
title(sprintf('Histogram generated from %d rolls of %d %d-sided dice', NumRoll, NumDice, NumFace));
xlabel(sprintf('Sum of %d dice', NumDice));
ylabel('Count');
I suggest you have a close look at my code and the documentation for each function I've used. The exercise may prove useful for you when tackling other problems in Matlab in the future. Once you've done that, if there is anything you don't understand, then please let me know in a comment and I'll try to help. Cheers!
ps, If you don't ever need to refer to the individual rolls again, you can of course convert the AllRoll and SumRoll line into a one-liner, ie: SumRoll = sum(randi(NumFace, NumRoll, NumDice), 2);. I think the two-liner is more readable personally, and I doubt it will make much difference to the efficiency of the code.