Cumulative count of unique values in Matlab - matlab

Suppose you are collecting cards - your album is made of n_cards cards. Each pack you buy contains cards_in_pack cards, each card has the same probability to be extracted. How many packs do you need to buy in order to collect all the cards if you can't trade your doubles? Suppose you want to simulate the process. This is an obvious way to do it:
n_cards = 100; n_experiments = 1e4; cards_in_pack = 5;
cards = randi([1 n_cards], ceil(sqrt(n_cards)) * n_experiments * n_cards, 1, 'uint16');
tic
n_packs = zeros(n_experiments, 1);
ctrl1 = 1;
i_f = 0;
n = 0;
while ctrl1
ctrl2 = 1;
i1 = 0;
while ctrl2
i1 = i1 + 1;
ctrl2 = numel(unique(cards((cards_in_pack * i_f + 1):(cards_in_pack * (i_f + i1))))) ~= n_cards;
end
n = n + 1;
n_packs(n) = i1;
i_f = i_f + i1;
ctrl1 = n ~= n_experiments;
end
toc
% Average number of packs:
mean(n_packs)
% Distribution of the number of packs necessary to complete the album
hist(n_packs, 50)
% Number of cards needed in the experiments:
sum(n_packs) * cards_in_pack
This is very slow - is there a faster way to do it? Specifically: is there a fast way to calculate the cumulative count of unique values in Matlab?

The simulation can be vectorized across experiments. So the experiment loop is removed, and simulation time is greatly reduced.
Since each experiment may finish at different times (different number of packs required), an experiment can be in two states: ongoing or finished. The code maintains a vector of ongoing experiments (exps_ongoing) and a 0-1 matrix of obtained cards in each experiment (cards_obtained).
For each ongoing experiment, a new pack is generated and the cards contained in that pack are (over)written on cards_obtained. When all cards have been obtained for an ongoing experiment, that experiment is removed from exps_ongoing. The code ends when all experiments have finished.
n_cards = 100;
cards_in_pack = 5;
n_experiments = 1e4;
cards_obtained = zeros(n_cards,n_experiments);
%// will contain cards obtained in each experiment
exps_ongoing = 1:n_experiments; %// list of which experiments are ongoing
n_packs = zeros(1,n_experiments); %// will record how many packs have been used
while ~isempty(exps_ongoing)
n_packs(exps_ongoing) = n_packs(exps_ongoing) + 1;
%// pick one pack for each ongoing experiment
new_cards = randi(n_cards,cards_in_pack,numel(exps_ongoing));
%// generate pack contents for each ongoing experiment
cards_obtained(new_cards + repmat(((exps_ongoing)-1)*n_cards,cards_in_pack,1)) = true;
%// take note of obtained cards in each ongoing experiment. Linear indexing is used here
exps_ongoing = setdiff(exps_ongoing,exps_ongoing(all(cards_obtained(:,exps_ongoing))));
%// ongoing experiments for which all cards have been obtained are removed
end
disp(mean(n_packs))
For your input data, this reduces time by a factor of 50 on my computer (104.36 seconds
versus 1.89 seconds, measured with tic, toc).

OK, in this instance, it's pretty simple to simulate because the constraints are in our favour - all we need to know is when there are no cards left that we don't have. Thus we can dump the explicit uniqueness test and just count...
I'd do it something like this:
n_packs = zeros(n_experiments, 1, 'uint32');
for i=1:n_experiments
collection = zeros(n_cards, 1, 'uint32');
while nnz(collection) < n_cards
n_packs(i) = n_packs(i) + 1;
pack = randi(n_cards, cards_in_pack, 1, 'uint32');
collection(pack) = collection(pack) + 1;
end
end
Now I can't guarantee that that will be faster (I don't have Matlab with me to test it - there may be a bug or two as well), but It's about the simplest algorithm I can come up with, and simple code tends to be fast code. For maximum speed tweaking have a play with the data types - uint32 may not be optimal for everything due to Matlab's internals.

If all cards have equal probability, you can use this simple approach I took for a similar problem: open lots and lots of packs at once, and then check how many you would have needed to open.
n_cards = 100; n_experiments = 1e4; cards_in_pack = 5;
nPacks=zeros(n_experiments);
for i=1:n_experiments,
%# assume it is never going to take you >1500000 cards
r=randi(n_cards,1500000,1);
%# since R2013a, unique returns the first occurrence
%# for earlier versions, take the minimum of x
%# and subtract it from the total array length
[~,x]=unique(r);
nPacks(i,1)=ceil(max(x)/cards_in_pack);
end

Related

Solving probability problems with MATLAB

How can I simulate this question using MATLAB?
Out of 100 apples, 10 are rotten. We randomly choose 5 apples without
replacement. What is the probability that there is at least one
rotten?
The Expected Answer
0.4162476
My Attempt:
r=0
for i=1:10000
for c=1:5
a = randi(1,100);
if a < 11
r=r+1;
end
end
end
r/10000
but it didn't work, so what would be a better way of doing it?
Use randperm to choose randomly without replacement:
A = false(1, 100);
A(1:10) = true;
r = 0;
for k = 1:10000
a = randperm(100, 5);
r = r + any(A(a));
end
result = r/10000;
Short answer:
Your problem follow an hypergeometric distribution (similar to a binomial distribution but without replacement), if you have the necessary toolbox you can simply use the probability density function of the hypergeometric distribution:
r = 1-hygepdf(0,100,10,5) % r = 0.4162
Since P(x>=1) = P(x=1) + P(x=2) + P(x=3) + P(x=4) + P(x=5) = 1-P(x=0)
Of course, here I calculate the exact probability, this is not an experimental result.
To get further:
Noticed that if you do not have access to hygepdf, you can easily write the function yourself by using binomial coefficient:
N = 100; K = 10;
n = 5; k = 0;
r = 1-(nchoosek(K,k)*nchoosek(N-K,n-k))/nchoosek(N,n) % r = 0.4162
You can also use the binomial probability density function, it is a bit more tricky (but also more intuitive):
r = 1-prod(binopdf(0,1,10./(100-[0:4])))
Here we compute the probability to obtain 0 rotten apple five time in a row, the probabily increase at every step since we remove 1 good juicy apple each time. And then, according to the above explaination, we take 1-P(x=0).
There are a couple of issues with your code. First of all, implicitly in what you wrote, you replace the apple after you look at it. When you generate the random number, you need to eliminate the possibility of choosing that number again.
I've rewritten your code to include better practices:
clear
n_runs = 1000;
success = zeros(n_runs, 1);
failure = zeros(n_runs, 1);
approach = zeros(n_runs, 1);
for ii = 1:n_runs
apples = 1:100;
a = randperm(100, 5);
if any(a < 11)
success(ii) = 1;
elseif a >= 11
failure(ii) = 1;
end
approach(ii) = sum(success)/(sum(success)+sum(failure));
end
figure; hold on
plot(approach)
title("r = "+ approach(end))
hold off
The results are stored in an array (called approach), rather than a single number being updated every time, which means you can see how quickly you approach the end value of r.
Another good habit is including clear at the beginning of any script, which reduces the possibility of an error occurring due to variables stored in the workspace.

Regarding loop structure in Matlab for an iterative procedure

I'm trying to code a loop in Matlab that iteratively solves for an optimal vector s of zeros and ones. This is my code
N = 150;
s = ones(N,1);
for i = 1:N
if s(i) == 0
i = i + 1;
else
i = i;
end
select = s;
HI = (item_c' * (weights.*s)) * (1/(weights'*s));
s(i) = 0;
CI = (item_c' * (weights.*s)) * (1/(weights'*s));
standarderror_afterex = sqrt(var(CI - CM));
standarderror_priorex = sqrt(var(HI - CM));
ratio = (standarderror_afterex - standarderror_priorex)/(abs(mean(weights.*s) - weights'*select));
ratios(i) = ratio;
s(i) = 1;
end
[M,I] = min(ratios);
s(I) = 0;
This code sets the element to zero in s, which has the lowest ratio. But I need this procedure to start all over again, using the new s with one zero, to find the ratios and exclude the element in s that has the lowest ratio. I need that over and over until no ratios are negative.
Do I need another loop, or do I miss something?
I hope that my question is clear enough, just tell me if you need me to explain more.
Thank you in advance, for helping out a newbie programmer.
Edit
I think that I need to add some form of while loop as well. But I can't see how to structure this. This is the flow that I want
With all items included (s(i) = 1 for all i), calculate HI, CI and the standard errors and list the ratios, exclude item i (s(I) = 0) which corresponds to the lowest negative ratio.
With the new s, including all ones but one zero, calculate HI, CI and the standard errors and list the ratios, exclude item i, which corresponds to the lowest negative ratio.
With the new s, now including all ones but two zeros, repeat the process.
Do this until there is no negative element in ratios to exclude.
Hope that it got more clear now.
Ok. I want to go through a few things before I list my code. These are just how I would try to do it. Not necessarily the best way, or fastest way even (though I'd think it'd be pretty quick). I tried to keep the structure as you had in your code, so you could follow it nicely (even though I'd probably meld all the calculations down into a single function or line).
Some features that I'm using in my code:
bsxfun: Learn this! It is amazing how it works and can speed up code, and makes some things easier.
v = rand(n,1);
A = rand(n,4);
% The two lines below compute the same value:
W = bsxfun(#(x,y)x.*y,v,A);
W_= repmat(v,1,4).*A;
bsxfun dot multiplies the v vector with each column of A.
Both W and W_ are matrices the same size as A, but the first will be much faster (usually).
Precalculating dropouts: I made select a matrix, where before it was a vector. This allows me to then form a variable included using logical constructs. The ~(eye(N)) produces an identity matrix and negates it. By logically "and"ing it with select, then the $i$th column is now select, with the $i$th element dropped out.
You were explicitly calculating weights'*s as the denominator in each for-loop. By using the above matrix to calculate this, we can now do a sum(W), where the W is essentially weights.*s in each column.
Take advantage of column-wise operations: the var() and the sqrt() functions are both coded to work along the columns of a matrix, outputting the action for a matrix in the form of a row vector.
Ok. the full thing. Any questions let me know:
% Start with everything selected:
select = true(N);
stop = false; % Stopping flag:
while (~stop)
% Each column leaves a variable out...
included = ~eye(N) & select;
% This calculates the weights with leave-one-out:
W = bsxfun(#(x,y)x.*y,weights,included);
% You can comment out the line below, if you'd like...
W_= repmat(weights,1,N).*included; % This is the same as previous line.
% This calculates the weights before dropping the variables:
V = bsxfun(#(x,y)x.*y,weights,select);
% There's different syntax, depending on whether item_c is a
% vector or a matrix...
if(isvector(item_c))
HI = (item_c' * V)./(sum(V));
CI = (item_c' * W)./(sum(W));
else
% For example: item_c is a matrix...
% We have to use bsxfun() again
HI = bsxfun(#rdivide, (item_c' * V),sum(V));
CI = bsxfun(#rdivide, (item_c' * W),sum(W));
end
standarderror_afterex = sqrt(var(bsxfun(#minus,HI,CM)));
standarderror_priorex = sqrt(var(bsxfun(#minus,CI,CM)));
% or:
%
% standarderror_afterex = sqrt(var(HI - repmat(CM,1,size(HI,2))));
% standarderror_priorex = sqrt(var(CI - repmat(CM,1,size(CI,2))));
ratios = (standarderror_afterex - standarderror_priorex)./(abs(mean(W) - sum(V)));
% Identify the negative ratios:
negratios = ratios < 0;
if ~any(negratios)
% Drop out of the while-loop:
stop = true;
else
% Find the most negative ratio:
neginds = find(negratios);
[mn, mnind] = min(ratios(negratios));
% Drop out the most negative one...
select(neginds(mnind),:) = false;
end
end % end while(~stop)
% Your output:
s = select(:,1);
If for some reason it doesn't work, please let me know.

Writing a matlab program for combinations of large numbers

Because for combinations of large numbers at times matlab replies NaN, the assignment is to write a program to compute combinations of 200 objects taken 90 at a time. Once this works we are to make it into a function y = comb(n,k).
This is what I have so far based on an example we were given of the probability that 2 people in a class have the same birthday.
This is the example:
nMax = 70; %maximum number of people in classroom
nArray = 1:nMax;
prevPnot = 1; %initialize probability
for iN = 1:nMax
Pnot = prevPnot*(365-iN+1)/365; %probability that no birthdays are the same
P(iN) = 1-Pnot; %probability that at least two birthdays are the same
prevPnot = Pnot;
end
plot(nArray, P, '.-')
xlabel('nb. of people')
ylabel('prob. that at least two have same birthday')
grid on
At this point I'm having trouble because I'm more familiar with java. This is what I have so far, and it isn't coming out at all.
k = 90;
n = 200;
nArray = 1:k;
prevPnot = 1;
for counter = 1:k
Pnot = (n-counter+1)/(prevPnot*(n-k-counter+1);
P(iN) = Pnot;
prevPnot = Pnot;
end
The point of the loop I wrote is to separate out each term
i.e. n/k*(n-k), times (n-counter)/(k-counter)*(n-k-counter), and so forth.
I'm also not entirely sure how to save a loop as a function in matlab.
To compute the number of combinations of n objects taken k at a time, you can use gammaln to compute the logarithm of the factorials in order to avoid overflow:
result = exp(gammaln(n+1)-gammaln(k+1)-gammaln(n-k+1));
Another approach is to remove terms that will cancel and then compute the result:
result = prod((n-k+1:n)./(1:k));

How can I rearrange a list of numbers so that every N numbers is nonrepeating?

So I have a list of 190 numbers ranging from 1:19 (each number is repeated 10 times) that I need to sample 10 at a time. Within each sample of 10, I don't want the numbers to repeat, I tried incorporating a while loop, but computation time was way too long. So far I'm at the point where I can generate the numbers and see if there are repetitions within each subset. Any ideas?
N=[];
for i=1:10
N=[N randperm(19)];
end
B=[];
for j=1:10
if length(unique(N(j*10-9:j*10)))<10
B=[B 1];
end
end
sum(B)
Below is an updated version of the code. this might be a little more clear in showing what I want. (19 targets taken 10 at a time without repetition until all 19 targets have been repeated 10 times)
nTargs = 19;
pairs = nchoosek(1:nTargs, 10);
nPairs = size(pairs, 1);
order = randperm(nPairs);
values=randsample(order,19);
targs=pairs(values,:);
Alltargs=false;
while ~Alltargs
targs=pairs(randsample(order,19),:);
B=[];
for i=1:19
G=length(find(targs==i))==10;
B=[B G];
end
if sum(B)==19
Alltargs=true;
end
end
Here are some very simple steps to do this, basically you just shuffle the vector once, and then you grab the last 10 unique values:
v = repmat(1:19,1,10);
v = v(randperm(numel(v)));
[a idx]=unique(v);
result = unique(v);
v(idx)=[];
The algorithm should be fairly efficient, if you want to do the next 10, just run the last part again and combine the results into a totalResult
You want to sample the numbers 1:19 randomly in blocks of 10 without repetitions. The Matlab function 'randsample' has an optional 'replacement' argument which you can set to 'false' if you do not want repetitions. For example:
N = [];
replacement = false;
for i = 1:19
N = [N randsample(19,10,replacement)];
end
This generates a 19 x 10 matrix of random integers in the range [1,..,19] without repetitions within each column.
Edit: Here is a solution that addresses the requirement that each of the integers [1,..,19] occurs exactly 10 times, in addition to no repetition within each column / sample:
nRange = 19; nRep = 10;
valueRep = true; % true while there are repetitions
nLoops = 0; % count the number of iterations
while valueRep
l = zeros(1,nRep);
v = [];
for m = 1:nRep
v = [v, randperm(nRange,nRange)];
end
m1 = reshape(v,nRep,nRange);
for n = 1:nRep
l(n) = length(unique(m1(:,n)));
end
if all(l == nRep)
valueRep = false;
end
nLoops = nLoops + 1;
end
result = m1;
For the parameters in the question it takes about 300 iterations to find a result.
I think you should approach this constructively.
It's easy to initially find a 19 groups that fulfill your conditions just by rearranging the series 1:19: series1 = repmat(1:19,1,10); and rearranged= reshape(series1,10,19)
then shuffle the values
I would select two random columns copy them and switch the values at two random positions
then make a test if it fulfills your condition - like: test = #(x) numel(unique(x))==10 - if yes replace your columns
just keep shuffling till your time runs out or you are happy
of course you might come up with more efficient shuffling or testing
I was given another solution through the MATLAB forum that works pretty well (Credit to Niklas Nylen over on the MATLAB forum). Computation time is pretty low too. It basically shuffles the numbers until there are no repetitions within every 10 values. Thanks all for your help.
y = repmat(1:19,1,10);
% Run enough iterations to get the output random enough, I selected 100000
for ii = 1:100000
% Select random index
index = randi(length(y)-1);
% Check if it is allowed to switch places
if y(index)~=y(min(index+10, length(y))) && y(index+1)~=y(max(1,index-9))
% Make the switch
yTmp = y(index);
y(index)=y(index+1);
y(index+1)=yTmp;
end
end

Setting sampling rate in MATLAB for Arduino

I'm having trouble getting consistent results with the code I am using. I want to run my Arduino for a specific amount of time (say 20 seconds) and collect data from the analog pin with a specific sampling rate (say four samples a second). The code is as follows.
a_pin = 0;
tic;
i = 0;
while toc < 20
i = i + 1;
time(i) = toc;
v(i) = a.analogRead(a_pin);
pause(.25);
end
Is there a way to set the loop to run a specific time and then in the loop sample at a different rate?
You can try this:
a_pin = 0;
fs = 4; % sampling frequency (samplings per second)
mt = 20; % time for measurements
ind = 1;
nind = 1;
last_beep = 0;
tic;
while toc < mt
time(ind) = toc;
v(ind) = a.analogRead(a_pin);
% wait for appropriate time for next measurement
while( nind == ind )
nind = floor(toc*fs) + 1;
end
ind = nind;
% beep every second
if (ceil(toc) > last_beep)
beep(); % don't know if this one exist, check docs
last_beep = ceil(toc);
end
end
Maximal sampling time for a single Arduino analog read command is around 0.04 s, in practice I'd go minimally 0.05. Adding two read operations is in the order of 2*0.04, in practice more like 0.1 s. I think it is mainly limited by the USB communication speeds.
I am also new at arduino, but having implemented a real time analysis for EEG using it, on practice, I was able to sample 2 analog channels with a samplinf frequency between 57 and 108Hz. It was very variable (calculated through tic/toc), but it is still proper for realtime processing in my case.
My code uses a While loop, a series of memory updates, digital pin manipulations, plot of trace (drawnow) and seems to run smoothly enough
My answer is simply here : 0.0283 sec for sampling 2 analog inputs in my case.
Cheers