SET game odds simulation (MATLAB) - matlab

I have recently found the great card came - SET. Briefly, there are 81 cards with the four features: symbol (oval, squiggle or diamond), color (red, purple or green), number (one, two or three) and shading (solid, striped or open). The task is to locate (from selected 12 cards) a SET of 3 cards, in which each of the four features is either all the same on each card or all different on each card (no 2+1 combination).
I've coded it in MATLAB to find a solution and to estimate odds of having a set in randomly selected cards.
Here is my code to estimate odds:
%% initialization
K = 12; % cards to draw
NF = 4; % number of features (usually 3 or 4)
setallcards = unique(nchoosek(repmat(1:3,1,NF),NF),'rows'); % all cards: rows - cards, columns - features
setallcomb = nchoosek(1:K,3); % index of all combinations of K cards by 3
%% test
tic
NIter=1e2; % number of test iterations
setexists = 0; % test results holder
% C = progress('init'); % if you have progress function from FileExchange
for d = 1:NIter
% C = progress(C,d/NIter);
% cards for current test
setdrawncardidx = randi(size(setallcards,1),K,1);
setdrawncards = setallcards(setdrawncardidx,:);
% find all sets in current test iteration
for setcombidx = 1:size(setallcomb,1)
setcomb = setdrawncards(setallcomb(setcombidx,:),:);
if all(arrayfun(#(x) numel(unique(setcomb(:,x))), 1:NF)~=2) % test one combination
setexists = setexists + 1;
break % to find only the first set
end
end
end
fprintf('Set:NoSet = %g:%g = %g:1\n', setexists, NIter-setexists, setexists/(NIter-setexists))
toc
100-1000 iterations are fast, but be careful with more. One million iterations takes about 15 hours on my home computer. Anyway, with 12 cards and 4 features I've got around 13:1 of having a set. This is actually a problem. The instruction book said this number should be 33:1. And it was recently confirmed by Peter Norvig. He provides the Python code, but I didn't test it yet.
So can you find an error? Any comments on performance improvement are welcome.

I tackled the problem writing my own implementation before looking at your code. My first attempt was very similar to what you already had :)
%# some parameters
NUM_ITER = 100000; %# number of simulations to run
DRAW_SZ = 12; %# number of cards we are dealing
SET_SZ = 3; %# number of cards in a set
FEAT_NUM = 4; %# number of features (symbol,color,number,shading)
FEAT_SZ = 3; %# number of values per feature (eg: red/purple/green, ...)
%# cards features
features = {
'oval' 'squiggle' 'diamond' ; %# symbol
'red' 'purple' 'green' ; %# color
'one' 'two' 'three' ; %# number
'solid' 'striped' 'open' %# shading
};
fIdx = arrayfun(#(k) grp2idx(features(k,:)), 1:FEAT_NUM, 'UniformOutput',0);
%# list of all cards. Each card: [symbol,color,number,shading]
[W X Y Z] = ndgrid(fIdx{:});
cards = [W(:) X(:) Y(:) Z(:)];
%# all possible sets: choose 3 from 12
setsInd = nchoosek(1:DRAW_SZ,SET_SZ);
%# count number of valid sets in random draws of 12 cards
counterValidSet = 0;
for i=1:NUM_ITER
%# pick 12 cards
ord = randperm( size(cards,1) );
cardsDrawn = cards(ord(1:DRAW_SZ),:);
%# check for valid sets: features are all the same or all different
for s=1:size(setsInd,1)
%# set of 3 cards
set = cardsDrawn(setsInd(s,:),:);
%# check if set is valid
count = arrayfun(#(k) numel(unique(set(:,k))), 1:FEAT_NUM);
isValid = (count==1|count==3);
%# increment counter
if isValid
counterValidSet = counterValidSet + 1;
break %# break early if found valid set among candidates
end
end
end
%# ratio of found-to-notfound
fprintf('Size=%d, Set=%d, NoSet=%d, Set:NoSet=%g\n', ...
DRAW_SZ, counterValidSet, (NUM_ITER-counterValidSet), ...
counterValidSet/(NUM_ITER-counterValidSet))
After using the Profiler to discover hot spots, some improvement can be made mainly by early-break'ing out of loops when possible. The main bottleneck is the call to the UNIQUE function. Those two lines above where we check for valid sets can be rewritten as:
%# check if set is valid
isValid = true;
for k=1:FEAT_NUM
count = numel(unique(set(:,k)));
if count~=1 && count~=3
isValid = false;
break %# break early if one of the features doesnt meet conditions
end
end
Unfortunately, the simulation is still slow for larger simulation. Thus my next solution is a vectorized version, where for each iteration, we build a single matrix of all possible sets of 3 cards from the hand of 12 drawn cards. For all these candidate sets, we use logical vectors to indicate what feature is present, thus avoiding the calls to UNIQUE/NUMEL (we want features all the same or all different on each card of the set).
I admit that the code is now less readable and harder to follow (thus I posted both versions for comparison). The reason being that I tried to optimize the code as much as possible, so that each iteration-loop is fully vectorized. Here is the final code:
%# some parameters
NUM_ITER = 100000; %# number of simulations to run
DRAW_SZ = 12; %# number of cards we are dealing
SET_SZ = 3; %# number of cards in a set
FEAT_NUM = 4; %# number of features (symbol,color,number,shading)
FEAT_SZ = 3; %# number of values per feature (eg: red/purple/green, ...)
%# cards features
features = {
'oval' 'squiggle' 'diamond' ; %# symbol
'red' 'purple' 'green' ; %# color
'one' 'two' 'three' ; %# number
'solid' 'striped' 'open' %# shading
};
fIdx = arrayfun(#(k) grp2idx(features(k,:)), 1:FEAT_NUM, 'UniformOutput',0);
%# list of all cards. Each card: [symbol,color,number,shading]
[W X Y Z] = ndgrid(fIdx{:});
cards = [W(:) X(:) Y(:) Z(:)];
%# all possible sets: choose 3 from 12
setsInd = nchoosek(1:DRAW_SZ,SET_SZ);
%# optimizations: some calculations taken out of the loop
ss = setsInd(:);
set_sz2 = numel(ss)*FEAT_NUM/SET_SZ;
col = repmat(1:set_sz2,SET_SZ,1);
col = FEAT_SZ.*(col(:)-1);
M = false(FEAT_SZ,set_sz2);
%# progress indication
%#hWait = waitbar(0./NUM_ITER, 'Simulation...');
%# count number of valid sets in random draws of 12 cards
counterValidSet = 0;
for i=1:NUM_ITER
%# update progress
%#waitbar(i./NUM_ITER, hWait);
%# pick 12 cards
ord = randperm( size(cards,1) );
cardsDrawn = cards(ord(1:DRAW_SZ),:);
%# put all possible sets of 3 cards next to each other
set = reshape(cardsDrawn(ss,:)',[],SET_SZ)';
set = set(:);
%# check for valid sets: features are all the same or all different
M(:) = false; %# if using PARFOR, it will complain about this
M(set+col) = true;
isValid = all(reshape(sum(M)~=2,FEAT_NUM,[]));
%# increment counter if there is at least one valid set in all candidates
if any(isValid)
counterValidSet = counterValidSet + 1;
end
end
%# ratio of found-to-notfound
fprintf('Size=%d, Set=%d, NoSet=%d, Set:NoSet=%g\n', ...
DRAW_SZ, counterValidSet, (NUM_ITER-counterValidSet), ...
counterValidSet/(NUM_ITER-counterValidSet))
%# close progress bar
%#close(hWait)
If you have the Parallel Processing Toolbox, you can easily replace the plain FOR-loop with a parallel PARFOR (you might want to move the initialization of the matrix M inside the loop again: replace M(:) = false; with M = false(FEAT_SZ,set_sz2);)
Here are some sample outputs of 50000 simulations (PARFOR used with a pool of 2 local instances):
» tic, SET_game2, toc
Size=12, Set=48376, NoSet=1624, Set:NoSet=29.7882
Elapsed time is 5.653933 seconds.
» tic, SET_game2, toc
Size=15, Set=49981, NoSet=19, Set:NoSet=2630.58
Elapsed time is 9.414917 seconds.
And with a million iterations (PARFOR for 12, no-PARFOR for 15):
» tic, SET_game2, toc
Size=12, Set=967516, NoSet=32484, Set:NoSet=29.7844
Elapsed time is 110.719903 seconds.
» tic, SET_game2, toc
Size=15, Set=999630, NoSet=370, Set:NoSet=2701.7
Elapsed time is 372.110412 seconds.
The odds ratio agree with the results reported by Peter Norvig.

Here's a vectorized version, where 1M hands can be calculated in about a minute. I got about 28:1 with it, so there might still be something a little off with finding 'all different' sets. My guess is that this is what your solution has trouble with, as well.
%# initialization
K = 12; %# cards to draw
NF = 4; %# number of features (this is hard-coded to 4)
nIter = 100000; %# number of iterations
%# each card has four features. This means that a card can be represented
%# by a coordinate in 4D space. A set is a full row, column, etc in 4D
%# space. We can even parallelize the iterations, at least as long as we
%# have RAM (each hand costs 81 bytes)
%# make card space - one dimension per feature, plus one for the iterations
cardSpace = false(3,3,3,3,nIter);
%# To draw cards, we put K trues into each cardSpace. I can't think of a
%# good, fast way to draw exactly K cards that doesn't involve calling
%# unique
for i=1:nIter
shuffle = randperm(81) + (i-1) * 81;
cardSpace(shuffle(1:K)) = true;
end
%# to test, all we have to do is check whether there is any row, column,
%# with all 1's
isEqual = squeeze(any(any(any(all(cardSpace,1),2),3),4) | ...
any(any(any(all(cardSpace,2),1),3),4) | ...
any(any(any(all(cardSpace,3),2),1),4) | ...
any(any(any(all(cardSpace,4),2),3),1));
%# to get a set of 3 cards where all symbols are different, we require that
%# no 'sub-volume' is completely empty - there may be something wrong with this
%# but since my test looked ok, I'm not going to investigate on Friday night
isDifferent = squeeze(~any(all(all(all(~cardSpace,1),2),3),4) & ...
~any(all(all(all(~cardSpace,1),2),4),3) & ...
~any(all(all(all(~cardSpace,1),3),4),2) & ...
~any(all(all(all(~cardSpace,4),2),3),1));
isSet = isEqual | isDifferent;
%# find the odds
fprintf('odds are %5.2f:1\n',sum(isSet)/(nIter-sum(isSet)))

I found my error. Thanks Jonas for the hint with RANDPERM.
I used RANDI to randomly drawn K cards, but there is about 50% chance to get repeats even in 12 cards. When I substituted this line with randperm, I've got 33.8:1 with 10000 iterations, very close to the number in instruction book.
setdrawncardidx = randperm(81);
setdrawncardidx = setdrawncardidx(1:K);
Anyway, it would be interesting to see other approaches to the problem.

I'm sure there's something wrong with my calculation of these odds, since several others have confirmed with simulations that it's close to 33:1 as in the instructions, but what's wrong with the following logic?
For 12 random cards, there are 220 possible combinations of three cards (12!/(9!3!) = 220). Each combination of three cards has a 1/79 chance of being a set, so there's a 78/79 chance of three arbitrary cards not being a set. So if you examined all 220 combinations and there were a 78/79 chance that each one weren't a set, then your chance of not finding a set examining all possible combinations would be 78/79 raised to the 220th power, or 0.0606, which is approx. 17:1 odds.
I must be missing something...?
Christopher

Related

How do I remove a group of rows based on condition met in one in matlab? Have code but not sure what's going wrong?

I am trying to detect high amplitude events and remove them along with rows above and below. I have the following code which does this in part but not fully and I'm not sure where the error is. I have commented out the the audioread function and added randi to allow reproducible results. Code:
%[data, fs] = audioread("noise.wav");
%t1 = linspace(0, (numel(data)-1)/fs, numel(data));
rng(1)
data = randi(10,1000,1);
threshold = 5;
clear_range = 10; %rows/samples
data = clearRange(data, threshold, clear_range);
%t1 = linspace(0, (numel(data)-1)/fs, numel(data));
%plot(t1, data);
plot(data)
function [data] = clearRange(data, threshold, clear_range, compare_column)
% data: matrix of values to clean
% threshold: value to compare values against
% clear_range: number of rows to delete
% compare_column: column to check for value to compare against threshold
if nargin < 4
compare_column = 1;
end
for i = 1:length(data)
if i > length(data)
break
end
if data(i,compare_column) > threshold
data(max(1, i-clear_range):min(length(data), i+clear_range),:) = [];
end
end
end
I think the main problem with your code is that you modify data while looping over it. This means, you delete peaks (or high amplitude events in your words) in rows with an index greater than i, so that they cannot be taken into account in following iterations.
E.g. consider peaks in rows with indices 4 and 6, which should cause that rows up to index 16 are removed (with a value of clear_range equal to 10). However, when i is equal to 4, you remove rows up to index 14. Consequently, you also remove the peak at position 6, so that it is not taken into account in further iterations.
In general, it is easier to rely on MATLAB's matrix/array operations instead of using loops.
Please find below a possible solution with explanations inline.
clc;
% I adjusted inputs to get a minimal example
data = randi(10,30,1);
threshold = 9;
rangeToClear = 1;
columnToCompare = 1;
dataOut = clearRange(data, threshold, rangeToClear, columnToCompare );
disp('In:')
disp( data' );
disp('Out:')
disp( dataOut' ); % Plot for cross-check
function data = clearRange(data, threshold, rangeToClear, columnToCompare)
% rowsWithPeak is 1-D logical array showing where columnToCompare is greater than the threshold
rowsWithPeak = data( :, columnToCompare ) > threshold;
% kernel is a column vector of ones of size Nx1, where N is the number of rows
% that should be removed around a peak
kernel = ones( 2*rangeToClear+1, 1 );
% rowsToRemove is a column vector being greater than one at row indices
% that should be removed from the data. To obtain rowsToRemove,
% we convolute rowsWithPeak with the kernel. The argument 'same' to
% the conv2 function, specifies that rowsToRemove will have the same
% size as rowsWithPeak.
rowsToRemove = conv2( rowsWithPeak, kernel, 'same' );
% rowsToRemoveLogical is a logical array being one, where rowsToRemove is greater than 0.
% Note that, rowsToRemoveLogical = rowsToRemove > 0 would also work here
rowsToRemoveLogical = logical( rowsToRemove);
% Finally, we use rowsToRemoveLogical to mask positions of the rows that should be removed.
data( rowsToRemoveLogical, : ) = [];
end

Fast way to get mean values of rows accordingly to subscripts

I have a data, which may be simulated in the following way:
N = 10^6;%10^8;
K = 10^4;%10^6;
subs = randi([1 K],N,1);
M = [randn(N,5) subs];
M(M<-1.2) = nan;
In other words, it is a matrix, where the last row is subscripts.
Now I want to calculate nanmean() for each subscript. Also I want to save number of rows for each subscript. I have a 'dummy' code for this:
uniqueSubs = unique(M(:,6));
avM = nan(numel(uniqueSubs),6);
for iSub = 1:numel(uniqueSubs)
tmpM = M(M(:,6)==uniqueSubs(iSub),1:5);
avM(iSub,:) = [nanmean(tmpM,1) size(tmpM,1)];
end
The problem is, that it is too slow. I want it to work for N = 10^8 and K = 10^6 (see commented part in the definition of these variables.
How can I find the mean of the data in a faster way?
This sounds like a perfect job for findgroups and splitapply.
% Find groups in the final column
G = findgroups(M(:,6));
% function to apply per group
fcn = #(group) [mean(group, 1, 'omitnan'), size(group, 1)];
% Use splitapply to apply fcn to each group in M(:,1:5)
result = splitapply(fcn, M(:, 1:5), G);
% Check
assert(isequaln(result, avM));
M = sortrows(M,6); % sort the data per subscript
IDX = diff(M(:,6)); % find where the subscript changes
tmp = find(IDX);
tmp = [0 ;tmp;size(M,1)]; % add start and end of data
for iSub= 2:numel(tmp)
% Calculate the mean over just a single subscript, store in iSub-1
avM2(iSub-1,:) = [nanmean(M(tmp(iSub-1)+1:tmp(iSub),1:5),1) tmp(iSub)-tmp(iSub-1)];tmp(iSub-1)];
end
This is some 60 times faster than your original code on my computer. The speed-up mainly comes from presorting the data and then finding all locations where the subscript changes. That way you do not have to traverse the full array each time to find the correct subscripts, but rather you only check what's necessary each iteration. You thus calculate the mean over ~100 rows, instead of first having to check in 1,000,000 rows whether each row is needed that iteration or not.
Thus: in the original you check numel(uniqueSubs), 10,000 in this case, whether all N, 1,000,000 here, numbers belong to a certain category, which results in 10^12 checks. The proposed code sorts the rows (sorting is NlogN, thus 6,000,000 here), and then loop once over the full array without additional checks.
For completion, here is the original code, along with my version, and it shows the two are the same:
N = 10^6;%10^8;
K = 10^4;%10^6;
subs = randi([1 K],N,1);
M = [randn(N,5) subs];
M(M<-1.2) = nan;
uniqueSubs = unique(M(:,6));
%% zlon's original code
avM = nan(numel(uniqueSubs),7); % add the subscript for comparison later
tic
uniqueSubs = unique(M(:,6));
for iSub = 1:numel(uniqueSubs)
tmpM = M(M(:,6)==uniqueSubs(iSub),1:5);
avM(iSub,:) = [nanmean(tmpM,1) size(tmpM,1) uniqueSubs(iSub)];
end
toc
%%%%% End of zlon's code
avM = sortrows(avM,7); % Sort for comparison
%% Start of Adriaan's code
avM2 = nan(numel(uniqueSubs),6);
tic
M = sortrows(M,6);
IDX = diff(M(:,6));
tmp = find(IDX);
tmp = [0 ;tmp;size(M,1)];
for iSub = 2:numel(tmp)
avM2(iSub-1,:) = [nanmean(M(tmp(iSub-1)+1:tmp(iSub),1:5),1) tmp(iSub)-tmp(iSub-1)];
end
toc %tic/toc should not be used for accurate timing, this is just for order of magnitude
%%%% End of Adriaan's code
all(avM(:,1:6) == avM2) % Do the comparison
% End of script
% Output
Elapsed time is 58.561347 seconds.
Elapsed time is 0.843124 seconds. % ~70 times faster
ans =
1×6 logical array
1 1 1 1 1 1 % i.e. the matrices are equal to one another

How to loop over a matlab image without making copies?

I was trying to loop over an image in MATLAB but my code is running to slow. I am fairly knew to MATLAB but I suspect that it's because it's making a copy of my randomly selected image. My code is:
function patches = sampleIMAGES()
load IMAGES; % load images from disk
patchsize = 8; % we'll use 8x8 patches
numpatches = 10000;
patches = zeros(patchsize*patchsize, numpatches);
size_img = size(IMAGES);
num_rows_img = size_img(1);
num_cols_img = size_img(2);
num_images = size_img(3);
for i=1:numpatches,
%get random image
rand_img_number = randi(num_images);
rand_img = IMAGES(:, :, rand_img_number);
%get random patch patchsizexpatchsize
rand_row = randi(num_rows_img - patchsize);
rand_col = randi(num_cols_img - patchsize);
rand_patch = rand_img(rand_row:rand_row+patchsize-1, rand_col:rand_col+patchsize-1);
patches(:, i) = rand_patch(:)';
end
end
How is it possible to loop over this without making a copy if MATLAB does not allow to index twice into a matrix/array?
Approach #1 - im2col based
numpatches = 10000; %//Number of patches
blksz = 8; %// Blocksize
[m,n,r] = size(IMAGES); %// Get sizes
%// Store blocks from IMAGES as columns, so that they could be processed in
%// a vectorized fashion later on
blks_col(blksz*blksz,(m-blksz+1)*(n-blksz+1),r)=0; %// Pre-allocate
for k1=1:r
blks_col(:,:,k1) = im2col(IMAGES(:,:,k1),[blksz blksz],'sliding');
end
blks_col = reshape(blks_col,size(blks_col,1),[]);
%// Get rand row, column and dimension-3 indices to be used for indexing
%// into blks_col in one go
rand_row = randi(size(IMAGES,1)-blksz+1,numpatches,1);
rand_col = randi(size(IMAGES,2)-blksz+1,numpatches,1);
rand_dim3 = randi(size(IMAGES,3),numpatches,1);
%// Select the specific column from blks_col that represents the
%// [blksz x blksz] used to make a single patch in each iteration from
%// original code
num_cols_im2col = (m-blksz+1)*(n-blksz+1);
col_ind = (rand_dim3-1)*num_cols_im2col + (rand_col-1)*(m-blksz+1) + rand_row;
patches = blks_col(:,col_ind);
Example
As an example I assumed IMAGES as the 3D data obtained from reading one of the images provided in the image gallery of Image Processing Toolbox and increased the number of patches to 100000, i.e. -
IMAGES = imread('peppers.png');
numpatches = 100000;
The runtime with original code - 22.376446 seconds.
The runtime with im2col based code - 2.237993 seconds
Then, I doubled the number of patches to 200000, for which the runtime with original code literally doubled and im2col based approach's runtime stayed around that ~2.3 sec mark.
Thus, this im2col based approach would make sense when you are working with lots of patches as opposed to when working with lots of images (that are put in the third dimension of IMAGES).
Approach #2 - Indexing based
Being a purely indexing based approach, this is expected to be memory-efficient and good with performance too.
numpatches = 10000; %//Number of patches
blksz = 8; %// Blocksize
[m,n,r] = size(IMAGES); %// Get sizes
%// Get rand row, column and dimension-3 indices to be used for indexing
rand_row = randi(size(IMAGES,1)-blksz+1,numpatches,1);
rand_col = randi(size(IMAGES,2)-blksz+1,numpatches,1);
rand_dim3 = randi(size(IMAGES,3),numpatches,1);
%// Starting indices for each patch
start_ind = (rand_dim3-1)*m*n + (rand_col-1)*m + rand_row;
%// Row indices for each patch
lin_row = permute(bsxfun(#plus,start_ind,[0:blksz-1])',[1 3 2]); %//'
%// Get linear indices based on row and col indices
lin_rowcol = reshape(bsxfun(#plus,lin_row,[0:blksz-1]*m),blksz*blksz,[]);
%// Finally get the patches
patches = IMAGES(lin_rowcol);
To no longer copy the images, instead of these two lines:
rand_img = IMAGES(:, :, rand_img_number);
rand_patch = rand_img(rand_row:rand_row+patchsize-1, rand_col:rand_col+patchsize-1);
combine both to one line:
rand_patch = IMAGES(rand_row:rand_row+patchsize-1, rand_col:rand_col+patchsize-1, rand_img_number);
Another way to improve the performance: Generating 100 random numbers at onece is faster than generating 1 number 100 times. Generate all numbers you need outside the loop:
rand_img_number = randi(num_images,numpatches,1);
Then use rand_img_number(i) instead of rand_img_number inside the loop. Do the same for the two other random numbers.

intermittent loops in matlab

Hello again logical friends!
I’m aware this is quite an involved question so please bear with me! I think I’ve managed to get it down to two specifics:- I need two loops which I can’t seem to get working…
Firstly; The variable rollers(1).ink is a (12x1) vector containing ink values. This program shares the ink equally between rollers at each connection. I’m attempting to get rollers(1).ink to interact with rollers(2) only at specific timesteps. The ink should transfer into the system once for every full revolution i.e. nTimesSteps = each multiple of nBins_max. The ink should not transfer back to rollers(1).ink as the system rotates – it should only introduce ink to the system once per revolution and not take any back out. Currently I’ve set rollers(1).ink = ones but only for testing. I’m truly stuck here!
Secondly; The reason it needs to do this is because at the end of the sim I also wish to remove ink in the form of a printed image. The image should be a reflection of the ink on the last roller in my system and half of this value should be removed from the last roller and taken out of the system at each revolution. The ink remaining on the last roller should be recycled and ‘re-split’ in the system ready for the next rotation.
So…I think it’s around the loop beginning line86 where I need to do all this stuff. In pseudo, for the intermittent in-feed I’ve been trying something like:
For k = 1:nTimeSteps
While nTimesSteps = mod(nTimeSteps, nBins_max) == 0 % This should only output when nTimeSteps is a whole multiple of nBins_max i.e. one full revolution
‘Give me the ink on each segment at each time step in a matrix’
End
The output for averageAmountOfInk is the exact format I would like to return this data except I don’t really need the average, just the actual value at each moment in time. I keep getting errors for dimensional mismatches when I try to re-create this using something like:
For m = 1:nTimeSteps
For n = 1:N
Rollers(m,n) = rollers(n).ink’;
End
End
I’ll post the full code below if anyone is interested to see what it does currently. There’s a function at the end also which of course needs to be saved out to a separate file.
I’ve posted variations of this question a couple of times but I’m fully aware it’s quite a tricky one and I’m finding it difficult to get my intent across over the internets!
If anyone has any ideas/advice/general insults about my lack of programming skills then feel free to reply!
%% Simple roller train
% # Single forme roller
% # Ink film thickness = 1 micron
clc
clear all
clf
% # Initial state
C = [0,70; % # Roller centres (x, y)
10,70;
21,61;
11,48;
21,34;
27,16;
0,0
];
R = [5.6,4.42,9.8,6.65,10.59,8.4,23]; % # Roller radii (r)
% # Direction of rotation (clockwise = -1, anticlockwise = 1)
rotDir = [1,-1,1,-1,1,-1,1]';
N = numel(R); % # Amount of rollers
% # Find connected rollers
isconn = #(m, n)(sum(([1, -1] * C([m, n], :)).^2)...
-sum(R([m, n])).^2 < eps);
[Y, X] = meshgrid(1:N, 1:N);
conn = reshape(arrayfun(isconn, X(:), Y(:)), N, N) - eye(N);
% # Number of bins for biggest roller
nBins_max = 50;
nBins = round(nBins_max*R/max(R))';
% # Initialize roller struct
rollers = struct('position',{}','ink',{}','connections',{}',...
'rotDirection',{}');
% # Initialise matrices for roller properties
for ii = 1:N
rollers(ii).ink = zeros(1,nBins(ii));
rollers(ii).rotDirection = rotDir(ii);
rollers(ii).connections = zeros(1,nBins(ii));
rollers(ii).position = 1:nBins(ii);
end
for ii = 1:N
for jj = 1:N
if(ii~=jj)
if(conn(ii,jj) == 1)
connInd = getConnectionIndex(C,ii,jj,nBins(ii));
rollers(ii).connections(connInd) = jj;
end
end
end
end
% # Initialize averageAmountOfInk and calculate initial distribution
nTimeSteps = 1*nBins_max;
averageAmountOfInk = zeros(nTimeSteps,N);
inkPerSeg = zeros(nTimeSteps,N);
for ii = 1:N
averageAmountOfInk(1,ii) = mean(rollers(ii).ink);
end
% # Iterate through timesteps
for tt = 1:nTimeSteps
rollers(1).ink = ones(1,nBins(1));
% # Rotate all rollers
for ii = 1:N
rollers(ii).ink(:) = ...
circshift(rollers(ii).ink(:),rollers(ii).rotDirection);
end
% # Update all roller-connections
for ii = 1:N
for jj = 1:nBins(ii)
if(rollers(ii).connections(jj) ~= 0)
index1 = rollers(ii).connections(jj);
index2 = find(ii == rollers(index1).connections);
ink1 = rollers(ii).ink(jj);
ink2 = rollers(index1).ink(index2);
rollers(ii).ink(jj) = (ink1+ink2)/2;
rollers(index1).ink(index2) = (ink1+ink2)/2;
end
end
end
% # Calculate average amount of ink on each roller
for ii = 1:N
averageAmountOfInk(tt,ii) = sum(rollers(ii).ink);
end
end
image(5:20) = (rollers(7).ink(5:20))./2;
inkPerSeg1 = [rollers(1).ink]';
inkPerSeg2 = [rollers(2).ink]';
inkPerSeg3 = [rollers(3).ink]';
inkPerSeg4 = [rollers(4).ink]';
inkPerSeg5 = [rollers(5).ink]';
inkPerSeg6 = [rollers(6).ink]';
inkPerSeg7 = [rollers(7).ink]';
This is an extended comment rather than a proper answer, but the comment box is a bit too small ...
Your code overwhelms me, I can't see the wood for the trees. I suggest that you eliminate all the stuff we don't need to see to help you with your immediate problem (all those lines drawing figures for example) -- I think it will help you to debug your code yourself to put all that stuff into functions or scripts.
Your code snippet
For k = 1:nTimeSteps
While nTimesSteps = mod(nTimeSteps, nBins_max) == 0
‘Give me the ink on each segment at each time step in a matrix’
End
might be (I don't quite understand your use of the while statement, the word While is not a Matlab keyword, and as you have written it the value returned by the statement doesn't change from iteration to iteration) equivalent to
For k = 1:nBins_max:nTimeSteps
‘Give me the ink on each segment at each time step in a matrix’
End
You seem to have missed an essential feature of Matlab's colon operator ...
1:8 = [1 2 3 4 5 6 7 8]
but
1:2:8 = [1 3 5 7]
that is, the second number in the triplet is the stride between successive elements.
Your matrix conn has a 1 at the (row,col) where rollers are connected, and a 0 elsewhere. You can find the row and column indices of all the 1s like this:
[ri,ci] = find(conn==1)
You could then pick up the (row,col) locations of the 1s without the nest of loops and if statements that begins
for ii = 1:N
for jj = 1:N
if(ii~=jj)
if(conn(ii,jj) == 1)
I could go on, but won't, that's enough for one comment.

Matlab: Random sample with replacement

What is the best way to do random sample with replacement from dataset? I am using 316 * 34 as my dataset. I want to segment the data into three buckets but with replacement. Should I use the randperm because I need to make sure I keep the index intact where that index would be handy in identifying the label data. I am new to matlab I saw there are couple of random sample methods but they didn't look like its doing what I am looking for, its strange to think that something like doesn't exist in matlab, but I did the follwoing:
My issue is when I do this row_idx = round(rand(1)*316) sometimes I get zero, that leads to two questions
what should I do to avoid zeor?
What is the best way to do the random sample with replacement.
shuffle_X = X(randperm(size(X,1)),:);
lengthOf_shuffle_X = length(shuffle_X)
number_of_rows_per_bucket = round(lengthOf_shuffle_X / 3)
bucket_cell = cell(3,1)
bag_matrix = []
for k = 1:length(bucket_cell)
for i = 1:number_of_rows_per_bucket
row_idx = round(rand(1)*316)
bag_matrix(i,:) = shuffle_X(row_idx,:)
end
bucket_cell{k} = bag_matrix
end
I could do following:
if row_idx == 0
row_idx = round(rand(1)*316)
assuming random number will never give two zeros values in two consecutive rounds.
randi is a good way to get integer indices for sampling with replacement. Assuming you want to fill three buckets with an equal number of samples, then you can write
data = rand(316,34); %# create some dummy data
number_of_data = size(data,1);
number_of_rows_per_bucket = 50;
bucket_cell = cell(1,3);
idx = randi([1,number_of_data],[number_of_rows_per_bucket,3]);
for iBucket = 1:3
bucket_cell{iBucket} = data(idx(:,iBucket),:);
end
To the question: if you use randperm it will give you a draw order without replacement, since you can draw any item once.
If you use randi it draws you with replacement, that is you draw an item possibly many times.
If you want to "segment" a dataset, that usually means you split the dataset into three distinct sets. For that you use draw without replacement (you don't put the items back; use randperm). If you'd do it with replacement (using randi), it will be incredibly slow, since after some time the chance that you draw an item which you have not before is very low.
(Details in coupon collector ).
If you need a segmentation that is a split, you can just go over the elements and independently decide where to put it. (That is you choose a bucket for each item with replacement -- that is you put any chosen bucket back into the game.)
For that:
% if your data items are vectors say data = [1 1; 2 2; 3 3; 4 4]
num_data = length(data);
bucket_labels = randi(3,[1,num_data]); % draw a bucket label for each item, independently.
for i=1:3
bucket{i} = data(bucket_labels==i,:);
end
%if your data items are scalars say data = [1 2 3 4 5]
num_data = length(data);
bucket_labels = randi(3,[1,num_data]);
for i=1:3
bucket{i} = data(bucket_labels==i);
end
there we go.