MATLAB vector: prevent consecutive values from same range - matlab

Okay, this might seem like a weird question, but bear with me.
So I have a random vector in a .m file, with certain constraints built into it. Here is my code:
randvecall = randsample(done, done, true);
randvec = randvecall([1;diff(randvecall(:))]~=0);
"Done" is just the range of values we take the sample from, so don't worry about that. As you can see, this randsamples from a range of values, and then prunes this random vector with the diff function, so that consecutive duplicate values are removed. There is still the potential for duplicate values in the vector, but they simply cannot be consecutive.
This is all well and good, and works perfectly fine.
So, say, randvec looks like this:
randvec =
54
47
52
26
39
2
14
51
24
6
19
56
34
46
12
7
41
18
29
7
It is actually a lot longer, with something like 60-70 values, but you get the point.
What I want to do is add a little extra constraint on to this vector. When I sample from this vector, the values are classified according to their range. So values from 1-15 are category 1, 16-30 are category 2, and so on. The reasons for this are unimportant, but it is a pretty important part of the program. So if you look at the values I provided above, you see a section like this:
7
41
18
29
7
This is actually bad for my program. Because the value ranges are treated separately, 41, 18, and 29 are used differently than 7 is. So, for all intents and purposes, 7 is appearing consecutively in my script. What I want to do is somehow parse/modify/whatever the vector when it is generated so that the same number from a certain range cannot appear twice "in a row," regardless of how many other numbers from different ranges are between them. Does this make sense/did I describe this well? So, I want MATLAB to search the vector, and for all values within certain ranges (1-15,16-30,31-45,46-60) make sure that "consecutive" values from the same range are not identical.
So, then, that is what I want to do. This may not by any means be the best way to do this, so any advice/alternatives are, of course, appreciated. I know I can do this better with multiple vectors, but for various reasons I need this to be a single, long vector (the way my script is designed it just wouldn't work if I had a separate vector for each range of values).

What you may want to do is create four random vectors, one for each category, ensure that they do not contain any two consecutive equal values, and then build your final random vector by ordered picking of values from random categories, i.e.
%# make a 50-by-nCategories array of random numbers
categories = [1,16,31,46;15,30,45,60]; %# category min/max
nCategories = size(categories,2);
randomCategories = zeros(50,nCategories);
for c=1:nCategories
%# draw 100 numbers for good measure
tmp = randi(categories(:,c),[100 1]);
tmp(diff(tmp==0)) = []; %# remove consecutive duplicates
%# store
randomCategories(:,c) = tmp(1:50);
end
%# select from which bins to pick. Use half the numbers, so that we don't force the
%# numbers of entries per category to be exactly equal
bins = randi(nCategories,[100,1]);
%# combine the output, i.e. replace e.g. the numbers
%# '3' in 'bins' with the consecutive entries
%# from the third category
out = zeros(100,1);
for c = 1:nCategories
cIdx = find(bins==c);
out(cIdx) = randomCategories(1:length(cIdx),c);
end

First we assign each element the bin number of the range it lies into:
[~,bins] = histc(randvec, [1 16 31 46 61]);
Next we loop for each range, and find elements in those categories. For example for the first range of 1-16, we get:
>> ind = find(bins==1); %# bin#1 of 1-16
>> x = randvec(ind)
ans =
2
14
6
12
7
7
now you can apply the same process of removing consecutive duplicates:
>> idx = ([1;diff(x)] == 0)
idx =
0
0
0
0
0
1
>> problematicIndices = ind(idx) %# indices into the vector: randvec
Do this for all ranges, and collect those problematic indices. Next decide how you want to deal with them (remove them, generate other numbers in their place, etc...)

If I understand your problem correct, I think that is one solution. It uses unique, but applies it to each of the subranges of the vector. The values that are duplicated within a range of indices are identified so you can deal with them.
cat_inds = [1,16,31,46,60]; % need to include last element
for i=2:numel(cat_inds)
randvec_part = randvec( cat_inds(i-1):cat_inds(i) );
% Find the indices for the first unique elements in this part of the array
[~,uniqInds] = unique(randvec_part,'first');
% this binary vector identifies the indices that are duplicated in
% this part of randvec
%
% NB: they are indices into randvec_part
%
inds_of_duplicates = ~ismember(1:numel(randvec_part), uniqInds);
% code to deal with the problem indices goes here. Modify randvec_part accordingly...
% Write it back to the original vector (assumes that the length is the same)
randvec( cat_inds(i-1):cat_inds(i) ) = randvec_part;
end

Here's a different approach than what everyone else has been tossing up. The premise that I'm working on here is that you want to have a random arrangement of values in a vector without repitition. I'm not sure what other constraints you are applying prior to the point where we are giving out input.
My thoughts is to use the randperm function.
Here's some sample code how it would work:
%randvec is your vector of random values
randvec2 = unique(randvec); % This will return the sorted list of values from randvec.
randomizedvector = randvec2(randperm(length(randvec2));
% Note: if randvec is multidimensional you'll have to use numel instead of length
At this point randomizedvector should contain all the unique values from the initial randvec and but 'shuffled' or re-randomized after the unique function call. Now you could just seed the randvec differently to avoid needing the unique function call as simply calling randperm(n) will returning a randomized vector with values ranging from 1 to n.
Just an off the wall 2 cents there =P enjoy!

Related

Matlab: Array of random integers with no direct repetition

For my experiment I have 20 categories which contain 9 pictures each. I want to show these pictures in a pseudo-random sequence where the only constraint to randomness is that one image may not be followed directly by one of the same category.
So I need something similar to
r = randi([1 20],1,180);
just with an added constraint of two numbers not directly following each other. E.g.
14 8 15 15 7 16 6 4 1 8 is not legitimate, whereas
14 8 15 7 15 16 6 4 1 8 would be.
An alternative way I was thinking of was naming the categories A,B,C,...T, have them repeat 9 times and then shuffle the bunch. But there you run into the same problem I think?
I am an absolute Matlab beginner, so any guidance will be welcome.
The following uses modulo operations to make sure each value is different from the previous one:
m = 20; %// number of categories
n = 180; %// desired number of samples
x = [randi(m)-1 randi(m-1, [1 n-1])];
x = mod(cumsum(x), m) + 1;
How the code works
In the third line, the first entry of x is a random value between 0 and m-1. Each subsequent entry represents the change that, modulo m, will give the next value (this is done in the fourth line).
The key is to choose that change between 1 and m-1 (not between 0 and m-1), to assure consecutive values will be different. In other words, given a value, there are m-1 (not m) choices for the next value.
After the modulo operation, 1 is added to to transform the range of resulting values from 0,...,m-1 to 1,...,m.
Test
Take all (n-1) pairs of consecutive entries in the generated x vector and count occurrences of all (m^2) possible combinations of values:
count = accumarray([x(1:end-1); x(2:end)].', 1, [m m]);
imagesc(count)
axis square
colorbar
The following image has been obtained for m=20; n=1e6;. It is seen that all combinations are (more or less) equally likely, except for pairs with repeated values, which never occur.
You could look for the repetitions in an iterative manner and put new set of integers from the same group [1 20] only into those places where repetitions have occurred. We continue to do so until there are no repetitions left -
interval = [1 20]; %// interval from where the random integers are to be chosen
r = randi(interval,1,180); %// create the first batch of numbers
idx = diff(r)==0; %// logical array, where 1s denote repetitions for first batch
while nnz(idx)~=0
idx = diff(r)==0; %// logical array, where 1s denote repetitions for
%// subsequent batches
rN = randi(interval,1,nnz(idx)); %// new set of random integers to be placed
%// at the positions where repetitions have occured
r(find(idx)+1) = rN; %// place ramdom integers at their respective positions
end

Matlab matching first column of a row as index and then averaging all columns in that row

I need help with taking the following data which is organized in a large matrix and averaging all of the values that have a matching ID (index) and outputting another matrix with just the ID and the averaged value that trail it.
File with data format:
(This is the StarData variable)
ID>>>>Values
002141865 3.867144e-03 742.000000 0.001121 16.155089 6.297494 0.001677
002141865 5.429278e-03 1940.000000 0.000477 16.583748 11.945627 0.001622
002141865 4.360715e-03 1897.000000 0.000667 16.863406 13.438383 0.001460
002141865 3.972467e-03 2127.000000 0.000459 16.103060 21.966853 0.001196
002141865 8.542932e-03 2094.000000 0.000421 17.452007 18.067214 0.002490
Do not be mislead by the examples I posted, that first number is repeated for about 15 lines then the ID changes and that goes for an entire set of different ID's, then they are repeated as a whole group again, think first block of code = [1 2 3; 1 5 9; 2 5 7; 2 4 6] then the code repeats with different values for the columns except for the index. The main difference is the values trailing the ID which I need to average out in matlab and output a clean matrix with only one of each ID fully averaged for all occurrences of that ID.
Thanks for any help given.
A modification of this answer does the job, as follows:
[value_sort ind_sort] = sort(StarData(:,1));
[~, ii, jj] = unique(value_sort);
n = diff([0; ii]);
averages = NaN(length(n),size(StarData,2)); % preallocate
averages(:,1) = StarData(ii,1);
for col = 2:size(StarData,2)
averages(:,col) = accumarray(jj,StarData(ind_sort,col))./n;
end
The result is in variable averages. Its first column contains the values used as indices, and each subsequent column contains the average for that column according to the index value.
Compatibility issues for Matlab 2013a onwards:
The function unique has changed in Matlab 2013a. For that version onwards, add 'legacy' flag to unique, i.e. replace second line by
[~, ii, jj] = unique(value_sort,'legacy')

Find median value of the largest clump of similar values in an array in the most computationally efficient manner

Sorry for the long title, but that about sums it up.
I am looking to find the median value of the largest clump of similar values in an array in the most computationally efficient manner.
for example:
H = [99,100,101,102,103,180,181,182,5,250,17]
I would be looking for the 101.
The array is not sorted, I just typed it in the above order for easier understanding.
The array is of a constant length and you can always assume there will be at least one clump of similar values.
What I have been doing so far is basically computing the standard deviation with one of the values removed and finding the value which corresponds to the largest reduction in STD and repeating that for the number of elements in the array, which is terribly inefficient.
for j = 1:7
G = double(H);
for i = 1:7
G(i) = NaN;
T(i) = nanstd(G);
end
best = find(T==min(T));
H(best) = NaN;
end
x = find(H==max(H));
Any thoughts?
This possibility bins your data and looks for the bin with most elements. If your distribution consists of well separated clusters this should work reasonably well.
H = [99,100,101,102,103,180,181,182,5,250,17];
nbins = length(H); % <-- set # of bins here
[v bins]=hist(H,nbins);
[vm im]=max(v); % find max in histogram
bl = bins(2)-bins(1); % bin size
bm = bins(im); % position of bin with max #
ifb =find(abs(H-bm)<bl/2) % elements within bin
median(H(ifb)) % average over those elements in bin
Output:
ifb = 1 2 3 4 5
H(ifb) = 99 100 101 102 103
median = 101
The more challenging parameters to set are the number of bins and the size of the region to look around the most populated bin. In the example you provided neither of these is so critical, you could set the number of bins to 3 (instead of length(H)) and it still would work. Using length(H) as the number of bins is in fact a little extreme and probably not a good general choice. A better choice is somewhere between that number and the expected number of clusters.
It may help for certain distributions to change bl within the find expression to a value you judge better in advance.
I should also note that there are clustering methods (kmeans) that may work better, but perhaps less efficiently. For instance this is the output of [H' kmeans(H',4) ]:
99 2
100 2
101 2
102 2
103 2
180 3
181 3
182 3
5 4
250 3
17 1
In this case I decided in advance to attempt grouping into 4 clusters.
Using kmeans you can get an answer as follows:
nbin = 4;
km = kmeans(H',nbin);
[mv iv]=max(histc(km,[1:nbin]));
H(km==km(iv))
median(H(km==km(iv)))
Notice however that kmeans does not necessarily return the same value every time it is run, so you might need to average over a few iterations.
I timed the two methods and found that kmeans takes ~10 X longer. However, it is more robust since the bin sizes adapt to your problem and do not need to be set beforehand (only the number of bins does).

extracting data from an existing matrix

I have a matrix containing 4320 entries
for example:
P=[ 26 29 31 33 35 26 29 ..........25]
I want to create 180 matrices and every matrix contains 24 entries,i.e
the 1st matrix contains the 1st 24 entries
the 2nd matrix contains the 2nd 24 entries and so on
I know a simple method but it will take a long time which is:
P1=P(1:24);P2=P(25:48),..........P180=P(4297:4320)
and it is dificult since I have huge number of entries for
the original matrix P
thanks
I'm going to go ahead and assume this is MATLAB-related, in which case you'd use the reshape function:
Px = reshape(P, 24, []);
Px will now be a proper matrix, and you can access each of the 180 "matrices" (actually row vectors, you seem to be confusing the two) by simple MATLAB syntax:
P100 = P(:,100);
You can loop through the items in the index, counting up, creating a new matrix every 24 entries. Modular arithmetic might help:
foreach (var currentIndexInLargerMatrix : int = 0 to 4320)
begin
matrixToPutItIn := currentIndexInLargerMatrix div 24;
indexInNewMatrix := currentIndexInLargerMatrix mod 24;
end
in many languages the modulus (remainder) operator is either "mod" or "%".
"div" here denotes integer division. Most languages just use the virgule (slash) "/".
This obviously isn't complete code, but should get you started.
I think You's answer is the best way to approach your problem, where each submatrix is stored as a row or column in a larger matrix and is retrieved by simply indexing into that larger matrix.
However, if you really want/need to create 180 separate variables labeled P1 through P180, the way to do this has been discussed in other questions, like this one. In your case, you could use the function EVAL like so:
for iMatrix = 1:180 %# Loop 180 times
tempP = P((1:24)+24*(iMatrix-1)); %# Get your submatrix
eval(['P' int2str(iMatrix) ' = tempP;']); %# Create a new variable
end

Increasing the length of a column in MATLAB

I'm just beginning to teach myself MATLAB, and I'm making a 501x6 array. The columns will contain probabilities for flipping 101 sided die, and as such, the columns contain 101,201,301 entries, not 501. Is there a way to 'stretch the column' so that I add 0s above and below the useful data? So far I've only thought of making a column like a=[zeros(200,1);die;zeros(200,1)] so that only the data shows up in rows 201-301, and similarly, b=[zeros(150,1);die2;zeros(150,1)], if I wanted 200 or 150 zeros to precede and follow the data, respectively in order for it to fit in the array.
Thanks for any suggestions.
You can do several thing:
Start with an all-zero matrix, and only modify the elements you need to be non-zero:
A = zeros(501,6);
A(someValue:someOtherValue, 5) = value;
% OR: assign the range to a vector:
A(someValue:someOtherValue, 5) = 1:20; % if someValue:someOtherValue is the same length as 1:20