How to separate repetitive numbers - matlab

I have this kind of data
A = [1 0.5
1 0.1
1 0.3
2 1
2 0.5
2 0.1
2 1
5 2 ]
Looking at the first column, there can be repeating numbers, and those appearing only once. From rows starting with repeating numbers, I want to select the last occurrence, along with the rest of the row. For the above example, my output will become:
Output = [1 0.3
2 1 ]
How can I do this?

I am going to assume a couple of things:
The first column doesn't have to be sorted, but only contiguous groups are considered (i.e. if the first column contains entries like [2;2;2;3;2], the last row won't be considered a part of the "2 group"). If you want detached rows/groups to be considered, make sure to sort the rows of A before applying this algorithm).
The first column contains integers only.
Here's my suggestion:
out = A( [false; diff([logical(diff(A(:,1),1)); true])>0], :);
An explanation of the way it works:
We differentiate the first column to detect value transitions.
To the end of the previous result we concatenate a true, so that if the last row is part of a group, it gets considered.
Then we differentiate this again, so that we detect consecutive transitions (like the 2->5 in your example. We keep only "positive" transitions because only this indicates a new number in the first column.
Finally, we concatenate a false to the beginning, because the first row is never selected.

Using the unique function, you can easily solve your problem:
%%% Find the first indices of the unique numbers in column 1
[~, i_first, ~] = unique(A(:,1),'first');
%%% Then, find the last indices of the unique numbers in column 1
[~, i_last, ~] = unique(A(:,1),'last');
%%% Lastly, remove the entries with the same starting and finishing index
%%% from the last indices vector
i_last(i_last == i_first) = [];
%%% Output the requested values
Output = A(i_last, :);
This solution asumes the following: (courtesy of Dev-iL)
1. first column has to contain integers (otherwise this would require uniquetol)
2. non-contiguous groups are treated as contiguous (i.e. this performs sorting implicitly)

Related

Matlab: Remove rows when first and last 2 elements are shuffled

I have a matrix where each element is a single unit of a 2d coordinate. As such, each element in any given row are paired, where elements in the first column are paired with those in the second, and elements in the third column paired with the fourth. All possible combinations of the 4 numbers are present in the matrix.
What I need to do is depup the matrix by removing rows where the first set of coordinates (e.g columns 1 and 2 in a row) are swapped with the second set of coordinates. For example if one row contains the value "3, 4, 2, 1" then I would need to remove "2, 1, 3, 4" from else where in the matrix.
An example of this could be seen below, where I would want to remove the last row, as it is the reverse of the first row;
3 3 1 1
1 2 2 3
3 4 1 2
4 4 3 1
4 1 4 4
1 1 3 3
I'm quite stumped as to how to do this, and my previous attempts have all failed. Whilst it may not be useful to the answer, I have included my code showing how I am constructing the initial matrix below;
%create list of all piece coordinates
p1_row_index=(1:n);
p1_column_index=(1:n);
p2_row_index=(1:n);
p2_column_index=(1:n);
% get all possible combinations of these variables
[p1_row_index,p1_column_index,p2_row_index,p2_column_index]=BalanceFactors(1,1,1:n,1:n,1:n,1:n);
pc_list(:,1)=p1_row_index; % piece1 coordiantes for rows
pc_list(:,2)=p1_column_index; % piece1 coordiantes for columns
pc_list(:,3)=p2_row_index; % piece2 coordiantes for rows
pc_list(:,4)=p2_column_index; % piece2 coordiantes for columns
Thank you for your time.
Many thanks,
Matt
Complex numbers come in handy for this:
[~, ind] = unique(sort(M(:,[1 3])+1j*M(:,[2 4]), 2), 'rows', 'stable');
result = M(ind, :);
The code works as follows:
M(:,[1 3])+1j*M(:,[2 4]) creates a complex matrix with half the columns, where each pair of coordinates of the original matrix becomes a complex number.
sort(..., 2) sorts each row. Rows that originally were shuffled versions of each other now become identical.
[~, ind] = unique(..., 'rows', 'stable') gives the index of the first occurrence of each unique (complex, sorted) row.
M(ind, :) selects the desired rows from M.

Recognise that numbers in a row of a matrix are all the same number

I have a matrix of 0s, 1s, 2s and 3s.If all the elements in the same row are the same then I want it to display the text 'flush'. For example, I have the matrix
[0,1,0,2,3;
0,0,0,0,0;
3,2,1,3,1;
2,2,2,2,2];
How would I program Matlab to recognise the 2nd and 4th row all have the same number?
A = [0,1,0,2,3; 0,0,0,0,0; 3,2,1,3,1; 2,2,2,2,2]
As it was said before if you only have positive numbers you can use the variance.
n_flush = var(A, [], 2) == 0
However, this will fail for negative numbers for example a row like [-2 -1 1 2].
What I would do is to compare the first column with the rest and flag the rows where all the elements are equal.
n_flush = all(bsxfun(#eq, A(:,1), A(:,2:end)),2)
Now, if you want to display flush every time the rows are equal you can do
for ind = find(n_flush)
fprintf('flush row %i\n', ind)
end
If you need to have the whole thing in a one-liner (which is what many Matlab-geeks try to do), then maybe this here will suit your needs
cellfun(#(x) char((x==0)*sprintf('flush')), num2cell(var(A')'), 'UniformOutput', false)
Edit: nice idea GameOfThrows
Yet another solution by explicitly subtracting the first column from each column via duplicating the first column to other columns of a matching-sized matrix.
identical_rows = ~any(A - kron(ones(1,size(A,2)),A(:,1)),2)

Fastest way of finding repeated values in different cell arrays of different size

The problem is the following:
I have a cell array of the form indx{jj} where each jj is an array of 1xNjj, meaning they all have different size. In my case max(jj)==3, but lets consider a general case for the shake of it.
How would you find the value(s) repeated in all the jj i the fastest way?
I can guess how to do it with several for loops, but is there a "one (three?) liner"?
Simple example:
indx{1}=[ 1 3 5 7 9];
indx{2}=[ 2 3 4 1];
indx{3}=[ 1 2 5 3 3 5 4];
ans=[1 3];
One possibility is to use a for loop with intersect:
result = indx{1}; %// will be changed
for n = 2:numel(indx)
result = intersect(result, indx{n});
end
Almost no-loop approach (almost because cellfun essentially uses loop(s) inside it, but it's effect here is minimal as we are using it to find just the number of elements in each cell) -
lens = cellfun(#numel,indx);
val_ind = bsxfun(#ge,lens,[1:max(lens)]');
vals = horzcat(indx{:});
mat1(max(lens),numel(lens))=0;
mat1(val_ind) = vals;
unqvals = unique(vals);
out = unqvals(all(any(bsxfun(#eq,mat1,permute(unqvals,[1 3 2]))),2));
Another possibility that I could suggest, though Luis Mendo's answer is very good, is to take all of the vectors in your cell array and remove the duplicates. This can be done through cellfun, and specifying unique as the function to operate on. You'd have to set the UniformOutput flag to false as we are outputting a cell array at each index. You also have to be careful in that each cell array is assumed to be all row vectors, or all column vectors. You can't mix the way the arrays are shaped or this method won't work.
Once you do this, concatenate all of the vectors into a single array through cell2mat, then do a histogram through histc. You'd specify the edges to be all of the unique numbers in the single array created before. Note that you'd have to make an additional call to unique on the output single array before proceeding. Once you calculate the histogram, for any entries with a bin count equal to the total number of elements in your cell array (which is 3 in your case), then these are values that you see in all of your cells. As such:
A = cell2mat(cellfun(#unique, indx, 'uni', 0));
edge_values = unique(A);
h = histc(A, edge_values);
result = edge_values(h == numel(indx));
With the unique call for each cell array, if a number appears in every single cell, then the total number of times you see this number should equal the total number of cells you have.

Matlab matching first column of a row as index and then averaging all columns in that row

I need help with taking the following data which is organized in a large matrix and averaging all of the values that have a matching ID (index) and outputting another matrix with just the ID and the averaged value that trail it.
File with data format:
(This is the StarData variable)
ID>>>>Values
002141865 3.867144e-03 742.000000 0.001121 16.155089 6.297494 0.001677
002141865 5.429278e-03 1940.000000 0.000477 16.583748 11.945627 0.001622
002141865 4.360715e-03 1897.000000 0.000667 16.863406 13.438383 0.001460
002141865 3.972467e-03 2127.000000 0.000459 16.103060 21.966853 0.001196
002141865 8.542932e-03 2094.000000 0.000421 17.452007 18.067214 0.002490
Do not be mislead by the examples I posted, that first number is repeated for about 15 lines then the ID changes and that goes for an entire set of different ID's, then they are repeated as a whole group again, think first block of code = [1 2 3; 1 5 9; 2 5 7; 2 4 6] then the code repeats with different values for the columns except for the index. The main difference is the values trailing the ID which I need to average out in matlab and output a clean matrix with only one of each ID fully averaged for all occurrences of that ID.
Thanks for any help given.
A modification of this answer does the job, as follows:
[value_sort ind_sort] = sort(StarData(:,1));
[~, ii, jj] = unique(value_sort);
n = diff([0; ii]);
averages = NaN(length(n),size(StarData,2)); % preallocate
averages(:,1) = StarData(ii,1);
for col = 2:size(StarData,2)
averages(:,col) = accumarray(jj,StarData(ind_sort,col))./n;
end
The result is in variable averages. Its first column contains the values used as indices, and each subsequent column contains the average for that column according to the index value.
Compatibility issues for Matlab 2013a onwards:
The function unique has changed in Matlab 2013a. For that version onwards, add 'legacy' flag to unique, i.e. replace second line by
[~, ii, jj] = unique(value_sort,'legacy')

MATLAB vector: prevent consecutive values from same range

Okay, this might seem like a weird question, but bear with me.
So I have a random vector in a .m file, with certain constraints built into it. Here is my code:
randvecall = randsample(done, done, true);
randvec = randvecall([1;diff(randvecall(:))]~=0);
"Done" is just the range of values we take the sample from, so don't worry about that. As you can see, this randsamples from a range of values, and then prunes this random vector with the diff function, so that consecutive duplicate values are removed. There is still the potential for duplicate values in the vector, but they simply cannot be consecutive.
This is all well and good, and works perfectly fine.
So, say, randvec looks like this:
randvec =
54
47
52
26
39
2
14
51
24
6
19
56
34
46
12
7
41
18
29
7
It is actually a lot longer, with something like 60-70 values, but you get the point.
What I want to do is add a little extra constraint on to this vector. When I sample from this vector, the values are classified according to their range. So values from 1-15 are category 1, 16-30 are category 2, and so on. The reasons for this are unimportant, but it is a pretty important part of the program. So if you look at the values I provided above, you see a section like this:
7
41
18
29
7
This is actually bad for my program. Because the value ranges are treated separately, 41, 18, and 29 are used differently than 7 is. So, for all intents and purposes, 7 is appearing consecutively in my script. What I want to do is somehow parse/modify/whatever the vector when it is generated so that the same number from a certain range cannot appear twice "in a row," regardless of how many other numbers from different ranges are between them. Does this make sense/did I describe this well? So, I want MATLAB to search the vector, and for all values within certain ranges (1-15,16-30,31-45,46-60) make sure that "consecutive" values from the same range are not identical.
So, then, that is what I want to do. This may not by any means be the best way to do this, so any advice/alternatives are, of course, appreciated. I know I can do this better with multiple vectors, but for various reasons I need this to be a single, long vector (the way my script is designed it just wouldn't work if I had a separate vector for each range of values).
What you may want to do is create four random vectors, one for each category, ensure that they do not contain any two consecutive equal values, and then build your final random vector by ordered picking of values from random categories, i.e.
%# make a 50-by-nCategories array of random numbers
categories = [1,16,31,46;15,30,45,60]; %# category min/max
nCategories = size(categories,2);
randomCategories = zeros(50,nCategories);
for c=1:nCategories
%# draw 100 numbers for good measure
tmp = randi(categories(:,c),[100 1]);
tmp(diff(tmp==0)) = []; %# remove consecutive duplicates
%# store
randomCategories(:,c) = tmp(1:50);
end
%# select from which bins to pick. Use half the numbers, so that we don't force the
%# numbers of entries per category to be exactly equal
bins = randi(nCategories,[100,1]);
%# combine the output, i.e. replace e.g. the numbers
%# '3' in 'bins' with the consecutive entries
%# from the third category
out = zeros(100,1);
for c = 1:nCategories
cIdx = find(bins==c);
out(cIdx) = randomCategories(1:length(cIdx),c);
end
First we assign each element the bin number of the range it lies into:
[~,bins] = histc(randvec, [1 16 31 46 61]);
Next we loop for each range, and find elements in those categories. For example for the first range of 1-16, we get:
>> ind = find(bins==1); %# bin#1 of 1-16
>> x = randvec(ind)
ans =
2
14
6
12
7
7
now you can apply the same process of removing consecutive duplicates:
>> idx = ([1;diff(x)] == 0)
idx =
0
0
0
0
0
1
>> problematicIndices = ind(idx) %# indices into the vector: randvec
Do this for all ranges, and collect those problematic indices. Next decide how you want to deal with them (remove them, generate other numbers in their place, etc...)
If I understand your problem correct, I think that is one solution. It uses unique, but applies it to each of the subranges of the vector. The values that are duplicated within a range of indices are identified so you can deal with them.
cat_inds = [1,16,31,46,60]; % need to include last element
for i=2:numel(cat_inds)
randvec_part = randvec( cat_inds(i-1):cat_inds(i) );
% Find the indices for the first unique elements in this part of the array
[~,uniqInds] = unique(randvec_part,'first');
% this binary vector identifies the indices that are duplicated in
% this part of randvec
%
% NB: they are indices into randvec_part
%
inds_of_duplicates = ~ismember(1:numel(randvec_part), uniqInds);
% code to deal with the problem indices goes here. Modify randvec_part accordingly...
% Write it back to the original vector (assumes that the length is the same)
randvec( cat_inds(i-1):cat_inds(i) ) = randvec_part;
end
Here's a different approach than what everyone else has been tossing up. The premise that I'm working on here is that you want to have a random arrangement of values in a vector without repitition. I'm not sure what other constraints you are applying prior to the point where we are giving out input.
My thoughts is to use the randperm function.
Here's some sample code how it would work:
%randvec is your vector of random values
randvec2 = unique(randvec); % This will return the sorted list of values from randvec.
randomizedvector = randvec2(randperm(length(randvec2));
% Note: if randvec is multidimensional you'll have to use numel instead of length
At this point randomizedvector should contain all the unique values from the initial randvec and but 'shuffled' or re-randomized after the unique function call. Now you could just seed the randvec differently to avoid needing the unique function call as simply calling randperm(n) will returning a randomized vector with values ranging from 1 to n.
Just an off the wall 2 cents there =P enjoy!