sorting with explicit tie (repeated elements) resolution - matlab

By default, MATLAB's sort function deals with ties/repeated elements by preserving the order of the elements, that is
>> [srt,idx] = sort([1 0 1])
srt =
0 1 1
idx =
2 1 3
Note that the two elements with value 1 in the input arbitrarily get assigned index 2 and 3, respectively. idx = [3 1 2], however, would be an equally valid sort.
I would like a function [srt,all_idx] = sort_ties(in) that explicitly returns all possible values for idx that are consistent with the sorted output. Of course this would only happen in the case of ties or repeated elements, and all_idx would be dimension nPossibleSorts x length(in).
I got started on a recursive algorithm for doing this, but quickly realized that things were getting out of hand and someone must have solved this before! Any suggestions?

I had a similar idea to what R. M. suggested. However, this solution is generalized to handle any number of repeated elements in the input vector. The code first sorts the input (using the function SORT), then loops over each unique value to generate all the permutations of the indices for that value (using the function PERMS), storing the results in a cell array. Then these index permutations for each individual value are combined into the total number of permutations for the sorted index by replicating them appropriately with the functions KRON and REPMAT:
function [srt,all_idx] = sort_ties(in,varargin)
[srt,idx] = sort(in,varargin{:});
uniqueValues = srt(logical([1 diff(srt)]));
nValues = numel(uniqueValues);
if nValues == numel(srt)
all_idx = idx;
return
end
permCell = cell(1,nValues);
for iValue = 1:nValues
valueIndex = idx(srt == uniqueValues(iValue));
if numel(valueIndex) == 1
permCell{iValue} = valueIndex;
else
permCell{iValue} = perms(valueIndex);
end
end
nPerms = cellfun('size',permCell,1);
for iValue = 1:nValues
N = prod(nPerms(1:iValue-1));
M = prod(nPerms(iValue+1:end));
permCell{iValue} = repmat(kron(permCell{iValue},ones(N,1)),M,1);
end
all_idx = [permCell{:}];
end
And here are some sample results:
>> [srt,all_idx] = sort_ties([0 2 1 2 2 1])
srt =
0 1 1 2 2 2
all_idx =
1 6 3 5 4 2
1 3 6 5 4 2
1 6 3 5 2 4
1 3 6 5 2 4
1 6 3 4 5 2
1 3 6 4 5 2
1 6 3 4 2 5
1 3 6 4 2 5
1 6 3 2 4 5
1 3 6 2 4 5
1 6 3 2 5 4
1 3 6 2 5 4

Consider the example A=[1,2,3,2,5,6,2]. You want to find the indices where 2 occurs, and get all possible permutations of those indices.
For the first step, use unique in combination with histc to find the repeated element and the indices where it occurs.
uniqA=unique(A);
B=histc(A,uniqA);
You get B=[1 3 1 1 1]. Now you know which value in uniqA is repeated and how many times. To get the indices,
repeatIndices=find(A==uniqA(B==max(B)));
which gives the indices as [2, 4, 7]. Lastly, for all possible permutations of these indices, use the perms function.
perms(repeatIndices)
ans =
7 4 2
7 2 4
4 7 2
4 2 7
2 4 7
2 7 4
I believe this does what you wanted. You can write a wrapper function around all this so that you have something compact like out=sort_ties(in). You probably should include a conditional around the repeatIndices line, so that if B is all ones, you don't proceed any further (i.e., there are no ties).

Here is a possible solution I believe to be correct, but it's somewhat inefficient because of the duplicates it generates initially. It's pretty neat otherwise, but I still suspect it can be done better.
function [srt,idx] = tie_sort(in,order)
L = length(in);
[srt,idx] = sort(in,order);
for j = 1:L-1 % for each position in sorted array, look for repeats following it
for k = j+1:L
% if repeat found, add possible permutations to the list of possible sorts
if srt(j) == srt(k)
swapped = 1:L; swapped(j) = k; swapped(k) = j;
add_idx = idx(:,swapped);
idx = cat(1,idx,add_idx);
idx = unique(idx,'rows'); % remove identical copies
else % because already sorted, know don't have to keep looking
break;
end
end
end

Related

How to split a matrix based on how close the values are?

Suppose I have a matrix A:
A = [1 2 3 6 7 8];
I would like to split this matrix into sub-matrices based on how relatively close the numbers are. For example, the above matrix must be split into:
B = [1 2 3];
C = [6 7 8];
I understand that I need to define some sort of criteria for this grouping so I thought I'd take the absolute difference of the number and its next one, and define a limit upto which a number is allowed to be in a group. But the problem is that I cannot fix a static limit on the difference since the matrices and sub-matrices will be changing.
Another example:
A = [5 11 6 4 4 3 12 30 33 32 12];
So, this must be split into:
B = [5 6 4 4 3];
C = [11 12 12];
D = [30 33 32];
Here, the matrix is split into three parts based on how close the values are. So the criteria for this matrix is different from the previous one though what I want out of each matrix is the same, to separate it based on the closeness of its numbers. Is there any way I can specify a general set of conditions to make the criteria dynamic rather than static?
I'm afraid, my answer comes too late for you, but maybe future readers with a similar problem can profit from it.
In general, your problem calls for cluster analysis. Nevertheless, maybe there's a simpler solution to your actual problem. Here's my approach:
First, sort the input A.
To find a criterion to distinguish between "intraclass" and "interclass" elements, I calculate the differences between adjacent elements of A, using diff.
Then, I calculate the median over all these differences.
Finally, I find the indices for all differences, which are greater or equal than three times the median, with a minimum difference of 1. (Depending on the actual data, this might be modified, e.g. using mean instead.) These are the indices, where you will have to "split" the (sorted) input.
At last, I set up two vectors with the starting and end indices for each "sub-matrix", to use this approach using arrayfun to get a cell array with all desired "sub-matrices".
Now, here comes the code:
% Sort input, and calculate differences between adjacent elements
AA = sort(A);
d = diff(AA);
% Calculate median over all differences
m = median(d);
% Find indices with "significantly higher difference",
% e.g. greater or equal than three times the median
% (minimum difference should be 1)
idx = find(d >= max(1, 3 * m));
% Set up proper start and end indices
start_idx = [1 idx+1];
end_idx = [idx numel(A)];
% Generate cell array with desired vectors
out = arrayfun(#(x, y) AA(x:y), start_idx, end_idx, 'UniformOutput', false)
Due to the unknown number of possible vectors, I can't think of way to "unpack" these to individual variables.
Some tests:
A =
1 2 3 6 7 8
out =
{
[1,1] =
1 2 3
[1,2] =
6 7 8
}
A =
5 11 6 4 4 3 12 30 33 32 12
out =
{
[1,1] =
3 4 4 5 6
[1,2] =
11 12 12
[1,3] =
30 32 33
}
A =
1 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 3
out =
{
[1,1] =
1 1 1 1 1 1 1
[1,2] =
2 2 2 2 2 2
[1,3] =
3 3 3 3 3 3 3
}
Hope that helps!

Remove single elements from a vector

I have a vector M containing single elements and repeats. I want to delete all the single elements. Turning something like [1 1 2 3 4 5 4 4 5] to [1 1 4 5 4 4 5].
I thought I'd try to get the count of each element then use the index to delete what I don't need, something like this:
uniq = unique(M);
list = [uniq histc(M,uniq)];
Though I'm stuck here and not sure how to go forward. Can anyone help?
Here is a solution using unique, histcounts and ismember:
tmp=unique(M) ; %finding unique elements of M
%Now keeping only those elements in tmp which appear only once in M
tmp = tmp(histcounts(M,[tmp tmp(end)])==1); %Thanks to rahnema for his insight on this
[~,ind] = ismember(tmp,M); %finding the indexes of these elements in M
M(ind)=[];
histcounts was introduced in R2014b. For earlier versions, hist can be used by replacing that line with this:
tmp=tmp(hist(M,tmp)==1);
You can get the result with the following code:
A = [a.', ones(length(a),1)];
[C,~,ic] = unique(A(:,1));
result = [C, accumarray(ic,A(:,2))];
a = A(~ismember(A(:,1),result(result(:,2) == 1))).';
The idea is, add ones to the second column of a', then accumarray base on the first column (elements of a). After that, found the elements in first column which have accum sum in the second column. Therefore, these elements repeated once in a. Finally, removing them from the first column of A.
Here is a cheaper alternative:
[s ii] = sort(a);
x = [false s(2:end)==s(1:end-1)]
y = [x(2:end)|x(1:end-1) x(end)]
z(ii) = y;
result = a(z);
Assuming the input is
a =
1 1 8 8 3 1 4 5 4 6 4 5
we sort the list s and get index of the sorted list ii
s=
1 1 1 3 4 4 4 5 5 6 8 8
we can find index of repeated elements and for it we check if an element is equal to the previous element
x =
0 1 1 0 0 1 1 0 1 0 0 1
however in x the first elements of each block is omitted to find it we can apply [or] between each element with the previous element
y =
1 1 1 0 1 1 1 1 1 0 1 1
we now have sorted logical index of repeated elements. It should be reordered to its original order. For it we use index of sorted elements ii :
z =
1 1 1 1 0 1 1 1 1 0 1 1
finally use z to extract only the repeated elements.
result =
1 1 8 8 1 4 5 4 4 5
Here is a result of a test in Octave* for the following input:
a = randi([1 100000],1,10000000);
-------HIST--------
Elapsed time is 5.38654 seconds.
----ACCUMARRAY------
Elapsed time is 2.62602 seconds.
-------SORT--------
Elapsed time is 1.83391 seconds.
-------LOOP--------
Doesn't complete in 15 seconds.
*Since in Octave histcounts hasn't been implemented so instead of histcounts I used hist.
You can test it Online
X = [1 1 2 3 4 5 4 4 5];
Y = X;
A = unique(X);
for i = 1:length(A)
idx = find(X==A(i));
if length(idx) == 1
Y(idx) = NaN;
end
end
Y(isnan(Y)) = [];
Then, Y would be [1 1 4 5 4 4 5]. It detects all single elements, and makes them as NaN, and then remove all NaN elements from the vector.

Matlab returning matrix with value of element's rank in descending order [duplicate]

Hi I need to sort a vector and assign a ranking for the corresponding sorting order. I'm using sort function [sortedValue_X , X_Ranked] = sort(X,'descend');
but the problem is it assigns different ranks for the same values (zeros).
i.e. x = [ 13 15 5 5 0 0 0 1 0 3] and I want zeros to take the same last rank which is 6 and fives needs to share the 3rd rank etc..
any suggestions?
The syntax [sortedValues, sortedIndexes] = sort(x, 'descend') does not return rank as you describe it. It returns the indexes of the sorted values. This is really useful if you want to use the sort order from one array to rearrange another array.
As suggested by #user1860611, unique seems to do what you want, using the third output as follows:
x = [ 13 15 5 5 0 0 0 1 0 3];
[~, ~, forwardRank] = unique(x);
%Returns
%forwardRank =
% 5 6 4 4 1 1 1 2 1 3
To get the order you want (decending) you'll need to reverse the order, like this:
reverseRank = max(forwardRank) - forwardRank + 1
%Returns
%reverseRank =
% 2 1 3 3 6 6 6 5 6 4
You may be done at this point. But you may want to sort these into the into an acsending order. This is a reorder of the reverseRank vector which keeping it in sync with the original x vector, which is exactly what the 2nd argument of sort is desined to help with. So we can do something like this:
[xSorted, ixsSort] = sort(x, 'descend'); %Perform a sort on x
reverseRankSorted = reverseRank(ixsSort); %Apply that sort to reverseRank
Which generates:
xSorted = 15 13 5 5 3 1 0 0 0 0
reverseRankSorted = 1 2 3 3 4 5 6 6 6 6
tiedrank.m might be the thing you are looking for.
>> x = round(rand(1,5)*10)
x =
8 7 3 10 0
>> tiedrank(x)
ans =
4 3 2 5 1

Order of elements in cell array constructed by accumarray

What I'm trying to do: given a 2D matrix, get the column indices of the elements in each row that satisfy some particular condition.
For example, say my matrix is
M = [16 2 3 13; 5 11 10 8; 9 7 6 12; 4 14 15 1]
and my condition is M>6. Then my desired output would be something like
Indices = {[1 4]'; [2 3 4]'; [1 2 4]'; [2 3]';}
After reading the answers to this similar question I came up with this partial solution using find and accumarray:
[ix, iy] = find(M>6);
Indices = accumarray(ix,iy,[],#(iy){iy});
This gives very nearly the results I want -- in fact, the indices are all right, but they're not ordered the way I expected. For example, Indices{2} = [2 4 3]' instead of [2 3 4]', and I can't understand why. There are 3 occurrences of 2 in ix, at indices 3, 6, and 9. The corresponding values of iy at those indices are 2, 3, and 4, in that order. What exactly is creating the observed order? Is it just arbitrary? Is there a way to force it to be what I want, other than sorting each element of Indices afterwards?
Here's one way to solve it with arrayfun -
idx = arrayfun(#(x) find(M(x,:)>6),1:size(M,1),'Uni',0)
Display output wtih celldisp(idx) -
idx{1} =
1 4
idx{2} =
2 3 4
idx{3} =
1 2 4
idx{4} =
2 3
To continue working with accumarray, you can wrap iy with sort to get your desired output which doesn't look too pretty maybe -
Indices = accumarray(ix,iy,[],#(iy){sort(iy)})
Output -
>> celldisp(Indices)
Indices{1} =
1
4
Indices{2} =
2
3
4
Indices{3} =
1
2
4
Indices{4} =
2
3
accumarray is not guaranteed to preserve order of each chunk of its second input (see here, and also here). However, it does seem to preserve it when the first input is already sorted:
[iy, ix] = find(M.'>6); %'// transpose and reverse outputs, to make ix sorted
Indices = accumarray(ix,iy,[],#(iy){iy}); %// this line is the same as yours
produces
Indices{1} =
1
4
Indices{2} =
2
3
4
Indices{3} =
1
2
4
Indices{4} =
2
3

sort in matlab and assign ranking

Hi I need to sort a vector and assign a ranking for the corresponding sorting order. I'm using sort function [sortedValue_X , X_Ranked] = sort(X,'descend');
but the problem is it assigns different ranks for the same values (zeros).
i.e. x = [ 13 15 5 5 0 0 0 1 0 3] and I want zeros to take the same last rank which is 6 and fives needs to share the 3rd rank etc..
any suggestions?
The syntax [sortedValues, sortedIndexes] = sort(x, 'descend') does not return rank as you describe it. It returns the indexes of the sorted values. This is really useful if you want to use the sort order from one array to rearrange another array.
As suggested by #user1860611, unique seems to do what you want, using the third output as follows:
x = [ 13 15 5 5 0 0 0 1 0 3];
[~, ~, forwardRank] = unique(x);
%Returns
%forwardRank =
% 5 6 4 4 1 1 1 2 1 3
To get the order you want (decending) you'll need to reverse the order, like this:
reverseRank = max(forwardRank) - forwardRank + 1
%Returns
%reverseRank =
% 2 1 3 3 6 6 6 5 6 4
You may be done at this point. But you may want to sort these into the into an acsending order. This is a reorder of the reverseRank vector which keeping it in sync with the original x vector, which is exactly what the 2nd argument of sort is desined to help with. So we can do something like this:
[xSorted, ixsSort] = sort(x, 'descend'); %Perform a sort on x
reverseRankSorted = reverseRank(ixsSort); %Apply that sort to reverseRank
Which generates:
xSorted = 15 13 5 5 3 1 0 0 0 0
reverseRankSorted = 1 2 3 3 4 5 6 6 6 6
tiedrank.m might be the thing you are looking for.
>> x = round(rand(1,5)*10)
x =
8 7 3 10 0
>> tiedrank(x)
ans =
4 3 2 5 1