Matlab returning matrix with value of element's rank in descending order [duplicate] - matlab

Hi I need to sort a vector and assign a ranking for the corresponding sorting order. I'm using sort function [sortedValue_X , X_Ranked] = sort(X,'descend');
but the problem is it assigns different ranks for the same values (zeros).
i.e. x = [ 13 15 5 5 0 0 0 1 0 3] and I want zeros to take the same last rank which is 6 and fives needs to share the 3rd rank etc..
any suggestions?

The syntax [sortedValues, sortedIndexes] = sort(x, 'descend') does not return rank as you describe it. It returns the indexes of the sorted values. This is really useful if you want to use the sort order from one array to rearrange another array.
As suggested by #user1860611, unique seems to do what you want, using the third output as follows:
x = [ 13 15 5 5 0 0 0 1 0 3];
[~, ~, forwardRank] = unique(x);
%Returns
%forwardRank =
% 5 6 4 4 1 1 1 2 1 3
To get the order you want (decending) you'll need to reverse the order, like this:
reverseRank = max(forwardRank) - forwardRank + 1
%Returns
%reverseRank =
% 2 1 3 3 6 6 6 5 6 4
You may be done at this point. But you may want to sort these into the into an acsending order. This is a reorder of the reverseRank vector which keeping it in sync with the original x vector, which is exactly what the 2nd argument of sort is desined to help with. So we can do something like this:
[xSorted, ixsSort] = sort(x, 'descend'); %Perform a sort on x
reverseRankSorted = reverseRank(ixsSort); %Apply that sort to reverseRank
Which generates:
xSorted = 15 13 5 5 3 1 0 0 0 0
reverseRankSorted = 1 2 3 3 4 5 6 6 6 6

tiedrank.m might be the thing you are looking for.
>> x = round(rand(1,5)*10)
x =
8 7 3 10 0
>> tiedrank(x)
ans =
4 3 2 5 1

Related

Remove single elements from a vector

I have a vector M containing single elements and repeats. I want to delete all the single elements. Turning something like [1 1 2 3 4 5 4 4 5] to [1 1 4 5 4 4 5].
I thought I'd try to get the count of each element then use the index to delete what I don't need, something like this:
uniq = unique(M);
list = [uniq histc(M,uniq)];
Though I'm stuck here and not sure how to go forward. Can anyone help?
Here is a solution using unique, histcounts and ismember:
tmp=unique(M) ; %finding unique elements of M
%Now keeping only those elements in tmp which appear only once in M
tmp = tmp(histcounts(M,[tmp tmp(end)])==1); %Thanks to rahnema for his insight on this
[~,ind] = ismember(tmp,M); %finding the indexes of these elements in M
M(ind)=[];
histcounts was introduced in R2014b. For earlier versions, hist can be used by replacing that line with this:
tmp=tmp(hist(M,tmp)==1);
You can get the result with the following code:
A = [a.', ones(length(a),1)];
[C,~,ic] = unique(A(:,1));
result = [C, accumarray(ic,A(:,2))];
a = A(~ismember(A(:,1),result(result(:,2) == 1))).';
The idea is, add ones to the second column of a', then accumarray base on the first column (elements of a). After that, found the elements in first column which have accum sum in the second column. Therefore, these elements repeated once in a. Finally, removing them from the first column of A.
Here is a cheaper alternative:
[s ii] = sort(a);
x = [false s(2:end)==s(1:end-1)]
y = [x(2:end)|x(1:end-1) x(end)]
z(ii) = y;
result = a(z);
Assuming the input is
a =
1 1 8 8 3 1 4 5 4 6 4 5
we sort the list s and get index of the sorted list ii
s=
1 1 1 3 4 4 4 5 5 6 8 8
we can find index of repeated elements and for it we check if an element is equal to the previous element
x =
0 1 1 0 0 1 1 0 1 0 0 1
however in x the first elements of each block is omitted to find it we can apply [or] between each element with the previous element
y =
1 1 1 0 1 1 1 1 1 0 1 1
we now have sorted logical index of repeated elements. It should be reordered to its original order. For it we use index of sorted elements ii :
z =
1 1 1 1 0 1 1 1 1 0 1 1
finally use z to extract only the repeated elements.
result =
1 1 8 8 1 4 5 4 4 5
Here is a result of a test in Octave* for the following input:
a = randi([1 100000],1,10000000);
-------HIST--------
Elapsed time is 5.38654 seconds.
----ACCUMARRAY------
Elapsed time is 2.62602 seconds.
-------SORT--------
Elapsed time is 1.83391 seconds.
-------LOOP--------
Doesn't complete in 15 seconds.
*Since in Octave histcounts hasn't been implemented so instead of histcounts I used hist.
You can test it Online
X = [1 1 2 3 4 5 4 4 5];
Y = X;
A = unique(X);
for i = 1:length(A)
idx = find(X==A(i));
if length(idx) == 1
Y(idx) = NaN;
end
end
Y(isnan(Y)) = [];
Then, Y would be [1 1 4 5 4 4 5]. It detects all single elements, and makes them as NaN, and then remove all NaN elements from the vector.

Straighten and concatenate the individual grids from ndgrid [duplicate]

This question already has answers here:
Generate a matrix containing all combinations of elements taken from n vectors
(4 answers)
Closed 6 years ago.
I'm trying to do the following in a general way:
x = {0:1, 2:3, 4:6};
[a,b,c] = ndgrid(x{:});
Res = [a(:), b(:), c(:)]
Res =
0 2 4
1 2 4
0 3 4
1 3 4
0 2 5
1 2 5
0 3 5
1 3 5
0 2 6
1 2 6
0 3 6
1 3 6
I believe I have to start the following way, but I can't figure out how to continue:
cell_grid = cell(1,numel(x));
[cell_grid{:}] = ndgrid(x{:});
[cell_grid{:}]
ans =
ans(:,:,1) =
0 0 2 3 4 4
1 1 2 3 4 4
ans(:,:,2) =
0 0 2 3 5 5
1 1 2 3 5 5
ans(:,:,3) =
0 0 2 3 6 6
1 1 2 3 6 6
I can solve this in many ways for the case with three variables [a, b, c], both with and without loops, but I start to struggle when I get more vectors. Reshaping it directly will not give the correct result, and mixing reshape with permute becomes really hard when I have arbitrary number of dimensions.
Can you think of a clever way to do this that scales to 3-30 vectors in x?
You can use cellfun to flatten each of the cell array elements and then concatenate them along the second dimension.
tmp = cellfun(#(x)x(:), cell_grid, 'uniformoutput', false);
out = cat(2, tmp{:})
Alternately, you could avoid cellfun and concatenate them along the dimension that is one higher than your dimension of each cell_grid member (i.e. numel(x) + 1). Then reshape to flatten all dimensions but the last one you just concatenated along.
out = reshape(cat(numel(x) + 1, cell_grid{:}), [], numel(x));

sort in matlab and assign ranking

Hi I need to sort a vector and assign a ranking for the corresponding sorting order. I'm using sort function [sortedValue_X , X_Ranked] = sort(X,'descend');
but the problem is it assigns different ranks for the same values (zeros).
i.e. x = [ 13 15 5 5 0 0 0 1 0 3] and I want zeros to take the same last rank which is 6 and fives needs to share the 3rd rank etc..
any suggestions?
The syntax [sortedValues, sortedIndexes] = sort(x, 'descend') does not return rank as you describe it. It returns the indexes of the sorted values. This is really useful if you want to use the sort order from one array to rearrange another array.
As suggested by #user1860611, unique seems to do what you want, using the third output as follows:
x = [ 13 15 5 5 0 0 0 1 0 3];
[~, ~, forwardRank] = unique(x);
%Returns
%forwardRank =
% 5 6 4 4 1 1 1 2 1 3
To get the order you want (decending) you'll need to reverse the order, like this:
reverseRank = max(forwardRank) - forwardRank + 1
%Returns
%reverseRank =
% 2 1 3 3 6 6 6 5 6 4
You may be done at this point. But you may want to sort these into the into an acsending order. This is a reorder of the reverseRank vector which keeping it in sync with the original x vector, which is exactly what the 2nd argument of sort is desined to help with. So we can do something like this:
[xSorted, ixsSort] = sort(x, 'descend'); %Perform a sort on x
reverseRankSorted = reverseRank(ixsSort); %Apply that sort to reverseRank
Which generates:
xSorted = 15 13 5 5 3 1 0 0 0 0
reverseRankSorted = 1 2 3 3 4 5 6 6 6 6
tiedrank.m might be the thing you are looking for.
>> x = round(rand(1,5)*10)
x =
8 7 3 10 0
>> tiedrank(x)
ans =
4 3 2 5 1

Extracting unique values

I have data in two columns that looks as follows:
A B
1,265848208 3
-0,608043611 0
-0,285735893 0
0,006895134 7
0 7
-0,004526196 7
0,176326617 10
-0,159688071 2
0,22439945 2
-0,991045044 1
0,178022324 1
-0,270967397 4
0,285849994 4
1,881705539 23
1,057184204 10
NaN 10
For all unique values in B I want to extract the corresponding value in column A and move it to a new matrix. I'm looking to then compute the mean of all the corresponding values in A and use as a dependent variable (weighted by no of observations per value in B) in a regression with the common value of B being the independent variable to reduce noise. Any help would on how to do this in Matlab (except running the regression) would be great!
Thanks
Oscar
Here is an efficient solution:
X = [
1.265848208 3
-0.608043611 0
-0.285735893 0
0.006895134 7
0 7
-0.004526196 7
0.176326617 10
-0.159688071 2
0.22439945 2
-0.991045044 1
0.178022324 1
-0.270967397 4
0.285849994 4
1.881705539 23
1.057184204 10
NaN 10
];
%# unique values in B, and their indices
[valB,~,subs] = unique(X(:,2));
%# values of A for each unique number in B (cellarray)
valA = accumarray(subs, X(:,1), [], #(x) {x});
%# mean of each group
meanValA = cellfun(#nanmean, valA)
%# perform regression here...
The result:
%# B values, mean of corresponding values in A, number of A values
>> [valB meanValA cellfun(#numel,valA)]
ans =
0 -0.44689 2
1 -0.40651 2
2 0.032356 2
3 1.2658 1
4 0.0074413 2
7 0.00078965 3
10 0.61676 3
23 1.8817 1

sorting with explicit tie (repeated elements) resolution

By default, MATLAB's sort function deals with ties/repeated elements by preserving the order of the elements, that is
>> [srt,idx] = sort([1 0 1])
srt =
0 1 1
idx =
2 1 3
Note that the two elements with value 1 in the input arbitrarily get assigned index 2 and 3, respectively. idx = [3 1 2], however, would be an equally valid sort.
I would like a function [srt,all_idx] = sort_ties(in) that explicitly returns all possible values for idx that are consistent with the sorted output. Of course this would only happen in the case of ties or repeated elements, and all_idx would be dimension nPossibleSorts x length(in).
I got started on a recursive algorithm for doing this, but quickly realized that things were getting out of hand and someone must have solved this before! Any suggestions?
I had a similar idea to what R. M. suggested. However, this solution is generalized to handle any number of repeated elements in the input vector. The code first sorts the input (using the function SORT), then loops over each unique value to generate all the permutations of the indices for that value (using the function PERMS), storing the results in a cell array. Then these index permutations for each individual value are combined into the total number of permutations for the sorted index by replicating them appropriately with the functions KRON and REPMAT:
function [srt,all_idx] = sort_ties(in,varargin)
[srt,idx] = sort(in,varargin{:});
uniqueValues = srt(logical([1 diff(srt)]));
nValues = numel(uniqueValues);
if nValues == numel(srt)
all_idx = idx;
return
end
permCell = cell(1,nValues);
for iValue = 1:nValues
valueIndex = idx(srt == uniqueValues(iValue));
if numel(valueIndex) == 1
permCell{iValue} = valueIndex;
else
permCell{iValue} = perms(valueIndex);
end
end
nPerms = cellfun('size',permCell,1);
for iValue = 1:nValues
N = prod(nPerms(1:iValue-1));
M = prod(nPerms(iValue+1:end));
permCell{iValue} = repmat(kron(permCell{iValue},ones(N,1)),M,1);
end
all_idx = [permCell{:}];
end
And here are some sample results:
>> [srt,all_idx] = sort_ties([0 2 1 2 2 1])
srt =
0 1 1 2 2 2
all_idx =
1 6 3 5 4 2
1 3 6 5 4 2
1 6 3 5 2 4
1 3 6 5 2 4
1 6 3 4 5 2
1 3 6 4 5 2
1 6 3 4 2 5
1 3 6 4 2 5
1 6 3 2 4 5
1 3 6 2 4 5
1 6 3 2 5 4
1 3 6 2 5 4
Consider the example A=[1,2,3,2,5,6,2]. You want to find the indices where 2 occurs, and get all possible permutations of those indices.
For the first step, use unique in combination with histc to find the repeated element and the indices where it occurs.
uniqA=unique(A);
B=histc(A,uniqA);
You get B=[1 3 1 1 1]. Now you know which value in uniqA is repeated and how many times. To get the indices,
repeatIndices=find(A==uniqA(B==max(B)));
which gives the indices as [2, 4, 7]. Lastly, for all possible permutations of these indices, use the perms function.
perms(repeatIndices)
ans =
7 4 2
7 2 4
4 7 2
4 2 7
2 4 7
2 7 4
I believe this does what you wanted. You can write a wrapper function around all this so that you have something compact like out=sort_ties(in). You probably should include a conditional around the repeatIndices line, so that if B is all ones, you don't proceed any further (i.e., there are no ties).
Here is a possible solution I believe to be correct, but it's somewhat inefficient because of the duplicates it generates initially. It's pretty neat otherwise, but I still suspect it can be done better.
function [srt,idx] = tie_sort(in,order)
L = length(in);
[srt,idx] = sort(in,order);
for j = 1:L-1 % for each position in sorted array, look for repeats following it
for k = j+1:L
% if repeat found, add possible permutations to the list of possible sorts
if srt(j) == srt(k)
swapped = 1:L; swapped(j) = k; swapped(k) = j;
add_idx = idx(:,swapped);
idx = cat(1,idx,add_idx);
idx = unique(idx,'rows'); % remove identical copies
else % because already sorted, know don't have to keep looking
break;
end
end
end