Matlab find unique column-combinations in matrix and respective index - matlab

I have a large matrix with with multiple rows and a limited (but larger than 1) number of columns containing values between 0 and 9 and would like to find an efficient way to identify unique row-wise combinations and their indices to then build sums (somehwat like a pivot logic). Here is an example of what I am trying to achieve:
a =
1 2 3
2 2 3
3 2 1
1 2 3
3 2 1
uniqueCombs =
1 2 3
2 2 3
3 2 1
numOccurrences =
2
1
2
indizies:
[1;4]
[2]
[3;5]
From matrix a, I want to first identify the unique combinations (row-wise), then count the number occurrences / identify the row-index of the respective combination.
I have achieved this through generating strings with num2str and strcat, but this method appears to be very slow. Along these thoughts I have tried to find a way to form a new unique number through concatenating the values horizontally, but Matlab does not seem to support this (e.g. from [1;2;3] build 123). Sums won't work because they would remove the possibility to identify unique combinations. Any suggestions on how to best achieve this? Thanks!

To get the unique rows you can use unique with the 'rows' option enabled:
[C, ix, ic] = unique(a, 'rows', 'stable');
C contains the unique rows; ix the indexes of the first occurrences of those rows in C; ic basically contains the information you want. To access it you can loop over the indexes of ix and save them in a cell array:
indexes = cell(1, length(ix));
for k = 1:length(ix)
indexes{k} = find(ic == ix(k));
end
indexes will be a cell array containing the indexes you were looking for. For example:
indexes{1}
% ans =
%
% 1
% 4
And to count the occurrences of a particular combination you can just use numel. For example:
numel(indexes{1})
% ans =
%
% 2

Related

Sorting with tiebreaks in matlab recursively

I want to define a recursive function that sorts an input vector and uses a sequence of secondary vectors to break any ties (or randomises them if it runs out of tiebreak vectors)
Given some input vector I and some tiebreaker matrix T, the pseudocode for the algorithm is as follows:
check if T is empty, if so, we reached stopping condition, therefore randomise input
get order of indices for sorted I, using matlab's standard sort function
find indices of duplicate values
for each duplicate value,
call function recursively on T(:,1) with rows corresponding to the indices of that duplicate value, with T(:,2:end)(with appropriate rows) as the new tiebreaker matrix - if empty then this call will just return random indices
fix the order of the sorted indices in the original sorted I
return the sorted I and corresponding indices
Here is what I have so far:
function [vals,idxs] = tiebreak_sort(input, ties)
% if the tiebreak matrix is empty, then return random
if isempty(ties)
idxs = randperm(size(input,1));
vals = input(idxs);
return
end
% sort the input
[vals,idxs] = sort(input);
% check for duplicates
[~,unique_idx] = unique(vals);
dup_idx = setdiff(1:size(vals,1),unique_idx);
% iterate over each duplicate index
for i = 1:numel(dup_idx)
% resolve tiebreak for duplicates
[~,d_order] = tiebreak_sort(ties(input==input(i),1),...
ties(input==input(i),2:end));
% fix the order of sorted indices (THIS IS WHERE I AM STUCK)
idxs(vals==input(i)) = ...?
end
return
I want to find a way to map the output of the recursive call, to the indices in idxs, to fix their order based on the (possibly recursive) tie breaks, but my brain is getting twisted in knots thinking about it..
Can I just use the fact that Matlabs sort function is stable and preserves the original order, and do it like this?
% find indices of duplicate values
dups = find(input==input(i));
% fix the order of sorted indices
idxs(vals==input(i)) = dups(d_order);
Or will that not work? is there another way of doing what I am trying to do, in general?
Just to give a concrete example, this would be a sample input:
I = [1 2 2 1 2 2]'
T = [4 1 ;
3 7 ;
3 4 ;
2 2 ;
1 8 ;
5 3 ]
and the output would be:
vals = [1 1 2 2 2 2]'
idxs = [4 1 5 3 2 6]'
Here, there are clearly duplicates in the input, so the function is called recursively on the first column of the tiebreaker matrix, which was able to fix the 1s but it needed a second recursive call on the 3s of the first column to break those ties.
No need to define a function, sortrows does that:
[S idxs] = sortrows([I T]);
vals = S(:,1);

Find the row/column of all maximum values in each row of a matrix

I want search indexes maximum value/s in each row. If row has more than one maximum, so I want to save both indexes.
For example:
X = [5 6 8
1 2 3
4 4 0];
And I need indexes
inds = [1 3
2 3
3 1
3 2];
I wanted to use function max but this function only saves one index.
You can use max to compute the max for each row and then compare the elements in each row to it's row-wise max using bsxfun and eq. Then you can find the row/column positions of these maxima. We use a transpose in there (.') to ensure that we get the ordering of the output that you expect.
[c,r] = find(bsxfun(#eq, d, max(d, [], 2)).')
output = [r,c];
Another way to do it, would be to use max and repmat.
First you find the maximum of each row using
rowMaximum=max(X,[],2);
Then you replicate the maximum so that it has the same dimension as your input and compare it to the input
logicalMaximum=repmat(rowMaximum,1,size(X,2))==X;
And the last thing you wanna do is converting this logical array into your desired indexes
[columns,rows]=find(logicalMaximum);
result=[rows,columns];

How to sum aggregate matrix rows in the following way?

I have a matrix of size n-by-3. For some rows of this matrix, the first two columns are identical. I need to keep only one copy of these first-two-element combinations, where the third column will have the sum of 3rd column from rows with identical first-two-columns.
Here's an example of what I want to do:
M = [...
1 2 1
1 2 3
1 2 2
1 2 4
2 3 1
2 3 4
2 3 0];
The final matrix that I need is
R = [...
1 2 1+3+2+4
2 3 1+4+0];
How can this be done? I don't see how I can use the unique command for this.
You may use unique in combination with accumarray. Let's call the initial n x 3 array A:
[C, ~, ic] = unique(A(:,1:2), 'rows');
B = [C, accumarray(ic, A(:,3))];
Explanation:
unique outputs not only unique elements of array (rows in our case thanks to the argument rows), but also two arrays of indexes. The first one is the indexes of the first unique elements in A; I discard it since I don't use it. The second one can be used to reconstruct original array from the output array: A(:, 1:2) = C(ic,:).
accumarray is the generalization of histogram computation, it makes the sum of elements in 2nd argument array for each unique index in the first argument array. In your case, you make the sum over the 3rd column of the original array only.
And that's all in two simple commands!

All possible combinations such that sum of all numbers is a fixed number

I need to find all possible combinations of numbers 1:8 such that sum of all elements is equal to 8
The combinations need to be arranged in an ascending order.
Eg
1 7
2 2 4
1 3 5
1 2 2 3
1 1 1 1 1 1 1 1
A number can repeat itself. But a combination must not..
i.e 1 2 2 3 and 2 1 2 3
I need the the solution in ascending order So there will be only one possibility of every combination
I tried a few codes online suggested on Find vector elements that sum up to specific number in MATLAB
VEC = [1:8];
NUM = 8;
n = length(VEC);
finans = zeros(2^n-1,NUM);
for i = 1:(2^n - 1)
ndx = dec2bin(i,n) == '1';
if sum(VEC(ndx)) == NUM
l = length(VEC(ndx));
VEC(ndx)
end
end
but they dont include the possibilities where the numbers repeat.
I found a better approach through recursion and it's more elegant (I like elegant) and faster than my previous attempt (0.00399705213 seconds on my computer).
EDIT: You will need my custom function stretchmat.m that stretches a vector to fit the size of another matrix. Kinda like repmat but stretching the first parameter (see help for details). Very useful!
script.m
% Define funciton to prepend a cell x with a variable i
cellprepend = #(x,i) {[i x]};
% Execute and time function
tic;
a = allcomb(cellprepend,1,8); % Solution in a
toc;
allcomb.m
function a = allcomb( cellprepend, m, n )
% Add entire block as a combination
a{1} = n;
% Exit recursion if block size 1
if n == 1
return;
end
% Recurse cutting blocks at different segments
for i = m:n/2
b = allcomb(cellprepend,i,n-i);
a = [a cellfun( cellprepend, b, num2cell( stretchmat( i, b ) ) )];
end
end
So the idea is simple, for solutions that add to 8 is exhaustive. If you look for only valid answers, you can do a depth first search by breaking up the problem into 2 blocks. This can be written recursively as I did above and is kinda similar to Merge Sort. The allcomb call takes the block size (n) and finds all the ways of breaking it up into smaller pieces.
We want non-zero pieces so we loop it from 1:n-1. It then prepends the first block to all the combinations of the second block. By only doing all comb on one of the blocks, we can ensure that all solutions are unique.
As for the sorting, I'm not quite sure what you mean by ascending. From what I see, you appear to be sorting from the last number in ascending order. Can you confirm? Any sort can be appended to the end of script.m.
EDIT 2/3 Notes
For the permutatively unique case, the code can be found here
Thanks to #Simon for helping me QA the code multiple times
EDIT: Look at my second more efficient answer!
The Naive approach! Where the cartprod.m function can be found here.
% Create all permutations
p(1:8) = {0:8};
M = fliplr( cartprod( p{:} ) );
% Check sums
r = sum( M, 2 ) == 8;
M = M(sum( M, 2 ) == 8,:); % Solution here
There are definitely more efficient solutions than this but if you just need a quick and dirty solution for small permutations, this will work. Please note that this made Matlab take 3.5 GB of RAM to temporarily store the permutations.
First save all combinations with repetitions in a cell array. In order to do that, just use nmultichoosek.
v = 1 : 8;
combs = cell(length(v),0);
for i = v
combs{i} = nmultichoosek(v,i);
end
In this way, each element of combs contains a matrix where each row is a combination. For instance, the i-th row of combs{4} is a combination of four numbers.
Now you need to check the sum. In order to do that to all the combinations, use cellfun
sums = cellfun(#(x)sum(x,2),combs,'UniformOutput',false);
sums contains the vectors with the sum of all combinations. For
instance, sums{4} has the sum of the number in combination combs{4}.
The next step is check for the fixed sum.
fixed_sum = 10;
indices = cellfun(#(x)x==fixed_sum,sums,'UniformOutput',false);
indices contains arrays of logical values, telling if the combination satisfies the fixed sum. For instance, indices{4}(1) tells you if the first combination with 4 numbers sums to fixed_sum.
Finally, retrieve all valid combinations in a new cell array, sorting them at the same time.
valid_combs = cell(length(v),0);
for i = v
idx = indices{i};
c = combs{i};
valid_combs{i} = sortrows(c(idx,:));
end
valid_combs is a cell similar to combs, but with only combinations that sum up to your desired value, and sorted by the number of numbers used: valid_combs{1} has all valid combinations with 1 number, valid_combs{2} with 2 numbers, and so on. Also, thanks to sortrows, combinations with the same amount of numbers are also sorted. For instance, if fixed_sum = 10 then valid_combs{8} is
1 1 1 1 1 1 1 3
1 1 1 1 1 1 2 2
This code is quite efficient, on my very old laptop I am able to run it in 0.016947 seconds.

What does it mean to use logical indexing/masking to extract data from a matrix? (MATLAB)

I am new to matlab and I was wondering what it meant to use logical indexing/masking to extract data from a matrix.
I am trying to write a function that accepts a matrix and a user-inputted value to compute and display the total number of values in column 2 of the matrix that match with the user input.
The function itself should have no return value and will be called on later in another loop.
But besides all that hubbub, someone suggested that I use logical indexing/masking in this situation but never told me exactly what it was or how I could use it in my particular situation.
EDIT: since you updated the question, I am updating this answer a little.
Logical indexing is explained really well in this and this. In general, I doubt, if I can do a better job, given available time. However, I would try to connect your problem and logical indexing.
Lets declare an array A which has 2 columns. First column is index (as 1,2,3,...) and second column is its corresponding value, a random number.
A(:,1)=1:10;
A(:,2)=randi(5,[10 1]); //declares a 10x1 array and puts it into second column of A
userInputtedValue=3; //self-explanatory
You want to check what values in second column of A are equal to 3. Imagine as if you are making a query and MATLAB is giving you binary response, YES (1) or NO (0).
q=A(:,2)==3 //the query, what values in second column of A equal 3?
Now, for the indices where answer is YES, you want to extract the numbers in the first column of A. Then do some processing.
values=A(q,2); //only those elements will be extracted: 1. which lie in the
//second column of A AND where q takes value 1.
Now, if you want to count total number of values, just do:
numValues=length(values);
I hope now logical indexing is clear to you. However, do read the Mathworks posts which I have mentioned earlier.
I over simplified the code, and wrote more code than required in order to explain things. It can be achieved in a single-liner:
sum(mat(:,2)==userInputtedValue)
I'll give you an example that may illustrate what logical indexing is about:
array = [1 2 3 0 4 2];
array > 2
ans: [0 0 1 0 1 0]
using logical indexing you could filter elements that fullfil a certain condition
array(array>2) will give: [3 4]
you could also perform alterations to only those elements:
array(array>2) = 100;
array(array<=2) = 0;
will result in "array" equal to
[0 0 100 0 100 0]
Logical indexing means to have a logical / Boolean matrix that is the same size as the matrix that you are considering. You would use this as input into the matrix you're considering, and any locations that are true would be part of the output. Any locations that are false are not part of the output. To perform logical indexing, you would need to use logical / Boolean operators or conditions to facilitate the selection of elements in your matrix.
Let's concentrate on vectors as it's the easiest to deal with. Let's say we had the following vector:
>> A = 1:9
A =
1 2 3 4 5 6 7 8 9
Let's say I wanted to retrieve all values that are 5 or more. The logical condition for this would be A >= 5. We want to retrieve all values in A that are greater than or equal to 5. Therefore, if we did A >= 5, we get a logical vector which tells us which values in A satisfy the above condition:
>> A >= 5
ans =
0 0 0 0 1 1 1 1 1
This certainly tells us where in A the condition is satisfied. The last step would be to use this as input into A:
>> B = A(A >= 5)
B =
5 6 7 8 9
Cool! As you can see, there isn't a need for a for loop to help us select out elements that satisfy a condition. Let's go a step further. What if I want to find all even values of A? This would mean that if we divide by 2, the remainder would be zero, or mod(A,2) == 0. Let's extract out those elements:
>> C = A(mod(A,2) == 0)
C =
2 4 6 8
Nice! So let's go back to your question. Given your matrix A, let's extract out column 2.
>> col = A(:,2)
Now, we want to check to see if any of column #2 is equal to a certain value. Well we can generate a logical indexing array for that. Let's try with the value of 3:
>> ind = col == 3;
Now you'll have a logical vector that tells you which locations are equal to 3. If you want to determine how many are equal to 3, you just have to sum up the values:
>> s = sum(ind);
That's it! s contains how many values were equal to 3. Now, if you wanted to write a function that only displayed how many values were equal to some user defined input and displayed this event, you can do something like this:
function checkVal(A, val)
disp(sum(A(:,2) == val));
end
Quite simply, we extract the second column of A and see how many values are equal to val. This produces a logical array, and we simply sum up how many 1s there are. This would give you the total number of elements that are equal to val.
Troy Haskin pointed you to a very nice link that talks about logical indexing in more detail: http://www.mathworks.com/help/matlab/math/matrix-indexing.html?refresh=true#bq7eg38. Read that for more details on how to master logical indexing.
Good luck!
%% M is your Matrix
M = randi(10,4)
%% Val is the value that you are seeking to find
Val = 6
%% Col is the value of the matrix column that you wish to find it in
Col = 2
%% r is a vector that has zeros in all positions except when the Matrix value equals the user input it equals 1
r = M(:,Col)==Val
%% We can now sum all the non-zero values in r to get the number of matches
n = sum(r)
M =
4 2 2 5
3 6 7 1
4 4 1 6
5 8 7 8
Val =
6
Col =
2
r =
0
1
0
0
n =
1