This question is motivated by very specific combinatorial optimization problem, where search space is defined as a space of permuted subsets of vector unsorted set of discrete values with multiplicities.
I am looking for effective (fast enough, vectorized or any other more clever solution) function which is able to find indices of subsets in the following manner:
t = [1 1 3 2 2 2 3 ]
is unsorted vector of all possible values, including its multiplicities.
item = [2 3 1; 2 1 2; 3 1 1; 1 3 3]
is a list of permuted subsets of vector t.
I need to find list of corresponding indices of subsets item which corresponds to the vector t. So, for above mentioned example we have:
item =
2 3 1
2 1 2
3 1 1
1 3 3
t =
1 1 3 2 2 2 3
ind = item2ind(item,t)
ind =
4 3 1
4 1 5
3 1 2
1 3 7
So, for item = [2 3 1] we get ind = [4 3 1], which means, that:
first value "2" at item corresponds to the first value "2" at t on position "4",
second value "3" at item corresponds to the first value "3" at t on position "3" and
third value "1" at item corresponds to the first value "1" at t on position "1".
In a case item =[ 2 1 2] we get ind = [4 1 5], which means, that:
first value "2" at item corresponds to the first value "2" at t on position "4",
second value "1" at item corresponds to the first value "1" at t on position "1", and
third value "2" at item corresponds to the second(!!!) value "1" at t on position "5".
item = [1 1 1]
does not exist any solution, because vector t contains only two "1".
My current version of function "item2ind" is very trivial serial code, which is possible simple parallelized by changing of "for" to "parfor" loop:
function ind = item2ind(item,t)
[nlp,N] = size(item);
ind = zeros(nlp,N);
for i = 1:nlp
auxitem = item(i,:);
auxt = t;
for j = 1:N
I = find(auxitem(j) == auxt,1,'first');
if ~isempty(I)
auxt(I) = 0;
ind(i,j) = I;
error('Incompatible content of item and t.');
But I need something definitely more clever ... and faster:)
Test case for larger input data:
t = 1:10; % 10 unique values at vector t
t = repmat(t,1,5); % unsorted vector t with multiplicity of all unique values 5
nlp = 100000; % number of item rows
[~,p] = sort(rand(nlp,length(t)),2); % 100000 random permutations
item = t(p); % transform permutations to items
item = item(:,1:30); % transform item to shorter subset
tic;ind = item2ind(item,t);toc % runing and timing of the original function
tic;ind_ = item2ind_new(item,t);toc % runing and timing of the new function
To achieve vectorizing the code, I have assumed that the error case won't be present. It should be discarded first, with a simple procedure I will present below.
Method First, let's compute the indexes of all elements in t:
t = t(:);
mct = max(accumarray(t,1));
G = accumarray(t,1:length(t),[],#(x) {sort(x)});
G = cellfun(#(x) padarray(x.',[0 mct-length(x)],0,'post'), G, 'UniformOutput', false);
G = vertcat(G{:});
Explanation: after putting input in column vector shape, we compute the max number of occurences of each possible value in t using accumarray. Now, we form array of all indexes of all numbers. It forms a cell array as there may be not the same number of occurences for each value. In order to form a matrix, we pad each array independently to the max length (naming mct). Then we can transform the cell array into a matrix. At this step, we have:
G =
1 11 21 31 41
2 12 22 32 42
3 13 23 33 43
4 14 24 34 44
5 15 25 35 45
6 16 26 36 46
7 17 27 37 47
8 18 28 38 48
9 19 29 39 49
10 20 30 40 50
Now, we process item. For that, let's figure out how to create the cumulative sum of occurences of values inside a vector. For example, if I have:
A = [1 1 3 2 2 2 3];
then I want to get:
B = [1 2 1 1 2 3 2];
Thanks to implicit expansion, we can have it in one line:
B = diag(cumsum(A==A'));
As easy as this. The syntax A==A' expands into a matrix where each element is A(i)==A(j). Making the cumulative sum in only one dimension and taking the diagonal gives us the good result: each column in the cumulative sum of occurences over one value.
To use this trick with item which 2-D, we should use a 3D array. Let's call m=size(item,1) and n=size(item,2). So:
C = cumsum(reshape(item,m,1,n)==item,3);
is a (big) 3D matrix of all cumulatives occurences. Last thing is to select the columns that are on the diagonal along dimension 2 and 3:
ia = C(sub2ind(size(C),repelem((1:m).',1,n),repelem(1:n,m,1),repelem(1:n,m,1)));
Now, with all these matrices, indexing is easy:
ind = G(sub2ind(size(G),item,ia));
Finally, let's recap the code of the function:
function ind = item2ind_new(item,t)
t = t(:);
[m,n] = size(item);
mct = max(accumarray(t,1));
G = accumarray(t,1:length(t),[],#(x) {sort(x)});
G = cellfun(#(x) padarray(x.',[0 mct-length(x)],0,'post'), G, 'UniformOutput', false);
G = vertcat(G{:});
C = cumsum(reshape(item,m,1,n)==item,3);
ia = C(sub2ind(size(C),repelem((1:m).',1,n),repelem(1:n,m,1),repelem(1:n,m,1)));
ind = G(sub2ind(size(G),item,ia));
Results Running the provided script on an old 4-core, I get:
Elapsed time is 4.317914 seconds.
Elapsed time is 0.556803 seconds.
ans =
Speed up is substential (more than 8x), along with memory consumption (with matrix C). I guess some improvements can be done with this part to save more memory.
EDIT For generating ia, this procedure can cost a lost of memory. A way to save memory is to use a for-loop to generate directly this array:
ia = zeros(size(item));
for i=unique(t(:)).'
ia = ia+cumsum(item==i, 2).*(item==i);
In all cases, when you have ia, it's easy to test if there is an error in item compared to t:
MATLAB - Returning a matrix of sums of elements corresponding to the same kind

An n×m matrix A and an n×1 vector Date are the inputs of the function S = sumdate(A,Date).
The function returns an n×m vector S such that all rows in S correspond to the sum of the rows of A from the same date.
For example, if
A = [1 2 7 3 7 3 4 1 9
6 4 3 0 -1 2 8 7 5]';
Date = [161012 161223 161223 170222 160801 170222 161012 161012 161012]';
Then I would expect the returned matrix S is
S = [15 9 9 6 7 6 15 15 15;
26 7 7 2 -1 2 26 26 26]';
Because the elements Date(2) and Date(3) are the same, we have
S(2,1) and S(3,1) are both equal to the sum of A(2,1) and A(3,1)
S(2,2) and S(3,2) are both equal to the sum of A(2,2) and A(3,2).
Because the elements Date(1), Date(7), Date(8) and Date(9) are the same, we have
S(1,1), S(7,1), S(8,1), S(9,1) equal the sum of A(1,1), A(7,1), A(8,1), A(9,1)
S(1,2), S(7,2), S(8,2), S(9,2) equal the sum of A(1,2), A(7,2), A(8,2), A(9,2)
The same for S([4,6],1) and S([4,6],2)
As the element Date(5) does not repeat, so S(5,1) = A(5,1) = 7 and S(5,2) = A(5,2) = -1.
The code I have written so far
Here is my try on the code for this task.
function S = sumdate(A,Date)
S = A; %Pre-assign S as a matrix in the same size of A.
Dlist = unique(Date); %Sort out a non-repeating list from Date
for J = 1 : length(Dlist)
loc = (Date == Dlist(J)); %Compute a logical indexing vector for locating the J-th element in Dlist
S(loc,:) = repmat(sum(S(loc,:)),sum(loc),1); %Replace the located rows of S by the sum of them
I tested it on my computer using A and Date with these attributes:
size(A) = [33055 400];
size(Date) = [33055 1];
length(unique(Date)) = 2645;
It took my PC about 1.25 seconds to perform the task.
This task is performed hundreds of thousands of times in my project, therefore my code is too time-consuming. I think the performance will be boosted up if I can eliminate the for-loop above.
I have found some built-in functions which do special types of sums like accumarray or cumsum, but I still do not have any ideas on how to eliminate the for-loop.
I would appreciate your help.
You can do this with accumarray, but you'll need to generate a set of row and column subscripts into A to do it. Here's how:
[~, ~, index] = unique(Date); % Get indices of unique dates
subs = [repmat(index, size(A, 2), 1) ... % repmat to create row subscript
repelem((1:size(A, 2)).', size(A, 1))]; % repelem to create column subscript
S = accumarray(subs, A(:)); % Reshape A into column vector for accumarray
S = S(index, :); % Use index to expand S to original size of A
S =
15 26
9 7
9 7
6 2
7 -1
6 2
15 26
15 26
15 26
Note #1: This will use more memory than your for loop solution (subs will have twice the number of element as A), but may give you a significant speed-up.
Note #2: If you are using a version of MATLAB older than R2015a, you won't have repelem. Instead you can replace that line using kron (or one of the other solutions here):
How to efficiently find in some dataset the number of occurrences of a given list of items, without using loops?

I have a dataset, M, where some items and their category types are stored in columns 1 and 2 respectively. The vector cat stores the unique category types present in M. Vector Y is a subset of items in M. I want to find how many times each category type is associated with the items in Y. This is the code I have written to do this:
cat(:,1) = unique(M(:,2)); % Unique items in M
cat(:,2) = zeros(size(cat,1),1); % initialize column 2 of cat to 0s
N = size(Y,1);
for i=1:N
item = Y(i,1);
temp = M(M(:,1)==item,:);
C(:,1) = unique(temp(:,2));
C(:,2) = histc(temp(:,2), unique(temp(:,2))); % Frequency of items in temp(:,2)
for j=1:size(cat,1)
for k=1:size(C,1)
if cat(j,1)==C(k,1)
cat(j,2) = cat(j,2)+C(k,2);
clear C; clear temp; clear item;
But this is obviously slow for even moderately sized M, Y and cat. How do I make it faster?
To illustrate with an example, say:
M=[3 2
4 12
1 7
3 4
2 10
1 6
4 19
4 6
3 12
1 10
2 12];
Then I want the output cat to be the following:
cat=[2 1
4 1
6 0
7 0
10 1
12 2
19 0];
If I correctly understand you want histogram of categories of those items from M that also appear in Y.
Using ismember you can find index of items of M that also appear in Y:
idx = ismember(M(:,1), Y);
Use that index to filter out desired items and save it to temp:
temp = M(idx, :);
Form histogram of temp with unique values from Cat(:,1):
Cat(:,2) = histc(temp(:, 2), Cat(:, 1));
Avoiding saving intermediate results the above code can be simplified :
idx = ismember(M(:,1),Y);
Cat(:,2) = histc(M(idx, 2), Cat(:,1));
Or all in one line:
Cat(:,2) = histc(M(ismember(M(:,1),Y), 2), Cat(:,1));
How to get all the possible combinations of elements in a matrix, but don't allow exchange of elements inbetween columns?

Lets say I have this matrice A: [3 x 4]
1 4 7 10
2 5 8 11
3 6 9 12
I want to permute the element of in each column, but they can't change to a different column, so 1 2 3 need to always be part of the first column. So for exemple I want:
3 4 8 10
1 5 7 11
2 6 9 12
3 4 8 11
1 6 7 10
2 5 9 12
1 6 9 11
. . . .
So in one matrix I would like to have all the possible permutation, in this case, there are 3 different choices 3x3x3x3=81possibilities.So my result matrixe should be 81x4, because I only need each time one [1x4]line vector answer, and that 81 time.
An other way to as the question would be (for the same end for me), would be, if I have 4 column vector:
Compare to my previous exemple, each column vector can have a different number of row. Then is like I have 4 boxes, A, B C, D and I can only put one element of a in A, b in B and so on; so I would like to get all the permutation possible with the answer [A B C D] beeing a [1x4] row, and in this case, I would have 3x3x3x4=108 different row. So where I have been missunderstood (my fault), is that I don't want all the different [3x4] matrix answers but just [1x4]lines.
so in this case the answer would be:
1 4 7 10
and 1 4 7 11
and 1 4 7 12
and 1 4 7 13
and 2 4 8 10
and ...
until there are the 108 combinations
The fonction perms in Matlab can't do that since I don't want to permute all the matrix (and btw, this is already a too big matrix to do so).
So do you have any idea how I could do this or is there is a fonction which can do that? I, off course, also could have matrix which have different size. Thank you
Basically you want to get all combinations of 4x the permutations of 1:3.
You could generate these with combvec from the Neural Networks Toolbox (like #brainkz did), or with permn from the File Exchange.
After that it's a matter of managing indices, applying sub2ind (with the correct column index) and rearranging until everything is in the order you want.
a = [1 4 7 10
2 5 8 11
3 6 9 12];
siz = size(a);
perm1 = perms(1:siz(1));
Nperm1 = size(perm1,1); % = factorial(siz(1))
perm2 = permn(1:Nperm1, siz(2) );
Nperm2 = size(perm2,1);
permidx = reshape(perm1(perm2,:)', [Nperm2 siz(1), siz(2)]); % reshape unnecessary, easier for debugging
col_base_idx = 1:siz(2);
col_idx = col_base_idx(ones(Nperm2*siz(1) ,1),:);
lin_idx = reshape(sub2ind(size(a), permidx(:), col_idx(:)), [Nperm2*siz(1) siz(2)]);
result = a(lin_idx);
This avoids any loops or cell concatenation and uses straigh indexing instead.
Permutations per column, unique rows
Same method:
siz = size(a);
permidx = permn(1:siz(1), siz(2) );
Npermidx = size(permidx, 1);
col_base_idx = 1:siz(2);
col_idx = col_base_idx(ones(Npermidx, 1),:);
lin_idx = reshape(sub2ind(size(a), permidx(:), col_idx(:)), [Npermidx siz(2)]);
result = a(lin_idx);
Your question appeared to be a very interesting brain-teaser. I suggest the following:
in = [1,2,3;4,5,6;7,8,9;10,11,12]';
b = perms(1:3);
a = 1:size(b,1);
c = combvec(a,a,a,a);
for k = 1:length(c(1,:))
out{k} = [in(b(c(1,k),:),1),in(b(c(2,k),:),2),in(b(c(3,k),:),3),in(b(c(4,k),:),4)];
%and if you want your result as an ordinary array:
out = vertcat(out{:});
b is a 6x3 array that contains all possible permutations of [1,2,3]. c is 4x1296 array that contains all possible combinations of elements in a = 1:6. In the for loop we use number from 1 to 6 to get the permutation in b, and that permutation is used as indices to the column.
Hope that helps
this is another octave friendly solution:
function result = Tuples(A)
[P,n]= size(A);
M = reshape(repmat(1:P, 1, P ^(n-1)), repmat(P, 1, n));
result = zeros(P^ n, n);
for i = 1:n
result(:, i) = A(reshape(permute(M, circshift((1:n)', i)), P ^ n, 1), i);
A = [...
1 4 7 10;...
2 5 8 11;...
3 6 9 12];
result = Tuples(A)
Question updated that: given n vectors of different length generates a list of all possible tuples whose ith element is from vector i:
function result = Tuples( A)
if exist('repelem') ==0
repelem = #(v,n) repelems(v,[1:numel(v);n]);
n = numel(A);
siz = [ cell2mat(cellfun(#numel, A , 'UniformOutput', false))];
tot_prd = prod(siz);
tot_cum = tot_prd ./ cum_prd;
cum_siz = cum_prd ./ siz;
result = zeros(tot_prd, n);
for i = 1: n
result(:, i) = repmat(repelem(A{i},repmat(tot_cum(i),1,siz(i))) ,1,cum_siz(i));
a = {...
result =Tuples(a)
This is a little complicated but it works without the need for any additional toolboxes:
You basically want a b element 'truth table' which you can generate like this (adapted from here) if you were applying it to each element:
[b, n] = size(A)
truthtable = dec2base(0:power(b,n)-1, b) - '0'
Now you need to convert the truth table to linear indexes by adding the column number times the total number of rows:
idx = bsxfun(#plus, b*(0:n-1)+1, truthtable)
now you instead of applying this truth table to each element you actually want to apply it to each permutation. There are 6 permutations so b becomes 6. The trick is to then create a 6-by-1 cell array where each element has a distinct permutation of [1,2,3] and then apply the truth table idea to that:
[m,n] = size(A);
b = factorial(m);
permutations = reshape(perms(1:m)',[],1);
permCell = mat2cell(permutations,ones(b,1)*m,1);
truthtable = dec2base(0:power(b,n)-1, b) - '0';
expandedTT = cell2mat(permCell(truthtable + 1));
idx = bsxfun(#plus, m*(0:n-1), expandedTT);
Another answer. Rather specific just to demonstrate the concept, but can easily be adapted.
A = [1,4,7,10;2,5,8,11;3,6,9,12];
P = perms(1:3)'
[X,Y,Z,W] = ndgrid(1:6,1:6,1:6,1:6);
You now have 1296 permutations. If you wanted to access, say, the 400th one:
Permutation_within_column = [P(:,X(400)), P(:,Y(400)), P(:,Z(400)), P(:,W(400))];
ColumnOffset = repmat([0:3]*3,[3,1])
My_permutation = Permutation_within_column + ColumnOffset; % results in valid linear indices
This approach allows you to obtain the 400th permutation on demand; if you prefer to have all possible permutations concatenated in the 3rd dimension, (i.e. a 3x4x1296 matrix), you can either do this with a for loop, or simply adapt the above and vectorise; for example, if you wanted to create a 3x4x2 matrix holding the first two permutations along the 3rd dimension:
Permutations_within_columns = reshape(P(:,X(1:2)),3,1,[]);
Permutations_within_columns = cat(2, Permutations_within_columns, reshape(P(:,Y(1:2)),3,1,[]));
Permutations_within_columns = cat(2, Permutations_within_columns, reshape(P(:,Z(1:2)),3,1,[]));
Permutations_within_columns = cat(2, Permutations_within_columns, reshape(P(:,W(1:2)),3,1,[]));
ColumnOffsets = repmat([0:3]*3,[3,1,2]);
My_permutations = Permutations_within_columns + ColumnOffsets;
Generate pairs of points using a nested for loop

As an example, I have a matrix [1,2,3,4,5]'. This matrix contains one column and 5 rows, and I have to generate a pair of points like (1,2),(1,3)(1,4)(1,5),(2,3)(2,4)(2,5),(3,4)(3,5)(4,5).
I have to store these values in 2 columns in a matrix. I have the following code, but it isn't quite giving me the right answer.
for s = 1:5;
for tb = (s+1):5;
if tb>s
in = sub2ind(size(pairpoints),(tb-1),1);
pairpoints(in) = s;
in = sub2ind(size(pairpoints),(tb-1),2);
pairpoints(in) = tb;
With this code, I got (1,2),(2,3),(3,4),(4,5). What should I do, and what is the general formula for the number of pairs?
One way, though is limited depending upon how many different elements there are to choose from, is to use nchoosek as follows
pairpoints = nchoosek([1:5],2)
pairpoints =
1 2
1 3
1 4
1 5
2 3
2 4
2 5
3 4
3 5
4 5
See the limitations of this function in the provided link.
An alternative is to just iterate over each element and combine it with the remaining elements in the list (assumes that all are distinct)
pairpoints = [];
data = [1:5]';
len = length(data);
for k=1:len
pairpoints = [pairpoints ; [repmat(data(k),len-k,1) data(k+1:end)]];
This method just concatenates each element in data with the remaining elements in the list to get the desired pairs.
Try either of the above and see what happens!
Another suggestion I can add to the mix if you don't want to rely on nchoosek is to generate an upper triangular matrix full of ones, disregarding the diagonal, and use find to generate the rows and columns of where the matrix is equal to 1. You can then concatenate both of these into a single matrix. By generating an upper triangular matrix this way, the locations of the matrix where they're equal to 1 exactly correspond to the row and column pairs that you are seeking. As such:
%// Highest value in your data
N = 5;
[rows,cols] = find(triu(ones(N),1));
pairpoints = [rows,cols]
pairPoints =
1 2
1 3
2 3
1 4
2 4
3 4
1 5
2 5
3 5
4 5
Bear in mind that this will be unsorted (i.e. not in the order that you specified in your question). If order matters to you, then use the sortrows command in MATLAB so that we can get this into the proper order that you're expecting:
pairPoints = sortrows(pairPoints)
pairPoints =
1 2
1 3
1 4
1 5
2 3
2 4
2 5
3 4
3 5
4 5
Take note that I specified an additional parameter to triu which denotes how much of an offset you want away from the diagonal. The default offset is 0, which includes the diagonal when you extract the upper triangular matrix. I specified 1 as the second parameter because I want to move away from the diagonal towards the right by 1 unit so I don't want to include the diagonal as part of the upper triangular decomposition.
for loop approach
If you truly desire the for loop approach, going with your model, you'll need two for loops and you need to keep track of the previous row we are at so that we can just skip over to the next column until the end using this. You can also use #GeoffHayes approach in using just a single for loop to generate your indices, but when you're new to a language, one key advice I will always give is to code for readability and not for efficiency. Once you get it working, if you have some way of measuring performance, you can then try and make the code faster and more efficient. This kind of programming is also endorsed by Jon Skeet, the resident StackOverflow ninja, and I got that from this post here.
As such, you can try this:
pairPoints = []; %// Initialize
N = 5; %// Highest value in your data
for row = 1 : N
for col = row + 1 : N
pairPoints = [pairPoints; [row col]]; %// Add row-column pair to matrix
We get the equivalent output:
pairPoints =
1 2
1 3
1 4
1 5
2 3
2 4
2 5
3 4
3 5
4 5
Small caveat
This method will only work if your data is enumerated from 1 to N.
Edit - August 20th, 2014
You wish to generalize this to any array of values. You also want to stick with the for loop approach. You can still keep the original for loop code there. You would simply have to add a couple more lines to index your new array. As such, supposing your data array was:
dat = [12, 45, 56, 44, 62];
You would use the pairPoints matrix and use each column to subset the data array to access your values. Also, you need to make sure your data is a column vector, or this won't work. If we didn't, we would be creating a 1D array and concatenating rows and that's not obviously what we're looking for. In other words:
dat = [12, 45, 56, 44, 62];
dat = dat(:); %// Make column vector - Important!
N = numel(dat); %// Total number of elements in your data array
pairPoints = []; %// Initialize
%// Skip if the array is empty
if (N ~= 0)
for row = 1 : N
for col = row + 1 : N
pairPoints = [pairPoints; [row col]]; %// Add row-column pair to matrix
vals = [dat(pairPoints(:,1)) dat(pairPoints(:,2))];
vals = [];
Take note that I have made a provision where if the array is empty, don't even bother doing any calculations. Just output an empty matrix.
We thus get:
vals =
12 45
12 56
12 44
12 62
45 56
45 44
45 62
56 44
56 62
Finding maxima in 2D matrix along certain dimension with indices

I have a <206x193> matrix A. It contains the values of a parameter at 206 different locations at 193 time steps. I am interested in the maximum value at each location over all times as well as the corresponding indices. I have another matrix B with the same dimensions of A and I'm interested in values for each location at the time that A's value at that location was maximal.
I've tried [max_val pos] = max(A,[],2), which gives the right maximum values, but A(pos) does not equal max_val.
How exactly does this function work?
I tried a smaller example as well. Still I don't understand the meaning of the indices....
>> H
H(:,:,1) =
1 2
3 4
H(:,:,2) =
5 6
7 8
>> [val pos] = max(H,[],2)
val(:,:,1) =
val(:,:,2) =
pos(:,:,1) =
pos(:,:,2) =
The indices in idx represent the index of the max value in the corresponding row. You can use sub2ind to create a linear index if you want to test if A(pos)=max_val
A=rand(206, 193);
[max_val, idx]=max(A, [], 2);
A_max=A(sub2ind(size(A), (1:size(A,1))', idx));
Similarly, you can access the values of B with:
B_Amax=B(sub2ind(size(A), (1:size(A,1))', idx));
From your example:
H(:,:,2) =
5 6
7 8
[val pos] = max(H,[],2)
val(:,:,2) =
pos(:,:,2) =
The reason why pos(:,:,2) is [2; 2] is because the maximum is at position 2 for both rows.
max is a primarily intended for use with vectors. In normal mode, even the multi-dimensional arrays are treated as a series of vectors along which the max function is applied.
So, to get the values in B at each location at the time where A is maximum, you should
// find the maximum values and positions in A
[c,i] = max(A, [], 2);
// iterate along the first dimension, to retrieve the corresponding values in B
C = [];
for k=1:size(A,1)
C(k) = B(k,i(k));
You can refer to #Jigg's answer for a more concise way of creating matrix C