MATLAB find mean of column in matrix using two different indices - matlab

I have a 22007x3 matrix with data in column 3 and two separate indices in columns 1 and 2.
eg.
x =
1 3 4
1 3 5
1 3 5
1 16 4
1 16 3
1 16 4
2 4 1
2 4 3
2 11 2
2 11 3
2 11 2
I need to find the mean of the values in column 3 when the values in column 1 are the same AND the values in column 2 are the same, to end up with something like:
ans =
1 3 4.6667
1 16 3.6667
2 4 2
2 11 2.3333
Please bear in mind that in my data, the number of times the values in column 1 and 2 occur can be different.
Two options I've tried already are the meshgrid/accumarray option, using two distinct unique functions and a 3D array:
[U, ix, iu] = unique(x(:, 1));
[U2,ix2,iu2] = unique(x(:,2));
[c, r, j] = meshgrid((1:size(x(:, 1), 2)), iu, iu2);
totals = accumarray([r(:), c(:), j(:)], x(:), [], #nanmean);
which gives me this:
??? Maximum variable size allowed by the program is exceeded.
Error in ==> meshgrid at 60
xx = xx(ones(ny,1),:,ones(nz,1));
and the loop option,
for i=1:size(x,1)
if x(i,2)== x(i+1,2);
totals(i,:)=accumarray(x(:,1),x(:,3),[],#nanmean);
end
end
which is obviously so very, very wrong, not least because of the x(i+1,2) bit.
I'm also considering creating separate matrices depending on how many times a value in column 1 occurs, but that would be long and inefficient, so I'm loathe to go down that road.

Group on the first two columns with a unique(...,'rows'), then accumulate only the third column (always the best approach to accumulate only where accumulation really happens, thus avoiding indices, i.e. the first two columns, which you can reattach with unX):
[unX,~,subs] = unique(x(:,1:2),'rows');
out = [unX accumarray(subs,x(:,3),[],#nanmean)];
out =
1 3 4.6667
1 16 3.6667
2 4 2
2 11 2.33

This is an ideal opportunity to use sparse matrix math.
x = [ 1 2 5;
1 2 7;
2 4 6;
3 4 6;
1 4 8;
2 4 8;
1 1 10]; % for example
SM = sparse(x(:,1),x(:,2), x(:,3);
disp(SM)
Result:
(1,1) 10
(1,2) 12
(1,4) 8
(2,4) 14
(3,6) 7
As you can see, we did the "accumulate same indices into same container" in one fell swoop. Now you need to know how many elements you have:
NE = sparse(x(:,1), x(:,2), ones(size(x(:,1))));
disp(NE);
Result:
(1,1) 1
(1,2) 2
(1,4) 1
(2,4) 2
(3,6) 1
Finally, you divide one by the other to get the mean (only use elements that have a value):
matrixMean = SM;
nz = find(NE>0);
matrixMean(nz) = SM(nz) ./ NE(nz);
If you then disp(matrixMean), you get
(1,1) 10
(1,2) 6
(1,4) 8
(2,4) 7
(3,6) 7
If you want to access the individual elements differently, then after you have computed SM and NE you can do
[i j n] = find(NE);
matrixMean = SM(i,j)./NE(i,j);
disp([i(:) j(:) nonzeros(matrixMean)]);

Related

dot product of matrix columns

I have a 4x8 matrix which I want to select two different columns of it then derive dot product of them and then divide to norm values of that selected columns, and then repeat this for all possible two different columns and save the vectors in a new matrix. can anyone provide me a matlab code for this purpose?
The code which I supposed to give me the output is:
A=[1 2 3 4 5 6 7 8;1 2 3 4 5 6 7 8;1 2 3 4 5 6 7 8;1 2 3 4 5 6 7 8;];
for i=1:8
for j=1:7
B(:,i)=(A(:,i).*A(:,j+1))/(norm(A(:,i))*norm(A(:,j+1)));
end
end
I would approach this a different way. First, create two matrices where the corresponding columns of each one correspond to a unique pair of columns from your matrix.
Easiest way I can think of is to create all possible combinations of pairs, and eliminate the duplicates. You can do this by creating a meshgrid of values where the outputs X and Y give you a pairing of each pair of vectors and only selecting out the lower triangular part of each matrix offsetting by 1 to get the main diagonal just one below the diagonal.... so do this:
num_columns = size(A,2);
[X,Y] = meshgrid(1:num_columns);
X = X(tril(ones(num_columns),-1)==1); Y = Y(tril(ones(num_columns),-1)==1);
In your case, here's what the grid of coordinates looks like:
>> [X,Y] = meshgrid(1:num_columns)
X =
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
Y =
1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5
6 6 6 6 6 6 6 6
7 7 7 7 7 7 7 7
8 8 8 8 8 8 8 8
As you can see, if we select out the lower triangular part of each matrix excluding the diagonal, you will get all combinations of pairs that are unique, which is what I did in the last parts of the code. Selecting the lower-part is important because by doing this, MATLAB selects out values column-wise, and traversing the columns of the lower-triangular part of each matrix gives you the exact orderings of each pair of columns in the right order (i.e. 1-2, 1-3, ..., 1-7, 2-3, 2-4, ..., etc.)
The point of all of this is that can then use X and Y to create two new matrices that contain the columns located at each pair of X and Y, then use dot to apply the dot product to each matrix column-wise. We also need to divide the dot product by the multiplication of the magnitudes of the two vectors respectively. You can't use MATLAB's built-in function norm for this because it will compute the matrix norm for matrices. As such, you have to sum over all of the rows for each column respectively for each of the two matrices then multiply both of the results element-wise then take the square root - this is the last step of the process:
matrix1 = A(:,X);
matrix2 = A(:,Y);
B = dot(matrix1, matrix2, 1) ./ sqrt(sum(matrix1.^2,1).*sum(matrix2.^2,1));
I get this for B:
>> B
B =
Columns 1 through 11
1 1 1 1 1 1 1 1 1 1 1
Columns 12 through 22
1 1 1 1 1 1 1 1 1 1 1
Columns 23 through 28
1 1 1 1 1 1
Well.. this isn't useful at all. Why is that? What you are actually doing is finding the cosine angle between two vectors, and since each vector is a scalar multiple of another, the angle that separates each vector is in fact 0, and the cosine of 0 is 1.
You should try this with different values of A so you can see for yourself that it works.
To make this code compatible for copying and pasting, here it is:
%// Define A here:
A = repmat(1:8, 4, 1);
%// Code to produce dot products here
num_columns = size(A,2);
[X,Y] = meshgrid(1:num_columns);
X = X(tril(ones(num_columns),-1)==1); Y = Y(tril(ones(num_columns),-1)==1);
matrix1 = A(:,X);
matrix2 = A(:,Y);
B = dot(matrix1, matrix2, 1) ./ sqrt(sum(matrix1.^2,1).*sum(matrix2.^2,1));
Minor Note
If you have a lot of columns in A, this may be very memory intensive. You can get your original code to work with loops, but you need to change what you're doing at each column.
You can do something like this:
num_columns = nchoosek(size(A,2),2);
B = zeros(1, num_columns);
counter = 1;
for ii = 1 : size(A,2)
for jj = ii+1 : size(A,2)
B(counter) = dot(A(:,ii), A(:,jj), 1) / (norm(A(:,ii))*norm(A(:,jj)));
counter = counter + 1;
end
end
Note that we can use norm because we're specifying vectors for each of the inputs into the function. We first preallocate a matrix B that will contain the dot products of all possible combinations. Then, we go through each pair of combinations - take note that the inner for loop starts from the outer most for loop index added with 1 so you don't look at any duplicates. We take the dot product of the corresponding columns referenced by positions ii and jj and store the results in B. I need an external counter so we can properly access the right slot to place our result in for each pair of columns.

maximum value column with the condition of another column [duplicate]

I need to find the maximum among values with same labels, in matlab, and I am trying to avoid using for loops.
Specifically, I have an array L of labels and an array V of values, same size. I need to produce an array S which contains, for each value of L, the maximum value of V. An example will explain better:
L = [1,1,1,2,2,2,3,3,3,4,4,4,1,2,3,4]
V = [5,4,3,2,1,2,3,4,5,6,7,8,9,8,7,6]
Then, the values of the output array S will be:
s(1) = 9 (the values V(i) such that L(i) == 1 are: 5,4,3,9 -> max = 9)
s(2) = 8 (the values V(i) such that L(i) == 2 are: 2,1,2,8 -> max = 8)
s(3) = 7 (the values V(i) such that L(i) == 3 are: 3,4,5,7 -> max = 7)
s(4) = 8 (the values V(i) such that L(i) == 4 are: 6,7,8,6 -> max = 8)
this can be trivially implemented by traversing the arrays L and V with a for loop, but in Matlab for loops are slow, so I was looking for a faster solution. Any idea?
This is a standard job for accumarray.
Three cases need to be considered, with increasing generality:
Integer labels.
Integer labels, specify fill value.
Remove gaps; or non-integer labels. General case.
Integer labels
You can just use
S = accumarray(L(:), V(:), [], #max).';
In your example, this gives
>> L = [1 1 1 2 2 2 3 3 3 4 4 4 1 2 3 7];
>> V = [5 4 3 2 1 2 3 4 5 6 7 8 9 8 7 6];
>> S = accumarray(L(:), V(:), [], #max).'
S =
9 8 7 8
Integer labels, specify fill value
If there are gaps between integers in L, the above will give a 0 result for the non-existing labels. If you want to change that fill value (for example to NaN), use a fifth input argument in acccumarray:
S = accumarray(L(:), V(:), [], #max, NaN).';
Example:
>> L = [1 1 1 2 2 2 3 3 3 4 4 4 1 2 3 7]; %// last element changed
>> V = [5 4 3 2 1 2 3 4 5 6 7 8 9 8 7 6]; %// same as in your example
>> S = accumarray(L(:), V(:), [], #max, NaN).'
S =
9 8 7 8 NaN NaN 6
Remove gaps; or non-integer labels. General case
When the gaps between integer labels are large, using a fill value may be inefficient. In that case you may want to get only the meaningful values in S, without fill values, i.e.skip non-existing labels. Also, it may be the case that L doesn't necessarily contain integers.
These two issues are solved by applying unique to the labels before using accumarray:
[~, ~, Li] = unique(L); %// transform L into consecutive integers
S = accumarray(Li(:), V(:), [], #max, NaN).';
Example:
>> L = [1.5 1.5 1.5 2 2 2 3 3 3 4 4 4 1 2 3 7.8]; %// note: non-integer values
>> V = [5 4 3 2 1 2 3 4 5 6 7 8 9 8 7 6 ]; %// same as in your example
>> [~, ~, Li] = unique(L); %// transform L into consecutive integers
>> S = accumarray(Li(:), V(:), [], #max, NaN).'
S =
9 5 8 7 8 6
helper=[L.', V.'];
helper=sortrows(helper,-2);
[~,idx,~]=unique(helper(:,1));
S=helper(idx,2);
What I do is: I join the two arrays as columns. Then I sort them regarding second column with biggest element first. Then I get the idx of the unique Values in L before I return the corresponding Values from V.
The solution from Luis Mendo is faster. But as far as I see his solution doesn't work if there is a zero,negative value or a noninteger inside L:
Luis solution: Elapsed time is 0.722189 seconds.
My solution: Elapsed time is 2.575943 seconds.
I used:
L= ceil(rand(1,500)*10);
V= ceil(rand(1,500)*250);
and ran the code 10000 times.

How can I go through the columns of a matrix in matlab and add them each to a specific column of a sum matrix in matlab?

Supose there is a Matrix
A =
1 3 2 4
4 2 5 8
6 1 4 9
and I have a Vector containing the "class" of each column of this matrix for example
v = [1 , 1 , 2 , 3]
How can I sum the columns of the matrix to a new matrix as column vectors each to the column of their class? In this example columns 1 and 2 of A would added to the first column of the new matrix, column 2 to the 3 to the 2nd, column 4 the the 3rd.
Like
SUM =
4 2 4
6 5 8
7 4 9
Is this possible without loops?
One of the perfect scenarios to combine the powers of accumarray and bsxfun -
%// Since we are to accumulate columns, first step would be to transpose A
At = A.' %//'
%// Create a vector of linear IDs for use with ACCUMARRAY later on
idx = bsxfun(#plus,v(:),[0:size(A,1)-1]*max(v))
%// Use ACCUMARRAY to accumulate rows from At, i.e. columns from A based on the IDs
out = reshape(accumarray(idx(:),At(:)),[],size(A,1)).'
Sample run -
A =
1 3 2 4 6 0
4 2 5 8 9 2
6 1 4 9 8 9
v =
1 1 2 3 3 2
out =
4 2 10
6 7 17
7 13 17
An alternative with accumarray in 2D. Generate a grid with the vector v and then apply accumarray:
A = A.';
v = [1 1 2 3];
[X, Y] = ndgrid(v,1:size(A,2));
Here X and Y look like this:
X =
1 1 1
1 1 1
2 2 2
3 3 3
Y =
1 2 3
1 2 3
1 2 3
1 2 3
Then apply accumarray:
B=accumarray([X(:) Y(:)],A(:)),
SUM = B.'
SUM =
4 2 4
6 5 8
7 4 9
As you see, using [X(:) Y(:)] create the following array:
ans =
1 1
1 1
2 1
3 1
1 2
1 2
2 2
3 2
1 3
1 3
2 3
3 3
in which the vector v containing the "class" is replicated 3 times since there are 3 unique classes that are to be summed up together.
EDIT:
As pointed out by knedlsepp you can get rid of the transpose to A and B like so:
[X2, Y2] = ndgrid(1:size(A,1),v);
B = accumarray([X2(:) Y2(:)],A(:))
which ends up doing the same. I find it a bit more easier to visualize with the transposes but that gives the same result.
How about a one-liner?
result = full(sparse(repmat(v,size(A,1),1), repmat((1:size(A,1)).',1,size(A,2)), A));
Don't optimize prematurely!
The for loop performs fine for your problem:
out = zeros(size(A,1), max(v));
for i = 1:numel(v)
out(:,v(i)) = out(:,v(i)) + A(:,i);
end
BTW: With fine, I mean: fast, fast, fast!

Matlab: avoid for loops to find the maximum among values with same labels

I need to find the maximum among values with same labels, in matlab, and I am trying to avoid using for loops.
Specifically, I have an array L of labels and an array V of values, same size. I need to produce an array S which contains, for each value of L, the maximum value of V. An example will explain better:
L = [1,1,1,2,2,2,3,3,3,4,4,4,1,2,3,4]
V = [5,4,3,2,1,2,3,4,5,6,7,8,9,8,7,6]
Then, the values of the output array S will be:
s(1) = 9 (the values V(i) such that L(i) == 1 are: 5,4,3,9 -> max = 9)
s(2) = 8 (the values V(i) such that L(i) == 2 are: 2,1,2,8 -> max = 8)
s(3) = 7 (the values V(i) such that L(i) == 3 are: 3,4,5,7 -> max = 7)
s(4) = 8 (the values V(i) such that L(i) == 4 are: 6,7,8,6 -> max = 8)
this can be trivially implemented by traversing the arrays L and V with a for loop, but in Matlab for loops are slow, so I was looking for a faster solution. Any idea?
This is a standard job for accumarray.
Three cases need to be considered, with increasing generality:
Integer labels.
Integer labels, specify fill value.
Remove gaps; or non-integer labels. General case.
Integer labels
You can just use
S = accumarray(L(:), V(:), [], #max).';
In your example, this gives
>> L = [1 1 1 2 2 2 3 3 3 4 4 4 1 2 3 7];
>> V = [5 4 3 2 1 2 3 4 5 6 7 8 9 8 7 6];
>> S = accumarray(L(:), V(:), [], #max).'
S =
9 8 7 8
Integer labels, specify fill value
If there are gaps between integers in L, the above will give a 0 result for the non-existing labels. If you want to change that fill value (for example to NaN), use a fifth input argument in acccumarray:
S = accumarray(L(:), V(:), [], #max, NaN).';
Example:
>> L = [1 1 1 2 2 2 3 3 3 4 4 4 1 2 3 7]; %// last element changed
>> V = [5 4 3 2 1 2 3 4 5 6 7 8 9 8 7 6]; %// same as in your example
>> S = accumarray(L(:), V(:), [], #max, NaN).'
S =
9 8 7 8 NaN NaN 6
Remove gaps; or non-integer labels. General case
When the gaps between integer labels are large, using a fill value may be inefficient. In that case you may want to get only the meaningful values in S, without fill values, i.e.skip non-existing labels. Also, it may be the case that L doesn't necessarily contain integers.
These two issues are solved by applying unique to the labels before using accumarray:
[~, ~, Li] = unique(L); %// transform L into consecutive integers
S = accumarray(Li(:), V(:), [], #max, NaN).';
Example:
>> L = [1.5 1.5 1.5 2 2 2 3 3 3 4 4 4 1 2 3 7.8]; %// note: non-integer values
>> V = [5 4 3 2 1 2 3 4 5 6 7 8 9 8 7 6 ]; %// same as in your example
>> [~, ~, Li] = unique(L); %// transform L into consecutive integers
>> S = accumarray(Li(:), V(:), [], #max, NaN).'
S =
9 5 8 7 8 6
helper=[L.', V.'];
helper=sortrows(helper,-2);
[~,idx,~]=unique(helper(:,1));
S=helper(idx,2);
What I do is: I join the two arrays as columns. Then I sort them regarding second column with biggest element first. Then I get the idx of the unique Values in L before I return the corresponding Values from V.
The solution from Luis Mendo is faster. But as far as I see his solution doesn't work if there is a zero,negative value or a noninteger inside L:
Luis solution: Elapsed time is 0.722189 seconds.
My solution: Elapsed time is 2.575943 seconds.
I used:
L= ceil(rand(1,500)*10);
V= ceil(rand(1,500)*250);
and ran the code 10000 times.

Extracting unique values

I have data in two columns that looks as follows:
A B
1,265848208 3
-0,608043611 0
-0,285735893 0
0,006895134 7
0 7
-0,004526196 7
0,176326617 10
-0,159688071 2
0,22439945 2
-0,991045044 1
0,178022324 1
-0,270967397 4
0,285849994 4
1,881705539 23
1,057184204 10
NaN 10
For all unique values in B I want to extract the corresponding value in column A and move it to a new matrix. I'm looking to then compute the mean of all the corresponding values in A and use as a dependent variable (weighted by no of observations per value in B) in a regression with the common value of B being the independent variable to reduce noise. Any help would on how to do this in Matlab (except running the regression) would be great!
Thanks
Oscar
Here is an efficient solution:
X = [
1.265848208 3
-0.608043611 0
-0.285735893 0
0.006895134 7
0 7
-0.004526196 7
0.176326617 10
-0.159688071 2
0.22439945 2
-0.991045044 1
0.178022324 1
-0.270967397 4
0.285849994 4
1.881705539 23
1.057184204 10
NaN 10
];
%# unique values in B, and their indices
[valB,~,subs] = unique(X(:,2));
%# values of A for each unique number in B (cellarray)
valA = accumarray(subs, X(:,1), [], #(x) {x});
%# mean of each group
meanValA = cellfun(#nanmean, valA)
%# perform regression here...
The result:
%# B values, mean of corresponding values in A, number of A values
>> [valB meanValA cellfun(#numel,valA)]
ans =
0 -0.44689 2
1 -0.40651 2
2 0.032356 2
3 1.2658 1
4 0.0074413 2
7 0.00078965 3
10 0.61676 3
23 1.8817 1