I need to find the maximum among values with same labels, in matlab, and I am trying to avoid using for loops.
Specifically, I have an array L of labels and an array V of values, same size. I need to produce an array S which contains, for each value of L, the maximum value of V. An example will explain better:
L = [1,1,1,2,2,2,3,3,3,4,4,4,1,2,3,4]
V = [5,4,3,2,1,2,3,4,5,6,7,8,9,8,7,6]
Then, the values of the output array S will be:
s(1) = 9 (the values V(i) such that L(i) == 1 are: 5,4,3,9 -> max = 9)
s(2) = 8 (the values V(i) such that L(i) == 2 are: 2,1,2,8 -> max = 8)
s(3) = 7 (the values V(i) such that L(i) == 3 are: 3,4,5,7 -> max = 7)
s(4) = 8 (the values V(i) such that L(i) == 4 are: 6,7,8,6 -> max = 8)
this can be trivially implemented by traversing the arrays L and V with a for loop, but in Matlab for loops are slow, so I was looking for a faster solution. Any idea?
This is a standard job for accumarray.
Three cases need to be considered, with increasing generality:
Integer labels.
Integer labels, specify fill value.
Remove gaps; or non-integer labels. General case.
Integer labels
You can just use
S = accumarray(L(:), V(:), [], #max).';
In your example, this gives
>> L = [1 1 1 2 2 2 3 3 3 4 4 4 1 2 3 7];
>> V = [5 4 3 2 1 2 3 4 5 6 7 8 9 8 7 6];
>> S = accumarray(L(:), V(:), [], #max).'
S =
9 8 7 8
Integer labels, specify fill value
If there are gaps between integers in L, the above will give a 0 result for the non-existing labels. If you want to change that fill value (for example to NaN), use a fifth input argument in acccumarray:
S = accumarray(L(:), V(:), [], #max, NaN).';
Example:
>> L = [1 1 1 2 2 2 3 3 3 4 4 4 1 2 3 7]; %// last element changed
>> V = [5 4 3 2 1 2 3 4 5 6 7 8 9 8 7 6]; %// same as in your example
>> S = accumarray(L(:), V(:), [], #max, NaN).'
S =
9 8 7 8 NaN NaN 6
Remove gaps; or non-integer labels. General case
When the gaps between integer labels are large, using a fill value may be inefficient. In that case you may want to get only the meaningful values in S, without fill values, i.e.skip non-existing labels. Also, it may be the case that L doesn't necessarily contain integers.
These two issues are solved by applying unique to the labels before using accumarray:
[~, ~, Li] = unique(L); %// transform L into consecutive integers
S = accumarray(Li(:), V(:), [], #max, NaN).';
Example:
>> L = [1.5 1.5 1.5 2 2 2 3 3 3 4 4 4 1 2 3 7.8]; %// note: non-integer values
>> V = [5 4 3 2 1 2 3 4 5 6 7 8 9 8 7 6 ]; %// same as in your example
>> [~, ~, Li] = unique(L); %// transform L into consecutive integers
>> S = accumarray(Li(:), V(:), [], #max, NaN).'
S =
9 5 8 7 8 6
helper=[L.', V.'];
helper=sortrows(helper,-2);
[~,idx,~]=unique(helper(:,1));
S=helper(idx,2);
What I do is: I join the two arrays as columns. Then I sort them regarding second column with biggest element first. Then I get the idx of the unique Values in L before I return the corresponding Values from V.
The solution from Luis Mendo is faster. But as far as I see his solution doesn't work if there is a zero,negative value or a noninteger inside L:
Luis solution: Elapsed time is 0.722189 seconds.
My solution: Elapsed time is 2.575943 seconds.
I used:
L= ceil(rand(1,500)*10);
V= ceil(rand(1,500)*250);
and ran the code 10000 times.
Related
I would like to align and count vectors with different time stamps to count the corresponding bins.
Let's assume I have 3 matrix from [N,edges] = histcounts in the following structure. The first row represents the edges, so the bins. The second row represents the values. I would like to sum all values with the same bin.
A = [0 1 2 3 4 5;
5 5 6 7 8 5]
B = [1 2 3 4 5 6;
2 5 7 8 5 4]
C = [2 3 4 5 6 7 8;
1 2 6 7 4 3 2]
Now I want to sum all the same bins. My final result should be:
result = [0 1 2 3 4 5 6 7 8;
5 7 12 16 ...]
I could loop over all numbers, but I would like to have it fast.
You can use accumarray:
H = [A B C].'; %//' Concatenate the histograms and make them column vectors
V = [unique(H(:,1)) accumarray(H(:,1)+1, H(:,2))].'; %//' Find unique values and accumulate
V =
0 1 2 3 4 5 6 7 8
5 7 12 16 22 17 8 3 2
Note: The H(:,1)+1 is to force the bin values to be positive, otherwise MATLAB will complain. We still use the actual bins in the output V. To avoid this, as #Daniel says in the comments, use the third output of unique (See: https://stackoverflow.com/a/27783568/2732801):
H = [A B C].'; %//' stupid syntax highlighting :/
[U, ~, IU] = unique(H(:,1));
V = [U accumarray(IU, H(:,2))].';
If you're only doing it with 3 variables as you've shown then there likely aren't going to be any performance hits with looping it.
But if you are really averse to the looping idea, then you can do it using arrayfun.
rng = 0:8;
output = arrayfun(#(x)sum([A(2,A(1,:) == x), B(2,B(1,:) == x), C(2,C(1,:) == x)]), rng);
output = cat(1, rng, output);
output =
0 1 2 3 4 5 6 7 8
5 7 12 16 22 17 8 3 2
This can be beneficial for particularly large A, B, and C variables as there is no copying of data.
I need to find the maximum among values with same labels, in matlab, and I am trying to avoid using for loops.
Specifically, I have an array L of labels and an array V of values, same size. I need to produce an array S which contains, for each value of L, the maximum value of V. An example will explain better:
L = [1,1,1,2,2,2,3,3,3,4,4,4,1,2,3,4]
V = [5,4,3,2,1,2,3,4,5,6,7,8,9,8,7,6]
Then, the values of the output array S will be:
s(1) = 9 (the values V(i) such that L(i) == 1 are: 5,4,3,9 -> max = 9)
s(2) = 8 (the values V(i) such that L(i) == 2 are: 2,1,2,8 -> max = 8)
s(3) = 7 (the values V(i) such that L(i) == 3 are: 3,4,5,7 -> max = 7)
s(4) = 8 (the values V(i) such that L(i) == 4 are: 6,7,8,6 -> max = 8)
this can be trivially implemented by traversing the arrays L and V with a for loop, but in Matlab for loops are slow, so I was looking for a faster solution. Any idea?
This is a standard job for accumarray.
Three cases need to be considered, with increasing generality:
Integer labels.
Integer labels, specify fill value.
Remove gaps; or non-integer labels. General case.
Integer labels
You can just use
S = accumarray(L(:), V(:), [], #max).';
In your example, this gives
>> L = [1 1 1 2 2 2 3 3 3 4 4 4 1 2 3 7];
>> V = [5 4 3 2 1 2 3 4 5 6 7 8 9 8 7 6];
>> S = accumarray(L(:), V(:), [], #max).'
S =
9 8 7 8
Integer labels, specify fill value
If there are gaps between integers in L, the above will give a 0 result for the non-existing labels. If you want to change that fill value (for example to NaN), use a fifth input argument in acccumarray:
S = accumarray(L(:), V(:), [], #max, NaN).';
Example:
>> L = [1 1 1 2 2 2 3 3 3 4 4 4 1 2 3 7]; %// last element changed
>> V = [5 4 3 2 1 2 3 4 5 6 7 8 9 8 7 6]; %// same as in your example
>> S = accumarray(L(:), V(:), [], #max, NaN).'
S =
9 8 7 8 NaN NaN 6
Remove gaps; or non-integer labels. General case
When the gaps between integer labels are large, using a fill value may be inefficient. In that case you may want to get only the meaningful values in S, without fill values, i.e.skip non-existing labels. Also, it may be the case that L doesn't necessarily contain integers.
These two issues are solved by applying unique to the labels before using accumarray:
[~, ~, Li] = unique(L); %// transform L into consecutive integers
S = accumarray(Li(:), V(:), [], #max, NaN).';
Example:
>> L = [1.5 1.5 1.5 2 2 2 3 3 3 4 4 4 1 2 3 7.8]; %// note: non-integer values
>> V = [5 4 3 2 1 2 3 4 5 6 7 8 9 8 7 6 ]; %// same as in your example
>> [~, ~, Li] = unique(L); %// transform L into consecutive integers
>> S = accumarray(Li(:), V(:), [], #max, NaN).'
S =
9 5 8 7 8 6
helper=[L.', V.'];
helper=sortrows(helper,-2);
[~,idx,~]=unique(helper(:,1));
S=helper(idx,2);
What I do is: I join the two arrays as columns. Then I sort them regarding second column with biggest element first. Then I get the idx of the unique Values in L before I return the corresponding Values from V.
The solution from Luis Mendo is faster. But as far as I see his solution doesn't work if there is a zero,negative value or a noninteger inside L:
Luis solution: Elapsed time is 0.722189 seconds.
My solution: Elapsed time is 2.575943 seconds.
I used:
L= ceil(rand(1,500)*10);
V= ceil(rand(1,500)*250);
and ran the code 10000 times.
just lets make it simple, assume that I have a 10x3 matrix in matlab. The numbers in the first two columns in each row represent the x and y (position) and the number in 3rd columns show the corresponding value. For instance, [1 4 12] shows that the value of function in x=1 and y=4 is equal to 12. I also have same x, and y in different rows, and I want to average the values with same x,y. and replace all of them with averaged one.
For example :
A = [1 4 12
1 4 14
1 4 10
1 5 5
1 5 7];
I want to have
B = [1 4 12
1 5 6]
I really appreciate your help
Thanks
Ali
Like this?
A = [1 4 12;1 4 14;1 4 10; 1 5 5;1 5 7];
[x,y] = consolidator(A(:,1:2),A(:,3),#mean);
B = [x,y]
B =
1 4 12
1 5 6
Consolidator is on the File Exchange.
Using built-in functions:
sparsemean = accumarray(A(:,1:2), A(:,3).', [], #mean, 0, true);
[i,j,v] = find(sparsemean);
B = [i.' j.' v.'];
A = [1 4 12;1 4 14;1 4 10; 1 5 5;1 5 7]; %your example data
B = unique(A(:, 1:2), 'rows'); %find the unique xy pairs
C = nan(length(B), 1);
% calculate means
for ii = 1:length(B)
C(ii) = mean(A(A(:, 1) == B(ii, 1) & A(:, 2) == B(ii, 2), 3));
end
C =
12
6
The step inside the for loop uses logical indexing to find the mean of rows that match the current xy pair in the loop.
Use unique to get the unique rows and use the returned indexing array to find the ones that should be averaged and ask accumarray to do the averaging part:
[C,~,J]=unique(A(:,1:2), 'rows');
B=[C, accumarray(J,A(:,3),[],#mean)];
For your example
>> [C,~,J]=unique(A(:,1:2), 'rows')
C =
1 4
1 5
J =
1
1
1
2
2
C contains the unique rows and J shows which rows in the original matrix correspond to the rows in C then
>> accumarray(J,A(:,3),[],#mean)
ans =
12
6
returns the desired averages and
>> B=[C, accumarray(J,A(:,3),[],#mean)]
B =
1 4 12
1 5 6
is the answer.
I have a 22007x3 matrix with data in column 3 and two separate indices in columns 1 and 2.
eg.
x =
1 3 4
1 3 5
1 3 5
1 16 4
1 16 3
1 16 4
2 4 1
2 4 3
2 11 2
2 11 3
2 11 2
I need to find the mean of the values in column 3 when the values in column 1 are the same AND the values in column 2 are the same, to end up with something like:
ans =
1 3 4.6667
1 16 3.6667
2 4 2
2 11 2.3333
Please bear in mind that in my data, the number of times the values in column 1 and 2 occur can be different.
Two options I've tried already are the meshgrid/accumarray option, using two distinct unique functions and a 3D array:
[U, ix, iu] = unique(x(:, 1));
[U2,ix2,iu2] = unique(x(:,2));
[c, r, j] = meshgrid((1:size(x(:, 1), 2)), iu, iu2);
totals = accumarray([r(:), c(:), j(:)], x(:), [], #nanmean);
which gives me this:
??? Maximum variable size allowed by the program is exceeded.
Error in ==> meshgrid at 60
xx = xx(ones(ny,1),:,ones(nz,1));
and the loop option,
for i=1:size(x,1)
if x(i,2)== x(i+1,2);
totals(i,:)=accumarray(x(:,1),x(:,3),[],#nanmean);
end
end
which is obviously so very, very wrong, not least because of the x(i+1,2) bit.
I'm also considering creating separate matrices depending on how many times a value in column 1 occurs, but that would be long and inefficient, so I'm loathe to go down that road.
Group on the first two columns with a unique(...,'rows'), then accumulate only the third column (always the best approach to accumulate only where accumulation really happens, thus avoiding indices, i.e. the first two columns, which you can reattach with unX):
[unX,~,subs] = unique(x(:,1:2),'rows');
out = [unX accumarray(subs,x(:,3),[],#nanmean)];
out =
1 3 4.6667
1 16 3.6667
2 4 2
2 11 2.33
This is an ideal opportunity to use sparse matrix math.
x = [ 1 2 5;
1 2 7;
2 4 6;
3 4 6;
1 4 8;
2 4 8;
1 1 10]; % for example
SM = sparse(x(:,1),x(:,2), x(:,3);
disp(SM)
Result:
(1,1) 10
(1,2) 12
(1,4) 8
(2,4) 14
(3,6) 7
As you can see, we did the "accumulate same indices into same container" in one fell swoop. Now you need to know how many elements you have:
NE = sparse(x(:,1), x(:,2), ones(size(x(:,1))));
disp(NE);
Result:
(1,1) 1
(1,2) 2
(1,4) 1
(2,4) 2
(3,6) 1
Finally, you divide one by the other to get the mean (only use elements that have a value):
matrixMean = SM;
nz = find(NE>0);
matrixMean(nz) = SM(nz) ./ NE(nz);
If you then disp(matrixMean), you get
(1,1) 10
(1,2) 6
(1,4) 8
(2,4) 7
(3,6) 7
If you want to access the individual elements differently, then after you have computed SM and NE you can do
[i j n] = find(NE);
matrixMean = SM(i,j)./NE(i,j);
disp([i(:) j(:) nonzeros(matrixMean)]);
What is a Matlab-efficient way (no loop) to do the following operation: transform an input vector input into an output vector output such as output(i) is the number of integers in input that are less or equal than input(i).
For example:
input = [5 3 3 2 4 4 4]
would give:
output = [7 3 3 1 6 6 6]
First of all, don't use input for a variable name, it's a reserved keyword. I'll use X here instead.
An alternative way to obtain your desired result would be:
[U, V] = meshgrid(1:numel(X), 1:numel(X));
Y = sum(X(U) >= X(V))
and here's a one-liner:
Y = sum(bsxfun(#ge, X, X'))
EDIT:
If X has multiple rows and you want to apply this operation on each row, this is a little bit trickier. Here's what you can do:
[U, V] = meshgrid(1:numel(X), 1:size(X, 2));
V = V + size(X, 2) * idivide(U - 1, size(X, 2));
Xt = X';
Y = reshape(sum(Xt(U) >= Xt(V))', size(Xt))'
Example:
X =
5 3 3 2 4 4 4
3 9 7 7 1 2 2
Y =
7 3 3 1 6 6 6
4 7 6 6 1 3 3
I have found a possible answer:
output = arrayfun(#(x) sum(x>=input),input)
but it doesn't take advantage of vectorization.