Matlab: "grouping mean" - matlab

Suppose I have the vectors:
y = [1 1.01 1.02 1.03 2 2.01 2.02 3 3.01 3.02 3.03];
c = [0 0 0 0 1 1 1 2 2 2 2 ];
Is there a vectorized way to get a "grouping mean", that is, the mean value of y for each unique value of c? (This is a simplified example; I have something similar but the vector size is in the thousands and there are hundreds of values of c)
I can do it in a for-loop, just wondering if it could be vectorized. Here's my for-loop implementation:
function [my,mc] = groupmean(y,c)
my = [];
mc = [];
for ci = unique(c)'
mc(end+1) = ci;
my(end+1) = mean(y(c==ci));
end

Short answer:
>> y = [1 1.01 1.02 1.03 2 2.01 2.02 3 3.01 3.02 3.03];
>> c = [0 0 0 0 1 1 1 2 2 2 2 ];
>> groupmeans = accumarray(c'+1,y',[],#mean)
groupmeans =
1.015
2.01
3.015
To explain the above: accumarray is a bit cryptic, but extremely useful and worth getting to know (and very fast). The first input is a vector (they need to be column vectors, which is why it's c' and y') that groups the rows of the second input vector. The elements need to be positive integers (for some reason), which is why I've added 1 to c'. The last input is a handle to a function that is applied as an accumulator to each group of the values in y.
Hope that makes sense! If not, doc accumarray :)

Related

Applying median filter to data with 2 axes

I have the following code:
x = VarName3;
y = VarName4;
x = (x/6000)/60;
plot(x, y)
Where VarName3 and VarName4 are 3000x1. I would like to apply a median filter to this in MATLAB. However, the problem I am having is that, if I use medfilt1, then I can only enter a single array of variables as the first argument. And for medfilt2, I can only enter a matrix as the first argument. But the data looks very obscured if I convert x and y into a matrix.
The x is time and y is a list of integers. I'd like to be able to filter out spikes and dips. How do I go about doing this? I was thinking of just eliminating the erroneous data points by direct manipulation of the data file. But then, I don't really get the effect of a median filter.
I found a solution using sort.
Median is the center element, so you can sort three elements, and take the middle element as median.
sort function also returns the index of the previous syntaxes.
I used the index information for restoring the matching value of X.
Here is my code sample:
%X - simulates time.
X = [1 2 3 4 5 6 7 8 9 10];
%Y - simulates data
Y = [0 1 2 0 100 1 1 1 2 3];
%Create three vectors:
Y0 = [0, Y(1:end-1)]; %Left elements [0 0 1 2 0 2 1 1 1 2]
Y1 = Y; %Center elements [0 1 2 0 2 1 1 1 2 3]
Y2 = [Y(2:end), 0]; %Right elements [1 2 0 2 1 1 1 2 3 0]
%Concatenate Y0, Y1 and Y2.
YYY = [Y0; Y1; Y2];
%Sort YYY:
%sortedYYY(2, :) equals medfilt1(Y)
%I(2, :) equals the index: value 1 for Y0, 2 for Y1 and 3 for Y2.
[sortedYYY, I] = sort(YYY);
%Median is the center of sorted 3 elements.
medY = sortedYYY(2, :);
%Corrected X index of medY
medX = X + I(2, :) - 2;
%Protect X from exceeding original boundries.
medX = min(max(medX, min(X)), max(X));
Result:
medX =
1 2 2 3 6 7 7 8 9 9
>> medY
medY =
0 1 1 2 1 1 1 1 2 2
Use a sliding window on the data vector centred at a given time. The value of your filtered output at that time is the median value of the data in the sliding window. The size of the sliding window is an odd value, not necessarily fixed to 3.

MATLAB: After reshaping matrix to array, how can we know back where the value originally belongs to?

Let z = [1 3 5 6] and by getting all the difference between each elements:
we get:
bsxfun(#minus, z', z)
ans =
0 -2 -4 -5
2 0 -2 -3
4 2 0 -1
5 3 1 0
I now want to order these values in ascending order and remove the duplicates. So:
sort(reshape(bsxfun(#minus, z', z),1,16))
ans =
Columns 1 through 13
-5 -4 -3 -2 -2 -1 0 0 0 0 1 2 2
Columns 14 through 16
3 4 5
C = unique(sort(reshape(bsxfun(#minus, z', z),1,16)))
C =
-5 -4 -3 -2 -1 0 1 2 3 4 5
But by looking at -5 in [-5 -4 -3 -2 -1 0 1 2 3 4 5],
how can I tell where -5 comes from. By reading myself the matrix,
0 -2 -4 -5
2 0 -2 -3
4 2 0 -1
5 3 1 0
I know it comes from z(1) - z(4), i.e. row 1 column 4.
Also 2 comes from both z(3) - z(2) and z(2) - z(1), which comes from two cases. Without reading the originally matrix itself, how can we know that the 2 in [-5 -4 -3 -2 -1 0 1 2 3 4 5] is originally in row 3 column 2 and row 2 column 1 of the original matrix?
So by looking at each element in [-5 -4 -3 -2 -1 0 1 2 3 4 5], how do we know, for example, where -5 comes from in the original matrix index efficiently. I want to know as I need to do operation on ,e.g.,-5 and two indices that produce this: for example, for each difference, say -5, i do (-5)*1*6, as z(1)- z(6) = -5. But for 2, I need to do 2*(3*2+2*1) as z(3) - z(2) = 2, z(2) - z(1) = 2 which is not distinct.
Thinking hard, I think i should not reshape bsxfun(#minus, z', z) to array. I will also create two index array such that I can do operations like (-5)*1*6 stated above effectively. However, this is easier said than done and I also have to take care of nondistinct sources. Or should I do the desired operations first?
Use the third output from unique. And don't sort, unique will do that for you.
[sortedOutput,~,linearIndices] = unique(reshape(bsxfun(#minus, z', z),[1 16]))
You can reconstruct the result from bsxfun like so:
distances = reshape(sortedOutput(linearIndices),[4 4]);
If you want to know where a certain value appears, you write
targetValue = -5;
targetValueIdx = find(sortedOutput==targetValue);
linearIndexIntoDistances = find(targetValueIdx==linearIndices);
[row,col] = ind2sub([4 4],linearIndexIntoDistances);
Because linearIndices is 1 wherever the first value in sortedOutput appears in the original vector.
If you save the result of bsxfun in an intermediate variable:
distances=bsxfun(#minus, z', z)
Then you can look for the values of C in distances using find iteratively.
[rows,cols]=find(C(i)==distances)
This will give all rows and cols if the values are repeated. You just need to then use them for your equation.
You can use accumarray to collect all row and column indices that correspond to the same value in the matrix of differences:
z = [1 3 5 6]; % data vector
zd = bsxfun(#minus, z.', z); % matrix of differences
[C, ~, ind] = unique(zd); % unique values and indices
[rr, cc] = ndgrid(1:numel(z)); % template for row and col indices
f = #(x){x}; % anonymous function to collect row and col indices
row = accumarray(ind, rr(:), [], f); % group row indices according to ind
col = accumarray(ind, cc(:), [], f); % same for col indices
For example, C(6) is value 0, which appears four times in zd, at positions given by row{6} and col{6}:
>> row{6}.'
ans =
3 2 1 4
>> col{6}.'
ans =
3 2 1 4
As you see, the results are not guaranteed to be sorted. If you need to sort them in linear order:
rowcol = cellfun(#(r,c)sortrows([r c]), row, col, 'UniformOutput', false);
so now
>> rowcol{6}
ans =
1 1
2 2
3 3
4 4
I'm not sure I've followed exactly but some points to consider:
unique will sort the data for you by default so you don't need to call sort first
unique actually has three outputs and you can recover your original vector (i.e. with duplicates) using the third output so
[C,~,ic] = unique(reshape(bsxfun(#minus, z', z),1,16))
now you can get back to bsxfun(#minus, z', z),1,16) by calling
reshape(C(ic), numel(z), numel(z))
You might be more interested in the second output of unique which tells you what index each unique value was at in your 1-by-16 vector. It really depends on what you're trying to do though. But with this you could get a list of row column pairs to match your unique values:
[rows, cols] = ndgrid(1:4);
coords = [rows(:), cols(:)];
[C, ia] = unique(reshape(bsxfun(#minus, z', z),1,16));
coords_pairs = coords(ia,:)
which results in
coords_pairs =
1 4
1 3
2 4
2 3
3 4
4 4
4 3
3 2
4 2
3 1
4 1

Efficient representation of low-complexity integer vectors

Let V be a vector of integers, and let L be the length of V.
Now, suppose that the number N of distinct values in V is much smaller than L.
One may also assume that V is sorted, so that it can be construed as the concatenation of N consecutive constant "blocks".
Lastly, one may assume that, once initialized, V is henceforth read-only (aka immutable).
(In the case I'm working with at the moment, L is between 106 and 107, and N is about 20.)
It is wasteful to store such low-complexity data in a standard MATLAB L-long vector. Does MATLAB have any built-in1 data structure that
has the same interface as a regular vector (e.g. one can read its k-th element with the expression V(k), its last element with V(end), ranges of locations with V(p:q), etc.).
uses much less storage space than L × the size of an integer.
?
BTW, the problem here is reminiscent to that of sparse-array representation, but not quite the same (at least AFAICT).
OK, here's my solution, based on gariepy's answer:
block_sizes = [5, 4, 3, 2];
block_values = 1000 + (1:numel(block_sizes));
interpolation_table = [0 cumsum(block_sizes)];
V = #(i) interp1(interpolation_table, [NaN block_values], i, 'next');
V(0)
ans =
NaN
V(1:5)
ans =
1001 1001 1001 1001 1001
V(6:9)
ans =
1002 1002 1002 1002
V(10:12)
ans =
1003 1003 1003
V(13:14)
ans =
1004 1004
V(15)
ans =
NaN
It has a tiny wart, though:
V(end)
ans =
1001
(It would have been better if it raised an exception when given end as arguments, rather than give a completely crazy answer.)
1 Of course, I know that I can always can try to roll my own implementation of such a thing, but I prefer not to re-invent wheels if I can avoid it.
One possible method to represent this data is an interpolation table.
Assume vec is your length L vector. First, count the number of occurrences:
[num_occurrences, y_values] = hist(vec, unique(vec));
Then, build the interpolation representation:
interp_table = zeros(1,length(y_values) + 1);
interp_table(1) = 1;
y_values(end+1) = y_values(end) + 1; % Need to create a "fake" value at the end of the interpolation table
for i = 2:length(y_values)
interp_table(i) = interp_table(i-1) + num_occurrences(i-1);
end
Finally, define a function handle to give you the "array-like" access you want.
my_fun = #(x) interp1(interp_table, y_values, x, 'previous');
Example:
>> vec = [1 1 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 ];
>> my_fun(1)
ans =
1
>> my_fun(2)
ans =
1
>> my_fun(3)
ans =
2
>> my_fun(6)
ans =
2
>> my_fun(7)
ans =
2
>> my_fun(8)
ans =
3
>> my_fun(17)
ans =
3
>> my_fun(18) %% This is vec(L+1), so it should never be needed
ans =
4
>> my_fun(19) %% Above index L+1, values are not defined
ans =
NaN
The examples demonstrate a minor caveat: values above my_fun(L) should not be used, where L is the length of the original vector that is being represented by the interpolation table. So this gives you array-like access, though you cannot directly calculate the "length" of this interpolation table.
EDIT: note that you CAN do ranges with this interpolation function:
>> my_fun(1:17)
ans =
Columns 1 through 15
1 1 2 2 2 2 2 3 3 3 3 3 3 3 3
Columns 16 through 17
3 3

matlab indexing with multiple condition

I can't figure out how to create a vector based on condition on more than one other vectors. I have three vectors and I need values of one vector if values on other vectors comply to condition.
As an example below I would like to choose values from vector a if values on vector b==2 and values on vector c==0 obviously I expect [2 4]
a = [1 2 3 4 5 6 7 8 9 10];
b = [1 2 1 2 1 2 1 2 1 2];
c = [0 0 0 0 0 1 1 1 1 1]
I thought something like:
d = a(b==2) & a(c==0)
but I have d = 1 1 1 1 1 not sure why.
It seems to be basic problem but I can find solution for it.
In your case you can consider using a(b==2 & c==0)
Use ismember to find the matching indices along the rows after concatenating b and c and then index to a.
Code
a(ismember([b;c]',[2 0],'rows'))
Output
ans =
2
4
You may use bsxfun too for the same result -
a(all(bsxfun(#eq,[b;c],[2 0]'),1))
Or you may just tweak your method to get the correct result -
a(b==2 & c==0)

cumsum only within groups?

Lets say I have 2 vectors:
a=[0 1 0 1 1 0 1 0 0 0 1 1 1];
b=[1 1 1 1 1 1 2 2 2 3 3 3 3];
For every group of numbers in b I want to cumsum, so that the result should look like that:
c=[1 3;2 1;3 3]
That means that I have for the ones in b 3 ones in a, for group two in b I have only one one in a etc.
There have been some complicated answers so far. Try accumarray(b',a').
If you're looking for a solution where b can be anything, then a combination of hist and unique will help:
num = unique(b(logical(a))); %# identify the numbers in b with non-zero counts
cts = hist(b(logical(a)),num); %# count
c = [num(:),cts(:)]; %# combine.
If you want the first column of c to go from 1 to the maximum of b, then you can rewrite the first line as num=1:max(b), and you'll also get rows in c where the counts are zero.
Assuming that b is monotonically increasing by 1:
c = cell2mat(transpose(arrayfun( #(x) [ x sum(a(find( b == x ))) ], min(b):max(b), 'UniformOutput',false)))
should give the right answer in a one liner format, or:
for ii=min(b):max(b)
II = find( b == ii );
v = sum(a(II));
c(ii,:) = [ii v];
end
which is a bit easier to read. Hope this helps.