Summing data by window size in grouped data - matlab

I am quite new to matlab and I am trying to find a way to accomplish the following task without using for loops:
I have a data set looking like this:
data = [1 5; 1 3; 1 8; 2 1; 2 2; 2 5; 3 3; 3 8; 3 4]
the first column is a group (being a month year combination in the future)
Now I want to calculate a sum over the second column with a given window size but only if the group index is the same - if not the maximal sum in this group shall be calculated.
With window size=2 I would like to create the following result:
summed_data = [1 8; 1 11; 1 8; 2 3; 2 7; 2 5; 3 11; 3 12; 3 4]
With a window size of 3 the result would look like this:
summed_data = [1 16; 1 11; 1 8; 2 8; 2 7; 2 5; 3 15; 3 12; 3 4]
and so on.
I thought about using accumarray by creating sufficient subindexes - but I have a problem with the window size and that the sums are overlapping.
Has anyone an idea on how to implement it without using loops?
Thanks in advance and best regards
stephan

This seems to work:
ws = 2; k = [ones(ws,1);zeros(mod(ws,2),1)];
C = accumarray(data(:,1),data(:,2),[],#(v){conv(v,k,'same')})
You seem to be anchoring to the current pixel in the window and looking forward. Is that correct?
Not sure if this covers all corner cases, but it might steer you in the right direction.
Test: ws = 2
ws = 2; k = [ones(ws,1);zeros(mod(ws,2),1)];
C = accumarray(data(:,1),data(:,2),[],#(v){conv(v,k,'same')});
summed_data = [data(:,1) vertcat(C{:})]
summed_data =
1 8
1 11
1 8
2 3
2 7
2 5
3 11
3 12
3 4
Test: ws = 3
summed_data = [data(:,1) vertcat(C{:})]
summed_data =
1 16
1 11
1 8
2 8
2 7
2 5
3 15
3 12
3 4

Related

resample data based on a particular variable

I have a large dataset as below. From the data, I want to randomly sample based on 'id'. Since the data has 5 ids, I would like to sample 5 ids with replacement and produce a new dataset with observations of sampled ids.
id value var1 var2 …
1 1
1 2
1 3
1 4
2 5
2 6
2 7
3 8
3 9
3 10
4 11
4 12
4 13
5 14
5 15
5 16
Let's suppose, I randomly draw 5 values from 1 to 5 (because there are 5 unique ids) and the result is (2 4 3 2 1). Then, I would like to have this data
id value var1 var2 …
2 5
2 6
2 7
4 11
4 12
4 13
3 8
3 9
3 10
2 5
2 6
2 7
1 1
1 2
1 3
1 4
Here is a sample code for ids varying from 1 through 5.
% data = [1 1; 1 2; 1 3; 1 4; 2 5; 2 6; 2 7; 3 8; 3 9; 3 10; 4 11; 4 12; 4 13;...
% 5 14; 5 15; 5 16];
data = rand(10000000,10);
data(:,1) = randi([1,5], length(data),1);
% Get all the indices from the 1st column;
indxCell = cell(5,1);
for i=1:5
tmpIndx = find(data(:,1) == i);
indxCell{i} = tmpIndx;
end
% Rearrange the indices
randIndx = randperm(5);
randIndxCell = indxCell(randIndx, 1);
% Generate a vector of indices by rearranging the 1st column of data matrix.
numDataPts = length(data);
newIndices = zeros(numDataPts,1);
endIndx = 1;
for i=1:5
startIndx = endIndx;
endIndx = startIndx + length(randIndxCell{i});
newIndices(startIndx:endIndx-1, 1) = randIndxCell{i};
end
newData = data(newIndices,:);
For more unique ids, you could modify the code.
Edits: Modified the data size and also rewrote the 2nd for-loop.

Matlab subscript indices error

help me please. i always get a subscript indices must either be real positive integers or logical error whenever i put 0 value on my "data" how can i get rid of it, i need to have a zero on that one. whenever there is a zero Voltage(1,0) = 1. but I can't get through.
Voltage = [0 1 1 3 4 1; 1 0 5 4 5 3; 6 4 0 4 5 7; 9 3 4 0 6 4; 7 8 5 6 0 7; 4 5 6 7 3 0];
data =[0 2 3 4; 5 6 7 8; 2 3 4 5; 4 5 6 7; 3 4 5 6; 1 3 5 7; 1 2 3 4; 3 4 5 6];
Vm = data(:,1);
Vn = data(:,2);
R = data(:,3);``
X1 = data(:,4);
sz=max(Vn)
y=1:sz
for Vm=data(:,1)
if Vm==0
Voltage(y,Vm)=1
Voltage(y,Vm)=logical(Voltage(y,Vm));
Current = Voltage(y,Vm)-Voltage(y,Vn);
else Vm >= 1
Current = Voltage(y,Vm)-Voltage(y,Vn);
end
end
You are trying to reference a value in the else statement in the Voltage matrix using y but y is not an integer it is an array (or 1d matrix). If you display y you will see that it is 1 2 3 4 5 6. There are several sections of offending code, one of which is:
else Vm >= 1
disp(y) # `y` is not an integer and therefore not a valid index.
Current = Voltage(y,Vm)-Voltage(y,Vn);
To fix it, decide if y should be static or change in the loop.
Let me know if you want a further explanation.

MATLAB vectorization to create a matrix

I want to create a matrix like
[1 2;
1 3;
1 4;
1 5;
2 3;
2 4;
2 5;
3 4;
3 5;
4 5 ]
when the size is 5. I aim to have sizes greater than 100. How can I create a matrix like this using vertorization in MATLAB?
You're looking for binomial coefficients, so use the built-in nchoosek command. For example, the matrix in your question can be generated by:
A = nchoosek(1:5, 2)
This results in:
A =
1 2
1 3
1 4
1 5
2 3
2 4
2 5
3 4
3 5
4 5
One solution:
[r,c]=find(tril(ones(N),-1));
result = [c,r];
As a bonus, you can get the number of rows in such matrix with
nrows = nchoosek(N,2);

How can I sum specific values of a column corresponding to unique values of another column, without using the "accumarray" command?

I have a matrix in matlab:
a=[1 1; 1 2; 1 3; 2 1; 2 5; 2 7; 3 2; 3 1; 3 4];
if
a1=[1 1 1 2 2 2 3 3 3]; is the first column
and
a2=[1 2 3 1 5 7 2 1 4]; is the second column
of this matrix, I want for the repeated values "unique(a1)" of a1 to sum the corresponding values of a2, so as to get this:
a3=[1+2+3 1+5+7 2+1+4]=[6 13 7]
but without using the "accumarray" command
Any help please?
My consolidator tool does this for you, even offering a tolerance.
[a1cons,a2cons] = consolidator(a1',a2',#sum)
a1cons =
1
2
3
a2cons =
6
13
7

Expand matrix based on first row value (MATLAB)

My input is the following:
X = [1 1; 1 2; 1 3; 1 4; 2 5; 1 6; 2 7; 1 8];
X =
1 1
1 2
1 3
1 4
2 5
1 6
2 7
1 8
I am looking to output a new matrix based on the value of the first column. If the value is equal to 1 -- the output will remain the same, when the value is equal to 2 then I would like to output two of the values contained in the second row. Like this:
Y =
1
2
3
4
5
5
6
7
7
8
Where 5 is output two times because the value in the first column is 2 and the same for 7
Here it is (vectorized):
C = cumsum(X(:,1))
A(C) = X(:,2)
D = hankel(A)
D(D==0) = inf
Y = min(D)
Edit:
Had a small bug, now it works.
% untested code:
Y = []; % would be better to pre-allocate
for ii = 1:size(X,1)
Y = [Y; X(ii,2)*ones(X(ii,1),1)];
end