Cumulative Summation in intervals - MATLAB - matlab

Suppose I have 2 input vectors x and reset of the same size
x = [1 2 3 4 5 6]
reset = [0 0 0 1 0 0]
and an output y which is the cumulative sum of the elements in x. Whenever the value of resets corresponds to 1, the cumulative sum for the elements reset and start all over again just like below
y = [1 3 6 4 9 15]
How would I implement this in Matlab?

One approach with diff and cumsum -
%// Setup few arrays:
cx = cumsum(x) %// Continuous Cumsumed version
reset_mask = reset==1 %// We want to create a logical array version of
%// reset for use as logical indexing next up
%// Setup ID array of same size as input array and with differences between
%// cumsumed values of each group placed at places where reset==1, 0s elsewhere
%// The groups are the islands of 0s and bordered at 1s in reset array.
id = zeros(size(reset))
diff_values = x(reset_mask) - cx(reset_mask)
id(reset_mask) = diff([0 diff_values])
%// "Under-compensate" the continuous cumsumed version cx with the
%// "grouped diffed cumsum version" to get the desired output
y = cx + cumsum(id)

Here's a way:
result = accumarray(1+cumsum(reset(:)), x(:), [], #(t) {cumsum(t).'});
result = [result{:}];
This works because if the first input to accumarray is sorted, the order within each group of the second input is preserved (more about this here).

Related

MATLAB: Applying vectors of row and column indices without looping

I have a situation analogous to the following
z = magic(3) % Data matrix
y = [1 2 2]' % Column indices
So,
z =
8 1 6
3 5 7
4 9 2
y represents the column index I want for each row. It's saying I should take row 1 column 1, row 2 column 2, and row 3 column 2. The correct output is therefore 8 5 9.
I worked out I can get the correct output with the following
x = 1:3;
for i = 1:3
result(i) = z(x(i),y(i));
end
However, is it possible to do this without looping?
Two other possible ways I can suggest is to use sub2ind to find the linear indices that you can use to sample the matrix directly:
z = magic(3);
y = [1 2 2];
ind = sub2ind(size(z), 1:size(z,1), y);
result = z(ind);
We get:
>> result
result =
8 5 9
Another way is to use sparse to create a sparse matrix which you can turn into a logical matrix and then sample from the matrix with this logical matrix.
s = sparse(1:size(z,1), y, 1, size(z,1), size(z,2)) == 1; % Turn into logical
result = z(s);
We also get:
>> result
result =
8
5
9
Be advised that this only works provided that each row index linearly increases from 1 up to the end of the rows. This conveniently allows you to read the elements in the right order taking advantage of the column-major readout that MATLAB is based on. Also note that the output is also a column vector as opposed to a row vector.
The link posted by Adriaan is a great read for the next steps in accessing elements in a vectorized way: Linear indexing, logical indexing, and all that.
there are many ways to do this, one interesting way is to directly work out the indexes you want:
v = 0:size(y,2)-1; %generates a number from 0 to the size of your y vector -1
ind = y+v*size(z,2); %generates the indices you are looking for in each row
zinv = z';
zinv(ind)
>> ans =
8 5 9

How to compare columns of a binary matrix and compare elements in matlab?

i have [sentences*words] matrix as shown below
out = 0 1 1 0 1
1 1 0 0 1
1 0 1 1 0
0 0 0 1 0
i want to process this matrix in a way that should tell W1 & W2 in "sentence number 2" and "sentence number 4" occurs with same value i.e 1 1 and 0 0.the output should be as follows:
output{1,2}= 2 4
output{1,2} tells word number 1 and 2 occurs in sentence number 2 and 4 with same values.
after comparing W1 & W2 next candidate should be W1 & W3 which occurs with same value in sentence 3 & sentence 4
output{1,3}= 3 4
and so on till every nth word is compared with every other words and saved.
This would be one vectorized approach -
%// Get number of columns in input array for later usage
N = size(out,2);
%// Get indices for pairwise combinations between columns of input array
[idx2,idx1] = find(bsxfun(#gt,[1:N]',[1:N])); %//'
%// Get indices for matches between out1 and out2. The row indices would
%// represent the occurance values for the final output and columns for the
%// indices of the final output.
[R,C] = find(out(:,idx1) == out(:,idx2))
%// Form cells off each unique C (these will be final output values)
output_vals = accumarray(C(:),R(:),[],#(x) {x})
%// Setup output cell array
output = cell(N,N)
%// Indices for places in output cell array where occurance values are to be put
all_idx = sub2ind(size(output),idx1,idx2)
%// Finally store the output values at appropriate indices
output(all_idx(1:max(C))) = output_vals
You can get a logical matrix of size #words-by-#words-by-#sentences easily using bsxfun:
coc = bsxfun( #eq, permute( out, [3 2 1]), permute( out, [2 3 1] ) );
this logical array is occ( wi, wj, si ) is true iff word wi and word wj occur in sentence si with the same value.
To get the output cell array from coc you need
nw = size( out, 2 ); %// number of words
output = cell(nw,nw);
for wi = 1:(nw-1)
for wj = (wi+1):nw
output{wi,wj} = find( coc(wi,wj,:) );
output{wj,wi} = output{wi,wj}; %// you can force it to be symmetric if you want
end
end

How to generate this matrix in matlab

H matrix is n-by-n, n=10000. I can use loop to generate this matrix in matlab. I just wonder if there are any methods that can do this without looping in matlab.
You can see that the upper right portion of the matrix consists of 1 / sqrt(n*(n-1)), the diagonal elements consist of -(n-1)/sqrt(n*(n-1)), the first column consists of 1/sqrt(n) and the rest of the elements are zero.
We can generate the full matrix that consists of the first column having all 1 / sqrt(n), then having the rest of the columns with 1 / sqrt(n*(n-1)) then we'll need to modify the matrix to include the rest of what you want.
As such, let's concentrate on the elements that start from row 2, column 2 as these follow a pattern. Once we're done, we can construct the other things that build up the final matrix.
x = 2:n;
Hsmall = repmat([1./sqrt(x.*(x-1))], n-1, 1);
Next, we will tackle the diagonal elements:
Hsmall(logical(eye(n-1))) = -(x-1)./sqrt(x.*(x-1));
Now, let's zero the rest of the elements:
Hsmall(tril(logical(ones(n-1)),-1)) = 0;
Now that we're done, let's create a new matrix that pieces all of this together:
H = [1/sqrt(n) 1./sqrt(x.*(x-1)); repmat(1/sqrt(n), n-1, 1) Hsmall];
Therefore, the full code is:
x = 2:n;
Hsmall = repmat([1./sqrt(x.*(x-1))], n-1, 1);
Hsmall(logical(eye(n-1))) = -(x-1)./sqrt(x.*(x-1));
Hsmall(tril(logical(ones(n-1)),-1)) = 0;
H = [1/sqrt(n) 1./sqrt(x.*(x-1)); repmat(1/sqrt(n), n-1, 1) Hsmall];
Here's an example with n = 6:
>> H
H =
Columns 1 through 3
0.408248290463863 0.707106781186547 0.408248290463863
0.408248290463863 -0.707106781186547 0.408248290463863
0.408248290463863 0 -0.816496580927726
0.408248290463863 0 0
0.408248290463863 0 0
0.408248290463863 0 0
Columns 4 through 6
0.288675134594813 0.223606797749979 0.182574185835055
0.288675134594813 0.223606797749979 0.182574185835055
0.288675134594813 0.223606797749979 0.182574185835055
-0.866025403784439 0.223606797749979 0.182574185835055
0 -0.894427190999916 0.182574185835055
0 0 -0.912870929175277
Since you are working with a pretty large n value of 10000, you might want to squeeze out as much performance as possible.
Going with that, you can use an efficient approach based on cumsum -
%// Values to be set in each column for the upper triangular region
upper_tri = 1./sqrt([1:n].*(0:n-1));
%// Diagonal indices
diag_idx = [1:n+1:n*n];
%// Setup output array
out = zeros(n,n);
%// Set the first row of output array with upper triangular values
out(1,:) = upper_tri;
%// Set the diagonal elements with the negative triangular values.
%// The intention here is to perform CUMSUM across each column later on,
%// thus therewould be zeros beyond the diagonal positions for each column
out(diag_idx) = -upper_tri;
%// Set the first element of output array with n^(-1/2)
out(1) = -1/sqrt(n);
%// Finally, perform CUMSUM as suggested earlier
out = cumsum(out,1);
%// Set the diagonal elements with the actually expected values
out(diag_idx(2:end)) = upper_tri(2:end).*[-1:-1:-(n-1)];
Runtime Tests
(I) With n = 10000, the runtime at my end were - Elapsed time is 0.457543 seconds.
(II) Now, as the final performance-squeezing practice, you can edit the pre-allocation step for out with a faster pre-allocation scheme as listed in this MATLAB Undodumented Blog. Thus, the pre-allocation step would look like this -
out(n,n) = 0;
The runtime with this edited code was - Elapsed time is 0.400399 seconds.
(III) The runtime for n = 10000 with the other answer by #rayryeng yielded - Elapsed time is 1.306339 seconds.

Every possible difference among multiple vectors

I have 3 vectors, v1, v2, v3. What I want to get is the difference between every possible pair of them, that is, v1-v2, v1-v3, v2-v3. How can I do this without looping in matlab?
Thank you.
Just use nchoosek to generate the combinations first and then use them to index into your array of row-vectors:
Test case:
numVectors = 3;
dim = 5;
Vs = rand(numVectors, dim);
Actual computation:
combs = nchoosek(1:size(Vs,1), 2);
differences = Vs(combs(:,1),:) - Vs(combs(:,2),:);
The above creates 3 random row vectors of dimension 5. So in your case, you may want to replace the creation of the random matrix with Vs = [v1; v2; v3]; if your vectors are row vectors; or transpose the vectors using Vs = [v1, v2, v3].'; if your data are column vectors.
Using bsxfun:
clear
clc
%// Sample vectors.
v1 = [1 2];
v2 = [10 20];
v3 = [0 0];
Out = bsxfun(#minus,[v1 v2 v3], [v1 v2 v3].')
Out =
0 1 9 19 -1 -1
-1 0 8 18 -2 -2
-9 -8 0 10 -10 -10
-19 -18 -10 0 -20 -20
1 2 10 20 0 0
1 2 10 20 0 0
Reasoning: Each difference is computed starting from the 1st element of the 1st vector until the 2nd element of the last vector.
The 1st column contains all the differences for the 1st element of the 1st vector, i.e. (1 -1), (1-2), (1-10), (1 - 20), (1 - 0), (1 - 0).
Then 2nd column, same thing but this time with the 2: (2 - 1), (2 - 2), (2 - 10), and so on.
Sorry if my explanations are unclear haha I don't know the right terms in english. Please ask for more details.
Code
%// Concatenate all vectors to form a 2D array
V = cat(2,v1(:),v2(:),v3(:),v4(:),v5(:))
N = size(V,2) %// number of vectors
%// Find all IDs of all combinations as x,y
[y,x] = find(bsxfun(#gt,[1:N]',[1:N])) %//'
%// OR [y,x] = find(tril(true(size(V,2)),-1))
%// Use matrix indxeing to collect vector data for all combinations with those
%// x-y IDs from V. Then, perform subtractions across them for final output
diff_array = V(:,x) - V(:,y)
Few points about the code
bsxfun with find gets us the IDs for forming pairwise combinations.
We use those IDs to index into the 2D concatenated array and perform subtractions between them to get the final output.
Bonus Stuff
If you look closely into the part where it finds the IDs of all combinations, that is basically nchoosek(1:..,2).
So, basically one can have alternatives to nchoosek(1:N,2) as:
[Y,X] = find(bsxfun(#gt,[1:N]',[1:N]))
[y,x] = find(tril(true(N),-1))
with [X Y] forming those pairwise combinations and might be interesting to benchmark them!

Indices of constant consecutive values in a matrix, and number of constant values

I have a matrix with constant consecutive values randomly distributed throughout the matrix. I want the indices of the consecutive values, and further, I want a matrix of the same size as the original matrix, where the number of consecutive values are stored in the indices of the consecutive values. For Example
original_matrix = [1 1 1;2 2 3; 1 2 3];
output_matrix = [3 3 3;2 2 0;0 0 0];
I have struggled mightily to find a solution to this problem. It has relevance for meteorological data quality control. For example, if I have a matrix of temperature data from a number of sensors, and I want to know what days had constant consecutive values, and how many days were constant, so I can then flag the data as possibly faulty.
temperature matrix is number of days x number of stations and I want an output matrix that is also number of days x number of stations, where the consecutive values are flagged as described above.
If you have a solution to that, please provide! Thank you.
For this kind of problems, I made my own utility function runlength:
function RL = runlength(M)
% calculates length of runs of consecutive equal items along columns of M
% work along columns, so that you can use linear indexing
% find locations where items change along column
jumps = diff(M) ~= 0;
% add implicit jumps at start and end
ncol = size(jumps, 2);
jumps = [true(1, ncol); jumps; true(1, ncol)];
% find linear indices of starts and stops of runs
ijump = find(jumps);
nrow = size(jumps, 1);
istart = ijump(rem(ijump, nrow) ~= 0); % remove fake starts in last row
istop = ijump(rem(ijump, nrow) ~= 1); % remove fake stops in first row
rl = istop - istart;
assert(sum(rl) == numel(M))
% make matrix of 'derivative' of runlength
% don't need last row, but needs same size as jumps for indices to be valid
dRL = zeros(size(jumps));
dRL(istart) = rl;
dRL(istop) = dRL(istop) - rl;
% remove last row and 'integrate' to get runlength
RL = cumsum(dRL(1:end-1,:));
It only works along columns since it uses linear indexing. Since you want do something similar along rows, you need to transpose back and forth, so you could use it for your case like so:
>> original = [1 1 1;2 2 3; 1 2 3];
>> original = original.'; % transpose, since runlength works along columns
>> output = runlength(original);
>> output = output.'; % transpose back
>> output(output == 1) = 0; % see hitzg's comment
>> output
output =
3 3 3
2 2 0
0 0 0