Cumulative Sum with >=0 Restriction in Matlab - matlab

I want to calculate the cumulative sum of a vector, but stop summing up once the sum becomes negative, and start again at positive elements.
Example:
We have a vector:
[1 1 -1 -1 -1 -1 1 1 1 1]
The normal cumulative sum would then be:
[1 2 1 0 -1 -2 -1 0 1 2]
But i want:
[1 2 1 0 0 0 1 2 3 4]
The only solution i could come up with was to loop over the elements of the vector like this:
test = [1 1 -1 -1 -1 -1 1 1 1 1];
testCumsum = zeros(size(test));
for i=1:length(test)
if i==1
testCumsum(i) = test(i);
else
testCumsum(i) = testCumsum(i-1) + test(i);
end
if testCumsum(i)<0
testCumsum(i) = 0;
end
end
Is the a more matlab-ish solution?
(The sum can become negative an arbitrary number of times, the vectors can become pretty large, and the elements can be any number, not just 1 and -1)

You won't be able to vectorize it since you have to decide on each elemenet based on previous ones. You can find regions of positive and negative runs but it would be unnecessarily complex and I don't know if you can gain over your own solution.
Here is a simplification of your code for input A and output C:
C=A;
C(1) = max(C(1), 0);
for k=2:numel(C)
C(k) = max(C(k-1)+C(k), 0);
end

call your vector x,
y=x >0
z=x.*y
sum(z)
the y vector is 0 / 1 where the elemnts of x are greater than 0 the dot product to get z sets your negative values to 0, and then you can sum
_Ah i see more clearly what you want to do now, - looping is probably going to be quickest, you could break into block segments if the array is large and use parfor to speed it up

Related

How to find number of occurrences of a subset of elements in a vector without using loops in MATLAB?

Say X is the given vector:
X=[1
2
4
2
3
1
4
5
2
4
5];
And Y is the given subset of elements from X:
Y=[3
4
5];
The required output is the number of times the elements in Y occur in X:
out=[1
3
2];
My solution to do this would be to use for loop:
for i=1:size(X,1)
temp = X(X(:,1)==Y(i,1),:);
out(i,1) = size(temp,1);
end
But when X and Y are large, this is inefficient. So, how to do it faster making use of vectorization? I know about hist and histc, but I can't think of how to use them in this case to get the desired output.
A Fast Option
You could use bsxfun combined with sum to compute this
sum(bsxfun(#eq, Y, X.'), 2)
Explanation
In this example, bsxfun performs a given operation on every combination of elements in X and Y. The operation we're gonig to use is eq (a check for equality). The result is a matrix that has a row for each element in Y and a column for each element in X. It will have a 1 value if the element in X equals the element in Y that corresponds to a given row.
bsxfun(#eq, Y, X.')
% 0 0 0 0 1 0 0 0 0 0 0
% 0 0 1 0 0 0 1 0 0 1 0
% 0 0 0 0 0 0 0 1 0 0 1
We can then sum across the columns to count the number of elements in X that were equal to a given value in Y.
sum(bsxfun(#eq, Y, X.'), 2)
% 1
% 3
% 2
On newer versions of MATLAB (since R2016b), you can omit the bsxfun since the equality operation will automatically broadcast.
sum(Y - X.', 2)
A Memory-Efficient Option
The first option isn't the most efficient since it requires creating a matrix that is [numel(Y), numel(X)] elements large. Another way which may be more memory efficient may be to use the second output of ismember combined with accumarray
[tf, ind] = ismember(X, Y);
counts = accumarray(ind(tf), ones(sum(tf), 1), [numel(Y), 1], #numel);
Explanation
ismember is used to determine if the values in one array are in another. The first input tells us if each element of the first input is in the second input and the second output tells you where in the second input each element of the first input was found.
[tf, ind] = ismember(X, Y);
% 0 0 1 0 1 0 1 1 0 1 1
% 0 0 2 0 1 0 2 3 0 2 3
We can use the second input to "group" the same values together. The accumarray function does exactly this, it uses the ind variable above to determine groups and then applies a given operation to each group. In our case, we want to simply determine the number of element within each group. So to do that we can pass a second input the size of the ind input (minus the ones that didn't match) of ones, and then use numel as the operation (counts the number in each group)
counts = accumarray(ind(tf), ones(sum(tf), 1), [numel(Y), 1], #numel);
% 1
% 3
% 2

Cross-Correlation of two signals

I want to find the correlation between two signals x1 and x2.
x1 = [1 1 1 1 1]
x2 = [1 1 1 1 1]
r1 = xcorr(x1,x2) //function in matlab to find cross correlation of x1 and x2
x1 and x2 both look like this
and their cross correlation look like this
I understand that correlation measures the degree of similarity between two signals, giving highest value to the point which corresponds to maximum similarity (the two signals are shifted relative to each other to measure similarity at different points right?). So in that case, the cross correlation should give a high value at all points but this is not so. The maximum value is at 5th position. Why is that? Can someone explain this to me?
You seem to have a slight misunderstanding of how cross-correlation works. Cross-correlation takes one signal, and compares it with shifted versions of another signal. If you recall, the (unnormalized) cross-correlation of two signals is defined as:
(source: jiracek at www-rohan.sdsu.edu)
s and h are two signals. Therefore, we shift versions of the second signal h and take element by element products and sum them all together. The horizontal axis of the cross-correlation plot denote shifts, while the vertical axis denotes the output of the cross-correlation at each shift. Let's compute the cross-correlation by hand for the signal so we can better understand the output that MATLAB is giving us.
To compute the outputs, both signals need to be zero-padded in order to accommodate for the first point when both signals start to overlap. Specifically, we need to zero-pad so that we have N2-1 zeroes to the left of s and N2-1 zeroes to the right of s in order to facilitate our computation of the cross correlation. N2 in this case is the length of h. For each time you calculate the cross correlation given a shift of the signal h, you would create a signal of all zero that is the same size as the zero-padded version of s, then place the original signal h within this larger signal. You would use this new signal to compare with the zero-padded version of s.
Actually, a property of cross-correlation is that it's commutative. If you had one signal that was longer, and a signal that was shorter, it would be easier for you to leave the long signal stationary, while you shifted the shorter one. Bear in mind that you'll certainly get the same results no matter which one you choose the shift, but you should always choose the easier path!
Back to where we were, this is what the first value of the cross correlation looks like (shift = 1).
s = [0 0 0 0 1 1 1 1 1 0 0 0 0]
h = [1 1 1 1 1 0 0 0 0 0 0 0 0]
The second signal slides from left to right, and we start where the right end of h begins to overlap the first signal, which is s. We do a point-by-point multiplication between s and h, and we sum up the elements. In this case, we get:
s ** h = (0)(1) + (0)(1) + (0)(1) + (0)(1) + (1)(1) + (0)(1) + (0)(1) + (0)(1) + (0)(1)
= 1
The ** in this case is (my version of) the cross-correlation operator. Let's look at shift = 2:
s = [0 0 0 0 1 1 1 1 1 0 0 0 0]
h = [0 1 1 1 1 1 0 0 0 0 0 0 0]
Remember, we are shifting towards the right by 1 more and s stays the same. Doing the same calculations as above, we should get:
s ** h = (0)(1) + (0)(1) + (0)(1) + (0)(1) + (1)(1) + (1)(1) + (0)(1) + (0)(1) + (0)(1)
= 2
If you repeat this for the other shifts, you'll see that the values keep increasing by 1, up until we have total overlap, which is the fifth shift (shift = 5). In this case, we get:
s = [0 0 0 0 1 1 1 1 1 0 0 0 0]
h = [0 0 0 0 1 1 1 1 1 0 0 0 0]
When you compute the cross-correlation, we get 5. Now, when we compute the sixth shift (shift = 6), we move to the right by 1, and that's when the cross-correlation starts to drop. Specifically:
s = [0 0 0 0 1 1 1 1 1 0 0 0 0]
h = [0 0 0 0 0 1 1 1 1 1 0 0 0]
If you go ahead and compute the cross-correlation, you'll see that the result is 4. You keep shifting to the right, and you'll see that the values keep decreasing by 1 per shift we take. You get to the final point where there is only one point where both s and h overlap, which is here:
s = [0 0 0 0 1 1 1 1 1 0 0 0 0]
h = [0 0 0 0 0 0 0 0 1 1 1 1 1]
By computing the cross-correlation, we only get the value of 1. You'll also see that this is at shift = 9. Therefore, this explains your graph where the cross-correlation starts to increase, because there is an increasing amount of overlap. It then reaches the maximum at shift = 5 because there is total overlap of the two signals. The cross-correlation then starts to decrease because the amount of overlap is also starting to decrease.
You'll also notice that the total number of shifts that we need to compute is N1 + N2 - 1, and this is a property of cross correlation. N1 and N2 are the lengths of s and h respectively. As such, given that N1 = N2 = 5, we see that the total number of shifts is N1 + N2 - 1 = 9, which also corresponds to the last shift we computed above.
Hope this helps!

Finding all possible “lists” of possible pairs in Matlab

I have been thinking about a problem for the last few days but as I am a beginner in MATLAB, I have no clue how to solve it. Here is the background. Suppose that you have a symmetric N×N matrix where each element is either 0 or 1, and N = (1,2,...,n).
For example:
A =
0 1 1 0
1 0 0 1
1 0 0 0
0 1 0 0
If A(i,j) == 1, then it is possible to form the pair (i,j) and if A(i,j)==0 then it is NOT possible to form the pair (i,j). For example, (1,2) is a possible pair, as A(1,2)==A(2,1)==1 but (3,4) is NOT a possible pair as A(3,4)==A(4,3)==0.
Here is the problem. Suppose that a member of the set N only can for a pair with at most one other distinct member of the set N (i.e., if 1 forms a pair with 2, then 1 cannot form a pair with 3). How can I find all possible “lists” of possible pairs? In the above example, one “list” would only consist of the pair (1,2). If this pair is formed, then it is not possible to form any other pairs. Another “list” would be: ((1,3),(2,4)). I have searched the forum and found that the latter “list” is the maximal matching that can be found, e.g., by using a bipartite graph approach. However, I am not necessarily only interested to find the maximal matching; I am interested in finding ALL possible “lists” of possible pairs.
Another example:
A =
0 1 1 1
1 0 0 1
1 0 0 0
1 1 0 0
In this example, there are three possible lists:
(1,2)
((1,3),(2,4))
(1,4)
I hope that you can understand my question, and I apologize if am unclear. I appreciate all help I can get. Many thanks!
This might be a fast approach.
Code
%// Given data, A
A =[ 0 1 1 1;
1 0 0 1;
1 0 0 0;
1 1 0 0];
%%// The lists will be stored in 'out' as a cell array and can be accessed as out{1}, out{2}, etc.
out = cell(size(A,1)-1,1);
%%// Code that detects the lists using "selective" diagonals
for k = 1:size(A,1)-1
[x,y] = find(triu(A,k).*(~triu(ones(size(A)),k+1)));
out(k) = {[x y]};
end
out(cellfun('isempty',out))=[]; %%// Remove empty lists
%%// Verification - Print out the lists
for k = 1:numel(out)
disp(out{k})
end
Output
1 2
1 3
2 4
1 4
EDIT 1
Basically I will calculate all the the pairwise indices of the matrix to satisfy the criteria set in the question and then simply map them over the given matrix. The part of finding the "valid" indices is obviously the tedious part in it and in this code with some aggressive approach is expensive too when dealing with input matrices of sizes more than 10.
Code
%// Given data, A
A = [0 1 1 1; 1 0 1 1; 1 1 0 1; 1 1 1 0]
%%// Get all pairwise combinations starting with 1
all_combs = sortrows(perms(1:size(A,1)));
all_combs = all_combs(all_combs(:,1)==1,:);
%%// Get the "valid" indices
all_combs_diff = diff(all_combs,1,2);
valid_ind_mat = all_combs(all(all_combs_diff(:,1:2:end)>0,2),:);
valid_ind_mat = valid_ind_mat(all(diff(valid_ind_mat(:,1:2:end),1,2)>0,2),:);
%%// Map the ones of A onto the valid indices to get the lists in a matrix and then cell array
out_cell = mat2cell(valid_ind_mat,repmat(1,[1 size(valid_ind_mat,1)]),repmat(2,[1 size(valid_ind_mat,2)/2]));
A_masked = A(sub2ind(size(A),valid_ind_mat(:,1:2:end),valid_ind_mat(:,2:2:end)));
out_cell(~A_masked)={[]};
%%// Remove empty lists
out_cell(all(cellfun('isempty',out_cell),2),:)=[];
%%// Verification - Print out the lists
disp('Lists =');
for k1 = 1:size(out_cell,1)
disp(strcat(' List',num2str(k1),':'));
for k2 = 1:size(out_cell,2)
if ~isempty(out_cell{k1,k2})
disp(out_cell{k1,k2})
end
end
end
Output
A =
0 1 1 1
1 0 1 1
1 1 0 1
1 1 1 0
Lists =
List1:
1 2
3 4
List2:
1 3
2 4
List3:
1 4
2 3
I'm sure there's a faster way to do it, but here's the obvious solution:
%// Set top half to 0, and find indices of all remaining 1's
A(triu(A)==1) = 0;
[ii,jj] = find(A);
%// Put these in a matrix for further processing
P = [ii jj];
%// Sort indices into 'lists' of the kind you defined
X = repmat({}, size(P,1),1);
for ii = 1:size(P,1)-1
X{ii}{1} = P(ii,:);
for jj = ii+1:size(P,1)
if ~any(ismember(P(ii,:), P(jj,:)))
X{ii}{end+1} = P(jj,:); end
end
end

finding the minimum between 3 numbers

How can I find the smallest number among three that is non zero.
I tried introducing a very small number eps = 1e-6 (my numbers are either zero or clearly larger than eps) and doing tests between min(x,eps), min(y,eps) etc. I didn't get anything. Is there a way to do that with a function?
If the numbers are all stored in a vector x you could do the following:
x = [1 0 2 0 3 0 4];
y = min(x(x>0));
This is based on your statement that
numbers are either zero or clearly larger than eps
If you mean larger in magnitude and you want to accept non-zero negative values you could use:
x = [1 0 -2 0 3 0 4];
y = min(x(x~=0));
Note that this will return the most negative number when negative numbers are present, rather than the number with the smallest non-zero magnitude. To get the number with the smallest non-zero magnitude, you could use:
x = [1 0 -2 0 3 0 4];
xnonzero = x(x~=0);
[~,idx] = min(abs(xnonzero));
y = xnonzero(idx);
It doesn't seem very elegant. There is probably a more direct way.
numbers = [1 3 4 -2 1 0];
answer = min(numbers(numbers>0));
answer == 1

find non-overlapping sequences of zeros in matlab arrays

This is related to:
Finding islands of zeros in a sequence.
However, the problem is not exactly the same:
Let's take the same vector with the above postfor the purpose of comparison:
sig = [1 1 0 0 0 0 1 1 1 1 1 0 1 0 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 0];
What I am trying to find are the starting indices of islands of n consecutive zeros; however, overlapping is not allowed. For example for n=2, I want the result:
v=[3, 5, 14, 25];
I found the solution of Amro brilliant as a starting point (especially with regards to strfind), but the second part of his answer does not give me the result that I expect. This is a non-vectorized solution that I have so far:
function v=findIslands(sig, n)
% Finds indices of unique islands
% sig --> target vector
% n --> This is the length of the island
% This will find the starting indices for all "islands" of ones
% but it marks long strings multiple times
startIndex = strfind(sig, zeros(1,n));
L=length(startIndex);
% ongoing gap counter
spc=0;
if L>0 % Check if empty
v=startIndex(1);
for i=2:L
% Count the distance
spc=spc+(startIndex(i)-startIndex(i-1));
if spc>=n
v=[v,startIndex(i)];
% Reset odometer
spc=0;
end
end
else
v=[];
display('No Islands Found!')
end
I was wondering if someone has a faster vectorized solution to the above problem.
You can convert everything into strings and use regular expressions:
regexp(sprintf('%d', sig(:)), sprintf('%d', zeros(n, 1)))
Example
>> sig = [1 1 0 0 0 0 1 1 1 1 1 0 1 0 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 0];
>> n = 2;
>> regexp(sprintf('%d', sig(:)), sprintf('%d', zeros(n, 1)))
ans =
3 5 14 25
Do this:
As an example let's look at the case where the run length you want is 2.
Convert vector to binary number
Set index = size-1, set starting = []
Loop until n < 4:
Is n divisible by 4?
Yes? Append index to starting. Set n = n / 4
No? Set n = n / 2
Goto 3
For any other run length replace 4 with 2**run.
Use gnovice's answer from the same linked question. It's vectorized, and the runs where duration == n are the ones you want.
https://stackoverflow.com/a/3274416/105904
Take the runs with duration >= n, and then divide duration by n, and that'll tell you how many consecutive runs you have at each position and how to expand the index list. This could end up faster than the regexp version, if your island density isn't too high.