Cross-Correlation of two signals - matlab

I want to find the correlation between two signals x1 and x2.
x1 = [1 1 1 1 1]
x2 = [1 1 1 1 1]
r1 = xcorr(x1,x2) //function in matlab to find cross correlation of x1 and x2
x1 and x2 both look like this
and their cross correlation look like this
I understand that correlation measures the degree of similarity between two signals, giving highest value to the point which corresponds to maximum similarity (the two signals are shifted relative to each other to measure similarity at different points right?). So in that case, the cross correlation should give a high value at all points but this is not so. The maximum value is at 5th position. Why is that? Can someone explain this to me?

You seem to have a slight misunderstanding of how cross-correlation works. Cross-correlation takes one signal, and compares it with shifted versions of another signal. If you recall, the (unnormalized) cross-correlation of two signals is defined as:
(source: jiracek at www-rohan.sdsu.edu)
s and h are two signals. Therefore, we shift versions of the second signal h and take element by element products and sum them all together. The horizontal axis of the cross-correlation plot denote shifts, while the vertical axis denotes the output of the cross-correlation at each shift. Let's compute the cross-correlation by hand for the signal so we can better understand the output that MATLAB is giving us.
To compute the outputs, both signals need to be zero-padded in order to accommodate for the first point when both signals start to overlap. Specifically, we need to zero-pad so that we have N2-1 zeroes to the left of s and N2-1 zeroes to the right of s in order to facilitate our computation of the cross correlation. N2 in this case is the length of h. For each time you calculate the cross correlation given a shift of the signal h, you would create a signal of all zero that is the same size as the zero-padded version of s, then place the original signal h within this larger signal. You would use this new signal to compare with the zero-padded version of s.
Actually, a property of cross-correlation is that it's commutative. If you had one signal that was longer, and a signal that was shorter, it would be easier for you to leave the long signal stationary, while you shifted the shorter one. Bear in mind that you'll certainly get the same results no matter which one you choose the shift, but you should always choose the easier path!
Back to where we were, this is what the first value of the cross correlation looks like (shift = 1).
s = [0 0 0 0 1 1 1 1 1 0 0 0 0]
h = [1 1 1 1 1 0 0 0 0 0 0 0 0]
The second signal slides from left to right, and we start where the right end of h begins to overlap the first signal, which is s. We do a point-by-point multiplication between s and h, and we sum up the elements. In this case, we get:
s ** h = (0)(1) + (0)(1) + (0)(1) + (0)(1) + (1)(1) + (0)(1) + (0)(1) + (0)(1) + (0)(1)
= 1
The ** in this case is (my version of) the cross-correlation operator. Let's look at shift = 2:
s = [0 0 0 0 1 1 1 1 1 0 0 0 0]
h = [0 1 1 1 1 1 0 0 0 0 0 0 0]
Remember, we are shifting towards the right by 1 more and s stays the same. Doing the same calculations as above, we should get:
s ** h = (0)(1) + (0)(1) + (0)(1) + (0)(1) + (1)(1) + (1)(1) + (0)(1) + (0)(1) + (0)(1)
= 2
If you repeat this for the other shifts, you'll see that the values keep increasing by 1, up until we have total overlap, which is the fifth shift (shift = 5). In this case, we get:
s = [0 0 0 0 1 1 1 1 1 0 0 0 0]
h = [0 0 0 0 1 1 1 1 1 0 0 0 0]
When you compute the cross-correlation, we get 5. Now, when we compute the sixth shift (shift = 6), we move to the right by 1, and that's when the cross-correlation starts to drop. Specifically:
s = [0 0 0 0 1 1 1 1 1 0 0 0 0]
h = [0 0 0 0 0 1 1 1 1 1 0 0 0]
If you go ahead and compute the cross-correlation, you'll see that the result is 4. You keep shifting to the right, and you'll see that the values keep decreasing by 1 per shift we take. You get to the final point where there is only one point where both s and h overlap, which is here:
s = [0 0 0 0 1 1 1 1 1 0 0 0 0]
h = [0 0 0 0 0 0 0 0 1 1 1 1 1]
By computing the cross-correlation, we only get the value of 1. You'll also see that this is at shift = 9. Therefore, this explains your graph where the cross-correlation starts to increase, because there is an increasing amount of overlap. It then reaches the maximum at shift = 5 because there is total overlap of the two signals. The cross-correlation then starts to decrease because the amount of overlap is also starting to decrease.
You'll also notice that the total number of shifts that we need to compute is N1 + N2 - 1, and this is a property of cross correlation. N1 and N2 are the lengths of s and h respectively. As such, given that N1 = N2 = 5, we see that the total number of shifts is N1 + N2 - 1 = 9, which also corresponds to the last shift we computed above.
Hope this helps!

Related

Count length and frequency of island of consecutive numbers

I have a sequence of ones and zeros and I would like to count how often islands of consecutive ones appear.
Given:
S = [1 1 0 0 1 1 1 0 1 0 0 1 0 0 0 1 1 0 0 1 0 0 1 1 0 1]
By counting the islands of consecutive ones I mean this:
R = [4 3 1]
…because there are four single ones, three double ones and a single triplet of ones.
So that when multiplied by the length of the islands [1 2 3].
[4 3 1] * [1 2 3]’ = 13
Which corresponds to sum(S), because there are thirteen ones.
I hope to vectorize the solution rather than loop something.
I came up with something like:
R = histcounts(diff( [0 (find( ~ (S > 0) ) ) numel(S)+1] ))
But the result does not make much sense. It counts too many triplets.
All pieces of code I find on the internet revolve around diff([0 something numel(S)]) but the questions are always slightly different and don’t really help me
Thankful for any advice!
The following should do it. Hopefully the comments are clear.
S = [1 1 0 0 1 1 1 0 1 0 0 1 0 0 0 1 1 0 0 1 0 0 1 1 0 1];
% use diff to find the rising and falling edges, padding the start and end with 0
edges = diff([0,S,0]);
% get a list of the rising edges
rising = find(edges==1);
% and falling edges
falling = find(edges==-1);
% and thereby get the lengths of all the runs
SRuns = falling - rising;
% The longest run
maxRun = max(SRuns);
% Finally make a histogram, putting the bin centres
R = hist(SRuns,1:maxRun);
You could also obtain the same result with:
x = find(S==1)-(1:sum(S)) %give a specific value to each group of 1
h = histc(x,x) %compute the length of each group, you can also use histc(x,unique(x))
r = histc(h,1:max(h)) %count the occurence of each length
Result:
r =
4,3,1

How does Y = eye(K)(y, :); replace a "for" loop? Coursera

Working on an assignment from Coursera Machine Learning. I'm curious how this works... From an example, this much simpler code:
% K is the number of classes.
K = num_labels;
Y = eye(K)(y, :);
seems to be a substitute for the following:
I = eye(num_labels);
Y = zeros(m, num_labels);
for i=1:m
Y(i, :)= I(y(i), :);
end
and I have no idea how. I'm having some difficulty Googling this info as well.
Thanks!
Your variable y in this case must be an m-element vector containing integers in the range of 1 to num_labels. The goal of the code is to create a matrix Y that is m-by-num_labels where each row k will contain all zeros except for a 1 in column y(k).
A way to generate Y is to first create an identity matrix using the function eye. This is a square matrix of all zeroes except for ones along the main diagonal. Row k of the identity matrix will therefore have one non-zero element in column k. We can therefore build matrix Y out of rows indexed from the identity matrix, using y as the row index. We could do this with a for loop (as in your second code sample), but that's not as simple and efficient as using a single indexing operation (as in your first code sample).
Let's look at an example (in MATLAB):
>> num_labels = 5;
>> y = [2 3 3 1 5 4 4 4]; % The columns where the ones will be for each row
>> I = eye(num_labels)
I =
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
>> Y = I(y, :)
Y =
% 1 in column ...
0 1 0 0 0 % 2
0 0 1 0 0 % 3
0 0 1 0 0 % 3
1 0 0 0 0 % 1
0 0 0 0 1 % 5
0 0 0 1 0 % 4
0 0 0 1 0 % 4
0 0 0 1 0 % 4
NOTE: Octave allows you to index function return arguments without first placing them in a variable, but MATLAB does not (at least, not very easily). Therefore, the syntax:
Y = eye(num_labels)(y, :);
only works in Octave. In MATLAB, you have to do it as in my example above, or use one of the other options here.
The first set of code is Octave, which has some additional indexing functionality that MATLAB does not have. The second set of code is how the operation would be performed in MATLAB.
In both cases Y is a matrix generated by re-arranging the rows of an identity matrix. In both cases it may also be posible to calculate Y = T*y for a suitable linear transformation matrix T.
(The above assumes that y is a vector of integers that are being used as an indexing variables for the rows. If that's not the case then the code most likely throws an error.)

Cumulative Sum with >=0 Restriction in Matlab

I want to calculate the cumulative sum of a vector, but stop summing up once the sum becomes negative, and start again at positive elements.
Example:
We have a vector:
[1 1 -1 -1 -1 -1 1 1 1 1]
The normal cumulative sum would then be:
[1 2 1 0 -1 -2 -1 0 1 2]
But i want:
[1 2 1 0 0 0 1 2 3 4]
The only solution i could come up with was to loop over the elements of the vector like this:
test = [1 1 -1 -1 -1 -1 1 1 1 1];
testCumsum = zeros(size(test));
for i=1:length(test)
if i==1
testCumsum(i) = test(i);
else
testCumsum(i) = testCumsum(i-1) + test(i);
end
if testCumsum(i)<0
testCumsum(i) = 0;
end
end
Is the a more matlab-ish solution?
(The sum can become negative an arbitrary number of times, the vectors can become pretty large, and the elements can be any number, not just 1 and -1)
You won't be able to vectorize it since you have to decide on each elemenet based on previous ones. You can find regions of positive and negative runs but it would be unnecessarily complex and I don't know if you can gain over your own solution.
Here is a simplification of your code for input A and output C:
C=A;
C(1) = max(C(1), 0);
for k=2:numel(C)
C(k) = max(C(k-1)+C(k), 0);
end
call your vector x,
y=x >0
z=x.*y
sum(z)
the y vector is 0 / 1 where the elemnts of x are greater than 0 the dot product to get z sets your negative values to 0, and then you can sum
_Ah i see more clearly what you want to do now, - looping is probably going to be quickest, you could break into block segments if the array is large and use parfor to speed it up

How to replace non-zero elements randomly with zero?

I have a matrix including 1 and 0 elements like below which is used as a network adjacency matrix.
A =
0 1 1 1
1 1 0 1
1 1 0 1
1 1 1 0
I want to simulate an attack on the network, so I must replace some specific percent of 1 elements randomly with 0. How can I do this in MATLAB?
I know how to replace a percentage of elements randomly with zeros, but I must be sure that the element that is replaced randomly, is one of the 1 elements of matrix not zeros.
If you want to change each 1 with a certain probability:
p = 0.1%; % desired probability of change
A_ones = find(A); % linear index of ones in A
A_ones_change = A_ones(rand(size(A_ones))<=p); % entries to be changed
A(A_ones_change) = 0; % apply changes in those entries
If you want to randomly change a fixed fraction of the 1 entries:
f = 0.1; % desired fraction
A_ones = find(A);
n = round(f*length(A_ones));
A_ones_change = randsample(A_ones,n);
A(A_ones_change) = 0;
Note that in this case the resulting fraction may be different to that intended, because of the need to round to an integer number of entries.
#horchler's point is a good one. However, if we keep it simple, then you can just multiple your input matrix to a mask matrix.
>> a1=randint(5,5,[0 1]) #before replacing 1->0
a1 =
1 1 1 0 1
0 1 1 1 0
0 1 0 0 1
0 0 1 0 1
1 0 1 0 1
>> a2=random('unif',0,1,5,5) #Assuming frequency distribution is uniform ('unif')
a2 =
0.7889 0.3200 0.2679 0.8392 0.6299
0.4387 0.9601 0.4399 0.6288 0.3705
0.4983 0.7266 0.9334 0.1338 0.5751
0.2140 0.4120 0.6833 0.2071 0.4514
0.6435 0.7446 0.2126 0.6072 0.0439
>> a1.*(a2>0.1) #And the replacement prob. is 0.1
ans =
1 1 1 0 1
0 1 1 1 0
0 1 0 0 1
0 0 1 0 1
1 0 1 0 0
And other trick can be added to the mask matrix (a2). Such as a different freq. distribution, or a structure (e.g. once a cell is replaced, the adjacent cells become less likely to be replaced and so on.)
Cheers.
The function find is your friend:
indices = find(A);
This will return an array of the indices of 1 elements in your matrix A and you can use your method of replacing a percent of elements with zero on a subset of this array. Then,
A(subsetIndices) = 0;
will replace the remaining indices of A with zero.

find non-overlapping sequences of zeros in matlab arrays

This is related to:
Finding islands of zeros in a sequence.
However, the problem is not exactly the same:
Let's take the same vector with the above postfor the purpose of comparison:
sig = [1 1 0 0 0 0 1 1 1 1 1 0 1 0 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 0];
What I am trying to find are the starting indices of islands of n consecutive zeros; however, overlapping is not allowed. For example for n=2, I want the result:
v=[3, 5, 14, 25];
I found the solution of Amro brilliant as a starting point (especially with regards to strfind), but the second part of his answer does not give me the result that I expect. This is a non-vectorized solution that I have so far:
function v=findIslands(sig, n)
% Finds indices of unique islands
% sig --> target vector
% n --> This is the length of the island
% This will find the starting indices for all "islands" of ones
% but it marks long strings multiple times
startIndex = strfind(sig, zeros(1,n));
L=length(startIndex);
% ongoing gap counter
spc=0;
if L>0 % Check if empty
v=startIndex(1);
for i=2:L
% Count the distance
spc=spc+(startIndex(i)-startIndex(i-1));
if spc>=n
v=[v,startIndex(i)];
% Reset odometer
spc=0;
end
end
else
v=[];
display('No Islands Found!')
end
I was wondering if someone has a faster vectorized solution to the above problem.
You can convert everything into strings and use regular expressions:
regexp(sprintf('%d', sig(:)), sprintf('%d', zeros(n, 1)))
Example
>> sig = [1 1 0 0 0 0 1 1 1 1 1 0 1 0 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 0];
>> n = 2;
>> regexp(sprintf('%d', sig(:)), sprintf('%d', zeros(n, 1)))
ans =
3 5 14 25
Do this:
As an example let's look at the case where the run length you want is 2.
Convert vector to binary number
Set index = size-1, set starting = []
Loop until n < 4:
Is n divisible by 4?
Yes? Append index to starting. Set n = n / 4
No? Set n = n / 2
Goto 3
For any other run length replace 4 with 2**run.
Use gnovice's answer from the same linked question. It's vectorized, and the runs where duration == n are the ones you want.
https://stackoverflow.com/a/3274416/105904
Take the runs with duration >= n, and then divide duration by n, and that'll tell you how many consecutive runs you have at each position and how to expand the index list. This could end up faster than the regexp version, if your island density isn't too high.