Finding contiguous points in an increasing range - matlab

I have a set of data points in a vector. For example,
[NaN, NaN, NaN, -1.5363, NaN -1.7664, -1.7475];
These data result from a code which selects 3 points within a specified range (specifically. -0.6 an 0.6). If three points from the column do not exist in this range, the range is incrementally expanded until three points are found. In the above example, the range was increased to -1.8 to 1.8. However, the data we are analyzing is erratic, and has random peaks and troughs, leading to points which are non-contiguous being accepted into the range (element 3 is chosen to be valid, but not element 4).
What would be the best way to go about this? I already have a code to incrementally increase the range to find three points, I just need to modify it to not stop at any three points, but to increase the range until it finds three CONTIGUOUS points. If that were done for the above example, I would just evaluate slopes to remove the 3rd element (since between 3 and 4, the slope is negative).
Thanks.

Assuming your data as provided in the example is in the variable x, you can use isnan and findstr like so:
x = [NaN, NaN, NaN, -1.5363, NaN -1.7664, -1.7475, 123];
~isnan(x)
ans =
0 0 0 1 0 1 1 1
pos = findstr(~isnan(x), [1 1 1]);
The reason for using findstr like this is that we would like to find the sequence [1 1 1] within the logical array returned by isnan, and findstr will return the index of the positions in the input array where this sequence appears.
For your example data, this will return [], but if you change it to the data in the example I have given, it will return 6, and you can extract the contiguous region with x(pos:pos+2). You will have to be a bit careful about cases where there are more than 3 contiguous values (if there were 4, it would return [6 7]) and the cases where there is more than one contiguous region. If you don't need to do anything meaningful with these cases then just use pos(1).
If you want to extract the entirety of the first contiguous region whose length is greater than or equal to 3, you could do something like:
x = [NaN, NaN, NaN, -1.5363, NaN -1.7664, -1.7475, 123, 456, 789];
startPos = [];
stopPos = [];
pos = findstr(~isnan(x), [1 1 1]);
if ~isempty(pos)
startPos = pos(1);
stopPos = startPos + 2;
% Find any cases where we have consecutive numbers in pos
if length(pos) > 1 && any(diff(pos) == 1)
% We have a contiguous section longer than 3 elements
% Find the NaNs
nans = find(isnan(x));
% Find the first NaN after pos(1), or the index of the last element
stopPos = nans(nans > startPos);
if ~isempty(stopPos)
stopPos = stopPos(1) - 1; % Don't want the NaN
else
stopPos = length(x);
end
end
end
x(startPos:stopPos)

Related

How to create an adjacency/joint probability matrix in matlab

From a binary matrix, I want to calculate a kind of adjacency/joint probability density matrix (not quite sure how to label it as so please feel free to rename).
For example, I start with this matrix:
A = [1 1 0 1 1
1 0 0 1 1
0 0 0 1 0]
I want to produce this output:
Output = [1 4/5 1/5
4/5 1 1/5
1/5 1/5 1]
Basically, for each row, I want to calculate the proportion of times where they agreed (1 and 1 or 0 and 0). A will always agree with itself and thus have it as 1 along the diagonal. No matter how many different js are added it will still result in a 3x3, but an extra i variable will result in a 4x4.
I like to think of the inputs along i in the A matrix as the person and Js as the question and so the final output is a 3x3 (number of persons) matrix.
I am having some trouble with this on matlab. If you could please help point me in the right direction that would be fabulous.
So, you can do this in two parts.
bothOnes = A*A';
gives you a matrix showing how many 1s each pair of rows share, and
bothZeros = (1-A)*(1-A)';
gives you a matrix showing how many 0s each pair of rows share.
If you just add them up, you get how many elements they share of either type:
bothSame = A*A' + (1-A)*(1-A)';
Then just divide by the row length to get the desired fractional representation:
output = (A*A' + (1-A)*(1-A)') / size(A, 2);
That should get you there.
Note that this only works if A contains only 1's and 0's, but it can be adapted for other cases.
Here are some alternatives, assuming A can only contain 0 and 1:
If you have the Statistics Toolbox:
result = 1-squareform(pdist(A, 'hamming'));
Manual approach with implicit expansion:
result = mean(permute(A, [1 3 2])==permute(A, [3 1 2]), 3);
Using bitwise operations. This is a more esoteric approach, and is only valid if A has at most 53 columns, due to floating-point limitations:
t = bin2dec(char(A+'0')); % convert each row from binary to decimal
u = bitxor(t, t.'); % bitwise xor
v = mean(dec2bin(u)-'0', 2); % compute desired values
result = 1 - reshape(v, size(A,1), []); % reshape to obtain result

Every possible difference among multiple vectors

I have 3 vectors, v1, v2, v3. What I want to get is the difference between every possible pair of them, that is, v1-v2, v1-v3, v2-v3. How can I do this without looping in matlab?
Thank you.
Just use nchoosek to generate the combinations first and then use them to index into your array of row-vectors:
Test case:
numVectors = 3;
dim = 5;
Vs = rand(numVectors, dim);
Actual computation:
combs = nchoosek(1:size(Vs,1), 2);
differences = Vs(combs(:,1),:) - Vs(combs(:,2),:);
The above creates 3 random row vectors of dimension 5. So in your case, you may want to replace the creation of the random matrix with Vs = [v1; v2; v3]; if your vectors are row vectors; or transpose the vectors using Vs = [v1, v2, v3].'; if your data are column vectors.
Using bsxfun:
clear
clc
%// Sample vectors.
v1 = [1 2];
v2 = [10 20];
v3 = [0 0];
Out = bsxfun(#minus,[v1 v2 v3], [v1 v2 v3].')
Out =
0 1 9 19 -1 -1
-1 0 8 18 -2 -2
-9 -8 0 10 -10 -10
-19 -18 -10 0 -20 -20
1 2 10 20 0 0
1 2 10 20 0 0
Reasoning: Each difference is computed starting from the 1st element of the 1st vector until the 2nd element of the last vector.
The 1st column contains all the differences for the 1st element of the 1st vector, i.e. (1 -1), (1-2), (1-10), (1 - 20), (1 - 0), (1 - 0).
Then 2nd column, same thing but this time with the 2: (2 - 1), (2 - 2), (2 - 10), and so on.
Sorry if my explanations are unclear haha I don't know the right terms in english. Please ask for more details.
Code
%// Concatenate all vectors to form a 2D array
V = cat(2,v1(:),v2(:),v3(:),v4(:),v5(:))
N = size(V,2) %// number of vectors
%// Find all IDs of all combinations as x,y
[y,x] = find(bsxfun(#gt,[1:N]',[1:N])) %//'
%// OR [y,x] = find(tril(true(size(V,2)),-1))
%// Use matrix indxeing to collect vector data for all combinations with those
%// x-y IDs from V. Then, perform subtractions across them for final output
diff_array = V(:,x) - V(:,y)
Few points about the code
bsxfun with find gets us the IDs for forming pairwise combinations.
We use those IDs to index into the 2D concatenated array and perform subtractions between them to get the final output.
Bonus Stuff
If you look closely into the part where it finds the IDs of all combinations, that is basically nchoosek(1:..,2).
So, basically one can have alternatives to nchoosek(1:N,2) as:
[Y,X] = find(bsxfun(#gt,[1:N]',[1:N]))
[y,x] = find(tril(true(N),-1))
with [X Y] forming those pairwise combinations and might be interesting to benchmark them!

Indices of constant consecutive values in a matrix, and number of constant values

I have a matrix with constant consecutive values randomly distributed throughout the matrix. I want the indices of the consecutive values, and further, I want a matrix of the same size as the original matrix, where the number of consecutive values are stored in the indices of the consecutive values. For Example
original_matrix = [1 1 1;2 2 3; 1 2 3];
output_matrix = [3 3 3;2 2 0;0 0 0];
I have struggled mightily to find a solution to this problem. It has relevance for meteorological data quality control. For example, if I have a matrix of temperature data from a number of sensors, and I want to know what days had constant consecutive values, and how many days were constant, so I can then flag the data as possibly faulty.
temperature matrix is number of days x number of stations and I want an output matrix that is also number of days x number of stations, where the consecutive values are flagged as described above.
If you have a solution to that, please provide! Thank you.
For this kind of problems, I made my own utility function runlength:
function RL = runlength(M)
% calculates length of runs of consecutive equal items along columns of M
% work along columns, so that you can use linear indexing
% find locations where items change along column
jumps = diff(M) ~= 0;
% add implicit jumps at start and end
ncol = size(jumps, 2);
jumps = [true(1, ncol); jumps; true(1, ncol)];
% find linear indices of starts and stops of runs
ijump = find(jumps);
nrow = size(jumps, 1);
istart = ijump(rem(ijump, nrow) ~= 0); % remove fake starts in last row
istop = ijump(rem(ijump, nrow) ~= 1); % remove fake stops in first row
rl = istop - istart;
assert(sum(rl) == numel(M))
% make matrix of 'derivative' of runlength
% don't need last row, but needs same size as jumps for indices to be valid
dRL = zeros(size(jumps));
dRL(istart) = rl;
dRL(istop) = dRL(istop) - rl;
% remove last row and 'integrate' to get runlength
RL = cumsum(dRL(1:end-1,:));
It only works along columns since it uses linear indexing. Since you want do something similar along rows, you need to transpose back and forth, so you could use it for your case like so:
>> original = [1 1 1;2 2 3; 1 2 3];
>> original = original.'; % transpose, since runlength works along columns
>> output = runlength(original);
>> output = output.'; % transpose back
>> output(output == 1) = 0; % see hitzg's comment
>> output
output =
3 3 3
2 2 0
0 0 0

Sensor time distributions from arrays and cell arrays

I have a cell array and a numeric array in matlab which are inherently linked. The numeric array (A) contains a series of times from several data sources e.g. the time of each measurement. The array is n sensors (columns) by n measurements (rows). The array is filled with -1 by default since 0 is a valid time.
A = [ [ 100 110 -1 -1 ] ; ...
[ -1 200 180 -1 ] ; ...
[ -1 200 210 240 ] ; ...
[ 400 -1 -1 450 ] ];
The cell contains the sensors, in chronological order, for each row of the numeric array. Each cell elements contains a vector showing the sensors in the order they made the measurements.
C = { [1 2] [3 2] [2 3 4] [1 4]};
I want to see the distribution of times relative to each sensor e.g. what is the distribution of times from sensor 2/3/4 (when they are present), relative to sensor?
For example...
Sensor 1 is involved in the first and fourth measurements and the other detectors were +10 (100 -> 110) and +50 (400 -> 450). In this case I'm looking to return an array such as [10 50].
Sensor 2 is involved in the first three events, one of which is a three-way event. In this case it sensor2 isn't always the first to trigger, so some values will be negative. In this case I'm looking to return [-10 -20 +10 +40)]
Using the same logic sensor3 should return [20 -10 30] and sensor4 [-40 -30 -50].
I'm sure there should be an easy way to do this but I can't get my head round it. Of course the example I've given is a very simple one.... normally I'm dealing with tens of sensors and 100,000's measurements so looping over each and every col / row will take a long time... and often draw little results if only two (or so) of the sensors trigger in each measurement. For this reason I was hoping to use the elements in the cell array to access only the correct elements in the numeric array.
Any thoughts?
If I have understood the problem well enough for solving, it seems you don't need to worry about C for the output. Here's the code -
num_sensors = size(A,2)%// No. of sensors
A = A'; %//' The tracking goes row-wise, so transpose the input array
A(A==-1)=nan; %//set minus 1's to NaNs as excluding elements
out = cell(num_sensors,1); %// storage for ouput
for k1 = 1:num_sensors
%// Per sensor subtractions
per_sensor_subt = bsxfun(#minus,A,A(k1,:));
%// Set all elements of its own row to NaNs to exclude own subtractions
per_sensor_subt(k1,:)=nan;
%// Get all the non-nans that correspond to the valid output
out{k1} = per_sensor_subt(~isnan(per_sensor_subt));
end
Output -
>> celldisp(out)
out{1} =
10
50
out{2} =
-10
-20
10
40
out{3} =
20
-10
30
out{4} =
-40
-30
-50
As you have confirmed that the order of the output for each cell isn't important, you can employ a simplified approach that could be faster -
num_sensors = size(A,2)%// No. of sensors
A(A==-1)=nan; %//set minus 1's to NaNs as excluding elements
out = cell(num_sensors,1); %// storage for ouput
for k1 = 1:num_sensors
%// Per sensor subtractions
per_sensor_subt = bsxfun(#minus,A,A(:,k1));
%// Set all elements of its own row to NaNs to exclude own subtractions
per_sensor_subt(:,k1)=nan;
%// Get all the non-nans that correspond to the valid output
out{k1} = per_sensor_subt(~isnan(per_sensor_subt));
end
Fully vectorized solution if memory permits -
[m,n] = size(A)%// No. of sensors and measurements
A(A==-1)=nan; %//set minus 1's to NaNs as excluding elements
%// Per sensor subtractions
per_sensor_subt = bsxfun(#minus,A,permute(A,[1 3 2]))
%// Set all elements of its own row to NaNs to exclude own subtractions
own_idx = bsxfun(#plus,bsxfun(#plus,[1:m]',[0:n-1]*numel(A)),[0:n-1]*m);%//'
per_sensor_subt(own_idx)=nan;
%// Linear and row-col-dim3 indices of valid subtractions
idx = find(~isnan(per_sensor_subt))
[x,y,z] = ind2sub(size(per_sensor_subt),idx)
%// Get per sensor output
out = arrayfun(#(n) per_sensor_subt(idx(z==n)),1:n,'un',0)
If you would like to calculate C, use this approach -
%// Sort A row-wise
[sortedA,sorted_idx] = sort(A,2)
%// Set all invalid indices to zeros, so that later on we can use `nonzeros`
%// to extract out the valid indices
valid_sorted_idx = sorted_idx.*(sortedA~=-1)
%// Convert to a cell array
valid_sorted_idx_cell = mat2cell(valid_sorted_idx,ones(1,size(A,1)),size(A,2))
%// Extract the valid ones(nonzero indices) for the final output, C
C = cellfun(#(x) nonzeros(x), valid_sorted_idx_cell,'un',0)

Matlab find Series of First Negative Number

With a matrix of numbers in Matlab, how would you find the first negative number after a series of positive numbers?
So far, the only answer I could come up with was to write a loop to check for the first negative number, then record it, then look for the first positive number, and truncate the array there, then start over. Is there a vectorized way to do this?
e.g., I have x = [ -1 -5 -2 3 4 8 -2 -3 1 9], and I want this function or script to give me an array of y = [1 7].
or
find(diff(sign([1 x]))<0)
that is to say: find the locations in x where the difference in sign between successive elements is negative, oh and pushing a 1 onto the front of x to take care of the case where the 1st element is already negative
This is a quite home made solution, check it out
x = [ -1 -5 -2 3 4 8 -2 -3 1 9]
neg_idx = find(x < 0) % // negative values
z = [0 diff(neg_idx)] % // increments among indices of successive values;
% // consecutive indices return a difference of 1 whereas
% // non consecutive return a value greater than 1.
% // because the first element must be distinguished
% // we put the zero in front
id = find(z ~= 1) % // Every time there is a jump, a non consecutive neg.
% // value appears
% // thus the solution is in
y = neg_idx(id)
ans =
1 7
If neg_idx is empty (i.e. no negative value is involved) you will get an Index exceeds matrix dimensions, although the condition is immediate to check.